A suffix array is a data structure that sorts all suffixes of a text string in order to allow efficient search for substrings. There are several data structures to represent suffix arrays, as well as derived structures like the Burrows-Wheeler transformation. Suffix arrays and derived data structures are often used in compression algorithms and in bioinformatics applications, where they represent DNA sequences. In the bioinformatics scenario of short read mapping, the latency of memory operations in a data- dependent random access pattern is the restricting performance factor. A recent research paper proposed a method to trade-off required memory size for number of memory access operations. However, the current work only presents one particular design point on a single hardware platform.
We have a wide range of CPU and FPGA systems, where this trade- off can be explored further. A particularly interesting platform is the recently released hybrid memory cube (HMC) manufactured by world leading memory manufacturer Micron. In this thesis, you will design a scalable short read mapping implementation with suffix-arrays and integrate it into a state-of-the-art bioinformatics application and explore it on the HMC and other hardware platforms. You can contribute to leading-edge research on the algorithmic side and work with brand new hardware systems.
- Memory-scalable C/C++ implementation of short read matching
- Port to hardware with OpenCL or high-level synthesis
- Explore memory trade-offs
- Solid programming background (C/C++)
- Experienced Linux user
- Interest in new hardware