Résumé

Ultrascale computing and bioinformatics are two rapidly growing fields with a big impact right now and even more so in the future. The introduction of next generation sequencing pushes current bioinformatics tools and workflows to their limits in terms of performance. This forces the tools to become increasingly performant to keep up with the growing speed at which sequencing data is created. Ultrascale computing can greatly benefit bioinformatics in the challenges it faces today, especially in terms of scalability, data management and reliability. But before this is possible, the algorithms and software used in the field of bioinformatics need to be prepared to be used in a heterogeneous distributed environment. For this paper we choose to look at sequence alignment, which has been an active topic of research to speed up next generation sequence analysis, as it is ideally suited for parallel processing. We present a multilevel stream based parallel architecture to transparently distribute sequence alignment over multiple cores of the same machine, multiple machines and cloud resources. The same concepts are used to achieve multithreaded and distributed parallelism, making the architecture simple to extend and adapt to new situations. A prototype of the architecture has been implemented using an existing commercial sequence aligner. We demonstrate the flexibility of the implementation by running it on different configurations, combining local and cloud computing resources.

Détails

Actions