Ultra Scalable Framework for DNA Search, Alignment, Assembly

The MetaGraph framework allows for indexing and analysis of very large biological sequence collections, producing compressed indexes that can represent several petabases of input data. The indexes can be efficiently queried with any query sequence of interest. Read more in the paper preprint.

Sourcing on raw sequencing data available in public archives such as SRA or ENA, MetaGraph makes this treasure trove of information directly accessible for full text search, helping to discover whether any given sequence has ever been observed before and, if yes, in which context.

The featureful API enables both exact k-mer matching as well as inexact search (alignment). The search results are associated with the annotations available for the matches in the index, providing information on, e.g., the sample source or other associated metadata.