Key publications describing the MetaGraph framework, compression algorithms, and alignment methods.
If you are using MetaGraph or the index resources for your work, please cite:
Karasikov M, Mustafa H, Danciu D, Kulkov O, Zimmermann M, Barber C, Rätsch G, Kahles A. Efficient and accurate search in petabase-scale sequence repositories. Nature. 2025;647: 1036–1044. https://www.nature.com/articles/s41586-025-09603-w
@article{karasikov2025metagraph,
title={Efficient and accurate search in petabase-scale sequence repositories},
author={Karasikov, Mikhail and Mustafa, Harun and Danciu, Daniel and Kulkov, Oleksandr and Zimmermann, Marc and Barber, Christopher and R{\"a}tsch, Gunnar and Kahles, Andr{\'e}},
journal={Nature},
volume={647},
number={8091},
pages={1036--1044},
year={2025},
publisher={Nature Publishing Group},
doi={10.1038/s41586-025-09603-w}
}Why it matters: Demonstrates practical feasibility of economical full‑text search in 67 petabase pairs of public sequence data, making biological archives searchable at scale.
OA: Open Access
Why it matters: First end‑to‑end description of MetaGraph for petabase‑scale annotated de Bruijn‑graph indexing and search.
Why it matters: Column‑hierarchical (Multi‑BRWT) compression—cornerstone for compact colored/annotated DBGs.
Why it matters: Exploits graph topology to sparsify labels—big annotation size reductions with fast queries.
Why it matters: Early compact/dynamic color encoding that informed later MetaGraph annotation designs.
Why it matters: Adds counts and coordinates; enables lossless quantitative and positional queries in MetaGraph.
Why it matters: Label‑consistent SCA/MLA alignment used as MetaGraph's sensitive mode for experiment discovery.
Why it matters: Long inexact sketch‑based seeding (MG‑Sketch) boosts recall at high divergence; complements MetaGraph's alignment toolkit.