M
MetaGraph

©2019-2025 BMI LAB | ETH ZURICH | PRIVACY | IMPRINT

    Publications underpinning MetaGraph

    Key publications describing the MetaGraph framework, compression algorithms, and alignment methods.

    MetaGraph framework
    The foundational architecture for petabase‑scale sequence search
    • Efficient and accurate search in petabase‑scale sequence repositories
      Nature, 2025. DOI: 10.1038/s41586-025-09603-w
      M. Karasikov; H. Mustafa; D. Danciu; O. Kulkov; M. Zimmermann; C. Barber; G. Rätsch; A. Kahles

      Why it matters: Demonstrates practical feasibility of economical full‑text search in 67 petabase pairs of public sequence data, making biological archives searchable at scale.

      OA: Open Access

    • MetaGraph: Indexing and Analysing Nucleotide Archives at Petabase‑scale
      bioRxiv, 2020. DOI: 10.1101/2020.10.01.322164
      M. Karasikov; H. Mustafa; D. Danciu; M. Zimmermann; C. Barber; G. Rätsch; A. Kahles

      Why it matters: First end‑to‑end description of MetaGraph for petabase‑scale annotated de Bruijn‑graph indexing and search.

    Graph annotation & compression
    Compact representations and efficient encoding of graph labels
    • Sparse Binary Relation Representations for Genome Graph Annotation
      Journal of Computational Biology, 27(4):626–639, 2020. DOI: 10.1089/cmb.2019.0324
      M. Karasikov; H. Mustafa; A. Joudaki; S. Javadzadeh‑No; G. Rätsch; A. Kahles

      Why it matters: Column‑hierarchical (Multi‑BRWT) compression—cornerstone for compact colored/annotated DBGs.

      OA: PMCID: PMC7185347

    • Topology‑based sparsification of graph annotations (RowDiff)
      Bioinformatics, 37(Suppl 1):i169–i176, 2021. DOI: 10.1093/bioinformatics/btab330
      D. Danciu; M. Karasikov; H. Mustafa; A. Kahles; G. Rätsch

      Why it matters: Exploits graph topology to sparsify labels—big annotation size reductions with fast queries.

      OA: PMCID: PMC8346655

    • Dynamic compression schemes for graph coloring
      Bioinformatics, 35(3):407–414, 2019. DOI: 10.1093/bioinformatics/bty632
      H. Mustafa; I. Schilken; M. Karasikov; C. Eickhoff; G. Rätsch; A. Kahles

      Why it matters: Early compact/dynamic color encoding that informed later MetaGraph annotation designs.

      OA: PMCID: PMC6530811

    Lossless counting & coordinates
    Quantitative and positional information in graph indexes
    • Lossless indexing with counting de Bruijn graphs
      Genome Research, 32(9):1754–1764, 2022. DOI: 10.1101/gr.276607.122
      M. Karasikov; H. Mustafa; G. Rätsch; A. Kahles

      Why it matters: Adds counts and coordinates; enables lossless quantitative and positional queries in MetaGraph.

      OA: PMCID: PMC9528980

    Sequence‑to‑graph alignment (MetaGraph)
    Sensitive alignment methods for annotated graph search
    • Label‑guided seed‑chain‑extend alignment on annotated de Bruijn graphs
      Bioinformatics, 40(Suppl 1):i337–i346, 2024. DOI: 10.1093/bioinformatics/btae226
      H. Mustafa; M. Karasikov; N. Mansouri Ghiasi; G. Rätsch; A. Kahles

      Why it matters: Label‑consistent SCA/MLA alignment used as MetaGraph's sensitive mode for experiment discovery.

      OA: PMCID: PMC11211850

    • Aligning distant sequences to graphs using long seed sketches
      Genome Research, 33(7):1208–1217, 2023. DOI: 10.1101/gr.277659.123
      A. Joudaki; A. Meterez; H. Mustafa; R. Groot Koerkamp; A. Kahles; G. Rätsch

      Why it matters: Long inexact sketch‑based seeding (MG‑Sketch) boosts recall at high divergence; complements MetaGraph's alignment toolkit.

      OA: PMCID: PMC10538362