MetaGraph
MetaGraph
SearchResultsExamplesDatabasesTeamPubs
DocsHelp

©2019-2025 BMI LAB | ETH ZURICH | PRIVACY | IMPRINT

    Database Indexes

    Annotated de Bruijn graph indexes over petabases of public sequence data.

    595,669,082

    Total Accessions

    21,100,582 GB

    Sequences Indexed

    14

    Available Online

    of 17 total
    61

    Jobs Completed

    in the last hour
    Available Databases
    Compressed sequence indexes spanning viruses, bacteria, fungi, plants, animals, and humans. Each row represents a searchable labeled de Bruijn graph over public sequence collections.
    DatabaseAccessionsIndexed Sequences (TB)Index Size (GB)Health StatusFeaturesS3 Location
    GnomAD
    Human reference genome and variation
    290.00311
    Healthy
    DNATaxonomic IDAlign
    s3://metagraph/gnomad
    Metasub
    MetaSUB urban microbiome dataset (k=41)
    4,2207.247
    Healthy
    DNASample MetadataGeocoordinatesCity ContextAlign
    s3://metagraph/metasub_k41
    RefSeq (33M)
    RefSeq (33M accessions)
    32,881,3481.7463
    Healthy
    DNATaxonomic IDAlign
    s3://metagraph/refseq/
    RefSeq (85k) Coord
    RefSeq (85k) with coordinates
    85,3751.7508
    Healthy
    DNATaxonomic IDAlignCoordinates
    s3://metagraph/refseq/
    SRA fungi
    SRA fungi raw sequences
    121,90016280
    Healthy
    DNARNATaxonomic IDSample MetadataAlign
    s3://metagraph/fungi/
    SRA human
    SRA human raw sequences
    121,9007253,402
    Healthy
    DNARNATaxonomic IDSample MetadataAlign
    s3://metagraph/human/
    SRA Logan contigs
    SRA Logan contigs (partial: 17M/27M)
    16,764,97516,45042,856
    Healthy
    DNARNATaxonomic IDSample Metadata
    s3://metagraph/all_sra
    SRA metagut
    SRA Metagut
    241,3841561,111
    Healthy
    DNARNATaxonomic IDSample MetadataAlign
    s3://metagraph/metagut/
    SRA metazoa
    SRA Metazoa raw sequences
    805,2391,9995,366
    Healthy
    DNARNATaxonomic IDSample MetadataAlign
    s3://metagraph/metazoa/
    SRA metazoa 1k
    SRA Metazoa 1K raw sequences
    67,391119302
    Healthy
    DNARNATaxonomic IDSample MetadataAlign
    s3://metagraph/metazoa/
    SRA microbe
    SRA Microbe raw sequences
    446,50622157
    Healthy
    DNARNATaxonomic IDSample MetadataAlign
    s3://metagraph/microbe/
    SRA mus musculus
    SRA Mus musculus raw sequences
    57,938147292
    Healthy
    DNARNATaxonomic IDSample MetadataAlign
    s3://metagraph/mouse
    SRA plants
    SRA plants raw sequences
    121,9001,1091,844
    Healthy
    DNARNATaxonomic IDSample MetadataAlign
    s3://metagraph/plants/
    Tara Oceans
    Marine metagenome genomes from global ocean survey
    34,8150.06215
    Healthy
    DNAAlign
    s3://metagraph/tara_oceans/
    UHGG All
    UHGG All contigs
    4,6440.7127
    Healthy
    DNATaxonomic IDSample MetadataGeocoordinatesAlign
    s3://metagraph/uhgg_all/
    UHGG catalog
    UHGG Catalog
    4,6440.0113
    Healthy
    DNATaxonomic IDSample MetadataGeocoordinatesAlign
    s3://metagraph/uhgg_catalogue/
    UniParc
    UniProt Archive - comprehensive protein sequence database
    543,904,8740.21125
    Healthy
    Amino AcidsTaxonomic IDCoordinates
    s3://metagraph/uniparc
    How to Download Index Files
    Access MetaGraph indexes via AWS S3 for local analysis using the MetaGraph command line tool

    All indexes are hosted on AWS S3 for public access. Install the AWS CLI following the installation guide (supports Windows, macOS, Linux).

    Example commands:

    # List available objects in a bucket
    aws s3 ls s3://metagraph/refseq/ --no-sign-request

    # Download a specific file
    aws s3 cp s3://metagraph/refseq/file.dbg . --no-sign-request

    # Sync an entire directory
    aws s3 sync s3://metagraph/refseq/ ./local-refseq/ --no-sign-request

    The --no-sign-request flag indicates public access without AWS credentials.