Annotated de Bruijn graph indexes over petabases of public sequence data.
Total Accessions
Sequences Indexed
Available Online
Searches Completed
Active Searches
| Database | Accessions | Indexed Sequences (TB) | Index Size (GB) | Online Status | Features | S3 Location |
|---|---|---|---|---|---|---|
Human reference genome and variation | 29 | 0.003 | 11 | OK (#0) | DNATaxonomic ID |
All indexes are hosted on AWS S3 for public access. Install the AWS CLI following the installation guide (supports Windows, macOS, Linux).
Example commands:
# List available objects in a bucket
aws s3 ls s3://metagraph/refseq/ --no-sign-request
# Download a specific file
aws s3 cp s3://metagraph/refseq/file.dbg . --no-sign-request
# Sync an entire directory
aws s3 sync s3://metagraph/refseq/ ./local-refseq/ --no-sign-requestThe --no-sign-request flag indicates public access without AWS credentials.
s3://metagraph/gnomad |
MetaSUB urban microbiome dataset (k=41) | 4,220 | 7.2 | 47 | Download only | DNASample MetadataGeocoordinatesCity ContextAlign | s3://metagraph/metasub_k41 |
MetaSUB urban microbiome dataset with k=19 | 4,220 | 7.2 | 206 | Download only | DNASample MetadataGeocoordinatesCity ContextAlign | s3://metagraph/metasub_k19 |
RefSeq (33M accessions) | 32,881,348 | 1.7 | 475 | OK (#0) | DNATaxonomic IDAlignCoordinates | s3://metagraph/refseq/ |
RefSeq (85k) with coordinates | 85,375 | 1.7 | 474 | OK (#0) | DNATaxonomic IDAlignCoordinates | s3://metagraph/refseq/ |
SRA fungi raw sequences | 121,900 | 162 | 112 | OK (#0) | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/fungi/ |
SRA human raw sequences | 436,494 | 725 | 17,418 | OK (#0) | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/human/ |
SRA Logan contigs (partial: 17M/27M) | 25,085,804 | 42,828 | 100,925 | OK (#0) | DNARNATaxonomic IDSample Metadata | s3://metagraph/all_sra |
SRA Metagut | 241,384 | 156 | 2,726 | OK (#0) | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/metagut/ |
SRA Metazoa raw sequences | 805,220 | 1,999 | 4,997 | OK (#0) | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/metazoa/ |
SRA Metazoa 1K raw sequences | 67,390 | 119 | 413 | Download only | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/metazoa/ |
SRA Microbe raw sequences | 446,506 | 221 | 57 | OK (#0) | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/microbe/ |
SRA Mus musculus raw sequences | 57,938 | 147 | 292 | OK (#0) | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/mouse |
SRA plants raw sequences | 531,714 | 1,109 | 2,031 | OK (#0) | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/plants/ |
Marine metagenome genomes from global ocean survey | 318,205,057 | 0.062 | 163 | Download only | DNAAlign | s3://metagraph/tara_oceans/ |
UHGG All contigs | 286,997 | 0.71 | 52 | OK (#0) | DNATaxonomic IDSample MetadataGeocoordinatesAlign | s3://metagraph/uhgg_all/ |
UHGG Catalog | 4,644 | 0.011 | 3 | OK (#0) | DNATaxonomic IDSample MetadataGeocoordinatesAlign | s3://metagraph/uhgg_catalogue/ |
UniProt Archive - comprehensive protein sequence database | 541,160,336 | 0.21 | 169 | OK (#0) | Amino AcidsTaxonomic IDCoordinates | s3://metagraph/uniparc |