Annotated de Bruijn graph indexes over petabases of public sequence data.
Total Accessions
Sequences Indexed
Available Online
Jobs Completed
Database | Accessions | Indexed Sequences (TB) | Index Size (GB) | Health Status | Features | S3 Location |
---|---|---|---|---|---|---|
Human reference genome and variation | 29 | 0.003 | 11 | Healthy | DNATaxonomic IDAlign | s3://metagraph/gnomad |
MetaSUB urban microbiome dataset (k=41) | 4,220 | 7.2 | 47 | Healthy | DNASample MetadataGeocoordinatesCity ContextAlign | s3://metagraph/metasub_k41 |
RefSeq (33M accessions) | 32,881,348 | 1.7 | 463 | Healthy | DNATaxonomic IDAlign | s3://metagraph/refseq/ |
RefSeq (85k) with coordinates | 85,375 | 1.7 | 508 | Healthy | DNATaxonomic IDAlignCoordinates | s3://metagraph/refseq/ |
SRA fungi raw sequences | 121,900 | 162 | 80 | Healthy | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/fungi/ |
SRA human raw sequences | 121,900 | 725 | 3,402 | Healthy | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/human/ |
SRA Logan contigs (partial: 17M/27M) | 16,764,975 | 16,450 | 42,856 | Healthy | DNARNATaxonomic IDSample Metadata | s3://metagraph/all_sra |
SRA Metagut | 241,384 | 156 | 1,111 | Healthy | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/metagut/ |
SRA Metazoa raw sequences | 805,239 | 1,999 | 5,366 | Healthy | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/metazoa/ |
SRA Metazoa 1K raw sequences | 67,391 | 119 | 302 | Healthy | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/metazoa/ |
SRA Microbe raw sequences | 446,506 | 221 | 57 | Healthy | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/microbe/ |
SRA Mus musculus raw sequences | 57,938 | 147 | 292 | Healthy | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/mouse |
SRA plants raw sequences | 121,900 | 1,109 | 1,844 | Healthy | DNARNATaxonomic IDSample MetadataAlign | s3://metagraph/plants/ |
Marine metagenome genomes from global ocean survey | 34,815 | 0.062 | 15 | Healthy | DNAAlign | s3://metagraph/tara_oceans/ |
UHGG All contigs | 4,644 | 0.71 | 27 | Healthy | DNATaxonomic IDSample MetadataGeocoordinatesAlign | s3://metagraph/uhgg_all/ |
UHGG Catalog | 4,644 | 0.011 | 3 | Healthy | DNATaxonomic IDSample MetadataGeocoordinatesAlign | s3://metagraph/uhgg_catalogue/ |
UniProt Archive - comprehensive protein sequence database | 543,904,874 | 0.21 | 125 | Healthy | Amino AcidsTaxonomic IDCoordinates | s3://metagraph/uniparc |
All indexes are hosted on AWS S3 for public access. Install the AWS CLI following the installation guide (supports Windows, macOS, Linux).
Example commands:
# List available objects in a bucket
aws s3 ls s3://metagraph/refseq/ --no-sign-request
# Download a specific file
aws s3 cp s3://metagraph/refseq/file.dbg . --no-sign-request
# Sync an entire directory
aws s3 sync s3://metagraph/refseq/ ./local-refseq/ --no-sign-request
The --no-sign-request
flag indicates public access without AWS credentials.