Search DNA, RNA, and protein sequences across public archives using annotated de Bruijn graphs with exact matching or sensitive alignment.
Report issues or request features:
Web limits: up to 10 sequences per web query. For larger batches, use the web Application Programming Interface (API) or Command Line Interface (CLI).
See the Databases page for live coverage and whether an index includes counts or coordinates.
View databasesMetaGraph returns hits organized by database and accession. Similar to BLAST, results are ranked by relevance, but instead of E-values, MetaGraph uses discovery threshold and k-mer matching to identify significant matches.
If your search returns no hits, consider:
See the dedicated Examples page, or run a pre‑filled search:
This example has been tested and verified to work!
curl -X POST "https://metagraph.ethz.ch:8081/search" \
-H "Content-Type: application/json" \
-d '{
"queries": [
{
"db": "refseq85_coord",
"q": "ATGCGATCGTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG"
}
]
}'Note: Use just the sequence (no FASTA header)
curl "https://metagraph.ethz.ch:8081/search/{search_id}/status"Keep checking until status is "done"
curl "https://metagraph.ethz.ch:8081/search/{search_id}/results"Install the latest release on Linux or Mac OS X with Anaconda:
conda install -c bioconda -c conda-forge metagraphIf docker is available on the system, immediately get started with
docker pull ghcr.io/ratschlab/metagraph:master
docker run -v ${HOME}:/mnt ghcr.io/ratschlab/metagraph:master \
metagraph build -v -k 10 -o /mnt/transcripts_1000 /mnt/transcripts_1000.faand replace ${HOME} with a directory on the host system to map it under /mnt in the container.
By default, it executes the binary compiled for the DNA alphabet {A,C,G,T}. To run the binary compiled for the DNA5 or Protein alphabet, just replace metagraph with metagraph_DNA5 or metagraph_Protein, respectively, e.g.:
docker run -v ${HOME}:/mnt ghcr.io/ratschlab/metagraph:master \
metagraph_Protein build -v -k 10 -o /mnt/graph /mnt/protein.faFor more complex workflows, consider running docker in the interactive mode:
$ docker run -it --entrypoint /bin/bash -v ${HOME}:/mnt ghcr.io/ratschlab/metagraph:master
root@5c42291cc9cf:/# ls /mnt/
root@5c42291cc9cf:/# metagraph --versionTo compile from source (e.g., for builds with custom alphabet or other configurations), see documentation online.
./metagraph build./metagraph annotate./metagraph transform_anno./metagraph queryDATA="../tests/data/transcripts_1000.fa"
./metagraph build -k 12 -o transcripts_1000 $DATA
./metagraph annotate -i transcripts_1000.dbg --anno-filename -o transcripts_1000 $DATA
./metagraph query -i transcripts_1000.dbg -a transcripts_1000.column.annodbg $DATA
./metagraph stats -a transcripts_1000.column.annodbg transcripts_1000.dbg./metagraphSimple build
./metagraph build -v --parallel 30 -k 20 --mem-cap-gb 10 \
-o <GRAPH_DIR>/graph <DATA_DIR>/*.fasta.gz \
2>&1 | tee <LOG_DIR>/log.txtBuild with disk swap (use to limit the RAM usage)
./metagraph build -v --parallel 30 -k 20 --mem-cap-gb 10 --disk-swap <GRAPH_DIR> \
-o <GRAPH_DIR>/graph <DATA_DIR>/*.fasta.gz \
2>&1 | tee <LOG_DIR>/log.txtBuild from k-mers filtered with KMC
K=20
./KMC/kmc -ci5 -t4 -k$K -m5 -fm <FILE>.fasta.gz <FILE>.cutoff_5 ./KMC
./metagraph build -v -p 4 -k $K --mem-cap-gb 10 -o graph <FILE>.cutoff_5.kmc_pre./metagraph annotate -v --anno-type row --fasta-anno \
-i primates.dbg \
-o primates \
~/fasta_zurich/refs_chimpanzee_primates.fa1. Cluster columns
./metagraph transform_anno -v --linkage --greedy \
-o linkage.txt \
--subsample R \
-p NCORES \
primates.column.annodbg2. Construct Multi-BRWT
./metagraph transform_anno -v -p NCORES --anno-type brwt \
--linkage-file linkage.txt \
-o primates \
--parallel-nodes V \
-p NCORES \
primates.column.annodbg./metagraph query -v -i <GRAPH_DIR>/graph.dbg \
-a <GRAPH_DIR>/annotation.column.annodbg \
--min-kmers-fraction-label 0.8 --labels-delimiter ", " \
query_seq.fa./metagraph align -v -i <GRAPH_DIR>/graph.dbg query_seq.fa./metagraph assemble -v <GRAPH_DIR>/graph.dbg \
-o assembled.fa \
--unitigs./metagraph assemble -v <GRAPH_DIR>/graph.dbg \
--unitigs \
-a <GRAPH_DIR>/annotation.column.annodbg \
--diff-assembly-rules diff_assembly_rules.json \
-o diff_assembled.faStats for graph
./metagraph stats graph.dbgStats for annotation
./metagraph stats -a annotation.column.annodbgStats for both
./metagraph stats -a annotation.column.annodbg graph.dbgSee the Databases page for the live snapshot date and coverage.
INSDC subsets (Microbe, Fungi, Plants, Metazoa incl. Human/Mouse), SRA‑MetaGut, RefSeq, UHGG, Tara Oceans, UniParc. See Databases for the active set.
Web UI supports up to 10 sequences per query (maximum length 50k). Larger runs via API and local command line interfaces.
Indexes are versioned; some include counts or coordinates. Record index name/version and the detailed search parameters.
Raw‑read indexes use moderate cleaning (MetaGraph and Logan use different strategies to reduce remove infrequent k-mers). Assembled/reference/protein indexes are indexed losslessly with coordinates.