Python API

The MetaGraph API provides a simple way to query indexes (running on a remote server or locally) in Python and supports both exact k-mer matching as well as inexact search (alignment).

Attention

The API described here was used in the internal implementation of MetaGraph Online and can also be used to query indexes hosted locally. For the API of MetaGraph Online, refer to MetaGraph Online Help.

Installation

Install MetaGraph API in Python:

pip install -U "git+https://github.com/ratschlab/metagraph.git#subdirectory=metagraph/api/python"

Sequence alignment

The align method allows alignment of sequences to the graph. The method accepts a single sequence or a list of sequences represented with strings. Additionally, the method accepts the following keyword arguments:

metagraph.client.GraphClient.align(self, ...)

Align sequence(s) to the joint graph

Parameters:
  • sequence (Union[str, Iterable[str]]) – The query sequence

  • min_exact_match (float) – The minimum fraction (between 0.0 and 1.0) of nucleotides covered by seeds required to align the sequence [default: 0]

  • max_alternative_alignments (int) – The number of different alignments to return [default: 1]

  • max_num_nodes_per_seq_char (float) – The maximum number of nodes to consider per sequence character during extension [default: 10.0]

Returns:

A data frame with alignments

Return type:

pandas.DataFrame

Examples

Example of search in MetaSUB

from metagraph.client import GraphClient

metasub = GraphClient('dnaloc.ethz.ch', 80, api_path='/api/metasub19')

lbls = metasub.column_labels()

# >ENA|A14565|A14565.1 16S rRNA
query = 'TCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAAT\
        GTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATA\
        ACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGG\
        GATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAG\
        GATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGG\
        GAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTT\
        CGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTG\
        ACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGG\
        GTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAG\
        ATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCG\
        TAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCG\
        GTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAA\
        ACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCC\
        TTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAA\
        GGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT\
        CGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTTCCAGAGATGGAT\
        TGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAA\
        ATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCC\
        GGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCA\
        TCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGA\
        CCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACT\
        CGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTT\
        CCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTA\
        GCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAAC\
        AAGGTAACCGTAGGGGAAC'

metasub.search(query, discovery_fraction=0.0, top_labels=200)

metasub.align(query, min_exact_match=0.8)

Search multiple graphs in parallel

The API provides MultiGraphClient, which can query multiple graph servers in parallel. Both search and align have the keyword argument parallel [default: True]. If parallel=True, the result will be a dictionary mapping the specified index names to instances of concurrent.futures.Future. If parallel=False, all graphs will simply be queried in sequence and the results will be instances of pandas.DataFrame.

from metagraph.client import MultiGraphClient

multi = MultiGraphClient()

multi.add_graph('dnaloc.ethz.ch', 80, api_path='/api/metasub19', name='metasub')
multi.add_graph('dnaloc.ethz.ch', 80, api_path='/api/uhgg', name='uhgg')

multi.list_graphs()
# {'metasub': ('dnaloc.ethz.ch', 80), 'uhgg': ('dnaloc.ethz.ch', 80)}

# >ENA|A14565|A14565.1 16S rRNA
query= 'TCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAAT\
        GTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATA\
        ACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGG\
        GATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAG\
        GATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGG\
        GAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTT\
        CGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTG\
        ACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGG\
        GTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAG\
        ATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCG\
        TAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCG\
        GTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAA\
        ACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCC\
        TTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAA\
        GGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT\
        CGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTTCCAGAGATGGAT\
        TGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAA\
        ATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCC\
        GGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCA\
        TCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGA\
        CCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACT\
        CGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTT\
        CCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTA\
        GCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAAC\
        AAGGTAACCGTAGGGGAAC'

# Search in parallel
futures = multi.search(query, discovery_fraction=0.0, top_labels=100)
# {'metasub': <Future at 0x116dbed10 state=running>,
   'uhgg': <Future at 0x116dad8d0 state=running>}

# You can either handle the Future instances yourself
# or block and wait for all of the results
result = MultiGraphClient.wait_for_result(futures)

Query a locally hosted index

When an index is hosted locally, say on address localhost and 5555, the API client can connect to it as follows:

from metagraph.client import GraphClient

graph_client = GraphClient('127.0.0.1', 5555, api_path='')

Since in this case requests directly go to the MetaGraph engine without being forwarded via an intermediate HTTP server, the api_path flag should be omitted. (Compare this to the example above).

Before initializing a client and initiating a connection, a search engine (the main MetaGraph app) must be started to load up an index for query. This can be done, for instance, as follows:

metagraph server_query -v -i graph.dbg -a annotation.row_diff_brwt.annodbg --port 5555 -p 10

Other examples

Find more examples here.