Python API

The MetaGraph API provides a simple way to query indexes (running on a remote server or locally) in Python and supports both exact k-mer matching as well as inexact search (alignment).

Installation

Install MetaGraph API in Python:

pip install -U "git+https://github.com/ratschlab/metagraph.git#subdirectory=metagraph/api/python"

Sequence alignment

The align method allows alignment of sequences to the graph. The method accepts a single sequence or a list of sequences represented with strings. Additionally, the method accepts the following keyword arguments:

metagraph.client.GraphClient.align(self, ...)

Align sequence(s) to the joint graph

Parameters:
  • sequence (Union[str, Iterable[str]]) – The query sequence

  • min_exact_match (float) – The minimum fraction (between 0.0 and 1.0) of nucleotides covered by seeds required to align the sequence [default: 0]

  • max_alternative_alignments (int) – The number of different alignments to return [default: 1]

  • max_num_nodes_per_seq_char (float) – The maximum number of nodes to consider per sequence character during extension [default: 10.0]

Returns:

A data frame with alignments

Return type:

pandas.DataFrame

Examples

Example of search in MetaSUB

from metagraph.client import GraphClient

metasub = GraphClient('dnaloc.ethz.ch', 80, api_path='/api/metasub19')

lbls = metasub.column_labels()

# >ENA|A14565|A14565.1 16S rRNA
query = 'TCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAAT\
        GTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATA\
        ACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGG\
        GATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAG\
        GATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGG\
        GAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTT\
        CGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTG\
        ACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGG\
        GTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAG\
        ATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCG\
        TAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCG\
        GTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAA\
        ACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCC\
        TTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAA\
        GGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT\
        CGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTTCCAGAGATGGAT\
        TGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAA\
        ATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCC\
        GGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCA\
        TCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGA\
        CCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACT\
        CGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTT\
        CCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTA\
        GCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAAC\
        AAGGTAACCGTAGGGGAAC'

metasub.search(query, discovery_threshold=0.0, top_labels=200)

metasub.align(query, min_exact_match=0.8)

Search multiple graphs in parallel

The API provides MultiGraphClient, which can query multiple graph servers in parallel. Both search and align have the keyword argument parallel [default: True]. If parallel=True, the result will be a dictionary mapping the specified index names to instances of concurrent.futures.Future. If parallel=False, all graphs will simply be queried in sequence and the results will be instances of pandas.DataFrame.

from metagraph.client import MultiGraphClient

multi = MultiGraphClient()

multi.add_graph('dnaloc.ethz.ch', 80, api_path='/api/metasub19', name='metasub')
multi.add_graph('dnaloc.ethz.ch', 80, api_path='/api/uhgg', name='uhgg')

multi.list_graphs()
# {'metasub': ('dnaloc.ethz.ch', 80), 'uhgg': ('dnaloc.ethz.ch', 80)}

# >ENA|A14565|A14565.1 16S rRNA
query= 'TCGAACGGTAACAGGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAAT\
        GTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATA\
        ACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGG\
        GATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAG\
        GATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGG\
        GAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTT\
        CGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTG\
        ACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGG\
        GTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAG\
        ATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCG\
        TAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCG\
        GTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAA\
        ACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCC\
        TTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAA\
        GGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT\
        CGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTTCCAGAGATGGAT\
        TGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAA\
        ATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCC\
        GGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCA\
        TCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGA\
        CCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACT\
        CGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTT\
        CCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTA\
        GCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAAC\
        AAGGTAACCGTAGGGGAAC'

# Search in parallel
futures = multi.search(query, discovery_threshold=0.0, top_labels=100)
# {'metasub': <Future at 0x116dbed10 state=running>,
   'uhgg': <Future at 0x116dad8d0 state=running>}

# You can either handle the Future instances yourself
# or block and wait for all of the results
result = MultiGraphClient.wait_for_result(futures)

Query a locally hosted index

When an index is hosted locally, say on address localhost and 5555, the API client can connect to it as follows:

from metagraph.client import GraphClient

graph_client = GraphClient('127.0.0.1', 5555, api_path='')

Since in this case requests directly go to the MetaGraph engine without being forwarded via an intermediate HTTP server, the api_path flag should be omitted. (Compare this to the example above).

Before initializing a client and initiating a connection, a search engine (the main MetaGraph app) must be started to load up an index for query. This can be done, for instance, as follows:

metagraph server_query -v -i graph.dbg -a annotation.row_diff_brwt.annodbg --port 5555 -p 10

Other examples

Find more examples here.