https://{{base_url}}/v1/seq-info/{{gmsc_id}}
Where {{gmsc_id}} is of the form GMSC10.100AA.xxx_xxx_xxx or GMSC10.90AA.xxx_xxx_xxx.
Returns
{
"id": "GMSC10.xxAA.xxx_xxx_xxxx",
"nucleotide": "ATC...",
"aminoacid": "MAV...",
"taxonomy": "s__Bacteroides_vulgatus",
"habitat": "human gut",
"quality": {
"antifam": true,
"terminal": true,
"rnacode": 0.9,
"metat": 1,
"metap": 1,
"riboseq": 0.9
}
}Note that the quality field is only present for 90AA sequences.
https://{{base_url}}/v1/seq-info-multi/
This is a POST-only endpoint, expecting a JSON package consisting of a
dictonary with an entry seq_ids, which is a list of strings (identifiers).
For example:
{
"seq_ids": [
"GMSC10.90AA.123_456_789",
"GMSC10.90AA.123_456_790",
...]
}Returns a list of entries like the outputs of seq-info.
https://{{base_url}}/v1/seq-filter/
POST endpoint, with arguments:
hq_only: boolean. optional (only active for 90AA)habitat: str. mandatorytaxonomy: str. optionalquality_antifam: boolean. optionalquality_terminal: boolean. optionalquality_rnacode: float. optionalquality_metat: integer. optionalquality_metap: integer. optionalquality_riboseq: float. optional
habitat is treated as a comma separated list (e.g., you can use marine,freshwater to match all the entities that are present in both marine and freshwater).
taxonomy is a substring match so you can pass any taxonomic level (e.g., passing o__Pelagibacterales will match d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Pelagibacterales;f__Pelagibacteraceae;g__AAA240-E13).
Returns
{
"status":"Ok",
"results": [
{
"habitat":"marine,plant associated,sediment",
"seq_id":"GMSC10.90AA.000_013_322",
"taxonomy":"d__Bacteria"},
....
]
}At most 1,001 entries are returned.
https://{{base_url}}/v1/cluster-info/{{gmsc_90AA_id}}
Returns the membership of the given cluster. At most 20 results are thick (meaning that metadata is also returned). For the rest, only identifiers are returned. Example output
{
"status":" Ok",
"cluster": [
{
"aminoacid":"MAAAGFLIVSFKPFEKPSRNAATTAGFSAENFEFTMIALPYSLRP",
"habitat":"soil",
"nucleotide":"ATGGCCGCGGCCGGATTCTTGATCGTGTCCTTCAAGCCTTTCGAGAAGCCTTCGAGAAACGCCGCGACGACGGCCGGCTTCTCGGCCGAGAATTTCGAGTTCACGATGATCGCGCTGCCGTACAGCTTGAGACCGTAA",
"seq_id":"GMSC10.100AA.547_444_661",
"taxonomy":"d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Xanthobacteraceae;g__VAZQ01;s__VAZQ01 sp005883115"
}, ...
]
}NOTE. These are not recommended for public use. For large-scale analyses,
we recommend you use the
GMSC-mapper command line tool
locally. Public API endpoints will be maintained for the long-term. No such commitment
is made for endpoints marked internal. You have been warned.
https://{{base_url}}/internal/seq-search(POST)
Arguments:
sequence_faa: FASTA formatted set of sequencesis_contigs: bool (whenTrue, inputs are assumed to be DNA contigs)
Returns
{
"status": "message (normally 'Ok')",
"search-id": "xxxxx"
}https://{{base_url}}/internal/seq-search/{{search_id}}
Returns
{
"search_id": "str",
"status": "str",
"results": [
{
"query_id": "query_1",
"aminoacid": "MHEDVIQFARNEVWSLV....",
"taxonomy": "s__Bacteroides_vulgatus",
"habitat": "human gut",
"hits": [
{ "id": "GMSC10.xxAA.xxx_xxx_xxxx",
"e_value": "2.1e-23",
"aminoacid": "MHEELIQFARNEV...",
"identity": "98.4"
}, ...
]
}, ...]status will be one of Running (if the results are not yet ready), Done,
or Expired. In the case of Done, the results field will be filled in.
Dependencies
flasknumpypandaspolars
Running this (in test mode) can be done with
python -m flask runTesting can be done with curl:
curl http://127.0.0.1:5000/v1/seq-info/GMSC10.100AA.000_000_002These examples assume you are running the test version on
http://127.0.0.1:5000/. Adapt as necessary.
Searching requires using POST and a FASTA file. For example, if you have the
file example.faa, you can use
curl -X POST --form "sequence_faa=$(cat example.faa)" http://127.0.0.1:5000/internal/seq-search/The output will look something like this:
{"search_id":"1-jmgi","status":"Ok"}You can later use the given ID (in this case 1-jmgi, but it will be different
every time the app runs) to retrieve the results:
curl http://127.0.0.1:5000/internal/seq-search/1-jgmiResults will look like one of the following
{"search_id":"1-jmgi","status":"Running"}{"search_id":"1-jmgi","status":"Done", results":[...]}{"search_id":"1-jmgi","status":"Expired"}
Search ID are of the form #-xxxx where # is just an index counting up and
xxxx is a random string.
Indexing is done by the make-indices.py Jug
script. It expects FASTA and other files to be present in the gsmc-db
subdirectory.