tRNA taxonomy v226 by FlorianTrigodet · Pull Request #2492 · merenlab/anvio

FlorianTrigodet · 2025-11-10T13:03:51Z

New version of the tRNA taxonomy using GTDB v226.
I ran a quick test and it worked fine. I can also see that the database version's check are working.
I also noticed that in the original commit for the first and only version of the tRNA taxonomy database in anvi'o that we would make a README file showing how we generated the data. But, alas, no such README was ever committed.

I will write a short one explaining how I got the data and how I transformed it.

FlorianTrigodet · 2025-11-10T13:15:59Z

WELL. I just pushed the README.txt to the master branch 🤦

FlorianTrigodet · 2025-11-10T13:16:32Z

Content of the README:

We used the genome collection of GlobDB v226 (https://globdb.org), which itself contains all genome representatives of GTDB v226.
GlobDB already provides contigs-databases with annotated tRNA, through the command anvi-scan-trnas.

We made an external-genomes-txt file with all the GTDB genomes present in GlobDB and used anvi-get-sequences-for-hmm-hits to extract the sequences for all tRNAs:

anvi-get-sequences-for-hmm-hits -e external-genomes.txt --hmm-sources Transfer_RNAs -o all_tRNAs.fa

The headers of that fasta file contains the information about the anticodon and the genome's id:

>Ile_GAT___Transfer_RNAs___4068027d3b59b533bad5e550e79d768a08c97ce5fbceb337f7a6171e bin_id:GCA_000008085|source:Transfer_RNAs|e_value:92.8|contig:hash112713c5_GCA_000008085_000000000001|gene_callers_id:hash112713c5_612|start:7000|stop:7074|length:74

Using that information, we can create as many fasta file as anticodon and only keep the genome's name as the defline:

mkdir -p ANTICODON_SEARCH_DATABASES

awk -v outdir="ANTICODON_SEARCH_DATABASES" '
    /^>/ {
        # parse anticodon and genome id from header
        match($0, />[^_]+_([A-Z]{3})___.*bin_id:([^|]+)/, m)
        anticodon = m[1]
        genome = m[2]
        # print sequence under genome ID
        if (anticodon != "" && genome != "") {
            fname = outdir "/" anticodon
            print ">" genome >> fname
        }
        next
    }
    {
        print >> fname
    }
' all_tRNAs.fa

# gzip all outputs
gzip ANTICODON_SEARCH_DATABASES/*

Finally, we can use the taxonomy file provided in GlobDB to construct the table mapping the genome's name/accession to a taxonomy:

grep "^>" all_tRNAs.fa | sed -E 's/.*bin_id:([^|]+).*/\1/' | sort -u > genome_ids.txt
grep -Ff genome_ids.txt globdb_r226_taxonomy.tsv > ACCESSION_TO_TAXONOMY.txt
gzip ACCESSION_TO_TAXONOMY.txt && rm genome_ids.txt

ivagljiva · 2025-11-10T13:26:57Z

WELL. I just pushed the README.txt to the master branch 🤦

LOL 😂

meren · 2025-11-10T15:36:18Z

Thank you very much for this, @FlorianTrigodet!

I think it would be great if we could translate your README.txt into something that could go into the help docs for anvi-estimate-trna-taxonomy under a section, say, "Notes for Advanced Users and Programmers" where we describe how the underlying data for this can be updated.

meren · 2025-11-10T15:37:50Z

Plus, what do you mean by that you tested and it worked? Does it mean you run the old and new database on a few contigs-db files do you get reasonably comparable results? :)

FlorianTrigodet · 2025-11-11T08:35:23Z

I ran the old an new version on a metagenomic assembly (human gut) and from a quick glance at a few dozen annotations (out of >1200 annotation for AUG) the results are relatively comparable.
I cannot do a simple diff because GTDB changed quite a few phylum name since v89.
Here is the file that compare the previous and new annotation for each tRNA CAT (start codon AUG):
tRNA_CAT_v89_vs_v226.txt

As for moving the readme to the documentation, should we move it to anvi-setup-trna-taxonomy or anvi-run-trna-taxonomy?

meren · 2025-11-11T09:46:32Z

I had suggested anvi-estimate-trna-taxonomy, but I think anvi-setup-trna-taxonomy would be much better to keep that information I guess :)

tRNA taxonomy v226

6975720

FlorianTrigodet requested a review from meren November 10, 2025 13:04

Merge branch 'master' into tRNA_taxonomy_v226

e97654c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tRNA taxonomy v226#2492

tRNA taxonomy v226#2492
FlorianTrigodet wants to merge 2 commits intomasterfrom
tRNA_taxonomy_v226

FlorianTrigodet commented Nov 10, 2025

Uh oh!

FlorianTrigodet commented Nov 10, 2025

Uh oh!

FlorianTrigodet commented Nov 10, 2025

Uh oh!

ivagljiva commented Nov 10, 2025

Uh oh!

meren commented Nov 10, 2025

Uh oh!

meren commented Nov 10, 2025

Uh oh!

FlorianTrigodet commented Nov 11, 2025

Uh oh!

meren commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

FlorianTrigodet commented Nov 10, 2025

Uh oh!

FlorianTrigodet commented Nov 10, 2025

Uh oh!

FlorianTrigodet commented Nov 10, 2025

Uh oh!

ivagljiva commented Nov 10, 2025

Uh oh!

meren commented Nov 10, 2025

Uh oh!

meren commented Nov 10, 2025

Uh oh!

FlorianTrigodet commented Nov 11, 2025

Uh oh!

meren commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants