Conversation
|
WELL. I just pushed the README.txt to the master branch 🤦 |
|
Content of the README: We used the genome collection of GlobDB v226 (https://globdb.org), which itself contains all genome representatives of GTDB v226. We made an external-genomes-txt file with all the GTDB genomes present in GlobDB and used anvi-get-sequences-for-hmm-hits to extract the sequences for all tRNAs: The headers of that fasta file contains the information about the anticodon and the genome's id: Using that information, we can create as many fasta file as anticodon and only keep the genome's name as the defline: Finally, we can use the taxonomy file provided in GlobDB to construct the table mapping the genome's name/accession to a taxonomy: |
LOL 😂 |
|
Thank you very much for this, @FlorianTrigodet! I think it would be great if we could translate your README.txt into something that could go into the help docs for anvi-estimate-trna-taxonomy under a section, say, "Notes for Advanced Users and Programmers" where we describe how the underlying data for this can be updated. |
|
Plus, what do you mean by that you tested and it worked? Does it mean you run the old and new database on a few contigs-db files do you get reasonably comparable results? :) |
|
I ran the old an new version on a metagenomic assembly (human gut) and from a quick glance at a few dozen annotations (out of >1200 annotation for AUG) the results are relatively comparable. As for moving the readme to the documentation, should we move it to anvi-setup-trna-taxonomy or anvi-run-trna-taxonomy? |
|
I had suggested |
New version of the tRNA taxonomy using GTDB v226.
I ran a quick test and it worked fine. I can also see that the database version's check are working.
I also noticed that in the original commit for the first and only version of the tRNA taxonomy database in anvi'o that we would make a README file showing how we generated the data. But, alas, no such README was ever committed.
I will write a short one explaining how I got the data and how I transformed it.