We are using the complete genome of the Sars-COV-2 virus strain Wuhan-Hu-1
found National Library of Medicine
for the first part of the project. The text version of the genome can be
found in genomes/.
When reading this genome, we can break it into the 5'UTR region (sites 1-265) and the actual gene (sites 266-21555) according to the FASTA file linked in the above reference.
The genomes/fasta_reader.py can be used to read the wuhan-hu-1.txt file
by calling its read_wuhan_1 function with the filepath to wuhan-hu-1.txt.
This function will return 2 strings: hu1_full_genome and hu1_rbd. The
full genome is just that. The hu1_rbd is a string comprised of the
nucleotides at sites 21563..25384 corresponding (hopefully) to the RBD.
It has an arg parser and can be called in the terminal:
python fasta_reader.py [-filepath=<file/you/are/reading> -opt=[which file
you are reading]The only option for -opt at this stage is 0 for ``read_wuhan_1`. If we
use a different FASTA for part 2, we can update the reader to portion it
appropriately and add its option.
- Length of RBD: Roy, U. Comparative structural analyses of selected spike protein-RBD mutations in SARS-CoV-2 lineages. Immunol Res 70, 143–151 (2022). https://doi.org/10.1007/s12026-021-09250-z