This it the GitHub repository for the final project of the R for Bio Data Science course at DTU. The project was done by:
- Deeptha Sri - s210230
- Eric Bautista - s212514
- Jonathan Funk - s212697
- Laura Machado - s212775
The objective of our final project was to analyze the meta-data of the RCSB protein data bank
The data was analyzed using5 R tidyverse](https://www.tidyverse.org/). The project structure is inspired by the Josh Reich’s Load-clean-func-do-thought and this 2009 paper by William Stafford Noble.
We combined data from different sources for our analysis, namely:
- Protein Data Bank (PDB)
- National Center for Biotechnology Information (NCBI)
- Structural Classification Of Proteins (SCOP)
which were accessed on the 03/05/2022.
The data was processed and analyzed based on a flowchart below:
We recreated pie charts which are on the RCSB and visalized the data as bar plots. some of the categories which were chosen by creators of the plots were changed during this project based on our preferences:
{width=45%}
{width=45%}
{width=45%}
{width=45%}
{width=45%}
{width=45%}
{width=40%}
{width=40%}