Human DNA Parsing

Task Description

A human chromosome is represented as a long string. This C program is for counting the occurrences of all words of length 10 in the human chromosome 1.

A word in this context is a substring starting at any nucleotide and has a length of 10. Each nucleotide represents the beginning of a word. Yes, these words are overlapping, and are NOT separated by spaces.

The chromosome file must be provided to your program as a command line argument. Sequences of human chromosomes may contain additional letters when the identity of the nucleotide cannot be determined precisely. Words consisting of A, C, G, and T only must be counted; all other words must not. The counting is NOT case-sensitive.

Design

Since the data we are going to process is huge, we need to think about the tradeoff between computation resource and the RAM we have. In this implementation, we go through the entire file twice: For the first time, we need to count the occurence times of each word, which helps us to allocate only RAM that we will use; For the second time, we store the location to the porper place of the array. Therefore, although we process the entire file twice, which cause a little bit more computation resource, it saves us tons of RAM comparing to allocating a SIZE*SIZE 2d array.

Implememtation

We use bitwise operation to get the index to make the program efficient.

Testing

The run.sh is an aotomation testing against the students work.

./run.sh

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.org		README.org
p2.c		p2.c
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human DNA Parsing

Task Description

Design

Implememtation

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Human DNA Parsing

Task Description

Design

Implememtation

Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages