Skip to content

Python code for an implementation of G4-Hunter algorithm#2

Open
JocelynSP wants to merge 4 commits intoAnimaTardeb:masterfrom
JocelynSP:master
Open

Python code for an implementation of G4-Hunter algorithm#2
JocelynSP wants to merge 4 commits intoAnimaTardeb:masterfrom
JocelynSP:master

Conversation

@JocelynSP
Copy link

@JocelynSP JocelynSP commented Apr 26, 2017

This is an implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016.

It merges windows to regions more sensibly than the supplied binary executable, so that regions do not overlap. Merged regions reflect the published algorithm in that terminal As and Ts are not shown.
When run on the supplied Mitochondria_NC_012920_1.fasta the windows scores agree with those of the supplied binary executable.
It does not currently output a Score_plot.pdf

This is a naive implementation of the algorithm in Bedrat, Lacroix & Mergny "Re-evaluation of G-quadruplex propensity with G4Hunter' 2016.
It is inefficient, and doesn't fully match the scoring system in that nucleotides at the terminals of a window are given a score that does not reflect any extension of the run of matching nucleotides outside the window.
It merges windows to regions more sensibly, so that regions do not overlap, and regions more accurately reflect the published algorithm in that terminal As and Ts are removed.
It does not output a Score_plot.pdf
Add option to ScoreSeq function to adjust score based on runs extending outside window
Correct errors in how scores were adjusted for runs comencing before the window being scored
@JocelynSP
Copy link
Author

I have now matched the scoring system, so windows have the score adjusted for the length of run outside the window. This gives the same output for the file Mitochondria_NC_012920_1.fasta with window 25nts and threshold 1.5 as the original. (Except for being tab-separated instead of space-separated)

@JocelynSP JocelynSP changed the title Python code for a naive implementation of G4-Hunter algorithm Python code for an implementation of G4-Hunter algorithm May 1, 2017
@mahzer
Copy link

mahzer commented Jan 18, 2018

Hi Jocelyn,
Nice work!
Is that possible to get the result as a BED file when using a reference genome as an input? I know it's possible with the original R script but I'd like to try the new feature that you have implemented.
Thanks,
MZ

@AnimaTardeb
Copy link
Owner

AnimaTardeb commented Jan 18, 2018 via email

@JocelynSP
Copy link
Author

Hi Mahzer,
I don't know what original R code you are referring to, do you mean the original Python / binary, or might you be on the wrong post?

I am not interested in doing more work on this script, but it would not be hard to add a bed-format output, or to convert the _merged.tsv file to bed-format.
BED files have no column headers. They have 3 to 12 tab-separated fields, with chrom , start and end being required, as Amina said above. See: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
In the merged.tsv file, the chrom is a section heading and would have to be written in field 1 instead; then Start and End can go in fields 2 and 3. The Sequence could go in field 4 (name), or name could just be '.' Then field 5 (score) is Score, and no other optional fields would be used
Jocelyn

@mahzer
Copy link

mahzer commented Jan 19, 2018

Thanks, Amina and Jocelyn.

I was referring to the R scripts included in the supplementary of the paper. I did not look at all of them and thought one of them is the actual code in R.

MZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants