DSuveges · DSuveges · Feb 10, 2026 · Feb 9, 2026 · Feb 10, 2026 · Feb 10, 2026
diff --git a/README.md b/README.md
@@ -34,23 +34,23 @@ uv sync --all-extras
 
 As input, the most recent datasets are pulled from the respected sources as part of the data preparation process. All source data are mapped to the GRCh38 build of the human genome.
 
-* **The sequence of the human genome** is dowloaded from [Ensembl](http://www.ensembl.org/info/data/ftp/index.html) (checking for the most recent version).
-* **Genome wide association signals** most recent version of the NHGRI-EBI [GWAS catalog](https://www.ebi.ac.uk/gwas/) (checking the most recent version).
-* **Gene annotation** the most recent gene coordinates are downloaded from [GENCODE](http://www.gencodegenes.org/releases/current.html).
-* **Canonical transcripts** of protein coding genes are defined according to [Ensembl](https://www.ensembl.org/Help/Glossary).
-* **Cytological bands** coordinates fetched from the Ensembl [REST API](http://rest.ensembl.org/).
+- **The sequence of the human genome** is dowloaded from [Ensembl](http://www.ensembl.org/info/data/ftp/index.html) (checking for the most recent version).
+- **Genome wide association signals** most recent version of the NHGRI-EBI [GWAS catalog](https://www.ebi.ac.uk/gwas/) (checking the most recent version).
+- **Gene annotation** the most recent gene coordinates are downloaded from [GENCODE](http://www.gencodegenes.org/releases/current.html).
+- **Canonical transcripts** of protein coding genes are defined according to [Ensembl](https://www.ensembl.org/Help/Glossary).
+- **Cytological bands** coordinates fetched from the Ensembl [REST API](http://rest.ensembl.org/).
 
 More information on the sources can be found in the `config.json` configuration file.
 
 ### Step 1 - Pre-processing
 
 ```bash
-python Prepare_data.py -d data_folder/ -c config.json -s 450 -t 0.5
+uv run Prepare-data -d data_folder/ -c config.json -s 450 -t 0.5
 ```
 
 help output:
 
-```
+```text
 usage: prepare_data.py [-h] -d DATADIR -c CONFIG -s CHUNKSIZE -t TOLERANCE
 
 This script fetches and parses input data for the genome plotter project
@@ -67,20 +67,20 @@ optional arguments:
                         Fraction of a chunk that cannot be N.
 ```
 
-* *<DATADIR>* folder into which the files are going to be saved.
-* *<CONFIG>* JSON file containing the project level configuration. Will be used for multiple scripts
-* *<LOGFILE>* information on the run is saved here.
-* *<CHUNKSIZE>* the length of non-overlapping window used to pool together to calculate [GC content](https://en.wikipedia.org/wiki/GC-content). In basepairs.
-* *<TOLERANCE>* Ns are discarded from the GC content calculation. This float (ranging from 0-1) shows the maximum of Ns in a chunk tolerated. Chunks with too high N ratio is considered as heterochromatic region on the plot.
+- **DATADIR** folder into which the files are going to be saved.
+- **CONFIG** JSON file containing the project level configuration. Will be used for multiple scripts
+- **LOGFILE** information on the run is saved here.
+- **CHUNKSIZE** the length of non-overlapping window used to pool together to calculate [GC content](https://en.wikipedia.org/wiki/GC-content). In basepairs.
+- **TOLERANCE** Ns are discarded from the GC content calculation. This float (ranging from 0-1) shows the maximum of Ns in a chunk tolerated. Chunks with too high N ratio is considered as heterochromatic region on the plot.
 
 
 ### Step 2 - Generate chromosome plot
 
 ```bash
-./plot_chromosome.py --help
+uv run plot-chromosome --help
 ```
 
-```
+```text
 usage: plot_chromosome.py [-h] -c CHROMOSOME [-w WIDTH] [-p PIXEL] [-s DARKSTART] [-m DARKMAX] -f FOLDER [--textFile]
                           [-g GENEFILE] [-t TEST] [--dummy] --config CONFIG [-l LOGFILE]
 
@@ -125,16 +125,15 @@ Finally the `.png` file is saved (and `.svg` file if required).
 
 The `gene_sets/` folder contains a set of files that can be used as gene annotation:
 
-* *kinases_Hs.bed.gz*: list of kinase genes in the human genome, generated by `get_kinases.sh` script.
-* *gene_w_drugs.tsv.gz*: list of genes for which approved drugs exists (where the mechanism of action is known) based on [OpenTargets tractability](https://docs.targetvalidation.org/getting-started/target-tractability) data.
+- *kinases_Hs.bed.gz*: list of kinase genes in the human genome, generated by `get_kinases.sh` script.
+- *gene_w_drugs.tsv.gz*: list of genes for which approved drugs exists (where the mechanism of action is known) based on [OpenTargets tractability](https://docs.targetvalidation.org/getting-started/target-tractability) data.
 
 ### Result
 
 The following image was created based on the data of chromosome 20, where 450 bp-s were averaged to get GC content, and 200 of these chunks were plotted in each row. The list of kinase genes are shown on the right side of the chromosome.
 
 <img src="plots/chr20.png" alt="Chromosome 20" height="1500"/>
 
-
 The combined plot showing the entire genome with all the protein kinases highlighted:
 
 <img src="plots/kinases.png" alt="Protein kinases in the genome" width="1000">

diff --git a/src/genome_plotter/config.json → src/genome_plotter/assets/config.json b/src/genome_plotter/config.json → src/genome_plotter/assets/config.json
@@ -66,7 +66,7 @@
     "gwas_data": {
       "host": "ftp.ebi.ac.uk",
       "path": "/pub/databases/gwas/releases/latest",
-      "source_file": "gwas-catalog-associations_ontology-annotated.tsv",
+      "source_file": "gwas-catalog-associations_ontology-annotated-full.zip",
       "processed_file": "processed_GWAS.bed.gz",
       "release_date": null
     }

diff --git a/src/genome_plotter/functions/ChromosomePlotter.py b/src/genome_plotter/functions/ChromosomePlotter.py
@@ -3,6 +3,9 @@
 from __future__ import annotations
 
 import logging
+import os
+
+os.environ["DYLD_FALLBACK_LIBRARY_PATH"] = "/opt/homebrew/lib"
 
 import cairosvg
 import pandas as pd
@@ -62,14 +65,16 @@ def __add_centromere(self: ChromosomePlotter) -> None:
 
         # Right mark of the centromere:
         centromere_string = f'<path d="M -1 0 \
-            C 0 {centromere_midpoint}, {cetromere_x/2} {centromere_midpoint}, {cetromere_x/2} {centromere_midpoint} \
-            C {cetromere_x/2} {centromere_midpoint}, 0 {centromere_midpoint}, -1 {centromere_hight} Z" fill="white"/>\n'
+            C 0 {centromere_midpoint}, {cetromere_x / 2} {centromere_midpoint}, {cetromere_x / 2} {centromere_midpoint} \
+            C {cetromere_x / 2} {centromere_midpoint}, 0 {centromere_midpoint}, -1 {centromere_hight} Z" fill="white"/>\n'
 
         half_centromere = f'<g transform="translate(0, {centromere_start})">\n\t{centromere_string}\n</g>\n'
 
         # Generating the other half of the centromoere:
-        other_half = f'<g transform="rotate(180 0 {centromere_start + centromere_midpoint}) \
+        other_half = (
+            f'<g transform="rotate(180 0 {centromere_start + centromere_midpoint}) \
             translate(-{self.__width__}, 0)">\n\t{half_centromere}\n</g>\n'
+        )
 
         # adding both sides of the centromere to the plot:
         self.__plot_string__ += (

diff --git a/src/genome_plotter/functions/CytobandAnnotator.py b/src/genome_plotter/functions/CytobandAnnotator.py
@@ -2,7 +2,11 @@
 
 from __future__ import annotations
 
+import os
+
 import cairosvg
+
+os.environ["DYLD_FALLBACK_LIBRARY_PATH"] = "/opt/homebrew/lib"
 import pandas as pd
 
 

diff --git a/src/genome_plotter/functions/svg_handler.py b/src/genome_plotter/functions/svg_handler.py
@@ -2,8 +2,10 @@
 
 from __future__ import annotations
 
+import os
 from typing import Any
 
+os.environ["DYLD_FALLBACK_LIBRARY_PATH"] = "/opt/homebrew/lib"
 import cairosvg
 
 
@@ -177,7 +179,7 @@ def draw_line(
 
         if kwargs:
             for key, value in kwargs.items():
-                extra_args += f' {key.replace("_","-")}="{value}"'
+                extra_args += f' {key.replace("_", "-")}="{value}"'
         print(extra_args)
         self.__svg__ += self.__svg_line__.format(
             x1, y1, x2, y2, stroke, stroke_width, extra_args

diff --git a/src/genome_plotter/input_parsers/fetch_ensembl.py b/src/genome_plotter/input_parsers/fetch_ensembl.py
@@ -44,9 +44,9 @@ def __init__(self: FetchGenome, ensembl_parameters: SourcePrototype) -> None:
         Args:
             ensembl_parameters (SourcePrototype): Parameters to fetch the data.
         """
-        assert (
-            ensembl_parameters.release and ensembl_parameters.path
-        ), "Ensembl release and path needs to be provided."
+        assert ensembl_parameters.release and ensembl_parameters.path, (
+            "Ensembl release and path needs to be provided."
+        )
 
         # These are the parameters required to fetch the data:
         self.host = ensembl_parameters.host
@@ -93,11 +93,7 @@ def parse_genome(
                 # If there's data in the buffer, save it:
                 if chrom_data:
                     # We are skipping non-canonical chromosomes:
-                    if chrom_name and len(chrom_name) < 3:
-                        logger.info(f"Parsing chromosome {chrom_name} is done.")
-                        self.process_chromosome(chrom_data, chrom_name)
-                    else:
-                        logger.info(f"Chromosome {chrom_name} is skipped.")
+                    self.process_chromosome(chrom_data, chrom_name)
 
                     # Empty chromosome data buffer:
                     chrom_data = ""
@@ -129,6 +125,12 @@ def process_chromosome(
             chrom_data (pd.DataFrame): The chromosome sequence data.
             chr_name (str): The name of the chromosome.
         """
+        # Test chromosome name if we want to process or not:
+        if chr_name is None or len(chr_name) > 3:
+            logger.warning(f"Non-canoncial chromosome ({chr_name}) is skipped.")
+            return None
+
+        logger.info(f"Processing chromosome: {chr_name}")
         raw_data = []
         file_name = f"{self.data_folder}/{self.parsed_file.format(chr_name)}"
         chunk_size = self.chunk_size