Skip to content

mathiashole/ipegg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

236 Commits
 
 
 
 
 
 

Repository files navigation

ipegg

R GitHub code size in bytes GitHub last commit

ipegg (InterProscan Environment & Genomic Grapher) is an R-based tool designed to visualize protein domain architectures from InterProScan TSV outputs. It creates publication-quality plots that respect the full protein length and allows for deep customization via YAML configuration files.

🚀 Features

📊 Advanced Visualization

  • True Backbone: Represents the protein from amino acid 0 to its full length (V3 column), highlighting N- and C-terminal regions without domains.
  • Intelligent Scaling: Automatically adjusts image height based on the number of sequences to maintain consistent block thickness.
  • Natural Sorting: Handles alphanumeric IDs (e.g., Chr1, Chr2, Chr10) correctly.

🎨 Customization

  • Regex Normalization: Automatically clean or rename complex domain names.
  • Domain Filtering: Remove noisy or uninformative hits (e.g., non-cytoplasmic regions).
  • Dynamic Themes: Large axis titles and clear X-axis lines for better readability.

🔗 Integration

  • iTOL Ready: Generates ready-to-use dataset files for iTOL (Interactive Tree Of Life).
  • Phylogenetic Ordering: Option to order sequences based on a Newick tree file.

⚙️ Configuration (config.yaml)

ipegg uses a YAML file to manage settings, ensuring reproducible results for your genomic pipelines.

Yeast Example Configuration:

🛠️ Installation

1. Clone the repository:

git clone [https://github.com/mathiashole/ipegg.git](https://github.com/mathiashole/ipegg.git)

cd ipegg

2. Install required R packages:

install.packages(c("ggplot2", "dplyr", "yaml", "RColorBrewer", "tidyr", "ggtree"))

🔧 Usage

Run the script from your terminal passing the configuration file as an argument:

Rscript ipegg.r --config config2.yaml

⚙️ YAML Arguments

Section Parameter Description
input file Path to the InterProScan TSV file.
normalize replace Dictionary of "Pattern": "Replacement" for name cleaning (Regex support).
remove list List of domains or keywords to exclude from the final plot.
options ordenar Boolean. If true, applies natural alphanumeric sort (e.g., Hap1, Hap2, Hap10).
domains colors Manual color mapping using R color names or Hex codes.

💖 Contributing

We welcome contributions!

  • :octocat: Pull requests and 🌟 stars are always welcome.
  • For major changes, please open an issue first to discuss what you would like to change.
  • Please make sure to update tests as appropriate.

About

ipegg is a viewer and compares protein domain architectures based on InterProScan annotations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages