A Python-based computational genomics pipeline for detecting DNA sequence mutations across multiple genome samples using FASTA sequence analysis.
This project simulates a simplified variant calling workflow used in computational biology and genomics research.
The pipeline compares sample DNA sequences against a reference genome and identifies genomic variants (mutations), including SNP-like changes.
- FASTA sequence parsing
- Reference vs sample genome comparison
- Mutation (variant) detection
- Multi-sample genomic analysis
- Variant annotation
- CSV export of detected mutations
- Mutation visualization plots
- Python
- BioPython
- Pandas
- NumPy
- Matplotlib
variant-calling-pipeline/
│
├── data/
│ ├── reference.fasta
│ ├── sample1.fasta
│ ├── sample2.fasta
│ └── sample3.fasta
│
├── src/
│ └── variant_detector.py
│
├── results/
│ ├── variants.csv
│ └── variant_plot.png
│
├── requirements.txt
├── README.md
└── .gitignore
## Example Variant Output
Position Reference Sample Mutation
17 C T C>T
22 G A G>A
Visualization
The pipeline generates mutation position plots for genomic variant visualization.
Output example:
results/variant_plot.png
##How to Run
1️- Activate virtual environment
.\venv\Scripts\Activate.ps1
2- Run variant calling pipeline
python src/variant_detector.py
## Bioinformatics Concepts Used
Variant Calling
SNP Detection
FASTA Parsing
Comparative Genomics
Mutation Annotation
Genome Analysis Pipelines
## Future Improvements
VCF file generation
Sequence alignment scoring
Real genome dataset integration
Streamlit web interface
Phylogenetic analysis
Advanced mutation statistics
## Author
Built as a bioinformatics portfolio project to demonstrate computational genomics and Python-based genome analysis skills.
---
# STEP 3 — SAVE FILE
Press:
```text id="save1"
Ctrl + S