🧬 Bioinformatics – Protein Sequence Property Analyzer

A Python-based bioinformatics project for analyzing protein sequences and extracting important biochemical, structural, and physicochemical properties.
The project reads protein sequences in FASTA format, performs protein analysis using Biopython, computes hydropathy profiles, detects motifs, classifies proteins based on stability and hydrophobicity, and visualizes the results using Matplotlib.

📌 Project Overview

Proteins are biological macromolecules made up of amino acid chains, and their properties determine how they behave in a cell.
This project focuses on sequence-based protein analysis and produces a detailed report for each protein sequence provided in the input.

The analyzer calculates:

Protein Length
Molecular Weight
Isoelectric Point (pI)
GRAVY Score (Grand Average of Hydropathy)
Instability Index
Secondary Structure Fractions
Hydropathy Profile
Motif Locations
Protein Classification

✨ Features

🧾 FASTA Parsing
- Reads protein sequences from FASTA format
- Supports sequence names and sequence data separation
📂 External FASTA File Support
- Can be extended to load .fasta, .fa, or .txt FASTA files from:
  - local system
  - project folder
  - downloads folder
  - any custom file path
🧪 Protein Property Analysis
- Uses Bio.SeqUtils.ProtParam.ProteinAnalysis
- Computes:
  - molecular weight
  - isoelectric point
  - GRAVY
  - instability index
  - helix fraction
  - turn fraction
  - sheet fraction
🌊 Hydropathy Plot
- Uses a sliding-window method to calculate local hydropathy values
- Helps identify hydrophobic and hydrophilic regions in the protein
🔍 Motif Detection
- Uses regular expressions to identify sequence motifs
- Default motif used in the project:
```
N[^P][ST]
```
🧠 Protein Classification
- Classifies proteins as:
  - Stable & Hydrophobic
  - Stable & Hydrophilic
  - Unstable Protein
📊 Tabular Summary
- Displays all computed values in a Pandas DataFrame
📈 Graphical Visualization
- Generates hydropathy plots for each protein sequence

🛠️ Technologies Used

Python
Biopython
Matplotlib
Pandas
Regular Expressions (re)

⚙️ How It Works

1. FASTA Input

The program begins with protein sequences written in FASTA format.

Example:

>Protein_Name
SEQUENCE

📂 External FASTA File Support

The project is not limited to hardcoded FASTA sequences only.
Users can also import their own external FASTA files (.fasta, .fa, .txt) from anywhere on their local system for analysis.

This makes the project more flexible and closer to real-world bioinformatics workflows where protein datasets are usually stored as external FASTA files.

Example File Paths

C:\Users\yourname\Desktop\protein.fasta
D:\Bioinformatics\sample.fa
./datasets/proteins.fasta

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
BIOINFORMATICS.ipynb		BIOINFORMATICS.ipynb
README.md		README.md
protein_analyzer.py		protein_analyzer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 Bioinformatics – Protein Sequence Property Analyzer

📌 Project Overview

✨ Features

🛠️ Technologies Used

⚙️ How It Works

1. FASTA Input

Example:

📂 External FASTA File Support

Example File Paths

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 Bioinformatics – Protein Sequence Property Analyzer

📌 Project Overview

✨ Features

🛠️ Technologies Used

⚙️ How It Works

1. FASTA Input

Example:

📂 External FASTA File Support

Example File Paths

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages