PROTEUS

Advanced zero-day static analysis engine built with Rust and Python

Features • Quick Start • Documentation • Contributing • License

Advanced Zero-Day Static Analysis Engine

Proteus is a high-performance malware analysis tool built with Rust and Python, designed to detect zero-day threats through static analysis, heuristics, and machine learning.

🎯 Features

Core Analysis

🔍 PE/ELF Binary Analysis - Deep inspection of Windows and Linux executables
📊 Entropy Calculation - Detect packed/encrypted malware (section-level granularity)
🧠 Heuristic Scoring - Intelligent threat assessment with configurable thresholds
🔤 String Extraction - ASCII and wide string analysis with pattern detection
🌐 IOC Detection - Automatic extraction of URLs, IPs, registry keys, file paths
⚡ High Performance - Rust-powered core with parallel processing via Rayon
📦 Batch Processing - Scan entire directories efficiently

Detection Engines

🤖 ML Detection - Random Forest (96% accuracy) + Isolation Forest anomaly detection
🎯 YARA Engine - 40+ industry-standard detection rules
- Ransomware: WannaCry, Ryuk, Maze, Locky families
- RAT Detection: NanoCore, njRAT, DarkComet, Quasar, AsyncRAT
- Banking Trojans: Emotet, TrickBot, Dridex, Zeus, Formbook, AgentTesla
- Packer Detection: UPX, ASPack, Themida, VMProtect, PECompact, MPRESS
- Suspicious Behaviors: Code injection, credential dumping, keyloggers, browser theft
🔬 Multi-Layer Analysis - Combine heuristic + ML + YARA for maximum accuracy

Advanced Features

🤖 ML Ready - Feature extraction pipeline for machine learning
📈 Feature Engineering - 16+ features including entropy, imports, exports, strings
🎯 Detection Metrics - Built-in accuracy, precision, recall tracking
🔧 Extensible - Modular architecture for custom analyzers

📊 Detection Metrics (Real-World Dataset)

Metric	Value
Test Accuracy	96.22%
Precision (Malicious)	95%
Recall (Malicious)	97%
F1-Score	0.96
False Positive Rate	0.97%
Training Dataset	1,190 samples
Real Malware Samples	576
Clean Samples	614

🚀 Quick Start

Prerequisites

Rust 1.83+ (Install)
Python 3.10+ (Install)
Windows 10/11 or Linux
YARA 4.5+ (Optional, required for Rust build - Install Guide)
MalwareBazaar API (Optional, for dataset collection - included in code)

Installation

git clone https://github.com/ChronoCoders/proteus.git
cd proteus

python -m venv venv
venv\Scripts\activate

pip install -r requirements.txt

maturin develop --release

Basic Usage

Analyze a single file:

python cli.py file C:\path\to\sample.exe

Analyze with ML prediction:

python cli.py file C:\path\to\sample.exe --ml

Analyze with YARA rules:

python cli.py file C:\path\to\sample.exe --yara

Complete analysis (Heuristic + ML + YARA):

python cli.py file C:\path\to\sample.exe --ml --yara

Full analysis with strings:

python cli.py file C:\path\to\sample.exe --ml --yara --strings

String-only analysis:

python cli.py strings C:\path\to\sample.exe

Batch scan directory:

python cli.py dir C:\path\to\samples --output results.json

Collecting Real Malware Dataset

Collect malware samples from MalwareBazaar (default: 50 samples per tag, ~500 total):

python malware_collector.py

Collect with custom sample count:

# Collect 100 samples per tag (~1000 total)
python malware_collector.py --samples=100

# Collect 20 samples per tag (~200 total)
python malware_collector.py --samples=20

Enable verbose debugging mode:

python malware_collector.py --verbose

Combine options:

python malware_collector.py --samples=100 --verbose

Features:

✅ Automatic AES-encrypted ZIP extraction
✅ Retry logic for failed downloads (2 attempts per sample)
✅ Real-time progress tracking
✅ Graceful interrupt handling (Ctrl+C saves progress)
✅ Metadata persistence (resume capability)
✅ 10 malware categories: ransomware, trojan, rat, stealer, backdoor, loader, miner, banker, spyware, worm

Collection Statistics:

Default: ~500 samples in ~17 minutes
Large: ~1000 samples in ~33 minutes
Custom: configurable via --samples=N

Building Test Dataset

python test_dataset_builder.py

Training ML Models

python ml_trainer.py

📖 Documentation

Example Output

╔═══════════════════════════════════════╗
║         PROTEUS v0.2.0                ║
║   Zero-Day Static Analysis Engine     ║
╚═══════════════════════════════════════╝

[*] Analysis: suspicious.exe
[+] Type: PE
[+] Entropy: 7.85
[+] Threat Score: 66.00/100
[+] Verdict: MALICIOUS
[!] Suspicious Indicators:
    - VirtualAlloc
    - CreateRemoteThread
    - WriteProcessMemory

[*] YARA Scan:
[!] YARA Matches: 3
    Rule: Suspicious_Code_Injection
      Severity: HIGH
      Family: suspicious
    Rule: Emotet_Trojan
      Severity: CRITICAL
      Family: trojan
    Rule: UPX_Packer
      Severity: MEDIUM
      Family: packer

[*] ML Analysis:
[+] ML Prediction: MALICIOUS
[+] Confidence: 100.00%
[+] Probabilities:
    Clean: 0.00%
    Malicious: 100.00%

[*] String Analysis:
[+] Total strings: 342
[+] Encoded strings: 15

[!] URLs (2):
    http://malicious-c2.com/payload
    https://evil.net/download

[!] Suspicious strings (8):
    cmd.exe /c powershell
    Disable-WindowsDefender
    keylogger.dll

Architecture

proteus/
├── src/                      # Rust core engine
│   ├── lib.rs                # Module entry point
│   ├── pe_parser.rs          # PE file parsing (goblin)
│   ├── elf_parser.rs         # ELF file parsing
│   ├── entropy.rs            # Shannon entropy calculation
│   ├── heuristics.rs         # Threat scoring algorithms
│   ├── string_extractor.rs   # String analysis engine
│   └── python_bindings.rs    # PyO3 FFI bindings
├── python/                   # Python orchestration
│   ├── __init__.py
│   ├── analyzer.py           # Main analyzer class
│   ├── ml_detector.py        # ML model integration
│   ├── yara_engine.py        # YARA rule engine
│   ├── config.py             # Configuration management
│   ├── validators.py         # Security validators
│   └── rate_limiter.py       # API rate limiting
├── yara_rules/               # YARA detection rules
│   ├── ransomware.yar        # Ransomware signatures
│   ├── rats.yar              # RAT detection
│   ├── trojans.yar           # Banking trojans
│   ├── packers.yar           # Packer detection
│   └── suspicious_behavior.yar # Behavioral analysis
├── cli.py                    # Command-line interface
├── malware_collector.py      # MalwareBazaar dataset collector
├── ml_trainer.py             # ML training pipeline
├── test_dataset_builder.py   # Dataset generation
├── requirements.txt          # Python dependencies
├── Cargo.toml                # Rust dependencies
└── pyproject.toml            # Python project configuration

Feature Extraction

Proteus extracts 16+ features per sample:

Binary Features:

Global entropy
Section count
Max section entropy
Import count
Export count
Suspicious API count

String Features:

Total strings
URL count
IP count
Registry key count
Suspicious keyword count
File path count
Encoded string count
Encoded ratio
Suspicious ratio

Threat Detection Patterns

High Entropy Indicators:

Entropy > 7.8: Likely packed/encrypted
Entropy > 7.5: Suspicious compression
Entropy > 7.2: Elevated entropy

Suspicious APIs (PE):

VirtualAlloc, VirtualProtect, WriteProcessMemory,
CreateRemoteThread, LoadLibrary, GetProcAddress,
WinExec, ShellExecute, URLDownloadToFile,
CreateProcess, OpenProcess, ReadProcessMemory,
SetWindowsHookEx, GetAsyncKeyState, InternetOpen

Suspicious Symbols (ELF):

execve, system, fork, ptrace, mprotect,
mmap, dlopen, socket, bind

Suspicious Keywords (Strings):

cmd, powershell, eval, exec, system, shell,
download, upload, exploit, payload, inject,
keylog, screenshot, webcam, ransomware,
encrypt, bitcoin, miner, bypass, disable

🔬 Development

Build & Test

maturin develop

maturin develop --release

cargo test

python -m pytest

cargo clippy
mypy .

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Style

Rust: Follow rustfmt and clippy recommendations
Python: Follow PEP 8, type hints required
No comments in code (self-documenting code preferred)
Use latest stable versions of dependencies

🗺️ Roadmap

v0.2.0 (Current) ✅

YARA rule engine (40+ detection rules)
Ransomware, RAT, Trojan, Packer detection
Suspicious behavior analysis
CLI --yara flag integration
Multi-layer detection (Heuristic + ML + YARA)

v0.3.0 (Planned)

Advanced packer detection enhancements
Digital signature validation
PE resource section analysis
Retrain ML models with larger real-world dataset (1000+ samples)
Custom YARA rule support via CLI

v0.4.0 (Future)

📊 Performance

Benchmarks (Intel i7, 16GB RAM):

Single file analysis: ~50ms
Batch processing (100 files): ~3 seconds
String extraction: ~20ms
ML prediction: ~5ms
YARA scanning: ~100ms

⚠️ Limitations

Current Version (v0.2.0):

ML models require training on collected real-world samples
No dynamic analysis capabilities
Windows-focused (PE analysis more mature than ELF)
Dataset collection requires MalwareBazaar API access

Recommended Use:

Educational purposes
Research projects
Malware analysis training
Static analysis component in larger systems
Dataset collection for ML training

🔒 Security & Legal

Important Notes:

Always analyze malware in isolated environments (VMs/sandboxes)
Do not use on production systems without proper testing
Obey local laws regarding malware possession and analysis
This tool is for educational and research purposes only

Disclaimer: The authors are not responsible for misuse of this tool. Users are solely responsible for ensuring their usage complies with applicable laws and regulations.

📝 License

MIT License - see LICENSE file for details

👥 Authors

ChronoCoders Team

Advanced static analysis engine
ML integration
YARA rule engine
Performance optimization

🙏 Acknowledgments

goblin - Excellent binary parsing library
PyO3 - Seamless Rust-Python integration
Rayon - Parallel processing made easy
scikit-learn - ML algorithms
pyzipper - AES-encrypted ZIP extraction
MalwareBazaar - Real-world malware sample repository
YARA - Industry-standard malware detection framework

📚 Additional Resources

⭐ If you find Proteus useful, please star the repository!

🐛 Found a bug? Open an issue

💡 Have a feature request? Start a discussion

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
api		api
docs		docs
models		models
python		python
src		src
web		web
yara_rules		yara_rules
.deepsource.toml		.deepsource.toml
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
cli.py		cli.py
malware_collector.py		malware_collector.py
ml_trainer.py		ml_trainer.py
mypy.ini		mypy.ini
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

ChronoCoders/proteus

Folders and files

Latest commit

History

Repository files navigation

PROTEUS

🎯 Features

Core Analysis

Detection Engines

Advanced Features

📊 Detection Metrics (Real-World Dataset)

🚀 Quick Start

Prerequisites

Installation

Basic Usage

Collecting Real Malware Dataset

Building Test Dataset

Training ML Models

📖 Documentation

Example Output

Architecture

Feature Extraction

Threat Detection Patterns

🔬 Development

Build & Test

Contributing

Code Style

🗺️ Roadmap

v0.2.0 (Current) ✅

v0.3.0 (Planned)

v0.4.0 (Future)

📊 Performance

⚠️ Limitations

🔒 Security & Legal

📝 License

👥 Authors

🙏 Acknowledgments

📚 Additional Resources

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages