Advanced zero-day static analysis engine built with Rust and Python
Features β’ Quick Start β’ Documentation β’ Contributing β’ License
Advanced Zero-Day Static Analysis Engine
Proteus is a high-performance malware analysis tool built with Rust and Python, designed to detect zero-day threats through static analysis, heuristics, and machine learning.
- π PE/ELF Binary Analysis - Deep inspection of Windows and Linux executables
- π Entropy Calculation - Detect packed/encrypted malware (section-level granularity)
- π§ Heuristic Scoring - Intelligent threat assessment with configurable thresholds
- π€ String Extraction - ASCII and wide string analysis with pattern detection
- π IOC Detection - Automatic extraction of URLs, IPs, registry keys, file paths
- β‘ High Performance - Rust-powered core with parallel processing via Rayon
- π¦ Batch Processing - Scan entire directories efficiently
- π€ ML Detection - Random Forest (96% accuracy) + Isolation Forest anomaly detection
- π― YARA Engine - 40+ industry-standard detection rules
- Ransomware: WannaCry, Ryuk, Maze, Locky families
- RAT Detection: NanoCore, njRAT, DarkComet, Quasar, AsyncRAT
- Banking Trojans: Emotet, TrickBot, Dridex, Zeus, Formbook, AgentTesla
- Packer Detection: UPX, ASPack, Themida, VMProtect, PECompact, MPRESS
- Suspicious Behaviors: Code injection, credential dumping, keyloggers, browser theft
- π¬ Multi-Layer Analysis - Combine heuristic + ML + YARA for maximum accuracy
- π€ ML Ready - Feature extraction pipeline for machine learning
- π Feature Engineering - 16+ features including entropy, imports, exports, strings
- π― Detection Metrics - Built-in accuracy, precision, recall tracking
- π§ Extensible - Modular architecture for custom analyzers
| Metric | Value |
|---|---|
| Test Accuracy | 96.22% |
| Precision (Malicious) | 95% |
| Recall (Malicious) | 97% |
| F1-Score | 0.96 |
| False Positive Rate | 0.97% |
| Training Dataset | 1,190 samples |
| Real Malware Samples | 576 |
| Clean Samples | 614 |
- Rust 1.83+ (Install)
- Python 3.10+ (Install)
- Windows 10/11 or Linux
- YARA 4.5+ (Optional, required for Rust build - Install Guide)
- MalwareBazaar API (Optional, for dataset collection - included in code)
git clone https://github.com/ChronoCoders/proteus.git
cd proteus
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
maturin develop --releaseAnalyze a single file:
python cli.py file C:\path\to\sample.exeAnalyze with ML prediction:
python cli.py file C:\path\to\sample.exe --mlAnalyze with YARA rules:
python cli.py file C:\path\to\sample.exe --yaraComplete analysis (Heuristic + ML + YARA):
python cli.py file C:\path\to\sample.exe --ml --yaraFull analysis with strings:
python cli.py file C:\path\to\sample.exe --ml --yara --stringsString-only analysis:
python cli.py strings C:\path\to\sample.exeBatch scan directory:
python cli.py dir C:\path\to\samples --output results.jsonCollect malware samples from MalwareBazaar (default: 50 samples per tag, ~500 total):
python malware_collector.pyCollect with custom sample count:
# Collect 100 samples per tag (~1000 total)
python malware_collector.py --samples=100
# Collect 20 samples per tag (~200 total)
python malware_collector.py --samples=20Enable verbose debugging mode:
python malware_collector.py --verboseCombine options:
python malware_collector.py --samples=100 --verboseFeatures:
- β Automatic AES-encrypted ZIP extraction
- β Retry logic for failed downloads (2 attempts per sample)
- β Real-time progress tracking
- β Graceful interrupt handling (Ctrl+C saves progress)
- β Metadata persistence (resume capability)
- β 10 malware categories: ransomware, trojan, rat, stealer, backdoor, loader, miner, banker, spyware, worm
Collection Statistics:
- Default: ~500 samples in ~17 minutes
- Large: ~1000 samples in ~33 minutes
- Custom: configurable via
--samples=N
python test_dataset_builder.pypython ml_trainer.pyβββββββββββββββββββββββββββββββββββββββββ
β PROTEUS v0.2.0 β
β Zero-Day Static Analysis Engine β
βββββββββββββββββββββββββββββββββββββββββ
[*] Analysis: suspicious.exe
[+] Type: PE
[+] Entropy: 7.85
[+] Threat Score: 66.00/100
[+] Verdict: MALICIOUS
[!] Suspicious Indicators:
- VirtualAlloc
- CreateRemoteThread
- WriteProcessMemory
[*] YARA Scan:
[!] YARA Matches: 3
Rule: Suspicious_Code_Injection
Severity: HIGH
Family: suspicious
Rule: Emotet_Trojan
Severity: CRITICAL
Family: trojan
Rule: UPX_Packer
Severity: MEDIUM
Family: packer
[*] ML Analysis:
[+] ML Prediction: MALICIOUS
[+] Confidence: 100.00%
[+] Probabilities:
Clean: 0.00%
Malicious: 100.00%
[*] String Analysis:
[+] Total strings: 342
[+] Encoded strings: 15
[!] URLs (2):
http://malicious-c2.com/payload
https://evil.net/download
[!] Suspicious strings (8):
cmd.exe /c powershell
Disable-WindowsDefender
keylogger.dll
proteus/
βββ src/ # Rust core engine
β βββ lib.rs # Module entry point
β βββ pe_parser.rs # PE file parsing (goblin)
β βββ elf_parser.rs # ELF file parsing
β βββ entropy.rs # Shannon entropy calculation
β βββ heuristics.rs # Threat scoring algorithms
β βββ string_extractor.rs # String analysis engine
β βββ python_bindings.rs # PyO3 FFI bindings
βββ python/ # Python orchestration
β βββ __init__.py
β βββ analyzer.py # Main analyzer class
β βββ ml_detector.py # ML model integration
β βββ yara_engine.py # YARA rule engine
β βββ config.py # Configuration management
β βββ validators.py # Security validators
β βββ rate_limiter.py # API rate limiting
βββ yara_rules/ # YARA detection rules
β βββ ransomware.yar # Ransomware signatures
β βββ rats.yar # RAT detection
β βββ trojans.yar # Banking trojans
β βββ packers.yar # Packer detection
β βββ suspicious_behavior.yar # Behavioral analysis
βββ cli.py # Command-line interface
βββ malware_collector.py # MalwareBazaar dataset collector
βββ ml_trainer.py # ML training pipeline
βββ test_dataset_builder.py # Dataset generation
βββ requirements.txt # Python dependencies
βββ Cargo.toml # Rust dependencies
βββ pyproject.toml # Python project configuration
Proteus extracts 16+ features per sample:
Binary Features:
- Global entropy
- Section count
- Max section entropy
- Import count
- Export count
- Suspicious API count
String Features:
- Total strings
- URL count
- IP count
- Registry key count
- Suspicious keyword count
- File path count
- Encoded string count
- Encoded ratio
- Suspicious ratio
High Entropy Indicators:
- Entropy > 7.8: Likely packed/encrypted
- Entropy > 7.5: Suspicious compression
- Entropy > 7.2: Elevated entropy
Suspicious APIs (PE):
VirtualAlloc, VirtualProtect, WriteProcessMemory,
CreateRemoteThread, LoadLibrary, GetProcAddress,
WinExec, ShellExecute, URLDownloadToFile,
CreateProcess, OpenProcess, ReadProcessMemory,
SetWindowsHookEx, GetAsyncKeyState, InternetOpen
Suspicious Symbols (ELF):
execve, system, fork, ptrace, mprotect,
mmap, dlopen, socket, bind
Suspicious Keywords (Strings):
cmd, powershell, eval, exec, system, shell,
download, upload, exploit, payload, inject,
keylog, screenshot, webcam, ransomware,
encrypt, bitcoin, miner, bypass, disable
maturin develop
maturin develop --release
cargo test
python -m pytest
cargo clippy
mypy .Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Rust: Follow
rustfmtandclippyrecommendations - Python: Follow PEP 8, type hints required
- No comments in code (self-documenting code preferred)
- Use latest stable versions of dependencies
- YARA rule engine (40+ detection rules)
- Ransomware, RAT, Trojan, Packer detection
- Suspicious behavior analysis
- CLI --yara flag integration
- Multi-layer detection (Heuristic + ML + YARA)
- Advanced packer detection enhancements
- Digital signature validation
- PE resource section analysis
- Retrain ML models with larger real-world dataset (1000+ samples)
- Custom YARA rule support via CLI
- HTML report generation
- REST API server
- Web dashboard
- Real-time monitoring
- PCAP analysis integration
- Behavior monitoring (dynamic analysis)
Benchmarks (Intel i7, 16GB RAM):
- Single file analysis: ~50ms
- Batch processing (100 files): ~3 seconds
- String extraction: ~20ms
- ML prediction: ~5ms
- YARA scanning: ~100ms
Current Version (v0.2.0):
- ML models require training on collected real-world samples
- No dynamic analysis capabilities
- Windows-focused (PE analysis more mature than ELF)
- Dataset collection requires MalwareBazaar API access
Recommended Use:
- Educational purposes
- Research projects
- Malware analysis training
- Static analysis component in larger systems
- Dataset collection for ML training
Important Notes:
- Always analyze malware in isolated environments (VMs/sandboxes)
- Do not use on production systems without proper testing
- Obey local laws regarding malware possession and analysis
- This tool is for educational and research purposes only
Disclaimer: The authors are not responsible for misuse of this tool. Users are solely responsible for ensuring their usage complies with applicable laws and regulations.
MIT License - see LICENSE file for details
Copyright (c) 2025 ChronoCoders
ChronoCoders Team
- Advanced static analysis engine
- ML integration
- YARA rule engine
- Performance optimization
- goblin - Excellent binary parsing library
- PyO3 - Seamless Rust-Python integration
- Rayon - Parallel processing made easy
- scikit-learn - ML algorithms
- pyzipper - AES-encrypted ZIP extraction
- MalwareBazaar - Real-world malware sample repository
- YARA - Industry-standard malware detection framework
β If you find Proteus useful, please star the repository!
π Found a bug? Open an issue
π‘ Have a feature request? Start a discussion