Skip to content

Advanced zero-day static analysis engine built with Rust and Python for malware detection through entropy analysis, heuristics, and machine learning

Notifications You must be signed in to change notification settings

ChronoCoders/proteus

Repository files navigation

PROTEUS

Rust Python License Status Stars Forks Issues Release

Advanced zero-day static analysis engine built with Rust and Python

Features β€’ Quick Start β€’ Documentation β€’ Contributing β€’ License


Advanced Zero-Day Static Analysis Engine

Proteus is a high-performance malware analysis tool built with Rust and Python, designed to detect zero-day threats through static analysis, heuristics, and machine learning.

🎯 Features

Core Analysis

  • πŸ” PE/ELF Binary Analysis - Deep inspection of Windows and Linux executables
  • πŸ“Š Entropy Calculation - Detect packed/encrypted malware (section-level granularity)
  • 🧠 Heuristic Scoring - Intelligent threat assessment with configurable thresholds
  • πŸ”€ String Extraction - ASCII and wide string analysis with pattern detection
  • 🌐 IOC Detection - Automatic extraction of URLs, IPs, registry keys, file paths
  • ⚑ High Performance - Rust-powered core with parallel processing via Rayon
  • πŸ“¦ Batch Processing - Scan entire directories efficiently

Detection Engines

  • πŸ€– ML Detection - Random Forest (96% accuracy) + Isolation Forest anomaly detection
  • 🎯 YARA Engine - 40+ industry-standard detection rules
    • Ransomware: WannaCry, Ryuk, Maze, Locky families
    • RAT Detection: NanoCore, njRAT, DarkComet, Quasar, AsyncRAT
    • Banking Trojans: Emotet, TrickBot, Dridex, Zeus, Formbook, AgentTesla
    • Packer Detection: UPX, ASPack, Themida, VMProtect, PECompact, MPRESS
    • Suspicious Behaviors: Code injection, credential dumping, keyloggers, browser theft
  • πŸ”¬ Multi-Layer Analysis - Combine heuristic + ML + YARA for maximum accuracy

Advanced Features

  • πŸ€– ML Ready - Feature extraction pipeline for machine learning
  • πŸ“ˆ Feature Engineering - 16+ features including entropy, imports, exports, strings
  • 🎯 Detection Metrics - Built-in accuracy, precision, recall tracking
  • πŸ”§ Extensible - Modular architecture for custom analyzers

πŸ“Š Detection Metrics (Real-World Dataset)

Metric Value
Test Accuracy 96.22%
Precision (Malicious) 95%
Recall (Malicious) 97%
F1-Score 0.96
False Positive Rate 0.97%
Training Dataset 1,190 samples
Real Malware Samples 576
Clean Samples 614

πŸš€ Quick Start

Prerequisites

  • Rust 1.83+ (Install)
  • Python 3.10+ (Install)
  • Windows 10/11 or Linux
  • YARA 4.5+ (Optional, required for Rust build - Install Guide)
  • MalwareBazaar API (Optional, for dataset collection - included in code)

Installation

git clone https://github.com/ChronoCoders/proteus.git
cd proteus

python -m venv venv
venv\Scripts\activate

pip install -r requirements.txt

maturin develop --release

Basic Usage

Analyze a single file:

python cli.py file C:\path\to\sample.exe

Analyze with ML prediction:

python cli.py file C:\path\to\sample.exe --ml

Analyze with YARA rules:

python cli.py file C:\path\to\sample.exe --yara

Complete analysis (Heuristic + ML + YARA):

python cli.py file C:\path\to\sample.exe --ml --yara

Full analysis with strings:

python cli.py file C:\path\to\sample.exe --ml --yara --strings

String-only analysis:

python cli.py strings C:\path\to\sample.exe

Batch scan directory:

python cli.py dir C:\path\to\samples --output results.json

Collecting Real Malware Dataset

Collect malware samples from MalwareBazaar (default: 50 samples per tag, ~500 total):

python malware_collector.py

Collect with custom sample count:

# Collect 100 samples per tag (~1000 total)
python malware_collector.py --samples=100

# Collect 20 samples per tag (~200 total)
python malware_collector.py --samples=20

Enable verbose debugging mode:

python malware_collector.py --verbose

Combine options:

python malware_collector.py --samples=100 --verbose

Features:

  • βœ… Automatic AES-encrypted ZIP extraction
  • βœ… Retry logic for failed downloads (2 attempts per sample)
  • βœ… Real-time progress tracking
  • βœ… Graceful interrupt handling (Ctrl+C saves progress)
  • βœ… Metadata persistence (resume capability)
  • βœ… 10 malware categories: ransomware, trojan, rat, stealer, backdoor, loader, miner, banker, spyware, worm

Collection Statistics:

  • Default: ~500 samples in ~17 minutes
  • Large: ~1000 samples in ~33 minutes
  • Custom: configurable via --samples=N

Building Test Dataset

python test_dataset_builder.py

Training ML Models

python ml_trainer.py

πŸ“– Documentation

Example Output

╔═══════════════════════════════════════╗
β•‘         PROTEUS v0.2.0                β•‘
β•‘   Zero-Day Static Analysis Engine     β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

[*] Analysis: suspicious.exe
[+] Type: PE
[+] Entropy: 7.85
[+] Threat Score: 66.00/100
[+] Verdict: MALICIOUS
[!] Suspicious Indicators:
    - VirtualAlloc
    - CreateRemoteThread
    - WriteProcessMemory

[*] YARA Scan:
[!] YARA Matches: 3
    Rule: Suspicious_Code_Injection
      Severity: HIGH
      Family: suspicious
    Rule: Emotet_Trojan
      Severity: CRITICAL
      Family: trojan
    Rule: UPX_Packer
      Severity: MEDIUM
      Family: packer

[*] ML Analysis:
[+] ML Prediction: MALICIOUS
[+] Confidence: 100.00%
[+] Probabilities:
    Clean: 0.00%
    Malicious: 100.00%

[*] String Analysis:
[+] Total strings: 342
[+] Encoded strings: 15

[!] URLs (2):
    http://malicious-c2.com/payload
    https://evil.net/download

[!] Suspicious strings (8):
    cmd.exe /c powershell
    Disable-WindowsDefender
    keylogger.dll

Architecture

proteus/
β”œβ”€β”€ src/                      # Rust core engine
β”‚   β”œβ”€β”€ lib.rs                # Module entry point
β”‚   β”œβ”€β”€ pe_parser.rs          # PE file parsing (goblin)
β”‚   β”œβ”€β”€ elf_parser.rs         # ELF file parsing
β”‚   β”œβ”€β”€ entropy.rs            # Shannon entropy calculation
β”‚   β”œβ”€β”€ heuristics.rs         # Threat scoring algorithms
β”‚   β”œβ”€β”€ string_extractor.rs   # String analysis engine
β”‚   └── python_bindings.rs    # PyO3 FFI bindings
β”œβ”€β”€ python/                   # Python orchestration
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ analyzer.py           # Main analyzer class
β”‚   β”œβ”€β”€ ml_detector.py        # ML model integration
β”‚   β”œβ”€β”€ yara_engine.py        # YARA rule engine
β”‚   β”œβ”€β”€ config.py             # Configuration management
β”‚   β”œβ”€β”€ validators.py         # Security validators
β”‚   └── rate_limiter.py       # API rate limiting
β”œβ”€β”€ yara_rules/               # YARA detection rules
β”‚   β”œβ”€β”€ ransomware.yar        # Ransomware signatures
β”‚   β”œβ”€β”€ rats.yar              # RAT detection
β”‚   β”œβ”€β”€ trojans.yar           # Banking trojans
β”‚   β”œβ”€β”€ packers.yar           # Packer detection
β”‚   └── suspicious_behavior.yar # Behavioral analysis
β”œβ”€β”€ cli.py                    # Command-line interface
β”œβ”€β”€ malware_collector.py      # MalwareBazaar dataset collector
β”œβ”€β”€ ml_trainer.py             # ML training pipeline
β”œβ”€β”€ test_dataset_builder.py   # Dataset generation
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ Cargo.toml                # Rust dependencies
└── pyproject.toml            # Python project configuration

Feature Extraction

Proteus extracts 16+ features per sample:

Binary Features:

  • Global entropy
  • Section count
  • Max section entropy
  • Import count
  • Export count
  • Suspicious API count

String Features:

  • Total strings
  • URL count
  • IP count
  • Registry key count
  • Suspicious keyword count
  • File path count
  • Encoded string count
  • Encoded ratio
  • Suspicious ratio

Threat Detection Patterns

High Entropy Indicators:

  • Entropy > 7.8: Likely packed/encrypted
  • Entropy > 7.5: Suspicious compression
  • Entropy > 7.2: Elevated entropy

Suspicious APIs (PE):

VirtualAlloc, VirtualProtect, WriteProcessMemory,
CreateRemoteThread, LoadLibrary, GetProcAddress,
WinExec, ShellExecute, URLDownloadToFile,
CreateProcess, OpenProcess, ReadProcessMemory,
SetWindowsHookEx, GetAsyncKeyState, InternetOpen

Suspicious Symbols (ELF):

execve, system, fork, ptrace, mprotect,
mmap, dlopen, socket, bind

Suspicious Keywords (Strings):

cmd, powershell, eval, exec, system, shell,
download, upload, exploit, payload, inject,
keylog, screenshot, webcam, ransomware,
encrypt, bitcoin, miner, bypass, disable

πŸ”¬ Development

Build & Test

maturin develop

maturin develop --release

cargo test

python -m pytest

cargo clippy
mypy .

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Code Style

  • Rust: Follow rustfmt and clippy recommendations
  • Python: Follow PEP 8, type hints required
  • No comments in code (self-documenting code preferred)
  • Use latest stable versions of dependencies

πŸ—ΊοΈ Roadmap

v0.2.0 (Current) βœ…

  • YARA rule engine (40+ detection rules)
  • Ransomware, RAT, Trojan, Packer detection
  • Suspicious behavior analysis
  • CLI --yara flag integration
  • Multi-layer detection (Heuristic + ML + YARA)

v0.3.0 (Planned)

  • Advanced packer detection enhancements
  • Digital signature validation
  • PE resource section analysis
  • Retrain ML models with larger real-world dataset (1000+ samples)
  • Custom YARA rule support via CLI

v0.4.0 (Future)

  • HTML report generation
  • REST API server
  • Web dashboard
  • Real-time monitoring
  • PCAP analysis integration
  • Behavior monitoring (dynamic analysis)

πŸ“Š Performance

Benchmarks (Intel i7, 16GB RAM):

  • Single file analysis: ~50ms
  • Batch processing (100 files): ~3 seconds
  • String extraction: ~20ms
  • ML prediction: ~5ms
  • YARA scanning: ~100ms

⚠️ Limitations

Current Version (v0.2.0):

  • ML models require training on collected real-world samples
  • No dynamic analysis capabilities
  • Windows-focused (PE analysis more mature than ELF)
  • Dataset collection requires MalwareBazaar API access

Recommended Use:

  • Educational purposes
  • Research projects
  • Malware analysis training
  • Static analysis component in larger systems
  • Dataset collection for ML training

πŸ”’ Security & Legal

Important Notes:

  • Always analyze malware in isolated environments (VMs/sandboxes)
  • Do not use on production systems without proper testing
  • Obey local laws regarding malware possession and analysis
  • This tool is for educational and research purposes only

Disclaimer: The authors are not responsible for misuse of this tool. Users are solely responsible for ensuring their usage complies with applicable laws and regulations.

πŸ“ License

MIT License - see LICENSE file for details

Copyright (c) 2025 ChronoCoders

πŸ‘₯ Authors

ChronoCoders Team

  • Advanced static analysis engine
  • ML integration
  • YARA rule engine
  • Performance optimization

πŸ™ Acknowledgments

  • goblin - Excellent binary parsing library
  • PyO3 - Seamless Rust-Python integration
  • Rayon - Parallel processing made easy
  • scikit-learn - ML algorithms
  • pyzipper - AES-encrypted ZIP extraction
  • MalwareBazaar - Real-world malware sample repository
  • YARA - Industry-standard malware detection framework

πŸ“š Additional Resources


⭐ If you find Proteus useful, please star the repository!

πŸ› Found a bug? Open an issue

πŸ’‘ Have a feature request? Start a discussion