This project focuses on malware classification by analyzing the frequency of assembly instructions extracted from disassembled files. It leverages a hybrid ML/DL pipeline that combines:
- ✅ Deep Learning (MLP) for complex pattern recognition
- ✅ Traditional ML (RandomForest, XGBoost, StackingClassifier) for ensemble robustness
- ✅ SHAP Explainability for interpretable AI and feature impact analysis
- Machine Learning:
RandomForestClassifier,XGBoost,StackingClassifier - Deep Learning:
MLP (Multi-Layer Perceptron)built withTensorFlow/Keras - Feature Engineering: Opcode frequency extraction from
.asmfiles - Libraries Used:
Scikit-learn,XGBoost,TensorFlow,SHAP,Matplotlib,Pandas,NumPy - Visualization & Explainability:
SHAP,Matplotlib
The dataset consists of malware samples with counts of key assembly instructions.
Extracted Instructions:
mov, jmp, call, push, pop, cmp, lea, xor, test, sub, add, shr, shl
Target Labels (Malware Families):
| Label | Malware Type |
|---|---|
| 0 | Trojan 🐴 |
| 1 | Ransomware 💰 |
| 2 | Worm 🪱 |
| 3 | Spyware 🔍 |
| 4 | Adware 📢 |
| 5 | Rootkit 🛠️ |
| 6 | Backdoor 🔓 |
| 7 | Keylogger ⌨️ |
| 8 | Fileless Malware 🌫 |
- ✅ Extract opcode frequency features
- ✅ Normalize features using
StandardScaler - ✅ Select top 10 features via
RandomForestimportance
- Input: Top 10 most important features
- Hidden Layers:
128 → 64 neurons(ReLU + Dropout) - Output: Softmax over 9 malware classes
- Loss:
Categorical Crossentropy - Optimizer:
Adam
- Base Models:
RandomForest 🌲,XGBoost ⚡ - Meta Learner:
RandomForest - Trained on selected top features
- Accuracy Score
- Classification Report
- Confusion Matrix
| Model | Accuracy |
|---|---|
| MLP | 90.20% |
| Stacking Model | 97.84% |
- SHAP Summary Plot: Shows feature contributions for predictions
- Feature Importance Plot: Highlights top influential instructions
- Instruction Frequency by Malware Type: Explains behavioral patterns
| Name | GitHub |
|---|---|
| Shobhana Shankar | @Shobhanashankar |
| Madhumita | @Madhumita-05 |
| Sahil Khan | @Sahil-Khan10 |
| Arivumathi | @Arivumathi007 |