Skip to content

tomergill/Malware_Classification_Final_Project

Repository files navigation

Malware_Classification_Final_Project

This project is inspired by the winners of "Microsoft Malware Classification Challenge". The objective of our project is to classify executable files into benign files or to one of nine malicious file classes.

In order to achive our goal we used two models:

  • Machine Learning - Our main feature was based on opcode count: we read disassembly of EXE files and then splited them into n-grams . We used XGBoost package (an implementation of gradient boosted decision trees) in order to construct different decision trees and combine them into an Improved model.

  • Deep Learning - we implimented a convolutional neural network based on Raff’s groundbreaking paper: 'Malware Detection by Eating a Whole EXE'.

Results:

We examined files that can be categorized into ten different classes (one bengin class and nine malware classes). Moreover, we ensured that each class received equal representation in the test set, so we can make sure that the model doesn't classifies all the files into the same class.

Machine learning:

Accuracy Average loss
Train set 99.487231% 0.013942
Test set 94.611516% 0.249856

ml

Deep Learning:

Accuracy Average loss
Train set 99.256321% 0.025617
Test set 91.666667% 0.368867

dl graph

Requirements:

Machine Learning:

  • xgboost
  • numpy
  • sklearn
  • pydasm

Deep Learning:

  • pytorch
  • numpy

About

Yossi Mandil & Tomer Gill's Bachelor Degree Final Project under the BIU Cyber Center - Malware & Benign File Classification using Machine Learning & Deep Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages