Malware_Classification_Using_Deep_Learning

Data Science Capstone project on creating a deep learning algorithm that can classify compressed images of known malware binary streams

As the world continues to become more digitized, there is a growing need within government agencies & large organizations for Digital Forensics. This sub-field related to Cybersecurity is concerned with investigating how cyber crimes were committed, what vulnerabilities need to be remedied and, in some cases, what prosecution steps can be pursued. In the first piece of the job around diagnosing how cyber crimes were committed, there is a consistent backlog of information to investigate, potentially malicious files to classify and a time intensive process to do so. Most digital forensics today involves an investigation conducted by subject matter experts, utilizing proprietary tools and custom making of “features” that can be used to classify malicious files. This process is expensive and time-consuming meaning there is an opening for improvements. Hence this project is focused solely on using deep learning techniques to classify malware files. As opposed to traditional tabular features that would be used to classify a malware file, in this case deep learning is utilized because we are classifying the file off its binary representation. Quite literally classifying the machine language 0’s & 1’s in a sequential order (binary stream) for each malware file. This process is beneficial not only because it can be utilized much quicker, but also because of its generalizability to all file types – what file can’t be made into its binary form? Therefore, this project involves creating 3 different deep learning networks to understand their ability to classify these malware files into 9 different classes. These 9 different classes are derived from a Microsoft data set and are therefore mostly files that affect windows users. The class distribution is not even, so I utilized evaluation metrics such as the F-1 score and modified the training weights to prevent overfitting to the majority classes. This project does not focus on a high degree of visualization or EDA, except where necessary to understand the data files & class imbalance.

Please note:

For the most concise description of this project please read the PowerPoint presentation here: https://github.com/jones5am/Malware_Classification_Using_Deep_Learning/raw/master/Malware%20Classification%20Using%20Deep%20Learning%20-%20v1.2.pptx
For the most detailed description of this project, please read the full report here: https://github.com/jones5am/Malware_Classification_Using_Deep_Learning/raw/master/Malware%20Classification%20Using%20Deep%20Learning%20v1.1.docx
This project inolves about 400GB of knowm malware files given as text files in hex format. These are then further manipulated into a 1D image so that we can apply Deep Learning classificaiton methods. But because of the file size and content - you cannot run this project with just what I've posted in my respository
This project is heavliy based on the research paper which can be found here: https://arxiv.org/ftp/arxiv/papers/1807/1807.08265.pdf
If you would like to replicate this project please download the data from it's original source here: https://www.kaggle.com/c/malware-classification

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DS Capstone Project Classify Malware Files - v1.6.ipynb		DS Capstone Project Classify Malware Files - v1.6.ipynb
Malware Classification Using Deep Learning - v1.2.pptx		Malware Classification Using Deep Learning - v1.2.pptx
Malware Classification Using Deep Learning v1.1.docx		Malware Classification Using Deep Learning v1.1.docx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware_Classification_Using_Deep_Learning

About

Releases

Packages

Languages

jones5am/Malware_Classification_Using_Deep_Learning

Folders and files

Latest commit

History

Repository files navigation

Malware_Classification_Using_Deep_Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages