Obfuscated-Malware-Detection-ML-Analysis

Project Description

For decades attackers have used malware, such as spyware, ransomware, and trojans, to corrupt computers and initiate attacks causing billions in damages, thereby making malware detection methods vital for research. A significant challenge is that the advent of obfuscation (concealment) techniques has made this task increasingly difficult and energy-intensive for computers. My work builds on recent research prototyping a light-weight malware detection approach, involving using features from snapshots of machines’ volatile memory to detect malware behavioral signatures with machine learning models. I utilize the new CIC-MalMem-2022 dataset of benign and malignant memory dumps to examine not only the accuracy of several trained models at malware detection, but also build on prior research to study models’ versatility at adjusting to several valuable factors for industry application. These factors include adjusting to resource-saving feature reduction training, response to natural or artificial data drift, application capabilities on low resource machines (IoT), and capacity for recognizing ‘zero-day’ attacks.

Work Description and Structure

The cybersecurity_ml_analysis.ipynb notebook contains data science research that provides a diverse analysis of algorithms for application in obfuscated malware detection, through examing several core factors.

These factors are:

training on semi-synthetic dataset for greater utility accross machines
application capabilities on low resource machines (e.i. IoT)
adjusting to resource-saving feature reduction training (for low memory and processing footprint)
capacity for recognizing ‘zero-day’ attacks (with incomplete malware dataset)
response to natural or artificial data drift (responsivess of model to change)

Data description:

Obfuscated-MalMem2022.csv is the raw memory dump data during a variety of benign operations and malware attacks. Source: https://www.unb.ca/cic/datasets/malmem-2022.html
augmented_data_set.csv is a larger augmented dataset that contains the previous memory data along with synthetic data samples generated using a GAN. Implemented in notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Obfuscated-MalMem2022.csv		Obfuscated-MalMem2022.csv
README.md		README.md
augmented_data_set.csv		augmented_data_set.csv
cybersecurity_ml_analysis.ipynb		cybersecurity_ml_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Obfuscated-Malware-Detection-ML-Analysis

Project Description

Work Description and Structure

Data description:

About

Releases

Packages

Languages

KirilldogU/Obfuscated-Malware-Detection-ML-Analysis

Folders and files

Latest commit

History

Repository files navigation

Obfuscated-Malware-Detection-ML-Analysis

Project Description

Work Description and Structure

Data description:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages