Skip to content

KirilldogU/Obfuscated-Malware-Detection-ML-Analysis

Repository files navigation

Obfuscated-Malware-Detection-ML-Analysis

Project Description

For decades attackers have used malware, such as spyware, ransomware, and trojans, to corrupt computers and initiate attacks causing billions in damages, thereby making malware detection methods vital for research. A significant challenge is that the advent of obfuscation (concealment) techniques has made this task increasingly difficult and energy-intensive for computers. My work builds on recent research prototyping a light-weight malware detection approach, involving using features from snapshots of machines’ volatile memory to detect malware behavioral signatures with machine learning models. I utilize the new CIC-MalMem-2022 dataset of benign and malignant memory dumps to examine not only the accuracy of several trained models at malware detection, but also build on prior research to study models’ versatility at adjusting to several valuable factors for industry application. These factors include adjusting to resource-saving feature reduction training, response to natural or artificial data drift, application capabilities on low resource machines (IoT), and capacity for recognizing ‘zero-day’ attacks.

Work Description and Structure

The cybersecurity_ml_analysis.ipynb notebook contains data science research that provides a diverse analysis of algorithms for application in obfuscated malware detection, through examing several core factors.

These factors are:

  • training on semi-synthetic dataset for greater utility accross machines
  • application capabilities on low resource machines (e.i. IoT)
  • adjusting to resource-saving feature reduction training (for low memory and processing footprint)
  • capacity for recognizing ‘zero-day’ attacks (with incomplete malware dataset)
  • response to natural or artificial data drift (responsivess of model to change)

Data description:

  • Obfuscated-MalMem2022.csv is the raw memory dump data during a variety of benign operations and malware attacks. Source: https://www.unb.ca/cic/datasets/malmem-2022.html

  • augmented_data_set.csv is a larger augmented dataset that contains the previous memory data along with synthetic data samples generated using a GAN. Implemented in notebook.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published