Skip to content

AhmadSavaiz03/Label_Consistent_Backdoor

Repository files navigation

Label Consistent Backdoor Attack using Adversarial Perturbations

In the same vein as my earlier work on developing a library of backdoor attacks, I tackled a project focused on creating models with imperceptible triggers. These triggers are designed to be inconspicuous not only to human observers but also to machine learning models, except for the specifically trained model that recognizes these triggers. A prominent technique employed by researchers for this purpose involves modifying the gradient in certain image areas using the adversarial perturbations - more in research paper. I ventured into creating my own rudimentary version of this approach using the fast gradient sign method (FGSM).

For this project, I utilized the CIFAR-10 dataset and the ResNet-18 model. The choice of CIFAR-10 presented additional challenges over simpler datasets like MNIST, primarily due to its larger data volume and the consequent need for increased processing power. The process began with loading the images, followed by applying a function to modify their labels and adjust their gradients by varying degrees of epsilon, a parameter that controls the intensity of the alteration. These steps and their outcomes, including the high baseline accuracy achieved and the generation of poisoned images, were detailed in my progress presentation. However, it's important to note that the attack success rate exhibited some degree of inconsistency.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published