Torchattacks is a PyTorch library that contains adversarial attacks to generate adversarial examples.
Clean Image | Adversarial Image |
---|---|
- Usage
- Attacks and Papers
- Performance Comparison
- Documentation
- Citation
- Expanding the Usage
- Contribution
- Recommended Sites and Packages
- torch==1.4.0
- python==3.6
pip install torchattacks
orgit clone https://github.com/Harry24k/adversairal-attacks-pytorch
import torchattacks
atk = torchattacks.PGD(model, eps=8/255, alpha=2/255, steps=4)
adversarial_images = atk(images, labels)
-
All images should be scaled to [0, 1] with transform[to.Tensor()] before used in attacks. To make it easy to use adversarial attacks, a reverse-normalization is not included in the attack process. To apply an input normalization, please add a normalization layer to the model. Please refer to code or nbviewer.
-
All models should return ONLY ONE vector of
(N, C)
whereC = number of classes
. Considering most models in torchvision.models return one vector of(N,C)
, whereN
is the number of inputs andC
is thenumber of classes, torchattacks also only supports limited forms of output. Please check the shape of the modelβs output carefully. In the case of the model returns multiple outputs, please refer to the demo. -
torch.backends.cudnn.deterministic = True
to get same adversarial examples with fixed random seed. Some operations are non-deterministic with float tensors on GPU [discuss]. If you want to get same results with same inputs, please runtorch.backends.cudnn.deterministic = True
[ref].
Implemented adversarial attacks in the papers.
The distance measure in parentheses.
Name | Paper | Remark |
---|---|---|
FGSM (Linf) |
Explaining and harnessing adversarial examples (Goodfellow et al., 2014) | |
BIM (Linf) |
Adversarial Examples in the Physical World (Kurakin et al., 2016) | Basic iterative method or Iterative-FSGM |
CW (L2) |
Towards Evaluating the Robustness of Neural Networks (Carlini et al., 2016) | |
RFGSM (Linf) |
Ensemble Adversarial Traning: Attacks and Defences (Tramèr et al., 2017) | Random initialization + FGSM |
PGD (Linf) |
Towards Deep Learning Models Resistant to Adversarial Attacks (Mardry et al., 2017) | Projected Gradient Method |
PGDL2 (L2) |
Towards Deep Learning Models Resistant to Adversarial Attacks (Mardry et al., 2017) | Projected Gradient Method |
MIFGSM (Linf) |
Boosting Adversarial Attacks with Momentum (Dong et al., 2017) | π Contributor zhuangzi926, huitailangyz |
TPGD (Linf) |
Theoretically Principled Trade-off between Robustness and Accuracy (Zhang et al., 2019) | |
EOTPGD (Linf) |
Comment on "Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network" (Zimmermann, 2019) | EOT+PGD |
PGDDLR (Linf) |
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks (Croce et al., 2020) | PGD based on DLR loss |
APGD (Linf, L2) |
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks (Croce et al., 2020) | |
APGDT (Linf, L2) |
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks (Croce et al., 2020) | Targeted APGD |
FAB (Linf, L2, L1) |
Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack (Croce et al., 2019) | |
Square (Linf, L2) |
Square Attack: a query-efficient black-box adversarial attack via random search (Andriushchenko et al., 2019) | |
AutoAttack (Linf, L2) |
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks (Croce et al., 2020) | APGD+APGDT+FAB+Square |
DeepFool (L2) |
DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks (Moosavi-Dezfooli et al., 2016) | |
OnePixel (L0) |
One pixel attack for fooling deep neural networks (Su et al., 2019) | |
SparseFool (L0) |
SparseFool: a few pixels make a big difference (Modas et al., 2019) |
For a fair comparison, Robustbench is used. As for the comparison packages, currently updated and the most cited methods were selected:
Robust accuracy against each attack and elapsed time on the first 50 images of CIFAR10. For L2 attacks, the average L2 distances between adversarial images and the original images are recorded. All experiments were done on GeForce RTX 2080. This is for torchattacks==2.13.2
. For the latest version, please refer to here (code, nbviewer).
Attack (Linf) | Package | Wong2020 | Rice2020 | Carmon2019 | Remark |
---|---|---|---|---|---|
FGSM | Torchattacks | 48% (15 ms) | 62% (94 ms) | 68% (18 ms) | |
Foolbox* | 48% (34 ms) | 62% (46 ms) | 68% (27 ms) | ||
ART | 48% (50 ms) | 62% (799 ms) | 68% (221 ms) | ||
PGD | Torchattacks | 44% (156 ms) | 58% (2770 ms) | 58% (692 ms) | |
Foolbox* | 44% (249 ms) | 58% (3452 ms) | 58% (945 ms) | ||
ART | 44% (495 ms) | 58% (4782 ms) | 58% (1444 ms) | ||
Attack (L2) | Package | Wong2020 | Rice2020 | Carmon2019 | Remark |
CWβ | Torchattacks | 14% / 0.61 (4367 ms) |
22% / 0.56 (43680 ms) |
26% / 0.48 (13032 ms) |
Highest Success Rate |
Foolbox* | 32% / 0.41 (4530 ms) |
34% / 0.43 (45273 ms) |
32% / 0.42 (13314 ms) |
Smallest Perturbation | |
ART | 24% / 0.71 (71613 ms) |
26% / 0.65 (691977 ms) |
26% / 0.62 (206250 ms) |
||
PGDL2 | Torchattacks | 68% / 0.5 (166 ms) |
70% / 0.5 (2796 ms) |
68% / 0.5 (712 ms) |
|
Foolbox* | 68% / 0.5 (267 ms) |
70% / 0.5 (3501 ms) |
68% / 0.5 (962 ms) |
||
ART | 68% / 0.5 (470 ms) |
70% / 0.5 (4822 ms) |
68% / 0.5 (1441 ms) |
* Note that Foolbox returns accuracy and adversarial images simultaneously, thus the actual time for generating adversarial images might be shorter than the records.
β Considering that the binary search algorithm for const c
can be time-consuming, torchattacks supports customized grid search as in code, nbviewer.
Here is a documentation for this package.
Here is update records of this package.
- White Box Attack with ImageNet (code, nbviewer): Using torchattacks to make adversarial examples with the ImageNet dataset to fool ResNet-18.
- Black Box Attack with CIFAR10 (code, nbviewer): This demo provides an example of black box attack with two different models. First, make adversarial datasets from a holdout model with CIFAR10 and save it as torch dataset. Second, use the adversarial datasets to attack a target model.
- Adversairal Training with MNIST (code, nbviewer): This code shows how to do adversarial training with this repository. The MNIST dataset and a custom model are used in this code. The adversarial training is performed with PGD, and then FGSM is applied to evaluate the model.
- Applications of MultiAttack with CIFAR10 (code, nbviewer): This code shows the applications of Multiattack.
SemanticScholar. If you use this package, please cite the following BibTex:
@article{kim2020torchattacks,
title={Torchattacks: A Pytorch Repository for Adversarial Attacks},
author={Kim, Hoki},
journal={arXiv preprint arXiv:2010.01950},
year={2020}
}
Torchattacks supports collaboration with other attack packages.
Through expending the usage, we can use fucntions in torchattacks such as save and multiattack.
- https://github.com/bethgelab/foolbox
pip install foolbox
- e.g., L2BrendelBethge
from torchattacks.attack import Attack
import foolbox as fb
class L2BrendelBethge(Attack):
def __init__(self, model):
super(L2BrendelBethge, self).__init__("L2BrendelBethge", model)
self.fmodel = fb.PyTorchModel(self.model, bounds=(0,1), device=self.device)
self.init_attack = fb.attacks.DatasetAttack()
self.adversary = fb.attacks.L2BrendelBethgeAttack(init_attack=self.init_attack)
self._attack_mode = 'only_default'
def forward(self, images, labels):
images, labels = images.to(self.device), labels.to(self.device)
# DatasetAttack
batch_size = len(images)
batches = [(images[:batch_size//2], labels[:batch_size//2]),
(images[batch_size//2:], labels[batch_size//2:])]
self.init_attack.feed(model=self.fmodel, inputs=batches[0][0]) # feed 1st batch of inputs
self.init_attack.feed(model=self.fmodel, inputs=batches[1][0]) # feed 2nd batch of inputs
criterion = fb.Misclassification(labels)
init_advs = self.init_attack.run(self.fmodel, images, criterion)
# L2BrendelBethge
adv_images = self.adversary.run(self.fmodel, images, labels, starting_points=init_advs)
return adv_images
atk = L2BrendelBethge(model)
atk.save(data_loader=test_loader, save_path="_temp.pt", verbose=True)
- https://github.com/IBM/adversarial-robustness-toolbox
git clone https://github.com/IBM/adversarial-robustness-toolbox
- e.g., SaliencyMapMethod (or Jacobian based saliency map attack)
import torch.nn as nn
import torch.optim as optim
from torchattacks.attack import Attack
import art.attacks.evasion as evasion
from art.classifiers import PyTorchClassifier
class JSMA(Attack):
def __init__(self, model, theta=1/255, gamma=0.15, batch_size=128):
super(JSMA, self).__init__("JSMA", model)
self.classifier = PyTorchClassifier(
model=self.model, clip_values=(0, 1),
loss=nn.CrossEntropyLoss(),
optimizer=optim.Adam(self.model.parameters(), lr=0.01),
input_shape=(1, 28, 28), nb_classes=10)
self.adversary = evasion.SaliencyMapMethod(classifier=self.classifier,
theta=theta, gamma=gamma,
batch_size=batch_size)
self.target_map_function = lambda labels: (labels+1)%10
self._attack_mode = 'only_default'
def forward(self, images, labels):
adv_images = self.adversary.generate(images, self.target_map_function(labels))
return torch.tensor(adv_images).to(self.device)
atk = JSMA(model)
atk.save(data_loader=test_loader, save_path="_temp.pt", verbose=True)
All kind of contributions are always welcome! π
If you are interested in adding a new attack to this repo or fixing some issues, please have a look at contribution.md.
-
Adversarial Attack Packages:
- https://github.com/IBM/adversarial-robustness-toolbox: Adversarial attack and defense package made by IBM. TensorFlow, Keras, Pyotrch available.
- https://github.com/bethgelab/foolbox: Adversarial attack package made by Bethge Lab. TensorFlow, Pyotrch available.
- https://github.com/tensorflow/cleverhans: Adversarial attack package made by Google Brain. TensorFlow available.
- https://github.com/BorealisAI/advertorch: Adversarial attack package made by BorealisAI. Pytorch available.
- https://github.com/DSE-MSU/DeepRobust: Adversarial attack (especially on GNN) package made by BorealisAI. Pytorch available.
- https://github.com/fra31/auto-attack: Set of attacks that is believed to be the strongest in existence. TensorFlow, Pyotrch available.
-
Adversarial Defense Leaderboard:
-
Adversarial Attack and Defense Papers:
- https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html: A Complete List of All (arXiv) Adversarial Example Papers made by Nicholas Carlini.
- https://github.com/chawins/Adversarial-Examples-Reading-List: Adversarial Examples Reading List made by Chawin Sitawarin.