armory.tex

\documentclass[letterpaper]{article}
% DO NOT CHANGE THIS
\usepackage{aaai23} % DO NOT CHANGE THIS
\usepackage{times} % DO NOT CHANGE THIS
\usepackage{helvet} % DO NOT CHANGE THIS
\usepackage{courier} % DO NOT CHANGE THIS
\usepackage[hyphens]{url} % DO NOT CHANGE THIS
\usepackage{graphicx} % DO NOT CHANGE THIS
\urlstyle{rm} % DO NOT CHANGE THIS
\def\UrlFont{\rm} % DO NOT CHANGE THIS
\usepackage{graphicx} % DO NOT CHANGE THIS
\usepackage{natbib} % DO NOT CHANGE THIS
\usepackage{caption} % DO NOT CHANGE THIS
\frenchspacing % DO NOT CHANGE THIS
\setlength{\pdfpagewidth}{8.5in} % DO NOT CHANGE THIS
\setlength{\pdfpageheight}{11in} % DO NOT CHANGE THIS
%
% Keep the \pdfinfo as shown here. There’s no need
% for you to add the /Title and /Author tags.
\pdfinfo{
    /TemplateVersion (2023.1)
}

\title{Armory -- A Library for Adversarial ML Evaluation
    \thanks{Armory is based upon work supported by the Defense Advanced Research
        Projects Agency (DARPA) under Contract No. HR001120C0114 and
        US Army (JATIC) Contract No. W519TC2392035}
}
\author{
    Matt Wartell
}
\affiliations{
    TwoSix Technologies, Inc. \\
    matt.wartell@twosixtech.com
}

\begin{document}

\maketitle

\begin{abstract}

    Armory is a library for the evaluation of ML models in the face of adversarial
    manipulation. It was created to support such evaluations in the DARPA
    GARD research project to aid the development of defenses against adversarial
    perturbation.


\end{abstract}

% \section{Background}

Adversarial manipulation of ML systems is the practice of exploiting the vulnerabilities
of machine learning models to produce misleading or harmful outcomes. As early as a
decade ago, the susceptibility of ML systems to adversarial manipulation was being noted
in the literature \cite{szegedy2014intriguing}. This and further work formed the basis
for DARPA's interest in developing technologies to make models more robust to
adversarial attack. To this end, DARPA established the GARD program which pitted defense
creators ("performers" typically academic / industry collaborations) against industry
evaluators charged with evading those defenses. As these evaluation competitions
repeated over the course of the research project, performers got better at making models
robust against attack and the evaluators improved their tools for defeating those
defenses. The collective experience has been encoded into Armory and IBM's Adversarial
Robustness Toolbox (ART).

As a mechanism to assist GARD evaluators, the Armory application was created by TwoSix
Technologies in close collaboration with IBM and MITRE. It provided a number of features
that were helpful to the GARD research program: A curated collection of models, and
datasets well known to the ML community and used because they were well characterized.
Armory first measures the model performance on the unperturbed inputs and then with
adversarial perturbations computed and applied. The Armory application used Docker and
conda to ease environment replication between performers and evaluators. Armory was
released as open-source via Github and the Python Package Index (PyPI) since its
inception.

A key capability of Armory is allowing for determinate experiment replication so
that the performers can fully specify their defense and its exercise in code such that
evaluators are guaranteed to get the same results (there are some defenses that count on
stochastic effects to evade attacks, but variability is expected in those results).

In 2023, the Chief Digital and AI Office's Test and Evaluation Directorate (CDAO T\&E)
requested that TwoSix adapt the Armory application to the Joint AI Test Infrastructure
Capability (JATIC), a suite of interoperable software tools for comprehensive AI model
T\&E. A hallmark of the JATIC program is that contributed tools provide utility to the
broadest possible ML evaluation community. To this end TwoSix has converted Armory to a
pure Python library. In the process, we have shed mechanisms such as the docker
dependency which was an unneeded hurdle for Armory users not involved in GARD research.
We have also transformed the experiment description from a JSON datum to Pythonic
strongly-typed object constructors. The Armory library is soon to replace the
GARD-Armory application on Github and armory-library will supplant armory-testbed as the
pip-installable entry on PyPI. The GARD-Armory application will continue to be used and
supported through the duration of the GARD program.

% \section{Capabilities}

The Armory library provides broad evaluative capabilities for a host of ML modalities:
RGB and multispectral image classification and object detection, adversarial patch
attacks, video processing, audio speech processing, and dataset poisoning. The Armory
library is provided as a pure Python Library with primary dependencies on PyTorch
and IBM ART. There is historic support for TensorFlow models and datasets within
the library, but their use, as well as the non-image modalities
are not a focus of the JATIC efforts.

A user of armory library, after importing the library itself, needs to call
upon the jatic\_toolbox to obtain a model and dataset. The Armory API allows easy
modulation of the dataset with adversarial perturbations, and these three objects
are passed to Armory's evaluation engine. The evaluation engine first runs the
model on the dataset to measure the unaltered "benign" performance, and then
repeats that with adversarial perturbation. The construction of the experiment
is tracked using mlFlow so that it can be repeated in complete detail, the resultant
metrics are also gathered. The mlFlow databases can live on a server or
be captured in a small, local file-based database (sqlite) if no server is specified.

Armory has the ability to generate elaborate metrics for the model under test:
adversarial perturbations such as $L^P$ distances, task performance such as categorical
accuracy, statistical quantities like Kullback-Leibler (KL) divergence, and fairness
statistics relevant to poisoning and filtering attacks. Armory also provides extension
mechanisms that allow a user to put software probes into the evaluation operation and
store novel metrics to mlFlow.

With the JATIC effort we have opened armory-library to arbitrarily large models
and datasets. Although there is no fundamental limitation that armory imposes,
we are testing computational performance across ever larger inputs to afford
guidance to prospective users.


\bibliography{armory}

\end{document}