arXiv Preprint Code GitHub CSCE 585Project HelloWorldTutorial

## ATHENA is a Framework for Building Adversarial Defense

Though machine learning systems have achieved impressive success in a wide range of domains like computer vision and natural language processing, they are highly vulnerable to adversarial examples. An adversarial example is an input artifact that is crafted from legitimate data by adding human-imperceptible perturbations, aiming to covertly force ML Systems to produce an incorrect output. The vulnerability to adversarial examples can lead to a series of consequences, especially in security-critical tasks. For example, an object detector on a self-driving vehicle may incorrectly recognize an stop sign as a speed limit.

The threat of the adversarial examples has inspired a sizable body of research on various defense techniques. With the assumption on the specific known attack(s), most of the existing defenses, although effective against particular attacks, can be circumvented under slightly different conditions, either a stronger adaptive adversary or in some cases even weak (but different) adversaries. The arms race between the attacks and defenses leads us to this central question:

How can we, instead, design a defense, not as a technique, but as a framework that one can construct a specific defense considering the niche tradeoff space of robustness one may want to achieve as well as the cost one is willing to pay to achieve that level of robustness?

To address this question, we propose ATHENA (Goddess of defense in Greek mythology)—an extensible framework for building generic (and thus, broadly applicable) yet effective defenses against adversarial attacks. The design philosophy behind ATHENA is based on ensemble of many diverse weak defenses (WDs), where each WD, the building blocks of the framework, is a machine learning classifier (e.g., DNN, SVM) that first applies a transformation on the original input and then produces an output for the transformed input. Given an input, an ensemble first collects predicted outputs from all of the WDs and then determines the final output, using some ensemble strategy such as majority voting or averaging the predicted outputs from the WDs.

## Insights: Weak Defenses Complements Each Other!

In computer vision, a transformation is an image processing function. By distorbing its input, a transformation changes the adversarial optimized perturbations and thus making the perturbations less effective. However, the effectiveness of a single type of transformation varies on attacks and datasets. By mitigating the perturbations in different ways such as adjusting angles or position of the input, adding or removing noises, a collection of diverse transformations provides robustness against various attacks. Thus, the Diverse ensemble achieves the lowest error rate in most cases, especially for tasks on CIFAR-100.

Ensembling diverse transformations can result in a robust defense against a variety of attacks and provide a tradeoff space, where one can build a more robust ensemble by adding more transformations or building an ensemble with lower overhead and cost by utilizing fewer transformations.

## Zero Knowledge Threat Model

##### Adversary knows everything about the model, but it does not know there is a defense in place!

The effectiveness of individual WDs (each associated to a transformation) varies across attack methods and magnitudes of an attack. While a large population of transformations from a variety of categories successfully disentangle adversarial perturbations generated by various attacks. The variation of individual WDs’ error rates spans wider as the perturbation magnitude become stronger for a selected attack. By utilizing many diverse transformations, with ATHENA, we build effective ensembles that outperform the two state-of-the-art defenses — PGD adversarial training (PGD-ADT) and randomly smoothing (RS), in all cases.

## Black-box Threat Model

### Transfer-based approach

Although the transferability rate increases as the budget increases, the drop in the transferability rate from the undefended model (UM) to ATHENA indicates that ATHENA is less sensitive to the perturbation. Ensembling from many diverse transformations provides tangible benefits in blocking the adversarial transferability between weak defenses, and thus enhances model’s robustness against the transfer-based black-box attack.

Hop-Skip-Jump attack (HSJA) generates adversarial examples by querying the output labels from the target model for the perturbed images. Compared to that generated based on the UM, the adversarial examples generated based on ATHENA are much further away from the corresponding benign samples. As the query budget increases, the distances of the UM-targeted AEs drop much more significantly than that of the ATHENA-targeted AEs. Therefore, ATHENA increases the chance of such AEs being detected by even a simple detection mechanism.

## White-box Threat Model

### Greedy approach

As expected, stronger AEs are generated by the greedy white-box attack with a looser constraint on the dissimilarity threshold. However, such success comes at a price: with the largest threshold, the greedy attack has to spend 310X more time to generate adversarial example for a single input. This provides a tradeoff space, where realizations of ATHENA that employ larger ensembles incur more cost to the adversaries and they will eventually give up! Moreover, the generated AEs are heavily distored and very likely to be detected either by a human or an automated detection mechanism.

### Optimization-based approach

As the adversary have access to more WDs, it can launch more successful attacks without even increasing the perturbations. However, the computational cost of AE generation increases as well. The attacker has the choice to sample more random transformations and a choice to a distribution of a large population and diverse transformations in order to generate stronger AEs. However, this will incur a larger computational cost as well.

## Acknowledgement

• Google via GCP cloud research credits
• NASA (EPSCoR 521340-SC001)
• Research Computing Center at the University of South Carolina
• Chameleon Cloud via GPU compute nodes

## How to Cite

### Citation

Ying Meng, Jianhai Su, Jason M O'Kane, and Pooyan Jamshidi. ATHENA: A Framework based on Diverse Weak Defenses for Building Adversarial Defense. arXiv preprint arXiv: 2001.00308, 2020.


### Bibtex

@article{meng2020athena,
title={ATHENA: A Framework based on Diverse Weak Defenses for Building Adversarial Defense},
author={Ying Meng and Jianhai Su and Jason M O'Kane and Pooyan Jamshidi},
journal={arXiv preprint arXiv:2001.00308},
year={2020}
}


## Data Availability

The code posted on GitHub contains the analysis needed to reproduce the results in the paper. It also includes scripts for setting up all the dependencies using conda, and scripts to download the datasets and models used in the analysis of the paper. The GitHub repository also contains the derivative data associated to all figures in the paper and notebooks that consume the data and plot the figures and tables.