Instructions

Deep Neural Networks(DNN) are vulnerable to training time attacks because individual users often do not have the computational resources for training large/complex models (that often comprise millions of parameters) or the ability to acquire large, high-quality training datasets required for achieving high accuracy. The latter is especially true when data acquisition and labeling entails high cost or requires human expertise; As a result, users either outsource DNN training or, more commonly, source pre-trained DNN models from online repositories like the Model Zoos for different frameworks or GitHub. While the user can verify a model’s accuracy on representative inputs by testing on small public or private validation data, the user may not know or trust the model’s author (or trainer) or have access to their training data set. This opens the door to DNN backdooring attacks. An adversary can train and upload a DNN model that is highly accurate on clean inputs (and thus on the user’s validation set), but misbehaves when inputs contain special attacker-chosen backdoor triggers. Such maliciously trained DNNs have been referred to as “BadNets.”

Competition

The aim of the competition is to mitigate backdoors that exist in the “BadNets”, i.e., to reduce the attack success rate as much as possible. Your task is to reverse-engineer the backdoor trigger for each network, perhaps to just detect if the network is clean/backdoored, or to design a tool that can identify inputs with the trigger in the test dataset. You may also propose and describe other defenses for backdoored networks by contacting the competition organizers. As a starting point, one can go through the following references:

Liu, Kang, Brendan Dolan-Gavitt, and Siddharth Garg. “Fine-pruning: Defending against backdooring attacks on deep neural networks.” International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, Cham, 2018.
Tran, Brandon, Jerry Li, and Aleksander Madry. “Spectral signatures in backdoor attacks.” In Advances in Neural Information Processing Systems, pp. 8000-8010. 2018.
Wang, Bolun, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks.” In 2019 IEEE Symposium on Security and Privacy (SP), pp. 707-723. IEEE, 2019.
Gao, Yansong, Change Xu, Derui Wang, Shiping Chen, Damith C. Ranasinghe, and Surya Nepal. “Strip: A defence against trojan attacks on deep neural networks.” In Proceedings of the 35th Annual Computer Security Applications Conference, pp. 113-125. 2019.
Liu, Yingqi, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. “ABS: Scanning neural networks for back-doors by artificial brain stimulation.” In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1265-1282. 2019.
Qiao, Ximing, Yukun Yang, and Hai Li. “Defending neural backdoors via generative distribution modeling.” In Advances in Neural Information Processing Systems, pp. 14004-14013. 2019.

Backdoored Network

The backdoored network is generated by the organizers of this contest and accessible via the Github repository (see Task #2 below). The repository contains:

The network to be “repaired” and it’s architecture.
A link to clean validation dataset to evaluate the network performance.
Test script for classifying a single image.

Tasks

Register your Team: https://forms.gle/NBpHEB3YB8YokZb28
Clone/Pull the CSAW HackML 2020 GitHub repository as your starting point: https://github.com/csaw-hackml/CSAW-HackML-2020
The competition present three goals to the participants which can be taken up independently and/or in a combination:
- Goal-I (Network repairing): Develop a repaired network having low attack-success rate with backdoored inputs and high clean accuracy.
- Goal-II (Detection of Backdoor inputs): Develop a tool/methodology to identify backdoored input images in test data.
- Goal-III (Detection of Backdoored models): Develop a tool/methodology to detect whether a network is clean or backdoored.
Prepare accompanying documentation to explain the method(s) used to mitigate the backdoor effect(s) in the network.

Submission Deadline: 18 October 2020, 23:59 EST

Implementation Requirements

Participants are recommended to use python3 and Keras as the deep learning framework, as this should result in easy integration of the backdoored model with the provided evaluation script.

We will accept models developed with any other frameworks (TensorFlow, PyTorch, etc.) provided that it works with the evaluation script.

Submission Guidelines

Participants should provide a link for the organizers to a zip archive that contains:

Script(s) for the organizers to evaluate and verify if the proposed defense is meeting all the defined goals. Specifically, you script(s) should do the following for:
- Goal-I: Network repairing
  - Inputs: Validation dataset and BadNet.
  - Desired Output: Low attack success rate on poisoned test-data and high clean accuracy on clean test-data.

- Goal-II: Detection of Backdoor inputs
  - Inputs: Validation dataset and/or BadNet.
  - Desired Output: Classify the test image as either clean or poisoned.

- Goal-III: Detection of Backdoored models
  - Inputs: Network and Validation dataset.
  - Desired Output: Classify the network as either clean or backdoored.

A report that summarizes the defence techniques (4-pages maximum), including:
- A well defined aim explaining the specific goal that the proposed defense is trying to achieve.
- A summary of the overall submission.
- Details on the techniques used for backdoor defense.
- Any other information or interesting observations/ideas that you think will be helpful for the scorers/organizers.
Furthermore, to aid the judges in evaluating the submissions, it is recommended that participants prepare a docker container containing the required dependencies.

The link should be emailed to csaw-hackml@nyu.edu.

Scoring Guidelines

To aid teams in designing their defense, here are a few guidelines to keep in mind to get maximum points for the submission:

The organizers will evaluate the submission on a set of held out BadNets with slightly different trigger properties. This step is to ensure that the participants don’t retrain the provided BadNet with clean data.
Report should contain a description of the defense performance on adaptive attackers.