Double-Cross Attacks: Subverting Active Learning Systems
Published in Usenix, 2021
author = {Sanchez Vicarte, Jose Rodrigo and Schreiber, Benjamin and Paccagnella, Riccardo and Fletcher, Christopher W.},
title = {Game of Threads: Enabling Asynchronous Poisoning Attacks},
year = {2020},
isbn = {9781450371025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3373376.3378462},
doi = {10.1145/3373376.3378462},
booktitle = {Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems},
pages = {35–52},
numpages = {18},
keywords = {adversarial machine learning, trusted execution environment, asynchronous stochastic gradient descent},
location = {Lausanne, Switzerland},
series = {ASPLOS ’20}
}
author = {Mahmoud, Abdulrahman and Aggarwal, Neeraj and Nobbe, Alex and Vicarte, Jose and Adve, Sarita and Fletcher, Christopher and Frosio, Iuri and Hari, Siva},
year = {2020},
month = {06},
pages = {25-31},
title = {PyTorchFI: A Runtime Perturbation Tool for DNNs},
doi = {10.1109/DSN-W50199.2020.00014}
}
author = {Choi, Woo-Seok and Tomei, Matthew and Vicarte, Jose Rodrigo Sanchez and Hanumolu, Pavan Kumar and Kumar, Rakesh},
title = {Guaranteeing Local Differential Privacy on Ultra-Low-Power Systems},
year = {2018},
isbn = {9781538659847},
publisher = {IEEE Press},
url = {https://doi.org/10.1109/ISCA.2018.00053},
doi = {10.1109/ISCA.2018.00053},
booktitle = {Proceedings of the 45th Annual International Symposium on Computer Architecture},
pages = {561–574},
numpages = {14},
keywords = {microcontrollers, randomized response, IoT, low-power systems, RAPPOR, differential privacy},
location = {Los Angeles, California},
series = {ISCA ’18}
}
Active learning is widely used in data labeling services to support real-world machine learning applications. By selecting and labeling the samples that have the highest impact on model retraining, active learning can reduce labeling efforts, and thus reduce cost.
In this paper, we present a novel attack called Double Cross, which aims to manipulate data labeling and model training in active learning settings. To perform a double-cross attack, the adversary crafts inputs with a special trigger pattern and sends the triggered inputs to the victim model retraining pipeline. The goals of the triggered inputs are (1) to get selected for labeling and retraining by the victim; (2) to subsequently mislead human annotators into assigning an adversary-selected label; and (3) to change the victim model’s behavior after retraining occurs. After retraining, the attack causes the victim to mislabel any samples with this trigger pattern to the adversary-chosen label. At the same time, labeling other samples, without the trigger pattern, is not affected. We develop a trigger generation method that simultaneously achieves these three goals. We evaluate the attack on multiple existing image classifiers and demonstrate that both gray-box and black-box attacks are successful. Furthermore, we perform experiments on a real-world machine learning platform (Amazon SageMaker) to evaluate the attack with human annotators in the loop, to confirm the practicality of the attack. Finally, we discuss the implications of the results and the open research questions moving forward.