Towards Simple Yet Effective Transferable Targeted Adversarial Attacks


Transfer-based targeted adversarial attacks against deep image classifiers remain an open issue. Depending on which parts of the deep neural network are explicitly incorporated into the loss function, the existing methods can be divided into two categories: (a) feature space attack and (b) output space attack. One recent work has shown that attacking the feature space outperforms attacking the output space by a large margin. However, the elevated attack success comes at the cost of requiring to train layer-wise auxiliary classifiers for each corresponding target class together with a greedy search to find the optimal layers. In this work, we revisit the output space attack and improve it from two perspectives: First, we identify over-fitting as one major factor that hinders transferability, for which we propose to augment the network input and/or feature layers with noise. Second, we propose a new cross-entropy loss with two ends: one for pushing the sample far from the source class, i.e. ground-truth class, and the other for pulling it close to the target class. We find that given sufficiently large iterations, our approach can outperform the state-of-the-art feature space method by a large margin.

In Workshop on robust and reliable Machine Learning in the real world @ ICLR 2021 (RobustML @ ICLR2021)
Philipp Benz
Philipp Benz
Ph.D. Candidate @ Robotics and Computer Vision Lab, KAIST

My research interest is in Deep Learning with a focus on robustness and security.