On Strength and Transferability of Adversarial Examples: Stronger Attack Transfers Better


Our work revisits adversarial attack by perceiving it as shifting the sample semantically close to or far from a certain class, i.e. interest class. With this perspective, we introduce a new metric called interest class rank (ICR), i.e. the rank of interest class in the adversarial example, to evaluate adversarial strength. The widely used attack success rate (ASR) only taking the top-1 prediction into account can be seen as a special case of ICR. Considering top-k prediction, our ICR constitutes a fine-grained evaluation metric and it can also be readily extended to transfer-based black-box attack. With the widely observed phenomenon that I-FGSM transfers worse than FGSM, adversarial transferability, i.e. attack strength on the black-box target model, is widely reported to be at odds with white-box attack strength. Our work challenges this widely held belief with the finding that increasing the number of iterations boosts both white-box strength and black-box transferability. This finding provides a non-trivial insight that adversarial transferability can be enhanced through improving the white-box adversarial strength. To this end, we provide a geometric perspective on the logit gradient and propose a new loss that achieves SOTA white-box attack strength, consequently, also leading to SOTA attack strength in the black-box setting.

In Workshop on robust and reliable Machine Learning in the real world @ ICLR 2021 (RobustML @ ICLR2021)
Philipp Benz
Philipp Benz
Ph.D. Candidate @ Robotics and Computer Vision Lab, KAIST

My research interest is in Deep Learning with a focus on robustness and security.