0 | Philipp Benz

Privacy Safe Representation Learning via Frequency Filtering Encoder

Training a reconstruction attacker can successfully recover the original image of existing Adversarial Representation Learning (ARL) methods. We introduce a novel ARL method enhanced through low-pass filtering, limiting the available information amount to be encoded in the frequency domain.

Trade-off Between Accuracy, Robustness, and Fairness of Deep Classifiers

Deep classifiers trained on balanced datasets exhibit a class-wise imbalance, which is even more severe for adversarially trained models. We propose a class-wise loss re-weighting to obtain more fair standard and robust classifiers. The final results suggest, that there exists a triangular trade-off between accuracy, robustness, and fairness.

Backpropagating Smoothly Improves Transferability of Adversarial Examples

We conjecture that the reason that backpropagating linearly (LinBP) improves the transferability is mainly due to a continuous approximation for the ReLU in the backward pass. We propose backpropagating continuously (ConBP) that adopts a smooth yet non-linear gradient approximation. Our ConBP consistently achieves equivalent or superior performance than the recently proposed LinBP

Is FGSM Optimal or Necessary for L∞ Adversarial Attack?

We identify two drawbacks of MI-FGSM; inducing higher average pixel discrepancy to the image as well as making the current iteration update overly dependent on the historical gradients. We propose a new momentum-free iterative method that processes the gradient with a generalizable Cut & Norm operation instead of a sign operation.

Towards Simple Yet Effective Transferable Targeted Adversarial Attacks

We revisit the transferable adversarial attacks and improve it from two perspectives; First, we identify over-fitting as one major factor that hinders transferability, for which we propose to augment the network input and/or feature layers with noise. Second, we propose a new cross-entropy loss with two ends; One for pushing the sample far from the source class, i.e. ground-truth class, and the other for pulling it close to the target class.

On Strength and Transferability of Adversarial Examples: Stronger Attack Transfers Better

We revisit adversarial attacks by perceiving it as shifting the sample semantically close to or far from a certain class, i.e. interest class. With this perspective, we introduce a new metric called interest class rank (ICR), i.e. the rank of interest class in the adversarial example, to evaluate adversarial strength.

Stochastic Depth Boosts Transferability of Non-Targeted and Targeted Adversarial Attacks

In contrast to most existing works that manipulate the image input for boosting transferability, our work manipulates the model architecture. Specifically, we boost the transferability with stochastic depth by randomly removing a subset of layers in networks with skip connections. Technical-wise, our proposed approach is mainly inspired by previous work improving the network generalization with stochastic depth. Motivation-wise, our approach of removing residual module instead of skip connection is inspired by the known finding that transferability of adversarial examples are positively related to local linearity of DNNs.