Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications for several years. However, new model architectures have recently been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been shown vulnerable to adversarial examples. This work sets out to investigate the adversarial vulnerability of the recently introduced ViT and MLP-Mixer architectures and compare their performance with CNNs. Our results on white-box and black-box attacks suggest that ViT and MLP-Mixer architectures are more robust to adversarial examples. Using a toy example, we also provide empirical evidence that the lower adversarial robustness of CNNs can be attributed to their shift-invariant property. With a frequency study, we further analyze the distribution of frequencies learned from different model architectures.