With the continuous development of artificial intelligence, neural networks have exhibited exceptional performance across various domains. However, the existence of adversarial samples poses a significant challenge to the application of neural networks in security-related fields. As research progresses, there is an increasing focus on the robustness of neural networks and their inherent performance. This paper aims to improve neural networks to enhance their adversarial robustness. Although adversarial training has shown great potential in improving adversarial robustness, it suffers from the drawback of long running times. This is primarily because it requires generating adversarial samples for the target model at each iteration step. To address the issues of time-consuming adversarial sample generation and lack of diversity in adversarial training, this paper proposes a contrastive distillation algorithm based on masked autoencoders (MAE) to enhance the adversarial robustness of neural networks. Due to the low information density in images, the loss of image pixels caused by masking can often be recovered using neural networks. Thus, masking-based methods are commonly employed to increase sample diversity and improve the feature learning capabilities of neural networks. Given that adversarial training methods often require considerable time to generate adversarial samples, this paper adopts masking methods to mitigate the time-consuming issue of continuously generating adversarial samples during adversarial training. Additionally, randomly occluding parts of the image can effectively enhance sample diversity, which helps create multi-view samples to address the problem of feature in contrastive learning. Firstly, to reduce the teacher model's reliance on global image features, the teacher model learns in an improved masked autoencoder how to infer the features of obscured blocks based on visible sub-blocks. This method allows the teacher model to focus on learning how to reconstruct global features from limited visible parts, thereby enhancing its deep feature learning ability. Then, to mitigate the impact of adversarial interference, this paper employs knowledge distillation and contrastive learning methods to enhance the target model's adversarial robustness. Knowledge distillation reduces the target model's dependence on global features by transfering the knowledge from the teacher model, while contrastive learning enhances the model's ability to recognize tine-grained information among images by leveraging the diverty of the generated multi-view samples. Finally, label information is utilized to adjust the classification head to ensure recognition accuracy. By fine-tuning the classification head with label information, the model can maintain high accuracy in recognizing dean samples while improving its robustness against adversarial attacks. Experimental results conducted on ResNet50 and WideResNet50 demonstrate an average improvement of 11.50% in adversarial accuracy on CIFAR-10 and an average improvement of 6.35% on CIFAR-100. These results validate the effectiveness of the proposed contrastive distillation algorithm based on masked autoencoders. The algorithm attenuates the impact of adversarial interference by generating adversarial samples only once, enhances sample diversity through random masking, and improves the neural network's adversarial robustness. © 2024 Science Press. All rights reserved.