-
摘要:
深度神经网络(DNNs)容易受到对抗样本的攻击,现有基于动量的对抗样本生成方法虽然可以达到接近100%的白盒攻击成功率,但是在攻击其他模型时效果仍不理想,黑盒攻击成功率较低。针对此,提出一种基于损失平滑的对抗样本攻击方法来提高对抗样本的可迁移性。在每一步计算梯度的迭代过程中,不直接使用当前梯度,而是使用局部平均梯度来累积动量,以此来抑制损失函数曲面存在的局部振荡现象,从而稳定更新方向,逃离局部极值点。在ImageNet数据集上的大量实验结果表明:所提方法与现有基于动量的方法相比,在单个模型攻击实验中的平均黑盒攻击成功率分别提升了38.07%和27.77%,在集成模型攻击实验中的平均黑盒攻击成功率分别提升了32.50%和28.63%。
Abstract:Deep neural networks (DNNs) are susceptible to attacks from adversairial samples. Most existing momentum-based adversarial attack methods achieve nearly 100% attack success rates under the white-box setting, but only achieve relatively low attack success rates under the black-box setting. An adversarial attack method based on loss smoothing is proposed, which can further improve the adversarial transferability. By integrating the locally averaged gradient term into the iterative process for attacks, our methods can suppress the local oscillation of the loss surface, stabilize the update direction and escape from poor local maxima. Empirical results on the standard ImageNet dataset demonstrate that the proposed method could significantly improve the adversarial transferability by 38.07% and 27.77% under single-model setting, and 32.50% and 28.63% under ensemble-model setting than the existing methods.
-
表 1 单个模型攻击实验中不同方法的攻击成功率
Table 1. Success rates in single model setting by various adversarial attack methods
% 模型 攻击方法 Inc-v3[27] Inc-v4[28] IncRes-v2[28] Res-101[29] Inc-v3ens3[30] Inc-v3ens4[30] IncRes-v2ens[30] Inc-v3[27] MI-FGSM[19] 100.0* 45.1 41.9 35.5 13.5 12.9 6.3 LS-MI-FGSM 99.8* 85.8 84.7 77.3 56.7 54.2 36.1 NI-FGSM[20] 100.0* 51.7 48.4 41.0 12.5 13.5 5.7 LS-NI-FGSM 100.0* 79.2 77.5 69.8 38.9 39.1 23.8 Inc-v4[28] MI-FGSM[19] 56.8 99.8* 46.3 41.4 16.9 14.5 7.8 LS-MI-FGSM 88.2 98.1* 84.0 76.8 60.6 59.9 46.5 NI-FGSM[20] 65.0 100.0* 52.4 46.1 16.4 13.1 7.1 LS-NI-FGSM 86.3 99.8* 80.2 72.2 48.3 47.2 33.8 IncRes-v2[28] MI-FGSM[19] 59.2 51.9 97.8* 45.4 22.3 16.6 12.3 LS-MI-FGSM 84.5 82.8 94.8* 79.6 66.6 62.0 60.1 NI-FGSM[20] 62.5 54.4 98.9* 46.4 20.7 15.9 10.1 LS-NI-FGSM 85.4 83.5 99.1* 76.2 54.9 50.5 43.4 Res-101[29] MI-FGSM[19] 59.1 51.0 49.2 99.2* 25.0 21.6 13.2 LS-MI-FGSM 84.7 80.8 79.9 99.5* 68.7 64.6 54.3 NI-FGSM[20] 64.7 57.9 56.5 99.4* 23.5 21.0 11.6 LS-NI-FGSM 82.2 77.4 78.2 99.7* 59.6 53.6 43.4 注:*表示白盒攻击,其余为黑盒攻击。 表 2 集成模型攻击实验中不同方法的攻击成功率
Table 2. Success rates in multi-model setting by various adversarial attack methods
% 攻击方法 Inc-v3[27] Inc-v4[28] IncRes-v2[28] Res-101[29] Inc-v3ens3[30] Inc-v3ens4[30] IncRes-v2ens[30] MI-FGSM[19] 100.0* 99.7* 99.4* 99.9* 48.2 42.8 28.5 LS-MI-FGSM 99.8* 99.4* 98.8* 99.7* 82.5 78.3 56.2 NI-FGSM[20] 100.0* 99.9* 99.8* 100.0* 45.5 41.0 25.4 LS-NI-FGSM 100.0* 100.0* 100.0* 99.9* 74.3 70.6 52.9 注:*表示白盒攻击,其余为黑盒攻击。 -
[1] CHEN L Y, LI S B, BAI Q, et al. Review of image classification algorithms based on convolutional neural networks[J]. Remote Sensing, 2021, 13(22): 4712-4730. doi: 10.3390/rs13224712 [2] ZAIDI S S A, ANSARI M S, ASLAM A, et al. A survey of modern deep learning based object detection models[J]. Digital Signal Processing, 2021, 126: 512-523. [3] YUAN X H, SHI J F, GU L C. A review of deep learning methods for semantic segmentation of remote sensing imagery[J]. Expert Systems with Applications, 2021, 169: 114417. [4] CHAKRABORTY A, ALAM M, DEY V, et al. A survey on adversarial attacks and defences[J]. CAAI Transactions on Intelligence Technology, 2021, 6(1): 25-45. [5] GUO Y J, WEI X X, WANG G Q, et al. Meaningful adversarial stickers for face recognition in physical world[J]. IEEE Ṫransaction on Pattern Analysis and Machine Intelligence, 2023, 45(3): 2711-2725. [6] LIU A S, LIU X L, FAN J X, et al. Perceptual-sensitive GAN for generating adversarial patches[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washton, D.C.: AAAI, 2019, 33(1): 1028-1035. [7] SHU M, LIU C, QIU W, et al. Identifying model weakness with adversarial examiner[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI, 2022: 11998-12006. [8] WANG Y, MA X, BAILEY J, et al. On the convergence and robustness of adversarial training[C]//Proceedings of the 36rd International Conference on Machine Learning. New York: ICML, 2019: 11426-11438. [9] DEMONTIS A, MELIS M, PINTOR M, et al. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks[C]//Proceedings of the 28th USENIX Conference on Security Symposium. New York: ACM, 2018: 321-338. [10] GOODFELLOW I J, SHLENS J, SZEGEDY C . Explaining and harnessing adversarial examples[C]//Proceedings of the 3rd International Conference on Learning Representations. Washington, D.C.: ICLR, 2015: 1-11. [11] KURAKIN A, GOODFELLOW I, BENGIO S . Adversarial examples in the physical world[C]//Proceedings of the 5th International Conference on Learning Representations. Washington, D.C.: ICLR, 2017: 99-112. [12] MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[C]//Proceedings of Machine Learning Research. Baltimore: MD, 2017, 162: 52-65. [13] BAI T, LUO J Q, ZHAO J, et al. Recent advances in adversarial training for adversarial robustness[C]//Proceedings of the 30th International Joint Conference on Artificial Intelligence. Freiburg: IJCAI, 2021: 4312-4321. [14] CHEN J X, FENG X, JIANG L, et al. State of charge estimation of lithium-ion battery using denoising autoencoder and gated recurrent unit recurrent neural network[J]. Energy, 2021, 227(9): 1-8. [15] LIU Y L, GAO Y, YIN W . An improved analysis of stochastic gradient descent with momentum[C]//Proceedings of the 34th Conference on Neural Information Processing Systems. Cambridge: NeurIPS, 2020: 18261-18271. [16] QU G N, LI N . Accelerated distributed nesterov gradient descent[J]. IEEE Transactions on Automatic Control, 2020, 65(6): 2566-2581. [17] HANG J, HAN K J, CHEN H, et al. Ensemble adversarial black-box attacks against deep learning systems[J]. Pattern Recognition, 2019, 101: 107184. [18] YANG K, YAU J, FEI-FEI L, et al. A study of face obfuscation in imageNet[C]//Proceedings of the 39rd International Conference on Machine Learning. New York: ICML, 2022. [19] DONG Y, LIAO F, PANG T, et al. Boosting adversarial attacks with mommtum[C]//Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 9185-9193. [20] LIN J D, SONG C B, HE K, et al. Nesterov accelerated gradient and scale invariance for improving transferability of adversarial examples[C]//Proceedings of the 8th International Conference on Learning Representations. Washington, D.C.: ICLR, 2019: 1-12. [21] STUTZ D, HEIN M, SCHIELE B. Disentangling adversarial robustness and generalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake: CVPR, 2019: 6976-6987. [22] TRAMER F, KURAKIN A, PAPERNOT N, et al. Ensemble adversarial training: Attacks and defenses[C]//Proceedings of the 6th International Conference on Learning Representations. Washington, D.C.: ICLR, 2018: 1-22. [23] KURAKIN A, GOODFELLOW I, BENGIO S. Adversarial machine learning at scale[C]//Proceedings of the International Conference on Learning Representations. Washington, D.C.: ICLR, 2016: 1-17 [24] GUO C, RANA M, CISSE M, et al. Countering adversarial images using input transformations[C]//Proceedings of the International Conference on Learning Representations. Washington, D.C.: ICLR, 2018: 1-12. [25] XIE C H, WANG J Y, ZHANG Z S, et al. Mitigating adversarial effects through randomization[C]//Proceedings of the 6th International Conference on Learning Representations. Washington, D.C.: ICLR, 2018: 1-16. [26] LIAO F Z, LIANG M, DONG Y P, et al. Defense against adversarial attacks using high-Level representation guided denoiser[C]//Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake: CVPR, 2018: 1778-1787. [27] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2818-2826. [28] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-resNet and the impact of residual connections on learning[J]. CoRR, 2016, 31(4): 4278-4284. [29] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake:CVPR, 2016: 770-778. [30] WANG H J,WANG Y S. Self-ensemble adversarial training for improved robustness[C]//Proceedings of the International Conference on Learning Representations. Washington, D.C.: ICLR, 2021:1-18. [31] LIU Y P, CHEN X Y, LIU C, et al. Delving into transferable adversarial examples and black-box attacks[C]//Proceedings of the 5th International Conference on Learning Representations. Washington, D.C.: ICLR, 2017: 1-24.