基于损失平滑的对抗样本攻击方法

黎妹红; 金双; 杜晔

doi:10.13700/j.bh.1001-5965.2022.0478

基于损失平滑的对抗样本攻击方法

doi: 10.13700/j.bh.1001-5965.2022.0478

黎妹红^{1, 2, ,},
金双^{1, 2},
杜晔^{1, 2}

1.
北京交通大学智能交通数据安全与隐私保护技术北京市重点实验室，北京 100044
2.
北京交通大学计算机与信息技术学院，北京 100044

基金项目: 国家重点研发计划项目(2020YFB2103800,2020YFB2103802)

详细信息

通讯作者:
E-mail：mhli1@bjtu.edu.cn

中图分类号: V221⁺.3；TP309；TP391.41
计量
- 文章访问数: 1439
- HTML全文浏览量: 92
- PDF下载量: 7
- 被引次数: 0
出版历程
- 收稿日期: 2022-06-11
- 录用日期: 2022-09-06
- 网络出版日期: 2022-09-15
- 整期出版日期: 2024-02-27

Adversarial attack method based on loss smoothing

LI Meihong^{1, 2
, ,},
JIN Shuang^{1, 2},
DU Ye^{1, 2}

1.
Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing Jiaotong University, Beijing 100044, China
2.
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

Funds: National Key R&D Program of China (2020YFB2103800,2020YFB2103802)

More Information

Corresponding author: E-mail：mhli1@bjtu.edu.cn

摘要

摘要:
深度神经网络(DNNs)容易受到对抗样本的攻击，现有基于动量的对抗样本生成方法虽然可以达到接近100%的白盒攻击成功率，但是在攻击其他模型时效果仍不理想，黑盒攻击成功率较低。针对此，提出一种基于损失平滑的对抗样本攻击方法来提高对抗样本的可迁移性。在每一步计算梯度的迭代过程中，不直接使用当前梯度，而是使用局部平均梯度来累积动量，以此来抑制损失函数曲面存在的局部振荡现象，从而稳定更新方向，逃离局部极值点。在ImageNet数据集上的大量实验结果表明：所提方法与现有基于动量的方法相比，在单个模型攻击实验中的平均黑盒攻击成功率分别提升了38.07%和27.77%，在集成模型攻击实验中的平均黑盒攻击成功率分别提升了32.50%和28.63%。
- 深度神经网络 /
- 对抗样本 /
- 黑盒攻击 /
- 损失平滑 /
- 人工智能安全
Abstract:
Deep neural networks (DNNs) are susceptible to attacks from adversairial samples. Most existing momentum-based adversarial attack methods achieve nearly 100% attack success rates under the white-box setting, but only achieve relatively low attack success rates under the black-box setting. An adversarial attack method based on loss smoothing is proposed, which can further improve the adversarial transferability. By integrating the locally averaged gradient term into the iterative process for attacks, our methods can suppress the local oscillation of the loss surface, stabilize the update direction and escape from poor local maxima. Empirical results on the standard ImageNet dataset demonstrate that the proposed method could significantly improve the adversarial transferability by 38.07% and 27.77% under single-model setting, and 32.50% and 28.63% under ensemble-model setting than the existing methods.
- deep neural network /
- adversarial example /
- black-box attack /
- loss-smoothing /
- artificial intelligence security

HTML全文

图 1 非平滑损失曲面示例

Figure 1. Example of non-smooth loss function surface

下载: 全尺寸图片幻灯片

图 2 本文方法框架图

Figure 2. Frame diagram of the proposed method

下载: 全尺寸图片幻灯片

图 3 不同对抗攻击方法关系

Figure 3. Relationship between different adversarial attack methods

下载: 全尺寸图片幻灯片

图 4 邻域范围上界与攻击成功率关系

Figure 4. Relationship between upper bound of neighborhood and attack success rate

下载: 全尺寸图片幻灯片

图 5 邻域内采样的样本数量与攻击成功率关系

Figure 5. Relationship between number of sampled in neighborhood and attack success rate

下载: 全尺寸图片幻灯片

表 1 单个模型攻击实验中不同方法的攻击成功率

Table 1. Success rates in single model setting by various adversarial attack methods %

模型	攻击方法	Inc-v3^[27]	Inc-v4^[28]	IncRes-v2^[28]	Res-101^[29]	Inc-v3_ens3^[30]	Inc-v3_ens4^[30]	IncRes-v2_ens^[30]
Inc-v3^[27]	MI-FGSM^[19]	100.0*	45.1	41.9	35.5	13.5	12.9	6.3
	LS-MI-FGSM	99.8*	85.8	84.7	77.3	56.7	54.2	36.1
	NI-FGSM^[20]	100.0*	51.7	48.4	41.0	12.5	13.5	5.7
	LS-NI-FGSM	100.0*	79.2	77.5	69.8	38.9	39.1	23.8
Inc-v4^[28]	MI-FGSM^[19]	56.8	99.8*	46.3	41.4	16.9	14.5	7.8
	LS-MI-FGSM	88.2	98.1*	84.0	76.8	60.6	59.9	46.5
	NI-FGSM^[20]	65.0	100.0*	52.4	46.1	16.4	13.1	7.1
	LS-NI-FGSM	86.3	99.8*	80.2	72.2	48.3	47.2	33.8
IncRes-v2^[28]	MI-FGSM^[19]	59.2	51.9	97.8*	45.4	22.3	16.6	12.3
	LS-MI-FGSM	84.5	82.8	94.8*	79.6	66.6	62.0	60.1
	NI-FGSM^[20]	62.5	54.4	98.9*	46.4	20.7	15.9	10.1
	LS-NI-FGSM	85.4	83.5	99.1*	76.2	54.9	50.5	43.4
Res-101^[29]	MI-FGSM^[19]	59.1	51.0	49.2	99.2*	25.0	21.6	13.2
	LS-MI-FGSM	84.7	80.8	79.9	99.5*	68.7	64.6	54.3
	NI-FGSM^[20]	64.7	57.9	56.5	99.4*	23.5	21.0	11.6
	LS-NI-FGSM	82.2	77.4	78.2	99.7*	59.6	53.6	43.4
注：*表示白盒攻击，其余为黑盒攻击。

下载: 导出CSV

表 2 集成模型攻击实验中不同方法的攻击成功率

Table 2. Success rates in multi-model setting by various adversarial attack methods %

攻击方法	Inc-v3^[27]	Inc-v4^[28]	IncRes-v2^[28]	Res-101^[29]	Inc-v3_ens3^[30]	Inc-v3_ens4^[30]	IncRes-v2_ens^[30]
MI-FGSM^[19]	100.0*	99.7*	99.4*	99.9*	48.2	42.8	28.5
LS-MI-FGSM	99.8*	99.4*	98.8*	99.7*	82.5	78.3	56.2
NI-FGSM^[20]	100.0*	99.9*	99.8*	100.0*	45.5	41.0	25.4
LS-NI-FGSM	100.0*	100.0*	100.0*	99.9*	74.3	70.6	52.9
注：*表示白盒攻击，其余为黑盒攻击。

下载: 导出CSV

参考文献(31)

[1]	CHEN L Y, LI S B, BAI Q, et al. Review of image classification algorithms based on convolutional neural networks[J]. Remote Sensing, 2021, 13(22): 4712-4730. doi: 10.3390/rs13224712
[2]	ZAIDI S S A, ANSARI M S, ASLAM A, et al. A survey of modern deep learning based object detection models[J]. Digital Signal Processing, 2021, 126: 512-523.
[3]	YUAN X H, SHI J F, GU L C. A review of deep learning methods for semantic segmentation of remote sensing imagery[J]. Expert Systems with Applications, 2021, 169: 114417.
[4]	CHAKRABORTY A, ALAM M, DEY V, et al. A survey on adversarial attacks and defences[J]. CAAI Transactions on Intelligence Technology, 2021, 6(1): 25-45.
[5]	GUO Y J, WEI X X, WANG G Q, et al. Meaningful adversarial stickers for face recognition in physical world[J]. IEEE Ṫransaction on Pattern Analysis and Machine Intelligence, 2023, 45(3): 2711-2725.
[6]	LIU A S, LIU X L, FAN J X, et al. Perceptual-sensitive GAN for generating adversarial patches[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washton, D.C.: AAAI, 2019, 33(1): 1028-1035.
[7]	SHU M, LIU C, QIU W, et al. Identifying model weakness with adversarial examiner[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI, 2022: 11998-12006.
[8]	WANG Y, MA X, BAILEY J, et al. On the convergence and robustness of adversarial training[C]//Proceedings of the 36rd International Conference on Machine Learning. New York: ICML, 2019: 11426-11438.
[9]	DEMONTIS A, MELIS M, PINTOR M, et al. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks[C]//Proceedings of the 28th USENIX Conference on Security Symposium. New York: ACM, 2018: 321-338.
[10]	GOODFELLOW I J, SHLENS J, SZEGEDY C . Explaining and harnessing adversarial examples[C]//Proceedings of the 3rd International Conference on Learning Representations. Washington, D.C.: ICLR, 2015: 1-11.
[11]	KURAKIN A, GOODFELLOW I, BENGIO S . Adversarial examples in the physical world[C]//Proceedings of the 5th International Conference on Learning Representations. Washington, D.C.: ICLR, 2017: 99-112.
[12]	MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[C]//Proceedings of Machine Learning Research. Baltimore: MD, 2017, 162: 52-65.
[13]	BAI T, LUO J Q, ZHAO J, et al. Recent advances in adversarial training for adversarial robustness[C]//Proceedings of the 30th International Joint Conference on Artificial Intelligence. Freiburg: IJCAI, 2021: 4312-4321.
[14]	CHEN J X, FENG X, JIANG L, et al. State of charge estimation of lithium-ion battery using denoising autoencoder and gated recurrent unit recurrent neural network[J]. Energy, 2021, 227(9): 1-8.
[15]	LIU Y L, GAO Y, YIN W . An improved analysis of stochastic gradient descent with momentum[C]//Proceedings of the 34th Conference on Neural Information Processing Systems. Cambridge: NeurIPS, 2020: 18261-18271.
[16]	QU G N, LI N . Accelerated distributed nesterov gradient descent[J]. IEEE Transactions on Automatic Control, 2020, 65(6): 2566-2581.
[17]	HANG J, HAN K J, CHEN H, et al. Ensemble adversarial black-box attacks against deep learning systems[J]. Pattern Recognition, 2019, 101: 107184.
[18]	YANG K, YAU J, FEI-FEI L, et al. A study of face obfuscation in imageNet[C]//Proceedings of the 39rd International Conference on Machine Learning. New York: ICML, 2022.
[19]	DONG Y, LIAO F, PANG T, et al. Boosting adversarial attacks with mommtum[C]//Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 9185-9193.
[20]	LIN J D, SONG C B, HE K, et al. Nesterov accelerated gradient and scale invariance for improving transferability of adversarial examples[C]//Proceedings of the 8th International Conference on Learning Representations. Washington, D.C.: ICLR, 2019: 1-12.
[21]	STUTZ D, HEIN M, SCHIELE B. Disentangling adversarial robustness and generalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake: CVPR, 2019: 6976-6987.
[22]	TRAMER F, KURAKIN A, PAPERNOT N, et al. Ensemble adversarial training: Attacks and defenses[C]//Proceedings of the 6th International Conference on Learning Representations. Washington, D.C.: ICLR, 2018: 1-22.
[23]	KURAKIN A, GOODFELLOW I, BENGIO S. Adversarial machine learning at scale[C]//Proceedings of the International Conference on Learning Representations. Washington, D.C.: ICLR, 2016: 1-17
[24]	GUO C, RANA M, CISSE M, et al. Countering adversarial images using input transformations[C]//Proceedings of the International Conference on Learning Representations. Washington, D.C.: ICLR, 2018: 1-12.
[25]	XIE C H, WANG J Y, ZHANG Z S, et al. Mitigating adversarial effects through randomization[C]//Proceedings of the 6th International Conference on Learning Representations. Washington, D.C.: ICLR, 2018: 1-16.
[26]	LIAO F Z, LIANG M, DONG Y P, et al. Defense against adversarial attacks using high-Level representation guided denoiser[C]//Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake: CVPR, 2018: 1778-1787.
[27]	SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2818-2826.
[28]	SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-resNet and the impact of residual connections on learning[J]. CoRR, 2016, 31(4): 4278-4284.
[29]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake:CVPR, 2016: 770-778.
[30]	WANG H J,WANG Y S. Self-ensemble adversarial training for improved robustness[C]//Proceedings of the International Conference on Learning Representations. Washington, D.C.: ICLR, 2021:1-18.
[31]	LIU Y P, CHEN X Y, LIU C, et al. Delving into transferable adversarial examples and black-box attacks[C]//Proceedings of the 5th International Conference on Learning Representations. Washington, D.C.: ICLR, 2017: 1-24.