Malware family classification method based on abstract assembly instructions

LI Yu; LUO Senlin; HAO Jingwei; PAN Limin

doi:10.13700/j.bh.1001-5965.2020.0568

Volume 48 Issue 2

Feb. 2022

Turn off MathJax

Article Contents

Abstract

References

Journal of Beijing University of Aeronautics and Astronautics > 2022 > 48(2): 348-355.

Li Chao, Li Yunhua. High-accuracy measuring system for position and attitude in pushing-bridge construction based on machine vision[J]. Journal of Beijing University of Aeronautics and Astronautics, 2007, 33(03): 322-326. (in Chinese)

Citation:

LI Yu, LUO Senlin, HAO Jingwei, et al. Malware family classification method based on abstract assembly instructions[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 348-355. doi: 10.13700/j.bh.1001-5965.2020.0568(in Chinese)

Citation:

PDF( 3805 KB)

Malware family classification method based on abstract assembly instructions

doi: 10.13700/j.bh.1001-5965.2020.0568

School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China

Funds:

2020 Information Security Software Project of the Ministry of Industry and Information Technology CEIEC-2020-ZM02-0134

More Information

Corresponding author: PAN Limin, E-mail: panlimin2016@gmail.com
Received Date: 30 Sep 2020
Accepted Date: 13 Nov 2020
Publish Date: 20 Feb 2022

Abstract

Abstract

The emergence of malware variants poses a great threat to network security. In malware family classification methods based on assembly instructions, the semantics of operands are closely related to the operating environment and difficult to extract, which leads to the lack of instruction semantics and the difficulty in correctly classifying malware variants. A malware family classification method based on abstract assembly instructions is proposed. The instruction is reconstructed by abstracting the operand type, so that the semantics of the operands can be separated from the constraints of the operating environment. The word attention mechanism and bidirectional gate recurrent unit (Bi-GRU) are used to construct an instruction embedding network and to capture the instruction behavior semantics. Combined with bidirectional recursive neural networks (Bi-RNN), the common instruction sequence of malware family is learned to reduce the interference of variation technology on the instruction sequence. The original instruction and family common instruction sequence are integrated to construct feature images, and the malware family classification is realized through convolutional neural network. The experimental results on the public dataset show that the proposed method can effectively extract operand information, resist the interference of irrelevant instructions in malware variants, and realize the family classification of malware variants.

FullText(HTML)

References(16)

References

[1]	YE Y F, LI T, ADJEROH D, et al. A survey on malware detection using data mining techniques[J]. ACM Computing Surveys, 2017, 50(3): 1-40.
[2]	YE Y F, LI T, ZHU S H, et al. Combining file content and file relations for cloud based malware detection[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2011: 222-230.
[3]	TAMERSOY A, ROUNDY K, CHAU D H. Guilt by association: Large scale malware detection by mining file-relation graphs[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 1524-1533.
[4]	DING Y X, XIA X L, CHEN S, et al. A malware detection method based on family behavior graph[J]. Computers & Security, 2018, 73: 73-86.
[5]	HARDY W, CHEN L, HOU S, et al. D14md: A deep learning framework for intelligent malware detection[C]//Proceedings of the International Conference on Data Mining(DMIN). New York: CSREA, 2016: 61.
[6]	NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware images: Visualization and automatic classification[C]//Proceedings of the 8th International Symposium on Visualization for Cyber Security. New York: ACM, 2011: 1-7.
[7]	CUI Z H, XUE F, CAI X J, et al. Detection of malicious code variants based on deep learning[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3187-3196. doi: 10.1109/TII.2018.2822680
[8]	TRINIUS P, HOLZ T, GÖBEL J, et al. Visual analysis of malware behavior using treemaps and thread graphs[C]//20096th International Workshop on Visualization for Cyber Security. Piscataway: IEEE Press, 2009: 33-38.
[9]	ZHANG J X, QIN Z, YIN H, et al. IRMD: Malware variant detection using opcode image recognition[C]//2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). Piscataway: IEEE Press, 2016: 1175-1180.
[10]	NI S, QIAN Q, ZHANG R. Malware identification using visualization images and deep learning[J]. Computers & Security, 2018, 77: 871-885.
[11]	SUN G S, QIAN Q. Deep learning and visualization for identifying malware families[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(1): 283-295. doi: 10.1109/TDSC.2018.2884928
[12]	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of ICLR Workshops Track. Cambridge: MIT Press, 2013: 21-29.
[13]	CHO K, VAN MERRIENBOER B, BAHDANAU D, et al. On the properties of neural machine translation: Encoder-decoder approaches[C]//Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014: 103-111.
[14]	YANG Z C, YANG D Y, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 33-38.
[15]	SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681. doi: 10.1109/78.650093
[16]	Microsoft. Microsoft malware classification challenge (Big 2015)[EB/OL]. (2019-06-11)[2020-09-01].