Citation: | HE Qinglin, WANG Lihong, LUO Bing, et al. Large-scale IoT malware analysis and classification method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 240-248. doi: 10.13700/j.bh.1001-5965.2020.0401(in Chinese) |
Recently, Internet of things (IoT) malware emerges in large numbers and attacks IoT devices in cyberspace. However, the family characteristics of IoT malwares are not obvious due to the open-source problem, a more fine-grained malware classification method is needed to solve the problems of advanced threat malware discovery and attack organization tracking. To address this question, we took a large-scale analysis of 157 911 IoT malwares which have been found from May 2019 to May 2020, and labeled a dataset which includes 9 categories and 12 278 malwares. Then we proposed an IoT malware classification method whose main idea is extracting complex structure features including FCG graph and text by static reverse analysis. The learning features using graph representation learning and text representation learning were used, and the experiments on the labeled dataset show that the average recall rate is 88.1%. Our method has been taken into practice and works well.
[1] |
World Economic Forum. The global risks report 2020[EB/OL]. (2020-01-15)[2020-07-03]. https://www.weforum.org/reports/the-global-risks-report-2020.
|
[2] |
Gartner Inc. Gartner identifies top 10 strategic IoT technologies and trends[EB/OL]. (2018-11-07)[2020-07-03]. https://www.gartner.com/en/newsroom/press-releases/2018-11-07-gartner-identifies-top-10-strategic-iot-technologies-and-trends.
|
[3] |
ANTONAKAKIS M, APRIL T, BAILEY M, et al. Understanding the Mirai botnet[C]//USENIX Security Symposium, 2017: 1093-1110.
|
[4] |
DE DONNO M, DRAGONI N, GIARETTA A, et al. DDoS-capable IoT malwares: Comparative analysis and Mirai investigation[J]. Security and Communication Networks, 2018, 2018: 7178164.
|
[5] |
COZZI E, GRAZIANO M, FRATANTONIO Y, et al. Understanding Linux malware[C]//IEEE Symposium on Security and Privacy. Piscataway: IEEE Press, 2018: 161-175.
|
[6] |
HERWIG S, HARVEY K, HUGHEY G, et al. Measurement and analysis of Hajime a peer-to-peer IoT botnet[C]//Network and Distributed Systems Security Symposium, 2019: 1-15.
|
[7] |
国家互联网应急中心. Mozi样本分析报告[EB/OL]. (2020-02-28)[2020-07-03]. https://www.ics-cert.org.cn/portal/page/112/f6aa66554f9a4669904d6b138cfea1ac.html.
CNCERT. Dive into Mozi malware[EB/OL]. (2020-02-28)[2020-07-03]. https://www.ics-cert.org.cn/portal/page/112/f6aa66554f9a4669904d6b138cfea1ac.html (in Chinese).
|
[8] |
Google LLC. VirusTotal[EB/OL]. [2020-07-03]. http://virustotal.com.
|
[9] |
SU J W, VARGAS D V, PRASAD S, et al. Lightweight classification of IoT malware based on image recognition[C]//IEEE 42nd Annual Computer Software and Application Conference. Piscataway: IEEE Press, 2018: 664-669.
|
[10] |
GIBERT D, MATEU C, PLANES J, et al. Classification of malware by using structural entropy on convolutional neural networks[C]//30th AAAI Conference on Innovative Applications of Artificial Intelligence, 2018: 1-6.
|
[11] |
SRI SHAILA G, DARKI A, FALOUTSOS M, et al. IDAPro for IoT malware analysis [C]//Proceedings of the 12th USENIX Conference on Cyber Security Experimentation and Test, 2019: 15.
|
[12] |
WANG F, SHOSHITAISHVILI Y. Angr-The next generation of binary analysis[C]//2017 IEEE Cybersecurity Development. Piscataway: IEEE Press, 2017: 8-9.
|
[13] |
Radare2[EB/OL]. [2020-07-03]. https://github.com/radareorg/radare2.
|
[14] |
HU X, CHIUEH T, SHIN K G. Large-scale malware indexing using function-call graphs[C]//ACM Conference on Computer and Communications Security. New York: ACM, 2009: 611-620.
|
[15] |
KONG D, YAN G H. Discriminant malware distance learning on structural information for automated malware classification[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 1357-1365.
|
[16] |
CIMPANU C. New Echobot malware is a smorgasbord of vulnerabilities[EB/OL]. (2019-06-17)[2020-07-03]. https://www.zdnet.com/article/new-echobot-malware-is-a-smorgasbord-of-vulnerabilities.
|
[17] |
Microsoft malware classification challenge (BIG 2015)[EB/OL]. [2020-07-03]. https://www.kaggle.com/c/malware-classification.
|
[18] |
HARUYAMA T. fn_fuzzy: Fast multiple binary diffing triage with IDA[EB/OL]. (2019-05-09)[2020-07-03]. https://conference.hitb.org/hitbsecconf2019ams/sessions/fn_fuzzy-fast-multiple-binary-diffing-triage-with-ida/.
|
[19] |
XU X J, LIU C, FENG Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection[C]//ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 363-376.
|
[20] |
DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT, 2019: 4171-4186.
|
[21] |
HEITMAN C, ARCE I. BARF: A multiplatform open source binary analysis and reverse engineering framework[C]//XX Congreso Argentino de Ciencias de la Computación, 2014.
|
[22] |
ALAM S, HORSPOOL R N, TRAORÉ I. MAIL: Malware analysis intermediate language: A step towards automating and optimizing malware detection[C]//Proceedings of the 6th International Conference on Security of Information and Networks, 2013: 233-240.
|
[23] |
SHERVASHIDZE N, SCHWEITZER P, VAN LEEUWEN E J, et al. Weisfeiler-Lehman graph kernels[J]. Journal of Machine Learning Research, 2011, 12: 2539-2561. http://e-citations.ethbib.ethz.ch/view/pub:138403
|
[24] |
MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS2013, 2013: 3111-3119.
|
[25] |
LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of ICML, 2014: 1188-1196.
|