-
摘要:
物联网(IoT)恶意样本发展迅猛,在网络中大量攻击各类物联网设备,但由于开源问题导致其家族特征并不明显,需要一种更细粒度的样本分类方法,以解决高级威胁样本发现和攻击组织追踪等问题。针对该问题,对2019年5月至2020年5月捕获到的157 911个物联网恶意样本进行了大规模分析,并标注了一套包含9个家族分支共计12 278个样本的数据集。提出了物联网恶意样本的分类方法,通过静态逆向分析提取FCG图和文本等复杂结构特征,利用图表示学习和文本表示学习的特征,在标注的数据集上取得了平均召回率88.1%的分类效果。所提方法在实际工作应用中效果优异。
Abstract:Recently, Internet of things (IoT) malware emerges in large numbers and attacks IoT devices in cyberspace. However, the family characteristics of IoT malwares are not obvious due to the open-source problem, a more fine-grained malware classification method is needed to solve the problems of advanced threat malware discovery and attack organization tracking. To address this question, we took a large-scale analysis of 157 911 IoT malwares which have been found from May 2019 to May 2020, and labeled a dataset which includes 9 categories and 12 278 malwares. Then we proposed an IoT malware classification method whose main idea is extracting complex structure features including FCG graph and text by static reverse analysis. The learning features using graph representation learning and text representation learning were used, and the experiments on the labeled dataset show that the average recall rate is 88.1%. Our method has been taken into practice and works well.
-
Key words:
- Internet of things (IoT) /
- malware /
- classification /
- graph learning /
- text learning
-
表 1 157 911个恶意样本CPU架构分布
Table 1. Distribution of CPU framework of 157 911 malwares
CPU架构 样本个数 占比/% ARM 45 663 28.9 MIPS 28 569 18.1 X86 16 956 10.7 MC68000 11 576 7.3 PowerPC 10 202 6.5 SuperH SH 9 793 6.2 Sparc 7 703 4.9 X86-64 5 210 3.3 其他 22 239 14.1 表 2 物联网恶意样本加壳方式
Table 2. Packers of IoT malware
壳类型 样本个数 占比/% UPX变种 41 026 26.0 标准UPX3.94 15 373 9.7 标准UPX3.95 2 148 1.4 其他UPX标准版本 623 0.40 表 3 样本10大漏洞利用统计
Table 3. Statistics of top 10 vulnerability exploited in malware
漏洞 影响设备 发现样本个数 CVE-2017-17215 华为HG532家用路由器 18 025 CVE-2014-8361 使用Realtek SDK摄像头 7 108 CVE-2017-6884 Zyxel家用路由器 6 945 Redis命令执行 安装Redis服务的设备 5 433 CVE-2018-10561 GPON路由器 2 369 JAWS命令执行 VPower DVR等 1 804 Vacron命令执行 Vacron NVR设备 1 534 Linksys命令执行 Linksys系列路由器 1 281 Dlink命令执行 Dlink系列路由器 1 250 Netgear命令执行 Netgear系列路由器 837 表 4 物联网恶意程序标注数据集
Table 4. Labeled IoT malware dataset
类别 样本个数 覆盖CPU类型 类别说明 1 241 ARM, MIPS, X86 Echobot系列样本 2 123 ARM, MIPS Mozi系列样本 3 2 676 ARM, MIPS, X86 UnHAnaA系列样本 4 598 ARM, MIPS, X86 JoSho系列样本 5 4 501 ARM, MIPS, X86 Loligang系列样本 6 2 185 ARM, MIPS, X86 Yakuza系列样本 7 807 ARM, MIPS, X86 Sora系列样本 8 234 ARM, MIPS, X86 Fbot系列样本 9 913 ARM, MIPS, X86 Owari系列样本 表 5 恶意样本中常用的3种CPU架构指令集数量
Table 5. Number of instruction set commonly used in malware with three CPU frameworks
CPU类型 样本中使用的指令集数量 ARM 179 MIPS 198 X86 195 表 6 不同特征和类别的分类召回率及F1值
Table 6. Classification recall rate and F1 value with different features and categories
类别 V1 V2 V2+V3 R/% F1 R/% F1 R/% F1 1 100 1.0 100 1.0 100 1.0 2 87 0.91 90 0.92 92 0.93 3 85 0.92 89 0.93 90 0.93 4 47 0.52 67 0.72 70 0.74 5 89 0.91 92 0.94 94 0.95 6 93 0.93 94 0.94 96 0.96 7 93 0.97 93 0.97 94 0.97 8 82 0.84 86 0.89 88 0.91 9 59 0.65 68 0.73 69 0.73 均值 82 0.85 87 0.89 88.1 0.92 -
[1] World Economic Forum. The global risks report 2020[EB/OL]. (2020-01-15)[2020-07-03]. https://www.weforum.org/reports/the-global-risks-report-2020. [2] Gartner Inc. Gartner identifies top 10 strategic IoT technologies and trends[EB/OL]. (2018-11-07)[2020-07-03]. https://www.gartner.com/en/newsroom/press-releases/2018-11-07-gartner-identifies-top-10-strategic-iot-technologies-and-trends. [3] ANTONAKAKIS M, APRIL T, BAILEY M, et al. Understanding the Mirai botnet[C]//USENIX Security Symposium, 2017: 1093-1110. [4] DE DONNO M, DRAGONI N, GIARETTA A, et al. DDoS-capable IoT malwares: Comparative analysis and Mirai investigation[J]. Security and Communication Networks, 2018, 2018: 7178164. [5] COZZI E, GRAZIANO M, FRATANTONIO Y, et al. Understanding Linux malware[C]//IEEE Symposium on Security and Privacy. Piscataway: IEEE Press, 2018: 161-175. [6] HERWIG S, HARVEY K, HUGHEY G, et al. Measurement and analysis of Hajime a peer-to-peer IoT botnet[C]//Network and Distributed Systems Security Symposium, 2019: 1-15. [7] 国家互联网应急中心. Mozi样本分析报告[EB/OL]. (2020-02-28)[2020-07-03]. https://www.ics-cert.org.cn/portal/page/112/f6aa66554f9a4669904d6b138cfea1ac.html.CNCERT. Dive into Mozi malware[EB/OL]. (2020-02-28)[2020-07-03]. https://www.ics-cert.org.cn/portal/page/112/f6aa66554f9a4669904d6b138cfea1ac.html (in Chinese). [8] Google LLC. VirusTotal[EB/OL]. [2020-07-03]. http://virustotal.com. [9] SU J W, VARGAS D V, PRASAD S, et al. Lightweight classification of IoT malware based on image recognition[C]//IEEE 42nd Annual Computer Software and Application Conference. Piscataway: IEEE Press, 2018: 664-669. [10] GIBERT D, MATEU C, PLANES J, et al. Classification of malware by using structural entropy on convolutional neural networks[C]//30th AAAI Conference on Innovative Applications of Artificial Intelligence, 2018: 1-6. [11] SRI SHAILA G, DARKI A, FALOUTSOS M, et al. IDAPro for IoT malware analysis [C]//Proceedings of the 12th USENIX Conference on Cyber Security Experimentation and Test, 2019: 15. [12] WANG F, SHOSHITAISHVILI Y. Angr-The next generation of binary analysis[C]//2017 IEEE Cybersecurity Development. Piscataway: IEEE Press, 2017: 8-9. [13] Radare2[EB/OL]. [2020-07-03]. https://github.com/radareorg/radare2. [14] HU X, CHIUEH T, SHIN K G. Large-scale malware indexing using function-call graphs[C]//ACM Conference on Computer and Communications Security. New York: ACM, 2009: 611-620. [15] KONG D, YAN G H. Discriminant malware distance learning on structural information for automated malware classification[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 1357-1365. [16] CIMPANU C. New Echobot malware is a smorgasbord of vulnerabilities[EB/OL]. (2019-06-17)[2020-07-03]. https://www.zdnet.com/article/new-echobot-malware-is-a-smorgasbord-of-vulnerabilities. [17] Microsoft malware classification challenge (BIG 2015)[EB/OL]. [2020-07-03]. https://www.kaggle.com/c/malware-classification. [18] HARUYAMA T. fn_fuzzy: Fast multiple binary diffing triage with IDA[EB/OL]. (2019-05-09)[2020-07-03]. https://conference.hitb.org/hitbsecconf2019ams/sessions/fn_fuzzy-fast-multiple-binary-diffing-triage-with-ida/. [19] XU X J, LIU C, FENG Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection[C]//ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 363-376. [20] DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT, 2019: 4171-4186. [21] HEITMAN C, ARCE I. BARF: A multiplatform open source binary analysis and reverse engineering framework[C]//XX Congreso Argentino de Ciencias de la Computación, 2014. [22] ALAM S, HORSPOOL R N, TRAORÉ I. MAIL: Malware analysis intermediate language: A step towards automating and optimizing malware detection[C]//Proceedings of the 6th International Conference on Security of Information and Networks, 2013: 233-240. [23] SHERVASHIDZE N, SCHWEITZER P, VAN LEEUWEN E J, et al. Weisfeiler-Lehman graph kernels[J]. Journal of Machine Learning Research, 2011, 12: 2539-2561. http://e-citations.ethbib.ethz.ch/view/pub:138403 [24] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS2013, 2013: 3111-3119. [25] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of ICML, 2014: 1188-1196.