留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

大规模物联网恶意样本分析与分类方法

何清林 王丽宏 罗冰 杨黎斌

何清林, 王丽宏, 罗冰, 等 . 大规模物联网恶意样本分析与分类方法[J]. 北京航空航天大学学报, 2022, 48(2): 240-248. doi: 10.13700/j.bh.1001-5965.2020.0401
引用本文: 何清林, 王丽宏, 罗冰, 等 . 大规模物联网恶意样本分析与分类方法[J]. 北京航空航天大学学报, 2022, 48(2): 240-248. doi: 10.13700/j.bh.1001-5965.2020.0401
HE Qinglin, WANG Lihong, LUO Bing, et al. Large-scale IoT malware analysis and classification method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 240-248. doi: 10.13700/j.bh.1001-5965.2020.0401(in Chinese)
Citation: HE Qinglin, WANG Lihong, LUO Bing, et al. Large-scale IoT malware analysis and classification method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 240-248. doi: 10.13700/j.bh.1001-5965.2020.0401(in Chinese)

大规模物联网恶意样本分析与分类方法

doi: 10.13700/j.bh.1001-5965.2020.0401
基金项目: 

国家重点研发计划 2017YFC1201204

详细信息
    通讯作者:

    王丽宏, E-mail: wlh@isc.org.cn

  • 中图分类号: TP393.4;TP312

Large-scale IoT malware analysis and classification method

Funds: 

National Key R & D Program of China 2017YFC1201204

More Information
  • 摘要:

    物联网(IoT)恶意样本发展迅猛,在网络中大量攻击各类物联网设备,但由于开源问题导致其家族特征并不明显,需要一种更细粒度的样本分类方法,以解决高级威胁样本发现和攻击组织追踪等问题。针对该问题,对2019年5月至2020年5月捕获到的157 911个物联网恶意样本进行了大规模分析,并标注了一套包含9个家族分支共计12 278个样本的数据集。提出了物联网恶意样本的分类方法,通过静态逆向分析提取FCG图和文本等复杂结构特征,利用图表示学习和文本表示学习的特征,在标注的数据集上取得了平均召回率88.1%的分类效果。所提方法在实际工作应用中效果优异。

     

  • 图 1  样本分析处理示意图

    Figure 1.  Schematic diagram of sample analysis and processing

    图 2  两个Mozi样本的FCG示意图

    Figure 2.  Schematic diagram of FCG of two Mozi samples

    图 3  样本特征提取与分类方法示意图

    Figure 3.  Schematic diagram of sample feature extraction and classification method

    图 4  基于文本表示学习的向量特征学习

    Figure 4.  Vector characteristic learning based on text representation learning

    表  1  157 911个恶意样本CPU架构分布

    Table  1.   Distribution of CPU framework of 157 911 malwares

    CPU架构 样本个数 占比/%
    ARM 45 663 28.9
    MIPS 28 569 18.1
    X86 16 956 10.7
    MC68000 11 576 7.3
    PowerPC 10 202 6.5
    SuperH SH 9 793 6.2
    Sparc 7 703 4.9
    X86-64 5 210 3.3
    其他 22 239 14.1
    下载: 导出CSV

    表  2  物联网恶意样本加壳方式

    Table  2.   Packers of IoT malware

    壳类型 样本个数 占比/%
    UPX变种 41 026 26.0
    标准UPX3.94 15 373 9.7
    标准UPX3.95 2 148 1.4
    其他UPX标准版本 623 0.40
    下载: 导出CSV

    表  3  样本10大漏洞利用统计

    Table  3.   Statistics of top 10 vulnerability exploited in malware

    漏洞 影响设备 发现样本个数
    CVE-2017-17215 华为HG532家用路由器 18 025
    CVE-2014-8361 使用Realtek SDK摄像头 7 108
    CVE-2017-6884 Zyxel家用路由器 6 945
    Redis命令执行 安装Redis服务的设备 5 433
    CVE-2018-10561 GPON路由器 2 369
    JAWS命令执行 VPower DVR等 1 804
    Vacron命令执行 Vacron NVR设备 1 534
    Linksys命令执行 Linksys系列路由器 1 281
    Dlink命令执行 Dlink系列路由器 1 250
    Netgear命令执行 Netgear系列路由器 837
    下载: 导出CSV

    表  4  物联网恶意程序标注数据集

    Table  4.   Labeled IoT malware dataset

    类别 样本个数 覆盖CPU类型 类别说明
    1 241 ARM, MIPS, X86 Echobot系列样本
    2 123 ARM, MIPS Mozi系列样本
    3 2 676 ARM, MIPS, X86 UnHAnaA系列样本
    4 598 ARM, MIPS, X86 JoSho系列样本
    5 4 501 ARM, MIPS, X86 Loligang系列样本
    6 2 185 ARM, MIPS, X86 Yakuza系列样本
    7 807 ARM, MIPS, X86 Sora系列样本
    8 234 ARM, MIPS, X86 Fbot系列样本
    9 913 ARM, MIPS, X86 Owari系列样本
    下载: 导出CSV

    表  5  恶意样本中常用的3种CPU架构指令集数量

    Table  5.   Number of instruction set commonly used in malware with three CPU frameworks

    CPU类型 样本中使用的指令集数量
    ARM 179
    MIPS 198
    X86 195
    下载: 导出CSV

    表  6  不同特征和类别的分类召回率及F1

    Table  6.   Classification recall rate and F1 value with different features and categories

    类别 V1 V2 V2+V3
    R/% F1 R/% F1 R/% F1
    1 100 1.0 100 1.0 100 1.0
    2 87 0.91 90 0.92 92 0.93
    3 85 0.92 89 0.93 90 0.93
    4 47 0.52 67 0.72 70 0.74
    5 89 0.91 92 0.94 94 0.95
    6 93 0.93 94 0.94 96 0.96
    7 93 0.97 93 0.97 94 0.97
    8 82 0.84 86 0.89 88 0.91
    9 59 0.65 68 0.73 69 0.73
    均值 82 0.85 87 0.89 88.1 0.92
    下载: 导出CSV
  • [1] World Economic Forum. The global risks report 2020[EB/OL]. (2020-01-15)[2020-07-03]. https://www.weforum.org/reports/the-global-risks-report-2020.
    [2] Gartner Inc. Gartner identifies top 10 strategic IoT technologies and trends[EB/OL]. (2018-11-07)[2020-07-03]. https://www.gartner.com/en/newsroom/press-releases/2018-11-07-gartner-identifies-top-10-strategic-iot-technologies-and-trends.
    [3] ANTONAKAKIS M, APRIL T, BAILEY M, et al. Understanding the Mirai botnet[C]//USENIX Security Symposium, 2017: 1093-1110.
    [4] DE DONNO M, DRAGONI N, GIARETTA A, et al. DDoS-capable IoT malwares: Comparative analysis and Mirai investigation[J]. Security and Communication Networks, 2018, 2018: 7178164.
    [5] COZZI E, GRAZIANO M, FRATANTONIO Y, et al. Understanding Linux malware[C]//IEEE Symposium on Security and Privacy. Piscataway: IEEE Press, 2018: 161-175.
    [6] HERWIG S, HARVEY K, HUGHEY G, et al. Measurement and analysis of Hajime a peer-to-peer IoT botnet[C]//Network and Distributed Systems Security Symposium, 2019: 1-15.
    [7] 国家互联网应急中心. Mozi样本分析报告[EB/OL]. (2020-02-28)[2020-07-03]. https://www.ics-cert.org.cn/portal/page/112/f6aa66554f9a4669904d6b138cfea1ac.html.

    CNCERT. Dive into Mozi malware[EB/OL]. (2020-02-28)[2020-07-03]. https://www.ics-cert.org.cn/portal/page/112/f6aa66554f9a4669904d6b138cfea1ac.html (in Chinese).
    [8] Google LLC. VirusTotal[EB/OL]. [2020-07-03]. http://virustotal.com.
    [9] SU J W, VARGAS D V, PRASAD S, et al. Lightweight classification of IoT malware based on image recognition[C]//IEEE 42nd Annual Computer Software and Application Conference. Piscataway: IEEE Press, 2018: 664-669.
    [10] GIBERT D, MATEU C, PLANES J, et al. Classification of malware by using structural entropy on convolutional neural networks[C]//30th AAAI Conference on Innovative Applications of Artificial Intelligence, 2018: 1-6.
    [11] SRI SHAILA G, DARKI A, FALOUTSOS M, et al. IDAPro for IoT malware analysis [C]//Proceedings of the 12th USENIX Conference on Cyber Security Experimentation and Test, 2019: 15.
    [12] WANG F, SHOSHITAISHVILI Y. Angr-The next generation of binary analysis[C]//2017 IEEE Cybersecurity Development. Piscataway: IEEE Press, 2017: 8-9.
    [13] Radare2[EB/OL]. [2020-07-03]. https://github.com/radareorg/radare2.
    [14] HU X, CHIUEH T, SHIN K G. Large-scale malware indexing using function-call graphs[C]//ACM Conference on Computer and Communications Security. New York: ACM, 2009: 611-620.
    [15] KONG D, YAN G H. Discriminant malware distance learning on structural information for automated malware classification[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 1357-1365.
    [16] CIMPANU C. New Echobot malware is a smorgasbord of vulnerabilities[EB/OL]. (2019-06-17)[2020-07-03]. https://www.zdnet.com/article/new-echobot-malware-is-a-smorgasbord-of-vulnerabilities.
    [17] Microsoft malware classification challenge (BIG 2015)[EB/OL]. [2020-07-03]. https://www.kaggle.com/c/malware-classification.
    [18] HARUYAMA T. fn_fuzzy: Fast multiple binary diffing triage with IDA[EB/OL]. (2019-05-09)[2020-07-03]. https://conference.hitb.org/hitbsecconf2019ams/sessions/fn_fuzzy-fast-multiple-binary-diffing-triage-with-ida/.
    [19] XU X J, LIU C, FENG Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection[C]//ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 363-376.
    [20] DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT, 2019: 4171-4186.
    [21] HEITMAN C, ARCE I. BARF: A multiplatform open source binary analysis and reverse engineering framework[C]//XX Congreso Argentino de Ciencias de la Computación, 2014.
    [22] ALAM S, HORSPOOL R N, TRAORÉ I. MAIL: Malware analysis intermediate language: A step towards automating and optimizing malware detection[C]//Proceedings of the 6th International Conference on Security of Information and Networks, 2013: 233-240.
    [23] SHERVASHIDZE N, SCHWEITZER P, VAN LEEUWEN E J, et al. Weisfeiler-Lehman graph kernels[J]. Journal of Machine Learning Research, 2011, 12: 2539-2561. http://e-citations.ethbib.ethz.ch/view/pub:138403
    [24] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS2013, 2013: 3111-3119.
    [25] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of ICML, 2014: 1188-1196.
  • 加载中
图(4) / 表(6)
计量
  • 文章访问数:  392
  • HTML全文浏览量:  45
  • PDF下载量:  70
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-09
  • 录用日期:  2020-09-05
  • 网络出版日期:  2022-02-20

目录

    /

    返回文章
    返回
    常见问答