留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于异质信息网络的恶意代码检测

刘亚姝 侯跃然 严寒冰

刘亚姝, 侯跃然, 严寒冰等 . 基于异质信息网络的恶意代码检测[J]. 北京航空航天大学学报, 2022, 48(2): 258-265. doi: 10.13700/j.bh.1001-5965.2020.0539
引用本文: 刘亚姝, 侯跃然, 严寒冰等 . 基于异质信息网络的恶意代码检测[J]. 北京航空航天大学学报, 2022, 48(2): 258-265. doi: 10.13700/j.bh.1001-5965.2020.0539
LIU Yashu, HOU Yueran, YAN Hanbinget al. Malicious code detection based on heterogeneous information network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 258-265. doi: 10.13700/j.bh.1001-5965.2020.0539(in Chinese)
Citation: LIU Yashu, HOU Yueran, YAN Hanbinget al. Malicious code detection based on heterogeneous information network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 258-265. doi: 10.13700/j.bh.1001-5965.2020.0539(in Chinese)

基于异质信息网络的恶意代码检测

doi: 10.13700/j.bh.1001-5965.2020.0539
基金项目: 

国家重点研发计划 2018YFB0803604

国家重点研发计划 2018YFB0804704

国家自然科学基金 U1736218

北京建筑大学市属高校基本科研业务费专项资金 X20152

详细信息
    通讯作者:

    严寒冰, E-mail: yhb@cert.org.cn

  • 中图分类号: TP393

Malicious code detection based on heterogeneous information network

Funds: 

National Key R & D Program of China 2018YFB0803604

National Key R & D Program of China 2018YFB0804704

National Natural Science Foundation of China U1736218

the Fundamental Research Funds for Beijing University of Civil Engineering and Architecture X20152

More Information
  • 摘要:

    恶意代码对网络安全、信息安全造成了严重威胁。如何快速检测恶意代码,阻止和降低恶意代码产生的危害一直是亟需解决的问题。通过获取恶意应用的动态信息、构造异质信息网络(HIN),提出了描述恶意代码动态特征的方法,实现了恶意代码检测与分类。构建了FILE、API、DLL三类对象的4种元图,刻画了恶意代码HIN的网络模式。经过改进的随机游走策略,尽可能多地获取元图中对象节点的上下文信息,将其作为连续词包(CBOW)模型的输入,从而得到词向量的网络嵌入。通过投票方法改进主角度分析模型,得到多元图特征融合的分类结果。在仅可获得有限信息的情况下,大大提高了基于单元图特征的恶意样本分类准确率。

     

  • 图 1  元路径与元图示例

    Figure 1.  Examples of meta path and meta graph

    图 2  元图M示例

    Figure 2.  Examples of meta graph M

    图 3  恶意代码网络模式

    Figure 3.  Network schema of malicious code

    图 4  四种元图

    Figure 4.  Four types of meta graph

    图 5  关系R1的矩阵I示例

    Figure 5.  Example of matrix I on relationship R1

    表  1  单元图模型kNN分类结果

    Table  1.   Classification results of each meta graph model using kNN

    评价指标 S1 S2 S3 S4
    分类准确率 0.978 0.96 0.899 0.910
    误报率 0.021 0.037 0.085 0.078
    漏报率 0.022 0.041 0.101 0.090
    下载: 导出CSV

    表  2  单元图模型RF分类结果

    Table  2.   Classification results of each meta graph model using RF

    评价指标 S1 S2 S3 S4
    分类准确率 0.942 0.923 0.869 0.858
    误报率 0.053 0.067 0.104 0.112
    漏报率 0.058 0.078 0.131 0.139
    下载: 导出CSV

    表  3  单元图模型线性SVM分类结果

    Table  3.   Classification results of each meta graph model using linear SVM

    评价指标 S1 S2 S3 S4
    分类准确率 0.960 0.936 0.882 0.899
    误报率 0.040 0.060 0.099 0.087
    漏报率 0.040 0.063 0.118 0.100
    下载: 导出CSV

    表  4  元图权重结果

    Table  4.   Weight values of meta graphs

    S S1 S2 S3 S4
    S1 0.000 16 1.659 9 0.586 8 1.820 5
    S2 1.659 9 0.000 164 4 0.947 3 1.972 1
    S3 0.586 8 0.947 3 0.000 231 4 0.905 5
    S4 1.820 5 1.972 1 0.905 5 0.000 176 9
    α 0.243 4 0.248 4 0.236 0.272 3
    下载: 导出CSV

    表  5  主角度融合分类结果

    Table  5.   Classification results using principal angle hybrid method

    评价指标 kNN RF SVM
    分类准确率 0.965 0 0.927 9 0.937 5
    误报率 0.033 3 0.065 0 0.059 3
    漏报率 0.034 4 0.072 3 0.063 5
    下载: 导出CSV

    表  6  合并2个沙箱结果后单元图模型kNN分类结果

    Table  6.   Classification results of each meta graph model with two sandboxes results using kNN

    评价指标 S1 S2 S3 S4
    分类准确率 0.978 3 0.957 0.895 0.911
    误报率 0.021 0.036 0.083 0.078
    漏报率 0.022 0.042 0.101 0.089
    下载: 导出CSV
  • [1] SIKORSKI M, HONING A. Practical malware analysis: The hands on guide to dissecting malicious software[M]. San Francisco: No Starch Press, 2012: 1-2.
    [2] 石川, 孙怡舟, 菲利普·俞. 异质信息网络的研究现状和未来发展[J]. 中国计算机学会通讯, 2017, 13(11): 35-40.

    SHI C, SUN Y Z, YU P. The research status and future development of heterogeneous information network[J]. Journal of China Computer Federation, 2017, 13(11): 35-40(in Chinese).
    [3] SHI C, LI Y, ZHANG J, et al. A survey of heterogeneous information network analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 17-37.
    [4] SUN Y Z, HAN J W, YAN X F, et al. PathSim: Meta path-based top-k similarity search in heterogeneous information networks[C]//Proceedings of the 37th International Conference on Very Large Data Bases, 2011: 992-1003.
    [5] SUN Y Z, NORICK B, HAN J, et al. Integrating meta path selection with user-guided object clustering in heterogeneous information networks[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 1348-1356.
    [6] SHI C, KONG X, HUANG Y, et al. HeteSim: A general framework for relevance measure in heterogeneous networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(10): 2479-2492.
    [7] CAO B, KONG X, YU P S. Collective prediction of multiple types of links in heterogeneous information networks[C]//Proceedings of the IEEE International Conference on Data Mining. Piscataway: IEEE Press, 2015: 50-59.
    [8] HUANG Z, ZHENG Y, CHENG R, et al. Meta structure: Computing relevance in large heterogeneous information networks[C]//Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1595-1604.
    [9] TAMERSOY A, ROUNDY K, CHAU D H. Guilt by association: Large scale malware detection by mining file-relation graphs[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 1524-1533.
    [10] CHEN L W, LI T, ABDULHAYOGLU M, et al. Intelligent malware detection based on file relation graphs[C]//Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing. Piscataway: IEEE Press, 2015: 85-92.
    [11] FAN Y J, HOU S F, ZHANG Y M, et al. Gotcha-Sly malware! Scorpion: A Metagraph2vec based malware detection system[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2018: 253-262.
    [12] PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 701-710.
    [13] YU X D, CHAWLA N V, SWAMI A. Metapath2vec: Scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017: 135-144.
    [14] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07)[2020-09-01]. https://arxiv.org/abs/1301.3781v3.
    [15] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013, 2: 3111-3119.
  • 加载中
图(5) / 表(6)
计量
  • 文章访问数:  383
  • HTML全文浏览量:  101
  • PDF下载量:  25
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-09-23
  • 录用日期:  2020-12-18
  • 网络出版日期:  2022-02-20

目录

    /

    返回文章
    返回
    常见问答