北京航空航天大学学报 ›› 2022, Vol. 48 ›› Issue (2): 282-290.doi: 10.13700/j.bh.1001-5965.2020.0400

• 论文 • 上一篇    下一篇

一种基于深度学习的恶意代码克隆检测技术

沈元1, 严寒冰2, 夏春和1, 韩志辉2   

  1. 1. 北京航空航天大学 计算机学院, 北京 100083;
    2. 国家计算机网络应急技术处理协调中心, 北京 100029
  • 收稿日期:2020-08-09 发布日期:2022-03-03
  • 通讯作者: 严寒冰 E-mail:yhb@cert.org.cn
  • 基金资助:
    国家自然科学基金(U1736218);国家科技重大专项(2018YFB0804704);北航青年拔尖人才支持计划(YWF-20-BJ-J-1038)

Malicious code clone detection technology based on deep learning

SHEN Yuan1, YAN Hanbing2, XIA Chunhe1, HAN Zhihui2   

  1. 1. School of Computer Science and Engineering, Beihang University, Beijing 100083, China;
    2. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
  • Received:2020-08-09 Published:2022-03-03
  • Supported by:
    National Natural Science Foundation of China (U1736218); National Science and Technology Major Project (2018YFB0804704); Beihang Youth Top Talent Support Program (YWF-20-BJ-J-1038)

摘要: 恶意代码克隆检测已经成为恶意代码同源分析及高级持续性威胁(APT)攻击溯源的有效方式。从公共威胁情报中收集了不同APT组织的样本,并提出了一种基于深度学习的恶意代码克隆检测框架,目的是检测新发现的恶意代码中的函数与已知APT组织资源库中的恶意代码的相似性,以此高效地对恶意软件进行分析,进而快速判别APT攻击来源。通过反汇编技术对恶意代码进行静态分析,并利用其关键系统函数调用图及反汇编代码作为该恶意代码的特征。根据神经网络模型对APT组织资源库中的恶意代码进行分类。通过广泛评估和与MCrab模型的对比可知,改进模型优于MCrab模型,可以有效地进行恶意代码克隆检测与分类,且获得了较高的检测率。

关键词: 深度学习, 高级持续性威胁(APT)组织, 克隆检测, 控制流图(CFG), 系统函数调用图

Abstract: Malicious code clone detection has become an effective way to analyze malicious code homology and advanced persistent threat (APT) attacks. In this paper, we collect samples of different APT organizations from public threat intelligence, and propose a deep learning based malicious code clone detection framework to detect the similarity between the functions in newly discovered malicious code and the malicious code in known APT organizational resources in order to efficiently analyze malware and quickly identify the source of APT attacks. We perform static analysis of malicious code through disassembly technology, use its key function call graph and disassembly code as the features of the malicious code, and then classify the malicious code in the APT organization library according to the neural network model. Through extensive evaluation and comparison with our previous models (MCrab), the improved model is better than the previous model, which can effectively detect and classify malicious code clones and obtain higher detection rate.

Key words: deep learning, advanced persistent threat (APT) groups, clone detection, control flow graph (CFG), system function call graph

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发