Android恶意APP多视角家族分类方法

郝靖伟; 罗森林; 张寒青; 杨鹏; 潘丽敏

doi:10.13700/j.bh.1001-5965.2020.0658

Android恶意APP多视角家族分类方法

doi: 10.13700/j.bh.1001-5965.2020.0658

1.
北京理工大学信息与电子学院，北京 100081
2.
国家计算机网络应急技术处理协调中心, 北京 100029

基金项目:

国家242信息安全计划 2019A012

工信部2020年信息安全软件项目 CEIEC-2020-ZM02-0134

详细信息

通讯作者:
杨鹏, E-mail: yp@cert.org.cn

中图分类号: V219; TP317
计量
- 文章访问数: 529
- HTML全文浏览量: 208
- PDF下载量: 163
- 被引次数: 0
出版历程
- 收稿日期: 2020-11-25
- 录用日期: 2020-12-25
- 网络出版日期: 2022-05-20

Android malicious APP multi-view family classification method

1.
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
2.
National Computer Network Emergency Response Technical Team and Coordination Center, Beijing 100029, China

Funds:

242 National Information Security Projects 2019A012

2020 Information Security Software Project of the Ministry of Industry and Information Technology CEIEC-2020-ZM02-0134

More Information

Corresponding author: YANG Peng, E-mail: yp@cert.org.cn

摘要

摘要:
针对现有Android恶意软件家族分类方法特征构建完备性不足、构建视角单质化等问题，提出了一种多视角特征规整的卷积神经网络(CNN)恶意APP家族分类方法。该方法结合MinHash算法。将软件中Android框架系统API、操作码序列、AndroidManifest.xml文件中的权限和Intent组合3个视角的原始特征在保留APP间相似度情况下进行规整，并利用多路卷积神经网络完成对各视图的特征提取和信息融合，构建一套恶意APP家族分类模型。基于公开数据集Drebin、Genome、AMD的实验结果表明：恶意APP家族分类准确率超过0.96，证明了所提方法能够充分挖掘各视角的行为特征信息，能有效利用多视角特征间的异构特性，具有较强的实用价值。
- Android恶意软件 /
- 家族分类 /
- 多视角特征 /
- 行为语义 /
- 卷积神经网络(CNN)
Abstract:
Aimed at the problems of incompleteness and singularization of feature construction in the existing Android malware family classification methods, a malicious APP family classification method based on multi-view features regularization and convolutional neural network (CNN) is proposed. We combine the MiniHash algorithm to visualize the original features of the three perspectives which contain APIs of Android framework, opcode sequences, and permissions and Intents in AndroidManifest.xml file, while retaining the similarity among APPs. The feature extraction and information fusion of each view are accomplished through a multi-view convolutional neural network, and then build a set of malicious APP family classification models. The experimental results based on Drebin, Genome and AMD public datasets show that the classification accuracy of malicious APP family is over 0.96, which proves that the proposed method can fully exploit the behavioral characteristic information of various perspectives and effectively make use of the heterogeneous characteristics among multiple perspectives, which has strong practical value.
- Android malware /
- family classification /
- multi-view features /
- behavioral semantics /
- convolutional neural network (CNN)

HTML全文

图 1 多视角特征规整的CNN Android恶意软件家族分类原理框架

Figure 1. Principle framework of Android malware family classification method based on multi-view features regularization and convolutional neural network

下载: 全尺寸图片幻灯片

图 2 plankton家族样本3视角特征可视化效果

Figure 2. Visualization results of 3-view features of plankton family sample

下载: 全尺寸图片幻灯片

图 3 多视角卷积神经网络结构

Figure 3. Multi-view CNN structure

下载: 全尺寸图片幻灯片

图 4 OP视角卷积神经网络结构

Figure 4. OP view CNN structure

下载: 全尺寸图片幻灯片

图 5 API视角及MF视角卷积神经网络结构

Figure 5. API view and MF view CNN structure

下载: 全尺寸图片幻灯片

图 6 不同模型在不同数据划分方式下的分类准确率

Figure 6. Classification accuracy of different models under different data partitioning methods

下载: 全尺寸图片幻灯片

表 1 系统权限和系统自定义Intent信息

Table 1. System permissions and system customized Intent information

元素	数目	样例
系统权限	95	android.permission.ACCESS_NETWORK_STATEandroid.permission.CAMERA android.permission.ADD_SYSTEM_SERVICE android.permission.WRITE_CONTACTS android.permission.REBOOT ⋮
系统定义的Intent	85	android.intent.category.BROWSABLE com.android.settings.APPLICATION_SETTINGS com.android.settings.WIFI_IP_SETTINGS android.intent.action.CALL android.intent.category.CAR_DOCK ⋮

下载: 导出CSV

表 2 实验软硬件环境信息概况

Table 2. Overview of experimental software and hardware environment information

项目	配置
Dell服务器	Inter(R) Xeon(R) Gold 5 120 CPU 2.20 GHz，GPU Tesla T4×4，Ubuntu 16.04 64 64
开发工具	Pytorch1.3.0，python3.5，androguard3.3.5

下载: 导出CSV

表 3 神经网络参数设置信息

Table 3. Neural network parameter setting information

设置项	信息
优化算法	AdamW
初始学习率	0.001
epoch	100
batch_size	128

下载: 导出CSV

表 4 恶意家族分类消融测试实验结果

Table 4. Experimental results of classified ablation of malicious family

训练测试集比	评价指标	API	OP	MF	API+OP	API+MF	OP+MF	API+OP+MF
1∶10	Acc	0.908	0.877	0.862	0.914	0.914	0.864	0.920
	P_weight	0.910	0.876	0.860	0.915	0.917	0.867	0.921
	R_weight	0.908	0.877	0.862	0.914	0.914	0.864	0.920
	F_weight	0.907	0.872	0.854	0.908	0.908	0.853	0.914
1∶5	Acc	0.932	0.922	0.877	0.948	0.945	0.910	0.947
	P_weight	0.928	0.922	0.877	0.947	0.943	0.910	0.946
	R_weight	0.932	0.922	0.870	0.948	0.945	0.910	0.947
	F_weight	0.928	0.917	0.873	0.946	0.942	0.907	0.944
1∶2	Acc	0.963	0.941	0.918	0.962	0.965	0.928	0.964
	P_weight	0.957	0.94	0.912	0.963	0.963	0.927	0.963
	R_weight	0.963	0.941	0.918	0.962	0.965	0.928	0.964
	F_weight	0.959	0.939	0.913	0.961	0.962	0.924	0.961
5∶1	Acc	0.979	0.964	0.956	0.986	0.984	0.977	0.990
	P_weight	0.974	0.966	0.951	0.986	0.985	0.977	0.990
	R_weight	0.979	0.964	0.956	0.986	0.984	0.977	0.990
	F_weight	0.976	0.964	0.952	0.985	0.984	0.976	0.989

下载: 导出CSV

表 5 恶意家族样本数据

Table 5. Malicious family sample data

数据库	家族类型数量	软件数量
Drebin	130	5 347
Genome	33	1 185
AMD	42	5 065

下载: 导出CSV

表 6 基于不同数据库的测试结果

Table 6. Test results based on different databases

数据库	Acc	P_weight	R_weight	F_weight
Genome	0.982	0.987	0.982	0.982
Drebin	0.965	0.963	0.965	0.962
AMD	0.976	0.970	0.962	0.978

下载: 导出CSV

表 7 对比实验结果

Table 7. Comparative experimental results

方法	数据库	Acc
Dendroid	Genome	0.942
Apposcopy	Genome	0.900
DroidSIFT	Genome	0.930
MudFlow	Genome	0.881
TriFlow	Genome	0.881
DroidLegacy	Genome	0.929
Astroid	Genome	0.938
Astroid	AMD	0.943
FalDroid	Genome	0.972
FalDroid	Drebin	0.953
本文	Genome	0.982
本文	Drebin	0.965
本文	AMD	0.976

下载: 导出CSV

参考文献(17)

[1]	SCHULTZ M G, ESKIN E, ZADOK E, et al. Data mining methods for detection of new malicious executables[C]//Proceedings 2001 IEEE Symposium on Security and Privacy. Piscataway: IEEE Press, 2000: 38-49.
[2]	ABOU-ASSALEH T, CERCONE N, KESELJ V, et al. Detection of new malicious code using N-grams signatures[C]// Second Annual Conference on Privacy Security and Trust. Piscataway: IEEE Press, 2004: 193-196.
[3]	PARK Y H, REEVES D S, STAMP M. Deriving common malware behavior through graph clustering[J]. Computers & Security, 2013, 39: 419-430.
[4]	SHEEN S, KARTHIK R, ANITHA R. Comparative study of two-and multi-classification-based detection of malicious executables using soft computing techniques on exhaustive feature set[M]//KRISHNAN G S S, ANITHA R, LEKSHMI R S, et al. Computational intelligence, cyber security and computational models. Berlin: Springer, 2014: 215-225.
[5]	SUAREZ-TANGIL G, TAPIADOR J E, PERISLOPEZ P, et al. Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families[J]. Expert Systems with Applications, 2014, 41(4): 1104-1117. doi: 10.1016/j.eswa.2013.07.106
[6]	FAN M, LIU J, LUO X P, et al. Android malware familial classification and representative sample selection via frequent subgraph analysis[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(8): 1890-1905. doi: 10.1109/TIFS.2018.2806891
[7]	JOSHUA G, MAHMOUD H, MALEK S. Lightweight, obfuscation-resilient detection and family identification of Android malware[C]//IEEE/ACM 40th International Conference on Software Engineering. Piscataway: IEEE Press, 2018: 497-497.
[8]	ZHANG L, THING V, CHENG Y. A scalable and extensible framework for Android malware detection and family attribution[J]. Computers & Security, 2019, 80: 120-133.
[9]	PEKTAS A, ACARMAN T. Deep learning for effective Android malware detection using API call graph embeddings[J]. Soft Computing, 2020, 24(2): 1027-1043. doi: 10.1007/s00500-019-03940-5
[10]	GAO T C, PENG W, SISODIA D, et al. Android malware detection via graphlet sampling[J]. IEEE Transactions on Mobile Computing, 2019, 18(12): 2754-2767. doi: 10.1109/TMC.2018.2880731
[11]	ZHANG M, DUAN Y, YIN H, et al. Semantics-aware Android malware classification using weighted contextual API dependency graphs[C]//Proceedings of the 2014 Conference on Computer and Communications Security. New York: ACM, 2014: 1105-1116.
[12]	AAFER Y, DU W, YIN H. DroidAPIMiner: Mining API-level features for robust malware detection in Android[C]//International Conference on Security and Privacy in Communication Systems. Berlin: Springer, 2013: 86-103.
[13]	CAI H, MENG N, RYDER B, et al. DroidCat: Effective Android malware detection and categorization via APP-level profiling[J]. IEEE Transactions on Information Forensics and Security, 2019, 14(6): 1455-1470. doi: 10.1109/TIFS.2018.2879302
[14]	SUN G, QIAN Q. Deep learning and visualization for identifying malware families[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(1): 283-295. doi: 10.1109/TDSC.2018.2884928
[15]	ARP D, SPREITZENBARTH M, HUBNER M. Drebin: Effective and explainable detection of Android malware in your pocket[C]//21st Annual Network and Distributed System Security Symposium, 2014: 23-26.
[16]	ZHOU Y, JIANG X. Dissecting Android malware: Characterization and evolution[C]//Proceedings of the 2012 IEEE Symposium on Security and Privacy. Piscataway: IEEE Press, 2012: 95-109.
[17]	LI Y, JANG J, HU X, et al. Android malware clustering through malicious payload mining[C]//International Symposium on Research in Attacks. Berlin: Springer, 2017: 192-214.