-
摘要:
针对现有Android恶意软件家族分类方法特征构建完备性不足、构建视角单质化等问题,提出了一种多视角特征规整的卷积神经网络(CNN)恶意APP家族分类方法。该方法结合MinHash算法。将软件中Android框架系统API、操作码序列、AndroidManifest.xml文件中的权限和Intent组合3个视角的原始特征在保留APP间相似度情况下进行规整,并利用多路卷积神经网络完成对各视图的特征提取和信息融合,构建一套恶意APP家族分类模型。基于公开数据集Drebin、Genome、AMD的实验结果表明:恶意APP家族分类准确率超过0.96,证明了所提方法能够充分挖掘各视角的行为特征信息,能有效利用多视角特征间的异构特性,具有较强的实用价值。
-
关键词:
- Android恶意软件 /
- 家族分类 /
- 多视角特征 /
- 行为语义 /
- 卷积神经网络(CNN)
Abstract:Aimed at the problems of incompleteness and singularization of feature construction in the existing Android malware family classification methods, a malicious APP family classification method based on multi-view features regularization and convolutional neural network (CNN) is proposed. We combine the MiniHash algorithm to visualize the original features of the three perspectives which contain APIs of Android framework, opcode sequences, and permissions and Intents in AndroidManifest.xml file, while retaining the similarity among APPs. The feature extraction and information fusion of each view are accomplished through a multi-view convolutional neural network, and then build a set of malicious APP family classification models. The experimental results based on Drebin, Genome and AMD public datasets show that the classification accuracy of malicious APP family is over 0.96, which proves that the proposed method can fully exploit the behavioral characteristic information of various perspectives and effectively make use of the heterogeneous characteristics among multiple perspectives, which has strong practical value.
-
表 1 系统权限和系统自定义Intent信息
Table 1. System permissions and system customized Intent information
元素 数目 样例 系统权限 95 android.permission.ACCESS_NETWORK_STATEandroid.permission.CAMERA
android.permission.ADD_SYSTEM_SERVICE
android.permission.WRITE_CONTACTS
android.permission.REBOOT
⋮系统定义的Intent 85 android.intent.category.BROWSABLE
com.android.settings.APPLICATION_SETTINGS
com.android.settings.WIFI_IP_SETTINGS
android.intent.action.CALL
android.intent.category.CAR_DOCK
⋮表 2 实验软硬件环境信息概况
Table 2. Overview of experimental software and hardware environment information
项目 配置 Dell服务器 Inter(R) Xeon(R) Gold 5 120 CPU 2.20 GHz,GPU Tesla T4×4,Ubuntu 16.04 64 64 开发工具 Pytorch1.3.0,python3.5,androguard3.3.5 表 3 神经网络参数设置信息
Table 3. Neural network parameter setting information
设置项 信息 优化算法 AdamW 初始学习率 0.001 epoch 100 batch_size 128 表 4 恶意家族分类消融测试实验结果
Table 4. Experimental results of classified ablation of malicious family
训练测试集比 评价指标 API OP MF API+OP API+MF OP+MF API+OP+MF 1∶10 Acc 0.908 0.877 0.862 0.914 0.914 0.864 0.920 Pweight 0.910 0.876 0.860 0.915 0.917 0.867 0.921 Rweight 0.908 0.877 0.862 0.914 0.914 0.864 0.920 Fweight 0.907 0.872 0.854 0.908 0.908 0.853 0.914 1∶5 Acc 0.932 0.922 0.877 0.948 0.945 0.910 0.947 Pweight 0.928 0.922 0.877 0.947 0.943 0.910 0.946 Rweight 0.932 0.922 0.870 0.948 0.945 0.910 0.947 Fweight 0.928 0.917 0.873 0.946 0.942 0.907 0.944 1∶2 Acc 0.963 0.941 0.918 0.962 0.965 0.928 0.964 Pweight 0.957 0.94 0.912 0.963 0.963 0.927 0.963 Rweight 0.963 0.941 0.918 0.962 0.965 0.928 0.964 Fweight 0.959 0.939 0.913 0.961 0.962 0.924 0.961 5∶1 Acc 0.979 0.964 0.956 0.986 0.984 0.977 0.990 Pweight 0.974 0.966 0.951 0.986 0.985 0.977 0.990 Rweight 0.979 0.964 0.956 0.986 0.984 0.977 0.990 Fweight 0.976 0.964 0.952 0.985 0.984 0.976 0.989 表 5 恶意家族样本数据
Table 5. Malicious family sample data
数据库 家族类型数量 软件数量 Drebin 130 5 347 Genome 33 1 185 AMD 42 5 065 表 6 基于不同数据库的测试结果
Table 6. Test results based on different databases
数据库 Acc Pweight Rweight Fweight Genome 0.982 0.987 0.982 0.982 Drebin 0.965 0.963 0.965 0.962 AMD 0.976 0.970 0.962 0.978 表 7 对比实验结果
Table 7. Comparative experimental results
方法 数据库 Acc Dendroid Genome 0.942 Apposcopy Genome 0.900 DroidSIFT Genome 0.930 MudFlow Genome 0.881 TriFlow Genome 0.881 DroidLegacy Genome 0.929 Astroid Genome 0.938 Astroid AMD 0.943 FalDroid Genome 0.972 FalDroid Drebin 0.953 本文 Genome 0.982 本文 Drebin 0.965 本文 AMD 0.976 -
[1] SCHULTZ M G, ESKIN E, ZADOK E, et al. Data mining methods for detection of new malicious executables[C]//Proceedings 2001 IEEE Symposium on Security and Privacy. Piscataway: IEEE Press, 2000: 38-49. [2] ABOU-ASSALEH T, CERCONE N, KESELJ V, et al. Detection of new malicious code using N-grams signatures[C]// Second Annual Conference on Privacy Security and Trust. Piscataway: IEEE Press, 2004: 193-196. [3] PARK Y H, REEVES D S, STAMP M. Deriving common malware behavior through graph clustering[J]. Computers & Security, 2013, 39: 419-430. [4] SHEEN S, KARTHIK R, ANITHA R. Comparative study of two-and multi-classification-based detection of malicious executables using soft computing techniques on exhaustive feature set[M]//KRISHNAN G S S, ANITHA R, LEKSHMI R S, et al. Computational intelligence, cyber security and computational models. Berlin: Springer, 2014: 215-225. [5] SUAREZ-TANGIL G, TAPIADOR J E, PERISLOPEZ P, et al. Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families[J]. Expert Systems with Applications, 2014, 41(4): 1104-1117. doi: 10.1016/j.eswa.2013.07.106 [6] FAN M, LIU J, LUO X P, et al. Android malware familial classification and representative sample selection via frequent subgraph analysis[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(8): 1890-1905. doi: 10.1109/TIFS.2018.2806891 [7] JOSHUA G, MAHMOUD H, MALEK S. Lightweight, obfuscation-resilient detection and family identification of Android malware[C]//IEEE/ACM 40th International Conference on Software Engineering. Piscataway: IEEE Press, 2018: 497-497. [8] ZHANG L, THING V, CHENG Y. A scalable and extensible framework for Android malware detection and family attribution[J]. Computers & Security, 2019, 80: 120-133. [9] PEKTAS A, ACARMAN T. Deep learning for effective Android malware detection using API call graph embeddings[J]. Soft Computing, 2020, 24(2): 1027-1043. doi: 10.1007/s00500-019-03940-5 [10] GAO T C, PENG W, SISODIA D, et al. Android malware detection via graphlet sampling[J]. IEEE Transactions on Mobile Computing, 2019, 18(12): 2754-2767. doi: 10.1109/TMC.2018.2880731 [11] ZHANG M, DUAN Y, YIN H, et al. Semantics-aware Android malware classification using weighted contextual API dependency graphs[C]//Proceedings of the 2014 Conference on Computer and Communications Security. New York: ACM, 2014: 1105-1116. [12] AAFER Y, DU W, YIN H. DroidAPIMiner: Mining API-level features for robust malware detection in Android[C]//International Conference on Security and Privacy in Communication Systems. Berlin: Springer, 2013: 86-103. [13] CAI H, MENG N, RYDER B, et al. DroidCat: Effective Android malware detection and categorization via APP-level profiling[J]. IEEE Transactions on Information Forensics and Security, 2019, 14(6): 1455-1470. doi: 10.1109/TIFS.2018.2879302 [14] SUN G, QIAN Q. Deep learning and visualization for identifying malware families[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(1): 283-295. doi: 10.1109/TDSC.2018.2884928 [15] ARP D, SPREITZENBARTH M, HUBNER M. Drebin: Effective and explainable detection of Android malware in your pocket[C]//21st Annual Network and Distributed System Security Symposium, 2014: 23-26. [16] ZHOU Y, JIANG X. Dissecting Android malware: Characterization and evolution[C]//Proceedings of the 2012 IEEE Symposium on Security and Privacy. Piscataway: IEEE Press, 2012: 95-109. [17] LI Y, JANG J, HU X, et al. Android malware clustering through malicious payload mining[C]//International Symposium on Research in Attacks. Berlin: Springer, 2017: 192-214.