留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

Android恶意软件检测低冗余特征选择方法

郝靖伟 潘丽敏 李蕊 杨鹏 罗森林

郝靖伟, 潘丽敏, 李蕊, 等 . Android恶意软件检测低冗余特征选择方法[J]. 北京航空航天大学学报, 2022, 48(2): 225-232. doi: 10.13700/j.bh.1001-5965.2020.0567
引用本文: 郝靖伟, 潘丽敏, 李蕊, 等 . Android恶意软件检测低冗余特征选择方法[J]. 北京航空航天大学学报, 2022, 48(2): 225-232. doi: 10.13700/j.bh.1001-5965.2020.0567
HAO Jingwei, PAN Limin, LI Rui, et al. Low redundancy feature selection method for Android malware detection[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 225-232. doi: 10.13700/j.bh.1001-5965.2020.0567(in Chinese)
Citation: HAO Jingwei, PAN Limin, LI Rui, et al. Low redundancy feature selection method for Android malware detection[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 225-232. doi: 10.13700/j.bh.1001-5965.2020.0567(in Chinese)

Android恶意软件检测低冗余特征选择方法

doi: 10.13700/j.bh.1001-5965.2020.0567
基金项目: 

国家242信息安全计划 2019A012

工信部2020年信息安全软件项目 CEIEC-2020-ZM02-0134

详细信息
    通讯作者:

    杨鹏, E-mail: yp@cert.org.cn

  • 中图分类号: V219;TP317

Low redundancy feature selection method for Android malware detection

Funds: 

242 National Information Security Projects 2019A012

2020 Information Security Software Project of the Ministry of Industry and Information Technology CEIEC-2020-ZM02-0134

More Information
  • 摘要:

    针对Android恶意软件检测特征选择中,对类间具有相同频率分布的特征过度关注而导致特征冗余问题,提出了一种Android恶意软件检测低冗余特征选择方法。利用Mann-Whitney检验方法选择出存在频率分布偏差的特征;通过外观比率间隔算法量化偏差程度和特征出现频率剔除低偏差和整体软件中低频使用的特征;结合粒子群优化算法和分类器检测效果得到最优特征子集。使用公开数据集DREBIN和AMD进行实验,实验结果显示,在AMD数据集上选择出了294维特征,进行特征选择后6种分类器的检测准确率提高了1%~5%,在DREBIN数据集上选择出了295维特征,少于4种对比方法,且进行特征选择后6种分类器的检测准确率提高了1.7%~5%。实验结果表明,所提方法能够降低Android恶意软件检测中特征的冗余性,提升恶意软件的检测准确率。

     

  • 图 1  Android恶意软件检测低冗余特征选择方法原理框架

    Figure 1.  Framework of low redundancy feature selection method for Android malware detection

    表  1  初始特征信息

    Table  1.   Initial feature information

    特征类型 数目 特征
    系统权限 24 READ CALENDAR
    WRITE CALENDAR
    CAMERA
    READ CONTACTS
    WRITE CONTACTS
    ……
    API 620 android/net/ConnectivityManager;
    startUsingNetworkFeature
    android/net/wifi/WifiManager;
    enableNetwork
    android/net/wifi/WifiManager;
    disconnect
    android/net/wifi/WifiManager;
    setWifiEnabled
    ……
    组间通信Intent 45 Android.intent.action.MAIN
    Android.intent.action.VIEW
    Android.intent.action.ATTACH DATA
    Android.intent.action.EDIT
    Android.intent.action.PICK
    ……
    下载: 导出CSV

    表  2  Mann-Whitney检验输入矩阵

    Table  2.   Mann-Whitney test input matrix

    数据集 特征fi 数据集 特征fi
    恶意软件1 0 良性软件1 0
    恶意软件2 0 良性软件2 0
    恶意软件3 0 良性软件3 1
    恶意软件n Bf 良性软件m Mf
    下载: 导出CSV

    表  3  实验所用软件资源

    Table  3.   Software resources used in experiment

    软件名称 来源
    Python 3.7.3 https://www.python.org/(开源)
    Anaconda 4.3.3 https://www.anaconda.com/(开源)
    下载: 导出CSV

    表  4  实验所用硬件资源

    Table  4.   Hardware resources used in experiment

    名称 描述
    实验计算机 Mac
    操作系统 MacOS High Sierra
    网络配置 3.1 GHz Intel Core i5处理器,8 GB 2 133 MHz内存
    下载: 导出CSV

    表  5  数据集概况

    Table  5.   Description of dataset

    数据集 软件总数量 恶意软件数量 良性软件数量 类别
    DREBIN 11 120 5 560 5 560 2
    AMD 49 300 24 650 24 650 2
    下载: 导出CSV

    表  6  AMD数据集的最优特征子集

    Table  6.   Optimal feature subset of AMD dataset

    特征类型 数目 特征(AMD数据集)
    系统权限 14 ACCESS_NETWORK_STATE
    ACCESS_WIFI_STATE
    BROADCAST_STICKY
    CAMERA
    GET_TASKS
    ……
    API 262 android/net/wifi/WifiManager; enableNetwork
    android/net/wifi/WifiManager; setWifiEnabled
    Dex Class Loader
    ……
    组间通信Intent 18 android.intent.action.ACTION_SHUTDOWN
    android.intent.action.AIRPLANE_MODE
    android.intent.action.BOOT_COMPLETED
    android.intent.action.MEDIA_MOUNTED
    android.intent.action.SEARCH
    ……
    下载: 导出CSV

    表  7  特征选择实验结果(AMD数据集)

    Table  7.   Experimental results of feature selection (AMD dataset)

    方法 原始特征集(689维) 最优特征集(294维)
    Accuracy/% Precision/% Recall/% F1 Accuracy/% Precision/% Recall/% F1
    GBDT 95.0 94.9 95.2 0.950 95.6 96.2 95.1 0.956
    MLP 96.0 96.1 96.1 0.961 96.4 96.0 97.4 0.970
    LR 92.9 90.2 95.5 0.927 94.6 95.2 95.2 0.947
    AdaBoost 93.2 92.3 94.2 0.932 93.8 94.3 93.5 0.939
    NB 86.4 85.3 87.5 0.864 85.0 94.5 88.9 0.864
    RF 97.6 96.7 98.5 0.974 98.2 98.1 98.4 0.983
    下载: 导出CSV

    表  8  特征选择实验结果(DREBIN数据集)

    Table  8.   Experimental results of feature selection (DREBIN dataset)

    方法 原始特征集(689维) 最优特征集(295维)
    Accuracy/% Precision/% Recall/% F1 Accuracy/% Precision/% Recall/% F1
    GBDT 96.8 90.4 94.3 0.924 96.9 92.0 94.5 0.935
    MLP 97.0 93.3 92.7 0.930 98.4 95.0 94.0 0.944
    LR 96.3 88.8 93.3 0.924 96.1 89.0 93.1 0.924
    AdaBoost 96.0 90.0 91.3 0.903 96.8 89.9 92.2 0.916
    NB 94.6 82.1 91.4 0.865 95.3 86.7 91.9 0.871
    RF 98.1 93.2 98.2 0.956 98.9 94.9 98.9 0.967
    下载: 导出CSV

    表  9  特征选择方法实验结果对比

    Table  9.   Comparison of experimental results among feature selection methods

    方法 特征数量 Accuracy/% Precision/% Recall/% F1
    文献[18] 364 96.5 96.0 97.3 0.967
    文献[19] 394 96.4 96.4 96.9 0.967
    文献[20] 400 97.0 96.8 97.3 0.971
    文献[21] 314 96.4 96.4 96.8 0.966
    文献[22] 22 96.1 97.0 97.4 0.962
    本文方法 294 98.2 98.1 98.4 0.983
    下载: 导出CSV
  • [1] 中国互联网络信息中心. 第44次中国互联网络发展现状统计报告[R]. 北京: 中国互联网络信息中心, 2019.

    China Internet Network Information Center. The 44th China statistical reports on internet development[R]. Beijing: China Internet Network Information Center, 2019(in Chinese).
    [2] International Data Corporation. Worldwide smartphone market shares[R]. New York: International Data Corporation, 2019.
    [3] YERIMA S Y, SEZER S, MCWILLIAMS G. Analysis of Bayesian classification-based approaches for Android malware detection[J]. IET Information Security, 2014, 8(1): 25-26. doi: 10.1049/iet-ifs.2013.0095
    [4] PEHLIVAN U, BALTACI N, ACARTURK C, et al. The analysis of feature selection methods and classification algorithms in permission based Android malware detection[C]//Computational Intelligence in Cyber Security. Piscataway: IEEE Press, 2014: 1-8.
    [5] WANG W, WANG X, FENG D, et al. Exploring permission-induced risk in Android applications for malicious application detection[J]. IEEE Transactions on Information Forensics and Security, 2014, 9(11): 1869-1882. doi: 10.1109/TIFS.2014.2353996
    [6] CEN L, GATES C S, SI L, el al. A probabilistic discriminative model for Android malware detection with decompiled source code[J]. IEEE Transactions on Dependable and Secure Computing, 2015, 12(4): 400-412. doi: 10.1109/TDSC.2014.2355839
    [7] ZHAO K, ZHANG D, SU X, et al. Fest: A feature extraction and selection tool for Android malware detection[C]//Computers and Communication. Piscataway: IEEE Press, 2015: 714-720.
    [8] TAO G, ZHENG Z, GUO Z, et al. MalPat: Mining patterns of malicious and benign Android apps via permission-related APIs[J]. IEEE Transactions on Reliability, 2018, 67(1): 355-369. doi: 10.1109/TR.2017.2778147
    [9] LI J, SUN L, YAN Q, et al. Significant permission identification for machine learning based Android malware detection[J]. IEEE Transactions on Industrial Informatics, 2017, 14(7): 3216-3225.
    [10] DESNOS A, GUEGUEN G, BACHMANN S. Androguard package[EB/OL]. (2020-04-30)[2021-09-01]. https://github.com/androguard.
    [11] MANN H B, WHITNEY D R. On a test whether one of two random variables is statistically larger than the other[J]. Annals of Mathematical Statistics, 1947, 18(1): 50-60. doi: 10.1214/aoms/1177730491
    [12] FRIEDMAN J H. Stochastic gradient boosting[J]. Computational Statistics & Data Analysis, 2002, 38(4): 367-378.
    [13] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. doi: 10.1006/jcss.1997.1504
    [14] HANSEN L K, SALAMON P. Neural network ensembles[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10): 993-1001. doi: 10.1109/34.58871
    [15] RUCZINSKI I, KOOPERBERG C, LEBLANC M. Logic regression[J]. Journal of Computational and Graphical Statistics, 2003, 12(3): 475-511. doi: 10.1198/1061860032238
    [16] FRIEDMAN N, GEIGER D, GOLDSZMIDT M. Bayesian network classifiers[J]. Machine Learning, 1997, 29: 131-163. doi: 10.1023/A:1007465528199
    [17] BREIMAN L. Random forests[J]. Machine Learning, 2001, 45: 5-32. doi: 10.1023/A:1010933404324
    [18] MORALES O S, ESCAMILLA A P J, RODRGUEZ M A, et al. Native malware detection in smartphones with Android OS using static analysis, feature selection and ensemble classifiers[C]//International Conference on Malicious and Unwanted Software. Piscataway: IEEE Press, 2016: 67-74.
    [19] SEDANO J, GONZLEZ S, CHIRA C, et al. Key features for the characterization of Android malware families[J]. Logic Journal of the IGPL, 2017, 25(1): 54-66. doi: 10.1093/jigpal/jzw046
    [20] RAI S, DHANESHA R, NAHATA S, et al. Malicious application detection on Android smartphones with enhanced static-dynamic analysis[C]//International Conference on Information Systems Security. Berlin: Springer, 2017: 194-208.
    [21] FATIMA A, MAURYA R, DUTTA M K, et al. Android malware detection using genetic algorithm based optimized feature selection and machine learning[C]//International Conference on Telecommunications & Signal Processing. Piscataway: IEEE Press, 2019: 220-223.
    [22] SUN L, LI Z, YAN Q, et al. SigPID: Significant permission identification for android malware detection[C]//International Conference on Malicious and Unwanted Software. Piscataway: IEEE Press, 2017: 1-8.
  • 加载中
图(1) / 表(9)
计量
  • 文章访问数:  361
  • HTML全文浏览量:  142
  • PDF下载量:  137
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-09-30
  • 录用日期:  2020-12-18
  • 网络出版日期:  2022-02-20

目录

    /

    返回文章
    返回
    常见问答