Citation: | HAO Jingwei, PAN Limin, LI Rui, et al. Low redundancy feature selection method for Android malware detection[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 225-232. doi: 10.13700/j.bh.1001-5965.2020.0567(in Chinese) |
A low redundancy feature selection method for Android malware detection is proposed to solve the problem of feature redundancy caused by excessive attention to features with the same frequency distribution between classes. First, the method selects features with frequency distribution bias by Mann-Whitney test, and then quantifies the degree of bias and feature appearance frequency by the appearance ratio interval algorithm to reject features with low bias and low use frequency in the overall software. Finally, the particle swarm optimization algorithm is combined with model detection effect to obtain the optimal feature subset. Experiments were conducted using public datasets DREBIN and AMD. The experimental results show that 294-dimensional features were selected on the AMD dataset, and the detection accuracy of the six classifiers is improved by 1%-5%, 295-dimensional features were selected on the DREBIN dataset less than 4 comparison methods, and the detection accuracy of the six classifiers is improved by 1.7%-5%. The experimental results illustrate that the proposed method can reduce the redundancy of features in Android malware detection and improve the malware detection accuracy.
[1] |
中国互联网络信息中心. 第44次中国互联网络发展现状统计报告[R]. 北京: 中国互联网络信息中心, 2019.
China Internet Network Information Center. The 44th China statistical reports on internet development[R]. Beijing: China Internet Network Information Center, 2019(in Chinese).
|
[2] |
International Data Corporation. Worldwide smartphone market shares[R]. New York: International Data Corporation, 2019.
|
[3] |
YERIMA S Y, SEZER S, MCWILLIAMS G. Analysis of Bayesian classification-based approaches for Android malware detection[J]. IET Information Security, 2014, 8(1): 25-26. doi: 10.1049/iet-ifs.2013.0095
|
[4] |
PEHLIVAN U, BALTACI N, ACARTURK C, et al. The analysis of feature selection methods and classification algorithms in permission based Android malware detection[C]//Computational Intelligence in Cyber Security. Piscataway: IEEE Press, 2014: 1-8.
|
[5] |
WANG W, WANG X, FENG D, et al. Exploring permission-induced risk in Android applications for malicious application detection[J]. IEEE Transactions on Information Forensics and Security, 2014, 9(11): 1869-1882. doi: 10.1109/TIFS.2014.2353996
|
[6] |
CEN L, GATES C S, SI L, el al. A probabilistic discriminative model for Android malware detection with decompiled source code[J]. IEEE Transactions on Dependable and Secure Computing, 2015, 12(4): 400-412. doi: 10.1109/TDSC.2014.2355839
|
[7] |
ZHAO K, ZHANG D, SU X, et al. Fest: A feature extraction and selection tool for Android malware detection[C]//Computers and Communication. Piscataway: IEEE Press, 2015: 714-720.
|
[8] |
TAO G, ZHENG Z, GUO Z, et al. MalPat: Mining patterns of malicious and benign Android apps via permission-related APIs[J]. IEEE Transactions on Reliability, 2018, 67(1): 355-369. doi: 10.1109/TR.2017.2778147
|
[9] |
LI J, SUN L, YAN Q, et al. Significant permission identification for machine learning based Android malware detection[J]. IEEE Transactions on Industrial Informatics, 2017, 14(7): 3216-3225.
|
[10] |
DESNOS A, GUEGUEN G, BACHMANN S. Androguard package[EB/OL]. (2020-04-30)[2021-09-01]. https://github.com/androguard.
|
[11] |
MANN H B, WHITNEY D R. On a test whether one of two random variables is statistically larger than the other[J]. Annals of Mathematical Statistics, 1947, 18(1): 50-60. doi: 10.1214/aoms/1177730491
|
[12] |
FRIEDMAN J H. Stochastic gradient boosting[J]. Computational Statistics & Data Analysis, 2002, 38(4): 367-378.
|
[13] |
FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. doi: 10.1006/jcss.1997.1504
|
[14] |
HANSEN L K, SALAMON P. Neural network ensembles[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10): 993-1001. doi: 10.1109/34.58871
|
[15] |
RUCZINSKI I, KOOPERBERG C, LEBLANC M. Logic regression[J]. Journal of Computational and Graphical Statistics, 2003, 12(3): 475-511. doi: 10.1198/1061860032238
|
[16] |
FRIEDMAN N, GEIGER D, GOLDSZMIDT M. Bayesian network classifiers[J]. Machine Learning, 1997, 29: 131-163. doi: 10.1023/A:1007465528199
|
[17] |
BREIMAN L. Random forests[J]. Machine Learning, 2001, 45: 5-32. doi: 10.1023/A:1010933404324
|
[18] |
MORALES O S, ESCAMILLA A P J, RODRGUEZ M A, et al. Native malware detection in smartphones with Android OS using static analysis, feature selection and ensemble classifiers[C]//International Conference on Malicious and Unwanted Software. Piscataway: IEEE Press, 2016: 67-74.
|
[19] |
SEDANO J, GONZLEZ S, CHIRA C, et al. Key features for the characterization of Android malware families[J]. Logic Journal of the IGPL, 2017, 25(1): 54-66. doi: 10.1093/jigpal/jzw046
|
[20] |
RAI S, DHANESHA R, NAHATA S, et al. Malicious application detection on Android smartphones with enhanced static-dynamic analysis[C]//International Conference on Information Systems Security. Berlin: Springer, 2017: 194-208.
|
[21] |
FATIMA A, MAURYA R, DUTTA M K, et al. Android malware detection using genetic algorithm based optimized feature selection and machine learning[C]//International Conference on Telecommunications & Signal Processing. Piscataway: IEEE Press, 2019: 220-223.
|
[22] |
SUN L, LI Z, YAN Q, et al. SigPID: Significant permission identification for android malware detection[C]//International Conference on Malicious and Unwanted Software. Piscataway: IEEE Press, 2017: 1-8.
|