Citation: | WANG Keyao, WANG Huiwen, ZHAO Qing, et al. A modified Mahalanobis distance discriminant method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(5): 824-830. doi: 10.13700/j.bh.1001-5965.2020.0652(in Chinese) |
Mahalanobis distance discriminant method is an effective multivariate statistical analysis method based on the Mahalanobis distance. An important feature of the Mahalanobis distance is its introduction of the inverse of covariance matrix, which avoids the disturbance to the distance measurement from the scales of the attribute variables and the correlations among these variables. However, when there is multicollinearity among the attribute variables, the singularity of the covariance matrix will affect the stability of the inverse matrix estimation, and will greatly damage the effect of the Mahalanobis distance discriminant method. We propose a modified Mahalanobis distance discriminant method, which adopts the general cross-validation (GCV) to choose the dimensions of these variables with the best prediction effect, so that the inverse of the covariance matrix can be well estimated when these attribute variables are highly correlated. The modified Mahalanobis distance discriminant method can provide a reliable estimation of the covariance matrix, resist the disturbances outside the sample set, improve the discriminant accuracy of the model, and enhance the generalization ability of the model. Simulations are conducted to verify the improvement of the discriminant performance of the modified Mahalanobis distance discriminant method compared with the classical one.
[1] |
上官丽英, 王惠文. 单形空间中多元成分数据的Fisher判别方法[J]. 北京航空航天大学学报, 2013, 39(10): 1376-1380. https://bhxb.buaa.edu.cn/CN/article/advancedSearchResult.do
SHANGGUAN L Y, WANG H W. Fisher discriminant method for multiple compositional-data variables in simplex space[J]. Journal of Beijing University of Aeronautics and Astronautics, 2013, 39(10): 1376-1380(in Chinese). https://bhxb.buaa.edu.cn/CN/article/advancedSearchResult.do
|
[2] |
CHI S P, MATTHEW Z H. A self-calibrated direct approach to precision matrix estimation and linear discriminant analysis in high dimensions[J]. Computational Statistics & Data Analysis, 2021, 155: 1-20.
|
[3] |
黄雅楠, 魏立力. 基于相似度的三角模糊数Fisher线性判别分析[J]. 计算机工程, 2018, 44(8): 38-42. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC201808007.htm
HUANG Y N, WEI L L. Similarity-based Fisher linear discriminant analysis for triangular fuzzy number[J]. Computer Enginnering, 2018, 44(8): 38-42(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC201808007.htm
|
[4] |
王惠文, 陈梅玲, Gilbert Saporta. 基于Gram-Schmidt过程的判别变量筛选方法[J]. 北京航空航天大学学报, 2011, 37(8): 958-961. https://bhxb.buaa.edu.cn/CN/Y2011/V37/I8/958
WANG H W, CHEN M L, SAPORTA G. Variable selection in discriminant analysis based on Gram-Schmidt process[J]. Journal of Beijing University of Aeronautics and Astronautics, 2011, 37(8): 958-961(in Chinese). https://bhxb.buaa.edu.cn/CN/Y2011/V37/I8/958
|
[5] |
MAHALANOBIS P C. On the generalized distance in statistics[J]. Proceedings of the National Institute of Sciences of India, 1936, 2(1): 49-55.
|
[6] |
史骏, 陈才扣. 基于马氏距离的半监督鉴别分析及人脸识别[J]. 北京航空航天大学学报, 2011, 37(12): 1589-1593. https://bhxb.buaa.edu.cn/CN/Y2011/V37/I12/1589
SHI J, CHEN C K. Mahalanobis distance-based semi-supervised discriminant analysis for face recognition[J]. Journal of Beijing University of Aeronautics and Astronautics, 2011, 37(12): 1589-1593(in Chinese). https://bhxb.buaa.edu.cn/CN/Y2011/V37/I12/1589
|
[7] |
DE MAESSCHALCK R, JOUAN-RIMBAUD D, MASSART D L. The Mahalanobis distance[J]. Chemometrics and Intelligent Laboratory Systems, 2000, 50(1): 1-18. doi: 10.1016/S0169-7439(99)00047-7
|
[8] |
梅江元. 基于马氏距离的度量学习算法研究及应用[D]. 哈尔滨: 哈尔滨工业大学, 2016: 7-15.
MEI J Y. Research on Mahalanobis distance based metric learning algorithm and its applications[D]. Harbin: Harbin Institute of Technology, 2016: 7-15(in Chinese).
|
[9] |
CUDNEY E A F, RAGSDELL K M. Forecasting using the Mahalanobis-Taguchi system in the presence of collinearity[R]. [S. l. ]: SAE Technical Paper, 2006.
|
[10] |
陶建波, 程龙生. 基于岭估计的岭马田系统在复共线性数据中的应用[J]. 数学的实践与认识, 2016, 46(4): 109-116. https://www.cnki.com.cn/Article/CJFDTOTAL-SSJS201604012.htm
TAO J B, CHENG L S. The application of ridge Mahalanobis-Taguchi system based on ridge estimation in data with multicollinearity[J]. Mathematics in Practice and Theory, 2016, 46(4): 109-116(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SSJS201604012.htm
|
[11] |
丁坤, 刘振飞, 高列, 等. 基于主成分分析和马氏距离的光伏系统健康状态研究[J]. 可再生能源, 2017, 35(1): 1-7. https://www.cnki.com.cn/Article/CJFDTOTAL-NCNY201701001.htm
DING K, LIU Z F, GAO L, et al. Research on photovoltaic system health state based on PCA-MD method[J]. Renewable Energy Resources, 2017, 35(1): 1-7(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-NCNY201701001.htm
|
[12] |
谢吉伟, 刘君强, 王小磊. 基于马氏距离的航空发动机健康监控方法[J]. 航空计算技术, 2015, 45(3): 72-75. doi: 10.3969/j.issn.1671-654X.2015.03.018
XIE J W, LIU J Q, WANG X L. Aero-engines health monitoring method based on Mahalanobis distance[J]. Aeronautical Computing Technique, 2015, 45(3): 72-75(in Chinese). doi: 10.3969/j.issn.1671-654X.2015.03.018
|
[13] |
JOHNSON R A, WICHERN D W. Applied multivariate statistical analysis[M]. 6th ed. New York: Pearson, 2007: 445.
|
[14] |
CHOI B Y, TAYLOR J, TIBSHIRANI R. Selecting the number of principal components: Estimation of the true rank of a noisy matrix[J]. The Annals of Statistics, 2017, 45(6): 2590-2617.
|
[15] |
FORKMAN J, JOSSE J, PIEPHO H P. Hypothesis tests for principal component analysis when variables are standardized[J]. Journal of Agricultural, Biological and Environmental Statistics, 2019, 24: 289-308. doi: 10.1007/s13253-019-00355-5
|
[16] |
VIRTA J, NORDHAUSEN K. Estimating the number of signals using principal component analysis[J]. Stat, 2019, 8(1): e231.
|
[17] |
HASTIE T, TIBSHIRANI R, FRIEDMAN J. The elements of statistical learning[M]. 2nd ed. Berlin: Springer, 2009: 232-249.
|
[18] |
JOSSE J, HUSSON F. Selecting the number of components in PCA using cross-validation approximations[J]. Computational Statistics & Data Analysis, 2012, 56: 1869-1879.
|
[19] |
KIERS H A L. Weighted least squares fitting using ordinary least squares algorithms[J]. Psychometrica, 1997, 62(2): 251-266. doi: 10.1007/BF02295279
|
[20] |
JOSSE J, PAGÈS J, HUSSON F. Gestion des données manquantes en analyse en composantes principales[J]. Journal de la Société Française de Statistiques, 2009, 150: 28-51.
|