An attribute reduction algorithm of weighting neighborhood rough sets with Critic method
-
摘要:
邻域粗糙集相比经典粗糙集能够处理非离散型数据和高维度数据,具有获得简化数据且不降低数据处理的能力。针对邻域粗糙集中每个属性具有相同权重,且每个属性对决策的影响程度不同的问题,提出用Critic赋权法加权邻域粗糙集的属性约简算法。使用Critic赋权法为条件属性赋权,引入加权距离函数计算邻域关系,得到加权邻域关系;构建加权邻域粗糙集,采用属性依赖度和重要度评估子集的重要性,使用等距搜索寻找最佳阈值,进行属性约简找到最优属性子集;采用UCI库中的 10个数据集进行实验验证,与传统邻域粗糙集的属性约简算法的性能进行比较分析。实验结果表明:所提算法可得到最小属性约简集,并可保证约简后数据的分类准确率,具有有效性和实际应用价值。
Abstract:Compared with classical rough sets, neighborhood rough sets can process non-discrete and high-dimensional data, and get simplified data without reducing the ability of data processing. An attribute reduction approach of weighting neighborhood rough sets using the Critic method is proposed, aiming at the problem that every attribute in neighborhood rough sets has the same weight and every attribute has varied influence on decision making. Firstly, the Critic method is used to weigh the conditional attributes, the weighted distance function is introduced to calculate the neighborhood relationship, and then the weighted neighborhood relationship is obtained. Secondly, the weighted neighborhood rough sets are constructed, the attribute dependency and importance are used to evaluate the importance of the subset, the isometric search is used to find the best threshold, attribute reduction is carried out, and the optimal attribute subset is found.Finally, the experimental verification is carried out with 10 data sets in the UCI database, and the performance of the attribute reduction algorithm is compared with that of traditional neighborhood rough sets.The outcomes of the experiment demonstrate that the algorithm is able to guarantee the classification accuracy of the reduced data in addition to obtaining the minimum attribute reduction set.It has effectiveness and practical application value.
-
表 1 某信息系统统计信息
Table 1. Statistical information of an information system
U $ {a_1} $ $ {a_2} $ $ {a_3} $ $ {a_4} $ $ d $ $ {x_1} $ 0.06 0.91 0.11 0.08 1 $ {x_2} $ 0.03 0.92 0.32 0.10 1 $ {x_3} $ 0.03 0.43 0.31 0.14 1 $ {x_4} $ 0.06 0.41 0.45 0.11 1 $ {x_5} $ 0.05 0.68 0.67 0.13 1 $ {x_6} $ 0.09 0.12 0.81 0.14 1 $ {x_7} $ 0.08 0.69 0.23 0.03 2 $ {x_8} $ 0.12 0.21 0.39 0.07 2 $ {x_9} $ 0.10 0.89 0.61 0.03 2 $ {x_{10}} $ 0.14 0.93 0.78 0.09 2 $ {x_{11}} $ 0.13 0.41 0.74 0.07 2 $ {x_{12}} $ 0.14 0.43 0.91 0.03 2 表 2 B1、B2的邻域
Table 2. Neighborhood of B1 and B2
U B1 B2 $ {x_1} $ $ \{ {x_1}\} $ $ \{ {x_1}\} $ $ {x_2} $ $ \{ {x_2}\} $ $ \{ {x_2}\} $ $ {x_3} $ $ \{ {x_3}\} $ $ \{ {x_3}\} $ $ {x_4} $ $ \{ {x_4}\} $ $ \{ {x_4}\} $ $ {x_5} $ $ \{ {x_5}\} $ $ \{ {x_5}\} $ $ {x_6} $ $ \{ {x_6}\} $ $ \{ {x_6}\} $ $ {x_7} $ $ \{ {x_7}\} $ $ \{ {x_7}\} $ $ {x_8} $ $ \{ {x_8}\} $ $ \{ {x_8}\} $ $ {x_9} $ $ \{ {x_9}\} $ $ \{ {x_9}\} $ $ {x_{10}} $ $ \{ {x_{10}}\} $ $ \{ {x_{10}}\} $ $ {x_{11}} $ $ \{ {x_{11}}\} $ $ \{ {x_{11}}\} $ $ {x_{12}} $ $ \{ {x_{12}}\} $ $ \{ {x_{12}}\} $ 表 3 B1、B2的加权邻域
Table 3. Weighted neighborhood of B1 and B2
U B1 B2 $ {x_1} $ $ \left\{{x}_{1}, {x}_{2}, {x}_{5}, {x}_{7}, {x}_{9}\right\} $ $ \left\{{x}_{1}, {x}_{2}, {x}_{8}\right\} $ $ {x_2} $ $ \left\{{x}_{1}, {x}_{2}, {x}_{5}\right\} $ $ \left\{{x}_{1}, {x}_{2}, {x}_{4}\right\} $ $ {x_3} $ $ \left\{{x}_{3}, {x}_{4}, {x}_{5}\right\} $ $ \left\{{x}_{3}, {x}_{5}\right\} $ $ {x_4} $ $ \left\{{x}_{3}, {x}_{4}\right\} $ $ \left\{{x}_{2}, {x}_{4}, {x}_{5}, {x}_{10}\right\} $ $ {x_5} $ $ \left\{{x}_{1}, {x}_{2}, {x}_{3}, {x}_{5}, {x}_{7}\right\} $ $ \left\{{x}_{3}, {x}_{4}, {x}_{5}, {x}_{6}\right\} $ $ {x_6} $ $ \left\{{x}_{6}, {x}_{8}\right\} $ $ \left\{{x}_{5}, {x}_{6}\right\} $ $ {x_7} $ $ \begin{array}{c}\left\{{x}_{1}, {x}_{5}, {x}_{7}\right\}\end{array} $ $ \left\{{x}_{7}, {x}_{9}\right\} $ $ {x_8} $ $ \left\{{x}_{6}, {x}_{8}, {x}_{11}, {x}_{12}\right\} $ $ \left\{{x}_{1}, {x}_{8}, {x}_{10}, {x}_{11}\right\} $ $ {x_9} $ $ \left\{{x}_{1}, {x}_{7}, {x}_{9}, {x}_{10}\right\} $ $ \left\{{x}_{7}, {x}_{9}, {x}_{12}\right\} $ $ {x_{10}} $ $ \left\{{x}_{9}, {x}_{10}\right\} $ $ \left\{{x}_{4}, {x}_{8}, {x}_{10}, {x}_{11}\right\} $ $ {x_{11}} $ $ \left\{{x}_{8}, {x}_{11}, {x}_{12}\right\} $ $ \left\{{x}_{8}, {x}_{10}, {x}_{11}\right\} $ $ {x_{12}} $ $ \left\{{x}_{8}, {x}_{11}, {x}_{12}\right\} $ $ \left\{{x}_{9}, {x}_{12}\right\} $ 表 4 实验结果
Table 4. Experimental results
属性子集 准确率/% KNN SVM $ \{ {a_1},{a_2}\} $ 75 83.3 $ \{ {a_3},{a_4}\} $ 83.3 91.7 表 5 数据集信息
Table 5. Dataset information
数据集 样本数 属性数 wine 178 13 glass 214 9 wpbc 198 33 diab 768 8 bupa 345 6 horse 366 27 seeds 210 7 divorce 169 54 hcv 1385 14 german 1000 20 表 6 分类准确率实验结果
Table 6. Experimental results of classification accuracy
% 数据集 算法 Original NRS Entropy 本文算法 wine KNN 94.92 97.22 96.11 95.52 SVM 97.72 98.33 98.30 97.75 glass KNN 66.15 71.95 70.06 69.61 SVM 67.69 65.89 65.45 68.82 wpbc KNN 70.63 76.89 76.79 79.34 SVM 74.77 77.95 80.47 80.39 diab KNN 74.35 74.74 74.34 74.94 SVM 76.83 76.57 76.17 76.17 bupa KNN 58.55 64.61 63.67 64.62 SVM 57.68 66.41 67.41 68.41 horse KNN 66.03 68.97 69.15 69.41 SVM 70.27 72.15 71.57 73.77 seeds KNN 93.65 91.90 91.90 91.90 SVM 92.06 92.86 91.43 92.86 divorce KNN 94.11 97.65 98.23 98.24 SVM 94.11 98.14 98.23 98.24 hcv KNN 88.70 93.55 93.79 93.89 SVM 89.83 93.04 94.06 94.57 german KNN 70.40 72.80 73.00 73.80 SVM 73.50 77.00 76.80 78.40 表 7 约简后的属性个数
Table 7. Number of attributes after reduction
数据集 算法 Original NRS Entropy 本文算法 wine KNN 13 6 5 4 SVM 13 6 6 6 glass KNN 9 7 6 5 SVM 9 8 8 6 wpbc KNN 33 9 7 7 SVM 33 8 7 6 diab KNN 8 7 8 8 SVM 8 4 4 7 bupa KNN 6 6 6 4 SVM 6 6 6 4 horse KNN 27 10 12 10 SVM 27 8 13 12 seeds KNN 7 7 7 7 SVM 7 5 7 3 divorce KNN 54 3 5 5 SVM 54 8 5 5 hcv KNN 14 7 5 5 SVM 14 4 5 6 german KNN 20 9 10 8 SVM 20 14 15 14 -
[1] PAWLAK Z. Rough sets[J]. International Journal of Computer & Information Sciences, 1982, 11(5): 341-356. [2] 苑红星, 卓雪雪, 竺德, 等. 基于矩阵的混合型邻域决策粗糙集增量式更新算法[J]. 控制与决策, 2022, 37(6): 1621-1631.YUAN H X, ZHUO X X, ZHU D, et al. Rough set incremental update algorithm for mixed neighborhood decision based on matrix[J]. Control and Decision, 2022, 37(6): 1621-1631(in Chinese). [3] YAO Y Y. Relational interpretations of neighborhood operators and rough set approximation operators[J]. Information Sciences, 1998, 111(1-4): 239-259. doi: 10.1016/S0020-0255(98)10006-3 [4] YUAN Z, CHEN H M, XIE P, et al. Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions[J]. Applied Soft Computing, 2021, 107: 107353. doi: 10.1016/j.asoc.2021.107353 [5] 饶梦, 苗夺谦, 罗晟. 一种粗糙不确定的图像分割方法[J]. 计算机科学, 2020, 47(2): 72-75. doi: 10.11896/jsjkx.190500177RAO M, MIAO D Q, LUO S. Rough uncertain image segmentation method[J]. Computer Science, 2020, 47(2): 72-75(in Chinese). doi: 10.11896/jsjkx.190500177 [6] WANG G Q, LI T R, ZHANG P F, et al. Double-local rough sets for efficient data mining[J]. Information Sciences, 2021, 571: 475-498. doi: 10.1016/j.ins.2021.05.007 [7] 冀俊忠, 龙腾, 杨翠翠. 基于邻域决策粗糙集的脑功能连接生物标记物识别[J]. 控制与决策, 2023, 38(4): 1092-1100.JI J Z, LONG T, YANG C C. Identifying brain functional connectivity biomarkers based on neighborhood decision rough set[J]. Control and Decision, 2023, 38(4): 1092-1100(in Chinese). [8] KONDO M. On topologies defined by neighbourhood operators of approximation spaces[J]. International Journal of Approximate Reasoning, 2021, 137: 137-145. doi: 10.1016/j.ijar.2021.07.010 [9] HU Q H, PEDRYCZ W, YU D R, et al. Selecting discrete and continuous features based on neighborhood decision error minimization[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2010, 40(1): 137-150. [10] CHEN H M, LI T R, FAN X, et al. Feature selection for imbalanced data based on neighborhood rough sets[J]. Information Sciences, 2019, 483: 1-20. [11] CHEN H M, LI T R, CAI Y, et al. Parallel attribute reduction in dominance-based neighborhood rough set[J]. Information Sciences, 2016, 373: 351-368. doi: 10.1016/j.ins.2016.09.012 [12] 邓志轩, 郑忠龙, 邓大勇. F-邻域粗糙集及其约简[J]. 自动化学报, 2021, 47(3): 695-705.DENG Z X, ZHENG Z L, DENG D Y. F-neighborhood rough sets and its reduction[J]. Acta Automatica Sinica, 2021, 47(3): 695-705(in Chinese). [13] WANG C Z, SHI Y P, FAN X D, et al. Attribute reduction based on k-nearest neighborhood rough sets[J]. International Journal of Approximate Reasoning, 2019, 106: 18-31. doi: 10.1016/j.ijar.2018.12.013 [14] GUO Y T, TSANG E C C, XU W H, et al. Adaptive weighted generalized multi-granulation interval-valued decision-theoretic rough sets[J]. Knowledge-Based Systems, 2020, 187: 104804. doi: 10.1016/j.knosys.2019.06.012 [15] TSANG E C C, HU Q H, CHEN D G. Feature and instance reduction for PNN classifiers based on fuzzy rough sets[J]. International Journal of Machine Learning and Cybernetics, 2016, 7(1): 1-11. doi: 10.1007/s13042-014-0232-6 [16] VLUYMANS S, MAC PARTHALÁIN N, CORNELIS C, et al. Weight selection strategies for ordered weighted average based fuzzy rough sets[J]. Information Sciences, 2019, 501: 155-171. doi: 10.1016/j.ins.2019.05.085 [17] KUMAR R, SINGH S, BILGA P S, et al. Revealing the benefits of entropy weights method for multi-objective optimization in machining operations: Acritical review[J]. Journal of Materials Research and Technology, 2021, 10: 1471-1492. doi: 10.1016/j.jmrt.2020.12.114 [18] YOU C S, YANG S Y. A simple and effective multi-focus image fusion method based on local standard deviations enhanced by the guided filter[J]. Displays, 2022, 72: 102146. doi: 10.1016/j.displa.2021.102146 [19] 徐宇恒, 程嗣怡, 庞梦洋. 基于CRITIC-TOPSIS的动态辐射源威胁评估[J]. 北京航空航天大学学报, 2020, 46(11): 2168-2175.XU Y H, CHENG S Y, PANG M Y. Dynamic radiator threat assessment based on CRITIC-TOPSIS[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(11): 2168-2175(in Chinese). [20] 吴希. 三种权重赋权法的比较分析[J]. 中国集体经济, 2016(34): 73-74. doi: 10.3969/j.issn.1008-1283.2016.34.039WU X. Comparative analysis of three weight empowerment methods[J]. China’s Collective Economy, 2016(34): 73-74(in Chinese). doi: 10.3969/j.issn.1008-1283.2016.34.039 [21] ZIELOSKO B, STAŃCZYK U. Condition attributes, properties of decision rules, and discretisation: Analysis of relations and dependencies[J]. Procedia Computer Science, 2021, 192: 3922-3931. doi: 10.1016/j.procs.2021.09.167 [22] BHADRA D, DHAR N R, SALAM M A. Sensitivity analysis of the integrated AHP-TOPSIS and CRITIC-TOPSIS method for selection of the natural fiber[J]. Materials Today: Proceedings, 2022, 56: 2618-2629. doi: 10.1016/j.matpr.2021.09.178 [23] 李冬. 基于邻域粗糙集的属性约简算法研究及应用[D]. 成都:成都信息工程大学, 2020.LI D. Research and application of attribute reduction algorithm based on neighborhood rough set[D]. Chengdu: Chengdu University of Information Technology, 2020(in Chinese). [24] 吴尚智, 王旭文, 王志宁, 等. 利用粗糙集和支持向量机的银行借贷风险预测模型[J]. 成都理工大学学报(自然科学版), 2022, 49(2): 249-256.WU S Z, WANG X W, WANG Z N, et al. Prediction model of bank lending risk using rough set and support vector machine[J]. Journal of Chengdu University of Technology (Science & Technology Edition), 2022, 49(2): 249-256(in Chinese). [25] XIONG L, YAO Y. Study on an adaptive thermal comfort model with K-nearest-neighbors (KNN) algorithm[J]. Building and Environment, 2021, 202: 108026. doi: 10.1016/j.buildenv.2021.108026 [26] AKRAM-ALI-HAMMOURI Z, FERNÁNDEZ-DELGADO M, ALBTOUSH A, et al. Ideal kernel tuning: Fast and scalable selection of the radial basis kernel spread for support vector classification[J]. Neurocomputing, 2022, 489: 1-8. doi: 10.1016/j.neucom.2022.03.034 [27] HUANG Y Y, GUO K J, YI X W, et al. Matrix representation of the conditional entropy for incremental feature selection on multi-source data[J]. Information Sciences, 2022, 591: 263-286. doi: 10.1016/j.ins.2022.01.037 -


下载: