留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于快速模拟退火的组合聚类算法

李红 张志宾

李红, 张志宾. 基于快速模拟退火的组合聚类算法[J]. 北京航空航天大学学报, 2019, 45(8): 1646-1652. doi: 10.13700/j.bh.1001-5965.2018.0647
引用本文: 李红, 张志宾. 基于快速模拟退火的组合聚类算法[J]. 北京航空航天大学学报, 2019, 45(8): 1646-1652. doi: 10.13700/j.bh.1001-5965.2018.0647
LI Hong, ZHANG Zhibin. Ensemble clustering algorithm based on rapid simulated annealing[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(8): 1646-1652. doi: 10.13700/j.bh.1001-5965.2018.0647(in Chinese)
Citation: LI Hong, ZHANG Zhibin. Ensemble clustering algorithm based on rapid simulated annealing[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(8): 1646-1652. doi: 10.13700/j.bh.1001-5965.2018.0647(in Chinese)

基于快速模拟退火的组合聚类算法

doi: 10.13700/j.bh.1001-5965.2018.0647
基金项目: 

国家自然科学基金 71471009

详细信息
    作者简介:

    李红  女, 博士, 副教授, 硕士生导师。主要研究方向:数据挖掘、社会计算

    张志宾  男, 硕士研究生。主要研究方向:商务智能

    通讯作者:

    李红, E-mail: hong_lee@buaa.edu.cn

  • 中图分类号: TP391

Ensemble clustering algorithm based on rapid simulated annealing

Funds: 

National Natural Science Foundation of China 71471009

More Information
  • 摘要:

    应用模拟退火算法解决组合聚类问题有两方面,一是有效利用基础聚类作为先验信息,以获得尽可能好的组合聚类结果;二是降低模拟退火过程的随机性,提高算法收敛速度。针对这2个问题,提出了基于投票的快速模拟退火(BV-RSA)模型。该模型利用基础聚类对样本划分的完全或部分一致性作为启发信息,构建超点集合和超点投票箱,由超点取代其代表的样本子集参与退火过程,超点运动方向在投票箱范围内随机选择,降低了超点运动随机性,加速了组合聚类过程。数据集实验表明,BV-RSA模型在聚类精度和鲁棒性方面表现良好。

     

  • 图 1  BV-RSA模型框架

    Figure 1.  Framework of BV-RSA model

    图 2  基础聚类数量对模型精度的影响

    Figure 2.  Impact of number of basic clusters on model precision

    图 3  k-means和BV-RSA模型的司机分群结果

    Figure 3.  Driver grouping result obtained from k-means and BV-RSA model

    图 4  各类司机群体的工作日接单量分布

    Figure 4.  Distribution of order quantity received by different driver groups in workdays

    图 5  各类司机群体的周末接单量分布

    Figure 5.  Distribution of order quantity received by different driver groups on weekend

    表  1  基础聚类与超点示例

    Table  1.   An example of basic partition and super-objects

    数据样本 基础聚类的簇标签 超点
    π1 π2 π3
    x1 C1 C2 C3 S1
    x2 C1 C2 C3
    x3 C1 C2 C3
    x4 C1 C1 C1 S2
    x5 C2 C1 C1 S3
    x6 C2 C1 C1
    x7 C2 C1 C1
    x8 C2 C1 C1
    x9 C3 C3 C1 S4
    x10 C3 C3 C2 S5
    x11 C3 C3 C2
    x12 C3 C3 C2
    下载: 导出CSV

    表  2  实验数据集描述

    Table  2.   Description of experimental datasets

    数据集 样本数 属性数 类别数 来源
    iris 150 3 3 UCI
    glass 214 9 6 UCI
    segment 2 310 19 7 UCI
    hitech 2 301 126 321 6 CLUTO
    k1b 2 340 21 839 6 CLUTO
    la12 6 279 31 472 6 CLUTO
    re1 1 657 3 758 25 CLUTO
    reviews 4 069 126 373 5 CLUTO
    sports 8 580 126 373 7 CLUTO
    tr11 414 6 429 9 CLUTO
    tr12 313 5 804 8 CLUTO
    tr41 878 7 454 10 CLUTO
    tr45 690 8 261 10 CLUTO
    letter 20 000 16 26 LIBSVM
    mnist 70 000 786 10 LIBSVM
    下载: 导出CSV

    表  3  各模型聚类结果的精度比较

    Table  3.   Comparison of clustering result precision of different models

    数据集 BV-RSA/C BV-RSA/R CSPA HGPA MCLA k-means
    iris 0.91 0.91 0.79 0.73 0.91 0.89
    glass 0.53 0.54 0.60 0.56 0.53 0.54
    segment 0.67 0.65 0.59 0.52 0.67 0.66
    hitech 0.56 0.56 0.56 0.53 0.56 0.51
    k1b 0.89 0.87 0.86 0.86 0.91 0.64
    la12 0.74 0.75 0.71 0.60 0.76 0.69
    re1 0.71 0.70 0.70 0.62 0.71 0.43
    reviews 0.67 0.67 0.68 0.62 0.67 0.64
    sports 0.93 0.93 0.65 0.49 0.93 0.64
    tr11 0.85 0.86 0.79 0.65 0.85 0.68
    tr12 0.78 0.74 0.79 0.73 0.78 0.64
    tr41 0.82 0.81 0.72 0.61 0.81 0.65
    tr45 0.80 0.78 0.79 0.65 0.74 0.68
    letter 0.34 0.33 0.24 0.24 0.30 0.26
    mnist 0.57 0.55 0.35 0.56 0.54
    下载: 导出CSV

    表  4  输入数据描述

    Table  4.   Description of input data

    字段名称 字段含义
    uid 司机id
    morning_cnt 司机在早高峰时间段总共接单数
    night_cnt 司机在晚高峰时间段总共接单数
    usual_cnt 司机在平峰时间段总共接单数
    weekday_cnt 司机在工作日总共接单数
    weekend_cnt 司机在周末总共接单数
    income 司机的收入
    complain_rate 司机被投诉率
    route_num 司机设置有效的常用路线数
    gender 性别(1-男;2-女)
    Age 年龄
    下载: 导出CSV
  • [1] NGUYEN N, CARUANA R.Consensus clusterings[C]//IEEE International Conference on Data Mining.Piscataway, NJ: IEEE Press, 2007: 607-612.
    [2] STREHL A, GHOSH J.Cluster ensembles:A knowledge reuse framework for combining partitionings[J].Journal of Machine Learning Research, 2002, 3(3):583-617. http://cn.bing.com/academic/profile?id=95607976ad8495bf99b3e460dbad4285&encoded=0&v=paper_preview&mkt=zh-cn
    [3] AYAD H, KAMEL M.Refined shared nearest neighbors graph for combining multiple data clusterings[C]//Advances in Intelligent Data Analysis.Berlin: Springer, 2003: 307-318. https://www.researchgate.net/publication/221460998_Refined_Shared_Nearest_Neighbors_Graph_for_Combining_Multiple_Data_Clusterings
    [4] YANG Y, KAMEL M S.An aggregated clustering approach using multi-ant colonies algorithms[J].Pattern Recognition, 2006, 39(7):1278-1289. doi: 10.1016/j.patcog.2006.02.012
    [5] FERN X Z, BRODLEY C E.Solving cluster ensemble problems by bipartite graph partitioning[C]//Proceedings of Twenty-First International Conference on Machine Learning.New York: ACM, 2004: 281-288. http://web.engr.oregonstate.edu/~xfern/graph_icml04.pdf
    [6] YU Z, HAN G, LI L, et al.Adaptive noise immune cluster ensemble using affinity propagation[C]//IEEE International Conference on Data Engineering.Piscataway, NJ: IEEE Press, 2016: 1454-1455. Adaptive noise immune cluster ensemble using affinity propagation
    [7] FRED A L N, JAIN A K.Combining multiple clusterings using evidence accumulation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6):835-850. doi: 10.1109/TPAMI.2005.113
    [8] WANG X, YANG C, ZHOU J.Clustering aggregation by probability accumulation[J].Pattern Recognition, 2009, 42(5):668-675. doi: 10.1016/j.patcog.2008.09.013
    [9] HU M, DENG X, YAO Y.A sequential three-way approach to constructing a co-association matrix in consensus clustering[C]//International Joint Conference on Rough Sets.Berlin: Springer, 2018, 11103: 599-613. doi: 10.1007%2F978-3-319-99368-3_47
    [10] LIU H, LIU T, WU J, et al.Spectral ensemble clustering[C]//Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York: ACM, 2015: 715-724. https://wenku.baidu.com/view/b5f0cb917fd5360cbb1adb37.html
    [11] HUANG D, WANG C D, LAI J H.Locally weighted ensemble clustering[J].IEEE Transactions on Cybernetics, 2016, 48(5):1460-1473. http://d.old.wanfangdata.com.cn/Periodical/nygcxb201314022
    [12] ZHOU Z H, TANG W.Clusterer ensemble[J].Knowledge-Based Systems, 2006, 19(1):77-83. http://d.old.wanfangdata.com.cn/Periodical/rjxb200504002
    [13] FU Y, YANG Y, LIU Y.A decision model for fuzzy clustering ensemble[C]//Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering.Paris: Atlantis Press, 2007. https://www.researchgate.net/publication/266650562_A_Decision_Model_for_Fuzzy_Clustering_Ensemble
    [14] 陈晓云, 陈刚.基于最大内聚度基准的加权投票聚类集成[J].控制与决策, 2014(2):236-240. http://d.old.wanfangdata.com.cn/Periodical/kzyjc201402008

    CHEN X Y, CHEN G.Weighted voting clustering ensemble based on maximum cohesion[J].Control and Decision, 2014(2):236-240(in Chinese). http://d.old.wanfangdata.com.cn/Periodical/kzyjc201402008
    [15] LU Z, PENG Y, XIAO J.From comparing clusterings to combining clusterings[C]//23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference.Palo Alto: AAAI, 2008, 2: 665-670. https://www.aaai.org/Papers/AAAI/2008/AAAI08-106.pdf
    [16] HUANG D, LAI J, WANG C D.Ensemble clustering using factor graph[J].Pattern Recognition, 2016, 50(C):131-142. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=b837e47b9d728114f197c99a1582ceec
    [17] TOPCHY A P, JAIN A K, PUNCH W F.A mixture model for clustering ensembles[C]//Proceedings of the Fourth SIAM International Conference on Data Mining.Philadelphia: SIAM Publications, 2004: 379-390. http://dataclustering.cse.msu.edu/papers/topchy_mixture_siam_accepted.pdf
    [18] 蒋君, 徐蔚鸿, 潘楚.基于粒计算和模拟退火的K-medoids聚类算法[J].计算机仿真, 2015, 32(12):214-217. doi: 10.3969/j.issn.1006-9348.2015.12.045

    JIANG J, XU W H, PAN C.Improved K-medoids clustering algorithm based on many factors[J].Computer Simulation, 2015, 32(12):214-217(in Chinese). doi: 10.3969/j.issn.1006-9348.2015.12.045
    [19] SARTAKHTI J S, AFRABANDPEY H, SARAEE M.Simulated annealing least squares twin support vector machine (SA-LSTSVM) for pattern classification[J].Soft Computing, 2017, 21(15):4361-4373. doi: 10.1007/s00500-016-2067-4
  • 加载中
图(5) / 表(4)
计量
  • 文章访问数:  459
  • HTML全文浏览量:  2
  • PDF下载量:  306
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-11-08
  • 录用日期:  2019-03-29
  • 刊出日期:  2019-08-20

目录

    /

    返回文章
    返回
    常见问答