Self-supervised learning for community detection based on deep graph convolutional networks
-
摘要:
为缓解图神经网络在群体发现任务中对先验知识的过度依赖并提高识别准确性,提出了一种基于自监督学习和深度图卷积网络(GCN)的群体发现模型。该模型充分利用少量标记节点的语义特征,并通过语义对齐机制获得未知节点的伪标签,从而引入一种自监督学习模块以缓解GCN模型训练过程中对大量先验标签的依赖性。同时,为通过获取网络全局信息以提高群体识别的准确性,通过堆叠多个自监督学习模块构建一种深度图自监督学习模型,并利用初始残差和恒等映射2种策略来克服深度模型带来的过平滑问题。在公开数据集上的实验表明:在给定少量先验标签和加深模型深度的情况下,所提模型与现有模型相比在群体识别精度上表现出了明显优势。
Abstract:To alleviate the excessive dependence of graph neural networks on prior knowledge in community discovery and improve recognition accuracy, a novel self-supervised learning model for community detection based on a deep graph convolutional network (GCN) is proposed. The model makes full use of the semantic features of a small number of nodes and obtains pseudo-labels of unknown nodes through a semantic alignment mechanism, and thus introduces a self-supervised module to alleviate the dependence on a large number of prior labels during the training of GCN. Furthermore, by stacking self-supervised modules, a deep graph self-supervised learning model is built to increase the accuracy of community detection by obtaining the global information of networks. Two strategies, identity mapping and initial residual, are employed to address the over-smoothing issues that the deep model introduces. According to experiments conducted on publicly available datasets, the suggested approach outperforms current models in terms of community recognition accuracy when a limited number of prior labels are used and the model depth is increased.
-
表 1 数据集基本信息
Table 1. Basic information of datasets
数据集 节点数 边数 社群数 特征维度 Cora 2 708 5 429 7 1 433 Citeseer 3 327 4 732 6 3 703 PubMed 19 717 44 338 3 500 表 2 在不同数量的已知标签条件下模型聚类精度比较
Table 2. Comparison of clustering accuracy of various models under different numbers of known labels
% 模型 每类1个标签 每类2个标签 每类5个标签 Cora Citeseer PubMed Cora Citeseer PubMed Cora Citeseer PubMed MLP 42.5 27.9 49.5 48.9 33.6 55.9 59.9 43.2 65.1 GCN[12] 43.2 33.6 51.9 50.1 41.8 54.9 69.0 53.2 68.1 GAT[13] 44.6 33.9 52.1 59.6 44.1 58.6 70.6 54.3 68.8 DAGNN[39] 59.1 44.3 56.3 64.2 53.8 64.8 71.2 54.7 70.6 Self-SAGCN[25] 61.2 56.8 63.2 70.8 63.9 68.8 75.1 66.9 71.9 SDGCN 63.8 57.8 65.3 71.5 64.4 70.9 78.6 68.1 73.5 表 3 在不同模型深度条件下模型聚类精度对比
Table 3. Comparison of clustering accuracy of various models under different model depth
模型 层数为4 层数为16 层数为32 层数为64 Cora Citeseer PubMed Cora Citeseer PubMed Cora Citeseer PubMed Cora Citeseer PubMed GCN[12] 67.8 15.6 17.2 16.4 50.6 18.9 14.9 17.6 71.6 40.2 36.6 39.9 GCN(drop) 80.6 75.7 62.5 49.5 68.6 57.3 41.1 33.2 78.3 77.6 76.2 61.1 JKNet[18] 78.9 79.3 80.6 70.9 67.9 68.2 69.2 63.1 78.1 72.1 73.6 73.1 GCNII[20] 79.2 83.1 83.9 84.1 67.0 70.2 70.8 71.6 79.1 77.6 79.6 79.9 Self-SAGCN[25] 75.1 34.6 31.6 30.8 62.5 48.9 26.2 18.1 75.3 50.8 43.7 40.1 SDGCN 81.2 82.6 84.1 85.2 70.2 71.6 72.7 72.2 78.6 79.2 80.1 80.6 -
[1] BEDRU H D, YU S, XIAO X R, et al. Big networks: a survey[J]. Computer Science Review, 2020, 37: 100247. doi: 10.1016/j.cosrev.2020.100247 [2] DA MATA A S. Complex networks: a mini-review[J]. Brazilian Journal of Physics, 2020, 50(5): 658-672. doi: 10.1007/s13538-020-00772-9 [3] 潘理, 吴鹏, 黄丹华. 在线社交网络群体发现研究进展[J]. 电子与信息学报, 2017, 39(9): 2097-2107. doi: 10.11999/JEIT161192PAN L, WU P, HUANG D H. Reviews on group detection in online social networks[J]. Journal of Electronics & Information Technology, 2017, 39(9): 2097-2107(in Chinese). doi: 10.11999/JEIT161192 [4] CAI B, ZENG L N, WANG Y P, et al. Community detection method based on node density, degree centrality, and K-means clustering in complex network[J]. Entropy, 2019, 21(12): 1145. doi: 10.3390/e21121145 [5] SU X, XUE S, LIU F Z, et al. A comprehensive survey on community detection with deep learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(4): 4682-4702. doi: 10.1109/TNNLS.2021.3137396 [6] 齐金山, 梁循, 李志宇, 等. 大规模复杂信息网络表示学习: 概念、方法与挑战[J]. 计算机学报, 2018, 41(10): 2394-2420. doi: 10.11897/SP.J.1016.2018.02394QI J S, LIANG X, LI Z Y, et al. Representation learning of large-scale complex information network: concepts, methods and challenges[J]. Chinese Journal of Computers, 2018, 41(10): 2394-2420(in Chinese). doi: 10.11897/SP.J.1016.2018.02394 [7] ZHOU Z, AMINI A A. Analysis of spectral clustering algorithms for community detection: the general bipartite setting[J]. The Journal of Machine Learning Research, 2019, 20(1): 1774-1820. [8] ZHANG Y, YEUNG D Y. Overlapping community detection via bounded nonnegative matrix tri-factorization[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 606-614. [9] YANG B, ZHAO X H, LIU X Y. Bayesian approach to modeling and detecting communities in signed network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Austin: AAAI, 2015: 1952-1958. [10] WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24. doi: 10.1109/TNNLS.2020.2978386 [11] 徐冰冰, 岑科廷, 黄俊杰, 等. 图卷积神经网络综述[J]. 计算机学报, 2020, 43(5): 755-780. doi: 10.11897/SP.J.1016.2020.00755XU B B, CEN K T, HUANG J J, et al. A survey on graph convolutional neural network[J]. Chinese Journal of Computers, 2020, 43(5): 755-780(in Chinese). doi: 10.11897/SP.J.1016.2020.00755 [12] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22) [2023-05-25]. https://arxiv.org/abs/1609.02907. [13] VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. (2018-02-04) [2023-05-25]. https://arxiv.org/abs/1710.10903. [14] WANG H W, WANG J, WANG J L, et al. GraphGAN: graph representation learning with generative adversarial nets[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 2508-2515. [15] JIN D, LIU Z Y, LI W H, et al. Graph convolutional networks meet Markov random fields: semi-supervised community detection in attribute networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2019: 152-159. [16] HU R Q, PAN S R, LONG G D, et al. Going deep: graph convolutional ladder-shape networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. NewYork: AAAI, 2020: 2838-2845. [17] JIN D, LI B Y, JIAO P F, et al. Community detection via joint graph convolutional network embedding in attribute network [C]//Proceedings of International Conference on Artificial Neural Networks. Berlin: Springer, 2019: 594-606. [18] XU K, LI C, TIAN Y, et al. Representation learning on graphs with jumping knowledge networks[C]//Proceedings of the International conference on machine learning. Stockholm: PMLR, 2018: 5453-5462. [19] RONG Y, HUANG W B, XU T Y, et al. DropEdge: towards deep graph convolutional networks on node classification[EB/OL]. (2010-03-12) [2023-05-26]. https://arxiv.org/abs/1907.10903. [20] CHEN M, WEI Z, HUANG Z, et al. Simple and deep graph convolutional networks[C]//Proceedings of the International Conference on Machine Learning. Vienna: PMLR, 2020: 1725-1735. [21] LIU X, ZHANG F J, HOU Z Y, et al. Self-supervised learning: generative or contrastive[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(1): 857-876. [22] YOU Y N, CHEN T L, WANG Z Y, et al. When does self-supervision help graph convolutional networks?[EB/OL]. (2020-06-18) [2023-05-26]. https://arxiv.org/abs/2006.09136. [23] ZHU Y Q, XU Y C, YU F, et al. CAGNN: cluster-aware graph neural networks for unsupervised graph representation learning[EB/OL]. (2020-09-03) [2023-05-26]. https://arxiv.org/abs/2009.01674. [24] SUN K, LIN Z C, ZHU Z X. Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 5892-5899. [25] YANG X, DENG C, DANG Z Y, et al. Self-SAGCN: Self-supervised semantic alignment for graph convolution network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 16770-16779. [26] JIA S W, GAO L, GAO Y, et al. Defining and identifying cograph communities in complex networks[J]. New Journal of Physics, 2015, 17(1): 013044. doi: 10.1088/1367-2630/17/1/013044 [27] ZHANG P, MOORE C. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity[J]. Proceedings of the National Academy of Sciences of the United States of America, 2014, 111(51): 18144-18149. [28] FANUEL M, ALAÍZ C M, SUYKENS J A K. Magnetic eigenmaps for community detection in directed networks[J]. Physical Review E, 2017, 95(2-1): 022302. [29] BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics: Theory and Experiment, 2008, 2008: 10008. doi: 10.1088/1742-5468/2008/10/P10008 [30] HOLLAND P W, LASKEY K B, LEINHARDT S. Stochastic block models: first steps[J]. Social Networks, 1983, 5(2): 109-137. doi: 10.1016/0378-8733(83)90021-7 [31] LIU J H, MA G X, JIANG F, et al. Community-preserving graph convolutions for structural and functional joint embedding of brain networks[C]//Proceedings of the IEEE International Conference on Big Data. Piscataway: IEEE Press, 2019: 1163-1168. [32] ZHANG T Q, XIONG Y, ZHANG J W, et al. CommDGI: community detection oriented deep graph infomax[C]//Proceedings of the ACM International Conference on Information & Knowledge Management. New York: ACM, 2020: 1843-1852. [33] ZHAO H, YANG X, WANG Z, et al. Graph debiased contrastive learning with joint representation clustering[C]//Proceedings of the International Joint Conference on Artificial Intelligence. Montreal: IJCAI, 2021: 3434-3440. [34] WANG X F, LI J H, YANG L, et al. Unsupervised learning for community detection in attributed networks based on graph convolutional network[J]. Neurocomputing, 2021, 456: 147-155. doi: 10.1016/j.neucom.2021.05.058 [35] QIAO F, HUANG C X, XU S, et al. Community detection model based on graph representation and self-supervised learning[C]//Proceedings of the Artificial Intelligence and Security. Berlin: Springer, 2021: 27-40. [36] JIN W, DERR T, LIU H, et al. Self-supervised learning on graphs: deep insights and new direction[EB/OL]. (2020-06-17)[2023-05-20]. https://doi.org/10.48550/arXiv.2006.10141. [37] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [38] KLICPERA J, BOJCHEVSKI A, GÜNNEMANN S. Predict then propagate: graph neural networks meet personalized PageRank[EB/OL]. (2022-04-05) [2023-05-28]. https://arxiv.org/abs/1810.05997. [39] LIU M, GAO H Y, JI S W. Towards deeper graph neural networks[C]//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2020: 338-348. -