Two-phase software clustering method based on complex network theory
-
摘要: 将复杂网络社区检测中的GN(Girvan-Newman)算法引入到软件聚类中,针对GN算法中存在的计算量大、可能产生小规模社区的缺陷,提出了一种二阶段聚类方法.首先基于结构模式对软件网络进行聚类.通过识别和聚类软件网络中3种常见的结构模式:卫星结构、链结构和拓扑相似结构,可以有效地减小网络规模.其次,在限制模块大小的前提下利用改进的GN算法进行聚类.如果介数最大边的删除会导致生成的社区规模小于预定值,那么放弃删除该边,转而尝试介数次大的边.实验结果表明:二阶段聚类算法可以有效地改善软件聚类效果,提高现有社区划分算法在大规模软件中的适用性.Abstract: GN(Girvan-Newman) algorithm, a famous community detection algorithm, is introduced into software clustering. In order to overtake the weakness of high computation complexity and avoid generating small scale modules, a two-phase software clustering method is proposed. Firstly, cluster software based on its structure pattern. 3 structure patterns are identified, including: star structure, link structure and topology similarity structure. Cluster these structure patterns could efficiently reduce the scale of software network. Secondly, use modified GN algorithm to cluster software. If the remove of the edge with maximal betweenness would produce a module whose scale is smaller than the value set in advance,this remove action is forbidden. The edge with secondly maximal betweenness is tried. The experiment results show that the two-phase clustering algorithms can improve the effect of software clustering and be applied in the large-scale software.
-
Key words:
- legacy system /
- reverse engineering /
- reengineering
-
[1] Tzerpos V,Holt R C.ACDC: an algorithm for comprehension-driven clustering 7th Conference on Reverse Engineering (WCRE-2000).Brisbane,Australia: IEEE,2000: 258-267 [2] Mitchell B S,Mancoridis S.On the automatic modularization of software systems using the Bunch tool[J].Transactions on Software Engineering,2006,32(3): 193-208 [3] Andritsos P,Tzerpos V.Software clustering based on information loss minimization 10th Working Conference on Reverse Engineering(WCRE-2003).Victoria,BC,Canada: IEEE,2003: 334-344 [4] Wu J,Hassan A E,Holt R C.Comparison of clustering algorithms in the context of software evolution 21st IEEE International Conference on Software Maintenance(ICSM 2005).Budapest,Hungary: IEEE,2005: 525-535 [5] 李兵,马于涛,刘婧,等.软件系统的复杂网络研究进展[J].力学进展, 2008,38(6): 805-814 Li Bing,Ma Yutao,Liu Jing,et al.Advances in the studies on complex networks of software systems [J].Advances in Mechanics,2008,38(6): 805-814(in Chinese) [6] Valverde S,Cancho R F,Sole R V.Scale-free networks from optimal design[J].Europhysics Letters (EPL),2002,60(4):512-517 [7] Girvan M,Newman M E J.Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences of the United States of America,2002,99(12):7821-7826 [8] Newman M E,Girvan M.Finding and evaluating community structure in networks[J].Physical Review E,2004,69(2):026113-1-026113-15 [9] Tomcat . .http://tomcat.apache.org/ [10] Jedit . .http://www.jedit.org/ [11] Shtern M,Tzerpos V.A framework for the comparison of nested software decompositions 11th Working Conference on Reverse Engineering (WCRE-2004).Delft,Netherlands: IEEE,2004: 284-292 [12] Tzerpos V,Holt R C.MoJo: a distance metric for software clusterings 6th Working Conference on Reverse Engineering (WCRE-1999).Atlanta,GA,USA: IEEE,1999: 187-193 [13] El-Ramly M,Iglinski P,Stroulia E,et al.Modeling the system-user dialog using interaction traces 8th Working Conference on Reverse Engineering (WCRE-2001).Stuttgart: IEEE,2001: 208-217 [14] Wen Z,Tzerpos V.Evaluating similarity measures for software decompositions 20th IEEE International Conference on Software Maintenance(ICSM-2004).Chicago,IL,USA: IEEE,2004: 368-377 [15] Wen Z,Tzerpos V.An effectiveness measure for software clustering algorithms 12th IEEE International Workshops on Program Comprehension(IWPC-2004).Bari,Italy: IEEE,2004: 194-203
点击查看大图
计量
- 文章访问数: 3197
- HTML全文浏览量: 32
- PDF下载量: 1755
- 被引次数: 0