Identification method for defect-introducing fine-grained software changes
-
摘要: 软件开发过程中,缺陷通过变更引入软件系统。为提高缺陷发现效率,降低人工审查成本,提出一种引入缺陷细粒度变更自动化识别方法。该方法基于机器学习分类思想,将细粒度变更作为实例,从时间、地点、内容、意图以及人员5方面构造特征集;采用程序静态分析与自然语言语义分析相结合的方法挖掘软件历史库,自动化构建细粒度变更实例;使用软件历史中的细粒度变更实例训练分类器,从而识别新的细粒度变更是否引入了缺陷。在实际软件系统上运用成本有效性评估策略验证方法有效性。结果表明相比于文件和事务粒度的引入缺陷变更识别方法,该方法可显著降低人工审查成本。Abstract: Software defects were introduced into software system by software changes in the software development process. A new method to identify defect-introducing fine-grained changes was proposed to improve the efficiency of defect finding and reduce the cost of manual inspection. This method was based on the idea of machine learning classification. It took the fine-grained change as classification instance and constructed feature set from five dimensions, namely time, context, content, purpose and implementer of the change. It built fine-grained change instances automatically by mining software history repositories with the program static analysis and natural language semantic analysis techniques. It trained a classifier by learning change instances in software history, which could identify whether a new fine-grained change introduced any defects or not. Cost-effectiveness analysis was conducted on real software systems to verify the validity of the proposed method. The results indicate that compared with methods for file and transaction level changes, this method can reduce the manual inspection cost significantly.
-
[1] Aversano L,Cerulo L,Grosso C D.Learning from bug-introducing changes to prevent fault prone code[C]//Penta M D.9th International Workshop on Principles of Software Evolution.New York:ACM,2007:19-26 [2] Kim S,Whitehead E J,Zhang Y.Classifying software changes:clean or buggy [J].IEEE Transactions on Software Engineering,2008,34(2):181-196 [3] Fluri B,Würsch M,Pinzger M,et al.Change distilling:tree differencing for fine-grained source code change extraction[J].IEEE Transactions on Software Engineering,2007,33(11):725-743 [4] Eyolfson J,Tan L,Lam P.Do time of day and developer experience affect commit bugginess [C]//Deursen A.Proceedings of the 8th Working Conference on Mining Software Repositories.Piscataway,NJ:IEEE,2011:153-162 [5] Sliwerski J,Zimmermann T,Zeller A.When do changes induce fixes [J].ACM Sigsoft Software Engineering Notes,2005,30(4):1-5 [6] Halstead M H.Elements of software science[M].Amsterdam:Elsevier North-Holland Press,1977:26-28 [7] McCabe T J.A complexity measure[J].IEEE Transactions on Software Engineering,1976,2(4):308-320 [8] Zimmermann T,Nagappan N.Predicting defects with program dependencies[C]//Mens T.2009 3rd International Symposium on Empirical Software Engineering and Measurement.Piscataway,NJ:IEEE,2009:435-438 [9] Thomas S W,Adams B,Hassan A E,et al.Modeling the evolution of topics in source code histories[C]//Deursen A.Proceedings of the 8th working conference on Mining Software Repositories.Piscataway,NJ:IEEE,2011:173-182 [10] Yan R,Huang C,Tang J,et al.To better stand on the shoulder of giants[C]//Boughida K.Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries.New York:ACM,2012:51-60 [11] Graves T L,Karr A F,Marron J S,et al.Predicting fault incidence using software change history[J].IEEE Transactions on Software Engineering,2000,26(7):653-661 [12] Pan K,Kim S,Whitehead E J.Toward an understanding of bug fix patterns[J].Empirical Software Engineering,2009,14(3):286-315 [13] Chen T H,Thomas S W,Nagappan M,et al.Explaining software defects using topic models[C]//Lanza M.Proceedings of the 9th IEEE Working Conference on Mining Software Repositories.Piscataway,NJ:IEEE,2012:189-198 [14] Mockus A,Votta L G.Identifying reasons for software change using historic databases[C]//Fadini B.2000 IEEE Interantional Conference on Software Maintenance.Piscataway,NJ:IEEE,2000:120-130 [15] Hassan A E.Automated classification of change messages in open source projects[C]//Wainwright R L.23rd Annual ACM Symposium on Applied Computing.New York:ACM,2008:837-841 [16] Hata H,Mizuno O,Kikuno T.Bug prediction based on fine-grained module histories[C]//Glinz M.Proceedings of the 34th International Conference on Software Engineering.Piscataway,NJ:IEEE,2008:837-841 [17] Khoshgoftaar T M,Golawala M,Hulse J V.An empirical study of learning from imbalanced data using random forest[C]//Avouris N.Proceedings of the 19th International Conference on Tools with Artificial Intelligence.Piscataway,NJ:IEEE,2007:310-317
点击查看大图
计量
- 文章访问数: 932
- HTML全文浏览量: 15
- PDF下载量: 521
- 被引次数: 0