Classifying Software Changes: Clean or Buggy?
|
This paper introduces a new technique for finding latent software bugs called change classification. Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes, or clean changes. In this manner, change classification predicts the existence of bugs in software changes. The classifier is trained using features (in the machine learning sense) extracted from the revision history of a software project, as stored in its software configuration management repository. The trained classifier can classify changes as buggy or clean with 78% accuracy and 65% buggy change recall (on average). Change classification has several desirable qualities: (1) the prediction granularity is small (a change to a single file), (2) predictions do not require semantic information about the source code, (3) the technique works for a broad array of project types and programming languages, and (4) predictions can be made immediately upon completion of a change. Contributions of the paper include a description of the change classification approach, techniques for extracting features from source code and change histories, a characterization of the performance of change classification across 12 open source projects, and evaluation of the predictive power of different groups of features.
[1] 181 E. Alpaydin, Introduction to Machine Learning. MIT Press, 2004.
[2] G. Antoniol, G. Casazza, and A. Cimitile, “Traceability Recovery by Modeling Programmer Behavior,” Proc. Seventh Working Conf. Reverse Eng., pp. 240-247, 2000.
[3] J. Anvik, L. Hiew, and G.C. Murphy, “Who Should Fix This Bug?” Proc. 28th Int'l Conf. Software Eng., pp. 361-370, 2006.
[4] J. Bevan, E.J. Whitehead Jr., S. Kim, and M. Godfrey, “Facilitating Software Evolution with Kenyon,” Proc. 10th European Software Eng. Conf./13th ACM Int'l Symp. Foundations of Software Eng., pp.177-186, 2005.
[5] Y. Brun and M.D. Ernst, “Finding Latent Code Errors via Machine Learning over Program Executions,” Proc. 26th Int'l Conf. Software Eng., pp. 480-490, 2004.
[6] D. Cubranic and G.C. Murphy, “Hipikat: Recommending Pertinent Software Development Artifacts,” Proc. 25th Int'l Conf. Software Eng., pp. 408-418, 2003.
[7] M. Fischer, M. Pinzger, and H. Gall, “Populating a Release History Database from Version Control and Bug Tracking Systems,” Proc. 19th Int'l Conf. Software Maintenance, pp. 23-32, 2003.
[8] C. Flanagan, K.R.M. Leino, M. Lillibridge, G. Nelson, J.B. Saxe, and R. Stata, “Extended Static Checking for Java,” Proc. ACM Conf. Programming Language Design and Implementation, pp. 234-245, 2002.
[9] Y. Freund and R.E. Schapire, “A Short Introduction to Boosting,” J. Japanese Soc. Artificial Intelligence, vol. 14, no. 5, pp. 771-780, 1999.
[10] G.A. Di Lucca, M. Di Penta, and S. Gradara, “An Approach to Classify Software Maintenance Requests,” Proc. 18th Int'l Conf. Software Maintenance, pp. 93-102, 2002.
[11] T.L. Graves, A.F. Karr, J.S. Marron, and H. Siy, “Predicting Fault Incidence Using Software Change History,” IEEE Trans. Software Eng., vol. 26, no. 7, pp. 653-661, July 2000.
[12] T. Gyimothy, R. Ferenc, and I. Siket, “Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction,” IEEE Trans. Software Eng., vol. 31, no. 10, pp. 897-910, Oct. 2005.
[13] A.E. Hassan and R.C. Holt, “The Top Ten List: Dynamic Fault Prediction,” Proc. 21st Int'l Conf. Software Maintenance, pp. 263-272, 2005.
[14] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proc. 10th European Conf. Machine Learning, pp. 137-142, 1998.
[15] T.M. Khoshgoftaar and E.B. Allen, “Ordering Fault-Prone Software Modules,” Software Quality Control J., vol. 11, no. 1, pp. 19-37, 2003.
[16] T.M. Khoshgoftaar and E.B. Allen, “Predicting the Order of Fault-Prone Modules in Legacy Software,” Proc. Ninth Int'l Symp. Software Reliability Eng., pp. 344-353, 1998.
[17] S. Kim, K. Pan, and E.J. Whitehead Jr., “Memories of Bug Fixes,” Proc. 14th ACM Symp. Foundations of Software Eng., pp. 35-45, 2006.
[18] S. Kim, T. Zimmermann, E.J. Whitehead Jr., and A. Zeller, “Predicting Bugs from Cached History,” Proc. 29th Int'l Conf. Software Eng., pp. 489-498, 2007.
[19] R. Krovetz, S. Ugurel, and C.L. Giles, “Classification of Source Code Archives,” Proc. ACM SIGIR '03, pp. 425-426, 2003.
[20] A. Kuhn, S. Ducasse, and T. Girba, “Enriching Reverse Engineering with Semantic Clustering,” Proc. 12th Working Conf. Reverse Eng., pp. 133-142, 2005.
[21] R. Kumar, S. Rai, and J.L. Trahan, “Neural-Network Techniques for Software-Quality Evaluation,” Proc. Ann. Reliability and Maintainability Symp., pp. 155-161, 1998.
[22] T.K. Landauer, P.W. Foltz, and D. Laham, “Introduction to Latent Semantic Analysis,” Discourse Processes, vol. 25, pp. 259-284, 1998.
[23] D. Lewis, Y. Yang, T. Rose, and F. Li, “RCV1: A New Benchmark Collection for Text Categorization Research,” J. Machine Learning Research, vol. 5, pp. 361-397, 2004.
[24] Z. Li and Y. Zhou, “PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code,” Proc. 10th European Software Eng. Conf./13th ACM Int'l Symp. Foundations of Software Eng., pp. 306-315, 2005.
[25] B. Livshits and T. Zimmermann, “DynaMine: Finding Common Error Patterns by Mining Software Revision Histories,” Proc. 10th European Software Eng. Conf./13th ACM Int'l Symp. Foundations of Software Eng., pp. 296-305, 2005.
[26] J. Lyle and M. Weiser, “Automatic Program Bug Location by Program Slicing,” Proc. Second Int'l Conf. Computers and Applications, pp. 877-883, 1987.
[27] J. Madhavan and E.J. Whitehead Jr., “Predicting Changes Inside an Integrated Development Environment,” Proc. Eclipse Technology Exchange Workshop, 2007.
[28] J.I. Maletic and N. Valluri, “Automatic Software Clustering via Latent Semantic Analysis,” Proc. 14th IEEE Int'l Conf. Automated Software Eng., p. 251, 1999.
[29] A. Marcus and J.I. Maletic, “Recovering Documentation-to-Source-Code Traceability Links Using Latent Semantic Indexing,” Proc. 25th Int'l Conf. Software Eng., pp. 125-135, 2003.
[30] T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Trans. Software Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007.
[31] O. Mizuno and T. Kikuno, “Training on Errors Experiment to Detect Fault-Prone Software Modules by Spam Filter,” Proc. 11th European Software Eng. Conf./15th ACM Int'l Symp. Foundations of Software Eng., pp. 405-414, 2007.
[32] A. Mockus and L.G. Votta, “Identifying Reasons for Software Changes Using Historic Databases,” Proc. 16th Int'l Conf. Software Maintenance, pp. 120-130, 2000.
[33] A. Mockus and D.M. Weiss, “Predicting Risk of Software Changes,” Bell Labs Technical J., vol. 5, no. 2, pp. 169-180, 2002.
[34] D.C. Montgomery, G.C. Runger, and N.F. Hubele, Engineering Statistics. Wiley, 2001.
[35] A.W. Moore, “Cross-Validation,” http://www.autonlab.org/tutorialsoverfit.html , 2005.
[36] N. Nagappan and T. Ball, “Use of Relative Code Churn Measures to Predict System Defect Density,” Proc. 27th Int'l Conf. Software Eng., pp. 284-292, 2005.
[37] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” http://www.ics.uci. edu/~mlearnMLRepository.html , 1988.
[38] T.J. Ostrand and E.J. Weyuker, “The Distribution of Faults in a Large Industrial Software System,” Proc. ACM Int'l Symp. Software Testing and Analysis, pp. 55-64, 2002.
[39] T.J. Ostrand, E.J. Weyuker, and R.M. Bell, “Predicting the Location and Number of Faults in Large Software Systems,” IEEE Trans. Software Eng., vol. 31, no. 4, pp. 340-355, Apr. 2005.
[40] T.J. Ostrand, E.J. Weyuker, and R.M. Bell, “Where the Bugs Are,” Proc. ACM Int'l Symp. Software Testing and Analysis, pp. 86-96, 2004.
[41] K. Pan, S. Kim, and E.J. Whitehead, Jr., “Bug Classification Using Program Slicing Metrics,” Proc. Sixth IEEE Int'l Workshop Source Code Analysis and Manipulation, 2006.
[42] M.D. Penta, S. Gradara, and G. Antoniol, “Traceability Recovery in RAD Software Systems,” Proc. 10th IEEE Int'l Workshop Program Comprehension, pp. 207-216, 2002.
[43] B. Raskutti, H.L. Ferrá, and A. Kowalczyk, “Second-Order Features for Maximizing Text Classification Performance,” Proc. 12th European Conf. Machine Learning, pp. 419-430, 2001.
[44] Scientific Toolworks, “Maintenance, Understanding, Metrics and Documentation Tools for Ada, C, C++, Java, and FORTRAN,” http:/www.scitools.com/, 2005.
[45] S. Scott and S. Matwin, “Feature Engineering for Text Classification,” Proc. 16th Int'l Conf. Machine Learning, pp. 379-388, 1999.
[46] F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[47] J. Śliwerski, T. Zimmermann, and A. Zeller, “When Do Changes Induce Fixes?” Proc. Int'l Workshop Mining Software Repositories, pp. 24-28, 2005.
[48] J.C. Spohrer, E. Soloway, and E. Pope, “Where the Bugs Are,” Proc. ACM Conf. Human Factors in Computing Systems, pp. 47-53, 1985.
[49] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[50] Wikipedia, “CamelCase,” http://en.wikipedia.org/wikiCamel Case, 2005.
[51] C.C. Williams and J.K. Hollingsworth, “Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques,” IEEE Trans. Software Eng., vol. 31, no. 6, pp. 466-480, June 2005.
[52] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.
[53] L. Zhang, J. Zhu, and T. Yao, “An Evaluation of Statistical Spam Filtering Techniques,” ACM Trans. Asian Language Information Processing, vol. 3, no. 4, pp. 243-269, 2004.
[54] Z. Zheng, X. Wu, and R. Srihari, “Feature Selection for Text Categorization on Imbalanced Data,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 80-89, 2004.
[55] T. Zimmermann and P. Weißgerber, “Preprocessing CVS Data for Fine-Grained Analysis,” Proc. Int'l Workshop Mining Software Repositories, pp. 2-6, 2004.
[56] T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller, “Mining Version Histories to Guide Software Changes,” IEEE Trans. Software Eng., vol. 31, no. 6, pp. 429-445, June 2005.
Index Terms:
Software maintenance, Metrics/Measurement, Clustering, classification, and association rules, Configuration Management, Data mining
Citation:
Sunghun Kim, E. James Whitehead, Jr., Yi Zhang, "Classifying Software Changes: Clean or Buggy?," IEEE Transactions on Software Engineering, vol. 34, no. 2, pp. 181-196, Mar./Apr. 2008, doi:10.1109/TSE.2007.70773