loading...
Controlling Overfitting in Software Quality Models: Experiments with Regression Trees and Classification
London, England April 04-April 06
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/METRIC.2001.915528Seventh International Software Metric ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Taghi M. Khoshgoftaar, Florida Atlantic University
Edward B. Allen, Mississippi State University
Jianyu Deng, Motorola Inc
In this day of "faster, cheaper, better" release cycles, software developers must focus enhancement efforts on those modules that need improvement the most. Predictions of which modules are likely to have faults during operations is an important tool to guide such improvement efforts during maintenance. Tree-based models are attractive because they readily model nonmonotonic relationships between a response variable and predictors. However, tree-based models are vulnerable to overfitting, where the model reflects the structure of the training data set too closely. Even though a model appears to be accurate on training data, if overfitted, it may be much less accurate when applied to a current data set. To account for the severe consequences of misclassifying fault-prone modules, our measure of overfitting is based on expected costs of misclassification, rather than the total number of misclassifications. In this paper, we apply a regression-tree algorithm in the S-Plus system to classification of software modules by application of our classification rule that accounts for the preferred balance between misclassification rates. We conducted a case study of a very large legacy telecommunications system, and investigated two parameters of the regression-tree algorithm. We found here that minimum deviance was strongly related to overfitting, and can be used to control it, but the effect of minimum node size on overfitting is ambiguous.
Index Terms:
software maintenance, software reliability, software metrics, fault-prone modules, regression trees, classification, overfitting, S-Plus
Citation:
Taghi M. Khoshgoftaar, Edward B. Allen, Jianyu Deng, "Controlling Overfitting in Software Quality Models: Experiments with Regression Trees and Classification," metrics, pp.190, Seventh International Software Metrics Symposium (METRICS'01), 2001
Usage of this product signifies your acceptance of the Terms of Use.