loading...
Ensemble Imputation Methods for Missing Software Engineering Data
Como, Italy September 19-September 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/METRICS.2005.2111th IEEE International Software Metr ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Bhekisipho Twala, Brunel University
Michelle Cartwright, Brunel University
One primary concern of software engineering is prediction accuracy. We use datasets to build and validate prediction systems of software development effort, for example. However it is not uncommon for datasets to contain missing values. When using machine learning techniques to build such prediction systems, handling of incomplete data is an important issue for classifier learning since missing values in either training or test set or in both sets can affect prediction accuracy. Many works in machine learning and statistics have shown that combining (ensemble) individual classifiers is an effective technique for improving accuracy of classification. The ensemble strategy is investigated in the context of incomplete data and software prediction. An ensemble Bayesian multiple imputation and nearest neighbour single imputation method, BAMINNSI, is proposed that constructs ensembles based on two imputation methods. Strong results on two benchmark industrial datasets using decision trees support the method.
Index Terms:
Machine learning, decision trees, incomplete data, imputation, ensemble, software prediction
Citation:
Bhekisipho Twala, Michelle Cartwright, "Ensemble Imputation Methods for Missing Software Engineering Data," metrics, pp.30, 11th IEEE International Software Metrics Symposium (METRICS'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.