loading...
An Investigation of Analysis Techniques for Software Datasets
Boca Raton, Florida November 04-November 06
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/METRIC.1999.809734Sixth International Software Metrics ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Lesley Pickard, Keele University
Barbara Kitchenham, Keele University
Susan Linkman, Keele University
The goal of this study was to investigate the efficacy of different data analysis techniques for software data. We used simulation to create datasets with a known underlying model and with non-Normal characteristics that are frequently found in software datasets: skewness, unstable variance, and outliers and combinations of these characteristics.We investigated three main statistically-based data analysis techniques: Residual Analysis; Multivariate regression; Classification and Regression Trees (CART). In addition to the standard 'Least Squares' version of the technique, we also investigated robust and non-parametric versions of the techniques.We found that standard multivariate regression techniques were best if the data only exhibited skewness. However, under more extreme conditions such as severe heteroscedasticity, the non-parametric residual analysis technique performed best.We also found that even when the analysis technique did not accurately recreate the true underlying model, the faulty model could generate reasonably good predictions. This study indicates that simulation is very useful technique for evaluating different data analysis techniques.
Index Terms:
Data Analysis Techniques; Software Datasets; Analysis Technique Evaluation
Citation:
Lesley Pickard, Barbara Kitchenham, Susan Linkman, "An Investigation of Analysis Techniques for Software Datasets," metrics, pp.130, Sixth International Software Metrics Symposium (METRICS'99), 1999
Usage of this product signifies your acceptance of the Terms of Use.