loading...
Problems Creating Task-relevant Clone Detection Reference Data
Victoria, B.C., Canada November 13-November 17
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WCRE.2003.128725910th Working Conference on Reverse En ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Andrew Walenstein, University of Louisiana at Lafayette
Nitin Jyoti, University of Louisiana at Lafayette
Junwei Li, University of Louisiana at Lafayette
Yun Yang, University of Louisiana at Lafayette
Arun Lakhotia, University of Louisiana at Lafayette
One prevalent method for evaluating the results of automated software analysis tools is to compare the tools' output to the judgment of human experts. This evaluation strategy is commonly assumed in the field of software clone detector research. We report our experiences from a study using several human judges who tried to establish "reference sets" of function clones for several medium-sized software systems written in C. The study employed multiple judges and followed a process typical for inter-coder reliability assurance wherein coders discussed classification discrepancies until consensus is reached. A high level of disagreement was found for reference sets made specifically for reengineering task contexts. The results, although preliminary, raise questions about limitations of prior clone detector evaluations and other similar tool evaluations. Implications are drawn for future work on reference data generation, tool evaluations, and benchmarking efforts.
Citation:
Andrew Walenstein, Nitin Jyoti, Junwei Li, Yun Yang, Arun Lakhotia, "Problems Creating Task-relevant Clone Detection Reference Data," wcre, pp.285, 10th Working Conference on Reverse Engineering (WCRE 2003), 2003
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions