loading...
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
Minneapolis, Minnesota May 20-May 26
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICSE.2007.3029th International Conference on Soft ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Lingxiao Jiang, University of California, Davis, USA
Ghassan Misherghi, University of California, Davis, USA
Zhendong Su, University of California, Davis, USA
Stephane Glondu, ENS de Cachan, France
Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space \mathbb{R}^n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.
Citation:
Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu, "DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones," icse, pp.96-105, 29th International Conference on Software Engineering (ICSE'07), 2007
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions