With the widespread adoption of object-oriented technologies, the lack of computationally efficient and scalable approaches is limiting the ability to model and analyze the history of large object-oriented software systems. This paper proposes an approximate representation of object-oriented code characteristics, inspired by pattern recognition centroids for clustering. An interesting application of such a representation is a linear-time complexity algorithm to detect duplicate or nearly duplicated code in object-oriented systems. The algorithm accuracy and time complexity were assessed on 11 releases of a large software system, the Eclipse Framework.
Index Terms:
Object-Oriented Software Evolution, Clone Detection, Source Code Analysis
Citation:
E. Merlo, G. Antoniol, M. Di Penta, V. F. Rollo, "Linear Complexity Object-Oriented Similarity for Clone Detection and Software Evolution Analyses," icsm, pp.412-416, 20th IEEE International Conference on Software Maintenance (ICSM'04), 2004