loading...
A Probabilistic Approach to Source Code Authorship Identification
Las Vegas, Nevada, USA April 02-April 04
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ITNG.2007.17International Conference on Informati ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Jay Kothari, Drexel University
Maxim Shevertalov, Drexel University
Edward Stehle, Drexel University
Spiros Mancoridis, Drexel University
There exists a need for tools to help identify the authorship of source code. This includes situations in which the ownership of code is questionable, such as in plagiarism or intellectual property infringement disputes. Authorship identification can also be used to assist in the apprehension of the creators of malware. In this paper we present an approach to identifying the authors of source code. We begin by computing a set of metrics to build profiles for a population of known authors using code samples that are verified to be authentic. We then compute metrics on unidentified source code to determine the closest matching profile. We demonstrate our approach on a case study that involves two kinds of software: one based on open source developers working on various projects, and another based on students working on assignments with the same requirements. In our case study we are able to determine authorship with greater than 70% accuracy in choosing the single nearest match and greater than 90% accuracy in choosing the top three ordered nearest matches.
Citation:
Jay Kothari, Maxim Shevertalov, Edward Stehle, Spiros Mancoridis, "A Probabilistic Approach to Source Code Authorship Identification," itng, pp.243-248, International Conference on Information Technology (ITNG'07), 2007
Usage of this product signifies your acceptance of the Terms of Use.