loading...
Plagiarism Detection in arXiv
Hong Kong December 18-December 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.126Sixth IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Daria Sorokina, Cornell University, USA
Johannes Gehrke, Cornell University, USA
Simeon Warner, Cornell University, USA
Paul Ginsparg, Cornell University, USA
We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology effi- ciently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to imple- ment as a real-time submission screen for a collection many times larger.
Citation:
Daria Sorokina, Johannes Gehrke, Simeon Warner, Paul Ginsparg, "Plagiarism Detection in arXiv," icdm, pp.1070-1075, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions