In this paper, we report on the identification of document type using a k-dependence Bayesian categorization engine. In particular, we show that the use of font and capitalization as features improves precision and recall.
Index Terms:
text categorization, document classification, document type, OCR
Citation:
Kazem Taghva, Jason Vergara, "Feature Selection for Document Type Classification," itng, pp.179-182, Fifth International Conference on Information Technology: New Generations (itng 2008), 2008