Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study
|
Text categorization systems often induce document classifiers from pre-classified examples by the use of machine learning techniques. The circumstance that each example-document can belong to many different classes often leads to impractically high computational costs that sometimes grow exponentially in the number of features. Looking for ways to reduce these costs, we explored the possibility of running a ``baseline induction algorithm'' separately for subsets of features, obtaining a set of classifiers to be combined. For the specific case of classifiers that return not only class labels but also confidences in these labels, we investigate here a few alternative fusion techniques, including our own mechanism that was inspired by the Dempster-Shafer Theory. The paper describes the algorithm and, in our specific case study, compares its performance to that of more traditional mechanisms.
[1] 1638 D.D. Lewis, R.E. Schapire, J.P. Callan, and R. Papka, “Training Algorithms for Linear Text Classifiers,” Proc. ACM SIGIR '96, pp.298-306, 1996.
[2] Y. Yang and C.G. Chute, “An Example-Based Mapping Method for Text Categorization and Retrieval,” ACM Trans. Information Systems, vol. 12, no. 3, pp. 252-277, July 1994.
[3] S. Eyheramendy, D.D. Lewis, and D. Madigan, “On the Naive Bayes Model for Text Categorization,” Proc. Ninth Int'l Workshop Artificial Intelligence and Statistics (AI & Statistics '03), citeseer. ist.psu.edu/articleeyheramendy03naive.html , 2003.
[4] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers,” Machine Learning, vol. 29, nos. 2-3, pp. 131-163, citeseer.ist.psu.edufriedman97bayesian.html , 1997.
[5] P. Langley, W. Iba, and K. Thompson, “An Analysis of Bayesian Classifiers,” Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 223-228, citeseer.ist.psu.edu/articlelangley92analysis.html , 1992.
[6] C. Apte and S. Weiss, “Data Mining with Decision Trees and Decision Rules,” Future Generation Computer Systems, vol. 13, pp.197-210, , Nov. 1997.
[7] S.M. Weiss, C. Apt, F.J. Damerau, D.E. Johnson, F.J. Oles, T. Goetz, and T. Hampp, “Maximizing Text-Mining Performance,” IEEE Intelligent Systems, vol. 14, no. 4, pp. 63-69, July-Aug. 1999.
[8] B. Li, Q. Lu, and S. Yu, “An Adaptive $k$ -Nearest Neighbor Text Categorization Strategy,” ACM Trans. Asian Language Information Processing, vol. 3, pp. 215-226, 2004.
[9] M.E. Ruiz and P. Srinivasan, “Automatic Text Categorization Using Neural Networks,” Advances in Classification Research: Proc. Classification Research Workshop (ASIS SIG/CR), vol. 8, pp. 59-72, Oct. 1998.
[10] M.E. Ruiz and P. Srinivasan, “Hierarchical Text Categorization Using Neural Networks,” Information Retrieval, vol. 5, no. 1, pp. 87-118, citeseer.ist.psu.eduruiz02hierarchical.html , 2002.
[11] E.D. Wiener, J.O. Pedersen, and A.S. Weigend, “A Neural Network Approach to Topic Spotting,” Proc. Fourth Ann. Symp. Document Analysis and Information Retrieval (SDAIR '95), pp. 317-332, citeseer.ist.psu.eduwiener95neural.html, 1995.
[12] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proc. Ninth European Conf. Machine Learning (ECML '98), pp. 137-142, citeseer.ist.psu. edu/articlejoachims98text.html , 1998.
[13] J.T. Kwok, “Automated Text Categorization Using Support Vector Machine,” Proc. Fifth Int'l Conf. Neural Information Processing (ICONIP '98), pp. 347-351, citeseer.ist.psu.edukwok98automated. html , 1998.
[14] Y. Freund, “Boosting a Weak Learning Algorithm by Majority,” Information and Computation, vol. 121, no. 2, pp. 256-285, Sept. 1995.
[15] Y. Freund and R.E. Schapire, “Experiments with a New Boosting Algorithm,” Proc. 13th Int'l Conf. Machine Learning (ICML '96), pp.148-156, citeseer.ist.psu.edufreund96experiments.html , 1996.
[16] R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, vol. 5, no. 2, pp. 197-227, 1990.
[17] R.E. Schapire and Y. Singer, “Improved Boosting Using Confidence-Rated Predictions,” Machine Learning, vol. 37, no. 3, pp.297-336, citeseer.ist.psu.eduschapire99improved.html , 1999.
[18] F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[19] F. Sebastiani, “Text Categorization,” Text Mining and Its Applications, A. Zanasi, ed., pp. 109-129, 2005.
[20] European Communities, http://europa.eu.int/celexeurovoc, 2005.
[21] Institutional JRC-IPSC Exploratory Research Project, http://www.jrc.cec.eu.int/langtecheurovoc.html , 2005.
[22] Y. Freund and L. Mason, “The Alternating Decision Tree Learning Algorithm,” Proc. 16th Int'l Conf. Machine Learning (ICML '99), pp.124-133, citeseer.ist.psu.edufreund99alternating.html , 1999.
[23] R.E. Schapire and Y. Singer, “BoosTexter: A Boosting-Based System for Text Categorization,” Machine Learning, vol. 39, nos.2/3, pp. 135-168, citeseer.ist.psu.eduschapire00boostexter. html , 2000.
[24] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” Proc. Workshop Learning for Text Categorization (AAAI '98), citeseer.ist.psu.edumccallum98com parison.html , 1998.
[25] D. Vilar, M.J. Castro, and E. Sanchis, “Multi-Label Text Classification Using Multinomial Models,” Proc. Fourth Int'l Conf. España for Natural Language Processing (EsTAL '04), Oct. 2004.
[26] S. Gao, W. Wu, C.-H. Lee, and T.-S. Chua, “A MFoM Learning Approach to Robust Multiclass Multi-Label Text Categorization,” Proc. 21st Int'l Conf. Machine Learning (ICML '04), pp. 329-336, 2004.
[27] S. Godbole and S. Sarawagi, “Discriminative Methods for Multi-Labeled Classification,” Proc. Eighth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), pp. 22-30, 2004.
[28] S. Zhu, X. Ji, W. Xu, and Y. Gong, “Multi-Labelled Classification Using Maximum Entropy Method,” Proc. ACM SIGIR '05, pp. 274-281, 2005.
[29] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J.Computer and System Sciences, vol. 55, no. 1, pp. 119-139, Aug. 1997.
[30] G. Holmes, B. Pfahringer, R. Kirkby, E. Frank, and M. Hall, “Multiclass Alternating Decision Trees,” Proc. 13th European Conf. Machine Learning (ECML '02), citeseer.ist.psu.eduholmes02multiclass.html , 2002.
[31] F. De Comité, R. Gilleron, and M. Tommasi, “Learning Multi-Label Alternating Decision Trees and Applications,” Proc. Conf. Apprentissage Automatique (CAP '01), pp. 195-210, 2001.
[32] F. De Comité, R. Gilleron, and M. Tommasi, “Learning Multi-Label Alternating Decision Trees from Texts and Data,” Proc. Third Int'l Conf. Machine Learning and Data Mining (MLDM '03), pp. 35-49, 2003.
[33] H. Chen and T.K. Ho, “Evaluation of Decision Forests on Text Categorization,” Proc. Seventh SPIE Conf. Document Recognition and Retrieval, pp. 191-199, citeseer.ist.psu.educhen00evaluation.html , 2000.
[34] L. Diao, K. Hu, Y. Lu, and C. Shi, “Boosting Simple Decision Trees with Bayesian Learning for Text Categorization,” Proc. World Congress on Intelligent Control and Automation, pp. 321-325, 2002.
[35] J. Chen, X. Zhou, and Z. Wu, “A Multi-Label Chinese Text Categorization System Based on Boosting Algorithm,” Proc. IEEE Int'l Conf. Computer and Information Technology (CIT '04), pp. 1153-1158, 2004.
[36] F. Roli and G. Giacinto, Design of Multiple Classifier Systems, pp.199-226, citeseer.ist.psu.edu552125.html, 2002.
[37] A.F.R. Rahman, H. Alam, and M.C. Fairhurst, “Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variations,” p. 167, http://link.springer-ny.com/link/service/ series/0558/bibs/242324230167.htm; http://link.springer-ny.com/link/service/ series/0558/bibs/2423/24230167.htmhttp:/ /link.springer-ny.com/link/service/series/ 0558/papers/242324230167.pdf, 2002.
[38] J. Fürnkranz, “Hyperlink Ensembles: A Case Study in Hypertext Classification,” Information Fusion, vol. 3, no. 4, pp. 299-312, , Dec. 2002.
[39] V.S. Uren and T.R. Addis, “How Weak Categorizers Based upon Different Principles Strengthen Performance,” The Computer J., vol. 45, no. 5, pp. 511-524, 2002.
[40] P.N. Bennett, S.T. Dumais, and E. Horvitz, “The Combination of Text Classifiers Using Reliability Indicators,” Information Retrieval, vol. 8, no. 1, pp. 67-100, 2005.
[41] G. Shafer, A Mathematical Theory of Evidence. Princeton Univ. Press, 1976.
[42] D. Bahler and L. Navarro, “Methods for Combining Heterogeneous Sets of Classifiers,” Proc. Nat'l Conf. Artificial Intelligence (AAAI) Workshop New Research Problems for Machine Learning, citeseer.ist.psu.edu470241.html, 2000.
[43] A. Al-Ani and M. Deriche, “A New Technique for Combining Multiple Classifiers Using the Dempster-Shafer Theory of Evidence,” J. Artificial Intelligence Research, vol. 17, pp. 333-361, Nov. 2002.
[44] Y. Bi, D. Bell, H. Wang, G. Guo, and K. Greer, “Combining Multiple Classifiers Using Dempster's Rule of Combination for Text Categorization,” Proc. First Int'l Conf. Modeling Decisions for Artificial Intelligence (MDAI '04), pp. 127-138, 2004.
[45] Y. Bi, S. McClean, and T. Anderson, “Improving Classification Decisions by Multiple Knowledge,” Proc. 17th IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI '05), pp. 340-347, 2005.
[46] D.A. Bell, J. Guan, and Y. Bi, “On Combining Classifier Mass Functions for Text Categorization,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 10, pp. 1307-1319, Oct. 2005.
[47] H. Altnçay, “A Dempster-Shafer Theoretic Framework for Boosting-Based Ensemble Design,” Pattern Analysis and Applications, vol. 8, no. 3, pp. 287-302, Dec. 2005.
[48] C.J. van Rijsbergen, Information Retrieval, second ed. Butterworths, http://www.dcs.gla.ac.uk/∼iain/keith index.htm, 1979.
[49] Y. Yang, “An Evaluation of Statistical Approaches to Text Categorization,” Information Retrieval, vol. 1, nos. 1/2, pp. 69-90, citeseer.ist.psu.edu/articleyang98evaluation.html , 1999.
[50] X. Shen, M. Boutell, J. Luo, and C. Brown, “Multi-Label Machine Learning and Its Application to Semantic Scene Classification,” Proc. Int'l Symp. Electronic Imaging, citeseer.ist.psu.edu699302.html, Jan. 2004.
[51] M.-L. Zhang and Z.-H. Zhou, “A $k$ -Nearest Neighbor Based Algorithm for Multi-Label Classification,” Proc. First IEEE Int'l Conf. Granular Computing (GrC '05), vol. 2, pp. 718-721, , July 2005.
[52] S. Blackman and R. Popoli, Design and Analysis of Modern Tracking Systems. Artech House, 1999.
[53] Y. Yang and J.O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization,” Proc. 14th Int'l Conf. Machine Learning (ICML '97), pp. 412-420, citeseer.ist.psu.eduyang97 comparative.html , 1997.
Index Terms:
Machine Learning, text categorization, multi-label examples, data fusion, Dempster-Shafer Theory.
Citation:
Kanoksri Sarinnapakorn, Miroslav Kubat, "Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 12, pp. 1638-1651, Aug. 2007, doi:10.1109/TKDE.2007.190663