loading...
Is random model better? On its accuracy and efficiency
Melbourne, Florida November 19-November 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2003.1250902Third IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Wei Fan, IBM T.J.Watson Research
Haixun Wang, IBM T.J.Watson Research
Philip S. Yu, IBM T.J.Watson Research
Sheng Ma, IBM T.J.Watson Research
Inductive learning searches an optimal hypothesis that minimizes a given loss function. It is usually assumed that the simplest hypothesis that fits the data is the best approximate to an optimal hypothesis. Since finding the simplest hypothesis is NP-hard for most representations, we generally employ various heuristics to search its closest match. Computing these heuristics incurs significant cost, making learning inefficient and unscalable for large dataset. In the same time, it is still questionable if the simplest hypothesis is indeed the closest approximate to the optimal model. Recent success of combining multiple models, such as bagging, boosting and meta-learning, has greatly improved the accuracy of the simplest hypothesis, providing a strong argument against the optimality of the simplest hypothesis. However, computing these combined hypotheses incurs significantly higher cost. In this paper, we first advert that as long as the error of a hypothesis on each example is within a range dictated by a given loss function, it can still be optimal. Contrary to common beliefs, we propose a completely random decision tree algorithm that achieves much higher accuracy than the single best hypothesis and is comparable to boosted or bagged multiple best hypotheses. The advantage of multiple random tree is its training efficiency as well as minimal memory requirement.
Citation:
Wei Fan, Haixun Wang, Philip S. Yu, Sheng Ma, "Is random model better? On its accuracy and efficiency," icdm, pp.51, Third IEEE International Conference on Data Mining (ICDM'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.