A parallel genetic programming approach to induce decision trees in large data sets is presented. A population of trees is evolved by employing the genetic operators and every individual is evaluated by using a fitness function based on the J-measure. The method is able to deal with large data sets since it uses a parallel implementation of genetic programming through the grid model. Experiments on data sets from the UCI machine learning repository show better results with respect to C5. Furthermore, performance results show a nearly linear speedup.
Index Terms:
Decision Trees, Genetic programming, classification, parallel processing
Citation:
G. Folino, C. Pizzuti, G. Spezzano, "Improving Induction Decision Trees with Parallel Genetic Programming," pdp, pp.0181, 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing (EUROMICRO-PDP 2002), 2002