In this paper, we propose a novel technique for the efficient prediction of multiple continuous target variables from high-dimensional and heterogeneous data sets using a hierarchical clustering approach. The proposed approach consists of three phases applied recursively: partitioning, localization and prediction. In the partitioning step, similar target variables are grouped together by a clustering algorithm. In the localization step, a classification model is used to predict which group of target variables is of particular interest. If the identified group of target variables still contains a large number of target variables, the partitioning and localization steps are repeated recursively and the identified group is further split into subgroups with more similar target variables. When the number of target variables per identified subgroup is sufficiently small, the third step predicts target variables using localized prediction models built from only those data records that correspond to the particular subgroup. Experiments performed on the problem of damage prediction in complex mechanical structures indicate that our proposed hierarchical approach is computationally more efficient and more accurate than straightforward methods of predicting each target variable individually or simultaneously using global prediction models.
Citation:
Aleksandar Lazarevic, Ramdev Kanapady, Chandrika Kamath, Vipin Kumar, Kumar Tamma, "Localized Prediction of Continuous Target Variables Using Hierarchical Clustering," icdm, pp.139, Third IEEE International Conference on Data Mining (ICDM'03), 2003