In this paper we consider incomplete data sets, i.e., data sets with missing attribute values. Two different types of missing attribute values are studied: lost and "do not care". Furthermore, three definitions of approximations are dis- cussed: singleton, subset, and concept. Theoretically, sin- gleton approximations should not be used in data mining since concepts approximated by singleton approximations are not definable. However, we conducted a number of experiments on 44 different incomplete data sets using all three approximation definitions and our results show that none of these approximations is superior to the other.
Citation:
Jerzy W. Grzymala-Busse, Witold J. Grzymala-Busse, Zdzislaw S. Hippe, Wojciech Rzasa, "A Comparison of Three Approximation Strategies for Incomplete Data Sets," grc, pp.301, 2007 IEEE International Conference on Granular Computing (GRC 2007), 2007