In is paper, a novel algorithm for clustering data streams with mixed numeric and categorical attributes (CNC-Stream)is proposed. A new similarity measure based on entropy determining the similarity between the objects (data points in the stream or the micro-clusters in memory) is also presented here, which makes CNC-Stream work. The experiments conducted on the real data sets and synthetic data sets show that the proposed method is of high quality.
Index Terms:
Entropy, Cluster, Data Stream, Mix Attributes
Citation:
Shuyun Wang, Yingjie Fan, Chenghong Zhang, HeXiang Xu, Xiulan Hao, Yunfa Hu, "Entropy Based Clustering of Data Streams with Mixed Numeric and Categorical Values," icis, pp.140-145, Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008), 2008