loading...
A Clustering Method Using an Irregular Size Cell Graph
Tokyo, Japan April 03-April 04
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/RIDE.2005.515th International Workshop on Resear ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Tomotake Nakamura, Hiroshima City University
Yoko Kamidoi, Hiroshima City University
Shin?ichi Wakabayashi, Hiroshima City University
Noriyoshi Yoshida, Hiroshima City University

In this paper, we propose a clustering method "FlexDice" for large high-dimensional datasets. A clustering method is one technique of data mining, and it attracts attention as a technique of extracting useful information from large high-dimensional datasets. There have already been many clustering methods. In the present computer environment, however, it might be difficult for existing clustering methods to find clusters from a dataset with 50-dimensions and 100,000 data objects within one minute. The clustering method FlexDice proposed in this paper can realize efficient clustering for a large high-dimensional database.

The data structure used in FlexDice is a graph-structure. Its data structure and the data structure of Quadtree have a few same features, but they have some crucial differences. The most crucial difference is that the data structure of Quadtree is a tree-structure while the data structure used in FlexDice is a graph-structure. In this paper, we show the differences between these structures.

Quadtree is a tree-structure, and the algorithm constructing it forms cells hierarchically by dividing data object spaces in a top-down manner. That is why traversing operations from the root of the tree to each of its leaves is necessary in such methods of searching for, or indexing of data objects. In contrast to the case of Quadtree, no tree-structure is required in the algorithm FlexDice, because such traversing operations are unnecessary. However, in the clustering method, relevant cells which include each of the similar data objects must be merged, instead of choosing a hyper-dividing plane. Hence, FlexDice creates neighboring links among relevant cells in every layer after dividing cells, and merges such cells including similar data objects. To reduce memory usage, FlexDice dynamically removes worthless cells, and maintains only worthwhile cells including data objects and parent cells needed for creating neighboring links of worthwhile cells. After neighboring links among worthwhile cells are created, these parent cells needed for creating neighboring links of worthwhile cells are removed from memory.

In this paper, we present dissimilarity between the data structure used in FlexDice and the structure of Quadtree, and show that the data structure used in FlexDice is suitable for clustering.

Citation:
Tomotake Nakamura, Yoko Kamidoi, Shin?ichi Wakabayashi, Noriyoshi Yoshida, "A Clustering Method Using an Irregular Size Cell Graph," ride, pp.19-26, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.