In this paper, we propose a clustering method "FlexDice" for large high-dimensional datasets. A clustering method is one technique of data mining, and it attracts attention as a technique of extracting useful information from large high-dimensional datasets. There have already been many clustering methods. In the present computer environment, however, it might be difficult for existing clustering methods to find clusters from a dataset with 50-dimensions and 100,000 data objects within one minute. The clustering method FlexDice proposed in this paper can realize efficient clustering for a large high-dimensional database.
The data structure used in FlexDice is a graph-structure. Its data structure and the data structure of Quadtree have a few same features, but they have some crucial differences. The most crucial difference is that the data structure of Quadtree is a tree-structure while the data structure used in FlexDice is a graph-structure. In this paper, we show the differences between these structures.
Quadtree is a tree-structure, and the algorithm constructing it forms cells hierarchically by dividing data object spaces in a top-down manner. That is why traversing operations from the root of the tree to each of its leaves is necessary in such methods of searching for, or indexing of data objects. In contrast to the case of Quadtree, no tree-structure is required in the algorithm FlexDice, because such traversing operations are unnecessary. However, in the clustering method, relevant cells which include each of the similar data objects must be merged, instead of choosing a hyper-dividing plane. Hence, FlexDice creates neighboring links among relevant cells in every layer after dividing cells, and merges such cells including similar data objects. To reduce memory usage, FlexDice dynamically removes worthless cells, and maintains only worthwhile cells including data objects and parent cells needed for creating neighboring links of worthwhile cells. After neighboring links among worthwhile cells are created, these parent cells needed for creating neighboring links of worthwhile cells are removed from memory.
In this paper, we present dissimilarity between the data structure used in FlexDice and the structure of Quadtree, and show that the data structure used in FlexDice is suitable for clustering.