We introduce a new clustering algorithm called WINP for very large databases. Two different sizes of handling objects were used in WINP to acquire high accuracy and efficiency. WINP creates a window to detect approximate locations of clusters before accurate clustering processing. Clustering on these locations will reduce a lot of computations and get a good performance. WINP is the first algorithm to realize both incremental clustering and distributed parallel clustering. The advantages of our new approach are (1) It is very efficient; (2) It realizes distributed parallel processing and can be run on a number of workstations connected via local area network; (3) It introduces a novel incremental clustering method for new coming data in an already processed database; (4) It is effective in discovering clusters of arbitrary shape; (5) It is not sensitive to noise; and (6) It has some ability to deal with high dimensional points.
Citation:
Zhang Qiang, Zhao Zheng, Sun Zhi Wei, Edward Daley, "WINP: A Window-Based Incremental and Parallel Clustering Algorithm for Very Large Databases," ictai, pp.169-176, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05), 2005