Optimal Cluster Preserving Embedding of Nonmetric Proximity Data
|
Abstract—For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concerning the problem of unsupervised structure detection or clustering, in this paper, a new embedding method for pairwise data into Euclidean vector spaces is introduced. We show that all clustering methods, which are invariant under additive shifts of the pairwise proximities, can be reformulated as grouping problems in Euclidian spaces. The most prominent property of this constant shift embedding framework is the complete preservation of the cluster structure in the embedding space. Restating pairwise clustering problems in vector spaces has several important consequences, such as the statistical description of the clusters by way of cluster prototypes, the generic extension of the grouping procedure to a discriminative prediction rule, and the applicability of standard preprocessing methods like denoising or dimensionality reduction.
[1] 1540 I.T. Jolliffe, Principal Component Analysis. New York: Springer-Verlag, 1986.
[2] K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An Introduction to Kernel-Based Learning Algorithms,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 181-201, 2001.
[3] D.W. Jacobs, D. Weinshall, and Y. Gdalyahu, Classification with Nonmetric Distances: Image Retrieval and Class Representation IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 583-600, June 2000.
[4] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. John Wiley&Sons, second ed., 2001.
[5] M.N. Murty, A.K. Jain, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[6] T.F. Cox and M.A.A. Cox, Multidimensional Scaling. London: Chapman&Hall, 2001.
[7] Y. Takane, F.W. Young, and J. de Leeuw, Nonmetric Individual Differences Multidimensional Scaling: An Alternating Least Squares Method with Optimal Scaling Features Psychometrica, vol. 42, pp. 7-67, 1977.
[8] J. Shi and J. Malik, Normalized Cuts and Image Segmentation IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[9] J. Puzicha, T. Hofmann, and J. Buhmann, A Theory of Proximity Based Clustering: Structure Detection by Optimization Pattern Recognition, vol. 33, no. 4, pp. 617-634, 1999.
[10] T. Hofmann and M. Buhmann, Pairwise Data Clustering by Deterministic Annealing IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 1-14, Jan. 1997.
[11] P. Brucker, On the Complexity of Clustering Problems Optimization and Operations Research: Lecture Notes in Economics and Math. Systems, M. Beckman and H.P. Kunzi, eds. pp. 45-54, Springer, 1978.
[12] W.S. Torgerson, Theory and Methods of Scaling. New York: John Wiley and Sons, 1958.
[13] G. Young and A.S. Householder, Discussion of a Set of Points in Terms of Their Mutual Distances Psychometrika, vol. 3, pp. 19-22, 1938.
[14] B. Schölkopf, A. Smola, and K.-R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, Vol. 10, 1998, pp. 1299-1319.
[15] S. Mika, B. Schölkopf, A. Smola, K.-R. Müller, S. Mika, M. Scholz, and G. Rätsch, Kernel PCA and De-Noising in Feature Spaces Advances in Neural Information Processing Systems, M.S. Kearns, S.A. Solla, and D.A. Cohn, eds., vol. 11, pp. 536-542, MIT Press, 1999.
[16] P.J. Huber, Projection Pursuit The Annals of Statistics, vol. 13, no. 2, pp. 435-475, 1985.
[17] S. Roweis and L. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding Science, vol. 290, pp. 2323-2326, 2000.
[18] J.B. Tenenbaum, V. Silva, and J.C. Langford, A Global Geometric Framework for Nonlinear Dimensionality Reduction Science, vol. 290, pp. 2319-2323, 2000.
[19] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag, 1995.
[20] P. Drineas et al., "Clustering in Large Graphs and Matrices," Proc. Symp. Discrete Algorithms, SIAM, Philadelphia, 1999, pp. 291-299; .
[21] K. Rose, E. Gurewitz, and G.C. Fox, A Deterministic Annealing Approach to Clustering Pattern Recognition Letters, vol. 11, no. 9, pp. 589-594, 1990.
[22] B. Boeckmann, A. Bairoch, R. Apweiler, M.-C. Blatter, A. Estreicher, E. Gasteiger, M.J. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout, and M. Schneider, The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL Nucleic Acids Research, vol. 31, pp. 365-370, 2003.
[23] W.R. Pearson and D.J. Lipman, Improved Tools for Biological Sequence Analysis Proc. Nat'l Academy of Sciences, vol. 85, pp. 2444-2448, 1988.
[24] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis. Cambridge Univ. Press, 1998.
[25] S. Dudoit and J. Fridlyand, A Prediction-Based Resampling Method for Estimating the Number of Clusters in a Data Set Genome Biology, vol. 3, no. 7, 2002.
[26] T. Lange, M. Braun, V. Roth, and J.M. Buhmann, Stability-Based Model Selection Proc. Conf. Neural Information Processing Systems, to appear, 2003.
[27] P. Soundararajan and S. Sarkar, Investigation of Measures for Grouping by Graph Partitioning Proc. Conf. Computer Vision and Pattern Recognition, pp. 239-246, 2001.
Index Terms:
Clustering, pairwise proximity data, cost function, embedding, MDS.
Citation:
Volker Roth, Julian Laub, Motoaki Kawanabe, Joachim M. Buhmann, "Optimal Cluster Preserving Embedding of Nonmetric Proximity Data," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1540-1551, Dec. 2003, doi:10.1109/TPAMI.2003.1251147