loading...
Alternate Representation of Distance Matrices for Characterization of Protein Structure
Houston, Texas November 27-November 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2005.19Fifth IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Keith Marsolo, Ohio State University
Srinivasan Parthasarathy, Ohio State University
The most suitable method for the automated classification of protein structures remains an open problem in computational biology. In order to classify a protein structure with any accuracy, an effective representation must be chosen. Here we present two methods of representing protein structure. One involves representing the distances between the C? atoms of a protein as a two-dimensional matrix and creating a model of the resulting surface with Zernike polynomials. The second uses a wavelet-based approach. We convert the distances between a protein?s Cα atoms into a one-dimensional signal which is then decomposed using a discrete wavelet transformation. Using the Zernike co-efficients and the approximation coefficients of the wavelet decomposition as feature vectors, we test the effectiveness of our representation with two different classifiers on a dataset of more than 600 proteins taken from the 27 most-populated SCOP folds. We find that the wavelet decomposition greatly outperforms the Zernike model.With the wavelet representation, we achieve an accuracy of approximately 56%, roughly 12% higher than results reported on a similar, but less-challenging dataset. In addition, we can couple our structure-based feature vectors with several sequence-based properties to increase accuracy another 5-7%. Finally, we use a multi-stage classification strategy on the combined features to increase performance to 78%, an improvement in accuracy of more than 15-20% and 34% over the highest reported sequence-based and structure-based classification results, respectively.
Citation:
Keith Marsolo, Srinivasan Parthasarathy, "Alternate Representation of Distance Matrices for Characterization of Protein Structure," icdm, pp.298-305, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.