Abstract
Predicting protein functions is an important issue in the post-genomic era. This paper studies several network-based kernels including local linear embedding (LLE) kernel method, diffusion kernel and laplacian kernel to uncover the relationship between proteins functions and protein-protein interactions (PPI). The author first construct kernels based on PPI networks, then apply support vector machine (SVM) techniques to classify proteins into different functional groups. The 5-fold cross validation is then applied to the selected 359 GO terms to compare the performance of different kernels and guilt-by-association methods including neighbor counting methods and Chi-square methods. Finally, the authors conduct predictions of functions of some unknown genes and verify the preciseness of our prediction in part by the information of other data source.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
W. Kim, C. Krumpelman, and E. Marcotte, Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy, Genome Biology, 2008, 9(Suppl 1): S5.
E. Marcotte, M. Pellegrini, M. Thompson, et al., A combined algorithm for genome-wide prediction of protein function, Nature, 1999, 402: 83–86.
E. Marcotte, M. Pellegrini, N. H. Ricq, et al., Detecting protein function and protein-protein interactions from genome sequences, Science, 1999, 285: 751–753.
J. Watson, R. Laskowski, and J. Thornton, Predicting protein function from sequence and structural data, Current Opinion in Structural Biology, 2005, 15: 275–284.
X. Zhao, Y. Wang, L. Chen, and K. Aihara, Gene function prediction using labeled and unlabeled data, BMC Bioinformatics, 2008, 9: 57.
M. Brown, W. Grundy, D. Lin, et al., Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. Sci., 2000, 97: 262–267.
M. Eisen, P. Spellman, P. Brown, and D. Bostein, Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci., 1998, 95: 14863–14868.
W. Ching, L. Li, N. Tsing, et al., A weighted local least squares imputation method for missing value estimation in microarray gene expression data, International Journal of Data Mining and Bioinformatics, 2010, 4(3): 331–347.
B. Schwikowski, P. Uetz, and S. Fields, A network of protein protein interactions in yeast, Nat. Biotechnol, 2000, 18: 1257–1261.
H. Hishigaki, K. Nakai, T. Ono, et al., Assessment of prediction accuracy of protein function from protein-protein interaction data, Yeast, 2001, 18: 523–531.
A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani, Global protein function prediction from proteinCprotein interaction networks, Nat. Biotechnol., 2003, 21: 697–700.
U. Karaoz, T. Murali, S. Letovsky, et al., Whole-genome annotation by using evidence integration in functional-linkage networks, Proc. Natl. Acad. Sci., 2004, 101: 2888–2893.
E. Nabieva, K. Jim, A. Agarwal, et al., Whole-proteome prediction of protein function via graphtheoretic analysis of interaction maps, Bioinformatics, 2005, 21(Suppl 1): 302–310.
M. Deng, Z. Tu, F. Sun, and T. Chen, Mapping gene ontology to proteins based on protein-protein interaction data, Bioinformatics, 2003, 20: 895–902.
J. David and J. Robert, A simple generalisation of the area under the ROC curve for multiple class classification problems, Machine Learning, 2001, 45: 171–186.
H. Lee, Z. Tu, M. Sun, et al., Diffusion Kernel-based logistic regression models for protein function prediction, OMICS, a Journal of Integrative Biology, 2006, 1(10): 40–55.
R. Kondor and J. Lafferty, Diffusion kernels on graphs and other discrete input spaces, Proc Int Conf Machine Learning, 2002: 315–322.
R. Lanckriet, M. Deng, M. Cristianini, et al., Kernel-based data fusion and its application to protein function prediction in yeast, Proceedings of the Pacific Symposium on Biocomputing, 2004, January 3–8, 300–311.
J. Ham, D. Lee, S. Mika, and B. Scholkopf, A kernel view of the dimensionality reduction of manifolds, Proceedings of the Twenty-First International Conference on Machine Learning, (AAAI Press, Menlo Park, CA), 2004: 47–54.
R. Sam and S. Lawrence, Nonlinear dimensionality reduction by locally linear embedding, Science, 2000, 290: 2323–2326.
U. Guldener, M. Munsterkotter, M. Oesterheld, et al., MPact: The MIPS protein interaction resource on yeast, Nucleic Acids Res., 2006, 34: 436–441.
A. Ruepp, et al., The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes, Nucl. Acids. Res., 2004, 32: 5539–5545.
W. Ching, L. Li, Y. Chan, and H. Mamitsika, A Study of network-based kernel methods on protein-protein interaction for protein functions prediction, The Third International Symposium on Optimization and Systems Biology (OSB 2009), Lecture Notes in Operations Research, Series 11, 2009, 11: 25–32.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported in part by HKRGC Grant 7017/07P, HKU CRCG Grants, HKU strategic theme grant on computational sciences, HKU Hung Hing Ying Physical Science Research Grant, National Natural Science Foundation of China Grant No. 10971075 and Guangdong Provincial Natural Science Grant No. 9151063101000021.
Rights and permissions
About this article
Cite this article
Li, L., Ching, W., Chan, Y. et al. On network-based kernel methods for protein-protein interactions with applications in protein functions prediction. J Syst Sci Complex 23, 917–930 (2010). https://doi.org/10.1007/s11424-010-0207-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-010-0207-y