Abstract
This paper addresses the classification task of data mining (a form of supervised learning) in the context of an important bioinformatics problem, namely the prediction of protein functions. This problem is cast as a hierarchical classification problem. The protein functions to be predicted correspond to classes that are arranged in a hierarchical structure (this takes the form of a class tree). The main contribution of this paper is to propose a new Artificial Immune System that creates a new representation for proteins, in order to maximize the predictive accuracy of a hierarchical classification algorithm applied to the corresponding protein function prediction problem.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Cui, J., et al.: Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. Mol. Immunol. 44, 514–20 (2007)
Tong, J.C., Tammi, M.T.: Prediction of protein allergenicity using local descriptions of amino acid sequence. Front. Biosci. 13, 6072–6078 (2008)
Chothia, C., Finkelstein, A.V.: The classification and origins of protein folding patterns. Ann. Rev. Biochem. 59, 1007–1035 (1990)
Christopoulos, A., Kenakin, T.: G protein-coupled receptor allosterism and complexing. Pharmacol. Rev. 54, 323–374 (2002)
Gether, U., et al.: Structural basis for activation of G-protein-coupled receptors. Pharm. Toxicol. 91, 304–312 (2002)
Bissantz, C.: Conformational changes of G protein-coupled receptors during their activation by agonist binding. J. Recept. Signal Transduct. Res. 23, 123–153 (2003)
Hebert, T.E., Bouvier, M.: Structural and functional aspects of G protein-coupled receptor oligomerization. Biochemical Cell Biology 76, 1–11 (1998)
Schoneberg, T., et al.: Mutant G-protein-coupled receptors as a cause of human diseases. Pharmacol. Ther. 104, 173–206 (2004)
Klabunde, T., Hessler, G.: Drug design strategies for targeting G-protein coupled receptors. ChemBioChem 3, 928–944 (2002)
Kolakowski Jr., L.F.: Gcrdb: A G-protein-coupled receptor database. Recept. Channels 2, 1–7 (1994)
Attwood, T.K., Findlay, J.B.: Design of a discriminating fingerprint for G-protein-coupled receptors. Protein Eng. 6, 167–176 (1993)
Attwood, T.K., Findlay, J.B.: Fingerprinting G-protein-coupled receptors. Protein Eng. 7, 195–203 (1994)
Davies, M.N., et al.: On the hierarchical classification of G protein-coupled receptors. Bioinformatics 23(23), 3113–3118 (2007)
Secker, A., et al.: An experimental comparison of classification algorithms for the hierarchical prediction of protein function. Expert Update (Magazine of the British Computer Society’s Specialist Group on AI), Special Issue on the 3rd UK KDD (Knowledge Discovery and Data Mining) Symposium 9(3), 17–22 (2007)
Sandberg, M., et al.: New chemical descriptors relevant for the design of biologically active peptides. A Multivariate Characterization of 87 Amino Acids. J. Med. Chem. 41(14), 2481–2491 (1998)
Guan, P., et al.: Analysis of peptide-protein binding using amino acid descriptors: Prediction and experimental verification for human histocompatibility complex Hla-A0201. J. Med. Chem. 48(23), 7418–7425 (2005)
Davies, M.N., et al.: Proteomic applications of automated GPCR classification. Proteomics 7(16), 2800–2814 (2007)
Davies, M.N., et al.: Optimizing amino acid groupings for GPCR classification. Bioinformatics 24(18), 1980–1986 (2008)
Li, T., et al.: Reduction of protein sequence complexity by residue grouping. Protein Eng. 16(5), 323–330 (2003)
Cannata, N., et al.: Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics 18(8), 1102–1108 (2002)
Luthra, A., et al.: A method for computing the inter-residue interaction potentials for reduced amino acid alphabet. Biosciences 32, 883–889 (2007)
Freitas, A.A., de Carvalho, A.C.P.L.F.: A tutorial on hierarchical classification with applications in bioinformatics. In: Taniar, D. (ed.) Research and Trends in Data Mining Technologies and Applications, pp. 175–208. Idea Group (2007)
de Castro, L.N., Timmis, J.: Artificial Immune Systems: A New Computational Intelligence Approach. Springer, New York (2002)
Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, New York (2002)
de Castro, L.N., Timmis, J.: An artificial immune network for multimodal optimisation. In: Proceedings of: 2002 congress on evolutionary computation (CEC 2002). Part of the 2002 IEEE world congress on computational intelligence, pp. 699–704 (2002)
Andrews, P.: Opt-Ainet source code in Java. Accessed October 2007 (2005)
Andrews, P.S., Timmis, J.: On diversity and artificial immune systems: Incorporating a diversity operator into Ainet. In: Proceedings of: International workshop on natural and artificial immune systems (NAIS), pp. 293–306 (2005)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Brownlee, J.: Weka classification algorithms. Version 1.6. Accessed February 2007 (2006)
Keerthi, S.S., et al.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput. 13(3), 637–649 (2001)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of: Fifteenth international conference on machine learning (1998)
Watkins, A., Timmis, J.: Artificial immune recognition system (AIRS): Revisions and refinements. In: Proceedings of: 1st International conference on artificial immune systems (ICARIS 2002), pp. 173–181 (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Secker, A., Davies, M.N., Freitas, A.A. et al. An Artificial Immune System for Clustering Amino Acids in the Context of Protein Function Classification. J Math Model Algor 8, 103–123 (2009). https://doi.org/10.1007/s10852-009-9107-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10852-009-9107-3