Abstract
Protein Subcellular Localization (PSL) prediction of recently evolved Unknown Protein Sequence (UPS) is vital for understanding the protein functions. Although PSL provides insight into the prediction of harmful and useful characteristics, diagnosis of disease and drug design. In the present work One-Hot-Encoding (OHE) and Convolutional Neural Network (CNN) based OCNN model is proposed for the functional characterization of protein sequence through the PSL. Gram-Positive (G+) dataset with 473 known protein sequence samples including four subcellular localizations is used for the training and validation of the OCNN model. As essential preprocessing raw protein sequence has been encoded using OHE, as well as the length of the encoded sequence are standardized and normalized through padding and capping. Next, encoded and standardized protein sequence samples are convoluted in the hidden layer of the OCNN model using ReLU, TanH, and Sigmoid activation function. After that Adam and Stochastic Gradient Decent (SGD) optimization function are utilized for the PSL prediction of the protein sequence samples. OCNN model achieved 92.94% of accuracy through combination of Sigmoid, Softmax, and Adam functions with known protein sequences. The validated OCNN model can be further utilized for the function prediction of UPS, where 64.83% accuracy is achieved through the combination of ReLU, Softmax, and Adam functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lei X, Zhao J, Fujita H, Zhang A (2018) Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. Knowl-Based Syst 151:136–148. https://doi.org/10.1016/j.knosys.2018.03.027
Guo H, Liu B, Cai D, Lu T (2018) Predicting protein–protein interaction sites using modified support vector machine. Int J Mach Learn Cybern 9:393–398. https://doi.org/10.1007/s13042-015-0450-6
Sureyya Rifaioglu A, Doğan T, Jesus Martin M, Cetin-Atalay R, Atalay V (2019) DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks. Sci Rep 9:1–16.https://doi.org/10.1038/s41598-019-43708-3
Zhang J, Yang JR (2015) Determinants of the rate of protein sequence evolution. Nat Rev Genet 16:409–420. https://doi.org/10.1038/nrg3950
Tahir M, Khan A (2016) Protein subcellular localization of fluorescence microscopy images: employing new statistical and Texton based image features and SVM based ensemble classification. Inf Sci 345:65–80. https://doi.org/10.1016/j.ins.2016.01.064
Wan S, Mak MW (2018) Predicting subcellular localization of multi-location proteins by improving support vector machines with an adaptive-decision scheme. Int J Mach Learn Cybern 9:399–411. https://doi.org/10.1007/s13042-015-0460-4
Ranjan A, Fahad MS, Fernandez-Baca D, Deepak A, Tripathi S (2019) Deep robust framework for protein function prediction using variable-length protein sequences. IEEE/ACM Trans Comput Biol Bioinf 1–1. https://doi.org/10.1109/tcbb.2019.2911609
Almagro Armenteros JJ, Sønderby CK, Sønderby SK, Nielsen H, Winther O (2017) DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics (Oxford, England) 33:3387–3395.https://doi.org/10.1093/bioinformatics/btx431
Agrawal S, Sisodia DS, Nagwani NK (2021) Augmented sequence features and subcellular localization for functional characterization of unknown protein sequences. Med Biol Eng Comput 2297–2310. https://doi.org/10.1007/s11517-021-02436-5
Shi Q, Chen W, Huang S, Wang Y, Xue Z (2019) Deep learning for mining protein data. Brief Bioinform 1–25. https://doi.org/10.1093/bib/bbz156
Wang Y, Li Y, Song Y, Rong X (2020) The influence of the activation function in a convolution neural network model of facial expression recognition. Appl Sci (Switzerland) 10. https://doi.org/10.3390/app10051897
Vassallo K, Garg L, Prakash V, Ramesh K (2019) Contemporary technologies and methods for cross-platform application development. J Comput Theor Nanosci 16:3854–3859. https://doi.org/10.1166/jctn.2019.8261
Shanmugham B, Pan A (2013) Identification and characterization of potential therapeutic candidates in emerging human pathogen mycobacterium abscessus: a novel hierarchical In Silico approach. PLoS ONE 8. https://doi.org/10.1371/journal.pone.0059126
Audagnotto M, Dal Peraro M (2017) Protein post-translational modifications: In silico prediction tools and molecular modeling. Comput Struct Biotechnol J 15:307–319. https://doi.org/10.1016/j.csbj.2017.03.004
Mondal SI, Ferdous S, Jewel NA, Akter A, Mahmud Z, Islam MM, Afrin T, Karim N (2015) Identification of potential drug targets by subtractive genome analysis of Escherichia coli O157:H7: an in silico approach. Adv Appl Bioinform Chem 8:49–63. https://doi.org/10.2147/AABC.S88522
Weimer A, Kohlstedt M, Volke DC, Nikel PI, Wittmann C (2020) Industrial biotechnology of Pseudomonas putida: advances and prospects. Appl Microbiol Biotechnol 104:7745–7766. https://doi.org/10.1007/s00253-020-10811-9
Zhang T, Ding Y, Chou KC (2006) Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence. Comput Biol Chem 30:367–371. https://doi.org/10.1016/j.compbiolchem.2006.08.003
Agrawal S, Sisodia DS, Nagwani NK (2021) Long short term memory based functional characterization model for unknown protein sequences using ensemble of shallow and deep features. Neural Comput Appl 4. https://doi.org/10.1007/s00521-021-06674-4
Elabd H, Bromberg Y, Hoarfrost A, Lenz T, Franke A, Wendorff M (2020) Amino acid encoding for deep learning applications. BMC Bioinform 21:1–14. https://doi.org/10.1186/s12859-020-03546-x
Giri SJ, Dutta P, Halani P, Saha S (2021) MultiPredGO: deep multi-modal protein function prediction by amalgamating protein structure, sequence, and interaction information. IEEE J Biomed Health Inform 25:1832–1838. https://doi.org/10.1109/JBHI.2020.3022806
Choong ACH, Lee NK (2017) Evaluation of convolutionary neural networks modeling of DNA sequences using ordinal versus one-hot encoding method. In: 1st international conference on computer and drone applications: ethical integration of computer and drone technology for humanity sustainability, IConDA 2017. 2018 Jan, pp 60–65. https://doi.org/10.1109/ICONDA.2017.8270400.
Sønderby SK, Sønderby CK, Nielsen H, Winther O (2015) Convolutional LSTM networks for subcellular localization of proteins. Lect Notes Comput Sci (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9199:68–80. https://doi.org/10.1007/978-3-319-21233-3_6
Wei L, Ding Y, Su R, Tang J, Zou Q (2018) Prediction of human protein subcellular localization using deep learning. J Parall Distrib Comput 117:212–217. https://doi.org/10.1016/j.jpdc.2017.08.009
Kulmanov M, Khan MA, Hoehndorf R (2018) DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 34:660–668. https://doi.org/10.1093/bioinformatics/btx624
Gao R, Wang M, Zhou J, Fu Y, Liang M, Guo D, Nie J (2019) Prediction of enzyme function based on three parallel deep CNN and amino acid mutation. Int J Mol Sci 20. https://doi.org/10.3390/ijms20112845
Kulmanov M, Hoehndorf R, Cowen L (2020) DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 36:422–429. https://doi.org/10.1093/bioinformatics/btz595
Zhou J, Lu Q, Xu R, Gui L, Wang H (2017) CNNsite: Prediction of DNA-binding residues in proteins using Convolutional Neural Network with sequence features. In: Proceedings—2016 IEEE international conference on bioinformatics and biomedicine, BIBM 2016, pp 78–85. https://doi.org/10.1109/BIBM.2016.7822496
Shen H-B, Chou K-C (2009) Gpos-mPLoc: a top-down approach to improve the quality of predicting subcellular localization of gram-positive bacterial proteins. Protein Pept Lett 16:1478–1484. https://doi.org/10.2174/092986609789839322
Lipman DJ, Souvorov A, Koonin EV, Panchenko AR, Tatusova TA (2002) The relationship of protein conservation and sequence length. BMC Evol Biol 2:1–10. https://doi.org/10.1186/1471-2148-2-20
Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH. 08–12-September-2016, pp 3429–3433. https://doi.org/10.21437/Interspeech.2016-1033
Wang L, Wang HF, Liu SR, Yan X, Song KJ (2019) Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation forest. Sci Rep 9:1–12. https://doi.org/10.1038/s41598-019-46369-4
Zhou S, Chen Q, Wang X (2013) Active deep learning method for semi-supervised sentiment classification. Neurocomputing 120:536–546. https://doi.org/10.1016/j.neucom.2013.04.017
Sharma R, Dehzangi A, Lyons J, Paliwal K, Tsunoda T, Sharma A (2015) Predict gram-positive and gram-negative subcellular localization via incorporating evolutionary information and physicochemical features Into Chou’s General PseAAC. IEEE Trans Nanobiosci 14:915–926. https://doi.org/10.1109/TNB.2015.2500186
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Agrawal, S., Sisodia, D.S., Nagwani, N.K. (2023). Function Characterization of Unknown Protein Sequences Using One Hot Encoding and Convolutional Neural Network Based Model. In: Singh, P., Singh, D., Tiwari, V., Misra, S. (eds) Machine Learning and Computational Intelligence Techniques for Data Engineering. MISP 2022. Lecture Notes in Electrical Engineering, vol 998. Springer, Singapore. https://doi.org/10.1007/978-981-99-0047-3_24
Download citation
DOI: https://doi.org/10.1007/978-981-99-0047-3_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0046-6
Online ISBN: 978-981-99-0047-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)