A Structure Based Algorithm for Improving Motifs Prediction

Pathak, Sudipta; Kundeti, Vamsi Krishna; Schiller, Martin R.; Rajasekaran, Sanguthevar

doi:10.1007/978-3-642-39159-0_22

Sudipta Pathak²⁴,
Vamsi Krishna Kundeti²⁵,
Martin R. Schiller²⁶ &
…
Sanguthevar Rajasekaran²⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7986))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

1573 Accesses

Abstract

Minimotifs are short contiguous peptide sequences in proteins that are known to have functions. There are many repositories for experimentally validated minimotifs. MnM is one of them. Predicting minimotifs (in unknown sequences) is a challenging and interesting problem in biology. Minimotifs stored in the MnM database range in length from 5 to 15. Any algorithm for predicting minimotifs in an unknown query sequence is likely to have many false positives owing to the short lengths of the motifs looked for. Our team has developed a series of algorithms (called filters) in the past to reduce the false positives and improve the prediction accuracy. All of these algorithms are based on sequence information. In a recent paper we have demonstrated the power of structural information in characterizing motifs. In this paper we present an algorithm that exploits structural information for reducing false positives in motifs prediction. We test the validity of our algorithm using the minimotifs stored in the MnM database. MnM is a web system for minimotif search that our team has built. It houses more than 300,000 minimotifs. Our new algorithm is a learning algorithm that will be trained in the first phase and in the second phase its accuracy will be measured. For any input query protein sequence, MnM identifies a list of putative minimotifs in the query sequence. We currently employ a series of sequence based algorithms to reduce the false positives in the predictions of MnM. For every minimotif stored in MnM, we also store a number of attributes pertinent to the motif. One such attribute is the source of the minimotif. The source is nothing but the protein in which the minimotif is present. For the analysis of our new algorithm we only employ those minimtofis that have multiple sources for positive control. Random data is used as negative data. The basic idea of our algorithm is the hypothesis that a putative minimotif is likely to be valid if its structure in the query sequence is very similar to its structure in its source protein. Another important feature of our algorithm is that it is specific to individual minimotifs. In other words, a unique set of parameters is learnt for every minimotif. We feel that this is a better approach than learning a common set of parameters for all the minimotifs together. Our findings reveal that in most of the cases the occurrences of the minimotifs in their source proteins are structurally similar. Also, typically, the occurrences of a minimotif in its source protein and a random protein are dissimilar. Our experimental results show that the parameters learnt by our algorithm can significantly reduce false positives.

Download to read the full chapter text

Chapter PDF

schematikon: Detailed Sequence-Structure Relationships from Mining a Non-redundant Protein Structure Database

Comparison of GHT-Based Approaches to Structural Motif Retrieval

Automated protein motif generation in the structure-based protein function prediction tool ProMOL

Article 16 November 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Kundeti, V.K., Rajasekaran, S.: A Statistical Technique to Predict Structural Characteristics of Short Motifs, BECAT Tech. Report
Google Scholar
Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description Version 3.30 Document Published by the wwPDB
Google Scholar
UniProt Documentation, http://www.ebi.ac.uk/uniprot/Documentation/
Database of protein domains, families and functional sites, http://prosite.expasy.org/prosite.html/
Non-redundant databases (NRDB)
Google Scholar
OWL database, http://www.bioinf.man.ac.uk/dbbrowser/OWL/index.php
Obenauer, J.C., Cantley, L.C., Yaffe, M.B.: Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Research 31(13), 3635–3641 (2003)
Article Google Scholar
Rajasekaran, S., Merlin, J.C., Kundeti, V., Oommen, A., Mi, T., Oommen, A., Vyas, J., Alaniz, I., Chung, K., Chowdhury, F., Deverasatty, S., Irvey, T.M., Lacambacal, D., Lara, D., Panchangam, S., Rathnayake, V., Watts, P., Schiller, M.R.: A computational tool for identifying minimotifs in protein-protein interactions and improving the accuracy of minimotif predictions. Proteins: Structure, Function, and Bioinformatics 79(1), 153–164 (2010)
Article Google Scholar
Rajasekaran, S., Mi, T., Merlin, J.C., Oommen, A., Gradie, P., Schiller, M.R.: Partitioning of minimotifs based on function with improved prediction accuracy. PLoS ONE 5(8), e12276 (2010)
Google Scholar
Rajasekaran, S., Balla, S., Gradie, P., Gryk, M.R., Kadaveru, K., Kundeti, V., Maciejewski, M.W., Mi, T., Rubino, N., Vyas, J., Schiller, M.R.: Minimotif miner 2nd release: a database and web system for motif search. Nucleic Acids Research 37, D185–D190 (2009)
Google Scholar
Balla, S., Thapar, V., Verma, S., Luong, T., Faghri, T., Huang, C.-H., Rajasekaran, S., del Campo, J.J., Shinn, J.H., Mohler, W.A., Maciejewski, M.W., Gryk, M.R., Piccirillo, B., Schiller, S.R., Schiller, M.R.: Minimotif Miner, a tool for investigating protein function. Nat. Methods 3, 175–177 (2006) (PMID: 16489333)
Google Scholar
Via, A., Gould, C.M., Gemünd, C., Gibson, T.J., Helmer-Citterich, M.: A structure filter for the Eukaryotic Linear Motif Resource. BMC Bioinformatics 10, 351 (2009), doi:10.1186/1471-2105-10-351
Article Google Scholar
Sigrist, C.J.A., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.: PROSITE: A documented database using patterns and profiles as motif descriptors. Oxford Journals (2002), doi: 10.1093/bib/3.3.265
Google Scholar
Neduva, V., Russell, R.B.: DILIMOT: discovery of linear motifs in proteins. Nucleic Acids Res. (2006), doi: 10.1093/nar/gkl159
Google Scholar
Sidman, K.E., George, D.G., Barker, W.C., Hunt, L.T.: The protein identification resource (PIR). Nucleic Acids Research 16(5) (1988)
Google Scholar
Altschul, S.F., Gish, W., Myers, W.M.E.W., Lipmanl, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Connecticut, USA
Sudipta Pathak & Sanguthevar Rajasekaran
Intel Corporation, USA
Vamsi Krishna Kundeti
School of Life Sciences, University of Nevada Las Vegas, USA
Martin R. Schiller

Authors

Sudipta Pathak
View author publications
You can also search for this author in PubMed Google Scholar
Vamsi Krishna Kundeti
View author publications
You can also search for this author in PubMed Google Scholar
Martin R. Schiller
View author publications
You can also search for this author in PubMed Google Scholar
Sanguthevar Rajasekaran
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Windsor, 5115 Lambton Tower, 401 Sunset Avenue, N9B 3P4, Windsor, ON, Canada
Alioune Ngom
I3S Research Lab., Nice Sophia Antipolis University, 06903, Sophia Antipolis Cedex, France
Enrico Formenti
LERIA - Faculté des Sciences, Université d’Angers, 2 Boulevard Lavoisier, 49045, Angers Cedex 01, France
Jin-Kao Hao
School of Electronics and Information Engineering, Tongji University, 201804, Shanghai, China
Xing-Ming Zhao
Institute for Computing and Information Sciences, Radboud University, 6500 GL, Nijmegen, The Netherlands
Twan van Laarhoven

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pathak, S., Kundeti, V.K., Schiller, M.R., Rajasekaran, S. (2013). A Structure Based Algorithm for Improving Motifs Prediction. In: Ngom, A., Formenti, E., Hao, JK., Zhao, XM., van Laarhoven, T. (eds) Pattern Recognition in Bioinformatics. PRIB 2013. Lecture Notes in Computer Science(), vol 7986. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39159-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-39159-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39158-3
Online ISBN: 978-3-642-39159-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A Structure Based Algorithm for Improving Motifs Prediction

Abstract

Chapter PDF

Similar content being viewed by others

schematikon: Detailed Sequence-Structure Relationships from Mining a Non-redundant Protein Structure Database

Comparison of GHT-Based Approaches to Structural Motif Retrieval

Automated protein motif generation in the structure-based protein function prediction tool ProMOL

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Structure Based Algorithm for Improving Motifs Prediction

Abstract

Chapter PDF

Similar content being viewed by others

schematikon: Detailed Sequence-Structure Relationships from Mining a Non-redundant Protein Structure Database

Comparison of GHT-Based Approaches to Structural Motif Retrieval

Automated protein motif generation in the structure-based protein function prediction tool ProMOL

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation