Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection

Kumar, Gayatri; Srinivasan, Narayanaswamy; Sandhya, Sankaran

doi:10.1007/978-1-0716-2095-3_5

Gayatri Kumar⁴,
Narayanaswamy Srinivasan⁴ &
Sankaran Sandhya⁴^nAff5

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2449))

941 Accesses
1 Citations

Abstract

Sequence-based approaches are fundamental to guide experimental investigations in obtaining structural and/or functional insights into uncharacterized protein families. Powerful profile-based sequence search methods rely on a sequence space continuum to identify non-trivial relationships through homology detection. The computational design of protein-like sequences that serve as “artificial linkers” is useful in identifying relationships between distant members of a structural fold. Such sequences act as intermediates and guide homology searches between distantly related proteins. Here, we describe an approach that represents natural intermediate sequences and designed protein-like sequences as HMM (Hidden Markov Models) profiles, to improve the sensitivity of existing search methods. Searches made within the “Profile database” were shown to recognize the parent structural fold for 90% of the search queries at query coverage better than 60%. For 1040 protein families with no available structure, fold associations were made through searches in the database of natural and designed sequence profiles. Most of the associations were made with the Alpha-alpha superhelix, Transmembrane beta-barrels, TIM barrel, and Immunoglobulin-like beta-sandwich folds. For 11 domain families of unknown functions, we provide confident fold associations using the profiles of designed sequences and a consensus from other fold recognition methods. For two DUFs (Domain families of Unknown Functions), we performed detailed functional annotation through comparisons with characterized templates of families of known function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

Article Open access 20 June 2016

Clustering predicted structures at the scale of the known protein universe

Article Open access 13 September 2023

Crumple: An Efficient Tool to Explore Thoroughly the RNA Folding Landscape

References

Jones DT, Miller RT, Thornton JM (1995) Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing. Proteins Struct Funct Genet 23:387–397. https://doi.org/10.1002/prot.340230312
Article CAS PubMed Google Scholar
Jones DT (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 287:797–815. https://doi.org/10.1006/jmbi.1999.2583
Article CAS PubMed Google Scholar
Kelley LA, MacCallum RM, Sternberg MJ (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 299:501–522. https://doi.org/10.1006/JMBI.2000.3741
Article Google Scholar
Wang Y, Virtanen J, Xue Z, Zhang Y (2017) I-TASSER-MR: automated molecular replacement for distant-homology proteins using iterative fragment assembly and progressive sequence truncation. Nucleic Acids Res 45:W429–W434. https://doi.org/10.1093/nar/gkx349
Article CAS PubMed PubMed Central Google Scholar
Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJE (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10:845–858. https://doi.org/10.1038/nprot.2015.053
Article CAS PubMed PubMed Central Google Scholar
Xu D, Jaroszewski L, Li Z, Godzik A (2014) FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking. Bioinformatics 30:660–667. https://doi.org/10.1093/bioinformatics/btt578
Article CAS PubMed Google Scholar
Ghouzam Y, Postic G, Guerin P-E, de Brevern AG, Gelly J-C (2016) ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Sci Rep 6:28268. https://doi.org/10.1038/srep28268
Article CAS PubMed PubMed Central Google Scholar
Wu S, Zhang Y (2007) LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res 35:3375–3382. https://doi.org/10.1093/nar/gkm251
Article CAS PubMed PubMed Central Google Scholar
Xu J, Li M, Kim D, Xu Y (2003) Raptor: optimal protein threading by linear programming. J Bioinforma Comput Biol 1:95–117. https://doi.org/10.1142/S0219720003000186
Article CAS Google Scholar
Zhu J, Zhang H, Li SC, Wang C, Kong L, Sun S, Zheng W-M, Bu D (2017) Improving protein fold recognition by extracting fold-specific features from predicted residue–residue contacts. Bioinformatics 33:3749–3757. https://doi.org/10.1093/bioinformatics/btx514
Article CAS PubMed Google Scholar
Saidi R, Maddouri M, Mephu Nguifo E (2010) Protein sequences classification by means of feature extraction with substitution matrices. BMC Bioinformatics 11:175. https://doi.org/10.1186/1471-2105-11-175
Article CAS PubMed PubMed Central Google Scholar
Wei L, Liao M, Gao X, Zou Q (2015) Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans Nanobiosci 14:649–659. https://doi.org/10.1109/TNB.2015.2450233
Article Google Scholar
Ibrahim W, Abadeh MS (2017) Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition. J Theor Biol 421:1–15. https://doi.org/10.1016/j.jtbi.2017.03.023
Article CAS PubMed Google Scholar
Lyons J, Paliwal KK, Dehzangi A, Heffernan R, Tsunoda T, Sharma A (2016) Protein fold recognition using HMM–HMM alignment and dynamic programming. J Theor Biol 393:67–74. https://doi.org/10.1016/J.JTBI.2015.12.018
Article CAS PubMed Google Scholar
Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, Orengo C, Thornton J, Tramontano A (2009) Protein function annotation by homology-based inference. Genome Biol 10:207. https://doi.org/10.1186/gb-2009-10-2-207
Article PubMed PubMed Central Google Scholar
Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng Des Sel 12:85–94. https://doi.org/10.1093/protein/12.2.85
Article CAS Google Scholar
Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15:275–284. https://doi.org/10.1016/J.SBI.2005.04.003
Article CAS PubMed Google Scholar
Bru C, Courcelle E, Carrère S, Beausse Y, Dalmar S, Kahn D (2004) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33:D212–D215. https://doi.org/10.1093/nar/gki034
Article CAS PubMed Central Google Scholar
Jones DT, Swindells MB (2002) Getting the most from PSI-BLAST. Trends Biochem Sci 27:161–164
Article CAS Google Scholar
Sandhya S, Kishore S, Sowdhamini R, Srinivasan N (2003) Effective detection of remote homologues by searching in sequence dataset of a protein domain fold. FEBS Lett 552:225–230. https://doi.org/10.1016/S0014-5793(03)00929-3
Article CAS PubMed Google Scholar
Koretke KK, Russell RB, Copley RR, Lupas AN (1999) Fold recognition using sequence and secondary structure information. Proteins Suppl 3:141–148
Article Google Scholar
Krishnadev O, Srinivasan N (2011) AlignHUSH: alignment of HMMs using structure and hydrophobicity information. BMC Bioinform 12:275. https://doi.org/10.1186/1471-2105-12-275
Article CAS Google Scholar
Mistry J, Finn RD, Eddy SR, Bateman A, Punta M (2013) Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res 41:e121–e121. https://doi.org/10.1093/nar/gkt263
Article CAS PubMed PubMed Central Google Scholar
Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger SJ, Söding J (2019) HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinform 20:473. https://doi.org/10.1186/s12859-019-3019-7
Article CAS Google Scholar
Margelevičius M, Venclovas Č (2005) PSI-BLAST-ISS: an intermediate sequence search tool for estimation of the position-specific alignment reliability. BMC Bioinform 6:185. https://doi.org/10.1186/1471-2105-6-185
Article CAS Google Scholar
Pandurangan AP, Stahlhacke J, Oates ME, Smithers B, Gough J (2019) The SUPERFAMILY 2.0 database: a significant proteome update and a new webserver. Nucleic Acids Res 47:D490–D494. https://doi.org/10.1093/nar/gky1130
Article CAS PubMed Google Scholar
Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195. https://doi.org/10.1371/journal.pcbi.1002195
Article CAS PubMed PubMed Central Google Scholar
Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform 11:431. https://doi.org/10.1186/1471-2105-11-431
Article CAS Google Scholar
Scheeff ED, Bourne PE (2006) Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction. BMC Bioinform 7:410. https://doi.org/10.1186/1471-2105-7-410
Article CAS Google Scholar
Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21:951–960. https://doi.org/10.1093/bioinformatics/bti125
Article PubMed Google Scholar
Park J, Teichmann SA, Hubbard T, Chothia C (1997) Intermediate sequences increase the detection of homology between sequences. J Mol Biol 273:349–354. https://doi.org/10.1006/jmbi.1997.1288
Article CAS PubMed Google Scholar
Salamov AA, Suwa M, Orengo CA, Swindells MB (1999) Combining sensitive database searches with multiple intermediates to detect distant homologues. Protein Eng 12:95–100. https://doi.org/10.1093/protein/12.2.95
Article CAS PubMed Google Scholar
Li W, Pio F, Pawlowski K, Godzik A (2000) Saturated BLAST: an automated multiple intermediate sequence search used to detect distant homology. Bioinformatics 16:1105–1110. https://doi.org/10.1093/bioinformatics/16.12.1105
Article CAS PubMed Google Scholar
John B, Sali A (2004) Detection of homologous proteins by an intermediate sequence search. Protein Sci 13:54–62. https://doi.org/10.1110/ps.03335004
Article CAS PubMed PubMed Central Google Scholar
Teichmann SA, Chothia C, Church GM, Park J (2000) Fast assignment of protein structures to sequences using the intermediate sequence library PDB-ISL. Bioinformatics 16:117–124. https://doi.org/10.1093/bioinformatics/16.2.117
Article CAS PubMed Google Scholar
Sandhya S, Mudgal R, Jayadev C, Abhinandan KR, Sowdhamini R, Srinivasan N (2012) Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. Mol BioSyst 8:2076–2084. https://doi.org/10.1039/c2mb25113b
Article CAS PubMed Google Scholar
Mudgal R, Sandhya S, Kumar G, Sowdhamini R, Chandra NR, Srinivasan N (2014) NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection. Nucleic Acids Res 43:D300–D305. https://doi.org/10.1093/nar/gku888
Article CAS PubMed PubMed Central Google Scholar
Mudgal R, Sowdhamini R, Chandra N, Srinivasan N, Sandhya S (2014) Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. J Mol Biol 426:962–979. https://doi.org/10.1016/j.jmb.2013.11.026
Article CAS PubMed Google Scholar
Mudgal R, Sandhya S, Kumar G, Sowdhamini R, Chandra NR, Srinivasan N (2015) NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection. Nucleic Acids Res 43:D300–D305. https://doi.org/10.1093/nar/gku888
Article CAS PubMed Google Scholar
Mudgal R, Sandhya S, Chandra N, Srinivasan N (2015) De-DUFing the DUFs: deciphering distant evolutionary relationships of domains of unknown function using sensitive homology detection methods. Biol Direct 10:38. https://doi.org/10.1186/s13062-015-0069-2
Article CAS PubMed PubMed Central Google Scholar
Kumar G, Srinivasan N, Sandhya S (2020) Artificial protein sequences enable recognition of vicinal and distant protein functional relationships. Proteins Struct Funct Bioinform 88:1688–1700. https://doi.org/10.1002/prot.25986
Article CAS Google Scholar
Sandhya S, Mudgal R, Kumar G, Sowdhamini R, Srinivasan N (2016) Protein sequence design and its applications. Curr Opin Struct Biol 37:71–80. https://doi.org/10.1016/j.sbi.2015.12.004
Article CAS PubMed Google Scholar
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer ELL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE, Finn RD (2019) The PFAM protein families database in 2019. Nucleic Acids Res 47:D427–D432. https://doi.org/10.1093/nar/gky995
Article CAS PubMed Google Scholar
Hubbard TJP, Ailey B, Brenner SE, Murzin AG, Chothia C (1999) SCOP: a structural classification of proteins database. Nucleic Acids Res 27:254–256. https://doi.org/10.1093/nar/27.1.254
Article CAS PubMed PubMed Central Google Scholar
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh L-SL (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32:115D–119D. https://doi.org/10.1093/nar/gkh131
Article CAS Google Scholar
Schaffer AA (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29:2994–3005. https://doi.org/10.1093/nar/29.14.2994
Article CAS PubMed PubMed Central Google Scholar
Altschul S, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. https://doi.org/10.1093/nar/25.17.3389
Article CAS PubMed PubMed Central Google Scholar
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal omega. Mol Syst Biol 7. https://doi.org/10.1038/msb.2011.75
Chandonia J-M, Fox NK, Brenner SE (2019) SCOPe: classification of large macromolecular structures in the structural classification of proteins—extended database. Nucleic Acids Res 47:D475–D481. https://doi.org/10.1093/nar/gky1134
Article CAS PubMed Google Scholar
Velankar S, Dana JM, Jacobsen J, van Ginkel G, Gane PJ, Luo J, Oldfield TJ, O’Donovan C, Martin M-J, Kleywegt GJ (2012) SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res 41:D483–D489. https://doi.org/10.1093/nar/gks1258
Article CAS PubMed PubMed Central Google Scholar
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:29–37. https://doi.org/10.1093/nar/gkr367
Article CAS Google Scholar
Xu Q, Dunbrack RL (2012) Assignment of protein sequences to existing domain and family classification systems: PfamPFAM and the PDB. Bioinformatics 28:2763–2772. https://doi.org/10.1093/bioinformatics/bts533
Article CAS PubMed PubMed Central Google Scholar
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C (2009) InterPro: the integrative protein signature database. Nucleic Acids Res 37:D211–D215. https://doi.org/10.1093/nar/gkn785
Article CAS PubMed Google Scholar
Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066
Article CAS Google Scholar
Pei J, Grishin NV (2014) PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information. In: Methods in molecular biology (Clifton, N.J.), pp 263–271
Google Scholar
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices11Edited by G. Von Heijne. J Mol Biol 292:195–202. https://doi.org/10.1006/jmbi.1999.3091
Article CAS PubMed Google Scholar
Bateman A, Finn RD (2007) SCOOP: a simple method for identification of novel protein superfamily relationships. Bioinformatics 23:809–814. https://doi.org/10.1093/bioinformatics/btm034
Article CAS PubMed Google Scholar
Chen L, Shi K, Yin Z, Aihara H (2013) Structural asymmetry in the Thermus thermophilus RuvC dimer suggests a basis for sequential strand cleavages during Holliday junction resolution. Nucleic Acids Res 41:648–656. https://doi.org/10.1093/nar/gks1015
Article CAS PubMed Google Scholar
Yoshikawa M, Iwasaki H, Kinoshita K, Shinagawa H (2000) Two basic residues, Lys-107 and Lys-118, of RuvC resolvase are involved in critical contacts with the Holliday junction for its resolution. Genes Cells 5:803–813. https://doi.org/10.1046/j.1365-2443.2000.00371.x
Article CAS PubMed Google Scholar
Singarapu KK, Liu G, Xiao R, Bertonati C, Honig B, Montelione GT, Szyperski T (2007) NMR structure of protein yjbR from Escherichia coli reveals “double-wing” DNA binding motif. Proteins Struct Funct Genet 67:501–504. https://doi.org/10.1002/prot.21297
Article CAS PubMed Google Scholar
Feldmann EA, Seetharaman J, Ramelot TA, Lew S, Zhao L, Hamilton K, Ciccosanti C, Xiao R, Acton TB, Everett JK, Tong L, Montelione GT, Kennedy MA (2012) Solution NMR and X-ray crystal structures of pseudomonas syringae Pspto-3016 from protein domain family PF04237 (DUF419) adopt a “double wing” DNA binding motif. J Struct Funct Genom 13:155–162. https://doi.org/10.1007/s10969-012-9140-8
Article CAS Google Scholar

Download references

Acknowledgments

This research is supported by Mathematical Biology program and FIST program, sponsored by the Department of Science and Technology and also by the Department of Biotechnology, Government of India, in the form of IISc-DBT partnership program. We also gratefully acknowledge support from Bioinformatics and Computational Biology Centre, funded by DBT and support from UGC, India—Centre for Advanced Studies and Ministry of Human Resource Development, India. NS is a J. C. Bose National Fellow. SS was supported as a post-doctoral fellow by the DBT-IISc partnership program and is currently affiliated with M.S. Ramaiah University of Applied Sciences, Bangalore this paper is dedicated to one of the authors of the paper, Prof N. Srinivasan, who passed away on September 03, 2021.

Author information

Sankaran Sandhya
Present address: Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India

Authors and Affiliations

Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
Gayatri Kumar, Narayanaswamy Srinivasan & Sankaran Sandhya

Authors

Gayatri Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Narayanaswamy Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Sankaran Sandhya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Narayanaswamy Srinivasan or Sankaran Sandhya .

Editor information

Editors and Affiliations

University of Pavia, Pavia, Italy
Oliviero Carugo
Genome Institute and Bioinformatics Institute, Singapore, Singapore
Frank Eisenhaber

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Kumar, G., Srinivasan, N., Sandhya, S. (2022). Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 2449. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2095-3_5

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2095-3_5
Published: 05 May 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2094-6
Online ISBN: 978-1-0716-2095-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

Clustering predicted structures at the scale of the known protein universe

Crumple: An Efficient Tool to Explore Thoroughly the RNA Folding Landscape

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation