Abstract
Modeling the three-dimensional (3D) structures of proteins assumes great significance because of its manifold applications in biomolecular research. Toward this goal, we present MaxMod, a graphical user interface (GUI) of the MODELLER program that combines profile hidden Markov model (profile HMM) method with Clustal Omega program to significantly improve the selection of homologous templates and target-template alignment for construction of accurate 3D protein models. MaxMod distinguishes itself from other existing GUIs of MODELLER software by implementing effortless modeling of proteins using templates that bear modified residues. Additionally, it provides various features such as loop optimization, express modeling (a feature where protein model can be generated directly from its sequence, without any further user intervention) and automatic update of PDB database, thus enhancing the user-friendly control of computational tasks. We find that HMM-based MaxMod performs better than other modeling packages in terms of execution time and model quality. MaxMod is freely available as a downloadable standalone tool for academic and non-commercial purpose at http://www.immt.res.in/maxmod/.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Recent advancement in high throughput next-generation sequencing technologies has led to an exponential rise in genome sequence databases. However, the significance of the genomic data cannot be gained until functional inferences of these sequences are deciphered. Toward this end, elucidation of protein three-dimensional (3D) structure bears great importance in understanding the mechanism of protein function, its evolutionary features and catalytic activity, all of which can serve as important framework in designing further experimental studies. Keeping in view of the time consuming nature of experimental determination of protein structure, theoretical modeling based on homology is currently the most reliable, rapid, and cost-effective approach for deducing structural properties of sequences and to bridge the ever expanding gap between the number of known protein sequences and the number of structures solved [1]. Homology modeling method predicts the 3D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (template) [2]. Although the reliability of this method has been well established in recent years, selection of the most accurate template and correctness of the target-template alignment are still the challenging areas of research.For homologous protein sequences with sequence identity greater than 40 %, the alignment is generally considered to be almost accurate. However, as the overall sequence identity decreases, alignment becomes difficult and subsequently reduces the quality of the final model [1, 3]. Therefore, the choice of sequence alignment strategy plays a more critical role in generating accurate protein models than the choice of the modeling program, with distinctly improved models obtained by employing the best available sequence alignment technique [4]. The widely used MODELLER program [5] for homology modeling uses standard pairwise comparison methods for template selection and target-template alignment [3]. The subsequently released graphical user interfaces (GUIs) of MODELLER program such as MINT (http://www.bioinf.org.uk/software/mint/), EasyModeller [6], SWIFT MODELLER [7] and PyMod [8] have also implemented pairwise comparison methods into their workflow for comparative protein structure modeling. A brief account of some essential features of these programs with their limitations is presented in Table 1.
Although pairwise comparison methods, which employ a dynamic programming algorithm guarantee an optimal alignment, the intensity and generality of the underlying substitution matrices (PAM and BLOSSUM) limit the reliability of such methods to cases of high sequence identity. On the other hand, alignment in the so called twilight zone (between 15–30 % sequence identity) requires additional information regarding the protein family to which the particular sequence belongs [9]. In the past several years probabilistic inference methods based on profile hidden Markov models (profile HMM) have emerged as an alternative to conventional pairwise alignment methods such as BLAST [10, 11] and FASTA [12] for creating sequence profiles in order to detect more distant remote homologous templates from database [13]. The key factor in HMM algorithm is in computing not just one best-scoring alignment but a sum of probabilities over the entire local alignment ensemble and therefore, contain more information about the sequence family than a single sequence [14, 15]. Furthermore, a number of recent studies have corroborated the principal advantage of profile-profile based alignment in template identification and overall model quality generation [16–18]. Despite these many advantages, implementation of HMM method in homology modeling software and tools is yet to be addressed adequately [13]. Here we describe the development and benchmarking of MaxMod, a unique Microsoft Windows based GUI of MODELLER that integrates HMMER3 program for template identification and Clustal Omega program for sequence alignment. HMMER3 makes profile HMM searches as fast as BLAST, while retaining the power of probabilistic inference technology [13]. In conjunction, implementation of Clustal Omega allows fast scalable generation of high quality multiple sequence alignment by using HHalign package of HMMER3 [19]. We believe that MaxMod will make the entire process of protein homology modeling much faster and user-friendly.
Methods
MaxMod has been developed using Visual Studio.NET platform with C# as the programming language for a high degree of flexibility in the development of user interface (UI) and creating an interactive modular system. The UI is built on a multiple document interface (MDI) for effective presentation of different user modules. The input and output (I/O) operations dominate the entire coding architecture for formatting Python scripts and input files of the backend MODELLER program.
The architecture of MaxMod (Fig. 1) consists of three distinct layers, (a) Presentation layer: All visual elements of MaxMod including user I/O, job directory management and PDB sequence database update are present in this layer. (b) Business layer: This layer contains standard programming features of the .NET framework base class library (BCL) such as collection classes, data type definitions, variables, security and IO operations along with some non-standard features viz., drawing, classes for database interaction, and web support. Business layer takes input from the preceding presentation layer, processes data (formatting of python scripts and preparing inputs for other 3rd party programs) and sends it to the next level. (c) Data access layer: This is a virtual layer controlling various 3rd party programs such as HMMER3, Clustal Omega, Jmol, and PROCHECK [20], all of which have been integrated within MaxMod. The other programmes such as MODELLER and Python require pre-installation. The PDB database also resides in this layer for templates search. All processed data and instructions from BCL are received by the 3rd party programs of data access layer and are further executed to display the output in the presentation layer. Based on the above architecture, MaxMod follows a definite workflow as illustrated in Fig. 2.
Submission of protein sequence
The user is required to submit the target protein sequence in RAW format with a job title of a maximum of five characters. If no title is provided, the program assigns a default name (MODEL) to the submitted sequence along with date and time (format: YYYYMMDDHHMMSS) of submission. The job title also represents the working directory name, where the results are saved for accessing at a later time. At this stage the user can select one of the options viz., “search templates”, “upload templates” or “express modeling”, depending on the requirement (Fig. 3a).
Search templates
The PDB sequence database and “phmmer” program of HMMER3 software suite are packaged together with MaxMod in order to search templates. On selecting the “search templates” option, HMMER3 program executes to find remote homologs from PDB for the target protein sequence and the output is presented in a tabular format outlining the PDB code with chain name of the crystal structure, E-value, bit-score, E-value of domain hits, bit-score of domain hits and percentage of sequence identity. The user can select desired number of templates for viewing more detailed information of the crystal structure available in PDB and their alignment with target sequence. The window will then be directed to RCSB website (www.rcsb.org) for extracting the atomic coordinates of the selected structures (Fig. 3b).
Upload templates
If the “upload templates” option in the homepage is selected, the user will be redirected to a separate window where any number of PDB structures can be uploaded as templates and the appropriate chain can be further chosen from a drop down menu (Fig. 3c).
Compare templates
The user can select the most accurate template by clicking on the “compare templates” option, which performs comparison between the selected templates on the basis of better crystallographic resolution (R-factor) and higher overall sequence identity. MaxMod then displays a dendrogram from the generated log file with their respective R-factor (Fig. 3d).
Model construction and analysis
Successful submission of template structures by exercising any of the options viz.,“search templates”, “upload templates” and “compare templates”, the user will be redirected to the model construction window where template-wise arrangement of ligands are displayed in a tree-view topology. Required ligands may be selected to copy their atomic coordinates onto the modeled structure. Other advanced features are also available in MaxMod such as, “optimization and refinement” where each model is first optimized with the variable target function method with conjugate gradients, followed by its refinement using molecular dynamics with simulated annealing; “rapid optimization” enables the user to get an approximate model very quickly and, the “automatic loop refinement after model building” allows refinement of loop regions after constructing the 3D protein model (Fig. 3e). Selection of the “build model” option after indicating the number of models to be generated will automatically redirect to a new window where ‘file name’, ‘molpdf (molecular probability density function)’, and ‘discrete optimized potential energy (DOPE) score’ are shown in the left panel and options for ‘PROCHECK’, ‘visualization’, ‘DOPE evaluation’, and ‘download’ are available in the right panel (Fig. 3f). A low ‘molpdf’ or ‘DOPE score’ signifies a reliable model. PROCHECK and Jmol are programs used to generate the Ramachandran plot (Fig. 3g) and visualize 3D conformation of protein, respectively (Fig. 3h).
Express modeling
To make the homology modeling procedure simpler and user-friendly, especially for beginners and non-programmer biologists, another useful feature named “express modeling” option is provided in the home page of MaxMod, where submission of protein sequence in RAW format is the only requirement for building protein 3D model.
Loop optimization
Loops that connect elements of secondary structure for proper protein folding determine the functional specificity of the protein [21]. As a consequence, the accuracy of loop modeling is a crucial component in determining the usefulness of comparative models for studying protein-ligand interactions [22]. In this context we have included a “loop optimization” utility in MaxMod where PDB structures can be uploaded or obtained directly from the job directory. The user is required to specify the loop region to be refined as well as the number of structures to be generated. The resulting optimized 3D protein models are displayed in a separate window to analyze and download.
Results and discussion
MaxMod is a rich user-friendly standalone tool for protein homology modeling that implements profile HMM method in the modeling framework, unlike other existing GUIs like EasyModeller, SWIFT MODELLER, and PyMod, which employ pairwise comparison methods such as ALIGN2D or SALIGN commands for target-template alignment. The advantage of using profile HMM over pairwise comparison method in MaxMod is that it turns a multiple sequence alignment into a position-specific scoring system which is more suitable for identifying distant homologous relationships. MaxMod can also effortlessly construct protein models using templates bearing modified residues, a feature not present in any other GUIs. Additionally other important features are available such as loop optimization, model validation, and visualization, automated update of PDB database, and express modeling to enable users, to build 3D model by simply submitting the protein sequence.
On comparing MaxMod with other MODELLER-based GUIs with respect to the total time taken to construct 3D model for the protein sequence lactate dehydrogenase (UniProt Acc Id: O96445), it was observed that MaxMod takes around 18 s which is approximately three times faster than PyMod and five times faster than EasyModeller and SWIFT MODELLER (Table 2). The rapid construction of protein model by MaxMod can be attributed to improved template search and target-template alignment using HMMER3 and Clustal Omega programs, respectively. Moreover on assessing the above four modeling programs in relation to their ability to build 3D models with template bearing modified residues, specifically using the crystal structure (PKR kinase domain-eIF2alpha- AMP-PNP complex; PDB Id-2A19) containing a modified residue named phosphothreonine, it was observed that unlike other programs which, either completely failed to construct any model or were unable to copy the atomic coordinates of ligands, MaxMod successfully completed protein modeling without any difficulty. Furthermore, the overall performance of these programs was compared by assessing the stereochemical quality of the various 3D structures generated from modeling a test set of 15 randomly selected proteins, ranging sequences identity from as low as 27 % to as high as 84 % (Table 3). PROCHECK results indicated that all 3D models determined using MaxMod were of better stereochemical quality with approximately more than 99 % of residues in the allowed region of Ramachandran plot (Table 3). Furthermore, to check the compatibility of inter-residues interactions, Verify3D [23, 24] tool was employed where the scores indicated that models generated through MaxMod have relatively greater percentage of residues with an average score >0.2, as compared to the models generated by other programs. Similarly, to detect potential errors in the proteins, their Z-score and total energy plots were calculated using ProSA-web program [25]. The Z-score indicates overall model quality and measures the deviation of the total energy of the modeled structure with respect to energy distribution derived from random conformations [26]. The score outside a range characteristic for native proteins indicates erroneous structures. The ProSA energy plot indicated that all the 3D models generated using MaxMod fall within the range of experimentally determined structures (Supplementary Fig. 1). Thus, the overall results (Table 3) conclusively demonstrate the reliability of MaxMod for significant improvement in model accuracy.
Conclusions
MaxMod is a rich user-friendly GUI to the MODELLER program for prediction of protein 3D structures. Its unique strengths are, (i) the use of profile HMM methods such as HMMER and Clustal Omega for template identification and target-template alignment, respectively; (ii) effortless modeling of protein using templates having modified residues (iii) other useful features such as (a) loop optimization, (b) express modeling, (c) model validation, and (d) PDB database update facility. Additionally, the processing time required for model building as well as the overall model quality is significantly improved due to substitution of progressive alignment with profile HMM method. The program runs on any version of Microsoft Windows and we plan to release regular updates, twice annually.
References
Cavasotto CN, Patak SS (2009) Homology modeling in drug discovery: current trends and applications. Drug Discov Today 14(13–14):676–683
Barton GJ (1998) Protein Sequence alignment techniques. Acta Cryst D 54:1139–1146
Vyas VK, Ukawala RD, Ghate M, Chintha C (2012) Homology modeling a fast tool for drug discovery: current perspectives. Indian J Pharm Sci 74(1):1–17
Dalton JA, Jackson RM (2007) An evaluation of automated homology modeling methods at low target-template sequence similarity. Bioinformatics 23((5):1901–1908
Sali A, Blundell TL (1993) Comparative protein Ssructure modelling by satisfaction of spacial restrants. J Mol Biol 234(3):779–815
Kuntal BK, Aporoy P, Reddanna P (2010) EasyModeller: a graphical interface to MODELLER. BMC Res Notes 3:226–230
Mathur A, Vidyarthi AS (2011) SWIFT MODELLER: a JAVA based GUI for molecular modeling. J Mol Model 17(10):2601–2607
Bramucci E, Paiardini A, Bossa F, Pascarella S (2012) PyMod: sequence similarity searches, multiple sequence-structure alignments, and homology modeling within PyMOL. BMC Bioinformatics 13: Suppl 4:S2
Saxena A, Sangwan RS, Mishra A (2013) Fundamentals of homology modelling steps and comparison among important bioinformatics tools: an overview. Sci Int 1(7):237–252
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic. Acids Res 25(17)):3389–3402
Pearson WR (2000) (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 132:185–219
Eddy SR (2011) Accelerated Profile HMM Searches. PLoS Comput Biol 7(10):e1002195
Krogh A, Brown M, Mian IS, Sjolander K, Haussler D (1994) Hidden Markov models in computational biology: applications to protein modeling. J Mol Biol 235(5):1501–1531
Eddy SR (1998) Profile hidden markov models. Bioinformatics 4(9):755–763
Yan R, Xu D, Yang J, Walker S, Zhang Y (2013) A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Sci Rep 3:2619–2627
Sauder JM, Arthur JW, Dunbrack RL Jr (2000) Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins 40(1):6–22
Edgar RC, Sjolander KA (2004) Comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20(8):1301–1308
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539
Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 26:283–291
Fetrow JS (1995) Omega loops: nonregular secondary structures significant in protein function and stability. FASEB J 9(9)):708–717
Fiser A, Do RK, Sali A (2000) Modeling of loops in protein structures. Protein Sci 9(9):1753–1773
Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253(5016):164–170
Luthy R, Bowie JU, Eisenberg D (1992) Assessment of protein models with three-dimensional profiles. Nature 356(6364):83–85
Wiederstein M, Sippl MJ (2007) ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucl Acids Res 35:W407–410
Sippl MJ (1995) Knowledge-based potentials for proteins. Curr Opin Struct Biol 5(2):229–235
Acknowledgments
NM is grateful to Council of Scientific and Industrial Research, Govt. of India for the award of Senior Research Fellowship. The authors would also like to thank the members of CNeM department, CSIR-IMMT for providing server space and hosting the website of MaxMod.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(GIF 24 kb)
Rights and permissions
About this article
Cite this article
Parida, B.K., Panda, P.K., Misra, N. et al. MaxMod: a hidden Markov model based novel interface to MODELLER for improved prediction of protein 3D models. J Mol Model 21, 30 (2015). https://doi.org/10.1007/s00894-014-2563-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00894-014-2563-3