Background

Comparing the architecture of different proteins can provide insights into the principles of their formation and function. Where proteins are very similar it can be useful to superimpose and inspect their three-dimensional structures computationally[1]. Proteins with less overall similarity may still share a common arrangement of secondary-structure features, as exemplified by the CATH[2] and SCOP[3] classification schemata. Outside the regions of secondary structure one can identify smaller structural motifs such as the β-turn[4], generally ranging from three to six residues in length, and defined by specific residues having particular dihedral angles or arrangements of hydrogen bonds[5]. In our studies of such small structural motifs, we needed to compare and analyse them, and separate the sub-classes that they often encompass. It was for this purpose that we developed Structure Motivator, the software described here.

Small structural elements in proteins are recognized by visual inspection of individual proteins using programs that display three-dimensional graphics, and they may then be compared by superimposition in programs of the same type (e.g. Figure nine of[6]). However superimposition is not a practicable means of comparison for large sets of small structural motifs (hundreds and more). The most common practice is to display two-dimensional plots of the φ and ψ dihedral angles (Ramachandran plots[7]) at each residue[816]. Our program, Structure Motivator, also employs plots of dihedral angles of small three-dimensional structural elements. However, rather than merely providing a visualization end-point, the plots serve as a starting point for the interactive exploration of such elements.

Structure Motivator will be of value to structural biologists wishing to analyse existing small structural elements in proteins, sub-classify them, and define new ones. It allows users with no knowledge of relational databases to make what are, in effect, complex database queries defining new structural motifs, merely by selecting areas of the plots using a computer mouse.

Implementation

Structure Motivator has been designed for the desktop — rather than as a web application — to facilitate data input and output, to allow graphics to be saved and printed easily, and so that it can be used in the absence of an internet connection. It was written in the Java programming language so that it could be deployed across different platforms, and gratuitous version inflation was avoided for the benefit of those using older computers.

The Protein Motif Database[5], implemented in the MySQL database management system, was used during development and for preparing input files distributed with the application. A modified version of this was prepared (see Additional file1, for the schema) and migrated using DdlUtils[17] to the Derby database management system[18], which is written in Java and may be embedded in Java applications. Both Structure Motivator and the PreMotivator utility contain this embedded database.

To optimize performance, SQL (Structured Query Language) queries to the embedded database in Structure Motivator are only used for the initial creation of Java objects corresponding to a chosen motif. Subsequent selections are made by addressing these objects in memory. SQL queries employing JDBC (Java Database Connectivity) were needed to prepare files for use in Structure Motivator, and this functionality was provided in a separate utility, PreMotivator. The format for Structure Motivator text files was chosen to facilitate conversion from tables resulting from SQL queries.

Structure Motivator provides one option that does require an internet connection. It can link to a web facility that allows individual motifs to be viewed in the context of the three-dimensional structure of the protein. This uses a server-based Perl CGI (Common Gateway Interface) application, motivator.cgi, to return a web page with appropriate three-dimensional visualization of the motif with the Jmol applet[19] and additional JavaScript controls. The server is that used for the Motivated Proteins web application which houses both the MySQL database corresponding to that in Structure Motivator and the set of 429 curated files of protein subunits[13], both of which are used by the CGI application. However the JavaScript/Live Connect controls provided for the user differ from those in Motivated Proteins, being tailored to the requirements of Structure Motivator.

The Java code used for simple regression analysis came from the Apache Commons Mathematics library, statistics (http://commons.apache.org). That for launching the user’s web browser was from Dem Pilifian (http://www.centerkey.com/java/browser), modified to run under Java 1.4.

Results

The functions of Structure Motivator will be presented, together with examples of their use. The console displayed upon starting the application is shown in Figure1, with some of its components numbered for reference below.

Figure 1
figure 1

The Structure Motivator console and interactive selection. The console of Structure Motivator is shown displaying φψ plots for the residues in one of the motifs from the embedded database (a type I or 2,3-αRαR β-turn). The elliptical selection tool has been used to encompass the βL region of position 4 after adjusting the y axis (ψ) to allow this. Points representing the conformation of all the residues in instances of β-turns corresponding to the selection have become highlighted in blue. The grey horizontal and vertical lines in positions 2 and 3 show the user the dihedral angle ranges used to define the embedded motif, and the cartoon indicates its hydrogen-bonding pattern. The grey lines may be switched off using the ‘Angle Constraints’ control on the console.

Loading structural elements

On launching Structure Motivator the user is presented with a display of the first of the inbuilt motifs (derived from the Protein Motif database underlying the Motivated Proteins web facility[5]). The drop-down menu (Figure1: 1) allows one to choose from the 19 classes of motif, after which one may select a sub-class (there are almost 100 in all) from the list which loads in the window below (Figure1: 2). For these inbuilt motifs, a cartoon of the structure is displayed (Figure1: 3).

A second type of structural element may be loaded from within the application through a menu item, ‘Load n-mers’. This gives access to a complete set (90,000) of small peptides (there is a choice of 3-mers to 6-mers) from the proteins in the database, providing a blank canvas, as it were, from which one can define and prepare subsets of structural motifs.

Alternatively text files in Structure Motivator format specifying sets of structural elements can be loaded using a menu item (‘Open File’). This is of particular interest to users who are able to generate their own data for analysis. Files in this format may also be generated by an associated utility, PreMotivator, which contains the same embedded database as Structure Motivator. PreMotivator allows specification of main-chain dihedral angles at different positions in a query peptide of up to nine residues in length (the maximum for Structure Motivator). The application website also contains files for some motifs not present in the Protein Motif database, e.g. γ-turns[20] and catgrips[21], together with some other structural elements, including α-turns, 310 helices, and sections of α-helices.

Choice of dihedral angles to display

After loading a set of structural motifs or elements, one is presented with separate φψ plots for each residue — the standard Ramachandran plot (Figure1). The ‘Dihedral Combination’ controls (Figure1: 4) allow one to change to the alternatives of χ1 angles plotted against φ (φχ1 plot), a linked composite of the φχ1 plot with the φψ plot (φψχ plot), or the anti-φψ plot.

The φψχ plot is particularly useful when one is concerned with the inter-relationship between χ1 angles and both φ and ψ angles. An example is provided by pentapeptides with the simple - -P- - sequence motif. Figure 2 shows two alternative χ1 marquee selections at position 3 (made as described below), allowing separate visualization of elements with either the DOWN (Figure2 (a)) or UP (Figure2 (b)) pucker of the proline ring. The φψχ plot allows one to see clearly the influence of these different χ1 distributions on the ψ distribution at position 2 — the preceding residue. (This has been observed previously[22, 23] and is particularly evident for the ζ[10] and αR regions of the plot. In addition it can be seen how the αL conformation is disfavoured at the position preceding a proline ring with UP pucker.)

Figure 2
figure 2

Inter-relationship between φψ and φχ1 plots. A set of pentapeptides with pro at position 3 were displayed using the ‘φψχ’ combination setting (Figure1) and selections made in the χ1 region of position 3 corresponding to (a) the DOWN pucker, and (b) the UP pucker of the pro ring. Only the plots for positions 2 and 3 are shown. Note that the y axis (χ) of the φχ1 portion of the φψχ plot defaults to the range –120˚ to 240˚ to prevent the trans rotamer being split across the 180˚/–180˚ boundary.

In the anti-φψ plot (see Additional file2), the ψ angle for one position in a motif is plotted against the φ angle in the following position, allowing study of the pair of angles flanking the peptide bonds, rather than those flanking the α-carbon residues. Such plots are useful for examining peptide-plane flipping[24].

Modifying the display

The console has controls to modify the way in which the dihedral angle plots are displayed. The number of points plotted may be decreased (Figure1: 8), which can be useful if there are very many of them, and an option is available (Figure1: 7) to hide the angle constraints (grey lines at residue positions 2 and 3 in Figure1) which indicate the ranges of dihedral angles used in the definition of some of the inbuilt motifs.

The two-dimensional nature of a standard Ramachandran plot does not express the 360˚ continuity of dihedral angles, so that a cluster of structural elements may appear both at the top and bottom of a plot, or at its left and right extremities. To facilitate selection of such clusters the user can adjust the axes of the plot by double-clicking at a residue position and entering values in a dialogue box. Such an adjustment, to group together points representing the βL conformation, is illustrated in Figure1 for residue 4.

Another option (Figure1: 5) allows one to visualize the distribution of any particular amino acid within the residues of a structural element. One can highlight an amino acid (‘include’ it), ‘exclude’ it to view only the dihedral angle distribution of the other amino acids, or restrict the display to this ‘sole’ amino acid. One use of this facility is to determine whether or not a particular amino acid is evenly distributed within a region of the plot. For example, applying this for glycine in Figure1 demonstrates its asymmetric distribution in the αL region at residue 4 (see Additional file3).

Making selections from the dihedral angle plots

There are two criteria on which selection of a sub-set of structural motifs may be made: amino-acid sequence pattern and dihedral angle distribution at a particular position.

The ‘Edit Patterns’ button on the console (Figure1: 6) provides access to a dialogue box in which the user may specify a sequence pattern of amino acid residues to be present in a structural motif (displayed in black) and/or a pattern of residues to be excluded (displayed in red). The 4214 instances of the element illustrated in Figure2 were selected in this way from 90,000 4-mers by specifying the pattern - -P- -. A facility that may be used to inform such sequence-based selection is a pop-up menu of amino-acid composition at any residue position, evoked by a right mouse click in the plot for the residue in question (see Additional file4).

To select structural elements with a particular range of dihedral angles one chooses either the rectangular or elliptical marquee tool in the ‘Area Selection’ region area of the console (Figure1: 9) and drags over the area to be selected. The points corresponding to selected instances of the elements are highlighted in blue — both within the dotted outline of the selection marquee and in the plots for the other residues — whereas unselected points remain red (Figure1). Sometimes it is more convenient to exclude an area of the plot. This can be done by holding down a modifier key when dragging, in which case instances outside the area enclosed by the marquee are selected. As an aid to precise selection one can display the co-ordinates at any point in a ‘tool-tip’ if the cursor is kept stationary at that point for a few seconds (see Additional file4).

The power of such interactive selection is in defining a subset of structural elements for export. (In Figure1 one could proceed to export all Type I β-turns with the βL conformation in position 4.) However the tool can also be used analytically. An option in the ‘Area Selection’ region of the console allows display of various statistics: mean values of the angles within the area and the slope of the line through it (see Additional file4).

Comparing dihedral angles within elements

Structure motivator provides a facility that allows the distribution of φ/ψ angles at different positions in a structural element to be compared by superimposition (the ‘Superimpose φ/ψ plots’ menu item). An analytical example of the use of this is shown in Figure3 for the three central residues of five-residue 310 helices (structural elements in an external file derived from SQL queries on our Protein Motif database). It can be seen how the distribution of dihedral angles changes between positions, as has been documented previously[25].

Figure 3
figure 3

Superimposition of plots from different positions in a motif. (a) The plot superimposition window of Structure Motivator is shown displaying the φψ distribution for the three central residues of the five-residue 310 helix in a pentapeptide. The pentapeptide includes the N-cap (position 1) and the C-cap (position 5), the display of which have been suppressed by deselection. Residues are drawn in the order 2 (red), 3 (blue) and 4 (green), so that only the distribution for residue 4 is seen in its entirety. (b) As in the foregoing, but with the order reversed so that position 2 is drawn last and its distribution seen in its entirety. The file used for this figure (‘threeTen5.txt’) is included in the package distributed with the application.

Figure3 also illustrates that irrelevant positions (positions 1 and 5 in this case) may be excluded and that the order of imposition may be altered (cf. Figure3 (a) and (b)). One can use this facility to prepare figures for publication (e.g. Figure four and Figure five of[26]) as colours may be modified, if necessary, for the output medium, and superimpositions saved or printed.

Viewing elements in the context of a protein

A virtue of the dihedral angle plot is that it allows multiple instances of a structural type to be examined. However at a certain stage in analysis one often wishes to examine individual instances in the context of the three-dimensional structure of the proteins in which they reside. The menu option, ‘Inspect Motifs’, lists the motifs in a marquee selection (Figure4 (a)), and allows the user to select one and view it on a web page (Figure4 (b)) using the Jmol structure viewer[19]. There are two alternative views of the page — ‘in protein’ (Figure4 (b)) or ‘close-up’ (Figure4 (c)) — each with custom controls in addition to Jmol’s own controls.

Figure 4
figure 4

Inspection of structural elements in the Jmol viewer. (a) The ‘Inspect Motifs’ window of Structure Motivator with the 58 octapeptides described in the text loaded and one selected ready for viewing. (b) Selected structural element (2CBP, residues 15–22[27]), highlighted in red, in secondary structure view in the context of the whole protein. (c) Close-up view of the octapeptide with hydrogen bonds visualized. The display of side-chains and residue numbers has been suppressed. The file used to generate the octapeptide in PreMotivator (‘8mer.txt’) is included in the package distributed with the application. The description ‘40-range’ indicates that a range of ± 40˚ for the angles specified in 8mer.txt was used to define the peptide.

The way that this facility might be used is illustrated for a set of 58 octapeptides (generated with PreMotivator) in which the dihedral angles specified at positions 2–7 are those found in three successive β-turns, the first two of type I (2,3-αRαR) and the third of type II (2,3-βRαL). The question that we wished to answer was which, if any, of these elements were not parts of α-helices. Using the ‘Inspect Motifs’ facility we loaded each successively into Jmol, turned on the secondary-structure display option, located the octapeptide (highlighted in red), and noted if it fell outside the helices. We processed the 58 structural elements in just ten minutes, identifying six of interest, one of which is shown in Figure4.

The utility of the close-up view is illustrated in Figure4 (c). Specifying dihedral angles found in β-turns — as was done in generating the octapeptides — does not in itself guarantee that corresponding hydrogen bonds are present. However using the close-up view one can ‘click-join’ potential hydrogen-bonded atoms and see the length of the putative bond displayed.

Exporting data

Structure Motivator allows export of different types of data. Pertinent to the objective of sub-categorizing structural elements is export of selections of the type shown in Figure1 or2 as files in Structure Motivator format. Such exported text files can be reloaded into Structure Motivator for further analysis or sub-categorization. There are also options to save simpler listings of the primary structures of elements in a selection, either with information identifying their position in a protein (for use when inspecting the motifs in a molecular viewer other than Jmol) or as plain alphabetical strings suitable for computational analysis.

Other features

The colours with which points are displayed in Structure Motivator can be changed from the Preferences menu item to provide altered contrast for those with impaired colour vision or for the requirements of publication. Facilities for printing and saving graphical visualizations are available from the File menu. Instructions are available from the Help menu within the application, together with links to an on-line glossary of the inbuilt motifs. A manual containing more detailed instructions and information in PDF format (Additional file5) is distributed with the application and is available on-line.

Discussion

Identification and analysis of small regions of protein structure has focussed primarily on linear patterns of amino acids, for example those in the Pfam[28] and Prosite[29] databases. Fewer applications are directed specifically at the three-dimensional conformations of such structures, although the DALI tool has been used for this purpose[30] and Ramachandran Plot Explorer allows one to investigate the effects (e.g. on hydrogen bonds) of altering the dihedral angles[31]. The Ramachandran plot is frequently used for visualization or analysis in protein studies, often in relation to a single protein. For example the PROCHECK suite of programs uses Ramachandran plots to check the stereochemical quality of protein structures[32]. However this use is quite different from that in Structure Motivator, and we are not aware of comparable software for the purpose of analysing small structural elements.

The facilities most related to Structure Motivator are in a web application, PDBeMotif (formerly MSDmotif[6]), rather than in a standalone program. PDBeMotif (http://www.ebi.ac.uk/pdbe-site/pdbemotif/) has comprehensive form-based querying of the whole Protein Data Bank, and presents summary data for many of the motifs from Motivated Proteins. It also provides φ/ψ (but not χ1) plots of each motif, but these are intended for visualization, and interaction (clicking within a plot) is solely to link to the corresponding proteins (cf. our ‘Inspect Motifs’ facility). With Structure Motivator, in contrast, interaction using the marquee tool allows sub-sets within a broad group to be selected for further analysis, and we regard this as the distinguishing feature of the application.

An embedded relational database of 429 high-resolution protein structures[5] underpins Structure Motivator. This was ported from MySQL to Derby, a different database management system written in Java and designed to allow databases to be incorporated into programs. The database allows the program to generate peptide ‘templates’ from which users can prepare their own structural elements, either within the application itself, or with the auxiliary tool, PreMotivator.

Although we have provided a tool for use by structural biologists without database or programming skills, we recognize that it is not without limitations. The embedded database within Structure Motivator is restricted to 429 proteins, albeit high-resolution structures with added fixed hydrogen atoms and corrected to ensure optimal orientations of asn and gln[33]. Nevertheless, if users wish to examine structural elements from proteins not represented in this set they need to derive them elsewhere and import them into Structure Motivator. The other limitation is that we do not provide a tool for users to prepare structural motifs with particular specified hydrogen-bonding patterns, in part because SQL queries involving hydrogen bonds can be very slow to run. The motifs provided do present several hydrogen-bonding patterns that may be useful as starting points, and we have shown how the customized Jmol view provided allows sub-classes of structural elements to be examined for hydrogen bonds (Figure4 (c)).

Conclusions

We have demonstrated how Structure Motivator can be employed as a research tool to analyse and sub-classify either the inbuilt motifs provided or a user’s own set of external structural elements. Its repertoire of tools can be used to analyse any peptide with a definable structure: all that is necessary is that the peptide have a fixed number of residues and a common reference point. As an example, we have used Structure Motivator to analyse hexapeptides in which the third residue is the C-terminus of an α-helix[26] — not what one might normally regard as a ‘motif’— and then sub-divided these hexapeptides by making selections based on the conformation at the C-cap (residue 4). Other structural elements that we have analysed in our published research are α-turns and six-residue 310 helices[26], and examples from our unpublished work include β-hairpins, αRαL repeats, and peptides containing residues with dihedral angles in the ζ-region of the Ramachandran plot.

Structure Motivator provides functionality not found in other applications for investigating protein structure. Equally important are the ease, speed and immediacy with which this functionality can be employed. Consider, for example, the ζ -region of the dihedral-angle plot in Figure2, and how much easier, quicker and more accurate it is to select this with an elliptical marquee tool than by making the corresponding SQL query. The visualizations available for the structural subsets in the resulting selections themselves suggest new queries, which can be rapidly made by a succession of further selections. Thus, Structure Motivator is a unique “What if?” tool for investigating the three-dimensional structure of proteins: it both provokes ideas for experimental avenues and provides the means by which one may explore them.

Availability and requirements

Project name

Structure Motivator

Project home page

http://motif.gla.ac.uk/motivator.html

Operating system(s)

Platform independent

Programming language

Java

Other requirements

Java 1.4 or higher. Internet connection and web browser with Java support for inspecting individual structures using the Jmol applet.

License

GPL

Restrictions to non-academic use

License required