Keywords

6.1 Introduction

The structure of proteins and other macromolecules is fundamental for the underlying biological interactions. As biological molecules interact at their surfaces, an understanding of the surface characteristics of the participating molecules would be particularly useful for studying interactions among them. Although the boundary surface of the electronic density surrounding a molecule is not well defined, the term of molecular surfaces was first introduced by Richards in 1977 to describe a molecular envelope accessible, e.g., by a solvent molecule [1]. There are several representational schemes to define the molecular surface model. These include the isovalue electronic density surface, van der Waals surface, Richards’s molecular surface, and solvent accessible surface (SAS) [2].

The isovalue electronic density surface is described as the molecular envelope consisting points with the same electronic density values, generally 0.002 au, in a given volume.

The van der Waals surface is, however, defined as the molecular envelope containing the atomic spheres with van der Waals radii. It is simply constructed from overlapping van der Waals spheres of the atoms. Given the spherical representation of the atoms with van der Waals radii, the van der Waals surface is represented as the union of all portions of all atomic sphere surfaces not occluded by neighboring atomic spheres.

Richards’s molecular surface is composed of two different kinds of surface patches: the contact surface and the reentrant surface [1]. Imagine the approach of a small “probe” molecule up to the van der Waals surface of a macromolecule. Depending on the size of the probe molecule (except for a probe of zero size), there will be regions of “dead space,” crevices that are not accessible to the probe as it rolls about on the macromolecule. The molecular surface is traced out by the inward-facing part of the probe molecule sphere as it rolls on the van der Waals surface of the macromolecule. The contact surface is formed by the part of the van der Waals surface of each atom that is accessible to the probe sphere. The reentrant surface corresponds to the inward-facing part of the probe sphere when it is simultaneously in contact with two or three atoms forming crevices too narrow for the probe molecule to penetrate. Richards’s molecular surface is usually defined using a water molecule as the probe, represented as a sphere with radius 1.4 Å. In [3], Connolly has proposed an analytical method for calculating Richards’s molecular surface, with which a set of curved regions of spheres and tori, joined together at circular arcs, are used to describe the molecular surface.

The solvent accessible surface (SAS) corresponds to the molecular envelope of the surface that is traced by the center of the probe molecule sphere as it rolls on the van der Waals surface of the macromolecule [4, 5]. The center of the probe molecule can thus be placed at any point on the accessible surface and not penetrate the van der Waals spheres of any of the atoms in the macromolecule. Mathematically, it is equivalent to a van der Waals surface in which the atomic radii have been extended by the probe radius.

Figure 6.1 illustrates the last three kinds of representational schemes for the molecular surface model.

Fig. 6.1
figure 1

Schematic view of van der Waals surface, Richards’s surface, and SAS

Molecular surface modeling has several applications. One direct benefit with molecular surfaces is the protein or macromolecule visualization [68]. Various physical chemical properties such as electrostatic potential and hydrophobicity [9] can be mapped onto the molecular surface and color coded [1014]. Crucial in protein-protein interaction and interface study [15], molecular surfaces have been applied to the protein-protein docking problem which is the prediction of a complex between two proteins given the three-dimensional structures of the individual proteins [1618]. Identifying binding pockets on protein surfaces to help in rational or structure-based drug design [1925] is another major purpose of molecular surface investigation.

For those atoms of a protein or other macromolecules, a significant number of them lie buried beneath the molecular surface of the protein or macromolecule. Interactions among these macromolecules are often dominated by interactions with the “surface atoms,” although interactions with the interior atoms of the macromolecule certainly contribute to the total intermolecular interaction energy. Therefore, a classification with “surface atoms” or “interior atoms” of proteins or other macromolecules is significant for biochemical tasks, particularly for molecular docking. For such a classification, several factors should be considered, e.g., the running time of the classification algorithm, number of surface atoms correctly identified, and the numbers of surface atoms and interior atoms incorrectly identified [26].

In this chapter, we present a simple, graphics hardware-based approach to identifying surface atoms of macromolecules from interior atoms. The chapter is organized as follows. In Sect. 6.2, we review the related research works. In Sect. 6.3, we describe the overview of our algorithm as well as its implementation details. Section 6.4 presents some experimental results and discussions and the final section concludes our study.

6.2 Prior Work

Deanda et al. [26] propose a definition for surface atoms as follows: “An atom will be classified as an ‘effective surface atom’ if its SAS area is greater than a user specified minimum threshold value for the atomic SAS area SA min acc .” Accordingly, they develop an SAS approach to distinguishing the surface atoms of macromolecules from the interior atoms. The SAS approach is a computational one that calculates the atomic contributions to the SAS area and designating beforehand a constant value as the minimum threshold for the atomic SAS area. They adopt a surface area and volume package (SAVOL3) [27, 28] to calculate the atomic SAS area. In their paper, they also summarize several other methods for surface atom identification: (1) the NIN (number of intersecting neighbors) approach based on the intuitive notion that the number of intersecting neighbors (i.e., atomic spheres intersect one another) would be far greater for interior atoms than for surface atoms, (2) the SOV (sum of vectors) approach which is a variation of the NIN approach and uses the norm of the SOV to its neighbors as a criterion for classifying surface atoms from interior atoms, (3) the UCSF (University of California at San Francisco) approach that imbeds the macromolecule within a 3D lattice and associates the atoms with the lattice points for classifying surface atoms [29], and (4) the MDS (molecular dot surface) approach which uses the molecular cloud point representation to identify surface atoms [30].

All those algorithms are geometry based. While the NIN, SOV, UCSF, and MDS approaches suffer from ambiguities for identifying surface atoms (i.e., atoms are often misclassified) [26], the SAS approach needs geometry computations of atomic SAS areas which are often performed with specific software packages. With the rapid development of graphics processing unit (GPU), numerous applications have been developed based on graphics hardware [3135, 3942]. We believe that techniques developed for graphics hardware rendering will be very useful for bio-related tasks, such as the identification of surface atoms for proteins or other macromolecules.

6.3 Algorithm Overview and Implementation

The kernel idea behind the definition of surface atoms in [26] is that if an atom of a macromolecule contributes to the molecule’s SAS, then the atom will be considered as a “surface atom” of the molecule. Bearing this in mind, we adjust slightly the surface atom definition as follows. Let an atom A (with van der Waals radius r) of a macromolecule M be represented as a hard sphere HS and the counterpart of HS with the radius being extended by the probe radius pr to (r + pr) be denoted as an extended hard sphere (EHS), and then atom A will be classified as a “surface atom” if EHS can be seen from outside of the solvent accessible surface (SAS) of the molecule M.

6.3.1 Algorithm Overview

Our algorithm is based on the rendering of the EHSs with commercially available graphics hardware. Therefore, we can exploit the hardware to increase performance.

Imagine that the solvent accessible surface of a macromolecule M is surrounded by a bounding box and that each face of the box is a viewing plane. An image is generated for each face by parallel projecting onto it the EHSs of the macromolecule M with hidden surfaces removed by depth comparison (Fig. 6.2).

Fig. 6.2
figure 2

Identifying surface atoms with color and depth buffers

Therefore, if the EHS of an atom appears in one or more of the six images, then the atom will be classified as a surface atom. Resolutions for the faces are chosen so that there are enough pixels for classifying the surface atoms.

6.3.2 Implementation

The implementation of the algorithm takes the advantage of graphics hardware capabilities (e.g., color buffer and depth buffer), OpenGL graphics library as well as the OpenGL utility toolkit (GLUT) [36, 37]. Apart from the objects positioning and orientation in the scene, OpenGL offers facilities to define a viewing volume and to specify the way objects are projected on the screen. There are two kinds of projection: orthographic and perspective. The orthographic projection draws object without affecting their relative size. The perspective projection is similar to our vision mode: the further an object is, the smaller it appears, and two parallel straight lines seem to converge in the distance. In both cases, viewing volumes are hexahedra: a box or a truncated pyramid respectively (Fig. 6.3).

Fig. 6.3
figure 3

(a) Orthographic and (b) perspective views

In our algorithm, the orthographic projection is used and the bounding box of the macromolecule’s SAS is adopted as the viewing volume. An image is generated for each of the six faces of the viewing volume by rendering the EHSs of the macromolecule with hidden surfaces removed.

For graphics hardware rendering with OpenGL, the color information at each pixel can be stored either in RGBA mode or in color-index mode. In the first mode, the R, G, B, and possibly alpha values are kept for each pixel. In the second mode, however, only a single number (called the color index) is stored for each pixel. Each color index indicates an entry in a color table that defines a particular set of R, G, and B values. In either RGBA or color-index mode, a certain amount of color data is stored at each pixel. This amount is determined by the number of bitplanes in the frame buffer. A bitplane contains one bit of data for each pixel.

For most commonly available low-end graphics cards, at least 16 bitplanes are provided for color storage in RGBA mode, and at most 8 bitplanes are available in color-index mode. Considering there are often several hundreds to thousands of atoms in a typical macromolecule, we choose the RGBA mode in this implementation. It would be more straightforward with the color-index implementation, and high-end graphics workstations can be used to improve its efficiency (e.g., with 12 bitplanes on SGI Octane workstations for color-index buffers).

Each atom is firstly initialized with a unique identity, and a color table (with the number of atoms of the macromolecule in size) is created with each of its components corresponding to an atom identity, and then each atom’s EHS is rendered with the color (in the color table) corresponding to the atom’s identity. Subsequently, the color values of the rendered atoms’ EHSs are read from the color buffer and used to determine the appearance of the EHSs in the images. To do so, a Boolean array is used as a flag list to indicate which atom is a surface atom and which one is not. The display list is used for rendering EHSs with a high performance.

It is worth noting that the same viewing matrix is used for a pair of rendering (e.g., front and back, left and right, and top and bottom). This is done by setting the depth comparison logic on one of the renderings to save the z-depth values farthest away instead of closest with glDepthFunc( ) and set the face culling logic on the same rendering to eliminate the front polygons of EHSs with glCullFace( ). For instance, when rendering the two images for the front and back pair of the viewing volume, firstly the viewing matrix for the front view is set, and then the first image (corresponding with the front view) is generated by culling back polygons (of EHSs of the molecule) which face away from the front view and setting the depth comparison logic to GL_LEQUAL to make the depth test satisfied if the incoming z value is less than or equal to the stored z value and finally the second image (corresponding to the back view) is rendered by culling front polygons which face toward the front view and setting the depth comparison logic to GL_GREATER to make the depth test passed if the incoming z value is greater than the stored z value.

Figure 6.4 lists the pseudo code of our algorithm.

Fig. 6.4
figure 4

Pseudo code of our algorithm

6.3.3 Improvements

The above algorithm can quickly and successfully classify most surface atoms of any macromolecules. The main limitation of the above approach is that it may miss concavities. If some EHS of an atom contributes to the molecule’s SAS and is not visible from any of the six faces of the viewing volume, then this atom will not be properly classified. The algorithm, however, can be easily improved by adding more viewing planes. For instance, we can sample from the four diagonals of the above bounding box to add 8 more viewing directions and construct viewing planes to render the atoms’ EHSs (Fig. 6.5). Furthermore, we find using higher resolution of the viewing plane can also improve the classification. We will show with experiment how they help in the next section.

Fig. 6.5
figure 5

Improving the algorithm by sampling from additional 8 viewing directions

6.4 Experimental Results and Discussions

Several macromolecules from the Protein Data Bank (PDB) [38] were tested under the resolution of 1792 * 1344 in the true color mode (32-bit mode). Figure 6.6 shows the contents of the color buffer when performing the test with a triose-phosphate isomerase (1TIM). Table 6.1 lists the testing results with a dihydrofolate reductase (1RA2), a thermolysin (7TLN), and a triose-phosphate isomerase (1TIM). The tests were performed under different resolutions of the viewing plane (e.g., 100 * 100, 400 * 400, 800 * 800, 1000 * 1000, and 1182 * 1182) and with different configurations of viewing planes (e.g., 6 viewing planes and 14 viewing planes). For comparison reason, the experimental data of the SAS approach selected from [26] were listed in Table 6.2. Their experiments were performed on an SGI Indigo with an R4400 processor.

Fig. 6.6
figure 6

Color buffer contents when testing with triose-phosphate isomerase (1TIM)

Table 6.1 Experimental results with several macromolecules from Protein Data Bank
Table 6.2 Experimental data of the SAS approach (selected from [26])

From Table 6.1, we can clearly see that the number of classified surface atoms increases with the increment of both the viewing planes and the rendering resolution. However, while the accuracy of the classification is nearly constantly improved with more sampling view planes, the number of classified surface atoms increases nonlinearly with the increment of the rendering resolution. For the number of classified surface atoms of the 3 testing macromolecules, there is only a subtle degree of difference for the resolutions of 1000 * 1000 and 1182 * 1182.

Theoretically, there may be an “accurate” or “exact” number of surface atoms for a macromolecular structure, and there may exist a “clear” borderline between surface atoms and interior atoms. However, to our knowledge, there is yet to have a theoretical solution at present time to calculate the “accurate” or “exact” surface atom number. It is a challenging job as well to numerically find out this “accurate” or “exact” number and/or “clear” borderline. In fact, the accuracy of the SAS approach [26] is dependent upon the user-specified minimum threshold value for the atomic SAS area and the precision of the atomic SAS area calculation. On the other hand, the accuracy of our graphics hardware-based approach depends upon both the viewing plane setting and the rendering resolution. Still, we think that the numerical solutions are worth trying when “accurate” theoretical solutions are not available. Also, we believe that the numbers of classified surface atoms from our approach show kinds of tendency of convergence when the viewing directions and resolution are increased. This again turns out as an interesting yet difficult research topic.

Conclusions

This chapter presents a fast and easy-to-implement algorithm for identifying surface atoms of macromolecules from interior atoms, which is based on the color buffer and z-buffer. The algorithm can be easily incorporated within visualization applications for macromolecules as a preprocessing step to enable the removal of interior atoms from the macromolecular structure. Doing so, a simplified macromolecular structure can be generated for graphics display which can reduce the time required for display and manipulation of macromolecules.

Unlike existing methods for identifying surface atoms of macromolecules mainly based on geometry computations performed by general CPU, our approach takes the advantage of widely available graphics hardware and most of the computations are fulfilled with the graphics processing unit (GPU). As our algorithm is based on the color buffer and z-buffer, its complexity is independent of the molecule complexity but dependent on the rendering resolution and its viewing plane setting.

With the computational power of graphics hardware outperforming that of general CPU by Moore’s law [34], we believe that algorithms based on GPU for biochemical tasks will be very promising in the future.