Abstract
The molecular distance geometry problem consists in finding the positions in \({\mathbb{R}}^{3}\) of atoms of a molecule, given some inter-atomic distances. In this work we formulate this problem as a nonlinear optimization problem and solve some instances using a continuous optimization routine. For each proposed experiment, we compare the numerical solution obtained with the true structure. This comparison is performed by solving a Procrustes problem.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
1 Introduction
In this work we propose and solve some computational experiments involving instances of the molecular distance geometry problem [11]. We employ a continuous optimization software to find numerical solutions to the problem. Our objective is to reconstruct three-dimensional structures of proteins using only the distances between their atoms. To attain this goal, we need to determine a set of n points \(\{{x}^{1},{x}^{2},\ldots ,{x}^{n}\} \subset{\mathbb{R}}^{3}\) such that \(\|{x}^{i} - {x}^{j}\| =\hat{ {d}}_{ij}\), where \(\hat{{d}}_{ij}\) is the Euclidean distance between the atoms i and j. We can formulate this task as a continuous optimization problem as follows:
The variables in Eq. (12.1) are the coordinates of points \({x}^{i} \in{\mathbb{R}}^{3}\), and the objective function is not differentiable when \({x}^{i} = {x}^{j}\), for some i, j. However, as the distances between atoms are always positive real numbers, we can apply a minimization algorithm that uses first derivatives to solve the problem (12.1). Then, if \(\hat{{d}}_{ij} > 0\) for all i, j, the local minimizers of Eq. (12.1) are configurations that do not contain coincident points. This result was proved by Jan de Leeuw in [4]. In the computational experiments, we solve some instances of the molecular distance geometry problem using GENCAN [2]. This routine, available atwww.ime.usp.br/~egbirgin/tangois able to find approximate solutions to minimization problems with box constraints. For each considered instance, the numerical solutions obtained by GENCAN were compared to the true structure of the analyzed protein. The comparison was carried out as follows: given the true configuration of the protein and a numerical solution, we determine a transformation that superimposes both structures in some optimal manner. This problem is known as the Procrustes problem [6, 8].
The Procrustes problem consists in finding an orthogonal matrix \(Q \in{\mathbb{R}}^{3\times 3}\) that minimizes the function
where M 0 and M 1 are matrices in \({\mathbb{R}}^{n\times 3}\) and \(\|{.\|}_{F}\) is the Frobenius norm. The orthogonal matrix Q that minimizes Eq. (12.2) has a closed-form expression. In the book of Golub and Van Loan [5], a singular value decomposition is employed to determine Q. This way of solving Eq. (12.2) does not ensure that the orthogonal matrix Q is a rotation matrix. There are cases where Q is the composition of a rotation and of a reflection. Kearsley in [9] uses unitary quaternions to find Q, and, as a result, he always obtains a rotation matrix. More references about quaternions and rotations can be found in [3, 7, 10, 12]. We desire to investigate if the numerical solutions obtained by GENCAN to the problem (12.1) differ from the original configurations by transformations involving a pure rotation or a rotation followed by a reflection. For this, we solve the Procrustes problem applying both proposed techniques: singular value decomposition and unitary quaternions.
2 Numerical Experiments
We selected some proteins from protein data bank [1] and we considered only the alpha carbon coordinates (C α) of each structure. The selected proteins and the number of atoms (\({n}_{{C}_{\alpha }}\)) are indicated in Table 12.1. We initially propose three sets of tests with these proteins; the details are discussed below. All experiments in this work have been carried out on a single core of an Intel Core 2 CPU 2.4GHz with 2GB RAM, running MAC OS X 10.5.
2.1 Solving Problems Using All Distances Between Atoms
In this set of experiments we suppose that all the distances \(\hat{{d}}_{ij}\) between the atoms are known. With the first five proteins of Table 12.1, we solve the problem (12.1) using GENCAN twenty times whereas the problem with 1lmO protein was solved two hundred times, where each run corresponds to a different starting point. Table 12.2 shows the results of runs for which GENCAN reached the lowest objective function value. The columns in this table have the following meaning: ndist is the total number of distances between pairs of atoms, nvar is the number of variables, iter and evalf are, respectively, the total number of iterations and evaluations of the objective function, f(x ∗ ) is the final objective function value, and t(s) is the CPU time in seconds. The column pure rot. shows the quantity of runs in which the optimization routine obtained a solution that differs from the true structure by a transformation involving a pure rotation. The column rot. + ref. indicates the total of rounds in which the numerical solution differs from the true structure by a transformation involving a rotation followed by a reflection. The stopping criterion of GENCAN in all tests required that the gradient norm had to be smaller than 10 − 4.
The final values of the objective function show that GENCAN finds configurations of points in \({\mathbb{R}}^{3}\) that fit all the distances. However, we noted in six tests related to the protein 6PAX that the routine obtains configurations with \(f({x}^{{_\ast}}) \approx1{0}^{3}\) and gradient norm smaller than 10 − 4. These configurations are certainly local minimizers.
We chose the protein 1OOH to illustrate a test where GENCAN obtains a solution that differs of the true configuration by a transformation involving reflection. Figure 12.1a shows the true structure of 1OOH with 126 alpha carbons. Each atom is represented by a point in \({\mathbb{R}}^{3}\) and consecutive points are joined by lines. Figure 12.1b compares the original distances between pairs of atoms (\(\hat{{d}}_{ij}\) axis) to the distances obtained numerically (axis d ij ).
The analysis with the Procrustes problem is shown in Fig. 12.2. To construct this figure we use only ten first consecutive C α atoms of 1OOH. We solve the Procrustes problem (12.2) using the two formulations discussed above. Figure 12.2a shows the optimal superimposition of the true and numerical structures. The numerical solution (red points) does not appear in this image because it is superimposed by the true structure (green points). The transformation matrix obtained in this case involves a reflection and was obtained solving Eq. (12.2) with the strategy proposed by G. Golub and C. Van Loan. Figure 12.2b shows the superimposition obtained by applying the strategy proposed by Kearsley. In this case, the orthogonal matrix describes a pure rotation. We note in Fig. 12.2b that the numerical configuration (red points) is the reflected image of the original one (green points). In this case, it is not possible to determine a rotation that superimposes both structures. The results obtained for the Procrustes problem are reported in Table 12.3.
2.2 Simulating Errors
In this set of experiments, we consider the first five proteins of Table 12.1 and we suppose that all distances between pairs of atoms were obtained with errors. To simulate this situation, we add to each value \(\hat{{d}}_{ij}\) a random number created in the interval \([-\rho ,\rho ]\), with \(\vert \rho \vert \leq1\). Then, for each protein and each fixed value ρ, we solve Eq. (12.1) twenty times with a different starting point in each run. All tables below show only the results corresponding to the tests in which GENCAN reached the lowest final value of objective function. Table 12.4 indicates the final values of objective function attained by GENCAN and Table 12.5 shows the performance of the routine for solving each instance in terms of iterations, evaluations of the objective function, and CPU time. According to Table 12.4, when we increase the parameter ρ, the final values of the objective function also increase by a factor of 102. The performance of GENCAN in these problems was quite similar to the results obtained in correspondence with problems considered in the first set of experiments.
We built some figures for an experiment related to the protein 3RAT, where we fixed ρ = 1 (fifth line and last column of Table 12.4). Figure 12.3a shows the true structure of 3RAT with 124 alpha carbons, and Fig. 12.3b indicates the result of superimposing both structures (true and numerical) by a transformation involving a pure rotation. For building this figure, we use only the first ten atoms of structures: true (green points) and numerical (red points). In this case, we obtained the same orthogonal matrix by solving the Procrustes problem with the two approaches described above:
Figure 12.4 shows a graph where the x-and y-axes represent, respectively, the perturbed distances and the distances obtained by solving Eq. (12.1) with GENCAN. Although the final value of objective function is not small, the points are close to the line y = x.
2.3 Solving Problems Using a Subset of Interatomic Distances
In these experiments, we use the same proteins reported in Table 12.1. However, we try here to recover the true structure using only distances not greater than a fixed parameter d fix. Assuming that the distances are known exactly, we varied the parameter value d fix and we analyzed the obtained results using the Procrustes technique. To each protein and each value d fix, we solve the problem (12.1) fifty times with a multistart strategy: a different initial point was used in each run. In all the cases, GENCAN stopped when the gradient norm was lower than 10 − 4.
Tables 12.6 and 12.7 show only information corresponding to the tests with the lowest value of the objective function attained by GENCAN. The total number of distances between pairs of atoms (ndist), the number of distances used to solve the problem (12.1) (nd), and the final value of the objective function (f(x ∗ )) are reported in Table 12.6. The performance of routine is indicated in Table 12.7. These results show that GENCAN can find configurations of points that fit all distances between atoms using less than 36 % of the known distances.
We illustrate a test with the protein 6PAX where we attempt to recover the true structure considering only distances not greater than 10 Å. Figure 12.5a shows the true structure with 133 alpha carbons and Fig. 12.5b compares the original distances between atoms (\(\hat{{d}}_{ij}\)) with the distances in the numerical solution (d ij ). We can see in the graph that the points are concentrated around the line y = x. To create Fig. 12.6a, we solved the Procrustes problem using a singular value decomposition and we obtained a transformation involving a reflection. In the case of Fig. 12.6b, we applied the quaternion approach and, as a result, we obtained a pure rotation matrix. These results are shown in Table 12.8.
According to the experiments, we can conclude that it is possible to use a continuous optimization routine to recover a 3D structure of a protein using only a subset of known distances between pairs of atoms. In particular, if we provide to GENCAN a reasonable starting point, the routine solves the problem very quickly.
2.4 Comparing GENCAN and MD-jeep
To finish this work, we compare the performances of GENCAN to the ones of a software tool named MD-jeep. MD-jeep was developed specifically to solve molecular distance geometry problems using combinatorial optimization techniques [13]. This software was written in C by Mucherino et al., and it is freely distributed atwww.antoniomucherino.it/en/mdjeep.php
In order to run the experiments, we used eight instances obtained from protein conformations downloaded from Protein Data Bank. We extracted the coordinates of atoms N, C α, and C from each structure. For each protein, only distances not greater than d fix = 6 Åwere considered as input for the routines. To solve the problems with GENCAN, we employ a multistart strategy: we perform runs until the routine provides a solution that differs from the original structure by a transformation involving a pure rotation. GENCAN stopped in all tests with the gradient norm smaller than 10 − 4. The solutions of MD-jeep listed in Table 12.9 differ from original structure by a linear transformation involving a rotation matrix. The columns of Table 12.9 have the following meaning: nat is the number of atoms N, C α, and C present in each protein, ndist is the total number of distances between pairs of atoms, nd is the number of distances not greater than d fix = 6 Å, and t(s) is the CPU time, in seconds.
After solving the problems with both packages, we analyze the quality of the solutions obtained using the error formula
where \(\hat{{d}}_{ij}\) is the original distance between the atoms i, j, d ij is the final distance between the points \({x}^{i},{x}^{j} \in{\mathbb{R}}^{3}\), and n d is the number of distances used in each test. We employed a Fortran procedure to evaluate the numerical solutions using the formula (12.3). The results are shown in the columns E sol of Table 12.9. According to Table 12.9, we can see that both routines attain good solutions to the problems. GENCAN obtains smaller values to the error (12.3), but MD-jeep is much faster.
References
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)
Birgin, E.G., Martínez, J.M.: Large-scale active-set box-constrained optimization method with spectral projected gradients. Comput. Optim. Appl. 23, 101–125 (2002)
Curtis, M.L.: Matrix Groups. Springer, New York (1984)
De Leeuw, J.: Differentiability of Kruskal’s Stress at a Local Minimum. Psychometrika 49, 111–113 (1984)
Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press (1996)
Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press (2004)
Horn, B.K.P.: Closed-form solution of absolute orientation using unit quaternions. J. Opt. Soc. Am. 4, 629–642 (1987)
Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A34, 827–828 (1978)
Kearsley, S.K.: On the orthogonal transformation used for structural comparisons Acta Crystallogr. A45, 208–210 (1989)
Kuipers, J.B.: Quaternions and Rotation Sequences: A Primer with Applications to Orbits, Aerospace and Virtual Reality. Princeton University Press, New Jersey (2002)
Liberti, L., Lavor, C., Mucherino, A., Maculan, N.: Distance geometry methods: from continuous to discrete. Int. Trans. Oper. Res. 18, 33–51 (2010)
Lima, R.S.: Representation of rotations: advantages and disadvantages in theory and practice. Master Thesis, Department of Applied Mathematics, IMECC, UniCamp (2007)
Mucherino, A., Liberti, L., Lavor, C.: MD-jeep: an implementation of a branch and prune algorithm for distance geometry problems. In: Fukuda, K., et al. (eds.) Proceedings of the Third International Congress on Mathematical Software (ICMS10), Lectures Notes in Computer Science, vol. 6327, pp. 186–197. Kobe, Japan (2010)
Acknowledgements
The authors are thankful to PRONEX-Optimization (PRONEX - CNPq / FAPERJ E-26 / 171.164/2003 - APQ1), FAPESP (Grant 06/53768-0), and CNPq.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Lima, R.S., Martínez, J.M. (2013). Solving Molecular Distance Geometry Problems Using a Continuous Optimization Approach. In: Mucherino, A., Lavor, C., Liberti, L., Maculan, N. (eds) Distance Geometry. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5128-0_12
Download citation
DOI: https://doi.org/10.1007/978-1-4614-5128-0_12
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5127-3
Online ISBN: 978-1-4614-5128-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)