Solving Molecular Distance Geometry Problems Using a Continuous Optimization Approach

Lima, Rodrigo S.; Martínez, J. M.

doi:10.1007/978-1-4614-5128-0_12

Rodrigo S. Lima⁵ &
J. M. Martínez⁶

1995 Accesses
1 Citations

Abstract

The molecular distance geometry problem consists in finding the positions in ${\mathbb{R}}^{3}$ of atoms of a molecule, given some inter-atomic distances. In this work we formulate this problem as a nonlinear optimization problem and solve some instances using a continuous optimization routine. For each proposed experiment, we compare the numerical solution obtained with the true structure. This comparison is performed by solving a Procrustes problem.

Access provided by Autonomous University of Puebla. Download chapter PDF

Modeling the Molecular Distance Geometry Problem Using Dihedral Angles

Unassigned distance geometry and molecular conformation problems

Article 15 April 2021

Recent advances on the interval distance geometry problem

Article 03 January 2017

Keywords

1 Introduction

In this work we propose and solve some computational experiments involving instances of the molecular distance geometry problem [11]. We employ a continuous optimization software to find numerical solutions to the problem. Our objective is to reconstruct three-dimensional structures of proteins using only the distances between their atoms. To attain this goal, we need to determine a set of n points $\{{x}^{1},{x}^{2},\ldots ,{x}^{n}\} \subset{\mathbb{R}}^{3}$ such that $\|{x}^{i} - {x}^{j}\| =\hat{ {d}}_{ij}$, where $\hat{{d}}_{ij}$ is the Euclidean distance between the atoms i and j. We can formulate this task as a continuous optimization problem as follows:

$$\begin{array}{rcl} \mbox{ minimize}\ \,& & \sum\limits_{i,j}\ {(\|{x}^{i} - {x}^{j}\| -\hat{ {d}}_{ ij})}^{2}, \\ \mbox{ subject to}\ \,& & {x}^{i} \in{\mathbb{R}}^{3},i = 1,2,\ldots ,n.\end{array}$$

(12.1)

The variables in Eq. (12.1) are the coordinates of points ${x}^{i} \in{\mathbb{R}}^{3}$, and the objective function is not differentiable when ${x}^{i} = {x}^{j}$, for some i, j. However, as the distances between atoms are always positive real numbers, we can apply a minimization algorithm that uses first derivatives to solve the problem (12.1). Then, if $\hat{{d}}_{ij} > 0$ for all i, j, the local minimizers of Eq. (12.1) are configurations that do not contain coincident points. This result was proved by Jan de Leeuw in [4]. In the computational experiments, we solve some instances of the molecular distance geometry problem using GENCAN [2]. This routine, available atwww.ime.usp.br/~egbirgin/tangois able to find approximate solutions to minimization problems with box constraints. For each considered instance, the numerical solutions obtained by GENCAN were compared to the true structure of the analyzed protein. The comparison was carried out as follows: given the true configuration of the protein and a numerical solution, we determine a transformation that superimposes both structures in some optimal manner. This problem is known as the Procrustes problem [6, 8].

The Procrustes problem consists in finding an orthogonal matrix $Q \in{\mathbb{R}}^{3\times 3}$ that minimizes the function

$$g(Q) =\| {M}_{0} - {M}_{1}{Q\|}_{F},$$

(12.2)

where M ₀ and M ₁ are matrices in ${\mathbb{R}}^{n\times 3}$ and $\|{.\|}_{F}$ is the Frobenius norm. The orthogonal matrix Q that minimizes Eq. (12.2) has a closed-form expression. In the book of Golub and Van Loan [5], a singular value decomposition is employed to determine Q. This way of solving Eq. (12.2) does not ensure that the orthogonal matrix Q is a rotation matrix. There are cases where Q is the composition of a rotation and of a reflection. Kearsley in [9] uses unitary quaternions to find Q, and, as a result, he always obtains a rotation matrix. More references about quaternions and rotations can be found in [3, 7, 10, 12]. We desire to investigate if the numerical solutions obtained by GENCAN to the problem (12.1) differ from the original configurations by transformations involving a pure rotation or a rotation followed by a reflection. For this, we solve the Procrustes problem applying both proposed techniques: singular value decomposition and unitary quaternions.

2 Numerical Experiments

We selected some proteins from protein data bank [1] and we considered only the alpha carbon coordinates (C _α) of each structure. The selected proteins and the number of atoms (${n}_{{C}_{\alpha }}$) are indicated in Table 12.1. We initially propose three sets of tests with these proteins; the details are discussed below. All experiments in this work have been carried out on a single core of an Intel Core 2 CPU 2.4GHz with 2GB RAM, running MAC OS X 10.5.

Table 12.1 Proteins used in the computational experiments

Full size table

2.1 Solving Problems Using All Distances Between Atoms

In this set of experiments we suppose that all the distances $\hat{{d}}_{ij}$ between the atoms are known. With the first five proteins of Table 12.1, we solve the problem (12.1) using GENCAN twenty times whereas the problem with 1lmO protein was solved two hundred times, where each run corresponds to a different starting point. Table 12.2 shows the results of runs for which GENCAN reached the lowest objective function value. The columns in this table have the following meaning: ndist is the total number of distances between pairs of atoms, nvar is the number of variables, iter and evalf are, respectively, the total number of iterations and evaluations of the objective function, f(x ^∗) is the final objective function value, and t(s) is the CPU time in seconds. The column pure rot. shows the quantity of runs in which the optimization routine obtained a solution that differs from the true structure by a transformation involving a pure rotation. The column rot. + ref. indicates the total of rounds in which the numerical solution differs from the true structure by a transformation involving a rotation followed by a reflection. The stopping criterion of GENCAN in all tests required that the gradient norm had to be smaller than 10^− 4.

Table 12.2 First set of experiments: all the distances are known

Full size table

The final values of the objective function show that GENCAN finds configurations of points in ${\mathbb{R}}^{3}$ that fit all the distances. However, we noted in six tests related to the protein 6PAX that the routine obtains configurations with $f({x}^{{_\ast}}) \approx1{0}^{3}$ and gradient norm smaller than 10^− 4. These configurations are certainly local minimizers.

We chose the protein 1OOH to illustrate a test where GENCAN obtains a solution that differs of the true configuration by a transformation involving reflection. Figure 12.1a shows the true structure of 1OOH with 126 alpha carbons. Each atom is represented by a point in ${\mathbb{R}}^{3}$ and consecutive points are joined by lines. Figure 12.1b compares the original distances between pairs of atoms ($\hat{{d}}_{ij}$ axis) to the distances obtained numerically (axis d _ij).

The analysis with the Procrustes problem is shown in Fig. 12.2. To construct this figure we use only ten first consecutive C _α atoms of 1OOH. We solve the Procrustes problem (12.2) using the two formulations discussed above. Figure 12.2a shows the optimal superimposition of the true and numerical structures. The numerical solution (red points) does not appear in this image because it is superimposed by the true structure (green points). The transformation matrix obtained in this case involves a reflection and was obtained solving Eq. (12.2) with the strategy proposed by G. Golub and C. Van Loan. Figure 12.2b shows the superimposition obtained by applying the strategy proposed by Kearsley. In this case, the orthogonal matrix describes a pure rotation. We note in Fig. 12.2b that the numerical configuration (red points) is the reflected image of the original one (green points). In this case, it is not possible to determine a rotation that superimposes both structures. The results obtained for the Procrustes problem are reported in Table 12.3.

Table 12.3 Results of procrustes problem with 1OOH protein

Full size table

2.2 Simulating Errors

In this set of experiments, we consider the first five proteins of Table 12.1 and we suppose that all distances between pairs of atoms were obtained with errors. To simulate this situation, we add to each value $\hat{{d}}_{ij}$ a random number created in the interval $[-\rho ,\rho ]$, with $\vert \rho \vert \leq1$. Then, for each protein and each fixed value ρ, we solve Eq. (12.1) twenty times with a different starting point in each run. All tables below show only the results corresponding to the tests in which GENCAN reached the lowest final value of objective function. Table 12.4 indicates the final values of objective function attained by GENCAN and Table 12.5 shows the performance of the routine for solving each instance in terms of iterations, evaluations of the objective function, and CPU time. According to Table 12.4, when we increase the parameter ρ, the final values of the objective function also increase by a factor of 10². The performance of GENCAN in these problems was quite similar to the results obtained in correspondence with problems considered in the first set of experiments.

Table 12.4 Final values of objective function reached by GENCAN

Full size table

Table 12.5 Performance of GENCAN in tests with errors

Full size table

We built some figures for an experiment related to the protein 3RAT, where we fixed ρ = 1 (fifth line and last column of Table 12.4). Figure 12.3a shows the true structure of 3RAT with 124 alpha carbons, and Fig. 12.3b indicates the result of superimposing both structures (true and numerical) by a transformation involving a pure rotation. For building this figure, we use only the first ten atoms of structures: true (green points) and numerical (red points). In this case, we obtained the same orthogonal matrix by solving the Procrustes problem with the two approaches described above:

$$Q = \left (\begin{array}{rrr} - 0.728719& - 0.651336& 0.211497\\ 0.141985 & - 0.44583 & - 0.883785 \\ 0.669932& - 0.614001& 0.417364 \end{array} \right ),\quad g(Q) = 2.02778.$$

Figure 12.4 shows a graph where the x-and y-axes represent, respectively, the perturbed distances and the distances obtained by solving Eq. (12.1) with GENCAN. Although the final value of objective function is not small, the points are close to the line y = x.

2.3 Solving Problems Using a Subset of Interatomic Distances

In these experiments, we use the same proteins reported in Table 12.1. However, we try here to recover the true structure using only distances not greater than a fixed parameter d _fix. Assuming that the distances are known exactly, we varied the parameter value d _fix and we analyzed the obtained results using the Procrustes technique. To each protein and each value d _fix, we solve the problem (12.1) fifty times with a multistart strategy: a different initial point was used in each run. In all the cases, GENCAN stopped when the gradient norm was lower than 10^− 4.

Tables 12.6 and 12.7 show only information corresponding to the tests with the lowest value of the objective function attained by GENCAN. The total number of distances between pairs of atoms (ndist), the number of distances used to solve the problem (12.1) (nd), and the final value of the objective function (f(x ^∗)) are reported in Table 12.6. The performance of routine is indicated in Table 12.7. These results show that GENCAN can find configurations of points that fit all distances between atoms using less than 36 % of the known distances.

Table 12.6 Final values of objective function attained by GENCAN

Full size table

Table 12.7 Performance of GENCAN in the resolution of problems

Full size table

We illustrate a test with the protein 6PAX where we attempt to recover the true structure considering only distances not greater than 10 Å. Figure 12.5a shows the true structure with 133 alpha carbons and Fig. 12.5b compares the original distances between atoms ($\hat{{d}}_{ij}$) with the distances in the numerical solution (d _ij). We can see in the graph that the points are concentrated around the line y = x. To create Fig. 12.6a, we solved the Procrustes problem using a singular value decomposition and we obtained a transformation involving a reflection. In the case of Fig. 12.6b, we applied the quaternion approach and, as a result, we obtained a pure rotation matrix. These results are shown in Table 12.8.

Table 12.8 Results of procrustes problem with 6PAX protein

Full size table

According to the experiments, we can conclude that it is possible to use a continuous optimization routine to recover a 3D structure of a protein using only a subset of known distances between pairs of atoms. In particular, if we provide to GENCAN a reasonable starting point, the routine solves the problem very quickly.

2.4 Comparing GENCAN and MD-jeep

To finish this work, we compare the performances of GENCAN to the ones of a software tool named MD-jeep. MD-jeep was developed specifically to solve molecular distance geometry problems using combinatorial optimization techniques [13]. This software was written in C by Mucherino et al., and it is freely distributed atwww.antoniomucherino.it/en/mdjeep.php

In order to run the experiments, we used eight instances obtained from protein conformations downloaded from Protein Data Bank. We extracted the coordinates of atoms N, C _α, and C from each structure. For each protein, only distances not greater than d _fix = 6 Åwere considered as input for the routines. To solve the problems with GENCAN, we employ a multistart strategy: we perform runs until the routine provides a solution that differs from the original structure by a transformation involving a pure rotation. GENCAN stopped in all tests with the gradient norm smaller than 10^− 4. The solutions of MD-jeep listed in Table 12.9 differ from original structure by a linear transformation involving a rotation matrix. The columns of Table 12.9 have the following meaning: nat is the number of atoms N, C _α, and C present in each protein, ndist is the total number of distances between pairs of atoms, nd is the number of distances not greater than d _fix = 6 Å, and t(s) is the CPU time, in seconds.

Table 12.9 Comparing GENCAN and MDJEEP

Full size table

After solving the problems with both packages, we analyze the quality of the solutions obtained using the error formula

$${E}_{\mbox{ sol}} = \frac{1} {{n}_{d}}\sum\limits_{i,j}\ \frac{\vert \hat{{d}}_{ij} - {d}_{ij}\vert } {{d}_{ij}} ,$$

(12.3)

where $\hat{{d}}_{ij}$ is the original distance between the atoms i, j, d _ij is the final distance between the points ${x}^{i},{x}^{j} \in{\mathbb{R}}^{3}$, and n _d is the number of distances used in each test. We employed a Fortran procedure to evaluate the numerical solutions using the formula (12.3). The results are shown in the columns E _sol of Table 12.9. According to Table 12.9, we can see that both routines attain good solutions to the problems. GENCAN obtains smaller values to the error (12.3), but MD-jeep is much faster.

References

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)
Article Google Scholar
Birgin, E.G., Martínez, J.M.: Large-scale active-set box-constrained optimization method with spectral projected gradients. Comput. Optim. Appl. 23, 101–125 (2002)
Article MathSciNet MATH Google Scholar
Curtis, M.L.: Matrix Groups. Springer, New York (1984)
Book Google Scholar
De Leeuw, J.: Differentiability of Kruskal’s Stress at a Local Minimum. Psychometrika 49, 111–113 (1984)
Article MathSciNet Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press (1996)
Google Scholar
Gower, J.C., Dijksterhuis, G.B.: Procrustes Problems. Oxford University Press (2004)
Google Scholar
Horn, B.K.P.: Closed-form solution of absolute orientation using unit quaternions. J. Opt. Soc. Am. 4, 629–642 (1987)
Article Google Scholar
Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A34, 827–828 (1978)
Google Scholar
Kearsley, S.K.: On the orthogonal transformation used for structural comparisons Acta Crystallogr. A45, 208–210 (1989)
Google Scholar
Kuipers, J.B.: Quaternions and Rotation Sequences: A Primer with Applications to Orbits, Aerospace and Virtual Reality. Princeton University Press, New Jersey (2002)
Google Scholar
Liberti, L., Lavor, C., Mucherino, A., Maculan, N.: Distance geometry methods: from continuous to discrete. Int. Trans. Oper. Res. 18, 33–51 (2010)
Article MathSciNet Google Scholar
Lima, R.S.: Representation of rotations: advantages and disadvantages in theory and practice. Master Thesis, Department of Applied Mathematics, IMECC, UniCamp (2007)
Google Scholar
Mucherino, A., Liberti, L., Lavor, C.: MD-jeep: an implementation of a branch and prune algorithm for distance geometry problems. In: Fukuda, K., et al. (eds.) Proceedings of the Third International Congress on Mathematical Software (ICMS10), Lectures Notes in Computer Science, vol. 6327, pp. 186–197. Kobe, Japan (2010)
Google Scholar

Download references

Acknowledgements

The authors are thankful to PRONEX-Optimization (PRONEX - CNPq / FAPERJ E-26 / 171.164/2003 - APQ1), FAPESP (Grant 06/53768-0), and CNPq.

Author information

Authors and Affiliations

Department of Mathematics and Computation, ICE-UNIFEI, Federal University of Itajubá, 37500-903, Itajubá, MG, Brazil
Rodrigo S. Lima
Department of Applied Mathematics, IMECC-UNICAMP, University of Campinas, 13081-970, Campinas, SP, Brazil
J. M. Martínez

Authors

Rodrigo S. Lima
View author publications
You can also search for this author in PubMed Google Scholar
J. M. Martínez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodrigo S. Lima .

Editor information

Editors and Affiliations

IRISA, University of Rennes 1, avenue du General Leclerc, Rennes, 35042, France
Antonio Mucherino
, Dept of Applied Maths (IMECC-UNICAMP), State University of Campinas, Campinas, 13081, Brazil
Carlile Lavor
, LIX, Ecole Polytechnique, Palaiseau, 91128, France
Leo Liberti
Instituto Alberto Luiz Coimbra de, Pos-Graduacao e Pesquisa de, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil
Nelson Maculan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lima, R.S., Martínez, J.M. (2013). Solving Molecular Distance Geometry Problems Using a Continuous Optimization Approach. In: Mucherino, A., Lavor, C., Liberti, L., Maculan, N. (eds) Distance Geometry. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5128-0_12

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5128-0_12
Published: 03 November 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5127-3
Online ISBN: 978-1-4614-5128-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Solving Molecular Distance Geometry Problems Using a Continuous Optimization Approach

Abstract

Similar content being viewed by others

Modeling the Molecular Distance Geometry Problem Using Dihedral Angles

Unassigned distance geometry and molecular conformation problems

Recent advances on the interval distance geometry problem

Keywords

1 Introduction

2 Numerical Experiments