Long Short-Term Memory in Chemistry Dynamics Simulation

Wu, Heng; Lu, Shaofei; Eduardo, Colmenares-Diaz; Liang, Junbin; She, Jingke; Tan, Xiaolin

doi:10.1007/978-3-030-70296-0_9

Heng Wu⁸,
Shaofei Lu⁹,
Colmenares-Diaz Eduardo¹⁰,
Junbin Liang¹¹,
Jingke She¹² &
…
Xiaolin Tan¹²

Part of the book series: Transactions on Computational Science and Computational Intelligence ((TRACOSCI))

1513 Accesses

Abstract

Chemistry dynamics simulation is widely used in quantitative structure activity relationship QSAR, virtual screening, protein structure prediction, quantum chemistry, materials design, and property prediction, etc. This chapter explores the idea of integrating Long Short-Term Memory (LSTM) with chemistry dynamics simulations to enhance the performance of the simulation and improve its usability for research and education. The idea is successfully used to predict the location, energy, and Hessian of atoms in a H2O reaction system. The results demonstrate that the artificial neural network–based memory model successfully learns the desired features associated with the atomic trajectory and rapidly generates predictions that are in excellent agreement with the results from chemistry dynamics simulations. The accuracy of the prediction is better than expected.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Long Short-Term Memory Neural Network on the Trajectory Computing of Direct Dynamics Simulation

Molecular Dynamics with Neural Network Potentials

Machine Learning for Molecular Dynamics on Long Timescales

Keywords

1 Introduction

Classical trajectory chemical dynamics simulations are widely and powerful tools that have been used to study reaction dynamics since the 1960s [1]. In contrast to the variational transition state theory (VTST) and reaction path Hamiltonian methods [2], they provide much greater insight into the dynamics of reactions for the classical equations of motion of the atoms are numerically integrated on a potential energy surface (PES). The traditional approach uses an analytic function that is gotten by fitting ab initio and/or experimental data [3] to construct the surface. Regarding a small number of atoms or a high degree of symmetry [4, 5], it is practical. Researchers recently proposed additional approaches and algorithms for representing PESs. Wang and Karplus firstly demonstrated that the trajectories may be integrated “on the fly” when the potential energy and gradient are available at each point of the numerical integration according to an electronic structure theory calculation. During the numerical integration, the method directly calculates the local potential and gradient under an electronic structure theory in a “direct dynamics” simulation. However, regarding a high-level electronic structure theory, the computation of direct dynamics simulations become quite expensive. Thus, it is important to use the largest numerical integration step size when maintaining the accuracy of the trajectory. In order to use a larger integration step, Helgaker et al. adopt the second derivative of the potential (Hessian). After the Hessians are gotten directly by an electronic structure theory, using a second-order Taylor expansion, a local approximation PES can be constructed and the trajectories can be approximately calculated. For local quadratic potential is only valid in a small region (named a “trust radius”), the equations of motion are only integrated under the trust radius. The new potential, gradient, and Hessian, calculated again at the end of the trust radius, define a new local quadratic PES where the integration of the equations of motion is successive. Millam et al. used a fifth-order polynomial or a rational function to fit the potential between the potential, gradients, and Hessians at the beginning and end of each integration step. It provides a more accurate trajectory in the trust region and calculates larger integration steps. That involves a predictor step, the integration on the approximate quadratic model potential. The following step, the fitting on the fifth-order PES between the starting point and the end point in the trust radius, is also called the “corrector step.” It is named the Hessian-based predictor–corrector integration scheme. Around it, some scholars proposed their own methods. Because of extrapolation, errors in prediction–correction algorithms grow rapidly, usually four predictions are followed by an ab initio calculation. This limits the improvement of computing performance.

The successful application of the prediction of deep learning in computational chemistry greatly expanded its application. Deep learning is a machine learning algorithm, not unlike those already in use in various applications in computational chemistry, from computer-aided drug design to materials property prediction [6]. Deep learning models achieved top positions in the Tox21 toxicity prediction challenge issued by NIH in 2014 [7]. Among some of its more high-profile achievements include the Merck activity prediction challenge in 2012, where a deep neural network not only won the competition and outperformed Merck’s internal baseline model, but did so without having a single chemist or biologist in their team. Machine learning (ML) models also can be used to infer quantum mechanical (QM) expectation values of molecules, based on reference calculations across chemical space [8]. Such models can speed up predictions by several orders of magnitude, demonstrated for relevant molecular properties such as polarizabilities, electron correlation, and electronic excitations [9]. LSTM is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. The prediction of LSTM has been widely used in different fields [10, 11].

In this chapter, we explore the idea of integrating LSTM layer with chemistry dynamics simulations to enhance the performance in trust radius. This idea is inspired by the recent development and use of LSTM in material simulations and scientific software applications [12]. We employ a particular example, H₂O molecular dynamics simulation on NWChem/Venus (cdssim.chem.ttu.edu) package [13] to illustrate this idea. LSTM has been used to predict the energy, location, and Hessian of atoms. The results demonstrate that LSTM-based memory model, trained on data generated via these simulations, successfully learns preidentified key features associated with the energy, location, and Hessian of molecular system. The deep learning approach entirely bypasses simulations and generates predictions that are in excellent agreement with results obtained from explicit chemistry dynamics simulations. The results demonstrate that the performance gains of chemical computing can be enhanced using data-driven approaches such as deep learning which improves the usability of the simulation framework by enabling real-time engagement and anytime access.

This chapter is organized as follows. Section 2 presents the idea that integrate chemistry dynamics simulations with LSTM. Section 3 shows the experiment setting and results on H₂O molecular dynamics simulation, followed by data analysis. Section 3 presents the conclusions and lays out future work.

2 Methodology

2.1 Prediction–Correction Algorithm

In chemistry dynamics simulation, Hessian’s calculation consumes most of the CPU time because Hessian is the third derivative of the position. Hessian updating is a technique frequently used to replace electronic structure calculations of the Hessian in optimization and dynamics simulations. Existing generally applicable Hessian-update schemes, for example, the symmetric rank one (SR1) scheme, Powell’s symmetrization of Broyden’s (PSB) method, the scheme of Bofill, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) scheme, the scheme of Farkas and Schlegel, and other Hessian update schemes, are based on the Eq. (1)

$$ H\left({X}_{k+1}\right)\ \left({X}_{k+1}\hbox{--} {X}_k\right)=G\left({X}_{k+1}\right)\hbox{--} G\left({X}_k\right) $$

(1)

where G(X) and H(X), respectively, denote the gradient and Hessian of the potential energy at point X. Some researchers employed Hessian update method to build Hessian-based prediction–correction integration method to calculate the trajectory of atom in order to reduce the calculation time of Hessian and ab initio.

As illustrated in Fig. 1, in each time step of the integration method, the prediction is used to identify the direction the trajectory, ab initio potential energy, ab initio gradient, and ab initio or Hessian are computed at the end point X _i,p of predicted trajectory. The potential information calculated at the end of predicted trajectory is used with the potential energy information at point X _i−1,p near the trajectory starting point X _i−1 of this time step, which is the end point of corrected trajectory of the previous time step, to interpolate a highly accurate local PES. This highly accurate PES is used in the correction phase of the time step to recompute a more accurate trajectory.

In each time step, to obtain an accurate predicted trajectory, the prediction utilizes the Hessian in addition to the potential energy and its gradient. Assuming the current time step is the ith time step, the potential energy information needed during the prediction to integrate the trajectory is obtained by the quadratic expansion.

$$ E(X)=E\left({X}_{i-1,p}\right)+G\left({X}_{i-1,p}\right)\left(X-{X}_{i-1,p}\right)\\ +1/2{\left(X-{X}_{i-1,p}\right)}^TH\left({X}_{i-1,p}\right)\left(X-{X}_{i-1,p}\right)i>2 $$

(2)

P is an integer. About the point X _i–1,p, the end point of the predicted trajectory of the (i–1)th time step at which ab initio potential energy E(X _i–1,p), ab initio gradient G(X _i–1,p), and ab initio or updated Hessian H(X _i–1,p) have been computed on a region within a trust radius from X _i–1,p.

If we use X _i–1,p as the current location, the next part will show how to calculate the potential energy for the next X location. We can calculate the Potential Energy (P) and Gradient (G) at the X _i–1,p from known position. For example, there are eight atoms in F⁻ + CH₃OOH. There are 3 × N dimensions in the gradient and location vectors and N ² dimensions in the Hessian matrix of the reaction system. Therefore, most of calculation of Eq. (2) is to compute H(X _i–1,p). The biggest challenge is to choice different approaches to fast the calculation of H(X _i–1,p) with the position and others of the current location, at the same time, cannot enlarge the system error.

2.2 Long Short-Term Memory

As shown in Fig. 2, a neural network is the connection of many single neurons, an output of a neuron can be an input of another neuron. Each single neuron has an activation function. The left layer of the neural network is called the input layer, it includes X1,X2,X3,X4, the right layer of it is output layer, involve Z1. The other layer is hidden layer, it covers Y1,Y2,Y3.

Recurrent neural network (RNN) is a typical kind of neural network. As shown in the leftmost part of Fig. 3.

Like the leftmost of Fig. 3, RNN is a neutral network containing loops. N is a node of neural network. I stands for input and O for output. Loops allow information to be transmitted from the current step to the next step. RNN can be regarded as a multiple assignment of the same neural network, and each neural network module transmits the message to the next one. The right side of Fig. 3 corresponds to the unfolding of the left side. The chain feature of RNN reveals that RNN is essentially related to sequences and lists. RNN applications have been successful in speech recognition, language modeling, translation, picture description, and this list is still growing. One of the key features of RNN is that they can be used to transmit the previous information to the current task. But the distance from previous step to related step is not too long.

Long short-term memory (LSTM) overcomes this shortcoming. LSTM is a special type of RNN. LSTM solves the problem of long-term dependence of information. LSTM avoids long-term dependencies through deliberate design. Figure 4 shows the structure of a node in LSTM, where a forget gate can be observed. The output of the forget gate is “1” or “0.” “1” means full reserve, “0” is abandon completely. The forget gate determines which information will be retained and what will be discarded. The upper horizontal line allows the input information cross neutral node without changing in Fig. 4. There are two types of gates in a LSTM node (input and output gates). The middle gate is an input gate, which determines the information to be saved in the natural node. F means function modular and create a new candidate value vector. The right gate is the output gate. The F module closed the output gate determines which information of the natural node will be transmit to the output gate.

A node has three gates and a cell unit as shown in Fig. 4. The gates use sigmoid as activation function, the tanh function is used to transfer from input to cell states. The following are to definite a node. For the gates, the function are

$$ {i}_t=g\left({w}_{xi}{x}_t+{w}_{hi}{h}_{t-1}+{b}_i\right)\vspace*{-12pt} $$

(3)

$$ {f}_t=g\left({w}_{xf}{x}_t+{w}_{hf}{h}_{t-1}+{b}_f\right)\vspace*{-12pt} $$

(4)

$$ {f}_o=g\left({w}_{xo}{x}_t+{w}_{of}{h}_{t-1}+{b}_o\right) $$

(5)

The transfer for input status is

$$ c\mbox{\_}{in}_t=\tan h\left({w}_{xc}{x}_t+{w}_{hc}{h}_{t-1}+{b}_{o\mbox{\_} in}\right) $$

(6)

The status is updated by

$$ {c}_t={f}_t^{\,\ast\,} {c}_{t-1}+{i}_x^{\,\ast\,} c\mbox{\_}{in}_t \vspace*{-12pt}$$

(7)

$$ {h}_t={o}_t^{\hspace*{6pt}\ast\,} \tan h\left({c}_t\right) $$

(8)

The workflow of a node is shown in Figs. 5 and 6 shows the flowchart for LSTM.

2.3 Model

The calculation of position of the atom, the energy of the system, and Hessian occupy almost all the CPU time in chemistry dynamics simulations. Figure 7 illustrates the Hessian-based predictor–corrector algorithm in chemistry dynamics simulations. At each time step, the potential energy, kinetic energy, velocity, Hessian, and other parameters are calculated from the position of the atom. In Fig. 1, assuming X _i–1,p is the current point, the calculation potential energy of next point X is as follows. The gradient and potential energy of the current point can be calculated from the known location of the point. Assuming eight atoms, the dimension of gradient and location will be 3 × N, which of H will be N2. Hence, the largest calculation of Eq. (2) will be to calculate H(X _i–1,P). It is the focus of study of various algorithms to quickly and accurately calculate.

Algorithm 1 Algorithm

H(Xi–1,p) according to the location and time information of the current point, simultaneously systematic error is required least. Researchers proposed some Hessian update methods to saving computing time [14]. Deep learning will be used to predict the location, energy, and Hessian of atoms. Therefore, deep leaning will be used three times to instead of predictor–corrector. It is important to understand that our deep learning model needs to be trained and initialized before predicting. The result of this approach is a novel predictor–corrector algorithm with deep learning.

3 Experimental Results

To test the algorithm with deep learning, we implemented the integration algorithm in the VENUS (cdssim.chem.ttu.edu) dynamics simulation package interfaced with the electronic structure calculation NWChem [13]. We chose the reaction system H₂O as our testing problem. In the tests, ab initio potential energy, gradient, and Hessian were calculated using the density function theory 6–311 + G**, and ab initio Hessian is calculated once in every five steps during training. In the remaining nine steps, the new update scheme is used. We calculated a trajectory for the chemical reaction system with 5000*0.67 integration steps, where each step has a fixed size of 0.02418884 fs (100 a.u.; 1 a.u. = 2.418884e-17 s). The remaining step 5000*0.33 steps were predicted by the proposed deep learning algorithm. There are three prediction parameters in our test. They are atomic position, energy, and Hessian, respectively.

Figure 8 illustrates the computational energy and its predicted values. The above is the H₂O system computational energy chart. The horizontal coordinate is the time step and the vertical one is the energy value. The yellow region represents training data and green section predicted values. After more than 3000 training steps, the predicted value is almost the same as the calculated values. Table 3 lists some relative errors. We find the relative error to be less than 0.1%.

Figure 9 shows a hydrogen atomic location chat. The above is the computational values and the following is training and predicted values. The horizontal coordinate is the time step and the vertical one is the atomic location. Table 1 has some relative error between predicted and computational values. We find the relative error less than 0.7% and some even less than 0.01%. Figure 10 is one of Hessian chat. The above are the computational values and the following are training and predicted values. The horizontal coordinate is the time step and the vertical one is Hessian value. Table 2 has some relative error between predicted and computational values. We find the minimum relative error is 8% and some even over 25%. Although it is 5000 steps, Hessian calculated only 1000 steps because of the predictive–correction algorithm. Therefore, the size of the training set is less than 670 and the relative error is relatively large (Table 3).

Table 1 Relative error between atomic position prediction and computational value

Full size table

Table 2 Relative error between Hessian prediction and computational value

Full size table

Table 3 Relative error between system energy prediction and computational value

Full size table

The prediction–correction algorithm can reduce H₂O reaction system dynamics simulation time from months to days. The stability of the prediction–correction algorithm becomes very weak as simulation goes on. In addition, there must be an ab initio calculation every few steps in the prediction–correction algorithm. As the prediction step increases, the stability becomes weaker. Deep learning can reduce the simulation time of the reaction system by one third. The prediction step can reach over 1200 steps without affecting the system error after enough training. If some reinforcement learning and other methods are used, the calculation time will be further reduced and the prediction steps will be more.

4 Conclusion and Future WORK

In this chapter, a new molecular dynamics simulation algorithm is proposed by combining deep learning and predictive–correction algorithms. The new algorithm can reduce the calculation time of the system by one-third without increasing the error. In the future, the enhanced learning and parameter migration will be used to further reduce the calculation time. Then monodromy matrix [15,16,17] will be used to monitor the change of the calculation error.

References

D.L. Bunker, Classical trajectory methods. Comput. Phys., 10, 287–324 (1971, 1971)
Google Scholar
J.M. Millam, V. Bakken, W. Chen, W.L. Hase, Ab initio classical trajectories on the Born–Oppenheimer surface: Hessian-based integrators using fifth-order polynomial and rational function fits. J. Chem. Phys. 111, 3800–3805 (1999)
Article Google Scholar
N. Sathyamurthy, Computational fitting of AB initio potential energy surfaces. Comput. Phys. Rep. 3, 1–69 (1985)
Article Google Scholar
H.-M. Keller, H. Floethmann, A.J. Dobbyn, R. Schinke, H.-J. Werner, C. Bauer, P. Rosmus, The unimolecular dissociation of HCO. II. Comparison of calculated resonance energies and widths with high-resolution spectroscopic data. J. Chem. Phys. 105, 4983–5004 (1996)
Article Google Scholar
X. Zhang, S. Zou, L.B. Harding, J.M. Bowman, A global ab initio potential energy surface for formaldehyde. J. Phys. Chem. 108, 8980–8986 (2004)
Article Google Scholar
A.P. Bartók, M.J. Gillan, F.R. Manby, G. Csányi, Machine-learning approach for one- and two-body corrections to density functional theory: applications to molecular and condensed water. Phys. Rev. B 88, 054104 (August 2013)
Article Google Scholar
NIH., https://ncats.nih.gov/news/releases/2015/tox21-challenge-2014-winners (2014)
M. Rupp, A. Tkatchenko, K.-R. Müller, O.A. von Lilienfeld, Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012)
Article Google Scholar
R. Ramakrishnan, P.O. Dral, M. Rupp, O.A. von Lilienfeld, J. Chem. Theor. Comput. 11, 2087 (2015)
Article Google Scholar
S. Lu, Q. Zeng, H. Wu, A New Power Load Forecasting Model (SIndRNN): independently recurrent neural network based on softmax kernel function. IEEE 21st International Conference on High Performance Computing and Communications , https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00320, 2019
H. Wu, S. Lu, A. Lopez-Aeamburo, J. She, Temperature Prediction Based on Long Short-Term Memory Networks, CSCI'19, 2019
Google Scholar
V. Botu, R. Ramprasad, Adaptive machine learning framework to accelerate ab initio molecular dynamics. Int. J. Quantum Chem. 115(16), 1074–1083 (2015)
Article Google Scholar
E. Apra, T.L. Windus, T.P. Straatsma, et al., NWChem, A computational chemistry package for parallel computers, version 5.0, Pacific Northwest National Laboratory, Richland, Washington, 2007
Google Scholar
H. Wu et al., Higher-accuracy schemes for approximating the Hessian from electronic structure calculations in chemical dynamics simulations. J. Chem. Phys. 133, 074101, 2010
Google Scholar
H. Wu, et al., A High Accuracy Computing Reduction Algorithm Based on Data Reuse for Direct Dynamics Simulations, CSCI 2016
Google Scholar
H. Wu and S. Lu, Evaluating the accuracy of a third order hessian-based predictor-corrector integrator, Europe Simulation Conference, 2016
Google Scholar
H. Wu, S. Lu, et al., Evaluating the accuracy of Hessian-based predictor-corrector integrators. J. Cent. South Univ. 24(7), 1696–1702 (2017)
Article Google Scholar

Download references

Acknowledgments

This work was supported by Dr. Hase research group and Chemdynm cluster at Texas Tech University, as well as the Industrial Internet Innovation and Development Project of China: Digital twin system for automobile welding and casting production lines and its application demonstration (TC9084DY).

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, West Virginia State University, Institute, WV, USA
Heng Wu
College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
Shaofei Lu
Department of Computer Science, Midwestern State University, Wichita Falls, USA
Colmenares-Diaz Eduardo
School of Computer and Electronics Information, Guangxi University, Guilin, China
Junbin Liang
College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
Jingke She & Xiaolin Tan

Authors

Heng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shaofei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Colmenares-Diaz Eduardo
View author publications
You can also search for this author in PubMed Google Scholar
Junbin Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jingke She
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaofei Lu .

Editor information

Editors and Affiliations

Department of Computer Science, University of Georgia, Athens, GA, USA
Hamid R. Arabnia
Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, Canada
Ken Ferens
Business Administration, University of Oviedo, Oviedo, Asturias, Spain
David de la Fuente
Institute of Informatics Problems, The Russian Academy of Sciences, Moscow, Russia
Elena B. Kozerenko
Technology and Information systems, Universidad de Castilla La Mancha, Ciudad Real, Ciudad Real, Spain
José Angel Olivas Varela
Facultad de Informática - CIC PBA, Universidad Nacional de La Plata, La Plata, Argentina
Fernando G. Tinetti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, H., Lu, S., Eduardo, CD., Liang, J., She, J., Tan, X. (2021). Long Short-Term Memory in Chemistry Dynamics Simulation. In: Arabnia, H.R., Ferens, K., de la Fuente, D., Kozerenko, E.B., Olivas Varela, J.A., Tinetti, F.G. (eds) Advances in Artificial Intelligence and Applied Cognitive Computing. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-70296-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-70296-0_9
Published: 15 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70295-3
Online ISBN: 978-3-030-70296-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Long Short-Term Memory in Chemistry Dynamics Simulation

Abstract

Similar content being viewed by others

Long Short-Term Memory Neural Network on the Trajectory Computing of Direct Dynamics Simulation

Molecular Dynamics with Neural Network Potentials

Machine Learning for Molecular Dynamics on Long Timescales

Keywords

1 Introduction