Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Proteins perform their functions through selective intermolecular interactions. These forces are exerted on external objects such as other proteins, inhibitors, nucleic acids, membranes, signal molecules, etc. The strength and temporal evolution of interactions do depend on proteins’ amino acid composition and a particular spatial arrangement of these basic building blocks. Non-native conformations usually prevent proteins from performing their “natural” tasks. The definition of what is “a protein conformation” is somewhat blurry, since proteins are flexible molecules. The energy landscapes of such large systems are very rich, complex, and highly multidimensional. In other words, one may expect that in its native, functional form, a protein adopts not only one particular conformation, but rather numerous, closely lying related structures may participate in a given chemical or biological process. Quite often large-scale molecular motions are required for performing a given protein function (Bahar et al. 2010; van Oijen 2007). For example, hemoglobin increases its oxygen-binding capability via famous allosteric effects (Tekpinar and Zheng 2013), the enzyme polymerase augmented by PCNA DNA clamp (Rydzewski et al. 2015b) processes 1,000 base pairs per second during DNA replication, and G protein-coupled opioid receptor is activated by subtle conformational changes (Huang et al. 2015).

Capturing protein dynamical structures at work is not an easy task (Russel et al. 2009). Numerous experimental techniques have been developed: spectroscopy of all sorts, NMR, EPR, and even time-dependent X-ray crystallography. The computer era brought an excellent additional tool for studies of protein structures and dynamics. Since the pioneering ideas of A. Warshel et al. (1970) and the first application work by J.A. McCammon et al. (1977), empirical force fields have been used thousands of times to describe computationally proteins’ dynamics and their interactions. At this point, one has to note that currently, quantum mechanics is the only physical theory having appropriate rigor for the description of molecular systems and their interactions. Unfortunately, ab initio (or force field free) molecular dynamics (MD) simulations of proteins are still in their childhood and far from being available for every research group (Dal Peraro et al. 2007) (Rydzewski and Nowak, chapter "Molecular Dynamics in Excited State"). On the other hand, a classical approximation, where individual atoms (or even groups of atoms in the so-called coarse-grained MD) are replaced by model material points interacting through analytically predetermined potentials, offers a very promising alternative for tedious, expensive, and difficult experimental studies of proteins. Computer modeling of protein structure and dynamics is a currently mature field of science (Karplus and McCammon 2002). Its role has been recognized by the 2013 Nobel Prize in chemistry awarded to Michael Levitt (2014), Martin Karplus (2014), and Arieh Warshel (2014) for “taking the chemical experiment into cyberspace” (Staffan Normark, Permanent Secretary of the Royal Swedish Academy of Sciences).

Why do we use computational molecular mechanics and/or molecular dynamics methods for studies of structure and protein dynamics? Here is a partial list of good reasons for their popularity and widespread applications:

  1. 1.

    Give models for mechanistic interpretation of protein functions.

  2. 2.

    Help to connect a protein structure and dynamics with its functions.

  3. 3.

    Provide time-resolved data on protein structures.

  4. 4.

    Generate visual models easy to manipulate (great pictures in papers).

  5. 5.

    Promise to determine intra- and intermolecular interactions quantitatively.

  6. 6.

    Help to discover subtle functional differences in protein architectures.

  7. 7.

    Are useful in drug design.

  8. 8.

    Provide an easy tool to study protein mutants.

  9. 9.

    Allow for checking catalytic properties.

  10. 10.

    Are easy to verify, extend, or modify.

  11. 11.

    Bring information on thermodynamics.

  12. 12.

    Provide data on mechanical strength.

  13. 13.

    Direct comparisons with experimental data such as NMR or X-ray structures, single-molecule AFM spectroscopy, etc. are possible.

  14. 14.

    Have a strong impact on biochemistry, biology, biophysics, and bioinformatics.

  15. 15.

    Cheap computers are available everywhere.

  16. 16.

    Good quality, free software is accessible via WWW.

  17. 17.

    A relatively short “learning curve.”

  18. 18.

    Allow for computer experiments in fully controlled conditions.

  19. 19.

    Provide “molecular big data” suitable for data mining.

  20. 20.

    The field of computer modeling is well established and has many experts, conferences, specialized journals, and university courses around.

  21. 21.

    Supercomputer power doubling every 18 months.

  22. 22.

    Strong support from the industry and governments, biopharma sector, medical agencies, computer makers, etc.

And, of course, the last but not the least reason for the great popularity of computational modeling of proteins is just human curiosity and interest in solving scientific problems. For example, solving the protein folding problem is still the great challenge to scientific community. We know over 100 million protein sequences, but only some 100,000 3-D structures are available in the Protein Data Bank (as of year 2015). It is commonly believed that finding a realistic and sturdy method of converting 1-D protein sequence into 3-D structure should bring another Nobel Prize to the authors of such a discovery.

The purpose of this chapter is twofold: (a) we want to draw attention to major review papers and resources regarding studies of molecular dynamics of proteins and (b) we want to point out the most representative, current applications of MD methods which have been published in recent years.

Formalism of Molecular Mechanics and Molecular Dynamics Methods: A Short Presentation

In order to describe atomistic level chemical phenomena, one has to use quantum mechanics (Dahl 2001; Greenberger et al. 2009). This theory applied to molecules brought “an explosion” of quantum chemistry and together with the computer revolution has changed the way the chemistry is done in the twenty-first century. Despite great successes of quantum chemistry, this approach is still limited to small- or medium-size systems, having 20–300 atoms (Piela 2014). Proteins usually are composed of thousands of atoms; moreover, their realistic models should take into account a substantial number of water molecules. Thus, classical models of proteins are necessary. In such a simplified approach, electrons are neglected, quantum bonds are replaced by effective analytical potentials, and instead of atoms, we have carefully designed model balls. The motion of model atoms is treated in a classical way: validity of Newton equations of motion is assumed in the proteins’ nanoworld. Structures of molecules may be optimized using classical force fields and methods of molecular modeling; the time evolution of a position of each individual atom may be followed for a reasonably long period of time using the molecular dynamics formalism, but a possibility of forming/breaking chemical bonds is in classical simulations basically lost. Thus, the real chemistry, based on chemical reactions, needs the quantum theory. Nevertheless, the classical modeling provides so many useful hints into the nature of biomolecules that computational modeling of protein structure and dynamics is a fundamental part of the research in all major laboratories. Current technology allows for performing at least 35-ns classical MD simulations for 10,000,000 atoms in one computer day (Andoh et al. 2013).

Some History

The Electronic Numerical Integrator And Computer (ENIAC) was built at the University of Pennsylvania in Philadelphia in 1945 . It weighed more than 27,000 kg (60,000 lb) and contained more than 18,000 vacuum tubes. It is regarded as the first successful digital computer. Already in 1955, E. Fermi, J. Pasta, and S. Ulam calculated numerically the motions of a chain of coupled anharmonic oscillators. This is probably the first example that molecular dynamics simulation suggested analytical solution for the problem studied. In 1957, Alder and Wainwright studied dynamics of a system of hard, two-dimensional discs (Alder and Wainwright 1957). In 1971, Rahman and Stillinger used a more realistic Lennard-Jones interaction potential to study motions in water (Rahman and Stillinger 1971). The duration of those simulations was about 10 ps.

The concept of interaction force field has been applied to proteins by S. Lifson in 1969. This model was aimed at facilitating refinement process of protein structure determination based on X-ray diffraction experiments (Levitt and Lifson 1969). The role of quantum electronic effects in enzymatic reactions has been recognized later as well in the form of QM/MM approach (Warshel and Levitt 1976). Current force fields – sets of analytical formulas together with carefully chosen parameters – are based on infrared spectra, geometrical information from amide crystals, and quantum mechanical calculations. The famous paper by McCammon et al. (1977) on BPTI dynamics may be regarded as the first atomistic MD modeling of a protein. Interesting stories on the history and development of molecular dynamics simulation science may be found in the review published by M. Karplus (2003) and the article by W. Jorgensen (2013). At the end of 2015 in the PubMed Bibliographics database, nearly 20,000 papers were listed under the query “molecular dynamics simulation protein.” This number has doubled since 2011. The area of computational chemistry and biology is very strong and is fast growing.

On the Origin of Potential Energy Surface (PES) Concept

As we know, on the grounds of quantum mechanics , the complete information of the molecular system may be obtained from the wave function |Φ〉, which is a solution of the Schrodinger equation :

$$ \widehat{H}\left|\Phi \right\rangle =\varepsilon \left|\Phi \right\rangle $$

where Hamiltonian \( \widehat{H\ } \) contains both electron (i,j) and nuclear (A,B) degrees of freedom (in atomic units):

$$ \begin{array}{cc}\hfill \begin{array}{l}\widehat{H}=-{\displaystyle \sum_{i=1}^N\frac{1}{2}{\nabla}_i^2-}{\displaystyle \sum_{A=1}^M\frac{1}{2{M}_A}{\nabla}_A^2}\\ {} -{\displaystyle \sum_{\mathrm{i}=1}^N{\displaystyle \sum_{j=1}^M\frac{Z_A}{r_{iA}}+}}{\displaystyle \sum_{\mathrm{i}=1}^N{\displaystyle \sum_{i>j}^N\frac{1}{r_{ij}}+}}\\ {} +{\displaystyle \sum_{A}^M{\displaystyle \sum_{B>A}^M\frac{Z_A{Z}_B}{R_{AB}}}}\end{array}\hfill &{\qquad}\begin{array}{l}\mathrm{atomic}\ \mathrm{units}\ \left(\mathrm{a}.\mathrm{u}.\right):\\ {}\hslash =1\\ {}"c=1" (137.)\\ {}m=1\\ {}e=1\end{array}\hfill \end{array} $$

If we adopt Born-Oppenheimer approximation , a separation of electronic and nuclear degrees of freedom is possible:

$$ {\widehat{H}}_{el}{\Phi}_{el}={\varepsilon}_{el}{\Phi}_{el} $$

where electronic Hamiltonian is

$$ {\widehat{H}}_{el}=-{\displaystyle \sum_{i=1}\frac{1}{2}{\nabla}_i^2-{\displaystyle \sum_{\mathrm{i}=1}^N{\displaystyle \sum_{A=1}^M\frac{Z_A}{r_{iA}}+}}{\displaystyle \sum_{\mathrm{i}=1}^N{\displaystyle \sum_{i>j}^N\frac{1}{r_{ij}}}}} $$

and electronic wave function Φ el depends in a parametric way on positions of all nuclei {\( \vec{R}{}_A \)}:

$$ \begin{array}{l}{\Phi}_{el}={\Phi}_{el}\left( \left\{ \overrightarrow{r}{}_i\right\}\!; \left\{ \overrightarrow{R}{}_A\right\} \right)\\ {}{\varepsilon}_{el}={\varepsilon}_{el}\left( \left\{ \overrightarrow{R}{}_A\right\} \right)\end{array} $$

As one can see, the electronic energy ε el depends on positions of nuclei as well. Thus, for each different arrangement of atoms, Φ el is a different function of electron coordinates \( \overrightarrow{r}\!\!{}_i \). For a fixed position of atoms (nuclei), the total energy reads

$$ {\varepsilon}_{tot}^{fix\ N}={\varepsilon}_{el}+{\displaystyle \sum_{A}^M{\displaystyle \sum_{B>A}^M\frac{Z_A{Z}_B}{R_{AB}}}} $$

Such an approximation leads to the following nuclear Hamiltonian \( \widehat{H\ }_{nucl} \):

$$ \begin{array}{l}{\widehat{H}}_{nucl}=-{\displaystyle \sum_{A=1}^M\frac{1}{2{M}_A}{\nabla}_A^2}+{\left\langle {\widehat{H}}_{nucl}\right\rangle}_{\Phi_{el}}+{\displaystyle \sum_{A}^M{\displaystyle \sum_{B>A}^M\frac{Z_A{Z}_B}{R_{AB}}}}=\\ {} -{\displaystyle \sum_{A=1}^M\frac{1}{2{M}_A}{\nabla}_A^2}+\underbrace{\varepsilon_{el}\left\{{R}_A\right\}+{\displaystyle \sum_{A}^M{\displaystyle \sum_B^M\frac{Z_A{Z}_B}{R_{AB}}}}}=\\ {} -{\displaystyle \sum_{A=1}^M\frac{1}{2{M}_A}{\nabla}_A^2}+ {\varepsilon}_{{}_{tot}}^{fix\ N}\left\{{R}_A\right\}\end{array} $$

This total energy ε fix N tot (with the fixed positions of atoms) provides a potential for nuclear motion. Such quantity is called potential energy function (called sometimes potential enenrgy surface PES) and in molecular dynamics simulations is approximated by a force field. In terms of quantum mechanics solutions to the nuclear Schrödinger equation,

$$ {\widehat{H}}_{nucl}{\Phi}_{nucl}={\varepsilon}_{nucl}{\Phi}_{nucl} $$

provides information on vibrations, rotations, and translations of the molecule. But the dynamics of a molecule (protein) may be studied also classically, provided that the realistic molecular PES is known.

Force Fields

Classical force fields (FFs) have common assumption that the analytic potentials are good approximation to the “real” PES. It is also assumed that parameters pertaining to particular atom types (or groups of atoms) are transferable from one model biomolecular system to similar ones.

The general expression for potential energy V is the following:

$$ \begin{aligned} V&={\displaystyle \sum_{\mathrm{bond}\mathrm{s}\ i}}{V}_i^B+{\displaystyle \sum_{\mathrm{bond}\ \mathrm{angles}\ j}}{V}_j^A+{\displaystyle \sum_{\mathrm{torsional}\ \mathrm{angles}\ k\ }}{V}_k^D+{\displaystyle \sum_{\mathrm{improper}\ \mathrm{angles}\ l}}{V}_l^E\nonumber\\ &{}\quad+{\displaystyle \sum_{\mathrm{atom}\ \mathrm{pairs}\ \left(r,s\right)}}\left({V}_{r,s}^C+{V}_{r,s}^P+{V}_{r,s}^{VdW}\right) \end{aligned}$$

The physical meaning of each term is described in the legend in Fig. 1. In general, an intuitive decomposition of energy is used in the above equation. Additive components correspond to energies of chemical bonds (V i B), energies related to angular deformation of the optimal molecular structure (V j A, V k D, V l E), and pairwise interactions of all atoms present in the system related to Coulomb energy of a system of partially charged model atoms (V C) and so-called Lennard-Jones (or van der Waals) term (V P + V VdW).

Fig. 1
figure 1

Based on H. Grubmueller, “Proteins as Molecular Machines: Force Probe Simulations,” published in Computational Soft Matter: From Synthetic Polymers to Proteins, Lecture Notes, Norbert Attig, Kurt Binder, Helmut Grubmueller, Kurt Kremer (Eds.), John von Neumann Institute for Computing, Julich, Germany, NIC Series, Vol. 23, ISBN 3-00-012641-4, pp. 401–422, 2004. © 2004 by John von Neumann Institute for Computing. Interaction contributions to a typical force field. Bond stretch vibrations are described by a harmonic potential V B, the minimum of which is at the equilibrium distance b 0 between the two atoms connected by chemical bond i (the indices i, j, etc. are not shown in the figure). Bond angles and out-of-plane (improper) angles are also described by harmonic potential terms, V A and V E, where Θ0 and ζ 0 denote the respective equilibrium angles. Dihedral twists (torsional angles) are subjected to a periodic potential V D; the respective force constants are denoted by k’s with appropriate indices. Nonbonded forces are described by Coulomb interactions, V C, and Lennard-Jones potentials, V LJ = V P + V VdW, where the latter includes the Pauli repulsion, V P \( \sim {r}^{-12} \), and the van der Waals interaction, V VdW \( \sim -{r}^{-6} \), respectively

One should distinguish between all-atom and united-atom force fields. In this second type, the groups of real atoms are used as basic model “atoms,” for example, the methyl group CH3 may be treated as a special type of a united atom having molecular weight of 15. Such simplification saves some computer time and in certain cases does not affect the results of modeling. The drawback of the majority of force fields currently used is the lack of atomic polarization effects. Partial changes on atoms to some extent include polarization, but usually, parameters remain fixed through the whole simulation. But the reality is different. Induced dipole moments may change the dynamics of the molecule and may affect results of modeling. Charges are modified “on the fly.” Thus, the next generation of force fields shall include atomic polarizability (Warshel et al. 2007; Huang et al. 2014).

There are numerous classical force fields designed for the description of protein structure and dynamics. Among the most popular ones are CHARMM (MacKerell et al. 1998), AMBER (Weiner et al. 1984), GROMOS05 (Christen et al. 2005), and OPLS (Jorgensen and Tirado-Rives 1988). Perhaps, the best up-to-date account of various force fields and computer codes available for protein simulation is the Wikipedia on the Internet. The problem of the design of a potential energy function for proteins is discussed in Boas and Harbury (2007). Some papers concerning comparison of the quality of various force fields have appeared (Hornak et al. 2006; Guvench and MacKerell 2008; Lindorff-Larsen et al. 2012a). It is obvious that each force field was optimized for a different set of systems and they were tuned to reproduce different properties, so it is not clear how a fair comparison should be performed. On the other hand, in the protein modeling community, there is continuous tension and pressure that more effort should be devoted for careful scrutinizing of the force fields and for setting clear recommendations regarding “justified” error bars for results of computations (Aliev and Courtier-Murias 2010; Pantelopulos et al. 2015).

Structure Optimization or Energy Minimization

The most standard problem of molecular mechanics applied to proteins is the following: what is the best (optimum) structure of the studied protein? The common assumption is that the minimization of the energy (or sometimes other thermodynamic functions such as free energy) will give the answer. It is believed that such a structure is a good approximation to the native structure of the protein of interest.

There are many computational methods of finding local minima (Klepeis et al. 2003; Chou 2004; Kmiecik et al. 2007). Unfortunately, there is no general method known which might lead to the global minimum of PES of a given protein. Such techniques as simulated annealing (Kannan and Zacharias 2009) or replica exchange (Sugita and Okamoto 1999; Sugita 2009), supported by genetic algorithms, may help to sample the conformational space efficiently and may generate plausible “candidates” for the global minimum, but these approaches are rather heuristic recipes than rigorous procedures. Quite often, the energetics of transition from a conformation A to a conformation B is of interest and should be calculated (Schlegel 2003). There are numerous methods of finding classical reaction paths appropriate for proteins, for example, the self-penalty walk (Nowak et al. 1991). It is worth noting a tricky technique of milestoning (Kuczera et al. 2009). The interested reader may find details of reaction path calculations for large molecular systems in Elber et al. (2002) or Tao et al. (2012).

General Molecular Dynamics Scheme

The basic physics behind a simple molecular dynamics scheme seems to be trivial. We want to solve simultaneously M Newton’s equation of motion for each individual atom i (or a grain of atoms):

$$ {\boldsymbol{a}}_i=\frac{1}{m_i}{\boldsymbol{F}}\!_i,\ i=1,\dots .,M $$

The force F i acting on each atom may be calculated (locally) as gradient of the potential (PES):

$$ {\boldsymbol{F}}_i=-\nabla V$$

Having forces, one can easily have accelerations acting on each atom. We assume that the molecular system is deterministic (not always the case), and we may predict the position of all atoms by integrating the acceleration (twice) with respect to time. There are many numerical algorithms suitable for solving these problems, but due to its simplicity and numerical stability, Verlet algorithm is perhaps the most frequently used in real simulations (Frenkel and Smit 2002). Of course, one must assume the initial structure of the protein and has to plug in time into numerical algorithm. Time variable is discretized; the size of the time step depends on the timescale of the fastest motions we want to study. For real proteins in ambient temperatures, it is usually 1 fs (10−15 s). Figure 2 shows major steps in MD routines.

Fig. 2
figure 2

A general scheme of generating MD trajectory

Here, we present typical steps in doing molecular dynamics simulations of proteins:

  1. 1.

    What is the scientific problem we want to solve?

  2. 2.

    Is classical MM/MD modeling an appropriate methodology for this type of problem? Would these calculations help to understand nature, indeed? How our results might be verified experimentally?

  3. 3.

    Do we have enough knowledge/experience/expertise or we prefer “black box” approach (“let’s calculate something, and we will see…”)?

  4. 4.

    What is the expected time of calculations required to obtain reasonable, publishable results? (Note: one trajectory usually is not enough.)

  5. 5.

    Do we have sufficient resources (licensed codes, computer time, storage space, manpower, etc.)?

  6. 6.

    Having answered points (1–5), we need to set up an initial model of the protein. Usually, an experimental structure (X-ray or NMR) downloaded from the PDB www.pdb.org (Berman et al. 2000) is a good starting point.

  7. 7.

    We should check whether the structure is complete. Is the resolution of the structure adequate for our purposes? Does it contain all amino acids? Are there any missing atoms? Do we have force field parameters for prosthetic groups, ligands, metal ions, and exotic stuff present in our favorite protein? One should always check the “biological unit” entry in PDB to avoid misunderstanding of the real structure of a protein of interest.

  8. 8.

    We need to decide in what particular environment simulation will proceed. Vacuum? Water? Continuum dielectric? Exotic solvent? pH? Should ions be added for charge neutralization?

  9. 9.

    In what conditions of temperature and pressure we plan simulations? How the temperature will be maintained? Shall we switch to the Langevin dynamics?

  10. 10.

    Do we need quasi-infinite model (periodic boundary conditions – PBC)? What is an appropriate shape and size of the solvation box? Should we use the Ewald summation technique to properly calculate electrostatic interactions?

  11. 11.

    Let us assume we perform a standard 50-ns MD simulation of a protein having a reasonable initial structure. We will add a box of model water molecules at least 6 (or 9) angstrom thick at each protein border region.

  12. 12.

    After this initial preparation, the first step will be an optimization of the protein structure. We may initially freeze the protein and allow for some steps of minimization of solvated water molecules (e.g., 500 steps of the steepest descent (SD) method).

  13. 13.

    Next, we may allow for some MD simulations of the solvent (water). We may gradually increase the temperature of waters from 0 K (minimized structure in principle is not related to any temperature) to 300 K, for 500–1,000 ps period of time. The time step in MD is usually 1 fs; thus, we will ask the MD code to perform up to 1,000,000 steps.

  14. 14.

    We may relax constraints and “unfreeze” the protein. Some 200–500 SD steps should be sufficient to transfer the protein (+water) from the “experimental” minimum to a “local force field-related” minimum. Too many steps result in overminimization (Rydzewski et al. 2015a)!

  15. 15.

    We may increase the temperature in a stepwise manner from 0 to 300 K for the whole system. 1–5 ns of such heating phase is often more than enough.

  16. 16.

    If we perform T = const simulation, we need to equilibrate the system well before useful data may be collected. The equilibration time teq depends on the system studied. In the current papers, one can find teq from 1 to 10 ns. We should observe at least a few geometrical parameters of our protein. The RMS distance calculated for Cα atoms from the minimized (or PDB) structure to the current one may help to estimate whether the model is fully equilibrated. The RMS plot versus time should be flat.

  17. 17.

    Now, we may use the equilibrated system (protein + water) and launch a long production run (50 or more up to 1,000 ns).

  18. 18.

    The structures at selected time point (frames) should be stored. These structures are further analyzed using computer graphics and specialized software analysis tools. One may store structures every 100 steps (fs) or even every 1,000 steps or more; it depends on what data are needed.

  19. 19.

    In the theory, we should set up modeling for infinite time in order to sample all configuration space of the protein studied. This is obviously not possible; thus, we cross our fingers and believe in ergodicity of the trajectory obtained. Once calculated quantities do not depend on time of simulation, it is a reasonable signal that the longer calculation will not bring new information. Usually, several shorter trajectories, with different initial conditions, will provide better understanding of the protein than one but very long trajectory.

During MD simulations, temperature and pressure have to be controlled using special algorithms, and infinite systems may be mimicked by using periodic boundary conditions. Long-range electrostatic interactions are often accounted for by using the Ewald summation technique (Sagui and Darden 1999). Inclusion of stochastic character of atomic motion results in using Langevin dynamics instead of Newton dynamics (Schlick 2010).

One should note that solvation effects may be accounted for not only by including into the studied system explicit water molecules but also via implicit solvent models (Chen and Brooks 2008; Chen et al. 2008). Changes in free energy upon solvation are often estimated using the generalized Born model (Hou et al. 2011; Cumberworth et al. 2015).

Practical Aspects

Even the most sophisticated and advanced methods (Schwede and Peitsch 2008) need tools to perform computations. Both computer codes and hardware are required. The majority of papers published in this field utilized public domain codes to obtain data on protein dynamics. There are of course numerous commercial packages, such as Discovery Studio (Biovia), Sybyl-X (Certara), Yasara (YASARA Biosciences), and Desmond/Maestro (Schrödinger Inc.), popular in industry and certain research environments, but it seems that routine academic work is based on basically freely available software.

MD Codes

One of the first codes was CHARMM (commercial version is called CHARMm) developed originally at Harvard University. This suite of programs is very versatile and contains all major computational methods. It is parallelized and allows also for QM/MM simulations (Brooks et al. 2009). AMBER force field is popularly used for modeling of nucleic acids, but thousands of simulations of proteins have been published as well (Case et al. 2005; Salomon-Ferrer et al. 2013). The most recent, 14th, version of AMBER is heavily optimized with respect to performance (GPUs) and contains such advanced techniques as locally enhanced sampling and SCC-DFTB QM/MM methods.

In Europe, the GROMACS code has growing popularity due to its speed and good scaling on parallel clusters (Pronk et al. 2013; Van Der Spoel et al. 2005). The methods of essential dynamics, principal component analysis, and flooding (Lange et al. 2006) are available here. The most recent version – GROMACS 5.1 – takes advantage of multicore processors present in modern PCs and workstations and runs not only under many Unix distributions but Windows as well (Pronk et al. 2013). Both CUDA and OpenCL graphics card environments are supported (Abraham et al. 2015).

In our lab, we are quite happy with the NAMD code having ~40,000 registered users (Phillips et al. 2005). It is well documented, relatively fast, well maintained, and often updated and scales nicely. It has certain flexibility in selection of the force field (CHARMM, AMBER). The authors have implemented locally enhanced sampling (LES), implicit ligand sampling (ILS), replica exchange, and steered molecular dynamics (SMD) schemes. New versions of NAMD (2.11 as of 2015) will run effectively on GPUs; there are also attempts to port this code to a computational grid environment.

This short presentation of major packages devoted to protein dynamics simulations is very far from being complete. Some WWW services, including Wikipedia, try to maintain the updated lists of available MD codes.

Developments in Force Fields and Parametrization

MD data may give insights into chemical or biological phenomena only if calculations are based on good-quality force fields and adequate parameters. Thus, much effort is devoted to testing and development of advanced force fields (Lindorff-Larsen et al. 2012a; Pantelopulos et al. 2015). Some researchers argue that NMR measurements provide objective measures of quality of FFs (Robustelli et al. 2010; Beauchamp et al. 2012; Huang and MacKerell 2013), while others warn that such an approach is not necessarily correct (Martin-Garcia et al. 2015).

Subtle intermolecular interactions often depend on deformations of atomic electronic densities. To account for such effects in a classical FF, atomic polarizability has to be added. Calculations with such polarizable FF are more time-consuming, but new phenomena may be studied using such more advanced protein models. The additive and polarizable variant of CHARMM FF has been very recently published (Vanommeslaeghe and MacKerell 2015). Other advances in polarizable force fields, with special attention devoted to longtime simulations, are discussed in Huang et al. (2014). Notably, a great deal of efforts is devoted to construction of new force fields suitable for modeling of medically important protein-surface interactions (Martin et al. 2015). Also efforts to optimize implicit solvent FF (Bottaro et al. 2013) are promising since longer and longer simulations are required, for example, in protein folding studies. Comprehensive discussion of current status of FF developments may be found in the review by Lopes et al. (2015).

Programming user-friendly procedures of generating new parameters enhances area of MD applications. For example, a practical way to generate parameters (CHARMM, GROMACS) required for protein simulations with small organic ligands is to use the SwissParam server (Zoete et al. 2011). Another option of extending CHARMM MD simulations to new ligands offers the tool kit ffTK implemented into newer versions of VMD (Mayne et al. 2013). For GROMACS users, an automated topology builder (Malde et al. 2011) may facilitate a lot of routine MD research on new proteins. Some hints of building new parameters may be found here (Wang et al. 2014).

Visualization

In the era of the Internet and efficient graphics cards, we have tens of programs designed to visualize a protein structure. Each year brings new players in this competition. Many researchers prefer to pay license fee and to use professional graphics for visualization of their structure and MD data: Chimera, PyMOL, Discovery Studio, Maestro, HyperChem, Yasara, etc. Open software is also available, for example, OVITO (Alexander 2010) or Avogadro (Hanwell et al. 2012). A comprehensive list of currently used molecular visualization codes may be found in the book by Gu and Bourne (2009).

User-friendly and robust code evolved from the early Visual Molecular Dynamics (VMD) software, created by K. Schulten group at UIUC in the USA (Humphrey et al.1996). There are thousands of users of this package. An example of VMD visualization of the extracellular matrix synaptic protein reelin is presented in Fig. 3. Preparation of a similar picture is not difficult if one uses critically (Knapp and Schreiner 2009) the VMD guide (Hsin et al. 2008).

Fig. 3
figure 3

A fragment of extracellular matrix protein reelin (2ee2 PDB code, two out of eight BNR-EGF-BNR repeats are shown) that plays a role in a synaptic plasticity and maintenance of synaptic function, visualized using the VMD code (Humphrey et al. 1996). NAG cofactors are shown in ball-and-stick representation, green spheres represent Ca+2 ions, the surface denotes EGR parts, and a disulfide bridge critical for mechanical resistance is shown in red mesh. Abnormal reelin expression is observed in autism spectrum disorders, schizophrenia, or Alzheimer’s (author: K. Mikulska, UMK, Poland)

Review of Reviews

Since the very first proceedings of CECAM workshop in France devoted to models for protein dynamics (Berendsen 1976), details of computer simulations of physical and biological systems have been described in many comprehensive books. For example, a good starting point may be the book by Allen and Tildesley published already in 1987 (Allen and Tildesley 1987). It is devoted mainly to simulations of liquids, but many methodological aspects of computational modeling of physical systems are well covered. Details of molecular simulations are presented by Haile (1992). In a classical text by Rapaport (1995), explanations of basic software and algorithms may be found. The introductory but very informative book by Frenkel and Smit focuses on Monte Carlo and molecular dynamics methodologies. The authors analyze algorithms and present useful FORTRAN-based pseudocodes for basic steps of simulations (Frankel and Smit 2001). Broader aspects of molecular modeling are covered by A. Leach (2001). In this book, besides main algorithms and methods of computational chemistry and modeling methods of protein structure prediction, free energy calculations, solvation, and drug design applications are presented. The book coauthored by M. Karplus, one of the founding fathers of MD simulations of proteins, is a valuable source of information for everyone interested in protein dynamics (Becker and Karplus 2006).

Excellent reviews on various aspects of biomolecular modeling are published quite often. Here only a very concise, subjective, and limited review of the recent (i.e., published in the twenty-first century) reviews is presented, just to provide a handy reference to further search for relevant information.

A large body of proteins perform catalytic functions. MD calculations of enzymatic mechanisms are a great challenge to theory, and the best strategy for simulations is a matter of continuous debate. In reviews by A. Warshel et al. (Warshel 2002, 2003), main aspects of proper understanding of catalysis are described. Modeling of chemical reactions requires a special approach – some possibilities are outlined in van Speybroeck and Meier (2003), and a comprehensive review of computational enzymology, largely based on QM/MM methods, may be found in Lonsdale et al. (2010). Yet another class of problems arise when protein-protein interactions are modeled (Ritchie 2008).

Basic methods and main applications of computer modeling of biosystems are presented in a comprehensive work by Schlick (2010), a review by Goodfellow et al. (Moraitakis et al. 2003), a paper by van Gunsteren et al. (2002)(Hansson et al. 2002), or the review by K. Kremer (2003). Historical development of MD techniques is presented by Field (2015). Protein modeling field is outlined by Naray-Szabo et al. in the previous 2012 edition of the Handbook of Computational Chemistry (Leszczynski 2012) and in more recent comprehensive and general review by Orozco (2014). Useful accounts on are presented in a recent review by Lorenz and Doltsinis (2012). Particularly interesting are studies of membrane proteins (Gumbart et al. 2005; Stansfeld and Sansom 2011a, b) since receptor proteins are common and attractive drug targets (Borhani and Shaw 2012). One can also find reviews dedicated to application of simulations in narrow subdisciplines, such as biotechnology (Aksimentiev et al. 2008). It is worth noting that more integrative approach, combining MD simulations, experimental data, and network analysis, is advocated as a tool to get an understanding of biological process in the cell (Papaleo 2015).

An account on classical methods, problems, and goals of biomolecular simulations is given in a comprehensive article by van Gunsteren et al. (2006). Focused mainly on proteins, a very informative paper by Adcock and McCammon provides an excellent description of methods and key results from MD (Adcock and McCammon 2006). Methodological advances were also reviewed at the same time: Chu et al. popularized multiscale simulations (Chu et al. 2007), Elber et al. critically commented the literature on longtime simulation methods (Dal Peraro et al. 2005), and Liwo et al. enumerated efficient methods of sampling of proteins’ conformational space (Liwo et al. 2008). Newer review of enhanced sampling techniques has been published by Fujisaki et al. (2015). Coarse-grained MD simulations have a growing importance in computational biochemistry and biology. Numerous excellent reviews have been published recently (Marrink and Tieleman 2013; Saunders and Voth 2013; Kmiecik et al. 2014; May et al. 2014; Vicatos et al. 2014; Ingolfsson et al. 2014; Barnoud and Monticelli 2015).

Enhanced sampling of conformational space has been always promising keyword for progress in MD simulation methods (Spiwok et al. 2015; Fujisaki et al. 2015). Collective variables may reduce the space (Vashisth et al. 2014), but typically metadynamics, replica exchange, or Gaussian accelerated MD and their variants are extensively explored (Granata et al. 2013; Abrams and Bussi 2013; Comer et al. 2014; Andersen et al. 2015; Miao et al. 2015).

A lot of effort has been put in elaborating reliable and practical methods of calculating changes in free energies (Kholmurodov et al. 2003; Meirovitch 2007; Christ et al. 2010; Pohorille et al. 2010). Even “computational alchemy” term has been coined for some counterintuitive but physically valid methods (Straatsma and McCammon 1992; Aleksandrov et al. 2010). Such data are required, for instance, in drug design (Galeazzi 2009; Morra et al. 2008). Increasing role of taking into account entropic effects is now widely accepted (Noel and Whitford 2014; Rydzewski et al. 2015a). Enhanced MD helps now to access better ligand-protein docking (Andersen et al. 2015). Time-consuming free energy calculations may profit a lot from using enhanced sampling techniques (Bernardi et al. 2015; Miao et al. 2015) as well.

Advantages of using the computational approach to chemistry and biology are outlined in a comprehensive review of Mulholland et al. (van der Kamp et al. 2008). Supercomputers contribute a lot to the current blooming state of MD simulations of large biosystems (Chipot 2015). One can find also more focused, specialized reviews of MD approach to biological problems (Dodson et al. 2008; Avila et al. 2011) or to future nanotechnology applications of self-assembling systems (Klein and Shinoda 2008), biotechnology (Aksimentiev et al. 2008), and drug design (Borhani and Shaw 2012). Quite often modern simulation techniques are called “computational microscope,” just to stress its role in molecular biology (Dror et al. 2012). Interestingly enough, with the petaflop computer system currently available, multimillion atom simulations are possible (Sanbonmatsu and Tung 2007). Promising efforts are made to construct computers dedicated to MD simulations on hardware level (Scarpazza et al. 2013). The success of ANTONs from D.E. Shaw Research is notable (Shaw et al. 2014).

Milliseconds in simulations of protein folding problems are currently sought (Bowman et al. 2011; Piana et al. 2014); thus, all means helping to reach this target are welcome (Lane et al. 2013), including optimization of MD codes (Krieger and Vriend 2015). Protein folding phenomena still pose more questions than answers (Dill and MacCallum 2012).

For sure, this field profits from the use of the Internet. Some groups have developed user-friendly and even charming portals to the simulation software (Miller et al. 2008). Folding@Home project has reached 20-Tflop computer power (Lane et al. 2013), and this software is available for other distributed computing environments now (Eastman and Pande 2015). Other researches, for example, the group of V. Daggett, invest in large-scale depositories of scientific data stemming from MD simulations (Van der Kamp et al. 2010). Sharing data, especially obtained using advanced computer resources and easy to expose in clouds, is always a good idea. Such an “ocean” of numbers deserves careful scrutiny. Hopefully, some scientific treasures will be fished out in the near future.

Selected Examples of Applications of MD to Study Proteins

In this short chapter, we point out selected, representative, widely discussed problems where the MD methods have been applied.

Protein Folding Studies

Proteins are synthesized as linear polymers, but for their function, precise 3-D structure is usually necessary. The process of folding to such a native structure may be conveniently studied using molecular dynamics simulations (Fersht and Daggett 2002; Zhang et al. 2009; Freddolino et al. 2010; Towse and Daggett 2013). The main obstacle in this research is still relatively limited (100–1,000 ns) timescale accessible for standard modeling (Scheraga et al. 2007). However, more and more benchmark calculations achieve microseconds (Piana et al. 2011), including classical studies for very small systems such as villin headpiece fragment, one of the most stable and fastest-folding naturally occurring proteins (Duan and Kollman 1998; Ensign et al. 2007; Freddolino and Schulten 2009) or ubiquitin (Piana et al. 2013). Simulations help to propose universal folding mechanisms and to determine intermediates (Freddolino et al. 2008; Rizzuti and Daggett 2013; Piana et al. 2014). Better algorithms and huge computer power make performing meaningful folding simulations possible within a few days (Nguyen et al. 2014). Quite promising are new methodological developments adding core physical insights into a standard Replica Exchange MD (REMD) (Perez et al. 2015) or even mixing MD with experimental data (Sborgi et al. 2015). A lot of effort is devoted to finding an intelligent method of “ab initio” protein folding (Shakhnovich 2006; Blaszczyk et al. 2013). The progress may be checked by following the worldwide folding competition CASP (Kryshtafovych et al. 2016).

Intrinsically Disordered Proteins

Not all proteins are nicely folded, and numerous proteins remain disordered through the whole (or majority) of their life-span in a cell. The paradigm that only well-structured biomolecules may play significant physiological role has been abandoned at the beginning of the twenty-first century (Wright and Dyson 1999). The role of intrinsically disordered protein (IDP) is intensively studied (Wright and Dyson 2015) and opens very attractive field for MD simulations (Lindorff-Larsen et al. 2012b; Baker and Best 2014; Fu and Vendruscolo 2015). Notably, there is an ongoing discussion of what MD force field (Baker and Best 2013; Rauscher et al. 2015; Piana et al. 2015) and methodology (Do et al. 2014; Granata et al. 2015; Zerze et al. 2015) are the best for IDP modeling.

Protein-Drug Interactions and Docking

The pharmaceutical industry badly needs reliable theoretical methods for calculating ligand binding affinities (Aqvist et al. 2002; Gallicchio and Levy 2011; Zhao and Caflisch 2015; Yuriev et al. 2015). The problem is not easy, since many factors, for example, multiple binding sites, protein flexibility, and solvent model, affect a small value of the free energy of binding (Simonson et al. 2002; Deng and Roux 2009; Spyrakis et al. 2011). The chance of getting wrong results is high. However, in the literature, there are hundreds of papers devoted to protein-drug interactions. One of the main topics is HVI-related enzyme inhibitors (Monroe et al. 2014). The threat of a bird or swine flu pandemia triggered studies of interactions of neuraminidases or influenza A peptides with antiviral drugs (Le et al. 2011; Khurana et al. 2011). The role of induced fit effects during ligand docking to steroid hormone-binding receptors was analyzed by Cornell and Lam (2009). MD may be used in an anticancer drug development (Rosales-Hernandez et al. 2009; Lauria et al. 2010), GPCR inhibitors/activators (Dror et al. 2013;Tautermann et al. 2015), or studies of inhalational anesthetics’ interactions with proteins (Vemparala et al. 2010). Studies on microsecond timescale MD explained mechanisms of chemokine receptor CCR5 inactivation (Salmas et al. 2015), macrolide binding to the ribosome (Sothiselvam et al. 2014), or opioid receptor activation (Huang et al. 2015). The review of MD applications in drug design, especially in stability of drug-macromolecule complexes, has been recently published (Mortier et al. 2015).

Spectroscopy Experiments

Spectroscopy is an extremely useful analytical and diagnostic technique with wide applications in chemistry, physics, life sciences, industry, medicine, etc. Molecular dynamics helps to interpret experimental data; some examples are given below. Sen et al. used simulations to explain time-resolved Stokes-shift experiments with biopolymers (Sen et al. 2009). Fluorescent proteins , especially based on GFP, are often studied computationally. Sun et al. applied QM/MM and MD to explain dependence of red fluorescent protein on pH of the environment (Sun et al. 2010). Computational studies of energy transduction in photoactive yellow protein may give useful hints for spectroscopic studies (Gamiz-Hernandez and Kaila 2016) as well as do the modeling of visual pigments (Brunk and Rothlisberger 2015; Wanko et al. 2006). Quite often, the interpretation of NMR experiments profits from simulations, such as studies of lipid-binding sites to neurotoxin (Weber et al. 2015) or evaluation of rotational diffusion constants from MD (Wong and Case 2008). On the other hand, the NMR spin relaxation data help to improve force fields (Beauchamp et al. 2012).

Functionally Important Motions (FIMs)

Some motions of proteins are critical for their proper functioning (Henzler-Wildman and Kern 2007). MD simulations may identify such modes. Relation of protein’s mechanics and function was reviewed in 2003 by Schulten (Tajkhorshid et al. 2003), but since that time, this group studied new problems, for example, plant phototropism (Freddolino et al. 2006b) and complete satellite Tobacco mosaic virus vibrations (Freddolino et al. 2006a). Other groups investigated large-scale motions in biosensors (Tatke et al. 2008), linker motions crucial for ligase (Liu and Nussinov 2010), or retinal release from opsin (Wang and Duan 2011). Extraction of information on FIM needs special methodology (Schuyler et al. 2009), such as essential dynamics (Amadei et al. 1993) or metadynamics (Biarnes et al. 2011); thus, new ways of MD data analysis are suggested (Hub and de Groot 2009; Vuillon and Lesieur 2015). Critical assessment of FIM analysis methods may be found in Moradi and Tajkhorshid (2014). The main global motions are encoded in protein structures, and these features may be explored in protein ligand studies using normal mode analysis (Bahar et al.2015). A very interesting step toward systematic classification of dynamical properties of proteins called dynasome has been proposed in paper (Hensen et al. 2012). Studies of over 100 distinct proteins show that internal mobility patterns exhibit rather continuous distribution. The strong correlation between structure and dynamics has been, however, noted.

Molecular Machines

Having such powerful computers at hand, we are ready to study molecular machines – proteins or bio-complexes that perform some mechanical work during their activity cycle (Scheres 2010; Elber and Kirmizialtin 2013). Rotations of parts of ATPase were studied by Ma et al. already in 2002 (Ma et al. 2002) and molecular rotation in ATP synthase by Aksimentiev et al. in 2004 (Aksimentiev et al. 2004). Solvent-induced lid opening in lipases was also analyzed computationally (Rehm et al. 2010). Even the whole cellular mechanics simulations were reviewed by Gao et al. (2006). In recent years, a flux on papers in this fascinating phenomenon of molecular machines has been observed (Kutzner et al. 2011; Sanbonmatsu 2012; Bock et al. 2013; Czub and Grubmuller 2014; Ito and Ikeguchi 2014; Mukherjee and Warshel 2012, 2015a, b; Ma and Schulten 2015).

Mechanoselective ion channels may be considered as molecular machines, too, and an ion gating is better understood now due to MD (Vasquez et al. 2008). Simulations were applied to the helicase motor (Dittrich and Schulten 2006; Yu et al. 2007; Flechsig and Mikhailov 2010) and perhaps to the most advanced (not counting more static virus capsids) object: the ribosome (Becker et al. 2009; Romanowska et al. 2008; Trylska 2010). Recent progress in ribosome simulations has been summarized by Sanbonmatsu (2012). The MD application in this field obviously profits from using 3-D animation; such tools bring new dimensions to protein science (Iwasa 2015). More information on large macromolecular complexes may be found in Perilla et al. (2015).

The Mechanism of Enzymatic Activity

Progress in computational enzymology is described in Lonsdale et al. (2010) and Carvalho et al. (2014). Reactions have to be treated using quantum mechanics, but some features of enzyme dynamics may be revealed by MD. For example, Peplowski et al. used the steered MD method to enforce a ligand transport within the biotechnological enzyme nitrile hydratase (Peplowski et al. 2008). In that way, residues that may change catalytic properties of this metalloprotein have been indicated. Carloni et al. used the MD approach to explain the mode of action of G proteins (Khafizov et al. 2009). Lid opening in proteasome has been studied in detail (Ishida 2014). Worth checking are papers devoted to computational studies of the model enzyme dihydrofolate reductase, since this enzyme is a target in cancer therapy and anti-malaria drug development (Kohen 2015). MD may serve also as a useful tool in designing new enzymatic functions (Damborsky and Brezovsky 2014). Free energy calculations in the context of enzymatic activity are easy to perform due to empirical valence bond (EVB) + MD software developments (Isaksen et al. 2015). Useful guidelines on how to perform effectively free energy simulations may be found in Klimovich et al. (2015).

Transport Phenomena in Proteins

Heme proteins are popular objects of MD simulations (Bikiel et al. 2006) since transport processes of small gaseous ligands (O2, NO, CO) are important for physiology, and therefore they serve as a “test ground” for new methods. Myoglobin is even sometimes called a “hydrogen atom” of MD simulations (Elber 2010). Other members of the heme globin family have been recently discovered, and studies of diffusion paths and free energy landscapes for neuroglobin (Orlowski and Nowak 2008), cytoglobin (Orlowski and Nowak 2007), or protoglobin (Forti et al. 2011a, b) were published. It seems that such systematic study allows for construction of uniform picture of ligand migration pathways and the evolution of transport protein structure (Cohen et al. 2008; Forti et al. 2011b). MD simulations are helpful in the explanation of complex enzymatic mechanisms, for example, of dehydrogenase/acetyl-CoA synthase (Wang et al. 2013) or cytochrome C oxidases (Oliveira et al. 2014), and interpretation of X-ray experiments (Tsuduki et al. 2012). However, determination of larger ligand diffusion paths is not trivial (Kingsley and Lill 2015). The standard MD protocol is not practical since simulation times required to collect reasonable statistics are long. Special variants of steered molecular dynamics are useful in such problems; see Fig. 4. The review of recent methodological progress and new ideas based on memetic algorithms are presented in a recent paper by Rydzewski and Nowak (2015).

Fig. 4
figure 4

A schematic view of complex access/exit paths PW1–PW3 of camphor ligand (licorice) in cytochrome P450cam enzyme calculated by our fast SMD memetic algorithm (Rydzewski and Nowak 2015). The figure was prepared using the PyMOL code (DeLano 2002) by J. Rydzewski

Structure and Dynamics of Ion Channels and Porins

Dynamics and transport through ion channels and other pores in biological membranes is a subject of vigorous research (Khalili-Araghi et al. 2009; Sigg 2014). Among many papers, the work on channel gating by M. Sansom group (Fowler and Sansom 2013; Aryal et al. 2015) is worth mentioning as well as B. Roux et al. computational studies of ion channels (Horn et al. 2014; Li et al. 2014) and numerous papers on the ion conductance in a potassium channel (Boiteux et al. 2007; Kim and Warshel 2014; Delemotte et al. 2015; Linder et al. 2015) and sodium channels as well (Li and Gong 2015). The main research groups have tried to formulate a uniform picture on the voltage-dependent ion channel conductance mechanism (Vargas et al. 2012). One should note that external electric field has critical impact on ion channel activity and this effect may be effectively modeled (English and Waldron 2015).

References to interesting computational studies of aquaporins may be found in a review by Hub et al. (2009). The water transport in eye lenses was investigated using large-scale simulations by D.E. Shaw group (Ikeguchi 2009). At the same time, K. Schulten’s team has modeled nanomechanics of RNA in nanopores (Miao and Schulten 2009; Khalili-Araghi et al. 2009). More recently, a huge simulation of nanopore formation provided new data on nuclear pore gating (Gamini et al. 2014).

Sodium symporter modeling has been recently reviewed by Bisha and Magistrato (2016); other symporters are described in Kardos and Héja (2015) or Espinoza-Fonseca and Ramírez-Salinas (2015). Increasing computational resources open possibility of all-atom simulations of a protein translocation through a nanopore (Di Marino et al. 2015).

Protein-DNA Interactions

Transfer of information from DNA to proteins and its impact on the whole cell activity is determined by protein-DNA interactions . Such complexes are difficult to model due to their size and heterogeneity, but good understanding of these systems is a key to genetics (Mac Kerell and Nilsson 2008; Rohs et al. 2009). Good examples of successful MD applications are studies of p53 protein binding modes to DNA quadruplexes (Ma and Levine 2007), investigations of zinc-finger proteins binding to DNA (Lee et al. 2010), or modeling of DNA bending (van der Vaart 2015). Details of the mechanism of Lac repressor interactions (Villa et al. 2005) and sliding along DNA (Furini et al. 2010) have been discovered by the MD methods. There is a growing interest in studies of proteins related to clustered regularly interspaced short palindromic repeats (CRISPRs) present in prokaryotic cells. A recent example of insightful study of CRISPR-related endoribonuclease may be found in Estarellas et al. (2015).

Origins of Molecular Diseases

Applications of computer simulations in the areas related to medical problems are numerous: funds come both from the public budgets and private charities. Many diseases have well-defined etiology, related to abnormalities in a protein structure, point mutations, etc. Such “molecular diseases” are popular objects of the theoretical modeling (Papaleo and Invernizzi 2011).

The epidemic of BSE spawned the great interest in the prion protein research. Simulations of folding (Sakudo et al. 2010; Kupfer et al. 2009) evolved into studies of membrane-bound complexes (DeMarco and Daggett 2009). The Alzheimer disease is related to amyloid fiber formation in the brain, and this process is successfully modeled (Urbanc et al. 2010; Kassler et al. 2010; Straub and Thirumalai 2010; Nasica-Labouze et al. 2015), too. Less known are studies of transthyretin (TTR) fibril formation (Ortore and Martinelli 2012; Rodrigues et al. 2010). However, the problem is serious since the aggregation of TTR leads to a lethal illness. It is necessary to notice some impressive studies of HIV virus (Carnevale et al. 2009; Zhao et al. 2013). There are examples of computational research on proteins involved in primary congenital glaucoma (Achary and Nagarajaram 2009), osteoporosis (Lee et al. 2011), epidermolysis bullosa simplex (Jankowski et al. 2014), autism spectrum disorders (Mikulska et al. 2011), or cancer (Kumar and Purohit 2014). An excellent example of MD exploratory study of point mutations in CFTR chloride channel involved in cystic fibrosis is the recent paper by Mornon et al. (2015).

Simulations of Single-Molecule AFM Experiments

In the opinion of this author (and many others as well), non-equilibrium protein dynamics has an excellent overlap with single-molecule experiments performed by atomic force microscope (AFM) (Nowak and Marszalek 2005; Kumar and Li 2010; Galera-Prat et al. 2010). One of the first computational studies of ligand-antibody enforced dissociation has been published in 2001 by Karplus et al. (Paci et al. 2001). Stretching of individual molecules by the AFM cantilever gives force spectra, i.e., dependence of a force (in the range 0–3,000 pN) on protein extension (0–100 nm or even more) (Rief and Grubmuller 2002) that may be confronted with special type of MD simulations, for example, steered molecular dynamics (Nowak and Marszalek 2005) or even quantum-type SMD (Lu et al. 2004). An example of SMD stretching of reelin (see Fig. 2) – a protein important in neural system development and proper functioning of synapses – is presented in Fig. 5.

Fig. 5
figure 5

An example of mechanical unfolding scenario of modular protein reelin recovered by SMD (CHARMM27 force field, NAMD code (Phillips et al. 2005)). K. Mikulska, W. Nowak (unpublished results). The figure was prepared using the VMD code (Humphrey et al. 1996)

One can see that despite rather low stretching velocity, the computed SMD forces are much higher (1–2 nN) than that (100–300 pN) typically observed in AFM experiments (Mikulska et al. 2012); this is related to high loading rates imposed by limited SMD time (G. Lee et al. 2004). Good progress toward SMD and high-speed AFM result consistency is observed (Sotomayor and Schulten 2007; Rico et al. 2013) especially if coarse-grained models of proteins are employed (Kumar and Li 2010; Galera-Prat et al. 2010; Mikulska et al. 2014; Chwastyk et al. 2014; Chen et al. 2016). Free energy profiles along a stretching coordinate may be calculated using Jarzynski theorem (Hummer and Szabo 2010; Rydzewski et al. 2015). Therefore, more and more laboratories use SMD modeling as a tool to interpret results of SMFS (He et al. 2012) and optical tweezers (Sieben et al. 2012) experiments. A computational microscope postulated long time ago (Lee et al. 2009) is at hand.

Conclusions and Future Directions

Computational chemistry is currently a well-established, fully functional branch of science (Akimov and Prezhdo 2015). Easy access to computers and high-quality, specialized computer codes results in myriad of applications. There is a large, well-trained, and active community of computational chemists. Results of calculations are useful and hard to obtain without the theoretical and computational approaches. Virtually all chemical and a large body of biological systems may be modeled using current facilities.

It looks that computer simulations of proteins have still very good prospects ahead. There are new, promising directions of the investigations. Better computational power offers possibility of real-time calculations of protein dynamics, including aggregates and membrane-embedded systems. MD simulations of proteins may help in rational drug design (Harvey and De Fabritiis 2012). Electronic excited states of proteins are unexplored area. The methodology of MD simulations of photoexcited states is still not well established (Kubiak and Nowak 2008) (Rydzewski and Nowak, chapter "Molecular Dynamics in Excited State"), but future MD applications should include interactions of biological systems with light (Dittrich et al. 2005; Hayashi et al. 2009; Rossle and Frank 2009). For example, nice progress of understanding protonic gating in the popular green fluorescent protein chromophore has been achieved through dynamical simulations (Olsen et al. 2010). Similar studies related to optogenetics will appear soon. Every month brings new better software for extensive analysis of MD trajectories (Zwier et al. 2015) or new ways of doing MD (Kukic et al. 2015).

An interpretation of special experiments, such as high-pressure studies (Paci 2002) and especially cryo-electron microscopy (McGreevy et al. 2016) or X-ray data fitting (McGreevy et al. 2014), may be fruitfully augmented by an application of computational modeling. Computer scientists and physicists work hard to enlarge the maximum size of the simulated system. The whole virus all-atom simulation is not a record study anymore (Zink and Grubmuller 2009; Sanbonmatsu and Tung 2007; Zhao et al. 2013; Goh et al. 2015), computational virusology field has been established (Reddy and Sansom 2016), and there are plans to use computational methods in the design of useful viruses {Zhang et al. 2015, p. 8316}. However, one should note that recent record of 64-million-atom NAMD simulations required 8,000 Cray XK7 nodes for a period of 2 months (Perilla et al. 2015).

Long time ago, it has been shown in Los Alamos National Laboratory that even 320-million-atom protein simulations are technically possible (2009). Large systems, such as big protein complexes in the living cell, will require coarse-grained approaches (Clementi 2008). A lot of insight comes from such modeling of membrane proteins (Sansom et al. 2008; Ayton et al. 2010; Stansfeld et al. 2015). Multiscale modeling is yet another line of methodological progress (Tozzini 2010; Nielsen et al. 2010; Goga et al. 2015). Such techniques expand timescale accessible for MD studies beyond 1 μs limit (Chu et al. 2007; Ayton et al. 2007; Sherwood et al. 2008). Impressive simulations on microsecond timescale coarse-grained Martini FF molecular dynamics simulations of enveloped virions in explicit solvent (5,000,000 particles) were performed by Sansom group at Oxford (Reddy et al. 2015). The calculated properties of the influenza A virion were consistent with experimental measurements.

What is the best technological strategy for the optimum performance of MD simulations is a matter of debate (Borell 2008; Goga et al. 2015). For example, D.E. Shaw has invested a lot of resources in order to develop dedicated chips with record performance for specialized tasks (Klepeis et al. 2009). Other groups, such as K. Schulten’s team, prefer improving algorithms and developing software for running calculations on very powerful, relatively inexpensive, graphical processing units, mass produced for computer game fans (Hardy et al. 2009; Stone et al. 2007; Zhmurov et al. 2010).

Both approaches have difficulty with surpassing grid computing idea: “Folding@Home” project created perhaps the most powerful computational device ever and attracted a lot of young people to science (Pande et al. 2003; Belden et al. 2015). Hopefully, all this computational chemistry modeling effort will bring us a better understanding of nature and a better life for everybody.