1 Introduction

Analysis of potential energy surfaces (PES) provides an useful information about properties of a system at hands. The shape of PES in vicinity of important stationary points, such as minima (i.e., stable states) and first-order saddle points [i.e., transition states (TS)], is as a key ingredient for the calculations based on the harmonic transition state theory [1], which is presently the most popular method to study reaction kinetics. Furthermore, the relaxations are often performed as a first step in the analysis of diverse physical properties of a system at hands, and hence they certainly represent one of the most common tasks executed by computational chemists. A determination of minima is relatively straightforward - in principle, one can just follow direction of negative gradient to descent from arbitrary initial point to some local minimum. In a first-order saddle point optimization, however, energy must be maximized along one and only one direction parallel with reaction coordinate (RC) and minimized along all remaining independent degrees of freedom. The fact that the reaction coordinate is usually unknown a priori is one of the reasons why the TS relaxations are typically much more difficult to perform than the stable states optimizations. Hence the computational algorithms designed for the TS optimization require, one way or the other, an initial guess for the unstable direction. The algorithms such as the quasi-Newton method [2], the partitioned rational function optimization [3] (P-RFO), or the geometrical direct inversion in the iterative subspace (DIIS) method [4], receive this information from the eigenpairs of the matrix of second order partial derivatives of energy with respect to the coordinates of system (Hesse matrix or Hessian), which has to be provided, at least for the initial configuration, by the user. The initial unstable direction must be explicitly specified also in the dimer method [5], while the nudged elastic band method [6] (NEB) optimizes whole RC connecting the initial and the final stable states and the user must define several intermediate states lying on the guessed initial reaction coordinate. For a detailed review of TS optimization methods, the reader is referred e.g., to Ref. [7].

Many chemical reactions and other activated processes of technological interest, such as surface diffusion and phase transitions, occur in solid state. Due to practical reasons, these processes are commonly simulated using periodic boundary conditions (PBC). In addition to atomic positions, the periodic systems are defined also by the size and shape of their unit cells. The methods allowing for a simultaneous optimization of atomic and lattice degrees of freedom, such as the solid state NEB [8, 9] and the solid state dimer methods [10], are surprisingly scarce. In this work, we extend our algorithm for periodic systems relaxation [11] in delocalized internal coordinates [12], which we briefly review in Sect. 2.1, for the use in geometry optimizations of transition states. The usefulness of internal coordinates in the TS optimizations has already been well documented for molecular systems [13,14,15]. One of the greatest advantages of methods based on internal coordinates is the availability of simple and reasonable initial Hesse matrix models. In Sect. 2.2, we define the Hesse matrix for the use in periodic TS optimizations and we suggest a simple model allowing for its inexpensive construction. The abilities of our algorithm are demonstrated in Sect. 3, where optimizations of various molecular and periodic systems with and without additional geometric constraints are discussed. Finally, summary and conclusions are presented in Sect. 4.

2 Methods

2.1 Delocalized internal coordinates for optimization of periodic systems

In this section, we briefly review the use of delocalized internal coordinates of Baker et al. [12] in optimization of periodic systems, which we proposed in our previous work [11]. We note that an alternative formulation of the same problem has been reported by Andzelm et al. [16].

Let us consider a periodic system consisting of \({N}_\mathrm{at}\) atoms contained within a single unit cell formed by three lattice vectors \(\varvec{a}_1\), \(\varvec{a}_2\), and \(\varvec{a}_3\), which we arrange in the matrix \(\varvec{h}=[\varvec{a}_1,\varvec{a}_2,\varvec{a}_3]\). It is convenient to define positions of atoms within the unit cell by fractional coordinates \(\varvec{s}=\{s_\alpha ^{a};a=1,\ldots ,N_\mathrm{at};\alpha =1\equiv a_1,2\equiv a_2,3 \equiv a_3\}\), which, by convention, fulfill the condition \(0 \le s_\alpha ^{a} < 1\), and which are related to Cartesian coordinates of the same set of atoms \(\varvec{r}=\{r_\alpha ^{a};a=1,\ldots ,N_\mathrm{at};\alpha =1 \equiv x,2 \equiv y,3 \equiv z\}\) shifted by the \(\varvec{L}=(l_1,l_2,l_3)\) multiples of the lattice vectors by the following linear transformation:

$$\begin{aligned} r_\alpha ^{a,\varvec{L_a}} = \sum _{\beta =1}^3 \, h_{\alpha \beta } (s^{a}_\beta + l_\beta ^{a}). \end{aligned}$$
(1)

Primitive internal coordinates (\(\varvec{q}\)) for any connected set of atoms are defined by collecting all interatomic distances within expected bond-lengths. Subsequently, bond-lengths are combined to define bond-angles and dihedral angles. Note that for the optimization of periodic systems, both the intra-cell (i.e., those defined by atoms from the same unit cell) and inter-cell (i.e., those defined by atoms from different cells) coordinates must be taken into account. Due to the translational symmetry, however, only the coordinates which are defined via at least one atom located in a reference cell (i.e., \(\varvec{L}=0\)) have to be considered. Having accumulated the information on connectivity of individual atoms, mutually unconnected molecular fragments can be identified and interconnected using suitable coordinates, such as inverse-power distances [17]. The number of internal coordinates generated in such a way is typically much greater than the number of independent degrees of freedom of the system (e.g., \(3{N}_\mathrm{at}+3\) for an unconstrained three-dimensional periodic system). Baker et al. [12] proposed to replace such a redundant set of internal coordinates by a non-redundant set of delocalized internal coordinates (\(\varvec{\tilde{q}}\)), related to \(\varvec{q}\) via the following transformation:

$$\begin{aligned} \varvec{\tilde{q}}=\varvec{U^t}\varvec{q}, \end{aligned}$$
(2)

where the transformation matrix \(\varvec{U}\) is formed by eigenvectors associated with nonzero eigenvalues of the matrix \(\varvec{B}\varvec{B^t}\). In the case of a molecular system, the Wilson’s B-matrix [18] \(\varvec{B}\) is defined as the matrix of derivatives of \(\varvec{q}\) with respect to the Cartesian coordinates of all atoms (i.e., \(\delta {\varvec{q}} = \varvec{B} \delta {\varvec{r}}\)). As suggested in our previous work [11], the definition of \(\varvec{B}\) can be generalized for the use in optimizations of periodic systems as follows:

$$\begin{aligned} \begin{pmatrix} \delta {\varvec{q}} \end{pmatrix}\quad =\begin{pmatrix} \varvec{B}^{qs}&\varvec{B}^{qh} \end{pmatrix}\quad \begin{pmatrix} \delta {\varvec{s}} \\ \delta {\varvec{h}} \end{pmatrix}, \end{aligned}$$
(3)

where the blocks \(\varvec{B}^{qs}\) and \(\varvec{B}^{qh}\) are defined by the following equations:

$$\begin{aligned} B^{qs}_{i,a_\alpha }= & {} \frac{\partial q_i\left( r^{a,\varvec{L}_a}_\alpha ,r^{b,\varvec{L}_b}_\beta ,\ldots \right) }{\partial s_{\alpha }^{a}} = \sum _{\varvec{L}_a} \sum _{\beta }\, \frac{\partial q_i}{\partial r_{\beta }^{a,\varvec{L}_a}} \frac{\partial r_{\beta }^{a,\varvec{L}_a}}{\partial s_{\alpha }^{a}} \nonumber \\= & {} \sum _{\varvec{L}_a} \sum _\beta \frac{\partial q_i}{\partial r_{\beta }^{a,\varvec{L}_a}}\, h_{\beta \alpha }, \end{aligned}$$
(4)

and

$$\begin{aligned} B^{qh}_{i,\alpha \beta }= & {} \frac{\partial q_i\left( r^{a,\varvec{L}_a}_\alpha ,r^{b,\varvec{L}_b}_\beta ,\ldots \right) }{\partial h_{\alpha \beta }} \nonumber \\= & {} \sum _{a,\varvec{L}_a} \sum _{\gamma } \frac{\partial q_i}{\partial r_{\gamma }^{a,\varvec{L}_a}} \frac{\partial r_{\gamma }^{a,\varvec{L}_a}}{\partial h_{\alpha \beta }}\nonumber \\= & {} \sum _{a,\varvec{L}_a} \sum _{\gamma } \frac{\partial q_i}{\partial r_{\alpha }^{a,\varvec{L}_a}}\delta _{\alpha \gamma } \left( s_{\beta }^{a} +l_\beta ^a\right) \nonumber \\= & {} \sum _{a,\varvec{L}_a} \frac{\partial q_i}{\partial r_{\alpha }^{a,\varvec{L}_a}} \left( s_{\beta }^{a} +l_\beta ^a\right) . \end{aligned}$$
(5)

The gradient vector \(\varvec{f}\) used in optimizations of periodic systems must involve contributions from the energy derivatives with respect to the fractional coordinates:

$$\begin{aligned} \frac{\partial E}{\partial s_{\alpha }^{a}} = \sum _{\beta } \frac{\partial E}{\partial r_{\beta }^{a}} h_{\beta \alpha }, \end{aligned}$$
(6)

as well as those with respect to the lattice vectors components:

$$\begin{aligned} \frac{\partial E}{\partial h_{\alpha \beta }}= -\,\Omega \,\sum _\mu \sigma _{\alpha \mu } (h)^{-1}_{\beta \mu }, \end{aligned}$$
(7)

where \(\Omega = |\varvec{a}_1 \cdot (\varvec{a}_2 \times \varvec{a}_3)|\) is the volume of unit cell, and \(\varvec{\sigma }\) is the stress tensor. The gradients expressed in terms of delocalized internal coordinates (\(\varvec{\varphi }\)) can be obtained from \(\varvec{f}\) using the relation:

$$\begin{aligned} \varvec{\varphi }=\varvec{U^t}\varvec{A^t} \varvec{f}, \end{aligned}$$
(8)

with \(\varvec{A^t}\) being a Moore–Penrose pseudoinverse of matrix \(\varvec{B} = \left( \varvec{B}^{qs} \, \varvec{B}^{qh}\right)\):

$$\begin{aligned} \varvec{A}=\left( \varvec{B^t}\varvec{B} \right) ^{-1} \varvec{B^t}. \end{aligned}$$
(9)

Upon determining the geometry optimization step in the space of delocalized internal coordinates by using an appropriate relaxation algorithm, the corresponding set of fractional coordinates and lattice vectors components is obtained in an iterative procedure:

$$\begin{aligned} \begin{pmatrix} \varvec{s}^{i+1}&\varvec{h}^{i+1} \end{pmatrix} =\begin{pmatrix} \varvec{s}^{i}&\varvec{h}^{i} \end{pmatrix} +\varvec{A}\varvec{U}\left( \varvec{\tilde{q}}^{opt}-\varvec{\tilde{q}}^{i}\right) , \end{aligned}$$
(10)

where \(\varvec{\tilde{q}}^{opt}\) are the delocalized internal coordinates determined by the optimization algorithm, and \(\varvec{\tilde{q}}^{i}\) are the delocalized internal coordinates calculated using the coordinates \(\varvec{s}^{i}\) and \(\varvec{h}^{i}\) from the iteration i, and the iteration is initialized using the values of coordinates from the previous optimization step.

The geometric constraints can be implemented in a straightforward way within the framework of delocalized internal coordinates [12]. The essential point is that the matrix \(\varvec{B} = \left( \varvec{B}^{qs} \, \varvec{B}^{qh}\right)\) must be modified in such a way that the vectors \(\varvec{B}_{j}\) corresponding to coordinates free to relax are made orthogonal to all vectors \(\varvec{B}_{c}\) corresponding to constrained coordinates, i.e., the vectors \(\varvec{B}_{j}\) must be modified according to formula:

$$\begin{aligned} \bar{\varvec{B}}_{j}= \varvec{B}_{j} - \sum _{c} \frac{(\varvec{B}_{j} \cdot \varvec{B}_{c})}{|\varvec{B}_{c}|} \frac{\varvec{B}_{c}}{|\varvec{B}_{c}|}, \end{aligned}$$
(11)

where the summation is over all constrained coordinates. The delocalized internal coordinates generated using the modified matrix \(\bar{\varvec{B}}\) correspond to linear combinations of either constrained or active coordinates but the two types are never mixed in a definition of individual \(\tilde{q}\). The energy gradients for \(\tilde{q}\) defined via constrained coordinates are then set to zero and the relaxation proceeds as in an unconstrained case. In principle, any coordinate with well defined first order derivatives with respect to atomic positions and lattice vectors components (e.g., bonds, angles, torsions, fractional, and Cartesian coordinates, lengths of lattice vectors and angles between them, cell volume, lattice vector components...) can be constrained in such a way.

2.2 Hesse matrix for transition state optimization

In our optimization program gadget [11], the optimization methods that employ the Hesse matrix (\(\varvec{H}\)) are implemented. The matrix \(\varvec{H}\) used in the TS optimizations should have one and only one negative eigenvalue with the corresponding eigenvector being approximately parallel with the reaction coordinate. Ideally, the Hesse matrix should be computed in every optimization step but such a procedure would be too time-consuming when performed for an extended system at ab initio level. When a finite-differences algorithm is used (as is often the case when a periodic DFT code is used in simulations), for instance, at least \(3{N}_\mathrm{at}\) additional gradient evaluations are needed for the \(\varvec{H}\) calculation, and the related computational cost increases the total cost of optimization. A commonly used strategy to tackle this problem [14, 15] is to determine \(\varvec{H}\) only for the initial structure and update it for the use in the next relaxation steps by employing the information on positions and gradients accumulated during the optimization. Out of the Hesse matrix updating schemes suitable for the transition state optimizations, we chose the scheme proposed by Bofill [19], which is based on a weighted combination of Murtagh–Sargent [20] and Powell-symmetric-Broyden [21] formulae.

In our TS optimizations, we define the initial Hesse matrix as follows. First, we approximate the Hesse matrix in primitive internal coordinates (\(\varvec{H}^{q}\)) by a simple diagonal matrix [12] in which the force-constants for all bonds, angles, and torsions are set to 0.5 a.u., 0.2 a.u., and 0.1 a.u., respectively. We note that more sophisticated models, such as those by Lindh [22] or Fischer [23], can also be used for this purpose. Next, the \(\varvec{H}^{q}\) is transformed into the \(\{\varvec{s},\varvec{h}\}\) coordinates using the approximate relation that neglects the contribution of terms involving the second order derivatives of primitive internal coordinates with respect to the atomic positions [14] and the lattice vectors components:

$$\begin{aligned} \begin{pmatrix} \varvec{H}^{s} &{} \varvec{H}^{sh} \\ (\varvec{H}^{sh})^t&{} \varvec{H}^{h}\\ \end{pmatrix} \approx \varvec{B^t}\varvec{H}^{{q}}\varvec{B}, \end{aligned}$$
(12)

where \(\varvec{H}^{s}\) and \(\varvec{H}^{h}\), are the blocks corresponding to second order derivatives of energy with respect to fractional coordinates and lattice vectors components, respectively, and \(\varvec{H}^{sh}\) is the block with mixed second order derivatives with respect to components of \(\varvec{s}\) and \(\varvec{h}\). Next, either the whole blocks \(\varvec{H}^{s}\) and \(\varvec{H}^{sh}\) or their rows and columns involving coordinates of 'active’ atoms (a), i.e., those with presumably significant contribution to reaction coordinate, are replaced by the corresponding terms computed from the second order derivatives with respect to Cartesian coordinates determined at the DFT level (e.g., via finite differences):

$$\begin{aligned} \frac{\partial E}{\partial s_{\alpha }^{a} \partial s_{\beta }^{b}} = \sum _{\mu } \sum _{\nu } \frac{\partial E}{\partial r_{\mu }^{a} \partial r_{\nu }^{b}} h_{\mu \alpha } h_{\nu \beta }, \end{aligned}$$
(13)

and

$$\begin{aligned} \frac{\partial E}{\partial h_{\alpha \beta } \partial s_{\gamma }^{a}} = \sum _{\mu } \sum _{\nu } \frac{\partial \sigma _{\alpha \mu }}{r^a_{\nu }} (h)^{-1}_{\beta \mu } h_{\nu \gamma }. \end{aligned}$$
(14)

We note that the idea of treating contributions of active and inactive atoms to the Hesse matrix at different levels of theory has been reinvented by several different authors [24,25,26,27]. It is only in special cases of processes with reaction coordinate determined solely by the lattice vectors components (such as the martensitic transformation discussed in Sect. 3.6), when the components of block \(\varvec{H}^{h}\):

$$\begin{aligned} \frac{\partial E}{\partial h_{\alpha \beta } \partial h_{\gamma \delta }} = \sum _{\mu } \frac{\partial \sigma _{\alpha \mu }}{h_{\gamma \delta }} (h)^{-1}_{\beta \mu } \end{aligned}$$
(15)

need to be determined accurately. In all other cases discussed in this work, the block \(\varvec{H}^{h}\) is approximated by the simple model described above.

The correct eigenvalue spectrum of the Hesse matrix is then ensured by its spectral decomposition and replacing the eigenvalues (\(\lambda _i\)) by their absolute values for all modes (\(\varvec{u}_i\)) except of the one representing the unstable mode (\(\varvec{u}_1\)) that is approximately parallel with the reaction coordinate:

$$\begin{aligned} \varvec{H} \rightarrow \lambda _1 \varvec{u}_1 \varvec{u}^t_1 + \sum _{i=2}^{3N_\mathrm{at}+9} |\lambda _i| \varvec{u}_i \varvec{u}^t_i. \end{aligned}$$
(16)

In the cases where the initial \(\varvec{H}\) matrix has multiple negative eigenvalues, the appropriate unstable mode is selected by a visual inspection of the corresponding vibrational modes. Finally, the matrix \(\varvec{H}^{\tilde{q}}\) used in the relaxation in delocalized internal coordinates is obtained by the following transformation:

$$\begin{aligned} \varvec{H}^{\tilde{q}} = \varvec{U^t} \varvec{A^t} \begin{pmatrix} \varvec{H}^{s} &{} \varvec{H}^{sh} \\ (\varvec{H}^{sh})^t&{} \varvec{H}^{h}\\ \end{pmatrix}\varvec{A} \varvec{U}. \end{aligned}$$
(17)

The procedure of Eq. 16 is applied after every \(\varvec{H}\) update performed during the relaxation. We note that a similar treatment has been used also in the previous work [25, 27].

3 Numerical tests

3.1 Simulation details

The energy and force calculations have been performed using the periodic DFT code vasp [28,29,30,31]. The Kohn–Sham equations have been solved variationally in a plane-wave basis set using the projector-augmented-wave (PAW) method of Blöchl [32], as adapted by Kresse and Joubert [33]. The PBE exchange-correlation functional in the generalized gradient approximation proposed by Perdew et al. [34] was used. For the purposes of our tests, high-quality forces were essential. Hence the calculations have been carried out with dense FFT grids (set automatically via the input parameter PREC=Accurate), and projection operators have been evaluated in reciprocal space. In order to reduce the Pulay stress possibly biasing the cell geometry optimizations, large plane-wave cutoffs have been used in calculations of periodic systems, see Table 1. In each self-consistent field cycle, the electronic wave function was converged to \(1\,\times \,10^{-7}\) eV/cell. In order to reduce the effect of periodic boundary conditions in relaxations of molecular systems (see Sect. 3.2), the size of each unit cell has been chosen such as to ensure that the minimal distance between any two atoms located each at a different periodic image of the molecule was at least 10 Å.

Table 1 Summary of important simulation parameters used in this study

The geometry optimizations were carried out using the external optimizer gadget [11], which reads the geometry, energy, and forces from the vasp output, and, until convergence, sets up internal coordinates, estimates an optimal move, calculates the new set of lattice parameters and fractional coordinates, and starts a new vasp calculation. In our tests, the P-RFO method [3] has been used but a similar performance has been achieved also by using the geometrical DIIS method [4] or a quasi-Newton method [2]. The maximal size of optimization step performed in primitive internal coordinate space (i.e., bonds, angles, and torsions) has been limited to 0.1 a.u. The optimizations have been considered to be converged when the maximal Cartesian force acting on each atom and, in the case of the cell geometry optimizations, also the maximal force acting on the lattice vectors, was smaller than \(5\times 10^{-3}\) eV/Å. Except of the test described in Sect. 3.6, two types of initial Hesse matrices have been used in simulations, both constructed as described in Sect. 2.2. In ’exact’ initial Hessian, all elements of blocks \(\varvec{H}^{s}\) and \(\varvec{H}^{sh}\) were computed at the DFT level via finite differences, while only rows and columns corresponding to the elements related to a relatively small number of active atoms (i.e., those with presumably significant contribution to RC) were determined at the DFT level in ’approximate’ \(\varvec{H}\). The input and selected output files from all calculations presented in this work are provided as electronic supplementary material (see Online Resource 1).

3.2 Molecular systems

In this section, we compare the performance of exact and approximate initial model Hesse matrices (see Sect. 3.1) in relaxation of atomic positions at fixed cell geometry in transition states of selected gas-phase reactions from the benchmark set of Baker and Chan [13]. Clearly, such a test is meaningful only for sufficiently large systems, where a significant fraction of atoms with negligible contribution to RC can be identified. For this reason, only systems consisting of ten or more atoms have been considered, whereby the number of active atoms ranged from 2 to 8 (see Table 2). The active atoms have been selected on the basis of visual inspection of unstable vibrational modes.

Table 2 The number of steps needed to relax the atomic positions in transition states of selected reactions from the benchmark set of Baker and Chan [13] using the method employed in this work (P-RFO) in combination with exact (exact) and approximate (approx.) initial Hesse matrices. The numbers in parentheses indicate the total number of gradient calculations performed in optimizations and Hesse matrix calculations. The number of atoms (\({N}_\mathrm{tot}\)) in each system and the number of active atoms (\({N}_\mathrm{act}\)) used to compute the approximate initial Hesse matrix are also indicated. For sake of comparison, the number of relaxation steps needed to achieve convergence using the improved dimer method [5, 35] (IDM) is also shown

The numbers of optimization steps (i.e., gradient calculations) needed to fulfil the relaxation criterion are compiled in Table 2. We note that relaxations performed for the same system using different methods converged to the same states with the energies that were identical within 0.1 meV (see the data in Online Resource 1). As evident, the optimization is strongly affected by the quality of initial \(\varvec{H}\) and the number of steps needed to achieve convergence in all seven systems is almost doubled (257 vs. 130) when exact \(\varvec{H}\) is replaced by the approximate Hessian. Importantly, however, the number of gradient evaluations needed to construct the approximate \(\varvec{H}\) is significantly smaller than that for the exact Hessian and this fact should be also considered when comparing the computational cost of these two sets of simulations. Taking all gradient evaluations into account, it is evident that the optimizations with the approximate \(\varvec{H}\) (total of 349 gradient evaluations) are actually slightly more efficient than those employing the exact Hesse matrix (373 evaluations). One can expect that the use of approximate \(\varvec{H}\) will be even more effective in optimizations of large systems where the ratio of active to inactive atoms is small. This should hold true especially in the cases where the initial configuration is very different from the relaxed structure and hence the Hessian can be expected to change significantly in the course of optimization. Our results presented in Sect. 3.4 and 3.5 support this expectation.

In order to illustrate the efficiency of the optimization algorithm used in this work, additional calculations have been performed using the improved dimer method (IDM) [5, 35], which is one the standard TS optimization methods implemented in vasp. A default computational setting for the IDM calculations has been used and the initial dimer vector was defined as the unstable eigenvector of the exact Hesse matrix. Unlike the P-RFO method, the IDM failed to converge in two out of seven systems (see Table 2). The performance of IDM in optimization of remaining systems (as measured in number of steps needed to achieve convergence) is significantly worse than that of the P-RFO combined with the delocalized internal coordinates. This result is, however, not surprising as the dimer method in its present implementation does not make use of the full information encoded in the Hesse matrix, and a large part of steps performed by the IDM corresponds to auxiliary calculations, such as the determination of curvature along the unstable direction or rotation of the dimer axis, rather than to actual optimization.

3.3 1D \(\hbox {H}_2\)-chain

As a first example of TS optimization of periodic systems, a simple reaction of one-dimensional chain of \(\hbox {H}_2\) molecules is considered, in which the atoms are reconnected as shown in Fig. 1. The unit cell contains only two atoms and the system can be fully described by just two geometric parameters, e.g., the difference in distances between an arbitrary H atom and its two neighbors (\(r_1-r_2\)), and the length of the lattice vector parallel with the chain (a) shown Fig. 1. As dictated by symmetry, the structure of TS corresponds to an infinite chain of H atoms with equidistant separation between nearest neighbours (i.e., \(r_1-r_2=0\)).

Fig. 1
figure 1

Initial (a), transition (b), and final (c) states of reconnection reaction of atoms in one-dimensional \(\hbox {H}_2\) chain. The solid lines represent the unit cell used in calculations

The geometry of simulation cell in the initial structure was given by the lattice parameters \(a=2.7\ {\AA }\), \(b=10.0\ {\AA }\ c=10.0\ {\AA }\), \(\alpha =\beta =\gamma =90^{\circ }\), and the value of the coordinate \(r_1-r_2\) was set to 0.54 Å. The geometry was allowed to relax only in direction parallel with the lattice vector a, while the lattice vectors b and c as well as the corresponding fractional coordinates of atoms were fixed. Maximal Cartesian component of gradient computed for the initial structure was 3.94 eV/Å and the internal pressure of 0.9 GPa indicated that the initial structure was expanded with respect to the equilibrium structure. Owing to the dimensionality of this system, only relaxation with exact initial \(\varvec{H}\) matrix has been performed (see Sect. 3.1). The geometry optimization of atomic positions at fixed cell geometry converged in 9 steps and the correct TS geometry (i.e., \(r_1-r_2=0\ {\AA }\)) has been obtained. We note that the relatively large number of optimization steps in this truly one-dimensional problem was caused partly by the step limitation (see Sect. 3.1) consistently applied in all calculations discussed in this work. The simultaneous relaxation of atomic positions and lattice parameter a converged in 16 steps and the geometry with \(r_1-r_2=0\ {\AA }\) and \(a=1.977\ {\AA }\) has been obtained. As the value of \(r_1-r_2\) in TS follows from symmetry, the predicted TS geometry can be easily checked by a set of single-point calculations with varied value of a and fixed \(r_1-r_2=0\ {\AA }\). As shown in Fig. 2, the position of relaxed TS fits the minimum on the energy versus a plot perfectly, as it should.

Fig. 2
figure 2

Energy of the one-dimensional chain of H atoms with \(r_1-r_2=0\ {\AA }\) as a function of the cell parameter a (see Fig. 1b). Red triangle represents the result of TS optimization with variable a

3.4 Proton transfer in chabazite

In the next example, we discuss the transition state optimization of an unconstrained 3D system. The reaction that we consider is a proton shift between two oxygen atoms next to aluminum in zeolite chabazite (see Fig. 3). This reaction has been studied theoretically [36, 37] in the context of heterogeneous catalysis. The unit cell with 37 atoms and the following initial values of lattice parameters has been used in calculations: \(a=b=c=9.291\ {\AA }\), \(\alpha =\beta =\gamma =93.9^{\circ }\), \(V=796.1\ {\AA }^3\). We note that the rhombohedral symmetry of cell is valid for higly siliceous chabazite or for a structure with a random distribution of Al and H atoms [38] and the use of a relatively small simulation cell creating an artificial regular pattern in the Al and H positions can be expected to break this symmetry. The maximum component of Cartesian gradient determined for the initial structure was as large as 7.71 eV/Å and the internal pressure was negative (\(-1.9\ \hbox {GPa}\)) indicating that this structure was compressed.

Fig. 3
figure 3

Initial (a), transition (b), and final (c) states of the proton transfer in acid chabazite. Atoms H, O1, and O2 were considered as active in construction of approximate initial Hesse matrix

The relaxation of atomic degrees of freedom at fixed lattice geometry converged in 88 steps when the exact initial Hessian was used. With the same \(\varvec{H}\), the full relaxation of atomic and lattice degrees of freedom converged in 67 steps, yielding the structure with the following lattice parameters: \(a=9.367\ {\AA }\), \(b=9.480\ {\AA }\), \(c=9.206\ {\AA }\), \(\alpha =92.0^{\circ }\), \(\beta =93.2^{\circ }\ \gamma =93.8^{\circ }\), \(V=813.8\ {\AA }^3\). Despite the relatively large expansion and deformation of lattice in the relaxation, the internal degrees of freedom contributing to the RC most significantly, namely the distances H-O1 and H-O2 (see Fig. 3), differ only by \(<0.01\ {\AA }\) from the values determined in relaxation of atomic positions only. This result is a consequence of the fact that this TS geometry is primarily determined by strong Coulomb interactions of \(\hbox {H}^+\) with the neighboring O atoms possessing a partial negative charge. The stabilization of TS due to the lattice relaxation is relatively modest (0.153 eV/cell).

In the next test, we examine the performance of the approximate Hessian (see Sect. 3.1) whereby only the atoms H, O1, and O2 (see Fig. 3) are considered as active. Both the atomic only and the full relaxations performed with the approximate initial \(\varvec{H}\) converged to the same structures as the relaxations performed with the exact initial \(\varvec{H}\). The number of relaxation steps needed to achieve convergence increased significantly (132 steps for the relaxation of atoms and 98 steps for the full relaxation) but taking into account the minimal number of gradient evaluations needed for the \(\varvec{H}\) calculation (111 for the exact and only 9 for the approximate Hessian), the relaxation with the approximate \(\varvec{H}\) turns out to be more efficient than that employing the exact initial Hessian.

3.5 Partial desorption of crotonaldehyde from the MgO surface

In this section, we consider partial desorption of crotonaldehyde from the MgO surface, which is a part of a catalytic transformation of ethanol to 1,3-butadiene, recently studied by Taifan et al. [39]. As shown in Fig. 4, the reaction consists of rotation of a relatively large molecular fragment with respect to the substrate (see also Online Resource 2 showing animation of the full transformation path). As it is typical for this kind of motion, the imaginary frequency of the unstable vibrational mode is relatively soft (173 i \(\hbox {cm}^{-1}\) in the initial and 93 i \(\hbox {cm}^{-1}\) in the final structures, respectively) making the determination of this TS rather challenging. Following Ref. [39], the initial geometry of simulation cell was defined by the lattice parameters \(a=12.746\ {\AA }\), \(b=17.020\ {\AA }\), \(c=19.255\ {\AA }\), \(\alpha =\beta =\gamma =90^{\circ }\), \(V=4183.1\ {\AA }^3\), and the fractional coordinates of atoms of the bottommost layer were fixed at their bulk values in all simulations discussed here, i.e., 82 out of the total of 130 atoms were allowed to relax. In relaxations involving the lattice components, additional constraints fixing the length of the lattice vector \(\varvec{c}\), which is perpendicular to the slab, and all lattice angles were introduced as described in Sect. 2.1. The maximal Cartesian component of gradient computed for the initial structure was 2.22 eV/Å and the stress-tensor components \(\sigma _{xx}\), and \(\sigma _{yy}\) (supposed to be relaxed out in the lattice optimization) were \(-2.5\ \hbox {GPa}\), and \(-3.7\ \hbox {GPa}\), respectively.

Fig. 4
figure 4

Initial (a), transition (b), and final (c) states of partial desorption of crotonaldehyde from the MgO surface with a stepped kink. The atoms represented by spheres were considered as active in construction of approximate initial Hesse matrix

The relaxations of atomic positions performed with the exact initial Hessian converged in 50 steps. The stress-tensor components \(\sigma _{xx}\) and \(\sigma _{yy}\) determined for the final structure (\(-2.4\ \hbox {GPa}\) and \(-3.7\ \hbox {GPa}\), respectively) changed only little compared to those computed for the initial configuration and their numerical values indicate that the structure is significantly stretched in directions parallel with the surface plane. Indeed, the lengths of lattice vectors \(\varvec{a}\) and \(\varvec{b}\) shrank in the full relaxation to 12.426 Å and 16.314 Å, respectively, decreasing thus the energy by 3.018 eV/cell. Due to the relatively large cell geometry relaxation, the full optimization required significantly larger number of steps (84) than the optimization of atomic positions only. As mentioned above, the reaction involves rotation of whole crotonaldehyde molecule with respect to the MgO surface, hence all 11 atoms forming the molecule plus 4 substrate atoms have been considered as active (displayed as spheres in Fig. 4) when constructing the approximate \(\varvec{H}\) matrix for the initial structure. With this approximate \(\varvec{H}\), the atomic only and the full relaxations converged in 95 and 193 steps, respectively, yielding the same structures as those obtained using the exact Hessian. When the minimal number of gradient evaluations needed for the \(\varvec{H}\) calculation is taken into account (246 for the exact and 45 for the approximate Hessian), the relaxation with the approximate \(\varvec{H}\) is, again, found to be more efficient than that employing the exact initial Hessian.

3.6 Pure affine shear deformation of Al

As our final example, we discuss a transformation taking place entirely in the space of lattice vectors components \(h_{i,\mu }\), namely the \(\langle 11\bar{2}\rangle \{111\}\) pure affine shear deformation of Al, previously studied by Jahnátek et al. [40]. As shown in Fig. 5, this transformation is realized via sliding the Al layers parallel with the (111) plane and the final structure is symmetry equivalent to the initial one. The deformation causes a variation of the lattice degrees of freedom while the fractional coordinates of atoms remain intact because the corresponding net forces are zero by symmetry. As this process is reversible (implying that the transformation path must be continuous), the maximum energy structure can be found by using our TS optimization technique. Alternatively, the same structure can be identified as the configuration corresponding to the maximum on the energy versus \(\alpha\) plot obtained in a series of constrained relaxations, where \(\alpha\) stands for the angle between two lattice vectors parallel with the \([ 11\bar{2}]\) and the [111] directions defined with respect to the conventional body-centered cubic cell of Al [40].

Fig. 5
figure 5

Initial (a), transition (b), and final (c) states of the transformation of Al under the \(\langle 11\bar{2}\rangle \{111\}\) pure affine shear deformation. Full lines represent the unit cell employed in calculations and different colors are used to distinguish different layers of Al atoms parallel with the (111) plane. Note that the structures a and c are equivalent by symmetry

Our simulation setup was similar to that reported in Ref. [40], albeit the plane-wave cutoff was increased in order to reduce the biasing effect of Pulay stress in the full TS relaxation. A minimal unit cell with lattice vectors parallel with the \([ 11\bar{2}]\), \([ 1\bar{1}0]\), and [111] directions has been chosen containing six Al atoms (i.e., three independent Al layers, see Fig. 5). The following cell geometry has been used in the initial guess for the maximum energy structure: \(a=2.854\ {\AA }\), \(b= 4.949\ {\AA }\), \(c=7.282\ {\AA }\), \(\alpha =74.1\ ^{\circ }\), \(\beta =90\ ^{\circ }\), \(\gamma =90^{\circ }\), \(\hbox {V}=98.9\ {\AA ^3}\). The components of the initial Hesse matrix corresponding to the lattice degrees of freedom (i.e., the block \(\varvec{H}^{h}\)) have been determined numerically at the DFT level while all remaining blocks were approximated by the model described in Sect. 2.2. The TS geometry converged in just nine steps and the structure with the following cell geometry has been obtained: \(a= 2.809\ {\AA }\), \(b=4.924\ {\AA }\), \(c=7.781\ {\AA }\), \(\alpha =71.6\ ^{\circ }\), \(\beta = 90.0^{\circ }\), \(\gamma =90.0^{\circ }\), \(\hbox {V}=102.1\ {\AA ^3}\). The internal pressure of 2.3 GPa determined for the initial structure has been eliminated entirely and all stress-tensor components vanished upon relaxation. The computed structure is perfectly consistent with that corresponding to the maximum on the energy vs. \(\alpha\) profile (see Fig. 6) determined in a series of constrained relaxations.

Fig. 6
figure 6

Change in energy of Al with variation of angle (\(\alpha\)) between lattice vectors parallel with the \([ 11\bar{2}]\) and the [111] directions defined with respect to the conventional body-centered cubic cell of Al (cf. Fig. 5). The result of the transition state optimization (full relaxation) is shown to coincide with maximum of the curve obtained from a series of constrained relaxations with fixed \(\alpha\)

4 Conclusions

In this work, our algorithm [11] for structural optimizations of periodic systems in delocalized internal coordinates [12] has been adapted for the use in relaxations of transition states. The presented method allows for simultaneous relaxations of atomic positions and cell geometry with or without additional geometric constraints. The performance of the method has been demonstrated on several different real-world examples covering the most important cases occurring in practice. In particular, a full unconstrained TS relaxation of a system with three-dimensional periodicity has been demonstrated on example of the proton exchange reaction in zeolite chabazite. As examples of relaxations with additional geometric constraints including fractional coordinates of selected atoms and lattice degrees of freedom (such as the lengths of lattice vectors and the lattice angles), the TS optimizations of one-dimensional chain of \(\hbox {H}_2\) molecules and the partial desorption of crotonaldehyde from MgO surface have been discussed. Finally, a relaxation of transition state defined entirely in the space of lattice vectors components has been demonstrated on example of the pure affine shear deformation of Al. The full TS relaxation involving both the atomic positions and the lattice vectors components was found to represent a relatively modest overhead compared to usual TS relaxations involving atomic positions only. The performance of the TS relaxations initialized using an approximate Hesse matrix was examined, in which a relatively small fraction of matrix elements corresponding to active atoms directly participating in the reaction of interest were determined accurately at the DFT level, while a major part of elements, typically related to inactive atoms and lattice vectors components, was defined on the basis of a simple empirical model. When a total number of gradient evaluations used in relaxations and Hessian calculations was taken into account, the TS geometry optimizations initialized with the approximate Hesse matrix was found to outperform the simulations carried out with exact initial Hessian, in which all elements related to atomic positions were computed at the DFT level.