1 Introduction

The accurate characterisation of electronically excited states is of paramount importance in the description of both absorption (UV/Vis) and emission (fluorescence) spectroscopies. In particular, the accurate evaluation of the energy required to excite a molecule from its ground electronic state to an electronically excited state is crucial in interpreting or predicting spectra.

In a previous communication [1], we presented a general method for calculating the wavefunction and energy of the first electronic excited state. This technique relies on the minimisation of the energy of a normalised, trial Slater determinant, constrained to remain orthogonal to the optimal, ground-state Hartree–Fock (HF) determinant, \(|\varPhi_{0}\rangle\), through the use of a projection operator [25]. If \(\hat{P}=|\varPhi_{0}\rangle\langle\varPhi_{0}|\), then

$$|\varPhi_{1}^{P}\rangle=(1-\hat{P})|\varPhi_{1}\rangle$$
(1)

where \(|\varPhi_{1}\rangle\) is the trial determinant. Assuming that the wavefunctions are real, then the energy expectation value of this constrained wavefunction is

$$E_{1}=\frac{\langle\varPhi_{1}|\hat{H}|\varPhi_{1}\rangle- 2\langle\varPhi_{1}|\hat{H}|\varPhi_{0}\rangle\langle\varPhi_{1}|\varPhi_{0}\rangle+ E_{0}\langle\varPhi_{1}|\varPhi_{0}\rangle^{2}} {1-\langle\varPhi_{1}|\varPhi_{0}\rangle^{2}}$$
(2)

where \(E_{0}=\langle\varPhi_{0}|\hat{H}|\varPhi_{0}\rangle\) and the constant, nuclear repulsion can be added to give the total energy. This energy expression can then be minimised by mixing in the virtual orbitals in the orthogonal complement to the subspace, spanned by the spin-orbitals in the trial determinant. The minimisation occurs by use of the vector of partial derivatives of the energy, taken with respect to the mixing coefficients of the set of virtual orbitals, \(\{|\phi_{t}\rangle\}\), a minimisation by steepest descent.

It was found that this method yields energies in excellent agreement with experimental data, with some significant improvements over more complex techniques such as configuration interaction with singles [6], time-dependent HF, symmetry adapted cluster configuration interaction [7, 8], and equations-of-motion-coupled cluster. However, the rate of convergence of the procedure was found to be very slow. It was thus decided to implement a more powerful optimisation routine using the matrix of partial second derivatives or Hessian. The advantages of using such a method are discussed by Fletcher [9], but the main reason for doing so here is that the choice of the direction of descent during the optimisation is often more effective than simply following the direction of steepest descent. The implications of this will be discussed more fully, later on.

Since our original work, research into calculating electronically excited states, using single-determinant approximations based on HF and density functional theory (DFT), has continued in other groups. In the HF arena, Tassi and co-workers have recently introduced an unrestricted HF (UHF)-based method to calculate excited states, where orbitals are varied to minimise the required energy [10]. The variations are restricted and orthogonality is enforced between the ground- and excited-state orbitals to ensure the excited state wavefunction is orthogonal to that of the states below it. The approach has subsequently been extended to deal with multiple excitations [11].

In the DFT regime, recent work by Ziegler and others has focused on the constricted variational DFT method to nth-order (CV(n)-DFT) [12], where the excited-state density is optimised by use of unitary transformations amongst the occupied and virtual orbitals. By limiting the variational space, orthogonality to the ground state is ensured. The initial implementation used a variational procedure up the second-order in the transformations, with higher-order terms (up to n) being treated perturbatively. Subsequently, the higher-order terms were also included variationally to give the self-consistent field CV(n)-DFT method (SCF-CV(n)-DFT) [13, 14]. This may be extended to multiple excitations, but, similar to our method, suffers from spin-contamination due to the multi-reference nature of open-shell systems. The most recent improvement to this class of methods has been to allow the excited-state orbitals to relax in response to changes in the Coulomb and exchange potentials due to the excitation [15]. In the previous implementations, the occupied β and some, non-participating α-orbitals were frozen, but here they are allowed to relax to second-order in the transformations. As such the method is known as the relaxed-SCF-CV(n)-DFT method [15].

The spin-contamination problem afflicts many of the simple methods used to calculate excited states; one way to avoid the problem is to use the spin-restricted open-shell Kohn–Sham DFT method (ROKS), originally aimed at ground-state calculations [1618]. More recently, it has been used to calculate excitation energies and Stokes shifts [19]. The spin-contamination is avoided by carrying out spin-adaptation during the energy optimisation rather than being dealt with a posteriori as in the other methods. In fact, as the singlet and triplet orbitals are the same, the purification is exact in this case. ROKS avoids variational collapse, but can suffer from mixing of the ground and excited states leading to non-orthogonality of the two states. This is rectified by the use of level shifting [19].

Straddling the wavefunction and density functional worlds, the maximum-overlap method (MOM) [2022] (related to the excited-state DFT method [23]) takes a different approach to the optimisation of the excited state. Orthogonality between the ground- and excited-state orbitals or determinants is not used, but variational collapse is avoided (although this is not guaranteed) by optimising the energy such that the occupied orbitals at any particular step are those which overlap most strongly with those from the previous step. This method has the advantage over the others in that higher excited states may be calculated without the need to calculate all lower lying states first; it does, however, suffer from the common problem of spin-contamination.

Also of use in calculating excited states in the HF or DFT worlds is the optimised effective potential method (OEP) [2426] where each orbital obeys a single-particle Schrödinger equation with local potential V eff. The variation is in the potential rather than in the orbitals themselves, meaning that the usual delocalised exchange potentials are not needed. Explicit orthogonality to the ground state is ensured via the orbitals and the method has the advantage of being able to deal with multiple excitations [27].

The aims of this communication are to present the derivation of the elements of the Hessian; to indicate how the Hessian is implemented into the optimisation routine; to assess the improvements in the convergence rate of the method; and to indicate the further work that needs to be carried out to improve the method so that it can compete with more established methods for calculating excited-state energies.

2 Second derivative of the energy

For clarity, the notation used previously [1] will be recapitulated in what follows, with further definitions added as required. For reasons outlined in our earlier work [1], the gradient vector is only evaluated where the mixing coefficients are zero, and so it will be for the Hessian matrix.

2.1 Matrix elements

We initially present expressions for the matrix elements, required for the generation of the Hessian, in terms of the molecular spin-orbitals in the trial and ground-state determinants. Having done this, it will be impractical, for reasons of space, to then substitute the appropriate expansions into the expressions for the Hessian elements. The reason for this is that each matrix element may take a variety of related forms based upon the orthogonality constraints, as outlined before [1], placed upon the constituent orbitals.

The Hamiltonian matrix elements involving just the trial determinant or just the ground-state determinant can be evaluated as usual, using Slater’s rules [28, 29]. The complications arise in evaluating matrix elements involving both the trial and ground-state determinants. By use of the pairing theorem of Amos and Hall [30], the overlap of the transformed ith trial and jth ground-state orbitals is defined:

$$\langle i|j_{0}\rangle=\lambda_{i}\delta_{ij} \qquad 0\leq\lambda_{i}\leq 1$$
(3)

Note that in the above and in all following equations, ground-state orbitals are distinguished from the trial orbitals by addition of a subscript zero.

With the spin-orbitals so paired, the overlap of the two determinants, each constructed from N spin-orbitals, is simply defined as

$$\varLambda=\prod_{i=1}^{N}\lambda_{i}$$
(4)

From this, we can then define a series of cofactors of the determinant of orbital overlaps, which will be of use shortly (we will have use for cofactors with between one and four indices missing from the product)

$$\varLambda_{i\ldots p}={\mathop{\mathop{\prod}\limits_{m=1}}\limits_{m\neq i,\ldots,p}^{N}}\lambda_{m}$$
(5)

The Hamiltonian matrix element between the trial and ground-state determinants is defined in Eq. (14) of Ref. [1] as

$$H_{10}=\langle\varPhi_{1}|\hat{H}|\varPhi_{0}\rangle = \frac{1}{2}\sum_{i=1}^{N}\langle i|\hat{h}\varLambda_{i}+\hat{F}_{10;i}|i_{0}\rangle$$
(6)

where \(\hat{h}\) is the usual one-electron operator and

$$\hat{F}_{10;i}=\hat{h}\varLambda_{i}+ \sum_{j\neq i}^{N}\varLambda_{ij}\left(\hat{J}_{10;j}-\hat{K}_{10;j}\right)$$
(7)

and \(\hat{J}_{10;j}\) and \(\hat{K}_{10;j}\) are generalised Coulomb and exchange operators such that, for arbitrary orbitals p and \(q, \langle p|\hat{J}_{10;j}|q_{0}\rangle=\langle pj|q_{0}j_{0}\rangle\) and \(\langle p|\hat{K}_{10;j}|q_{0}\rangle=\langle pj|j_{0}q_{0}\rangle\).

The above matrix elements have been previously defined in Ref. [1], all involving only occupied orbitals in both trial and ground-state determinants. We now move on to describe those matrix elements involving the trial determinant where one or more orbitals have been substituted by a virtual orbital (or two) (denoted by the indices t and u). Initially, we note those expressions used in our previous work, but not separately presented there. They are the Hamiltonian matrix elements between a singly substituted trial determinant (promotion from orbital m to orbital t), \(|\varPhi_{1,mt}\rangle\), and the HF ground state. There are three cases depending on the exact relationship of the two orbitals in the indexing scheme used.

$$\begin{aligned} \langle\varPhi_{1,mt}|\hat{H}|\varPhi_{0}\rangle&= \frac{\alpha_{m}}{2}\sum_{i\neq m}^{N}\langle i|\varLambda_{im}\hat{h}+\hat{F}_{10;m}|i_{0}\rangle\\ &\quad+ \langle t|\hat{F}_{10;m}|m_{0}\rangle \quad \left(\hbox{where}\,\, t=m+N\right) \end{aligned}$$
(8a)
$$\begin{aligned} \langle\varPhi_{1,mt}|\hat{H}|\varPhi_{0}\rangle&=\langle t|\hat{F}_{10;m}|m_{0}\rangle\\ &\quad-\alpha_{(t-N)}\langle (t-N)|\hat{F}_{10;(t-N)m}|m_{0}\rangle \quad \left(\hbox{where}\,\, m+N\neq t\le 2N\right) \end{aligned}$$
(8b)
$$\langle\varPhi_{1,mt}|\hat{H}|\varPhi_{0}\rangle=\langle t|\hat{F}_{10;m}|m_{0}\rangle \quad \left(\hbox{where}\,\, t>2N\right)$$
(8c)

where \(\alpha_{m}=\sqrt{1-\lambda_{m}^{2}}\) is the overlap between a trial, virtual orbital and the ground-state orbital with which it is paired using the extended pairing theorem [31, 32]. The only previously undefined term above is the Fock-type operator on the second line of Eq. (8b), which is simply

$$\hat{F}_{10;ij}=\hat{h}\varLambda_{ij} +\sum_{k\neq i,j}^{N}\varLambda_{ijk}\left(\hat{J}_{10;k}-\hat{K}_{10;k}\right)$$
(9)

The first new equation that we present is that for the overlap between a doubly substituted trial determinant and the ground state. Defining \(|\varPhi_{1,mt,nu}\rangle\) as the determinant obtained from \(|\varPhi_{1}\rangle\) by replacing occupied orbitals ϕ m and ϕ n with virtual orbitals ϕ t and ϕ u , respectively, we get the overlap

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\varPhi_{0}\rangle&=\alpha_{m}\alpha_{n} \varLambda_{mn}\left(\delta_{m,(t-N)}\delta_{n,(u-N)}\right.\\ &\quad\left.-\delta_{m,(u-N)}\delta_{n,(t-N)}\right)\quad (m,n=1,2,\ldots,N;\,t,u=N+1, N+2,\ldots,L) \end{aligned}$$
(10)

where L is the total number of occupied and virtual spin-orbitals for each spin α or β, or, equivalently, the number of basis functions. The next set of matrix elements are those Hamiltonian elements between two singly substituted trial determinants. As the two determinants contain orthonormal orbitals, the elements can be expanded through use of Slater’s rules. There are two such elements

$$\begin{aligned} \langle\varPhi_{1,mt}|\hat{H}|\varPhi_{1,mu}\rangle&= \delta_{tu}\sum_{i\neq m}^{N}\left\langle i|\hat{h}+\frac{1}{2}\sum_{j\neq m}\left(\hat{J}_{j}-\hat{K}_{j}\right)|i\right\rangle\\ &\quad+\left\langle t|\hat{h}+\sum_{j\neq m}^{N}\left(\hat{J}_{j}-\hat{K}_{j}\right)|u\right\rangle \end{aligned}$$
(11)

where \(\hat{J}_{j}\) and \(\hat{K}_{j}\) are the usual Coulomb and exchange operators between orbitals from the same set. Likewise, when m ≠ n

$$\langle\varPhi_{1,mt}|\hat{H}|\varPhi_{1,nu}\rangle= -\left(\langle n t||mu\rangle+\delta_{tu}\langle n|\hat{h}+\sum_{j\neq m,n}^{N}\left(\hat{J}_{j}-\hat{K}_{j}\right)|m\rangle\right)$$
(12)

Here we use the common notation, \(\langle ab||cd\rangle\), to denote an anti-symmetrised two-electron integral with respect to the inverse of the inter-electronic distance.

The final set of Hamiltonian matrix elements to consider are those where a determinant, with two occupied orbitals substituted by different virtual orbitals, is involved (the determinant vanishes if the virtual orbitals are the same). The first is that between the doubly substituted and unsubstituted trial determinant, which may be evaluated using Slater’s rules. It is simply

$$\langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{1}\rangle =\langle tu||mn\rangle$$
(13)

We now turn to the most troublesome integral present in any of the expressions for the Hessian, the Hamiltonian element between the doubly substituted trial determinant and the ground state. The difficulties here arise because the orbitals in each determinant come from different orthonormal sets. The general expansion for such an integral was given in Eq. (6) of Ref. [1]. In the case considered here, the pairing theorems [3032] are applied between the trial (both occupied and virtual) and ground-state orbitals, thus simplifying the equations. It should also be noted that overlaps between virtual orbitals and the ground-state orbitals need to be considered in evaluating the overlap cofactors. In doing so, all vanishing terms can be located, reducing the effort required in evaluating the matrix elements.

The first such element is that where t = m + N and u = n + N, i.e. the index of the virtual orbital (within the subspace of virtual orbitals) is equal to that of the orbital it replaces. This means that the tth virtual orbital is paired with the mth ground-state orbital and likewise for the uth virtual and nth ground-state orbital.

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{0}\rangle&= \frac{\alpha_{m}\alpha_{n}}{2}\sum_{i\neq m,n}^{N}\langle i|\hat{h}\varLambda_{imn}+\hat{F}_{10;imn}|i_{0}\rangle\\ &\quad+\varLambda_{mn}\langle \left(N+m\right)\left(N+n\right)||m_{0}n_{0}\rangle\\ &\quad+\alpha_{n}\langle \left(N+m\right)|\hat{F}_{10;mn}|m_{0}\rangle\\ &\quad+\alpha_{m}\langle \left(N+n\right)|\hat{F}_{10;mn}|n_{0}\rangle \end{aligned}$$
(14)

where, in addition to those explained above, we have defined one new operator

$$\hat{F}_{10;imn}=\hat{h}\varLambda_{imn}+\sum_{j\neq i,m,n}^{N}\varLambda_{ijmn}\left(\hat{J}_{10;j}-\hat{K}_{10;j}\right)$$
(15)

The simplest expansion of the Hamiltonian matrix element arises when both the tth and uth virtual orbitals are unpaired with any of the ground-state orbitals i.e. t,u > 2N. Nearly all terms are exactly zero so we are simply left with

$$\langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{0}\rangle =\varLambda_{mn}\langle tu||m_{0}n_{0}\rangle$$
(16)

The next matrix element to consider is the Hamiltonian element between the ground state and the substituted trial determinant where t = N + m and N + n ≠ u ≤ 2N, i.e. the tth orbital is paired with the mth ground-state orbital, but the uth virtual is not paired with the nth ground-state orbital. If t = u then clearly the matrix element vanishes by anti-symmetry, otherwise

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\varPhi_{0}\rangle&=\alpha_{m}\langle u|\hat{F}_{10;mn}|n_{0}\rangle \\ &\quad-\alpha_{m}\alpha_{(u-N)}\langle \left(u-N\right)|\hat{F}_{10;(u-N)mn}|n_{0}\rangle\\ &\quad-\alpha_{(u-N)}\varLambda_{(u-N)mn}\\ &\quad\times\langle \left(N+m\right)\left(u-N\right)||m_{0}n_{0}\rangle\\ &\quad+\varLambda_{mn}\langle \left(N+m\right)u||m_{0}n_{0}\rangle \end{aligned}$$
(17)

The expression where N + m ≠ t ≤ 2N and u = N + n is easily found by interchanging m and n, and t and u in the above equation.

Next, we deal with expansions of the Hamiltonian matrix element characterised by the fact that substituted virtual orbitals are both paired with ground-state orbitals, neither of which corresponds to the excited-state orbital which has been removed. In effect, this means that the overlap of the two determinants has two non-zero, off-diagonal virtual-ground overlaps. As noted above, if t = u, then the doubly substituted determinant vanishes and so does the matrix element. Hence we are only concerned with expansions where this is not the case.

There are three distinct expressions. The first obeys the conditions that m ≠ u − N and n ≠ t − N are fulfilled; neither virtual orbital is paired with either of the ground-state orbitals, of same index as the removed trial orbitals. The expression is thus

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{0}\rangle&=\varLambda_{mn}\langle tu||m_{0}n_{0}\rangle\\ &\quad-\alpha_{(t-N)}\varLambda_{(t-N)mn}\langle(t-N)u||m_{0}n_{0}\rangle\\ &\quad-\alpha_{(u-N)}\varLambda_{(u-N)mn}\langle t(u-N)||m_{0}n_{0}\rangle\\ &\quad+ \alpha_{(t-N)}\alpha_{(u-N)}\varLambda_{(t-N)(u-N)mn}\\ &\quad\times\langle(t-N)(u-N)||m_{0}n_{0}\rangle \end{aligned}$$
(18)

A second, more specific, case to consider, is that where one of the virtual orbitals is paired with the ground-state orbital of same index as the excited-state orbital, replaced by the other virtual orbital. In other words, when either u = m + N or t = n + N, but not both. Taking the first, then

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{0}\rangle&=\varLambda_{mn} \langle(n+N)u||m_{0}n_{0}\rangle\\ &\quad-\alpha_{n}\langle u|\hat{F}_{10;mn}|m_{0}\rangle\\ &\quad+\alpha_{n}\alpha_{(u-N)}\langle (u-N)|\hat{F}_{10;(u-N)mn}|m_{0}\rangle \end{aligned}$$
(19)

If t = n + N, then the expansion can be found by simply switching the indices m with n, and t with u. Finally, if u = m + N and t = n + N, i.e. both virtuals are paired with the ground-state orbital of same index as the other virtual, then

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{0}\rangle&=\varLambda_{mn} \langle(n+N)(m+N)||m_{0}n_{0}\rangle\\ &\quad-\frac{\alpha_{m}\alpha_{n}}{2}\sum\limits_{i\neq m,n}^{N}\langle i|\varLambda_{imn}\hat{h}+\hat{F}_{10;imn}|i_{0}\rangle\\ &\quad-\alpha_{m}\langle (m+N)|\hat{F}_{10;mn}|n_{0}\rangle\\ &\quad-\alpha_{n}\langle (n+N)|\hat{F}_{10;mn}|m_{0}\rangle \end{aligned}$$
(20)

If one of the virtual orbitals is paired with the ground-state orbital of same index as the substituted excited-state orbital, and the other virtual is unpaired, then t = m + N and u > 2N, or u = n + N and t > 2N. For the former case

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{0}\rangle&=\alpha_{m}\langle u|\hat{F}_{10;mn}|n_{0}\rangle\\ &\quad+\varLambda_{mn}\langle \left(m+N\right)u||m_{0}n_{0}\rangle \end{aligned}$$
(21)

The latter case is simply found by interchanging the indices m with n, and t with u.

A related scenario yields the final set of matrix elements. Here the trial determinant is altered by substitution with a virtual orbital that is not paired with any ground-state orbital, and another which is paired with a ground-state orbital of different index to the trial orbital it replaces, i.e. m + N ≠ t ≤ 2N and u > 2N, or n + N ≠ u ≤ 2N and t > 2N. These can both be separated into two sub-cases. Taking the former as a basis, if t = n + N, then the expansion is

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{0}\rangle&=-\alpha_{(t-N)}\langle u|\hat{F}_{10;mn}|m_{0}\rangle\\ &\quad+\varLambda_{mn}\langle tu||m_{0}n_{0}\rangle \end{aligned}$$
(22)

The equivalent situation in the latter case, where u = m + N, is easily found by the interchange of the appropriate indices.

The second sub-case occurs where t ≠ n + N (or u ≠ m + N when indices are swapped) and yields

$$\begin{aligned} \langle\varPhi_{1,mt,nu}|\hat{H}|\varPhi_{0}\rangle&=\alpha_{(t-N)} \varLambda_{(t-N)mn}\langle(t-N)u||m_{0}n_{0}\rangle\\ &\quad+\varLambda_{mn}\langle tu||m_{0}n_{0}\rangle \end{aligned}$$
(23)

This is the final, new matrix element required for efficient evaluation of the second derivative of the excited-state energy when all mixing coefficients are set to zero.

2.2 Second derivatives of the trial determinant

Having derived all matrix elements required for the evaluation of the analytic Hessian, we present the formula into which these expressions should be substituted.

To find the Hessian, we take derivatives of the energy expression with respect to the coefficients of the virtual orbitals used to modify the trial orbitals, {d mt }. Consequently, only the trial determinant is affected by such operations. Using the definition of a trial orbital from Ref. [1]

$$\phi_{m}^{'}= \tilde{\phi}_{m}\langle\tilde{\phi}_{m}|\tilde{\phi}_{m}\rangle^{-1/2},\quad \tilde{\phi}_{m}=\phi_{m}+\sum_{t=N+1}^{L}d_{mt}\phi_{t}$$
(24)

Remembering that we only evaluate the Hessian when all mixing coefficients are zero, we note that the derivatives of the trial orbitals are

$$\partial |\phi_{m}\rangle/\partial d_{nt}\left|_{{\bf d}={\bf 0}}=\delta_{mn}|\phi_{t}\rangle\right.$$
(25a)
$$\partial^{2} |\phi_{i}\rangle/\partial d_{mt}\partial d_{nu}\left|_{{\bf d}={\bf 0}}=-\delta_{mn}\delta_{tu}|\phi_{m}\rangle\right.$$
(25b)

As the trial determinant is just an anti-symmetrised product of the trial orbitals, we can simply evaluate its derivatives

$$\partial |\varPhi_{1}\rangle/\partial d_{mt}\left|_{{\bf d}={\bf 0}}=|\varPhi_{1,mt}\rangle\right.$$
(26a)
$$\partial^{2} |\varPhi_{1}\rangle/\partial d_{mt}\partial d_{mu}\left|_{{\bf d}={\bf 0}}=-\delta_{ut}|\varPhi_{1}\rangle\right.$$
(26b)
$$\partial^{2} |\varPhi_{1}\rangle/\partial d_{mt}\partial d_{nu}\left|_{{\bf d}={\bf 0}}=\left(1-\delta_{ut}\right)|\varPhi_{1,mt,nu}\rangle \quad (\hbox{where}\,m\neq n)\right.$$
(26c)

The general expression for the Hessian elements is simply formed by application of the standard rules of differential calculus to the energy expression, Eq. (2). Using the simplified notation, ∂ mt  = ∂/∂d mt , we get the rather cumbersome

$$\begin{aligned} \partial_{nu}\partial_{mt}E_{1} &=\left[1-|\langle\varPhi_{1}|\varPhi_{0}\rangle|^{2}\right]^{-4}\\ &\quad \times\left\{2\left[1-|\langle\varPhi_{1}|\varPhi_{0}\rangle|^{2}\right]^{2} \left[\langle\partial_{mt}\varPhi_{1}|\varPhi_{0}\rangle\langle\varPhi_{0}|\varPhi_{1} \rangle\partial_{nu}E_{1}^{'}\right.\right.\\ &\quad+\langle\partial_{nu}\varPhi_{1}|\varPhi_{0}\rangle\langle \varPhi_{0}|\varPhi_{1}\rangle\partial_{mt}E_{1}^{'}+\langle\partial_{mt} \varPhi_{1}|\varPhi_{0}\rangle\langle\varPhi_{0}|\partial_{nu}\varPhi_{1}\rangle E_{1}^{'}\\ &\quad\left.+\langle\partial_{nu}\partial_{mt} \varPhi_{1}|\varPhi_{0}\rangle\langle\varPhi_{0}|\varPhi_{1}\rangle E_{1}^{'}\right]\\ &\quad+8\left[1-|\langle\varPhi_{1}|\varPhi_{0}\rangle|^{2}\right]\langle \partial_{mt}\varPhi_{1}|\varPhi_{0}\rangle\langle\varPhi_{0}| \partial_{nu}\varPhi_{1}\rangle|\langle\varPhi_{1}|\varPhi_{0}\rangle|^{2}\\ &\quad+2\left[1-|\langle\varPhi_{1}|\varPhi_{0}\rangle|^{2}\right]^{3}\left[ \langle\partial_{nu}\partial_{mt}\varPhi_{1}|\hat{H}|\varPhi_{1} \rangle+\langle\partial_{mt}\varPhi_{1}|\hat{H}|\partial_{nu} \varPhi_{1}\rangle\right.\\ &\quad-\langle\partial_{nu}\partial_{mt}\varPhi_{1}|\hat{H}|\varPhi_{0} \rangle\langle\varPhi_{0}|\varPhi_{1}\rangle-\langle\partial_{mt}\varPhi_{1}| \hat{H}|\varPhi_{0}\rangle\langle\varPhi_{0}|\partial_{nu}\varPhi_{1}\rangle\\ &\quad-\langle\partial_{nu}\varPhi_{1}|\hat{H}|\varPhi_{0}\rangle\langle \varPhi_{0}|\partial_{mt}\varPhi_{1}\rangle-\langle\varPhi_{1}|\hat{H}| \varPhi_{0}\rangle\langle\varPhi_{0}|\partial_{nu}\partial_{mt}\varPhi_{1}\rangle\\ &\quad\left.\left.+E_{0}\left[\langle\partial_{nu}\partial_{mt}\varPhi_{1}|\varPhi_{0} \rangle\langle\varPhi_{0}|\varPhi_{1}\rangle+\langle\partial_{mt} \varPhi_{1}|\varPhi_{0}\rangle\langle\varPhi_{0}|\partial_{nu} \varPhi_{1}\rangle\right]\right]\right\} \end{aligned}$$
(27)

where E 1 is the energy expression defined as in Eq. (2), but with orbitals updated according to Eq. (24). Equation (26) can then be used in this expression to get an equation involving the matrix elements between different determinants. Subsequently, the various elements in terms of the different sets of molecular orbitals, as outlined above, can be substituted in, thus giving a set of working equations allowing the evaluation of all elements of the Hessian.

3 Computational details

Having derived the necessary matrix elements, a Fortran subroutine was written to evaluate the required integrals and to combine them so as to form the Hessian matrix. This routine was then integrated into the operational code, used to calculate the excitation energies of molecular systems.

The original code relied on the method of steepest descent to locate the minimum of the excited-state energy. By evaluating the gradient vector, the direction of steepest descent is found, so that by initiating a linear search in the direction of the negative of the gradient, the minimum in that direction is found. The occupied orbitals may then be paired, and a new virtual space found, before a new direction is chosen, and the process is repeated. Such a method is very robust, as the search always proceeds in a downwards direction.

Having implemented the Hessian, it becomes possible to use a second-order optimisation procedure based on Newton’s method. The Hessian matrix is inverted using the Gauss–Jordan method [33] and a direction of search determined by the evaluation of

$${\bf x}=-{\bf g}.({\bf H})^{-1}$$
(28)

at d = 0 where g and H are the gradient vector and Hessian matrix respectively. A line search [33] can then be performed in this direction. This is done rather than a simple jump to the point d New  = d Old  + x [9]. The reason for this is that the line search eliminates the possibility that the full step given by the Newton method may lead to an increase in energy, instead finding the lowest point in that direction. This is the approach taken in our implementation of the optimisation procedure. In fact, even this method may not lead to an appropriate direction of descent, particularly if the initial point is far from the minimum and the Hessian is non-positive definite. The possibility of this is taken care of in our code, by reversion to the steepest descent method when the linear search fails to provide a reasonable energy decrease. The steepest descent approach ensures that the minimum is neared, and when the Newton method provides a reliable search direction, then it is used to maximise the efficiency of convergence.

4 Comparison of steepest descent and modified newton optimisation

In this section, the improvements in convergence brought about by the implementation of the modified Newton optimisation routine, over the original steepest descent method, will be presented and discussed. There are two aspects to this, the total time taken to reach the excited-state wavefunction from a given starting point, and also the final gradient norm. The first part is evident, a reduced calculation times implies an improved optimisation method. However, the total time is influenced by two factors, the total number of iterations and the time taken for each iteration. An increase in the average iteration time is acceptable if the reduction in the number of iterations required is reduced by a greater proportion, hence reducing the overall time taken. With regard to the second aspect mentioned, when searching for the minimum, we are searching for the wavefunction where the norm of the gradient vector, defined as \(\sqrt{{\bf g.g}}\), is equal to 0. However, when dealing with computational procedures, such an idealised situation rarely, if ever, occurs, due to accumulated rounding errors and the inherent imprecision of the computer. Our convergence criteria were that the total energy of the excited state should change by less than 10−12 E h and that no element of the gradient vector should be greater than 10−10. Once these conditions have been met, the gradient norm should be as close to 0 as possible. Tighter convergence of the gradient norm is desirable and therefore the relative convergence of the two methods will be compared. It should also be clear that the final excited-state energy should be the same to a given precision, when using either of the optimisation methods.

The number of iterations required for each calculation, the average times for those iterations, as well as the total calculation times, and the final convergence of the gradient norms in calculations on the systems considered in our earlier work [1] are listed in Table 1. Calculations were carried out AMD Opteron machines rated at 1,795 MHz. It should be noted here that after corrections to the code and a change of the machine on which the calculations were performed, the excitation energies of N2 and C2 (unrestricted reference), given in the first paper [1], can be corrected to 8.104 and 1.158 eV respectively. All calculations were carried out at the experimental gas-phase geometries given in Ref. [1] using the 6-31G** basis set.

Table 1 Comparison of convergence rates, both in terms of number of iterations and timings, and final gradient norms between the optimisation methods using steepest descent and those using a Hessian-based technique on small molecular systems

Examining the table, the conclusions that can be drawn from the initial data are clear. The use of an optimisation routine based on a second-order method generally gives significant savings in the total amount of time required to find the energy minimum. The exception is hydrogen, which will be discussed shortly.

The reason for the decreases in total timings for the systems including heavier atoms is that the total number of iterations required is significantly reduced in all cases. At worst the decrease is by a factor of 10 for C2 (using a restricted reference wavefunction), and at best a factor of over 80 in the case of unrestricted reference C2, with the other two cases falling in between these extremes. For these systems, the average iteration times are significantly increased by using the second-order optimisation; ranging from a factor of 3.8 for N2 up to a factor of 5.9 for unrestricted C2. However, these increases are more than outweighed by the reduction in the total number of iterations, meaning that, in all cases, the total times for the calculations were reduced.

The reduction was most pronounced in the case of C2 using an UHF determinant as the ground state, where there was a reduction in the total time by a factor of over 12. This improvement is mainly due to the extremely slow convergence of the wavefunction and energy, in terms of the number of iterations needed, using the steepest descent approach. The slow optimisation indicates a relatively flat energy surface where the minimum energy is reached by a series of short steps. The use of the full Hessian clearly generates more effective search directions, for the linear search, in locating the minimum.

In the case of H2, the total calculation time increased by about 40 %, with respect to the original steepest descent procedure, when using the Hessian-based optimisation. There are two reasons for this: first is the rise in average iteration time by a factor of 7.2, coupled with a decrease in the number of iterations by a factor of only 7. The actual optimisation thus takes slightly longer for the second-order approach. There is also an increase in the time of non-iteration parts of the code by about 0.018 s for the second-order calculation. However, as both calculations only take on the order of one second, the loss in time is not problematic.

The second point to note from the table is that the convergence of the gradient norms, in all cases when using the Hessian, is tighter than when using just the gradient information, sometimes by several orders of magnitude. This reflects the better characterisation of the energy surface when using second derivatives of the energy function. It should, however, be pointed out that the energies obtained were the same regardless of optimisation technique.

In conclusion, the use of a full, analytic Hessian matrix increases the amount of CPU time required for each iteration, but the fact that the majority of terms in Eq. (32) are already calculated when evaluating the energy and gradient means that this increase is not of major significance. Allied with the large reduction in the number of iterations needed to get an excitation energy, the use of the Hessian-based method is much the preferable method on time grounds.

It should, however, be noted that the optimisation times were very slow in the larger molecules, even with the improvements afforded by the introduction of the second-order method. This will become even more of a problem if it is desired to perform calculations on larger molecules or to use larger basis sets. A major bottleneck appears to be the need to carry out large numbers of 4-index transformations to get the necessary integrals to evaluate the matrix elements described earlier. Clearly further work will be required to improve this significantly, but the benefits of using the more sophisticated optimisation are, nonetheless, clear.

Also of concern when doing calculations using larger molecules and/or basis sets is the storage of the large, double-precision Hessian matrix. To overcome this scaling issue, it would be possible to use an iterative method to solve Eq. (33), in the form g =  −H.x, without storage of the elements of the Hessian. As motivation for future work, we provide a brief outline of how such a method could be implemented for our problem based upon the method of conjugate gradients [33].

Initially, we form the gradient vector, g, at the starting point of the energy iteration and also form a trial search direction vector, x 0. We can then form the vector

$${\bf v}_{0}={\bf r}_{0}={\bf g}+{\bf H}.{\bf x}_{0}$$
(29)

It can easily be seen that this vector can be formed on-the-fly, i.e. without ever storing the whole Hessian matrix; its product with the trial vector can be formed element-by-element by evaluating the Hessian terms as necessary. From this starting point, we can then perform the following iteration (valid for symmetric matrices, which the Hessian is), from i = 0 to a point where the calculation is considered converged [33].

$$\alpha_{i}= -\frac{{\bf r}_{i}.{\bf r}_{i}}{{\bf v}_{i}.{\bf H.v}_{i}}$$
(30a)
$${\bf x}_{i+1}= {\bf x}_{i}+\alpha_{i}{\bf v}_{i}$$
(30b)
$${\bf r}_{i+1}= {\bf r}_{i}+\alpha_{i}{\bf H.v}_{i}$$
(30c)
$$\beta_{i}= -\frac{{\bf r}_{i+1}.{\bf r}_{i+1}}{{\bf r}_{i}.{\bf r}_{i}}$$
(30d)
$${\bf v}_{i+1}= {\bf r}_{i}+\beta_{i}{\bf v}_{i}$$
(30e)

As the initial step in this iteration, we can form the product of the Hessian with v i , on-the-fly, and store the result. This vector can then be used to form α i and then r i+1 by taking the inner product with v i and simple multiplication by a scalar, respectively. All other terms can be formed using scalar and vector multiplication only, and as such, each iteration in this procedure can thus be carried out by evaluating the Hessian elements just once. In practice, the iterations would continue until the norm of the residual vector r i falls below some pre-determined, small value, at which point x i becomes the search direction used. There would also need to be a check performed at the start of the sequence of iterations to check whether the Hessian is positive definite, if not a simple, steepest descent procedure can be reverted to for the energy minimisation. At no point during the iterations is the whole Hessian stored so this method of generation of the search directions may be used in calculations on larger systems than those considered here. It will be useful to examine the utility of such a method in work to be carried out subsequently, in particular the trade off between the reduced memory demands and the need to repeatedly evaluate the same Hessian elements.

The use of an iterative method to generate the search directions for the energy optimisation would also mean the effort required in inverting the Hessian matrix, a process which scales with the cube of the product of the numbers of trial occupied and virtual orbitals (i.e. of order N 3(L − N)3), can be avoided. It would thus be useful to examine, in later work, the relative merits of the direct solution of Eq. (33) by Hessian inversion and the iterative solution of the same as outlined above in terms of the time taken.

Finally, it should be remarked that other optimisation routines, more sophisticated than steepest descent but less demanding than use of the full Hessian, were tried in implementing this excited-state method. In particular, the BFGS routine [33] using an approximate Hessian, updated after each iteration, was tested. However, this method failed to locate the energy minimum, the reason apparently being that our method relies on creating a new trial orbital space in which to search at each stage. This means that the updating of the Hessian, based upon past search directions, is completely inaccurate, leading to search directions which are of no use in decreasing the energy towards the minimum. It would, however, be useful in further work to experiment further with the energy optimisation routines in the hopes of finding a method more powerful than steepest descent but requiring less computational effort than forming and storing the full Hessian.

In particular, it has been suggested that the use of unitary orbital transformations would make procedures based upon approximate Hessians workable for this method of calculating excited-state energies. To do so it would be necessary to define a unitary matrix, A, acting on the space of all trial orbitals (occupied and virtual) to produce an updated, and hopefully improved, set of occupied orbitals. For each set of spin-orbitals, labelled by ω (which can be α or β), we let

$$\varPhi^{{\rm T}}(\omega)=\left(\phi_{1}^{{\rm o}}(\omega)\ldots\phi_{N}^{{\rm o}}(\omega)\phi_{N+1}^{{\rm v}}(\omega)\ldots\phi_{L}^{{\rm v}}(\omega)\right)$$
(31)

which can then be updated as

$$\varPhi(\omega)\rightarrow\varPhi^{'}(\omega)={\bf A}(\omega)\varPhi(\omega)$$
(32)

The variables, we are trying to optimise, then become the elements of the matrices A(α) and A(β). A possible downside of this method would be that the gradient elements would no longer just be calculated at points where the variables were zero, meaning added complexity in these evaluations. However, further investigation into the possibilities of this approach would be worthwhile.

The problems surrounding the storage of the Hessian as well as the need to speed up the calculations are major issues which will need to be addressed if this method is to become competitive with other, established methods for calculating excited states e.g. CC2. In addition, it would also be useful in further work to use the Hessian’s eigenvalues to examine the stability of the wavefunctions generated here to ensure that they do in fact represent energy minima rather than saddle points.

With reference to the capabilities of the other single-determinant-type methods mentioned in the introduction, further work would be useful to see whether we can extend our method to deal with multiple excitations as the UHF method of Tassi et al. [10, 11] or the CV(n)-DFT family [1215] can, or whether the spin-contamination issues outlined in our earlier paper [1] can perhaps be dealt with in a similar way to ROKS [19], where spin-purification takes place during the optimisation. Contrary to some of the other methods, ours uses a full variational space, with all excited-state orbitals being optimised, and ensures orthogonality to the ground state, a feature of the exact solution. In conclusion, the idea of using simple wavefunctions or densities to calculate excited states, giving results in decent agreement with experimental data, is of significant interest. This is particularly true when dealing with large molecular systems and it is to be hoped that future investigations will continue to yield promising results.