1 Introduction

Two-dimensional electronic spectroscopy (2DES) is [1,2,3,4,5,6,7,8,9,10] steadily becoming a favorable tool to track photoinduced phenomena in complex systems [1,2,3,4,5,6,7,8,9,10,11]. This is due to its enhanced spatial resolution as compared to its one-dimensional pump-probe (1D-PP) counterpart, stemming from the use of an extra laser pulse in the buildup of the third-order nonlinear response signal compared to the latter: A typical experiment is performed by scanning time delays between the first and second laser pulses (t1) for a given time delay between the second and third pulses (t2), which can be considered as the analogue of the waiting time between pump and probe pulses in PP spectroscopy, and between a third pulse and a local oscillator (t3), which heterodynes the field emitted by the sample in response to the perturbation by the three incident pulses. By Fourier transforming the signal along t1 and t3, two-dimensional frequency maps with signals S(Ω1, t2, Ω3) can be obtained at different waiting times t2, where all electronic transitions involved in the three field–matter interactions are recorded [12,13,14]. The enhanced spatial resolution of 2DES along the pump frequency (Ω1) is particularly important for studying systems containing several absorbing chromophores in neighboring spectral regions that produce congested signals in PP setups, as is the case in DNA/RNA single- and double-strand multimers, where the different nucleobases contained in the sequence absorb at similar wavelengths and their signals overlap making their disentanglement in specific decay channels unfeasible and thus requiring more complex techniques for their adequate and unequivocal characterization.

The interpretation of two-dimensional electronic spectra strongly relies on theoretical modeling working as a road map for their understanding. These range from those based on model Hamiltonians [15,16,17,18,19], which employ pre-computed ab initio parameters from the different (isolated) monomers contained in the macromolecular system to mixed quantum mechanics/molecular mechanics (QM/MM) approaches that explicitly account for the description of the solvent combined with supermolecular schemes (involving dimeric or multimeric species), thus accounting explicitly and more accurately for electronic coupling [20,21,22]. Such advanced modeling involves electronic structure theory evaluations of the high-energy excited-state manifolds probed by 1D-PP and 2DES experiments, which can add up to 9 eV from the ground state in the case of UV-active DNA/RNA nucleobases, i.e., featuring a couple dozen electronic states in monomers [23, 24] and over a hundred excited states in dimeric species [25,26,27]. Thus, robust theoretical approaches able to describe in principle all types of excitations (localized, singly and doubly excited and/or charge transfer) on an even footing are required. Suitable candidates are multiconfigurational complete active space self-consistent field (CASSCF) [28] and its second-order perturbation theory extension (CASPT2) [29, 30], which are able to accurately compute high-lying states despite featuring different known caveats such as those arising from valence-Rydberg mixing [31] and a potential unbalanced treatment of ionic and covalent states at certain points of the potential energy surface upon addition of the dynamic correlation [32,33,34].

In this work, we evaluate and benchmark the capabilities of CASPT2 and restricted active space RASPT2 methods to simulate the electronic excited states (and their associated transition dipole moments) for the DNA/RNA nucleobase guanine, yielding the most accurate theoretical estimates to date for its high-lying excited-state manifold. Given the lack of experimental data in the high-energy windows, we resort to a systematic protocol where an incremental active space is employed until convergence, yielding the best estimates computationally affordable. Due to the large computational expense associated with these techniques and their dependence on the size of the active space employed, we focus on less-costly restricted active space (RASSCF/RASPT2) approaches, which are thoroughly benchmarked to allow considering larger dimeric and multimeric sequences in future studies. Moreover, the outcome completes our previous efforts on pyrimidine and adenine (purine) bases [23, 24] and it can eventually be straight on used for building Frenkel excitonic Hamiltonian-type models to simulate nonlinear spectra on DNA/RNA hetero-multimers.

2 Computational details

2.1 Electronic structure theory computations

Complete/restricted active space self-consistent field (CASSCF/RASSCF) [28, 35, 36] and their second-order perturbation theory extensions (CASPT2/RASPT2) [30, 37, 38] have been employed as implemented in the MOLCAS 8 package [39]. The reference computations of guanine, used to calibrate different reduced active space schemes, were obtained with a procedure previously introduced by Roos and co-workers [40] and applied by us to adenine [24] and all canonical pyrimidine nucleobases [23]: A set of additional diffuse and uncontracted basis functions is added to the center of charge of the molecule, followed by a systematic removal of π* Rydberg-like orbitals (21 in this particular case) to minimize the expected CASPT2 overestabilization due to the presence of Rydberg-type orbitals in the secondary space, which are not properly represented within the ANO basis set employed, and to avoid the appearance of spurious quasi-Rydberg states [40]. A more detailed description is given in [32]. A full π complete active space reference comprising all π bonding and π* antibonding orbitals (see Fig. 1) is then employed for all energy evaluations and geometry optimizations enforcing Cs symmetry. The following RAS nomenclature is adopted: RAS (maximal number of holes in RAS1, number of RAS1 orbitals/number of electrons in RAS2, number of RAS2 orbitals/maximal number of electrons in RAS3, number of RAS3 orbitals). A systematic increase in the RAS3 subspace allowing up to two electrons leads to RAS(0,0|14,11|2,4), RAS(0,0|14,11|2,6), and RAS(0,0|14,11|2,8) schemes, which are analyzed to evaluate the energy convergence upon active space enlargement. The exclusion of the n lone pair orbitals and the corresponding * transitions is based on the relatively small dependence these states have on the dynamic correlation (being of covalent nature within valence bond theory) here benchmarked at the CASPT2/RASPT2 levels of theory [33], the present schemes being able to properly describe them upon their inclusion given their lesser dependence on the active space, as shown by their already accurate estimates at the CASSCF level. Moreover, their mediating role in the photoinduced events taking place in guanine [41] heavily depends on the environment surrounding the system [42, 43], as the * state lays relatively low in energy and is thus accessible in vacuo from both bright La and Lb states, being blue-shifted in polar solvents for this and all other nucleobases and thus reducing their probability of non-adiabatic population [42, 44]. An imaginary level shift [45] of 0.2 a.u. was employed in the perturbation treatment to minimize the appearance of intruder states except when explicitly stated as in Sect. 3.2, while the IPEA shift was set to zero for all cases [46, 47]. The required transition dipole moments were obtained making use of the restricted/complete active space state interaction (RASSI/CASSI) method [48, 49]. Additional RASPT2 computations with less demanding RAS1/RAS3 schemes were considered within the RAS(4,7|0,0|4,4) by artificially shifting the excitation energies with the imaginary level shift (IMAG) as previously done in adenine [24], and by systematically increasing the number of holes and electrons in the RAS1/RAS3 subspace leading to RAS(5,7|0,0|5,4), RAS(6,7|0,0|6,4), RAS(7,7|0,0|7,4), and RAS(8,7|0,0|8,4). Cholesky decomposition was employed throughout to speed up the two-electron integrals [50,51,52].

Fig. 1
figure 1

Guanine molecular orbitals contained in the CASSCF simulations featuring all π bonding and antibonding orbitals

2.2 Nonlinear spectroscopy

The resulting energy levels and transition dipole moments have then been employed to simulate the two-dimensional electron spectra making use of the Spectron 2.7 program [53], which can compute quasi-absorptive 2DES maps via the sum-over-states approach [54] within the dipole approximation [20]. Further details on the working equations have been given elsewhere [24, 53]. Spectral line shapes are simulated assuming that the dephasing is caused by pure dephasing in the Markovian approximation. Thus, the signals were homogeneously broadened with a constant line broadening of 500 cm−1. The waiting time t2 between the pump pulse pair and the probe pulse is set to zero; excited-state dynamics is thus neglected. Within this static approximation, only computations at the Franck–Condon (FC) region are required to simulate the spectra. Protocols for computing 2DES considering spectral diffusion and non-adiabatic effects on an ultrashort timescale are well established [55,56,57,58]. However, our approximation is sufficient for our aim of benchmarking the effect of various computational parameters on the shape (position and intensity of the peaks) of the two-dimensional maps and to qualitatively separate the main signals characterizing the spectra of guanine at the FC region. Future studies could tackle the fine line shape given by the ultrashort vibrational dynamics as well as by inhomogeneities present in the sample, once the best affordable computational protocol is firmly established. All signals reported use the all-parallel xxxx pulse polarization configurations and are plotted on a linear scale. Both ground state bleaching (GSB) and stimulated emission (SE) contributions appear as negative (blue) peaks, whereas excited-state absorptions (ESAs) appear as positive (red) peaks in the two-dimensional spectra.

3 Results and discussion

3.1 Reference spectrum

Table 1 displays the reference excited-state computations of guanine based on a SA-30-RAS (0,0|14,11|2,8)/SS-RASPT2 computation. This level is achieved by performing the protocol described in Sect. 2, providing converged results with respect to the systematic increase in the RAS3 subspace [23, 24]. The specific configuration state functions (CSFs) defining the nature of the electronic transitions are reported in Table 1. As can be seen, low-energy lying states feature contributions of one or two CSFs with large weights, whereas those placed in the high-energy window display strong multiconfigurational character, by featuring multiple CSFs with rather small weights contributing to the total excited-state wave functions.

Table 1 Vertical excitation energies (EVA, eV), transition dipole moments (TDM, a.u.), leading wave function configurations and corresponding coefficients and weights for the different guanine electronic excited states calculated at the SA-30-RAS (0,0|14,11|2,8)/SS-RASPT2 (reference) level of theory with 21 Rydberg-like deleted orbitals (see Sect. 2)

A systematic enlargement of the RAS3 subspace has been carried out starting from the CAS(14,11) level, or CAS(0,0|14,11|0,0), its effects on the electronic excited-state energies and dipole moments being directly displayed along the NUV-pump/Vis-probe (Fig. 2, in the 10,000–30,000 cm−1 range) and the NUV-pump/NUV-probe (Fig. 3, in the 30,000–46,000 cm−1 range) 2D spectra. Evaluating the different energetic trends upon active space enlargement through 2DES maps is particularly appealing, given it provides a visual aid for its interpretation, the position of the peaks and their relative intensities being proportional to the computed vertical excitation energies and transition dipole moments, respectively. As can be seen, the NUV-pump/Vis-probe (Fig. 2) and NUV-pump/NUV-probe (Fig. 3) spectra show relatively small changes by going from RAS(0,0|14,11|2,6) to RAS(0,0|14,11|2,8), providing evidence of convergence already upon addition of six additional RAS3 orbitals, while considering the permutation of up to two electrons.

Fig. 2
figure 2

Two-dimensional NUV-pump/Vis-probe spectra of guanine in gas phase for different active space sizes: a CAS(14,11); b RAS(0,0|14,11|2,4); c RAS(0,0|14,11|2,6); d RAS(0,0|14,11|2,8). Peak numbering follows the excited-state assignment displayed in Table 1, red labels (left) denote ESAs along the 1La trace, and blue labels (right) refer to ESAs along the 1Lb trace

Fig. 3
figure 3

2D NUV-pump/NUV-probe spectra of guanine in gas phase for different active space sizes: a CAS(14,11); b RAS(0,0|14,11|2,4); c RAS(0,0|14,11|2,6); d RAS(0,0|14,11|2,8). Peak numbering follows the excited-state assignment displayed in Table 1, red labels (left) denoting ESAs along the 1La trace, blue labels (right) refer to ESAs along the 1Lb trace and black labels denote GSB signals

A closer look at the NUV-pump/Vis-probe spectra shows a negligible dependence of the different ESA signals among Ω3 upon active space enlargement, both for energetic position and associated relative intensities, whereas more pronounced differences are registered among Ω1 in terms of a systematic energy shift toward higher energies as has been previously reported for the other nucleobases [23, 24]. This energy shift totals ~ 1000 cm−1 along both 1La (centered at 37,000–38,000 cm−1) and 1Lb (centered at 41,000–42,000 cm−1) by going from CAS(0,0|14,11|0,0) to RAS(0,0|14,11|2,8). These differences are smaller than those observed for the other nucleobases, mainly due to the shared CFSs within both 1La and 1Lb, which remain stable upon active space enlargement. The reported excitation energy values for 1La (4.68 eV) and 1Lb (5.20 eV) are in agreement with the cross section of guanine, showing a band with two shoulders peaking at ~ 275 nm (4.51 eV) and ~ 250 (4.96 eV) [59], thus being within 0.2 eV of the theoretical estimates here reported. These are in agreement with equation of motion excitation energy coupled cluster (EOMEE)-CCSD(T)/aug-cc-pVTZ estimates reported by Szalay et al. [60] of 4.86 and 5.37 eV for the 1La and 1Lb states, respectively, and with previous CASPT2 computations of Roos and co-workers [61] yielding 4.51 and 5.25 eV, respectively. Unfortunately, only theoretical values up to state 4 (ππ*) are available at the (EOMEE)-CCSD(T) level [60], making its comparison for the high-energy window unfeasible, yet presenting an excitation energy of 6.26 eV, also in agreement with our estimate of 6.00 eV (see Table 1). The oscillator strength estimates for the 1La and 1Lb transitions are 0.145 and 0.269, respectively, which are lower than those obtained in the literature at the EOMEE-CCSD/aug-cc-pVTZ level of 0.16 and 0.37 [43]. Unfortunately, no estimates for the oscillator strength are available at the higher (EOMEE)-CCSD(T)/aug-cc-pVTZ level, which has a more accurate description of the excitation energies, for a better comparison. The active space enlargement effect along Ω1 is also observed in the low-lying excited-state absorptions, mainly referring to states 6, 7, and 9 along the 1La trace and state 7 along the 1Lb trace, which consequently causes the preservation of the energy gap among them and roughly retains the peak position along Ω3. This is associated with the similar nature of the excited ππ* states, highly dependent on dynamic correlation [33], which react analogously toward its systematic increase. Higher energy signals in the Vis-probe window show a more pronounced dependence on the dynamic correlation included in the model (Fig. 2): Peak 11 along the 1La trace oscillates slightly upon active space enlargement, its associated intensity being overestimated in the less-correlated CAS(0,0|14,11|0,0) with respect to the RAS(0,0|14,11|2,8) reference, a feature also observed for peaks 5 and 7 along the 1La trace, while peak 11 along the 1Lb displays a similar behavior and provides, together with state 7, the most intense and characteristic signals of guanine in the Vis window. Peak 14 along the 1Lb trace appears to lose intensity upon active space enlargement and is pushed beyond the Vis window and completely obscured in the UV range in the reference. It is worth noting that, as opposed to what has been found so far for the remaining nucleobases [23, 24], the leading CSFs of the signals featured in this window feature predominantly singly excited character (see Table 1), which may facilitate its description with computationally cheaper approaches such as those related to linear or quadratic response theories [62, 63].

Moving onto NUV-pump/NUV-probe spectra (see Fig. 3), larger differences can be discerned: As previously observed [23, 24], higher-lying electronic excited states seem to oscillate more pronouncedly compared to those found in the lower-lying Vis window. Besides those arising from GSB (peaks 2 and 3 for 1La and 1Lb, respectively), two signals feature in this window with sizable intensity, i.e., the off-diagonal GSB negative (blue) signals that are expected to appear when two or more bright states fall under the envelope of the utilized pump pulse. The off-diagonal bleach is thus a characteristic fingerprint of the 2D-NUV-probe spectrum of purine nucleobases [24], and it can be considered to individuate guanine traces along Ω1 in 2DUV spectra of multimeric systems. Along the 1La trace, signal 19 featuring pronounced doubly excited character is placed ~ 38,000 cm−1 in CAS(0,0|14,11|0,0) and it is systematically blue-shifted to ~ 40,000 cm−1 in RAS(0,0|14,11|2,8), RAS(0,0|14,11|2,6) displaying a strong overestimation, placing the signal at 43,000 cm−1. This large overestimation appears due to an artificial mixing with a close-lying CAS state, which strongly affects its energetic position upon the subsequent perturbation theory treatment; a more in-depth explanation may be found in [34]. Along the 1Lb trace, signal 17 is placed at ~ 31,000 cm−1 at CAS(0,0|14,11|0,0), moving up to ~ 34,000 cm−1 for the reference RAS(0,0|14,11|2,8). It is worth noting that this signal features a pronounced doubly excited HOMO to LUMO component, being suggested as a monomer-specific fingerprint for other nucleobases in the Vis probing window, while appearing in the UV in the case of guanine. As it was previously discussed for the pump-UV/probe-Vis spectra, CAS(0,0|14,11|0,0) overestimates the intensity of some signals, as can be seen for peaks 18 and 19 along the 1Lb trace (Fig. 3), which vanish upon active space enlargement.

3.2 Cost-effective approaches to 2DES

While reference RAS(0,0|14,11|2,8) results provide the highest accuracy available for the high-lying excited-state manifold, its routine use in applications is limited due to its exceedingly elevated cost, featuring well over 5 million CSFs and 12 million Slater determinants in its configuration interaction (CI) expansion. This elevated cost, despite being affordable within a single gas-phase molecule exploiting Cs symmetry, becomes unaffordable for (more interesting) dimeric and multimeric systems. To reduce costs, two different approaches have been followed: the use of artificial level shifts and of larger restricted active space schemes.

In a series of benchmarks on aromatic systems [24, 64, 65], we have documented that the RASPT2 protocol using schemes with small active spaces or featuring a moderate number of CSFs can dramatically underestimate transition energies as it overestimates the dynamic correlation of ionic (in valence bond terms) states. In the search for a cost-effective protocol to counteract (or at least damp) this effect, we observe that real and imaginary shift parameters (originally introduced to solve intruder state problem) invoke a non-uniform decrease of the correlation contribution, much more pronounced for ionic than for covalent states. This makes the use of the imaginary level shift well suited for improving performances of minimal RAS schemes toward characterization of highly excited states, even if we need to resort to larger shift values than suggested in the literature (a detailed argumentation can be found in Ref. [65]). We emphasize that the application of the large shift parameters makes sense only in the framework of semiempirical parameterization against a reliable reference data set and its use is otherwise discouraged.

Figure 4 shows the NUV-pump/Vis-probe spectra (lower panels) for the reduced RAS(4,7|0,0|4,4) approach with different imaginary level shifts. The less-expensive RAS(4,7|0,0|4,3), featuring one less π* orbital in the active space, displays much larger deviations with respect to the reference as opposed to what was observed in adenine [24]. As can be seen in Fig. 4, upon increase in the imaginary level shift, the energies along Ω1 are progressively blue-shifted, 1La and 1Lb traces approaching the RAS(0,0|14,11|2,8) reference for IMAG values of 0.5 (Fig. 4d), and being slightly blue-shifted with respect to the reference for the 0.6 value (Fig. 4e). The same trend can be observed along Ω3 in the Vis window (lower panels) with a couple exceptions, the splitting between signals 10 and 11 along the 1La trace, which is not reproduced at the RAS(4,7|0,0|4,4) approach regardless of the imaginary level shift employed, and the position of peak 7 along the 1Lb trace, which is shown to be extremely red-shifted within this cost-effective approach. The red-shifted signal 14 along 1Lb trace is pushed to the NUV window along Ω3 for larger IMAG (Fig. 4d, e), in agreement with reference values. As it has also been documented for other canonical nucleobases [23, 24], the deviations from reference computations in cheaper RAS approaches increase in the NUV-probe window (upper panels in Fig. 4), particularly for signal 17 along the 1Lb trace, which is the main fingerprint for this state in the NUV range and that is shown to still be within ~ 2000 cm−1 of the reference, while signals 18 and 19 appear in the right energetic position but displaying an artificially enhanced intensity in the reduced RAS approaches.

Fig. 4
figure 4

Two-dimensional NUV-pump/Vis-probe (lower panels) and NUV-pump/NUV-probe (upper panels) spectra of guanine in gas phase for a RAS(4,7|0,0|4,4) approach with different imaginary level shifts: a 0.2, b 0.3, c 0.4, d 0.5, e 0.6, and f reference RAS(0,0|14,11|2,8) values. Dashed lines represent positions of the 1La and 1Lb traces along Ω1 in RAS(0,0|14,11|2,8) reference computations for the sake of visual aid

Whereas all previous estimates shown above rely on a RAS(4,7|0,0|4,4) approach, an arguably more robust and correct way to increase the correlation in the model would be to extend its flexibility in terms of the holes and electrons accessible to the RAS1 and RAS3 subspaces, respectively. A systematic increase in the number of holes and electrons accessible within the RAS1/RAS3 subspaces was thus considered, incrementing symmetrically the number of holes and electrons one by one, up to eight, while retaining the imaginary level shift to a standard value of 0.2. As can be seen in Fig. 5 (lower panels), negligible gains along Ω3 can be observed in the NUV-pump/Vis-probe by going from four to five electrons/holes, from there onward being fully converged and showing no appreciable differences among the different RAS schemes considered besides an increase in the computational time, thus supporting the use of the RAS(4,7|0,0|4,4) for large systems. All estimates appear red-shifted with respect to the reference along Ω1, and remaining red-shifted regardless of the number of holes/electrons included in the active space, showing the strong active space dependence as discussed in the previous section. Moving to the higher energy NUV-pump/NUV-probe window (Fig. 5 upper panels), we observe lesser differences as those displayed in Fig. 4: The overall spectra seem analogous except for the intensity displayed by peak 19 along the 1La trace, which is obscured upon active space enlargement and in agreement with the reference computations, thus showing an overestimation on the transition dipole moments for this state in the smaller RAS approaches. Overall, we observe very little gains in moving to RAS approaches containing a larger number of electrons/holes, and also displaying an elevated cost with respect to the RAS(4,7|0,0|4,4) initially considered, which is here shown to be good enough for a qualitative description of the spectra and to be used for dimeric or multimeric species.

Fig. 5
figure 5

To-dimensional NUV-pump/Vis-probe (lower panels) and NUV-pump/NUV-probe (upper panels) spectra of guanine in gas phase for a RAS(4,7|0,0|4,4), b RAS(5,7|0,0|5,4), c RAS(6,7|0,0|6,4), d RAS(7,7|0,0|7,4), e RAS(8,7|0,0|8,4) schemes, and f RAS(0,0|14,11|2,8) reference values. Dashed lines represent positions of the 1La and 1Lb traces along Ω1 in RAS(0,0|14,11|2,8) reference computations for the sake of visual aid

4 Conclusions

We report a thorough assessment of the electronic high-lying singlet excited-state manifold of canonical purine nucleobase guanine in vacuo at the multiconfigurational wave function RASSCF/RASPT2 level. This approach allows us to obtain accurate estimates for transition energies and their associated dipole moments, which are then employed to simulate the two-dimensional NUV-pump/Vis-probe and NUV-pump/NUV-probe spectra of guanine for waiting times t2 = 0. The nonlinear electronic spectra show several ESA signals that may represent the particular fingerprint for this DNA/RNA nucleobase. As has been shown for other nucleobases [23, 24], the NUV-pump/Vis-probe spectral window features the clearest ESA fingerprints to be employed in order to discern among the different bases, for clearly tracking their monomer-specific decays and separating their contributions to those arising due to intermolecular interactions in complex dimeric/multimeric species. As expected, guanine displays similar signals to the other purine nucleobase, adenine [24], being slightly displaced along both pump and probe frequencies and displaying an analogous dependence on the dynamic correlation added, which blue-shifts both 1La and 1Lb signals along Ω1 upon its increase, while showing its main signal of partial doubly excited HOMO to LUMO character in the UV range along Ω3 at difference with the other nucleobases previously considered. Several reduced RAS schemes featuring all active orbitals in the RAS1/RAS3 subspaces to reduce costs were benchmarked, showing negligible gains over the inclusion of more than four holes/electrons in the active space, while a judicious use of the imaginary level shift is shown to artificially blue-shift the excitation energies moving them closer to the RAS(0,0|14,11|2,8) reference and thus making this approach applicable to larger dimeric and oligomeric systems.