1 Introduction

Upon UV excitation, DNA can undergo different relaxation paths which can result in either a non-reactive, radiative (i.e., fluorescence–phosphorescence) [1, 2], or non-radiative (i.e., through conical intersections) [37], decays back to the absorbing ground-state structure or to a photochemical process leading to mutagenic and/or carcinogenic interstrand pyrimidine photoproducts (as cyclobutane pyrimidine dimers, pyrimidine–pyrimidone (6–4) adducts, and their Dewar isomers) [811]. The understanding and characterization of these phenomena and the factors that determine the fate of the excitation (as, e.g., the nucleobase sequence, the geometry at the time of light irradiation) constitute a fundamental elusive goal, due to the considerable complexity of DNA and the number of different correlated variables affecting its photoresponse [12].

The third-order nonlinear pump–probe technique has been extensively employed to study oligo- and poly-nucleotides with single and double strands [1, 2, 46, 1317], providing precious insights on the photophysics and photochemistry of DNA. This technique employs a pair of ultrashort laser pulses to indirectly monitor the dynamics induced by the first (pump) pulse simultaneously in all electronic states under the pulse envelope through recording the spectral signatures of each state (i.e., absorption and emission signals) as a function of the time delay (i.e., waiting time) between the pump and the second (probe) pulse. The approach has however the drawback since it is unable to resolve concomitant de-excitation paths, due to the impossibility to disentangle concurrent signals coming out from different paths and/or states of a system simultaneously excited by the ultrashort (i.e., few femtoseconds, thus being several thousands cm−1 broad) pump pulse. The pump–probe technique is unable to provide unambiguous fingerprints specific for each possible photoinduced decay.

A more advanced third-order nonlinear optical spectroscopy, capable to overcome the pump–probe technique limitations, is the two-dimensional electronic spectroscopy (2DES) [18, 19]. Here, the sample interacts with three pulses: for a fixed time delay between the second and third pulses (t 2), which can be considered as the analogues of the waiting time between the pump and probe radiations in pump–probe spectroscopy, a single experiment is performed by scanning the time delays between the first and second pulse (t 1) and between the third pulse and a local oscillator (t 3), which heterodynes the field emitted by the sample in response to the perturbation by the three incident pulses and facilitates the extraction of the phase and time dependence of the field. Subsequently Fourier transforming the signals along t 1 and t 3, a two-dimensional Ω1, Ω3 spectrum is obtained, where all the electronic transitions involved in the three field–matter interactions are recorded with both high spectral and temporal resolution [2022].

The interpretation of a spectrum strongly relies on the theoretical computation of properties like energies and transitions dipole moments that characterize the system electronic states and the possible electronic transitions among them. In this sense, spectroscopy constitutes a fine example of the interplay between experiment and theory. 2DES is no exception to this rule, being even more dependent on theoretical simulations due to the high number of signals that can be recorded [2327]. In fact, the use of near-ultraviolet (NUV) and visible (Vis) radiation ideally allows the detection of all the electronic transitions among the electronic states placed up to around 9 eV from the ground state. Depending on the system size and properties, over a hundred excited states may be encountered in this energy window of 0–11 eV, resulting in a large amount of information that can potentially be recorded, and whose interpretation requires a theoretical characterization of the system excited-state manifold. The latter task poses a considerable computational challenge due to the large number of excited states. The challenge is related to both the required computational effort (memory, CPU time, etc.) and the difficulty to describe a large number of excited states of different nature (e.g., ionic, covalent, doubly excited) on an equal footing. Using complete active space self-consistent field (CASSCF) [28] and complete active space second-order perturbation theory (CASPT2)-related methods [2931], which is a common choice in photophysical and photochemical studies, the computation of highly accurate results for high-lying states has been documented to require the inclusion of different orbitals outside the valence space in the active space [32], whose number and character are not known a priori. Moreover, one has to face problems as valence-Rydberg mixing [33] and the unbalanced treatment of ionic and covalent states [24, 34], both potentially critical issues when computing many states. Despite the proven ability to provide a correct prediction of photophysical and photochemical data repeatedly shown by the CASPT2//CASSCF protocol for a variety of systems of very diverse size and nature [3538], the particularly large number of state normally required for simulating 2DES experiments and the necessity of using less computationally expensive treatment, such as the RASPT2//RASSCF methods [39], imposes as a first task in the simulation of a two-dimensional spectrum the determination of reliable results on the excited states of the system under study [40]. The need of such data is reinforced by the almost total absence of both experimental and theoretical information on the high-lying excited-state manifold [4143], at least for biological relevant molecules as nucleobases-related systems, on which the application of 2DES can bring a significant contribution to their characterization.

Here we provide the first step toward the simulation of two-dimensional electronic spectra of canonical pyrimidine nucleobases-containing systems that is for the reasons explained above, a thoughtful calibration of the electronic excited states that can be populated in 2DES employing NUV and Vis pulses for the thymine, uracil, and cytosine molecules. The obtained spectral signatures here characterized can also be employed in the analysis of widely used pump–probe spectra and other third-order nonlinear spectroscopic techniques. The NUV-pump/Vis- and NUV-pump/NUV-probe 2D electronic spectra of canonical nucleobase chromophores have been simulated (for t 2 = 0), highlighting their main features and reporting the principal differences among them. Finally, the spectra arising from dimeric and trimeric ideal aggregates of the molecules studied here are simulated through an “exciton Hamiltonian model.”

2 Computational details

A major goal of the present paper is the computation of high-level results on the excited states up to 11 eV for the canonical nucleobases thymine, uracil, and cytosine (see Fig. 1). Such an energy upper limit has been chosen to fully describe the states involved in NUV-pump/Vis-probe and NUV-pump/NUV-probe 2DES experiments. It also allows minimizing the contributions from the higher-energy region where the ionization threshold (~9 eV) might be crossed and where potential contributions from ionized species might arise [44], and their contribution is expected to be rather small above the ionization threshold and gradually increasing over the first few eV based on experimental data on olefins [45]. Our main objective is to obtain a highly accurate estimate for a large number of excited states. To this end we employ large basis sets, providing high flexibility that allows to correctly describe diffuse excited states, being affected more pronouncedly the higher-lying states. Additionally, a large active space beyond the valence space is used, in order to include as much correlation at the CASSCF level as possible in order to improve the reference function subsequently employed for the perturbation treatment at the CASPT2. By doing so, one often encounters problems associated with the appearance of Rydberg-type states that strongly affect the reference function and deteriorate the description of the valence states of interest. To address this, the computations have been performed following a recently presented two-step approach [46]. Firstly, the valence-Rydberg mixing problem, usually present when dealing with high-lying excited states, is treated through the optimization and subsequent deletion of the Rydberg orbitals involved in Rydberg states encountered in the explored energy region [47]. It must be stressed that these contributions and the overall importance of Rydberg-type states have been found to be negligible in solution [48], which is ultimately the media in which DNA/RNA systems are usually measured. In order to describe Rydberg orbitals, the ANO-L basis set [49] contracted to C,N,O[4s,3p,2d]/H[2s1p] employed here has been extended by including a set of uncontracted diffuse 8s 8p and 8d functions in the center of the charge [47]. Secondly, once deleted the optimized Rydberg orbitals, starting from an active space composed by all the valence π orbitals included in the RAS2 subspace, a systematic increase of the active space is performed expanding the RAS3 subspace by the inclusion of four extra orbitals at each step until reaching convergence in the results. RAS2 and RAS3 refer, as usual, to the restricted active space self-consistent field (RASSCF) active subspaces, i.e., RAS(a,b|c,d|e,f), indicating, respectively, a, the maximum number of holes allowed in the RAS1 subspace; b, the number of orbitals in RAS1; c, the number of electrons included in the RAS2 subspace; d, the number of orbitals in RAS2; e, the maximum number of electron that can be distributed among the RAS3 orbitals; f, the number of orbitals composing the RAS3 subspace [32]. In all cases a maximum number of two electrons were allowed in RAS3. Lone pair n orbitals have been excluded from the active space, in line with previous work on adenine [46], given their relatively dark character (with an exceedingly small oscillator strength) and their blueshifts observed in solution, which will indicate a minor involvement of these states in the range of interest [5052].

Fig. 1
figure 1

Chemical structure of the DNA and RNA pyrimidine nucleobases studied in the present work

C s symmetry has been imposed for the three molecules in all the computations here presented. CASSCF ground-state geometries optimizations have been operated including in the active space the most important valence π orbitals (CAS(6,6)). The excited-state wave functions and the transition dipole moments [53] characterizing the different electronic transitions have been computed at the CASSCF and RASSCF levels [54, 55], following the state-averaged (SA) procedure for the optimization of the orbitals. The energies have been obtained using the second-order perturbation theory approach in its single-state variant, SS-CASPT2/RASPT2, which includes the dynamic correlation effects missing in the CASSCF and RASSCF description. Within the CASPT2 calculations, the IPEA shift has been set equal to 0.0 [56], and an imaginary level-shift correction of 0.2 a.u. has been used in order to minimize the effects of possible intruder states [57]. The CASPT2 standard zeroth-order Hamiltonian has been employed as originally implemented. The core orbitals have been frozen in the CASPT2 calculations. Such a CASPT2 approach has been validated during the last decades in many different studies on organic molecules, providing a correct prediction, description, and interpretation of the experimental photophysical and photochemical data. The Cholesky decomposition has been used to speed up the calculation of two-electron integrals [5862]. All quantum–mechanical calculations have been performed with the MOLCAS package [6365]. Further details on the approach employed are reported in the Supporting Information. Applying this computational protocol, the final outcomes for thymine, uracil, and cytosine have been obtained at the SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 level of theory by deleting 29, 23, and 22 optimized Rydberg orbitals, respectively. In the state-averaging procedure, we have included more states than those discussed in the results section (covering a much higher energetic window at the CASSCF level), in order to guarantee that no states with pronounced energetic stabilization upon dynamic correlation corrections were omitted.

The calculated transition dipole moments and energies have been used to simulate two-dimensional electronic spectra employing the Spectron 2.7 program [18], which computes quasi-absorptive 2DES maps by the sum-over-state approach in the dipole approximation [23]. Details on the working equations are presented in the Supporting Information (SI). A static picture is adopted in order to describe the coherence dynamics during time t 1 and t 3, thus assuming that the vibrational dynamics in the system is slower than these timescales. The waiting time t 2 between the pump pulse pair and the probe pulse was set to zero. Using the static approximation, only computations at the Frank–Condon (FC) point are needed for simulate spectra at t 2 = 0. Although protocols for computing 2DES that consider spectral diffusion and non-adiabatic effects on an ultrashort timescale have been recently proposed [66, 67], the rather crude static approximation is sufficient for our aim to study the effect of various computational parameters on the shape of the 2D maps and to qualitatively individuate the main signals characterizing the spectra of canonical pyrimidine nucleobases at the Franck–Condon region. Our aim is to have a computational reference of the purely electronic contributions to the excitation energies of all accessible states in order to derive subsequent protocols for their efficient computation. The fine lineshape given by the ultrashort vibrational dynamics as well as by inhomogeneities present in the sample may be addressed at a later stage [66], once the best suited computational protocol for the electronic structure is established. A constant line broadening of 200 cm−1 was used throughout. All calculated signals use the all-parallel xxxx pulse polarization configurations and are plotted on a logarithmic scale. Ground-state bleaching (GSB) and stimulated emission (SE) contributions appear as negative (blue) peaks, and excited-state absorptions (ESAs) appear as positive (red) peaks in the 2D spectra.

Bidimensional electronic spectra were generated for simple models of unstacked and stacked uracil–cytosine dimer. The unstacked (and, thus, non-interacting) dimer was generated by combining the separate spectra of both monomers. The coupled dimer was computed using an approximate Frenkel exciton model [68] introducing exciton couplings between the first excited state of uracil and the two lowest lying excited states of cytosine. The couplings were computed in the point dipole approximation [68]. The two bases were assumed to be in a perfectly stacked configuration with an interbase distance of 4 Å. It was further assumed that the interaction does not affect the energies of the higher-lying states. Two mixed doubly excited states corresponding to the simultaneous excitation of different excitons were added to the manifold of higher-lying states, and a quartic coupling [18] of −1000 cm−1 was applied, implying that double excitations on neighboring sites have lower energy compared to the sum of energies of the two single excitations. Further details of the exciton model employed are given in SI (Fig. S7).

3 Results

3.1 Isolated canonical pyrimidine nucleobases

3.1.1 Thymine

Table 1 reports the thymine excited states up to 11 eV based on SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 computation. Such level of theory is used as the reference since it is in agreement with the benchmarking following the protocol described in Sect. 2, and the SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 calculation provides converged results with respect to the systematic increase of the RAS3 subspace [32, 46]. The RAS wave functions defining the nature of the computed states are also reported in Table 1. As expected, the lower-states wave functions are mainly composed by one or two configuration state functions, while at higher energies different configurations, among which double excited configuration state functions, contribute to the excited-state wave functions.

Table 1 Vertical excitation energies (E VA, eV), transition dipole moments (TDM, a.u.), leading wave function configurations, and corresponding coefficients for the different thymine excited states calculated at the SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 level of theory with 29 Rydberg-like deleted orbitals (see Sect. 2)

A gradual enlargement of the RAS3 subspace is carried out, and a comparison of the electronic structures is drawn utilizing the NUV-pump/Vis-probe (in the 10,000–30,000 cm−1 range) and NUV-pump/NUV-probe (in the 30,000–46,000 cm−1 range) two-dimensional electronic spectra for t 2 = 0 (see Fig. 2). This kind of graphical representation is appealing since the positions of the peaks and their intensities are proportional to the computed vertical excitation energies and transitions dipole moments, respectively. As clearly shown in Fig. 2, both the NUV-pump/Vis-probe and NUV-pump/NUV-probe spectra computed at the SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 level of theory display almost no changes with respect to the same spectra computed using the SA-29-RAS(0,0|10,8|2,8)/SS-RASPT2 results. The convergence of the SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 outcomes is further proven in Table S7, where the excitation energies of the computed states with the different active spaces are tabulated (see SI).

Fig. 2
figure 2

2D NUV-pump/Vis-probe (lower panel) and 2D NUV-pump/NUV-probe (upper panel) spectra of thymine in gas phase for different active space sizes: a CAS(10,8); b RAS(0,0|10,8|2,4); c RAS(0,0|10,8|2,8); d RAS(0,0|10,8|2,12). Peak numbering follows the excited-state assignment displayed in Table 1

Further examination of the NUV-pump/Vis-probe spectra reported in Fig. 2 (lower panel) is possible showing that the Ω3 values at which the signals appear as well as their intensities are little affected by the level of theory; meanwhile, the position of the trace is constantly moving to higher values of Ω1, showing the most significant variation passing from the RAS(0,0|10,8|2,4) to the RAS(0,0|10,8|2,8) active space (blueshift equal to 721 cm−1). This reflects a similar increase in the vertical excitation energies experienced by the states excited by the probe radiation (i.e., states 4, 5, 6, and 7) and the pump reachable state (state 2) that consequently causes the preservation of the energy gap among them, leading to the almost constant position of the peaks along Ω3. The described behavior could be an indicator of a similar nature of the mentioned states. More pronounced effects are appreciable comparing the NUV-pump/NUV-probe spectra at the different level of theory (see higher panels in Fig. 2). All ESAs exhibit a non-uniform blueshift upon active space enlargement causing the splitting of peaks 8 and 9, which appear to be overlapped in the least correlated cases (Fig. 2a, b). On the other hand, peak 10 undergoes a significant blueshift in the RAS(0,0|10,8|2,4) and then a gradual redshift for the higher-level spectra, resulting in a ~500 cm−1 shift from the CAS(10,8) to the RAS(0,0|10,8|2,12).

Using the reference SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 results, we can conclude that the Franck–Condon thymine NUV-pump/Vis-probe spectrum at t 2 = 0 is characterized by four ESA peaks, associated with electronic promotion from state 2 into states 4, 5, 6, and 7, respectively. All those peaks are significantly intense, with signal 7 being the most intense one, and energetically well separated, at least according to the static approximation here assumed. Regarding the analogous NUV-pump/NUV-probe spectrum, five ESA peaks due to excitation of states 8, 9, 10, 11, and 13 plus the expected GSB signal are present on it. Among them, peak 9 and the GSB/SE appear as the most intense features of the 2D map and can be consequently considered as the clearest fingerprints of the Franck–Condon thymine NUV-pump/NUV-probe spectrum. The intense signals resulting from the electron promotions between the initially accessed ππ* and states 6, 7, and 9 appear to be well separated and in a convenient probing window to monitor the specific ππ* excited-state dynamics with transient absorption spectroscopic techniques.

Finally, it is important to compare the obtained results with the available theoretical and experimental data. Previous computations have focused on the lower excited states of the system, providing information on the four lowest ππ* electronic states. CASPT2 results have been presented by the Roos [6971] and Thiel groups [72, 73]. Szalay et al. studied the excited state of the molecule using a variety of linear response coupled cluster methods, based on CCSD(T) as well as CC3 approximations [4143]. The corresponding vertical excitation energies are reported in Table S4 (see Supporting Information), comparison with our results shows a reasonable overall agreement, and the present data tend to be slightly blue-shifted when compared with the previous CASPT2 computations while displaying in general lower values with respect to couple cluster results. Those variations can be attributed to: (a) the different levels of theory employed, (b) the much higher number of states here computed, which can significantly affect the outcomes of a CASSCF computation using state-average procedure, and (c) the slightly different geometries used in the cited papers (see Figure S4), which has been recently shown to pronouncedly influence the energetic position of the excited-state manifold by up to 2000 cm−1 [46]. Experimentally, between 3 and 4 lower excited states have been described for thymine [74, 75]. Working with the bare molecule in vacuo, the most appropriate comparison for our results would be with gas-phase data, although the operated elimination of Rydberg orbitals prevents to compare with gas-phase spectra for regions in which Rydberg states contribute. To the authors’ knowledge, only one paper displays experimental values for the vertical excitation energies of states higher than the S1 in gas phase. In this study, performed by Abouaf et al. employing electronic energy loss spectroscopy [74], three bands associated with ππ* singlet excited-state transitions have been characterized at 4.95, 6.2, and 7.4 eV, respectively. The energetic regions at which those bands appear are in agreement with the vertical excitation energies here computed for states 2, 3, 4, and 5 (i.e., 5.00, 6.20, 6.51, and 7.31). Gustavsson et al. on the other hand reported the maximum of the absorption in solution for the first band placing it at 4.7 eV [75], in line with previous theoretical analyses that predict pronounced redshifts in strongly ionic states like the ππ* state in thymine [51, 52]. For further comparisons, different experimental results in solutions can be found in previous works by Roos et al. [6971] and in the references therein.

3.1.2 Uracil

Table 2 summarizes the characterized excited-state reference results up to 11 eV obtained for uracil based on SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 computations. As shown in Fig. 3 and analogously to thymine, both NUV-pump/Vis-probe and NUV-pump/NUV-probe reference spectra display slight changes with respect to the same spectra computed using the SA-29-RAS(0,0|10,8|2,8)/SS-RASPT2 results. The convergence of the reference computation is again assessed in detail in SI (see Table S8), where the excitation energies of the computed states with the different active spaces are displayed and compared with respect to previous theoretical studies employing a variety of theoretical techniques, in order to assess the methodology employed (see SI Table S5).

Table 2 Vertical excitation energies (E VA, eV), transition dipole moments (TDM, a.u.), leading wave function configurations, and corresponding coefficients for the different uracil excited states calculated at the SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 level of theory with 23 Rydberg-like deleted orbitals (see Sect. 2)
Fig. 3
figure 3

2D NUV-pump/Vis-probe (lower panel) and 2D NUV-pump/NUV-probe (upper panel) spectra of uracil in gas phase for different active space sizes: a CAS(10,8); b RAS(0,0|10,8|2,4); c RAS(0,0|10,8|2,8); d RAS(0,0|10,8|2,12). Peak numbering follows the excited-state assignment displayed in Table 2

The NUV-pump/Vis-probe spectra reported in Fig. 3 show virtually identical Ω3 values and intensities associated with the transitions depicted, whereas Ω1 appears to be more affected by the increase of the active space, showing a gradual blueshift of the trace representing the S1 excited state (state 2 in Table 2) by going from RAS(0,0|10,8|0,0) to RAS(0,0|10,8|2,12) totaling in a ~2000 cm−1 shift. An analogous blueshift occurs for all signals present in the NUV-pump/Vis-probe window (lower panels in Fig. 3), and as a result they remain in identical position along Ω3 despite being gradually moved to higher energies (states 4, 5, 6, and 7), as it occurs in thymine. This behavior is again attributed to the similar nature of the low-lying excited states accessed by the system and allows explaining their relative stability toward active space enlargement. On the other hand, the NUV-pump/NUV-probe window appears to be more affected in comparison (see higher panels in Fig. 3). A clear dependence in the size of the active space can be readily recognized, by collapsing signals 9 and 10 for the smallest CAS(10,8) space into a degenerated peak while showing a splitting of the two excited states for the more correlated RAS(0,0|10,8|2,4) computations. Both RAS(0,0|10,8|2,4) and RAS(0,0|10,8|2,8) (higher panels in Fig. 3b, c) show degenerated signals (states 11 and 12) at ~36,000 cm−1 along Ω3, which decouple at RAS(0,0|10,8|2,12) yielding a clear ~1500 cm−1 splitting between them within the static approach here employed. In general lines, states 9, 10, and 12 blue-shift gradually by increasing the active space, totaling shifts of ~2000, ~3000, and ~1000 cm−1 by going from CAS(10,8) to RAS(0,0|10,8|2,12), respectively, while state 14 red-shifts being completely covered by the GSB/SE signal in the more correlated cases. An additional peak emerges at the high-energy window when the active space is increased beyond the valence orbitals and is attributed to state 16 (see Table 2), which is gradually red-shifted upon active space enlargement being placed at ~43,000 cm−1 for the reference spectrum.

Based on the reference SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 results, we can conclude that the 2DES of uracil at the Franck–Condon region employing NUV-pump/Vis-probe setup at t 2 = 0 is characterized by three ESA signals, associated with electronic promotions arising from state 2 and leading to states 4, 5, and 6. All signals appear to be very intense, highlighting the transition to state 6 as the most intense (cf. Table 2), and appear to be energetically well separated within the static approach here employed. It is worth noting that signal 6 features strong contributions from the doubly excited HOMO to LUMO (H – > L) configuration, analogously to the aforementioned signal 7 in thymine that is also featured as the most intense transition. Signal 7 appears in the same energetic window as peak 6, and it is obscured due to its exceedingly small transition dipole moment. Regarding the NUV-pump/NUV-probe spectrum, four ESA peaks, due to the excitation of states 9, 10, 12, and 16, are present, as well as the signal associated with the GSB/SE. Among those, peaks 9, 10, 12 and the GSB/SE appear to be among the most intense and well-separated features of the 2D map and can be consequently considered as the clearest fingerprints of the Franck–Condon uracil NUV-pump/NUV-probe spectrum, yielding very similar conclusions to those previously obtained for thymine. Signal 14 appears to be completely covered by the more intense GSB/SE, and its contributions are expected to be negligible. An additional very intense signal appearing in the high-energy window, referred to the transition to state 16, appears to be another fingerprint unique to uracil that may be employed for its unequivocal characterization being the only signal appearing above the GSB/SE.

Our results were then compared to the available theoretical and experimental data. Most theoretical studies have focused on the lower-lying excited-state manifold, providing estimates for the four lowest lying ππ* electronic excited states. CASPT2 results have been put forth in the literature from Roos et al., employing a similar computational scheme as the one here employed making use of a smaller set of extra diffuse functions [70], whereas Thiel et al. employed more common Dunning-type basis sets [72, 73]. Single-reference linear response coupled cluster estimates have also been reported [76]. All values have been compared to the present findings and can be seen in Table S5 (see SI). The results here obtained tend to be slightly blue-shifted with respect to the previous CASPT2 values reported in the literature, likely due to the large state-averaging procedure employed to cover the whole NUV and Vis energy windows and due to employing large diffuse extra coefficients in the basis sets, while they tend to be red-shifted compared to high-level CCSD(T) estimates. The usage of different geometries (Figure S5 in the SI) is also expected to profoundly affect the energetic position of the excited-state manifold, as it has been shown elsewhere [46], slightly distancing our estimates from those computed with different geometrical arrangements, like those previously discussed in the literature. Experimentally, three bands have been registered for ππ* transitions in vacuo at 5.1, 6.0, and 6.6 eV, respectively. These are on average placed within a tenth of an eV of the values here obtained for states 2, 3, and 4, at 5.2, 6.18, and 6.55 eV, respectively, being in agreement with the only experimental evidence known to the authors related to high-lying excited states in gas-phase uracil [77]. Recent work by Gustavsson et al. [75] reported the aqueous absorption maximum for the first band, placing it at 4.79 eV, again in agreement as shown above for thymine with what has been here reported given the strong redshift expected due to solvation effects [51]. It is worth noting that the first nπ* band, placed at 4.8 eV, has been omitted due to its dark character and therefore its lack of involvement in the 2DES of uracil. The systematic removal of Rydberg-type orbitals in order to improve the description of the valence excited states prevents us to compare our results with high-energy Rydberg studies and their excitation energies. A more in-depth account on the comparison between previous theoretical and experimental evidence is available elsewhere [6971].

3.1.3 Cytosine

In Table 3, the cytosine excited states up to 11 eV based on SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 computations are reported. This level of theory provides converged results with respect to the systematic increase of the RAS3 subspace also for cytosine, reaching a point where further active space extension provides negligible changes in the vertical excitation energies (see Table S9 in the SI). Figure 4 shows the NUV-pump/Vis-probe and NUV-pump/NUV-probe reference spectra and how the signals fluctuate by adding correlation in the form of an increase in the active space. These fluctuations, larger than those registered for the other pyrimidine nucleobases studied, may be related to the larger amount of excited states spanned by cytosine within the first 11 eV with respect to the other two pyrimidines. In order to assess the methodology employed, the excitation energies calculated with different active spaces have been compared with respect to previous theoretical studies (Table S6 in the SI) based on several theoretical techniques.

Table 3 Vertical excitation energies (E VA, eV), transition dipole moments (TDM, a.u.), leading wave function configurations, and corresponding coefficients for the different cytosine excited states calculated at the SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 level of theory with 22 Rydberg-like deleted orbitals (see Sect. 2)
Fig. 4
figure 4

2D NUV-pump/Vis-probe (lower panel) and 2D NUV-pump/NUV-probe (upper panel) spectra of cytosine in gas phase for different active space sizes: a CAS(10,8); b RAS(0,0|10,8|2,4); c RAS(0,0|10,8|2,8); d RAS(0,0|10,8|2,12). Peak numbering follows the excited-state assignment displayed in Table 3

The NUV-pump/Vis-probe spectra of cytosine show similar Ω3 for the most correlated cases and similar peak intensities, whereas Ω1 is again heavily affected by the increase of the active space, showing a progressive blueshift of ~2000 cm−1 between the most and the least correlated cases (Fig. 4a, d), for the two intense traces found along Ω1, corresponding to the lowest lying (ππ*, state 2) and second lowest lying and more intense excited state (ππ*c,2, state 3). The first trace, which appears to gather a relatively small intensity in the 2D spectra, comprises analogous fingerprints as those previously described for uracil and thymine, where states 5, 7, and 8 provide monomer-specific fingerprints that could be monitored with an adequate pump–probe setup where their contributions could be enhanced. Nevertheless, the main contributions in cytosine arise from the second trace corresponding to the ππ*c,2 state that will provide appreciable fingerprints while obscuring the weaker contributions arising from the ππ* state. The second trace along Ω1 represents the transitions connecting the ππ*c,2 state (state 3, depicting mainly a H-1-> L transition, cf. Table 3) to the rest of the high-lying singlet excited-state manifold. This second trace is placed close to the energy regions where uracil and thymine absorb, being crucial for interchromophoric interactions given its energetic position as it will be shown over the next section, and appears to be more strongly affected by the gradual increase of the active space as compared to thymine and uracil. Signals 8, 9, and 10 appear to be accessible from the ππ*c,2 state, the former being relatively intense at the CAS(10,8) level, blue-shifting and reducing its intensity for the more correlated cases. This trend is similar for other signals that are also strongly blue-shifted upon active space enlargement yielding shifts of ~3500 and ~2000 cm−1 for states 9 and 10, respectively, by going from CAS(10,8) to RAS(0,0|10,8|2,12). Signal 12 appears to be the most intense contribution, closely followed by peak 10, representing the two main ESAs characterizing the spectrum (cf. Table 3). A significant shift can still be observed for the ππ*c,2 state (position of the trace along Ω1) by going from RAS(0,0|10,8|2,8) to RAS(0,0|10,8|2,12) so its convergence cannot be completely ensured in the present study.

The NUV-pump/NUV-probe window appears to be more strongly affected by the steady increase of the active space than what has been previously showed for thymine and uracil (see higher panel of Fig. 4), yielding not just quantitative but also qualitative changes in the spectra. A clear dependence in the size of the active space can be readily recognized, influencing both energetic position and transition dipole moments describing the electron promotions between the ππ* and ππ*c,2 states and the rest of the singlet excited-state manifold. The first trace, corresponding to the ππ* state (state 2, cf. Table 3), is again completely obscured by the more intense ππ*c,2 state, contributing to the spectra with its GSB/SE signal. The second trace shows some weak signals at the CAS(10,8) level for states 17, 18, and 19, but that are completely covered due to the strong intensity shown by the GSB/SE signal upon active space enlargement, showing how the extra correlation added by extending the active space can play a role not only just on the energetics of the system but also on other related properties like its transition dipole moments. For the more correlated cases (Fig. 4b, c, d), only the GSB/SE signal can be discerned, yielding a completely ESA empty spectral window.

Based on the reference SA-29-RAS(0,0|10,8|2,12)/SS-RASPT2 results, we conclude that the 2DES of cytosine at the Franck–Condon region employing a NUV-pump/Vis-probe setup at t 2 = 0 is characterized by two bright ESA signals in the second trace along Ω1, associated with electronic transitions connecting state 3 (ππ*c,2) to states 10 and 12, the latter being more intense contribution (cf. Table 3). The NUV-pump/NUV-probe results reveal a relatively empty spectrum, showing no intense contributions associated with ESAs and just the GSB/SE signals of the states 2 and 3, especially the latter, are particularly intense. As will be shown over the next section, the absence of strong ESA signals in the trace of ππ*c,2 indicates it as a potential region to probe the mixed excited-state signals when cytosine interacts with uracil, due to their proximity in energy enhancing their interaction.

Previous computations had made use of the CASPT2 to obtain the lowest lying singlet excited states of cytosine [71]. Single-reference linear response coupled cluster methods have also been reported for cytosine [41], all the aforementioned being compared to the present study in Table S6 (see SI). The results here obtained are again slightly blue-shifted with respect to previous CASPT2 computations given the different basis sets and state-averaging conditions employed and slightly red-shifted in comparison with the most correlated linear response CCSD(T) values as it has also been reported for thymine and uracil. Our results have been computed on top of slightly different structures (see Figure S6 in the SI) as compared to other theoretical studies thus accounting for possible strong fluctuations in the relative energetic position of the singlet excited-state manifold, as has been documented elsewhere [46]. Experimentally, four bands have been registered for ππ* transitions in vacuo at 4.6–4.7, 5.2–5.8, 6.1–6.4, and 6.7–7.1 eV [7779]. These features relate to signal 2 (4.5 eV), being placed within 0.1–0.2 eV to the first experimental band, signal 3 (5.42 eV) predicted to be within the estimated absorption maxima of the second band, signals 4 (6.31 eV) and 5 (6.42 eV) both being within the range of the third band, and signal 7 (6.92 eV) being the most likely candidate for the fourth absorption band registered experimentally, given its vertical excitation energy and transition dipole moment (cf. Table 3). Additionally, estimates of vertical excitation in aqueous solution were provided by Clark and co-workers, characterizing six different absorption bands placed at 4.65, 5.33, 5.60, 6.20, 7.69, and 8.06 eV, respectively [80]. By comparing the in vacuo and aqueous experimental data, it can readily seen how the second absorption band in gas-phase appears to have two different contributions in solution, the first and third bands being relatively unaffected by solvation, while the fourth and fifth band appears to be blue-shifted, getting close in energy to a sixth band not registered in vacuo. Ongoing efforts are being carried out to ascertain the specific states responsible for the diverse absorption bands in solution, which given the complex shifts of the signals cannot be assessed in terms of bathochromic shifts alone. As previously discussed for thymine and uracil, the dark nπ* states have been omitted given their exceedingly small oscillator strength associated with their transitions that will prevent them from being directly populated by the pump pulse.

3.1.4 Comparison of the three nucleobases

An overall analysis is made here in order to find the unique features present in each of the pyrimidine nucleobases that may be employed to disentangle their specific contributions in poly- and oligonucleotide single- and double-stranded chains.

We firstly consider the differences between thymine and uracil, the former appearing in DNA, whereas the latter is featured in RNA. Figures 2 and 3 show strong similarities in their 2DES in the different probing windows. The strong resemblance obtained in their respective spectra is in line with the similar chemical structure, where the methyl group present in thymine slightly redshifts the singlet excited manifold with respect to uracil, yielding a displaced analogous picture. Nevertheless, a few qualitative differences can be discerned within the static approach here employed, even though more accurate time-resolved approaches will be required to ascertain the capacity to separate the different fingerprints depending on their specific broadening, which will in turn depend on the excited-state dynamics of the system [66, 67]. These are mainly based on the blueshift of the H- > L transition by the removal of the methyl group in uracil as compared with thymine. Their lowest lying bright excited state is placed roughly in the same region along Ω1, being shifted by 0.2 eV (~1600 cm−1), in agreement with the experimental shift reported in aqueous solution by Gustavsson et al. [75] of ~1000 cm−1. Consequently, it is found that all states where the H – > L configuration contributes prominently (i.e., states 7, 9 and 10), the corresponding electronic levels are blue-shifted while the other states do not shift substantially. As a consequence, in the visible window, signals 6 and 7, which appear clearly separated in thymine, are overlapped in uracil, while in the NUV window we observe in uracil state mixing between the bright states 9 and 10 and the neighboring states that leads to intensity borrowing. As this effect arises from the mixture of configurations at the CASSCF level, where its magnitude cannot be completely ensured, we cannot exclude the possibility of an artificial mixing. One further fingerprint in the NUV region of uracil is the distinctive ESA signal above the GSB/SE that appears to be less intense thymine.

Cytosine, on the other hand, presents a very different spectrum compared to thymine and uracil given the presence of two well-defined traces along Ω1. The appearance of these two traces, which presents qualitative different spectra compared to thymine and uracil, yields an easy way to differentiate cytosine from the other pyrimidines. The first trace along Ω1, corresponding to the lowest lying ππ*c,1 excited state is pronouncedly red-shifted (by ~0.5 eV) with respect to the analogous transitions registered for uracil and thymine. The trace corresponding to this state appears to be weak given its exceedingly small transition dipole moment compared to any of the other pyrimidines and to the brighter ππ*c,2 state (cf. Table 3). The ππ*c,1 state is also placed too far away energetically from the bright ππ* excited states of thymine and uracil to interact, being the preferred state to monitor cytosine-centered electronic processes given the right combination of pump and probe pulses is adopted. The second trace arising from the population of the ππ*c,2 state (state 3), on the other hand, appears to be relatively bright and free of intense ESA signatures in the NUV region where intermolecular interactions might be probed, given the strong GSB/SE signal featured.

Overall, it can be concluded that the most intense signals in the Vis window for all pyrimidine systems, namely the ESAs featuring states 7, 9, and 10, have large contributions from double excitations involving the HOMO and LUMO orbitals (cf. Tables 1, 2, 3). This motif is therefore common to all canonical pyrimidines, including cytosine if a right pulse setup is used to enhance the first trace’s intensity and adenine, as previously reported [46]. In cytosine, peak 13 corresponds to an excited state with the largest H- > L contribution, confirming the general trend found for all nucleobases so far. Thus, the H- > L transition is expected to represent the most clear, intense, and unique signature of each DNA base that may be used to monitor the dynamics of its respective ππ* state in the Vis range. The NUV window, as documented for adenine [46], is strongly affected by the active space size, and thus the importance of this region for tracking the ππ* excited-state dynamics is mainly based on the consideration that for dimeric and multimeric species the mixed localized excited states, i.e., the fingerprints of interchromophores interaction, are expected to appear just in this high-energy window [25, 26].

3.2 Dimers of canonical pyrimidine nucleobases

Finally, we highlight the spectral signatures due to non-covalent interactions in dimers containing canonical pyrimidine nucleobases. Figure 5 shows the comparison of the bidimensional electronic spectra of an ideal perfectly stacked uracil–cytosine dimer (Fig. 5a), with base separation of 4.0 Å, and of a non-interacting dimer. Some of these fingerprints can be observed in linear absorption spectra or in pump–probe spectra, but only bidimensional spectroscopy is capable of revealing all signatures simultaneously.

Fig. 5
figure 5

2D NUV-pump/NUV-probe spectra of the unstacked (b, c) and perfectly stacked (d, e) uracil–cytosine dimer (a) models. A separation R of 4 Å was used in the case of the stacked dimer. The direction of the transition dipole vectors is also drawn (red lines) originating from the center of mass of each base. The spectral regions encompassed by the dashed rectangles in the full spectra (b, d) are zoomed-in in the rightest panels (c, e, respectively)

The spectrum of the non-interacting dimer can be conceived as composite of the spectra of the monomers (Fig. 5b, c). It is evident that the uracil trace (Ω1 ≈ 42,000 cm−1), associated with the transitions probed out of its lowest ππ* state (ππ*u), bears the highest oscillator strength, thereby revealing a set of characteristic ESAs in the range 33,000–39,000 cm−1 (peaks 9–12 in Fig. 3). On the other hand, the spectral traces of cytosine (Ω1 ≈ 36,000 cm−1 and Ω1 ≈ 43,500 cm−1), associated with its two lowest ππ* states (ππ*c,1 and ππ*c,2), show off-diagonal bleach signal at Ω1 ≈ 36,000 cm−13 ≈ 43,500 cm−1 and Ω1 ≈ 43,500 cm−13 ≈ 36,000 cm−1, while being free of intense ESA peaks (Fig. 4).

In the stacked dimer, the excitonic couplings (V) between ππ*u and ππ*c,1/ππ*c,2 are calculated to be in the few hundred cm−1 range (V ππ*u/ππ*c,1 ~ 100 cm−1, Vππ*u/ππ*c,2 ~ 600 cm−1) when the point dipole approximation is applied. Due to the near orthogonality of the transition dipole vectors μ c01 and μ u01 (Fig. 5a) and the large energetic separation of the corresponding states ππ*c,1 and ππ*u the exciton coupling has virtually no effect on the energetic position of the lowest excited state of cytosine. On the contrary, an increase of the gap between ππ*u and ππ*c,2 of nearly 400 cm−1 is obtained (Fig. 5d, label 1). It should be noted though that the point dipole approximation is known to overestimate the coupling strength for short interchromophore distances [68]. This, together with the peak broadening due to spectral diffusion and the inhomogeneity of the sample, is likely to cover the comparably small shifts.

Furthermore, it is evident that the intensity of the traces associated with ππ*u and ππ*c,2 changes as ππ*c,2 borrows oscillator strength from ππ*u. This scenario is particularly eminent for homodimers [81] where the strong coupling of bright states with identical electronic configuration in each monomer leads to the formation of strongly delocalized dark exciton states in the dimer [82]. In fact the exciton states of the uracil–cytosine dimer should no longer be seen as purely localized on uracil or cytosine, despite the small coupling. This becomes clear, when focusing on the spectral window below the intense GSB signals (Fig. 5e). Due to the exciton delocalization over both chromophores the higher-lying local states of each base become accessible from both exciton states, giving rise to a number of new ESAs along the trace associated with the ππ*c,2 transition in cytosine (Fig. 5e, label 2), which is empty in the isolated or non-interacting cytosine. The emerging signals are correlated with the ones along the uracil trace associated with the ππ*u transition, being red-shifted by the same energy which separates ππ*u and ππ*c,2. Although weaker in intensity than the diagonal bleach signals, these ESAs can be selectively enhanced by tuning the central frequency of the probe pulse to the region of 35,000 cm−1 or by applying polarized pulse sequences [25].

One further characteristic signature of the chromophore coupling is the off-diagonal cross-peaks (Fig. 5d, label 3) along the ππ*u and ππ*c,2 traces (Ω1 ≈ 42,000 cm−13 ≈ 44,000 cm−1 and Ω1 ≈ 44,000 cm−13 ≈ 42,000 cm−1), appearing when the pump and the probe pulse involve different transitions in the system. These off-diagonal bleach signals are particularly intriguing from an experimental point of view as they do not exhibit large spectral fluctuations (unlike ESA contributions) and can be easily targeted with suitable pump–probe setups (“two-color” setups required). We emphasize that their detection is only possible within the framework of bidimensional electronic spectroscopy, as they remain covered by the more intense diagonal bleach signals in case of pump–probe spectroscopy. In fact, the off-diagonal bleach signal is paired as a general rule to an ESA signal (Fig. 5d, label 4) that is expected to appear when the chromophores interact. The complementary ESA peak in each pair corresponds to a transition from the singly excited state (e.g., ππ*u) to a mixed doubly excited state where both monomers are excited simultaneously (e.g., ππ*u + ππ*c,2). The energetic separation between the bleach and the ESA, known as quartic coupling and corresponding to the Darling–Dennison coupling in infrared spectroscopy [18], is an indicator of the coupling strength [83], as both peaks should coincide and cancel in a non-interacting system. In the current example we chose a value of −1000 cm−1, which agrees with results from previous ab initio computations of aromatic heterodimers [83], whereas the physical implication is a weak attractive force between the excitons. The cytosine trace proves better suited to visualize the quartic coupling due to being less congested.

Besides the signatures discussed so far, charge transfer transitions can become accessible [25] in stacked oligomers and can be used as markers for the chromophoric interactions, provided they appear in a signal-free region due to their low oscillator strength. The energetic position and signal intensity of charge transfer states are hard to deduce without explicitly computing the electronic structure of the dimer, and therefore, we have not included them in the present exciton model. Furthermore, the higher-lying local states of each monomer will also be affected by the base stacking, although our experience shows that the effects are rather weak [25, 83].

4 Conclusions

In this work we report a rigorous calibration of the electronic high-lying excited-state manifolds of canonical pyrimidine nucleobases (thymine, uracil, and cytosine) in vacuo using accurate multiconfigurational wave function methods. The adopted approach provides accurate estimates of transition energies and dipole moments that have been used for simulating two-dimensional NUV-pump/Vis-probe and NUV-pump/NUV-probe spectra of the nucleobases and for constructing the excitonic Hamiltonian of a perfectly stacked uracil–cytosine dimer model. The nonlinear electronic spectra of thymine and uracil show rich 2D maps, with several ESA signals that represent spectroscopic fingerprints of these nucleobases. Notably, the 2D maps could resolve the tiny energy shifts caused by the small structural difference between thymine and uracil bases. Moreover, in the Vis-probe window, a set of ESA fingerprints can be used to discern between thymine and uracil bases, in analogy to adenine [46], suggesting a probing window for monomer-specific markers. Cytosine 2D spectra differ significantly from the other two pyrimidine nucleobases, due to the presence of two ππ* absorbing excited states in the NUV window, with one of them involving particularly bright transitions, gathering most of the spectral intensities. The signal trace associated with this cytosine state appears in a similar energetic window as the main ππ* state of uracil, inducing non-negligible electronic interactions. Interestingly, the 2DES spectrum of cytosine is dominated by the GSB signals and it is lacking ESA signatures unless non-covalent interchromophoric interactions are involved, which makes this nucleobase an ideal marker for interbase interactions. A preliminary study of these interchromophoric electronic couplings, based on a Frenkel excitonic model, was performed on a perfectly stacked uracil–cytosine dimer model. This indicates the presence of spectroscopic fingerprints of non-covalent chromophore interactions that could be detected on the spectral trace of the ππ*c,2 excited state of cytosine, by setting appropriate pump–probe pulse central frequencies, bandwidths, and polarizations.