Main

DNA replication is initiated by primases that synthesize short RNA/DNA primers, which are subsequently extended by processive polymerases. In bacteria, primer synthesis is undertaken by DnaG primases belonging to the TOPRIM family4,5. In eukaryotes, archaea and some viruses, replicative priming is performed by specialized primases from the Prim-Pol superfamily1, formerly known as archaeo-eukaryotic primases1,2. Prim-Pol proteins also undertake more diverse roles, including DNA repair, damage tolerance and adaptive immunity1,3,6,7,8. Although primases and polymerases have a common metal-dependent mechanism9, little is known about the de novo initiation step of primer synthesis10,11. The catalytic domain of Prim-Pol proteins (PP) is proficient in polymerase and translesion synthesis activities6,7,9,12,13,14,15. However, Prim-Pol enzymes reportedly require additional modules, in conjunction with their PP domain, to facilitate dinucleotide formation, the first step of primer synthesis16,17,18,19,20,21,22. Given the fundamental importance of priming for genome duplication, it is critical to understand the mechanism of primer synthesis.

The CAPP PP domain produces primers

To determine the molecular requisites for primer synthesis, we studied a CAPP from Marinitoga piezophila (Mp), a Prim-Pol protein implicated in CRISPR–Cas spacer acquisition3. MpCAPP possesses both DNA primase and polymerase activities and consists of a tetratricopeptide repeat (TPR), a PP domain and a predicted PriCT motif within the C-terminal domain (CTD) (Fig. 1a, top) containing a putative iron–sulfur cluster-binding motif, potentially required for primer synthesis (Extended Data Fig. 1a–d). Although mutating this motif ablated iron binding (Extended Data Fig. 1b–d), synthesis activities were unaffected (Extended Data Fig. 1e, f, lanes 6–9). To evaluate which domains are required for DNA synthesis, truncations (ΔTPR, PP and ΔCTD) were assessed for primer extension activities (Fig. 1a, bottom left, and Extended Data Fig. 1g, lanes 6–17). The PP and ΔCTD truncations retained efficient polymerase activity, indicating that only PP is essential for primer extension. Given that CAPP’s CTD has similarities with Pri2/L and PriCTs, proposed to be involved in primer synthesis3, we tested the truncations for primase activity. The ΔCTD and PP truncations both exhibited robust primer synthesis (Fig. 1a, bottom right, and Extended Data Fig. 1h, lane 6–9 and 14–17), establishing that CAPP’s catalytic domain is sufficient for primer initiation and extension. This was unexpected given that Prim-Pol enzymes reportedly require additional modules to facilitate primer synthesis10,11,16,17,18,19,20.

Fig. 1: MpCAPP’s PP domain is polymerase and primase proficient.
figure 1

a, Top, schematic representations of MpCAPP fragments with the highlighted domains. Bottom left, quantification of the polymerase activity of full-length wild-type MpCAPP (FL), its fragments or the full-length D177A/D179A mutant (FL AXA) (125 nM protein). Bottom right, quantification of the priming activity of MpCAPP FL, its fragments or the FL AXA mutant (2 µM protein). Data represent the mean ± s.d. from three independent experiments, except for FL and FL AXA in the primase assay, where four independent experiments were performed. Representative gels are shown in Extended Data Fig. 1g, h. Black dots represent individual values. WT, wild type. b, Left, overall structure of a primer initiation complex of MsCAPP (grey) bound to ssDNA (pink), GTP (blue), dATP (orange) and Co(II) (spheres) highlighting DNA-interacting regions R1–R3 (yellow). Right, MsCAPP active site showing a simulated annealing F– Fc omit map of the ssDNA template, I-site GTP and E-site dATP (contoured at 2σ at a resolution of 1.90 Å), along with the side chains of Y134, Y138 and Co(II). c, Schematic showing the protein–DNA and protein–nucleotide interactions in the primer initiation complex. Pentagon, sugar; square, base; red sphere, phosphate; yellow sphere, OH group; peach sphere, metal ion. Black arrows indicates an interaction between an amino acid and DNA, nucleotides or metal ions; blue dashed arrows indicate π–π stacking. d, Residues interacting with nucleotides bound in the I-site (left) and E-site (right). e, Left, active site with the oxygen of the 3′-OH of GTP about to initiate a nucleophilic attack on the α-phosphate of dATP. Middle, A-site Co(II), showing octahedral coordination to nucleotides and surrounding acidic residues with distances labelled in angstroms. E260 interacts with oxygen from the 2′-OH of GTP. Right, B-site Co(II), showing octahedral coordination to dATP, the DXD motif and a water molecule, with distances labelled in angstroms.

Structure of a primer initiation complex

To understand the active site architecture of CAPPs, we elucidated the crystal structures for the PP domain of Marinitoga sp. 1137 CAPP, amino acids 111–328 (MsPP111–328), in its apo form (resolution of 2.97 Å) and in complex with dGTP and manganese (Mn(II)) ions (resolution of 1.28 Å) (Extended Data Fig. 2a and Extended Data Table 1). MsPP111–328 is >98% identical to M. piezophila PP111–328 (Extended Data Fig. 2b) and also possesses primase activity (Extended Data Fig. 2c, lanes 6–9). The MsPP domain is composed of an α/β domain (residues 111–164 and 274–278) and an RNA recognition motif (RRM) domain (residues 169–262 and 291–328)2 (Extended Data Fig. 3a, b). This structure has substantial homology with those for other Prim-Pol enzymes, including Pri1 (ref. 23) and PrimPol14 (Extended Data Fig. 3c). In the complex with dGTP, nucleotide and Mn(II) ions were bound in an active site cleft, proposed to be the elongation (E) site (Extended Data Figs. 2a and 3a). D177 and D179 (motif I) and E260 (motif III) interacted with the two divalent metals. H226 (motif II) was hydrogen-bonded to the triphosphate tail of dGTP.

Prim-Pol proteins initiate primer synthesis by catalysing dinucleotide bond formation. To understand the molecular basis of this de novo synthesis step, we elucidated the structure of a ternary primer initiation complex performing dinucleotide synthesis at a resolution of 1.9 Å (Fig. 1b and Extended Data Table 1). The complex consists of the MsPP domain bound to GTP in the initiating site (I-site), a non-hydrolysable dATP analogue (dAMPNPP) in the E-site, two cobalt ions3 and a single-stranded DNA (ssDNA) template (9-mer) (Fig. 1b–d). The incoming bases (G and dA) form Watson–Crick pairings with their respective templating bases, with the 3′-OH nucleophilic group of the initiating base (GTP) positioned within attacking distance (3.9 Å) of the α-phosphate of the elongating nucleotide (dATP) (Fig. 1e, left), in readiness for dinucleotide bond formation using a two-metal-dependent mechanism24. The Co(II) ions are bound with octahedral symmetry in the A- and B-sites (Fig. 1e, middle and right; Extended Data Fig. 3d shows an omit map at 5σ). Metal A is coordinated by D177 and D179 (DXD, motif I), E260 (motif III), the α-phosphate group of dATP (E-site) and the 2′- and 3′-OH groups of GTP (I-site). The 2′-OH of the initiating nucleotide is weakly coordinated to metal A (3.1 Å) and stabilized by E260, suggesting why a ribonucleotide is requisite for primer initiation3. Metal B coordinates to the DXD sequence (motif I), phosphate groups (E-site dATP) and a water molecule, in the active site. This is similar to the coordination of Mn(II) in the dGTP complex, with hydroxyl groups from the cryoprotectant ethylene glycol replacing the hydroxyl groups of the ribose group from GTP (Extended Data Fig. 3e). Binding of PP to the template strand (−3 to +1) predominantly involves residues from region 1 (R1, hairpin 130–142) and region 2 (R2, loop 263–274) (Fig. 1b, c), which have moved to accommodate the ssDNA template (Extended Data Fig. 3f). Y134 pivots on its Cβ atom to form a π–π stacking interaction with the adenine base at the +2 position, wedging the template open, while R139 stabilizes the oxygen of the deoxyribose ring of the +2 templating nucleotide. There are no interactions with the nucleotide at position +3 and those further downstream (template strand).

PP interacts with the E-site nucleotide (dATP) via basic residues (R223, H226, K277 and H283), along with polar residues (T220 and N222), which directly contact the phosphate groups (Fig. 1c, d). E-site nucleotide binding also has a stabilizing effect on region 3 (R3, loop 283–286), which is unresolved in the apo structure (Extended Data Fig. 3f). The backbone amine group of K277 interacts with the 3′-OH group of dATP’s ribose, while F262, L275 and I276 form a hydrophobic pocket for the base and sugar (Extended Data Fig. 3g). The 2′ position of the dATP ribose ring fits snugly against hydrophobic residues L275 and Y138, which cannot accommodate a 2′-OH group (NTPs), explaining the preferred specificity for dNTPs in the E-site3. The adenine base of dATP bound in the E-site forms π–π stacking interactions with Y138 and the guanine base of GTP at the I-site.

A notable feature of the initiation complex is how remarkably few contacts are made between the enzyme and GTP (I-site) (Fig. 1c, d). Instead, this interaction relies mainly on π–π stacking interactions between the guanine base and neighbouring purine bases, provided by dATP (E-site) and an adenine base (−2) on the template strand. Together with Y138, these bases form an extended π–π stacking network that stabilizes GTP binding within the I-site and also PP’s binding to the template strand (Fig. 1c). These interactions are reminiscent of stabilizing contacts made within the primer initiation complex of RNA polymerases25,26,27. K181, K182 and R223 form a positively charged pocket that, along with metals A and B, binds to the triphosphate tail of the initiating GTP. Notably, the triphosphate tail in an alternative initiation complex adopts a different orientation and binds an additional metal ion (Extended Data Fig. 3h).

Molecular modelling studies investigating intermolecular interactions within the active site showed strong electrostatic stabilization between dATP and R223, with smaller stabilizing contributions from dispersion and induction (Extended Data Fig. 4). Dispersion and induction interactions between Y138 and GTP and between GTP and dATP were observed to be stabilizing for these pairs, consistent with an extended π–π stacking network (Extended Data Fig. 4a, c). However, when only pairwise interactions were considered, an overall destabilizing interaction between GTP and dATP was observed (Extended Data Fig. 4c). Although calculation of pairwise interactions indicates a repulsive interaction between the triphosphates of the I-site and E-site nucleotides, interaction of the metal dications, Y138, dT (+1) and dATP, together with GTP, suggests that the intermolecular interaction of these fragments with GTP is stabilizing, with a major overall contribution from induction and dispersion, as well as electrostatic interactions (Extended Data Fig. 4c).

Structure of a PP domain bound to double-stranded DNA

To compare the primer initiation intermediate with a primer–template complex, we elucidated the structure of CAPP’s PP domain bound to double-stranded DNA (dsDNA, 6-mer) and Co(II) ions at a resolution of 2.02 Å (Fig. 2a and Extended Data Table 1). The base pairings at positions −5, −4, −1 and 1 of the blunt-ended dsDNA are formed from standard Watson–Crick hydrogen bonds, while the bases at −3 and −2 form purine–pyrimidine (G–T) mismatched pairings, which induce a slight kinking of the B-form helix. The PP domain showed relatively minor overall differences from the other complexes (Extended Data Fig. 5). Binding of PP to the template strand (−3 to +1) is remarkably similar to the initiation complex, which is held in position by residues from R1 and R2 (Fig. 2b, c). A notable feature was the limited contacts made between PP and the primer strand (Fig. 2b, right). H226 (motif II) and H283 (R3) are in close proximity to the primer strand and, along with K277, position the hydroxyl group of the 3′ nucleotide (+1) in place. Both metal ions interact with the phosphodiester bond at +1 and −1, and R223 contacts the phosphodiester bond at −1 and −2 of the primer strand. Y138 (R1) stacks against the base at +1 (primer strand). R223, which sits between the phosphate tails of the initiating and elongating nucleotides in the primer initiation complex, has moved further towards the initiation site. It now holds the phosphate of the phosphodiester linkage between positions −2 and −1 in place, suggesting a role in translocating the priming strand during extension. The 3′ nucleotides (primer strand), template nucleotides and metal ions superpose onto a Prim-PolC ternary complex9 with root-mean-square deviation (r.m.s.d.) of about 2.0 Å (Fig. 2d, left), indicating that this represents a postcatalytic product complex. Comparison of the primer initiation complex with either the Prim-PolC ternary complex (Fig. 2d, right) or the CAPP postcatalytic product complex (Fig. 2e) shows similar active site configurations, indicating that a shared two-metal-dependent mechanism catalyses both dinucleotide synthesis and primer extension.

Fig. 2: Structure of a post-ternary complex of PP bound to dsDNA.
figure 2

a, A surface representation of the MsCAPP PP domain (grey) bound to dsDNA (pink and orange) forming a post-ternary complex, highlighting important DNA-interacting regions R1–R3 (yellow). b, Molecular models showing protein–DNA interactions with the template strand (left) and the primer strand (right). c, Schematic showing the important protein–DNA and protein–nucleotide interactions formed in the PP–dsDNA complex. Pentagon, sugar (deoxyribose); square, base; red sphere, phosphate; yellow sphere, OH group; peach sphere, metal ion. Black arrows indicate interactions between indicated amino acids and DNA. d, Superposition of active site bases and metal ions of the CAPP–dsDNA (cyan) and Prim-PolC–DNA (PDB 6SA0) (green) complexes (left) and the CAPP primer initiation (orange) and Prim-PolC–DNA (green) complexes (right). e, Superposition of the active sites of the CAPP–dsDNA (cyan) and CAPP primer initiation (orange) complexes.

Structure–function analyses of the PP domain

To examine the roles of specific residues in primer synthesis and/or extension, we mutated residues interacting with incoming nucleotides, metal ions or DNA (Fig. 3a) and evaluated the effect on MpCAPP’s PP synthesis activities (Fig. 3b and Extended Data Fig. 6a, b). Mutation of the metal-binding residues of motif I (AXA, D177A/D179A) ablated synthesis activities. Mutation of R223, which interacts with the dinucleotide/primer, resulted in almost complete loss of primase and polymerase synthesis activities; although the mutant protein was still capable of some dinucleotide synthesis, it only extended primers by 1 or 2 nucleotides, intimating a role in primer translocation. Mutation of H226 (motif II), which interacts with the incoming dNTP (+1), resulted in a reduction in polymerase activity, similar to that seen with R223A, but the mutant protein was significantly more deficient in dinucleotide synthesis. Mutation of other contacts with the incoming nucleotides (Y138A, E260A, F262A, K277A and K181A/K182A (KK)) also reduced primase and polymerase activities. Although the polymerase activity of the Y138A mutant was only modestly compromised, this mutant’s ability to synthesize dinucleotides was significantly diminished. This indicates that π–π stacking between Y138 and the incoming dNTP (E-site) is critical for the dinucleotide formation process but less essential for primer elongation. E260 makes contacts with metal A as well as with the 2′-OH and 3′-OH of the initiating GTP base (−1). The E260A mutant was barely active in primer extension assays, compared with the R223A or H226A mutant, and its inability to synthesize dinucleotides was comparable to that of the AXA mutant. This intimates that E260 has crucial roles in both priming and extension, owing to its binding to both metal A and the 2′-OH of the incoming NTP (−1) during the initiation of dinucleotide synthesis and to the 3′-OH of the deoxynucleotide (−1) of the primer during extension. Interaction of E260 with the 2′-OH of the incoming NTP explains why this residue is crucial for the first step of priming, as it appears to stabilize the NTP in a catalytically competent orientation for dinucleotide bond formation. Mutation of R2 residues (K264A, Q265A or N274A), which interact with the template strand, slightly reduced primer extension activity, but the resulting mutants exhibited strongly reduced primase activity (Fig. 4b). Although a triple R2 mutant (KQN) could perform limited primer extension, its priming activity was severely compromised, establishing the importance of R2 for both priming and polymerase activities.

Fig. 3: Structure–function analyses of MsCAPP PP primer synthesis and extension activities.
figure 3

a, Schematic representation of the interactions of MsCAPP with template and nucleotides in the primer initiation complex. Red sphere, phosphate; yellow sphere, OH group; peach sphere, metal ion. Black arrows indicate interactions. b, Effects of MpCAPP PP domain mutations on polymerase (left) and primase (right) activities. AXA, D177A/D179A; RR, R142A/R143A; KK, K181A/K182A; KQN, K264A/Q265A/N274A. Data represent the mean ± s.d. from five (polymerase) and four (primase) independent experiments. Representative gels are shown in Extended Data Fig. 6a, b. Black circles indicate individual values. c, Affinity of MsCAPP PP for template, dATP and GTP. FP assays contained 0–20 µM protein, 25 nM FAM–dATP (⋆dATP) or FAM–γGTP (⋆γGTP) or 50 nM FAM–DNA (⋆DNA, oKZ409)  ± 1 mM GTP and 0.1 mM dATP. d, Presence of GTP and dATP stimulates affinity of MsCAPP PP for template. FP assays contained 0–20 µM protein, 50 nM FAM–DNA (oKZ409) ± 0.1 mM dATP and/or 1 mM GTP/dGTP. e, Affinity of MsCAPP PP for template decreased over time in the presence of GTP and dATP. FP assays contained 10 µM protein, 50 nM FAM–DNA (oKZ409) ± 1 mM GTP and 0.1 mM dATP. FP was measured every 5 min for 2 h; 0.1 mM dATP was added at 60 min (dashed line indicated by a red arrowhead). f, The MsCAPP PP domain prefers to bind template containing a purine at the −2 position. FP assays contained 0–20 µM protein, 50 nM FAM–DNA (oKZ409, oKZ417 or oKZ418) ± 1 mM GTP and 0.1 mM dATP. A star indicates FAM labelling. Data in cf represent the mean ± s.d. from four independent experiments. ND, not determined. g, A model of Prim-Pol proteins’ primer initiation and extension mechanism. Green crescent, Prim-Pol; blue rectangle, ssDNA template; peach rectangle, newly primed strand; red R, purine ribonucleotide. Primed strand deoxyribonucleotides are shown in purple.

Fig. 4: The PP domains of eukaryotic replicative primases are primase proficient.
figure 4

a, I- and E-sites of CAPP (left), human PrimPol (middle) and human Pri1 (right). GTP (CAPP primer initiation complex) was superposed onto human Prim-Pol structures to identify their I-sites. b, Schematic representations of human, mouse and X. tropicalis full-length Prim-Pol proteins and their PP domains. ZF, zinc finger; RBD, RPA-binding domain. c, The human PrimPol PP is priming proficient. Primase assays contained 0.125, 0.25, 0.5 or 1 μM full-length HsPrimPol (lanes 2–5), 0.125, 0.25, 0.5 or 1 μM HsPrimPol1–354 (PP; lanes 6–9) or 1 μM HsPrimPol1–354 D114A/E116A (AXA; lane 10), 1 μM ssDNA (oKZ388), 2.5 μM FAM–γGTP (⋆γGTP) and 100 μM dNTP. d, Mouse and X. tropicalis PPs are priming proficient. Assays contained 0.125, 0.25, 0.5 or 1 μM HsPrimPol1–354 (HsPP; lanes 2–5), MmPrimPol1–338 (MmPP; lanes 6–9) or XtPrimPol1–334 (XtPP; lanes 10–13) under the conditions described in c. e, The HsPrimPol PP prefers a purine base at the −2 position (template) for dinucleotide synthesis. Reactions contained 0.25 µM HsPP, 1 μM ssDNA template (lane 2, oKZ435; lane 3, oKZ447; lane 4, oKZ449; lane 5, oKZ450; lane 6, oKZ448), 2.5 μM FAM–γGTP and 100 µM dNTP. Left, representative gel. Right, quantification. Dinucleotide signal was normalized to that in the presence of 3′-AAACTAAA-5′ (100%). Data represent the mean ± s.d. Black dots indicate individual values. f, Human Pri1 exhibits priming activity. Reactions contained 1, 2, 4 or 8 μM wild-type Pri1 (Pri1; lanes 2–5, 7) or 8 μM Pri1 D109A/D111A/D306A (Pri1AAA; lane 6), 1 μM ssDNA (oKZ388), 10 μM FAM–γGTP, and 100 μM ATP, CTP and UTP (lanes 1–6) or 100 μM dATP, dCTP and dTTP (lane 7). Results are representative of three (d, f) or four (c, e) independent experiments. Nt, oligonucleotide length marker; ‘C’ indicates control without protein.

Nucleotides affect PP binding to ssDNA

To investigate the mechanism of nucleotide and template binding, we determined the affinities of MsCAPP’s PP domain for GTP, dATP and ssDNA template (8-mer) using fluorescence polarization (FP) to measure dissociation constants (Kd). PP’s binding affinity was relatively weak for GTP (Kd = 23.70 µM) but was much stronger for dATP (Kd = 1.40 µM) (Fig. 3c). Although PP alone bound weakly to ssDNA, addition of both GTP (1 mM) and dATP (0.1 mM), in the presence of metal ions, strongly enhanced its affinity for DNA (Kd of about 1 µM). Modelling studies were also consistent with these findings (Extended Data Fig. 4). Together, these results underscore the importance of the divalent cations in stabilization of the negative charges of the two triphosphate moieties in close proximity. This increased affinity is not due to PP binding to newly synthesized dinucleotide, as addition of pppGpdA (riboG-deoxyA; rG–dA) dinucleotide did not increase the affinity for template (Extended Data Fig. 6c). Addition of GTP or dATP alone also did not stimulate template binding (Fig. 3d). Although addition of dGTP (1 mM) with dATP (0.1 mM) stimulated DNA binding by PP (Kd = 10.25 µM), this was about 10-fold lower than was observed in the presence of both GTP and dATP, which supports dinucleotide synthesis (Fig. 3d). This suggests that the 2′-OH group (GTP) stabilizes the first base (I-site) next to the dNTP (E-site) on the template strand. Together, these data indicate that the strongest affinity of PP for template occurs when both nucleotides and template are bound within PP’s active site before turnover. This conclusion was further supported by assays that showed that PP binding to template, under conditions that allow turnover, decreased over time to levels observed in the absence of GTP and dATP (Fig. 3e). However, addition of dATP (0.1 mM at 60 min) increased the affinity of PP to a level similar to that observed at time 0, indicating that the loss of binding affinity was due to lack of dATP, which is required for dinucleotide synthesis and primer extension.

Next, we tested how template sequence influences PP’s affinity for DNA. Binding of PP to a poly(dA) template containing a single dC or dT was not stimulated by the presence of both GTP and dATP, in contrast to a template containing 3′-dCdT-5′ (Kd of about 1 µM), indicating that nucleotide-stimulated affinity of PP for ssDNA is sequence dependent (Extended Data Fig. 6d). As the primer initiation structure showed π–π stacking between the template dA (−2) and GTP in the I-site, we examined whether the pre-initiation template base (−2) influences PP binding in the presence of incoming nucleotides. A template purine base at −2 should stabilize GTP (purine) more effectively that a pyrimidine base (Fig. 3f), which has reduced stacking potential25. When the template dA (−2) was substituted with dT, PP’s DNA binding affinity was about 4-fold lower (Kd = 3.82 µM) than for a template containing dA at this position (Kd of about 1 µM). PP binding to a template containing an abasic site at −2 was not stimulated by addition of GTP and dATP (Kd of about 20 µM). Similar results were obtained when templates containing four different bases or an abasic site at the −2 position were tested in priming assays in the presence of GTP and dATP. Dinucleotide synthesis was highest when a purine was at −2 (template) (Extended Data Fig. 6e). Modelling calculations (SAPT0) showed that the interaction between a dA base at position −2 and GTP exhibits an important induction and dispersion contribution (Extended Data Fig. 4c). Together, these results establish that the pre-initiation template base (−2) also has an important role in influencing PP’s affinity for the primer initiation nucleotide and probably explains the influence this ‘cryptic’ site exerts on primer synthesis.

MsPP exhibited greater affinity for DNA substrates with increasing length (Extended Data Fig. 6f), suggesting that low-affinity template sliding probably occurs before binding of incoming nucleotides28 (Fig. 3g). The relatively high affinity of MpCAPP for dATP, similar to PP’s affinity for template in the presence of both nucleotides, suggests that a dNTP (preferably a purine) is bound first in the E-site, possibly as the enzyme slides on ssDNA. PP’s affinity for template increases about 20-fold in the presence of both nucleotides, suggesting that a purine ribonucleotide binds into the I-site as the last component, ‘locking’ the enzyme onto the primer initiation site. Following turnover, the enzyme’s affinity for DNA decreases, enabling primer translocation to occur and the next round of nucleotide binding and addition to proceed (Fig. 3g).

PP domains share homology

Docking the primer initiation complex, using the bound nucleotides (E-site) and metal ions from structures of human Pri1 (ref. 23) and PrimPol14, enabled identification of putative binding sites for the I-site and ssDNA template within these replicative primases. The overall structures and active site residues surrounding the I- and E-sites are remarkably similar to those of MsCAPP (Fig. 4a and Extended Data Fig. 7a). In MsCAPP and Pri1, motifs I (DXD) and II (G/SXH) are conserved in both structures, with H226 (CAPP) and H166 (Pri1) both interacting with the phosphate tail of the E-site nucleotide. Motif III (hD/Eh, in which ‘h’ is a hydrophobic residue) is also present in Pri1, although E260 (CAPP) is replaced by D306 (Pri1). H283 (CAPP) and H324 (Pri1) also interact with the phosphate tail of the E-site nucleotide. R223 (CAPP) also has a counterpart in Pri1 (R163). L275 and K277 (CAPP) have counterparts L316 and K318 (Pri1). Both proteins have a tyrosine in R1, of which Y138 (CAPP) stacks with the base in the E-site and Y54 (Pri1) is close to the active site, although it does not interact with the base. Pri1 uses ribonucleotides for extension, although it can bind to dNTPs23. By contrast, CAPP’s E-site binds to dNTPs3. When examining residues surrounding the 2′ position of the E-site nucleotide, Y138 and L275 form a more hydrophobic environment in CAPP, whereas D79 forms a hydrogen bond with the 2′-OH in Pri1 (Extended Data Fig. 7b).

The CAPP and PrimPol active sites are also highly similar (Fig. 4a). Motif I (DXD in CAPP; DXE in PrimPol), motif II (G/SXH) and motif III (hEh in CAPP and hDh in PrimPol) are all present in both structures (Fig. 4a). Many key basic residues are also conserved, including R223, K277 and H283 (CAPP) and K165, R291 and K297 (PrimPol). R76 in R1 of PrimPol fulfils a role similar to that of Y138 in R1 of CAPP. Y100 (PrimPol) is also involved in selecting for dNTP binding in the E-site29. Together, these structural similarities intimate that the mechanism of primer initiation is probably conserved between these related DNA Prim-Pol proteins.

Eukaryotic PP domains can prime

Given CAPPs’ overt structural similarities with replicative Prim-Pol proteins, we investigated whether eukaryotic PP domains could also catalyse primer synthesis independently. PrimPol consists of a PP domain and a C-terminal zinc finger, previously considered critical for priming (Fig. 4b)13,16. As with CAPP, we observed that the PP domain (HsPrimPol1–354) of human PrimPol alone was sufficient for primer synthesis (Fig. 4c, lanes 6–9, and Extended Data Fig. 8a), although the activity was decreased in comparison with full-length protein, suggesting some stimulatory/stabilization role for the zinc-finger domain in priming13,16. All catalytic activities were ablated in a catalytic mutant (D114A/E116A) (Fig. 4c, lane 10). The equivalent PP domains of mouse (MmPrimPol1–338) and Xenopus tropicalis (XtPrimPol1–334) PrimPol proteins are also primase proficient, despite lacking their auxiliary domains (Fig. 4b, d, lanes 6–9 and 10–13, respectively). Decreasing concentrations of labelled GTP in the reaction caused loss of detectable PrimPol priming activity (Extended Data Fig. 8b), in agreement with the FP studies on MpCAPP, indicating that the I-site nucleotide can be readily outcompeted by unlabelled dNTPs, if present at substoichiometric concentrations (Extended Data Fig. 8c, d). Therefore, discrepancies with previous studies are probably due to more physiologically relevant concentrations of labelled initiating nucleotide being used in the current study, allowing priming to be more readily observed.

Mutation of corresponding residues in HsPrimPol1–354, shown to be important for CAPP’s primase and/or polymerase activity, also had a significant negative effect on its synthesis activities (Extended Data Fig. 8e–g). While HsPrimPol1–354 could produce primers using only dNTPs, primer synthesis was stimulated by the addition of purine NTPs, particularly GTP (Extended Data Fig. 8h), which was incorporated as the initiating primer nucleotide (Extended Data Fig. 8i). Similarly to CAPP, HsPrimPol1–354 also primes most efficiently when a purine base is located at the −2 position, suggesting that a similar π–π stacking network is also involved in primer initiation by other Prim-Pol proteins (Fig. 4e).

Eukaryotic and archaeal replicative primases (Pri1/PriS) reportedly require a second subunit (Pri2/PriL) to initiate primer synthesis30. However, when we assayed human Pri1 for primer synthesis activity, de novo primer synthesis was evident (Fig. 4f, lanes 2–5)31, although this was less efficient than with Pri2 (ref. 32). As with CAPP and PrimPol, Pri1 also prefers to initiate primer synthesis with a 5′ GTP over ATP (Extended Data Fig. 8j) but extends with NTPs, rather than dNTPs (Fig. 4f, compare lanes 5 and 7)30. Together, these findings establish that the PP domains of replicative primases are sufficient to catalyse de novo primer synthesis, supporting a conserved mechanism across the Prim-Pol superfamily.

Discussion

Here we present the molecular basis for primer synthesis by a DNA primase, uncovering the mechanism of de novo dinucleotide bond formation that initiates priming. This study provides compelling evidence that all the molecular determinants required to undertake the critical steps of primer synthesis reside within the catalytic domain of Prim-Pol proteins. The first key step of primer synthesis involves sliding of Prim-Pol along the ssDNA, which binds the elongating dNTP into an active site pocket (E-site) (Fig. 3g). The next step involves preferential binding of a purine ribonucleotide in the primer initiation site (I-site). Prim-Pol makes only limited contacts with the I-site nucleotide, and adjacent nucleotides and metal ions are therefore crucial for stabilizing its binding to the I-site. Most of the critical interactions with the initiating nucleotide occur between adjacent bases that form a π–π stacking network, which stabilizes initial binding of the incoming nucleotide (I-site), enabling it to form a stable Watson–Crick pairing with the template strand. A common feature of most primases is their preference to incorporate a purine as the initiating base. In the primer initiation complex, the incoming purine (I-site) stacks against a purine base (−2) from the template strand, which stabilizes its binding and probably explains the preferential binding of purines, over pyrimidines, at the I-site. These π–π contacts may also influence nucleotide docking (E-site) as a result of template stabilization. The specificity for ribonucleotide binding (I-site) appears to be conferred by specific interactions between the 2′-OH of the ribose moiety with metal A and its primary liganding residue (E260 in MsCAPP). RNA-dependent RNA polymerases undertake dinucleotide synthesis using an analogous, convergent priming mechanism during transcription initiation and replication25,26,27.

This study also demonstrates that the PP domains of replicative primases initiate de novo primer synthesis in the absence of ancillary modules. Prim-Pol enzymes almost certainly evolved from primordial RNA replicases, which were subsequently repurposed to undertake more specialized cellular roles, including primer synthesis. As de novo synthesis is a relatively inefficient step28, other modules were probably acquired during primase evolution to stabilize the precarious initiation intermediate, ensuring more efficient primer synthesis and extension13,22,33,34. These modules may act to enhance DNA and dinucleotide binding but may also regulate the primer initiation step to prevent unlicensed priming or ensure efficient termination. Notably, Prim-Pol enzymes involved in DNA repair synthesis, which are primase deficient, lack equivalent modules6,7,9. Having established the mechanism for initiating primer synthesis within the catalytic core of a Prim-Pol protein, further studies are now required to determine how these catalytic steps are influenced by ancillary modules and how primer termination is achieved.

Methods

Cloning, expression and purification of recombinant proteins

A description of all constructs, their cloning details and a list of all primers used can be found in Supplementary Table 1 and Supplementary Note 1.

MpCAPP FL WT, MpCAPP FL AXA, MpCAPP FL CC, MpCAPP fragments ΔCTD, ΔTPR and PP WT (and mutants), MsCAPP PP100–359 and MsCAPP PP111–328 were expressed from plasmids pKZ43, pKZ60, pKZ125, pKZ38, pKZ121, pKZ39, pPK247 and pAL101, respectively, in the BL21(DE3) Escherichia coli strain. The transformed cell cultures were grown in standard TB medium to OD600 of 0.8–1. Expression of all proteins was induced by adding IPTG to a final concentration of 0.5 mM, followed by incubation for 3 h at 37 °C.

MpCAPP FL WT and mutants (FL AXA and FL CC) and the ΔTPR fragment were fused to MBP via their C terminus and purified as described for MpCAPP FL WT previously3 with one modification: all purification procedures were done under deoxygenated conditions with N2-purged solutions and in a glove bag under an N2 atmosphere. In brief, collected cells were resuspended in buffer A (50 mM HEPES pH 7.5, 500 mM NaCl, 10% (vol/vol) glycerol, 1 mM TCEP, 10 mM imidazole) containing protease inhibitors, sonicated and cleared by ultracentrifugation. The supernatant was incubated with cobalt resin and eluted in buffer A containing 300 mM imidazole. Eluted protein was loaded onto amylose resin and washed in amylose wash buffer (50 mM HEPES pH 7.5, 500 mM NaCl, 10% (vol/vol) glycerol, 1 mM TCEP), and the bound protein was eluted with amylose wash buffer supplemented with 10 mM maltose. Eluted protein was concentrated in a Vivaspin 20 column (Sartorius), frozen in liquid nitrogen and stored at −80 °C.

For purification of MpCAPP fragments ΔCTD, PP WT (and mutants) and MsCAPP PP, the cell pellet was resuspended in buffer B (50 mM HEPES pH 7.5, 250 mM NaCl, 10 mM imidazole, 10% (vol/vol) glycerol, 0.5 mM TCEP) containing protease inhibitors, sonicated and ultracentrifuged. The supernatant was incubated with cobalt resin and extensively washed, and the protein was eluted in buffer B containing 300 mM imidazole. Eluted protein was loaded onto a 5-ml HiTrap Q HP column (Cytiva), and the resulting flow was immediately loaded onto a 5-ml HiTrap SP FF column (Cytiva) (before loading of the MpCAPP ΔCTD fragment onto Q and SP columns, the salt concentration of the sample was decreased to 150 mM to allow binding to the SP column). The SP column was developed with a 50-ml gradient of 250–600 mM NaCl in buffer B for MpCAPP PP WT and MsCAPP PP and 150–600 mM NaCl for MpCAPP ΔCTD. The peak fractions were pooled, concentrated in a Vivaspin 20 column (Sartorius), aliquoted, frozen in liquid nitrogen and stored at −80 °C. All MpCAPP PP mutants were expressed and purified as MpCAPP PP WT.

Full-length HsPrimPol was expressed from pET28a-HsPrimPol and purified as described previously13. In brief, the protein was expressed in SHuffle T7 E. coli cells overnight at 16 °C. The protein was purified using Ni-NTA affinity resin (Generon) followed by a HiTrap Heparin HP column (Cytiva) and finally size exclusion chromatography on a Superdex 75 column (Cytiva).

HsPrimPol1–354 and HsPrimPol1–354 mutants were expressed from plasmids pET28a-HsPrimPol1–354 and pLB38-46 in BL21(DE3) E. coli cells. The transformed cell cultures were grown in standard TB medium to an OD600 of 3. Expression of the proteins was induced by adding IPTG to a final concentration of 1 mM, followed by incubation for 3 h at 37 °C. Collected cells were resuspended in buffer C (180 mM phosphate citrate pH 7.0, 30 mM imidazole, 10% (vol/vol) glycerol, 0.5 mM TCEP) containing 0.1% Tween-20 and 0.5 mg ml–1 lysozyme, sonicated and cleared by ultracentrifugation. The supernatant was loaded onto an Ni-NTA column and washed with 180 mM phosphate citrate, pH 7.0, and then 90 mM phosphate citrate, pH 7.0. The protein was eluted directly into a HiTrap SP HP column with buffer D (90 mM phosphate citrate pH 7.0, 500 mM imidazole) and washed with 180 mM phosphate citrate, pH 7.0. The protein was eluted in buffer E (180 mM phosphate citrate pH 7.0, 100 mM potassium glutamate, 250 mM NaCl, 0.5 mM TCEP). The protein was concentrated in a Vivaspin 500 column (Sartorius), diluted with glycerol (50% final), aliquoted, frozen in liquid nitrogen and stored at −80 °C.

MmPrimPol1–338 and XtPrimPol1–334 were expressed from pET28a-MmPrimPol1–338 and pET28a-XtPrimPol1–338, respectively, in BL21 cells (the amino acid sequence of X. tropicalis PrimPol can be found in Supplementary Note 2). The transformed cell cultures were grown in standard TB medium to OD600 of 0.6. Expression of the proteins was induced by adding IPTG to a final concentration of 0.4 mM, followed by overnight incubation at 20 °C (MmPrimPol1–338) or 25 °C (XtPrimPol1–334). In brief, collected cells were resuspended in buffer F (50 mM Tris-HCl pH 7.5, 300 mM NaCl, 30 mM imidazole, 10% (vol/vol) glycerol, 17 μg ml–1 PMSF, 34 μg ml–1 benzamidine) with 0.1% IGEPAL and 1 mg ml–1 lysozyme, sonicated and cleared by ultracentrifugation. The supernatant was loaded onto an Ni-NTA column, washed with 5% buffer G (same as buffer F with 300 mM imidazole and 2 mM β-mercaptoethanol) and eluted in buffer G. To load the proteins onto a HiTrap Heparin HP column, the samples were diluted 1:10 in buffer H (50 mM Tris-HCl pH 7.5, 50 mM NaCl, 10% (vol/vol) glycerol, 2 mM DTT). The proteins were eluted using a gradient of up to 1 M NaCl. MmPrimPol1–338 was further purified with an SP column, using the same protocol as for the HiTrap Heparin HP column above. XtPrimPol1–334 was further purified by size exclusion chromatography on a Superdex 75 10/300 GL gel filtration column (Cytiva) using buffer J (50 mM Tris-HCl pH 7.5, 300 mM NaCl, 10% (vol/vol) glycerol, 0.5 mM TCEP). The proteins were frozen in liquid nitrogen and stored at −80 °C.

HsPri1 WT and the D109A/D111A/D306E mutant (plasmids pKZ241 and pKZ248, respectively) were expressed in BL21(DE3) E. coli cells. The culture was grown in TB medium at 37 °C to OD600 of 0.8–1. Protein expression was induced with 1 mM IPTG, and the culture was further incubated at 16 °C for 16 h. The first purification step by IMAC was performed as for the MpCAPP PP fragment as described above. After elution from cobalt resin, the protein was diluted 1:5 with ion exchange wash buffer (50 mM HEPES pH 7.5, 10% (vol/vol) glycerol, 0.5 mM TCEP) and loaded onto a 5-ml HiTrap Q HP column (Cytiva). The protein was eluted with a 100-ml gradient of 0–750 mM NaCl. The peak fractions containing Pri1 were pooled, diluted 1:5 with ion exchange wash buffer and loaded on a 5-ml HiTrap SP HP column (Cytiva). The bound protein was eluted with a 100-ml gradient of 0–750 mM NaCl. The eluted protein was further purified using a HiLoad 16/600 Superdex 200 pg column (Cytiva) in buffer containing 50 mM HEPES pH 7.5, 250 mM NaCl, 10% (vol/vol) glycerol and 0.5 mM TCEP. The fractions containing Pri1 were pooled, concentrated in a Vivaspin 20 column (Sartorius), diluted with glycerol (50% final), aliquoted, frozen in liquid nitrogen and stored at −80 °C.

Gels showing all purified proteins used in this study are available in Extended Data Fig. 9.

Polymerase assays

Polymerase assays were performed as described previously3 with minor changes. Twenty-microlitre reactions contained DNA substrate (FAM-labelled DNA primer and DNA template (oligonucleotides oPK405 + oPK404 or oNB2 + oNB1)) and 100 μM dNTPs in MpPolBuffer (10 mM Bis-Tris propane pH 7.0, 10 mM MgCl2, 10 mM NaCl and 0.5 mM TCEP) for MpCAPP or EuPolBuffer (10 mM Bis-Tris propane pH 7.0, 10 mM MgCl2, 0.5 mM TCEP, 0.1 mg ml–1 BSA) for HsPrimPol. Reactions were supplemented with MpCAPP or HsPrimPol variants (protein concentrations are given in the figure legends) and incubated at 37 °C for the time indicated. Reactions were quenched with 20 μl stop buffer (92.5% formamide, 5 mM EDTA, 0.025% SDS) and boiled for 3 min before electrophoresis on a denaturing gel containing 15% polyacrylamide (19:1 acrylamide:bis-acrylamide, 7 M urea, 1× TBE buffer). The gel was run at 25 W for 1.5 h in 1× TBE. Extended fluorescent primers were imaged using a Typhoon FLA 9500 scanner (Cytiva). For figures, contrast was adjusted in the linear range using ImageJ.

ImageJ was used for quantification of primer extension products using unmodified original scans. The signal of each band in the sample lane was assigned a number corresponding to the number of nucleotides added (ligated) to the fluorescently labelled primer (no extension = 0 and full template-dependent primer extension = x nucleotides). The weighted average was used to calculate the average extension length for each sample, where each extension (band; 0–x) had the importance of the value for the intensity of its fluorescence signal. The average extension of wild-type protein was used to standardize each gel (WT = 100%). Data represent the mean ± s.d. Source data are presented in Supplementary Table 2. The sequences for DNA oligonucleotides used for this assay are available in Supplementary Table 1.

Primase assays

For MpCAPP and MsCAPP, 20-μl reactions contained protein at the concentration indicated in the figure legend, 1 µM ssDNA oligonucleotide template, and 2.5 µM dATP, dTTP, dGTP and FAM-labelled dCTP (NU-809-5FM, Jena Bioscience) and 100 µM non-labelled GTP or 100 µM dNTP mix and 10 µM FAM–γGTP (NU-834-6FM, Jena Bioscience) in MpPrimBuffer (10 mM Bis-Tris propane pH 7.0, 0.5 mM TCEP, 10 mM MgCl2, 100 µM ZnCl2). Reactions were incubated at 50 °C for 30 min.

For eukaryotic PrimPol proteins, unless otherwise stated in the figure legend, 20-μl reactions contained protein at the concentration indicated, 1 μM ssDNA oligonucleotide template, and 100 μM dNTPs and 2.5 μM fluorescently labelled FAM–γGTP or FAM–γATP (NU-833-6FM, Jena Bioscience) or 2.5 μM unlabelled dCTP, dTTP and dGTP, 2.5 μM labelled FAM–dATP (NU-835-6FM, Jena Bioscience) and 100 μM individual NTPs in EuPrimBuffer (10 mM Bis-Tris propane pH 7.0, 10 mM MnCl2, 0.5 mM TCEP, 0.1 mg ml–1 BSA). Reactions were incubated at 37 °C for 30 min.

For human Pri1, 20-μl reactions contained Pri1 at the concentration indicated in the figure legend, 1 µM ssDNA oligonucleotide template (oKZ388), and 100 µM non-labelled ATP, UTP and CTP or dATP, dTTP and dCTP and 10 µM FAM–γGTP or 100 µM non-labelled GTP, UTP and CTP and 10 µM FAM–γATP in Pri1PrimBuffer (10 mM Tris-HCl pH 8.0, 0.5 mM TCEP, 5 mM MnCl2, 0.2 mg ml–1 BSA). Reactions were incubated at 25 °C for 30 min.

All primase reactions (20-μl volume) were precipitated by adding 20 μl CTAB solution (200 μM CTAB, 30 mM ammonium sulfate, 25 mM EDTA) and centrifuged at room temperature at 16,000g for 10 min. The pellet was resuspended in 25 μl loading buffer (92.5% formamide, 25 mM EDTA, 0.5% Ficoll 400). Samples were boiled for 3 min before 20 µl was loaded on a 24% (if not indicated otherwise in the figure legend) urea-PAGE gel (19:1 acrylamide:bis-acrylamide, 8 M urea (20% gel) or 6 M urea (24% gel), 1× TBE buffer) and run at 25 W for 2 h and 15 min in 1× TBE. Products were imaged using a Typhoon FLA 9500 scanner (Cytiva). For figures, contrast was adjusted in the linear range using ImageJ.

ImageJ was used to quantify products of the primase assay using unmodified original scans. The signal of the sample in the whole lane excluding fluorescently labelled mononucleotide was quantified and the background signal was subtracted (signal of control lane without protein excluding the signal of labelled mononucleotide). The primer synthesis signal of wild-type protein was used to standardize each gel (WT = 100%). Data are the mean ± s.d. Source data are presented in Supplementary Table 2. Sequences for DNA templates used in primer assays can be found in Supplementary Table 1.

Crystallization, data collection and structure determination

The construct for Marinitoga sp. 1137 CAPP100–359 was expressed and purified in the same way as MpCAPP PP WT, with an additional step of size exclusion chromatography using a HiLoad 26/60 Superdex 75 prep-grade column (Cytiva) with buffer containing 25 mM HEPES pH 7.4, 250 mM NaCl and 0.5 mM TCEP. The construct for Marinitoga sp. 1137 CAPP111–328 was cloned into pOPINF35 with a cleavable His tag, which was removed via overnight incubation with 3C protease (homemade) at 4 °C, and was otherwise purified using the same method as for MsCAPP100–359. Crystal screening experiments were set up with 6×His-tagged Marinitoga sp. 1137 CAPP (residues 100–359 for the apo form and dGTP complex, residues 111–328 for DNA complexes), and matrix screens (Molecular Dimensions, Hampton Research) were performed using the sitting drop vapor diffusion method with equal volumes of protein solution (381 μM for the apo form and dGTP complex, 90 μM for DNA complexes) and reservoir buffer. Apo crystals were grown in 0.1 M propionate, cacodylate, Bis-Tris propane (PCTP) pH 4.0 and 25% PEG 1500. For the dGTP complex, CAPP was co-crystallized with 2 mM dGTP and 10 mM MnCl2, in 20% PEG 3350 and 0.2 M potassium thiocyanate. For the primer initiation complex, CAPP was co-crystallized with 200–400 μM 5′-AAAAATCAA-3′ DNA oligonucleotide (ATDBio), 0.5 mM dAMPNPP (NU-443-1, Jena Bioscience), 2–4 mM GTP and 2 mM CoCl2 in 0.1 M HEPES/MOPS pH 7.5, 20% ethylene glycol, 10% PEG 8000, 0.1 M diethylene glycol, 0.1 M triethylene glycol, 0.1 M tetraethylene glycol, 0.1 M pentaethylene glycol and 140 mM potassium glutamate. For the dsDNA complex, CAPP was co-crystallized with 180 μM 5′-CGTGCG-3′ DNA oligonucleotide (ATDBio), 2 mM GTP and 5 mM CoCl2 in 0.05 M sodium cacodylate trihydrate, 10% PEG 4000, 0.1 M sodium chloride and 0.5 mM spermine.

Apo crystals were cryoprotected in the mother liquor with 30% glycerol; dGTP and dsDNA co-crystals were cryoprotected in the mother liquor with 25% ethylene glycol; and the crystal for the primer initiation complex was cryoprotected in the mother liquor alone. Diffraction data were collected at beamlines I03 and I04 of Diamond Light Source (Didcot, UK).

The diffraction data were processed with xia2. The initial phase solution was obtained with SHELXC/D/E36 using weak anomalous signal from manganese atoms for the dGTP complex dataset. The statistics for data processing are summarized in Extended Data Table 1. Automated model building was performed with ARP/wARP37, followed by alternate rounds of manual and automated refinement of the model using Coot38 and phenix.refine39. Molecular replacement with Phaser40 was performed on the apo and DNA complex datasets, using the dGTP complex model as a template. The refinement procedure for the apo and DNA complex datasets was the same as for the dGTP complex. Molecular graphics were generated with PyMOL (Schrödinger), with hydrophobic surfaces generated using a normalized consensus hydrophobicity scale41.

Fluorescence polarization assays

Fifty-microlitre reactions contained MsCAPP111–328 protein at the concentration indicated in the figure legend, 10 mM Bis-Tris propane pH 7.0, 10 mM MgCl2, 0.1 mM ZnCl2, 50 mM NaCl, 0.05% Tween-20 and 50 nM FAM–ssDNA oligonucleotide template with or without 1 mM GTP/dGTP and 0.1 mM dATP or 25 nM FAM–dATP or 25 nM FAM–γGTP and measured immediately at room temperature or as otherwise indicated in the figure legend. FP (excitation filter, 482-16 nm; dichroic filter, LP 504 nm; emission filter, 530–540 nm) was measured using a CLARIOstar multimode plate reader (BMG Labtech). Background signal (no-protein sample) was subtracted from each value, and the data obtained were analysed using GraphPad Prism (v.9.0, GraphPad Software). Data are the mean ± s.d. The equation for specific binding with a Hill slope was used for data curve fitting to obtain Kd values. Calculated Kd values that had a 95% interval of confidence containing negative values were excluded and labelled as not determined (ND). The equation [agonist] versus response – variable slope was used to calculate the half-maximal effective concentration (EC50) and the equation [inhibitor] versus response – variable slope was used to calculate the half-maximal inhibitory concentration (IC50). Source data used in GraphPad Prism are presented in Supplementary Table 2. All data were obtained from at least three independent assays. Sequences of the DNA templates used in FP assays are listed in Supplementary Table 1.

Computational methods

Non-covalent interaction (NCI) analysis of the active site using a cut-off of 8 Å was performed with the NCIPLOT programme42. The NCI analysis method is based on a reduced density gradient and the electron density, allowing attractive forces to be distinguished from repulsive forces. These forces were represented as isosurfaces using the visualization programme VMD (Visual Molecular Dynamics)43. The default red–green–blue (RGB) colour code for isosurfaces was used to represent attractive interactions (hydrogen bonds) in blue, repulsive interactions (steric effects in rings) in red and weak attractive interactions (that is, van der Waals forces) in green.

Symmetry-adapted perturbation theory (SAPT) calculations, which are based on Rayleigh–Schrödinger perturbation theory, were performed to investigate individual contributions to the total intermolecular energy44. In SAPT, the total intermolecular energy can be expressed as the sum of electrostatic, exchange repulsion, induction and dispersion contributions. SAPT can be truncated at different orders of inter- and intramolecular perturbation, offering several levels of SAPT45,46. We carried out SAPT0 calculations for selected pairs of residues present in the active site, using the def2-SV(P) basis set with PSI4 code47. The DNA bases (dA, dC and dT) and amino acids (Y138 and R223) in the active site were fragmented and completed with a methyl group to only consider the side chains for SAPT0 calculations, and Mg(II) was used as a surrogate for the divalent metal ions (Extended Data Fig. 4b). The charge transfer energy was computed following the Stone–Misquitta (SM09) definition48.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.