Introduction

DNA polymerases play a central role in many cellular processes, including nuclear and mitochondrial DNA replication, translesion synthesis, and repair of damaged bases, single-strand gaps, and double-strand breaks (DSBs) [15]. Although DNA polymerases have developed highly specialized functions in both replication and repair, many are also involved in multiple DNA metabolic pathways. Human DNA polymerases are divided into four families: A, B, X, and Y, based on their structural similarities. Replicative polymerases, which belong to the B-family and include polymerases α, δ, and ε, are conserved in all eukaryotes and are responsible for replication initiation and extension on the leading and lagging DNA strands [2]. Replicative polymerases are known for their high processivity and fidelity, and serve as critical factors in maintaining genomic integrity and avoiding replication-based mutagenesis. The nature of their fidelity is facilitated through a constrained active site that tightly binds DNA, exonuclease activity, and 3′→5′ proofreading function.

In contrast, Y-family DNA polymerases, which include Pol η, κ, and ι, are highly error-prone and non-processive [6, 7]. These polymerases participate in translesion synthesis (TLS), a method of DNA damage tolerance that allows cells to continue replication past DNA lesions without resulting in stalled replication forks. TLS polymerases have larger active sites that can accommodate bulky lesions and are permissive for the incorporation of bases opposite these lesions. Though TLS polymerases are by necessity, error-prone, they are also crucial for proper cellular function. For example, cells from xeroderma pigmentosum variant group patients, which lack functional Pol η, are hypersensitive to UV light and increased skin cancer incidence is observed in these patients [8].

Like Pol η, the A-family DNA polymerase Pol θ possesses translesion polymerase activity. However, since its discovery as the product of the POLQ gene in 1999 [9], exactly how the enzymology of Pol θ is connected to its cellular functions has been unclear. This conundrum was at least partially resolved with the recent discovery of a highly conserved role for Pol θ in error-prone end-joining repair of DSBs. A complementary body of literature has revealed that overexpression of Pol θ is frequently associated with a variety of types of cancers. The convergence of these two fields has generated a surge of interest into the biological roles of this highly unusual protein. In this review, we compare and contrast the distinct roles of Pol θ in different organisms, highlight several new studies that give insight into how and when Pol θ-mediated end joining occurs, and discuss the clinical relevance of Pol θ as a possible chemotherapeutic target.

Pol θ has unique structural determinants

A-family DNA polymerases, which include Pol γ, Pol ν, and Pol θ [911], are identified by their sequence similarity to Pol I, an E. coli polymerase with 5′→3′ DNA-dependent DNA synthesis activity and 3′→5′ proofreading exonuclease function, and a 5′→3′ exonuclease function in a separate domain. Pol I is involved in the removal of bulky adducts through base excision repair (BER) and the processing of Okazaki fragments during DNA replication [12]. As the polymerase responsible for mitochondrial DNA replication and repair in eukaryotes, Pol γ is required to have high fidelity and processivity. This is reflected in its fairly low misincorporation rate of 2 × 10−5 for single base pair substitutions [13]. While all A-family polymerases have DNA synthesis activity, only Pol γ has conserved the proofreading ability of Pol I. Pol θ retains a vestigial exonuclease-like region, but it lacks detectable exonuclease activity. Both Pol θ and Pol ν have extremely high error rates in vitro, 2.4 × 10−3 and 3.5 × 10−3, respectively, for single base pair substitutions, which is similar to that of error-prone Y-family pols [1416].

Pol θ is the only eukaryotic DNA polymerase that also contains a helicase domain. The polymerase and helicase domains are connected by a long, unstructured central region (Fig. 1). Both the polymerase and helicase domains are conserved among higher metazoans, although the central domain is more divergent [17, 18]. The helicase domain of human Pol θ shares 55 % sequence similarity with human HELQ (also known as HEL308), which is conserved in animals, plants and archaea [19]. HELQ has the ability to unwind both short (20–40 nucleotide) and long (60–70 nucleotide) DNA duplexes in vitro [19]. The helicase domain of Pol θ contains Walker A and Walker B motifs, which are used in the binding and hydrolysis of ATP. Although the helicase domain of Pol θ has single-stranded DNA-dependent ATPase activity, strand displacement activity has not yet been shown in vitro or in vivo [20]. The polymerase domain of human Pol θ shares 29 % sequence similarity with human Pol ν [10]. Pol ν is conserved among deuterostomes, including vertebrates, but not lower organisms. It has been suggested that Pol ν and HELQ might work coordinately in higher organisms to assume some of Pol θ’s roles [21], though Pol ν and HELQ do not appear to interact in vitro [22, 23].

Fig. 1
figure 1

Schematic of the domain structure of human Pol θ. a Domain structure of Pol θ. Domains include an N terminal helicase-like domain, a long unstructured central domain, and a C terminal polymerase domain. Within the polymerase domain is a non-functional exonuclease domain. b The polymerase domain of Pol θ contains finger, thumb, and palm subdomains. Insert 1 lies in the thumb domain while inserts 2 and 3 lie in the palm domain. The exonuclease subdomain contains 2 additional insertions, loop exo1 and loop exo2

DNA polymerases contain thumb, palm, and finger regions, the structures of which determine their properties and substrate specificities. The thumb region is responsible for association of the polymerase with its substrate and affects processivity, while the palm region contains the polymerase active site and the exonuclease domain (if present) and therefore affects fidelity. Three unique insertions are located within the polymerase domain of Pol θ in the thumb (insert 1) and palm (insert 2 and insert 3) regions (Fig. 1) [24, 25]. These insertions are proposed to contribute to novel functions of Pol θ. For example, Pol θ, like many translesion polymerases, has the ability to incorporate an adenine opposite an abasic (AP) site, which can be formed by spontaneous depurination of a DNA base or as a byproduct of DNA damage repair, in what as known as the “A” rule [24, 25]. However, full length recombinant Pol θ is unique in that it can efficiently perform DNA extension from the inserted base [24]. Pol θ can also extend from mispaired bases opposite bulky lesions like thymine glycol (Tg) and 6-4 photoproducts [24, 26]. In vitro experiments with an active fragment of Pol θ have shown that loss of inserts 2 or 3 impairs its ability to bypass AP sites and Tg lesions on DNA, while insert 1 is dispensable for this activity [25]. Deletion of insert 1, however, does reduce processivity. Though the inserts’ lengths vary across species, tending to increase with organismal complexity, certain amino acid residues are evolutionarily conserved from humans to Drosophila, including a basic residue in loop 2 that stabilizes the interaction between Pol θ and the nascent DNA strand and is essential for synthesis past AP and Tg lesions [27].

The crystal structure of the polymerase domain of Pol θ was recently solved and predicts that insert 2 makes contact with both O-helices within the finger domain, which changes position during the shift from the open to closed states and helps to align incoming nucleotides with the template DNA [27]. Insert 2 is also predicted to contact the 3′ (n-1) terminal phosphate of the primer DNA during DNA binding. A basic amino acid in human Pol θ, R2254, is located within insert 2 and appears to mediate this interaction. The putative salt bridge that is formed between insert 2 and the primer terminus is disrupted in R2254V mutants. These mutants are unable to bypass AP sites and Tg lesions in primer extension assays, but perform extension normally on an unaltered template [27]. Thus, it seems that insert 2 and its contacts with the O-helix and the 3′ (n−1) phosphate are responsible for positioning a poorly matched primer terminus for nucleotide addition to compensate for missing interactions between the primer and template strands due to lesions and DNA distortion.

Importantly, most biochemical studies (but see [20]) have been performed with truncated Pol θ protein containing only the polymerase domain, so the role of the helicase domain remains largely unknown. The related helicase HEL308 from archaea is able to displace proteins bound to DNA [28]. A similar role could exist for the helicase domain of Pol θ in displacing replication protein A (RPA) or other DNA binding proteins during translesion synthesis. The crystal structure of the polymerase domain of Pol θ also reveals the presence of two additional sequence inserts within the non-functional exonuclease-like subdomain, loop exo1 and loop exo2 [27]. Loop exo2 extends a known contact surface found in E. coli Pol I. Given that loop exo1 and loop exo2 are located at the extreme N terminus of the polymerase domain, they could potentially provide contacts to help position the helicase or central domains, though the helicase and central domains were not present in the crystal structure [27].

Pol θ is involved in distinct DNA damage repair pathways in different organisms

The first cellular role for Pol θ was identified through a mutagen sensitivity screen in Drosophila melanogaster [29]. Mutations in Drosophila mutagen sensitive 308 (mus308), the gene encoding Pol θ, cause hypersensitivity to a variety of interstrand crosslinking agents, including nitrogen mustard, diepoxybutane, and cisplatin [29]. Intriguingly, mus308 mutants are not sensitive to methyl methanesulfonate (MMS) and ultraviolet (UV) light, suggesting a highly specific role in interstrand crosslink (ICL) repair in Drosophila.

Since its initial discovery, Pol θ has been implicated in DNA repair pathways for many different organisms (Table 1). Its role in interstrand crosslink repair is conserved in C. elegans [30]. However, the exact nature of this role in either Drosophila or C. elegans is unclear. While the TLS function of Pol θ is likely to be relevant, Pol θ could also be involved in the regulation of homologous recombination (HR), which is used in ICL repair during S/G2 phases of the cell cycle [31]. The Fanconi anemia (FA) pathway also plays a large role in ICL recognition and subsequent repair. ICLs are directly recognized by FANCM and the FA core protein complex. The FA core complex then monoubiquitinates the FANCD2-FANCI heterodimer which recruits other DNA repair proteins, including BRCA1 and BRCA2, that interact with RAD51 to promote HR [31]. Complementation studies in C. elegans show that Pol θ-mediated ICL repair is independent of FA- and HR-associated proteins FANCD2 and HEL-308, but depends on BRCA1 [30]. Although BRCA1 is predominately associated with HR, it has also been shown to promote alternative end joining at dysfunctional telomeres in mouse embryonic fibroblasts (MEFs), along with DNA resection proteins CtIP and the MRE11/RAD50/NBS1 (MRN) complex [32]. Together, these observations indicate that end resection is required for Pol θ-mediated ICL repair and that the process is largely independent of the Fanconi anemia protein complex. Therefore, Pol θ might mediate a non-HR type of DSB repair during processing of ICLs in C. elegans and Drosophila.

Table 1 Effect of Pol θ loss in various organisms

Interestingly, the role of Pol θ in ICL repair does not appear to be conserved in mammals [33]. Instead, Pol ν, a polymerase whose presence is limited to deuterostomes, may substitute. Pol ν has high similarity to the polymerase domain of Pol θ and appears to be critical for ICL repair in human cancer cell lines, possibly functioning in synthesis across the lesion or in homologous recombination to repair an ICL-induced DSB [10, 21, 34]. While it has been suggested that Pol ν has subsumed the role of Pol θ in vertebrate ICL repair, it remains to be seen whether this holds true at the organismal level.

In Arabidopsis thaliana, Pol θ is coded for by the gene TEBICHI (TEB) [35]. Mutations in TEBICHI lead to sensitivity to DNA damaging agents mitomycin C (MMC) and MMS, which induce interstrand crosslinks and single- and double-strand breaks, respectively. Expression of the RAD51 protein is upregulated in teb mutants, suggesting an increased requirement for HR [35]. Interestingly, mutant plants with teb-1, an allele that removes the helicase domain but leaves the polymerase domain intact, are more sensitive to MMC and MMS than plants with the teb-3 mutation, which disrupts the polymerase domain alone [35]. TEBICHI mutations lead to growth retardation that is enhanced in the absence of the ATR checkpoint protein [36]. Treating teb mutants with DNA damaging agents exacerbates the growth retardation further, suggesting that this phenotype is related to defective DNA repair.

Initial studies in mice suggested that Pol θ may play a role in somatic hypermutation (SHM) of immunoglobulin genes, a process that diversifies B cell antigen receptor genes. During SHM, uracils present within the Ig locus, formed by AID-mediated deamination of cytosine, are excised by BER proteins and gap-filling is performed by error-prone polymerases. In an early study, mice deficient in Pol θ had an altered spectrum of IgG heavy chain mutations, with a significant increase in dG→dA transition mutations compared to control animals [37]. However, other studies have since indicated that the role of Pol θ in SHM is minor [3840]. Most recently, Gearhart and colleagues showed that the number of transition and transversion mutations was not significantly different between wildtype and Pol θ-deficient mice, nor was the number of mutations at A:T or G:C sites different, indicating that Pol θ does not have a role in SHM [41]. In contrast, mice deficient in Pol η had a significant decrease in transition mutations at A:T sites, suggesting that the SMH mutations in this study occurred in a primarily Pol η-dependent manner [41].

Pol θ is one of five human DNA polymerases, including Polymerases β, ι, γ, and λ, that have the ability to cleave 5′ deoxyribose phosphate (dRP) groups in vitro [4246]. Because 5′ dRP lyase activity is classically associated with Pol β-mediated base excision repair, it has been suggested that Pol θ may also be involved in BER. During short patch BER, a damaged base is excised by AP endonuclease and a polymerase incorporates a new base into the abasic site and removes the 5′ dRP group. DNA ligase 3/XRCC1 then seals the nick. A recent study showed that in vitro dRP lyase activity of Pol θ is much weaker than that of Pol β, suggesting Pol θ may not function significantly in BER [47]. In chicken DT40 cells, polq mutants are not significantly sensitive to the base-damaging agents MMS or 5-hydroxymethyl-2-deoxyuridine, which is incorporated into DNA and induces a BER response [48]. Double mutants lacking both Pol θ and Pol β are significantly more sensitive to hydrogen peroxide than either single mutant, suggesting that Pol θ might serve a minor role in base excision repair, possibly as a backup for Pol β [48]. However, cells derived from Polq −/− knockout mice are not hypersensitive to peroxide or paraquat [49]. Therefore, whether or not Pol θ plays a significant role in BER remains unclear.

Though dRP lyase activity is traditionally associated with BER, it is also used in other DNA repair processes, such as end joining. During classical non-homologous end-joining (c-NHEJ) DNA ends are bound by the Ku70/Ku80 heterodimer, which processes DNA ends but prevents extensive end resection. Ku70/80 also uses its 5′ dRP lyase activity during the end processing steps prior to DNA ligation [50]. This raises the possibility that another role for Pol θ’s lyase activity might be in processing ends of a DSB when excision of an abasic site is required for end joining.

Several studies using other model organisms further support the idea that Pol θ might be involved in DSB repair via end joining. In the green algae Chlamydomonas reinhardtii, Pol θ mutants are highly sensitive to the DSB-inducing agent Zeocin [51]. Unusually for a single-celled organism, C. reinhardtii has extremely low efficiency of HR, which may indicate an increased reliance on end-joining repair pathways [52]. When C. reinhardtii strains were transformed using non-homology-directed DNA integration, Pol θ mutants had a tenfold lower transformation efficiency compared to their wildtype counterparts. Pol θ mutants did not show such a defect during homology-directed transformation, consistent with a potential defect in end joining [51].

In mice, a point mutation within the exonuclease subdomain of Pol θ that destabilizes the protein and leads to decreased protein levels was identified in 2004. This allele, termed chaos1, leads to the formation of micronuclei within reticulocytes in bone marrow [53]. Micronucleus formation occurs when chromosomal fragments are left behind following nuclear expulsion during reticulocyte maturation. These micronuclei are thought to arise from a defect in mitotic chromosome segregation or from an increased frequency of chromosomal breakage, which could be due to defects in HR or end joining [49]. chaos1 mice have an increased frequency of both spontaneous and radiation-induced micronuclei; however, the chaos1 mutation does not confer hypersensitivity to radiation or MMC in cultured cells or mice [33, 49]. Importantly, chromosome instability is not sufficient to drive tumorigenesis in Pol θ-defective mice and unchallenged animals show no other phenotypic abnormalities [53].

To test the hypothesis that Pol θ is involved in double-strand break repair, a mutation in the ATM kinase was introduced into Pol θ-deficient mice [33]. ATM is recruited to DSBs by the MRN complex and begins a signaling cascade to facilitate HR [54]. ATM has also been implicated as a signaling factor in classical non-homologous end joining (c-NHEJ). It has been proposed that the formation of ATM-dependent γ-H2AX foci at DSBs is important for tethering DNA ends to facilitate c-NHEJ and prevent the usage of improper ends [55, 56]. Loss of both Pol θ and ATM is in mice was semi-lethal, with a 10 % survival rate during the neonatal period [33]. The surviving mice had severe growth retardation [33]. Additionally, atm−/−, chaos1 mutant MEFs had an increased number of chromosomal abnormalities per cell than either single mutant, suggesting that Pol θ is indeed involved in an HR- and c-NHEJ-independent mechanism of maintaining genome stability [33].

Pol θ promotes alternative end joining

Classical NHEJ (c-NHEJ) is genetically defined as DSB repair that involves the Ku70/Ku80 heterodimer, which binds to DNA ends and prevents resection, and DNA ligase 4/XRCC4, which seals breaks (Fig. 2) [57]. C-NHEJ is thought to be the dominant form of end joining in most organisms, while alternative forms of end joining serve as back-up pathways. It is becoming clear, however, that in certain contexts alternative end-joining mechanisms are prevalent and perhaps even preferred [58].

Fig. 2
figure 2

DNA double-strand break repair pathways. a Classical non-homologous end joining. A DNA break occurs during G1 phase. Ku70/Ku80 binds DNA ends and keeps them in close proximity (i). DNA-PKcs binds to Ku (ii) and recruits NHEJ core proteins including XRCC4 and DNA ligase 4. DNA ligase 4 ligates broken ends together (iii). b Homologous recombination. A DNA break occurs during S/G2. DNA is extensively resected in a 5′→3′ direction (i). The exposed 3′ single-stranded DNA is coated with RPA to stabilize it (ii). RPA is displaced by Rad51, with the help of Rad51 loading proteins (iii). Rad51 facilitates strand invasion and homology searching (iv). After DNA is copied from a homologous template, the D-loop is resolved (v). c Microhomology-mediated end joining. A DNA break occurs during S/G2. Limited resection occurs in a 5′→3′ direction (i). Microhomologies present at the DNA ends are aligned and stabilized by Pol θ, which then synthesizes DNA to fill in gaps (ii). DNA ligase 3/XRCC1 binds DNA to seal nicks (iii)

The first direct evidence for the involvement of Pol θ in alternative end joining was observed in Drosophila [59]. Using an I-SceI system which creates chromosomal DSBs with four-nucleotide complementary ends, researchers observed many repair events possessing >4 base pair (bp) inserts that appeared to be templated from flanking sequences. The percentage of these types of insertions decreased in flies lacking Pol θ [59]. This suggested that Pol θ might be utilizing its unique structure to generate and align short nucleotide homologies, a process the authors termed synthesis-dependent microhomology-mediated end joining (SD-MMEJ). In agreement with this, a recent study showed that Pol θ is required to generate >1 bp insertions during class switch recombination in mouse B cells [60]. Pol θ is also critical for an alternative end-joining like process that occurs during the genomic integration of linear group 2 introns in Drosophila [61].

The importance of Pol θ-mediated end joining in Drosophila was further highlighted in a study examining end-joining repair of transposon-induced gaps, in which the number of end-joining events recovered from the male germline of mus308 mutants decreased by two- to threefold relative to wildtype [59]. In the absence of both Pol θ and DNA ligase 4, end joining was almost completely abolished, indicating that Pol θ-mediated end joining is distinct from c-NHEJ [59]. Interestingly, the structure of the repair junctions in various mus308 mutant backgrounds suggested that both the helicase and polymerase domains were important for end-joining repair. For example, while junctions from wildtype flies provided evidence for annealing at 5–10 bp pre-existing microhomologies, these were not observed in flies with mutations within conserved helicase domain residues [59]. In addition, templated insertions were largely abolished in mus308 null mutants but were still present in flies with normal helicase domains. It is worth noting that no evidence has yet been obtained supporting a role for the helicase domain in DSB repair in other eukaryotes. The polymerase function alone of Pol θ is enough to restore bleomycin resistance and rescue spontaneous chromosomal instability in MEFs [60]. Thus, in organisms other than flies, another related helicase might substitute during Pol θ-mediated end joining in vivo.

In chicken DT40 cell lines, Pol θ localizes to laser-induced DSBs [48]. Similar observations have been made in HeLa cells, where Pol θ localization to DSBs depends on poly (ADP-ribose) polymerase 1 (PARP1) [62]. PARP1 has been previously implicated in alt-EJ [63] [64], suggesting that the role of Pol θ in alternative end joining is conserved. Indeed, as described below, Pol θ has emerged as a key player in a specific type of alternative end joining.

Pol θ mediates MMEJ

Microhomology-mediated end joining (MMEJ) is a type of alternative end-joining that shares many features with Pol θ-mediated end joining. MMEJ does not depend on c-NHEJ proteins; however, HR and MMEJ utilize the same initial DNA resection machinery, including the MRN complex and the endonuclease CtIP, to expose single-stranded DNA overhangs at sites of DSBs (Fig. 2) [65]. HR requires extensive resection, while MMEJ can occur in the context of a shorter ssDNA overhang. Single-stranded DNA is stabilized by RPA, which is replaced by RAD51 during HR and is inhibitory to MMEJ in Saccharomyces cerevisiae [66]. During MMEJ, microhomologies are exposed and aligned, an endonuclease trims DNA flaps, gaps are filled in by a polymerase, and DNA ligase 1 or DNA ligase 3/XRCC1 seals nicks [67]. While HR is considered to be an error-free repair mechanism, MMEJ always results in deletions and occasional insertions.

Recently, it has become clear that Pol θ is a key participant in MMEJ. One experimental system showed this using MEFs transfected with partially single-stranded DNA molecules possessing 45-nucleotide long tails with short, terminal microhomologies. While wildtype MEFs were able to carry out joining of these molecules, MEFs lacking Pol θ were defective in joining of these substrates but were still able to join molecules with 4 bp overhangs via c-NHEJ [60]. Perhaps this is because unlike most polymerases, Pol θ can extend from single-stranded DNA and DNA with a 3′ overhang [68], making it ideal for the annealing and joining of substrates with long tails.

Confirmation of this model came with the publication of an elegant in vitro MMEJ system [69]. In this study, the authors tested the ability of a purified Pol θ polymerase domain to align two MMEJ-like substrates, extend from an annealed microhomology, and displace DNA during primer extension. They found that the Pol θ polymerase domain can align two DNA molecules with 6–15 nucleotide overhangs possessing 4 base pairs of microhomology and aligns CG-rich microhomologies more efficiently, possibly because of increased hydrogen bonding between the microhomologies. Single-stranded overhangs of 18 nucleotides and greater appear to pose a problem and the enzyme bridges them with very low efficiency [69].

Remarkably, this study also showed that Pol θ mediates annealing of both internal and terminal microhomologies in vitro, can extend from mispaired bases, and displaces annealed ssDNA during template extension [69] (Fig. 3). Thus, the polymerase domain of Pol θ can independently carry out all of the major stages of MMEJ prior to flap trimming and ligation, at least in vitro. Although long (>18 nucleotide) single-stranded overhangs appear to be prohibitive to Pol θ activity in vitro [69], the full length Pol θ is able to join much longer ssDNA overhangs in vivo [60]. Because the in vitro studies were conducted using only the polymerase domain of Pol θ, it is possible that the helicase domain may also aid in primer unwinding or strand annealing during DNA extension. It is also possible that other proteins are required to stabilize long ssDNA overhangs during microhomology alignment.

Fig. 3
figure 3

Models of Pol θ-mediated MMEJ A double-strand break occurs (i) and DNA ends are resected (ii). Pol θ (green) aligns microhomologies (blue) located at the end of each ssDNA (iii). Pol θ synthesizes DNA to fill in the gap and strand displaces dsDNA, possibly aided by the helicase domain (purple) (iv). This repair process generates small deletions. Pol θ also aligns microhomologies that are located internally on ssDNA, leaving unpaired flaps (v). Flaps are cleaved by an endonuclease and Pol θ continues to synthesize DNA and displace dsDNA (vi). This process generates larger deletions. In the event that no microhomologies exist on ssDNA, Pol θ can utilize DNA overhangs as a template to generate microhomologies in “snap-back” synthesis, while displacing dsDNA (vii). Once microhomologies exist, they are aligned by Pol θ (viii) and Pol θ then fills in the gap (ix). This repair process generates templated insertions and deletions

Pol θ shows a strong preference for binding DNA with a 5′ terminal phosphate, similar to X-family polymerases used in c-NHEJ, and the presence of a 5′ terminal phosphate increases the rate of MMEJ mediated by Pol θ [69]. Insert 2 mutants are unable to bind primed DNA with a 3′ overhang but can bind primed DNA with a 5′ overhang. Because of this, insert 2 is postulated to interact with the 5′ phosphate to stabilize the DNA–protein complex and facilitate DNA displacement [69], though the crystal structure does not provide evidence for this interaction [27] (Fig. 4).

Fig. 4
figure 4

Mechanisms by which Pol θ might promote the initiation of MMEJ. a Once microhomologies are aligned, insert 2 may interact with the 5′ phosphate during DNA synthesis. This interaction could enhance binding ability of Pol θ and could also facilitate strand displacement during nascent DNA synthesis. b Insert 2 may also interact with the 3′ (n−1) phosphate on the nascent DNA strand. This interaction could facilitate nucleotide incorporation and stabilize mispaired nucleotides

Interestingly, two groups have suggested that Pol θ might operate during MMEJ as a dimer or multimer [27, 69]. While this possibility is certainly appealing in the context of a bridging model for Pol θ, whether or not this occurs in vivo remains to be tested.

Pol θ-mediated end joining impacts genome stability

Due to its ability to promote alternative end joining, Pol θ can have major effects on genome stability in different biological contexts associated with double-strand breaks. For example, spontaneous translocations between the Myc and IgH loci in mouse B cells are suppressed by the presence of Pol θ. In this context, Pol θ is thought to repair DSBs through a synthesis-dependent end-joining mechanism, thereby inhibiting other translocation-prone end-joining pathways [60].

In contrast, Pol θ promotes translocation formation in other genomic contexts. One example occurs in the case of deprotected telomeres in mice, where Pol θ utilizes pre-existing telomeric microhomologies to facilitate end-to-end chromosome joining events [62]. In addition, the loss of Pol θ results in a fourfold decrease in interchromosomal translocations involving CRISPR-Cas9 induced DSBs [62]. The reason for the disparate effect of Pol θ on translocation formation in different systems is presently unclear, but possible explanations might include the nature of the breaks or differential recruitment of other alt-EJ proteins in the two systems.

In addition to its role in repairing programmed or induced DSBs, recent work has also illustrated the importance of Pol θ in the repair of breaks formed at endogenous G-quadruplex (G4) DNA lesions during replication in C. elegans [70]. Long stretches of guanine residues are known to form very stable four-stranded structures in vitro that consist of stacked planar guanine tetrads. These G4 structures have been suggested to play roles in many cellular processes, including replication initiation, gene expression, and telomere maintenance (reviewed in [71]). Based on the sequence analysis, as many as 300,000 G-rich motifs, at a frequency of approximately 1 in every 10 kilobases, are estimated could form G4 structures in the human genome [72]. G4 structures are also highly stable in vivo and require special helicases to unwind them. In C. elegans, the absence of one of these helicases, FancJ, leads to replication fork stalling and DSB formation at G4 motifs, ultimately resulting in the creation of small (~50–200 bp) deletions [70]. Most of these deletion events have one nucleotide of microhomology at the junction site, while some contain templated insertions. Depletion of DNA ligase 4 or Brca1 has no effect on the frequency of deletions, indicating that they are not dependent on DSB repair through c-NHEJ or HR. However, depletion of Pol θ completely abolishes the small deletion class and the remaining deletions are much larger, averaging 20 kb in size. Thus, a Pol θ-dependent process suppresses these catastrophic large deletions at replication blocking lesions and thereby stabilizes the genome [70].

There is evidence that Pol θ-mediated end joining may also play a role in genome evolution in roundworms in response to various non-G4 DNA lesions such as base adducts and AP sites. Normally, cells use TLS polymerases to tolerate these lesions during replication and thereby prevent DSBs. When C. elegans depleted for TLS polymerases Pol η and Pol κ are cultured for multiple generations, small spontaneous deletions increase more than 20-fold compared to wildtype animals [73]. The size distribution of deletions in these strains ranges from 1 to >200 bp, and most repair junctions possess single nucleotide microhomologies or templated nucleotides. Additional removal of Pol θ in TLS polymerase-deficient mutants results in a 100-fold increase in deletion size, paralleling the results seen with G4 motifs [73]. Strikingly, different wild-caught strains of C. elegans have similar mutation profiles throughout their genomes, arguing for a general role of Pol θ in genome diversification in C. elegans. It will be interesting to determine whether there is evidence of Pol θ-mediated end joining in genome evolution in other eukaryotes.

Recently, Pol θ-mediated repair was also implicated as the major repair mechanism for CRISPR/Cas9-induced breaks in C. elegans [74]. CRISPR/Cas9 is an emerging genome editing system that has been used to disrupt or replace genes in many organisms (reviewed in [75]). One method of generating mutations through this system involves the creation of a single targeted DSB in a gene of interest, which can then be repaired mutagenically (with small insertions and/or deletions). Repair of Cas9-induced DSBs was initially thought to be a result of c-NHEJ; however, depletion of c-NHEJ factors Lig4 and Ku80 has no effect on the frequency or type of genomic alteration in C. elegans [74]. Depletion of Pol θ decreases repair frequency sixfold and alters the profile of repair products. In Pol θ proficient worms, the median deletion size is 13 bp and repair is often accompanied by short insertions. However, when Pol θ is depleted deletion size increases by 1000-fold, with a median deletion size of 10–15 kb and no insertions [74]. Whether the Pol θ dependence for repair of Cas9-induced breaks is specific to C. elegans or is more broadly conserved remains to be determined.

Evidence for Pol θ involvement in cancer progression

Error-prone Pol θ, like other TLS polymerases, needs to be tightly regulated, as dysregulation of its expression might promote mutagenesis and genome instability. An expression study of human replicative and TLS polymerases has shown that Pol θ and Pol ν are the only polymerases that are significantly upregulated in breast cancer patient tumors, compared to non-tumor tissue [76]. Breast cancer tumors are often deficient in HR proteins and therefore rely heavily on other DSB repair pathways. High expression of Pol θ is associated with poor patient prognosis, especially in those that have other genetic mutations or markers of advanced disease. For example, breast cancer patients with both high Pol θ expression and lymph node metastasis have significantly poorer survival than patients with either variable alone, with a survival rate of 50 % at about 40 months in one cohort [76]. High Pol θ expression is also significantly associated with a number of prognostic breast cancer indicators, including tumor size. Triple negative breast cancer, which lacks expression of estrogen receptor, progesterone receptor, and HER2, is the most aggressive and chemoresistant form of breast cancer. Triple negative breast cancer tumors are most frequently associated with high Pol θ levels, which may aid in TLS and DSB repair and promote survival when other DNA repair pathways are compromised [76].

Overexpression of Pol θ is not limited to breast cancer. One of the earliest Pol θ expression studies showed that Pol θ is preferentially expressed in lymphoid tissue, where it participates in class switch recombination, and is upregulated in lung, stomach, and colon cancer [77]. More recently, studies have demonstrated that Pol θ is upregulated in oral squamous cell carcinoma, non-small cell lung cancer, and colorectal cancer [7880]. In non-small cell lung cancer (NSCLC), which is the leading cause of cancer deaths worldwide, Pol θ is the only DNA polymerase that is upregulated twofold or greater in tumors compared to normal tissue [79]. Indeed, all other non-replicative polymerases are slightly downregulated in NSCLC tumors. Pol θ was overexpressed in more than 80 % of NSCLC samples and overexpression strongly correlated with poor patient survival, indicating that Pol θ may have a particularly important role during development of this type of cancer [79].

Intriguingly, overexpression of Pol θ in non-tumor cell lines with functional HR leads to an increase in DNA damage foci, suggesting that overexpression of Pol θ by itself can contribute to genome instability [76]. Importantly, these DNA damage foci include phosphorylated Chk2, which indicates that overexpression of Pol θ could lead to an extended checkpoint response. In this case, Pol θ might be out-competing HR but repairing DNA at a slower rate, leading to checkpoint activation.

The human POLQ gene contains 23 known single nucleotide polymorphisms (SNPs), at least nine of which are predicted to alter protein function. When these SNPs were first identified, they were not found to be significantly associated with sporadic or hereditary BRCA1/2 normal breast cancer [81, 82]. Recently, a study found that a mutation within the promoter region of POLQ, (c.-1060A > G), is significantly associated with hereditary breast cancer as well as hereditary breast and ovarian cancer syndrome in women with unknown BRCA1/2 profiles. However, the mutation is not associated with spontaneous breast cancer occurring in postmenopausal women over 50 [83]. The POLQ (c.-1060A > G) SNP is located at a putative binding site for the transcription factor Ying Yang 1 (YY1) [83]. Pol θ expression levels were not measured in this study, but the authors speculate that disrupting YY1 binding decreases transcription levels and that DSBs and DNA lesions may accumulate in the absence of Pol θ-mediated repair. It is also possible that the (c-1060A > G) mutation positively affects Pol θ transcription and drives mutation in HR-deficient tumors.

In 2013, a study of two Chinese populations identified SNPs in a number of DNA repair genes that were significantly associated with esophageal squamous cell carcinoma (ESCC). The DNA repair genes were clustered by pathway and it was found that those associated with homologous recombination, non-homologous end joining, and base excision repair were often significantly associated with ESCC, while nucleotide excision repair and mismatch repair genes were not [84]. Notably, the study identified ESSC-associated SNPs in the POLQ, HEL308, and POLN genes, thereby linking Pol θ to another type of cancer [84].

Pol θ was identified as a possible tumor-specific target using a siRNA screen to identify genes whose knockdown causes increased tumor radiosensitivity [49, 85]. Knockdown of genes involved in HR and c-NHEJ led to increased γ-H2AX foci, a marker for DSBs, in both irradiated and non-irradiated tumor cell lines. Only depletion of Pol θ resulted in a specific increase in γ-H2AX foci in irradiated tumor cells and not normal or non-irradiated cells [85]. The sensitization effect of Pol θ knockdown was observed in several different tumor cell lines. Since then, Pol θ has garnered much attention as a possible target for cancer therapeutics.

Pol θ as a chemotherapeutic target

Given the extreme phenotypic differences observed between high and low Pol θ expression, it is possible that Pol θ could be a major genetic driver contributing to poor clinical outcomes in cancer patients. This would makes Pol θ a good candidate as a prognostic marker and an appealing target for clinical therapeutics. However, until recently it was unclear exactly how Pol θ might function in a tumor-specific role, hampering drug development efforts. Emerging evidence now points to novel roles for Pol θ in modulating replication origin firing and DNA damage response pathway choice, both of which could be critical in tumor proliferation.

Two hallmarks of cancer progression are sustained cell proliferation and genome instability. Slower replication fork speed and shorter inter-origin distances are characteristics of cells with activated oncogenes, indicating that replication may be inhibited and dormant origins are being activated (reviewed in [86]). Strikingly, overexpression of Pol θ in colorectal cancer is more strongly associated with poor patient survival when replication and origin firing factors are also significantly upregulated [80]. This could indicate that Pol θ-mediated repair is utilized during DNA replication when DNA damage levels are high, which could in turn contribute to higher levels of genome instability in these cells. While a role for Pol θ-mediated MMEJ has been established during G2, human Pol θ also associates with chromatin in G1, well before the bulk of MMEJ repair is thought to take place [87]. During G1, Pol θ also interacts with origin licensing proteins Orc2 and Orc4. Though depletion of Pol θ does not affect replication origin number or density, it does appear to cause a subtle shift in the timing of origin firing during S phase, with some origins transitioning from early to late firing and others from late to early [87]. Overexpression of Pol θ leads to suppression of origin firing in a subset of origins and delayed replication. One interpretation of these data could be that under normal conditions, Pol θ binds to late firing origins to prevent them from firing early. When Pol θ is upregulated, the excess proteins may bind aberrantly to early- and mid-firing origins, delaying their firing as well [87]. It is possible that this temporal shift in origin firing could lead to the replicative stress and global genome instability characteristic of cancer cells.

In addition to its role in promoting MMEJ repair of DSBs, it now appears that human Pol θ may actively prevent HR repair. Human Pol θ binds to Rad51 and this interaction is dependent on three distinct regions located in the central domain [88]. Pol θ binding of RAD51 might sequester it away from DNA, thereby preventing the initiation of HR. Consistent with this model, the number of IR-induced RAD51 foci in the U2OS osteosarcoma cell line increases in the absence of Pol θ and in the presence of Pol θ mutants that cannot bind RAD51 [88]. Thus, Pol θ-mediated end joining appears to directly compete with HR in human cells. Interestingly, the putative RAD51-interaction sites appear to be conserved in vertebrates but not in invertebrates, which may explain why mutation of POLQ has not been reported to impact HR efficiency in other organisms.

The use of PARP inhibitors to treat breast and ovarian cancers has become common, with several drugs currently in use or in clinical trials. One way that PARP inhibitors might function is through their ability to prevent repair of single-strand breaks and double-strand breaks that would be lethal in certain genetic backgrounds [89]. Recent results suggest that targeting PARP1 in combination with Pol θ inhibition could be an attractive therapeutic option for HR-deficient tumors. Untreated MEFs lacking both Brca1 and Pol θ show increased chromosome aberrations and radial chromosome formation [62] and FANCD2-deficient mice that also lack Pol θ have severely reduced viability [88]. Pol θ is upregulated in HR-deficient cell lines derived from epithelial ovarian cancers [88]. BRCA1−/− tumor cell lines, which are deficient in HR, become hypersensitive to PARP inhibitors when Pol θ is co-depleted [88]. In total, these findings demonstrate that HR-deficient tumors are dependent on Pol θ for repair of DSBs and can be effectively targeted by simultaneous knockdown of Pol θ and chemotherapeutic treatment.

As previously mentioned, Pol θ is recruited to sites of DSBs in a PARP1-dependent manner [62]. PARP1 binds to DNA in the presence of single-strand and double strand DNA breaks and has been suggested to act as a scaffold that recruits proteins involved in alternative end joining [64]. Thus, one possible explanation for the synergistic effect of PARP inhibition and Pol θ knockdown in HR-deficient cancer cells could be a total inability to repair DSBs. However, an alternative explanation could be that Pol θ has an additional role besides PARP1-facilitated end joining, the exact nature of which awaits further characterization.

Concluding thoughts

Recent structural, biochemical, and genetic data illustrate that Pol θ is a highly specialized translesion synthesis polymerase with multiple roles in DNA metabolism. Though it may function in unique repair pathways in some model systems, its role in Pol θ-mediated end joining appears to be ubiquitous in metazoans. Now that a conserved role in MMEJ has been established, several intriguing questions clamor for attention. What role, if any, does the conserved helicase domain have in ICL and/or DSB repair? What proteins interact with Pol θ to assist it in MMEJ repair? Are there specific DNA sequences or chromatin contexts that promote Pol θ-mediated end joining? Following this line of inquiry, it will be interesting to survey the genomes of multiple organisms to look for DNA sequences that are hotspots for Pol θ-mediated end joining. Finally, given the many studies that have linked Pol θ overexpression of cancer severity and progression, further investigation into the mechanism(s) by which Pol θ overexpression promotes cancer will be a ripe target for future study. While expression levels of Pol θ are proving to be a useful diagnostic tool, understanding the link between Pol θ function, DSB repair, DNA replication, and cancer progression will be critical to creating effective cancer therapeutics, while minimizing potential undesirable side effects.