Introduction

Geminiviridae forms the largest family of the plant-infecting viruses. It has been classified into seven genera based on the insect vector used for their transmission, their host range and their genome organization [2, 77]. The geminiviruses are known to cause disease on a wide range of hosts infecting mostly dicotyledonous plants and are responsible for a large amount of economic damage to many important crops such as chilli, tomatoes, beans, squash, cassava, okra, cotton etc. [15,16,17, 38,39,40, 69, 70]. One of the contributing reasons for increase in the incidence and epidemics of geminivirus infection includes recombination among different geminiviruses co-infecting a host plant. This ultimately leads to development of more virulent virus.

Their genome contains either one or two DNA molecules. It contains DNA-A molecule which encodes for two (AV1 and AV2) and four ORFs (AC1, AC2, AC3 and AC4) from the virion sense and the complementary-sense strand, respectively. AV1 and AV2 ORFs code for the capsid protein or coat protein (CP) and the pre-coat protein, respectively, while AC1, AC2 and AC3 functions as replication initiator protein (Rep), the transcription activator protein (TrAP) and the replication enhancer protein (REn), respectively. The AC4 encoded protein is required for symptom production. The DNA-B molecule codes for the BC1 and BV1 ORFs which function as movement protein (MP) and nuclear shuttle protein (NSP), respectively.

Rep is a multifunctional protein as is evident from the presence of various motifs in the protein (Fig. 1) and its ability to interact with various host factors [63]. It binds to DNA in a site specific manner at the iterons present in the intergenic region (IR). Thereafter it initiates viral replication by creating a nick at the conserved nonanucleotide sequence. It also possesses ligase and ATP-dependent topoisomerase I activity [42, 55]. The Rep protein exhibits ATPase and helicase activities [20, 21, 24, 28].

Fig. 1
figure 1

Diagrammatic representation of the replication initiator protein (Rep) of geminivirus (modified from [63]

Rep and geminivirus replication

Geminivirus replication occurs within the nuclei of the infected plant cells through dsDNA intermediates via rolling circle mode of replication (RCR) [43]. However, characterization of various DNA intermediates produced during replication indicates that geminivirus multiplication adopts recombination dependent mode of replication (RDR) as well [33]. Geminivirus infection leads to modified expression level of a number of plant genes, many of which regulate biological processes like cell cycle, nucleotide metabolism, DNA repair and recombination [5]. Viral infection induces/activates the host DNA synthesis machinery such by upregulating of genes required during S/G2 phase while downregulating genes associated with M/G1 phase. Rep protein interacts and inhibits retinoblastoma-related protein (RBR), further disrupting E2F-RBR binding [25, 29, 36]. This leads to activation of the expression of DNA polymerases and other replication related host factors which are regulated by E2F transcription factor.

Geminivirus Rep is a highly conserved protein. It shared no similarity with known polymerases but instead, exhibited noteworthy similarity with the replication initiator proteins (Rep proteins) of eubacterial plasmids. It contains three conserved motifs namely motifs I, II and III at their N-termini [37]. These motifs are needed in initiation as well as termination of DNA synthesis during RCR. The Rep protein acts as a site- and strand- specific endonuclease [42]. Motif I (FLTY) and motif II (HLH) are required for dsDNA binding and metal-binding, respectively while the hydroxyl group of the Y (tyrosine) residue of motif III forms a covalent bond with the 5′-PO4 of the cleaved DNA strand [37, 42]. Thus motif III (YxxKD/E) forms the catalytic site for endonuclease activity.

For rolling-circle replication, Rep binds in the common region (CR) in sequence specific manner. The commence of the viral replication is mediated by the Rep protein via introducing a nick within the highly conserved nonanucleotide sequence 5′TAATATTAC3′ present in the plus strand [42]. The mapping results showed that the cleavage occurs at the phoshodiester bond between the seventh and eighth residues of the invariant nonamer 5′TAATATT↓AC3′. The 5′-phosphate end of the cleaved DNA remains covalently attached to the Rep protein while the 3′-hydroxyl end thus generated is utilized to start the RCR. After the completion of a full cycle of replication of the circular viral DNA when a new origin sequence is generated, it is yet again cleaved. The nascent 3′ end of DNA is then ligated to the previously generated 5′ end by the Rep protein, thus resolving the nascent viral single strand into genome-sized units. In Tomato yellow leaf curl virus (TYLCV) encoded Rep protein, the 211 amino acids at the N-terminal of the protein are required for origin cleavage as well as ligation [31]. In addition to the motif I-III, helix 1 and helix 2 situated between motif I and motif II also are required for DNA binding and endonuclease activity for the initiation of the replication [54]. The DNA binding domain spans from 1 to 130 amino acid region and overlaps with the oligomerization domain (120–180 amino acid residues) [54]. This suggests that DNA binding activity of the Rep protein during replication initiation requires oligomerized Rep protein. The endonuclease and ligase activities are conferred by 1–120 amino acids region of the Rep protein [31, 54]. Another motif named as Geminivirus Rep Sequence (GRS) showed a high extent of conservation among all the geminivirus encoded Rep proteins [52]. GRS comprises of an uncharacterized sequence constituting of two clusters of amino acids between motif II and motif III. GRS mutants showed impaired DNA cleavage activity, suggesting its requirement for the initiation of replication.

So far only the NMR structure for the N-terminal of the TYLCV Rep protein had been solved [12]. The structural information regarding the C-terminal of this protein is not available. Secondary structure prediction of the Rep protein showed the presence of conserved Walker motifs (Walker A and Walker B) at the C terminal which are required for the ATPase activity and is also required for the associated helicase activity [20, 21, 24, 28]. The C-terminal of the Rep protein contains B′ motif which has been reported in many animal viruses to be involved in ssDNA binding during DNA translocation and thus is required for the Rep mediated unwinding process in replication [28].

Rep as transcription regulator

Rep binds to site inbetween the transcription start site and the TATA box suggesting probable implication of the binding on transcription [72] as is shown in Fig. 2. In the life cycle of many DNA viruses like SV40, an early gene product is known to autoregulate its own transcription. The Rep protein is also known to autoregulate its own transcription inside the host cell. During the process of transcription, Rep protein mediates repression of its own promoter by binding to the conserved iteron sequences in the CR region of the viral genome [26]. The initiation of replication and the autoregulation of transcription due to Rep-iteron binding are exclusive and independent events as the Rep mutants impaired in replication still exhibits the transcription repression activity. This autoregulation helps in the expression of AC2 and AC3 genes since the transcription start site for both these downstream genes lies in the coding region of the AC1 ORF [68]. The expression of the AC2 ORF brings about the suppression of plant defence responses and the expression of late viral genes in the later phase of geminiviral infection. Apart from this in Tomato yellow leaf curl Sardinia virus (TYLCSV) Rep protein contains a highly conserved RGG sequence at amino acids located at 124–126 positons which when mutated alters the subcellular localization and inhibits its transcriptional autoregulation [66].

Fig. 2
figure 2

The Plus-strand origin of Replication and AC1 promoter in TGMV. Proteins predicted to interact with the origin are shown; Rep, REn, TATA binding protein (TBP) and G box transcription factor (GT). Rep and REn interaction is indicated. The direction of replication and transcription is shown. * marks the location of the site of cleavage at conserved nonanucleotide sequence for the replication initiation Adapted from [30]

Rep as stimulator of viral transcription through recruitment of post translational modification machinery

Geminivirus Rep is an early protein and has been shown to regulate its own transcription [26] which results in activation of transcription of the late viral genes [68]. In addition to this, it has been recently found that ChiLCV Rep protein also adopts a different way to stimulate the viral transcription through post translational modifications. Previously, the viral genome has been shown to exist as minichromosome and have been shown to interact with histone protein [35].

Recently post translational modification of histones, i.e. ubiquitination of H2B and methylation of H3 at K4 position were detected on the ChiLCV chromatin. Deposition of H2B-ub and H3–K4me3 on ChiLCV chromatin was found to correlate/coincide with the RNA polymerase II occupancy indicating that viral promoters get modified and possess mark of transcriptional activation. Above mentioned post translational modifications are caused by histone ubiquitination1 (NbHUB1) and ubiquitin conjugating enzyme2 (NbUBC2). Monoubiquitination of H2B marks the transcriptionally active region of the chromatin [44]. It acts as precursor to trimethylation at lysine-4 of H3 which is epigenetic mark for transcription activation. Interestingly it was found that ChiLCV Rep interacts with NbHUB1 and NbUBC2, recruits the post translational modification machinery onto the viral chromatin and leading to trimethylation of H3 at K4 position and ubiquitination of H2B [41]. These post translational modifications are known to occur at viral promoters and subsequently activating or stimulating viral transcription.

Rep as suppressor of gene silencing

Both Rep and RepA of Mastrevirus Wheat dwarf virus (WDV) are known to inhibit the post transcriptional gene silencing (PTGS) of ssGFP as well as inverted repeat dsGFP constructs and thus act as suppressor of RNA silencing [45]. However, their demonstrated silencing suppressor activity was weaker relative to that of the Tobacco etch virus (TEV) encoded HC-pro protein.

Methylation of DNA (specifically at cytosine base) leads to repression of gene transcription (TGS) while methylation of histones can result in either activation or repression of gene transcription. There are several findings indicating the role of host mediated viral genome methylation during the geminiviral infection process. The geminiviral genome is highly methylated in vivo [58]. In addition to that, out of the various replicative intermediates namely open circular, covalently closed circular and heterogeneous linear DNA, it is the viral heterogeneous linear dsDNA which is found to be preferentially methylated [56]. Conserved hairpin and the AC1 binding region of the viral DNA are found to be the frequent sites for cytosine methylation within which resides the early and late gene promoter and the origin of replication, thus suggesting the implication of methylation in the process of viral transcription and replication. In vitro DNA methylation found to have negative impact on the replication of the viral genome in tobacco protoplasts [9, 27].

Several studies indicated that plant utilizes both the cytosine as well as histone methylation machineries against the invading geminiviruses which the viral encoded RNA silencing suppressor proteins (RSS) counter by inhibiting the host silencing machinery [11, 59]. Supporting this notion it was found that Arabidopsis plants deficient in cytosine methyltransferases (CMT) and histone methyltransferases (MET) as well plants mutants for methylation cycle components results in more pronounced symptoms on geminivirus infection and the viral DNA isolated from them had significantly reduced methylation level [58]. Recently, it was also demonstrated that geminivirus infection results in Rep mediated reduction in transcript of methyl cycle enzymes, MET1 and CMT3 in N. benthamiana which are responsible for maintenance of symmetric methylation [64].

Interaction Rep with host factors

Rep interacts with several host factors that are implicated in replication and plant defence machinery and have been summarized in Table 1. The Geminivirus Rep protein is known to interact with cell cycle related proteins (RBR, PCNA, RF-C, RPA-32), recombination and repair related proteins (RAD51, RAD54). The progression of cell cycle is controlled by the retinoblastoma (Rb) family of protein. The expression of homologues of Rb (Retinoblastoma related protein-RBR) of maize caused reduction in geminiviral DNA replication in wheat cells. Wheat dwarf virus (WDV) encoded RepA protein physically interacts with the RBR protein by virtue of its LXCXE motif [79]. This interaction is important as disrupting this association abolished the WDV replication in the cultured wheat cells. Maize RBR1 and RBR2 protein interact with both Rep as well as D-type cyclin [1]. Interestingly, Tomato golden mosaic virus (TGMV) Rep lacks LXCXE motif and interacts with RBR protein with a distinct and novel helix4 motif [3]. The Helix 4 motif of the Rep protein consists of charged amino acids residues flanking a hydrophobic core. Geminivirus infection induces the expression of proliferating cell nuclear antigen (PCNA) in the mature infected cells [51]. Tomato PCNA has been shown to bind to TYLCSV Rep protein as well as with REn protein [13]. In the case of Indian mung bean yellow mosaic virus (IMYMV), 134–183 amino acid stretch in the Rep protein has been demonstrated to be required for interaction with PCNA [6]. This interaction has been shown to downregulate the endonuclease and ATPase functions of the Rep protein. WDV Rep protein binds to the Wheat large subunit of replication factor C complex (TmRFC-1) in the DNA/Rep/TmRFC-1 complexes which resembles the pre-initiation complex, thus aids in the further assembly of elongation complex for the viral replication [50].

Table 1 Host factors known to interact with geminivirus Rep along with implication of their interactions

In addition, the Rep protein binds to Replication Protein A-32 (RPA-32), one of the components of heterotrimeric ssDNA binding proteins, which is involved in repair and recombination. RPA-32 subunit interacts with the C-terminal of the Mung bean yellow mosaic India virus (MYMIV) Rep protein, downregulates the endonuclease activity while upregulating its ATPase activity [71]. Thus, it is suggested that the interaction with RPA-32 might limit the replication initiation and drives to elongation phase of RCR. Proteins such as RAD51 and RAD54 are two repair and recombination proteins which interact with the Rep protein [34, 73]. These proteins might play a critical role in case of replicational stress by stabilizing the replication fork. The N-terminal of the AtRAD54 protein binds to the oligomerization domain of the MYMIV-Rep protein and enhances its nicking, ATPase as well as helicase activities in vitro [34]. Similarly, AtRAD51 has been shown to interact with MYMIV-Rep protein. In contrast to these reports, studies on rad54 and rad51mutant A.thaliana plants showed no effect on the complementary strand replication (CSR), RCR or RDR of Euphorbia yellow mosaic virus (EuYMV) [61, 62]. Instead of AtRAD51, RAD51D (RAD51 paralog) has been demonstrated to promote viral replication. Thus it raises the possibility of functional redundancy and the role of other homologous recombination (HR) proteins in viral replication.

Geminiviral genomic DNA has been shown to assemble as minichromosomes [57]. The Rep protein of TGMV interacts with the Histone-3 which pointed towards role of this interaction in replication and transcription process [35]. It has been hypothesized that the Rep recruitment on the viral genome and its interaction with the histone 3 protein may help in removal of nucleosomal block in minichromosomes and thus helps in its efficient transcription and replication. Rep protein also interacts with a kinesin motor protein (GRIMP) that is involved in mitosis process [35]. Apart from that it also interacts with a kinase, Geminivirus Rep interacting kinase (GRIK). These interactions might inhibit the cell from entry into the mitotic phase.

NAC-domain containing protein family are involved in plant developmental pathways. GRAB1 and GRAB2 proteins belonging to the NAC-domain containing protein family have a negatively charged residues rich C-terminal and a conserved N-terminal. The N-terminal of GRAB1 and GRAB2 interacts with WDV Rep to inhibit the replication [78]. The N-terminal of Rep physically interacts with sumoylation conjugating enzyme (SCE-1) [14]. The K68 and K96 amino acid residues in the N-terminal of Rep protein are found to interact with SCE-1 and when mutated abolished the interaction and reduced the viral accumulation in infected plants [65]. Recently, the transcriptomic analysis of the TYLCSV infected tomato plants was demonstrated to cause an elevated expression of the genes required for suppression of programmed cell death (PCD) [49]. In the same study it was shown that the reprogramming is mediated by the central domain of the TYLCSV Rep.

Rep as target to generate broad spectrum viral resistance

The best strategy for the development of viral resistance is to inhibit the viral replication. Rep protein being essential as well as conserved protein serves a suitable target for achieving broad spectrum viral resistance. Many groups have targeted Rep protein and employed various approaches for Rep mediated plant virus resistance.

Transgenic tobacco plants expressing antisense sequence of TGMV AC1 upon virus infection show reduced symptom development and thus increased viral resistance [23]. Similarly, expression of antisense RNA for the region spanning AC1, AC2 and AC3 has been reported to inhibit replication in tobacco, N. benthamania and tomato [4, 7, 8, 80].

Integration of ACMV AC1 transgene in cassava has been reported to impart resistance to viral infection through post transcriptional gene silencing (PTGS) [18]. Resistance is achieved by 3′ end of AC1 sequence which overlaps with AC2, as in case of ACMV-infected plants there is high accumulation of siRNAs homologous to the C-terminal of Rep sequence [18, 19]. The transient expression of the siRNAs homologous to ACMV AC1 sequence in tobacco BY2 protoplasts results in reduced viral replication due to decrease in AC1 mRNA [76]. In addition to this, the symptom recovery in infected cassava plants was found to show association/relation with high amount of accumulated virus-derived siRNA [19].

Conserved nature of the N-terminus of geminivirus Rep protein has also been exploited to confer resistance to viral infection through transgenic expression of Rep binding peptide aptamers. Peptide aptamer consist of short stretch of amino acid residues that bind to target molecules [46, 60]. The peptide aptamers (A22 and A64) bind to conserved region of Rep protein and provide a broad spectrum virus resistance as they can bind to many geminivirus Rep proteins [60]. Also, transgenic plants expressing truncated N-terminus Rep exhibit a significant degree of viral resistance [32, 47, 53]. Transgenic tomato plants expressing T-Rep possess resistance to homologous virus and to a heterologous virus by repression of the Rep promoter and formation of impaired Rep complexes respectively [10]. To generate stable virus resistance in Arabidopsis, dsDNA binding site of Rep protein required to bind to the origin of replication of Beet severe curly top virus (BSCTV) was prevented/blocked by the artificial zinc finger protein (AZP) [67]. The Arabidopsis transgenic plants expressing the AZP achieved resistance to BSCTV infection. [48] generated synthetic Rep130 transgene (Rep130syn) having silent point mutations introduced in such a way that the continuous homology between the Rep130syn sequence and the corresponding wild-type viral transgene sequence (Rep130wt) was mostly less than or equal to 5 nts. This resulted bypassing of VIGS pathway of host. Thus the resistance provided by the transgene expression was due to stable accumulation of Rep 130syn protein and was completely protein mediated PDR. [18] employed the PDR approach to generate a mutant Rep protein which conferred resistance in cassava against broad-spectrum resistance against geminiviruses. The transgenic plant expressed significant levels of resistance to both homologous and heterologous species of cassava-infecting geminiviruses.

Concluding remarks

Rep is multifunctional in nature and possesses modular functions, but the coordination between these different modules is yet to be dissected. One of the reasons for the existence of such diverse and interplay between the viral protein Rep and host factors is due to the fact that virus encodes very few proteins. Thus this early protein has evolved in a way acquiring different domains so as to carry out diverse functions to drive the host cellular machinery from the efficient and productive viral replication inside the host cell.