Introduction

Parvoviruses are small, nonenveloped viruses with a linear, single-strand DNA (ssDNA) genome of about 5000 bases [1,2,3]. The family Parvoviridae contains two subfamilies, Densovirinae and Parvovirinae. Densoviruses infect only invertebrates, whereas Parvovirinae mainly infect vertebrates and contain eight genera: Bocaparvovirus, Dependoparvovirus, Erythroparvovirus, Protoparvovirus, Tetraparvovirus, Amdoparvovirus, Aveparvovirus, and Copiparvovirus [4]. Among these, the dependoviral adeno-associated viruses (AAVs) require co-infection with a helper virus for productive infection, whereas all others are autonomous, although they require the host cell to enter the S phase for viral DNA replication. Due to their high level of conservation, nonstructural protein genes are used for the classification of parvoviruses into different genera [5].

In terms of genetic composition, all parvoviruses of vertebrates have similar genomic structures, with terminal repeats required for DNA replication, a replication initiator protein (NS or Rep) on the left half of the genome, and a virion protein on the right (Fig. 1A). Parvoviruses are composed of two nonstructural proteins (NS1 and NS2) and two or three capsid proteins (VP1, 2, and 3) [6]. During canine parvoviral (CPV) infection of F81 (feline kidney) cells, NS1 mRNA and NS1 protein can be detected at 12 h, reaching maxima by 30 and 42 h, respectively [7]. The capsids of CPV, MEV (mink enteritis virus), and MVM (minute virus of mice) are formed of VP1 and VP2, while the PPV (porcine parvovirus) capsid contains an additional VP3 [8, 9].

Fig. 1
figure 1

A The MEV genome (Accession: KT899746.1). B The NS1 protein. The N-terminus is the DNA binding domain, including ion coordination sites, tyrosine residue site, and lysine sites; the middle is the helicase domain, including the NTP-binding site; and the C-terminus is the activation domain

A growing body of research has revealed that NS1 is involved in multiple activities, including the DNA damage responses, cell apoptosis, type I interferon responses, and tumor suppression [10], as well as significantly enhancing promoter activity by binding to cellular transcription factors or by direct DNA binding [11]. As a mainly nuclear phosphoprotein, NS1 also plays an important role in viral pathogenicity by regulating its phosphorylation state [12, 13].

In this review, we focus on recent advances in the structure of NS1 protein, its specific binding properties to DNA sequence and its biological function including the effects on viral gene expression and infection.

Structure of NS1 protein

The N-terminus of the parvoviral NS1 protein contains an origin of replication binding (OBD) domain [2, 14], also known as a DNA-specific binding domain or nuclease domain. Its central region contains a helicase domain which includes an NTP-binding site [15, 16], and its C-terminus includes a transactive domain [2, 14]. The two DNA-interacting domains, the N-terminal and helicase domains, are necessary for viral genomic replication and for the control of viral protein production, and the C-terminus is required for its transactivation function (Fig. 1) [14].

N-terminal domain

The N-terminal domain of the NS1 protein contains overlapping site-specific dsDNA binding, ssDNA recognition, and origin-specific ssDNA nicking functions [17]. It also contains a nuclear localization signal (NLS) that directs its transport into the host cell nucleus [17]. Parvoviral genomes amplify by a unique “rolling hairpin” process in which NS1 initiates genomic replication by binding site-specifically to the viral right-hand origin, with strand- and site-specific nicking of viral DNA at the end of the replication [16].

Structurally, the N-terminal domain of MVM NS1 reveals a nickase active site (Fig. 2A and B) that is highly versatile in binding the metal ligands required for ssDNA binding and cleavage, with the architecture of the active sites formed by E119, H127, H129, K214, and Y210, all which are essentially invariable in both metal-bound or free states [17].

Fig. 2
figure 2

The structure of N-terminal domain of NS1. A Structure of the nickase domain of MVM NS1(PDB: 4R94). The nickase active sites are marked with red box and enlarged in B. Residues involved in interaction with DNA

Through synthesis of the complementary strand DNA, genomic DNA is converted into a double-stranded replicative form after entering the host cell nucleus. This allows transcription and synthesis of the NS1 protein, which can direct viral DNA replication and modulate host cellular processes in favor of viral replication [18]. The initial binding site of NS1, in the Ori region, is therefore double-stranded. In MVM NS1, structurally, the loop L10 inserts into the DNA major groove, while the helix α4 interacts with the DNA minor groove in the N-terminal. K143, R146, and R147 in the helix α4 and 3 basic residues K191, K194, and K195 in the loop L10, along with K189 and K199, may all be involved in interaction with DNA (Fig. 2C) [17, 19].

Helicase domain

In addition to sequence-specific dsDNA recognition, the middle region of the parvoviral NS1 protein is able to function as a ssDNA helicase [14]. Helicases are enzymes that use the energy of ATP hydrolysis to translocate along DNA or RNA and unwind double-stranded regions [20]. They have been classified into three main superfamilies on the basis of sequence comparisons, with the NS1 belonging to the SF3 superfamily [14, 21, 22], members of which are encoded only by small DNA and RNA viruses [23]. SF3 helicases contain 4 conserved sequence motifs within a limited 100 amino acid region: Walker A (GPATTGKT), B (VIWWEE), B’ (a 14 amino acid region), and C (with an invariant Asn residue) motifs [24], all of which are essential to make up the core of the enzyme active site (Fig. 3) [25].

Fig. 3
figure 3

Structure of Rep40 (PDB: 1S9H). Walker A is in blue, walker B is in green, walker B’ is in red, and walker C is in magenta

Within the Parvovirinae, the helicase domain of carnivore parvovirus is poorly resolved. Analysis of this domain has mostly concentrated on AAV (Fig. 2B). AAV contains 4 nonstructural proteins, Rep78, -68, -52, and -40 which arise from alternative splicing schemes and differential promoter usage within the Rep ORF. All four isoforms possess the helicase domain [25]. In Rep40, residues K340 and T341 in the Walker A motif are predicted to form hydrogen bonds with an ATP molecule [25].

Additionally, as members of the SF3 family, all AAV Rep proteins have a version of the AAA + ATPase domain, an enzyme associated with a variety of cellular activities [24], containing an N-terminal helical bundle known as the oligomerization domain (OD). Common viral SF3 helicases and most AAA + proteins can form hexameric rings [26]. A characteristic of AAA + proteins, pertinent to their oligomerization, is the presence of a so-called arginine finger. Within the context of the hexamer, the arginine residue penetrates the active site of a neighboring subunit and thus plays a critical role in cooperative ATP hydrolysis. Analogous to AAA + proteins, R444 in the Rep40 hexamer model is in an excellent position to hydrogen bond with the γ-phosphate of an ATP modeled in the adjacent subunit [25]. Structures of hexameric ring helicases also reveal loops facing into the center of the pore whose residues are involved in ssDNA binding [20]. The Rep40 hexamer model shows the presence of similar loops protruding into the central pore that may play an analogous role. Among these, in the loop in hairpin 1, two lysine residues, K404 and K406, have the potential to interact with ssDNA passing through the central channel.

The amino acid sequence alignment of the NS1 proteins of MEV, MVM, CPV, FPV (feline parvovirus), PPV, and the Rep protein of AAV shows strict conservation of the active site residues noted above, reflecting the structural and functional unity of the active site architecture within the Parvoviridae.

The oligomerization of NS1 protein

Structurally, SF3 helicases share two domains, a DNA origin interaction domain (OID) and an AAA + motor domain. The AAA + motor domain is also a structural feature of cellular initiators and it functions as a platform for initiator oligomerization [24, 27].

Studies have shown that Rep proteins have a dynamic oligomeric behavior in which the DNA substrate molecule modulates its oligomeric state. Different oligomeric Rep-DNA complexes may form to carry out diverse reactions at different stages of the viral life cycle. In Rep68, the OID binds the Rep binding sequence (RBS) double-stranded DNA specifically, while the AAA + domain binds ssDNA or ss-dsDNA junctions non-specifically to perform the unwinding of DNA depending on both the DNA structure and cooperativity of Rep68 domains [27].

As for dsDNA, a 26 bp dsDNA region of the AAV origin promotes the formation of a pentameric Rep complex (Fig. 4A). Five Rep monomers bind five tetranucleotide repeats (5′-GCTC-3′) at the RBS, and each repeat is recognized by two Rep monomers from opposing faces of the DNA [19, 27]. Oligomerization of Rep68 on ssDNA requires synergy between the OID and AAA + domains and is a dynamic associative process from dimer to octamer rings.[27]. The cryo-EM and X-ray structures of Rep68–ssDNA complexes show that Rep68 generates hybrid ring structures (Fig. 4B), the OBD favors the formation of octamers (double-octameric complex), and the HDs form heptamers. Upon binding ATPγS, the HDs transform into hexameric rings (Fig. 4C) [26]. In addition, AAV is the only eukaryotic virus known to integrate its genome in human cells in a specific region of chromosome 19. Binding to the integration site, AAVS1 produces a heptameric complex [26].

Fig. 4
figure 4

The structure of Rep binding to DNA sequence. A Structure of OBD domain of Rep bound to dsDNA (PDB: 1RZ9). B Structure of helicase domain of Rep bound to ssDNA (PDB:7JSI). C The overall dimensions of Rep bound to ssDNA [26]

Specific binding of NS1 protein and DNA sequence

Structural analysis of NS1 protein has shown that the DNA-interacting function is mainly associated with the N-terminal origin-binding and helicase domains, while the C-terminal is mainly involved in transactivation [14, 17, 28]. Binding of NS1 protein to DNA sequences promotes its oligomerization [27]. Unlike AAV Rep, MVM NS1 molecules exhibit site-specific DNA binding only when preassembled into some form of oligomer (at least dimer) [29].

NS1 protein can bind to a simple cognate recognition sequence comprising two to three tandem copies of the tetranucleotide TGGT. This motif is also widely dispersed throughout the viral genome. In MVM, NS1 specifically binds to many internal sites, including the ORI recognition complexes OriL and OriR, the TAR element, the minor intron, and elements toward the right end of the genome [29].

Viral replication induces the formation of two ORI recognition complexes, named OriL and OriR according to their positions in the genome. In order to initiate viral DNA replication, the NS1 protein requires to bind at the initiation site, a sequence rich in (TGGT)2–3 [29]. The linear single-stranded DNA genome of MVM is replicated via an intermediate double-stranded replicative form (RF) DNA which requires the structural transition of the right-end palindrome from a linear duplex into a double-hairpin structure, to serve for the repriming of unidirectional DNA synthesis. However, elimination of the NS1-binding site (ACCA)2~3 from the central region of the right-end palindrome next to the axis of symmetry has been shown to markedly reduce the efficiency of hairpin-primed DNA replication, and confirms that the conformational transition is induced by NS1 [30]. In contrast, the binding site of AAV Rep is not found distributed throughout their genomes. In AAV origins, the larger Rep proteins (Rep68 and Rep78) bind to the RBE, which consists of 22 nucleotides including five consecutive tandem copies of the tetranucleotide GCTC [29].

One way that NS1 exerts its transcriptional regulation is by direct interaction with specific promoter sequences [11, 31]. NS1 is able to serve as a transcriptional activator of the viral P38 promoter through high affinity binding with sites immediately upstream of this sequence [17].

The function of NS1 protein

Effects of NS1 on viral gene expression

Transcriptional regulation of P4 promoter

Transcriptional regulation and post-transcriptional regulation constitute two major regulation modes of gene expression to either activate or repress the initiation of transcription and thereby control the number of proteins synthesized during translation [32]. At the level of transcription, the regulation of gene expression is largely dependent on the promoter [33].

The parvoviral genome consists of two overlapping transcription units. Members of the genus Parvovirus contain two promoters, P4 and P38 [34]. An early promoter such as the P4 promoter of MVM directs the synthesis of transcripts in which spliced derivatives R1 and R2 encode nonstructural proteins NS1 and NS2. Both NS1 and NS2 are phosphorylated proteins, and NS2 mRNA is derived from NS1 mRNA after alternative splicing, and they share a common N-terminal domain [17]. A PPV NS1 mRNA binding protein, SYNCRIP, has been identified to be involved in host pre-mRNA splicing, which is able to bind the 3′-terminal site of NS1 mRNA to promote the cleavage of NS1 mRNA into NS2 mRNA [10]. During infection, activation of the MVM P4 promoter is a key step in the replication of the virus with the initial activation being completely dependent on host cell factors [35]. In eukaryotes, promoters are able to bind transcription factors (TFs), recruit RNA polymerase and thereby regulate gene transcription. The P4 promoter contains multiple transcription factor binding sites, including E2F, Ets1, SP1, and CRE [36, 37]. Within the whole MVMp (prototype strain of MVM) genome, an E2F mutation abolishes P4 induction in the S phase resulting in amplification failure and a deficiency progeny particle generation, nevertheless, the virus can be rescued when nonstructural proteins are supplied in trans, showing that P4 hyperactivity in the S phase is required to reach a level of NS1 expression sufficient to drive the viral replication cycle [38]. NS1 protein is also cytotoxic, however, and there must be an additional negative feedback mechanism to prevent early death of the host cell through its overproduction [39].

While activation of the P4 promoter is completely dependent on host transcription factors, the composition of these varies widely among different cell types. It is possible, that only some host cell types have the capacity to initiate expression from the P4 promoter, which would therefore be a factor in determining the tropisms of parvoviruses. Indeed, the P4 promoter has been shown to play a role in determining the host cell-type range of MVM (Meir et al., 2017). A transgenic P4 promoter has also been shown to exhibit tissue and stage specific activity during embryonic development and continuing into adulthood [35]. The activity of the MVM P4 promoter increases with a decrease in its number of Tcf sites [37]. Additionally, abolition of the cyclic AMP Response Element (CRE) of the MVM P4 promoter reduces the infection but does not alter the host cell-type range in vivo [39]. Exchanging the promoter elements of MVM P4 with the equivalent regions of the closely related H1 virus results in elimination of infection in fibroblasts and chondrocytes while being retained in skeletal muscle [39]. It is apparent that the P4 promoter plays an important role in host range determination and therefore provides a target for modifying the productive infection potential of the virus [36, 37].

Transcriptional regulation of P38 promoter

The transcription of capsid protein is controlled by the P38 promoter. VP1 and VP2 mRNAs are generated through alternative splicing, and the major capsid protein, VP2, is able to self-assemble into empty capsids, known as virus-like particles (VLP), which act as major antigens in the induction of neutralizing antibodies [3, 8]. Parvoviral genomes contain a small ORF coding for a protein known as SAT, starting from four or seven nucleotides downstream from the VP2 initiation codon. This differs from previously identified nonstructural proteins, being expressed from the same mRNA as VP2 as a late NS protein [10, 40] which, as shown in the attenuated PPV NADL-2 strain, accumulates in the ER and accelerates virus release and spread [41].

As shown with MVM, transcription of structural protein genes is mainly regulated by the P38 promoter, which contains multiple cis-acting elements, including the upstream transactivation response (tar) motif, GC box, and TATA box. In MVM, mutations in the GC box or TATA box significantly reduce the level of transactivation. NS1 can also transactivate the P38 promoter [12, 42] and can regulate gene expression from heterologous promoters. Although the molecular mechanisms involved still require further investigation, genetic analysis has shown that the transcription-regulating domains of NS1 are confined to the amino- and carboxy-terminal portions of the protein [42].

The expression of MEV VP2 is inhibited in several common eukaryotic expression vectors. However, the 5′UTR of VP2 can enhance capsid gene transcription at both transcriptional and translational levels, and can significantly promote the transcriptional activity of the P38 promoter. Additionally, mutation of the 5′ UTR in MEV full-length clones has shown that the 5′ UTR is required for VP2 gene expression. NS1 has also been shown to increase the transcription activity of 5′UTR [3].

Effects of NS1 on viral replication and infection

NS1 and DNA damage responses

It has become increasingly clear that viruses, especially DNA viruses, can provoke DNA damage responses (DDRs) in infected cells, either in response to virus-encoded proteins or to the large amount of foreign DNA produced during viral replication. These cellular responses are varied, and have the potential to impede or facilitate virus replication [43, 44].

Upon infection, the MVM genome initially associates with sites of cellular DNA damage, and it appears that the virus exploits the damage response machinery early in infection to enhance its replication [43]. As the infection proceeds, new DNA damage sites are induced, characterized by the phosphorylation of H2AX, Nbs1, RPA32, Chk2, and p53. These proteins are recruited to MVM replication centers and co-localized with NS1, which is also the main viral replication protein; i.e., MVM can establish replication at cellular DNA damage sites and, as cellular DNA damage accrues, the virus spreads to newly damaged sites to amplify infection [45]. In addition, during MVM infection and during overexpression of the NS1 protein, the MVM genome, as well as a heterologous DNA molecule containing an NS1-binding site, can be localized to sites of cellular DNA damage, which confirms that NS1 plays an important role in localizing the MVM genome to sites of DNA damage to facilitate ongoing infection. It also demonstrates that binding of NS1 to a heterologous DNA sequence is necessary to translocate it to a cellular DDR site [46].

NS1 and immunity

As the first line of defense against viral infection, the innate immune system induces protective cellular factors including type I interferon (IFN) and inflammatory factors [47]. The nuclear factor-κB (NF-κB) pathway regulates expression of numerous components of the innate immune and inflammatory responses, controls cell proliferation and differentiation, and consequently regulates cell survival. PPV NS1 has been shown to activate this pathway and then to stimulate the production of interleukin (IL)-6 in a dose-dependent manner [47,48,49]. Nevertheless, in order to successfully infect cells, viruses have evolved strategies to escape the innate immunity. For example, while type I interferon plays a critical role in antiviral innate immunity, MEV infection can inhibit its expression in CRFK cells, during which the NS1 origin-binding domain plays an important role [50].

NS1 and apoptosis

During viral infection, premature host cell death by apoptosis may be triggered as a defense against the viral invasion [51]. Among the parvoviruses, the NS1 protein is considered to be responsible for inducing cell death [52]. For example, PPV NS1 can induce host cell death, effected mainly through the mitochondria-mediated intrinsic apoptosis pathway. PPV-induced apoptosis can also cause placental tissue damage, ultimately leading to reproductive failure. Additionally, PPV infection causes cell cycle arrest in the G1 and G2 phases, and the resulting NS1-induced apoptosis is significantly inhibited by caspase 9 inhibitor [53]. MEV NS1 can induce apoptosis in both F81 and HEK293T cells, similar to that induced during MEV infection in minks. NS1-induced apoptosis in HEK293T cells is also mediated by the mitochondrial pathway, through mitochondrial depolarization, opening of mitochondrial transition pores, release of cytochrome c, and activation of caspase 9 and caspase 3. In vitro infection of F81 cells with strain MEV-B induces cell cycle arrest in G1 phase [2]. It has also been reported that expression of CPV-2 NS1 in HeLa cells arrests infected cells in G1 phase mitochondrial-mediated apoptosis effected by the activation of caspase 9 [52].

Mutations in the key amino acid residues

Given the importance of NS1 protein, more and more studies have found that mutations in the parvovirus NS1 can result in loss of its biological property.

Although the structure has not being resolved, the key amino acids in CPV NS1 have been identified using the comparative modeling of AAV Rep. Mutations in the key ATP-binding amino acid residues of the conserved A motif (K406), B motif (E444 and E445), and positively charged region (R508 and R510) have been found to prevent the formation of infectious virus, showing that the DNA binding of NS1 depends on both binding and hydrolysis of ATP [16]. In addition, mutations in the divalent metal ion coordination site (H129, H131, and E121), the tyrosine that covalently links NS1 to the 5′end of the nicked DNA (Y212), the dsDNA recognition sites (K196 and K197), and the helicase sites (K470 and K472) in CPV NS1 change the intranuclear binding dynamics of NS1 dramatically. The deficiency is due to the difference of virus-specific as well as nonspecific DNA binding [14].

In MEV, the N-terminal replication origin-binding (aa 1–337), the helicase domain (aa 338–556), and the transactivation domain (aa 557–668) are all responsible for inducing apoptosis [2]. In addition, the deletion of any functional domain impaired the ability to enhance the activity of VP2-5′UTR [3].

Conclusions

The structural information can provide us a better understanding of the mechanisms of viral replication and infection. Structure analyses have revealed several key amino acid sites in the functional domains of NS1: a DNA binding site, a nickase active site, and an NTP-binding site [16, 17], although only a limited number of NS1 protein structures have been reported so far [17, 21, 26]. Many studies have focused on the effect of parvoviral NS1 protein on host cell functions such as apoptosis, cell cycle arrest, and interferon responses [10]. However, loss of any of the NS1 functional domains has a major negative impact on its normal biological activity [2, 3, 14]. As we have revealed, the NS1 origin-binding domain plays an important role in type I interferon responses [50]. We have also demonstrated that NS1 is required to activate the transcriptional activity of the VP2-5′UTR in MEV [3]. However, the mechanism by which the NS1 protein exerts its regulation on virus replication or infection has still not been investigated adequately. Analyses of the NS1 protein of its functional domains may provide further insight into the activity of this multitasking protein.