Recent studies have unveiled unexpected ubiquity and diversity of single-stranded DNA (ssDNA) viruses. Many ssDNA viruses with novel genome architectures and biological characteristics have been discovered in the last few years [3, 19]. Recently, a novel circular ssDNA mycovirus, Sclerotinia sclerotiorum hypovirulence-associated DNA virus 1 (SsHADV-1), was discovered and characterized [26]. SsHADV-1 has a genome organization similar to that of circoviruses and geminiviruses, i.e. it is bidirectionally transcribed, and its replication protein (Rep) is closely related to that of geminiviruses [26]. In less than three years, six viruses (or virus-like ssDNA molecules) with similarities to SsHADV-1 have been recovered from cassava leaf samples (cassava-associated circular DNA virus; CasCV), mosquitoes (mosquito VEM SDBVL-G virus; MvemV), dragonflies (dragonfly-associated circular viruses-1, 2, and 3; DfasCV-1, -2 and -3) and European badger faecal matter (Meles meles fecal virus; MmFV) [2, 16, 18, 23]. Recently, Rosario et al. [18] suggested that these viruses are similar and should belong to a new genus, which they have tentatively named Gemycocircularvirus (Gemini-myco–like circular DNA virus). Interestingly, these viruses also share some similarities to three geminivirus-like ssDNA viruses recently recovered from sewage samples, namely baminivirus (BamiV), niminivirus (NimiV) and nephavirus (NephV) [17]. The biological properties of SsHADV-1 like viruses have yet to be determined. Nonetheless, SsHADV-1 infects and confers hypovirulence to the plant pathogenic fungus S. sclerotiorum [26]. The fact that this virus can be transmitted extracellularly raises the possibility that it has the potential to be used for virus-mediated attenuation of fungal pathogenesis [27]. Furthermore, the recent discovery of SsHADV-1 in river bank and benthic sediments in New Zealand suggested that this virus may be more common in environmental samples than previously thought [10]. In this paper, we report the identification and molecular characterization of a novel ssDNA virus that is similar to SsHADV-1 and may be a new member of the proposed genus Gemycocircularvirus.

Hypericum japonicum plants exhibiting yellow mosaic symptoms were collected from Phu My, Binh Dinh Province, Vietnam (14° 11′ 11″ N; 109° 3′ 53″ E), on February 4, 2013. Total DNA was extracted from the sample using a CTAB method as described by Harrison et al. [9]. Rolling-circle amplification (RCA) was conducted using a TempliPhi™ kit (GE Healthcare, USA) to recover the genome of a suspected begomovirus [8]. Digestion of RCA products with the restriction enzyme PstI yielded a ~2.2-kb DNA fragment, which was cloned and sequenced by primer walking (Invitrogen Co., Shanghai, China). Unexpectedly, BLASTx [1] analysis of the assembled sequence contigs revealed similarities (E value < 3 × 10−88; 77% coverage) to the sequence of DfasCV-2, an SsHADV-1-like virus. Two sets of primers with opposite orientations (forward: 5′- nt2030-AGTCCTCCATCCTCGTGATG-nt2049 3′, 5′- nt1822-ATCAGCAGACGTCCCATTTC-nt1841 3′; reverse: 5′- nt1822-GAAATGGGACGTCTGCTGAT-nt1841 -3′, 5′- nt1645-GAATTGAGTTTGTTCCGGGA-nt1664-3′) were designed to amplify the genome of the virus. The resulting sequences were identical to the one recovered by PstI digestion of the RCA product. We propose the name Hypericum japonicum-associated circular DNA virus for the putative viral isolate (HJasCV; GenBank accession no. KF413620).

The putative ORFs of HJasCV were detected using ORF Finder (http://www.ncbi.nlm.nih.gov/projects/gorf/). Like SsHADV-1, HJasCV employs an ambisense coding strategy and has two large ORFs, one in the virion-sense and the other in the complementary-sense strand (Fig. 1a). The proteins encoded by the two ORFs share significant amino acid sequence identity with putative capsid and Rep proteins of SsHADV-1 and SsHADV-1-like viruses, respectively. The identification of a putative Rep has been critical in the discovery of novel ssDNA viruses [3, 19].

Fig. 1
figure 1

a Genome organisation of Hypericum japonicum-associated circular DNA virus (HJasCV) and the putative stem-loop structure identified in the long intergenic region. b Sequence of putative intron identified within the ORF encoding for the putative replication protein. c Conserved motifs identified in the Rep of SsHADV-1 and SsHADV-1-like viruses. d Maximum-likelihood phylogenetic tree of SsHADV-1 and SsHADV-1-like viruses rooted with BamiV, NamiV and NephV. e Percentage pairwise identities (nucleotide) between SsHADV-1 and SsHADV-1-like viruses

All known SsHADV-1-like viruses have been found to contain an intron in the Rep coding region (in the complementary-sense DNA), similar to that found in mastreviruses [25]. An intron was identified in the Rep coding region of HJasCV through sequence alignment and manual inspection (Fig. 1b). When the intron was removed, HJasCV was found to encode a Rep protein that is 336 aa long. It contains the rolling-circle replication (RCR)-related motifs (Fig. 1c), including RCR motif I (amino acid residues 12-16), which is implicated in recognition and binding of the ori-associated iterative sequences in geminiviruses [11], RCR motif II (50-56), which is believed to be involved in divalent metal coordination by binding Mg2+ or Mn2+ ions required for the cleavage reaction in circoviruses [20], nanoviruses [7, 13, 22, 24], and geminiviruses [12] through invariant histidine residues, and RCR motif III (91-95), which contains a tyrosine residue that is essential for cleavage and covalently attaches to the 5′ end of the cleaved DNA, and an invariant lysine that may be essential for binding and positioning (Fig. 1c) [11, 21, 22]. These three RCR motifs have been found in geminiviruses, nanoviruses, and circoviruses at structurally equivalent positions in the Rep N-terminus [2, 19]. It is noteworthy that whereas the motif II sequences found in circoviruses and nanoviruses contain one histidine residue, those found in geminiviruses and SsHADV-1 or SsHADV-1-like viruses contain two. Additionally, SsHADV-1 and SsHADV-1-like viruses have two other motifs commonly found in the Rep proteins of geminiviruses: an NTP-binding domain (Walker A, 210-217 and Walker B, 248-252) and a GRS (geminivirus Rep sequence, 67-82) domain. The NTP-binding domain, characterized by a phosphate-binding fold (P-loop), has been proposed to confer ATPase and helicase activities to geminiviral Rep proteins. The GRS domain, which is located between motifs II and III, has been shown to be required for the initiation of geminiviral replication [4, 15].

Within the 111-bp intergenic region, we identified a stem-loop structure with the nonanucleotide sequence TAATGTTATC positioned in the loop (Fig. 1a). A conserved nonanucleotide sequence has been found in all SsHADV-1 and SsHADV-1-like viruses except Meles meles fecal virus (MmFV) [23]. Similar nonanucleotide sequences are also present in other ssDNA viruses and have been reported to be essential in replication initiation of these viruses [19]. It is noteworthy that the nonanucleotide sequence of HJasCV is different from those of SsHADV-1 and SsHADV-1-like viruses by one nucleotide (the A in the fifth position is substituted by a G). The effects of the difference on the molecular biology of HJasCV are unknown. However, it has been shown that changes in the nonanucleotide motif may not inhibit the cleavage reaction between positions 7 and 8 for geminiviruses, although the reaction efficiency may be affected [12].

A maximum-likelihood (ML) phylogenetic tree was constructed using a MUSCLE [5]-aligned dataset (including complete genome sequences of all SsHADV-1 and SsHADV-1-like viruses, BamiV, NamiV and NephV) using PHYML 3.0 [6] with GTR+I+G4 nucleotide substitution model (Fig. 1). This, coupled with the nucleotide pairwise identities calculated using SDT v1.0 [14], clearly reveals that HJasCV is most closely related to CasCV, SsHADV-1 and DfaCV-2, sharing ~62-68% nucleotide sequence identity (Fig. 1d, e). Nonetheless, it is clearly evident that SsHADV-1 and SsHADV-1-like viruses share >57% pairwise genome-wide identity.

We also constructed ML phylogenetic trees using the aligned Rep and CP amino acid sequences encoded by SsHADV-1 and SsHADV-1-like viruses and related viruses (in the case of the Rep protein, we included related fungal integron sequences). The amino acid sequence ML trees were inferred using PHYML with the LG substitution model (Fig. 2). The analyses of the Rep and CP of HJasCV with SsHADV-1 and SsHADV-1 like viruses also indicated that HJasCV is most closely related, sharing >60% amino acid sequence identity in Rep and >40% in CP, to CasCV, SsHADV-1 and DfaCV-2 (Fig. 2).

Fig. 2
figure 2

a Maximum-likelihood phylogenetic tree of the Rep amino acid sequences of SsHADV-1 and SsHADV-1-like viruses, fungal introns and other closely related viruses. b Two-dimensional pairwise identity plot comparisons of Rep amino acid sequences of SsHADV-1 and SsHADV-1-like viruses. c Maximum-likelihood phylogenetic tree of the CP amino acid sequences of SsHADV-1 and SsHADV-1-like viruses, BamiV, NamiV, NephV. d Two-dimensional pairwise identity plot comparisons of CP amino acid sequences of SsHADV-1 and SsHADV-1-like viruses

Altogether, our analysis reveals that HJasCV is a novel SsHADV-1-like virus. Given that the only known host of SsHADV-1 and SsHADV-1-like viruses to date is a fungus [26], and considering the close pathogenic or symbiotic associations between fungi and plants, it is highly likely that HJasCV infects fungi associating with H. japonicum. The identification of the fungi associating with H. japonicum and an infectious clone of HJasCV would be required to demonstrate the host and biology of HJasCV.