Wisteria sinensis, also known as Chinese wisteria, is a long-lived perennial shrub-like climbing vine in the family Fabaceae [13]. It is commonly distributed in natural forests, riparian zones and ruderal areas due to the characteristic of ease of establishment, hardiness, fast growth and long lifespan, but in city parks and home gardens, it is highly valued for its foliage and the blue or white pea-like flowers that adorn walls and pergolas as ornamental plants. As with other flowering plants, W. sinensis is affected by different biotic agents. Wisteria powdery mildew, which is caused by various members of the genus Erysiphe [7]; bacterial leaf spot, which is caused by Pseudomonas syringae pv. syringae [25]; and wisteria mosaic disease, which is caused by wisteria vein mosaic virus (WVMV) [4, 6, 10, 17] and cucumber mosaic virus [20], are important diseases of W. sinensis that reduce their quality and ornamental value. Due to vegetative propagation by cuttings, grafting, and air layering, viruses are easily transmitted to propagules and can be dispersed worldwide in the W. sinensis plant trade.

RNA silencing is a conserved antiviral defense mechanism in plants. Once a plant is infected by a virus, antiviral silencing is triggered by cleaving viral double-stranded RNA and highly structured single-stranded RNA (ssRNA) resulting from base pairing between plus- and minus-strand viral RNAs or imperfect folding of self-complementary sequences within viral ssRNA and processing these cleavage products into small interfering RNAs (siRNAs) of different length. The sequences of these virus-derived siRNAs overlap and can be assembled into long contigs and even complete virus genome sequences by using bioinformatic tools, providing a rapid and unbiased method for virus identification. This method has contributed tremendously to unraveling the nature of microbial agents associated with disease and characterization of a number of novel or known plant viruses with RNA or DNA genomes [13, 19, 27].

Badnaviruses (family Caulimoviridae) are important agricultural and horticultural pathogens that cause serious crop losses [8]. In recent years, new badnaviruses have been identified from different plants including taro [14], yacon [16], pagoda tree [26] and grapevine [18]. The badnavirus genome consists of a 7.2- to 9.2-kb circular double-stranded DNA molecule that is encapsidated in a nonenveloped bacilliform virion and typically encodes three open reading frames (ORFs) that are tandemly arranged on the plus strand [8]. ORF1 encodes a small protein of unknown function, while the protein encoded by ORF2 is virion-associated. ORF3 encodes a large polyprotein that is post-translationally cleaved into the movement protein (MP), coat protein (CP), viral aspartic protease (AP), reverse transcriptase (RT) and ribonuclease H (RNase H) [8]. Additional ORFs have been reported in taro bacilliform virus [28], piper yellow mottle virus [11], cacao swollen shoot virus [9], PYMAV [26], citrus yellow mosaic virus [12] and dracaena mottle virus [22].

In June of 2015, a W. sinensis plant with mosaic and crinkle symptoms in the leaves was observed in Beijing, China. To identify the causal agent(s), the symptomatic leaves were collected and quickly frozen in liquid nitrogen, and total RNA was extracted from the samples using TRIzol Reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer’s recommended procedures. High-quality RNA assessed using a NanoDrop ND-1000 spectrophotometer and Agilent 2100 Bioanalyzer was then used for sRNA library construction following the recommendations of New England Biolabs. Briefly, 16- to 28-nt small RNAs were gel-purified from a 15% PAGE gel, adaptors were added to both ends of the purified RNAs, and the RNAs were amplified by RT-PCR using adaptor-specific primers. The PCR products were isolated and gel-purified. Sequencing was performed on a HiSeq 2000 platform (Illumina). Raw Illumina sRNA reads were trimmed and cleaned by removing sequences smaller than 16 nucleotides or longer than 30 nucleotides, low-quality tags, and polyA or N tags, using an in-house Perl script. The resulting reads were assembled into contigs using Velvet software [29] with a k-mer value of 17. These contigs were then further analyzed by BLAST searches against the GenBank nr nucleotide and protein databases at the NCBI.

After quality trimming of the raw sequence data, a total of 29,116,531 clean reads were obtained and 14,303 contigs were assembled using the Velvet program, with lengths ranging from 33 to 355 nt. A BLASTn and BLASTx analysis of assembled contigs against the NCBI database, using a highly homology sequence search, identified 238 and 162 contigs showing high sequence similarity to WVMV (genus Potyvirus, family Potyviridae) and pagoda yellow mosaic associated virus (PYMAV) (genus Badnavirus, family Caulimoviridae), respectively, while most of the remaining sequences were of host origin. The presence of the two viruses was confirmed by RT-PCR/PCR using primers that were designed based on the assembled contigs (Supplemental Table 1).

To obtain the complete genome sequence of the candidate badnavirus, total DNA was extracted from leaf tissues using a TaKaRa MiniBEST Plant Genomic DNA Extraction Kit, and PCR was conducted with LA Taq DNA polymerase using primer pairs targeting overlapping fragments designed based on the assembled contigs. The PCR products were ligated into the vector pMD19-T (TaKaRa, Dalian, China), and eight independent clones from each sample were sequenced. The reconstructed circular genome of the new badnavirus isolate, designated as “wisteria badnavirus 1” (WBV1), was 7326 nt in length, which is within the range of badnavirus genomes, and deposited in the GenBank database with the accession number KX168422. To confirm the existence of this virus, rolling-circle amplification (RCA) was conducted using an Illustra TempliPhi 500 Amplification Kit (GE Healthcare). The RCA product was digested with StuI or SphI, and the expected band was observed in an agarose gel (Supplemental Figure 1). Sequence comparison of WBV1 with other reported badnaviruses showed that it shared the highest (69%) nt sequence identity with PYMAV. As with other badnaviruses, the genome starts at the conserved tRNAmet primer binding site (5′-TGGTATCAGAGCTCAACA-3′) [5], which lies before a 215-bp-long 5′ non-coding region, followed by three conserved ORFs, which have overlapping start and stop codons (5′-ATGA-3′) (Fig. 1). A fourth potential ORF, ORF4, which is not typical of badnaviruses but has also been observed in the genomes of fig badnavirus 1, cacao swollen shoot virus (CSSV) and PYMAV [15, 21, 26], was found overlapping the end of ORF3 (Fig. 1). ORF1 (nt 216–671) of WBV1 potentially encodes a 151-aa protein with a predicted molecular weight (MW) of 17.8 kDa. BLAST analysis showed the ORF1 shares 72% nt and 73% aa sequence identity with ORF1 of PYMAV, and no conserved domains were detected. ORF2 (nt 668–1,105) encodes a 145-aa protein with a predicted MW of 16.3 kDa. It shares 75% nt and 79% aa sequence identity with ORF2 of PYMAV. ORF3 (nt 1,102–6,831) encodes a 1,909-aa polyprotein with a predicted MW of 215.6 kDa that shares 72% nt and 78% aa sequence identity with PYMAV ORF3 and contains domains homologous to those of MP, AP, RT and RNase H and a highly conserved zinc-finger-like RNA binding domain (CXCX2CX4HX4C) in the polyproteins of badnaviruses. In the RT-RNase H region of the ORF3, WBV1 shares the highest (74%) nt sequence identity with PYMAV. This value is below the 80% identity threshold used as the species demarcation criterion in the genus Badnavirus according to the International Committee on Taxonomy of Viruses [8], indicating that WBV1 represents a new badnavirus. ORF4 partially overlaps ORF3 (nt 6432-6890) and encodes a peptide of 152 aa. This ORF shares 74% nt and 75% aa sequence identity with PYMAV ORF4. To examine the taxonomic position of WBV1, all of the available badnaviruses in the GenBank database were downloaded and aligned using the Clustal X program [24]. A phylogenetic tree was constructed with the aligned sequences using the neighbor-joining (NJ) method implemented in the MEGA6.06 program [23] with the best-fit models, and phylogenetic analysis showed that WBV1 clustered together with PYMAV (Fig. 2). To the best of our knowledge, this is the first report of a DNA virus infecting wisteria. In our survey of viral diseases in wisteria, mosaic and crinkle symptoms were commonly observed, and the association of WBV1 with any of the observed symptoms should be investigated. Furthermore, studies focused on the transmission pattern, host range, epidemiology and population structure of this virus should be conducted to form the basis for virus control.

Fig. 1
figure 1

Schematic representation of the genome organization of wisteria badnavirus 1 showing open reading frames 1-4 (ORF1-ORF4), as well as the tRNA primer binding site and conserved functional motifs in the ORF3 polyprotein

Fig. 2
figure 2

Neighbor-joining phylogenetic tree obtained from alignment of the available complete genome sequences of badnaviruses, with 1000 bootstrap replicates. Bootstrap values are shown at the relevant nodes, and only bootstrap values ≥50% are shown. The scale bar represents genetic distance (substitutions per nucleotide)