RNA interference (RNAi) is a conserved mechanism in eukaryotes that protects genomes against aberrant endogenous or exogenous RNA molecules [3, 6, 7, 35]. This defence mechanisms involves the processing of double-stranded RNA (dsRNA) into small interfering (si) RNAs by RNAse III Dicer-like proteins (DCLs) and the formation of an RNA-induced silencing complex mediated by proteins of the Argonaute (AGO) family to facilitate the cleavage of target RNAs through base-pairing mechanisms [3, 4, 7, 12]. Virus-derived small interfering RNAs (vsiRNAs) are key elements in plant antiviral strategies [8].

During infection, plant viruses can activate the RNA silencing machinery in infected cells through the formation of viral dsRNA by the activity of virus-encoded RNA-dependent RNA polymerase (RdRp), intermolecular base pairing between plus- and minus-strand viral RNAs, and imperfect folding of self-complementary sequences within viral ssRNA [8], and the production of vsiRNAs requires DCLs, AGOs, and RdRps [5, 31]. There are at least four DCL proteins in Arabidpsis (DCL1-4), with distinct functions [19]. DCL4, DCL2 and DCL3 target viral genomes in a hierarchical fashion to yield vsiRNAs of 21, 22 and 24 nt, respectively. Multiple AGO genes might be involved in antiviral defense, and the 5′ nucleotide of the siRNA may dictate the binding of a specific AGO protein to siRNAs [20, 23].

Cucumber green mottle mosaic virus (CGMMV, genus Tobamovirus, family Virgaviridae) causes serious damage to cucurbit crops worldwide and can be transmitted mechanically and through seed [1, 18, 30]. It causes different symptoms depending on the host species, including severe mosaic with discoloration and deformation of leaves [2, 16]. Like other tobamoviruses, CGMMV has a single-stranded, positive-sense RNA genome with a 3’ tRNA-like structure instead of a poly(A) tail [1]. The genomes of tobamoviruses encode four polypeptides: a 124- to 132-kDa protein, a 181- to 189-kDa readthrough protein, a 28- to 31-kDa movement protein (MP) and a 17- to 18-kDa coat protein (CP) [1, 13]. Both large proteins are translated from the 5’-proximal start codon. The two small proteins are expressed from individual 3’ co-terminal subgenomic mRNAs. Very little is known about the RNA silencing pathways and virus-host interactions in cucumber. The vsiRNAs could mediate RNA silencing in specific antiviral immunity and, interestingly, a few recent reports have shown that some of these vsiRNAs can guide the degradation of homologous cellular transcripts using base-pairing mechanisms to create cellular conditions suitable for viral proliferation or to induce disease symptoms [2628].

Recent high-throughput sequencing of vsiRNAs in different host-virus systems, along with functional characterization, has provided insights into the origin and composition of vsiRNAs and their potential role in virus-host interactions [24]. In the genus Tobamovirus, the global impact of oilseed rape mosaic virus infection on Arabidopsis siRNAs and their mRNA targets was determined using this method combined with transcriptome profiling [15]. The biochemical framework for small-RNA production and RNA silencing is conserved in plants [29]; thus, in this study we have characterized the vsiRNAs of CGMMV from infected cucumber plants using deep sequencing to investigate the interaction between CGMMV and its host plant.

Cucumber seedlings were kept at room temperature under insect-proof netting and inoculated with CGMMV (or phosphate buffer as a control) at the three-true-leaf stage. The CGMMV inoculum was prepared from infected cucumber leaves obtained from the Plant Laboratory of the Beijing Entry-Exit Inspection and Quarantine Bureau. Leaf samples were collected from CGMMV-inoculated or negative control plants at 14 dpi, and the presence of the virus was confirmed by RT-PCR using the primer pair 5′-GGCTTACAATCCGATCACACCT-3′ and 5′-CTAAGCTTTCGAGGTGGTAGCCT-3′. Total RNA was extracted from CGMMV-infected plants using TRIzol Reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. Total RNA concentrations were estimated using a spectrophotometer (Nanodrop ND-2000, Thermo Fisher Scientific, Wilmington, DE, USA), and the RNA sample integrity was verified using a Bio-Analyser 2100 (Agilent Technologies, Waldbronn, Germany). Small RNA libraries were prepared using the Solexa protocol (http://www.illumina.com). Briefly, the total RNA was size-fractionated by 15 % polyacrylamide gel electrophoresis (PAGE), and small RNAs of 18–28 nt were isolated. Then, the purified small RNAs were ligated with 5′ and 3′ adaptors at both ends. After purification by PAGE, the final ligation products were reverse transcribed and a cDNA library was constructed. Finally, the small RNA libraries were sequenced using the Illumina HiSeq2000 platform. After removal of the adapter sequences and low-quality reads, the remaining 18- to 28-nt sRNAs were aligned with the CGMMV reference genome sequence (GenBank accession number D12505.1). Reads showing zero mismatches with the CGMMV reference sequence were retained for subsequent analysis. Library characterization and determination of the CGMMV siRNA profile were performed using locally developed Perl scripts. Putative targets of the vsiRNAs were identified with the Patscan [11] program with the cucumber genome Cucumis sativus v1.0 in Phytozome 10.3.

A total of 28,980,811 and 29,880,020 sRNAs were sequenced from the systemic leaves of CGMMV-infected and virus-free plants, respectively. Small RNAs with 21, 22 and 24 nt were dominant in both libraries, but in the CGMMV-infected library, 21- and 22-nt reads increased significantly, while 24-nt reads decreased (Fig. 1A), which was also observed in a previous report [32]. In total, 5,415,828 vsiRNA reads were identified in CGMMV-inoculated cucumber plants, accounting for 18.70 % of 18- to 28-nt reads. Conversely, only 1,012 reads matched the CGMMV genome in the mock-inoculated library, which corresponded to approximately 0.003 % of 18- to 28-nt reads. In CGMMV-infected cucumber plants, 21- and 22-nt vsiRNAs accumulated to high levels (Fig. 1B), representing 52.01 % and 30.53 % of total vsiRNAs, respectively, which suggested that the cucumber homologues of DCL4 and DCL2 may be the predominant Dicer-like ribonucleases involved in vsiRNA biogenesis [9]. The predominance of 21- to 22-nt sequences amongst vsiRNAs has been reported previously for several plant viruses [21, 33, 34] and suggests that the 21-nt siRNA is the predominant antiviral silencing component and DCL4 (responsible for the production of 21-nt siRNA) is the major producer of viral siRNAs [4, 9, 22, 25].

Fig. 1
figure 1

Size distribution of total small RNAs in libraries prepared from either phosphate-buffer-inoculated (CK) or CGMMV-inoculated cucumber plants (A) and vsiRNAs in a CGMMV-inoculated library (B)

Previous studies have indicated that the 5′-terminal nucleotides of siRNAs have a pivotal role in directing the siRNAs to specific AGO complexes [20]. In contrast to the observations for diverse plant-virus-specific small RNAs, which display a clear tendency to begin with uracil (U) or adenine (A) [10], CGMMV siRNAs with a cytosine (C) at their 5’ termini were the most abundant, accounting for 30.85 % of all vsiRNAs, while the percentage of A, U, and G residues at their 5’ termini was 26.58 %, 25.64 %, and 19.93 %, respectively (Fig. 2A). To gain further insights into CGMMV siRNA biogenesis and sorting, the sequence complexity of these small RNAs was analyzed for different-sized species. For 21-, 22-, and 23-nt CGMMV siRNAs, a clear preference for C as the 5′-terminal nucleotide was observed, indicative of vsiRNAs with high binding affinity for the AGO5 homologue in cucumber; however, a strong bias for sequences beginning with a 5′-A was observed in the 24-nt vsiRNAs (Fig. 2A), indicative of the high binding affinity of AGO2 and AGO4 homologues for 24-nt sRNAs. The low proportion of vsiRNAs beginning with a G in our datasets is consistent with previous reports [10, 20, 21].

Fig. 2
figure 2

Relative frequency of the 5′-terminal nucleotide of vsiRNAs (A) and the polarity distribution of vsiRNAs (B). The total represents all of the CGMMV-derived vsiRNAs

To explore the origin of the vsiRNAs, their polarity distribution was further characterized. There was a slight bias towards CGMMV siRNAs derived from the sense strand (54.62 %) of the viral RNA (Fig. 2B). This is broadly similar to the results obtained for turnip mosaic virus [14] and sugarcane mosaic virus [32], where siRNAs were produced equally from the two strands, supporting the hypothesis that viral dsRNAs formed during replication are the substrates of DCL proteins. One exception is tobacco rattle virus, in the same family as CGMMV, for which vsiRNAs were reported to be derived predominantly from the viral positive-strand RNA [9], but the reason behind this observed asymmetry is not clear. Additionally, single-base resolution maps of all redundant CGMMV siRNAs along with the genomes were created using Bowtie tools [17] and in-house Perl scripts. Our results showed that CGMMV siRNAs covered almost the entire virus genome; however, there were two hotspots located at the 5′ and 3′ termini (Fig. 3A and B, respectively).

Fig. 3
figure 3

Genome map of CGMMV (A) and maps of 21- to 24-nt vsiRNAs from CGMMV-inoculated cucumber plants at single-nucleotide resolution (B). The greyish line represents the CGMMV genome, and boxes with different colors represented the different ORFs of CGMMV shown in panel A

The predicted host target genes of CGMMV vsiRNAs that were represented in the virus-infected library by more than 200 reads totaled 92 targets predicted using the Patscan software (Table S1). These predicted targets are involved in a broad range of biological processes, such as biological regulation, cellular structure, cellular processes, localisation, metabolic processes (such as fructose and mannose metabolism, folate biosynthesis), and responses to stimuli, suggesting that vsiRNAs may play important roles due to their interactions with their targets during CGMMV infection.