Lily plants (species Lilium) are popular ornamentals because of their beautiful flower colors and floral scents, and there are more than 4500 cultivated varieties [1]. Lily plants are frequently infected with viruses, including cucumber mosaic virus (CMV), lily mottle virus (LMoV), lily symptomless virus (LSV), lily virus X (LVX), arabis mosaic virus (ArMV), plantago asiatica mosaic virus (PlAMV), strawberry latent ringspot virus (SLRV), tobacco rattle virus (TRV), and tomato ringspot virus (TRSV) [1,2,3,4]. Once infected by one or more viruses, symptoms of mosaic, chlorosis, streaking and/or leaf deformation often occur [5]. Viruses are transmitted vegetatively and may reduce the value of flowers.

In 2016, lily plants exhibiting virus-like symptoms of leaf yellowing, twisting and brownish necrotic spots were observed in the experimental field of Beijing Agricultural University under natural conditions (Supplemental Fig. 1). To identify the potential viruses involved in the disease, small-RNA deep sequencing was conducted using the Illumina Hiseq2000 sequencing platform. Raw Illumina sRNA reads were trimmed and cleaned by removing sequences shorter than 16 nucleotides (nt) or longer than 30 nucleotides, low-quality tags, and polyA or N tags using an in-house Perl script. A total of 4522 contigs with size ranging from 33-389 nt were assembled de novo using Velvet (k-mer value of 17) [6], and BLAST analysis against the GenBank database showed that most of the contigs were host-derived, while 159 were virus-like. Of these, 42 were of CMV and 61 were of LSV. The remaining virus-like contigs showed a high degree of amino acid (aa) sequence similarity to different proteins of potyviruses in BLASTx analysis, indicating a potential new potyvirus. All three viruses were confirmed by RT-PCR using virus-specific primers (Table S1).

Fig. 1
figure 1

Schematic representation of the genomic organization of LYSV. The 5′ and 3′ nontranslated regions (NTR) are represented by lines, and the large open reading frame (ORF) is depicted by an open box. The numbers below the diagram indicate the starting position (nt) predicted for each gene. The proteinase cleavage sites were also listed

To determine the complete genome sequence of the new potyvirus, conventional RT-PCR with primer pairs targeting different protein regions was carried out, and the 5′ and 3′ ends were determined by RACE PCR (Table S1). This virus was tentatively named “lily yellow mosaic virus” (LYMV), although its contribution to the observed symptoms on the multiply-infected host plant are unknown. The genomic RNA of LYMV-BJ was 9811 nt in length (accession number MF543013), excluding the poly(A) tail at the 3′ end with the 5′ and 3′ untranslated regions of 217 and 219 nt, respectively. Protein sequence analysis revealed nine putative proteinase cleavage sites for ten mature proteins at amino acid positions 364, 821, 1168, 1221, 1856, 1909, 2099, 2342 and 2857 (Fig. 1), which are conserved in potyviruses. The recently identified ORF encoding the PIPO protein [7] was identified from a GA6 motif at position 3140-3146 within the P3 region protein in the +2 reading frame. The conserved domains HX8DX28GXSG and FIMRGR in P1 were located at position 275-316 and 336-341 of the amino acid (aa) sequence, respectively [8]. In HC-Pro, the conserved motifs HXCX27CX2C [9], FRNK [10, 11] and PTK were found at polyprotein positions 388-421, 544-547, and 673-675. The conserved motif KITC was present as 415RITC418 in LYMV-BJ, which has also been found in some potyviruses including pepper vein banding virus and tobacco vein banding mosaic virus [9, 12, 13]. However, the less-conserved motif IGN in the central region of HC-Pro, which is essential for genome amplification [14], was IGK (aa 613-615) in LYMV. In P3, the conserved residues EPY-(X)7-SP-(X)2-L were at aa position 853-867 of the polyprotein [15]. In RNA helicase CI, the conserved motifs GXXGXGKS (aa 1,306-1,313), VLLLEPTRPL (aa 1,274-1,280) [16], DECH (position 1,395-1,398), LKVSATPP (1,422-1,429), LVYV (1,473-1,476), VATNIIENGVTL (1,524-1,535) and GERIQRLGRVGR (1,567-1,578) [15, 17] were all found. In NIa-Pro, the motif H(X)2T(X)3GHCG, which is responsible for the proteinase activity [18], was identified at position 2,244-2,254. In NIb, the conserved residues SIKAEL and ADGSRFD were located at positions 2,514-2,519 and 2,590-2,597, respectively. The conserved amino acids FDSS at position 2,596-2,599 of the polyprotein were located 261 aa upstream of the putative NIb/CP cleavage site [19]. The conserved motif (S/T)G-(X)3-T(X)3-N(S/T)(X)18–37GDD began at position 2,651. In CP, the DAG motif, which interacts with PTK of HC-Pro to regulate potyviruses transmission by aphids [20], was located at position 2,865-2,867. The three consensus motifs found in the CP of potyviruses [19] were also found in LYMV-BJ (MVWCIENGTSP, 2,974–2,984; AFDF, 3057-3060; QMKAAA, 3,077-3,082). Sequence analysis of all of the gene products showed that most of the LYMV-encoded proteins shared high levels of aa sequence identity with Thunberg fritillary mosaic virus, while the CP shared 71%, the highest, with pepper veinal mottle virus, which was below the threshold value used to discriminate between species and strains within the genus Potyvirus [21].

To examine the taxonomic position of LYMV, phylogenetic trees were constructed using the maximum-likelihood (ML) method implemented in the MEGA6.06 program [22], and LYMV clustered with Thunberg fritillary mosaic virus (Fig. 2), which was consistent with the aa sequence analysis.

Fig. 2
figure 2

Maximum Likelihood trees obtained from alignment of the complete genomic amino acid sequence of some potyviruses with 1000 bootstrap replicates. Bootstrap values are given by numbers at the relevant nodes in the topology and only bootstrap values of ≥50% were shown