Balloon plants (Gomphocarpus physocarpus [syn. Asclepias physocarpa]; family Asclepiadaceae) are ornamental plants grown for use in floriculture. They are susceptible to tobacco streak virus (TSV), tomato spotted wilt virus (TSWV) and araujia mosaic virus (ArjMV) [5, 6, 9, 11]. In 2015, balloon plants showing symptoms of mosaic, mottle and crinkling were found and collected in Tainan, Taiwan. Electron microscopic examination [8] revealed the presence of filamentous viral particles about 800 × 12 nm in crude sap. Examination of ultrathin sections of diseased leaves also showed the presence of potyviral pinwheel inclusions [20]. Both observations indicate the presence of a potyvirus-like agent.

A virus culture, CM532, was isolated from a symptom-bearing G. physocarpus. It was established in Chenopodium quinoa through three consecutive single-lesion isolations. In addition to the original host (G. physocarpus), CM532 also systemically infected another balloon plant species (G. fruticosus) and Nicotiana benthamiana 14 and 7 days, respectively, after mechanical inoculation. The virus isolate CM532 was maintained in N. benthamiana and Gomphocarpus spp.

Total RNA was extracted from CM532-infected leaves by using TRIzol Reagent (Invitrogen). The primer PNIbF1 (5′-GGBAAYAATAGTGGNCAACC-3′) [12] and oligo-dT were used to amplify the 3′ region (including part of the nuclear inclusion body protein (NIb) gene, the entire coat protein (CP) gene, and the 3′ untranslated region [3′-UTR]) of the potyvirus genome. A cDNA fragment of about 1.7 kb was amplified and cloned into pCRTM II-TOPO vector (Invitrogen) according to manufacturer’s recommendations. After blue-white selection, the correct clones were sequenced with vector primers in an automated sequencer. A database search using BLAST [4] in GenBank revealed that the sequence of the cloned cDNA fragment shared 71.9% and 74.3% nucleotide and amino acid sequence identity, respectively, with that of keunjorong mosaic virus (KjMV, accession number JF838187) [15]. This was found to be the most homologous potyvirus examined.

To obtain the complete genomic sequence of the virus, a modified overlapping amplicon cloning strategy [22] was used. Seven pairs of primers (Supplementary Table S1) were designed for amplification of cDNAs of the remaining genome of CM532. All seven fragments were amplified, cloned and sequenced from the total RNAs extracted from virus-infected N. benthamiana. Contigs were assembled using SeqMan software (Version 7.0, DNASTAR). The 5′ end of the virus genome was amplified by the 5′ rapid amplification of cDNA ends (RACE) procedure using terminal deoxynucleotidyl transferase (Takara).

The genome of the isolate CM532 consists of 9,998 nucleotides, excluding the 3′-poly (A) tail. The base composition of this viral RNA is 31.5% adenine, 18.8% cytosine, 23.8% guanine, and 25.9% uracil, which is similar to that of other potyviruses [16]. However, the length of 5′-UTR (174 nucleotides) differs from that of other potyviruses, and its base composition differs from that of the rest of the RNA sequence (34.5% A, 23.0% C, 7.5% G, 35.0% U). This very low G content seems to be a common feature of plant potyviral 5′ leader sequences [16]. The genome encodes a single large ORF encompassing nucleotide positions 175-9765. The ORF comprises 9,591 nt and encodes a polyprotein of 3196 amino acid (aa) residues with a deduced molecular weight of 363.23 kDa. Nine putative cleavage sites in this polyprotein were identified according to the conserved cleavage sites predicted by Adams et al. [2] and yielded ten mature proteins, P1, HC-Pro, P3, 6K1, CI, 6K2, NIa-VPg, NIa-Pro, NIb and CP, with estimated sizes of 437, 455, 346, 52, 634, 53, 189, 246, 517 and 267 aa, respectively. The predicted cleavage sites and the locations at each protein are shown in Fig. 1. A putative coding region for protein PIPO (89 aa) with the conserved motif GA6 [7] was found overlapping the P3 coding region from nucleotide positions 3310 to 3578. Two conserved nucleotide sequence motifs, potyboxes ‘a’ (ACAACAU) and ‘b’ (UCAAGCA), have been identified in the 5′-UTRs of many potyviruses [17, 18]. In the CM532 genome, two potybox-like 7-residue sequences were found in the 5′-UTR, i.e., ACAACAC for potybox ‘a’, starting at nt 14, and CCAAGCT (with nucleotides different from the conserved potyboxes underlined) for potybox ‘b’ starting at nt 49. Conserved potyvirus motifs were also identified in the polyprotein, including FRNK619, PTK746 and IGN753 motifs in HC-Pro [10, 19], GSGKSXXXP1386, DEXH1467, SATPP1497, VATNIIENGVTL1604 and QRLGRVGR1648 motifs in CI, GDD2765 and QPSTVVDN2731 motifs in NIb, and DAG2937, MVWCIENGTSP3055, CIENGTSP3055, AFDF3131 and QMKAAAL3154 in CP [13,14,15, 19]. The full-length sequence of the CM532 genome has been deposited in the DDBJ database under the accession number LC228573.

Fig. 1
figure 1

Schematic representation of the genome organization of gomphocarpus mosaic virus (GoMV) showing its putative amino acid cleavage sites compared to five other potyviruses (see Fig. 2 legend for virus acronyms). The large open reading frame (ORF) is depicted by an open box. The proteolytic cleavage sites of the P1 to CP coding regions are indicated below the genome. Amino acids that differ from the GoMV cleavage sites are marked in red. Numbers above boxes indicate the nucleotide positions of each mature protein. The position of putative protein PIPO is also shown

Despite the conserved potyviral features identified in the CM532 genome, a BLAST [4] search in GenBank using CM532 full-length nucleotide and deduced polyprotein amino acid sequences as queries revealed that CM532 represents a distinct potyvirus, sharing at most 68% and 75% identity with other potyviruses in its nucleotide and amino acid sequence, respectively. To further analyze sequence similarities and phylogenetic relationships, the most closely related potyviruses (as revealed by the BLAST search) were selected for pairwise comparison with CM532 using Vector NTI (Invitrogen) [1].

Comparison of the entire ORF nucleotide sequence of CM532 and the most closely related potyviruses showed nucleotide sequence identity values ranging from 68.4% with keunjorong mosaic virus (KjMV, JF838187) to 52.9% with squash chlorosis mottling virus (SqCMV, MF362994) (Supplementary Table S2). For the polyprotein, deduced amino acid sequences of CM532 and the selected potyviruses shared identity values ranging from 75.8% with KjMV to 39.1% with SqCMV (Supplementary Table S1). These results show that the similarity levels of both nucleotide and amino acid sequences are below the species demarcation threshold value for the genus Potyvirus (i.e., 76% nucleotide and 82% amino acid sequence identity) [1, 3, 21]. This indicates that the studied isolate, CM532, which we have tentatively named “gomphocarpus mosaic virus” (GoMV), may belong to a novel species of the genus Potyvirus.

In addition, comparison of the individual genes of GoMV to those of the most closely related KjMV showed 43.4%, 69.8%, 69.0%, 73.3 %, 68.6%, 74.7%, 72.0%, 77.3% nucleotide sequence identity in P1, HC-Pro, P3, CI, Vpg, NIa-Pro, Nib, and CP, respectively (Supplementary Table S1). All were below the species demarcation threshold, ranging from 58% for the P1 gene to 74-78% for the other genes [1, 3, 21]. Furthermore, analysis of the polyprotein cleavage sites showed differences between GoMV and KjMV (Fig. 1, e.g., the P3/6K1 cleavage sites are VRLQ/N and VRLQ/S for GoMV and KjMV, respectively, indicating differences in autoproteolytic specificity), which is another species demarcation criterion established by the ICTV [3]. The putative cleavage sites in the polyprotein, targets for the virus-encoded proteases, were identified according to the predicted conserved cleavage sites [1].

Phylogenetic analysis was conducted using MEGA 7. A phylogenetic tree was constructed from alignments of the complete amino acid sequences of GoMV and 29 potyviral polyproteins (Fig. 2). Ryegrass mosaic virus (RgMV, Y09854) was used as an outgroup because it belongs to the genus Rymovirus, whose members are the closest relatives of the potyviruses. GoMV does not fit into any existing group but is closely related to KjMV.

Fig. 2
figure 2

Phylogenetic tree of gomphocarpus mosaic virus (GoMV) and 29 selected potyviruses based on the complete amino acid sequence of the polyprotein. The tree was constructed using MEGA 7 based on the neighbor-joining method. Bootstrap analysis was applied using 1000 replications. Numbers shown at branch points indicate bootstrap values, and branches with less than 70% support were collapsed. Sequences of potyviruses for comparisons were obtained from the GenBank database, including those of Algerian watermelon mosaic virus (AWMV), banana bract mosaic virus (BBrMV), basella rugose mosaic virus (BaRMV), bean common mosaic necrosis virus (BCMNV), beet mosaic virus (BtMV), catharanthus mosaic virus (CatMV), chilli ringspot virus (ChiRSV), cowpea aphid-borne mosaic virus (CABMV), daphne mosaic virus (DapMV), dasheen mosaic virus (DsMV), East Asian passiflora virus (EAPV), freesia mosaic virus (FreMV), hardenbergia mosaic virus (HarMV), keunjorong mosaic virus (KjMV), Moroccan watermelon mosaic virus (MWMV), ornithogalum mosaic virus (OrMV), papaya leaf distortion mosaic virus (PLDMV), papaya ringspot virus (PRSV), peanut mottle virus (PeMoV), potato virus A (PVA), potato virus V (PVV), potato virus Y (PVY), soybean mosaic virus (SMV), squash chlorosis mottling virus (SqCMV), telosma mosaic virus (TeMV), watermelon mosaic virus (WMV), zantedeschia mild mosaic virus (ZaMMV), zucchini yellow mosaic virus (ZYMV), and zucchini tigre mosaic virus (ZTMV). Ryegrass mosaic virus (RgMV) was used as an outgroup