Introduction

Sweet potato (Ipomoea batatas) plants are grown extensively throughout tropical and temperate regions worldwide and are the third most crucial root crop, following Irish potato and cassava. Viral diseases of sweet potatoes have become widespread, causing severe crop losses worldwide. More than 30 viruses have been identified to infect sweet potatoes [1]. Among the different viruses infecting sweet potato, sweet potato latent virus (SPLV), sweet potato mild speckling virus (SPMSV), sweet potato virus G (SPVG) and sweet potato virus 2 (SPV2) have been recognized as distinct species of the genus Potyvirus in the family Potyviridae. SPV2 was first isolated from the Taiwan province of China and Nigeria [2] and has since then, the virus has been found in sweet potato crops in mainland China, South Korea, South Africa, Spain, the United States, Australia and other countries [3,4,5,6,7]. The sweet potato SPV2 has also been detected in plants belonging to the genus Chenopodium and Nicotiana. However, there has been no report of SPV2 infecting Ipomoea nil so far.

SPV2 is a positive single-stranded RNA virus that forms scroll-like inclusions in the cytoplasm [8]. SPV2 encodes a polyprotein characteristic of a Potyviruse, which is cleaved into ten functional proteins: the first protein (P1), helper-component proteinase (HC-Pro), the third protein (P3), the first 6 K protein (6K1), cytoplasmic inclusion protein (CI), the second 6 K protein (6K2), viral genome-linked protein (VPg), nuclear inclusion protein a-proteinase (NIa), nuclear inclusion protein b (NIb) and coat protein (CP) [9]. In addition, there is a small protein, pretty interesting sweet potato potyvirus ORF (PISPO) and pretty interesting potyvirus ORF (PIPO) encoded by a translational frameshift in P1 and P3 proteins [10, 11].

SPV2 has not been studied extensively, but its complete genome sequence analysis was recently reported [4, 12,13,14]. The P1 protein was highly conserved in potyviruses that infect sweet potatoes [12]. During turnip mosaic virus (TuMV) infection, the P1 protein influenced virus proliferation. The host factor NOD19 interacts with the P1 and is necessary for strong infection [15]. The P1 of TuMV also interacts with the chloroplast protein cpSRP54 leading to its degradation through the 26 S proteasome and autophagy pathways. This ultimately inhibits the synthesis of Jasmonic acid (JA) by suppressing cpSRP54 thereby enhancing viral infection [16]. Additionally, the P1 protein of some viruses are involved in host selection. The nucleotide and amino acid sequence of P1N-PISPO has been identified among SPVG, SPV2, sweet potato feathery mottle virus (SPFMV) and sweet potato virus C (SPVC).

Recombination is an important force in the evolution of plant viruses, including several Potyviridae members [17, 18]. Potential recombination events occurred in the P1, HC-Pro, and NIa-NIb regions within the SPFMV and SPLV species. The P1 is under a looser evolutionary constraint and is vulnerable to recombination in the Potyviridae family [18,19,20]. The GmEF1A and GmVAT have a strong interaction with SMV-P3 and play a critical role in virus replication. Blocking GmEF1A and GmVAT can inhibit the infection of soybeans by SMV [21]. The interaction between the membrane protein SMV-P3 and GmRHP may also be involved in the infection of potato virus Y. GmRHP may be an important host factor for P3 to participate in the replication of potato virus Y [22]. The analysis of protein interaction networks revealed that some proteins interacting with SMV-P3N-PIPO are involved in the host signal transduction and chloroplast photosynthetic system pathways, which are related to the formation of soybean symptoms caused by SMV. This indicates that P3N-PIPO may also play a role in the virus’s pathogenicity besides mediating the virus’s intercellular transport [23]. The nucleotides and amino acid sequences of CP are highly conserved among SPFMV, SPVC, SPVG, SPV2 and Sweet potato virus-Zimbabwe [12]. The variations in CP is mainly at the N-terminal region among the different species or different strains of these viruses [12]. The high level of homology at the C-terminal regions of the isolates indicates that this region may play a critical function in these viruses. Studying the genetic structure and diversity of viruses is important to understand the molecular evolutionary histories of their virulence, dispersion and emergence of epidemics [24].

In this study, we reported for the first time the occurrence of SPV2 in I. nil in China and analyzed the complete genomic sequence of SPV2-LN. Our results will provide a theoretical basis for understanding the genetic evolution and viral spread of SPV2, and could provide a theoretical basis for comprehensive control of the virus.

Materials and methods

Sample collection

Symptomatic leaf samples of I. nil showing yellow vein symptoms were collected in November 2021 at Guangxi University, Nanning City, Guangxi Province, China (N22°50′28.41″, E108°17′9.00″) and photographed. Inoculum was prepared from the symptomatic leaf (1 g/10 mL 1×PBS) and used to mechanically inoculate N. benthamiana plants. Symptom expression was observed and photographed at 4 days post inoculation.

RNA extraction, RT-PCR

Total plant RNA was extracted from the leaves of the symptomatic I. nil using the Plant RNA Extraction Kit (Vazyme Biotech, Nanjing, China) according to the manufacturer’s instructions. The first strand of cDNA was synthesized by mixing 5 µL of total RNA and 1 µL oligo(dT)18 in a PCR tube and incubating at 65 °C for 5 min and quickly transferring into ice for 2 min. Residual DNA was removed by adding 2 µL of 4×gDNA wiper mix (Vazyme Biotech, Nanjing, China) to the above mixture and incubating at 42 °C for 3 min. 2 µL of 5×HiscriptII qRT SupermixII (Vazyme Biotech, Nanjing, China) was added to the above mixture and incubated at 50 °C for 30 min, Finally, the mixture was maintained at 85 °C for 5 s, and stored in a -20 °C freezer till further use.

Small RNAs sequencing

An RNA library was constructed using the Small RNA Sample Pre Kit (Vazyme Biotech, Nanjing, China). Total RNA was used as the starter sample, and adapters were directly added to both ends of the Small RNA before reverse transcription to synthesize cDNA. Subsequently, the target DNA fragments were amplified by PCR using primers SPV2-1 F / SPV2-1390R, SPV2-1 F / SPV2-1350R, SPV2-7103 F / R, SPV2-10147 F / oligo(dT)18 (Supplementary table A1). A total of 50 µL PCR reaction system was made consisting of 2 µL forward primer (10 mM), 2 µL reverse primer (10 mM), 2 µL cDNA, 25 µL 2×taq master mix (Vazyme Biotech, Nanjing, China), and 19 µL ddH2O were added into the PCR tube. The PCR reaction condition was pre-denaturation at 95 °C for 5 min. The mixture was denatured at 95 °C for 15 s, annealed at 55 °C for 15 s, and extended at 72 °C for 90 s (elongation time according to 1 kb/min) with 32 cycles. Finally, the mixture was maintained at 72 °C for 10 min and stored in a 4 °C. The amplicons were separated by PAGE gel electrophoresis and sequenced using the Illumina Novase sequencing platform.

Rapid amplification of cDNA ends (5’ RACE)

The 5’-UTR sequence was amplified using the Rapid Amplification Kit for cDNA Ends (RACE) (Sangon Biotech, Shanghai, China) according to the manufacturer’s protocol. The first strand of cDNA was synthesized as a DNA-RNA hybrid using total RNA as a template and SPV2-243R as a primer (Supplementary table A1). The mixture with 15 µL DNA-RNA mix, 15 µL 5×Hybrid RNA Degeneration Buffer, 1 µL RNase H, and 44 µL DEPC water was kept at 30℃ for 1 h. Single-stranded DNA was purified with a rapid gel recovery kit (Sangon Biotech, Shanghai, China). The mixture with 26 µL cDNA, 10 µL 5×TdT Buffer, 12.5 µL 0.1% BSA, 1 µL 100 M dATP, and 0.5 µL TdT polymerase was kept at 37℃ for 10 min. SPV2-243R, 5’ RACE-QT, and 5´RACE-Q0 were used as primers for PCR amplification, and SPV2-136R/5’ RACE-QI were used as primers for the second round of nested PCR [25, 26].

Genomic sequence analysis

The original data files obtained after sequencing were converted into sequenced reads (Raw data) by base calling analysis. To ensure the quality of sequence reads, we obtained clean reads by removing the low-quality ends. 18 to 26 nt small RNAs were acquired from the above clean reads for subsequent analysis. We compared the above small RNAs with GenBank virus RefSeq using Bowtie to identify the specific viruses infecting the plants. Longer contigs were generated by mapping small RNA assemblies and splicing with SPAdes and Velvet. We annotated the contigs using the databases of NCBI Nr, NCBI Nt and GenBank Virus RefSeqs. We have submitted the raw data related to this study to NCBI. The accession to cite for these SRA data is PRJNA1040476. For detailed information, please refer to this website https://www.ncbi.nlm.nih.gov/sra/PRJNA1040476.

The complete genome of SPV2-LN was obtained by fragment splicing using DNAMAN (version 10) and SEQMAN [27]. The complete genome of SPV2-LN and 20 SPV2 isolates obtained from the NCBI (National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov/) database (Supplementary Table A3) were used for sequence alignment using CLUSTAL W of MEGA11. Polyprotein cleavage site analysis was performed on the website https://www.dpvweb.net/potycleavage/index.html to determine the protease recognition sites for SPV2-LN and 20 SPV2 isolates.

Genome base composition and codon usage bias

To study the evolution and environmental adaptation of different virus isolates, the base composition of SPV2-LN was analyzed by using DNAMAN. The RSCU (Relative synonymous codon usage) value was calculated by codonW1.4.2, and the codon bias was analyzed using the GGPLOT2 and APLOT packages of the R language [28]. If the RSCU value is < 1, the codon is used less frequently than other synonymous codons; If RSCU value > 1, the codon is used more frequently than other synonymous codons; If RSCU value = 1, the codon has no preference.

Sequence consistency rate analysis

Based on the coding region of SPV2-LN and information from SPV2 isolates in GenBank, the start and end positions of each protein were determined. The CLUSTAL W program of MEGA11 was used for sequence alignment of the SPV2 genomes, and the nucleotide and amino acid sequence identities were determined using BIOEDIT [29].

Recombination analysis

All the SPV2 nucleotide sequences were aligned with CLUSTAL W in MEGA11. Recombinant analysis was performed using the RDP4 package [30, 31]. The package contains RDP, CENECONV, BOOTSCAN, MAXCHI, CHIMAERA, SISCAN and 3SEQ. Each software parameter was set to the default value, and the P-value was 0.05. When more than six algorithms support recombination, the isolate is considered to be recombinant, otherwise, it is considered to be non-recombinant [30, 32].

Phylogenetic and genetic distance analysis

To explore the phylogenetic relationship between the SPV2 isolates and other isolates, a phylogenetic tree was constructed using the neighbor-joining method, with the bootstrap value set to 1000 [33]. The Kimura two-parameter model (K2P) method in MEGA11 was used to calculate the genetic distance within and between groups, and SPFMV (GenBank number NC_001841.1) was used as an out-group.

Selection pressure analysis

To reveal the evolutionary pressure on the SPV2 species [34], the online analysis software Datamonkey (http://www.datamonkey.org/) was used to calculate the dN/dS (non-synonymous/synonymous) values between different genes. The values of dN/dS were used to analyze and predict the selection pressure on each SPV2 protein. The codon selection pressure of 12 proteins at different positions of 21 SPV2 isolates were determined. dN/dS=1, dN/dS<1 and dN/dS>1 indicate neutral, negative and positive selection, respectively.

Results

Symptoms of SPV2-LN

The I. nil infected with SPV2 showed mottling and yellow vein symptoms (Fig. 1A). The Nicotiana benthamiana plants inoculated with this sample slurry by mechanical inoculation also showed leaf curling and yellowing symptoms at 4 dpi (Fig. 1B). The SPV2 infecting the I. nil plant was detected by small RNA sequencing, and the isolate was named SPV2-LN. All spliced sequences yielded a complete genomic sequence of SPV2-LN with a characteristic Potyvirus genome structure (Fig. 1C).

Fig. 1
figure 1

The symptoms on different hosts and amplification strategy of SPV2-LN. (A) Field symptoms on I. nil. (B) Symptoms on N. benthamiana plants at 4 dpi. (C) Genome structure of SPV2-LN and genome sequencing primer design

Base composition and codon preference of SPV2-LN genome

Based on genome sequencing and splicing results, the complete genome size of SPV2-LN was predicted to be 10,606 nt. This was deposited in the GenBank with accession number OR842902. The length of the coding region was 10,278 nt, which was composed of 3,266 A (31.8%), 2,558 T (24.9%), 2,497 G (24.3%) and 1,957 C (19.0%). The CDS encoded a polyprotein containing 3,425 amino acids of which leucine (Leu) was the most abundant amino acid, accounting for 8.82% and Tryptophan (Trp) was the least abundant amino acid, accounting for only 1.20%. The relative synonymous codon usage (RSCU) of the degenerate codon of SPV2-LN (Fig. 2), showed a preference for all amino acids, except for Cys, Glu, Lys and Gln: GCC (Ala), GAU (Asp), UUU (Phe), GGA (Gly), CAU (His), AUA (Ile), UUG (Leu), AAU (Asn), CCA (Pro), AGA (Arg), AGU and UCA (Ser), ACA (Thr), GUU (Val), UAU (Tyr). The third codon GC content (GC3s) was 39.4%, and the frequency of A/U was higher than that of G/C. The RSCU value was > 1 and with 28 codons, among them 21 codons ending in A/U, indicating that A and U were the preferred codons in the genome (Supplementary table A2). It was revealed that the codon terminal bases of the SPV2-LN genome were mainly A/U.

Fig. 2
figure 2

The RSCU value of codons in SPV2-LN. RSCU value > 1, indicating relative positive codon bias

Fig. 3
figure 3

Phylogenetic relationship between SPV2 isolates. Neighbor-Joining tree generated from the genomic sequences of SPV2-LN and the other 20 isolates of SPV2. ▲indicates isolate obtained in this study. The scale bars showed a genetic distance of 0.050

Protease recognition site analysis

The genome structure of SPV2-LN contains a conserved gene sequence of Potyvirus. The SPV2-LN polyprotein is cleaved into 12 mature proteins (P1, HC-Pro, P3, 6K1, CI, 6K2, VPg, NIa, NIb, CP, P1N-PISPO and P3N-PIPO proteins.) by protease hydrolysis and transcriptional sliding. The conserved motifs of the protease cleavage sites of each coding region of the SPV2-LN polyprotein were determined (Table 1). The protease recognition sites of 20 SPV2 isolates downloaded from NCBI were shown in Supplementary Table A5.

Table 1 The genomic structure and protease cleavage sites of SPV2-LN by comparison with other SPV2 isolates

Consistency rate of SPV2-LN with other SPV2 isolates

The amino acids and nucleotides sequence identities of SPV2-LN with other 20 SPV2 isolates were analyzed (Supplementary table A4). SPV2-LN had the highest nucleotide and amino acid sequence identity with isolate yu-17-47 (GenBank No. MK778808), 92.7% and 97.0% respectively. Among the 12 proteins, the amino acid sequence identities between SPV2-LN and all other SPV2 isolates were 87.0-95.3% for P1, 55.8–98.0% for HC-Pro, 28.1–94.6% for P3, 59.6–96.1% for 6K1, 64.6–98.7% for CI, 47.1–90.5% for 6K2, 59.2–98.9% for VPg, 61.7–98.7% for NIa, 65.8–97.6% for NIb, 52.1–97.7% for CP, 27.0-96.9% for P3N-PIPO and 59.3–93.6% for P1N-PISPO. The nucleotides sequence identities between SPV2-LN and all other SPV2 isolates were 89.2–96.8% for P1, 58.3–87.5% for HC-Pro, 44.0-87.6% for P3, 55.1–92.9% for 6K1, 63.0–92.0% for CI, 54.0-89.3% for 6K2, 59.4–93.7% for VPg, 59.3–95.4% for NIa, 62.2–94.4% for NIb, 52.8–95.5% for CP, 33.8-85.4% for P3N-PIPO and 61.0-96.7% for P1N-PISPO. The nucleotide sequence identities of the P1N-PISPO and P1 proteins were higher than those found for the other proteins.

P1N-PISPO, HC-Pro, CP and NIb contained recombination sites

The nucleotide sequence analysis of 21 SPV2 isolates using RDP4 showed that four SPV2 isolates had similar recombination sites (Table 2). The recombination isolates yu-17-47 were from China, SSBles-74_ZA and SPV2 were from South Africa, and SPV2-SP1 was from Greece. Meanwhile, the recombination analysis of the nucleotide alignment of each of the genes showed no obvious recombination sites in the sequences of P3, P3N-PIPO, 6K1, CI, 6K2, VPg and NIa. Recombination sites were only found in P1N-PISPO of SSBles-74_ZA, HC-Pro of SPV2, CP of SPV2-SP1 and NIb of SPV2-SP1 (Table 3).

Table 2 Recombinant analysis of SPV2 coding sequence
Table 3 Recombinant analysis of P1, P1N-PISPO, HC-Pro, NIb and CP in SPV2

Phylogenetic and genetic distance analysis

The phylogenetic tree was constructed using the Neighbor-Joining method based on the complete genome sequences of 21 SPV2 isolates (Fig. 3). The results showed that the 21 SPV2 isolates grouped into four clusters (Fig. 3). The SPV2-LN isolate was most closely related to isolate yu-17-47 in group IV (Fig. 3). SSBles-74_ZA and SPV2 from South Africa were in the group III and II. The rest of the SPV2 isolates were clustered in group I.

Genetic distances within and between groups were analyzed (Table 4). The results showed that the genetic distance within group 1 was the lowest. The genetic distance between group IV and other groups (group IV vs. III=0.447, group IV vs. II=0.443, and group IV vs. I=0.475), was more than the genetic distance between the other three groups. The majority of SPV2 isolates were isolated from sweet potato, and only SPV2-LN was isolated from I. nil for the first time, indicating that SPV2 may have a broad host range.

Table 4 Genetic distances within and between groups

Codon selection pressures

We examined the codon selection pressures at different positions of the 12 proteins in SPV2 isolates. The analysis showed that all of the dN/dS of SPV2 isolate proteins were less than 1. There was only one positive selection site in the P1N-PISPO protein. However, the other proteins were under strong negative selection pressure, the selection pressure of HC-Pro was the lowest among all the proteins (Table 5).

Table 5 Selection pressure analysis of SPV2

Discussion

This is the first report of SPV2 isolated from I. nil. This study obtained the complete nucleotide sequences of SPV2-LN by miRNA sequencing and RT-PCR. The structural characteristics and phylogenetic relationships of the SPV2 genome were analyzed by sequence consistency comparison and phylogenetic analysis [35, 36]. This research could provide a basis for revealing the evolution and pathogenicity of SPV2-LN at the molecular level and help with sweet potatoes viral control.

The complete genomes of 21 SPV2 isolates ranged between 10,562 and 10,747 nt, excluding the poly-A tail which contained a polyprotein. Isolate SPV2-LN had genome size of 10,606 nt excluding the poly-A tail which falls within the range of previous isolates. Genome size may differ significantly between closely related species due to various factors including host cell type, and environmental stress, among other factors [37]. SPV2-LN was closely related to yu-17-47 and it is closely related to the common ancestor of the four groups. The position of SPV2-LN in Group IV after phylogeneitcs analysis is of great significance, as it not only displays the phylogenetic relationships and evolutionary processes between species but also provides an important scientific basis for our understanding of the origin and evolution of virus biodiversity

Twelve mature proteins are formed through polyprotein processing and frameshift translation strategies including the P1 protein. Previous studies have shown that the P1 protein exhibited high variation and is a recombination hotspot in potyviruses infecting sweet potato [38]. However, in SPV2, the P1 protein was more conserved, and the sequence of P3 was more variable than the other 11 proteins. The P1 protein is conserved in potyviruses that infect sweet potato, and some viral P1 protein play a role in host selection [39]. This suggests that the conservation of P1 in SPV2 may play a role in the virus’s ability to infect and replicate in specific host plants. Previous studies have shown that the genomes of three potyviruses, including SPFMV, sweet potato virus C (SPVC) and SPVG, encoded P1N-PISPO protein. In contrast, the P1N-PISPO protein was not encoded in sweet potato latent virus (SPLV), sweet potato mild mottle virus (SPMMV) and other 78 potyviruses [12, 38, 40]. On the other hand, the high variability observed in P3 and P3N-PIPO may influence the virus’s ability to evade the host’s immune system allowing the virus to infect a wider host range. P3N-PIPO produced from the P3 gene via transcriptional slippage P3N-PIPO is critical in virus cell-to-cell movement and contributes to the suppression of host resistance [41, 42]. Here we showed that this variability in P3 and P3N-PIPO may be due to recombination which is an important force in the evolution of plant viruses, including several Potyviridae members [43]. Further research about the functional implications of these variations would be beneficial to better understand the mechanisms by which SPV2 interacts with its host and causes disease. Future studies can also focus on investigating the specific roles of these proteins in the virus’s life cycle and their potential as targets for antiviral strategies.

Codon use bias is affected by a variety of biological factors, such as GC composition, genome size, tRNA abundance, gene expression level, and mutation pressure [44]. This correlates with tRNA abundance in highly expressed genes which determines the translation efficiency and quality of the protein synthesized [45]. Natural selection and mutational stress are the main factors influencing codon preference [46]. The effect of mutation on a genome is not random but has a directionality toward higher or lower guanine-plus-cytosine content [47, 48]. In general, the higher the level of gene expression, the stronger the codon bias [49]. The codons used by SPV2-LN are mainly A/U, indicating that A and U are the preferred codons of this genome [50].

Recombination is prevalent in the evolution of potyviruses. In this study, recombination was absent in SPV2-LN, but present in SSBles-74_ZA, SPV2, yu-17-47 and SPV2-SP1. The lack of recombination in SPV2-LN indicates a low recombination frequency, suggesting that recombination is not the primary driving force of SPV2 molecular evolution. However, phylogenetic analysis showed a close relationship between SPV2-LN and yu-17-47 but relatively far apart from other isolates. This shows the existence of some variations among the SPV2 isolates, which may be the result of recombination events and mutations.

The frequency of non-synonymous mutations (dN) and homologous mutations (dS) are important for assessing selection pressure. A strong negative selection pressure reduces variation, especially in functionally conserved proteins [48]. For all genes of SPV2, the dN/dS ratio was lower than 1, indicating negative selection. In general, strong negative selection pressure tends to suppress variation, which generally occurs in functionally conserved proteins. Through recombination, genetic variation can be generated, mutation load can be reduced and new strains that change the biological characteristics of the virus can be generated contributing to the evolution of the virus [51,52,53]. Only one end of the P1N-PISPO protein was under positive selection while the rest was under negative selection. This indicates that recombination may not be the main driving force of SPV2 molecular evolution, but negative selection may be the main force driving SPV2 evolution.

Conclusion

This is the first report on the occurrence of SPV2 in I. nil in China. SPV2-LN presented clear symptoms in I. nil and using small RNA sequencing, RT-PCR and RACE, we characterized the genome of SPV2-LN. This discovery provides evidence of viruses’ ability to replicate, spread, and adapt to multiple host environments, thereby accelerating the speed and scope of virus transmission. Though single-potyvirus infections may cause mild or no symptoms in sweet potato, synergistic infection may arise which may cause severe symptoms. We advocate strengthening the monitoring and control of SPV2, especially in areas where SPCSV is severe to prevent co-infection and yield loss of sweet potatoes. This study will provides a basis for the in-depth study of the host range and the development of management or preventive measures against SPV2.