The members of the family Caulimoviridae infect various plants worldwide. The family includes 11 genera, whose members have a monopartite, open, circular dsDNA genome of 7.1–9.8 kbp with discontinuities in both strands that is encapsidated in an icosahedral or bacilliform-shaped viral coat [1]. Like those of other viruses in the order Ortervirales, the genome of members of the family Caulimoviridae alternates between dsDNA and ssRNA through cycles of transcription and reverse transcription. However, in contrast to retroviruses, the dsDNA form of the viral genome is encapsidated rather than the ssRNA replication intermediate [2]. Caulimovirus genomes contain six to seven open reading frames (ORFs) that sequentially encode a movement protein (MP), an aphid transmission factor (ATF), a virion-associated protein, a coat protein (CP), a protease + reverse transcriptase (RT) + ribonuclease (RNase) H, and a transactivator/viroplasmin (TAV) protein, with two to four discontinuities in the strands [1].

Pueraria montana (commonly called kudzu) is one of 26 plant species of the genus Pueraria (family Fabaceae), which are mostly found in Asia, North America, and South America [3]. These plants, especially their tubers, are known for their traditional health and cosmetic benefits, as well as for their use in agriculture to prevent soil erosion [3]. Plants of the genus Pueraria are infected by various viral pathogens, including kudzu mosaic virus (genus Begomovirus), tobacco ringspot virus (genus Nepovirus), and soybean vein necrosis virus (genus Orthotospovirus) [4,5,6].

In this paper, we report the discovery of new caulimovirus in Pueraria montana (P. montana) and describe its complete genome sequence and organization.

P. montana leaf samples showing vein-clearing-like symptoms were collected from Chunyang-myeon, South Korea, in August 2018 (Fig. 1a) and were kept in powdered form at −80°C until used. To identify viral sequences, the collected P. montana samples and 16 other plant samples (n = 17) showing virus-like symptoms were pooled together as described previously [7,8,9]. A WizPrep™ Plant RNA Mini Kit (Seongnam, Korea) was used to extract total RNA from the pooled sample, and high-throughput paired-end RNA sequencing was performed after removing plant ribosomal RNA using a Ribo-Zero™ rRNA Removal Kit (Plant Leaf) (Epicentre, Madison, WI, USA). A cDNA library was constructed in accordance with the manufacturer’s instructions using a TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA, USA). BluePippin™ 2% Agarose Gel Cassettes (Saga Science, Beverly, MA, USA) and an Agilent 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA) were used to measure the size and quality of the cDNA, respectively. An Illumina NovaSeq6000 system was used to obtain paired-end reads. Approximately 82 GB of raw data were generated from the pooled samples. All of the raw reads were trimmed, and the subsequent de novo assembly and contig annotation were done by Macrogen (Seoul, Korea).

Fig. 1
figure 1

Disease symptoms and genome organization of a new caulimovirus identified in Pueraria montana. (A) Pueraria montana plant leaves with vein clearing symptoms. (B) Circular representation of the PVA genome. (C) Linear representation of the PVA genome. Conserved domains/motifs, such as the movement protein (MP), aphid transmission factor (ATF), virion-associated protein, peptidase, reverse transcriptase, viroplasmin, and the cysteine motif, were identified.

The resulting contig sequences were compared with available sequences in GenBank using a BLASTn search, which indicated that the test samples were infected with several known and unknown plant viruses. Among the contigs, one long caulimovirus-related contig (7,572 nucleotides [nt]), which was assembled from 2,445,773 reads, was identified. The contig shared the highest sequence identity – 66.82%, 65.89%, and 65.74% – with strawberry vein banding virus (KX249738.1), angelica bushy stunt virus (NC_043523.1), and cauliflower mosaic virus (AB863145.1), respectively, and it appeared to represent a novel caulimovirus. Accordingly, we considered this caulimovirus-related assembled sequence (7,572 nt) to be a complete genome sequence of a novel caulimovirus, which we tentatively designated as "pueraria virus A" (PVA).

To confirm the HTS result and test specifically for the virus in each plant sample, two primers, PVA_4457_F (TTGGCTTGAAACAAGCTCCT) and PVA _4895_R (TCCTGCTGTGTCCATATCCA) were designed based on the single caulimovirus-related contig sequence. Total DNA was extracted from each of the symptomatic 17 samples that were used for HTS, using a DNeasy® Plant Mini Kit (QIAGEN, Hilden, Germany). The extracted DNA samples were subjected to PCR using AccuPower® ProFi Taq PCR PreMix (Bioneer, Daejeon, Korea). Among the tested samples (n = 17), the sample from P. montana was the only one positive for PVA.

To confirm the complete genome sequence of PVA, seven additional primer sets were designed based on the PVA contig sequence (Supplementary Table S1). All of these primer sets were used successfully to amplify viral DNA from the P. montana sample (Fig. 1a). Using AccuPower® ProFi Taq PCR PreMix (Bioneer), PCR products of the expected sizes were obtained (Supplementary Fig. S2). All of the amplicons were purified using an AccuPrep PCR Purification Kit (Bioneer) and cloned independently into the RBC T&A Cloning Vector (RBC Bioscience, Taipei, Taiwan). To reduce experimental errors, at least three clonal inserts per PCR were sequenced using the Sanger method at Genotech (Daejeon, Korea). All of the overlapping PVA sequences were assembled using the DNAMAN 5.0 program (Lynnon Biosoft, Quebec, Canada). The assembled complete genome sequence of PVA shared a significant amount of sequence identity (65.2%) with other members of the genus Caulimovirus.

The complete 7,572-nt genome sequence of PVA was deposited in the GenBank database (accession no. MZ826138), and it shares the most sequence similarity (66.82% identity and 31% query coverage) with strawberry vein banding virus (GenBank accession no. KX249738.1).

The genome of PVA starts with the conserved tRNAMet sequence 5′-TGGTATCAGAGCC-3′, which is a primer binding site and complementary to the consensus sequence of the plant tRNAMet binding site [10]. It is therefore presumed to utilize host tRNA molecules as primers for genome replication by reverse transcription of the negative-sense DNA strand [11].

Six putative ORFs were identified in the complete PVA genome sequence (Fig. 1b and c) using ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/): ORF1, nt 40–1,005; ORF2, nt 1,002–1,436; ORF3, nt 1,514–1,936; ORF4, nt 1,933–3,354; ORF5, nt 3,311–5,452; and ORF6, nt 5,566–6,876 (Fig. 1c). For the detection and functional annotation of the conserved domains/motifs in protein-encoding sequences, the Pfam database of protein families (http://pfam.xfam.org/) [12] was used (Fig. 1c).

ORF1 of PVA encodes an MP. The product of ORF1, which has two conserved amino acid sequence motifs, GNLSYGKLMF (aa 166–175) and GYTLSNSHHS (aa 220–229), is believed to be involved in cell-to-cell movement of caulimovirus members [8, 13, 14]. ORF2 of PVA encodes an IXG motif, which is necessary for interaction between the ATF and viral particles during aphid transmission in members of the genus Caulimovirus [8, 13, 14]. ORF3 of PVA potentially encodes a multifunctional virion-associated protein, which might play a role in virus cell-to-cell and plant-to-plant transmission, and interacts with the capsid, movement protein, and aphid transmission factor [15].

ORF4 of PVA encodes a viral CP, which contains a conserved cysteine motif (CX2CX4HX4C, aa 404–418). A similar motif has been identified in another caulimovirus CPs, and it includes an RNA-binding domain that is consistent with a cysteine motif or ‘zinc finger’ [16]. ORF5 of PVA encodes a polyprotein containing all of the motifs conserved in caulimovirus replicases, including aspartic protease (aa 54–213), reverse transcriptase (RT) (aa 334–489), and RNase H (aa 579–687) motifs, making it similar to the putative protease domains reported previously for caulimoviruses [8, 13, 17]. The conserved RT domain of caulimoviruses is present as YVDDIVF (aa 438–445) and IIETDASDLYWG (aa 484–495). ORF6 of PVA encodes a caulimovirus viroplasmin, with a conserved TAV (GLCSIIY; aa 250–256), which is critical for viral replication, translation, assembly, and protection against plant defense mechanisms [9, 13, 18].

PVA has two intergenic regions. The smaller one is 160 nt in length and is found between ORF5 and ORF 6, whereas the longer one is between ORF1 and ORF6. The former has a tentative TATA-like box, TATATATA (nt 5519–5526), and the latter contains a putative polyadenylation signal, AATAAAA (nt 7137–7143), downstream from its TATA-like box [9, 13, 17].

Sequence comparisons showed that the overall nucleotide sequence identity between PVA and others caulimovirus members ranged from 43.07% to 51.35% (Supplementary Table S2). Furthermore, pairwise alignments of amino acid sequences of PVA ORFs with those of another caulimoviruses revealed low sequence similarity (12.77%–47.90% identity). Only ORF5 had relatively high amino acid sequence similarity (62.46%–42.52% identity) to other members of the genus Caulimovirus (Supplementary Table S2). This suggests that ORF5 of PVA has a closer evolutionary relationship to other caulimoviruses than the other ORFs. To better understand the molecular relationships between PVA and other caulimoviruses, a phylogenetic tree was constructed by the maximum-likelihood method with 1000 bootstrap replicates in MEGA X v. 10.1.8 [19] using amino acid sequences. The phylogenetic tree constructed using an amino acid alignment of ORF 5 from PVA and other members of the genus Caulimovirus placed PVA within a group corresponding to the genus Caulimovirus and closest to SVBV (Fig. 2).

Fig. 2
figure 2

Phylogenetic tree based on amino acid sequences of ORF5 of PVA and other members of the genus Caulimovirus. The maximum-likelihood phylogenetic tree was constructed by the JTT matrix-based model in MEGA X v. 10.1.8, using ClustalW for sequence alignment. Bootstrap values for 1000 replicates > 50% are shown at the branches. Soybean chlorotic mottle virus (SbCMV) and blueberry red ringspot virus (BRRV) were used as the outgroup.

In summary, pueraria virus A (PVA) fulfills the current International Committee on Taxonomy of Viruses species demarcation criteria for caulimoviruses, which include host range differences and differences in polymerase (RT + RNase H) nt sequences of more than 20%. Thus, it can be classified as a member of a new species in the genus Caulimovirus. The genomic sequence obtained in this study will help in the further characterization of this virus and identification of other potential hosts. Since P. montana propagates vegetatively, our finding of another distinct caulimovirus highlights the importance of developing virus-specific detection and management alternative for these viruses. Further research is needed to identify the vector and to investigate the possible presence of plant-genome-integrated subgenomic forms or episomal elements.