Dioscorea bulbifera L. is an herbaceous twining vine belonging to the botanical family Dioscoreaceae and it is commonly called as air yam. The genetic diversity centers of D. bulbifera were mostly located in Asia and Africa, but it is currently spread throughout the Americas (Hammer 1998; Govaerts et al. 2007; Maurin et al. 2016). Air yam produces edible underground tubers and aerial bulbils that can be easily reproduced by vegetative propagation (Croxton et al. 2011). Dioscorea bulbifera is also classified into the category of non-conventional edible plants (Kinupp and Lorenzi 2014). In Brazil, crops belonging to the genus Dioscorea are socioeconomically important, being mainly cultivated by smallholder farmers, and are considered an alternative source for food security (Ferreira 2011).

Viruses belonging to the genus Badnavirus (family Caulimoviridae) have semicircular, double-stranded DNA (dsDNA) genomes of 7.0–9.2 kbp in size, encapsidated in a non-enveloped bacilliform particle, and encode three to seven open reading frames (ORFs) (Teycheney et al. 2020). The ORF1-encoded protein has been reported as virion-associated (Cheng et al. 1996), while ORF2 encodes a protein that binds to nucleic acid (Jacquot et al. 1996). The badnaviral ORF3 is the largest coding region, representing almost 80% of the viral genome, and encodes a polyprotein that includes important conserved domains such as the viral capsid, movement protein, aspartate protease, reverse transcriptase (RT), and ribonuclease H (RNase H) (Teycheney et al. 2020). Badnaviruses replicate through an RNA intermediate molecule, being referred to as plant pararetroviruses, and are mainly transmitted by mealybugs (Pseudococcidae) or, in some instances, aphids, in a non-circulative, semipersistent manner (Teycheney et al. 2020; Bhat et al. 2016).

Badnavirus-like particles have been first described in Dioscorea spp. in the 1970s (Harrison and Roberts 1973; Mantell and Haque 1979). Subsequently, partial and complete genome sequences obtained from Dioscorea alata, originally collected in Nigeria, have shown the presence of a new badnaviral species currently known as Dioscorea bacilliform AL virus (Phillips et al. 1999; Briddon et al. 1999). A second badnavirus species affecting Dioscorea sansibarensis has been later characterized, at the genome level, and it is presently referred to as Dioscorea bacilliform SN virus (Seal and Muller 2007). These badnaviruses are widespread in the main yam cultivation areas, indiscriminately infecting different Dioscorea species (Eni et al. 2008; Kenyon et al. 2008; Bousalem et al. 2009). Dioscorea bacilliform AL virus (DBALV) has been reported as the predominant badnaviral species infecting these hosts in northeastern Brazil (Guimarães et al. 2015). At least six other badnaviruses associated with Dioscorea spp. are officially accepted by the International Committee on Taxonomy of Viruses (ICTV), Dioscorea bacilliform AL virus 2, Dioscorea bacilliform ES virus, Dioscorea bacilliform RT virus 1, Dioscorea bacilliform RT virus 2, Dioscorea bacilliform RT virus 3, and Dioscorea bacilliform TR virus (https://talk.ictvonline.org/ictv-reports). Also, yam plants belonging to the Dioscorea cayenensis-rotundata species complex have been reported to harbor different groups of endogenous pararetroviral sequences (EPRVs) (Bousalem et al. 2009; Seal et al. 2014; Umber et al. 2014).

Here, the species diversity of badnaviruses associated with D. bulbifera from different growing regions in Brazil was assessed by PCR amplification and Sanger sequencing of the RT-RNase H domains. At least three previously reported badnaviruses were found, DBALV, Dioscorea bacilliform SN virus (DBSNV), and Dioscorea bacilliform TR virus (DBTRV). Sequences belonging to a putative novel badnavirus tentatively named dioscorea bacilliform BL virus (DBBLV) were recovered. Furthermore, badnavirus-like endogenous sequences were characterized, being most closely related to Dioscorea rotundata endogenous pararetroviruses.

Most bulbils of air yam were collected in northeastern Brazil, while some plants were obtained from the north, central-western, south, and southeastern regions of the country. To establish a gene bank collection, the air yam bulbils were planted at the experimental field of the Federal University of Alagoas, Rio Largo, Alagoas state, Brazil. Then, leaf samples were collected from each symptomatic and asymptomatic plant and kept at − 80 °C until being analyzed.

Total DNA was individually extracted from 100 to 200 mg of frozen leaf tissue using the method described by Doyle and Doyle (1987) and used as template for PCR amplification. The degenerated primers BadnaFP (5′-ATGCCITTYGGIITIAARAAYGCICC-3′) and BadnaRP (5′-CCAYTTRCAIACISCICCCCAICC-3′), which amplify a fragment of ~ 580 bp comprising part of the RT-RNase H domains of Badnavirus species (Yang et al. 2003), were used for virus detection. The PCR reactions were performed in a total volume of 15 μL, containing 1.5 μL of 10 × buffer (100 mM KCl, 100 mM Tris–HCl pH 9.0, 1% Triton-X), 1.2 μL of 2.5 mM dNTPs, 0.4 μL 50 mM MgCl2, 0.2 μL of Taq DNA polymerase (Thermo Fisher Scientific, Carlsbad, CA, USA), 1.0 μL of each oligonucleotide (10 μM), 1.0 μL (50 ng) of total DNA, and 8.7 μL of nuclease-free water. The amplification conditions were initial denaturation at 94 °C for 4 min, 35 cycles of denaturation at 94 °C for 30 s, annealing at 50 °C for 30 s, and extension at 72 °C for 1 min, and a final extension step at 72 °C for 10 min. The PCR products were analyzed in 1% agarose gel, stained with ethidium bromide and visualized under ultraviolet light. Expected size amplicons (~ 580 bp) were gel-purified using the GFX™ PCR DNA and Gel Band Purification Kit (GE Healthcare, Illinois, USA) according to the manufacturer’s protocol and directly Sanger sequenced with both BadnaFP and BadnaRP primers (Yang et al. 2003).

The contigs corresponding to the RT-RNase H nucleotide sequences were assembled and ambiguous sites were manually edited using CodonCode Aligner v. 4.1.1 (www.codoncode.com). The consensus sequences were initially analyzed with the BLASTn algorithm (Altschul 1990) to identify their closest matches among virus sequences available in the NCBI non-redundant GenBank database (https://www.ncbi.nlm.nih.gov/genbank). Then, similar sequences obtained from GenBank (Supplementary Table S1) were used for species demarcation of the new isolates via pairwise nucleotide sequence comparisons using Sequence Demarcation Tool v. 1.2 (Muhire et al. 2014). The Badnavirus species demarcation criterion of ≥ 80% nucleotide identity for the RT-RNase H domains established by the ICTV was adopted (Teycheney et al. 2020).

The RT-RNase H nucleotide sequences of the isolates reported here and badnaviral sequences retrieved from GenBank were aligned using the MUSCLE algorithm (Edgar 2004), and manually adjusted in MEGA 7.0 (Kumar et al. 2015). The phylogenetic relationship was determined by Bayesian inference (BI) through the CIPRES web portal (Miller et al. 2010) using MrBayes v.3.2.3 (Ronquist et al. 2012), assuming a general time reversible (GTR) nucleotide substitution model with a gamma (G) model of rate heterogeneity and invariable (I) sites, determined using MrModeltest 2.3 (Posada and Buckley 2004) according to the Akaike information criterion (AIC). The analysis consisted of two replicates with four chains each for 10 million generations and sampling every 1000 generations. The first 2500 trees per run were discarded as a burn-in. The posterior probability values (Rannala and Yang 1996) were determined from the majority rule consensus tree reconstructed with the 15,000 remaining trees. The BI tree was edited in FigTree v.1.4 (ztree.bio.ed.ac.uk/software/figtree) and Inkscape (https://inkscape.org/pt/).

A total of 60 air yam bulbils were obtained from different growing regions in Brazil (nnorth = 4; nnortheastern = 47; nsouth = 4; nsoutheastern = 3; ncentral-western = 2). Leaf samples were collected from symptomatic and asymptomatic plants generated from the air yam bulbils (Fig. 1a–d) and individually tested, by PCR, using the BadnaFP/BadnaRP primer pair (Yang et al. 2003). Expected size amplicons of approximately 580 bp were observed from 39 of 60 plants, suggesting an incidence level of 65% (Supplementary Table S2). High incidence of badnaviruses, ranging from 72.0–93.3%, in Dioscorea spp. have been previously reported in Brazil (Lima et al. 2013; Guimarães et al. 2015; Nascimento et al. 2020), demonstrating that badnaviruses are widespread in commercial plantations and germplasm collections of yams. Although the PCR primers described by Yang et al. (2003) are well known for being unable to distinguish between episomal and integrated RT-RNase H sequences, it is an important and frequently used tool for badnavirus detection (Teycheney et al. 2020; Bhat et al. 2016; Luo et al. 2022). The amplification products from 26 PCR-positive samples were bidirectionally Sanger sequenced and showed that at least 12 of 26 plants were infected by badnaviral species officially accepted by the ICTV.

Fig. 1
figure 1

Dioscorea bulbifera plants exhibiting foliar chlorosis and leaf curling symptoms. Partial RT-RNase H sequences of Dioscorea bacilliform SN virus (a, isolate DBMU), Dioscorea bacilliform TR virus (b, isolate DBMT2), Dioscorea bacilliform BL virus (c, isolate DBM123), and Dioscorea rotundata endogenous virus (d, isolate DBB1) were recovered, and their phylogenetic relationship was reconstructed using Bayesian inference (e). Posterior probability values between 0.95 and 1.0 (filled circles) and 0.50 and 0.94 (empty circles) are shown near to each branch node. The isolates reported here are indicated in red

The isolates DBJ2 and DBJ3 shared 99.0% nucleotide identity with each other and showed highest identity (84.3–95.2%) with DBALV sequences, while the isolates DBMT1 and DBMT2 showed 99.8% identity between them and were more closely related to DBTRV, at 86.0–86.2% identity. The isolates DBAM1, DBJG, DBT6, DBC2, DBM6, DBM8, DBM9, and DBMU showed 97.9–99.4% nucleotide identity among them, and shared greater identity with DBSNV, at 82.3–83.2%. The isolates DB11, DBM123, DB31, DB32, DBB2, DBCO4, DBE, DBCU2, and DBH showed 82.6–99.1% nucleotide identity with one another, and shared highest nucleotide identity with Dioscorea rotundata endogenous pararetrovirus eDBV12, at 75.7–79.9% identity, suggesting these sequences may represent a putative new badnaviral species for which the name Dioscorea bacilliform BL virus is tentatively proposed (Table 1). Additional studies are needed to determine if this new species represents an episomal or integrated badnavirus. Finally, the isolates DB21, DBB1, DBG, DBT2, and DBT3 shared 99.4–100.0% with one another, and were identified as eDBV12, at 88.6–89.2% nucleotide identity (Table 1). The new sequences reported here were deposited in NCBI-GenBank under accession nos. OM628722-OM628747 (Table 2).

Table 1 Percentages of pairwise nucleotide identity based on partial RT-RNase H sequences of the new isolates and badnaviral isolates retrieved from NCBI-GenBank
Table 2 Badnaviruses associated with Dioscorea bulbifera in Brazil

Badnaviruses are able to infect tropical and subtropical crops, including Dioscorea spp., of great socioeconomical importance worldwide and can lead to economic losses between 10 and 90% (Phillips et al. 1999; Briddon et al. 1999; Seal and Muller 2007; James et al. 2011; Eni et al. 2008; Kenyon et al. 2008; Bousalem et al. 2009; Silva et al. 2015; Deeshma and Bhat 2015; Bhat et al. 2016; Luo et al. 2022). Yam plants affected by badnaviruses usually exhibit disease symptoms such as leaf chlorosis and deformation, and dwarfism, which can lead to a reduction in the photosynthetic capacity of the infected plant with deleterious effects on production, tuber quality, and plant death (Thouvenel and Dumont 1988; 1990). Recently, a taxonomic positioning study in Badnavirus suggested partial RT-RNase H sequences (~ 580 bp) are sufficient for species demarcation (Ferreira et al. 2019). Therefore, based on the ICTV-approved ≥ 80% nucleotide identity species demarcation criterion for RT-RNase sequences into the genus Badnavirus (Teycheney et al. 2020), isolates of DBALV, DBSNV, DBTRV, and DBBLV were found to be largely spread in D. bulbifera growing areas in Brazil, reinforcing this host harbors a high badnaviral species diversity that can negatively impact the disease management. To our knowledge, this is the first report of DBTRV in D. bulbifera worldwide.

Endogenous pararetroviral sequences (EPRVs) have been shown to be integrated into the genome of the African yam, D. cayenensis-rotundata complex, but no evidence of EPRVs has been found in other yam species such as D. alata and D. sansibarensis (Bousalem et al. 2009; Seal et al. 2014; Umber et al. 2014). In the present study, EPRV sequences (eDBV12) previously reported in D. cayenensis-rotundata were also characterized from D. bulbifera plants using PCR primers amplifying the badnaviral RT-RNase H domains. These results emphasize the importance of EPRVs present in the genome of yams for implementation of reliable molecular detection tools and, although no evidence of infectious EPRVs has been found in Dioscorea spp., it may represent a challenge for yam germplasm conservation and exchange of genetic materials between breeding programs.

The BI phylogenetic tree reconstructed based on partial RT-RNase H sequences showed that the new sequences were clustered in three different clades (Fig. 1e). The isolates reported here and sharing the highest nucleotide identity with Dioscorea endogenous sequences, were clustered into two sister subgroups, referred to as subclades 1a and 1b. The subclade 1a was comprised by isolates representing the putative new species DBBLV (isolates DB11, DBM123, DB31, DB32, DBB2, DBCO4, DBE, DBCU2, and DBH), while the isolates DB21, DBB1, DBG, DBT2, and DBT3 clustered in a monophyletic group with eDBV12 sequences (subclade 1b; Fig. 1e). These results agree with the pairwise sequence comparisons and reinforce that the sequences in subclade 1a may represent a new badnaviral species. However, since only partial sequences were obtained in the present study, and these isolates were more closely related to eDBV12 endogenous sequences, additional studies are necessary to clarify their episomal or integrated origin.

The isolates DBJ2 and DBJ3 grouped together with other DBALV sequences in the phylogenetic subclade 1c, while the DBSNV (isolates DBAM1, DBJG, DBT6, DBC2, DBM6, DBM8, DBM9, and DBMU) and DBTRV (isolates DBMT1 and DBMT2) sequences were placed in divergent phylogenetic groups, clades 2 and 4, respectively (Fig. 1e). So far, eight distinct Dioscorea-infecting badnaviruses have been characterized at the genome level (Briddon et al. 1999; Seal and Muller 2007; Bömer et al. 2016; Umber et al. 2016; Sukal et al. 2017; Bömer et al. 2018; Sukal et al. 2020), with DBALV and DBSNV being previously reported associated with D. bulbifera (Sukal et al. 2020; Nascimento et al. 2020). Although high badnaviral species diversity in D. bulbifera was observed here, additional samples from different growing regions must be analyzed, as well as the complete genomes need to be characterized. Also, assessing the extant species diversity of badnaviruses infecting D. bulbifera, and other Dioscorea species, and estimating the evolutionary mechanisms acting on diversification of these viruses are important steps to improve disease identification and management.