Begomovirus (family Geminiviridae) is a genus of plant viruses that have circular ssDNA genomes packaged in twinned quasi-icosahedral particles and are transmitted by the whitefly Bemisia tabaci [1]. Begomoviruses are considered a limiting factor for production of economically important crops worldwide, especially in tropical and subtropical regions [2, 3]. They are also able to infect a wide range of weed/non-cultivated plants, which can act as alternate hosts when cultivated plants are absent, or as a source of new begomoviruses that emerge via recombination in cases of mixed infection [4, 5].

Previous studies have focused on the diversity of begomoviruses infecting weed plants from the family Euphorbiaceae, and several begomoviruses have been reported, including euphorbia mosaic virus [6], croton yellow vein mosaic virus [7], euphorbia yellow mosaic virus (EuYMV) [8], dalechampia chlorotic mosaic virus [9], and jatropha mosaic virus [10]. In Brazil, EuYMV has been reported to cause mosaic disease in Euphorbia heterophylla since the 1950s [11] in addition to being found naturally infecting Macroptilium atropurpureum, Sida santaremensis, Crotalaria juncea and Solanum lycopersicum [4, 12, 13]. To date, EuYMV is the only begomovirus infecting euphorbiaceous hosts in Brazil that has been completely characterized [8]. In 2006, begomovirus infection in Cnidoscolus urens (family Euphorbiaceae) was reported [14], with plants displaying symptoms of yellow mosaic and leaf deformation. However, the genome sequence of this viral isolate is still unknown. Here, the full-length genome sequence of a new begomovirus infecting C. urens is reported.

In 2015, a sample of C. urens showing symptoms of mosaic and leaf deformation (Fig. 1A) was collected in the county of Messias, Alagoas State, Brazil. Total DNA was extracted from fresh leaves according to Doyle and Doyle [15]. DNA was used as a template for rolling-circle amplification (RCA) of the full-length begomovirus genomes [16]. The RCA products were individually cleaved with ApaI or HindIII restriction enzymes (Invitrogen™, ThermoFisher Scientific) and ligated into the pBluescript KS+ (Stratagene) plasmid vector, which had been cleaved with the same enzyme. Viral inserts were sequenced commercially by primer walking at Macrogen Inc.

Fig. 1
figure 1

A. Symptoms of mosaic and leaf deformation in Cnidoscolus urens from which the begomovirus CnMLDV was obtained. B. Percent nucleotide sequence identities among full-length DNA-A of CnMLDV (BR-Mes3-15 isolate) and the most closely related begomoviruses obtained by BLASTn analysis

The complete begomovirus genome was assembled using CodonCode Aligner v. 4.1.1 (http://www.codoncode.com). The DNA-A and DNA-B nucleotide sequences were initially analyzed using the BLASTn algorithm [17] and the GenBank non-redundant nucleotide sequence database to identify the viruses with which they shared greatest similarity. The most similar sequences from GenBank were then used to classify the novel isolate using the program Sequence Demarcation Tool v. 1.2 [18].

Multiple sequence alignments were made for the full-length DNA-A and DNA-B datasets using the MUSCLE algorithm [19]. Bayesian inference was run using MrBayes v. 2.3.3 [20] through the CIPRES web portal [21]. The best nucleotide substitution model was determined using MrModeltest v. 2.3 [22] according to the Akaike Information Criterion (AIC). The evolutionary model GTR+G+I was used for both the DNA-A and DNA-B datasets. Two replicates with four chains each for 10 million generations and sampling every 1000 generations were used (a total of 10,000 trees). The first 2,500 trees were discarded as a burn-in phase. Posterior probabilities [23] were determined from a majority-rule consensus tree generated with the 7,500 remaining trees. The trees were edited in the Figtree program v. 1.4 (ztree.bio.ed.ac.uk/software/figtree).

In order to perform recombination analysis, full-length DNA-A and DNA-B sequences identified during this study and reference sequences for begomoviruses from Brazil and other countries in Central and South America (Supplementary Table S1) were used. Identification of putative parents and recombination breakpoints was performed using Recombination Detection Program (RDP) v. 4 [24]. Recombination events detected by at least four different methods were considered reliable.

Complete sequences of two different clones were obtained from the same sample of C. urens: DNA-A and DNA-B clones (KT966771 and KT966772), represented by the isolate BR-Mes3-15. Using pairwise comparisons of DNA-A sequences and the ≥91 % nucleotide identity criterion established by the Geminiviridae Study Group of the International Committee on Taxonomy of Viruses [25], this isolate was classified as a new begomovirus (Fig. 1B). The virus reported here was most closely related to passionfruit leaf distortion virus (PSLDV; 86.3 % identity for the complete DNA-A), and shared only 72.8 % of nucleotide sequence identity with EuYMV (a begomovirus commonly found in euphorbiaceous hosts). DNA-B showed the highest nucleotide sequence identity (71.8 %) to an isolate of tomato mottle leaf curl virus (ToMoLCV, JF803264). The name cnidoscolus mosaic leaf deformation virus (CnMLDV) is proposed for this new begomovirus.

The genome of CnMLDV contained five ORFs in the DNA-A and two in the DNA-B (data not shown). The cognate DNA-A and DNA-B components had the conserved nonanucleotide (5’-TAATATT/AC-3’) at the origin of replication, three identical iterons (TGGGGTC) located upstream from the Rep TATA box, and shared 97.3 % nucleotide sequence identity in the common region (CR, 185 nt). The Bayesian phylogenetic tree based on the DNA-A sequences placed CnMLDV in a same cluster with the PSLDV isolate, reinforcing their genetic relationship (Fig. 2A). The CnMLDV DNA-B clustered apart from PSLDV, in agreement with the SDT analysis, where they shared only 69.2 % nucleotide sequence identity (Fig. 2B).

Fig. 2
figure 2

Midpoint-rooted Bayesian phylogenetic trees based on the DNA-A (A) and DNA-B (B) nucleotide sequences of CnMLDV (BR-Mes3-15 isolate; in red) and the most closely related begomoviruses. ToLCNDV, an Old World begomovirus, was used as an outgroup (color figure online)

RDP4 analysis revealed evidence of a single recombination event occurring in the DNA-A of CnMLDV, with putative recombination breakpoints located in the CR and REn (Supplementary Table S2). ToMoLCV, a begomovirus commonly found infecting tomato plants in northeastern Brazil [26], was identified as the putative major parent, with the minor parent being unknown. For the CnMLDV DNA-B, one recombination event with putative recombination breakpoints located in the CR was detected, with macroptilium yellow net virus as the putative major parent (Supplementary Table S2). Although RDP4 analysis is unable to determine the actual sequence of the parents, it seems that begomoviruses infecting cultivated hosts have contributed to CnMLDV emergence/evolution. Together, these results support the classification of CnMLDV as a member of a distinct species in the genus Begomovirus, reinforcing the idea that non-cultivated/weed plants are natural reservoirs of genetic diversity in this viral group.