The genus Begomovirus (family Geminiviridae) comprises one of the most important groups of emerging plant viruses [1]. Begomoviruses can have mono- or bipartite genomes [2]. The genomic DNA of monopartite begomoviruses and the DNA-A component of Old World bipartite begomoviruses have an analogous genomic organization [3, 4] with six genes involved in replication (Rep and REn), transcription (TrAP), suppression of host defenses (TrAP and AC4), movement (V2), and encapsidation of viral progeny (CP). The DNA-B of bipartite begomoviruses contains two genes required for virus movement (NSP and MP) [5, 6]. Transcription of the viral genome in begomoviruses is bidirectional [7], and the viral and complementary transcription units of the genome are separated by an approximately 200-nt intergenic region (IR), known as the common region (CR) in bipartite viruses. Sequence identity between the cognate DNA-A and DNA-B CRs is usually very high (>94%). The CR contains the nonanucleotide 5'-TAATATTAC-3', which is conserved in all begomoviruses and constitutes the origin of replication, and specific binding sites for the Rep protein, known as iterons [8,9,10].

Pyrenacantha spp. (family Icacinaceae) are dioecious climbing herbs that originate from a subterranean perennial tuberous rootstock. They are endemic to East Africa, being found in Tanzania, Malawi, Zambia, Zimbabwe, and Mozambique [11]. In this study, we identified a new bipartite begomovirus that infects Pyrenacantha sp. To the best of our knowledge, this is the first report of a begomovirus infecting Pyrenacantha spp. anywhere in the world.

A Pyrenacantha sp. plant with symptoms of yellow mosaic (Fig. 1A) was collected in a maize field adjacent to a soybean field in the district of Malema (14°57.690'S, 37°23.428'E 628 masl), Nampula Province, Mozambique, in March 2019. At the time of collection, no whiteflies were observed in the Pyrenacantha sample, but they were present in the soybean plants. The identity of the Pyrenacantha sample was confirmed by DNA barcoding using the chloroplast matK and rbcL genes (GenBank accession numbers OK345031 and OK345032, respectively). Total DNA was extracted from leaf discs [12] and used as a template for rolling-circle amplification (RCA) [13]. The RCA product was cleaved with HindIII or SacI to obtain monomeric genomic fragments for cloning. Fragments of about 2.7 kbp, corresponding to one genomic copy of a begomovirus genomic component, were ligated to a pBluescriptKS+ vector (Stratagene) that had previously been cleaved with the same enzyme. Recombinant plasmids were introduced by transformation into E. coli DH5α [14], and clones were sequenced commercially (Macrogen, South Korea). The full-length genome sequence was assembled using SeqAssm v.1.0 (www.sequentix.de).

Fig. 1
figure 1

(A) Symptoms of yellow mosaic in the Pyrenacantha sp. sample. (B) Bayesian phylogenetic tree based on the full-length nucleotide sequences of the DNA-A of PyYMV (marked in red) and other begomoviruses reported from Africa. Labels indicate their abbreviations and GenBank accession numbers. Nodes with posterior probability values equal to or greater than 0.8 are indicated by black circles, and nodes with values between 0.6 and 0.79 are indicated by grey circles. The scale bar represents the number of nucleotide substitutions per site. ACMV, African cassava mosaic virus; AsMMV, asystasia mosaic Madagascar virus; CMMGV, cassava mosaic Madagascar virus; CLCGeV, cotton leaf curl Gezira virus; CYMV, cotton yellow mosaic virus; CPGMV, cowpea golden mosaic virus; DaLCV, datura leaf curl virus; DMV, deinbollia mosaic virus; DesMoV, desmodium mottle virus; EACMV, East African cassava mosaic virus; PepYVMLV, pepper yellow vein Mali virus; SACMV, South African cassava mosaic virus; SbCBV, soybean chlorotic blotch virus; SbMMoV, soybean mild mottle virus; TelGMV, telfairia golden mosaic virus; TbLCZV, tobacco leaf curl Zimbabwe virus; ToCSV, tobacco curly stunt virus; TLCBFV, tomato leaf curl Burkina Faso virus; ToLCNaV, tomato leaf curl Namakely virus; TYLCV, tomato yellow leaf curl virus. The tree is rooted with the New World begomovirus bean golden mosaic virus (BGMV)

Sequences were initially analysed using the BLASTn algorithm [15] to identify the viruses with which it shared the most sequence similarity. Closely related begomovirus sequences were used to determine the taxonomic position of the new isolate using Sequence Demarcation Tool v. 1.2 [16], according to the current criteria for species demarcation (<91% nt sequence identity for the DNA-A) [8]. The DNA-B was identified as a cognate component based on sequence identity in the common region. A search for ORFs was performed using the Geminivirus Data Warehouse platform [17] and ORF Finder (www.ncbi.nlm.nih.gov/projects/gorf/). A multiple sequence alignment was obtained using the MUSCLE algorithm implemented in MEGA X [18], and phylogenetic trees were built through Bayesian inference with MrBayes v. 3.2.7a [19]. The trees were visualized and edited with FigTree (www.tree.bio.ed.ac.uk/software/figtree/).

We cloned and sequenced three begomovirus components (two DNA-As and one DNA-B). The two DNA-A sequences are 100% identical. Sequence comparisons using the BLASTn algorithm and the SDT program revealed a maximum nucleotide (nt) sequence identity of 78.5% with the DNA-A of tomato leaf curl Namakele virus (TLCNaV, GenBank accession number AM701764) for clones MZ-Mal170.2-19 and MZ-Mal170.11-19, and 70.2% nt sequence identity with the DNA-B of deinbollia mosaic virus (DMV, KT878825) for clone MZ-Mal170.13-19. Based on current ICTV criteria for begomovirus species demarcation, the begomovirus cloned from Pyrenacantha sp., named Pyrenacantha yellow mosaic virus (PyYMV), represents a new species, for which the name "Begomovirus pyrenacanthae" is proposed. The two DNA-A components (both 2,766 nt long; GenBank accession numbers MZ390982 and MZ390983) have a genomic organization typical of Old World begomoviruses with six genes, two in the viral sense (CP and AV2) and four in the complementary sense (Rep, TrAP, REn and AC4). The DNA-B component (2,726 nt; MZ390984) has one gene in the viral sense (NSP) and one in the complementary sense (MP).

Examination of the DNA-A and DNA-B CRs indicated a 35-nt insertion in the DNA-A CR, starting 38 nt downstream from the second direct repeat and ending 14 nt upstream from the nonanucleotide (Fig. 2A). As a result of this insertion, the DNA-A and DNA-B CRs have only 82.7% nt sequence identity. Nevertheless, the two CRs have identical iterons, with the same inverted and direct repeats (TACCCC-GGtGTA-GGGGTA). Moreover, the iteron-related domain of the Rep protein (MPPSRFKVN; Fig. 2B), is predicted to recognise the GGTG iteron core sequence [20]. The sequence identity between the CRs when the 35-nt insertion is removed from the alignment is 96%. Together with the MspI restriction pattern of the RCA amplicon (indicating the presence of only two begomovirus components in the sample; data not shown), the presence of identical iterons, and the high nt sequence identity between the CRs when the 35-nt insertion is removed from the alignment, these results indicate that the DNA-A and DNA-B components cloned from the Pyrenacantha sp. sample are indeed cognate components of the same bipartite begomovirus. The 35-nt insertion likely originated from a recombination event. However, recombination analysis with RDP v. 4 [21] did not indicate a recombination event in the DNA-A, and BLAST analysis of the 35-nt insertion indicated no significant similarity to sequences in the databases.

Fig. 2
figure 2

(A) Nucleotide sequence alignment of the common regions (CRs) of PyYMV DNA-A and DNA-B. The iterons (two direct repeats and one inverted repeat) are underlined with the "t" of the imperfect direct repeat in lowercase, the nonanucleotide is in bold italics, and the 35-nucleotide insertion in the DNA-A CR is highlighted in grey. (B) Amino acid sequence alignment of the Rep protein N-terminal region of PyYMV and the most closely related begomoviruses. The iteron-related domain (IRD) is indicated in bold. Motif 1, involved in rolling-circle replication, is underlined

Phylogenetic analysis indicated that the PyYMV DNA-A is located in a separate branch in a cluster with DMV and asystasia mosaic Madagascar virus, two begomoviruses isolated from weeds (Fig. 1B). The DNA-B also clusters with DMV (Supplementary Fig. S1). In both cases, genetic distances between the PyYMV sequences and those of its closest relatives are large (for example, the distance between PyYMV and deinbollia mosaic virus is twice as large as that between tobacco leaf curl Zimbabwe virus and tomato curly stunt virus, or between East African cassava mosaic virus and South African cassava mosaic virus; Fig. 1B), indicating a high degree of divergence between PyYMV and other begomoviruses.

We have identified and molecularly characterized a new Old World, bipartite begomovirus infecting Pyrenacantha sp. in Mozambique. Non-cultivated plants can contribute to the prevalence and distribution of viruses in crops and act as a reservoir of begomovirus diversity [22,23,24]. Emerging viruses usually spread to a new host population from the wild host population [25]. Our surveys of common bean, soybean, and cowpea fields near the area where PyYMV was found did not detect this virus in these crops or in associated weeds (although no other Pyrenacantha spp. samples were found; B.A.I. Chipiringo and F.M. Zerbini, unpublished results), and a recent study in tomato fields in Mozambique found a single begomovirus, tomato chlorotic stunt virus (ToCSV) [26]. Thus, PyYMV may be restricted to Pyrenacantha spp., similar to other begomoviruses from non-cultivated hosts, which are also restricted to a single host species ("sealed containers") [27]. Whether it remains restricted to this host will depend on the virus- and ecosystem-specific factors that favor spillover events, such as host range and vector interactions (including the ability of Pyrenacantha spp. to support whitefly populations), or changes in agricultural practices that may lead to ecosystem simplification and increases in vector populations [28]. Pyrenacantha is a perennial shrub distributed across East Africa, and it is likely that it has been coexisting with PyYMV for a long time. Considering the rapid changes in agricultural practices currently taking place in Mozambique (such as increased areas of monoculture, extended growing seasons, irrigation, and mechanization replacing subsistence farming) [29, 30], the potential of PyYMV to emerge in crops plants should not be underestimated.