Most described members of the genus Mastrevirus of the family Geminiviridae infect monocotyledonous hosts. Although the first mastrevirus infecting a dicotyledonous plant, tobacco yellow dwarf virus (TbYDV), was characterised at the sequence level in 1992 [40], it has taken some time for further dicot-infecting mastreviruses to be characterised so as to allow a better picture of the diversity of these unique geminiviruses to emerge (Fig. 1A). Dicot-infecting mastreviruses have been identified in Africa, across the Middle East, in southern Asia and in Australia [14, 29, 41, 51]. Initially the centre of diversity of these viruses appeared to have been the Middle East/southern Asia, where three related viruses, bean yellow dwarf virus (BeYDV), chickpea chlorotic dwarf Sudan virus (CpCDSV) and chickpea chlorotic dwarf Pakistan virus (CpCDPKV) have been characterised [41]. However, recent identification of three new divergent Australian mastreviruses, chickpea chlorosis virus-A (CpCV-A), chickpea chlorosis virus-B (CpCV-B) and chickpea redleaf virus (CpRLV) [51] suggests that the centre of dicot-infecting mastrevirus diversity probably lies somewhere around the Pacific rim (Figs. 1 and 2).

Fig. 1
figure 1

A Maximum-likelihood tree (based on an alignment of complete genome nucleotide sequences) depicting the evolutionary relationships between 12 dicot-infecting mastreviruses (rooted with oat dwarf virus [ODV]). The ML tree was constructed using PhyML [12], with F84+G4 chosen as the best-fit model by RDP3 [39]. The numbers associated with tree branches are indicative of the percentage of 100 full maximum-likelihood bootstrap replicates supporting the existence of the branches. B Two-dimensional graphical representation of pairwise genome-wide nucleotide sequence identities (calculated with pairwise deletion of gaps; scale represents percentage identity) between the 12 dicot-infecting mastrevirus isolates. C Schematic representation of detectable recombination events amongst the dicot-infecting mastreviruses. The colours of blocks correspond to the different mastrevirus species. D Details of recombination between dicot-infecting mastreviruses detected using RDP3 [38, 39]. R, G, B, M, C, S and T indicate detection by the RDP, GENCONV, BOOTSCAN, MAXCHI, CHIMERA, SISCAN and 3SEQ methods, respectively, with the presented p-value being that determined by the method indicated in bold type

Fig. 2
figure 2

A Maximum-likelihood phylogenetic relationships of the 12 dicot-infecting mastreviruses based on alignments of the predicted amino acid sequences of the replication-associated (Rep), coat protein (CP) and movement proteins (MP), constructed with PhyML [12] using the LG model. Numbers associated with tree branches are indicative of the percentage of 100 full maximum-likelihood bootstrap replicates supporting the existence of the branches. B Two-dimensional graphical representation of pairwise amino acid sequence similarities (calculated with pairwise deletion of gaps; scale represents percentage identity) between the predicted Rep, CP and MP of the 12 dicot-infecting mastreviruses

With the exceptions of BeYDV, which causes problems in French beans (Phaseolus vulgaris) in South Africa, TbYDV, which infects tobacco, and as yet only partially characterised sugar-beet- and sweet-potato-infecting mastreviruses [10, 23], the dicot-infecting mastreviruses have mostly been found in chickpea and other pulses such as lentils [10, 18, 19, 24, 3234, 41, 48, 51]. Whereas a partially characterised Iranian sugar-beet-infecting virus seems quite closely related (between 88-91% similarity within the 1401-nt sequenced genome region) to Pakistani dicot-infecting mastreviruses, a 177-nucleotide sequence from a mastrevirus-infecting sweet potato may be a little more divergent, sharing between 79 and 84% nucleotide identity with all other dicot-infecting mastreviruses.

The mastreviruses, in common with all other geminiviruses, have small (~2700 bp) single-stranded DNA genomes that are encapsidated in quasi-icosahedral (geminate) particles. The mastreviruses are all apparently transmitted plant-to-plant by leafhopper vectors, with specific vector species varying both from continent to continent and between the monocot- and dicot-infecting viruses. Relative to the other geminiviruses, mastreviruses have fewer genes and tend to have smaller genomes. Mastrevirus genomes also have a unique arrangement amongst the geminiviruses and, for example, express two (instead of one, as in other geminiviruses) distinct replication-associated proteins, Rep and RepA, from alternatively spliced complementary-sense transcripts of the rep gene [47]. Whereas RepA is expressed from unspliced transcripts and is involved in the activation of host genes required for virion-strand DNA replication [20, 49, 58] and virus-encoded genes expressed from the virion strand [8], the Rep protein is expressed from spliced transcripts and is required to both initiate and terminate virion-strand replication during rolling-circle replication [17]. The two other mastrevirus genes —both expressed from the virion strand —encode the movement protein (MP), required for cell-to-cell movement in plants, and the coat protein (CP), which is the only structural protein within virus particles, determines vector specificity and is also required for cell-to-cell movement in plants [3, 4, 30, 31].

Although the overall genome arrangements of the dicot- and monocot-infecting mastreviruses are similar, the presumed jump of an ancestral monocot-infecting mastrevirus to a dicot host apparently involved major host-adaptive changes in all of the genes, because genes from monocot-infecting mastreviruses such as maize streak virus (MSV) can no longer complement the functions of genes from dicot-infecting mastreviruses such as BeYDV [31].

Further contributing to the divergence of monocot- and dicot- infecting mastreviruses would have been the fact that, once the jump into dicots had occurred, there would have been a significant ecological barrier to genetic exchange between the two groups by recombination. Genetic recombination is an important evolutionary process that has been shown to have played a major role in the evolution of mastreviruses and all other geminiviruses. For example, whereas ancient inter-genus recombination events may have created members of whole new geminivirus genera [6, 22, 56], intra-genus recombination has been implicated in the emergence of, amongst other important agricultural pathogens, the MSV and East African cassava mosaic virus strains that are currently threatening the food security of impoverished African countries [15, 44, 54, 59].

With the recent discovery of extensive dicot-infecting mastrevirus diversity in the Middle East [41] and Australia [51], enough full-genome sequences are now available to determine whether recombination between dicot-infecting mastreviruses occurs and, if so, whether it has the same characteristics as that occurring amongst monocot-infecting mastreviruses.

We therefore aligned twelve full dicot-infecting mastrevirus genome sequences obtained from GenBank (11-01-2011) using Muscle (default settings; [9]) and analysed these using various recombination detection and characterisation methods implemented in the program RDP3 [39]. Specifically, in RDP3, we detected evidence of individual recombination events and recombination breakpoint positions using the RDP [35], BOOTSCAN [5, 37], GENECONV [43], MAXCHI [50], CHIMAERA [45], 3SEQ [2], and SiScan [11] methods and identified parental and recombinant sequences using the VisRD method [28] and modified versions of the PHYLPRO [57] and EEEP [1] methods also implemented in RDP3 (with program defaults used throughout except that only evidence of recombination events detectable with four or more methods are reported here).

We identified clear evidence of at least seven recombination events within the twelve genomes analysed (Fig. 1). As has been seen before in the monocot-infecting mastreviruses such as MSV, Panicum streak virus (PanSV) and wheat dwarf virus [7, 13, 46, 5355], most of the observed interspecies recombination events have involved transfers of small sequence fragments (between 108 and 54 nt; events 3, 7, 6 and 5 in Fig. 1C and D) and are present in multiple sampled genomes (i.e., they are within obviously non-defective circulating virus lineages). Notably, two of these small events (events 3 and 6) occur within or near the short intergenic region (SIR) at sites where similarly sized recombination events are frequently observed in monocot-infecting mastreviruses [54, 55]. Furthermore, in the monocot-infecting mastrevirus MSV, it has been shown experimentally, firstly, that the SIR, unlike other genome regions, is highly modular and continues functioning properly even when transferred into genetic backgrounds that are highly divergent from those in which it evolved [36] and, secondly, that the SIR is probably more mechanistically predisposed to recombination than any other genome region [52]. This implies that the SIR sequences of the dicot-infecting viruses may be similar to those of their monocot-infecting counterparts in terms of both modularity and increased predisposition to recombination.

Also notable amongst the observed recombination events are the larger ones, (events 1, 2 and 4 in Fig. 1 C and D) which appear to have occurred either amongst the Australian viruses (events 2 and 4) or between Australian viruses and currently unsampled viruses (event 1). Large recombination events of this type, as well as the relative position of these in the genome (usually spanning large parts of Rep), have also been observed amongst the monocot-infecting mastreviruses [53, 54], other geminiviruses [25, 43] and other single-stranded DNA viruses [26] and may indicate the fact that Rep genes in general are innately recombinogenic. Using various different sequence analysis and experimental-based approaches, it has in fact been shown that, in various geminivirus and other single-stranded DNA viruses with bidirectionally transcribed genes, estimated basal recombination rates (i.e., the rates at which recombination is mechanistically occurring) tend to be higher in the complementary-sense genes (particularly in rep) than in the virion-sense genes [21, 26, 42].

To better illustrate this concordance between the size and distribution of recombination events observable within the dicot- and monocot-infecting mastrevirus genomes, we constructed a recombination breakpoint distribution map for the dicot-infecting viruses (as described by Heath et al. [16]) and compared this to an analogous map produced by Varsani et al. [54] for the monocot-infecting mastreviruses (Fig. 3). Despite only 14 recombination breakpoints being available to construct the dicot-infecting mastrevirus plot, there are striking similarities between the two breakpoint distribution plots. In both plots, there are recombination breakpoint clusters in and around the SIR and more recombination breakpoints within the complementary-sense ORFs than in the virion-sense ORFs. The most notable differences between the plots is the absence in the dicot-infecting mastrevirus plot of both breakpoints around the virion-strand origin of replication and a statistically significant recombination hotspot within the SIR (i.e., the SIR breakpoint cluster does not emerge above the grey areas, indicating the expected 95 and 99% breakpoint density intervals, as it does in the monocot-infecting mastrevirus plot). However, both of these differences are probably attributable to the small numbers of recombination breakpoints used to produce the plot for the dicot-infecting mastreviruses.

Fig. 3
figure 3

Conserved recombination breakpoint distributions in dicot- and monocot-infecting mastreviruses. A Dicot-infecting mastrevirus recombination breakpoint distribution plot (solid black line) indicating clustering of detectable recombination breakpoint positions (vertical lines above the plot) within a 200-nucleotide sliding window. Light and dark shaded regions, respectively, represent 99 and 95% confidence intervals of the ‘local’ recombination breakpoint hot- and cold-spot test of Heath et al. [16]. B Recombination breakpoint distribution plot for monocot-infecting mastreviruses (from Varsani et al. 2008). Positions of genomic features are indicated above the plots: horizontal arrows labelled V and C, respectively, represent virion- and complementary-sense genes. The V2 gene encodes the movement protein, the V1 gene encodes the coat protein, the C1 and C2 genes encode the replication-associated protein (translated from spliced transcripts), and the C1 gene encodes the replication-associated protein A. Vertical black arrows indicate the virion-strand replication origin

Similar patterns of detectable recombination between the monocot- and dicot-infecting mastreviruses suggest that, in these two groups, breakpoint distributions are largely being shaped by similar factors. Among these would be ecological factors influencing rates of virus co-infection (such as the degree of vector and virus geographical and host-range overlap), mechanistic factors influencing where recombination breakpoints occur (such as those determining where within the viral genomes either breakage or replication stalling is most likely to occur), and selective factors influencing which recombinants are able to survive in nature. As the number of dicot-infecting mastrevirus genomes increases, so too will the resolution with which we are able to characterise where within these genomes recombination events have occurred and to which species the parental viruses belonged. With the exception of BeYDV, which has been found in both Pakistan and southern Africa, there also appears to exist amongst the dicot-infecting mastreviruses a strong degree of phylogeographic clustering (i.e., closely related viruses tend to have been sampled within the same regions) such that, as has been demonstrated recently for TYLCV [27], it may also ultimately be possible to identify the geographical host-spots of dicot-infecting mastrevirus recombination.