Geminiviruses constitute a large family of plant DNA viruses infecting a broad range of plants. They cause devastating diseases in important crops that provide staple food in subsistence agriculture or are otherwise economically important, such as beans, cassava, cotton, maize, pepper, sugar beet, sweet potato, and tomato [19, 21, 22]. Within this family, the genus Begomovirus is composed of viruses that infect dicotyledonous plants, are transmitted by one whitefly species (Bemisia tabaci), and possess one or two genome components, named DNA-A and DNA-B [24]. Over the last four decades, agricultural intensification and the emergence and prevalence of a new and more aggressive biotype of the insect vector (Bemisia tabaci biotype-B) have facilitated an increase in begomovirus populations and their expansion to new plant hosts in Latin America. This has contributed to the emergence of new and more virulent viruses, producing an increase in frequency and severity of disease [3]. Besides cultivated plants, many weed species are hosts for begomoviruses [6], from which they can be transmitted to crops. Thus, weeds may act as virus reservoirs, facilitating recombination and generation of new viral genomes [11, 16, 22]. The characterization of weed-infecting begomoviruses is, therefore, important for elucidating their ecological and evolutionary behavior [6]. Within Latin America, Bolivia has climate zones that are suitable for begomovirus dissemination by its insect vector. However, even though beans, soybeans, cotton, and potatoes [2, 22], all of which are known to be geminivirus hosts, are amongst the main agricultural products cultivated in Bolivia, reports on begomoviral incidence are limited to a bean-infecting virus (bean golden mosaic virus - BGMV) [22]. A survey of Bolivian weeds using rolling-circle amplification (RCA)-based detection technology has now increased the number of known begomovirus isolates to eight.

RCA utilizes the DNA polymerase of the Bacillus subtilis bacteriophage Φ29 [7, 8], which possesses both polymerase and strand-displacement activity, allowing circular templates to be amplified preferentially [18, 28, 29]. The simplicity, high sensitivity, and proofreading activity of the procedure were exploited to amplify small circular DNA sequences without initial knowledge of putative viral sequences present in the samples [12, 18] and combined with RFLP, yielding clear data on the viral agents found in different Bolivian samples [12, 23].

Seven symptomatic plant samples, all showing mosaic symptoms, were collected from weeds from different regions in Bolivia in November 2007 and analyzed by RCA/RFLP: from Boyuibe and Villamontes (S21°01’28.20’’; W63°22’44.00’’; 680m) RCAs-30, -31 and -32, isolated from Sida rhombifolia, Sida micrantha and a putative Solanum plant, respectively; from Cerro Fraile (S18°17’00.00’’; W63°40’33.70’’; 1338m) RCAs-33 and -34, both isolated from Sida rhombifolia, and from Santa Cruz (S 17° 51’ 201’’; W 63° 14’ 465’’; 1622m) RCA-35 and RCA-36, isolated from Sida micrantha and Abutilon sp, respectively (Fig. 1a-g). Total DNA was extracted using a CTAB-based method [17] and the presence of small circular DNA was confirmed by RCA followed by digestion with the restriction enzyme HpaII (New England Biolabs, Frankfurt, Germany) and RFLP pattern analysis [12]. In order to obtain partial tandem repeats of viral full-length DNAs, a novel and efficient strategy was developed, which is applicable for all geminiviruses without the necessity to search for appropriate restriction sites. The RCA products were partially digested with Sau3AI (or its isoschizomer BfuCI, New England Biolabs): 1-1.5 μg of RCA product was treated with 0.4 units of the enzyme for 4 min at 37°C, and the reaction was stopped by adding 1 mM EDTA and 0.1% SDS (final concentration) and heating the solution for 20 min to 65°C. Of the resulting fragments, those of approximately 2.8 to 5.0 kb were eluted from a 0.8% agarose electrophoresis gel using a GFX DNA Gel Band Purification Kit according to the manufacturer’s instructions (GE Healthcare, Munich, Germany) and eluted in 30 μl of sterile water. The DNA from 3 μl was inserted into BamHI-cut pGreen0029 [14] plasmid, and E. coli DH5α cells were transformed with the recombinant DNA. Plasmids containing inserts were sequenced using universal primers (M13F/R), either in our laboratory (CEQ 8000 Genetic Analysis System, Beckman-Coulter) or commercially by Macrogen (Seoul, South Korea).

Fig. 1
figure 1

Mosaic symptoms in plants from which the virus samples were isolated. Sida rhombifolia (a, d, e; source of RCAs-30, -33 and -34, respectively), Sida micrantha (b; source of RCA-31 and f; source of RCA-35.1 as well as RCA-35.2), a putative Solanum plant that was difficult to identify because it had been reduced in size by grazing animals (c; source of RCA-32); Abutilon sp (g; source of RCA-36). Plant samples (a-c) were collected in Boyuibe-Villamontes, (d and e) in Cerro Fraile and (f and g) in Santa Cruz. The respective RCA samples were diagnosed using RFLP patterns of HpaII restriction (h). The fragments sizes predicted from the sequences are indicated where they fit to the electrophoretic mobility observed. Values in brackets represent bands too small to be resolved in this gel system; unassigned fragments may have resulted from polymorphism within the viral quasispecies or from unidentified begomovirus components. The genomic organization was determined by sequencing sample RCA-30 (i), which is representative for all the newly described viruses. ORF positions and directions of translation are indicated by arrows in DNA-A and DNA-B, AV1 for the coat protein (CP), AC1 for the Rep protein, AC2 for a transcriptional activator protein (TrAP), AC3 for a replication enhancer (REn), AC4 for a protein with unknown function in New World begomoviruses, BV1 for a nuclear shuttle protein (NSP), and BC1 for a movement protein (MP). CR indicates the common region; Ori, the origin of replication; and “iterons”, repeated DNA sequences that are putative Rep binding sites

The complete sequences obtained for DNAs A and B were assembled (Contig Assembly Program, Bioedit software) [13] and analyzed with BLAST [1]. For phylogenetic analysis, the sequences were compared using codaln software [25] in order to optimize the alignment of the most closely related viral sequences from the international databases retrieved with BLAST (version 2.2.22+, January 15, 2010) [30], and neighbor joining trees were calculated using the algorithms included in MEGA 4.0 [26], with 1000 bootstrap replications each. Virus names and accession numbers are listed in Supplementary Table 1.

All seven plant samples contained small circular DNA as inferred from RCA/RFLP analysis (Fig. 1h). After partially digesting the RCA products, inserting their fragments into pGreen0029, and sequencing positive clones, all of the viruses showed the typical genome organization of bipartite New World begomoviruses, with five open reading frames (ORFs) on their DNA-A molecules, encoding replication-associated protein (Rep), transcriptional activator protein (TrAP), replication enhancer protein (REn), AC4, and coat protein (CP), and with two ORFs on their DNA-B molecules, encoding movement protein (MP) and nuclear shuttle protein (NSP).

The sample from RCA-35, isolated from Sida micrantha, revealed a mixed infection with two different begomoviruses. Using the threshold level of 89% sequence identity (SI) for differentiating begomovirus species according to ICTV [9], and comparing the sequences to other geminiviral sequences in the database, one isolate from RCA-32, isolated from a putative Solanum plant, the two isolates from sample RCA-35 and one isolate from RCA-36, extracted from Abutilon sp, were identified as members of four new begomoviral species (Table 1 and Supplementary Table 2). Their component A/B pairs shared >96% SI within the common region (CR) and had identical iterons, the Rep-binding DNA motifs, which are species-specific in New World begomoviruses [4, 5]. These results support the conclusion that each isolate belongs to a new begomoviral species with two cognate DNA components. The other four isolates, one from Sida micrantha (RCA-31) and three from Sida rhombifolia (RCAs-30, -33 and -34) showed >90% DNA-A SI to previously reported sequences of Sida micrantha mosaic virus (SimMV) that had been detected in Brazil and therefore belong to this species. The isolate from sample RCA-31(S. micrantha) is a variant of Sida micrantha mosaic virus [Mato Grosso do Sul2:2007] (SimMV-MGS2:07; FN436005) with 95% SI. Isolates from samples RCA-30, -33 and -34 (S. rhombifolia) shared 94% and 93% SI with SimMV isolate 5157 (accession number EU908733.1) and 96% SI to each other and are variants of a new strain. We did not observe any statistically significant recombination events within the viral sequences described in this study using TOPALI software [20].

Table 1 Virus names, abbreviations and accession numbers of the sequences identified in this study

Comparison of the translated amino acid sequences for the ORFs of the new isolates (Supplementary Table 3) revealed that, for the DNA-A molecules, AV1 is the most conserved ORF (77-99% SI), and AC4 the most variable one (40-94% SI), and that for the DNA-B molecules, BC1 (83-98% SI) is more conserved than BV1 (72-93% SI).

In a phylogenetic analysis of the DNA-A components, RCA-35 (S. micrantha) and RCA-36 (Abutilon sp) group with cleome leaf crumple virus, but the bootstrap value was not significant. RCA-32 (Solanum) was found to be closely related to soybean blistering mosaic virus (SbBMV), which has been detected in soybean crops in Argentina [10], forming a distinct cluster with a bootstrap value of 99% (Fig. 2). The three variants of the new SimMV strain, RCAs -30, 33, and 34 (S. rhombifolia), cluster together with the other SimMV sequences (100% bootstrap value). The isolate from sample RCA-31 (S. micrantha), classified by pairwise comparison analysis as a variant of SimMV-MGS2:07, clustered with SimMV-MGS1:07 and SimMV-MGS2:07 (100% bootstrap value), as expected. Similarly, the DNA-B components of samples RCA-35 (S. micrantha) and RCA-36 (Abutilon sp) did not group with any other sequence, while that of RCA-31 (S. micrantha) and those of RCAs-30, -33 and -34 (S. rhombifolia) were located within the SimMV cluster (Fig. 2). In contrast, the DNA-B sequence from sample RCA-32 (Solanum) seems most closely related to tomato yellow vein streak virus (ToYVSV), but this may be because there is no SbBMV DNA-B sequence available in the database, and ToYVSV is a close relative of SbBMV (Fig. 2). The results of RCA-RFLP fragment pattern analysis (Fig. 1h) were in agreement with what was expected from in silico HpaII restriction of the determined sequences [27], with only a few unexpected bands, which may have resulted from polymorphic restriction sites or amplification of additional small circular DNA components present in the sample, such as mitochondrial DNA or unidentified begomovirus components.

Fig. 2
figure 2

Neighbor-joining trees of DNA-A and DNA-B for the isolates detected in this study (RCA-30, -31, -32, -33, -34, -35.1, -35.2, -36) compared to the sequences with highest sequence identity from the international database GenBank (January 15, 2010) and using African cassava mosaic virus (ACMV) as an outgroup. Numbers next to the branch points indicate bootstrap values (1,000 replicates) above 50% (0.5). Newly described isolates are highlighted and marked with asterisks if they represent new virus species. The corresponding identifier numbers, names, abbreviations and accession numbers are listed in Supplementary Table 1

As we can see from the sequence and phylogenetic analysis, there is a large amount of diversity amongst begomoviruses infecting plants all over America. Sequences from South America tend to group together, showing a closer relationship to each other than to sequences from Central and North America, which also show closer relationship to each other than to South America sequences, suggesting that the viruses from one geographical area are evolving from other viruses present in that area rather than being spread over long distances.

Considering their sequences, phylogenetic relationships, symptoms and the hosts from which the new viruses were isolated (Fig. 1a-g), we propose the name Solanum mosaic Bolivia virus (SoMBoV) for the species represented by RCA-32 (Solanum), Sida mosaic Bolivia virus 1 (SiMBoV1) for the species represented by RCA-35.1 (S. micrantha), Sida mosaic Bolivia virus 2 (SiMBoV2) for the species represented by RCA-35.2 (S. micrantha) and Abutilon mosaic Bolivia virus (AbMBoV) for the species represented by RCA-36 (Abutilon sp). Although we have not transmitted the viral isolates back to the original host plants and therefore cannot assign any symptom name with confidence, we are following the current policy of the ICTV study group (pers. communication) with these names. For the new SimMV strain isolates from RCAs-30, -33 and -34 (S. rhombifolia), we suggest the name Sida micrantha mosaic virus - Rhombifolia for the same reasons, although our previous research results did not support the symptom name [15].