Chrysanthemum virus B (CVB) is a member of the genus Carlavirus, family Betaflexiviridae [52] that is widespread throughout chrysanthemum-growing areas in both India [47] and the rest of the world. The host range of the virus is quite narrow and is apparently restricted to chrysanthemum and a few Nicotiana species. Although primarily transmitted in a non-persistent manner by the aphid Myzus persicae, CVB is also sap-transmissible. Slightly flexuous 685 × 12 nm rod-like virus particles contain a single linear, positive-sense, single-stranded RNA genome that is ~8.8 kb in size. As with other carlaviruses, the encapsidated CVB genomic RNA molecule is likely to have a poly(A) tail at its 3′ end and a 5′ monophosphate cap structure [10, 11, 23, 56].

The genome contains six probable genes (open reading frames [ORFs] 1-6): (i) at the 5′ end, an RNA-dependent RNA polymerase (ORF1) gene that accounts for ~70% of the genome and encodes a viral replicase with potential methyl transferase (MTR), RNA helicase (HEL) and RNA-dependent RNA polymerase (POL) domains [17, 33]; (ii) a triple gene block (TGB1-3; ORFs 2-4) which, by inference from other carlaviruses, encodes proteins required for cell-to-cell movement [8, 28]; (iii) a coat protein (CP) gene (ORF5) encoding the only subunit of the virus capsid [17], and (iv) at the 3′ end, a cysteine-rich nucleotide triphosphate (NTP)-binding protein gene [ORF6; 14, 56].

Although there are currently many partial sequences of the 3′-terminal regions of carlavirus genomes, full-length genomic RNA sequences have only been reported for potato virus M [PVM; 56], garlic latent virus [GarLV; 49, 51], blueberry scorch virus [BlScV; 3], daphne virus S [DVS; 24] and aconitum latent virus [AcoLV; 12]. Recently, the complete sequences of CVB [42], potato rough dwarf virus and potato virus P [PRDV and PVP; 32], hop mosaic virus [HpMV; 40], coleus vein necrosis virus [CVNV; 22], ligustrum necrotic ringspot virus [LNRSV; 46], red clover vein mosaic virus [RCVMV; 44], potato latent virus [PotLV; 54], kalanchoe latent virus [KLV; 7], butterbur mosaic virus [ButMV; 16] and cowpea mild mottle virus [CPMMV; 36] have also been reported.

A study of the CP genes of 29 Indian CVB isolates revealed the presence of at least three major lineages of the virus in India. It was also reported that genetic recombination might contribute to the diversification of CVB on this subcontinent [47]. To further characterize Indian CVB diversity and to assess the impact of recombination on CVB evolution, the full-genome sequences of four isolates representing the breadth of Indian CVB diversity (PB, UK, UP and TN) were determined by reverse transcription (RT)-PCR amplification of overlapping genome fragments.

The virus isolates PB and UK were obtained from the Indian states of Punjab and Uttarakhand, respectively. They correspond to previously described PB and UK2 isolates [47]. Isolates UP and TN were obtained from the states of Uttar Pradesh and Tamil Nadu, respectively. Chrysanthemum leaves were harvested from single infected plants, and virus RNA was isolated [9].

Eleven primer pairs were designed to collectively enable amplification of near-full-length CVB genomes (except replicase) using all available CVB sequences in GenBank in June 2006. Primers to amplify the replicase gene were designed by alignment of the different complete genome sequences of carlaviruses available at that time: poplar mosaic virus (accession no. NC_005343), GarLV (NC_003557), BlScV (NC_003499), lily symptomless virus (NC_005138), AcoLV (NC_002795), HpMV (NC_002552), and PVM (NC_001361). These sequences were aligned using either MultAlin [web site: http://prodes.toulouse.inra.fr/multalin/multalin.html; 4] or ClustalW [19], both with default settings. Primer pairs corresponded mostly to the conserved 5′ and 3′ ends of the genes, but in the replicase gene, five primers were designed that bound to internal conserved sites (Supplementary Table 1).

RT-PCR amplification of CVB genome fragments was standardized using total RNA isolated from partially purified virus preparations from infected chrysanthemum plants according to Ram et al. [43]. The amplified genome fragments were gel-purified using a GenElute Gel Extraction Kit (Sigma), and cloned into pGEM-T Easy Vector (Promega). Cloned amplicons of different CVB genome fragments were sequenced using an ABI Prism 310 with an ABI prism Big Dye- Terminator v3.0 Ready Reaction Cycle Sequencing Kit (Applied Biosystems). Both strands were completely sequenced in all cases, with no heterogeneities in the overlap.

The different overlapping CVB genome fragments were assembled into near-full-length genome sequences and submitted under GenBank accession numbers AM493895, AM765839, AM765838, and AM765837 (for isolates PB, TN, UK and UP, respectively). The sequences ranged in size from 8815 (UP isolate) to 8855 (PB isolate) nucleotides and appeared to contain all six ORFs previously identified in the Japanese full-length CVB genome [42; Supplementary Fig. 1].

By analogy with the Japanese CVB isolate (CVB-S) and using ORF mapping performed by DNAMAN (Lynnon Biosoft), we identified six probable genes (Supplementary Table 2). Pairwise distance analysis of CVB isolates (Table 1) indicated that throughout their genomes, the Indian CVB isolates were more similar to one another than they were to any other isolates—including the Japanese CVB-S. Over most of their genomes, however, the Indian isolates are quite closely related to CVB-S (with which they uniformly share >81% identity). Since the predicted N-terminal portion of the replicase expressed from ORF1 is quite conserved among the carlaviruses, it was somewhat surprising that ORF1 of the Indian isolates was only ~59% identical to that of the Japanese isolate. Nevertheless, the Indian CVB sequences contained conserved domains of the alkalylated DNA repair protein (AlkB), the carlavirus endopeptidase (peptidase_C23), viral RNA helicase 1 and RNA-dependent RNA polymerase (RdRp2).

Table 1 Percent identity between nucleotide (above the diagonals, in italic) and amino acid sequences (below the diagonals, in bold) of CVB-S and different CVB isolates

The movement protein complexes (TGB) of the Indian isolates showed from 81 to 92% sequence similarity to CVB-S. As with all other analyzed members of the genus Carlavirus, TGB3 diversity was higher than that of other viral proteins, whereas nucleic-acid-binding protein (NABP) displayed the highest degree of conservation (data not shown). This latter protein contained four cysteine residues in the arrangement CX2CX13CX4C—a pattern which has been found in many NABPs [1, 21; Supplementary Table 3]. Also, by analogy to previously characterized domains and motifs identified for other carlaviruses, we identified various amino acid motifs with probable functionality within the CVB genomes (Supplementary Table 3 and Supplementary Fig. 2).

The four new Indian CVB genome sequences were then compared with other carlavirus full-genome sequences obtained from public sequence databases, and phylogenetic trees were constructed using both the maximum-likelihood method, implemented in PhyML [15], with automated selection of the nucleotide substitution model GTR+I+G4 in Recombination Detection Program (RDP) 3 [29] and 100 full maximum-likelihood bootstrap replicates, and neighbor joining [45], implemented in DNAMAN version 5.1, with the Jukes Cantor nucleotide substitution model and 1000 bootstrap replicates. The analysis by maximum likelihood clearly indicated that the four new full-genome sequences were all more closely related to one another than to those of any other carlaviruses, including the full-genome sequence of CVB-S from Japan. There was, however, 100% bootstrap support for a clade containing the five CVB full-genome sequences, and DVS was the closest relative (Fig. 1).

Fig. 1
figure 1

Phylogenetic relationships of complete genome sequences of four Indian CVB isolates and a Japanese CVB isolate (CVB-S) with different carlaviruses, determined using the maximum-likelihood method (implemented in PhyML, with automated selection of model GTR+I+G4 in RDP3 and 100 full maximum-likelihood bootstrap replicates)

CP sequences from 32 Indian isolates (Supplementary Table 4), one Russian isolate and one Japanese isolate were aligned by ClustalW. Phylogenetic relationships between these isolates were inferred from the nucleotide sequence alignment by maximum likelihood (HKY model with transition:transversion ratios estimated from the data and 100 non-parametric bootstrap replicates) using PhyML [15]. The topology of the maximum-likelihood trees tentatively classified the CVB isolates into three major groups, presenting three viral lineages of the virus on the Indian subcontinent (Supplementary Fig. 3). PB and UP isolates, which were completely sequenced in the present study, fell into group I, TN and CVB-S belonged to group II, and UK2 (as UK in this manuscript) represented group III (Supplementary Fig. 3). When inoculated on Petunia hybrida, the Indian isolates induced different symptoms (Supplementary Fig. 4).

As recombination within the CP sequences of CVB has been reported previously [47], we tested for the presence of recombination within the full-genome sequences of CVB and of representatives of other carlavirus species. A number of studies were found in the literature in which RDP had been used for analysis of recombination in RNA viruses [20, 37, 38, 53]. We applied a group of seven recombination analysis methods implemented in the program RDP3 [2, 13, 2931, 35, 39, 41]. We only accepted phylogenetically supported evidence of recombination that was detected by three or more independent detection methods with a Bonferroni corrected P-value cutoff of 0.05.

Evidence of 16 potentially unique recombination events was detected by more than three recombination-detection methods (Table 2), although all of the events were detected with a high degree of confidence by at least two different recombination methods. While it is tempting to speculate that at least some of the 16 potential recombination signals that were each detected by more than four methods might be genuine evidence of recombination, to be absolutely sure that false positives were excluded, we focused further analysis exclusively on the one potential recombination event detected in isolate CVB-S, an event independently identified with a high degree of confidence by all seven recombination-detection methods (P-Values: RDP = 6.6 × 10−154, GENECONV = 1.4 × 10−143, BOOTSCAN = 3.4 × 10−162, MAXIMUM CHI SQUARE = 4.4 × 10−62, CHIMAERA = 1.3 × 10−44, SISCAN = 1.3 × 10−114, 3SEQ = 1.1 × 10−19). CVB-S was found to be a recombinant of UP as the major parent and TN as the minor parent. The potential recombination beginning and ending breakpoints were predicted to be at nucleotide positions 538 and 4260, respectively (Fig. 2). The recombinant region lay in the replicase region of the viral genome.

Table 2 Recombination events in the genomes of chrysanthemum virus B isolates as detected by “RDP v.3”
Fig. 2
figure 2

Evidence that isolate CVB-S is a recombinant of viruses resembling isolates CVB-UP and an unknown CVB-TN-like virus. a Maximum-likelihood trees (model HKY, 100 bootstrap replicates) constructed using different portions of the CVB genome. b BOOTSCAN evidence for the recombinant origins of different portions of CVB-S (100 bootstrap replicates, Jukes Cantor distances), constructed for 200-nucleotide sequence windows, step size 20

To further confirm the possible recombination events detected by the RDP, GENECONV, BOOTSCAN, MAXIMUM CHI SQUARE, CHIMAERA, SISCAN and 3SEQ methods, we used the RECCO program [34], a fast, simple, and sensitive method for detecting recombination in a set of sequences and locating putative recombination breakpoints based on cost minimization. All of the isolates that showed recombination by the different methods implemented in RDP showed recombination by RECCO. Whereas RDP identified a stretch (Fig. 2), RECCO specifically indicated a number of regions within this stretch (Fig. 3). The graph displayed breakpoints represented by downward peaks in the dataset (Fig. 3). The P-value for recombination is shown if the recombination event was the strongest in the whole dataset (Supplementary Table 5).

Fig. 3
figure 3

Putative recombination breakpoints, represented by downward peaks in the genomes of six CVB isolates, detected by the RECCO method based on cost optimization. The p-value for recombination is shown in the dataset if the recombination event was the strongest. The same regions were concluded to be recombinant when analyzed by the RDP v.3 method

This analysis revealed that all five of the CVB sequences are detectably recombinant, with compelling evidence of unique recombination events. Although many of these recombination events had clearly occurred among CVB isolates (as intra-CVB events), four events apparently occurred between CVB isolates and currently unsampled virus lineages that were (i) divergent CVB strains, (ii) currently undescribed CVB-like species, or (iii) species only distantly related to any currently described Carlavirus species.

The pervasiveness of recombination detected among the sequences described herein was surprising in that previously, analyses of recombination in CVB CP-encoding genes had indicated evidence of only a single recombination event. This would suggest that the CP gene of CVB may be a recombination cold-spot. Other genome-wide analyses of recombination patterns in members of a number of different viral species have, in fact, shown that recombination tends to be more constrained (or at least less prevalent) within structural protein genes [5, 6, 18, 2527, 48, 50, 55]. It is therefore possible that CVB, and potentially also other flexiviruses, conform to this pattern. If they do, this factor, coupled with the relative overabundance of flexivirus CP genes in public sequence databases, might explain why very little evidence of recombination in the carlaviruses has been reported to date.