Citrus tristeza virus (CTV), a member of the genus Closterovirus in the family Closteroviridae, is the most economically important virus of citrus worldwide. This virus can cause quick decline of sour orange (Citrus aurantium L.) or lemon [C. limon (L.) Burn. f.] rootstocks, stem pitting, regardless of the rootstock, and seedling yellows on sour orange, grapefruit (C. paradisi Macf.) or lemon seedlings. CTV is naturally transmitted by aphids in a semi-persistent manner [17]. CTV virions have two coat proteins of 25 (CP) and 27 (CPm) kDa, encapsidating about 97 and 3 %, respectively, of a single-stranded, positive-sense genomic RNA [5].

Phylogenetic analysis of nucleotide sequences of the most variable genomic region of CTV has allowed subdivision of isolates into seven genotypes: VT, T3, T30, T36, B165/T68, RB (trifoliate resistance breaking), and HA [69, 16, 21]. Although it has been determined that CTV populations consist of highly divergent molecular variants and recombinants derived from divergent variants [22, 27], the driving forces for CTV evolution are poorly understood. Previous studies indicated that some genomic regions of CTV isolates from different geographic origins were under negative selection, and showed frequent recombination [14]. Recently, an analysis of CP sequences from global CTV isolates showed moderate genetic variability and that the same recombination events could be detected in the CP of isolates from different geographic regions [23]. The genetic variability of the CPm gene has rarely been analyzed. There is only one report on the presence of two dominant groups (A1 and T30/R1) in Colombian field isolates based on CPm gene sequence analysis [18]. The CP and CPm proteins have multiple functions for CTV survival in hosts. The two coat proteins, combined with the heat shock protein homolog (HSP70h) and p61, are needed for virion assembly [27, 28], and CPm is possibly involved in aphid transmission [4, 5]. Therefore, it is important to understand the forces driving evolution of these two genes.

China represents an old country for citrus production. It has been suggested that CTV could have originated from China [1]. Previous studies have shown a complex CTV population in China [10, 28, 29]. Novel severe strains, including RB and B165/T68-1, have been identified in recent years [6, 7, 21], highlighting the need to enhance our understanding of the population structure of CTV in China. In this study, we collected CTV isolates from four citrus-producing provinces in China and analyzed the genetic diversity and population structure of these isolates by sequence analysis of the two capsid genes (CPm and CP).

A total of 79 citrus samples were collected from four provinces (Hubei, Hunan, Jiangxi and Sichuan) in southern and western China. Most of the sampled citrus trees were asymptomatic, except for a few plants that showed reduced growth. Isolate AT-1 was obtained from a field isolate containing different variants by aphid transmission. Five additional isolates (S4, N4, S45, S39 and S36), maintained in an insect-proof greenhouse at the National Indoor Conservation Center of Virus-Free Germplasm of Fruit Crops (Wuhan, Hubei, China) [10, 28], were also included in this study.

CTV detection was carried out by RT-PCR using the T36-CP primer set [11]. The primers CPm-F 5′-CGTCATATGAGCAGAGACGTG-3′ and CP-R 5′-TCAACGTGTGTTGAATTTCC-3′ [2, 11] were used for the amplification of an approximate 1540-bp fragment spanning the 3′ terminal region of the p61 gene (about 53 bp), the complete CPm (723 bp) and CP (672 bp) genes, and their internal non-coding region (about 92 bp).

Total RNA was extracted from leaf and phloem tissues using TRIzol Reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. First-strand cDNA was synthesized using M-MLV reverse transcriptase (Promega, Madison, WI, USA) and the random primer hexadeoxyribonucleotide mixture pd(N)6 (TaKaRa, Dalian, China) at 37 °C for 1 h. PCR reactions were carried out in a Mastercycler® gradient Thermocycler (Eppendorf, Hamburg, Germany) with the following conditions: one cycle at 94 °C for 3 min, then 35 cycles at 94 °C for 30 s, 55 °C for 30 s and 72 °C for 95 s, with a final extension for 10 min at 72 °C. The products were gel-purified and cloned into the vector pMD18-T (TaKaRa, Dalian, China). At least three positive clones of each product were sequenced by a commercial service (GenScript Biological Science, Nanjing, China). All sequences obtained were edited to remove primer sequences.

Multiple nucleotide and amino acid sequence alignments were performed using the MUSCLE algorithm implemented in the program MEGA 5.2.2 [25]. Aligned sequences for CPm and CP were scanned using seven recombination detection programs (RDP, BOOTSCAN, SISCAN, SEQ, GENECONV, LARD and MAXCHI) implemented in the programs RDP 3.44 [13] and Recco [15] to detect putative recombination events. Recombinant sequences were subsequently excluded from phylogenetic analyses. Phylogenetic trees were inferred by the maximum-likelihood (ML) method implemented in MEGA 5.2.2, with 1000 bootstrap replicates and the general time-reversible model of nucleotide substitution and gamma distribution of heterogeneity. The genetic distances of intra- and inter- clusters were computed using MEGA 5.2.2 with 1000 bootstrap replicates. The fixed-effects likelihood (FEL), internal fixed-effects likelihood (IFEL) and random-effects likelihood (REL) methods available from the DATAMONKEY server (http://www.datamonkey.org) were used for the identification of positively and negatively selected sites in the CPm and CP genes. Only concordant results using all three methods were considered.

A total of 32 citrus samples were positive for CTV by RT-PCR using the T36-CP primer set, and amplicons with the expected size of about 1540-bp from 26 individual samples were sequenced and submitted to GenBank with accession numbers KF144718 to 144771 (Supplementary Table 1). CTV from each sample was considered as an individual isolate. In total, 81 sequences were obtained from these isolates. Most of the isolates from Sichuan province comprised at least two different variants, while all four isolates from Jiangxi province and three isolates (N4, S4 and AT-1) from Hubei province had only one variant. Thus, 38 consensus sequences of clones with over 99 % identity within each isolate were generated and used for analysis.

These sequences were first scanned for putative recombination events in the CPm and CP genes. One recombination event involving the CP sequence of isolate SC-WZ-9 was identified (Suppl. Fig. S1A), and no recombination event was detected in CPm sequences. To avoid different alignments with contrasting evolutionary histories produced by recombination [19], the sequence of SC-WZ-9 was excluded from subsequent phylogenetic and population genetic analysis. Then, 58 CP sequences (37 obtained in this study and 21 retrieved from GenBank), and 60 CPm sequences (37 obtained in this study and 23 from GenBank) were used for analysis of phylogenetic relationships. The CP sequences of isolates A1 and R1 from Colombia were unavailable from GenBank, and thus were absent in the phylogenetic analysis of the CP gene.

The overall mean distance was 0.071 ± 0.006 and 0.095 ± 0.009 for the CP and CPm gene, respectively. These values are much lower than those previously observed for other seven genes (p33, HSP70h, p61, p18, p13, p20 and p23) in the 3′ region of the viral genome [14].

Phylogenetic analysis based on CP nucleotide sequences revealed six major groups. Based on their placements related to reference sequences of CTV isolates [7], these groups were designated as I (RB), II (T36), III (HA), IV (T30), V (VT), and VI, respectively (Fig. 1). Group VI is a newly identified cluster consisting of only two Chinese isolates. The sequences from Chinese CTV isolates segregated into all six groups, suggesting no correlation between the genetic relationship and geographic origin of CTV isolates. The majority (41.7 %) of CP sequences belonged to group RB, followed by group VT (32.0 %), groups T36 and HA (both 7.8 %), group T30 (5.8 %) and the new group VI (4.9 %). The average genetic distance over sequence pairs within each of the six CP-derived groups ranged from 0.005 ± 0.003 to 0.041 ± 0.006, and the average genetic distance between groups ranged from 0.064 ± 0.008 to 0.104 ± 0.014 (Supplementary Table 2).

Fig. 1
figure 1

Unrooted maximum-likelihood phylogenetic trees based on nucleotide sequences of CP (A) and CPm (B) of Chinese and representative global isolates of citrus tristeza virus. Bootstrap values (1000 replicates) are given at the branch nodes, and the values greater than 50 % are shown. Roman numerals I to VI represent individual groups. Names, geographical origins and GenBank accession numbers of reported isolates are indicated in bold and italic. The sequences from the same isolate having ≥99 % similarity are represented by one sequence, and their numbers are indicated in brackets. Chinese isolates obtained in this study are indicated by their province of origin (HB, Hubei; HN, Hunan; JX, Jiangxi; SC, Sichuan) followed by the isolate name and a clone number. Branch lengths are proportional to the genetic distance. The CP and CPm sequences of HN-XT-34-1, Taiwan-Pum-M/T5 and Kpg3, clustering in different groups in two phylogenetic trees, are indicated by a black circle and a black square, respectively

In the phylogenetic tree generated from CPm nucleotide sequences, all CTV sequences from China segregated into five groups. These were identified based on the relative positions of the same reference isolates in the CP-derived tree. Very low genetic distance (0.029 ± 0.004) was found between clusters RB and T30, which were thus designated as a single group (RB/T30). Except for isolate HN-XT34-1, and as indicated by black circles in Fig. 1, the CP and CPm sequences of most isolates obtained in this study showed the same positions in both the CP and CPm phylogenetic trees. The CP sequence of HN-XT34-1 fell into group T30, while its CPm sequence fell into group VI. In addition, the CP and CPm sequences of isolates Taiwan-Pum and Kpg 3 retrieved from GenBank also showed incongruent positions in both phylogenetic trees (as indicated by black squares in Fig. 1). The average genetic distance over sequence pairs ranged from 0.027 ± 0.004 to 0.050 ± 0.006 within each of the five CPm-derived groups, and from 0.073 ± 0.009 to 0.146 ± 0.017 between groups (Supplementary Table 2).

It was interesting to find that isolates from three provinces (Sichuan, Hubei and Jiangxi) in China formed a separate subgroup in group VT irrespective of their CP and CPm sequences. Isolates in this subgroup had an insertion of three amino acids (Q537th-N538th-R539th) at the C-terminal end of the p61 protein compared with most isolates in other groups (Suppl. Fig. S2). Three reference isolates (NZ-B18, T318A and Kpg 3) in group VT had an insertion of two amino acids (Q537th-N538th or Q537th-S538th) at the same position. This subgroup was designated as AT-1. A HinfI/RFLP analysis of the AT-1 CP gene showed that it contained a single sequence variant and belonged to the HinfI/RFLP group III (data not shown), a prevalent group in China [10, 29].

Notably, most CTV isolates from Hunan province fell into group RB. The results obtained here confirm the presence of RB-like CTV isolates in China [28].

Groups RB and VT consist of CTV isolates from China and other countries. The close genetic relationships amongst isolates in these groups indicate that several events of long-distance migration (gene flow) could have occurred between China and other countries. Three permutation-based statistical tests (Ks*, Z* and Snn) and the Fst statistics implemented in the program DnaSP 5.1 [12] were further used to assess the genetic differentiation and gene flow level between subpopulations. Normally, an absolute value of Fst >0.33 suggests infrequent gene flow, while an absolute value of <0.33 suggests frequent gene flow [12]. All available CP and CPm sequences deposited in GenBank were used for gene flow and natural selection analysis (Table 1). We considered two subpopulations: CTV isolates from China and isolates from other countries. Both subpopulations gave P-values <0.01 for the three tests and Fst values of 0.02204 and 0.06654 for the CPm and CP gene, respectively, suggesting low genetic differentiation in these two genes and a relatively higher gene flow in the CPm than that in CP. Long-distance gene flow or migration has also been described for CTV isolates from other countries [22], suggesting that frequent genetic exchange could have occurred over time between CTV populations from different geographical locations.

Table 1 Nucleotide diversity and neutrality tests calculated for the CPm and CP of CTV isolates on the basis of their geographical origins

The rate of synonymous substitutions per synonymous site (dS) and the rate of nonsynonymous substitutions per nonsynonymous site (dN) in the CPm and CP genes were calculated using the program DnaSP 5.1. The values of dN/dS for the CPm and CP genes of the worldwide population and two subpopulations were lower than one, indicating purifying selection (Table 1). The higher value of dN/dS for CP (0.066) compared with CPm (0.044) suggests that CPm is under stronger selection pressure than CP. To evaluate the importance of natural selection in shaping the population structure of CTV, Fu and Li’s D and Fu and Li’s F tests were used to test the mutation neutrality hypothesis for three subpopulations. The positive values for CPm of the subpopulation consisting of isolates from other countries suggested a possible population bottleneck (Table 1). Although the values for both CP and CPm from the Chinese subpopulation were non-significant, significant negative values were observed for the CP gene of the global population, suggesting population expansion. Therefore, demographic events may have contributed to CTV evolution.

For the CP gene, three positively selected sites (1.3 %) corresponding to codons 31, 41 and 68, and 67 negatively selected sites (30.0 %) were identified by both the FEL and IFEL methods at a significance level of 0.1. For the CPm gene, two positively selected sites (0.8 %) at positions 9 and 102, and 95 negatively selected sites (39.6 %) were identified. Positive selection at codon 102 has been reported previously [7]. It is known that CPm is involved in aphid transmission and virion assembly, and CP is related to virion assembly and viral RNA protection [4, 24]. Thus, the biological significance of those positively selected sites needs to be investigated further.

Initially, only one recombination event was found in the CP gene of isolate SC-WZ-9, and we used the appropriate 1498-bp fragment sequences for further investigations of recombination. The incongruent positions of isolate HN-XT34-1 in the CP- and CPm-based phylograms suggested a recombination event (Fig. 1). Further analysis confirmed the presence of a recombination event in HN-XT34-1, occurring between the C-terminus of CPm and the internal, non-coding region, with putative parental sequences from the RB and T30 groups (Suppl. Fig. S1B). In addition, putative recombination events were detected within 27 1498-bp sequences using RDP (Table 2). Sequence alignments showed that most of the recombination events occurred in AT-rich regions (Suppl. Fig. S3). Although the exact parents for these recombinants were not identified, most of the recombinants, irrespective of their geographic locations, had a putative parental sequence from Mexico or from isolates of the HA group, further supporting an ancient recombination event that occurred before the recombinants spread worldwide, as previously suggested for the CP gene [23]. Most of the recombination events occurred in the CPm region, with only two recombination events detected in the CP gene, suggesting that the CPm gene could be a recombination hotspot in the CTV genome. Most recombination breakpoints (except for two events in SC-Z-2-1-4 and SC-Z-1-1-4) were located outside of the CPm and CP regions, which could explain why only one recombination event in CP and no recombination events in CPm were detected when using the CPm or CP gene individually for the test. Previous reports have indicated that the VT strain can be more effectively transmitted by aphids than strains T36 and T30 [3, 26], and our results show that recombinant isolates belonged mainly to the VT group, providing a possible explanation for the expansion of the VT cluster. The high gene flow between distant geographical regions, the possible selection pressure and multiple infections by successive aphid inoculations could have contributed to the recombination events between variant sequences [20].

Table 2 Recombination events detected in the 1540-bp fragment sequences of CTV isolates by using seven programs packaged in the software RDP3

In conclusion, this study represents a comprehensive analysis of the genetic diversity and population structure of CTV isolates in China. Our results also revealed the balance between strong purifying selection and high genetic variability in both the CPm and CP genes, which could be related to their biological functions. The structure of the CTV population in China is not associated with geographic location and is mainly shaped by long-distance gene flow, negative selection and frequent recombination.