Infectious diseases cause considerable economic losses in agricultural crops worldwide. Characterization of the genetic variability of pathogen populations is necessary to not only understand their evolution and epidemiology, and perhaps to predict disease emergence, but to also design specific diagnostic tools and strategies for disease control or eradication. Plant RNA viruses have a great potential for evolution and adaptation due to their rapid replication, generation of large populations, and high mutation rate since RNA polymerases lack proofreading activity [1]. The genetic variation generated by mutation, and in some viruses by recombination [2], can be limited or shaped by natural selection, genetic drift, and gene flow [3]. Although genetic variation has been studied for a certain number of plant viruses [46], it has not been addressed for most plant viruses.

Tomato mosaic virus (ToMV), a member of the genus Tobamovirus [7] is distributed worldwide and causes a serious disease in tomato and other solanaceous crops [8, 9]. Tobamoviruses are very stable, able to survive for several years in dried plant debris, and can be transmitted to other plants by mechanical contact and through seeds [8]. Virions are rigid helical rodsmonopartite, linear, positive-sense single-stranded RNA of about 6.5 kb, which encodes at least four proteins. Two proteins of 130 and 180 kDa are translated directly from the genomic RNA. The 130 kDa protein contains characteristic motifs of the putative methyl transferase and helicase domains. The 180 kDa protein, which is synthesized by a read-through of the amber termination codon of the 130 kDa protein gene, is a RNA-dependent RNA polymerase. Both, the movement and the coat proteins are translated from different subgenomic RNAs which are synthesized during the replication cycle [7].

In this article, the phylogenetic relationships and population genetics of worldwide ToMV isolates were studied by analyzing the nucleotide sequences of the coat protein gene (CPG) to gain insight into the processes involved in the evolution of this virus.

A total of 75 ToMV isolates were analyzed. Twenty-nine isolates were collected from pepper or tomato fields in different provinces of Spain (Almería, Barcelona, León, Murcia, Valencia, Vizcaya, and Zaragoza). Total RNAs were extracted from 0.1 g of leaves by using the silica extraction procedure [10], and were used as templates for RT-PCR amplification with primers Tob-Uni 2 and ToMVspec [11]. The nucleotide sequences of these RT-PCR products were determined in both senses using an ABI PRISM DNA sequencer 377 (Perkin–Elmer) and deposited in GenBank under accession numbers: JF810425-JF810439 and JN381931-JN381944. These sequences were analyzed along with equivalent sequences of 46 ToMV isolates from different crops (tomato, pepper, eggplant, lilac, camellia, dogwood, red spruce, etc.) and natural water sources (lake, stream, melting ice) from several countries: Germany (4), Denmark: Greenland (6), Iran (4), Kazakhstan (3), China (8), Korea (2), Taiwan (1), Malaysia (1), USA (6), and Brazil (11), which were retrieved from GenBank (accession numbers are shown in Fig. 1).

Fig. 1
figure 1

Maximum likelihood phylogenetic tree of the coat protein gene of 75 Tomato mosaic virus (ToMV) isolates and one Tobacco mosaic virus (TMV) isolate used as an outgroup. Only bootstrap values greater than 50% are shown. Branch lengths are proportional to genetic distances. Geographic origin and GenBank accession numbers are indicated. Numbers between parentheses indicate number of isolates

A multiple sequence alignment was performed with CLUSTAL W [12]. The best nucleotide substitution model that fits these data was Tamura-3-parameter [13], which showed the lowest BIC (Bayesian information criterion) and AICc (Akeike Information Criterion, corrected) scores [14]. This model was used to estimate the nucleotide distances and diversities, as well as the phylogenetic relationships. Phylogenetic relationships were inferred by the maximum likelihood algorithm [14] with 1000 bootstraps replicates to estimate node significances [15], using a CPG sequence of Tobacco mosaic virus (TMV) as an outgroup. All these analyses were performed with the program MEGA 5.05 [16].

The phylogenetic analysis showed three main clades with a high bootstrap support (Fig. 1). Clades II and III were composed of only ToMV isolates from Brazil, whereas Clade I was composed of isolates from Brazil and the rest of the world. The pairwise nucleotide distances between isolates within a clade were lower than 0.042 (mean distances of 0.013 ± 0.004 and 0.021 ± 0.007 for Clades I and III, respectively), whereas the distance between two clades ranged from 0.086 to 0.207. Thus, based on the CPG genetic similarity, ToMV could be classified into three genotypes. The existence of divergent isolates in Brazil and the low divergence among the isolates from the rest of the world suggested that Brazil or South America could be the origin of ToMV and that one isolate (or genotype) might have spread worldwide. The distances of ToMV isolates with respect to the TMV sequence used as an outgroup were between 0.304 and 0.368. No correlation was found between genetic variation and the hosts where these isolates were collected (data not shown). A similar approach to the one above has been used for the reconstruction of the epidemiological history (origin and spread) of some plant viruses, e.g. Papaya ringspot virus [17].

To analyze the genetic structure of ToMV in relation to its geographic distribution, the ToMV world population was divided into six geographic subpopulations: Europe (Spain and Germany), South America (Brazil), East Asia (China, Korea, Taiwan and Malaysia), Central Asia (Kazakhstan and Iran), North America (USA), and Greenland. Nucleotide diversity was low within each subpopulation and between subpopulations (both <0.020), except for South America (Table 1), which showed a 0.095 nucleotide diversity and about 0.070 when compared with the other subpopulations. Spatial genetic stability has also been found in other viruses of the genus Tobamovirus [18, 19] and in other plant viruses [4, 5].

Table 1 Genetic diversity and population genetics parameters of geographic subpopulations of Tomato mosaic virus (ToMV)

To study if natural selection had some role in the low genetic variability observed, several methods were used. First, the nonsynonymous and synonymous substitutions (N/S) ratio in the CPG was estimated by the Pamilo–Bianchi–Li method [20, 21], implemented in the program MEGA 5.05, which gave a value of 0.171, indicating negative selection with similar values to other plant viruses [4]. Second, the N/S ratio at each individual codon was statistically tested by the fixed effects likelihood (FEL) method available from the DATAMONKEY server [22] (http://www.datamonkey.org). No positively-selected codon was detected, whereas 116 codons were under neutral evolution and 43 under negative selection (data not shown). Third, the Tajima’s D, Fu and Li’s statistics [23, 24] were calculated to test the mutation neutrality hypothesis using the DnaSP 5.10 program [25]. The three statistics were significantly negative (P < 0.05), suggesting a strong negative or purifying selection. Negative selection could occur by functional constraints during the virus life cycle to interact with the host [26, 27] and vector [28, 29]. The CPG can have selective constraints since it can be implied in the different functions involved in genome protection, as cell-to-cell movement, transmission between plants, etc. [30]. However, genetic shift may have also contributed to the reduction of genetic variability after the bottlenecks that virus populations sometimes undergo during their life cycle such as cell-to-cell and systemic movement [31] or transmission between plants [32].

Finally, to assess genetic differentiation and the gene flow level between subpopulations, three permutation-based statistical tests: Ks*, Z* and Snn [33, 34] and the statistic Fst [35] were used (Table 1). Two subpopulations with a similar distribution of sequence variants would give an Fst value that is statistically nondifferent from zero, whereas an Fs value of one would indicate total separation. All these tests were implemented in the DnaSP 5.10 program. The Ks*, Z*, and Snn tests were significant in most cases, suggesting genetic differentiation between subpopulations. The three tests were not significant in the comparison made between Greenland and North America, whereas when South America was compared with North America, Central Asia or Greenland, only the Ks* test was not significant. Fst values ranged from 0.100 to 0.300 for most cases, suggesting a limited gene flow between subpopulations, and between Spain and Germany in Europe or between Iran and Kazakhstan in Central Asia. It was higher than 0.300 when Greenland was compared with Europe and East Asia, and lower than 0.100 between Europe and Eastern Asia and between Greenland and North America. The low diversity and relatively low gene flow found in ToMV was distinct from other viruses with low gene flow but a greater genetic diversity, e.g., Rice stripe virus [6] or with a high gene flow, e.g., Citrus tristeza virus [36].

In conclusion, ToMV isolates grouped into three main clades or genotypes found in South America. One of these genotypes seems to have dispersed worldwide and manifests a high genetic stability which could be favored by a strong negative selection and genetic shift after bottlenecks.