Introduction

Fish from the Tetraodontidae family present the smallest genomes known in vertebrate species, eight times smaller than the human genome. This small size is due to the short length of introns and intergenic distances and to the scarcity of dispersed repetitive elements. We began random sequencing of Tetraodon nigroviridis, a Tetraodontidae species, whose genome is about 380 Mb, in 1997 (Crollius et al. 2000a). Concurrently, we have completely sequenced 11 BAC clones and several others have been sequenced at the National Institutes of Health (NIH). This paper describes the nucleotidic sequence comparison of these BACs to the human genome, in order to estimate the rate of rearrangements that occurred during the 450 million years (Myr) of independent evolution of these species. Other comparisons have been realized between the genomes of humans and other species, using different strategies, such as Zoo-FISH data (Chowdhary et al. 1998; Wienberg and Stanyon 1995), RH mapping (Band et al. 2000; Schibler et al. 1998; Murphy et al. 2000; Watanabe et al. 1999), and sequence comparisons between human and randomly sequenced genomes of mouse (Waterston et al. 2002) or F. rubripes (Aparicio et al. 2002) or parts of genomes restricted to one chromosome (Dehal et al. 2001; Dunham et al. 1999; Hattori et al. 2000; Mural et al. 2002). On the other hand, our study concerns full-sequenced BACs, comparisons of these data to human genome, and extrapolation of these results to whole genomes of both species. This shows that the level of synteny between these species is reduced and suggests that genomic rearrangements have occurred at the level of 16–26 per Myr during the independent evolution of these two species. This estimation is significantly higher than those obtained by comparisons of human to other mammalian genomes but similar to that obtained by comparison of human to F. rubripes genomes. Genome evolution by rearrangements thus proceeds at different rates depending on the vertebrate classes compared.

Methods

Genomic DNA Sequencing Strategy

DNA from BACs was extracted by standard alkaline lysis, purified on a CsCl gradient, and mechanically sheared to a size of 5 kb and ligated into pCDNA2.1 vector with BstXI adaptors. Shotgun sequences were obtained on Licor sequencers with ThermoSequenase and a dye primer chemistry, using primers SP (5′-TGATTACGCCAAGCTTGGTA-3′) and LP (5′-GCGAATTGGGCCCTCTAGAT-3′). Assembly was performed with the Phrap/Phred programs. Gaps were directly sequenced either on the BAC or on plasmid subclones.

Selection of Contig Sequences from the NIH BAC Sequences

Several BAC sequences from NIH we used are not finished and contain regions containing nonidentified nucleotides, represented by “n.” These are designated by an extension (e.g., AC113239.reg1 indicates region 1 of clone AC113239). For the synteny analyses, we used only the large contigs (Table 1).

Table 1 Description of the T. nigroviridis BACs

Computer Analysis

The BAC sequences were compared to public human nucleotide databases (gbpri and gbest data from GenBank) using a modification of BLAST programs according to the Exofish procedure (Crollius et al. 2000b). Regions larger than 1 kb presenting no Exofish result were compared to human, rodent, mammalian, and vertebrate sequences using TBLASTX2 with default settings, and results were examined manually. For all human matches, the localization was determined from the GoldenPath (http://genome.ucsc.edu/cgi-bin/hgGateway; November 2002 version).

For human genes localized between two genes orthologous to adjacent T. nigroviridis genes, the localization of their fish orthologue was determined by comparing the human protein sequence to the totality of sequences obtained by random sequencing of the fish genome. This was done using BLASTX (Altschul et al. 1990), with a score above 60 and percentage identity above 60%.

Results

Sequencing of T. nigroviridis BAC Clones

We selected BACs for complete sequencing based on features of interest already determined from prior sequencing of their extremities: putative synteny with human chromosomes 14 and 19, homology to the globin gene, and homology to genes involved in the immune response. Several clones from our BAC library were sequenced in the laboratory of E.D. Green (NIH Intramural Sequencing Center; unpublished data), and some of these BAC sequences are not complete; only the large finished regions have been compared to the human genome (described in Table 1).

Comparisons of T. nigroviridis Genomic Sequences with Human Genomic Sequences

The results of the alignments of these BAC sequences with human cDNAs and genomic DNA are listed in Fig. 1A, 1B, 1C, 1D, 1E, 1F, 1G, 1H. Overall, 199 genes were detected in these BACs by comparison with the human sequence. In several cases, the human orthologous genes were identified by selection of the best alignment; otherwise several possible homologous genes are reported. A strict estimation of the syntenic regions considered only cases for which adjacent genes in one species are represented by adjacent genes in the other. Overall, 32 syntenic regions were identified in 20 BACs (Table 2). The syntenic regions contain 78 genes (mean value, 2.44); the largest contains 5 genes (BX629356).

Figure 1A
figure 1A

Alignment of T. nigroviridis BACs with the human genome. The first column contains the localization of the alignments obtained by comparing the BAC sequence to the human genome, using either Exofish or TBLASTX2, and its orientation in the BAC. The second column gives the name of the human transcript, and the third its function, when described with the accession number. For several genes, no description is provided with the accession number (nd, not described). The fourth column contains the alignment obtained with human genomic sequence, when no match is obtained with any human transcript. The last column gives the localization of the gene or genomic region in the human genome. Gray blocks represent syntenic regions between H. sapiens and T. nigroviridis. If human genes are homologous to adjacent T. nigroviridis genes, but separated by genes without fish homologues, the number of additional human genes is indicated in the fifth column.

Figure 1B
figure 1B

.

Figure 1C
figure 1C

.

Figure 1D
figure 1D

.

Figure 1E
figure 1E

.

Figure 1F
figure 1F

.

Figure 1G
figure 1G

.

Figure 1H
figure 1H

.

Table 2 Syntenic regions identified by comparison between T. nigroviridis BACs and the human genome

Several regions contain genes which are adjacent in T. nigroviridis, and close to each other in the Human genome, but separated by one or several genes: In 11 cases, the human genes orthologous to adjacent fish genes are separated by one gene, and in four cases two genes are localized between the two orthologues (these cases are described in Fig. 1A, 1B, 1C, 1D, 1E, 1F, 1G, 1H). To determine if these genes are absent in the fish genome, or present but localized in other regions, the human cDNA sequences were compared to all other genomic sequences not covered by these BACs. In all cases, a putative homologous gene was identified elsewhere in the genome, suggesting that rearrangements were implicated in the elimination of the synteny. Overall, the total number of syntenic regions is 32, and—if we consider that absence of synteny is due to at least one rearrangement—the minimum number of rearrangements is 131 for these BACs.

We did not compare the T. nigroviridis BACs to the murine genome, as the number of rearrangements between human and mouse genomes has been evaluated to be 295 (Waterston et al. 2002), and the evolutive distances between T. nigroviridis and H. sapiens or M. musculus are identical: The numbers of rearrangements between T. nigroviridis and M. musculus are thus probably similar.

Number of Genes in the T. nigroviridis Genome

Two estimations of the numbers of genes in the human genome have been proposed: Lander et al. (2001) proposed around 32,000 genes, and Venter et al. (2001), about 26,600. Since both T. nigroviridis and H. sapiens are vertebrates, the number of genes in both species should be roughly similar. Moreover, the number of genes in F. rubripes has been estimated to be around 38,000 (Aparicio et al. 2002). As this species is evolutionarily closer to T. nigroviridis than H. sapiens (the date of divergence is about 40 Myr ago), this number must also be considered. It is supported by the number of genes (199) identified in the BAC clones (2.16 Mb), extrapolated to the whole genome (∼380 Mb): This estimate is about 35,000. Moreover, a recent estimate obtained by using the assembly of all sequences so far obtained in this species gives a comparable number (about 34.350 genes are predicted by O. Jaillon [unpublished results]). The number of genes in T. nigroviridis would thus be estimated to be in the 26,000–38,000 range.

Number of Rearrangements per Million Years

With these estimates of the number of genes in T. nigroviridis, we could extrapolate the data obtained from the observations of the BACs to the number of rearrangements per Myr (and per Mb). We first counted the number of rearrangements which could be identified between adjacent genes on the BACs: 131 rearrangements were identified. The lowest estimate uses the lowest estimate of the number of genes in T. nigroviridis, and the highest estimate uses the highest estimate in this species. This leads to a number of rearrangements situated between 19.02 rearrangements/Myr (131 × 26,000/450 × 2 × 199) and 27.79 rearrangements/Myr (131 × 38,000/450 × 2 × 199). As the respective sizes of these genomes are 380 and 3000 Mb, the numbers of rearrangements per Myr per Mb are 0.00634–0.05004 and 0.00926–0.07314 (Table 3).

Table 3 Rate of chromosome evolution in different taxa

Similar results are obtained without using any estimate of the number of genes in T. nigroviridis. The number of rearrangements per Myr can be obtained by using the number of rearrangements observed in these BACs and the fraction of the genome that they cover. This estimate is 25.61 rearrangements/Myr (131 × 380/2.16 × 450 × 2) and the number of rearrangements per Myr per Mb is 0.00853–0.06739 (Table 3).

Discussion

During evolution, chromosome rearrangements presumably involve multiple mechanisms: intrachromosomal inversions, interchromosomal translocations, deletions, centromeric fusion, duplications, and movements of internal segments via transposable elements. Physical processes that have been involved in the evolution of vertebrate genomes can be identified by genome comparisons. For these studies, homologous coding genes are the most valuable markers, because most of them are conserved between orders or even between classes. To date, there are gene maps under development in about 50 different mammals.

Comparative genome organization can be studied by cross-species chromosome painting (Zoo-FISH), as this approach can quickly and economically provide a cytogenetic map or homology. More than eight primate genomes have been studied using this technique (review in Chowdhary et al. 1998; Wienberg and Stanyon 1995). The analysis of chromosome rearrangements that separate humans and other great apes showed that chromosome conservation is interrupted by one or two rearrangements per Myr of separation.

Zoo-FISH can also be used to study the relationships between the human genome and more distantly related mammals, although this approach was initially less successful, since hybridization between the probe and the target becomes weaker with evolutionary distance. Nevertheless, comparative maps have been constructed with pig (Fronicke et al. 1996; Goureau et al. 1996; Pinton et al. 2000), cattle (Chowdhary et al. 1996; Hayes 1995), cat (Breen et al. 1999; Chowdhary et al. 1996; Wienberg et al. 1997), Indian muntjack deer (Fronicke and Scherthan 1997; Yang et al. 1997), horse (Raudsepp et al. 1996), sheep, and dog (Breen et al. 1999) genomes. All these comparisons produced a rate of rearrangement of between 0.14 and 0.34 per Myr. This means that about 0.000046–0.000113 rearrangement per Myr per Mb occurred. However, the resolution of Zoo-FISH is limited to 10 Mb (Wienberg and Stanyon 1998), and Zoo-FISH does not detect intrachromosomal inversions.

Several estimates of rearrangements per Myr by comparing genomic sequences have also been performed by mammalian gene localization by whole-genome radiation hybrid (RH) mapping, and this approach has become a powerful and expedient method of gene ordering. Comparison of cattle and human maps showed that 105 regions are conserved between these genomes, and the number of genomic rearrangements has thus been estimated to be about 0.58 per Myr (0.00019 per Myr per Mb [Band et al. 2000]). In the goat genome, 62 conserved fragments were obtained (0.34 disruption per Myr, 0.00011 per Myr per Mb [Schibler et al. 1998]), 100 in the cat (0.55 disruption per Myr, 0.00018 per Myr per Mb [Murphy et al. 2000]), and 109 in the rat (0.49 disruption per Myr, 0.00016 per Myr per Mb [Watanabe et al. 1999]).

Several estimates of rearrangements per Myr have been made by sequence comparisons between the whole mouse and human genomes (Davisson et al. 1998; Novacek 1992) or between single chromosomes of one species compared to the whole genome of the other (Dehal et al. 2001; Dunham et al. 1999; Hattori et al. 2000; Mural et al. 2002). This led to the identification of 195 conserved segments and suggests that 0.88 rearrangement per Myr occurred during the 220 Myr of separate evolution of these species (0.00029 disruption per Myr per Mb). These evaluations have been confirmed by the recent analysis of the whole genome of the mouse compared to the human genome (Waterston et al. 2002), which suggests that 295 rearrangements occurred, corresponding to 1.34 rearrangements per Myr (0.00044–0.00054 rearrangement per Myr per Mb). The mouse genome thus presents an unusually high number of genomic rearrangements compared to humans: The frequency of rearrangements per Myr is about two- to fourfold higher when the human genome is compared to mouse as when other mammals are compared.

In all mammalian species so far studied, a complete conservation of the genes on chromosome X has been observed, although gene order has been disrupted several times. Gene conservation on the mammalian X chromosome was predicted by Ohno (1973); it is due to the special mechanism of dosage compensation.

The F. rubripes genome has been sequenced following a shotgun strategy (Aparicio et al. 2002), and informatic processing has produced assembly and annotation for these sequences. Comparison with the human genome revealed the importance of rearrangements during evolution, as only 12.6 Mb of this genome presents a perfect conservation with humans. As the size of the F. rupribes gene is about 365 Mb long and the number of genes is about 38,000, these data suggest a rate of rearrangement of about 41 per Myr (0.01359–0.11168 disruption per Myr per Mb). This bracket is higher than the one obtained by T. nigroviridisH. sapiens comparison, but they are overlapping.

Our data indicate that—at least for the BACs analyzed in this article—there have been a large number of interchromosomal or intrachromosomal translocations in the 450 Myr since the divergence of H. sapiens and T. nigroviridis. Extrapolation of this value to the whole genomes suggests that the frequency of rearrangements that occurred during human–other mammalian species-dependent evolution is significantly lower than the frequency estimated by human–Tetraodontidae genome comparison. Peculiar mechanisms of population dynamics, in connection with the abundant number of ecological niches available for aquatic species, could be involved in speeding up the frequency of chromosome rearrangements. Another parameter could be represented by the generation time of Tetraodontidae fish, which is significantly shorter than that of mammalian species.

This estimation is limited by several factors. First, some genes have changed rapidly during vertebrate evolution and may be not be detected by genomic comparisons between human and fish genomes. The last comparison between the whole sequences of T. nigroviridis so far obtained, covering 7× the genome, showed that an alignment is obtained for ∼75% of the human genes using Exofish. We tried to increased this percentage by using TBLASTX2 with default settings in the regions for which no alignment was obtained with Exofish, and several new alignments were obtained in this way. But some genes may still not be detected. Moreover, the human genome is almost entirely sequenced, but the T. nigroviridis sequences have been obtained by random sequencing. The genome of this species is thus not fully sequenced, and this prevents the precise identification of orthologous genes. A better evaluation will be obtained with the progress of the whole-genome assembly of T. nigroviridis genome which is now continuing in our laboratory and with its global comparison with its human counterpart.