Abstract
Arabidopsis thaliana is an important model system for the study of plant biology. We have analyzed the complete genome sequences of Arabidopsis by using a newly developed windowless method for the GC content computation, the cumulative GC profile. It is shown that the Arabidopsis genome is organized into a mosaic structure of isochores. All the centromeric regions are located in GC-rich isochores, called centromere-isochores, which are characterized by a high GC content but low gene and T-DNA insertion densities. This characteristic distinguishes centromere-isochores from the other class of GC-rich isochores, called GC-isochores, which have high gene and T-DNA insertion densities. Consequently, 15 isochores have been identified, i.e., 7 AT-isochores, 3 GC-isochores, and 5 centromere-isochores. The genes in centromere-isochores, which have the highest GC content, have much shorter intron lengths and lower intron numbers, compared to those of the other two types. There is also considerable difference in the numbers and lengths of transposable elements (TEs) between AT and GC-isochores, i.e., the TE number (length) of AT-isochores is 6.3 (7.3) times that of GC-isochores. It is generally believed that TEs are accumulated in the regions surrounding the centromeres. However, within these TE-rich regions, there are regions of extremely low TE numbers (TE deserts), which correspond to the positions of centromere-isochores. In addition, a heterochromatic knob is located at the boundary of an AT-isochore. Furthermore, we show that the differences in GC content among isochores are mainly due to the GC content variation of introns, the third codon positions and intergenic regions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Arabidopsis thaliana, a flowering plant, is an important model system for the study of plant biology. Based on the sequencing efforts of the Arabidopsis Genome Initiative (AGI) established in 1996, the genome sequences of all five chromosomes have been completely sequenced (AGI 2000). The availability of the complete Arabidopsis genome sequence provides an unprecedented opportunity to study the global genome organization at the sequence level.
The isochore structure is referred to the phenomenon that in some eukaryotic genomes, the genome is organized into mosaics which are characterized by a fairly constant average GC content over scales of hundreds of kilobases and by abrupt change to another fairly constant-GC-content region (Bernardi 1995; Macaya et al. 1976). More than 10 years ago, Bernardi and coworkers analyzed the isochore structures of plants by density gradient ultracentrifugation experiments (Matassi et al. 1989; Montero et al. 1990; Salinas et al. 1988). The isochore structures of the Arabidopsis genome have also been investigated at the sequence level (Nekrutenko and Li 2000; Oliver et al. 2001).
In this report, we analyzed the isochore structures of the Arabidopsis genome by a newly developed windowless technique for the GC content computation, the cumulative GC profile. Consequently, 15 isochores have been identified. These isochores have a fairly homogeneous GC content and appear in the genome alternatively, with relatively sharp boundaries. The isochores are classified into three types, i.e., AT-, GC-, and centromere-isochores. The three types are distinct in terms of GC content, gene density and T-DNA insertion density, transposable element (TE) distribution. It is generally believed that TEs are accumulated in the regions surrounding the centromeres. Surprisingly, we found that within these TE-rich regions, there are regions of extremely low TE numbers (TE deserts), which correspond to the position of centromere-isochores. In addition, a heterochromatic knob is located at the boundary of an AT-isochore. The source of GC content variation among isochores was analyzed and shown to be mainly due to the differences of GC content at introns, the third codon positions and intergenic sequences.
Materials and Methods
The genome sequences of Arabidopsis were downloaded from http://www.ncbi.nlm.nih.gov. The TE data were based on Wright et al. (2003).
The Z-curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed given the other (Zhang and Zhang 1991, 1994). Based on the Z-curve, any DNA sequence can be uniquely described by three independent distributions, i.e., x n , y n , and z n . In particular, z n displays the distribution of bases of GC/AT types along the sequence, which is calculated as follows (Zhang and Zhang 1991, 1994)
where A n , C n , G n , and T n are the cumulative numbers of the bases A, C, G, and T, respectively, occurring in the subsequence from the first base to the nth base in the DNA sequence inspected, A0 = C0 = G0 = T0 = 0, z n = 0. By viewing the z n ∼ n curve, many global and local features of the GC content can be detected in a perceivable way.
For most genome or chromosome sequences, the curves of z n ∼ n are roughly straight lines. To amplify the variations, the curve of z n ∼ n is fitted by a straight line using the least square technique,
where (z, n) is the coordinate of a point on the fitted straight line and k is its slope. Instead of using the curve of z n ∼ n, we will use the z ′ n ≍ n curve, or simply z′ curve hereafter, where
Therefore, the variations of z n ∼ n curve deviated from the straight line, which corresponds to a constant GC content (see Eq. [4] below), are protruded by the z ′ n ≍ n curve. The z′ curve or the cumulative GC profile are used interchangeably in this paper. Let \(\overline {{\text{GC}}}\) denote the average GC content within a region Δn in a sequence, it was shown that (Zhang et al. 2001)
where k′ = Δz ′ n /Δn is the average slope of the z′ curve within the region Δn. It is clear to see from Eq. (4) that an up jump in the z′ curve, i.e., k′ > 0, indicates a decrease in GC content or an increase in AT content, whereas a drop in the z′ curve, i.e., k′ < 0, indicates an increase in GC content or a decrease in AT content. In addition, if a region in the z′ curve is a purely (approximately) straight line, then the GC content keeps absolutely (approximately) constant within this region. Any sharp maximum (minimum) point in the z′ curve indicates a turning point, where the GC content undergoes an abrupt change from a relatively GC-poor (GC-rich) region to a relatively GC-rich (GC-poor) region. The region Δn is usually chosen to be a fragment of a natural DNA sequence, e.g., isochore. The above method to calculate the GC content is called the windowless technique (Zhang et al. 2001).
The concept of isochores is related to domains of relatively homogeneous GC content with large scales in genomes, in which the variations of GC content may be considered to be small. Based on the z′ curve, a homogeneity index h, which describes the smallness of the GC content variations in isochores, was introduced (Zhang and Zhang 2003):
where
where z ′ n ≍ n is the cumulative GC profile defined in Eq. (3) for the isochore and the entire chromosome studied, respectively, and M and N are their lengths. If h ≪ 1, the variations of the GC content of the isochore may be considered to be small. No prior knowledge is available to define isochores based on h. In the current study, we arbitrarily chose h = 0.2 as the threshold of isochores.
One-way ANOVA tests were performed when comparing multiple groups of samples, whereas Student t-tests were performed when comparing two groups of samples, unless indicated otherwise.
Results
Features of the z′ Curves for the Five Chromosomes of Arabidopsis
The z′ curves for all five chromosomes of Arabidopsis are shown in Fig. 1. The cumulative GC profiles show that in these chromosomes long domains that have relatively homogeneous GC content exist, as reflected by the fact that some regions of the cumulative GC profiles can be approximately described by straight lines. The domains that have relatively homogeneous GC content are isochores. A quantitative index is used to assess the homogeneity of the GC content of isochores and this is discussed in another section. In addition, the GC-rich isochores are followed immediately by AT-rich isochores, and vice versa, clearly indicating a mosaic structure of the genome. In addition, one striking feature is that all the centromeric regions of the five chromosomes are located within GC-rich isochores. Furthermore, the overall patterns of GC content variation can be roughly classified into two groups: those of chromosomes 1, 3, and 5 are highly similar, and those of chromosomes 2 and 4 are highly similar.
Isochores of Arabidopsis and Their Classification
A total of 15 isochores have been identified in the Arabidopsis genome. In a previous work (Zhang and Zhang 2003), we classified the isochores into two types, i.e., GC-isochores and AT-isochores. The GC (AT-)-isochores are isochores whose GC content is higher (lower) than that of the chromosome where the isochores are located. The gene density of GC-isochores is usually high, whereas that of AT-isochores is usually low. In the Arabidopsis genome, all the centromeric regions are located in five GC-rich isochores. Although these five GC-rich isochores have a high GC content, they are distinct from the other class of GC-rich isochores in terms of many features, such as gene and T-DNA insertion densities. A much lower gene density was found in these five GC-isochores that are associated with centromeric regions (Table 1). Therefore, we classify the isochores that are associated with the centromeric regions as another class, the centromere-isochores.
Gene Distribution Among Isochores
In mammalian genomes, genes are preferentially distributed in high GC regions (Bernardi 1995; Lander et al. 2001). In human isochores, a higher gene density was found in H3 isochores (more GC-rich) than in L isochores (less GC-rich) (Bernardi 2000), which was subsequently confirmed based on isochores identified by a compositional segmentation method (Oliver et al. 2002). This correlation also holds, generally, for the Arabidopsis genome (Fig. 2A). The average GC contents for AT-isochores and GC-isochores are 0.340 and 0.368, respectively (Table 2) (p < 0.01). The gene density in AT-isochores and GC-isochores is 215 and 291/Mb, respectively (p < 0.001). The third type of isochores, centromere-isochores, although they have the highest GC content, 0.405 (p < 0.0001 compared with that of AT-isochores and p < 0.01 compared with that of GC-isochores), have the lowest gene density, 77/Mb (p < 0.001 compared with that of GC-isochores). Therefore, the centromere-isochores are distinct from the other class of GC-rich isochores, GC-isochores, which are characterized by a high GC content and a high gene density.
T-DNA Insertion Site Distribution Among Isochores
The distribution of the integration of transferred DNA (T-DNA) was investigated in the genome of Arabidopsis in 2000 (Barakat et al. 2000). Recently, over 225,000 independent Agrobacterium T-DNA insertion events in the Arabidopsis genome have been created that represent near-saturation of the gene space. The precise locations were determined for more than 88,000 T-DNA insertions. Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at the chromosome level (Alonso et al. 2003).
We studied the distribution of T-DNA insertion sites among isochores and found that T-DNA is most preferentially integrated into GC-isochores and most unfavorably integrated into centromere-isochores. The T-DNA insertion densities for AT, GC, and centromere-isochores are 2397, 3532, and 1605 sites/Mb, respectively (p < 0.0001) (Table 2 and Fig. 2B).
Although the precise mechanism of T-DNA integration in the host genome is not fully understood, one possible reason for the biased integration sites is that the biased integration is due to the different chromatin structures. For example, the integration may be promoted by increased chromatin accessibility in transcribed regions, thereby removing inhibitory effects of unfavorable chromatin environment (Schroder et al. 2002). Indeed, it has recently been found that sites of HIV integration in the human genome are not randomly distributed but instead are enriched in active genes (Schroder et al. 2002). Therefore, the biased distribution of T-DNA insertion sites among these isochores may reflect the difference in the chromatin structures of the three classes of isochores.
Transposable Element Distribution Among Isochores
Transposable elements (TEs) have been found in all eukaryotes, and these elements have the ability to move into new locations in chromosomes. The locations of TEs are highly biased, and in the Arabidopsis genome, it is generally believed that TEs are accumulated in the regions surrounding the centromeres (Copenhaver et al. 1999; AGI 2000; Wright et al. 2003). Recently, the locations of all TEs in the Arabidopsis genome were determined based on an Arabidopsis TE database (http://www.tebureau.mcgill.ca/ ). By using these data, we analyzed the TE distribution among isochores.
Based on the study of TE distribution along genomes, it is shown that regions surrounding the centromeric regions indeed contain high concentration of TEs. However, we noticed that in these TE-rich regions, there are regions that have extremely low numbers of TEs (TE deserts). Strikingly, these TE deserts correspond to the locations of centromere-isochores. Refer to Fig. 3 for an example based on chromosome 1. One possible explanation is that the regions corresponding to TE deserts are critical for centromere functions, and therefore, insertions of TEs into these regions are deleterious and eliminated by natural selection.
There is also considerable difference of the TE numbers and lengths between the AT- and the GC-isochores. The average TE numbers of AT- and GC-isochores are 84.16 and 13.39/Mb, respectively; the average TE lengths are 124 and 17 kb/Mb, respectively (Table 2 and Figs. 2C and D) (p < 0.01). Therefore, although there is only a 2.8% difference in GC content, the TE number (length) of AT-isochores is 6.3 (7.3) times that of GC-isochores. As mentioned previously, AT-isochores have a lower gene density than GC-isochores. The distributions of TEs in AT- and GC-isochores are consistent with the result of Wright et al. (2003), which shows a negative correlation between gene density and TE abundance.
Intron Number and Length Distributions AmongIsochores
It has been found that two classes of genes are present in Arabidopsis. One class of genes, the GC-rich class, has relatively low intron numbers and short concatenated intron lengths. The other class of genes, the GC-poor class, has relatively high intron numbers and long concatenated intron lengths (Barakat et al. 1998; Carels and Bernardi 2000). We analyzed the intron length and number distributions among isochores. The intron numbers of AT, GC, and centromere-isochores are 3.81, 4.58, and 2.70, respectively (p < 0.001); intron lengths are 645, 728, and 450 bp, respectively (p < 0.001) (Table 2). Therefore, genes within the most GC-rich class of isochores, centromere-isochores, have a much lower intron number and shorter intron length, than those of the other two classes (Figs. 2D and E). However, genes in the GC-isochores do not have lower intron numbers and shorter intron lengths. Therefore, it is likely that in the previously found two classes of genes (Carels and Bernardi 2000), the GC-rich class, which has low intron numbers and short lengths, contains the genes in the centromere-isochores.
A Heterochromatic Knob Is Located at an Isochore Boundary
The heterochromatic knobs were first observed by McClintock (1929) in the maize genome. These knobs are cytologically detectable, darkly stained heterochromatic regions present on the maize pathytene chromosomes. Many genetic effects are linked to the heterochromatic knobs. For instance, in the maize genome, the heterochromatic knobs were found to affect the recombination frequency and chromosome behavior in microspore divisions (Rhoades 1978; Rhoades and Dempsey 1973). Chromosome 5 of the Arabidopsis genome has a heterochromatic knob (AGI 2000) and the location is at the boundary of an AT-isochore. At the location of the heterochromatic knob, the genome undergoes a relatively abrupt change from a GC-rich region to an AT-rich region. However, chromosome 4 also has a heterochromatic knob, which is not close to any isochore boundary. Therefore, there is a possibility that the correlation between the heterochromatic knob and the isochore boundary in chromosome 5 is due to coincidence. Further information on heterochromatic knobs is needed to investigate this issue. If there is indeed a correlation between heterochromatic knobs and isochores, the correlation between these two structures may provide further insight into the origin of isochores. In addition, the cumulative GC profiles for chromosomes 1, 3, and 5 show a similar overall pattern (Fig. 1), therefore, it will be interesting to investigate the corresponding parts of chromosomes 1 and 3, to examine whether there are also heterochromatic knob structures.
Features of Unassigned Regions
In each chromosome, besides isochore regions, there are other unassigned regions, which are not isochores. We have also studied various features of these unassigned regions (Tables 1 and 2). The average gene density of these regions is 267.55/Mb; the T-DNA insertion number is 3146.31/Mb; the TE number is 31.09/Mb; the TE length is 40,952.24/Mb; the intron number is 4.25/gene; the intron length is 686.03/gene; and the GC contents of exons, introns, intergenic regions, GC12, and GC3 are 0.438, 0.333, 0.324, 0.452, and 0.427, respectively. In brief, all these features are in between those of AT- and GC-isochores. Because the GC content of these regions is in between those of the AT- and GC-isochore, it appears that the GC content is a critical factor in determining all these features.
Discussion
Source of the GC Content Variations Among Isochores
To investigate the source of the GC content variation among isochores, we calculated the GC content of exons, introns, GC12, GC3, and intergenic regions among different classes of isochores (Tables 1 and 2 and Fig. 2). Generally, the GC contents of all these different regions of AT-isochores are less than those of GC-isochores, which are less than those of centromere-isochores. For the AT-, GC-, and centromere-isochores, the GC contents of exons are 0.434, 0.442, and 0.468, respectively (p < 0.01); the GC contents of introns are 0.332, 0.336, and 0.399, respectively (p < 0.01); the GC contents of intergenic regions are 0.317, 0.330, 0.389, respectively (p < 0.01); the GC contents of GC12 are 0.447, 0.453, and 0.462, respectively (p < 0.05); and the GC contents of GC3 are 0.419, 0.437, and 0.468, respectively (p < 0.01). There is a clear difference, however, between GC12 and the GC content of noncoding regions, i.e., the GC content differences in noncoding regions among isochores are much more than those of GC12. For instance, the difference in GC12 between centromere and GC-isochore is 0.009, whereas the difference in GC content of introns is 0.063; the difference in GC3 is 0.031; and the difference in GC content of intergenic regions is 0.059. Therefore, the difference in isochore GC content is likely to be largely due to the variation in GC content in noncoding regions. This observation is consistent with the GC variation in human isochores, i.e., GC3 variation is greater than the GC content variation of isochores (Clay et al. 1996). These noncoding regions are believed to have less selective pressure than coding regions. Therefore, this GC variation pattern appears to support the view that isochores are due to the mutational bias along genomes (Eyre-Walker 1999; Eyre-Walker and Hurst 2001).
Comparison of Arabidopsis Isochores with Those of the Human Genome
Recently, the isochore structures of the human genome have been identified based on the cumulative GC profile (Zhang and Zhang 2003). The isochore structures of Arabidopsis are distinct from those of human in several aspects. First, the variations in GC content between AT- and GC-isochores are different. In the human genome, the average GC contents for AT- and GC-isochores are 0.38 and 0.47, respectively. Therefore, GC-isochores are 9% higher in GC content than AT-isochores. In the Arabidopsis genome, the average GC contents for AT- and GC-isochores are 0.34 and 0.37, respectively. Therefore, there is only a 3% difference in terms of GC content between these two types of isochores. However, isochore structures are still clear even though the GC content difference is not as large as those of the human genome. Another striking difference is that in the human genome, the type centromere-isochore does not exist. The behaviors of the GC-isochores that harbor the centromeric regions show no difference from that of other GC-isochores in the human genome. This difference may reflect the characteristic organization of the centromeric region of Arabidopsis (Haupt et al. 2001). In addition, the relative proportion of the AT-, GC-, and centromere-isochores in the Arabidopsis genome is 17.45, 53.65, and 7.37%, respectively. However, in the human genome, the GC-rich isochores (H3) only represent ∼3–5% (Bernardi 1995). Furthermore, in the Arabidopsis genome, the GC-isochores (8.01 Mb) are longer than the AT-isochores (2.49 Mb) and there are fewer GC-isochores (3) than AT-isochores (7), whereas in the human genome, GC-isochores (11.25 Mb) are shorter than AT-isochores (13.49 Mb) and there are more GC-isochores (34) than AT-isochores (22).
Comparison of the Cumulative GC Profile with theGC Content Distribution Based on Other Methods
As a routine procedure, the GC content distribution is computed by a sliding-window technique (Lander et al. 2001; Waterston et al. 2002). In this method, the number of G and C residues are counted within a window, therefore, the size of the window can be considered as the resolution of the GC content. The resolution of this method is usually low, because a small window size leads to large statistical fluctuations. In addition, the GC content distribution obtained by window-based methods is dependent on the window sizes chosen; in contrast, the cumulative GC profile is unique for a genome. A comparison between the window-based and the windowless approaches has been detailed in the literature (Li 2001; Zhang and Zhang 2003). We want to emphasize that in genomes with larger GC content variation, such as the human genome, the homogeneous GC content domains can be revealed by window-based methods, although the boundaries sometimes cannot be determined precisely (Pavlicek et al. 2002). However, for the genomes with relative low GC content variations, the disadvantages of window-based methods are more obvious, i.e., the window-based methods usually show a complex pattern. For example, the GC content distribution of the Arabidopsis genome is plotted based on 20-kb sliding windows (Fig. 4C), which shows a complex pattern, and boundaries between domains of the GC distribution are totally blurred by the variations. On the contrary, the z′ curve clearly shows five domains of GC distribution, i.e., two AT-isochores, two GC-isochores, and a domain with large GC variation (Figs. 3A and B). Therefore, the isochore structures and their boundaries can be revealed by the cumulative GC profiles clearly.
Another windowless tool to analyze genome heterogeneity is compositional segmentation, and this technique has also been used to study the isochore structures of eukaryotic genomes (Li 2001; Oliver et al. 2001). The isochores of the Arabidopsis genome has also been studied by the entropic segmentation method (Oliver et al. 2001). Please also visit http://bioinfo2.ugr.es/isochores/ for details about the method. As a comparison, the isochores of chromosome 3, obtained based on the entropic segmentaton method, are shown in Fig. 4D. One apparent difference is that much more isochores (56) were determined by the entropic segmentation method than those by the z′ curve. Another difference is that all regions of the chromosome are classified into different isochores based on the entropic segmentation method, whereas there are some unassigned regions, which are not isochores, based on the z′ curve. However, some segmentation points of both methods are highly consistent. For instance, the isochore boundaries obtained based on the z′ curve overlap well with some of the segmentation points based on the entropic segmentation method. In addition, among the 56 isochores obtained based on the entropic segmentation method, many boundaries correspond to some jumps in the z′ curve. Therefore, in some aspects, the two methods are consistent. However, the z′ curve appears to be more intuitive and can give a picture of the global GC content distribution along genomes.
Definition of Isochores
Identification of isochore structures in eukaryotic genomes provides much insight into the understanding of the genome organization, because of the clear functional implications of isochores. For instance, isochores have been correlated with gene density (Zoubak et al. 1996), chromosome bands (Saccone et al. 1993), and repeat elements (Meunier-Rotival et al. 1982). In the Arabidopsis genome, the isochores identified have been shown to be related to gene density, T-DNA insertion density, TE density, intron length, and so on. Although isochores have been known for more than 25 years, currently no clear definition of isochores is available. We defined the index, h, to assess the relative homogeneity of isochores compared to the variation in GC content of the whole genome. In this study, we arbitrarily chose a threshold of the homogeneity index, h, to be 0.20 for isochores. In fact, the homogeneity index, h, is more suitable to be an index to assess the relative homogeneity of isochores, rather than a definition. By using the entropic segmentation method, genomes can be split into many segments (isochores) objectively, based on segmentation points (Oliver et al. 2001). For instance, 56 isochors were found in chromosome 3 of the Arabidopsis genome (Oliver et al. 2001). However, it seems to be unreasonable that every region of a genome is an isochore. Therefore, due to the lack of a clear definition, most isochores identified so far are quite subjective.
The homogeneity of the GC content of isochores should be considered to be relative, whereas boundaries of isochores are absolute. No strict isochores that have an absolutely constant GC content have been found in the human genome, as well as other genomes. In terms of the homogeneity index h, h cannot be equal to 0. In some sense, the homogeneity of GC content of isochores is not as important as their functional implications. Isochores are a segment of genome DNA sequences, in which many characteristics, such as the gene density and repeat density, are different from those of other isochores (Meunier-Rotival et al. 1982; Zoubak et al. 1996). Therefore, isochores may be deemed as function domains of genomes or chromosomes, whose boundaries have critical biological meanings. For example, the boundary between Class II and Class III isochores in the human MHC sequence correspond to the change in replication timing (Tenzen et al. 1997). In the Arabidopsis genome, the centromere-isochores correspond to TE deserts. The problem is how to find these boundaries both experimentally and theoretically. The cumulative GC profile is one of the available tools (Li et al. 2002; Oliver et al. 2001; Peshkin and Gelfand 1999) to determine the isochore boundaries. The characterization of isochores and their boundaries will provide a solution to define isochores based on their biological functions, and in this regard, the isochore boundary appears to be more important than the homogeneity of GC content of isochores.
References
JM Alonso AN Stepanova TJ Leisse CJ Kim H Chen P Shinn DK Stevenson J Zimmerman P Barajas R Cheuk C Gadrinab C Heller A Jeske E Koesema CC Meyers H Parker L Prednis Y Ansari N Choy H Deen M Geralt N Hazari E Hom M Karnes C Mulholland R Ndubaku I Schmidt P Guzman L Aguilar-Henonin M Schmid D Weigel DE Carter T Marchand E Risseeuw D Brogden A Zeko WL Crosby CC Berry JR Ecker (2003) ArticleTitleGenome-wide insertional mutagenesis of Arabidopsis thaliana Science 301 653–657 Occurrence Handle10.1126/science.1086391 Occurrence Handle12893945
A Barakat G Matassi G Bernard (1998) ArticleTitleDistribution of genes in the genome of Arabidopsis thaliana and its implications for the genome organization of plants Proc Natl Acad Sci USA 95 10044–10049 Occurrence Handle1:CAS:528:DyaK1cXlsFWisbY%3D Occurrence Handle9707597
A Barakat P Gallois M Raynal D Mestre-Ortega C Sallaud E Guiderdoni M Delseny G Bernardi (2000) ArticleTitleThe distribution of T-DNA in the genomes of transgenic Arabidopsis and rice FEBS Lett 471 161–164 Occurrence Handle10.1016/S0014-5793(00)01393-4 Occurrence Handle1:CAS:528:DC%2BD3cXitlOltb0%3D Occurrence Handle10767414
G Bernardi (1995) ArticleTitleThe human genome: Organization and evolutionary history Annu Rev Genet 29 445–476 Occurrence Handle1:STN:280:BymH383os1I%3D Occurrence Handle8825483
G Bernardi (2000) ArticleTitleIsochores and the evolutionary genomics of vertebrates Gene 241 3–l7 Occurrence Handle10.1016/S0378-1119(99)00485-0 Occurrence Handle1:CAS:528:DyaK1MXotVGksrw%3D Occurrence Handle10607893
N Carels G Bernardi (2000) ArticleTitleTwo classes of genes in plants Genetics 154 1819–1825 Occurrence Handle1:CAS:528:DC%2BD3cXjtVams7s%3D Occurrence Handle10747072
O Clay S Caccio S Zoubak D Mouchiroud G Bernardi (1996) ArticleTitleHuman coding and noncoding DNA: compositional correlations Mol Phylogenet Evol 5 2–12 Occurrence Handle1:CAS:528:DyaK28XitVyrsLY%3D Occurrence Handle8673288
GP Copenhaver K Nickel T Kuromori MI Benito S Kaul X Lin M Bevan G Murphy B Harris LD Parnell WR McCombie RA Martienssen M Marra D Preuss (1999) ArticleTitleGenetic definition and sequence analysis of Arabidopsis centromeres Science 286 2468–2474 Occurrence Handle1:CAS:528:DC%2BD3cXmvVak Occurrence Handle10617454
A Eyre-Walker (1999) ArticleTitleEvidence of selection on silent site base composition in mammals: Potential implications for the evolution of isochores and junk DNA Genetics 152 675–683 Occurrence Handle1:CAS:528:DyaK1MXktFGgurw%3D Occurrence Handle10353909
A Eyre-Walker LD Hurst (2001) ArticleTitleThe evolution of isochores Nat Rev Genet 2 549–555 Occurrence Handle1:CAS:528:DC%2BD3MXltVCksb8%3D Occurrence Handle11433361
W Haupt TC Fischer S Winderl P Fransz RA Torres-Ruiz (2001) ArticleTitleThe centromere l (CENl) region of Arabidopsis thaliana: Architecture and functional impact of chromatin Plant J 27 285–296 Occurrence Handle1:CAS:528:DC%2BD3MXmvFKrtLo%3D Occurrence Handle11532174
ES Lander LM Linton B Birren C Nusbaum MC Zody J Baldwin K Devon K Dewar M Doyle W FitzHugh R Funke D Gage K Harris A Heaford J Howland L Kann J Lehoczky R LeVine P McEwan K McKernan J Meldrim JP Mesirov C Miranda W Morris J Naylor C Raymond M Rosetti R Santos A Sheridan C Sougnez N Stange-Thomann N Stojanovic A Subramanian D Wyman J Rogers J Sulston R Ainscough S Beck D Bentley J Burton C Clee N Carter A Coulson R Deadman P Deloukas A Dunham I Dunham R Durbin L French D Grafham S Gregory T Hubbard S Humphray A Hunt M Jones C Lloyd A McMurray L Matthews S Mercer S Milne JC Mullikin A Mungall R Plumb M Ross R Shownkeen S Sims RH Waterston RK Wilson LW Hillier JD McPherson MA Marra ER Mardis LA Fulton AT Chinwalla KH Pepin WR Gish SL Chissoe MC Wendl KD Delehaunty TL Miner A Delehaunty JB Kramer LL Cook RS Fulton DL Johnson PJ Minx SW Clifton T Hawkins E Branscomb P Predki P Richardson S Wenning T Slezak N Doggett JF Cheng A Olsen S Lucas C Elkin E Uberbacher M, Frazier et al. (2001) ArticleTitleInitial sequencing and analysis of the human genome Nature 409 860–921 Occurrence Handle10.1038/35057062 Occurrence Handle11237011
W Li (2001) ArticleTitleDelineating relative homogeneous G+C domains in DNA sequences Gene 276 57–72 Occurrence Handle1:CAS:528:DC%2BD3MXntlKmu74%3D Occurrence Handle11591472
W Li P Bernaola-Galvan F Haghighi I Grosse (2002) ArticleTitleApplications of recursive segmentation to the analysis of DNA sequences Comput Chem 26 491–510 Occurrence Handle1:CAS:528:DC%2BD38XksFersbc%3D Occurrence Handle12144178
G Macaya JP Thiery G Bernardi (1976) ArticleTitleAn approach on the organization of eukaryotic genomes at a macromolecular level J Mol Biol 108 237–254 Occurrence Handle1:CAS:528:DyaE2sXis12qtg%3D%3D Occurrence Handle826644
G Matassi LM Montero J Salinas G Bernardi (1989) ArticleTitleThe isochore organization and the compositional distribution of homologous coding sequences in the nuclear genome of plants Nucleic Acids Res 17 5273–5290 Occurrence Handle1:CAS:528:DyaL1MXltFSisbY%3D Occurrence Handle2762126
B McClintock (1929) ArticleTitleChromosome morphology in Zea mays Science 69 629
M Meunier-Rotival P Soriano G Cuny F Strauss G Bernardi (1982) ArticleTitleSequence organization and genomic distribution of the major family of interspersed repeats of mouse DNA Proc Natl Acad Sci USA 79 355–359 Occurrence Handle1:CAS:528:DyaL38XhtFKrs74%3D Occurrence Handle6281768
LM Montero J Salinas G Matassi Bernardi (1990) ArticleTitleGene distribution and isochore organization in the nuclear genome of plants Nucleic Acids Res 18 1859–1867 Occurrence Handle1:CAS:528:DyaK3cXktVWns78%3D Occurrence Handle2336360
A Nekrutenko WH Li (2000) ArticleTitleAssessment of compositional heterogeneity within and between eukaryotic genomes Genome Res 10 1986–1995 Occurrence Handle1:CAS:528:DC%2BD3MXjs1I%3D Occurrence Handle11116093
JL Oliver P Bernaola-Galvan P Carpena R Roman-Roldan (2001) ArticleTitleIsochore chromosome maps of eukaryotic genomes Gene 276 47–56 Occurrence Handle1:CAS:528:DC%2BD3MXntlKmurc%3D Occurrence Handle11591471
JL Oliver P Carpena R Roman-Roldan T Mata-Balaguer A Mejias-Romero M Hackenberg P Bernaola-Galvan (2002) ArticleTitleIsochore chromosome maps of the human genome Gene 300 117–127 Occurrence Handle1:CAS:528:DC%2BD38XptFaksLw%3D Occurrence Handle12468093
A Pavlicek J Paces O Clay G Bernardi (2002) ArticleTitleA compact view of isochores in the draft human genome sequence FEBS Lett 511 165–169 Occurrence Handle1:CAS:528:DC%2BD38XovFektQ%3D%3D Occurrence Handle11821069
L Peshkin MS Gelfand (1999) ArticleTitleSegmentation of yeast DNA using hidden Markov models Bioinformatics 15 980–986 Occurrence Handle1:CAS:528:DC%2BD3cXit1yksrk%3D Occurrence Handle10745987
MM Rhoades (1978) NoChapterTitle BD Waraen (Eds) Maize breeding and genetics Wiley New York 641–672
MM Rhoades E Dempsey (1973) ArticleTitleCytogenetic studies on a transmissible deficiency in chromosome 3 of maize J Hered 64 13–18 Occurrence Handle1:CAS:528:DyaE3sXktFGqsbw%3D
S Saccone A Sario ParticleDe J Wiegant AK Raap G Valle ParticleDella G Bernardi (1993) ArticleTitleCorrelations between isochores and chromosomal bands in the human genome Proc Natl Acad Sci USA 90 11929–11933 Occurrence Handle1:CAS:528:DyaK2cXntVKmug%3D%3D Occurrence Handle8265650
J Salinas G Matassi LM Montero G Bernardi (1988) ArticleTitleCompositional compartmentalization and compositional patterns in the nuclear genomes of plants Nucleic Acids Res 16 4269–4285 Occurrence Handle1:CAS:528:DyaL1cXktleisr4%3D Occurrence Handle3380684
AR Schroder P Shinn H Chen C Berry JR Ecker F Bushman (2002) ArticleTitleHIV-1 integration in the human genome favors active genes and local hotspots Cell 110 521–529 Occurrence Handle1:CAS:528:DC%2BD38Xmslyjt7g%3D Occurrence Handle12202041
T Tenzen T Yamagata T Fukagawa K Sugaya A Ando H Inoko T Gojobori A Fujiyama K Okumura T Ikemura (1997) ArticleTitlePrecise switching of DNA replication timing in the GC content transition area in the human major histocompatibility complex Mol Cell Biol 17 4043–4050 Occurrence Handle1:CAS:528:DyaK2sXktFOisb0%3D Occurrence Handle9199339
InstitutionalAuthorNameThe Arabidopsis Genome Initiative (2000) ArticleTitleAnalysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408 796–815 Occurrence Handle10.1038/35048692 Occurrence Handle11130711
RH Waterston K Lindblad-Toh E Birney J Rogers JF Abril P Agarwal R Agarwala R Ainscough M Alexandersson P An SE Antonarakis J Attwood R Baertsch J Bailey K Barlow S Beck E Berry B Birren T Bloom P Bork M Botcherby N Bray MR Brent DG Brown SD Brown C Bult J Burton J Butler RD Campbell P Carninci S Cawley F Chiaromonte AT Chinwalla DM Church M Clamp C Clee FS Collins LL Cook RR Copley A Coulson O Couronne J Cuff V Curwen T Cutts M Daly R David J Davies KD Delehaunty J Deri ET Dermitzakis C Dewey NJ Dickens M Diekhans S Dodge I Dubchak DM Dunn SR Eddy L Elnitski RD Emes P Eswara E Eyras A Felsenfeld GA Fewell P Flicek K Foley WN Frankel LA Fulton RS Fulton TS Furey D Gage RA Gibbs G Glusman S Gnerre N Goldman L Goodstadt D Grafham TA Graves ED Green S Gregory R Guigo M Guyer RC Hardison D Haussler Y Hayashizaki LW Hillier A Hinrichs W Hlavina T Holzer F Hsu A Hua T Hubbard A Hunt I Jackson DB Jaffe LS Johnson M Jones TA Jones A Joy M Kamal EK, Karlsson et al. (2002) ArticleTitleInitial sequencing and comparative analysis of the mouse genome Nature 420 520–562 Occurrence Handle10.1038/nature01262 Occurrence Handle12466850
SI Wright N Agrawal TE Bureau (2003) ArticleTitleEffects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana Genome Res 13 1897–1903 Occurrence Handle1:CAS:528:DC%2BD3sXmsVWqtLk%3D Occurrence Handle12902382
CT Zhang R Zhang (1991) ArticleTitleAnalysis of distribution of bases in the coding sequences by a diagrammatic technique Nucleic Acids Res 19 6313–6317 Occurrence Handle1:CAS:528:DyaK38Xhs1elug%3D%3D Occurrence Handle1956790
CT Zhang R Zhang (2003) ArticleTitleAn isochore map of the human genome based on the Z curve method Gene 317 127–135 Occurrence Handle1:CAS:528:DC%2BD3sXosFyisrk%3D Occurrence Handle14604800
CT Zhang J Wang R Zhang (200l) ArticleTitleA novel method to calculate the G+C content of genomic DNA sequences J Biomol Struct Dyn 19 333–341
R Zhang CT Zhang (1994) ArticleTitleZ curves, an intuitive tool for visualizing and analyzing the DNA sequences J Biomol Struct Dyn 11 767–782 Occurrence Handle1:CAS:528:DyaK2cXivFWisbc%3D Occurrence Handle8204213
S Zoubak O Clay G Bernardi (1996) ArticleTitleThe gene distribution of the human genome Gene 174 95–102 Occurrence Handle1:CAS:528:DyaK28XmtV2itb8%3D Occurrence Handle8863734
Acknowledgments
The present study was supported in part by the 973 Project of China (Grant 1999075606). We are indebted to Dr. Joseph Ecker, who kindly provided us the T-DNA insertion data used in the present study. We cordially thank the referees for their comments and suggestions, which were critical in improving the quality of the current article. We also thank Feng Gao, who helped prepare the data for GC12 and GC3.
Author information
Authors and Affiliations
Corresponding author
Additional information
[Reviewing Editor: Martin Kreitman]
Rights and permissions
About this article
Cite this article
Zhang, R., Zhang, CT. Isochore Structures in the Genome of the Plant Arabidopsis thaliana. J Mol Evol 59, 227–238 (2004). https://doi.org/10.1007/s00239-004-2617-8
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00239-004-2617-8