Introduction

A gene family is a group of genes that has descended from a common ancestral gene and therefore inherits similar DNA sequences and similar functions. In Escherichia coli, >50% of the genes are members of identified gene families (Koonin et al. 1998), and the proportion of gene family members in eukaryotes may be in the same range or even higher (Chervitz et al. 1998; Semple and Wolfe 1999). Multi-gene families are a distinctive feature of plant genomes. For instance, 65% of Arabidopsis thaliana genes belong to families, with half of them containing at least five members (see The Arabidopsis Genome Initiative 2000). In the rice genome, about 29% of the genes belong to gene families (International Rice Genome Sequencing Project, 2005), and in maize, it was estimated that 1/3 of the genes were members of gene families (Messing et al. 2004). The evolution of gene families has been the subject of controversy for many years (Nei and Rooney 2005). Therefore, expression divergence between member genes may be useful evidence in determining how gene families evolved. Such evidence requires extensive expressional analysis of large gene families.

The relative expression and amino acid composition of seed storage proteins significantly impact the nutritional value of maize as animal feed (Ueda and Messing 1993). Maize endosperm, the major site of storage protein accumulation, contains predominantly prolamins, the so-called zeins (Landry and Moureaux 1970; Wilson 1983). Zeins are one of the best-characterized sets of storage proteins specifically expressed during seed development and act as a reservoir for free amino acids. Zeins are about 50% of the total proteins in mature seeds (Soave et al. 1981) and 62–74% of the endosperm proteins (Hamaker et al. 1995; Landry et al. 2000). Classification of various zein proteins was on the basis of solubility and structural relationships. Zein proteins were classified as α-, β-, γ- and δ-zeins (Table 1, Coleman and Larkins 1998; Esen 1987).

Table 1 Classification of major zein proteins in maize

The α-zeins are the major fraction of maize zeins, and were among the first storage protein genes to be described (Pedersen et al. 1982; Wienand et al. 1981). Early studies showed that α-zeins are encoded by a large super gene family, with estimates ranging from 75 to 100 coding sequences (Hagen and Rubenstein 1981; Wilson and Larkins 1984). Determined by sequence similarity and copy number, α-zein gene family could be further divided in to four gene families: z1A, z1B, z1C, and z1D (Table 1, Heidecker and Messing 1986). Three of the gene families, z1A, z1B, and z1D, have apparent molecular mass on SDS polyacrylamide gels of 19 kD, whereas the z1C gene family has a molecular mass of 22 kD. Based on genomic DNA sequence analysis, it was found that the entire z1C gene family contained a total of 23 gene copies in BSSS53 inbred and 16 in B73 inbred (Song et al. 2001; Song and Messing 2003). While the three 19 kD α-zein gene families in B73 contained 25 gene copies in total and were composed of five different genomic regions covering a total length of over 1 Mb (Song and Messing 2002). The α-zein gene family is a typical super gene family that could be used as a model system for super gene family analysis.

The α-zein gene family history shows a recent gene amplification. Most of the 22 and 19 kD zein genes arose within the last 3–5 mya (Song et al. 2001; Song and Messing 2002). Such a short evolution history has resulted in high sequence similarity among gene members. In the z1A gene family, 2/3 of the member genes are >95% similar, with 1/3 > 99% similar. In the z1B gene family, the similarities among all the member genes are >90% and three of them are >95%. In the z1D gene family, 40% of the member genes have >90% similarity. The member genes of the z1C gene family are approximately 90% similar (Llaca and Messing 1998; Wilson and Larkins 1984). In an extreme case, two member genes of the z1C gene family—azs22.10 and zp22/6 are 99.3% similar with only three nucleotide differences in the BSSS53 inbred line (Song et al. 2001). With such high sequence similarity, even under high stringent conditions, cross-hybridization of cloned zein sequences are reported in dot blot hybridization (Burr et al. 1982). This may explain why few expression analyses could be linked to individual α-zein genes on the basis of hybridization.

Earlier studies using RNA hybridization-based approaches revealed that there is a temporal regulation of α-zein gene family expression (Kodrzycki et al. 1989; Marks et al. 1985). Woo et al. (2001) found similar temporal pattern of expression among some members of z1B, z1D and z1C gene families by in situ hybridization at 5, 10, 15, 20 and 25 DAP. Later Song and Messing (2002) found that two highly expressed 19 kD zein genes in the Woo et al. (2001) study (az19B1 and az19B3) were actually a mixture of different expressed genes. Therefore more sensitive methods would be required to distinguish member genes with high sequence similarity. After genomic DNA sequencing, Song and Messing investigated the expression status of each member gene of the 19 kD α-zein gene family by matching the EST database derived from different inbred lines, other than B73, and showed there were 15 of 25 genes expressed (Song and Messing 2002). The expression of z1C genes at 18 DAP was studied by random sequencing of RT-PCR clones in B73, BSSS53 inbred lines and their hybrids (Song and Messing 2003). There were 7 out of 23 member genes from the z1C gene family expressed in BSSS53 and 6 out of 16 in B73 (Song et al. 2001; Song and Messing 2003). However, for the purpose of gene expression analysis a sequencing approach appeared to be not only less efficient, but also expensive. A better method, more efficient and less expensive, needed to be developed for large gene family analysis.

To provide a comprehensive temporal expression profile of the entire α-zein super gene family at the individual gene level, we developed a method combining quantitative real-time PCR with random RT-PCR clone sequencing, which enabled us to study α-zein gene expression pattern and level during different endosperm development stages (10–34 DAP), in a single maize plant.

Material and methods

Plant growth and endosperm collection

Maize plants were grown in a greenhouse facility at Shanghai University, Shanghai, China. According to a previous study (Kiesselbach 1949) and our pilot experiment for zein expression (data not shown), at 10 days after pollination (DAP), the endosperm tissue was apparently visible and zein genes started to express. At 34 DAP, the kernel was hard to dissect due to dehydration. Therefore endosperms were collected from kernels of inbred B73 plants from 10 DAP till 34 DAP. In order to minimize biological variance between plants, each complete set of samples was collected from a single plant sequentially from 10 to 34 DAP. Each sample contained endosperm tissues from about 3 to 5 kernels. Since a previous study showed Opaque2 (O2) was regulated diurnally (Ciceri et al. 1999), all tissues were collected during midday (11:00 am to 14:00 pm) of the indicated day, flash frozen in liquid nitrogen and stored at −80°C.

RNA isolation and cDNA synthesis

RNAs were extracted using Trizol reagent (BioDev, Cat. No. MK072). To remove genomic DNA contamination, RNAs were subsequently subjected to DNase I treatment (TaKaRa, Cat. No. D2215). A 1 μg aliquot of purified RNA was then reverse transcribed in a 20-μl reaction volume using ReverTra Ace reverse transcriptase (TOYOBO, Cat. No. TRT-101).

Random sequencing

NetPrimer (http://www.premierbiosoft.com) was used to design gene family-specific primers (Primer I), which were universal for member genes within a gene family. Such primers were used for RT-PCR cloning for random sequencing purpose (Supplementary Table 1). All primers were validated by testing with genomic DNA as template to insure no biases among the member genes (data not shown). RT-PCR of cDNAs was carried out with the optimal cycle number within linear amplification stage, and PCR products were subsequently cloned into pMD-18T vector (TaKaRa, Cat. No. D101A). Clones were randomly picked for sequencing analysis using the DYEnamic ET dye terminator sequencing kit (Pharmacia, Cat. No. US81095), and analyzed on a MEGABACE 4500 DNA analysis system (Amersham). Base-calling and quality assessment of sequences were based on the phred/phrap program (Ewing et al. 1998). In order to have sufficient data for relative abundance calculation, for each developmental sample point, at least 192 random clones were sequenced for z1A or z1C gene family, and at least 96 random clones for z1B or z1D gene family.

Real-time quantitative PCR

Using the sequences of expressed member genes obtained through random sequencing, the gene family-specific but expressed member genes universal primers (PrimeII) were designed for Quantitative Real time PCR. Besides the common parameters for regular PCR primers the design followed three rules of Wang et al. (2006) (Supplementary Table 1).

SYBR Green I real time PCR was conducted on a DNA Engine Opticon 2 continuous fluorescence detection system (MJ Research, Inc.). The reactions were performed using SYBR Green Premix (TOYOBO, Cat. No. QPK-211). The annealing temperature and plate read temperature of each amplicon were listed in Supplementary Table 1. Quantitative Real Time PCR test included: (1) a standard curve of 8 serial 10-fold dilution points of cloned cDNA plasmids (ranging from 102 to 109 copies μl−1); (2) all reactions were performed in triplicate; and (3) a mock reaction lacking template as a control. Housekeeping gene, β-Actin, was used as an endogenous reference. For each sample point, three biological replicas (three sets of collected tissues) were used. The final data represented the mean of data from three biological replicas, with SD represented standard deviation from the three biological replicas. The MJ Opticon Monitor version 3.1.32 software (MJ Research, Inc.) collected and analyzed fluorescence data, and estimated the number of cDNA copies relative to β-Actin.

Results

Expression of α-zein super gene family members in B73 inbred

According to previous studies only a fraction of genes in z1C gene family were expressed in endosperm (Song et al. 2001; Song and Messing 2003). Because only the expressed genes would be of interest in a profiling study, we first determined how many genes were expressed in the α-zein gene family. Twenty two DAP was within the window of peak expression of zein genes (Marks et al. 1985). Using a universal primer (PrimeI) for each gene family, cDNAs of expressed genes from 4 gene families were cloned by RT-PCR. To reconstruct the pool of transcripts from each gene family in endosperm tissue, random cDNA clones were sequenced (see “Materials and methods”). Because there are no introns in α-zein genes, sequencing data was matched with each of the genomic sequences separately to determine their relative level of mRNA accumulation, according to their occurrence among sequencing samples.

This analysis revealed the expressed member genes from each gene family (Table 2). For the z1C gene family, there were 6 out of 16 member genes expressed. The result was consistent with the previous study by Song and Messing (2003). This study also revealed information about expressed gene members in the 19 kD gene families. There were 8 out of 12 member genes of the z1A gene family, 2 out of 8 member genes of the z1B gene family, and 2 out of 5 member genes of the z1D gene family expressed in endosperm. We found that the proportion of expressed genes among different gene families was uneven. Z1A had the highest ratio of expressed genes (3/4), while z1B had the lowest (1/4).

Table 2 Summary of α-zein genes in B73 and their expression

This analysis also revealed the relative transcript abundance of expressed member genes in each gene family (Table 2). Z1A gene family had the most expressed genes among α-zein gene families. But the contributions of these expressed genes were highly variable. One gene copy, Z350D07-1, contributed over 20% of the entire z1A gene family. On the other end, Z448F14-2 contributed only 1.78%. The rest of the member genes contributed between 8 and 18%. For z1C gene family, azs22.4 contributed nearly 1/3 of the expression level of the entire 22-kD gene family (33.67%), while the expression levels of the other expressed member genes were lower (10–18%). This was different from a previous study that showed azs22.4 and azs22.9 were equally large contributors to the expression level at 18 DAP (Song and Messing 2003). For the z1B gene family, we found that the expression levels of the two expressed gene copies were nearly the same. A similar case was found for z1D gene family (Table 2).

We also noticed that all expressed genes appeared to have intact coding regions. All member genes of the α-zein gene family with truncations appear to be incapable of accumulating mRNA at detectable level (Table 2). There was one gene (azs22.12) of z1C, which had an intact coding region but had no detectable expression level.

Expression profiling of α-zein super gene family in B73 inbred

After defining the expressed gene members of each gene family, primers (PrimerII) were designed for real-time PCR to profile the expression of each gene family separately. Using the common endogenous actin as a reference, the levels of gene expression among the different gene families were compared. Interestingly, an apparent expression rhythm was observed with all α zein gene families (Fig. 1). The expression rhythm was in an “up-and-down” oscillating pattern, indicating there was regulatory feedback for α zein synthesis during endosperm development. There were two groups of patterns. In the first group, represented by z1A and z1B, the expression pattern fluctuated with an interval of 4 days; in the second group, represented by z1C and z1D, the fluctuation interval was 6 days.

Fig. 1
figure 1

Quantitative real time PCR analysis of the expression of four α-zein gene families in endosperm of B73 inbred. The expression profiling of the four gene families was represented by color-coded lines. The longitudinal axis shows the mRNA level. The scale on left indicates the ratio to actin mRNA level. Data are presented as means ± SD (n = 3 biological replicates). The horizontal axis shows the day of endosperm tissue collection after pollination (DAP)

Although there were two groups of expression patterns, different gene families showed differences in expression patterns. For the z1A gene family, expression levels reached their peak at 18 DAP, and then decreased with fluctuation until 34 DAP without cease. For the z1B gene family, expression levels peaked at 22 DAP, and then decreased with a fluctuation until 34 DAP without cease. The expression pattern of the z1C and z1D gene families are similar, both of which have a peak at 24 DAP (Fig. 1). In general, z1A had the highest expression level, followed by z1C, and then z1D and z1B (Fig. 1). Comparing each gene families’ peak expression level, z1A had the highest expression level, about 4.50-fold to z1B, which had the lowest expression level. Z1C had the second highest expression level, about 2.81-fold to that of z1B. And z1D was about 1.77-fold to that of z1B. However, due to the non-synchronized oscillating patterns, the relative expression level among different gene families was variable during different endosperm developmental stages (Fig. 1). For example, at 18 DAP and at 24 DAP, the relative expression level between z1A and z1C was significantly different.

Expression profiling of member genes in each gene family

In order to reveal the expression pattern for individual genes within each gene family, we used random sequencing of RT-PCR clones to quantify the relative abundance of individual member genes (see “Materials and methods”). Based on sequence polymorphisms, all member genes could be distinguished. The relative expression of a particular gene member in its gene family would be shown by the occurrence of its sequence among the total samples sequenced. Combined with the quantitative expression level of gene families obtained by real time PCR as described above, we were able to obtain the quantitative expression level of individual member genes. Different endosperm developmental stages were analyzed (14, 18, 22, 26, 30, and 34 DAP, see also in “Materials and methods”), and the developmental expression pattern of individual genes could be drawn and compared with each other. By combining the result of random sequencing with the expression level of the corresponding gene family obtained through quantitative real time PCR analysis, the expression patterns of all gene members were obtained (Fig. 2).

Fig. 2
figure 2

Random sequencing and quantitative real time PCR analysis of member genes of α-zein gene families in endosperm of B73 inbred. The longitudinal axis shows the mRNA level. The scale on left indicates the expression level of individual member genes as the ratio to actin mRNA level. The scale on right indicates the expression level of the entire gene family (red line) as the ratio to actin mRNA level. The expression level of each gene member is obtained by multiplying its relative expressional percentage to the overall expression level of its gene family. The horizontal axis represents the day of endosperm tissues collection after pollination. Color-coded lines represent different member genes are shown

Individual genes within gene families were found to have very different expression patterns during endosperm development. In large gene families like z1A and z1C, very complicated expression patterns were observed, and no two genes shared an identical expression pattern in the same gene family. However, genes with similar patterns could be found. For example, if grouped by the overall trends of expression, z1C genes formed two groups. One group containing azs22.4, azs22.9 and fl2, showed a relative steady expression level across different developmental stages. The other group containing azs22.7, azs22.8 and azs22.9, showed a sharp decline after the expression peak. However, if grouped by the time to reach the expression peak, z1C formed three groups. One group with azs22.8 and fl2, had an apparent expression peak at 18 DAP. The second group with azs22.7 and azs22.19 had an apparent expression peak at 22 DAP. The third group with azs22.4 and azs22.9 had a flat expression peak. The z1A gene family had even more expressed gene members. There were two major pattern groups, one with a single expression peak, as with Z350D07-1, Z448F14-4 and Z448F14-6; the other group had two expression peaks, as with Z350D07-2, Z448F14-3, Z448F14-5 and Z448F14-7. Within the single expression peak group, Z350D07-1 and Z448F14-6 had their expression peak at 22 DAP, while Z448F14-4 had its expression peak at 18 DAP. For the group with two expression peaks, Z350D07-2, Z448F14-3 and Z448F14-5 had their expression peaks at 18 and 26 DAP; while Z448F14-7 had its expression peaks at 18 and 30 DAP. Therefore with different grouping standards, member genes could form into different groups, indicating dramatic divergence of expression patterns of individual genes within a gene family.

Compared with z1A and z1C gene families, the z1B and z1D gene families were smaller, i.e. fewer members. Both had only two expressed genes in the B73 inbred line. Accordingly, their expression patterns were also relatively simple. The two expressed member genes of z1B gene family had almost identical expression patterns. However, the two expressed member genes of z1D had divergent expression patterns. One gene had two peaks at 18 and 26 DAP (Z517A02-2), and the other only has one peak at 22 DAP (Z513H09-1) (Fig. 2).

Expression of gene families among different maize inbreds

A previous study indicated that dramatically different z1C haplotypes existed among different maize inbred lines resulting in significantly different gene sets (Song and Messing 2003). In order to have a more comprehensive view of the distribution of the zein super gene family among maize inbred lines, we carried out a comparison study of α zein gene family expression among three maize inbred lines, i.e. B73, BSSS53 and W22. Similar quantitative real time PCR and RT-PCR random clone sequencing was carried out for α zein gene expression at 24 DAP.

We found that significantly different expressed gene sets existed among these inbred lines. Between B73 and BSSS53, there were 16 non-allelic genes, and 11 allelic genes. Between BSSS53 and W22, 18 non-allelic genes and 10 allelic genes were found. Between B73 and W22, 8 non-allelic genes and 14 allelic genes were found (Table 3). We found that all non-allelic genes were contributed by large families, i.e. z1A and z1C. All the expressed genes in small families, i.e. z1B and z1D, appeared to be allelic. With large family like z1A, non-allelic genes were mostly contributed by a locus with a large gene cluster. The z1A gene family fell into two BAC contigs—Z350D07 and Z448F14 (Song and Messing 2002), that were mapped on two noncontiguous chromosome 4 loci (Song et al. 2003). Z350D07 was a locus with only three copies of z1A genes, and Z448F14 was a locus with a large gene cluster of nine copies of z1A genes. We determined that the two expressed genes from Z350D07 all appeared to be allelic among the three inbred lines, while all non-allelic genes were contributed by the Z448F14 locus.

Table 3 Expressed α-zein genes and their expression levels in maize inbred lines B73, BSSS53 and W22 at 24 DAP

We also discovered major haplotypes existed among these inbred lines, particularly in large gene families like z1A and z1C. For the z1A gene family, at least two major haplotypes were found, all contributed by the Z448F14 locus. One represented by BSSS53, the other one represented by B73 and W22. However, there was a minor difference between B73 and W22; an extra unknown z1A gene was found in W22. Between the two z1A haplotypes, only two genes appeared to be allelic, while at least eight genes appeared to be non-allelic. In BSSS53 haplotype, we found one z1A gene (Z350D07-2) had a premature stop codon (Table 3). This was the only gene in this study that had a premature stop codon, but was still expressed. This result confirmed an earlier observation that some 19 kD zein ESTs in maize EST database have premature stop codons (Song and Messing 2002).

For the z1C gene family, a previous study indicated that there were four major haplotypes (Song and Messing 2003). However, our study found that the three inbred lines represented three major haplotypes. There was only one gene (azs22.4) that appeared to be allelic among all three lines. The results for haplotypes in B73 and BSSS53 lines were inconsistent with the previous study (Song and Messing 2003). It was interesting that the third major haplotype was represented by W22. In this haplotype, the azs22.16 gene, i.e. floury2, was found to be unexpressed. Also, the azs22.12 gene, which was found to be unexpressed in B73 (see “Results”), was expressed at a low level in W22. Also there was an unknown z1C gene that was expressed at a high level (30.57%).

Despite the dramatic difference of expressed gene sets among these inbred lines, it was interesting that the overall expressed gene numbers appeared to be surprisingly consistent among these lines. In B73, a total of 18 expressed genes were detected, while there were 18 and 19 expressed genes detected from BSSS53 and W22, respectively (Table 3). Not only the number of total expressed genes, but also the number of expressed genes within each family was highly consistent. For example, z1B and z1D all had two expressed genes among these inbred lines. Z1A had eight, eight and nine expressed genes in B73, BSSS53 and W22, respectively. While z1C had six, seven and six expressed genes in B73, BSSS53 and W22, respectively. A selective pressure for maintaining the expressed gene number of α-zein genes appeared to be present.

In addition to the large amount of non-allelic genes, the differentiation among allelic genes was also significant. For example, the two z1A allelic genes from Z350D07 were diverged so much, that their relative abundance in three inbred lines was completely different (Table 3). A similar divergence was found with the two allelic genes in the z1D gene family (Table 3). For the z1B gene family, although the two allelic genes expressed at similar level between B73 and BSSS53, significant expression divergence was found in W22. Similar expression divergence could also be found from allelic genes in z1A and z1C gene families (Table 3).

Discussion

An effective strategy for expression profiling analysis of the maize α zein super gene family

The α-zein gene family encodes the most prominent storage protein in maize seed. It represents a typical gene super family that contains four small but related gene families, namely z1A, z1B, z1C and z1D. Due to the complexity of the super gene family and high sequence similarity among member genes, it was technically challenging to carry out a quantitative expression profiling analysis of the α-zein gene family during endosperm development.

Previous studies relied on RNA hybridization, EST data analysis or random RT-PCR clone sequencing (Dolfini et al. 1992; Kodrzycki et al. 1989; Marks et al. 1985; Woo et al. 2001; Song and Messing 2003). Although these studies yielded valuable information about α-zein gene expression during endosperm development, they either failed to distinguish individual member genes, or could not provide accurate quantitative data. In this study, we attempted a different strategy to profile the expression of maize α-zein gene family during endosperm development. We divided the analysis into two parts. In part one, universal primers were designed so that the family members could be amplified without bias. Quantitative real-time PCR analysis was carried out with these primers, and accurate quantitative data was obtained for the entire gene family at multiple time points during endosperm development. In part two, transcripts of expressed member genes from each family were recovered by RT-PCR and sequenced randomly. Different member genes were distinguished by their sequence polymorphism, and their relative abundance could be counted as their occurrence among the total samples sequenced. When combined with the quantitative data achieved before, the quantitative data for individual member genes could be deduced.

Because a PCR technique was used for both steps (real-time PCR and RT-PCR cloning), small tissue samples were required for the analysis. This enabled a complete set of experiments to be conducted on a single plant, providing much better data quality. This appeared to be quite critical for gene profiling analysis. Since in our pilot experiments we noticed that even with the same genetic background (same inbred line), growing in the same greenhouse conditions (e.g. pollinated on same day, tissue harvested on same day), different individual plants still showed significant variation in gene expression level (data not shown). Therefore an experiment based on a single plant provided the best control for a multi-point expression profiling study. Using this strategy, we were able to carry out an extensive expression profiling study of the maize α-zein gene super family. This included all 4 gene families, composed of 18 active gene members at 13 different developmental points covering most of the endosperm development. This same approach should be effective for expression profiling studies of other large gene super families as well.

Characteristic expressional patterns revealed regulatory feedback of maize α zein genes during endosperm development

For the first time, the expression profile of the α-zein gene super family during endosperm development has been obtained. Due to dense sampling (every 2 day interval) during endosperm development, this study revealed new findings of α-zein developmental expression patterns. Characteristic “up-and-down” oscillating expression patterns were discovered with all four gene families (Fig. 1). If less dense sampling points (e.g. every 4 day interval) were collected, such characteristic patterns might be missed. The “up-and-down” oscillating patterns clearly indicate that there was feedback regulation of α zein genes expression during endosperm development. Two distinct groups of “up-and-down” oscillating patterns indicated two different feedback regulation pathways. Based on this, we postulate that z1A and z1B might share one regulation pathway, while z1C and z1D might share another regulation pathway. Despite the four gene families being derived from a single ancestor gene, their regulation diverged during evolution.

A previous study had demonstrated that 22 kD zein (z1C) gene family was under the control of transcriptional factor O2 (Kodrzycki et al. 1989), but the transcriptional factor(s) responsible for expression of the 19 kD zein gene families is unknown. Because the O2 mutation affected some of the 19 kD zein proteins and based on the similar expression pattern z1D had to that of z1C, we predict O2 could also be the transcriptional factor for the z1D gene family. However, the z1A and z1B gene families might have a completely different, i.e. non-O2, transcriptional factor, as suggested by their distinct expressional patterns. Then how could such feedback regulation be carried out? Previous studies found that the activity of O2 was regulated diurnally, and phosphorylation was found to play an important role in diurnal regulation and DNA binding activity (Ciceri et al. 1997, 1999). These data suggest that there is an unknown signaling pathway with a Kinase component involved in O2 regulation. This could provide a possible answer for the “up-and-down” oscillating pattern found with z1C genes, the direct target genes of O2. A similar mechanism might also occur with transcriptional factor(s) for the 19 kD gene families.

Dramatically divergent member genes indicate a fast evolution rate of the maize α zein gene family

A striking finding from this study was the high degree of expression divergence among the α-zein gene family members. A large number of the member genes were found to be pseudogenes. We also noticed that nearly all of the expressed genes had intact coding sequences, while almost all unexpressed genes had mutations in the coding sequences (Tables 2, 3). It has been shown that premature stop codons decrease mRNA stability (Van Hoof and Green 1996) and that a single in-frame stop codon reduces the mRNA concentration significantly (Liu and Rubenstein 1993). Mutations that cause premature stop codons might be an early step in creating pseudogenes. Moreover, we found a large number of non-allelic expressed genes among different inbred lines (Table 3). The non-allelic expressed genes found were mainly contributed by pseudogenes of allelic gene copies from different inbred lines. This result indicates that the α-zein gene family is still creating pseudogenes. This provides further evidence that the α-zein gene family is subject to the “birth-and-death” evolution model (Nei and Hughes 1992).

Besides the divergence of expression status, the expression level of member genes also diverged greatly. A comparison among different inbred lines yielded more information about expression level divergence of allelic genes (Table 3). The differentiation of gene expressional patterns reflected the divergence of gene regulation. It’s surprising that closely related member genes within the same gene family could have such diverged expression patterns. Although the transcriptional factor(s) for the 19 kD zein genes is unknown, the 22 kD zein (z1C) genes are under the control of Opaque2 (Rubenstein and Geraghty 1986; Schmidt et al. 1987). However, in BSSS53 (o2) line, two member genes of the z1C gene family, zp22/6 and zp22/D87, are expressed in the absence of O2 (Song et al. 2001). Also some other transcription factors binding the P-box may interact with O2 to activate 22 kD zein gene expression, like a maize zinc-finger protein (Vicente-Carbajosa et al. 1997). Previous studies suggested that factor(s) other than O2 might activate the 22 kD zein (z1C) gene transcription (Ciceri et al. 2000; Muller et al. 1997). All of this suggests that the expressional divergences of the α-zein super gene family observed in this study might result from differential transcriptional regulation due to dissimilar transcription factors. Because the maize α-zein gene super family had a relatively recent evolutionary history, most of the gene copies arose from gene amplifications within the last 5 million years. Many member genes have very high sequence similarity. The dramatic divergence of these member genes, particularly in their expression level, could be explained by the entire α-zein gene super family having had a fast rate of evolution. This fast evolutionary rate was possibly due to the coexistance of many redundant gene copies that had similar biological functions, as compared to single copy genes. On the other hand, co-evolution of α-zein genes and their transcriptional regulators may have further accelerated their evolution at the gene expression and regulation level.

High diversity of α-zein genes among maize inbred lines contributed by highly diverged haplotypes

Previous work revealed highly diverged haplotypes of the z1C gene family in the maize genome (Song and Messing 2003). This would result from very different gene sets among the various inbred lines harboring different haplotypes. In this study, the expression of the α-zein gene super family was compared in 3 maize inbred lines. We found that these inbred lines have drastically different expressed gene sets. Large numbers of non-allelic genes were contributed by the large families (e.g. z1A or z1C), or by a chromosome locus with a large gene cluster (e.g. 448F14 locus). It appears that large gene families contribute more genetic variation than small gene families and single genes.

Based on these expressed gene sets, we predict different haplotypes of α-zein gene families (two for z1A, three for z1C). Crosses between these inbred lines would result in very different expressed genes sets in their hybrids. It’s interesting to note that the total expressed α-zein gene numbers among these inbred lines were highly consistent, not only for the entire α-zein gene super family but also for the four gene families (Table 3). We postulate that there might be a selection pressure for minimal expressed gene number in each haplotype. Above this number, there would be no selective advantage. However, less than this number might cause selective disadvantage. Besides the great diversity of expressed gene sets among inbred lines, the allelic genes also showed dramatic divergence in expression level. This further demonstrates the fast evolving nature of maize α zein gene family.

As a consequence of the combination of different haplotypes with different expression capacity, the overall expression level of α-zein genes could be significantly different. For breeding purposes, rational combination of different α-zein haplotypes would provide an effective way for manipulating the seed protein content in maize seeds.