Introduction

The candidate gene approach, which consists of looking for genes segregating around a locus putatively responsible for the variation of a trait, has been proposed as a means of initiating QTL characterization [1, 2]. Over 9 million expressed sequence tags (ESTs) from plant tissues are currently lodged in GenBank (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html; December 5, 2008). These represent a large set of candidate sequences involved in the physiological processes involved in plant development. The isolation of gene sequences specifically or differentially expressed during one developmental stage provides another source of candidate genes [3, 4]. Expressed sequence tag (EST) databases are valuable resources for discovering novel genes through in silico cloning [57].

Guanosine triphosphate (GTP) binding proteins (G proteins) participate in a wide range of biological processes, including signal transduction, protein synthesis and secretion, and cellular proliferation. Ras-homologous GTPases constitute a large family of signal transducers that alternate between an activated, GTP-binding state and an inactivated, GDP-binding state [811]. To date, five subfamilies of the Ras superfamily are known: Ras, Rho, Rab, Ran, and ARF1 proteins.

The ADP-ribosylation factor (ARF) belongs to the Ras superfamily of low molecular weight GTP-binding proteins (21–24 kDa) and regulates a diverse range of cellular processes. Like all small GTP-binding proteins in the Ras superfamily, ARF proteins cycle between inactive GDP-bound and active GTP-bound forms that bind selectively to effectors [12]. The ARF family plays a major role in membrane trafficking in eukaryotic cells [13]. ARF activation is facilitated by specific guanine nucleotide exchange factors (ARF-GEFs). Several ARF-GEFs have been identified, varying in size, structure and subcellular distribution [1418].

In maize, an ARF-like protein has only been identified by Verwoert et al. [19]. Another new candidate EST (PE215C3) putatively encoding GTP-binding proteins has been isolated from the suppression subtractive hybridization (SSH) library and mapped on the location of a QTL for grain weight by in silico mapping in our laboratory [20]. In this study, repeated EST searching, multiple sequence comparisons, and other data-mining techniques were employed to identify the Arf gene from maize endosperm. Certain bioinformatics tools were also used to analyze the genomic organization and the encoded protein of the isolated Arf gene. The ARF cDNA sequence of two maize inbreds with large- and small-sized grains was compared. Its expression characteristics were also analyzed both at different stages of endosperm development and in other tissues. The results of this study provide useful information for future studies on the maize Arf gene.

Materials and methods

Plant materials

A dent maize inbred Dan232 with large-sized grains and a popcorn inbred N04 with small-sized grains were planted at the Scientific Research and Education Center of Henan Agricultural University near Zhengzhou, Henan, China in 2006. Dan232 was derived from Lu 9 kuan × Dan340. N04 was derived from a Chinese popcorn variety BL03. Each plant was self-pollinated by hand. Ears were harvested at 3, 5, 7, 10, 15, 20, 25, 30 and 35 days after pollination (DAP), respectively. To increase the uniformity of the isolated kernels, the upper half and the lower approximately one-sixth of the ears were discarded. Grains were isolated from the remaining part of the ears. Samples were collected from at least six ears and pooled at each time point. Some of the collected samples were immediately frozen in liquid nitrogen and stored at −70°C.

Isolation of total RNA and mRNA

Total RNA was isolated using a hot phenol extraction method [21]. For PCR-select DNA subtraction, mRNA was purified from total RNA using an Oligotex™ mRNA Purification Kit (QIAGEN).

In silico cloning of complete open reading frames

Based on the candidate EST putatively encoding GTP-binding proteins from the SSH library established in our laboratory [20], the putative full-length TaCRT cDNA was obtained by in silico cloning. The differentially expressed EST obtained by SSH was selected for BLAST search in the National Center for Biotechnology Information (NCBI) EST database. The overlapping ESTs were assembled into contigs to obtain the open reading frames (ORFs). The specific primers P1 (5′-CATCGAGTCAACCGAACCCAAGC-3′; sense) and P2 (5′- GATAATCCCGGAATGCAGCAAAT-3′; antisense) were designed for amplification of the complete ORF. PCR was carried out using 1 μl of the obtained cDNA, 2.5 μl 10 × PCR buffer, 2.5 μl dNTPs mixture (2.5 mM each), 0.1 μl of each primer (10 μM), 0.125 μl Takara La Taq, and distilled H2O was added to make up the final volume of 25 μl. The PCR conditions were 1 min at 94°C, then 30 cycles of 40 s at 94°C, 40 s at 58°C and 1.5 min at 72°C, and a final extension of 10 min at 72°C. PCR products were separated on 1% agarose gels and the single specific PCR product band was cloned into the pGEM-T easy vector (Promega) for sequencing.

Bioinformatics analysis

Nucleotide sequence and protein similarity analyses were carried out using Danman version 5.2.2 and BLAST programs (http://www.ncbi.nlm.nih.gov/BLAST/), respectively. The ORF Finder (Open Reading Frame Finder) was used to identify the ORF in the nucleotide sequence (http://www.ncbi.nlm.nih.gov/projects/gorf/) [22]. To establish the genomic organization, the cDNA sequence was blasted to the contigs of the maize genome in GenBank. SIM4 (http://pbil.univ-lyon1.fr/sim4.php) was used to align the cDNA sequence with the genomic sequences to search for potential introns.

Reverse transcription-polymerase chain reaction (RT-PCR) analysis

RNA from maize endosperm and several other organs/tissues (embryo, pericarp, root, stem and leaf) was used for the RT-PCR, which was performed using the P1 and P2 primers. The β-actin gene was amplified as an internal control with the following primers: 5′-CGATTGAGCATGGCATTGTCA-3′ and 5′-CCCACTAGCGTACAACGAA-3′. The samples were resolved on a 1% agarose gel with 1 μg/ml of ethidium bromide and were analyzed with the software Quantity one 4.30 (BioRad, Hercules, CA).

Results

Cloning of ZmArf2 cDNA

An EST highly homologous to GTP-binding protein of rice was obtained from the SSH libraries during identification for differentially expressed genes between 10 DAP and 20 DAP in popcorn inbred N04 endosperm. This 275 bp EST was chosen as a query probe for in silico cloning. BLASTN searches against NCBI maize EST databases revealed that more than 50 EST hits were returned for this EST. The overlapping ESTs were assembled into a 1,259 bp extended sequence. To verify the result of in silico cloning, specific primers were designed for RT-PCR amplification and a 938 bp cDNA fragment was obtained from maize endosperm 10 DAP (Fig. 1). This fragment was fully sequenced and identified as a new maize GTP (ZmArf2) cDNA clone (GenBank accession no. EU816421).

Fig. 1
figure 1

Sequence of the ZmArf2 gene and its deduced amino acid sequence

Characterization of ZmArf2 cDNA sequence

The determined nucleotide sequence was 938 bp, which consists of a 237 bp 5′-untranslated region (UTR), a 92 bp 3′ UTR, and a 612 bp ORF. The ORF of the ZmArf2 gene encodes 203 deduced amino acid residues with a calculated molecular mass of 22.77 kDa and a predicted pI of 6.13. The BLASTN and BLASTX results showed that it was highly homologous to the GTP-binding protein-like gene in Oryza sativa L. with 80 and 83% identities, respectively.

By using the ZmArf2 cDNA sequence to BLAST search the NCBI High-Throughput Genomic Sequences (HTGS) database of maize (http://www.ncbi.nlm.nih.gov/HTGS/), a partial maize genomic sequence was identified as being present on BAC clone AC206679.3. Intron–exon boundaries were determined by aligning the cDNA sequence with the partial genomic sequence. Using SIM4, five exons were found in the relevant DNA sequence. The length of each exon was 93, 141, 109, 62, 101 and 106 bp, respectively (Fig. 2a, b).

Fig. 2
figure 2

a Schematic representation of the structure of the ZmArf2 gene. The numbered exons (E) are indicated as boxes with the length in nucleotides above the box. The introns (I) are thin lines with length in nucleotides listed below the line. b Representation of the structure for the corresponding cDNA product. The 5′-untranslated region (UTR), open reading frame (ORF), and 3′-UTR are indicated and the length in nucleotides is indicated below each box

For further study the ARF cDNA sequence was similarly cloned from the dent corn inbred Dan232. Comparison of the two ARF cDNA sequences from N04 and Dan232 revealed that eight nucleotides differed, which resulted in five amino acid changes (Table 1).

Table 1 Changes in the Arf cDNA sequence and the corresponding amino acid changes between maize inbred N04 and Dan232 endosperm

Protein prediction and phylogenetic analysis

Using the online software CDD v2.16 of NCBI for prediction of the gene functional domain (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml/), sequence analysis showed that the protein had a high similarity to that of other plant ARFs. Several conserved sequences were revealed. The G1 box, VVDRAGKT, constitutes the phosphate-binding loop [consensus sequence GXXXXGK(S/T)]. The G2 box, T only, is conserved throughout the superfamily, but neighboring residues are conserved within families. The G3 box, DLGG (consensus DXXG), interacts with the 7-phosphate of GTP. The G4 box, NKQD [consensus (N/T)KXD], was shown to be specific for guaninyl binding. The G5 box, SAY, has the consensus sequence [C/S]A[K/L/T]. The Switch I region, KLKS and PPDRVVPTVGL, and the Switch II region, DLGGQVSLRTIWEKYYEEA, undergo conformational changes upon GTP binding (Fig. 3).

Fig. 3
figure 3

The protein sequence for the ZmArf2 gene as deduced by CDD in NCBI

Phylogenetic analysis of a multiple alignment showed that six Arf sequences from different plant species were classified into two groups. The ZmArf2 sequence belonged to the same group as predicted ARFs from rice and Arabidopsis, with similarities of 84 and 71%, respectively (Fig. 4a, b). However, the similarity between this Arf sequence and another maize Arf sequence was only 35%, which were classified into different groups.

Fig. 4
figure 4

Alignment (a) and homology tree (b) from analysis of amino acid sequences of ZmArf2 and other Arf genes from six plant species. Identical amino acids are highlighted with black shadow. Dots are gaps introduced to improve the alignment. GenBank accession numbers for Arf amino acid sequences: ZmARF2 (PE215C3, maize), Oryza sativa (AAR87275.1, rice), Arabidopsis thaliana (NP_851175.1, arabidopsis), Vigna unguiculata (AAB91395.1, cowpea), Hordeum vulgare (CAD48129.2, barley), Zea mays (P49076.2, maize)

Expression analysis of the ZmArf2 gene

The expression of ZmArf2 at different developmental stages in the endosperm and other tissues for inbred N04 was measured. As shown in Fig. 5a, ZmArf2 was expressed more strongly during the early development of maize endosperm and subsequently expression was reduced slightly. Moreover, expression of the ZmArf2 gene was not tissue-specific but was also expressed in the embryo, pericarp, root, stem and leaf (Fig. 5b). Expression was higher in the stem than in the embryo, pericarp, root and leaf.

Fig. 5
figure 5

RT-PCR analysis of the ZmArf2 gene expressed at different stages of endosperm development (a) and in other tissues (b) for popcorn inbred N04. The numbers 1–7 refer to 3 days after pollination (DAP), 5, 7, 10, 15, 20 and 25 DAP, respectively. The numbers 8–12 refer to the embryo, pericarp, root, stem and leaf, respectively

Discussion

Using an in silico cloning technique an ADP-ribosylation factor (ARF) gene from the popcorn inbred N04 was cloned in this study. The full length of this cDNA was 938 bp, which included an ORF of 612 nucleotides encoding a polypeptide of 203 amino acids with an estimated molecular mass of about 22.77 kDa. RT-PCR expression showed that the mRNA of ZmArf2 was detectable at all stages during endosperm development and in all other tissues. Therefore, the expression of ZmArf2 was ubiquitous. Previous studies also showed that ARF proteins were ubiquitous and have been found in all eukaryotic cells, including humans, bovines, rat, mouse, chicken, plants, yeast and slime mold, and Drosophila melanogaster [2327].

It is known that ARF belongs to the Ras superfamily, which regulate a diverse range of cellular processes. In the present study, expression of ZmArf2 was higher in the earliest four stages than in later stages of endosperm development, and were highest in the first two stages. Since cellularization, cell division and cell proliferation are characteristically involved in the early developmental stages for maize endosperm [2831], such results suggested that the cloned ZmArf2 gene may play a critical role in these phases. Although the function of ARF proteins has not yet been fully elucidated, similar roles have been implicated in previous studies. Ras proteins, related to ARFs, are thought to be involved in basic processes such as cell growth and cell proliferation [8]. ARF proteins are implicated as regulators of vesicle-mediated protein trafficking [32]. A relationship between ARFs and membrane lipids has been demonstrated [33]. ARFs can influence the phospholipid content of membranes by activating phospholipase D (PLD), yielding phosphatidic acid (PA) and choline. PA has been found to induce DNA synthesis and cell proliferation [34].

From the comparison of the nucleotide and amino acid sequences between the popcorn inbred N04 and the dent corn inbred Dan232, eight nucleotides differed and five amino acids changed between the two inbreds. One amino acid change was from Val to Gly in the G1 box. The Tyr to His and Val to Ala changes were close to the Switch II and G4 box, respectively. The study by Barbacid [35] showed that mutation of Gly to Val or other amino acids resulted in deficient GTPase activity and increased transforming activity. Verwoert et al. [19] speculated that the ARF protein is indirectly involved in transformation of the lipid composition by base exchange. In the present study, the two inbreds are greatly different in endosperm weight and 100-grain weight, with Dan232 31.89–165.94% and 26.46–168.20% higher, respectively, than N04 during grain development [20]. Whether this sequence change could lead to deficient GTPase activity and increased transforming activity, or transformation of the lipid composition, and ultimately lead to the difference in grain weight between the two inbreds should be revealed in a future study. Furthermore, the roles of the other four changes in amino acids should be studied simultaneously.