Introduction

Plants respond to changes in their environment on various levels. Short-term changes in light intensity are compensated for by using already existing proteins, while persisting changes lead to altered gene expression (e.g., Murchie et al. 2005; Scheibe et al. 2005). During high-light acclimation, the amount of thylakoid components, especially of light-harvesting complexes (LHC), is decreased, and the amount of stromal enzymes, especially of Calvin-cycle enzymes, is increased (Anderson et al. 1995). To prevent oxidative stress, the amounts of glutathione-synthesizing enzymes and of chloroplast NADP-MDH also increase (Becker et al. 2006). The operation of the malate valve is important to stabilize the chloroplast NADPH/ATP ratio, especially under rapidly changing conditions (Scheibe 2004). With respect to the nuclear-encoded chloroplast NADP-MDH, the key enzyme of the malate valve, it has long been known that it is subject to light/dark modulation, mediated by the ferredoxin–thioredoxin system (Buchanan 1980; Scheibe 2004). The activation state of NADP-MDH is influenced by the NADPH/NADP ratio (Scheibe and Jacquot 1983), thus NADP-MDH activation, and consequently the export rate of malate from the chloroplast, decreases when little NADPH is available.

In addition to these fast changes in the activation state, it has also been shown that the total amount of NADP-MDH is regulated by several endogenous and exogenous factors. In tobacco plants, the amount of NADP-MDH, defined as capacity after full activation with reducing agents, showed strong age dependence. The capacity was highest in young leaves, and decreased with leaf and plant age (Faske et al. 1997). When tobacco plants were grown under various CO2 concentrations, the expression of NADP-MDH was reduced when elevated CO2 was present (Backhausen and Scheibe 1999). On a chlorophyll (Chl) basis, the NADP-MDH capacity was nearly tenfold higher in cold-grown winter wheat, compared to control plants grown at 20°C (Savitch et al. 2000). In cold-acclimated Arabidopsis plants, the NADP-MDH capacity was also higher than in plants grown at 20°C (Savitch et al. 2001). Monitoring the expression changes in NADP-MDH after a transfer into conditions of persisting over-reduction, i.e., a transfer of Arabidopsis plants into high light levels and moderately decreased temperature, it turned out that both NADP-MDH transcript and protein amount are upregulated within a few hours (Becker et al. 2006). From an array analysis conducted in parallel, it was concluded that the expression change of NADP-MDH is not a single event, but a part of the overall process of high-light acclimation (Becker et al. 2006). There is some evidence, at least in the case of fully developed Arabidopsis leaves, that an alteration in the chloroplast redox state is responsible for the release of redox signals. An excitation imbalance between photosystem (PS) I and PS II has the potential to generate such redox signals, and also regulates the transcription of several other nuclear-encoded genes (Walters et al. 1999; Fey et al. 2005; Holtgrefe et al. 2007).

However, in Arabidopsis, the expression pattern of NADP-MDH not only depends on the environmental conditions, but is modulated by the endogenous state of the plant. The transcriptional upregulation of NADP-MDH was evident in plants grown under short-day conditions, while in plants grown under long-day conditions no such increase was observed. The duration of the light period resulted in different responses towards an identical stimulus, i.e., high light and decreased temperature. It has been concluded that the endogenous systems that measure day length interact with redox regulation. Redox-mediated acclimation signals are redirected during short days in a way that allows light to be used more efficiently, and prevents oxidative damage under long-day conditions (Becker et al. 2006).

In summary, a complex pattern for the transcriptional regulation of NADP-MDH emerges. It is evident that, for the regulation of transcription, either the altered binding of transcription factors or the binding of different transcription factors to regulatory elements of the target gene is required. In the majority of cases described to date in plants, transcription factors do not interact with the coding region, but all cis-regulatory elements are localized in the 5′ upstream region of a gene. A typical promoter of the RNA-polymerase II-type consists of the core promoter, located directly in front of the beginning of the section to be transcribed, and the proximal and distal promoter regions. The core promoter contains the transcription start site (TSS), located in the initiation region (INR), together with some binding sites for general transcription factors. A well-known transcription-factor binding site is the TATA box. It is usually located 25–40 bp upstream of the TSS, and it is the recognition sequence for the TATA-binding protein that forms part of the transcription complex (Werner 1999).

Upstream of the core promoter is the proximal promoter, which usually has a length of 200–300 bp and contains binding sites for additional transcription factors. Transcription factors that bind to the proximal promoter are often involved in the regulation of transcription. A well-studied example is the CCAAT box, typically located 80–110 bp in front of the TSS. In contrast to the TATA box, which only regulates the start of transcription, the CCAAT box controls the intensity of gene expression. The distal promoter, which is located upstream of the proximal promoter, contains additional regulatory elements and can extend up to 1500 bp from the transcription start. The distal promoter is characterized by its highly variable structure, and by containing additional binding sites for transcription factors (Werner 1999).

The occurrence of different transcription-factor binding sites in the proximal and the distal promoter are thought to be responsible for the differential expression patterns of a gene. Transcription factors act as independent modules. Apart from constitutive promoters, development-dependent, tissue-specific promoters, and inducible promoters, which respond to specific environmental signals, are known (Tyagi 2001 ). In some cases, additional elements involved in positive or negative control of transcription can be located further upstream or downstream in the 3′-untranslated region of a gene (Timko et al. 1985). They differ from the distal promoter in acting in a manner that is independent of position and orientation (Rippe et al. 1995).

However, several modifications from the classical promoter type have been described. The modifications can be grouped in four types, all of which lead to correct transcription of the target genes: (1) The TATA box-core promoter contains neither INR nor downstream elements; (2) The INR-core promoter does not contain the TATA box and downstream elements; (3) The composite-core promoter contains only the TATA box and INR; and (4) the null core promoter contains only up- and downstream elements. Only 30–50% of all known promoters contain a TATA box. Many housekeeping genes, and most photosynthetic genes, are characterized by the lack of the TATA box (Shahmuradov et al. 2005). There are also some known examples of transcription factors that can bind to the coding region. The leaf ferredoxin gene, for example, contains important regulatory elements in the coding region (Dickey et al. 1992).

In this work we use different strategies, namely comparative genetics, analysis of knock-out plants, and a yeast-one hybrid system (YOH) screen, to put forward the hypothesis that, due to the presence of coding DNA segments and overlapping UTRs on the opposite DNA strand, in both the 3′ and 5′ directions, the regulatory elements that control the complex transcription pattern of NADP-MDH in Arabidopsis were shifted into the coding region of the gene during evolution.

Materials and Methods

Plant Material and Growth Conditions

Seeds of Capsella rubella Reut., Capsella bursa-pastoris (L.) Med., Cardaminopsis petraea (L.) Hiit., Cochlearia officinalis L., Lepidium densiflorum Schrad. and Lepidium latifolium L. were obtained from the collection of the Botanical Garden of the Universität Osnabrück. Seeds of Arabidopsis thaliana L. (Heynh.) from the accessions Columbia, Landsberg erecta, and Wassilewskija were obtained from the SALK institute (http://www.signal.salk.edu).

Inbred seeds were placed on soil and kept in darkness for 48 hours at 4°C in order to synchronize germination. Cultivation conditions were defined as a light intensity of 120 μmol quanta m−2 s−1, a relative humidity of 65%, and 8 hours of light, with 22°C to 18°C day/night temperatures. Six weeks after germination, the plants were transferred into a growth chamber with the respective conditions (120 μE at 12°C, or 750 μE at 22°C). SON-T AGRO 400 lamps (Philips, Eindhoven, The Netherlands) were used as light sources.

Genomic DNA from Arabis turrita L. and Arabidopsis wallichii (Hook. f. et Thoms.) Busch was a kind gift from M. Koch (Universität Heidelberg, Germany).

Leaf samples were taken by quickly removing all rosette leaves, and transferring them into liquid nitrogen within less than 10 seconds. Leaves from two different plants were combined to form each sample. The leaves were homogenized under liquid nitrogen, and stored at −80°C until use.

Identification and Analysis of Homozygous At5g58340 Plants

The Arabidopsis knock-out mutant lines At5g58340::tDNA-1 (SALK_053119) and At5g58340::tDNA-2 (SALK_018118) were received from the Arabidopsis Biological Resource Stock Center (http://www.arabidopsis.org/abrc). Homozygous knock-out plants were identified by PCR for T-DNA insertion within the gene region of At5g58340. Genomic DNA was isolated from plant tissues by standard methods. The sequence information for the gene- and T-DNA specific primers was obtained from the SALK-Institute (http://www.signal.salk.edu). The insertion positions of the T-DNA products were confirmed by sequencing the PCR products.

NADP-MDH Capacity Measurements in Leaf Samples

The samples used for the estimation of the NADP-MDH capacity were obtained from individual leaves. The leaves were cut from the plant and immediately transferred into liquid nitrogen. The leaves were homogenized under liquid nitrogen, and stored at −80°C until use. The activity of NAD-MDH, and total activity (capacity) of NADP-MDH were determined after exhaustive reduction with reduced dithiotreitol (DTT) and corrected for nonspecific NADP-dependent activity due to NAD-MDH, as described by Scheibe and Stitt (1988). For each experiment, duplicate measurements from three different plants were made.

Northern Blot Analysis

For Northern blot analysis, total RNA was isolated from frozen leaf material using the Purescript RNA extraction kit (Gentra Systems, Minneapolis, Minn., USA). For RNA gel-blot hybridization analysis, 10 μg of total RNA was denatured and separated on a 1.2% (w/v) agarose 2.5% (v/v) formaldehyde gel. Homidium bromide was included in the loading buffer to ensure equal sample loading. RNA was blotted onto a nylon membrane (Hybond-N, Amersham Biosciences) using downstream capillary transfer. RNA was cross-linked to the membrane by ultraviolet (UV) irradiation. Prehybridization and hybridization were performed at 65°C in Church-buffer medium [0.25 M sodium phosphate buffer, pH 7.2, 1 mM ethylenediamine tetraacetic acid (EDTA), 7% (w/v) sodium dodecyl sulfate (SDS), and 1% bovine serum albumin (BSA)]. Hybridization was performed by using an alpha-[32P]-dCTP-labeled full-length NADP-MDH (A. thaliana)-cDNA-specific probe (Ready-To-Go DNA labeling beads, Amersham Biosciences). Membranes were washed twice for 15 minutes at 65°C in washing buffer [40 mM sodium phosphate buffer, pH 7.2, 1 mM EDTA, 0.5% (w/v) SDS and 0.5% (w/v) BSA], then for 10 minutes at room temperature in washing buffer containing 1% (w/v) SDS. Finally, membranes were exposed to Kodak MS X-ray film at −80°C. The densitometric analysis of the blots was performed using the Gelix-1 gel scan software.

Isolation of Genomic DNA, Amplification and Sequencing of Gene Fragments

Genomic DNA was isolated using a modified protocol based on the cetyltrimethyl ammonium bromide (CTAB) method (Mummenhoff and Koch 1994). Homologous primers were designed for the amplification of NADP-MDH promoter fragments, based on the genomic NADP-MDH sequence from Arabidopsis thaliana var. Columbia (AB019228.1; clone MCK7). As forward primers, NADP-MDH-for (5′-tctagacgcctcgttggcgatatc-3′) and NADP-MDH-for2 (5′-ttgagctactgattcaagagagga-3′) were used, whereas NADP-MDH-rev (5′-ccatggccattatcgcacagacac-3′) was used as the reverse primer. The polymerase-chain reaction (PCR) conditions were 0.4–1.4 μg template DNA, 0.5 U Taq-Polymerase (Fermentas, St. Leon-Rot, Germany) and 0.4 mM of each primer. PCR was carried out using a gradient thermocycler (Eppendorf, Hamburg, Germany). The PCR cycle conditions were five minutes initial denaturation at 95°C, 45 cycles of one minute at 95°C, one minute at 55–60°C, one minute at 72°C for final extending, then 10°C cooling. The PCR products were purified with the QIAquick® PCR-purification kit (Qiagen, Hilden, Germany). Cycle-sequencing amplification was carried out for all DNA fragments in forward and reverse orientation by using the Big DyeTM-terminator cycle-sequencing ready-reaction kit (Applied Biosystems, PE Biosystems, Foster City, USA.

Western Blot and Determination of Protein Content

Equal amounts of soluble protein (30 μg) were loaded onto two 15% discontinuous SDS-polyacrylamide gels using a vertical minigel system (Mini-Protean II, BioRad, Germany). After separation by electrophoresis, one of the gels was stained with Coomassie. The other gel was blotted onto nitrocellulose membrane. Immunodetection was performed essentially as described in Graeve et al. (1994). For the detection of NADP-MDH, an antibody against NADP-MDH from pea leaves (1:3000) was used. For the detection of the second antibody (1:15000), Luminol (Amersham, Freiburg, Germany) was used as a substrate. The protein content of leaf extracts was determined according to Bradford (1976), with BSA as a standard.

Yeast-One Hybrid

For construction of the NADP-MDH-HIS3 fusion genes, parts of the 5′ UTR, the 3′ UTR, and the coding sequence of genomic DNA fragments of NADP-MDH (clone MCK7), were amplified by PCR using specific primers (see the primer list in the supplementary information). The potential promoter fragments were cloned in pGEM-T Easy, digested with EcoRI, and cloned into pHisi and pHisi-1 in order to position the fragments directly in front of the His3-reporter gene. These constructs were digested with XhoI and transferred into the genome of the yeast strain YM4271 by homologous recombination, according to the manufacturer’s instructions (Matchmaker One-Hybrid System; Clontech, Heidelberg, Germany). A small-scale transformation, with 15 μg linearized pHisi-1 carrying the promoter fragments, was performed. For selection, the strains were grown on SD-His medium, so that only colonies that had a functional His3-reporter gene were able to grow. The background activity of His3 was competitively inhibited by 3-AT, an inhibitor of the His3 gene. Only strains with a His3-background activity below 45 mM 3-AT were used for one-hybrid screens.

An Arabidopsis-cDNA library (Nemeth et al. 1998), which had a GAL4-activation domain (AD) and a leucine gene at the vector pACT2, was used to express proteins fused with GAL4-AD, which bind to the selected DNA fragments of NADP-MDH. The undigested cDNA plasmids were transformed into the promoter-reporter strains using the manufacturer’s protocol for a large-scale transformation (Matchmaker One-Hybrid System, Clontech, Heidelberg, Germany). The binding of a protein, expressed by the cDNA library fused to the GAL4-AD, to a DNA fragment resulted in expression of the reporter gene His3 and allowed growth of the cells on media without histidine and leucine. From these colonies, the cDNA-carrying plasmids were isolated and transformed to electrocompetent Escherichia coli cells. For sequencing, the plasmids isolated from E. coli were used. All proteins identified with the YOH screen were denoted as “binds at NADP-MDH gene” (BANG) and numbered according to the fragment they are binding to.

Bioinformatical Analysis

The DNA sequences of NADP-MDH were obtained from the National Center for Biotechnology Information (NCBI) gene bank (http://www.ncbi.nlm.nih.gov/), and from the eukaryote projects of the Institute for Genome Research, (http://www.tigr.org/tdb/euk/). The genomic DNA sequence of A. thaliana was used to generate homologous primers for promoter and coding region. The comparison of the obtained sequences was performed with the BLAST algorithm (Altschul et al. 1997), using the BLAST portal of NCBI (http://www.ncbi.nlm.nih.gov/BLAST). To identify conserved promoter regions, multiple sequence alignments were performed with ClustalW (Version 1.8.2), using standard parameters. To allow further analysis of the promoter sequences and the identification of potential transcription-factor binding sites, the MatInspector tool of the Genomatix homepage (http://www.genomatix.de/cgi.bin/matinspector) was used. The expression patterns of At5g58340 and At5g58330 were compiled from the AtGenExpress tool (http://www.jsp.weigelworld.org/expviz/expviz.jsp).

The identification of the cDNA sequences, for the results obtained with the YOH screen, was performed with BLAST on NCBI and TAIR (http://www.arabidopsis.org/Blast/). For unknown interaction partners, a function search was performed with http://www.ncbi.nlm.nih.gov, using the “Short, nearly exact matches in Arabidopsis” and the “Short, nearly exact matches in all organisms, search by domain architecture” options. Sequence alignments and the analysis of unknown proteins were done with Lalign (http://www.ch.embnet.org/software/LALIGN_form.html), Predict protein (http://www.expasy.ch), Psort (http://www.psort.nibb.ac.jp/form.html), and Meta-search (http://www.cubic.bioc.columbia.edu/pp/submit_meta.html).

Results

Capacity Changes of NADP-MDH in Various Brassicaceae Species and Varieties

It has recently been shown that in A. thaliana var. Columbia the capacity of NADP-MDH is upregulated at the transcriptional level under a combination of high light and low temperature (Becker et al. 2006). We analyzed how the two parameters act, separately, to alter transcript level and enzyme capacity in different Arabidopsis ecotypes and in selected other Brassicaceae species. Low-light-acclimated plants (120 μE at 22°C) were transferred into conditions of either decreased temperature (120 μE at 12°C) or high light (750 μE at 22°C), and samples were taken after four days.

The capacity of NADP-MDH in low-light-acclimated plants ranged between 1.29 μmol h−1 cm−2 leaf area in Cochlearia officinalis (Fig. 1B), and 6.15 μmol h−1 cm−2 leaf area in Arabidopsis thaliana var. Columbia (Fig. 1A). After the transfer, the capacity increased under both conditions. In Arabidopsis thaliana var. Columbia, values of 8.75 μmol h−1 cm−2 and 9.68 μmol h−1 cm−2 were reached, while in Cochlearia officinalis 2.63 μmol h−1 cm−2 and 4,54 μmol h−1 cm−2 were reached. In general, the different Arabidopsis ecotypes and the other Brassicaceae showed a similar behavior. The largest increase was always obtained after transfer into high light. Decreased temperatures also resulted in higher NADP-MDH capacities, but the final values were, on average, 25% below the values obtained in high light (Figs. 1 and 2). Northern blot analysis of NADP-MDH mRNA in the different Arabidopsis ecotypes indicates a specific increase in gene expression in all cases (Fig. 3), which correlates with the activity measurements. Thus, the increase in NADP-MDH capacity after the transfer into high light or decreased temperature is most likely the consequence of a transcriptional upregulation.

Fig. 1
figure 1

NADP-malate dehydrogenase (MDH) capacity changes in selected Brassicaceae species. Plants from the indicated species were analyzed four days after a transfer into low temperature (12°C), into high light (750 μE) or under control conditions (Control). Full activation of NADP-MDH was achieved by excessive reduction with DTTred (A) A. thaliana var. Columbia (B) Cochlearia officinalis (C) Capsella rubella (D) Lepidium latifolium. The error bars represent the standard error

Fig. 2
figure 2

NADP-malate dehydrogenase (MDH) capacity changes in Arabidopsis ecotypes. Plants from three Arabidopsis ecotypes were analyzed four days after transfer into low temperature (12°C), into high light (750 μE) or under unaltered control conditions (Control). The mean values of two independent samples for each condition are shown. Full activity of NADP-MDH was measured after reduction with DTTred (A) A. thaliana var. Columbia, (B) A. thaliana var. Landsberg erecta, (C) A. thaliana var. Wassilewskija

Fig. 3
figure 3

Northern blot analysis of the NADP-malate dehydrogenase (MDH) transcript in Arabidopsis ecotypes. Plants from three Arabidopsis ecotypes were analyzed four days after a transfer into low temperature (12°C), into high light (750 μE) or under unaltered control conditions. (A) Autoradiograph of the Northern blot, hybridized with a [32P]-NADP-MDH (A. thaliana)-specific probe. The numbers below the image indicate the band intensity. (B) As a loading control for the gels, the rRNA content is shown. Left side, A. thaliana var. Columbia; middle, A. thaliana var. Landsberg erecta; right side, A. thaliana var. Wassilewskija

Structure of the NADP-MDH Gene and the Upstream and Downstream DNA Regions

In Arabidopsis, the gene for NADP-MDH is located on chromosome five and consists of between 12 and 14 exons, and between 11 and 13 introns (Fig. 4). For this gene, three different splicing variants are available in the Arabidopsis database (http://www.arabidopsis.org). The first splicing variant (At5g58330.1) has a total length of 3180 bp, encoding for a protein of 444 amino acids. Its 5′ UTR is predicted to be 94 bp long, and its 3′ UTR is 808 bp long. Splicing variant At5g58330.2 is 57 bp shorter, and encodes a protein of 443 amino acids. Both differ in their amino acid sequence between position 16 and 48, located in the transit peptide of the protein. For At5g58330.2, the 5′ UTR is predicted to have a length of only 37 bp, while the 3′ UTR is also 808 bp long. The third splicing variant (At5g58330.3) is 3094 bp long, but the predicted protein should have a size of only 335 amino acids, because the 5′ UTR is predicted to cover the first 544 bp of the coding region as well. Therefore, the first 108 amino acids of the resulting protein, carrying the transit peptide and the N-terminal extension with the regulatory cysteine residues, would be missing. Apart from the lack of the ability to be transferred into chloroplasts and to be regulated via the N-terminal disulfide bridge, this would result in a much smaller NADP-MDH protein than has ever been observed in vivo. In addition, the corresponding transcript could not be found in the mRNA-data collection of NCBI (http://www.ncbi.nlm.nih.gov/).

Fig. 4
figure 4

Schematic representation of At5g58340, the splicing variants of NADP-MDH (At5g58330) and At5g58320 in A. thaliana var. Columbia. This figure gives an overview of the localizations of At5g58340, At5g58330 and At5g58320. The exons of At5g58340 are shown in red. The exons of the three splicing variants of At5g58330 (NADP-MDH) are shown in yellow, and At5g58320 is shown in green. All untranslated regions (UTRs) are shown in light blue. Exons 5 and 6 are not assigned to the Arabidopsis NADP-MDH gene (cf. Fig. 6)

Assuming that only the first two splicing variants (At5g58330.1 and At5g58330.2) occur in vivo, the distance to the next genes on chromosome 5 is very small (summarized in Fig. 4). In the 3′ direction, the coding region of At5g58320 starts 410 bp after the end of the last exon of NADP-MDH, but the UTRs are predicted to overlap by 829 bp. At5g58320 is transcribed in the opposite direction. The function of that gene is not known, and it must be questioned whether it is transcribed at all, since no expression data are available in the internet databases.

In 5′ direction, the transcription start of the gene At5g58340, also encoded on the opposite DNA strand, is only 319 bp away from the ATG of NADP-MDH. The coding region is 1575 bp long and contains two introns close to the end of the gene. The first exon, located nearest to the NADP-MDH transcription start, is 1050 bp long and thus covers the region which should carry the distal and the proximal promoter of NADP-MDH. The UTRs of both genes also overlap (Fig. 4). At5g58340 is denoted as an unknown gene, and a BLAST search for similar genes indicates that it may be a myb-type transcription factor, because it probably contains a myb-DNA binding domain. In order to assess whether At5g58340 is really transcribed and detectable on gene arrays, the expression data on Affymetrix chips available in the internet were used. According to AtGenExpress (http://www.jsp.weigelworld.org/expviz/expviz.jsp), At5g58340 is expressed in all parts of the plant. The lowest expression values are reported for leaves, and the highest expression is reported for pollen and seeds in the late developmental states. However, the intensity of the signals is less than 10% of the intensity found for NADP-MDH. It should also be noted that the expression pattern of NADP-MDH does not show any similarity with that of At5g58340.

The structure of the NADP-MDH gene, as described above, is not unique for Arabidopsis. All other Brassicaceae species sequenced in this study (see below) display the same gene structure, with 11 introns and only a small distance of approximately 320 bp between the coding regions of NADP-MDH and the unknown gene on the opposite DNA strand. The genomic sequence of the NADP-MDH gene and a large part of the 5′ UTR is also available for some Poaceae. Sorghum vulgare is unique in having two genes for NADP-MDH, and the sequence from Oryza sativa is also available from databases. Three striking differences in the gene structures form Brassicaceae and Poaceae are evident (Figs. 5 and 6). At first, the NADP-MDH genes from the Poaceae possess 13 introns, but DNA alignments indicate that the large exon 4 of A. thaliana var. Columbia is disrupted by two additional introns which may be as long as 350 bp (Fig. 6). Thus, exons 5 and 6 are absent in Arabidopsis in the figure. The second major difference is that intron 2 is much larger, up to 450 bp in Oryza sativa, compared to only approximately 50 bp in the Brassicaceae. The most striking difference with respect to transcriptional regulation is, however, that no other genes are located in a 1200-bp region upstream of the transcriptional start. In rice, a pseudogene may be located at −897, but its DNA sequence does not display any similarity with At5g58340.

Fig. 5
figure 5

Schematic representation of the 5′ upstream region of the NADP-malate dehydrogenase (MDH) gene in various plants. The ATG of NADP-MDH is set to zero. Only the first exon of NADP-MDH is shown (in yellow at the right end). The classical promoter region of NADP-MDH, which contains At5g58340 on the opposite DNA strand, is labeled with negative numbers. The 5′ upstream regions that were sequenced and aligned for the species listed in Table 4 are indicated. All sequenced DNA fragments are shown in grey and the sequence parts used for the alignments in Table 4 are shown in blue

Fig. 6
figure 6

Schematic representation of the NADP-MDH gene and of the fragments used for the YOH screen.The transcription start of NADP-MDH (ATG) is set at zero. The classical promoter region of NADP-MDH is labeled with negative numbers. Intron and exon structure, and numbering of introns and exons, as used in Tables 2 and 3, is indicated. At the top of the figure, the position of the fragments used in the YOH screen is indicated

Sequence Comparison of the NADP-MDH Gene and the 5′-DNA Region

Among Brassicaceae, genes, especially exons, are often highly conserved (Rossberg et al. 2000). However, the NADP-MDH protein is strongly conserved even between other organisms. The identity of NADP-MDH from A. thaliana var. Columbia and other Brassicaceae compared to that of Poaceae is 76–78%, and even the protein from Chlamydomonas reinhardtii is 60 or 62% identical with that of Poaceae or Brassicaceae, respectively (Table 1). In all cases, most differences in the translated sequence occur within the first 100 amino acids, carrying the transit peptide, and the region containing the N-terminal regulatory cysteines. The exception is NADP-MDH from Chlamydomonas reinhardtii which already carries the N-terminal sequence extension with several conserved amino acids but does not possess the two N-terminal cysteines. Also, the 40 amino acids of the sequence with the C-terminal regulatory cysteine residues are not as highly conserved as the other amino acids (Table 1).

Table 1 Alignment of the NADP-MDH amino-acid sequences

However, at the DNA level, some differences are apparent. In the coding region, the exons of all Brassicaceae are highly conserved (Table 2). Identities vary between 87 and 89% (Exon 14) and 96 and 100% (exons 7 to 9), as was already reported for other Brassicaceae exons (Acaran et al. 2000; Boivin et al. 2004). This is slightly higher than those observed within the Poaceae (Table 3) with average identities between 90 and 100%. The identity of Poaceae with Arabidopsis is only 10% for the exon 1 (encoding the transit peptide), and otherwise between 34% and 88%. A comparison of the introns shows large differences between Poaceae and Brassicaceae. In Brassicaceae, the introns of the NADP-MDH gene are nearly as highly conserved as the exons. In Poaceae, however, identities of 80% or above are rare, and only occur between the two Sorghum sequences. The identities between both species are around 25%, and only intron 10 is 63% identical between rice and Sorghum.

Table 2 Nucleotide identities of the NADP-MDH genes of selected Brassicaceae
Table 3 Nucleotide identities of the NADP-malate dehydrogenase (MDH) gene from A. thaliana var. Columbia and selected Poaceae

The same tendency continues in the 5′ UTR of the NADP-MDH gene. Here, the two regions indicated in Fig. 5 were analyzed in more detail. Overlapping region 1 has a length of approximately 520 bp and lies between −380 and −890 bp, upstream of the ATG of the Arabidopsis NADP-MDH. In the case of Brassicaceae, this is within the coding region of At5g58340. In overlapping region 1, the promoter sequences of eight Brassicaceae species, namely A. thaliana var. Columbia, A. thaliana var. Landsberg erecta, A. thaliana var. Wassilewskija, Cardaminopsis petraea, Capsella bursa-pastoris, Arabis turrita, Arabidopsis wallichii, and Cochlearia officinalis, are compared with Oryza sativa and the two Sorghum sequences. The summary (Table 4A) indicates high homologies, which are similar to the introns or exons of NADP-MDH within this region, while (with the exception of Sorghum I and II) this region is not conserved in Poaceae.

Table 4 Nucleotide identities within the 5′ region of the NADP-MDH gene

Overlapping region 2 has a length of 242 bp and lies between −76 and −318 bp, which is between the start of the NADP-MDH and At5g58340, and covers a large part of the 5′ UTR of At5g58340. In overlapping region 2, the sequences of four Brassicaceae (A. thaliana var. Landsberg erecta, Capsella bursa-pastoris, Arabis turrita, and Arabidopsis wallichii) are compared with NADP-MDH from Oryza sativa and the two Sorghum sequences. The degree of identity is not as high as in overlapping region 1, but with an average of 80%, it is much higher than in Poaceae, where only values below 10% were obtained (Table 4B). The gene from Chlamydomonas was not included in this comparison, because it differs in all aspects (promoter, intron structure, and length) from the higher plant genes.

Analysis of Knock-Out Plants for At5g58340

In order to assess whether the emerging picture, namely the lack of regulatory elements in 5′ position of the NADP-MDH gene, but regulatory elements located within the coding region, can be confirmed by a more direct method, knock-out plants that carry large T-DNA inserts in the proximal and distal promoter region were analyzed. Homozygous lines were selected as described in the Materials and Methods section, and the positions of the inserts were confirmed by DNA sequencing. Unfortunately, the expression level of At5g58340 is extremely low, and it was neither possible to analyze its expression in wild-type (WT) plants, nor to confirm the lack of expression in the homozygous knock-out plants.

In line ko1, the insert is located 420 bp in front of the ATG of NADP-MDH. This is outside the 5′ UTR of NADP-MDH, but inside the first exon of At5g58340. In line ko2, the insert position is 1524 bp in front of the ATG, which is within the second exon of At5g58340 (Fig. 7A). The position of the inserts is shown in Fig. 7A. In both knock-out lines, the expression of functional At5g58340 should be abolished. However, the homozygous knock-out plants did not display any visible differences to the WT plants. Also the expression of NADP-MDH was unaltered (Figs. 7B and 7C). This confirms our conclusion that no regulatory elements are located in the classical proximal and distal promoter region. However, it cannot be ruled out that regulatory elements are located in the 420 bp between the ATG and the position of the insert in ko1.

Fig. 7
figure 7

NADP-MDH expression in knock-out plants for At5g58340. (A) The position of the inserts in lines ko1 and ko2, as confirmed by DNA sequencing, is indicated; (B) sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE); (C) Western blot using an antibody raised against NADP-MDH from pea leaves. For both gels, a total amount of 30 μg protein was used. Lanes 1 and 3, WT; lane 2, homozygous At5g58340-ko1; lane 4, homozygous At5g58340-ko2

DNA Binding Proteins Identified by YOH

A YOH screen for DNA binding proteins was performed with A. thaliana var. Columbia genomic DNA. The YOH screens were performed with one fragment (F1) that covers the 5′ UTR of the NADP-MDH gene and the first exon (−411 to 106) and four fragments located inside the coding region (Fig. 6). Fragment 5 (F5) starts in the first exon (106 bp behind the ATG) and has a total size of 1740 bp. Fragment 9 (F9) spans the second half of the coding region and part of the 3′ UTR. It starts at position 1211 and ends at position 2788. Fragment 7 (F7) covers a part of the 3′ UTR and reaches from 2372 to 2530, and fragment 10 (F10) reaches from 1211 to 2437. The position of the fragments is indicated in Fig. 6. For the classical promoter region (F1), only three different proteins could be identified (Table 5). Two of them have an unknown function, and there is little evidence that they are involved in transcriptional regulation by a known mechanism.

Table 5 List of proteins identified by the YOH with fragments of the NADP-MDH gene

BANG 1-1 (At4g08240), the first protein that was found to bind to F1, has a calculated molecular weight of 14.5 kDa, and is predicted to possess between three and five α-helices. The three-dimensional structure could not be predicted and no known domains could be identified. It is characterized by a very high content of leucine residues, and the serine and threonine contents are also unusually high. A similar protein (55% identity) was only found in the rice genome (P0020D05.17, classified as ‘unknown protein’), but not in any other organism. The second protein, BANG 1-2 codes for At5g11680. The deduced amino acid sequence consists of 207 amino acids and has a calculated molecular weight of 23.1 kDa. No three-dimensional structure could be predicted and no known domains could be recognized. Again, only a protein identified within the rice genome has a high identity (75%) with BANG 1–2. Psort (http://www.expasy.ch) predicts that both proteins are localized in the cytosol. However, for such small proteins a nuclear localization signal (NLS) would not be required for nuclear translocation (Yoneda 1997). The third protein that binds to F1 was identified as EIN2 (BANG 1–3), an ethylene receptor (Chang and Shockey 1999). However, since this protein is probably membrane-associated (Alonso et al. 1999), it is not very likely that it can bind to the NADP-MDH gene, although it has been reported that clathrin-coated vesicles may provide such possibility (Benmerah et al. 2003).

In additional YOH screens, fragments that cover most of the coding region of the NADP-MDH gene were used. Fragment 5 covers introns and exons 1 to 10. Fragment 10 slightly overlaps with F5, and covers introns 8 to 13 and exons 8 to 14. A total of 34 proteins were identified as binding to the coding region. The Arabidopsis gene numbers and the description are summarized in Table 5. The properties of the proteins are summarized in Table 6. For 10 of the proteins, Psort predicts a nuclear localization, and for 19 additional proteins, a nuclear localization is possible, even without NLS, due to their small size. In eight cases, literature data indicate nuclear localization or the ability to bind DNA.

Table 6 Properties of putative NADP-MDH interaction partners

Discussion

Regulation of NADP-MDH Expression in Brassicaceae and Poaceae

Under in vivo conditions, a complex pattern for the transcriptional regulation of NADP-MDH is found. Changes in the expression of NADP-MDH are caused by exogenous factors (e.g., CO2 availability, light intensity, light period, or temperature) and endogenous factors (plant and leaf age, flowering). This transcriptional regulation is not unique in Arabidopsis and other Brassicaceae; it was also observed in other species such as wheat (Savitch et al. 2000) and tobacco (Faske et al. 1997). Unfortunately, no data are available as to whether changes in NADP-MDH expression also occur in Chlamydomonas or related algae, under changing environmental conditions.

Although the amino acid sequence of NADP-MDH is similar in Chlamydomonas, Poaceae and Brassicaceae (Table 1), the gene structure displays striking differences (Fig. 6). In Poaceae, typical promoter elements could be identified. The Chlamydomonas NADP-MDH gene also possesses a classical promoter, but it does not share conserved motifs with the Poaceae promoter. In addition, the gene structure differs between the two. Although the Brassicaceae gene is organized in a similar way as in Poaceae, another gene is located close to the transcription start of NADP-MDH in Brassicaceae, and promoter elements are absent from the 5′ region. The results of the YOH screen, sequence alignments, and the NADP-MDH expression in Arabidopsis knock-out plants for At5g58340 provide evidence that in Brassicaceae most regulatory elements, which are responsible for the many-folded environmental effects on NADP-MDH expression, must be located inside the coding region of the gene.

In the knock-out lines for At5g58340, a large insert disrupts the DNA region where the proximal and the distal promoters of typical genes should be located. However, both knock-out lines are indistinguishable from WT plants in all aspects. In particular, the expression of NADP-MDH is unaffected. It was expected that interruption of the promoter by large DNA inserts would affect gene expression, because regulatory elements are removed from their original positions. This supports the results obtained from motif prediction programs, which indicate that no regulatory elements are located in 5′ position of the NADP-MDH gene of Brassicaceae. With the YOH screen, only three proteins that bind to the 5′ region were identified, while 34 proteins appear to be able to bind to the coding region. However, most of the proteins are as yet unknown. Bioinformatical analysis confirmed that some of the proteins (BANG 5-6, BANG 5-9, BANG 5-10, BANG 5-12, BANG 7-4, BANG 7-5, BANG 10-6, BANG 10-8, BANG 10-11, BANG 10-13, and BANG 10-15) contain an NLS sequence, which would facilitate nuclear localization. Since more possibilities exist for nuclear import, the absence of an NLS does not automatically mean that nuclear localization is impossible. Future work will focus on the identification of DNA-binding proteins in vitro, and identification of their target motifs.

Secondly, the comparison of sequence identities in the coding and non-coding parts of the NADP-MDH gene indicates large differences between the Brassicaceae and the other organisms. The position of NADP-MDH in Poaceae is covered in Brassicaceae by another gene on the opposite DNA strand. The first exon of At5g58340 starts at −319 bp, before the transcription start of NADP-MDH. It is 1050 bp long and covers a large part of the proximal and distal promoter region. As a consequence, this region is highly conserved within the Brassicaceae. The evolution rate is as low as is expected for coding regions. One single nucleotide exchange in every 500–1000 bp is the average value. For non-coding regions, a higher average rate of one exchange in every 200–500 bp would be typical. (Brumfield et al. 2003). In fact, within the Poaceae, there was no similarity between the sequences in this DNA segment. The same difference between Brassicaceae and Poaceae holds true for the DNA segment between NADP-MDH and At5g58340. In Brassicaceae it is even more conserved than are the exons of At5g58340.

Additional characterization of the 5′ region of the NADP-MDH gene was performed by searching for cis-acting elements in transcription factor databases (Genomatix, MatInspector). For the genomic sequence of NADP-MDH from A. thaliana var. Columbia (At5g58330; T-DNA-Express), no potential binding sites for transcription factors could be identified, and a TATA box is also absent. For genes lacking the TATA box, a direct interaction between regulatory transcription factors, the TATA-box binding protein associated factor (TAF), and the preinitialization complex is possible and would also result in the binding of RNA-polymerase II (Wieczorek et al. 1998). Genes without a TATA box may be not so rare. The TATA box is absent from approximately one third of all Arabidopsis genes (Shahmuradov et al. 2005).

An unusually high degree of conservation was found not only for the 5′ region of the NADP-MDH gene, but also for the introns in the Brassicaceae NADP-MDH gene. Intron size and position are highly conserved in Brassicaceae, but not in Poaceae. This indicates that some kind of selection pressure prevents any exchange of nucleotides at a higher rate. Even the two genes of Sorghum NADP-MDH, which arise from a recent gene duplication (Luchetta et al. 1991; Rondeau et al. 2005), are more different than the introns of the Brassicaceae species sequenced in this work. Thus, three different types of promoters are found for the NADP-MDH gene in the different species. For Chlamydomonas and for rice, internet resources such as Gene2Promoter (http://www.genomatix.de/) predict a promoter, and a CCAAT box is present, while in Arabidopsis, the prediction programs do not find any promoter-like structures. This supports our conclusion that in the case of Brassicaceae, regulatory transcription-factor binding sites lie within the coding region of NADP-MDH, especially inside the introns. It seems that regulatory elements for NADP-MDH expression could not be inserted into the exon of another gene (At5g58340), even when this is encoded on the opposite DNA strand. One may expect that the loss or modification of At5g58340 may have serious consequences for fundamental processes in Arabidopsis, but the lack of any visible differences between the WT plants and both knock-out lines indicate that this is not the case.

Apart from the transcription-factor binding motifs, which are usually very small, promoters are not highly conserved. In particular, the distal promoter region is very variable, and there are some examples of distal promoter elements several kb away from the gene they regulate (Kirchhamer et al. 1996). There is evidence that downstream-activating sequences also play a role in transcriptional regulation. Dickey et al. (1992) demonstrated for the fed-1 gene from Pisum sativum, that regulatory elements can be located inside the coding region, even within exons. Fed-1 encodes the photosynthetic protein ferredoxin I. A light-responsive element was identified in the 5′ UTR and in the first third of the coding region. With respect to the Acl 1.4 gene, which encodes ACP4, an acyl-carrier protein, Bonaventure and Ohlrogge (2002) demonstrated that it also contains elements for transcriptional regulation within the coding region.

Evolution of NADP-MDH Structure and Function

It is interesting to ask how such a difference in gene structure could evolve. NADP-MDH is originally derived from NAD-malate dehydrogenase (NAD-MDH) and still displays several structural similarities to NAD-dependent malate dehydrogenases, especially around the active site. Apart from the altered coenzyme usage, the major difference between NAD- and NADP-MDH is the presence of extra amino acids in the C- and N-terminal extensions (Scheibe 1990). Both possess regulatory cysteine residues that act in two different ways to inactivate the enzyme upon oxidation (in darkness) in the chloroplast (Miginiac-Maslow and Lancelin 2002). The NADP-MDH gene of Chlamydomonas possibly resembles a common ancestor of the recent Brassicaceae and Poaceae gene. It differs from both in possessing only the C-terminal disulfide bridge. At the N-terminus of the Chlamydomonas protein the sequence extension is already present, but it does not contain any cysteine residues (Ocheretina et al. 2000). The absence of the second regulatory disulfide bridge alters the activation/inactivation properties of NADP-MDH (Lemaire et al. 2005). Although the protein sequence of Chlamydomonas NADP-MDH is quite similar to higher plant NADP-MDH (Table 1), the gene structure differs remarkably. As summarized in Fig. 6, the Chlamydomonas gene possesses only eight introns, some of which are quite long, and their position is different from those in the Brassicaceae and Poaceae. In addition, it possesses a classical promoter in 5′ direction.

The early evolution of NADP-MDH must have occurred in an atmosphere that consisted of nearly 1% CO2 with only traces of oxygen. Later, when the oxygen concentration increased, eukaryotic algae developed the CO2-concentrating mechanism (CCM) to maintain a high CO2/O2 ratio inside the chloroplasts. An enzyme with limited regulatory properties, as found for NADP-MDH of the recent Chlamydomonas, may be sufficient for a malate-valve function under such conditions. The ability to regulate the malate valve at the level of gene expression may have been no longer necessary. The insertion of another gene into the promoter (the progenitor of At5g58340 in this case) could occur without any negative consequences. During the evolution of multicellular organisms, the CCM of the algal type was lost, and after plants had entered land, the selection pressure changed again. Light intensities on land are not only much higher than in the water, but its intensity can also change more dramatically. In addition, the O2-partial pressure increased further, paralleled by a decrease of the atmospheric CO2 content. The regulatory disulfide bridge on the N-terminal sequence of NADP-MDH possibly evolved first, allowing for more efficient post-translational regulation of the enzyme, but the introduction of additional regulatory motifs into the DNA was also required. Our data do not clearly indicate the point from which Poaceae and Brassicaceae went in different directions, but the last common ancestors probably existed about 65 million years ago. Since both have conserved cysteine residues on N- and C-terminus, their introduction possibly happened in a common ancestor. The time point of At5g58340 insertion is more difficult to estimate. One possibility is that At5g58340 was inserted into a common ancestor of both, which already had both regulatory disulfide bridges, and was later lost in Poaceae. The 5′ region of Arabidopsis NADP-MDH can be aligned with the rice promoter to achieve 41% identity, but it must be questioned how far the algorithms used for such an alignment are reliable, especially when gaps need to be introduced into noncoding DNA. In this case, the progenitor of At5g58340 was removed, and the elements required for the transcriptional regulation of NADP-MDH, now forming a new promoter, were introduced instead. However, it is also possible that At5g58340 was inserted later in an ancestor of the Brassicaceae, but not into the genome of the Poaceae progenitors. Our data only indicate that, at the position of At5g58340, Poaceae do not possess a coding region. Genes very similar to At5g58340 are only found in other Brassicaceae, but not in rice, Sorghum or Chlamydomonas.

With the exception of some Poaceae, most recent plants still possess only one copy of NADP-MDH. The last step in evolution of the NADP-MDH gene in Poaceae seems to be a gene duplication in some C4 species such as Sorghum (Luchetta et al. 1991; Rondeau et al. 2005). This may be due to the special function of NADP-MDH in C4 metabolism, requiring a much higher expression level in the mesophyll cells only. However, maize successfully performs the C4 pathway with only one copy of the NADP-MDH gene.

In conclusion, our data support the hypothesis that NADP-MDH expression in Brassicaceae is regulated by a new type of promoter. We conclude that during evolution no regulatory elements could be introduced into an exon of another gene encoded on the opposite DNA strand without influencing its function. The NADP-MDH promoter can be described as a null–null promoter, because the NADP-MDH gene does not contain any regulatory elements within the 5′ region. All regulatory elements must then be located either inside the coding region, or in the 3′ UTR. Although all of the elements involved in transcriptional regulation may be located within the coding region, the expression of NADP-MDH follows the same complex pattern as in other organisms which use a null–core promoter, as is the case in Poaceae.