Introduction

Each compartment in a plant cell contains its own specific set of proteins meant to fulfil a specific function within the metabolic range of reactions. For most reactions, the general rule ‘one gene-one compartment’ (Small et al. 1998) applies which implies that most enzymes are targeted exclusively to one cellular compartment. As a consequence, similar metabolic steps are often performed by isoenzymes that are presumed to have evolved by gene duplication. However, an increasing number of examples for proteins that possess either one ambiguous targeting peptide or two or more targeting signals have emerged over the last decade. Selective targeting of proteins to different cellular compartments can be important for plant development and interorganellar communications. This phenomenon has been discussed in several recent reviews and the various mechanisms of dual targeting and combinations of intracellular targets have been summarized (Small et al. 1998; Silva-Filho 2003; Karniely and Pines 2005). Among the known combinations of target compartments, the combination of mitochondria and plastids is particularly abundant in plant cells (Silva-Filho 2003). In contrast, it is striking that hardly any target combinations of nucleus/plastids or nucleus/mitochondria have been reported. The aim of this study was to determine to what extent dual localization to the nucleus and one of the other DNA-containing organelles might occur among proteins involved in the regulation of gene expression. To this end we used an in silico approach to screen the genomes from Arabidopsis thaliana, a dicotyledonous plant, and rice, a monocotyledonous plant, for transcription factors that possess the relevant plastid, mitochondrial and nuclear-targeting sequences.

In contrast to plants, the available data from yeast and mammalian cells show that here, a significant number of proteins are active in mitochondria as well as in the nucleus. Several of such dually targeted proteins are involved in tRNA-processing like the yeast Trm1, Mod5, Cca1 and Rpm2 proteins (Ellis et al. 1989; Boguta et al. 1994; Wolfe et al. 1994, 1996; Stribinskis et al. 2005) or in DNA mismatch repair as the human uracil-DNA glycosylase (Slupphaug et al. 1993). One example implicated in chromatin remodeling, transcription, splicing and translation processes is the K protein of the hnRNP complex that has been found not only in the nucleus but also in the cytoplasm and in the mitochondria (Bomsztyk et al. 2004). Other proteins found in the mitochondria and in the nucleus are involved in programmed cell death such as the apoptosis inducing factor, AIF, which has been found in mammals and in yeast (Wissing et al. 2004; Ruchalski et al. 2006).

In land plants, one of the very few examples for dually targeted nuclear/plastid proteins was described in 1997 by Luo et al., who reported on the existence of two sets of transcripts of the bifunctional carrot dihydrofolate reductase/thymidylate synthase. A longer transcript of the corresponding gene encodes a protein with an N-terminal plastid target peptide that can direct the precursor protein to the chloroplasts while a shorter transcript produced from the same gene lacks the N-terminal extension and therefore apparently codes for a nuclear version of the protein (Luo et al. 1997). In 2001, the presence of a protein similar in size and immunologically related to a nuclear DNA-binding protein, SEBF, that acts as a repressor of the potato pathogenesis-related gene PR-10a, has been observed in chloroplasts (Boyle and Brisson 2001). Recently, the three members of a new family of transcription factors in Arabidopsis thaliana, the Whirly (Why) protein family, were shown to be directed to either plastids or mitochondria in protoplasts transformed with the respective GFP fusion proteins (Krause et al. 2005). Previous reports on the Why1 protein of potato (alias p24, Desveaux et al. 2000) have described the interaction between this protein and the promoter of the nuclear pathogen response gene PR-10a in infected cells (Desveaux et al. 2000, 2004). Most recently, reports on two further dually targeted DNA-binding proteins with localization in the nucleus and in one of the other two DNA-containing organelles have been published (Sunderland et al. 2006; Raynaud et al. 2006). In case of the DNA ligase 1, translation initiation from a first in-frame start codon produces a protein that is exclusively targeted to mitochondria, while the use of an alternative second start codon produces a protein that is found only in the nucleus (Sunderland et al. 2006). The existence of a chloroplast-localized protein initiated at a potential third AUG that was previously proposed (Sunderland et al. 2004) could not be confirmed.

The present study demonstrates that these proteins likely are just the tip of the iceberg and that dual-targeting activity to the nucleus and the plastids or mitochondria seems to be a broader phenomenon in plant cells than currently anticipated.

Materials and methods

Sequence retrieval

Predicted putative transcription factor sequences of Arabidopsis thaliana were obtained from the Arabidopsis Transcription Factor Database (Davuluri et al. 2003; http://www.arabidopsis.med.ohio-state.edu/AtTFDB). The gene names follow the AGI locus identifier and the annotation is based on TAIR v.6 (http://www.arabidopsis.org). The different loci coding for putative transcription factors of rice (Oryza sativa) were obtained from the Rice Transcription Factor Database (http://www.ricetfdb.bio.uni-potsdam.de). The rice genes were named according to the TIGR locus identifier and the annotation is based on TIGR v.4 (http://www.tigr.org/tdb/e2k1/osa1).

Prediction of subcellular localization

All predictions were based on a consensus prediction using a naïve Bayes method. For this, individual predictions of chloroplast and mitochondrial target peptides were performed by several publicly available web services (Table 1). These individual predictions were combined mathematically to a consensus score. In detail, two complementary hypotheses for the location of a protein in the chloroplast (and two more for the location in the mitochondrion) were tested: the hypothesis that a protein is located and the hypothesis that a protein is not located there, given a positive prediction. For each prediction program the likelihoods, i.e. the probability of a positive prediction regarding one or the other hypothesis, were evaluated by considering its prediction data for sets of plant proteins with known subcellular localization. Plant proteins for these test sets were selected from the UniProt database (Schneider et al. 2005) or the Arabidopsis Subcellular Proteomic Database (Heazlewood et al. 2005) (see supplemental files 1–3). Redundancy within the protein sets was reduced in a way that no two proteins shared greater than 40% sequence identity.

Table 1 Web services used to predict plastid (cTP) or mitochondrial (mTP) targeting sequences of plant transcription factors

To combine the different methods, it was assumed that their predictions are independent of each other. This naïve assumption allowed us to compute the likelihood of the parameters given several prediction data simply as product of the individual likelihoods. The ratio of the posterior probabilities of both hypotheses was computed by

$$ \frac{{p(c|a_{1}, a_{2}, \ldots,a_{n})}}{{p(\bar{c}|a_{1}, a_{2}, ...,a_{n})}}= \frac{{p(c){\prod\limits_{i = 1}^n {p(a_{i} |c)^{{w_{i}}}}}}}{{p(\bar{c}){\prod\limits_{i = 1}^n {p(a_{i} |\bar{c})^{{w_{i}}}}}}} $$

where c is the location of a protein in the chloroplast or mitochondrion, respectively, (the negation of c is written \({\bar{c}}\)) and a 1 to a n are the individual positive predictions. Based on predictions for the whole genomes of A. thaliana and O. sativa (data not shown), the chloroplast-targeted and mitochondrion-targeted proteins were estimated to constitute 15% and 12% of all open reading frames. p(c) for chloroplast-targeting was set, accordingly, to 0.15 and p(c) for mitochondrion-targeting to 0.12. The weight w i is given by the score value of the corresponding prediction program and was normalized to a value between 0 and 1. Programs without scoring (IPsort, WoLF-PSort) can be viewed as a special case of weighting where weights are restricted to either 0 or 1. The logarithm in base 2 of the ratio that resulted from this calculation was used as consensus score value.

Evaluation of the consensus prediction method

To show an improvement of this consensus method over each of the individual methods that contribute to it, the specificities of all methods were compared by applying them to the plant protein test sets described earlier (suppl. files 1–3). The specificity (computed as 1—false positives/all negatives) depends on the score value threshold (above which the prediction is positive) chosen for an individual prediction program. In general, a higher threshold generates a higher specificity but sacrifices sensitivity (computed as true positives/all positives). Therefore, the comparison of the specificities was based on a common reference sensitivity value. The specificity was evaluated after trimming the method score threshold to a value that results in a reference sensitivity of 0.7. This reference sensitivity was used for all further calculations.

Sequence alignments of orthologous proteins from different plant species and reconstruction of phylogenetic trees

Protein and translated EST databases were examined for sequences homologous to Arabidopsis transcription factors using the blastp and tblastn tools of the BLAST program (Altschul et al. 1990). The sequences were aligned using the Clustal X program (Thompson et al. 1997). The sequence alignments were subsequently inspected and edited by hand as recommended by Harrison and Langdale (2006) using the graphical multiple sequence alignment editor (BioEdit v.7.0.5.3) in order to obtain optimal alignment and eliminate gap-rich stretches. Nuclear localization sequences were identified with the programs PredictNLS (Cokol et al. 2000) and PSORT (Nakai and Horton 1999). Unrooted trees were prepared by the neighbor joining method (Saitou and Nei 1987) using Clustal X (v1.81) and TreeView (v1.5.2) with 1,000 replicates performed for obtaining bootstrap confidence values. The measure for the distances between sequences was percent divergence.

Localization of an At2g44940-GFP-fusion protein

The entire cDNA sequence and the sequence corresponding only to the plastid target peptide, respectively, were amplified by PCR using isolated cDNA from Arabidopsis. The PCR products were subsequently cloned, sequenced and then inserted in-frame in front of the gfp coding sequence using the binary gateway vector pBatTL-B-GFP2 that contains a double 35S promoter. Protoplasts from Arabidopsis thaliana were produced from Arabidopsis light-grown suspension culture cells according to the protocol of Negrutiu et al. (1987). The recombinant plasmids with the GFP fusion constructs were introduced into the protoplasts using PEG-mediated transformation (Negrutiu et al. 1987). Transiently transformed cells were analyzed for GFP fluorescence using a fluorescence microscope.

Results

Validation of the screening method

For most of the annotated plant transcription factors no experimental data concerning their subcellular localization are available. Analyses of these proteins are complicated by the fact that they are often present in trace amounts only. Sensitive methods like mass spectrometric analysis of compartmental proteomes are prone to artifacts because of the danger of cross-contamination from other cell compartments. Optical in vivo techniques based on the fusion with fluorescent proteins such as GFP or immunological methods are more reliable but are only available for a few selected proteins. For the task of identifying potential candidates that are targeted to one of the organelles, a prediction method of the subcellular localization that picks up as many true positives for a given compartment while keeping the number of false positives or true negatives as low as possible is highly desired. Wagner and Pfannschmidt (2006) have recently listed 48 putatively plastid-targeted transcription factors from Arabidopsis based on the prediction with the program TargetP (Nielsen et al. 1997). In contrast, we have chosen an approach where the results of several prediction programs were combined to a consensus prediction using a naïve Bayes method (see Materials and methods). In order to compare the performance of the consensus prediction to those of the individual single prediction programs that contribute to it, the specificities were calculated using sets of organellar test proteins consisting of >500 proteins from Arabidopsis and other species. For control, a test set of >600 proteins of confirmed non-organellar localization was used. We found that for both plastid and mitochondrial proteins, the consensus prediction method showed a higher specificity at a reference sensitivity of 0.7 than the single predictions which contribute to the consensus (Table 1). The vast majority of the organellar test set proteins achieved consensus score values of 10 and above (up to 21) (data not shown). When used on the experimental sets of DNA-binding SET domain proteins (Springer et al. 2003) (Table 2) and transcription factors (Tables 3, 4, 5, 6), we found again that those proteins with a confirmed localization (ATXR5, AtWhy1-3) had values of above 10. Our algorithm predicted high scores of 19.2 (AtWhy1), 17.1 (AtWhy3), 16.3 (ATXR5) and 10.6 (AtWhy2) for these proteins, respectively (Tables 2, 3, 4). Two more SET domain proteins also received high scores for plastids (At1g26760) and mitochondria (At5g06620) (Table 2), whereas the remaining 34 SET domain proteins were not indicated as being organelle-targeted by the prediction method. This is consistent with their confirmed (At1g02580, Choi et al. 2004) or presumed location according to the SUBA proteomic database (Heazlewood et al. 2005). Based on these results we decided to use 10 as cutoff value. Below this value the risk of contamination by false positives was observed to increase.

Table 2 Mitochondrial and plastid consensus scores for SET domain proteins
Table 3 Characteristics of Arabidopsis thaliana transcription factors with putative plastid localization sequences
Table 4 Characteristics of Arabidopsis thaliana transcription factors with putative mitochondrial localization sequences
Table 5 Characteristics of Oryza sativa transcription factors with putative plastid localization sequences
Table 6 Characteristics of Oryza sativa transcription factors with putative mitochondrial localization sequences

Identification of putative plastid and mitochondrial transcription factors

The Arabidopsis transcription factor database currently lists 1,747 different proteins from 50 transcription factor families. A similar list containing currently 2,309 different loci grouped in 53 transcription factor families was compiled for rice by the Rice Transcription Factor Database. The protein sequences from these lists were subjected to a search for targeting sequences to plastids and mitochondria.

Among the Arabidopsis transcription factors, we identified 78 proteins that possess putative plastid targeting sequences (cTPs) and 12 proteins with a putative mitochondrial presequence. Fifty-one of the proteins with a cTP possess an additional sequence (NLS) that can target the protein to the nucleus, while 27 proteins lack such a sequence (Fig. 1). Of the 12 putative mitochondrial proteins 7 possess no additional targeting sequences while 5 contain a NLS (Fig. 1). Most of the proteins without known nuclear localization sequences have a molecular weight below 40 kDa and might thus not necessarily need a NLS for nuclear import. In rice, 80 proteins with a cTP and 23 proteins with a mitochondrial presequence possess a NLS. Furthermore, 40 proteins exclusively possess a cTP while 15 proteins have only a mitochondrial presequence (Fig. 1). In Tables 3, 4, 5, and 6 these proteins are listed according to their affiliation with the different transcription factor families.

Fig. 1
figure 1

Venn diagram of Arabidopsis (a) and rice (b) transcription factors possessing targeting sequences. The number of proteins with plastid (cTP), mitochondrial (mTP) and nuclear (NLS) localization sequences and combinations thereof are depicted

Of the 50 Arabidopsis transcription factor families and the 53 transcription factor families of rice, 23 and 33, respectively, possess members with putative organellar presequences. These include large families with numerous members such as the C2H2 and CH3 zinc finger domain protein families or the AP2/EREBP proteins. On the other hand also small protein families like the GeBP or Whirly transcription factor families are included (Tables 3, 4, 5, and 6).

Apart from the three Whirly proteins of Arabidopsis (Krause et al. 2005, see Introduction), only three proteins from the list of identified proteins (At1g47870 alias E2FC, the GeBP protein At4g00270 and the GRAS protein At3g54220 alias Scarecrow) were so far analyzed for their subcellular localization using fluorescence-based techniques (proteins marked with asterisks in Tables 3, 4). All three were reported to be in the nucleus (Curaba et al. 2003; Heidstra et al. 2004; Koroleva et al. 2005). However, in the case of the YFP-At4g00270 fusion, the confocal images showed more than one fluorescent spot per cell. These spots were not seen with a nuclear control construct (Curaba et al. 2003) and can thus not be assigned to a specific compartment. A dual localization of this protein was, therefore, not refuted. Three transcription factors were identified by different mass spectrometric approaches but no confirmation of these by other methods exists. Only one (At4g00870) was identified as a nuclear protein (Bae et al. 2003), whereas the other two (At5g27070, At5g38560) were detected in a plasma membrane fraction (Nuhse et al. 2003).

Phylogenetic relationship of putative plastid and mitochondrial proteins of the AP2/EREBP family

The AP2/EREBP protein family is among the families with the most putative plastid or mitochondrial targeting sequences (see Tables 3, 4, 5, 6). This protein family is defined by the AP2/EREBP domain which consists of 60–70 amino acids and is involved in DNA binding (Weigel 1995). Based on the number of AP2/EREBP domains and other conserved motifs, the AP2/EREBP transcription factor family is divided into four subfamilies, the ERF subfamily, the APETALA2 (AP2) subfamily, the RAV subfamily and the DREB subfamily (Sakuma et al. 2002). ERF and DREB subfamilies are both characterized by the possession of a single AP2/ERF domain and are thus often regarded as one protein family (Nakano et al. 2006; Shigyo et al. 2006).

To analyze the phylogenetic position of the putative organellar proteins among the AP2/EREBP proteins, we constructed a phylogenetic tree with all 149 AP2 domain-containing proteins of Arabidopsis (not shown). Of the twelve putative plastid proteins, nine were identified as members of the DREB subfamily (Table 7). DREB proteins are reportedly involved in drought and low temperature stress responses in plant cells (Hao et al. 2002; Sakuma et al. 2002). Two of the other putative plastid proteins are members of the AP2 subfamily and a third one belongs to the RAV subfamily, whereas both putative mitochondrial proteins belong to the ERF subfamily (Table 7).

Table 7 Distribution of putative organellar proteins among the AP2/EREBP transcription factor family of Arabidopsis

For phylogenetic comparison of the individual putative organellar AP2 proteins from Arabidopsis and rice, a phylogenetic tree was constructed using only the sequences of the putative organellar proteins from both species (Fig. 2). The proteins were designated using the nomenclature defined by Nakano et al. (2006), where DREB proteins are represented by ERF groups I to IV and ERF proteins in senso stricto are represented by groups V to X. The phylogenetic tree showed that most Arabidopsis genes contain one or more closely related orthologues in rice, the only exceptions being the four Arabidopsis proteins belonging to group II of the ERF proteins (Fig. 2). No Arabidopsis orthologues could be found for any of the rice proteins belonging to groups XI to XIV which is consistent with previous observations (Nakano et al. 2006).

Fig. 2
figure 2

Phylogenetic tree of AP2/EREBP proteins with putative mitochondrial and plastid targeting sequences. Arabidopsis and rice sequences were obtained from the public databases (see Materials and methods). Full length amino acid sequences were aligned using the programs Clustal X and BioEdit. The resulting alignment was used to construct a neighbor joining tree (Saitou and Nei 1987) with the program TreeView. Numbers at the nodes represent bootstrap values in percentage based on 1,000 repeats. Only nodes with bootstrap values above 40 are labeled. The scale bar represents the number of substitutions per site. Proteins with a high chloroplast score (filled triangle) and proteins with high mitochondrial score (open circle) are designated. Classification of proteins into subfamilies as defined by Nakano et al. (2006) is indicated

The existence of homologous pairs or groups of putative organellar proteins in Arabidopsis and rice prompted us to search for related proteins in other species. For the AP2 protein from Arabidopsis that gained the highest chloroplast score and that is encoded by the gene locus At2g44940, several homologous proteins from both dicotyledonous and monocotyledonous species could be identified. These include a protein from maize (ZmDBF2), one from Triticum monococcum (TmCbf7), one from barley (HvCbf7), a protein from Medicago trunculata (MtERF) and one from potato that was deduced from the fused amino acid sequences of two overlapping EST sequences (StPPCBR81) (Fig. 3). A similar number of homologues were found for the gene product of At5g11190 that is putatively targeted to mitochondria (Fig. 3). Table 8 shows that all proteins from these species are strongly predicted to be targeted to either the plastids or the mitochondria.

Fig. 3
figure 3

Phylogenetic relationship of homologues of At2g44940 and At5g11190 gene products from different monocotyledonous and dicotyledonous plant species. Sequences from other plant species were obtained through BLAST searches. The alignment of full length amino acid sequences and construction of the neighbor joining tree was done as described in Fig. 2. Numbers at the nodes represent bootstrap values in percentage based on 1,000 repeats. The scale bar represents the number of substitutions per site

Table 8 Localization predictions for homologues of the At2g44940 and At5g11190 gene products

An alignment of six sequences homologous to the At2g44940 gene product revealed a high sequence identity within the AP2 domain and, beyond that, the existence of further domains that are highly conserved (Fig. 4). AP2 domains are characterized by several well-conserved amino acids that constitute a putative amphipathic α-helix and are generally divided into a DNA-binding and an oligomerization domain. These domains can be either adjacent to each other or separated by a few amino acids (Riechmann and Meyerowitz 1998; Liu et al. 1999). In the present case, the two parts of the AP2 domain are separated by a stretch of basic amino acids that constitute the nuclear localization sequence (Fig. 4). The N terminus of each protein, although being considerably variable, is extremely rich in hydroxylated amino acids and in alanine, leucine and arginine and thus fulfils the classical features of chloroplast-targeting sequences (Bruce 2000). Taken together, these findings indicate that this group of proteins has evolved before the monocotyledonous and dicotyledonous plants have split up.

Fig. 4
figure 4

Alignment of amino acid sequences of AP2 domain containing homologues of At2g44940. Amino acids that are identical in at least 5 out of 6 sequences are shown in white against a black background. The chloroplast target peptide (cTP) is depicted in italic letters and ends at the cleavage site that is marked by a downward arrow. The DNA-binding and oligomerization domains (DNA-BD, OD) of the AP2 motif and the nuclear localization sequence (NLS) are indicated by contiguous lines above the sequence. Other conserved domains are framed and designated I to III

Cellular localization of At2g44940

For the AP2 protein encoded by the At2g44940 gene, GFP fusion constructs of the entire gene product or the putative plastid target peptide sequence were used to examine the localization of this protein. Transient expression of these fusion proteins in protoplasts from a light-grown mesophyll cell suspension culture from Arabidopsis thaliana showed that the GFP fused to the entire At2g44940 gene product is indeed targeted to both compartments (Fig. 5a). The dual localization confirmed that both of the targeting signals, i.e. the N-terminal plastid target peptide and the NLS were correctly predicted. However, we observed that most of the recombinant protein was located inside the nucleus, whereas the chloroplasts showed only weak fluorescence. We therefore fused only the putative plastid target peptide to GFP and transformed protoplasts with this construct. As expected, the GFP fluorescence coincided only with the chlorophyll autofluorescence of the chloroplasts and no nuclear signal was observed (Fig. 5b).

Fig. 5
figure 5

Subcellular localization of At2g44940 gene products fused to GFP in Arabidopsis protoplasts. Fluorescent microscope images of GFP fluorescence and chlorophyll autofluorescence are shown in the left and middle images, respectively. The third column on the right depicts the merged images. a Two individual protoplasts that express the entire At2g44940 protein fused to GFP are shown. b One protoplast showing expression of the chloroplast target peptide (cTP At2g44940) fused to GFP is depicted

Discussion

Existence of proteins with sequences targeting them to the nucleus and either plastids or mitochondria

A systematic in silico search for dually targeted DNA-binding proteins from Arabidopsis and rice was performed by integrating the individual predictions of several prediction programs into a consensus prediction. With this approach, we identified approximately 90 transcription factors in Arabidopsis and almost twice as many transcription factors in rice that have a very high probability of possessing targeting sequences for the nucleus and at least one of the other two organelles (Fig. 1; Tables 3, 4, 5, 6). Many of the identified proteins were found to form orthologous groups and possess homologues in other plant species as well (Figs. 2, 3, 4 and data not shown). The same was observed for the SET domain proteins where all putative target-peptide containing proteins belong to the group of trx-related proteins (Table 2 as well as unpublished data).

Xiong et al. (2005) reported in a genome-wide comparative analysis between monocots and eudicots that approximately 50% of Arabidopsis and rice transcription factor genes form orthologous pairs or groups. They argue that the existence of such groups in two or more species hints at conserved functions of the proteins in monocotyledonous and dicotyledonous plants. A potential transit peptide for plastids or mitochondria has been conserved in orthologous proteins of the AP2/EREBP transcription factor family in a number of species (Figs. 2, 3, 4), suggesting that these proteins could indeed have a functional role within these organelles. Of particular interest with respect to this possible role is the fact that AP2 domain-containing proteins were recently discovered in a cyanobacterium, Trichodesmium erythraeum (Magnani et al. 2004; Wessler 2005). One possible interpretation of this observation is that the eukaryotic AP2 domain-containing proteins were derived originally from the algal ancestor of plastids. After multiplication, some of them could have retained a function in these organelles while many others were assigned new functions in the other DNA-containing compartments.

It is conspicuous that many putative plastid AP2-proteins belong to ERF groups II and III. These groups are characterized by additional specific C-terminal motifs. ERF group II is further subdivided into three subgroups, IIa, IIb and IIc (Nakano et al. 2006). Four putative dually targeted Arabidopsis proteins belong to the small subgroup IIb consisting of only seven members. All these proteins are characterized by the C-terminal CMII-3 motif. Interestingly, the same motif was also found in several members of the ERF group III, among them three further potentially dually targeted proteins. Whether there is a connection between the possession of this motif and a role inside the plastids cannot be resolved at this stage. Given this striking cluster of CMII-3 motif-containing proteins among the putative plastid-targeted transcription factors, it is surprising that no orthologues of these proteins were found in rice (see Fig. 2). However, two group II rice proteins achieved cTP consensus scores of 8.5 and 7.7, respectively, and therefore failed to reach our cut-off value. It cannot be precluded that these two proteins might represent plastid orthologues of the four Arabidopsis ERF group II members shown in Fig. 2.

So far, the localization of one AP2/EREBP protein from the DREB subgroup (At2g44940) was analyzed with fluorescent microscopical techniques. This analysis confirmed the presence and functionality of the predicted dual targeting signals in vivo (Fig. 5). Further experimental evidence will be needed to validate a presumed function of the identified candidates in the organelles. However, in many cases, the existence of paralogues and hence the possibility of functional redundancy could complicate the interpretation of experimental results.

Potential significance of nucleus/plastid and nucleus/mitochondria dually targeted proteins

A communication between the DNA-containing compartments is essential for plant cells since most organellar enzyme complexes are composed partly of nuclear-encoded subunits and partly of organelle-encoded subunits. This communication is characterized, for example, by nuclear control over plastid gene expression and a retrograde control of nuclear genes by a plastid signal. These mechanisms were summarized in a number of recent reviews (Richly et al. 2003; Strand 2004; Beck 2005).

Transcription factors that are dually targeted might play a key role in the coordinated regulation of nuclear and organellar genes in this context. Two possible ways are feasible by which the transcription factors could coordinate the gene expression in the different compartments. Both ways have been realized in yeast or animal cells. The first possibility implies that a protein would accumulate in both compartments simultaneously, either in the same cell type or under a similar developmental context. An example from yeast is the Rpm2 protein (Stribinskis et al. 2005). Such proteins can directly influence and co-regulate the expression of nuclear-encoded as well as organelle-encoded organellar proteins. The second possibility involves a development- or environment-induced retargeting of proteins as is evidently the case with the apoptosis-inducing factor (AIF) of yeast and mammalian cells. AIF is released from the mitochondria when these get disrupted during programmed cell death and is imported into the nucleus where it fulfils an important role in the coordinate degradation of nuclear DNA (Susin et al. 1999; Cregan et al. 2002; Ruchalski et al. 2006). Other well-studied examples for an influence of environmental factors on the localization of plant proteins are the phytochromes A, B, C, D and E whose nucleocytoplasmic partitioning is regulated by a diurnal rhythm and by light conditions (Kircher et al. 2002; Chen et al. 2005) or phototropin 1 that moves from the plasma membrane to the cytosol in response to blue light (Sakamoto and Briggs 2002).

So far, we can only speculate on whether a scenario similar to the ones mentioned also applies to plant transcription factors, since experimental data on nucleus/plastid- and nucleus/mitochondria-targeted plant proteins are extremely scarce. An interesting example is provided, however, by the dually targeted plant protein SEBF. This protein possesses a functional plastid target peptide and an RNA-binding domain reminiscent of that of heterogenous nuclear ribonucleoproteins (hnRNPs) (Boyle and Brisson 2001). The processed mature form of the protein was detected in the chloroplasts and, surprisingly, also in the nucleus, whereas the unprocessed form did not occur there (Boyle and Brisson 2001). Since no indication for a differential splicing was obtained, this raises the question whether the precursor was processed outside the chloroplast or whether the imported mature plastid protein was re-targeted to the nucleus. In line with such speculations, observations regarding the physical interaction of plastids as well as mitochondria with the nuclear envelope gain importance. Plastids seem to be attracted to the nucleus under certain circumstances and can interact with the nuclear envelope through stroma-filled tubular extensions termed stromules (Kwok and Hanson 2004). A clustering of plastids around the nucleus was, surprisingly, also seen in Arabidopsis protoplasts expressing the At2g44940 fusion protein (Fig. 5). The reason for this is unclear. A similar behavior was recently reported for mitochondria that seem to accumulate close to the nuclear envelope in leaf mesophyll cells undergoing programmed cell death (Selga et al. 2005).

In contrast, disintegration of chloroplast envelope membranes and vesicle ‘blebbing’ have recently been brought up as possible fates of ageing chloroplasts in senescing plant cells (Krupinska 2005). According to this scenario, plastid proteins might be released to the cytosol under these conditions. From there they could be imported into the nucleus, as is the case for some mitochondrial proteins in animal cells undergoing apoptotic cell death (e.g. AIF, see previous). A conditional re-targeting of organellar proteins could represent a novel mechanism of communication between the nucleus and the organelles, especially in situations such as pathogen attack, abiotic stresses or senescence, and would add a new dimension to our knowledge on the complex network of intercompartmental crosstalk. Indeed, a number of the proteins identified by our screen belong to families such as the DREB proteins whose association with stress responses is known. These proteins would thus be candidates for such a regulatory role.

In summary, our survey demonstrates the likely existence of more than the currently known proteins with nuclear as well as plastid or mitochondrial localization. Many of these factors belong to families that respond to external or internal stress stimuli and play a role in stress response reactions. Whether these putative dually targeted proteins are indeed part of the interorganellar communication network in plant cells and are able to affect the gene expression in two or more compartments and thereby contribute to stress response reactions will certainly be revealed in the future by a closer characterization of these proteins.