Introduction

To carry out experiments with transgenic plants, the researchers commonly need to design specific genetic constructs providing a certain expression pattern of the transgene(s). The most important element of any genetic construct is a promoter providing an appropriate transcription pattern. The requirements for promoter activity can vary considerably: a constitutive transcription in all tissues is demanded in some cases and tissue-specific, stage-specific, or inducible transcription is required in others. However, the set of commonly used promoters is very small, and these promoters provide only a few variants of transgene transcription pattern. It is quite evident that this limitation hampers experiments with transgenic plants (Furtado et al. 2008; Qu le et al. 2008).

Information on promoters of many plant genes is available in the literature. The gene expression control is frequently studied with the help of a reporter construct where the promoter DNA segment is located upstream of a reporter gene. Analysis of the reporter protein activity in transgenic plants allows for assessing the functional characteristics of the promoter being studied. The promoters described can, in principle, provide a wide range of choice of appropriate transcription patterns. For instance, if a foreign gene can be expressed at a high level but in a tissue-specific manner, a dozen variants could be considered. Thus, the tomato Lat52 promoter segment provides a high level of GUS expression in tobacco pollen (Bate and Twell 1998); rice prolamin and glutenin promoter segments increase by four–sixfold the GUS gene activity in transgenic rice endosperm as compared with the maize ubiquitin promoter (Qu and Takaiwa 2004); oat globulin promoter segment directs a strong endosperm-specific GUS expression in barley seeds (up to 10% of soluble protein; Vickers et al. 2006); and sweet potato ADP-glucose pyrophosphorylase promoter provides a high-level expression of the GUS reporter gene in Solanum tuberosum tubers (Kim et al. 2009). In some cases, it has been demonstrated that such tissue-restricted expression is more beneficial as compared with the typical constitutive promoters (e.g., cauliflower mosaic virus 35S RNA or maize ubiquitin promoters; Qu and Takaiwa 2004; Tiwari et al. 2006; Eskelin et al. 2009). Development of new methods for increasing plant tolerance to various stresses also requires promoters with a specific transcription pattern, for example, RD29A, COR15A, and DREB1 (Yamaguchi-Shinozaki and Shinozaki 1993; Baker et al. 1994; Yang et al. 2011). In general, a toolbox of promoters with known specificities would be a valuable resource to control the expression of transgenes in an appropriate manner for both plant improvement and molecular farming (Furtado et al. 2008; Qu le et al. 2008). The usage of promoters with less sequence homology but similar specificities will also be crucial in avoiding homology-based gene silencing when expressing more than one transgene in the same tissue (Furtado et al. 2009).

Experiments with transgenic constructs are frequently used to clarify the structure–function organization of promoters. For this purpose, several types of experiments are commonly used, namely, deletion analysis, point mutagenesis, and detection of cis-acting transcription factor (TF) binding sites. Despite an ultimate goal of such research being a full reconstruction of promoter organization (i.e., detection of the full set of TF binding sites and combinatorial pattern of their activities), the corresponding papers frequently also contain other valuable data. While characterizing promoters, researchers usually create a set of genetic constructs carrying the genomic DNA segments from the studied promoter region, located upstream of the reporter gene coding sequence (CDS). The constructs are expressed in transgenic plants, where the pattern of reporter protein synthesis (commonly, GFP or GUS) reflects the promoter characteristics. These data can further be used to detect TF binding sites and model the promoter structure and the mechanisms involved in transcription control. However, such information is evidently useful per se, since the genomic DNA segments with appropriate patterns of transcriptional activity represent potential promoters for plant transgenesis. These data are particularly interesting since different deletion variants can contain various combinations of TF binding sites and provide transcriptional patterns other than the full promoter regions. In our opinion, a specialized database on specific transcriptional activities of plant genomic DNA segments could be a valuable source of new candidate promoters for transgenic experiments.

A number of databases containing the information about promoter sequences and TF binding sites have been developed. However, most of them only accumulate the data on promoter organization and cis-acting regulatory elements. Here follows a short list of available www resources. The PLACE database (Higo et al. 1999) stores the consensus binding sequences for plant-specific transcription factors. Three interlinked databases—AtTFDB, AtcisDB, and AtRegNet in the Arabidopsis Gene Regulatory Information Server (AGRIS)—furnish comprehensive and updated information on the TFs, predicted and experimentally verified cis-regulatory elements, and their interactions (Yilmaz et al. 2011). The PlantProm database (PPDB) contains the nucleotide sequences of plant promoters with experimentally verified transcription start sites (Shahmuradov et al. 2003). PPDB contains information on Arabidopsis and rice promoter structures, and the transcription start sites predicted from full-length cDNA clones to TSS tag data. The core promoter structure, presence of cis-acting regulatory elements, and distribution of transcription start site clusters can also be viewed (Yamamoto and Obokata 2008). Athena and Osiris are the www resources for rapid visualization and systematic analysis of Arabidopsis (O’Connor et al. 2005) and rice (Morris et al. 2008) promoter sequences. Athena contains up to 3 kb of the promoter sequences for predicted Arabidopsis genes and the consensus sequences for 105 previously characterized TF binding sites imported from PLACE to AGRIS. Osiris contains the promoter sequences, predicted TF binding sites, gene ontology annotations, and microarray expression data for 24,209 genes of the rice genome. AthaMap database provides a genome-wide map of potential TF binding sites in Arabidopsis thaliana. The database contains the sites for 115 different TFs (Bülow et al. 2010). PlantPAN is the Plant Promoter Analysis Navigator for recognizing combinatorial cis-regulatory elements with a distance constraint in sets of plant genes (Chang et al. 2008). Thus, the available www resources mostly provide data on transcription factor binding sites and lack the information on experimentally verified promoter activities of plant DNA segments. In principle, some data (e.g., the presence of specific cis-acting regulatory elements) can be used to select candidate promoters. However, such predictions are most frequently ambiguous, as demonstrated by the following two examples: (1) despite the GluC promoter containing no endosperm-specific motifs (GCN4, AACA, or prolamin box), it directs a high-level transcription in this tissue (Qu le et al. 2008), and (2) the peach gene Pptha1 was detected by its cold-inducibility, but the promoter region failed to determine such expression pattern in A. thaliana (Tittarelli et al. 2009), etc. In general, the information on the DNA segments with certain types of transcriptional activity in transgenic experiments looks to be the most reliable source of potential promoters (Potenza et al. 2004; Jones and Sparks 2009; Peremarti et al. 2010; Xiao et al. 2010). Thus, we developed a specialized database (TransGene Promoters, TGP) compiling information annotated from the literature. The TGP provides the data on candidate promoters with experimentally verified transcriptional patters in transgenic plants of different species.

TGP database description

Although the set of very well-characterized promoters used in plant gene engineering is rather small, published experimental investigation of many plant genes has frequently provided some details on DNA segments transcriptional activities. For example, study of a promoter region commonly includes deletion analysis where the DNA segments of different lengths are placed upstream of the reporter gene and their expression characteristics are tested in experiments with the transgenic plants. This information may be found in the literature and further used for the selection of a promoter with potentially appropriate properties. Currently, this information may not be automatically retrievable and may not have so far been accumulated in database format.

The database was constructed on the SRS (Sequence Retrieval System) platform and contains three cross-linked sections: TGP_PROMOTER, TGP_SEQUENCE, and TGP_GENE. Typically, study of the promoter region of a target gene involves experimental analysis of several deletion variants (i.e., genomic DNA segments of different lengths). If some of such DNA segments demonstrate certain promoter activities in experiments with reporter constructs in transgenic plants, they are selected for annotation in the TGP_PROMOTER section, while the data on their nucleotide sequences and the corresponding gene are annotated in the TGP_SEQUENCE, and TGP_GENE sections, respectively.

The TGP_PROMOTER (Table 1) contains information about the promoter size, positions in corresponding GenBank entry, position of transcription start site (TSS) or translation initiation site (TIS) as well as a general description of the promoter sequence used in transgene construct (fields LOCALIZATION and DESCRIPTION). [Example: the regulatory region upstream of the coding part of the reporter gene includes a promoter fragment (387 bp upstream of the transcription start site), 89 bp of the 5′UTR, and 68 bp of the coding sequence of the potato Ci21A gene.] This information is important for designing transgene constructs. The entry also contains data on the plant species used in experiments, reporter gene, and the factors influencing transcription. The field TARGET SPECIES provides a list of plant species where promoters were tested in transgenic experiments. For example, the P1 promoter of the Arabidopsis SAG12 gene was evaluated in nine species. The field COMMENT gives a summary of the experimental data potentially useful for evaluation of promoter specificity and expression pattern. The nucleotide sequence of the described promoter can be retrieved from the corresponding GenBank entry according to the positions indicated in the LOCALIZATION field. Nonetheless, for the sake of convenience, we additionally compiled the promoter nucleotide sequences indicating the promoter positions relative to TSS or TIS (marked as +1) in the TGP_SEQUENCE section (Table 1). Finally, TGP_GENE (Table 1) contains the descriptions of native genes from which the promoter variants were annotated. Each entry contains the data on the corresponding protein and species as well as some information about gene activities. All these sections are cross-linked.

Table 1 Examples of the TGP_PROMOTER, TGP_SEQUENCE and TGP_GENE entries

Currently, the TGP database contains description of 224 promoters with their nucleotide sequences corresponding to 114 genes. They belong to 26 plant species (mostly to A. thaliana, Oryza sativa, and Nicotiana tabacum). The transgenic experiments have been made with 28 plant species. The database describes the promoters whose expression is sensitive to 37 different exogenous and endogenous stimuli, such as heavy metals, elicitors, hormones, cold, drought, salt, dehydration, infection, light, senescence, etc. According to the annotated information, these promoters are active in 40 different tissues and cell types, namely, seeds (51 promoters), roots (47 promoters), and pollen (12 promoters); more information is listed in Table 2.

Table 2 Number of tissue-specific and и regulated promoters in the TGP database

How to search in TGP database

The TGP database allows the user to search for candidate promoters whose expression was experimentally studied in particular species. TGP has a user-friendly SRS interface. Detailed tutorial (how-to-use) is available at the database www site. It is possible to use a combined search with the help of different logic operators (AND, OR, etc.). Below, we have provided a brief description of a few typical queries. TGP (based on the SRS system) allows for various types of queries, for example:

  • finding promoters working in a particular plant species;

  • finding promoters influenced by a particular regulator;

  • finding promoters working in a particular plant species and influenced by a particular regulator;

  • finding promoters isolated from particular plant species;

  • finding promoters influenced by several different regulators;

  • finding promoters active in certain organ or tissue; and

  • finding the tissue-specific promoters responsible to a particular regulator.

On the home page of the TGP_PROMOTER section, the field’s names are present at the left column (Fig. 1). State “ok” at the right column indicates that the corresponding fields are searchable. Column “No of Keys” reflects the number of terms, for example, field “Target species” contains 27 different species; field “Keywords” contains 94 different terms, etc. To find the promoters working in a particular plant species, the user may click the field Target species” at the home page of the TGP_PROMOTER table (marked by arrow 1 in the Fig. 1). On the next page clicking the button “List Values” results in a list of transgenic species (currently, 27).

Fig. 1
figure 1

The home page of the TGP_PROMOTER

For instance, to find the promoters whose activities have been verified in barley, click barley (Hordeum vulgare). This will result in a list of links (ID contains info on the corresponding species; e.g., Hv means Hordeum vulgare or Ta, Triticum aestivum). Check the list of promoters the expression of which has been evaluated in the specified organism.

To browse the full list of regulators, the user may click the field “Regulator” on the home page of the TGP_PROMOTER table (Fig. 1, see above). On the next page, clicking on the button “List Values” will give a current list of regulators of the TGP database. To find the promoters influenced by low temperature, click “cold” to get the list of corresponding entries.

To find the promoters that are active in a particular plant species and are influenced by a particular regulator, the user has to click the button “Search” on the home page of the TGP_PROMOTER table (Fig. 1). This will result in a Standard query form for the TGP_PROMOTER table (Fig. 2).

Fig. 2
figure 2

Standard SRS query form for the TGP_PROMOTER

From the drop-down menu “combine searches with”, select AND (marked by arrow 1 in Fig. 2). Select the field “Target_species” from the drop-down menu (arrow 2). In the text box, type the species name (e.g., tobacco, arrow 3). Next, select the field “Regulator” from the drop-down menu (arrow 2) and type the corresponding term in the text box (e.g., cold, arrow 3). Click the button “Submit Query” (arrow 4). This will give the list of promoters active in tobacco and influenced by cold.

TGP was constructed as a tool for selection of candidate promoters with appropriate characteristics. Thus, we did not try to make a full annotation of various experimental data concerning the promoters described. The main idea of TGP was to provide the link between a DNA segment and its ability to direct transcription of a reporter gene (level of expression, specificity, and species of transgenic plants). Thus, the deletion variants of full-length promoters were also annotated, since they could provide a transcription pattern distinct from their full-length variant (and sometimes more interesting for gene engineering). For example, a twofold induction was observed in the case of the −608 bp pea TOP2 promoter (Ps:TOP2_P1) after salicylic acid treatment and more than a threefold induction was observed in the case of the −468 bp TOP2 promoter (Ps:TOP2_P2) with the same treatment (Hettiarachchi et al. 2005). This provides an opportunity to select which level of induction is more suitable to solve a particular experimental problem. The experimentally measured activities of promoters with various lengths compiled in TGP are a useful supplement to the data on the promoter structure and can be used for basic research in molecular and computer biology. The information on promoters is updated on a regular basis from the information contained in published scientific papers.