Introduction

In recent years, a variety of investigations has been conducted aimed at identifying DNA binding proteins and conserved transcription factor binding sites genome-wide in a single species or in evolutionarily related species. The genome of Saccharomyces cerevisiae (e.g., Gasch et al. 2000; Hughes et al. 2000) and genomes of the genus Saccharomyces (e.g., Kellis et al. 2003; Chiang et al. 2003; Cliften et al. 2003; Moses et al. 2003, 2004) were used as convenient models in these studies. Recently, Gasch et al. (2004) have extended the analysis to 10 Hemiascomycete species and four other Ascomycete species.

In the year 2000, a couple of French laboratories had started the Génolevures project (Souciet et al. 2000), with the goal of obtaining genomic information from 13 species of the class Hemiascomycetes, simple fungi the vast majority of which are yeasts, that could be used to study the level of genetic diversity between these yeast species and the level of protein divergence within them (Malpertuy et al. 2000) as well as the level of synteny conservation between these genomes (Llorente et al. 2000; Fischer et al. 2001). These studies established that the level of genetic diversity between yeast species is often unsuspected. For instance, the average protein divergence of >50% found between Saccharomyces cerevisiae and Yarrowia lipolytica revealed that Hemiascomycetes are molecularly as diverse as the entire phylum of chordates (Makalowski et al. 1998; Dujon 2006).

Meanwhile, 15 more or less complete annotated genome sequences are available from the Hemiascomyctes class (see below). This information can be used to exploit the genomic evolution of regulatory networks in these species, including the genes regulated by specific transcription factors and the cognate cis-regulatory elements interacting with these factors.

In the present study we have focused on the 26S proteasome system, as it meets the requirements for a thorough in silico analysis. The 26S proteasome, responsible for the programmed proteolysis of proteins, has been intensely studied in S. cerevisiae, and comparisons with higher eukaryotic organisms have shown that at least the components of the 20S core particle and the six AAA+ ATP-binding proteins (RPTs) of the 19S cap particle are highly conserved from yeast to mammals (for an overview see Wolf and Hilt 2004). The S. cerevisiae genes for these entities are essential and single-copy throughout. Nearly all of the genes encoding the subunits of the 20S core (except PRE5) and the 19S regulatory particle (except RPN8, RPN10, and RPN13) possess a unique (nondegenerate) upstream nonamer box (GGTGGCAAA) which we called PACE and which was shown to bind to Rpn4p, a transcriptional activator (Mannhaupt et al. 1999). Further studies have elucidated that Rpn4p is a ligand, substrate, and transcriptional regulator of the 26S proteasome and exerts a negative feedback control (e.g., Xie and Varshavsky, 2001; Ju and Xie, 2004; Wang et al. 2004) and that Rpn4p participates in regulatory networks such as DNA damage repair (Jelinsky et al. 2000), stress responses (Owsianik et al. 2002), and filamentous growth (Prinz et al. 2004).

We compared the relevant elements of the proteasome system from Hemiascomycetes using those from S. cerevisiae as a reference genome. Our study revealed that these elements are highly conserved among the Hemiascomycetes, suggesting that similar control mechanisms of the proteasome system are operative among these yeast species. Extending the comparisons to data from S. pombe, three filamentous fungi, A. thaliana, and human, we did not detect true counterparts for Rpn4p or PACE-like elements in the latter species. These observations suggest that the regulation of the proteasome system in species other than the Hemiascomycetes is subject to different, but still unknown, control mechanisms.

Methods

Retrieval and Comparison of Gene Sequences

Orthologues for the 20S core subunits, the 6 RPT subunits, and the 14 RPN subunits of the 19S cap as well as those for Uba1p and Cdc48p were retrieved by searching the MIPS PEDANT databases (http://www.pedant.gsf.de) with the BLAST algorithm (Altschul et al. 1990). Data collections and references for the original genome sequences are as follows: http://www.yeastgenome.org or http://www.mips.gsf.de/genre/proj/yeast/ for S. cerevisiae (Goffeau et al. 1996); http://www.broad.mit.edu/annotation/fungi/comp_yeasts/ for S. paradoxus, S. mikatae, and S. bayanus (Kellis et al. 2003); http://www.genome.wustl.edu/ for S. castellii, S. kluyveri, and S. kudriavzevii (Cliften et al. 2003); http://www.broad.mit.edu/ for K. waltii (Kellis et al. 2004); http://www.cbi.labri.fr/Genolevures/ for C. glabrata, K. lactis, D. hansenii, and Y. lipolytica; http://www.agd.unibas.ch for A. gossypii (Dietrich et al. 2004); http://www.candidagenome.org for C. albicans (Jones et al. 2004) and C. dubliniensis; http://www.genedb.org/genedb/pombe/ for S. pombe (Wood et al. 2002); database N. crassa (Galagan et al. 2003) and http://www.mips.gsf.de/genre/proj/ncrassa/ for N. crassa; http://www.broad.mit.edu/annotation/genome/aspergillus_nidulans/Home.html for A. Nidulans; Aspergillus fumigatus genome project http://www.sanger.ac.uk/Projects/A_fumigatus/ for A. fumigatus (Nierman et al. 2005); http://www.mips.gsf.de/proj/plant/jsf/athal/index.jsp for A. thaliana (Arabidopsis Genome Initiative 2000); and Human Genome Resources http://www.ncbi.nlm.nih.gov/genome/guide/human/ for H. sapiens.

Searches for PACE-like Upstream Sequences

Five hundred base pairs of 5′-upstream sequence for each S. cerevisiae gene was extracted from a file provided on the MIPS FTP server (ftp://www.ftpmips.gsf.de/yeast/sequences/Scerevisiae_utr5_500.fa). For all other species, 500 bp of 5′-upstream sequences was extracted from the respective PEDANT database by internal accession using appropriate MYSQL queries. Multiple FASTA files containing these species specific promoter sequences were used as input into a JAVA based pattern search program, listing all patterns of 9mers, occurring on all promoters on both strands. In a second step, the PACE motif known from S. cerevisiae (GGTGGCAAA) was used as a search pattern, allowing two mismatches. The output for each species lists the resulting motifs in descending frequency and their positions together with the codes for the respective proteins.

The RSA tool (http://www.rsat.ulb.ac.be/rsat/) was used to list PACE or PACE-like sequences in the Hemiascomycetes species included in this program.

Alignment Tools

Alignment of the Rpn4p orthologues or upstream promoter sequences of proteasomal genes was done using the CLUSTAL W routine at the EBI server (http://www.ch.embnet.org) or DiAlign (http://www.dialign.gobics.de [Morgenstern et al. 2006]).

Results

Similarities of Proteasomal Gene Products from Other Species to S. cerevisiae

The sequences for 12 gene products of the 20S core, the 6 RPTs, and the RPN gene products from the 19S cap particle as well as those of the homologues of Uba1p and Cdc48p from the 15 Hemiascomyctes species and the 6 “outgroup” species (S. pombe, N. crassa, A. nidulans, A.fumigatus, A. thaliana, and H. sapiens) analyzed here were retrieved and compared as described under Methods. The sequences from S. cerevisiae were taken as a reference; we felt that pairwise comparison of all of these sequences was unnecessary. The results are presented in Table 1. For simplicity of discussion, we have divided the 15 Hemiascomycetes species into three groups: group 1 comprises S. cerevisiae, S. paradoxus, S. mikatae, S. bayanus, and S. kudriavzevii (whereby this group represents the Saccharomyces “sensu strictu” species); group 2 comprises S. castelli, S. kluyveri, A. gossypii, K. lactis, K. waltii, and C. glabrata; and group 3 comprises C. albicans, C. dubliniensis, D. hansenii, and Y. lipolytica.

Table 1 Homologies of 20S and 19S components from various species vs. S. cerevisiae

The conservation of the 20S core proteolytic subunits (Table 1A) is remarkably high: it ranges from 98% to 100% similarity within group 1 and from 81% to 97% within group 2. A noticeable decrease in similarity is seen in C. albicans and C. dubliniensis as well as in D. hansenii and Y. lipolytica (group 3), from 67% to 88%. Similarities within the “outgroup” species are still remarkably high (61% to 86%) but below those within group 3.

Sequence similarities among the Rpt products (Table 1A) are even higher than those observed for the 20S core subunits: 98%–100% within the Saccharomyces species (group 1) and 89%–99% in group 2. A slight decrease in similarity is seen for C. albicans and C. dubliniensis (85%–94%) as well as for D. hansenii (84%–94%) and Y. lipolytica (86%–94%). Remarkably, there is still high similarity for the “outgroup” species (81%–89%). An interesting finding was that in Arabidopsis many proteasomal genes (as far as sequences were available) are duplicated. It is noteworthy to stress that in none of the Hemiascomycetes species or in the other species were duplicates for any of the proteasomal genes detected. For the species listed in the Yeast Genome Order Browser (YGOB; http://www.wolfe.gen.tcd.ie/ygob) (Scannel et al. 2006), this was verified by looking up all genes relevant for our study. Interestingly, S. castellii has a second gene product, each with similarity to Rpn4 and Cdc48, respectively. However, the second copy of S. castellii Rpn4 (713.11) might be a nonfunctional relic, as similarities at the N-terminus and within the regions where the acidic domains are located are largely lost. Therefore we have relied in comparisons only on the first copy of Rpn4p (718.67), which has retained the characteristic features throughout the sequence.

Compared to the Rpt and 20S moieties, the RPN gene products on average are less conserved with reference to those of S. cerevisiae (Table 1B). This may be due to the fact that the single species have different lifestyles, and hence the functions of the Rpn proteins had to be adapted correspondingly. Note, for example, that subunits similar to Rpn13p and/or Rpn14p may be even missing from some species.

The most pronounced deviations in similarity become apparent when comparing the Rpn4p homologues: while the similarity ranges from 86% to 92% in the Saccharomyces “sensu stricto” (group 1), there is a sudden drop in similarity (39%–50%) when the Rpn4p homologues of the remaining Hemiascomycetes species are compared to Rpn4p from S. cerevisiae (Table 1B), mainly due to substantial variations within the central part. Therefore, we have aligned all Rpn4p-like sequences retrieved from the databases and carefully checked them for the presence of their known characteristic features, the highly conserved atypical Zn-finger at the C-terminus, and the two acidic domains in the center of the sequence as observed in S. cerevisiae (Mannhaupt et al. 1999). These features were found to be highly conserved among the Rpn4p homologues from the group 1 species (Fig. 1). Among the remaining Hemiascomycete species (groups 2 and 3), the Rpn4p homologues exhibit the conserved atypical Zn-finger at the C-terminus, which is always the most highly conserved part of the sequence, because it represents the DNA-binding domain (see also Gasch et al. 2004). Though deviating in sequence and length, two acidic domains are present in the Rpn4p homologues of all Hemiascomycetes species, occurring in similar relative positions (Fig. 1 and Supplement 1). Therefore, we conclude that the Rpn4p homologues from all Hemiascomycetes represent true transcription factors involved in the regulation of the proteasomal and further genes in these organisms. By contrast, the only conserved feature in the Rpn4p-like sequences from the “outgroup” species is the occurrence of the highly conserved C-terminal Zn-finger (see Supplement 2). Note, however, that the loops between CxxC and HxxxxH of the Zn-finger are shorter (14 instead of 21) in these cases than for the rest of the sequences, and that the protein sequence from H. sapiens is even considerably shorter. In none of the “outgroup” Rpn4p-like sequences could we detect any acidic domains.

Fig. 1
figure 1

Schematic representation of conserved domains in the Rpn4p homologues from Hemiascomycetes species. Lengths of proteins (in amino acid residues) are indicated. Black boxes, atypical Zn-finger; gray boxes, acidic domains 1 and 2; light-gray boxes, N-terminally conserved sequences. For more details, see Supplement 1

Presence of PACE-like Sequences in S. cerevisiae and Other Species

Next we inspected the 5′-upstream sequences of the proteasomal genes from the Hemiascomycetes as well as those of UBA1 (ubiquitin activating enzyme; E1) and CDC48 (ATPase in ER, nuclear membrane, and cytosol) for the occurrence of PACE or PACE-like elements. Both are single-copy genes in S. cerevisiae and belong to a large group of genes that appear to be under the control of Rpn4p (Mannhaupt et al. 1999; Kapranov et al. 2001). Interestingly, the PACE box is fully conserved in the majority of the CDC48 promoters (see Table 2), except in D. hansenii and Y. lipolytica. As indicated in Table 1A, the homologues of Uba1p and Cdc48p on average share an even higher degree of similarity with their counterparts from S. cerevisiae than the proteasomal genes, pointing to their absolute requirement throughout eukaryotes.

Table 2 PACE and PACE-like motifs upstream of Hemiascomycetes genes

An interval of 500 bp upstream from the translational start site was chosen, as the elements in S. cerevisiae are located within this region, varying between position –83 and position –163. Likewise, searches for PACE or PACE-like elements in Hemiascomycetes as far as they are available for the RSA (regulatory sequence analysis) tools (e.g., van Helden 2003) indicated their presence upstream of proteasomal gene promoters in noncoding regions. Our JAVA program (see Methods) applied to the Hemiascomycetes promoters delineated all GGTGGCAAA elements and degenerate nonamers thereof with maximally two base exchanges. These sequences (hits) were sorted by decreasing frequency and one (in a few cases, two) of these hits was selected for each gene that fulfilled the following criteria: (i) frequency ≥1; (ii) none, one, or two base exchanges, in this order; and (iii) the motif preferably conforming to the sequence DRTGGCRAN (i.e., leaving the “central” core of PACE unchanged). These criteria were built on the following three observations.

In our previous reports we observed that modification of the central GC or an exchange of the (central) C residue in PACE abolishes or reduces the binding of Rpn4p (Mannhaupt et al. 1999; Kapranov et al. 2001).

With four exceptions, the proteasomal genes from the Saccharomyces “sensu strictu” species (Table 2A, group 1) possess the unique sequence GGTGGCAAA (in direct or opposite orientation with the respective gene) in comparable distance upstream from the translational start site. This motif deviates by one nucleotide (GGTGGCGAA) in the promoters of RPN10 and RPN13, respectively, and by the first nucleotide (AGTGGCAAA) in the promoters of PRE5 and RPN8, respectively. However, in the light of earlier microarray expression profiles (Eisen et al. 1998), these alterations seem to be tolerated, i.e., these modified boxes should act as functional cis-regulatory elements in binding Rpn4p.

Furthermore, Jelinsky and colleagues (2000) have delineated groups of coregulated genes in S. cerevisiae whose upstream regions bear specific regulatory sequence motifs. They observed that one group of coregulated genes contained a number of DNA excision repair genes and a large selection of protein degradation genes. Moreover, transcription of these genes was found to be modulated by Rpn4p, most likely via its binding to MAG1 upstream repressor sequence elements (GGTGGCGA), which turned out to be almost identical to the proteasome-associated control element (PACE). The authors’ statement “that the MAG1 element normally behaves as a repressor binding site does not necessarily exclude Rpn4p’s behaving as an activator at this site” may be taken as a further indication that GGTGGCGA can act as a functional cis-regulatory element in binding Rpn4p.

Our results are outlined schematically in Fig. 2 and detailed in Table 2. As can be inferred from Table 2A, S. paradoxus, the species most closely related to S. cerevisiae, exhibits an identical “PACE pattern.” Remarkably, also the upstream positions of the motifs are very similar, if not identical, to each other. The “PACE pattern” changes only minimally within the Saccharomyces “sensu stricto” group, as far as we can conclude from the sequences available in the databases.

Fig. 2
figure 2

Schematic representation of the occurrence of PACE and PACE-like motifs in the upstream regions of proteasomal genes in Hemiascomycetes species. White box, genuine PACE, GGTGGCAAA; light-gray box, motif with one base exchange vs. PACE, conforming to DGTGGCRAN; gray box, motif with two base exchanges vs. PACE, conforming to DGTGGCRAN; dark-gray box, two base exchanges not conforming to DGTGGCRAN. NA, upstream sequence not available. For more details, see Table 2

In group 2 (Fig. 2, Table 2B), we observed the occurrence of a genuine PACE element for the majority of the proteasomal genes, though the upstream positions of these elements are much more variable compared to those in group 1. Further, there is an increasing number of cases in group 2 (particularly for K. lactis and C. glabrata), in which the PACE element is presumably substituted by either AGTGGCAAA (change of G to an A residue in position 1) or GGTGGCGAA, or even, in one case, by a PACE-like sequence with two base exchanges conforming to the above rules. In C. glabrata, we observe the “canonical” PACE element in only 5 cases among the 32 promoter sequences. But interestingly, C. glabrata exhibits an element (TGTGCCAAA) similar to AGTGGCAAA six times. The “alternative” PACE element GGTGGCGAA is present in C. glabrata three times.

The “PACE patterns” of the group 3 species (Fig. 2, Table 2C) exhibit still greater variations than those of group 2. In Y. lipolytica the “canonical” PACE element is found in four cases, while C. albicans and C. dubliniensis exhibit this sequence only for CDC48 (see below); none occurs in D. hansenii. However, instead we observed again a number of elements in which position 1 has been changed to an A residue: AGTGGCAAA occurs four times in C. albicans and three times in C. dubliniensis and D. hansenii, respectively; no such element is present in Y. lipolytica. The “alternative” PACE element GGTGGCGAA occurs only once in D. hansenii and Y. lipolytica, respectively, and not in the other members of group 3.

In addition, one observation we paid particular attention to is that in K. lactis, C. glabrata, and the group 3 PACE patterns, increasing numbers of cases are found in which PACE-like elements with one base exchange at their 3′-ends (GGTGGCAAN) occur (see Tables 2B and C). The figures are as follows: 1 in 32 for K. lactis; 5 in 32 for C. glabrata; 7 in 32 for C. albicans and C. dubliniensis, respectively; 8 in 32 for D. hansenii; and 9 in 32 for Y. lipolytica. Among these species, we also observed a number of PACE-like elements with variations in both position 1 and position 9; i.e., only the seven core positions have been conserved. For example, the occurrence of AGTGGCAAN is 4 in 32 for C. albicans and C. dubliniensis, respectively; 3 in 32 for D. hansenii; and 4 in 32 for Y. lipolytica. A PACE-like element conforming to the sequence GAAGGCAAA (i.e., changes in positions 2 and 3 vs. the canonical element as reported by Gasch et al. [2004]) is present at 5 in 32 in C. albicans and C. dubliniensis, respectively, and 2 in 32 in D. hansenii, but zero times in Y. lipolytica. These findings are discussed below.

Discussion

We performed a search in 15 Hemiascomycetes species (see Methods) to exploit the evolutionary maintenance of the transcription factor Rpn4p together with the so-called PACE element, which initially has been identified as an Rpn4p binding site for the majority of the proteasomal and a number of additional genes in S. cerevisiae (Mannhaupt et al. 1999; Kapranov et al. 2001). Thus, in extension to the 10 species analyzed by Gasch et al. (2004), we were able to add 2 species (C. glabrata and K. lactis) with an intermediate phylogenetic relationship to the Saccharomyces and 3 species (C. dubliniensis, D. hansenii, and Y. lipolytica) with a more distant phylogenetic relationship to the Saccharomyces species (Dujon 2006). As an “outgroup” in our searches, we have used the corresponding sequences from S. pombe, N. crassa, two Aspergillus species, Arabidopsis, and human (see Methods).

In accordance with earlier notions (e.g., Wolf and Hilt 2004) we found that the 30 proteasomal gene products considered here as well as Uba1p and Cdc48p are highly conserved throughout all these species (Table 1), as they are of fundamental importance for cellular function in eukaryotes. Further, the domain structures of the Rpn4p homologues in the Hemiascomycetes are well conserved. The highest similarity is observed for the C-terminal portions (ca. 130 residues) in which the domain of the atypical Zn-finger is located (Fig. 1). CLUSTAL W resulted in a nearly perfect alignment of the sequences in this region (Supplement 1). By contrast, the acidic domains reveal a greater divergence, except those among the S. cerevisiae “sensu stricto” species. However, by CLUSTAL W alignments and by eye inspection, two stretches rich in acidic amino acids, as well as similar in length and relative distances, are present in the residual Rpn4p-like sequences. As the acidic domains will probably function as activating domains, they need not be as strictly conserved as DNA-binding domains like Zn-fingers. Thus the high similarity of the acidic modules in Rpn4p of the “sensu stricto” species reflects their close evolutionary relationship, while the greater variation of these modules among the other species is likely to be a consequence of much greater evolutionary distances.

A microarray-based genomic survey had revealed that the S.cerevisiae proteasomal gene cluster exhibits a “stereotypical” expression pattern under varying environmental conditions (Eisen et al. 1998). Moses and colleagues have convincingly shown that functional PACE elements are evolutionary maintained in the upstream regions of those proteasomal genes from the Saccharomyces “sensu strict” group that follow the “stereotypical” expression pattern (Moses et al. 2003, 2004).

In our hands, conventional benchmarking tools (Pollard et al. 2004) such as DiAlign or CLUSTAL W (data not shown) allowed for the detection of PACE or PACE-like sequences in the upstream promoters of most proteasomal genes from the “sensu stricto” species, while extending such searches to the group 2 and group 3 species was hampered by the fact that during evolution a greater extent of rearrangements (including deletions/insertions) among homologous genes and their flanking sequences in general has occurred (e.g., Dujon et al. 2004; Fischer et al. 2006). When we tested DiAlign or CLUSTAL W to align the 500-bp upstream sequences from the 15 Hemiascomycetes, even those sequences that enharbor a unique PACE element were not correctly aligned. These routines also largely failed in pairwise alignments. However, when we preselected 50 bp including the presumptive elements detected by our JAVA program, these were correctly aligned by CLUSTAL W, at the same time demonstrating that the sequences flanking the elements share little or no similarity except those of the “sensu stricto” (group 1) species.

While Gasch et al. (2004) have chosen statistical approaches to build “meta-matrices” for PACE-like upstream elements, the simple routine we developed basically lead to similar results. The criteria we applied to the selection of the motifs in our approach (see Results) were based on earlier findings and are in agreement with an important hypothesis of the comparative genomics paradigm stating that as evolutionary distance increases, observing a match with a given level of conservation should become less and less likely by chance. Moses and collaborators (2003, 2004) have characterized the evolution of experimentally validated transcription factor binding sites (TFBSs) in the Saccharomyces cerevisiae genome, finding that functional TFBSs evolve more slowly than flanking intergenic regions, pointing to a purifying selection of such elements. They concluded that as evolutionary distance increases, one would expect fewer matches to a given matrix to be conserved by chance. Although not every functional binding site will remain under purifying selection, as a result of either functional change or binding-site turnover, a large subset of functional binding sites does remain under purifying selection. These authors also pointed out that there might be considerable position-specific variation in evolutionary rates within TFBSs. They further showed that evolutionary rate at each position is a function of the selectivity of the factor for bases at that position. We paid attention to this finding in that we considered rather exclusively PACE-like motifs which have kept the core of PACE (see Results), which seems to be essential for function.

In our approach, we explicitly listed the putative elements and their upstream locations for 30 proteasomal genes and 2 genes (UBA1 and CDC48) which are under the control of Rpn4p (including the five recently sequenced Hemiascomycete species). Comparisons thus allowed for a more detailed assessment of the relationships between the respective elements. Evaluating Table 2 immediately implies that during evolution of the Hemiascomycetes there is a continuous enrichment in the number of genuine PACE elements, so that the earlier and more “degenerate” PACE-like motifs but functional with their cognate factors could have converged to “canonical” PACE motifs in the evolutionary most recent species. In a large number of incidences, convergence could have been brought about by a single base change in preexisting motifs in the ancestors, notably in position 1, 9, or 7 of the sequence (mainly conforming to AGTGGCAAA, GGTGGCAAN, or GGTGGCGAA, respectively). This is obvious, for example, in comparing the motifs found in K. lactis to the other species in group 2 and those in group 1. A similar notion is valid for the motifs found in C. glabrata, because there are more “degenerate” PACE elements in C. glabrata than in the rest of species in group 2 or 1. A peculiarity of C. glabrata is the repeated occurrence of TGTGGCAAA (see Results), whereby a single base exchange would result in the “canonical” motif. Thus, it appears that the patterns in group 2 do not exactly reflect the divergence times as worked out for the phylogenetic tree on other criteria (e.g., Fisher et al. 2006; Dujon, 2006; Scannel et al. 2006).

Compared to group 2, there are a larger number of motifs in group 3 that conform to the sequences AGTGGCAAN or GGTGGCAAN. Examples that indeed mutations occur in these motifs in closely related species can be seen for C. albicans and C. dubliniensis. The upstream region of PRE3 contains the motif AGTGGCAAA in C. albicans, whereas it reads AGTGGCGAA in C. dubliniensis. The upstream region of RPN3 exhibits the motif GGTGGCAAC in C. dubliniensis, and GGTGGCGAC in C. albicans, in nearly identical upstream locations.

In any case, provided that all of the proteasomal genes in the various species are subject to regulation by their cognate Rpn4 factors, these would have to be flexible enough to bind degenerate PACE-like sequences in these species. Implicitly, this has been demonstrated for two “extreme” species, S. cerevisiae and C. albicans, by in vitro binding and competition experiments (Gasch et al. 2004), for which oligonucleotides comprising the decamers GGTGGCAAAA (Sequence A), AGTGGCAACA (Sequence C), and GAAGGCAAAA (Sequence B) were used. While C. albicans Rpn4 bound to these with comparable efficiency, S. cerevisiae Rpn4 preferentially bound to GGTGGCAAAA and less to AGTGGCAACA but had a reduced ability to bind to GAAGGCAAAA. These authors also stated that S. cerevisiae Rpn4p could transcribe a reporter gene to higher levels if Sequence A was present in its promoter compared to when Sequence B or a minimal promoter was placed upstream of the reporter gene, pointing out that S. cerevisiae Rpn4p (and probably their closest relatives) largely lost the ability to bind productively to Sequence B.

The authors have called Sequence B “the C. albicans specific element,” but it may be noted that analogous motifs occur also in C. dubliniensis (at similar upstream locations in the same proteasomal genes as in C. albicans) but also in D. hansenii. We find that for the two Candida species in group 3, the occurrence is ∼17 % among the proteasomal upstream elements and only ∼7 % for D. hansenii, while we wish to emphasize that the majority of the elements throughout the group 3 species still fit into the matrices GGTGGCAAN or AGTGGCAAN (base alterations in position 9).

Inspection of all proteasomal gene upstream regions in C. albicans reveals that only 8 of 30 of the elements fully conform to the above decamers, while the rest exhibit one or two base exchanges, again supporting the notion that Rpn4p must be flexible enough to bind degenerate PACE-like sequences, if the complete set of cis-regulatory elements is used in the regulatory network. It may well be that nonamers with a conserved core and one or two base variations might suffice for Rpn4 binding. Note that nearly all of our assignments and also the matrices formulated by Gasch et al. (2004) point to the significance of the central –GGC–. Gasch et al. (2004) argued that the different binding specificities found between S. cerevisiae and C. albicans reside in the second Zn-finger of the Rpn4 homologues which is proposed to contact the first half of the DNA-binding site. Given the possible significance of the core part of the PACE-like elements, we may speculate that this part is contacted by the first (atypical) finger, which in our alignment is found to be equal in length, very highly conserved in its “loop” region (12 residues) between CX10C and HX4H, and highly conserved in the linker region to the second Zn- finger, in all Hemiascomycetes. Unfortunately, it is unknown which parts of the atypical Zn-finger may make contact to which nucleotides of the cis-regulatory elements, and this remains largely unpredictable by comparisons among the Rpn4 homologues, as no models have been developed for such an atypical Zn-fingers other than for conventional Zn-fingers (S. Wolfe et al. 2000; Pabo et al. 2001).

At first view, the finding of Gasch et al. (2004) that the N. crassa Rpn4 homologue can bind the above decamers comes as a surprise. A comparison between the respective DNA-binding domains reveals that there is again high conservation within the first CX10C/HX4H domain and the second CXXC/HX4H domain, while the region between these two domains is shorter by seven amino acids in N. crassa compared to the Saccharomyces species. (This observation is also valid for the Rpn4p homologues from S. pombe, A. nidulans, and A. fumigatus; cf. Supplement 2.) Thus the DNA-binding domain in N. crassa may still allow for binding of PACE sequences. However, that none of these Rpn4 homologues contain acidic domains and no PACE-like elements are found upstream of the proteasomal genes argues for the proposition that no regulatory networks as in the Hemiascomycetes do exist in the other fungi.

Earlier and more degenerate PACE-like motifs, but functional with their cognate factors, have converged to “canonical” PACE motifs in the species that in evolution have separated more recently, such as the group 1 species, in which practically no changes have occurred either in the PACE patterns or in the DNA-binding sites of Rpn4p. Obviously, the ability of Rpn4p of the more recently segregated species to bind to degenerate PACE-like motifs has been reduced by adapting the Rpn4p sequences concomitantly (Gasch et al. 2004). This scenario would probably afford a stepwise (mutational) convergence of “pre-PACE” elements into “true” PACE motifs as evolution proceeded and would be in agreement with the proposal that cis-regulatory changes are an important source of genetic variation (Wray et al. 2003) and that gains (and losses) of functional binding sites significantly contribute to these changes (e.g., Dermitzakis and Clark, 2002). Likewise, as can be inferred from the comparison of the Rpn4 proteins (Supplement 1), mutations in the highly conserved DNA-binding domains must have contributed to their capability of interacting with the actually occurring PACE-like cis-regulatory elements. We have paid attention to this by putting the Rpn4p homologue of group 1 and 2 into an order that parallels the variability in the cognate elements found in these species: the most pronounced alterations in the Rpn4p DNA-binding domains are observed for those species in which the PACE-patterns exhibit the greatest variety (group 2), while the restriction of binding to a more specialized PACE element such as GGTGGCAAA in the more recently segregated Saccharmoyces (group 1) is mirrored by considerably fewer or no alterations in the DNA-binding domains. For the group 3 species, it is difficult to establish such a strict correlation.

Overall, it seems evident that the concerted regulation of the proteasomal genes by Rpn4 proteins is a special acquisition of the Hemiascomycetes, but that no similar mechanism is operative in S. pombe, other fungi, or higher eukaryotes. This notion is substantiated by investigations on the ubiquitin-proteasomal network from Drosophila (e.g., Wojcik and DeMartino 2002; Lundgren et al. 2005) or mammalian cells (e.g., Meiners et al. 2003) that have clearly demonstrated the existence of a concerted regulation but have not identified a system similar to the one in yeast. We hope that our observations will stimulate further experiments to better understand the regulatory network of this most important system for cell viability, in Hemiascomycetes as well as in higher eukaryotes.