Background

The structure and dynamics of chromatin clearly differentiate the eukaryotes from the other superkingdoms of life [1,2]. Eukaryotic chromatin undergoes a variety of structural and compositional changes that accompany the progression of the cell cycle and changes in gene expression [3]. These dynamics are regulated both by a variety of protein-protein and protein-DNA interactions and by catalytic modification and reorganization of proteins and nucleic acids that comprise the chromosomes [3]. In recent years it has become clear that covalent modification of histones, transcription factors and other chromosomal proteins plays a major part in the dynamics of chromatin. The covalent modifications include hydroxylation of proline and lysine, methylation of lysine and arginine, phosphorylation of serine and threonine, and acetylation and ubiquitination of lysine. Other enzymes, such as deacetylases and phosphatases, remove covalent modifications from proteins and thus reverse their effects [4,5,6]. The covalent modifications may change the local or global charge properties of chromosomal proteins and regulate their interactions with DNA. Additionally, the modifications also seem to form a 'code' that is recognized by specific groups of regulatory proteins [6,7,8]. Besides covalent modifications, ATP-dependent enzymes such as the Swi2/Snf2 ATPases and other chaperone-like proteins remodel the chromatin by rearranging the binding pattern of histones and other chromosomal proteins [9,10,11,12]. These covalent and non-covalent catalytic actions on the chromatin can result in condensation or decondensation of chromatin, either locally or on the chromosomal scale, and thereby regulate access of transcription factors and other proteins to the chromatin.

Many of these chromatin-modifying activities are organized into large multisubunit complexes, with the core enzyme accompanied by several noncatalytic subunits [12]. The subunits of these complexes are characterized by a number of conserved domains that interact with other proteins or DNA. Most of these domains are evolutionarily mobile modules and combine with each other in a very wide range of domain architectures [13]. Well studied examples of these are the bromodomain that interacts with acetylated peptides [14,15,16], chromodomains that mediate specific interactions with proteins [17,18] and RNA [19,20], the PHD finger mediating protein-protein interactions [21], the Myb and SANT domains [22] that interact with both DNA and proteins [23], and the SAP and AT-hook domains that interact with DNA [24,25]. These domains often occur together in large proteins linked to catalytic domains and may serve to tether these proteins to different components of chromatin and deliver catalytic activities to specific locations.

The computational analysis of chromatin proteins has helped in the identification of a large number of conserved modules in the chromosomal proteins [13]. These studies, followed by biochemical and structural characterization of these modules has thrown considerable light on the biology of chromatin dynamics and the roles these domains have played in course of evolution [14,15,16,20]. Using computational analysis, we have discovered a previously uncharacterized domain in multiple chromatin proteins. We also show that this domain is found linked to catalytic domains such as oxidoreductase and Jun activating binding protein 1 (JAB) domains and these proteins may define a novel class of chromatin-modifying enzymes.

Results and discussion

Identification of the SWIRM domain

In the course of our systematic survey of eukaryotic chromosomal proteins, we observed a conserved globular module shared by the SWI3p and RSC8p proteins from the yeast Saccharomyces cerevisiae that did not map to any previously characterized domain. These homologous proteins are parts of the multisubunit SWI/SNF and RSC complexes that drive chromatin remodeling through the SNF2/SWI2-like ATPase subunit [26,27,28]. The region of similarity encompassing this segment has been termed 'conserved region 1' in [27]. Both these proteins additionally contain a Myb-related helix-turn-helix domain termed the SANT domain [22] at their carboxyl terminus and a further carboxy-terminal conserved α-helical extension restricted to orthologs of RSC8p and SWI3p. Given that these subunits are key components of the respective chromatin-remodeling complexes, and mediate multiple interactions, we sought to further investigate the provenance of their conserved amino-terminal module using computational methods. A PSI-BLAST search [29] initiated with this region from RSC8p (gi: 14318562, region 80-177) not only recovered its paralogs and orthologs from other organisms such as Moira from Drosophila [30] and BAF155 and BAF170 from vertebrates [27], but also several other uncharacterized proteins from diverse eukaryotes with statistically significant expected values (e-values) at the point of detection. For example, the search recovered the nuclear protein SPAC23E2.02 from Schizosaccharomyces pombe [31] in iteration 4 (e = 2 × 10-4), the human proteins KIAA1915 (e = 10-5) and KIAA0601 [32,33] (e = 10-7) in iteration 2 and the Arabidopsis thaliana protein At2g47620 in iteration 3 (e = 10-7). We prepared a multiple alignment of these regions from all the proteins detected in these searches (Figure 1) and used it to construct a PSI-BLAST position-specific score matrix (PSSM) [34] and a hidden Markov model [35] for further searches against the complete or nearly complete proteome databases of individual eukaryotes. These searches allowed us to detect additional copies of this region in the carboxyl termini of the Asynaptic 1 protein (a Hop1p homolog) from plants [36]. Additionally these searches also recovered a conserved globular region in the extreme carboxyl terminus of the ADA2-like proteins from eukaryotes [37], with moderate significance. While these newly detected proteins have a somewhat distinct conservation pattern from the originally detected set (Figure 1), their secondary structure predicted using the JPred program [38] closely matches that predicted for the originally detected set. This observation, together with the information that the ADA2-like proteins are chromatinic proteins, just like the members of the original set, suggests that these domains are probably distant relatives of each other (Figure 1).

Figure 1
figure 1

Multiple sequence alignment of the SWIRM domain. Proteins are designated by their gene names followed by the species abbreviations and GenBank (gi) numbers. The coloring represents the conservation profile of amino-acid residues at 90% consensus distinguished by the following amino-acid classes: h, hydrophobic residues (LIYFMWACV (in the single-letter amino-acid code)), a, aromatic residues (FHYW) and l, aliphatic (LIAV) residues, all shaded yellow; c, charged (KERDH) residues (basic (KRH) residues and acidic (DE) residues) colored magenta; p, polar (STEDRKHNQC) residues colored blue; s, small (SACGDNPVT) residues colored green; u, tiny (GAS) residues shaded green (in positions that are always glycine, this is indicated with a G); b, big (LIYERFMWQ) residues shaded gray. The predicted secondary structure is shown above the alignment: H or h, α helix; E or e, β strand. Species abbreviations: At, Arabidopsis thaliana; Bna, Brassica napus; Ce, Caenorhabditis elegans; Ddi, Dictyostelium discoideum; Dm, Drosophila melanogaster; Ecu, Encephalitozoon cuniculi; Hs, Homo sapiens; Osa, Oryza sativa; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; Zm, Zea mays. A subset of Rsc8p orthologs is represented in the automatically generated, uncurated PFAM-B entry 3680.

The conserved region described above occurred in various distinct domain-architecture contexts, suggesting that it is an evolutionarily mobile domain (Figure 2). We named this domain the SWIRM domain after the proteins SWI3p, Rsc8p and Moira in which it was first recognized. Secondary-structure predictions, based on the multiple alignment, indicate the presence of four distinct a helices. Hence, it is predicted that the SWIRM domain is likely to form a globular structure in the form of a tetrahelical bundle (Figure 1). However, a direct comparison showed no specific relationship between the sequence conservation pattern of the SWIRM domain and those of other α-helical bundles that are commonly encountered in chromosomal proteins such as the bromodomain [16], the Myb or SANT domains and other helical DNA-binding domains [22]. The differences in conservation pattern between the distantly related ADA2p carboxy-terminal domains and the canonical SWIRM domains suggest that the former are likely to interact with a very distinct set of partners.

Figure 2
figure 2

Domain architectures of proteins predicted to have the SWIRM domain. Proteins are designated by their gene names, species abbreviations and gi numbers. The phyletic distribution of a particular domain architecture is additionally given in brackets, where A represents animals; P, plants; F, fungi; D, D. discoideum and Pf, Plasmodium falciparum. Domains are typically represented by their standard names or abbreviations. ZZ represents the ZZ-type of zinc finger, and 'helical' designates the conserved α-helical domain found at the carboxyl terminus of proteins with a Myb (SANT) and an RSC8-like SWIRM domain.

Functional implications and domain architectures of the SWIRM domains

Rsc8p (Swh3p) has been shown to mediate multiple interactions in the RSC complex: it undergoes dimerization via the carboxy-terminal coiled-coil segment, associates with the SWI2/SNF2 ATPase Sth1p by forming two distinct contacts, and it forms a complex with RSC6p subunit. Deletion analyses have shown that the region of the protein that includes the SWIRM domain is probably required for at least one of the contacts with Sth1p and perhaps even those with Rsc6p [39,40]. This suggests that the SWIRM domain is most likely to mediate protein-protein interactions. The Rsc8p- and SWI3p-related proteins have fairly complex architectures, implying that the different modules may mediate distinct interactions. Rsc8p has an additional ZZ domain [41] between the SWIRM and Myb domains, whereas the animal and Dictyostelium versions, like Moira, have an additional amino-terminal chromodomain (Figure 2). However, the common denominator in all these proteins are the SWIRM and Myb domains, suggesting that they mediate the evolutionarily conserved interactions of these proteins, such as those with the Swi2/Snf2 ATPase. The presence of a divergent SWIRM domain, linked to a HORMA domain, in Asy1, the plant ortholog of the yeast Hop1p [36], is consistent with the role of these proteins in mediating protein-protein interactions in the process of chromosome synapsis [42]. The ADA2p-like proteins [37] (Figure 2) are chromatin-associated proteins and are known to mediate protein-protein interactions [23]. They contain a distinct carboxy-terminal globular domain that is distantly related to the SWIRM domain, suggesting that this domain could mediate a subset of their interactions with the transcriptional machinery.

A striking group of proteins with the SWIRM domain is the one typified by SPAC23E2.02 and SPBC146.09c from S. pombe. Between one and four orthologs of this protein are encoded by different crown-group eukaryotes (Figures 1,3) and contain the SWIRM domain linked to a predicted FAD-dependent oxidoreductase domain [43]. A similar observation regarding the presence of a FAD-binding domain has been reported for KIAA0601, a vertebrate protein of this family [32,33]. KIAA0601, along with the co-repressor CoREST, is stably associated with the histone deacetylase complex. Furthermore, SPAC23E2.02 has been shown to be a nuclear protein [31] and along with its paralog SPBC146.09c, contains at its extreme carboxyl terminus an HMG1 domain, which is a common DNA-binding module found in diverse chromosomal proteins [44] (Figures 2,3). The closest relatives of the predicted oxidoreductase domains linked to the SWIRM domain are the polyamine oxidases and the monoamine oxidases (Figure 3). These enzymes are involved in the oxidation of amino groups of polyamines such as spermine and spermidine and monoamines such as dopamine and serotonin [45]. The availability of the crystal structure for the polyamine oxidase from maize allowed us to evaluate the sequence conservation of the oxidoreductase domain fused to the SWIRM domains [45,46]. The alignment of this domain from the SPAC23E2.02-like proteins spans the entire length of the amino oxidase and contains the hallmark residues required to bind the FAD cofactor (Figure 3). These include the glycine-rich loop bounded by a β-strand and an a helix that is typical of Rossmann fold nucleotide-binding proteins, a conserved glutamate specific to the amino oxidase family, that interacts with ribose in FAD and other residues of the substrate and cofactor-binding site. Thus, these oxidoreductase domains are likely to function as amino oxidases; this, together with their linkage within the same polypeptide to domains typical of chromatin proteins, and evidence for nuclear localization, suggests that they are novel chromatin-modifying enzymes. Consistent with this, the complex containing KIAA0601 has been shown to contain FAD and was suggested to be a chromatin-modifying enzyme [32]. Building on these observations, we suggest two functional possibilities for these proteins: first, they could act as novel protein-modifying enzymes that oxidize the amino groups of lysines or arginines present on histones or transcription factors; alternatively they could affect chromatin structure by oxidizing basic polyamines present in the chromatin and thereby reducing the charge balance.

Figure 3
figure 3

Multiple sequence alignment of the polyamine oxidase domain. Amino-acid residues are colored at 100% consensus. The protein designations, coloring scheme and consensus abbreviations are as described in Figure 1; +, acidic residues. The crystal-structure-derived secondary-structure assignments are shown above the alignment with positions involved in cofactor or substrate binding marked by an asterisk. Inserts in individual sequences are given as brown numbers. Species abbreviations: At, A. thaliana; Ce, C. elegans; Dm, D. melanogaster; Hs, Homo sapiens; Mm, Mus musculus; Mtu, Mycobacterium tuberculosis; Rn, Rattus norvegicus; Sp, S. pombe; Zma, Z. mays.

It is also conceivable that the FAD cofactor of these enzymes functions analogously to NAD, in a deacetylation reaction of acetyllysines similar to that carried out by Sir2 enzymes [47]. Whereas both these enzymes are derived versions of the Rossmann fold and bind a dinucleotide cofactor, the SWIRM amino oxidase proteins do not possess equivalents of the unique inserts with residues that allow the SIR2-like proteins to catalyze the deacetylation reaction. On the contrary, the conserved residues in the SWIRM amino oxidases are the same as other amino oxidases, suggesting that they share similar catalytic activities, as proposed above.

The combination with the SWIRM domain, which is predicted to be a protein-protein interaction module, suggests that these enzymes may be part of a larger complex, like other chromatin-modifying enzymes. Whereas in vertebrates KIAA0601 has been shown to interact with the histone-deacetylase-containing complexes [32,33], the presence of multiple proteins in this family could point to the formation of other distinct complexes. Interestingly, the SWIRM amino-oxidase-type proteins are absent in S. cerevisiae but present in S. pombe and other crown-group eukaryotes. Hence, it is likely to have been present in the ancestral crown-group eukaryote and secondarily lost in S. cerevisiae. We have shown previously that several functionally linked genes, encoding proteins involved in chromatin-structure dynamics, which are conserved in S. pombe and other crown-group eukaryotes, have been lost as a group in S. cerevisiae [43]. This implies that the SWIRM amino-oxidase-type proteins may be functionally linked to these other genes that were co-eliminated along with it in S. cerevisiae. These include genes for proteins such as the SET domain methyltransferase Clr4p, the chromodomain protein Swi6p, the PHD finger protein Mlo2p, the chromosomal actin-like protein SPAC23D3.09, and the predicted prolylhydroxylase with the double-stranded β-helix domain SPAC343.11c [43]. Some of these proteins may be part of a multiprotein chromatin-modifying catalytic complex of which the SWIRM amino oxidase proteins are a part.

In vertebrates, the SWIRM domain is found fused along with a Myb (SANT) domain to a JAB1/PAD1 domain [48,49] (for example, in the human protein KIAA1915, Figure 2). This latter domain is commonly associated with several proteasomal and signalosomal proteins and is involved in ubiquitin-mediated protein degradation [50]. In this case, the SWIRM domain may serve to recruit proteasomal complexes to specific chromatin proteins to effect their degradation.

Conclusions

We define a conserved domain of about 85 residues, predicted to participate in protein-protein interactions, in different eukaryotic chromatin proteins such as Swi3p and Rsc8p. Homologs of these molecules, with the SWIRM domain, are found in all eukaryotes belonging to the crown group, as well as earlier-branching protists such as the apicomplexans. A version of the SWIRM domain is found linked to an amino-oxidase domain in a class of nuclear proteins that are represented in most crown-group eukaryotes. These proteins are present in multiple copies in plant proteomes and are entirely absent in the yeast S. cerevisiae. We predict that these proteins define a new class of chromatin-modifying enzymes that are likely to oxidize the amino groups of histones or other nuclear proteins. Alternatively, they may oxidize polyamines in chromatin to alter the charge balance in the chromatin. In humans the SWIRM domain is found linked to a JAB1/PAD1 catalytic domain, suggesting that this protein may serve as a link in the regulation of chromatinic proteins through proteasomal degradation.

Materials and methods

The nonredundant (NR) database of protein sequences (National Center for Biotechnology Information, NIH, Bethesda) was searched using the BLASTP program. Profile searches were carried out using the PSI-BLAST program [29] with either a single sequence or an alignment used as the query, with a profile inclusion expectation (E) value threshold of 0.01, and were iterated until convergence. Previously known conserved protein domains were detected using the corresponding PSI-BLAST-derived PSSMs [34]. The PSSMs were prepared by choosing one or more starting queries (seeds) for a set of most frequently encountered domains (see [34] for details) and run against the NR database until convergence with the -C option of PSI-BLAST to save the PSSM. It was ensured that at convergence no false positives were included in the profiles. This profile database can be downloaded from [51] or used on the Internet via the RPS-BLAST program [52]. All globular segments of proteins that did not map to domains with previously constructed PSSMs were searched individually using PSI-BLAST to detect any additional domains that may have been overlooked. Hidden Markov model-based searches were run using the HMMER2 package [35].

Multiple alignments were constructed using the T_Coffee program [53], followed by manual correction based on the PSI-BLAST results. Protein secondary structure was predicted using a multiple alignment as the input for the JPRED program [38].