Keywords

11.1 Introduction

The basic principles of transcriptional regulation are similar between prokaryotes and eukaryotes and involve the binding of TFs to specific DNA sequences at target genes, where they recruit and stabilize the general transcriptional machinery required for gene expression [1, 2]. Despite these general similarities, transcription initiation in eukaryotes is considerably more complex, which is likely related to the increased genome size and greater need for organization compared to prokaryotes. One key difference is that DNA in eukaryotes is not readily accessible, but tightly packaged by architectural proteins into chromatin. The basic unit of this packaging is the nucleosome, which consists of ~147 bp of DNA wrapped around an octamer of histone proteins [3, 4]. Nucleosomes play an important role in condensing DNA, thereby allowing the large eukaryotic genome to fit into the nucleus. Perhaps not surprisingly, this compaction also negatively affects transcription initiation in vitro [5, 6] and in vivo [7], as it forms an impediment to the binding of TFs and the formation of a preinitiation complex (PIC) [8, 9]. To initiate transcription, TFs and the PIC must first overcome the physical barrier posed by nucleosomes; however, the stability of nucleosomes means that direct competition for DNA access is inefficient. A host of coactivators therefore exist that can be recruited to regulatory regions by TFs to facilitate transcription initiation. These coactivators typically consist of (or recruit) chromatin modifier (CM) complexes that either displace or evict nucleosomes or covalently modify histones to loosen their interactions with DNA. CMs can also function as corepressors by effecting a more closed chromatin conformation. Consequently, the recruitment of coregulators that affect chromatin structure is now recognized as a major mechanism by which TFs can regulate gene expression.

Knowledge of general chromatin architecture has greatly expanded in recent years due to the broad application of classical and novel techniques to map TF binding sites, histone modifications, and chromatin accessibility. Mapping of TF binding sites and histone modifications is typically done using chromatin immunoprecipitation (ChIP) or related techniques such as DamID, which are discussed in more detail in Chapter 8. Most of the techniques to map chromatin accessibility make use of the fact that regulatory sites and the short DNA linkers connecting nucleosomes are more sensitive to nuclease digestion by micrococcal nuclease (MNase) or DNase I, each of which has distinct cleavage patterns that provide a different view of chromatin structure [10]. MNase cuts preferentially in linker regions between nucleosome and it is therefore typically used to map the positions of nucleosomes. On the other hand, DNaseI also cuts DNA associated with nucleosomes, when used at higher concentrations, and its cleavage pattern therefore typifies general chromatin accessibility. Another approach to identify regions of open chromatin, formaldehyde-assisted isolation of regulatory elements (FAIRE), has also been described [11]. This method exploits the property that fragmented DNA that is highly crosslinked to histones after formaldehyde treatment (i.e. closed chromatin) can be separated from DNA with a low degree of crosslinking (i.e. open chromatin) by phenol extraction.

Advances in microarray and sequencing technology have made it possible to apply these various methods to create genome-wide maps of nucleosome occupancy [1215], potential regulatory sites [16, 17], as well as patterns of histone modifications and TF binding [1823]. A common observation in these studies is that active promoters and distal regulatory elements such as enhancers are associated with regions of open chromatin and enriched for bound TFs and their coregulators, underscoring that transcriptional regulation is universally linked to chromatin remodelling. These studies have also provided an unprecedented view of the higher-order structure of the genome, where broad domains of more accessible chromatin (i.e. euchromatin) alternate with regions that are less accessible to the transcription machinery (i.e. heterochromatin). It should be noted, though, that these techniques provide only a snapshot of the chromatin structure at the time of fixation and while many regulatory regions may appear stable, several lines of evidence suggest that remodelling is in fact a highly dynamic and continuously ongoing process. For example, nucleosomes found in yeast promoters exchange more rapidly than nucleosomes located in gene bodies [24, 25] and FRAP (fluorescence recovery after photobleaching) studies suggest that many TFs only transiently interact with DNA in vivo, even at active promoters [2629]. Thus, chromosomal domains and regulatory regions with apparently stable chromatin are likely in a dynamic equilibrium between competing forces, the balance of which ultimately determines the degree of DNA accessibility [8].

Following a brief introduction into the types of CM involved in chromatin remodelling, this chapter will highlight how TFs can regulate gene expression by recruiting these coregulators to orchestrate changes in the chromatin state, and in turn, how chromatin can affect TF target recognition and binding. Then, I will discuss how these dynamic and antagonistic forces may be coordinated to organize chromatin and direct transcription at specific locations in the genome. Other recent reviews that consider these and related topics include [3033], as well as Chapters 10 and 12 in this volume, which specifically consider TF–nucleosome interactions, and the auxiliary domains of TFs that mediate many of these functions, respectively. This chapter also contains a Glossary at the end which provides an overview of key terminology used throughout.

11.2 An Overview of Coregulators that Effect Changes in Chromatin Structure

A broad distinction can be made between two types of CMs, based on their mechanism of action: histone modifiers and ATPase nucleosome remodelling complexes. Histone modifiers are responsible for the wide variety of covalent modifications found on histone proteins, in particular on their unstructured N-terminal tails (Reviewed in [34, 35]). At least eight different types of histone modifications and their associated enzymes have been identified, with the number of distinctly modified residues currently standing at well over a hundred [34]. It has been proposed that combinations of these modifications constitute a “histone code” that is read by proteins that interact with specific modifications [36], allowing for an organized association of proteins with different stages of transcription. Indeed, the different modifications can serve as interaction sites for other coregulators, such as ATPase remodelers, that can direct further changes to chromatin structure (see examples below). The ultimate effect of histone modifications on chromatin structure – be it compacting or unwrapping – is therefore presumably to a large degree determined by the type of proteins that interact with them. Another way that histone modifications can affect chromatin structure is by changing the electrostatic properties of nucleosomes. For example, the acetylation of histone tails by histone acetyl transferases (HATs) neutralizes positive charges that would otherwise interact with negatively charged DNA [37], facilitating nucleosome unwrapping and mobility (Fig. 11.1a). It is unclear whether other modifications similarly affect chromatin through effects on the chemical properties of nucleosomes, but it has been suggested that phosphorylation may, like acetylation, reduce chromatin compaction through its effects on nucleosome charge [34].

Fig. 11.1
figure 11_1_209873_1_En

Effects of chromatin modifiers on chromatin structure. a Acetylation of histone tails by histone acetyl transferases (HATs) results in a more open chromatin conformation. b Model for nucleosome sliding by ATPase remodelers based on studies of the ACF complex [273]. In this model, the ATPase remodeler draws in DNA from the linker region (bottom arrow), resulting in the formation of a small DNA loop at the nucleosome entry site, which then propagates over the nucleosome, resulting in a lateral displacement along the DNA. The illustration shows one possible effect of remodelling at regulatory regions, namely the exposure of TF binding sites that would otherwise be rendered inaccessible by nucleosomes

Genome-wide studies have revealed that the occurrence of most modifications is tightly coupled to the location and activity of genes and their regulatory regions, in a manner that reflects their effects on chromatin structure. For example, acetylation marks are predominantly found at the beginning of active genes in yeast [22, 3841] and at promoters and CpG islands in higher eukaryotes [4245], although activation has also been linked to decreased acetylation of lysine residue 16 on histone H4 (i.e. H4K16ac) [38, 46, 47]. In contrast, methylation patterns differ depending on the residue that is modified, and distinct methylation states can be associated with either repression or activation [31, 34]. Classical examples include H3K4me and H3K27me, which mark regions of active and silent chromatin, respectively. The difference between acetylation and methylation patterns is mirrored in the specificity of their enzymes: HATs typically act indiscriminantly on multiple histone residues [34], whereas methyltransferases are restricted to a single residue on one histone type [48]. Some effects of HATs on chromatin may also be mediated through other targets, as it has become increasingly clear that they can acetylate many non-histone proteins, including TFs [4951]. For other modifications, the relation to the transcriptional state is less well characterized, but in general, phosphorylation appears to correlate with activation [52, 53], while sumoylation has been associated with repression [54, 55]. Ubiquitination, like methylation, can be associated with either transcriptional state [5658]. Extensive crosstalk between modifications presumably contributes to these complex patterns. For example, phosphorylation of H3S10 can stimulate acetylation of H3K14 [59, 60] and inhibit H3K19 methylation [61], while repression by sumoylation may be directly related to the fact that it competes for the same residues as acetylation and ubiquitination.

The second class of CMs, ATPase remodelers, can directly affect the degree of chromatin packing by repositioning or sliding nucleosomes along the DNA (Reviewed in [62]) (Fig. 11.1b). The primary driving force behind this motion comes from a central catalytic subunit, which contains a conserved ATPase domain that provides the energy to move nucleosomes by rewinding the DNA around them. This process involves breaking and reforming most histone–DNA interactions, which likely explains the broad effects that remodelers can have on nucleosomal DNA accessibility [63, 64], nucleosome eviction [6567] and histone exchange [68, 69]. Besides the ATPase domain, the catalytic subunits contain various additional domains that have been used to classify these remodelers into four major families: SWI/SNF, ISWI, CHD and INO80. Interestingly, with the exception of INO80 subunits, many of these additional domains mediate affinity to distinct histone modifications [70, 71], which are thought to confer different preferences for specifically modified chromatin structures to each family [72, 73]. SWI/SNF remodelers contain a bromodomain which binds acetylated histones [74], while the CHD family possesses chromodomains that can interact with methylated histone tails [7578]. ISWI family proteins have a pair of SANT and SLIDE domains that are believed to form a module with affinity for unmodified histones [79], though it is as yet unclear to what degree this interaction may be affected by specific modifications.

The diversity of CMs is further increased through the association of the core catalytic subunits with different complements of additional proteins, which can vary even within families [62, 70, 80]. These accessory subunits can play a structural role, and can also contribute a variety of additional interaction domains and catalytic activities. Some complexes, such as NURD (nucleosome-remodelling and histone deacytelase), even combine ATPase remodeler and histone modifier activities [81]. As in the case of histone modifications and their associated enzymes, a broad classification can be made regarding the effects of the ATPase remodelers on gene expression. For example, recruitment of SWI/SNF complexes is predominantly associated with transcriptional activation, consistent with its preference for acetylated histones, while ISWI complexes typically function as repressors [82]. This distinction is by no means sharply defined, though, and most ATPase remodelers have been found to function as activators at some promoters and repressors at others. Thus the ultimate effect of remodelling can vary depending on the context in which this remodelling takes place.

11.3 TFs Play a Central Role in Targeting Chromatin Remodelling

Exactly how chromatin remodelling complexes are guided to their target regions remains an active area of investigation. One clearly established pathway is direct recruitment by TFs, with TFs providing the targeting component through their sequence-specific DNA binding domains. This recruitment typically involves transient interactions with the transactivation or effector domains of TFs, which are discussed in more detail in Chapter 12 of this volume. The intrinsic preferences for specific histone modifications found in many CMs, discussed above, do indicate that there are also alternative routes that do not involve direct recruitment by TFs. For example, the bromodomains in the yeast Swi2/Snf2 remodelers and Gcn5 HAT are sufficient to anchor their respective complexes to acetylated promoters in the absence of transcriptional activators [74]. Individual histone binding domains may in general not be sufficient for effective targeting, however, given the low binding affinities of the domains characterized to date [62]. Instead, the interaction domains could serve other purposes that do not involve recruitment, such as regulating remodeler ATPase activity [62]. Regardless, even if histone modifications indeed provide important targeting cues for CMs, the question remains as to how these modifications are established in the first place, given that histone-modifying enzymes generally do not posses intrinsic DNA sequence preferences. One possible answer comes from detailed studies of model genes in yeast (Reviewed in [83]), which have shown that the actions of histone modifiers in the early stages of transcription initiation are primarily guided by sequence-specific TFs. It is therefore likely that TFs play a central role in targeting chromatin remodelling, whether this is through direct interactions with remodelling complexes, or by guiding initial histone modifications and/or other coregulators that mediate these interactions indirectly. An overview of some of the key features of TF-mediated recruitment of CMs and their implications for gene regulation will be given in the following paragraphs; readers are referred to Chapter 12 for more details.

Individual TFs can interact with a surprisingly wide variety of modifier complexes and other coregulators. This promiscuity is in part due to the intrinsic characteristics of the TF transactivation domains (also discussed in greater detail in Chapter 12), which are generally unstructured and only become stabilized upon interacting with their binding partners [84, 85] property that may allow for some degree of flexibility in the selection of binding partners [86]. The diversity of TF partners is also increased through interactions with subunits that are shared between different CM complexes. For example, acidic activation domains such as those found in the yeast Gal4 TF can recruit both the SAGA and NuA4 HATs through interactions with the Tra1 subunit that is present in both these complexes [8789] (Fig. 11.2a). The great diversity of TF binding partners may serve multiple purposes. First, it enables the same TF to participate in distinct mechanisms of transcription initiation at different genes, as has been described for the activation of transcription by Pho2 and Pho4 at the PHO5 and PHO8 promoters in budding yeast [83]. Second, the transient nature of TF interactions at individual regulatory regions [2629] could allow for repeated cycles of TF binding to the same target site with different coregulators, enabling a TF to affect initiation in more than one way. The particular coregulator(s) recruited at each site likely depends on other elements such as local chromatin structure and interactions with other TFs.

Fig. 11.2
figure 11_2_209873_1_En

Targeting of chromatin remodelling by TFs. a The diversity of TF interactions with CMs is increased through shared subunits in remodeler complexes, as illustrated here by the interaction between the Gal4 TF and the Tra1 subunit in the SAGA and NuA4 complexes. b Targeting of the RSC complex in S. cerevisiae by the Rsc3 TF subunit. c CBP hub function at the IFN-β enhanceosome. CBP interacts with the enhanceosome TFs, resulting in recruitment of the RNA polymerase II holoenzyme, PIC assembly and the initiation of transcription [274]

In addition to mediating targeting through transient interactions, TFs can be integrated into CM complexes as stable components (Fig. 11.2b). The budding yeast TF Rsc3 is a subunit of the RSC chromatin remodelling complex [90], and was shown to promote nucleosome exclusion at promoters containing Rsc3 binding motifs [91], suggesting that it directs the RSC complex to these locations. Likewise, the Iec1 TF subunit of the INO80 complex is required for recruitment to target genes in fission yeast, and for associated histone remodelling [92]. Numerous putative DNA binding domains have also been identified in subunits of SWI/SNF remodelers in higher eukaryotes, including high mobility group (HMG) domains, C2H2 zinc fingers, and AT–rich interaction domains (ARIDs) [93]. The function of these domains is still largely uncharacterized and some, such as the HMG and ARID domains, are known to predominantly bind DNA in a sequence-independent manner and likely have structural roles [94, 95]. Nevertheless, it is possible that others will turn out to be important for targeting. Interestingly, the integration of sequence-specific TFs in remodelling complexes does not appear to be highly conserved between species. The RSC complex in higher eukaryotes lacks the specific DNA-binding determinants found in yeast [93, 96]; similarly, the INO80 component Iec1 is fungal-specific and has no ortholog in budding yeast. The stable integration of these particular TFs in remodelling complexes may therefore be the result of adaptations to specific selective pressures during evolution.

The multitude of subunits found in CMs means that they too can have many binding partners, greatly increasing their potential to regulate diverse targets. The subunit composition of complexes associated with each CM can also vary, such that different versions can pair with distinct sets of TFs. This enables individual complexes to be involved in gene- and cell type-specific functions, as exemplified by the mammalian SWI/SNF-type ATPases Brahma (BRM) and its paralog Brahma related gene 1 (BRG1), which are part of numerous chromatin remodelling complexes that target specific promoters to control gene expression [97]. BRG1 can be associated with WINAC (WSTF including nucleosome assembly complex), which can inhibit or activate target gene expression through subunit–specific interactions with the Vitamin D receptor [98]. Alternatively, when incorporated in the NUMAC (nucleosomal methylation activation) complex it can associate with estrogen receptor-responsive promoters to activate transcription [99]. Dynamic changes in CM subunit composition during development have also been shown to result in alterations in targeting by TFs. For example, the BRG1/BRM associated factors (BAFs) BAF45A and BAF53A in the SWI/SNF-type neuronal-progenitor-specific BAF complex (npBAF) are replaced by BAF45B and BAF53B upon differentiation, to form a neuron-specific complex (nBAF) [100]. The inclusion of BAF53B allows the nBAF complex to interact with the calcium-responsive transactivator (CREST) to regulate genes that are essential for dendritic outgrowth in the differentiated cells [101]. A similar requirement for specific BAF complex components has been observed in the differentiation of cardiomyocytes, where ectopic expression of the GATA4 and TBX5 TFs in combination with the BAF60C but not BAF60A subunits can induce the differentiation of mesoderm into contracting cardiomyocytes in developing mouse embryos [102]. Together, these observations indicate that TF binding can be interpreted differently in distinct cell types, depending on the complement of coregulators that is expressed. This modularity underscores the importance of combinatorial subunit assembly in establishing gene regulatory networks and reveals an additional layer of complexity that must be considered in our attempts to reconstruct these networks.

CM complexes can also be used as scaffolds for the assembly of different components of the transcriptional machinery. Indeed, the main catalytic function of CMs is sometimes dispensable altogether, as illustrated by the fact that SAGA-mediated activation of GAL genes does not require its HAT activity [103105]. Instead, SAGA is believed to serve as a platform for the assembly of the PIC at GAL promoters. Similar functions have also been demonstrated for the general transcriptional coactivators CREB binding protein (CBP) and P300, two highly similar HATs with homologs in most multicellular organisms. In addition to the HAT domain, P300/CBP proteins contain other domains that mediate interactions with RNA polymerase II and a multitude of basal and gene-specific TFs [106, 107], allowing P300/CBP proteins to operate as hubs that can integrate signals from multiple TFs. This function has been most clearly described at the IFN-β enhanceosome, a stable complex of TFs and other nucleoproteins directly upstream of the IFN-β core promoter [108]. In this complex, CBP simultaneously interacts with multiple TFs bound across a 55 bp region, acting as a mediator for their synergistic activation of IFN-β transcription [108, 109] (Fig. 11.2c).

Consistent with their numerous interaction partners, P300/CBP have been linked to regulation of many genes, often acting at enhancers. Indeed, recent ChIP studies have identified P300/CBP binding as a key component of a wider signature of histone modifications and trans-acting factors that distinguish distal enhancers from gene promoters [20, 110114]. Another component of this signature is H3K4 monomethylation, which peaks at enhancers but not promoters. Nevertheless, despite the predominance of P300/CBP at distal enhancers, both proteins can also be associated with proximal promoters and genes [115], underscoring their versatile roles in gene regulation.

11.4 Determinants of TF Access to Chromatin

A complicating factor for any model of chromatin remodelling based primarily on targeting by TFs is that they typically recognize small DNA motifs (~6–12 bp) that can occur randomly at high frequencies. For example, an 8-bp recognition sequence will appear 45,000 times in a human-sized genome with random sequence composition, and in reality this number will be dozens of times greater considering that TFs typically bind degenerate motifs in vitro [116]. Chromatin is believed to significantly increase TF specificity by reducing the accessibility of many spurious binding sites [117, 118]. This central role of chromatin in restricting where transcription initiation takes place is underscored by observations that failure to properly reconstitute nucleosomes in the body of transcribed yeast genes results in the appearance of cryptic transcripts, presumably initiated from exposed sequences that resemble promoters [119, 120]. Nonetheless, the packaging of DNA by nucleosomes is not the only means by which TF specificity is achieved in vivo. For example, TF–TF interactions, direct or indirect (e.g. through scaffold proteins or by outcompeting nucleosomes), can decrease the number of potential target sites due to the larger size of the combined binding specificity. Moreover, recognition sites are often clustered together in regulatory regions, allowing for further synergistic interactions between TFs [121123]. A more in-depth overview of the various factors that play a role in TF target site selection can be found in Chapters 8 and 9.

The fact that nucleosomes can restrict access to DNA to prevent spurious transcription raises an important question: how can TFs bind their bona fide target sites to initiate the remodelling required for active transcription, given that much of the genome is covered by nucleosomes? Part of the answer to this question lies in the aforementioned fact that regulatory sites tend to be associated with open chromatin and nucleosome depleted regions (NDRs) [1215, 124]. In yeast and C. elegans, there is strong evidence that the intrinsic DNA sequence preferences of nucleosomes play a key role in establishing these regions, and that these preferences are encoded in the genome sequence [125, 126]. Rigid DNA sequences such as poly(dA:dT) tracts are common in many eukaryotic promoters and have long been known to disrupt nucleosome–DNA interactions, increasing accessibility of nearby TF binding sequences [12, 127130] (Fig. 11.3a). For example, the presence of a poly(dA:dT) tract in the Candida glabrata AFT1 promoter destabilizes a well-positioned nucleosome containing a metal responsive element, enabling Aft1 to bind and autoactivate its gene expression [131133]. Poly(dA:dT) tracts were also found to be major determinants of nucleosome exclusion in studies aimed at predicting in vivo nucleosome positions from DNA sequence features in a range of species [12, 134, 135]. Perhaps the most direct indication of the importance of intrinsic nucleosome sequence preferences in the establishment of NDRs at promoters has come from comparisons of in vivo yeast nucleosome occupancy patterns to those of nucleosomes reconstituted in vitro on purified yeast genomic DNA, which showed a high correlation between the two profiles [125, 136]. The importance of nucleosome disfavouring sequences in establishing NDRs is now widely accepted, though there is still some debate about the degree in which intrinsic sequence preferences dictate nucleosome positions outside these regions [137140].

Fig. 11.3
figure 11_3_209873_1_En

Mechanisms of TF access to chromatin. a Rigid Poly(dA:dT) elements (red) are refractory to nucleosome assembly, allowing TFs to access nearby binding sites. b A model for progressive opening of chromatin by sequential binding of multiple TFs, as proposed by Polach and Widom [153]. c Presentation of the Gal4 UAS by RSC (Reproduced, with permission, from [63]). The model shows two exposed binding sites in the Gal4 UAS in the RSC/UASg/nucleosome complex and that Gal4 (red and purple) can access these sites without disrupting the complex. The structure of RSC/nucleosome complex (yellow) was determined by cryo-electronmicroscopy [275] and the position of the DNA helix is indicated in green and blue

Despite their general applicability, models based on intrinsic nucleosome sequence preferences alone cannot fully explain the architecture of promoters and other regulatory sequences observed in living cells, even in yeast. An assessment of the influence of a wide range of sequence features on in vivo nucleosome positioning in budding yeast revealed additional strong nucleosome excluding elements that corresponded to binding motifs of sequence-specific TFs such as Reb1 and Abf1 [12]. The role of these factors in establishing NDRs was confirmed in Reb1 and Abf1 loss-of-function mutants that showed greatly increased nucleosome occupancy at hundreds of promoters containing their binding motifs [91, 141]. Moreover, the in vitro reconstituted nucleosome occupancy at Abf1 and Reb1 binding sites was higher than that measured in vivo [125]. Taken together, these data clearly indicate that TFs are capable of establishing NDRs at yeast promoters that lack intrinsic nucleosome-disfavouring sequences. Correspondingly, the concept of a universally encoded open promoter structure does not appear to apply to all genes: a subset of yeast genes that display highly variable expression levels have increased nucleosome occupancy in their promoters, consistent with predictions based on intrinsic sequence preferences [142]. It was proposed that the positioning of nucleosomes in these promoters plays a key role in the variable regulation of these genes.

The degree of basal nucleosome occupancy at promoters and other regulatory sequences also appears to vary between species. When applied to the human genome, models based on intrinsic nucleosome sequence preferences actually predict an overall increased occupancy at regulatory sites, in sharp contrast to most yeast promoters [143]. One explanation that was offered for this difference is that higher eukaryotes have greater requirements for variable gene expression, such as in the case of cell-type specific genes, and a constitutive open state might therefore not be desired [143]. Examples of TF binding to regions with high nucleosome occupancy have been described for the CCCTC-binding factor (CTCF) [144] and p53 [145], suggesting that the predicted increased nucleosome binding preferences in regulatory regions are relevant in vivo. Given these various observations it is evident that other mechanisms must exist to ensure TF access to DNA in regulatory regions that are occupied by nucleosomes. One model of TF binding to nucleosomal DNA that does not depend on external factors is based on in vitro observations that compacted DNA can undergo spontaneous transitions to more open states, allowing for brief windows of opportunity for TF access [146148]. These movements can affect relatively small regions of DNA near the nucleosome entry sites, a process referred to as “nucleosome breathing”, or involve the unwinding of DNA over longer stretches [147, 149]. The increased accessibility of DNA at nucleosome entry sites is consistent with observations that TF binding sites are, on average, enriched at these locations in vivo [150152]. Given the need to prevent cryptic transcription initiation, the thermodynamic balance in cells is likely such that individual TF binding events are not sufficient to prevent rapid rewrapping of nucleosomal DNA; however, cooperative binding of multiple TFs may overcome this barrier. Polach and Widom proposed that the binding of one TF could lead to further unwinding of the DNA on a nucleosome, enabling other factors to bind to nearby sites in a stepwise process that could ultimately result in a stable TF-DNA complex [153] (Fig. 11.3b). This cooperative model of TF access to nucleosomal DNA has two major additional benefits. First, it enables TFs to interact with each other without direct protein-protein contacts, creating new opportunities for coregulated gene expression. Second, the requirement for multiple closely spaced TF binding sites ensures regulatory site specificity. Cooperative binding of TFs to nucleosomal DNA has been demonstrated both in vitro [154] and in vivo [154157], though it remains difficult to assess how widespread this mode of regulation is across the genome.

There is also evidence that TFs can interact with DNA in a manner that involves additional direct contacts with nucleosomes. For example, FOXA1 (HNF3A) binds more strongly to nucleosomal DNA than to naked DNA [158]. The source of this unique behaviour can be traced to the protein structure of the FoxA family members. FOXA1-3 contain a C-terminal domain that interacts with the core histones H3 and H4, as well as a winged helix N-terminal forkhead DNA binding domain that structurally resembles that of linker histone H1 [159]. In stark contrast to H1 linker histones, which are known for their ability to stabilize nucleosomes and higher order chromatin structures [160, 161], FoxA factors have intrinsic chromatin opening activity [159, 162]. Interestingly, this activity does not require the action of CMs such as SWI/SNF. Because of their ability to open condensed chromatin, FoxA proteins have been proposed to function as “pioneer” TFs that facilitate the binding of other factors [159]. A similar pioneer function has also been described for the RAR and RXR members of the nuclear receptor family, due to their ability to bind a highly compacted chromatin fibre containing a PEPCK promoter in an in vitro system that recaptured the chromatin dynamics observed at this promoter in vivo [163]. In this system, the action of the RAR/RXR heterodimer together with CMs was required to disrupt the chromatin for subsequent binding of nuclear factor 1 (NF1), an essential coregulator for transcriptional activation of PEPCK. The requirement for additional coregulators in transcriptional activation by both FoxA and RAR/RXR may be essential to ensure that their actions do not result in spurious transcription at non-specific sites in the genome. In the case of FoxA, methylation patterns associated with repressive or active chromatin domains also further guide recruitment to specific sites [164].

Other TFs that are able to access condensed chromatin include the CAAT-box/enhancer binding protein (C/EBP), though its pioneering role may be limited to a subset of genes [165]. In yeast, the Reb1 and Abf1 TFs can clearly function as pioneers as well, as evidenced by their aforementioned ability to direct the formation of NDRs [91, 141]. Finally, Gal4 upstream activating sequences (UAS) are able form mini-promoters regardless of their location in the genome [166], indicating that Gal4 binding can also disrupt chromatin. The Gal4 UAS used in this study contained multiple Gal4 binding sites, suggesting cooperative binding as a possible mechanism underlying this effect. Alternatively, Gal4 access to nucleosomal DNA can also be aided by the actions of CMs in a manner that does not involve displacing nucleosomes away from binding sites, as it was recently shown that the RSC complex can envelop and partially unwind a nucleosome in the GAL1/GAL10 promoter, with RSC essentially “presenting” this element for Gal4 binding [63] (Fig. 11.3c).

11.5 A Dynamic Regulatory Role for Chromatin

Up to this point, the relationship between TFs and chromatin has mainly been explored in terms of how TFs overcome the chromatin barrier to access DNA and facilitate further remodelling. However, the involvement of chromatin in gene expression goes beyond merely forming a passive impediment to TF binding. Indeed, there are many indications that CMs are causative for gene expression outputs, so presumably they must be both regulated and regulatory. In the remainder of this chapter I will examine some of the other roles of chromatin remodelling, such as effecting transcriptional repression and controlling the accessibility and activity of regulatory regions, as well as establishing higher-order chromatin organization. In all these cases the role of TFs will be highlighted in particular.

CMs are essential coregulators in TF-mediated repression of many target genes. A large number of these coregulators belong to a family of histone deacetylases (HDACs) [167, 168], which catalyze the removal of acetyl groups that are closely associated with a relaxed chromatin structure. Accordingly, they prevent initiation by maintaining chromatin in a condensed state that is inaccessible to the transcription machinery. Some of the effects of HDACs may also be mediated by deacetylation of proteins other than histones, such as TFs [169]. Like their HAT counterparts, HDACs typically operate as part of larger corepressor complexes that include other chromatin binding or remodelling activities, as has been described for the NURD [81] and NCoR (nuclear receptor corepressor) complexes [168, 170]. The importance of HDACs in transcriptional repression is reflected in the size of their family, which includes as many as 6 different members in yeast and 18 in human, distributed over four main classes [171]. In addition to HDACs, other CMs such as ATPase nucleosome remodelers have also been implicated in the formation of repressive chromatin structures. For example, the ISW2 complex can be recruited to a large variety of promoters by the Ume6 repressor in budding yeast, where it establishes a repressive chromatin environment as evidenced by decreased nuclease sensitivity [172]. SWI/SNF remodelers can also effect transcriptional repression, either directly [173175], or as part of larger corepressor complexes that include deacetylase activities [81, 170, 176]. In contrast to HDACs, the mechanisms by which ATPase remodelers act to repress transcription are less well understood, but presumably involve chromatin compaction [172, 173] and/or the repositioning of nucleosomes to block important TF binding sites [177].

By condensing chromatin at promoters of repressed genes, CMs can place important restrictions on the actions of TFs, as illustrated by the effects of the Tup1-Cyc8 corepressor on Rap1-mediated gene activation in budding yeast [178]. The Tup1-Cyc8 complex was one of the first corepressors to be identified [179] and is targeted to promoters by a variety of sequence-specific TFs [180183] where it recruits HDACs and the Isw2 remodeler complex to induce chromatin condensation [184, 185]. Among the Tup1-Cyc8 targets are promoters of genes that are bound by Rap1 in low- but not high-glucose conditions, despite the fact that Rap1 directs the expression of other genes encoding glycolytic enzymes and ribosomal protein subunits when glucose is present [186, 187]. The increased number of Rap1 targets in low-glucose is even more surprising given that global Rap1 levels actually decrease during a shift to low glucose medium [178]. The contradictory behaviour of Rap1 binding was explained by the actions of Tup1-Cyc8, which prevent Rap1 binding to low-glucose specific genes when glucose is present. The Tup1-Cyc8-mediated promoter compaction is only released upon glucose depletion, presumably through a mechanism that involves the release or inactivation of the TFs responsible for recruiting Tup1-Cyc8, allowing Rap1 to bind [178]. This example shows that chromatin remodelling can provide an additional level of regulation of gene expression by preventing activators from recognizing their binding sites in target promoters.

An unexpected finding has been that the actions of chromatin-targeting corepressors are not just limited to transcriptionally silent regions. Genome-wide ChIP experiments have revealed that HDACs are also associated with active promoters [188, 189]. Even more surprising, the degree of HDAC recruitment was positively correlated with transcription levels. To explain this paradox, it was proposed that the presence of HDACs at active promoters was needed to reset the chromatin state between subsequent rounds of initiation [189, 190], which suggests that histone acetylation – like TF and nucleosome interactions – may be inherently transient. Indeed, the dynamic nature of TFs interactions with DNA in vivo may well be directly connected to negative feedback from CMs. For example, the human glucocorticoid receptor can be actively removed from promoter templates by SWI/SNF remodelers [26, 191] and Rsc2 can speed up the release of Ace1 from non-specific binding sites in yeast [27]. Nevertheless, the presence of remodelling complexes associated with repression at active promoters does not necessarily have to be associated with returning these promoters to their basal state. The yeast SWI/SNF ATPase Mot1 is a global repressor known for its role in removing TBP from DNA [192], and like HDACs, its presence at promoters is positively correlated with transcript levels [193]. However, in this particular case it was shown that Mot1 can actually make a positive contribution to PIC assembly at active promoters by releasing a transcriptionally inert TBP complexed with the NC2 inhibitor, thereby allowing entry of free TBP and productive initiation [193].

The precise positioning of nucleosomes at promoters may also be important for establishing regulated gene expression, as illustrated by the actions of the RSC complex at the CHA1 promoter in budding yeast. In uninduced conditions, RSC represses CHA1 expression by placing a nucleosome over the TATA box, resulting in a decreased level of TBP binding [177, 194]. Crucially, in the absence of two key RSC components (Swh3 and Sth1), the expression levels of CHA1 in uninduced cells are approximately equal to those observed in fully induced cells. Thus, the presence of an inhibitory nucleosome over binding motifs recognized by the basal transcription machinery is vital for maintaining activator-regulated expression of CHA1. Similar regulation mechanisms are likely far more widespread, given the aforementioned observation that yeast genes with variable expression levels tend to have increased nucleosome occupancy within their promoter regions, often overlapping TATA boxes [142]. Taken together, these various observations show that the complex interplay between chromatin, CMs and TFs affects all aspects of transcription regulation.

11.6 TFs and Higher Order Chromatin Organization

In addition to the localized organization at the level of individual regulatory regions, chromatin is also arranged into higher-order structures that can span broad regions and affect multiple genes. These domains typically share a common chromatin environment that is characterized by a specific signature of histone marks and associated proteins. Classic examples of such domains include the condensed heterochromatin regions found at telomeres and in the pericentric regions surrounding centromeres in most organisms, as well as the mating-type loci in yeasts [195]. The heterochromatin in these regions is characterized by the presence of heterochromatin protein 1 (HP1) [196], histone hypoacetylation and H3K9 methylation (H3K9me) [197]. The co-occurrence of these marks is no coincidence, as H3K9me serves as an anchor point for the chromodomain that is present in HP1 [75]. Homologues of HP1 have been identified in Drosophila, vertebrates and fission yeast and its loss invariably leads to defects in telomere and centromere function. Additional domains marked by HP1 and H3K9me have also been associated with silencing of a number of genes dispersed throughout the genome [198200].

A second important type of chromatin domain involved in gene silencing is established by Polycomb group (PcG) proteins. PcG proteins were initially identified as key developmental regulators of the Hox gene cluster in Drosophila (Reviewed in [201]), and two main PcG protein complexes have since been characterized with distinct roles in silencing in plants, vertebrates and flies. Polycomb repressive complex 2 (PRC2) has histone modifier activity and trimethylates H3K27, a characteristic signature of PcG chromatin domains, which can span up to 100 kb [202204]. This methylation mark can be read by PRC1, which possesses ubiquitination activity. The specific mechanisms underlying HP1 and PcG silencing have been discussed in great detail elsewhere [195, 205207]. Here, I will use these two domain types to illustrate the role of TFs in establishing higher order chromatin structure.

Heterochromatin typically originates at specific nucleation sites from which chromatin condensation spreads along the chromatin fibre. At telomeres, pericentric regions and yeast mating type loci, these nucleation sites often consist of highly repetitive DNA elements [208210]. Studies in fission yeast have shown that repeat-based silencing depends on transcription of the repetitive regions and RNAi pathways [211, 212], and similar mechanisms have since been found to operate in fly, plants and vertebrates (Reviewed in [213]). There are also many examples where silencing is nucleated by TF binding, however. In fission yeast, the Pcr1 and Atf1 TFs can bind a heptamer sequence in the REIII element at the mating-type locus [214] and recruit the Clr4 histone methylase, the HP1 homolog Swi6, and the histone deacetylase Clr3 silencing factors [215, 216]. Budding yeast lacks HP1 homologs, but possesses silent information regulator (SIR) proteins that perform similar functions and which can be recruited to telomeres and mating-type loci by the synergistic actions of Rap1, Abf1 and Orc1 [217]. In tetrapods (four-limbed vertebrates), a large family of kruppel-associated box domain zinc finger TFs (KRAB-ZF) has also been implicated in silencing. The KRAB domain that characterizes this family interacts with KRAB associated protein 1 (KAP1) [218, 219], which acts as a scaffold for several heterochromatin-associated proteins, including HP1 [220222]. Synthetic TF constructs with KRAB domains have been shown to induce heterochromatin silencing over broad regions, up to 12 kb away from their binding site [223, 224]. Natural KRAB-ZF proteins have been linked to the autoregulation of large clusters of KRAB-ZF genes [199, 200], but given that KRAB domains are present in more than 200 human TFs, they likely play a much wider role in chromatin metabolism. The KRAB domain is also discussed in Chapters 4 and 12 of this volume.

In contrast to HP1-associated heterochromatin, the origins of Polycomb domains are less well understood. In Drosophila, silencing by PcG proteins is driven by Polycomb response elements (PREs), which contain binding sites for the Pleiohomeotic (PHO) and PHO-like zinc finger TFs [225, 226], the only PcG proteins identified to date with DNA sequence specificity. The importance of PHO and PHO-like for PRE function is firmly established, as their disruption results in silencing defects at Hox genes [225, 227, 228] and a loss of PRC1 and PRC2 components [228]; however, PHO binding sites alone are insufficient to confer PRE-mediated silencing [225, 226, 229]. Many other TFs have been shown to bind PREs in Drosophila, including Pipsqueak, Zeste and GAGA factor (GAF) (Reviewed in [72]), but their role in silencing is unclear, given that null mutants for many of these genes do not show obvious PcG phenotypes. One possible explanation is that these TFs act synergistically at PREs, which is consistent with computational analyses that show that clusters of TF binding motifs – but not individual sites – can distinguish PRE from non-PRE sequences [230]. Redundancy between factors may explain why some null mutants do not show phenotypes.

Even less is known about PRC recruitment in vertebrates, where it has proved challenging to identify PREs because PcG proteins are often distributed over broad regions [202, 204, 231, 232]. A 3 kb DNA fragment in the MafB gene region that possesses activities consistent with a PRE was recently identified in mouse [233]. This fragment, named PRE-kr, was shown to bind PcG proteins and contains conserved binding sites for the mammalian PHO homolog YY1, as well as GAGAG motifs that are known to be bound by GAF and Pipsqueak in Drosophila. Another PRE with conserved YY1 binding sites has since been characterized in the human HOXD cluster, and disruption of these sites negatively affected binding of the PRC1 component BMI1 [234]. The role of YY1 in PcG silencing is consistent with earlier observations that YY1 knockdown results in loss of recruitment of the PRC2 component Ezh2 and H3K27me [235], as well as with other studies that have shown that YY1 interacts with PcG components [236238]. Taken together, these data suggest that at least some of the PcG-targeting mechanisms are conserved between flies and mammals. Nonetheless, other TFs such as the embryonic stem cell regulators OCT4 and NANOG may also be involved in targeting PcG proteins in mammals, based on their high degree of overlap with PcG proteins in ChIP studies [202, 231, 239]. Moreover, the discovery of the HOTAIR transcript, which targets PRC2 to the human HOXD locus, indicates that ncRNAs also play a role in directing Polycomb silencing [240]. Future studies will undoubtedly reveal whether this latter mechanism is more widespread.

Several mechanisms are believed to operate to expand chromatin domains beyond their initial nucleation sites (Reviewed in [241]). One model of spreading described for HP1 family members depends on a self-sustaining wave of silencing complex assembly, which is based in the ability of HP1 to bind both H3K9 methylated histones as well as the methyltransferase responsible for this modification (Fig. 11.4a) [75, 77, 242]. Starting at the nucleation site, H3K9 methylation of neighboring nucleosomes by HP1-recruited methyltransferases creates new HP1 binding sites, resulting in more HP1 binding and further propagation of the signal. A similar mechanism involving repeated cycles of deacetylation has also been described for SIR proteins in budding yeast [243, 244]. Recurrent assembly cannot completely account for all observations of spreading from a nucleation site, however, as indicated by the following examples. In budding yeast, individual Rap1 and Abf1 binding sites that are unable to direct silencing independently can enhance the actions of a silencer that is 4 kb away [245], suggesting long–range interactions between these sites. Another signal spreading from a subtelomeric silencer was shown to “skip over” an active reporter gene flanked by subtelomeric antisilencing regions (STARS), but still affected a second distal reporter gene [246]. Finally, ChIP studies of PcG proteins in Drosophila have revealed distribution patterns that seem inconsistent with a progressive spreading of Polycomb complexes. For example, while the H3K27me3 mark is consistently found in large domains [203, 247250], the PRC1 components Ph and Psc and the PRC2 methyltransferase E(z) are concentrated in much smaller peaks [203, 247]. Currently, the most favoured model to explain these various observations involves folding of the DNA in a manner that allows nucleation sites to contact and modify the surrounding chromatin (Fig. 11.4b), and has been proposed to explain the difference in distribution patterns of PcG components and H3K27me3 [251]. Several cases of long–range interactions between PREs and distant regulatory sites have also been described, forming higher order chromatin loop configurations that may facilitate gene silencing across broad domains [252, 253]. The relationship of TFs to higher-order chromatin structure is described in more detail in Chapter 13.

Fig. 11.4
figure 11_4_209873_1_En

Formation of chromatin domains. a Mechanism of spreading for HP1 heterochromatin at the S. pombe mating type locus from TF nucleation sites (Modified from [214]). Atf1 and Pcr1 binding results in the recruitment of the Clr3 histone deacetylase, which subsequently cooperates with heterochromatin proteins (HP) such as the HP1 homolog Swi6 to promote H3K9me of neighbouring nucleosomes. This creates additional HP1 binding sites, which form the basis for the spreading process. b Schematic representation of spreading of chromatin domains by looping interactions between the nucleation site and the surrounding DNA. c Model for the enhancer-blocker function of CTCF. Interactions between distant CTCF binding sites can form looped domains, thereby isolating genes from the actions of upstream enhancers

Given that silencing can propagate autonomously along the chromatin fibre, and that distal regulatory elements such as PREs and enhancers can operate over large distances, how are their effects on one region of the genome kept from spilling over to nearby genes? The answer to this question lies in yet another group of regulatory elements called insulators [254256], which possess one of two distinct characteristics: (1) they can block enhancers from activating genes when placed between the enhancer and the gene or (2) they can act as boundary elements to prevent the spread of the silencing effects of heterochromatin. These two activities are separate and measured in different assays, though many insulators can perform both functions in vivo, such as the 5HS4 insulator in the chicken β-globin locus [257, 258]. Once again, TFs play a central role in establishing insulator regions, and at least five different insulator-binding TFs have been identified in Drosophila to date: ZW5, Su(Hw), dCTCF, BEAF, and GAGA (Reviewed in [259]). In contrast, most vertebrate insulators appear to depend on only a single TF, the CCCTC-binding factor (CTCF) [257]. CTCF is considered to mainly function as an enhancer blocker rather than as a boundary protein, as evidenced by the fact that it is dispensable for blocking the spread of heterochromatin at the chicken β-globin locus [260]. Instead, this latter function depends on the USF1 TF, which binds boundary elements in the 5HS4 insulator as a heterodimer with USF2 [258, 261]. The USF1/USF2 heterodimer recruits HATs and the SET 7/9 methyltransferase, which establish a region of open chromatin that is thought to prevent the progression of silencing analogous to the manner in which firewalls prevent forest fires from spreading. In contrast, enhancer-blocking insulators such as those bound by Su(Hw) in Drosophila (Reviewed in [262]) or CTCF in vertebrates (Reviewed in [263]) have been suggested to operate by organizing chromatin into looped domains, isolating the genes contained inside from their distant regulatory elements (Fig. 11.4c). In addition, CTCF has also been implicated in anchoring DNA to the nuclear periphery, an area that is typically associated with a repressive chromatin environment, as it was found to be enriched at the boundaries of domains that are linked to the nuclear lamina [264].

11.7 Concluding Remarks

The complexity of chromatin–TF interactions is reflected in the considerable variability in initiation mechanisms for the few genes studied in great detail [83] suggesting that there are many routes leading to productive transcription. Indeed, considering that the requirement for coregulators at a single gene can vary depending on external conditions, and that promoters are typically unique in a genome, the number of transcriptional activation mechanisms may yet prove to be larger than the number of genes. Nonetheless, the number of possibilities is clearly not unlimited, since at any given regulatory region only a subset of TFs and their coregulators play a dominant role. Thus, it should be possible to build a catalogue of the proteins most commonly bound to these elements in specific cell types, and eventually decode the mechanisms that control gene expression. ChIP in combination with either microarrays or next-generation sequencing is currently the most widely used method for the identification of the proteins and histone modifications associated with DNA [265, 266]; however, this technique has several drawbacks. First, it can only identify the location of a handful of proteins at the same time, and second, it requires advance knowledge of the factor(s) to study. An alternative approach called proteomics of isolated chromatin segments (PICh) was recently developed that does not suffer from these limitations, and uses mass-spectrometry to detect proteins associated with a chromatin segment [267]. If this approach were to be applied to the large collections of regulatory regions that are now being identified in genome-wide nuclease hypersensitivity assays such as those undertaken by the ENCODE and modENCODE consortia [268], it might greatly expand our knowledge of the interplay between TFs and chromatin at these locations.

Simply knowing which proteins are associated with a given genomic region will not be enough to understand how these proteins operate to regulate transcription, since they generally do not work in isolation. Protein–protein interaction maps should also greatly facilitate mapping gene regulatory mechanisms, since they reveal interactions between and among TFs and CMs [269]. Moreover, maps of long range interactions between regulatory regions are needed to understand the interplay between promoters, enhancers, silencers and insulators. The advent of new technologies such as the numerous derivatives of chromosome conformation capture (3C) [270, 271] now make such approaches possible at a genome-wide level (see Chapter 13). Finally, detailed knowledge of the affinities of TFs and their coregulators for DNA, as well as for their protein binding partners will also be essential. This will require the application of techniques that can assess both the intrinsic DNA sequence specificities of TFs (see Chapter 8) and the binding kinetics of proteins, in a high-throughput and quantitative fashion. Potential strategies for the latter have been outlined by Segal and Widom [272]. Together, these various types of data will provide valuable insight into the ground rules that govern the interactions between DNA, chromatin and the transcription machinery. These rules can then form the basis for in silico modeling of these processes, which will be essential if we are to fully understand the intricate relationships between TFs and chromatin.