Introduction

When chromosomal DNA is wrapped around nucleosomes, it can become inaccessible for interactions with transcription factors including activators, TFIID and RNA polymerase. For this reason, active or potentially active regulatory elements are typically located in nucleosome-free regions of chromatin and are hypersensitive to chemical agents or to cleavage by nucleases such as DNase I or MNase. Accessibility depends upon a local configuration of nucleosome–DNA interactions that is permissive for the assembly of a variety of different, often large-scale regulatory complexes. In Drosophila, these regulatory complexes include the combinations of general transcription factors and RNA polymerase that are found at promoter sequences, the different Polycomb group complexes that are assembled on polycomb response elements (PREs), the protein complexes that are associated with boundary elements (insulators), the complexes that assemble on X-linked chromatin entry sites (CES) and the multiple factors that interact with enhancer sequences. In most all of these cases, there are specialized DNA-binding factors that are able to mediate alterations in the configuration of nucleosomes in the immediate neighborhood and help create and maintain nucleosome-free regions that are accessible to other functionally specific factors.

One of the DNA-binding proteins in Drosophila that helps generate nucleosome-free regions of chromatin is the GAGA factor (GAF). GAF binds to (GA)n sequences throughout the genome [1] and is encoded by the Trithorax-like (Trl) gene [2]. It was first identified as a factor that binds to GAGAG motifs located in the upstream promoter regions of the Ultrabithorax (Ubx) [3] and engrailed (en) [4] genes and stimulates their transcription in nuclear extracts. Subsequent studies showed that GAGAG motifs are present in the upstream regions of promoters in many different genes including kruppel, actin5C, ecdysone E74 as well as several heat shock genes (e.g. hsp70, hsp26, hsp27) and implicated these motifs in their transcriptional activity either in vitro or in vivo [5, 6]. The localization of GAF to promoter regions was further documented in genome-wide studies which showed that it associates with ~ 20% of the Pol II-bound promoters, and that this association correlates with the presence of a paused Pol II at the promoter [7,8,9].

It was initially thought that GAF acts as a classical activator; however, studies by Kadonaga and colleagues demonstrated that GAF enhances transcription in vitro indirectly by countering the repressive effects of histone H1 [10, 11]. Their findings led to idea that GAF might function by counteracting the inhibitory effects of chromatin on transcription and this view was reinforced by two different lines of evidence. One came from in vitro chromatin assembly experiments. When chromatin was assembled in early embryonic extracts on plasmid templates containing the hsp70 heat shock gene, the promoter region was inaccessible to nuclease cleavage; however, if assembly took place in the presence of GAF, or GAF was added (together with ATP) after assembly was completed, the promoter region, but not elsewhere in the hsp70 gene, became accessible to nuclease cleavage [12]. These experiments suggested that GAF helped displace nucleosomes from the promoter region. A somewhat different result was obtained for the hsp26 promoter. In vivo, this promoter has two hypersensitive sites containing binding sequences for GAF and the heat shock factor, HSF, separated by a positioned nucleosome. When chromatin was assembled on a plasmid containing the hsp26 promoter region in vitro, a nucleosome was positioned in the promoter so that GAF and HSF could bind to their recognition sequences in the absence of added ATP; however, when ATP was include in combination with GAF or HSF, the nucleosome position was altered by what appeared to be a sliding mechanism [13]. GAF can also facilitate the remodeling of the fushi-tarazu (ftz) promoter and its transcriptional activity in vitro. In this case, remodeling activity depends upon four GAGA motifs in the promoter [14].

The second line of evidence came from in vivo studies on the hsp26 and hsp70 heat shock genes. In the case of the hsp26 gene, the requirements for forming a “native” chromatin structure appeared to be more stringent than those in vitro. The GAF recognition sequences in the proximal and distal nuclease hypersensitive sites in the hsp26 promoter were found to be critical not only for heat induction, but also for generating the two nuclease hypersensitive sites. In contrast, the HSF factor recognition sequences were critical for induction, but not for hypersensitivity [15,16,17]. While mutations in the TATA box led to only a minor reduction of hypersensitivity, more extensive sequence alterations that eliminate TFIID binding in vitro had a much greater effect on hypersensitivity without apparently compromising GAF association with the promoter [18, 19].

The GAF recognition sequences in the upstream region of the hsp70 promoter have a similar function. Lee et al. found that mutations in the GAF sequences in the hsp70 promoter significantly reduce the level of paused polymerase under non-heat shock conditions and compromise the response of the promoter to heat induction [20]. Subsequent studies showed that GAF association with the upstream hsp70 promoter region is important for generating nuclease hypersensitivity and for the association of TFIID (TBP), the NELF pausing complex and HSF [9, 21, 22]. Mutations in the TATA box that reduce the binding of TFIID by at least 20-fold [18] and also deplete NELF do not affect GAF binding [9] indicating that GAF is associated with the promoter prior to these factors. The role of GAF in generating nucleosome-free regions is also supported by genome-wide MNase-seq experiments, which show that many GAF associated promoters and intergenic regions become much less accessible upon GAF RNAi knockdown [7]. Additionally, there is a decrease in Pol II (Rpb3) and CBP histone acetyltransferase [23] association at a subset of promoters after GAF RNAi knockdown.

The association of GAF with promoter regions dovetailed with the discovery that Trl mutations dominantly enhanced the loss-of-function phenotypes seen in Ubx/ + heterozygous animals. Trl mutations were also found to enhance position effect variegation of the white gene in the inversion in(1)wm4h. However, these were not the only phenotypes observed. Adult survivors of a hypomophic Trl allele, Trl13C, have loss-of-function transformations in segment A6 (parasegment 11: PS11). Like A5 and A7 (PS10 and PS12), A6 identity is specified by Abdominal-B (Abd-B); however, both of these segments appear normal suggesting that misspecification of A6 (PS11) was due to a defect is in the functioning of the A6 regulatory domain, iab-6, and not in the activity of the Abd-B promoter [2]. Other findings also pointed to functions beyond ensuring that promoters are accessible to general transcription factors and transcriptional activators. For example, GAGA motifs are found in many polycomb response elements (PREs). Consistent with GAF being important for the silencing function of these elements, mutations either in Trl or in the GAGA motifs weaken or disrupt Polycomb silencing [24,25,26,27,28,29]. Additional evidence that GAF is required for polycomb-dependent silencing comes from chromatin immunoprecipitation (ChIP) experiments, which showed that GAF is associated with PREs in vivo [30,31,32]. GAF is also important for the functioning of fly chromatin boundary elements (insulators). GAF localizes to many known boundaries [33] and mutations in their GAGA motifs or in the Trl gene can impair insulator functions [34,35,36,37,38].

GAF association with promoters, PREs, boundary elements, and probably also many enhancers [39,40,41] is connected to chromatin remodeling and the process of zygotic genome activation (ZGA) in early embryos. The first indication of a possible role in the earliest steps of embryogenesis came from studies on the maternal effect allele Trl13C [42]. Even though maternal deposition of Trl mRNA in embryos from homozygous Trl13C is reduced, but not eliminated, only a small percentage of the embryos produced by homozygous Trl13C mothers hatch and most of the progeny arrest development prior to cellularization. Transcription of two genes, fushi taratzu (ftz) and engrailed (en), that are normally turned on during nuclear cycles 12–14 is disrupted. The severity of this disruption in individual nuclei correlates with the extent of reduction in the amount of GAF protein in the nucleus. Even more striking than the defects in transcription are abnormalities in the pre-blastoderm nuclear division cycles in Trl13C embryos. The defects include asynchrony, incomplete chromosome segregation, chromosome fragmentation, and nuclear disintegration. As was observed for transcription, the severity of the disruptions in mitosis is correlated with the extent of depletion of GAF [42]. Interestingly, GAF is found to associate with heterochromatic regions during mitosis, suggesting that it may contribute to centrosome function [42, 43].

Similar though somewhat less severe nuclear division defects have been reported in embryos produced by germline clone mothers mutant in the ZGA gene Zelda (zld) [44, 45] raising the possibility that GAF could also play a prominent role in ZGA. Several findings are consistent with this suggestion. The first indication that GAF could have a genome-wide role in ZGA was the discovery that many of the genes that are first transcribed at high levels during nuclear cycle 14 have paused polymerases [46]. Consistent with studies on fly heat shock genes, which showed that GAF plays an important role in polymerase pausing [7, 20, 47, 48], many of these ZGA genes have GAF motifs in their promoters [46]. Further suggesting that GAF is likely important for Pol II recruitment in pre-cellular blastoderm embryos, Blythe and Weischaus found that reducing Zld or GAF activity suppresses the mitotic defects in mei-41 mutant embryos. Since the mitotic defects in mei-41 are thought to arise from conflicts between the replication machinery and Pol II, limiting Pol II activity would be expected to reduce the severity of the defects [49]. Subsequent work by Schultz et al. showed that there are two classes of Zld sites in early embryos, those that become inaccessible after Zld depletion, and those that remain open. Most of the Zld sites in the latter class are marked by GAGA motifs and thus could be bound by GAF [50]. Taken together these findings implicate GAF in ZGA, and suggest that in this context, it likely has complementary and sometimes overlapping functions with the pioneer factor Zld. The relationship between Zld and GAF has been further explored by Gaskill et al. [51]. They found that the earliest transcribed genes depend more on Zld than GAF, while genes that are activated during the major wave of transcription in NC14 tend to require GAF. This correlation fits with the genome-wide distribution of these two proteins. Zld sites are enriched in the vicinity of genes activated in earlier nuclear division cycles, while GAF occupancy is enriched near genes activated in the major wave of transcription. As was found for Zld, there are thousands of sites in the genome in early embryos whose accessibility in chromatin depends upon GAF [51]. With some exceptions, most of the GAF-dependent sites do not depend on Zld. Likewise, most sequences that require Zld for accessibility, do not also depend on GAF.

How does GAF perform such a diverse array of functions ranging from mitosis to transcriptional activation and PcG dependent silencing? One common thread linking these different functions is the establishment of regions of chromatin that are nucleosome free so that other factors, which have dedicated activities, are able to access their binding sites. Another is the existence of numerous partners that are implicated in different GAF functions [52]. In this review, we have summarize our current understanding of how GAF functions and the role of its different protein partners.

Structural features of the GAF protein

GAF has three distinguishable domains: an N-terminal BTB/POZ domain (Bric a brac, Tramtrack, Broad-complex/Poxvirus, Zinc finger), a central C2H2-type zinc finger and finally several alternative glutamine rich (Q) C-terminal domains [53, 54] (Fig. 1a). The two most abundant GAF isoforms, 519aa (GAF519) and 582aa (GAF582) share the BTB/POZ and zinc finger domains, but have distinct glutamine rich C-terminal domains of 142 and 205 amino acids (aa) in length respectively [55].

Fig. 1
figure 1

a Structure of GAF519 and GAF582 isoforms. Both isoforms share a 1-377 aa N-terminal domain that contains BTB/POZ domain and the C2H2-type zinc finger, while they have alternative C-terminal domains that are rich poly Q sequences. Amino acids K325 and K373 are targets for acetylation, while S378 and S388 may be phosphorylated in GAF519. b Proteins that interact directly with GAF. Proteins are shown as colored ovals. The size of the oval indicates the relative size of the proteins. The sequences for each protein that mediate interactions with GAF are indicated in the round brackets, while the corresponding references are indicated in the square brackets. Proteins that interact with GAF via their BTB domain are: Ttk, Mod(mdg4), Psq/BTB-V, Lola/BTB-IV, Lolal, Bab2, CG8924. Protein partners that interact with GAF but lack a BTB domain are: Pzq/Z4, CG2199, Corto, E(bx), Mep-1, Ssrp, TAF3/Bip2, Bin1. Red arrows indicate the GAF sequences that are required for interaction

The single zinc finger domain is responsible for DNA binding. A minimal trinucleotide sequence GAG is sufficient for recognition and binding [56], while the GAGAG pentanucleotide appears to be optimal for interaction [57, 58]. However, in vivo not all of the potential binding sites are occupied by GAF. Instead, GAF localization correlates with the number and density of GAGA-like motifs and sequences that are most highly enriched among GAF-bound fragments have multiple motifs [1, 30]. These findings suggest that interaction of GAF with its cognate recognition sequences in vivo is facilitated by cooperative interactions. Consistent with this possibility there is good evidence for cooperative binding in vitro [59, 60]. Cooperativity in DNA binding depends upon the zinc finger and the N-terminal BTB/POZ domain and the number of GAF-binding sites in the DNA substrate. When only a single GAGAG motif (1xGAGA) is present the BTB/POZ domain actually inhibits binding by the full-length GAF519 protein. In this case, truncated versions of GAF—either a BTB/POZ domain (∆POZ) deletion or a protein with only the DNA-binding domain (DBD)—binds to the 1xGAGA sequence with six–ninefold greater affinity than GAF519. A different result is obtained with a 5xGAGA oligonucleotide or natural DNA sequences, such as the Ubx, hsp26 and hsp70 promoters, that have multiple GAGA-like motifs. For these sequences, GAF519 binds with about tenfold greater affinity than either the ∆POZ or DBD proteins. Gel shift experiments indicate that this difference in affinity arises because GAF519 binds cooperatively to DNAs containing multiple GAGA-like motifs while the ∆POZ and DBD proteins do not. EM analysis shows that GAF assembles into large multi-subunit complexes on these natural templates, while such complexes can’t be detected with the ∆POZ protein. Moreover, complexes appear to be partially pre-assembled in solution in the absence of a DNA substrate as gel filtration experiments indicate that bacterially expressed GAF fractionates with a rather broad size distribution from a 60 kD monomer to multimers of up to 600 kD [60].

Other studies have also implicated the GAF BTB/POZ domain in homo and heterotypic protein–protein interactions [52, 61,62,63]. As was observed for the full-length GAF519 protein, the BTB/POZ domain self-assembles into a series of oligomers [62]. In gel filtration experiments, the BTB/POZ domain elutes in a broad distribution with a peak corresponding to a hexamer or octamer. In crosslinking experiments, the predominant species at low concentrations of crosslinker are monomers and dimers; while at higher concentrations several different multimeric species are evident. By contrast, the BTB/POZ domains of other proteins appear to assemble into more stable complexes. For example, a single multimeric species is observed for the Mod(mdg4) BTB/POZ domain and this multimer appears to correspond to an octamer [62]. The GAF BTB/POZ domain also mediates protein:protein interactions with other proteins that have a BTB/POZ domain [52, 61,62,63]. These GAF interacting BTB/POZ domain proteins include: Tramtrack (Ttk) [61,62,63], Lolal (Batman) [52, 62], Mod(mdg4), Pipsqueak (Psq/BTB-V) [62], Lola/BTB-IV, Bab2/BTB-II and CG8924 [52] (Fig. 1b). While the GAF BTB/POZ domain is promiscuous in its interactions with the BTB/POZ domains of other proteins, this is not the case for its interaction partners. Bonchuk et al. found that though the BTB/POZ domains of Mod(mdg4), Ttk and Psq all interact with GAF, Batman and themselves, they do not interact with each other [62].

While these findings indicate that GAF likely assembles into a variety of distinct complexes via interactions between its BTB/POZ domain and the BTB/POZ domain of other proteins, this is not the only protein:protein interaction domains. For example, other studies have suggested that the central part of the protein, which contains the zinc finger, might also facilitate interactions with other proteins [52, 64, 65] (Fig. 1b). The C-terminal poly Q domains could also have specialized protein interaction functions. The two primary isoforms, GAF519 and GAF582, have quite different developmental profiles. mRNAs encoding GAF519 are deposited in the developing egg during oogenesis and this isoform is present at high levels during the early stages of embryogenesis. In contrast, the larger GAF582 isoform only appears around 6 h of development [53]. In spite of the different developmental profiles, both isoforms co-localize in polytene chromosomes and behave indistinguishably in in vitro DNA binding and in tissue culture transient-transfection experiments [53]. On the other hand, while in vivo functional studies indicate that the two isoforms have overlapping functions, their activities are clearly not equivalent. The differences in activity may be related to distinct functional requirements at different stages of development [66]. Transgenes expressing both isoforms can rescue the zygotic lethality of Trl mutant flies; however, the extent of rescue is greater for GAF582, which is first expressed in the zygote at 6 h. The opposite result is obtained for the maternal effect lethality of the Trl13C allele. Transgenes expressing GAF519 partially rescue the maternal effect lethality, while transgenes expressing GAF582 do not. While this is consistent with their distinct expression patterns, the reason for this difference is not clear. Both isoforms associate with centromeric heterochromatin during mitosis and rescue the nuclear division defects in pre-cellular blastoderm embryos from Trl13C mothers. Both also rescue the defects in transcription of the ftz gene. Thus, there must be some other GAF target(s) in early embryos that requires the 519 aa and not the 582 aa isoform [66].

Another factor that is likely to impact GAF functionality is post-translational modification. The known modifications include O-glycosylation [67], phosphorylation [68] and acetylation [69]. Although the role and extent of the post-translational modifications are not completely understood, they are likely be important in modulating GAF activities through changes in DNA binding and protein:protein interactions. Consistent with this suggestion, phosphorylation and acetylation were shown to reduce the GAF DNA-binding activity [68, 69] suggesting that they could modulate GAF interactions with its target sequences.

GAF and chromatin remodelers

Eukaryotic gene expression in vivo takes places in the context of a chromatinized template, in which the DNA is assembled into nucleosomes and associated non-histone chromosomal proteins. Consequently, critical components of the transcriptional machinery (transcription factors, factors required for enhancer or silencer activity, RNA polymerases) cannot interact with their target sequences unless they are accessible. Accessible sequences are typically nucleosome free and can be detect as regions of DNA that are hypersensitive to nuclease cleavage [70, 71]. Nucleosome-free regions can extend over several hundred base pairs and are flanked by nucleosomes that are marked by highly dynamic H3.3 and H2A.Z histone variants [72]. One mechanism for generating accessible regions in chromatin involves a class of special “pioneer” factors that can interact with nucleosomal DNA and induce a change in the configuration of nucleosomes in the region [73]. As noted above, it is thought that the ZGA factor Zld is a pioneer protein [74, 75]. In the case of GAF, it has not only been implicated in ZGA, but there is also direct evidence that it can facilitate nucleosome displacement. Moreover, unlike Zld, GAF has been shown to associate with multiple remodeling complexes (see below).

As discussed above, a role for GAF in chromatin remodeling and the establishment of nucleosome-free region is supported by in vitro chromatin assembly experiments [12,13,14]. In these experiments, the promoter regions containing GAF-binding sites become accessible when GAF is included in the nucleosome assembly reaction mix. Nucleosomes-free promoter regions could also be generated de novo when GAF was added to purified nucleosome templates that had been pre-assembled in the absence of exogenous GAF [12]. However, addition of GAF by itself was not sufficient to remodel the nucleosomes around the promoter. Instead, generating a nucleosome-free promoter required not only GAF, but also the assembly extract and ATP. Since GAF does not have an ATPase activity, these requirements imply that the extract must contain an ATP-dependent co-factor that can function with GAF to remodel nucleosomes [12]. Tsukiyama and Wu were able to purify an ATP-dependent chromatin remodeler called NURF using this assay [76]. NURF consists of four proteins: ISWI which has ATPase activity and can translocate DNA; a large 300 kD multi-domain (including a Bromodomain and 3 PHD fingers) protein E(bx) (NURF301); a histone binding protein, Caf1-55 and Nurf-38 which is a member of the inorganic pyrophosphatase protein family [76, 77]. When incubated with a pre-assembled chromatin template, GAF and ATP, NURF can displace nucleosomes from the hsp70 promoter, making it accessible to MNase or restriction enzyme digestion [76] (Fig. 2a).

Fig. 2
figure 2

GAF functions in conjunction with ATP-dependent chromatin remodelers to establish nucleosome-free regions of chromatin. a GAF interacts with nucleosomal DNA and recruits chromatin remodelers. In an ATP-dependent reaction, remodelers translocate nucleosomes and establish “nucleosome-free” regions of chromatin that are hypersensitive to various nucleases. This facilitates the recruitment of other transcription factors. b GAF interacts with ATP-dependent chromatin remodelers of different subfamilies: SWI/SNF (PBAP and BAP), ISWI (NURF, ACF and ToRC), CHD (dNURD). Contacts established by indirect methods are indicated by dotted blue lines; direct partners (E(bx), ISWI and MEP-1) are indicated by solid black lines

The formation of nucleosome-free regions of chromatin by the combination of GAF and NURF in vitro is not simply dependent on the ability of GAF to bind to DNA and block nucleosome occupancy. Instead, there are specific protein:protein interactions between GAF and the NURF subunits that likely mediate the recruitment of the NURF complex either to free GAF or to chromatin associated GAF. One interaction is between GAF and the E(bx) subunit (Fig. 2b). Xiao et al. found that the 300 kD E(bx) subunit co-IPs GAF519 in embryonic nuclear extracts [65]. Using recombinant GAF, they showed that DBD domain and the immediately flanking amino acids mediate interactions with E(bx). Conversely, two regions in E(bx) are responsible for binding to GAF. One is a 1–391 aa N-terminal fragment while the other is a ~ 1000 aa sequence located between 993aa-2002aa. Both E(bx) sequences also mediate interactions with nucleosomes. The second direct interaction is between GAF and the ISWI subunit [65] (Fig. 2b). The interactions detected in these experiments appear to be sufficient for the formation of stable complexes between GAF and NURF. In reciprocal purification, when GAF is immunoprecipitated from embryonic nuclear extracts, all four NURF subunits are detected by mass spectrometry [52]. In addition to NURF, there are two other remodeling complexes that utilize the DEAD/H-helicase ATPase ISWI, the ACF [78] and ToRC [79] complexes (Fig. 2b). The ACF complex has ISWI and Acf1, while ToRC has ISWI plus two other subunits, CtBP and Tou. Since ISWI can interact directly with GAF, one might expect that both ACF and ToRC would associate with GAF in vivo. This seems to be the case as Acf1, as well as the two ToRC subunits CtBP and Tou are detected by mass spectrometry in GAF co-IPs [52].

Although GAF together with NURF forms nucleosome-free regions of chromatin on hsp70 promoter in vitro [12, 76], it seems likely that the SWI/SNF family of ATP-dependent chromatin remodelers is primarily responsible for GAF-dependent remodeling of promoters and perhaps also boundary elements and PREs (Fig. 2a). There are two fly SWI/SNF complexes, PBAP and BAP (Fig. 2b). They share seven subunits—the Brahma helicase (Brm), Bap111, Bap55, Bap60, Snr1, Act5C and Mor. The unique subunits that specify PBAP complex are Polybromo, Bap170 and SAYP, while the BAP complex contains OSA [80, 81]. Although direct interactions between GAF and SWI/SNF subunits have not yet been demonstrated, Nakayama et al. [82] found that a FLAG-tagged GAF519 protein expressed in a wild type Trl background is associated with several common SWI/SNF remodeling complex subunits and Polybromo specific to the PBAP in nuclear extracts. Supporting the idea that this interaction is functionally significant, they found that GAF and the PBAP subunits Polybromo, Brm, and Bap60 are associated with Fab-7 and d1 boundaries and the bxdPRE in vivo, and that the association of SWI/SNF subunits with these elements is significantly reduced in Trl mutants [82]. As would be predicted from the findings of Nakayama et al. [82] all of the PABP subunits are detected by mass spectrometry in proteins that are co-immunoprecipitated with GAF from nuclear extracts [52]. The idea that that there is an intimate functional connection between GAF and PBAP has received support from recent study by Judd et al. [83]. They found that GAF acts synergistically with the PBAP complex at a majority of GAF-regulated promoters. RNAi knockdown experiments of GAF and the PBAP subunit Bap170 indicate that both are required to generate hypersensitive regions at the same set of promoters in vivo. As GAF binds weakly to chromatin in the absence of PBAP, it likely interacts with target sequences prior to PBAP; however, when PBAP is present, GAF binding is substantially enhanced as is general accessibility. At the same time, this study showed that GAF functions with NURF at a subset of promoters to position the + 1 nucleosome and that this relationship seems to be important in facilitating the release of paused polymerases [83]. While GAF functions in the establishment of hypersensitive (nucleosome free) regions of chromatin, other factors are expected to help maintain these regions as open chromatin. For example, studies by Gilchrist et al. [84] suggested that the presence of paused polymerases just beyond the transcription start site helps to occlude nucleosomes from the promoter region.

PBAP and NURF are not the only GAF associated ATP-dependent remodeler complexes. Experiments by Lomaev et al. showed that GAF also co-immunoprecipitates the BAP specific OSA protein (together with all of the other SWI/SNF subunits [52]. Nakayama et al. [82] also found one of the subunits, Mi-2 of the Drosophila dNURD complex, suggesting that this chromatin remodeler might also be associated with GAF [82] (Fig. 2b). Consistent with this suggestion, all of the subunits of the dNURD complex were detected by mass spectrometry in the immunoprecipitation experiments of Lomaev et al. [52].

While these findings demonstrate physical linkages between GAF and the main chromatin remodelers, many important questions remain unanswered. One is the extent of context specificity between different remodelers and GAF. The studies of Judd et al. [83] indicate that GAF recruits PABP to promoters, while there is evidence from Nakayama et al. [82] that that this interaction may also take place at some boundary elements and PREs. Does the GAF-dependent remodeling of nucleosomes at promoters, boundaries and PREs depend only on PBAP, or do some of the other remodeling complexes also function at these elements or at least at a subset of these elements? What about NURF? Are the interactions of NURF with GAF limited to regions downstream of the transcription start site? If there is specificity with respect to the recruitment of remodelers to different classes of GAF-dependent elements, what determines this specificity?

GAF and formation/maintenance of nucleosome-free regions of chromatin

Physical interactions between GAF and at least five different chromatin remodeling complexes would be expected to coordinate GAF binding to DNA with local nucleosome remodeling, and this would provide a mechanism for generating nucleosome-free regions. The experiments with pre-assembled chromatin hsp70 template and NURF [12, 76] indicate that GAF can function as a classical “pioneer” protein and displace pre-existing nucleosomes. Although similar studies have not been performed for GAF and PBAP or other associated remodelers, one imagines that when combined with GAF they would also be able to remodel pre-assembled chromatin templates. On the other hand, while pioneer proteins are thought to be capable of displacing pre-existing nucleosomes, it is not entirely clear to what extent such an activity is relevant in vivo. In flies, transcriptional activation (ZGA) takes place in a two-step process with a minor wave of transcription during nuclear division cycles 8–13 and a major wave during nuclear cycle 14 [74, 85, 86]. During these and the earlier division cycles, the chromosomal DNA is duplicated before each division and the daughter chromosomes must be assembled into chromatin before mitosis commences. It is entirely possible that GAF or GAF plus a remodeling complex could interact with its cognate recognition sequences after the replication fork passes but before these sequences are fully assembled into nucleosomes. This would only require a pre-existing pool of free GAF, which at this stage of development could be provided by the substantial deposition of maternal Trl mRNA. At stages in development in which GAF (or GAF plus a remodeling complex) is already bound to its recognition motifs in nucleosome-free regions of chromatin, the pre-existing complexes could potentially help template the re-association of GAF (or GAF plus the remodeling complex) with its recognition motifs following the passage of the replication fork ensuring that the nucleosome-free regions are inherited by both daughter chromosomes. In either of these cases, displacement of pre-existing nucleosomes would be unnecessary. On the other hand, there are instances in which nucleosome-free regions of chromatin are generated by displacing pre-existing nucleosomes. A classic example would be the SUC2 promoter in the yeast S. Cerevisiae, which is remodeled by the SWI/SNF under inducing conditions [87, 88]. In this case, induction is rapid, and thus nucleosome remodeling is not coupled to the passage of the replication fork.

The interaction of transcriptions factors with chromatin remodeling complexes is not in itself sufficient to generate nucleosome-free regions of chromatin. For example, like GAF, HSF physically interacts with the E(bx) subunit of NURF [65]. Moreover, when incubated with NURF and ATP it can also displace nucleosomes from the hsp70 promoter [76]. While HSF appears to function as a “pioneer” factor when combined with NURF in these in vitro assays, HSF recognition sequences are not sufficient to generate a nucleosome-free promoter in vivo [16,17,18] and GAGA elements contribute significantly to HSF binding in vivo [22]. A possible reason for this is that under normal growth conditions, HSF is localized in the cytoplasm and thus would not be able to prevent nucleosome encroachment even if it could promote nucleosome displacement under heat shock conditions [89].

In fact, most of the well-studied nucleosome-free regions of chromatin in flies appeared to be maintained for extended periods of time, in some cases throughout much of the life cycle. While HSF is an extreme example, interactions between DNA-binding proteins and their target sequences have limited half-lives, and once a protein dissociates from its target sequence, it would not be able to block nucleosome encroachment. One mechanism that appears to be deployed by GAF in maintaining nucleosome-free regions is cooperative binding. The well-characterized GAF-dependent nucleosome-free regions typically extend over sequences of 100–400 bp and contain several GAGA-like motifs. Consistent with this being a general feature of GAF-dependent nucleosome-free regions, genome-wide studies indicate that GAF is most frequently found associated with sequences that contain multiple GAGA-like motifs [1, 30]. Based on in vitro biochemical experiments, sequences with multiple GAGA-like motifs are likely occupied in vivo by multimeric GAF complexes. As noted above, the BTB domain of GAF mediates the assembly of recombinant GAF into a spectrum of multimeric complexes, and this ability to multimerize substantially augments binding to natural DNA sequences such as the hsp70 or Ubx promoters [60]. The notion that a single protein and its cognate binding site might not be sufficient to maintain open regions of chromatin is supported by recent work by Kyrchanova et al. on several insulator proteins [90]. They found that a truncated 106 bp Fab-8 boundary that has two binding sites for the fly dCTCF zinc finger protein is not sufficient for dCTCF binding or for insulator function in the context of BX-C. Instead, at least four dCTCF binding sites are required for function, while three binding sites have no boundary activity and significantly reduced dCTCF association [90]. Similar results were obtained for two other polydactyl zinc finger DNA-binding proteins, Pita and Su(Hw). These studies also showed that boundary elements which have only a single binding site for Pita, dCTCF, or Su(Hw) require surrounding sequences (> 100 bp) for functionality [90]. This may also be true for pioneer proteins like Zld. While Zld occupancy has been observed at single sites, Zld is usually found in DNA segments that have a cluster of Zld binding motifs or other relevant motifs such as GAGAG [91]. It clearly will be of interested to determine whether single Zld sites in isolation can be accessed by Zld and are functional.

GAF and context-dependent partners

The formation and maintenance of nucleosome-free regions of chromatin by GAF is likely to involve not only GAF associated chromatin remodelers but also a collection of other proteins, some of which have functions that are dedicated to specific types of regulatory elements. In each regulatory context, the activity of GAF is augmented by these other protein partners. Some of these protein partners are part of larger complexes and may interact only indirectly with GAF, while other partners may involve direct physical interactions. Consistent with this later possibility, more than 20 proteins have found to interact directly with GAF (Fig. 3, Supplementary excel file).

Fig. 3
figure 3

GAF interactome. GAF interacts with chromatin remodelers, Trx and promoter-associated factors, proteins involved in boundary and PcG function, transcription elongation and mRNA export. Solid black lines—direct partners; dotted blue lines—partners identified by indirect methods

GAF promoter partners

Although functional studies have suggested that GAF interacts with NELF, the PolII subunit Rpb3 and CBP [7, 9, 23], none of these proteins are found stably associated with GAF in co-IP studies suggesting that they may interact with GAF indirectly. On the other hand, direct physical association has been shown for subunits of the TFIID complex [92, 93] and HSF [94] (Fig. 3). In the case of TFIID, GAF was shown to directly interact with TAF3 [92, 93] (Fig. 3). Both proteins could be co-purified in reciprocal co-IP experiments [92]. Another potential direct connection between GAF and TFIID is TAF4 (Fig. 3). It was identified in a yeast 2-hybrid (Y2H) screen as a GAF partner [93] and is present in GAF co-IP [52].

In addition to interactions with components of the general transcriptional machinery, GAF also associates with co-factors that have pathway or gene specific functions like HSF. For example, GAF was found to interact with the Yorkie (Yki) protein, a regulatory target of the Hippo signaling pathway (Fig. 3). Over 60% of Yki sites overlap with 49% of GAF sites genome wide in embryos and wing discs, and Yki and GAF co-purify from extracts of Drosophila S2 cells [95]. This interaction is functional—GAF is required for expression of dE2f1-Yki targets and depletion of GAF suppresses Yki-driven cell proliferation [95, 96]. Two other potential GAF interactors are the Atro/Gug and Gro co-repressor proteins. Atro and Gro proteins overlap with 24.3% and 21.3% (11.47% TSS) of GAF sites, respectively [97, 98]. Atro was shown to co-immunoprecipitate GAF in IP-western experiments [98], while Gro was found associated with GAF in IP/MS studies [52].

GAF boundary and PRE partners

Many fly chromatin boundary elements (insulators) have multiple GAGAG motifs and are occupied by GAF in ChIP experiments [33]. Two classic examples of GAF associated boundaries are Fab-7 [99,100,101,102,103,104] and Fab-8 [105, 106] from the Drosophila Bithorax complex (BX-C) (Fig. 4a). These two boundaries flank the iab-7 regulatory domain, insulating it from the adjacent iab-6 and iab-8 regulatory domains. The iab-6, iab-7 and iab-8 regulatory domains, together with iab-5, are responsible for the parasegment (PS) specific expression of the BX-C homeotic gene Abdominal-B (Abd-B) in the posterior parasegments PS10 (iab-5), PS11 (iab-6), PS12 (iab-7) and PS13 (iab-8). In addition to blocking cross-talk between adjacent regulatory domains the boundaries in this region of BX-C also have bypass activity. Bypass activity is required so that more distal domains can bypass the intervening boundary elements and activate Abd-B expression. For example, the iab-6 regulatory domain must bypass Fab-7 and Fab-8 to regulate Abd-B in PS11. Likewise, the iab-7 regulatory domain has to bypass Fab-8 to regulate Abd-B in PS12 [107,108,109,110].

Fig. 4
figure 4

GAF is required for the functioning of chromatin boundaries and PREs in BX-C. a The diagram shows the BX-C TARDs, containing Ubx, abd-A, and Abd-B genes and their respective cis-regulatory domains. The positions of the boundaries and PREs are indicated above and below the sequence coordinate line correspondingly. The red arrow marks Fab-7 boundary. The red, yellow and green ovals are regions bound by GAF, CTCF or E(z) and Ph, respectively. b The Fab-7 boundary has four nuclease hypersensitive regions: one minor—HS* and three prominent—HS1, HS2, and HS3. The locations of the recognition motifs for GAF, Pho, Elba and Insv proteins known to be associated with Fab-7 are indicated. The LBC complex binds to distal half of HS1 (dHS1) and HS3

Both Fab-7 and Fab-8 map to nuclease hypersensitive regions in chromatin. Fab-7 has four nuclease hypersensitive regions, HS*, HS1, HS2 and HS3 [37, 99, 102, 111] (Fig. 4b). The two largest are HS1, which is about 400 bp in length and HS3, which is nearly 200 bp. In nuclear extracts, GAF interacts with large probes spanning the centromere distal half of HS1, dHS1, and with a similarly sized HS3 probe as part of a large > 1000 kD complex called the LBC (Large Boundary Complex) [35, 38]. The sequence recognition properties of the LBC are complex. The available evidence indicates that the preferred substrates for the LBC are 120–200 bp in length and contain one or more GAGAG-like motifs [35]. However, other than GAGAG-like motifs, the sequences that are known to be bound by the LBC share no obvious similarities. dHS1 has four GAGAG motifs, while HS3 has two and mutations in these motifs substantially weakened LBC binding in nuclear extracts and boundary function in vivo [35, 38] (Fig. 4b). The LBC also binds to a ~ 130 bp sequence on centromere proximal side of Fab-8 that has a single GAGAG; however, unlike dHS1 or HS3, mutations in this GAGAG motif have only a minimal impact on LBC binding. While most other known LBC recognition sequences have GAn motifs that are also important for LBC binding, there are examples of sequences that are bound by the LBC but lack GAGAG motifs [112].

Supershift EMSA experiments using gel filtration fractions containing the LBC with antibodies directed against known insulator proteins suggest that the LBC complex contains GAF, Mod(mdg4), E(y)2 and CLAMP [35, 38, 52, 113]. While Mod(mdg4) and E(y)2 are also detected by mass spectrometry in GAF co-IPs from nuclear extracts, surprisingly CLAMP is not [52]. Given this discrepancy, it is not clear at this point if CLAMP is a bona fide component of the LBC. While it conceivable that it fails to remain associated with GAF/LBC in co-IP experiments it is also possible that the two CLAMP antibodies used in the supershift experiments cross-react with an as yet unknown LBC associated protein. In this context, other boundary factors that do not appear to be components of the LBC are nevertheless found in GAF co-IPs. These include CP190, Su(Hw), CP60, and the boundary associated factors Pzg/Z4, Dref and Chro [52]. GAF was shown to directly interact with Pzg/Z4 protein [52] (Fig. 1b, Fig. 3), but not with CP190 or Su(Hw) proteins [52, 62, 114]. Since the CP190 and Su(Hw) proteins were shown not found to be associated with the LBC in EMSA supershift experiments [38], these boundary proteins appear to associate with GAF in complexes that are independent of the LBC.

Unlike CLAMP, the presence of Mod(mdg4) in the LBC is supported not only by the supershift and co-IP experiments, but also by the fact that its BTB/POZ domain is known to interact with the BTB/POZ domain of GAF [62] (Fig. 1b, Fig. 3). Since BTB/POZ domain of Mod(mdg4) forms a stable octamer [62], it seems possible that a Mod(mdg4) octamer could provide a scaffold for the assembly of GAF and other proteins into the LBC.

Mod(mdg4) also has another property which could help explain the seemingly promiscuous sequence recognition properties of the LBC. There are 31 predicted Mod(mdg4) isoforms. All share the N-terminal BTB/POZ domain but have different C-terminal domains. Of the 31 distinct C-terminal domains, 27 have unique FLYWCH zinc finger DNA-binding domains and each would be expected to have somewhat different sequence recognition specificities. The temporal/tissue-specific patterns of expression and the relative abundance of the different Mod(mdg4) isoforms are not known; however, 14 of the Mod(mdg4) isoforms were detected in GAF co-IPs with embryonic nuclear extracts [52]. Included in this group is the PT isoform [also known as Mod(mdg4)67.2 or 2.2] that has implicated genetically in boundary function [115, 116]. Based on what is known about Mod(mdg4) oligomerization, it is possible that eight different Mod(mdg4) isoform combinations could be present in each LBC. If this were the case, it could explain how the LBC is able to interact with seemingly quite different DNA sequences. The presence of an octamer in each LBC plus at least one other DNA-binding protein (GAF) could also help account for the large size (120–200 bp) of the sequences that are bound by the LBC in nuclear extracts. In this context, it also worth noting the LBC recognition sequences that have been characterized by DNAase or MNase digests of chromatin span sequences of similar size [99, 102].

The functions of the LBC recognition sequences in Fab-7 and Fab-8 are somewhat different. In Fab-8, the primary function of the LBC recognition sequence is to provide bypass activity, while the sequences on the distal side of the hypersensitive region function to provide blocking activity [112]. Not unexpectedly, these distal sequences contain two binding sites for the fly dCTCF protein [112]. In the case of the Fab-7, Kyrchanova et al. found that a combination of dHS1 (the LBC recognition sequence in HS1), and HS3 is sufficient to fully reconstitute the blocking and bypass activity of the much larger WT Fab-7 boundary in boundary replacement experiments [35]. This finding was a surprise as previous studies suggested that HS3 is not part of the Fab-7 boundary but rather is a PRE (Polycomb Response Element) for the BX-C iab-7 regulatory domain, the “iab-7 PRE” [26, 103]. HS3 has two GAGAG motifs and three recognition sequences for the PcG group DNA-binding protein Pleiohomeotic [35] (Pho, [117]) (Fig. 4b). Mutants in the two GAGAG motifs or in the Pho binding sites eliminate the silencing activity of HS3 [28]. Silencing activity is also dependent upon the GAF gene Trl and the pho gene [26, 118]. As noted above, LBC binding to HS3 is disrupted by mutations in the two GAGAG motifs; however, it is not affected by mutations in the three binding sites for Pho [35]. This is also true for boundary function of HS3; it is eliminated by mutations in the GAGAG motifs, but not by mutations in the Pho binding sites [35]. Thus, HS3 has two distinct functions. One is a boundary function, which depends on the LBC, while the other is a silencing function, which requires both the LBC and Pho.

An as yet unanswered question is what role does the LBC play in PRE (and also TRE) function and how does this relate to its boundary activities? Recent Y2H experiments by Shokri et al. suggest that there is a direct interaction between GAF and Pho [119]. However, this association does not appear to be stable in Co-IP experiments as neither Pho nor its partner Stmbt are consistently detected by mass spectrometry [52]. This would suggest that one of the reasons why the GAGAG motifs in HS3 (iab-7 PRE) are required for PcG dependent silencing is that GAF (or in this case the LBC) is needed to open up the chromatin so that Pho can gain access. This possibility is supported by the studies of Mahmoudi et al. who showed that Pho could not gain access to chromatized Ubx PRE templates in the absence of added GAF [120].

On the other hand, other known components of the PcG silencing machinery form stable complexes with GAF in nuclear extracts. These include all of the components of the PcG complex PRC2 (Polycomb repressive complex), Zeste and Adf [52]. There are also several proteins that appear to interact directly with GAF. These include: Psq [62, 121], Batman [52, 62, 93, 122, 123], Bin1 [124] and Corto [125] (Fig. 1b, Fig. 3). GAF association in vivo has been shown for Psq [52, 121, 126], Batman [52, 122], Bin1 [52], while an association has not yet been found for the Corto protein.

As mentioned above, the Psq protein contains a BTB/POZ domain through which it can interact with GAF [62, 121] and DNA-binding domain. The DNA-binding domain consists of four tandem repeats of a conserved 50-amino acid sequence (Psq domain) and, like GAF, it recognizes GAGAG sequence motifs [127]. Both GAF and Psq could function together – they are colocalized on the polythene chromosomes [121] and bind similar sites genome-wide in S2 Drosophila cells as measured by ORGANIC profiling [128]. Psq co-purifies not only with GAF [52, 121], but also with the PcG protein Polycomb [129].

The Batman protein is only 127 aa in length. Most of the Batman protein corresponds to its BTB/POZ domain, which can interact directly with the GAF BTB/POZ domain [52, 62, 122, 123]. Batman and GAF are codistributed and both co-localize with about 50% of the Ph sites on polytene chromosomes [122]. It was shown that GAF association with Ph in embryos depends on externally provided Batman. On the other hand, in larval extracts, GAF can be co-IPd with Ph irrespective of Batman [123].

Another protein that might mediate GAF interactions with PcG complexes is the Bicoid interacting protein 1 (Bin1) also known as SAP18. While Bin1 lacks a BTB/POZ it nevertheless interacts with BTB/POZ domain of GAF [124]. Bin1 also interacts with the PRC2 subunit E(z) and they can be co-purified from nuclear extracts [130]. Apparently, Bin1 participates only in subset of GAF interacting associated sequences, since the GAF and Bin1 show only limited overlap in polytene chromosomes. However, both GAF and Bin1 co-localize at the region of the Bithorax complex [124].

Conclusions

GAF is unusual in that it is a participant in many different and seemingly unrelated regulatory contexts. In vivo, it is found associated with promoters, enhancers, boundary elements, PREs and TREs. Though not discussed here, GAF also co-localizes with CLAMP at a subset of the X-chromosome chromatin entry sites (CES) for the male specific lethal complex (MSL) [113]. In these different contexts, it facilitates promoter activity, transcriptional activation by enhancers including long distance enhancer–promoter interactions, insulating activity and transcriptional silencing. This diverse array of context-dependent activities poses two important questions. First, how does GAF carry out these different functions? Second, is there a common mechanism(s) of action or unifying theme or are these distinct functions completely “unrelated?” Probably the answer, at least in part, to the first question is that GAF has many different protein partners, either direct interactors, or proteins that are in complexes and interact with GAF only indirectly. In many instances, it is likely that these protein partners are largely responsible for carrying out distinct GAF functions. As for the second question, the common theme may be the ability of GAF to induce and then help maintain open regions of chromatin. In this respect, the cooperation of GAF with ATP-dependent chromatin remodelers could represent a conserved mechanism of action for “pioneer” proteins in other organisms. For example, it was shown that the OCT4 needs the BRG1/SMARCA4—SWI/SNF homologue, to support its binding and pioneering activity in mouse embryonic stem cells [131]. Similarly, chromatin association of the NANOG pioneer factor is facilitated by the BRG1 in mouse blastocysts [132]. Although two mammalian GAF homologs have been identified, their properties have not been explored in as great a detail as the Drosophila protein and it is not known whether they function in generating nucleosome-free regions of chromatin. One of the potential homologs is ZBTB3. It was computationally predicted to be the mammalian protein most structurally related to Drosophila GAF [133]. The other is the ThPOK (also known as c-KROX/ZBTB7B) protein, and it has been studied in more detail [134]. Like GAF, it has an N-terminal BTB/POZ domain, but instead of a single zinc finger, it has four C-terminal zinc fingers. ThPOK binds to GAGA sequences in vitro and in vivo and it functions in transcriptional activation of genes in different cellular contexts [135]. It has also been implicated in the differentiation of CD4 + progenitor cells where it is thought to activate the expression of the “suppressors of cytokine signaling” genes and repress expression of Blimp1 and Runx3 [136, 137]. ThPOK has also been found to contribute to the insulator function of an element located between the mouse Evx2 and Hoxd13 genes [138]. While it is not known whether ThPOK functions in generating nucleosome-free regions of chromatin, it has been shown to co-purify with the ACF and NurD chromatin remodeling complexes so this possibility remains open [139].