Protein-intrinsic properties and context-dependent effects regulate pioneer factor binding and function

Chromatin is a barrier to the binding of many transcription factors. By contrast, pioneer factors access nucleosomal targets and promote chromatin opening. Despite binding to target motifs in closed chromatin, many pioneer factors display cell-type-specific binding and activity. The mechanisms governing pioneer factor occupancy and the relationship between chromatin occupancy and opening remain unclear. We studied three Drosophila transcription factors with distinct DNA-binding domains and biological functions: Zelda, Grainy head and Twist. We demonstrated that the level of chromatin occupancy is a key determinant of pioneering activity. Multiple factors regulate occupancy, including motif content, local chromatin and protein concentration. Regions outside the DNA-binding domain are required for binding and chromatin opening. Our results show that pioneering activity is not a binary feature intrinsic to a protein but occurs on a spectrum and is regulated by a variety of protein-intrinsic and cell-type-specific features.

Fig. 1: Ectopically expressed PFs open chromatin and activate transcription.
Fig. 2: Twi binds closed chromatin extensively and drives accessibility at a limited number of sites.
Fig. 3: Motif content shapes PF activity.
Fig. 4: Many endogenous binding sites are resistant to ectopic PF binding.
Fig. 5: PF binding and opening of closed chromatin are concentration dependent.
Fig. 6: The DBD is not sufficient for PF function.

Data availability

Sequencing data are available through the Gene Expression Omnibus under accession GSE227884. Source data are provided with this paper.

Code availability

All analysis code is described in the Methods and freely available on GitHub at


We thank A. Theis, M. Freund and A. Mehle for helpful discussions and advice. We thank J. Zeitlinger (Stowers Institute for Medical Research) for generously sharing the anti-Twist antibody. We acknowledge the University of Wisconsin–Madison Biotechnology Center and the NUSeq Core Facility for sequencing. T.J.G. was supported by National Institutes of Health (NIH) National Research Service Award T32 GM007215. Experiments were supported by grants R35 GM136298 and R01 NS111647 from NIH to M.M.H. M.M.H. was also supported by a Vallee Scholar Award. M.M.H. is a Romnes Faculty Fellow and Vilas Faculty Mid-Career Investigator.

Author information

Authors and Affiliations



T.J.G. and E.D.L. performed experiments and analysis. E.D.L. generated the CUT&RUN data in larval neuroblasts. All other data and analysis were performed by T.J.G. T.J.G. and M.M.H. designed the experiments. M.M.H. and T.J.G. wrote the paper with input from E.D.L.

Corresponding author

Correspondence to Melissa M. Harrison.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks Antonio Giraldez and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Dimitris Typas, in collaboration with the Nature Structural & Molecular Biology team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Stable cell lines allow inducible expression of transcription factors at physiological concentrations.

a, Schematic of generation of stable cell lines and induction of protein expression. b, mRNA levels of zld and grh in S2 cells (n = 2 biologically independent samples). Top histogram shows the distribution of mRNA levels for all Drosophila genes. Vertical dashed line indicates a log2 RPKM value of 0 as a threshold for considering a gene to be expressed. c-d, Immunoblots showing titration of Zld (c) or Grh (d) protein levels in stable cell lines. Two independently generated cell lines are shown and compared to 2-3 hours (H) embryos. 60,000 cells were loaded in each well, which is equivalent to the approximately 60,000 nuclei present in 10 2-3 hours embryos. Black arrowheads indicate Zld (c) or Grh (d). Gray arrowheads indicate background bands used to assess loading. e-g, Heatmaps comparing Zld (e), Grh (f), or Twi (g) ChIP-seq signal to control experiments in which anti-Zld, anti-Grh, anti-Twi, or IgG antibodies were used to perform immunopreciptation in wild-type (WT) cells. ATAC-seq signal in wild-type cells is shown for reference. Heatmaps are ranked by mean intensity across all samples. For all boxplots, line shows the median, boxes extend from the 25th to the 75th percentile, and whiskers show 1.5 × the interquartile range. Outlier points beyond the range of the whiskers are shown individually.

Source data

Extended Data Fig. 2 Ectopic expression of pioneer factors in S2 cells leads to widespread changes to chromatin accessibility and gene expression.

a,d, Volcano plots showing changes in ATAC-seq signal in cells expressing Zld (a) or Grh (d) when compared to wild-type cells treated with the same concentration of CuSO4. b,e, RNA-seq volcano plots showing gene expression changes in cells expressing Zld (b). For ATAC-seq, (n = 2,777 decreased, 37,473 nonsignificant, 3,769 increased) or Grh (e) (n = 8,948 decreased, 43,143 nonsignificant, 4,784 increased) when compared to wild-type cells treated with the same concentration of CuSO4. c,f, Violin plots showing the correlation between changes in chromatin accessibility and gene expression upon expression of Zld (c) or Grh (f). On the x-axis, all ATAC-seq peaks are grouped based on increased, decreased, or non-significant (ns) changes to chromatin accessibility in Zld- or Grh-expressing cells compared to wild-type cells. Groups were compared using a two-sided Wilcoxon rank sum test and Bonferroni-corrected p-values are shown. g,h, Bar plots showing enrichment of gene ontology terms in genes significantly upregulated upon expression of Zld (g) or Grh (h). For all boxplots, line shows the median, boxes extend from the 25th to the 75th percentile, and whiskers show 1.5 × the interquartile range. Outlier points beyond the range of the whiskers are shown individually. Statistical significance for all volcano plots was determined using DESeq2 (see methods).

Extended Data Fig. 3 Zld and Grh bind to chromatin rapidly after induction of protein expression.

a-b, Immunoblots showing time course of Zld (a) or Grh (b) protein expression following induction of stable cell lines. Black arrowheads indicate Zld (a) or Grh (b). Gray arrowheads indicate background bands used to assess loading. c-d, Metaplots showing average z-score normalized CUT&RUN signal at class I, II or III sites at different time points following induction of Zld (c) or Grh (d).

Source data

Extended Data Fig. 4 Twist binding leads to chromatin opening and transcriptional activation.

a, Immunoblot showing titration of Twi or HA-Twi protein levels in stable cell lines. Protein levels in stable cell lines are compared to 3-4 hour (H) old embryos. Black arrowhead indicates Twi or HA-Twi. Gray arrowhead indicates background band used to assess loading. b, Volcano plots showing changes in ATAC-seq signal in cells expressing Twi when compared to wild-type cells treated with the same concentration of CuSO4. c, RNA-seq volcano plots showing gene expression changes in cells expressing Twi when compared to wild-type cells treated with the same concentration of CuSO4. d, Bar plots showing enrichment of gene ontology terms in genes significantly upregulated upon expression of Twi. Statistical significance for all volcano plots was determined using DESeq2 (see methods).

Source data

Extended Data Fig. 5 Chromatin features associated with Zld, Grh and Twi binding sites.

a,c,e, Heatmaps showing the levels of different chromatin marks in class I, II or III regions for Zld (a), Grh (c) or Twi (e). HC, heterochromatin. TFs, transcription factors.The color represents the average z-score normalized read depth across a 1 KB region surrounding the center of class I, II or III ChIP-seq peaks. b,d,f, Example genome browser tracks for class III regions with high levels of H3K27me3 for Zld (b), Grh (d), or Twi (f).

Extended Data Fig. 6 Zld, Grh and Twi do not bind preferentially to motifs in a particular position on nucleosomes.

a-c, Metaplots showing average MNase signal from wild-type cells centered on motifs within class I, II, or III regions for Zld (a), Grh (b), or Twi (c). d-f, Heatmaps showing MNase signal centered on motifs within class II and III regions for Zld (d), Grh (e), or Twi (f). Rows are ordered based on hierarchical clustering to highlight the various patterns of MNase signal around motifs. Signal intensity corresponds to spike-in normalized read counts provided in the original study60.

Extended Data Fig. 7 Zld, Grh and Twi display cell-type specific binding.

a-c, Venn diagrams showing overlap between ChIP-seq peaks identified in different tissues for Zld (a), Grh (b) or Twi (c). d-l, Genome browser tracks showing examples of class IV, V, and VI regions for Zld (d-f), Grh (g-i) or Twi (j-l). For each example, the top tracks show H3K27me3 and H3K9me3 signal over a larger region. Dashed gray lines indicate a zoomed-in region where Zld, Grh or Twi ChIP-seq signal is shown in S2 cells or in embryos.

Extended Data Fig. 8 Analysis of chromatin and motif content at class I-VI regions.

a, Immunoblot showing H3K27me3 levels in two replicates of DMSO- or tazemetostat-treated cells. Tubulin levels are shown as a loading control. b, Heatmap showing specificity of anti-H327me3 antibody in CUT&RUN reactions. A panel of barcoded spike-in nucleosomes bearing different modifications was added to each CUT&RUN reaction (see methods). For each sample, the heatmap displays the percentage of barcode reads for each sample and histone modification relative to the total number of barcode reads for all modifications. c-e, Heatmaps showing the levels of different chromatin marks in class I-VI regions for Zld (c), Grh (d) or Twi (e). HC, heterochromatin. TFs, transcription factors. f-h, Enrichment of known motifs in class I-VI sites for Zld (f), Grh (g) or Twi (h). Left heatmap shows normalized motif rank within each class, with 1 being more enriched and 0 being less enriched. Right heatmap shows the normalized expression (log2 RPKM) in RNA-seq datasets from each of the tissues that was analyzed.

Source data

Extended Data Fig. 9 Expression of Zld, Grh, or Twi at high protein levels results in chromatin opening at a small number of novel binding sites.

a-c, Immunoblots showing Zld (a), Grh (b) or Twi (c) protein levels when stable cell lines are induced using different concentrations of CuSO4. d-f, Bar plots showing the percentage of previously defined class I-VI binding sites that are bound by Zld (d), Grh (e), or Twi (f) when expressed at varying concentrations. g-i, Bar plots showing the percentage of previously defined class I-VI binding sites that overlap an ATAC-seq peak when Zld (g), Grh (h), or Twi (i) are expressed at varying concentrations. j, Heatmap showing z-score normalized, anti-Zld CUT&RUN data generated from either wild-type neural stem cells (Type II neuroblasts) or neural stem cells over-expressing Zld. Heatmaps are divided to show those peaks detected in wild-type vs. novel peaks that were only observed with over-expression of Zld. ATAC-seq data from wild-type neural stem cells is also shown. Heatmaps are ranked by ATAC-seq signal. k, Heatmap of adjusted p-values for the top de novo motifs enriched in either the endogenous or novel (Zld overexpression) Zld binding sites. Motif enrichment p-values were computed using AME (see methods).

Source data

Extended Data Fig. 10 Expression of Zld and Grh DNA-binding domains at protein levels comparable to the full-length proteins.

a-b, Immunoblots showing titration of protein levels for Zld (a) or Grh (b) DNA-binding domains to match expression of the full-length proteins. DNA-binding domain protein levels are shown at a range of CuSO4 concentrations and compared to an equivalent number of cells expressing full-length Zld or Grh at approximately physiological levels. The concentration of CuSO4 used to induce DBD expression for ChIP, ATAC and immunofluorescence experiments is indicated in red. c-d, Immunofluorescent microscopy images of stable cell lines expressing full-length protein or DBD only for Zld (c) or (Grh). Stable cell lines are compared to wild-type (WT) cells. All scale bars are 5 µm.

Source data

Supplementary information

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Table 1. Classification of Zld, Grh and Twi ChIP–seq peaks. The table contains information about ChIP–seq peaks for Zld, Grh and Twi, including the peak location, information about peak calling by MACS2, classification of peaks as class I, II or III, information about differential accessibility as determined by DESeq2, assignment of peaks to the nearest gene, information about differential expression of nearby genes and motif content with each peak. Statistical significance was determined using MACS2 for peaks and DESeq2 for differential gene expression or accessibility. See the Methods for details. Supplementary Table 2. Previously published genomics datasets that were analyzed in this work. This table provides the target of immunoprecipitation (for ChIP–seq datasets), the cell type and/or developmental stage, the Gene Expression Omnibus accession number (GSE) and the PubMed ID for the study in which the dataset was originally published.

Source data

Source Data Extended Data Fig. 1

Unprocessed western blots.

Source Data Extended Data Fig. 3

Unprocessed western blots.

Source Data Extended Data Fig. 4

Unprocessed western blots.

Source Data Extended Data Fig. 8

Unprocessed western blots.

Source Data Extended Data Fig. 9

Unprocessed western blots.

Source Data Extended Data Fig. 10

Unprocessed western blots and images.

This article is cited by
