Introduction

The octameric DNA element (ATGCAAAT) was originally discovered in 1984 as a conserved nucleotide sequence upstream of immunoglobulin genes [1]. Functional DNA elements related to the octamer sequence were subsequently found in promoters of the ubiquitously expressed small nucleolar RNA and histone 2B genes, enhancers of the SV40 virus as well as in the adenovirus origin of replication [2,3,4,5]. Oct1 and Oct2 (initially termed NF-A1 and NF-A2 [6], OTF-1 and OTF-2 [7,8,9], OBP100 [5] or NFIII [10]) were simultaneously discovered by various labs in the search for the trans-acting protein factors utilizing octamer DNA to regulate gene expression [11,12,13,14]. At the same time a factor was discovered as transcriptional activator of growth hormones and prolactin genes in the pituitary gland and termed Pit1 [15, 16]. Pit1, Oct1 and Oct2 contain two sequence motifs spanning a total of about 160 amino acids with homology to the C. elegans unc-86 gene [17]. This novel bipartite domain was, therefore, designated with the acronym POU (pronounced ‘pow’) after its founding family members Pit, Oct and Unc-86 [18]. One of the two sequence motifs is a 60 amino acids region with homology to the homeodomain initially discovered in the fruit fly Drosophila [19, 20] and, is therefore, termed POU-homeodomain (POUHD). The second sequence motif, unknown by the time POU was discovered, is composed of 75 amino acids located N-terminally to the POUHD that is specific for these proteins and is, therefore, named POU-specific (POUS) domain [18, 21]. A linker of variable length and sequence joins the POUS and POUHD. Mouse and human genomes encode 15 POU genes (Fig. 1a). POU TFs have essential roles in a wide array of cellular processes and gained particular prominence in studies demonstrating the reprogramming of somatic cells to pluripotency and the directed lineage reprogramming by the trans-differentiation to neural precursor cells (NPCs) or post-mitotic neurons. The bipartite architecture of the POU endows these TFs with the ability to associate with DNA in structurally diverse configurations. Several excellent reviews have discussed DNA binding by POU TFs and their function in mammalian development [22,23,24,25,26,27,28,29]. Here we provide an update about our current understanding on the molecular basis for the selective DNA recognition and context-dependent gene regulation by POU TFs and discuss these properties in the functional context of cellular reprogramming and chromatin remodeling.

Fig. 1
figure 1

Classification and expression of POU TFs. a Dendrogram generated using full-length mouse POU protein sequences with T-Coffee (http://www.tcoffee.org/). Systematic names and commonly used synonyms are indicated for each of the 15 factors. The six classes are represented in different colors and the traditionally used separation into octamer and non-octamer binding factors is highlighted. b Expression of POU genes in eight developmental domains represented as mean Z-scores of expression for all cell types in a domain [34]. POU factors are color coded according to class membership. c Expression of POU genes in selected mouse cell and tissue types represented as log2 transformation of GC-normalised RNA-seq read count data per gene. Domain assignments of datasets are indicated and color-coded. EM embryonic, SE surface ectoderm, MS mesoderm, NC neural crest, GC germ cells, EN endoderm, NR neuroectoderm, BM blood mesoderm. Pou4f3 is absent from the expression table as it was barely detectable in any of the 272 analyzed cell and tissue types

Classification, nomenclature and expression

As most other major transcription factor (TF) families POU genes are present in the genomes of all metazoans designating them as basic molecular toolkit driving animal evolution [30]. However, they are absent in plants and fungi. Many POU genes were identified in the late 1980s and early 1990s in parallel by several laboratories using different model organisms. POU factors were discovered either genetically or biochemically using gel shift assays with octamer DNA as probe and subsequent cloning of the associated gene. As a consequence, the nomenclature has been somewhat confusing with alternative names for many family members. POU genes were grouped into six classes based on sequence similarities denoted with roman numbers I–VI [29, 31]. Individual genes are now unambiguously referred to with prefixes designating class membership from POU1 to POU6 (Pou1–Pou6 for mouse factors) [32]. However, the classical synonyms are still more common in the literature and will thus also be used in this review (Fig. 1a). POU classes I, III, IV and VI can be found in genomes of sponges and eumetazoans and are, therefore, present in the common ancestors of all living animals. The POU class II group evolved later in bilaterian evolution whilst class V is constrained to vertebrates [30]. Traditionally, POU genes were grouped based on their biochemical activity as octamer DNA binding (classes II, III, and V) and non-octamer binding (classes I, IV, VI) (Fig. 1a, [27]). Octamer binding factors were numbered based on the position of retarded DNA probes in electrophoretic mobility shift assays from Oct1 to Oct11 [33].

POU factors function pleiotropically in a wide range of cell types. To illustrate expression patterns of POU genes, we used a recently compiled collection of RNA sequencing (RNA-seq) data from 272 mouse cell and tissue types that were assigned to 8 broad developmental domains (Fig. 1b, c) [34]. In Fig. 1c, a selection of these cell and tissue types is shown where individual POU genes are strongly expressed. The first isolated family members belong to classes I (Pit1) or II (Oct1, Oct2 and the later discovered Oct11). Pit1 has been intensely studied for its function in the anterior pituitary gland [15, 16] but is also widely expressed in the blood mesoderm (Fig. 1b, c). Oct1 is ubiquitously expressed and can be detected in almost any cell type [5, 7, 35]. In accordance with early studies, Oct2 is strongly expressed in the blood mesoderm in particular B-lymphocytes and can also be found in cells of the neuroectoderm [6, 9, 13, 14, 36, 37]. Oct11 has the most restricted expression of class II POU genes and predominates in cells of the surface ectoderm such as the epidermis, skin and taste buds [38, 39]. Some members of classes III (Brn1, Brn2 and Brn4), IV (Brn3) and VI (Brn5) were originally discovered in adult brain tissue and hence designated Brn1-5 [40,41,42,43]. A fourth POU III class gene, Oct6, was initially detected in the rat testes (termed Tst-1) [41] and in glial cells (termed SCIP for suppressed cAMP inducible POU) [44] but was later found to be also expressed in the blastocyst and other lineages of the early embryo such as developing brain and skin [38, 45]. Class III POU genes are devoid of any introns. Brn2 is expressed in the neuroendocrine hypothalamus and pituitary [46, 47], Brn1 in the developing nervous system, hypothalamus and kidney [48, 49] while Brn4 is expressed in developing nervous system, hypothalamus, pituitary gland and inner ear [48, 49]. Genome-wide profiling underlines the prominent and specific expression of the four class III POU genes in many neuroectodermal cell types (Fig. 1b, c). Class IV has the closest homology to the C. elegans gene unc-86 and contains three members termed Brn3a [41, 50], Brn3b [51,52,53] and Brn3c [50, 54]. Although they were initially detected in neuroectodermal cell types, Brn3a is also expressed in the embryonic domain (Fig. 1c). Class VI comprises two genes that are either broadly expressed (Pou6f1/Brn5) or exhibit an overall rather weak expression (Pou6f2/RPF-1). Brn5 was initially discovered in the neocortex [40] and Pou6f2 in human retina cDNA libraries, thus called retina-derived POU-domain factor-1 (RPF-1) [55]. Class V POU TFs consists of Pou5f1 (Oct4) as well as Pou5f2 (Sprm1). Oct4 is possibly the most prominent POU family member and was identified in embryonic and endometrial cancer cell lines for its activity to bind octamer DNA in electrophoretic mobility shift assays (EMSA) [48, 56, 57]. Several labs independently discovered Oct4 and used diverging numbering conventions initially leading to the parallel use of the designations Oct3 or Oct4 (in some studies converging to Oct3/4) [48, 56,57,58]. Oct4 is a hallmark factor regulating the pluripotency of embryonic stem cells and is involved in the earliest cell fate decisions in mammalian development [59,60,61]. The expression of Oct4 is largely restricted to the embryonic domain and can barely be detected in somatic cell types (Fig. 1b, c). Several transcript isoforms of Oct4 were reported in human [62] and mouse [63]. The full-length Oct4 is termed Oct4A and a truncated isoform was named Oct4B [64]. The two versions of Oct4 have different expression patterns [65], gene regulation and self-renewal potential [64]. The second member of class V, Sprm1/Pou5f2, is transiently expressed in the testis [66] and is required for the development of male germ cells [67]. Class V has a complex evolutionary history with initially three members (Pou5f1, Pou5f2 and Pouf5f3) present in different combinations in vertebrate clades [68]. A zebrafish homologue was initially named Pou2 and classified as Pou5f1 orthologue [68, 69]. This was recently reinterpreted and Pou2 is now considered to be a Pou5f3 orthologue that is present in marsupials, monotremes, birds, xenopus, salamanders and teleost fishes but unlike Pou5f1 and Pou5f2, it is absent in eutherians.

POU-erful reprogramming factors

The capacity of TFs to interconvert the state of somatic cells has first been demonstrated when the overexpression of the helix–loop–helix (HLH) factor MyoD alone could convert fibroblasts into myocytes [70]. Subsequently, the overexpression of C/EBPα or C/EBPβ was found to be able to convert differentiated B-cells into macrophages [71]. Likewise, several POU factors are potent mediators of cellular reprogramming. This has been demonstrated when transcription factor cocktails were overexpressed in somatic cells leading to the reprogramming into pluripotent stem cells [72,73,74], proliferative neural precursor cells [75,76,77,78,79,80] and post-mitotic neurons [81,82,83,84,85] (Fig. 2a–d). However, the reprogramming trajectory and outcome depend on the identity of the POU factor used in these assays as well as on the external cues applied during reprogramming.

Fig. 2
figure 2

POU factors used in pluripotency and lineage reprogramming cocktails. POU factor-containing cocktails used in pluripotency (a, b) or direct lineage (c, d) reprogramming of mouse (a, c) or human (b, d) cells. The starting cell type is represented in the center as sphere or box. The various reprogramming cocktail are shown besides the arrow. The size of cell cluster in a and b schematically represents efficiency. The reprogramming cocktail in each section (ad) is numbered and abbreviations and references are given below. a Mouse pluripotency reprogramming cocktails: 1 OSKM (O: Oct4; S: Sox2; K: Klf4, M: c-Myc) [73], 2 OKM + S1/S3 (Sox1/Sox3) [86], 3 OSM + K1/K2/K5 (Klf1/Klf2/Klf5) [86], 4 OSK + N-Myc/C-Myc/L-Myc [86], 5 OSK + Sall4 [237], 6 OSK + Glis1 [238], 7 OSK [86, 239], 8 OS + Esrrb [240], 9 OKM + Alk5 inhibitor [241], 10 OSM + Kenpaullone [242], 11 OS [243], 12 O + Bmp4 [244], 13 O + Bmi1 [245], 14 OK + BIX01294 + BayK8644 [246], 15 OK + BIX01294 + RG108 [246], 16 OK + BIX01294 [246], 17 O4 + VC6T [VC6T: (valproic acid, CHIR99021, 616452, tranylcypromine)] [247], 18 O + oxysterol and/or puromorphine [245], 19 O4 + Shh [245], 20 *Oct6 (engineered Oct6) [87] /Brn4 [75], 21 DP (dermal papilla): OKM, OK, O [248, 249], 22 HPC (hematopoietic stem cells): OSKM [250], 23 ADC (adipose derived stem cells): OSKM [251], 24 TSC (trophoblast stem cells): OSKM [252], 25 NSC (neural stem cells): OKM [250], OK + BIX01294 [253], OK + 2i (2i denotes PD0325901 and CHIR99021) [254], OK [255], O [89], 26 melanocytes and melanoma cells: OKM [256], 27 hepatic EN (endoderm): [257], 28 B and T cells: [250, 258], 29 pancreatic B (beta) cells: [259], 30 myeloid PC (progenitor cells): [250], 31 skeletal MSC (muscle stem cells): [260], 32 Oct6-KSM [86], 33 Oct1-KSM [86]. b Human pluripotency reprogramming cocktails: 1 OSKM [72, 261, 262], 2 OSNL (N: Nanog; L: Lin28a) [74], 3 OSK + Sall4 [237], 4 OSK + Utf1 [263], 5 OSK + Glis1 [238], 6 OSK, OS [243], 7 OS + VPA (valproic acid) [264], 8 urine (renal epithelium cells): OSKM [95], 9 keratinocytes: OSKM [265], OSK [266], 10 MSC/dental (mesenchyme-like stem cells of dental origin): OSNL [267], 11 NSC: O [88], 12 Am-DC (amnion derived cells): OSN [268, 269], 13 SMA (spinal muscular atrophy): OSNL [90], 14 FD (familial dysautonomia): OSKM [91], 15 ALS (amylotropic lateral sclerosis): OSKM, [92], 16 Parkinson’s: OSKM, OSK [93], 17 Genetic diseases: OSKM, OSK: [94]. 18 CB (cord blood): OSNL [270], OSKM, OSK, OS [271], 19 hepatocytes: OSKM [272], 20 ADC (adipose derived cells): OSKM [251], OSK, [273], 21 M and M (melanocytes and melanoma cells): OKM [256], 22 pancreatic beta cells: OSKM [274]. c Mouse lineage reprogramming cocktails: 1 BAM (Brn2, Ascl1, Myt1l) [105], 2 BAM [83, 85], 3 NPCs [77, 79], 4 pancreatic lineages [99], 5 cardiomyocytes [101], 6 cardiomyocytes (Oct4 alone) [84], 7 BKSM/OSKM [75], 8 BKSM + E47/Tcf3 [76], 9 Brn2 + Foxg1 + Sox2) [78]. d Human lineage reprogramming cocktails: 1 endothelial cells [102], 2 blood [103], 3 hepatic cells [100], 4 iNSC (induced neural stem cell) [80], 5 Brn2 + Neurod1(β2) [82], 6 BAM or BAM + β2 or BM + β2 (β2: Neurod1) [82], 7 BAM + Lmx1a +Foxa2 [108], 8 Brn2 + Mytl1 + miR-124 [81]

Somatic cell reprogramming to pluripotent stem cells

A hallmark accomplishment was the demonstration that self-renewing induced pluripotent stem cells (iPSCs) can be generated by de-differentiating fibroblasts through the forced expression of exogenously provided TF cocktails (Fig. 2a, b). To accomplish this feat, 24 factors were screened and a minimal set of four factors comprising Oct4, Sox2, Klf4 and c-Myc (OSKM) was identified [73]. The same cocktail was also able to achieve pluripotency reprogramming of human somatic cells [72]. The same year, a modified human iPSCs generating cocktail was reported that also contained Sox2/Oct4 but Nanog/Lin28 replaced Klf4/c-Myc [74]. Oct4 could not be replaced by Oct1 or Oct6 to induce pluripotency despite profound sequence conservation emphasizing its uniqueness [86]. However, recently Oct6 could be converted into an iPSCs inducer using rational protein engineering [87]. Similarly, the neural factor Brn4 was reported to allow iPSCs generation as part of an inducible polycistronic cassette consisting of Brn4, Sox2, Klf4, c-Myc (BSKM) [75]. However, both engineered Oct6 and Brn4 induce iPSCs with very low efficiency (Fig. 2a). Therefore, what endows Oct4 with its potent reprogramming activity is still an open question. Besides fibroblasts, Oct4 also reprograms other cell types (summarized in Fig. 2). Oct4 alone could induce pluripotency in mouse and human NPCs [88, 89]. This is possible because the high expression of Sox2 obviates the need for its exogenous supply. A number of studies explored alternative reprogramming strategies centered on Oct4 where the SKM factors were replaced by small molecules or other factors (Fig. 2). The iPS technology has broad clinical applications in particular in disease modeling and iPSCs could be derived from patients affected by spinal muscular atrophy (SMA) [90], familial dysautonomia (FD) [91], amylotropic lateral sclerosis (ALS) [92], Parkinson’s disease [93] and variety of genetic diseases [94]. Oct4 has also been deployed to convert the renal epithelium cells from urine in a quick and efficient way, omitting the need of invasive techniques for obtaining patient samples [95]. In sum, Oct4 is a stalwart component of cocktails directing the induction of pluripotency in large variety of reprogramming systems and technologies.

Lineage reprogramming to directly trans-differentiate somatic cells

Neural precursors cells (NPCs, here used to jointly include neural stem cells and neural progenitor cells following [96]) are proliferative and can, therefore, be expanded in culture and possess the capacity to differentiate into mature post-mitotic cells. If NPCs are tripotent they can form neurons as well as the non-neural glia including oligodendrocytes and astrocytes. Induced NPCs (iNPCs) have initially been generated using the otherwise iPSCs generating OSKM cocktail but with modified culture conditions favoring neural lineages [77] (Fig. 2c, d). OSKM-driven production of iNPCs could also be achieved by limiting the exposure of reprogramming cells to Oct4 to the early stages [79]. Alternatively, modified cocktails were used to produce iNPCs where the pluripotency factor Oct4 was replaced with Brn4 or Brn2. First, Brn4 was used alongside Sox2, Klf4 and c-Myc and the reprogramming efficiency could be further improved by the addition of E47/Tcf3 [76]. Second, tripotent iNPCs could be generated using a Brn2/Sox2/FoxG1 cocktail [78]. This suggested that the reprogramming of fibroblasts takes fundamentally different routes depending on whether Oct4 or class III POU TFs are forcibly expressed in the starting cells. Apparently, POU TFs have profoundly different activities in an identical nuclear environment despite a high degree of sequence homology. However, recent lineage tracing experiments challenged this view and indicated that both OSKM and BSKM cocktails induce a pluripotent state [75]. Therefore, iNPCs may in fact transit through an intermediate pluripotent state rather than being directly reprogrammed. This also implied that exogenous Oct4 and class III POU TFs engage the fibroblast genome similarly to initiate reprogramming and the direction of the cell state conversion relies on signaling provided at later stages. To resolve this question the genome engagement and associated epigenetic changes driven by Oct4 and class III POU TFs should be profiled side-by-side.

With the objective to generate differentiated lineages, Oct4-containing cocktails have been deployed in a strategy called cell activation and signaling directed (CASD) lineage reprogramming [97, 98]. Typically, a pulse of OSKM expression epigenetically de-differentiates the donor cells and induces a plastic state where cells are receptive for differentiation cues provided by small molecules or growth factors [99]. This strategy has been used to convert fibroblast cells into hepatic [100], cardiac [84, 101], neural [77], endothelial [102], pancreatic [99] and blood lineages [103]. However, lineage-tracing experiments have put into question whether this approach truly circumvents the pluripotent state [75, 104].

Besides the generation of iNPCs, class III POU TFs have also been deployed to produce post-mitotic neurons. Here, Brn2 along with Ascl1 and Myt1l (BAM, Fig. 2c, d) could convert dermal fibroblasts into neurons with surprisingly high efficiency and speed [83, 85]. Besides fibroblasts, the same cocktail could also reprogram endodermal hepatocytes [105] (Fig. 2c). Brn2 containing cocktails were also used to induce neurons from human cells using analogous strategies [81, 82, 106,107,108] (Fig. 2d). However, the efficiency in the human system is much lower and additional factors are required. Collectively, these studies demonstrate that the POU TF scaffold enables the potent interconversion of cellular states. However, individual POU TFs function non-redundantly and are mostly irreplaceable by other family members. Reprogramming experiments provide a powerful assay to delineate the sequence-function relationships defining the regulatory programs driven by POU factors.

DNA recognition

Monomeric binding and cross talk of POUS and POUHD

After the discovery of octamer DNA in a selected set of regulatory sequences, it was verified as the dominant recognition sequence for many POU TFs by several unbiased assays. Initially, the octamer was recovered as the preferred binding sequence for Oct1 in a random oligonucleotide selection study using the POU domain of Oct1 [109]. The relevance of octamer DNA to recruit POU TFs to chromatin was, for example, verified in a ChIP-seq study for Oct2 in splenic B cells [110]. To understand the structural basis for the recognition of the octamer, the structures of the 75 amino acid POUS [111, 112] and the 60 amino acid POUHD [113, 114] were initially analyzed separately by nuclear magnetic resonance (NMR) in the absence of DNA. The crystal structure of a whole POU could first be solved for Oct1 bound to a classical octamer DNA element derived from the histone 2B promoter (Fig. 3a) [115]. These studies showed that the POU domain belongs to the class of all-alpha domains with four helices in the POUS and three helices in the POUHD. Helices 2 and 3 of the POUS adopt a helix-turn-helix (HTH) fold and bind DNA using amino acid–DNA base interactions nearly identical to HTH prototypes found in repressor proteins of the bacteriophages 434 and λ [111, 115]. The four amino acids used to make base specific contacts are conserved in the POU family and in phage repressors indicating that the fold is evolutionarily ancient. Analogously, helices 2 and 3 of the POUHD form a HTH unit. For both the POUS and the POUHD, helix 3 is inserted into the major groove of the DNA and contributes most of the residues involved in base-specific interactions (Fig. 3a–e). The overall fold of the POUHD and residues engaged in base-specific contacts are very similar to classical homeodomains including the invariant Asn51 typically making a bi-dentate H-bonds with an adenine in the recognition sequence. Cys50 of the POUHD is a sequence variation characteristic for the POU family (Fig. 3b). The POUHD contains an extended Arg-Lys rich N-terminal arm mediating minor groove contacts. Surprisingly, the POUHD and the POUS are not making any direct protein–protein contacts when binding the canonical octamer DNA and the two domains lie on opposite sides of the DNA. This suggested that both units are in fact independently acting domains with autonomous DNA binding activity. Consistently, the isolated POUS and POUHD can bind DNA sequence-specifically with the POUs selecting a [A/G]TAATNA and the POUHD a GAATAT[T/G]C) consensus [109]. However, the affinity of the separated domains for these elements is lower than that of the intact POU for the full octamer. In the context of an intact POU-bound octameric DNA, the POUs binds the ATGC half-site and the POUHD the AAAT half-site [109, 115]. In particular, the POUS has only a moderate affinity for DNA by itself, but is highly sequence-specific. Further studies showed that two-nucleotide spacers between ATGC and AAAT half-sites are permitted [116]. Despite the lack of direct interactions, POUS and POUHD mutually influence each other’s DNA binding by a mechanism termed DNA mediated cooperativity. Here, allosteric changes to the structure of the DNA indirectly facilitate the recruitment of partner factors. This mechanism has been reported for a range of dimeric TF associations [117,118,119] and the POU TFs present an intriguing example of this binding mode for covalently coupled domains [120]. Consistently, high-throughput protein-binding-microarrays (PBMs) verified the full octamer as well as the POUHD half-site as being part of the binding landscape of Pit1 and Oct1 even in the context of the bipartite POU [121, 122]. The authors highlighted Oct1 as an example of a TF that is able to associate with DNA in multiple binding modes leading to primary, secondary and tertiary DNA binding motifs. TFs with several structural units such as the POU appear to be particularly well suited to accommodate alternative DNA sequences. NMR studies led to an interesting proposal of how the modular make-up of the POU TFs facilitates their search for functional target sites in the genome. These studies demonstrated that the POUS and the POUHD of Oct1 could bind non-consecutive DNA elements independently and thereby tether unlinked DNA molecules [123]. This observation inspired a model where the POU scans the genome with the more tightly bound POUHD remaining DNA associated whilst the detached POUS acts as a molecular antenna and samples proximal binding sites. If suitable alternative sites are encountered a process termed intersegment transfer via intermolecular translocation ensues.

Fig. 3
figure 3

Structural basis for monomeric and dimeric DNA recognition. a Structure of the Oct1-POU bound as monomer to octamer DNA (PDB ID: 1OCT [115]). The POU-specific (POUS) domain is colored green, the POU-homeodomain (POUHD) orange and the linker magenta. b Sequence logo representing the POUS (upper panel) and the POUHD (lower panel) of all 15 mouse POU proteins generated using weblogo (http://weblogo.berkeley.edu/). DNA contact residues, interaction interfaces on MORE and PORE DNA and selected phosphorylation sites are indicated. The two asterisks mark the residue 22 of the POUHD shown to switch the function of Oct1/2 and Brn3a/Brn3b and the residue 59 of the POUHD switching the preference of Oct4/Oct6 for MORE DNA, respectively [227, 229]. c Sequences of the hypervariable linker connecting the POUS and the POUHD. d Oct4 homodimer on PORE DNA element (3L1P [186]) and e of Oct6 bound to MORE DNA (2XSD, [131]. The POU-specific (POUS) and the POU-homeodomain (POUHD) are colored as in a with lighter shades for molecule 1 (M1) and darker shades for molecule 2 (M2) of the POU homodimers. N and C termini are marked

DNA dependent partnerships

POU TFs engage in protein interaction networks that likely change in the context of the bound DNA sequences leading to the recognition of variable sequence signature in chromatin (Fig. 4a–c). These interactions are expected to be critical for the cell-context specific function of POU TFs. The protein interactions are not restricted to the nuclear compartment but also influence other important processes such as intracellular transport and targeted degradation. Here we focus on the protein interactions that are supported by dedicated biochemical experiments or structural analysis and influence DNA recognition and target gene selection.

Fig. 4
figure 4

DNA motifs from genome-wide studies. Position weight matrices (PWMs) resembling a the octamer, b MORE and c canonical/compressed SoxOct elements. PWMs were downloaded from the HOMER database (http://homer.ucsd.edu/homer/index.html) originating from ChIP-seq studies for Oct2 (GSE21512); Oct4 (GSE11431); Oct6, Brn1, Brn2 (GSE35496); Pit1 (GSE58009); canonical SoxOct (GSE11431); compressed SoxOct (GSE44553). Cartoons depict the expected configurations of POUs bound to these sequences. Cartoons for the PORE and SoxOct + 3 bp configuration are shown as reference but the corresponding PWMs are not represented in the database. Selected genes regulated by variant SoxOct elements are listed. The Sox2-HMG is brown and the Sox17-HMG red

Homotypic dimerization

We refer to homotypic interactions for homodimers or heterodimers between paralogous POU TFs. POU factors are a relatively rare example of proteins forming facultative DNA dependent dimers in a versatile range of configurations mediated by their DNA binding domain. Other TF families such as basic helix–loop–helix factors (bHLH, i.e., Myc, Ascl1) or leucine zipper form obligate DNA independent dimers. In early studies cooperative homodimeric binding of Oct1 and Oct2 was observed on composite sequences with a low-affinity heptameric sequence preceding the classical octamer separated by a 2 bp spacer (CTCATGAATATGCAAAT) [124,125,126]. If the heptamer was replaced by a second octamer in the reverse orientation cooperative dimerization persisted [125]. Using random oligonucleotide selection, Brn2 of the POU III class, was found to bind a sequence with flexible spacing between the palindromic half-sites [127]. Brn2 as well as Brn3a (Pou4f1, in [127] termed Brn-3.0) were found to homodimerise on this sequence in a highly cooperative fashion. A structural basis for the homotypic POU dimerization was first provided for Pit1 bound to a PitD DNA element (ATGTATATACAT) [128]. In an effort to test whether the configuration of Pit1 can also be adopted by octamer binding POU TFs, Tomilin et al. converted this element into a perfect palindrome by introducing nucleotides favored by the POUS on octamer DNA [129]. This led to the identification of the More palindromic Oct factor Recognition Element (MORE) with a consensus ATG(C/A)AT(A/T)0–2AT(G/T)CAT that resembles both PitD as well as the sequence preferred by Brn2 [127,128,129,130]. On MORE DNA, cooperative formation of homodimer and heterodimers are formed by Oct1, Oct2, Oct4 and Oct6 and a half-site spacing up to 2 base-pairs is tolerated [129]. Structural studies confirmed that the binding mode of Oct1 [130] and Oct6 [131] to MORE is very similar to Pit1/PitD [128, 132]. Closer inspection of the above-mentioned composite heptamer/octamer element revealed that it in fact also represents a degenerate MORE variant (ATGaATATGCAa, positions deviating from the MORE consensus in small caps) [129]. Further, POU TFs form homodimers on an imperfect palindrome termed PORE (Palindromic Octamer Recognition Element, ATTTGAAATGCAAAT) discovered in the enhancer region of the Osteopontin gene [133]. Effective Oct4 dimerization on the PORE is necessary for transactivation. Crystallographic analysis revealed that Oct1 binds MORE and PORE DNA in markedly different configurations [130]. On PORE DNA, the POU subdomains of one molecule bind major grooves on opposite sides of the DNA. By contrast, on MORE DNA the POU rearranges such that the two domains bind adjacent major grooves on the same face of the DNA (Fig. 3d, e). As a consequence, alternative protein contact interfaces of the POUS and the POUHD mediate the dimeric interactions (Fig. 3b). The functional importance of the MORE motifs in gene regulation in a chromatin context has recently been supported by ChIP-seq analysis for Pit1 [134] and class III TFs [135] (Fig. 4b). However, the PORE has been recovered only once with moderate enrichment levels in a re-analysis of ChIP-seq data of mouse embryonic stem cells (ESCs) [136] suggesting that the bona fide PORE consensus is rarely utilized in a chromatin context. Nevertheless, PORE-like dimer configurations could well be a common mode of chromatin engagement by POU factors as non-DNA binding co-factors can compensate for the degeneration of the PORE consensus. A particularly intriguing example of how sequence requirements of the POU for its DNA target elements can be relieved, is provided by the B cell specific co-factor OBF1 (OcaB, Bob1). OBF1 forms a complex with POU dimers bound to PORE DNA reducing the dissociation rate leading to a profound stabilization of the DNA bound complex [137]. Strikingly, mutations to the POU protein and to the PORE DNA recognition element that normally obstruct dimeric binding can be reversed by OBF1 leading to protein–DNA complexes indistinguishable from the wild-type situation. Apparently, OBF1 alleviates sequence requirements in both the POU domain as well as in the DNA recognition elements including its actual sequence and the half-site spacing. This mechanism adds an additional layer of control and permits the recognition of target genes controlled by DNA elements that strongly diverge from consensus POU recognition sequences that would not be bound in the absence of the co-factor.

To identify structural elements that set POU factors biochemically and functionally apart, models to quantify homodimer cooperativity factors [138] were used to compare POU homodimerisation on a canonical MORE element (ATGAATATTCAT) for factors from POU classes I (Pit1), II (Oct1), III (Brn2 and Oct6), V (Oct4) and VI (Brn5) [87]. Whilst all tested POU TFs are able to form homodimers on the MORE, the extent of the cooperativity is substantially different. All tested somatic POU TFs (Pit1, Oct1, Brn2, Oct6 and Brn5) homodimerise on the MORE with a stronger cooperativity factors than the pluripotency factor Oct4. By structural analysis, the basis for this difference could be worked out. A single amino acid at the C-terminal position 59 of the POUHD (corresponding to position 151 of the Oct4 POU) interacts with a hydrophobic pocket of the POUS located at the opposite face of the DNA (Fig. 3b). Oct4 encodes a polar serine but Oct6 a hydrophobic methionine at this position [87, 130, 131]. When these residues are swapped, the resulting Oct6M59S shows a reduction in cooperativity on MORE whereas Oct4S59M now homodimerises very efficiently reminiscent to wild-type Oct6. The Oct6M59S mutation in combination with further modification can convert Oct6 into a pluripotency reprogramming factor [87]. This suggests that the ability to associate on MORE DNA in a cooperative manner evolved to functionally distinguish POU family members and in particular Oct4. Consistently, POUHD residue 59 is part of a distinctive dipeptide that defines the branches of the POU family suggesting it could be under positive selection leading to family specific functionality [30].

High-throughput SELEX (systematic evolution of ligands by exponential enrichment) reiterated the complexity of DNA recognition by POU TFs [139, 140]. Many of the profiled POU factors were found to share partially overlapping sites that can be arranged in a tiled pattern with overlapping half-site and alternative spacing. These studies recovered a number of motifs resembling the canonical octamer as well as the dimerization promoting MORE elements. Yet, a number of the reported elements suggest potentially novel binding configuration for which the stoichiometry is not obvious. Collectively, homotypic POU dimers critically contribute to the function of these proteins and such binding configurations are more common than initially assumed.

Heterotypic dimerization

The most prominent and intensely studied heterodimer partners of the POU family are from the Sry-related box (Sox) family. The Sox family consists of 20 members and possesses a high mobility group (HMG, 79 amino acids) DNA binding domain associating with a CATTGTC consensus via the minor groove of the DNA [141,142,143,144,145,146]. A DNA-dependent interaction between Oct4 and Sox2 was first described for an enhancer in the last exon of the Fgf4 gene [61, 147, 148]. Transactivation relies on Sox2/Oct4 dimerization on a composite DNA element with a 3 base-pair spacer between Sox and octamer sites (CTTTGTTtggATGCTAAT, spacer nucleotides in small cap). The Sox2/Oct4 dimer functions non-redundantly, as paralogous POU and Sox family members could not activate Fgf4 [61, 147]. Subsequently, co-operative binding of Oct4 and Sox2 on composite SoxOct elements was found to regulate Utf1 [149], Lefty1 [150], Fbx15 [151], Nanog [152, 153] as well as to auto-regulate Sox2 [154] and Oct4 [155, 156]. However, in all these cases the 3 bp spacer found in the Fgf4 enhancer was eliminated leading to a more compact composite element.

Microarray and deep sequencing enabled the genome-wide profiling of POU TFs and pluripotent cells were intensely scrutinized using these technologies with the consequence that Oct4 is the best studied family member. Initially, Oct4 binding to transcription start site (TSS) upstream regions was profiled in human ESCs using DNA microarrays (ChIP-on-ChIP) [157]. This study revealed a high degree of co-occupancy of Sox2, Oct4 and Nanog. A year later Oct4 binding was profiled in mouse ESCs using the chromatin immunoprecipitation paired-end ditag method and de novo motif discovery revealed the composite SoxOct motif as the most enriched sequence signature [158]. This suggested that DNA dependent heterodimerisation of Oct4 with Sox2 on the enhancers of pluripotency genes is a very frequent event. The advent of the more comprehensive chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) technique again led to the identification of a composite DNA element composed of the Sox half site juxtaposed by the Octamer element (CATTGTCATGCAAAT, henceforth termed canonical SoxOct motif). This sequence signature was found to be the most enriched de novo motif not just for Sox2 and Oct4 but also for Nanog, Smad1 and p300 datasets [159] (Fig. 4c). Clearly, Sox2/Oct4 heterodimers play essential roles to assemble regulatory complexes that regulate 100s if not 1000s of genes conferring the pluripotent phenotype. The SoxOct element was also the top motif in ChIP-seq studies using human ESCs [160]. Nevertheless, the cistrome of Oct4 is poorly conserved in mouse and human indicating a high degree of rewiring of cis-regulatory networks in mammalian evolution mainly driven by transposable elements [160]. The notion that the cooperative formation of Sox2/Oct4 heterodimers is essential for pluripotency was reinforced with rationally designed mutants that do not affect the monomeric DNA binding of Sox2 or Oct4 but disrupt DNA dependent heterodimerisation. Both, mutations disrupting dimerization on the canonical SoxOct elements introduced to Sox2 [161, 162] or to Oct4 [87] prevent pluripotency reprogramming. Mutations interfering with Sox2/Oct4 heterodimerisation on the Fgf4-like sequence with 3-bp spacer reduce reprogramming efficiency but do not completely disrupt the process [162].

The genome-wide profiling of Oct4 binding at different stages of pluripotency reprogramming in mouse cells showed that the genome engagement of Oct4 is highly dynamic [163, 164]. These studies indicate that Oct4 is not hitting its binding site in pluripotency enhancers ‘on-target’ but initially binds mostly somatic enhancers and switches to pluripotency enhancers at later stages of the 1–2 week procedure. Only a small proportion of sites are constitutively bound. The correlation of Oct4 binding with histone marks revealed that enhancer activation happens in a stepwise manner whereby Oct4 binding is preceded by H3K4me1 (priming mark) and followed by H3K27ac (activation mark) [163]. The activation of early engaged and constitutively bound pluripotency enhancers require Oct4, Sox2, Klf4 (OSK) to work in concert with somatic TFs (Runx1, Fra1, Cebpa, Cebpb) that are replaced by pluripotency related TFs later on [164]. On the contrary, the co-binding of Sox2/Oct4 and Esrrb facilitates the activation of late bound pluripotency enhancers [164]. These studies indicate that intricate stage specific heterotypic partnerships of Oct4 enable differentiated cells to attain pluripotency. A single molecule imaging study proposed that chromatin engagement of Sox2 and Oct4 is hierarchically ordered with Sox2 binding first assisting the subsequent recruitment of Oct4 [165]. This suggests that Sox2 facilitates the target selection whilst Oct4 predominantly functions to stabilize the heterodimer complex extending its residence time [165]. As these observations were made in 3T3 cells it is unclear whether a similar mechanism applies to pluripotency reprogramming and whether the binding hierarchy is different for closed versus open chromatin.

A high throughput consecutive affinity-purification-SELEX (CAP-SELEX) approach enabled the profiling of TF dimerization using target sequences with 40 randomized base pairs in a scalable manner [166]. This way, a number of co-motifs for POU TFs could be identified including the canonical SoxOct composite motif. Additional heterodimers were predicted for Oct1/POU2F1 with GSC2, TBX21, EOMES, ELK1, ETV1, SOX15, HOXB13 and SOX2. The OCT1/GSC2 dimer showed a strictly constrained arrangement of the two half sites with regard to spacing and order of the TFs. It will be interesting to probe whether these dimers also occur in the context of chromatin and mediate POU functions.

Is there a SoxOct partner code?

The Sox2/Oct4 heterodimer is one of the most prominent examples of heterotypic TF partnerships for its relevance in the gene regulatory network inducing and maintaining pluripotency. The abundance and the strict physical constraint of this complex lead to its repeated discovery in enhancer sequences. A series of additional Sox/Oct heterodimers were reported giving rise to the notion that there could be a Sox/Oct partner code at the heart of gene regulatory networks in early development [167,168,169]. A Sox/Oct partner code presumes two possible scenarios. First, alternative heterodimers prefer different composite DNA motifs and hence select specific sets of target genes. Monomeric factors lack the sequence selectivity and affinity to target functionally relevant gene sets. Second, alternative dimers may retain similar preference for DNA target sites. Yet, molecular events and functional outcome of a binding event may change by turning an activating regulatory complex into a repressive one by means of direct competition. For example, a Sox10/Oct6 dimer was reported to selectively synergize in glial cells [170, 171]. Yet, the DNA sequence requirements for this partnership could not be worked out. In NPCs, Brn2 and Sox2 partner on a non-canonical SoxOct composite element to drive the expression of the Nestin gene [172]. However, genome wide studies did not support a broad application of this dimer configuration. The model of a switch of Sox2 from Oct4 to the neural class III POU factors is particularly appealing as Sox2 functions prominently in both pluripotent as well as neural lineages. Indeed, a switch from Oct4 to Brn1/2 on identical enhancer locations was suggested to maintain Sox2 expression in pluripotent and neural cells [173]. The partnership of Sox2 and class III POU TFs was further explored in genome-wide studies. Initially, Brn2 and Sox2 were reported to co-bind the canonical SoxOct element with Sox2 in NPCs reminiscent to the Sox2/Oct4 heterodimer in ESCs [174]. The differentiation of ESCs to NPCs led to the co-recruitment of Sox2/Brn2 to NPC-specific enhancers. Conversely, overexpression of Oct4 in NPCs led to generation of iPSCs presumably by re-directing Sox2 from neural to pluripotency enhancers [88, 89]. Whilst these studies indicated that the DNA sequence signature facilitating SoxOct heterodimerisation in ESCs versus NPC/NSCs is indistinguishable with the canonical SoxOct motif at its core, the set of bound genes is very different. Consistently, Sox2/Oct4 and Sox2/POUIII dimers (Oct6, Brn2 and Brn4) showed an indistinguishable cooperativity pattern when 324 composite DNA sequences were profiled in a recent cooperativity-by-sequencing (Coop-seq) study [175]. Similarly, EMSA measurements showed that the cooperativity constant for Sox2/Oct4 dimerisation was only about twofold higher than for Sox2/Oct6 dimerization on canonical SoxOct DNA [87]. This poses the question as to the molecular basis for the selection of unique gene sets by Sox2/Oct4 versus Sox2/POUIII complexes. A variation to the SoxOct element in the POUHD bound portion found in the Utf1 (ATGCTAGA) sequence had been attributed to this difference as only Sox2/Oct4 but not Sox2/Oct6 could effectively dimerize on this sequence [176]. However, genome-wide analysis could so far not detect obvious difference at this position. Mistri et al. also profiled the binding of Oct6, Brn1 and Brn2 in mouse NPCs [135]. Contrary to Lodato et al., here the homodimerisation promoting MORE was identified as the preferred motif for all three class III POU factors. In support of this finding, motif scanning with the MORE using data from Lodato et al. could detect the MORE in a large fraction of Brn2 binding sites [87, 135]. Analogously, re-analysis of Brn2 ChIP-seq data during the reprogramming of MEFs to neurons showed a strong enrichment of the MORE [85, 87]. Collectively, a picture emerges that differential binding to MORE sequences appears to contribute more strongly to the selection of unique sets of target genes by Oct4 and class III POUs whereas the capacity to dimerise with Sox2 on canonical SoxOct is shared. In another study, novel configurations of SoxOct elements were searched for in the enhancers of ESCs and a variation of the canonical SoxOct was discovered to show a modest level of enrichment [161]. Here, a single nucleotide separating Sox and Oct half-sites is eliminated and this sequence was hence designated ‘compressed’ SoxOct element (CATTGTATGCAAAT). Surprisingly, Sox2 and Oct4 are unable to co-bind this sequence whilst Sox17 and Oct4 dimerize very efficiently on it [161, 175, 177]. ChIP-seq verified the functional relevance of the compressed SoxOct motif after the overexpression of Sox17 in ESCs as well as after the retinoic acid induced endodermal differentiation of embryonic carcinoma cells [178]. By structural analysis, the molecular basis for the differential propensities of Sox2/Oct4 and Sox17/Oct4 dimers to differentially select canonical or compressed DNA elements could be elucidated. Residue 57 of helix 3 of the HMG box encodes a basic lysine in Sox2 but an acidic glutamate in Sox17 [141]. Grafting this residue into Sox17 to generate the engineered Sox17EK produced a protein that cooperates more strongly than Sox2 with Oct4 on the canonical SoxOct element. By contrast, Sox17EK loses the capacity to dimerize with Oct4 on the compressed element in vitro and in the context of chromatin [161, 178, 179]. The functional relevance of this finding was further tested in pluripotency reprogramming. Only Sox1, Sox3 and Sox15 can replace Sox2 in this experiment but Sox17 and other Sox family members cannot [86, 180]. Strikingly, Sox17EK and the analogous Sox7EK now acquire the capacity to generate iPSCs in mouse and human cells and substantially outperform wild-type Sox2 [161, 180]. Consistently, Sox17EK is able to support the self-renewal of ESCs depleted of Sox2 whilst Sox2KE loses this capacity [181]. This provides a compelling example how the subtle features of single amino acids in trans and of single nucleotides in cis define gene regulatory networks during embryogenesis. Moreover, this insight provides a molecular mechanism for the context dependent switch of the regulatory activity of Oct4. Relatedly, Sox17 has recently been found to be a key regulator of the differentiation of primordial germ cells (PGCs) from naïve pluripotent cells in human, but not mouse [182]. As Oct4 is expressed in pluripotent cells as well as PGC it will be of interest to explore whether an analogous Sox/Oct partner switch that relies on canonical and compressed motifs guides the redistribution from pluripotency to PGC enhancers.

Collectively, Sox factors evolved molecular interfaces allowing their selective association with POU TFs in the context of alternative composite DNA sequences, two of which (canonical and compressed SoxOct DNA elements) are widespread in developmental enhancers. By contrast, whether or not POU TFs evolved similarly selective interfaces is presently unclear.

Binding to methylated DNA

DNA methylation changes profoundly during the conversion of MEFs to iPSCs and the DNA-methylation level was initially reported to be negatively correlated with Oct4 binding [163]. Another study reported that Oct4 binding is independent of the DNA-methylation status at early stages of reprogramming, whereas at late stages Oct4 binding coincides with unmethylated DNA at pluripotency enhancers [183]. To directly examine the impact of cytosine methylation on DNA by TFs the high-throughput methylation-sensitive SELEX was developed [140]. It is generally thought that DNA methylation serves as a barrier to TF DNA interactions and that the remodelling of CpG methylation is essential for cell fate transition. Indeed, a large fraction of the profiled TFs showed sensitivity to enzymatically introduced methyl moieties at CpG dinucleotides. However, for a subset of TFs, DNA methylation facilitates rather than blocks binding. In particular, many homeodomain containing TFs show a preference for methylated DNA including Oct4. This study identified a tertiary Oct4 motif comprised of a palindrome consisting of two POUS half-sites ATGCGCAT. SELEX enrichment of this element is increased upon methylation suggesting that Oct4 preferentially targets it in the methylated state. The preference of Oct4 to bind this element in its methylated form was validated using ChIP-seq combined with whole-genome bisulfite sequencing in mouse embryonic stem cells where the DNA de-methylating enzymes Tet1-2 were knocked-out leading to an accumulation of methylated DNA. Interestingly, as the ability to bind methylated CpG DNA is largely restricted to TFs involved in embryonic development such as homeodomain TFs, the potency to bind methylated DNA may underlie their ability for epigenetic remodeling leading to cell fate switching.

The role of the linker

The length of the linker varies for different POU TFs (Fig. 3c). Structures of higher order POU/DNA complexes demonstrated a striking flexibility to accommodate alternative DNA sequences and to adopt very different conformations induced by the DNA (Fig. 3d, e). This versatility is enabled by the variable and structurally flexible linker that allows for the spatial reorganization of two domains endowing POU TFs with the ability to adopt diverse quaternary structures. The linker was not visible in earlier structures [115, 128, 130, 131, 142, 184]. However, the linker could be modeled for Brn5-POU bound to a non-octamer DNA sequence derived from the corticotrophin-releasing hormone promoter [185]. The relative orientation of the POUS and POUHD is flipped in Brn2 in comparison to the arrangement seen for Oct1 on the octamer element. Here, the linker extends the helix 4 of the POUS by 7.5 Å. The extended helix is followed by a sharp turn and an additional short helix. Similarly, a large part of the linker was visible in Oct4POU structure on PORE DNA element whilst it was invisible in the Oct1/PORE complex [186]. The linker of Oct4 forms an additional alpha-helix with amino acids unique to Oct4 but conserved amongst its orthologues. This finding led to the conclusion that Oct4link contributes to the distinct role of Oct4 in pluripotency reprogramming presumably by recruiting specific co-activators [186]. Consistently mutations to the linker were found to affect the induction of pluripotency [186,187,188]. Similarly, when the Oct4 linker is replaced by the Oct6 linker the resulting Oct4linkOct6 loses the competency to generate iPSCs [186]. However, given the poor sequence conservation the definition of the linker is ambiguous. More recently the boundaries of the linker were redefined taking structural information into account. The new definition removed an additional charge from the chimeric Oct4linkOct6 protein where the linker borders the POUHD (Oct4linkOct6: QGKRKKR, Oct6 derived residues underlined) and enabled iPSCs generation [87] whilst a version containing this additional charge was impaired (Oct4linkOct6: QGRKRKKR) [186]. Collectively, diversity in DNA recognition by POU TFs is largely achieved by modifications to the linker and to residues mediating the intra and intermolecular cooperativity. It is these regions that set POU protein apart and allow them to assemble into versatile dimeric configurations to bind alternative DNA elements to regulate non-redundant sets of genes leading to different regulatory outcomes.

Pioneering activity: a unique competency of Oct4?

During development the chromatin states and associated gene expression programs are successively changed until terminally differentiated cells are formed. Reprogramming experiments by forced TF expression have shown that a limited set of factors is capable of inducing the remodeling of chromatin even of terminally differentiated cells leading to changed gene expression programs and cell state conversions. It has been proposed that only a selected set of TFs that could access closed chromatin compacted by nucleosomes and higher order chromosomal assemblies is capable of this feat. These molecules were termed ‘pioneer TFs’ because they are required for the initial engagement of closed chromatin leading to the subsequent recruitment of non-pioneering TFs that would not be able to access these sites themselves [189]. The pioneer TF concept was proposed initially from studies on FoxA1 during the differentiation of mouse endodermal tissue to mature hepatocytes [190]. FoxA1 initially engages closed chromatin followed by the recruitment of companion TFs and gene activation. FoxA1 was also found to be able to bookmark highly compact chromatin during mitosis [191]. The DNA binding domain of FoxA1 and other forkhead TFs is of the winged–helix type [192]. This fold closely resembles the structure of the linker histone H1 [193, 194]. H1 has a major role in the compaction of histone octamers and the formation of higher order nucleosomal structures such as the 30 nm fiber [195, 196]. This similarity is attributed to the ability of FoxA1 to open nucleosomal arrays compacted by H1 as it can directly compete with H1 [197]. However, more recently additional TFs have been reported to possess pioneering activity including Oct4. The integration of micrococcal nuclease sequencing (MNAse-seq) and ChIP-seq data showed that Oct4 in combination with Klf4 and Sox2 binds DNA that was initially covered by nucleosomes at the onset of pluripotency reprogramming of human cells which is thought to be causative for rapid opening (Fig. 5a) [198]. Data were later supported by binding assays with in vitro reconstituted nucleosomes showing that Oct4, Sox2 and Klf4 bind nucleosome associated DNA with similar affinity as free DNA [199]. Motif discovery suggested that rather than binding full motifs, Oct4 only binds shortened degenerate versions of its classical binding sequence corresponding to POUS and POUHD half-sites. Soufi et al. reasoned this is because in the context of the full octamer Oct4 binds DNA with its sub-domains arranged on opposite faces (Fig. 3a). As this would lead to steric interference with histone proteins, only individual half-sites can be exposed and are accessible on the nucleosome surface. It will be interesting to test this model structurally or using binding assays with nucleosome associated DNA having different motif compositions. Regardless, structural similarity to the linker histone does not appear to be a defining feature of pioneer TFs. An extensive analysis by the Plath laboratory of the epigenetic changes during reprogramming of mouse embryonic fibroblasts to induced pluripotent stem cells came to a different conclusion as to the pioneering activity of Oct4 [164]. Here, is was reported that Oct4 and its companion reprogramming TFs predominantly bind pre-opened chromatin of somatic enhancers and promoters promiscuously at early reprogramming stages leading to the silencing of somatic genes. By contrast, pluripotency enhancers are bound and opened dynamically in a step-wise fashion in cooperation with somatic TFs rather than by a directed pioneering process starting immediately at the onset of reprogramming. Further, the starting chromatin state was suggested not to be predictive for the pluripotency enhancers the TFs will eventually target. In this view, reprogramming TFs do not actively trigger chromatin opening but rather passively follow and bind pluripotency enhancers after they were opened by an unexplained mechanism. It was concluded that there could be species-specific differences in the reprogramming process in mouse and human. A related study by the laboratory of Jacob Hanna performed a high-resolution assessment of the epigenetic dynamics of pluripotency reprogramming in mouse [200]. The cells reprogram near-deterministically within 8 days because of the depletion of the reprogramming barriers Mbd3 and Gatad2a of the NuRD complex. In this study, 74% of enhancers bound by OSK at day 1 are closed in MEFs. However, full DNA motifs rather than half motifs were discovered in these presumably nucleosome covered regions. Likewise, two additional studies conducted using mouse cells, emphasized that Sox2 and Oct4 have the ability to target compacted chromatin rather than preferring locations pre-opened in MEFs. First, a study by the Jose Polo’s laboratory used FACS (Fluorescence-activated cell sorting) to separate cells successfully progressing to pluripotency from cells that fail. The authors find that 75% of Oct4, Sox2 binding occurs at sites that are closed in MEFs but are opened at day 3 of reprogramming implying an active role of Sox2/Oct4 in the opening process [183]. Likewise, using a highly efficient chemically defined reprogramming system, Li and colleagues used time course ATAC-seq to show that the majority of somatic enhancers open in MEF are devoid of Sox and POU binding motifs whereas a large proportion of sites that undergo a closed-to-open transition contain matches to these motifs [201]. Collectively, these studies support a model where Oct4, Sox2 and Klf4 actively induce epigenetic switching of pluripotency enhancers and directly induce chromatin opening consistent with the pioneering model. Nevertheless, the discrepancies in the interpretations from these complementarity studies should be resolved, which could be due to the different reprogramming systems or because of alternative data analysis strategies. Likewise, the cis-regulatory codes dictating the engagement of closed versus open chromatin remain unclear and require further investigations.

Fig. 5
figure 5

Molecular basis for context-dependent functions. a According to current models Oct4 is be able to bind nucleosomal DNA and induce opening whilst Brn2 can only bind chromatin pre-opened by factors such as Ascl1 [85, 198, 199]. The colored rectangles denote cognate binding sites. b Pit1 bound as homodimer to a compact binding site in the Prolactin (Prl) results in transcriptional activation whilst the introduction of a TT spacer (marked in red) in the growth hormone (GH) promoter results in the recruitment of the NCoR co-repressor [132]. c Binding of Oct1 as homodimer to the PORE element recruits OBF1 to activate transcription whilst the homodimeric configuration on MORE impairs OBF1 recruitment and reduces the transcriptional response [129, 130]. dg Single amino acid exchanges can switch functions of closely related paralogs. d Interchanging position 22 of the POUHD between Oct1 and Oct2 equips Oct2 with the capacity to recruit VP-16 and regulate transcription reminiscent to Oct1 [227, 228]. e An analogous exchange between Brn3a and Brn3b was reported to switch them from transcriptional activators to repressors on certain response elements [229]. f Oct4 co-binds with Sox2 to canonical SoxOct elements to control the transcription of pluripotency related genes but in the presence of Sox17, Oct4 is re-distributed to enhancers earmarked by compressed SoxOct elements to regulated extra-embryonic endoderm (XEN) genes [178]. Point mutations at the Oct4 interaction interface of Sox2 or Sox17 change binding and function of the resulting DNA dependent heterodimers. g Oct4 prefers heterodimerisation with Sox2 on canonical SoxOct elements whereas Oct6 prefers to form a homodimers on MORE elements. A single amino acid swap between Oct4 and Oct6 changes the binding preferences of the two proteins [87]. Solid green arrows represent context-dependent activities of wild type proteins; asterisks mark mutations and dashed arrows represent newly acquired functions

The pioneering question was also addressed for Brn2 (class III POU) during the highly efficient conversion of fibroblasts to mature neurons as part of the Brn2, Ascl1, Myt1l (BAM) cocktail [85] (Fig. 2c, d). Genomic binding analysis suggested that the HLH TF Ascl1 binds neural enhancers immediately after its forcible introduction into MEFs and was hence termed ‘on-target’ pioneer [85]. In contrast, Brn2 shows little chromatin engagement at the beginning of reprogramming and associates with relevant binding sites only after Ascl1 opened them. Thus, intriguingly, despite profound structural similarity, Oct4 is believed to function as pioneer but Brn2 is not (Fig. 5a). A careful analysis of the nucleosome binding and opening of Oct4 versus Brn2 should be carried out to test this model and to search for the molecular features endowing Oct4, but not Brn2, with pioneering capacity. It is possible that modified preference to DNA target sequence influence the ability to bind and remodel nucleosomes even of closely related factors. Alternatively, there may not be a clear-cut separation between pioneer and non-pioneer TFs. Further, whether ATP-dependent remodeling complexes are required for the opening process is currently debated [202, 203]. In conclusion, the molecular mechanism as to how POU TFs engage and remodel chromatin to direct changes of cell states awaits further mechanistic interrogation.

Different DNA binding modes direct alternative regulatory outcomes

Rather than forming rigid and structurally inert ‘enhanceosomes’ [204], POU factors are more likely to engage in flexible and dynamically changing partnerships in the context of different cis-regulatory regions. This versatility probably underlies context-dependent regulatory outcomes entailing the activation or repression of nearby genes but also include more complex epigenetic processes that do not lead to obvious effects on gene expression such as the maintenance of a ‘bivalent’ chromatin state. A consensus view is that most TFs regulate their target genes by triggering the formation of a loop between enhancers and promoters mediated by large molecular machineries such as the mediator facilitated by cohesion [205, 206]. Bivalent chromatin refers to the co-occurrence of ‘repressive’ H3K27me3 and ‘active’ H3K4me3 chromatin modifications which are particularly prominent at gene promoters of developmental genes in pluripotent cells [159, 207,208,209]. Proteomic studies indicated that Oct4 interacts with the machineries depositing the activation mark H3K4me3 as well as the depletion of repressive mark H3K27me3 probably assisted by WDR5 (H3K4me3 reader) and UTX (H3K27 demethylase) [210,211,212]. Working out how DNA sequences allosterically affect the recruitment of these various complexes is crucial to reveal the molecular underpinnings for the context dependent functions of POU TFs. Alternative binding modes occur on the level of binary and ternary POU/DNA complexes encompassing homotypic and heterotypic interactions. Switched binding configurations were demonstrated by structures of Oct1 [130], Oct6 [131] and Pit1 [128, 132] on PORE or MORE DNA leading to strikingly different homodimeric assemblies. In contrast to binary POU/DNA complexes on octamer DNA that lack protein–protein interactions, such contacts exist on the ternary PORE and MORE. This raised the question as to whether different DNA binding modes lead to different regulatory outcomes.

During the development of the pituitary gland Pit1 regulates both a lactotrop and a somatotrop programs from a common primordium by activating the growth hormone or prolactin genes, respectively [132]. The prolactin (Prl-1P) element resembles a degenerate MORE without spacer whilst the growth hormone element (GH-1) possesses a version with 2 base pair spacer. When the spacer is deleted from a reporter, expression is detected in lactotropes rather than somatotropes reminiscent to wild-type Prl-1P [132]. The context specific activation of the growth hormone in somatotrope cells but not in lactotropes was suggested to be mediated by differential recruitment of N-CoR (nuclear corepressor complex), which is sensitive to half-site spacing (Fig. 5b). Analogously, Oct1 dimers on PORE were shown to interact with the coactivator OBF-1 but this interaction does not occur when Oct1 is bound to the MORE because Oct1 residues required for OBF-1 recruitment are unavailable on the MORE as they shape the homodimer interface (Fig. 5c) [129]. Collectively, the assembly of subdomains on composite DNA elements as well as the spacing between half-sites critically influences cofactor recruitment and the transcriptional consequences of a binding event.

An interesting concept is that the preference for DNA sequences, target gene selection and the regulatory outcome of POU TFs can be influenced by post-translational modifications. This could allow a single POU TF to execute different regulatory programs in the same cell without epigenetic changes or the need for new co-factors but simply in response to intracellular signaling. POU factors are subjected to a variety of post-translational modifications including phosphorylation [213,214,215,216,217,218], O-GlcNAcylation [219,220,221], SUMOylation [222, 223] and ubiquitinylation [220, 224, 225]. Phosphorylation of the Oct4-POUS was suggested to more severely affect transcription from genes controlled by the PORE compared to MORE and SoxOct dependent targets [226]. In another study, cellular stress induced Oct1 phosphorylation at Ser385 of the POUHD was found to be deleterious for monomeric binding to the octamer but favors the association with promoter proximal MOREs [214]. Additionally, mutations of several potential phosphorylation sites including the sites corresponding to Oct1/Ser385 in Oct4 resulted in the complete abolishment of its reprogramming activity [187].

Outlook

In this review, we attempted to provide an in depth summary of our present understanding, how the ability of POU factors to bind DNA and chromatin in different configurations and with different partner factors relates to their capacity to reprogram cell states. We particularly emphasised the differences between paralogous POU genes and the unique ability of Oct4 to direct pluripotency reprogramming. The POU family is a particularly captivating group of TFs for their astonishing plasticity to bind a large set of composite DNA sequences either as homodimers or heterodimers. This plasticity likely dictates the selection of specific gene sets, the nature of the regulatory outcomes following binding, and changes to chromatin dynamics. This in turn determines context specific roles in embryonic development and during cellular reprogramming. Classical sequence comparison and mutagenesis studies revealed that even very subtle changes to the POU domain could have a tremendous functional impact. For example, residue 22 at helix 1 of the POUHD was found to be the sole determinant for differential gene regulation by Oct1 versus Oct2 and Brn3b versus Brn3a in some systems (Fig. 5d, e). The mere swap of this residue between Oct2 and Oct1 bestows Oct2 with the ability to recruit the VP16 co-activator to positively regulate transcription [227, 228]. Analogously, an isoleucine at this position is essential for the repressor function of Brn3b whilst a valine as found in Brn3a leads to transactivation [229]. Interchanging this residue swaps the regulatory activity of Brn3a and Brn3b. Residue 22 is solvent exposed and thus does not affect DNA binding and the dimeric configuration. Likewise, single amino acid swaps between Sox2 and Sox17 as well as between Oct4 and Oct6 can rebalance the association with heterotypic or homotypic partner proteins, thus switching their activities in pluripotency reprogramming (Fig. 5f, g) [87, 161]. These insights reinforce the notion that the structural properties that functionally discriminate POU TFs are predominantly provided by their DNA binding domains. Yet, overall sequence function relationships, both on the level of the trans acting POU factors and the cis DNA elements they are targeting, remain only poorly understood. Deep mutational scanning [230], with a focus on sites that were shown to set POU paralogs functionally and biochemically apart, provides a powerful approach to systematically define the key amino acids that led to the biochemical and functional diversification of POU TFs. Moreover, the side-by-side profiling of the epigenetic changes accompanying pluripotency and lineage reprogramming driven by paralogous POU factors provide a means to study the functional consequences of sequence diversifications. The POU domain not only mediates DNA recognition but also the DNA dependent partnership with other TFs as well as the selective recruitment of co-activators and repressors, which we are only beginning to work out. A number of proteomics studies have begun to address this problem with a focus on Oct4 in embryonic stem cells [231,232,233] and started to reveal the underpinnings of the cross-talk of Oct4 with epigenetic modifiers and the transcriptional machinery. However, these various studies revealed a surprisingly limited overlap for the identified sets of interaction partners in embryonic stem cells [231,232,233]. Future studies should contrast the interactomes of paralogous POU factors and define how the context of DNA binding sites changes the recruitment of co-activators. Here, studying the reliance of partner recruitment on the various DNA elements targeted by POU TFs (SoxOct, MORE, PORE) could reveal important insights as to the context specific regulatory programs directed by POU TFs. Ideally, such experiments should be carried out in the context of native chromatin. Future work should also address whether different binding modalities can switch regulatory responses. For example, delayed versus immediate transcriptional responses could be reliant on the DNA-bound configuration of POU TFs. Likewise, transient and sustained transcriptional responses could be modulated in this manner. The utilization of high throughput enhancer reporter assay such as STARR-seq (self-transcribing active regulatory region sequencing) [234] provides a method to study these questions systematically.

A molecular interpretation of genomic, epigenetic and functional data is impeded because the available structural information of POU TFs is currently limited to their DNA binding domains bound to a small set of DNA binding sites as monomers, homodimers or heterodimers. Structures of full-length POU proteins or higher order regulatory complexes are not available. Obtaining such structures may remain challenging in the years to come because of the large extent of structurally disordered portions outside of the POU domain and the highly dynamic and DNA context dependent assembly of TF complexes. Such complexes can likely be assembled only in vitro, once the DNA sequence requirements and the protein–protein interaction surfaces have been further refined. Biochemical and genomic assays demonstrated that POU factors could bind nucleosomes with high affinity [199]. The cis-regulatory context that facilitates binding of POU factors to nucleosome bound DNA, which is considered to be the first step eventually leading to an increase in the accessibility of such genomic regions, is not resolved so far. Structures of TF nucleosome complex are not available at present, thus precluding a molecular understanding of the pioneering process. However, advances to study higher order nucleosomal complexes by electron microscopy could put the goal of revealing the structural basis for TF: nucleosome recognition and pioneering activity within reach [195, 196, 235]. As a result of a more detailed dissection of the sequence-function relationships for POU factors and the structural basis for DNA, co-factor and chromatin recognition; efforts to switch or enhance reprogramming activities of POU factors could be invigorated. As a consequence, we expect structurally informed protein engineering to further advance reprogramming technologies by the design of artificial TFs based on the versatile scaffold of POU TFs [236].