Main

The differentiation of pluripotent stem cells to haematopoietic lineages generates robust erythroid–myeloid lineage-restricted progenitors but not HSCs. This pattern bears marked similarities to early haematopoietic ontogeny. We hypothesized that the same epigenetic factors actively repress multipotency in embryogenesis and differentiation from pluripotent stem cells. To identify these factors, we adopted a loss-of-function screen using lentivirally delivered short hairpin RNAs (shRNAs) that target 20 DNA- and histone-modifying factors (Extended Data Fig. 1a, Supplementary Table 1). Erythroid–myeloid progenitors differentiated from human pluripotent stem cells marked by CD34 and CD45 were expanded with five transcription factors (5F). They retained embryonic features, including lack of lymphoid potential7, and this enabled us to screen for reactivation of lymphoid potential as a measure of multipotency. 5F cells were transduced with individual shRNAs and screened for T cell potential on OP9-DL1 stromal cells (Fig. 1a). The knockdown of six factors independently enhanced CD4+CD8+ T cell potential from 5F cells (Fig. 1b, Extended Data Fig. 1b).

Figure 1: In vitro screen for epigenetic modifiers that restrict lymphoid potential.
figure 1

a, Scheme for human pluripotent stem-cell differentiation into haematopoietic progenitors. CD34+ cells were transduced with the five transcription factors (5F) HOXA9, ERG, RORA, SOX4 and MYB. 5F cells were then transduced with individual shRNAs (×4 each) that targeted each epigenetic modifier and seeded onto OP9-DL1 stromal cells to induce T cell differentiation. Dox, doxycycline; EB, embryoid body. b, Strictly standardized mean difference (SSMD) of CD4+CD8+ T cell frequencies across all four shRNAs targeting each epigenetic modifier in 5F cells using two iPS cell lines, CD45-iPS and MSC-iPS1, in two independent experiments. MT, methyltransferase. c, Prospective analysis of T and B cell frequencies from 5F cells plus shRNA targeting top candidates (n = 2 biological replicates). d, Flow analysis of CD4+CD8+ T cell development of 5F cells with shRNAs targeting luciferase (shLUC) or EZH1 (shEZH1) after 5 weeks of differentiation on OP9-DL1 stromal cells. e, Flow analysis of CD19+ B cell potential. f, Quantification of T cell potential of 5F plus shEZH1 cells compared to 5F plus shLUC cells pooled across two hairpins and five independent experiments (n = 10) using several iPS cell lines (CD34-iPS, CD45-iPS and MSC-iPS1). Individual values obtained for each hairpin are shown in the Source Data. ***P = 0.001 by unpaired two-tailed t-test. g, Quantification of colony-forming potential in three independent experiments. E, erythroid; GM, granulocyte, monocyte; M, monocyte; G, granulocyte; GEMM, granulocyte, erythroid, monocyte, megakaryocyte. h, i, Flow analysis of myeloid (CD11b+) (h) and erythroid (CD71+GLYA+) (i) potential. Experiments replicated at least twice. Data are mean ± s.e.m.

PowerPoint slide

Source data

Prospective validation revealed that only EZH1 knockdown (using shEZH1) elicited robust T (16.3 ± 7.4%; mean ± s.e.m.) and B (22.5 ± 7.3%) cell potential (Fig. 1c–e), compared to shRNAs targeting a control luciferase gene (shLUC) (T cell 0.002 ± 0.002%; B cell 0.022 ± 0.006%) across multiple induced pluripotent stem (iPS) cell lines (Fig. 1f). EZH1-deficient cells retained erythroid–myeloid potential as shown by colony-forming assays (Fig. 1g) and flow cytometry (Fig. 1h, i). EZH1 knockdown also promoted lymphoid potential independently of the five transcription factors, as evidenced by robust T cell differentiation from naive CD34+ haemogenic endothelial cells (26.1 ± 16.5% shEZH1 versus 2.3 ± 0.4% shLUC) (Extended Data Fig. 1c). Further characterization was not possible owing to the limited proliferation of pluripotent stem and haemogenic endothelial cells. By contrast, 5F cells expanded exponentially (Extended Data Fig. 1d) and showed increased CD34+ progenitors after shEZH1 transduction (78.8 ± 14.2% versus 29.3 ± 10.0%) (Extended Data Fig. 1e). Taken together, these data show that EZH1 knockdown activates multipotency in lineage-restricted embryonic haematopoietic progenitors.

EZH1 is a component of the Polycomb repressive complex 2 (PRC2), which mediates epigenetic silencing of genes via methylation of lysine residue 27 of histone H38. To dissect the role of PRC2 in repressing haematopoietic multipotency, we assessed T cell differentiation upon depletion of each PRC2 subunit. In addition to EZH1, SUZ12 knockdown also enhanced T cell potential, albeit to a lesser extent. By contrast, knockdown of EED or EZH2 had no effect on T cell potential and dual EZH1 and EZH2 knockdown phenocopied that of EZH2 depletion (Fig. 2a, b). To determine whether the catalytic SET domain was required, we overexpressed full-length mouse Ezh1 or mutant Ezh1 lacking the SET domain (mEzh1ΔSET) (Fig. 2c). Overexpression of mouse Ezh1 completely abrogated T cell potential in shEZH1 cells, whereas the mutant mEzh1ΔSET did not (Fig. 2c, d, Extended Data Fig. 2d–g). Furthermore, overexpression of mouse Ezh2 failed to suppress T cell potential, despite the remarkable homology of the SET domains (Extended Data Fig. 2e, h, i). These data show that specific inhibition of EZH1, rather than antagonism of canonical PRC2, unlocks lymphoid potential and the catalytic SET domain is required for this function.

Figure 2: Repression of canonical PRC2 subunits does not activate lymphoid potential.
figure 2

a, Representative flow plots of T cell potential of 5F cells with shRNAs targeting individual components of PRC2. b, Quantification of T cell potential of 5F cells plus shRNA targeting the indicated subunit in a, shown as using two hairpins across two independent experiments (n = 4). *P = 0.0457, **P = 0.0061 by unpaired two-tailed t-test. c, Representative flow analysis of T cell potential in 5F cells plus shEZH1, with co-expression of full-length mouse Ezh1 (mEzh1) or mutant mouse Ezh1 lacking the SET domain (mEzh1ΔSET, +ΔSET). d, Quantification of flow analysis in c (n = 3 biological replicates). *P = 0.0146, **P = 0.0011 by one-way ANOVA. All plots are gated on CD45+. Data are pooled across two independent experiments. Data are mean ± s.e.m. NS, not significant.

PowerPoint slide

Source data

To understand the molecular changes upon EZH1 knockdown, we performed RNA sequencing (RNA-seq), assay for transposase-accessible chromatin using sequencing (ATAC–seq) and chromatin immunoprecipitation followed by sequencing (ChIP–seq). Upregulated genes after EZH1 knockdown were enriched for biological processes such as defence response (P = 6.8 × 10−9), immune response (P = 1.2 × 10−7) and T cell co-stimulation (P = 0.03) (Fig. 3a, b). Human haematopoieteic gene signatures9, such as of HSCs (stem), multi-lymphoid progenitors (MLP) (early lymphoid) and ProB, were highly enriched in shEZH1 cells, consistent with stem and lymphoid potential (Fig. 3c). We also performed RNA-seq and ATAC–seq on emergent haematopoietic stem and progenitor cells (HSPCs) at 10.5 days post-coitum10,11,12 from the yolk sac and aorta-gonad-mesonephros (AGM) of wild-type, Ezh1+/− and Ezh1−/− mouse embryos (Fig. 4a). Interestingly, in wild-type embryos, the expression of Ezh1 was lower in the AGM than in the yolk sac, whereas Ezh2 and Eed were higher in the AGM (Fig. 4b). Notably, Ezh1 deficiency in vivo also induced genes enriched for angiogenesis, haematopoietic/lymphoid development and immune system processes (Extended Data Fig. 3a–d).

Figure 3: EZH1 directly binds to and modulates expression and chromatin accessibility of HSC and lymphoid genes.
figure 3

a, Heat map of upregulated (104) and downregulated (49) genes (>2-fold; Benjamini–Hochberg corrected t-test, P < 0.1) from RNA-seq analysis of CD34+CD38 HSPCs 5F plus shEZH1 cells (n = 10 biological replicates) compared to 5F plus shLUC cells (n = 8 biological replicates). b, Gene Ontology (GO) analysis of biological processes associated with significantly upregulated genes in a, subdivided by GO hierarchical categories with P values labelled along the radius. c, Enrichment of human HSC and progenitor signatures by gene set enrichment analysis (GSEA) in 5F plus shEZH1 compared with 5F plus shLUC cells, overlaid on the map of human HSPC hierarchy. CMP, common myeloid progenitor; MEP, megakaryocyte–erythroid progenitor; GMP, granulocyte–monocyte progenitor; ETP, early thymic progenitor; NES, normalized enrichment score. d, Density map of upregulated and downregulated ATAC peaks by MAnorm29 in 5F plus shEZH1 compared to 5F plus shLUC cells (n = 2 biological replicates). e, GO terms of enriched biological processes of ATAC peaks in d by GREAT analysis30. f, Tracks of representative genes that acquire a significant ATAC peak upon EZH1 knockdown. g, ChIP–seq density map of EZH1 peaks within bivalent (B), repressed (R), active (A) or null (N) promoter groups (n = 2 biological replicates). K4, H3K4me4; K27, H3K27me3. h, Waterfall plot of CellNet31 predicted regulators of EZH1-bound bivalent gene networks. TF, transcription factor. i, Sitepro quantitative analysis32 of H3K27me3 levels at all upregulated genes around the transcription start site (TSS) upon EZH1 knockdown, relative to shLUC (n = 2 biological replicates). j, Left, Sankey diagram illustrating histone methylation changes of all bivalent genes in shLUC control cells and after EZH1 knockdown (n = 2 biological replicates). Right, genes that lose H3K27me3 (become activated) are specifically enriched in the HSC signature, whereas bivalent genes that are unchanged or inactivated are enriched in the ProB signature by Fisher’s exact test. k, ChIP–seq tracks of EZH1, H3K4me3 and H3K27me3 at representative HSC promoter regions in shLUC and shEZH1 cells. Experiments replicated at least twice.

PowerPoint slide

Source data

Figure 4: Ezh1 deficiency increases lymphoid potential and engraftment of embryonic HSPCs.
figure 4

a, Representative images of E9.5 and E10.5 embryos (n > 50 embryos). b, Quantitative PCR (qPCR) of each PRC2 subunit in E10.5 wild-type yolk sac (YS) and AGM (n = 3 biological replicates). *P = 0.0439, ****P < 0.0001 by unpaired two-tailed t-test. Data are mean ± s.e.m. BM, bone marrow. c, ATAC density map of c-Kit+VE-cadherin+CD45+ HSPCs sorted from 30 pooled embryos of E10.5 wild-type (WT) and Ezh1+/− AGM. d, Significantly upregulated ATAC peaks were compared to HSPC, T and B cell networks and signatures of the human HSPC hierarchy45. *P < 0.05 by Fisher’s exact test. e, Left, engraftment of E10.5 AGM (3.5 ee) in sublethally irradiated adult NSG females. Donor chimaerism marked by CD45.2+ was measured in peripheral blood every 4 weeks up to 16 weeks post-transplantation. Each dot represents a single transplant recipient; lines denote mean values. Right, lineage distribution of engrafted mice showing T cell (T), B cell (B), and myeloid (M) contribution. f, Left, engraftment of E10.5 yolk sac (5 ee). Right, lineage distribution of engrafted mice. g, Left, engraftment of E9.5 PSP (10 ee). Right, lineage distribution of engrafted mice. h, Left, serial transplantation of whole bone marrow from primary recipients of E10.5 AGM cells in e. Secondary transplant (2°) was carried out after 24 weeks of primary transplant. Right, lineage distribution of engrafted mice (n ≥ 3 mice per group). *P < 0.05, ** P < 0.01, ***P < 0.0001 by unpaired two-tailed t-test. See Supplementary Information for exact P values per time point. Data are pooled across four (eg) or three (h) independent experiments; experiment in c was performed once.

PowerPoint slide

Source data

Regions of increased chromatin accessibility (1,610 ATAC peaks) in shEZH1 cells exhibited concomitantly increased gene expression upon EZH1 knockdown and were associated with T cell development and lymphocyte activation pathways, as well as HSC, HSC/MLP, B and T cell signatures (Fig. 3d, e, Extended Data Fig. 3e–g). EZH1 knockdown also increased accessibility to HSC/lymphoid transcription factors, such as HLF, FOXO1 and ARID5B13,14,15 (Fig. 3f). Downregulated peaks were enriched for alternative developmental processes and importantly, embryonic haematopoiesis (Fig. 3e, Extended Data Fig. 3e). In vivo, upregulated ATAC peaks in Ezh1-deficient AGM cells were enriched for immune response, T cell activation, lymphocyte differentiation pathways, as well as HSC and HSC/MLP signatures (Fig. 4a, c, d, Extended Data Fig. 3h, i); furthermore, Ezh1 deficiency increased accessibility to target genes of master haematopoietic transcription factors, including Runx1 (Extended Data Fig. 3k, l).

We hypothesized that these molecular changes upon EZH1 knockdown were mediated by bivalent, or poised, chromatin domains, often implicated in the control of developmentally regulated genes16. Consistent with previous reports, EZH1 was broadly associated with repressive (H3K27me3), bivalent (H3K27me3 and H3K4me3) and active (H3K4me3) histone methylation marks17,18 (Fig. 3gExtended Data Fig. 4a). Although active genes were associated with housekeeping functions (Extended Data Fig. 4b), EZH1-bound bivalent and repressed genes were enriched for developmental and morphogenic processes (Extended Data Fig. 4c, d). EZH1 knockdown increased the expression of bivalent genes, which were associated with HSC and early lymphoid lineages (Extended Data Fig. 4e, f). These genes included the targets of HSC transcription factors such as RUNX1T1 and SOX17, and NOTCH factors HES1, HEY1 and FOXC219 (Fig. 3h). EZH1 directly bound the promoters of HSC and ProB transcription factors including HLF, PRDM16, LMO2, ETS1, MEIS1, RUNX1 and HOX clusters (Extended Data Fig. 4e). We also observed a global reciprocal relationship between H3K27me3 and gene transcription (Fig. 3i, Extended Data Fig. 4g–k), with poised HSC genes exhibiting loss of H3K27me3 and increased expression upon EZH1 knockdown (Extended Data Fig. 4h, i). In total, 27 out of 29 of these activated HSC genes are direct targets of EZH1, including HOPX, HLF, MEIS1 and HES1 (P = 7.8 × 10−5; Fig. 3j, k).

EZH2 also bound activated HSC genes, consistent with its ability to target the same regions8 (Extended Data Fig. 4l); however, recent analysis of SET domain-swapping revealed context-specific sensitivity to an EZH2-specific inhibitor, further suggesting that although EZH1 and EZH2 can bind a common subset of HSC targets, these enzymes are likely to have distinct functions on chromatin20. Concordant with our observation that SUZ12 knockdown partially phenocopies EZH1 loss (Fig. 2a, b), we observed specific enrichment of EZH1 and SUZ12 at activated HSC and ProB genes, consistent with non-canonical targets of the EZH1–SUZ12 complex17 (Extended Data Fig. 4m–q). Similarly, upregulated ATAC peaks in Ezh1-deficient AGM were also enriched for SUZ12 binding, but not EZH2, indicating a conserved role for non-canonical PRC2 regulation in vivo (Extended Data Fig. 4r). These data suggest that in addition to the canonical function of EZH1–PRC2 in mediating H3K27me3 changes at poised HSC loci, EZH1 also regulates ProB genes through a complementary non-canonical EZH1–SUZ12 complex, highlighting an EZH1-specific function that is not phenocopied by EZH2.

The emergence of bona fide HSCs, defined by the capacity to repopulate irradiated adult recipients, marks the transition from embryonic to definitive haematopoiesis. We isolated AGM and yolk sac from embryonic day (E)10.5 wild-type, Ezh1+/− and Ezh1−/− embryos and transplanted adult non-obese diabetic (NOD)/severe combined immunodeficiency (SCID)/Il2rg−/− (NSG) recipients (Fig. 4a). We detected peripheral blood reconstitution from wild-type AGM in 3 out of 7 mice (11.9 ± 7.9%) at 4 weeks, but chimaerism decreased by 16 weeks (2 out of 7, 12.2 ± 8.1%); this corresponds to 1 repopulating unit in approximately 10.4 embryo equivalents (ee), consistent with HSCs being exceedingly rare at E10.510,21. By contrast, 5 out of 8 mice transplanted with Ezh1−/− AGM cells were engrafted at 4 weeks (39.2 ± 9.4%) and stabilized at 16 weeks (34.6 ± 14.6%). Notably, Ezh1+/− AGM transplant recipients had the highest initial chimaerism (41.2 ± 16.3%; 4 out of 5), which increased by 16 weeks (68.9 ± 17.8%), and was predominantly multilineage (3 out of 5) (Fig. 4e, Extended Data Fig. 5a, c). This corresponds to 1 repopulating unit in 3.6 Ezh1−/− and 2.2 Ezh1+/− ee, or an approximately fivefold increase in HSC frequency compared to wild type.

At E10.5, the yolk sac is thought to contain few, if any, HSCs21. We detected low-level engraftment of wild-type yolk sac cells in 5 out of 9 recipients at 4 weeks (3.4 ± 0.7%), and in 3 out of 9 mice at 16 weeks (4.3 ± 1.6%). Most Ezh1−/− (4.5 ± 0.9%, 6 out of 7 engrafted) and all of the Ezh1+/− yolk-sac-transplanted mice (5.4 ± 1.4%, 5 out of 5 engrafted) showed stable long-term engraftment at 16 weeks. The number of repopulating units calculated was similar to that of the AGM (about 1 in 12.3 ee wild-type mice; 1 in 2.6 ee Ezh1−/− mice, 1 in <2 ee Ezh1+/− mice). All engrafted mice were multilineage (Fig. 4f, Extended Data Fig. 5a, c). Importantly, up to 75% of peritoneal B cells in Ezh1+/− AGM-engrafted mice were of the adult-like B-2 phenotype, as opposed to the embryonic B-1 cells (Extended Data Fig. 6a). Moreover, up to 95% of donor-derived CD45.2+CD3+ T cells expressed adult-type TCRβ, as opposed to embryonic TCRγδ, in Ezh1−/− and Ezh1+/− AGM- and yolk-sac-engrafted mice (Extended Data Fig. 6b). These data provide compelling evidence that Ezh1 deficiency, and in particular haploinsufficiency, stimulates generation of definitive HSCs and adult-like lymphopoiesis.

The para-aortic splanchnopleura (PSP) at E9.5 lacks HSCs as determined by transplantation studies3. Transplantation of E9.5 wild-type PSP cells (Fig. 4a) failed to engraft adult recipients (0 out of 5)21,22; by contrast, we detected chimaerism in recipients of Ezh1−/− (3 out of 3, 1.6 ± 0.3%) and Ezh1+/− (4 out of 6 mice, 3.6 ± 1.3%) PSP at 4 weeks post-transplantation (Fig. 4g, Extended Data Fig. 5b). By 16 weeks, chimaerism increased in Ezh1−/− (3 out of 3, 9.4 ± 5.1%) and Ezh1+/− (5 out of 6, 13.1 ± 9.5%) recipients, and grafts were fully multilineage (Extended Data Fig. 5c). Thus, Ezh1 deficiency stimulates precocious generation of bona fide HSCs during embryogenesis.

To assess the self-renewal capacity of Ezh1-deficient HSCs, we performed secondary transplantation. No mice showed engraftment with E10.5 wild-type AGM (0 out of 4) or yolk sac (0 out of 7). By contrast, 4 out of 7 Ezh1−/− (4.4 ± 0.5%) and 9 out of 9 Ezh1+/− (57.8 ± 10.2%) AGM-derived secondary recipients were engrafted (Fig. 4h, Extended Data Fig. 5d). Of note, although no Ezh1−/− yolk sac recipients (0 out of 10) were engrafted, we observed secondary chimaerism from Ezh1+/− yolk sac cells (5 out of 7, 1.5 ± 0.3%), which increased by 16 weeks (6 out of 7, 5.1 ± 1.9%) (Extended Data Fig. 5d, e). All engrafted secondary recipients were multilineage, with no evidence of leukaemic transformation (Fig. 4h, Extended Data Fig. 5c, e). Taken together, these data indicate that genetic Ezh1 deficiency elicits precocious emergence of bona fide HSCs in vivo.

It has long been a curiosity that haematopoietic ontogeny progresses in reverse order, with haematopoietic progenitors appearing first in embryonic development independently of HSCs2,3. We propose that EZH1 represses definitive loci in primitive blood progenitors differentiated from human pluripotent stem cells and in mouse embryos, which precludes precocious HSC emergence during gestation. EZH1 deficiency promotes multipotency in lineage-restricted blood progenitors and enables precocious emergence of HSCs. Although PRC2 is a well-characterized HSC regulator, our data contribute compelling evidence for the distinct molecular functions of EZH1 and EZH2, and suggest a putative role for non-canonical PRC2, involving EZH1 and SUZ12. Homozygous loss of Suz12 in mice impairs HSC function and lymphopoiesis, but heterozygosity for Suz12 or Eed enhances HSC self-renewal23,24. Consistent with this, our data reinforce the concept that HSCs are exquisitely sensitive to PRC2 dosage, with partial reduction or increase affecting function23,24,25,26. Interestingly, Runx1 haploinsufficiency also promotes premature HSC generation27. Our data unify these observations; EZH1 marks many transcription factor-binding sites, whereas Ezh1 deficiency enhances accessibility to targets of key HSC transcription factors, including Runx1, to promote HSC emergence (Extended Data Fig. 3k, l). We identify Ezh1 as a molecular regulator of lineage-restricted potential of the first blood progenitors in the mammalian embryo, which accounts in part for why early embryonic progenitors lack multipotency. Beyond developmental implications, our findings suggest that resolution of EZH1-marked domains may be essential for physiological specification of HSCs from pluripotent stem cells, as a complementary approach to the synthetic reactivation of stem-cell programs by HSC transcription factors7,28.

Methods

A step-by step protocol can be found at the Protocol Exchange33.

Human iPS cell culture

All experiments were performed using MSC-iPS134, CD34-iPS and CD45-iPS cells, obtained from the Boston Children’s Hospital Human Embryonic Stem Cell Core (hESC) and verified by immunohistochemistry for pluripotency markers, teratoma formation and karyotyping. All cells were routinely tested for mycoplasma contamination. Human iPS cells were maintained on mouse embryonic fibroblast (GlobalStem) feeders in DMEM/F12 plus 20% KnockOut-Serum Replacement (Invitrogen), 1 mM l-glutamine, 1 mM non-essential amino acids (NEAA), 0.1 mM β-mercaptoethanol and 10 ng ml−1 bFGF. Medium was changed daily, and cells were passaged 1:4 onto fresh feeders every 7 days using standard clump passaging with collagenase IV.

Embryoid body differentiation

Differentiation of embryoid bodies was performed as previously described35. In brief, human pluripotent stem cell colonies were scraped into non-adherent rotating 10 cm plates at the ratio of 2:1. Embryoid body medium was KO-DMEM plus 20% FBS (Stem Cell Technologies), 1 mM l-glutamine, 1 mM NEAA, 1% penicillin–streptomycin, 0.1 mM β-mercaptoethanol, 200 μg ml−1 human transferrin and 50 μg ml−1 ascorbic acid. After 24 h, medium was changed by allowing embryoid bodies to settle by gravity, and replaced with embryoid body medium supplemented with growth factors: 50 ng ml−1 BMP4 (R&D Systems), 200 ng ml−1 SCF, 200 ng ml−1 FLT3, 50 ng ml−1 G-CSF, 20 ng ml−1 IL-6 and 10 ng ml−1 IL-3 (all Peprotech). Medium was changed on days 5 and 10. Embryoid bodies were dissociated on day 14 by digesting with collagenase B (Roche) for 2 h, followed by treatment with enzyme-free dissociation buffer (Gibco), and filtered through an 80-μm filter. Dissociated embryoid bodies were frozen in 10% DMSO, 40% FBS freezing solution.

Progenitor sorting

Dissociated embryoid body cells were thawed following the Lonza Poietics protocol and resuspended at 1 × 106 per 100 μl staining buffer (PBS plus 2% FBS). CD34+ cells were sorted from bulk embryoid body culture using human CD34 microbeads (Miltenyi Biotec) and run through a magnetic column separator (MACS) as per the manufacturer’s instructions.

Lentiviral and shRNA library plasmids

The 5F lentiviral plasmids HOXA9, ERG, RORA, SOX4 and MYB were cloned into pInducer-21 doxycycline-inducible lentiviral vector. The shRNA library targeting 20 epigenetic modifiers36 was obtained from the Broad Institute RNAi Consortium in pLKO.1 or pLKO.5 lentiviral vectors. Lentiviral particles were produced by transfecting 293T-17 cells (ATCC) with the lentiviral plasmids and third-generation packaging plasmids. Viruses were harvested 24 h after transfection and concentrated by ultracentrifugation at 64,965g for 3 h using the Beckman Coulter SW 32 Ti rotor. All viruses were titred by serial dilution on 293T cells.

5F gene transfer and 5F culture

MACS-separated CD34+ embryoid body progenitors were seeded on retronectin-coated (10 μg cm−2) 96-well plates at a density of 2 × 104–5 × 104 cells per well. The infection medium was SFEM (StemCell Technologies) with 50 ng ml−1 SCF, 50 ng ml−1 FLT3, 50 ng ml−1 TPO (all R&D Systems), 50 ng ml−1 IL-6 and 10 ng ml−1 IL-3 (both from Peprotech). Lentiviral infections were carried out in a total volume of 150 μl. The multiplicity of infection (MOI) for each factor was as follows: ERG MOI = 5, HOXA9 MOI = 5, RORA MOI = 3, SOX4 MOI = 3, MYB MOI = 3, and MOI = 2 for shRNA. Virus was concentrated onto cells by centrifuging the plate at 924g for 30 min at room temperature. Infections were carried out for 24 h. After gene transfer, 5F cells were cultured in SFEM with 50 ng ml−1 SCF, 50 ng ml−1 FLT3, 50 ng ml−1 TPO, 50 ng ml−1 (all R&D Systems) IL-6, and 10 ng ml−1 IL-3 (Peprotech). Doxycycline (Dox) was added at 2 μg ml−1 (Sigma). Puromycin was added at 0.3 μg ml−1 (ThermoFisher Scientific). Cultures were maintained at a density of <1 × 106 cells ml−1, and the medium was changed every 3–4 days.

T cell differentiation

After 14 days of respecification, 1 × 105 5F cells were plated in OP9-DL1 stromal co-culture37. Cells were cultured in α-MEM (Gibco), 1% penicillin–streptomycin, 20% FBS (Gemini), and 1 mM l-glutamine with 30 ng ml−1 SCF, 5 ng ml−1 FLT3, 5 ng ml−1 IL-7 (all R&D Systems) for 20 days with 2 μg ml−1 Dox followed by Dox removal. Cells were collected by mechanical dissociation and filtered through a 40-μm filter and passaged onto fresh stroma every 5–7 days. T cell development was assessed after 35 days using CD45, CD7, CD3, CD4 and CD8.

B cell differentiation

After 14 days of respecification, 5 × 104 5F cells were plated into a single well of MS-5 stroma in a 6-well NUNC plate. Cells were cultured in Myelocult H5100 (Stem Cell Technologies) supplemented with 50 ng ml−1 SCF, 10 ng ml−1 FLT3, 25 ng ml−1 IL7, 25 ng ml−1 TPO (all R&D Systems) and 1% penicillin–streptomycin for 10 days with 2 μg ml−1 Dox followed by Dox removal.

Colony assays

After 14 days of respecification, 5 × 104 cells were plated into 3 ml of complete methylcellulose H3434 (StemCell Technologies) supplemented with 10 ng ml−1 IL-6 (Peprotech), 10 ng ml−1 FLT3 (R&D) and 50 ng ml−1 TPO (R&D) without 2 μg ml−1 Dox. The mixture was distributed into two 60-mm dishes and maintained in a humidified chamber for 14 days.

Mouse transplantation

NOD/SCID/Il2rg−/− (NSG) (Jackson Laboratory) mice were bred and housed at the Boston Children’s Hospital animal care facility. Animal experiments were performed in accordance with institutional guidelines approved by Boston Children’s Hospital Animal Care Committee. At least three animals were used per cohort, based on previous transplantation studies. Mice were assigned randomly to groups and blinding was not used. In brief, 8–12-week-old mice were irradiated (2.75 Gy) 24 h before transplant. To ensure consistency between experiments, only female mice were used. Sublethally irradiated adult NSG females were transplanted intravenously with 3.5 ee of whole E10.5 AGM, 5 ee of whole E10.5 yolk sac or 10 ee of whole E9.5 PSP. Mice were bled retroorbitally every 4 weeks to monitor donor chimaerism up to 16 weeks post-transplantation. Twenty-four weeks after primary transplantation, primary recipients from each group were euthanized and 4 × 106 whole bone marrow cells were transplanted into 1–3 secondary recipients. Cells were transplanted in a 200 μl volume using a 28.5-gauge insulin needle. Sulfatrim was administered in drinking water to prevent infections after irradiation. Data points were combined from all independent experiments and outliers were not excluded.

Flow cytometry

The following antibodies were used for human cells: CD45 allophycocyanin (APC)-conjugated Cy7 (557833, BD Biosciences), CD4 phycoerythrin (PE)-conjugated Cy5 (IM2636U, Beckman Coulter Immunotech), CD8–BV421 (RPA-T8, BD Horizon), CD5–BV510 (UCHT2, BD Biosciences), TCRγδ–APC (555718, BD Biosciences), TCRαβ–BV510 (T10B9.1A-31, BD Biosciences), CD3–PE–Cy7 (UCHT1, BD Pharmigen), CD7–PE (555361, BD Pharmigen), CD1a–APC (559775, BD Pharmigen) for T cell staining. For B cell staining: CD45–PE–Cy5 (IM2652U, Beckman Coulter Immunotech), CD19–PE (4G7, BD Biosciences), CD56–V450 (B159, BD Biosciences), CD11b–APC–Cy7 (557754, BD Biosciences), For HSC/progenitor sorting: CD34–PE–Cy7 (8G12, BD Biosciences), CD45–APC–Cy7 (557833, BD Biosciences), CD38–PE–Cy5 (IM2651U, Beckman Coulter) and DAPI. For myeloid and erythroid staining: CD11b–APC–Cy7 (557754, BD Biosciences), GLYA–PE–Cy7 (A71564, Beckman Coulter), CD71–PE (555537, BD Biosciences), CD45–PE–Cy5 (IM2652U, Beckman Coulter Immunotech). All staining was performed with <1 × 106 cells per 100 μl staining buffer (PBS plus 2% FBS), with a 1:100 dilution of each antibody, for 30 min at room temperature in the dark. Compensation was performed by automated compensation with anti-mouse Igκ and negative beads (BD Biosciences). All acquisitions were performed on a BD Fortessa or BD Aria cytometer.

The following antibodies were used for mouse cells: CD45.2–PE–Cy7 (104, eBioscience), CD45.1–FITC (A20, eBioscience), B220–PB (RA3-6B2, BD Biosciences), Ter119–PE–Cy5 (Ter 119, eBioscience), GR1 (RB6-8C5, BD Bioscience), CD3–APC (145-2C11, eBioscience), CD19–APC–Cy7 (1D3, BD Bioscience), MAC1–AF700 (M1/70, BD Bioscience) for engraftment analyses. For B cell staining: CD45.2–APC–Cy7 (104, BioLegend), CD23–PE–Cy7 (B3B4, eBioscience), Ter119–PE–Cy5 (Ter 119, eBioscience), MAC1–A700 (M1/70, BD Bioscience), CD5–BV510 (53-7.3, BD Biosciences), IgM–eFluor660 (II/41, eBioscience). For T cell staining: CD45.2–PE–Cy7 (104, eBiosciences), TCRβ–PE–Cy5 (H57-597, BD Biosciences), CD8–APC–EF780 (53-6.7, eBioscience), CD4–APC (GK1.5, eBioscience), CD3–AF700 (17A2, BioLegend), TCRγδ–FITC (GL3, BD Biosciences). For HSPC sorting: CD16/32 (93, Biolegend), Ter119–biotin (Ter119, eBioscience), Gr-1–biotin (RB6-8C5, eBioscience), CD3–biotin (17A2, eBioscience), CD5–biotin (53-7.3, eBioscience), CD8–biotin (53-6.7, eBioscience), CD19–biotin (eBio1D3, eBioscience), streptavidin–eFluor450 (eBioscience), CD45–PerCP-Cy5.5 (30-F11, eBioscience), CD144–eFluor660 (eBioBV13, eBioscience), CD117–APC–eFluor 780 (2B8, eBioscience), CD41–PE–Cy7 (eBioMWReg30, eBioscience). All staining was performed with <1 × 106 cells per 100 μl staining buffer (PBS plus 2% FBS), with a 1:100 dilution of each antibody, for 30 min on ice in the dark. Compensation was performed by automated compensation with anti-rat and anti-hamster Igκ and negative beads (BD Biosciences). All acquisitions were performed on a BD Fortessa or BD Aria cytometer.

RNA-seq

Human cells were stained and sorted using CD34–PE–Cy7 (8G12, BD Biosciences), CD38–PE–Cy5 (IM2651U, Beckman Coulter) and DAPI (Beckman Coulter). RNA-seq libraries were prepared using the NEB Ultra (PolyA) kit as per the manufacturer’s protocol with 50 ng input RNA. Mouse cells were stained and sorted using the ‘HSPC stain’ (see ‘Flow cytometry’). RNA-seq libraries were prepared using the Clontech SMARTer Universal Low Input kit as per the manufacturer’s protocol with 10 ng input RNA. Libraries were sequenced using the 200 cycle paired-end kit on the Illumina HiSeq2500 system. RNA-seq reads were analysed with Tuxedo Tools following a standard protocol38. Reads were mapped with TopHat version 2.1.0 and Bowtie2 version 2.2.4 with default parameters against build hg19 of the human genome, and build hg19 of the RefSeq human genome annotation. Samples were quantified with the Cufflinks package version 2.2.1. Differential expression was performed using Cuffdiff with default parameters.

ATAC–seq

ATAC–seq was performed as previously described39. 5 × 103–50 × 103 cells were used for each tagmentation using Tn5 transposases. The resulting DNA was isolated, quantified and sequenced on an Illumina NextSeq500 system. The raw reads were aligned to the human genome assembly hg19 using Bowtie40 with the default parameters, and only tags that uniquely mapped to the genome were used for further analysis. ATAC peaks were identified using MACS41.

ChIP–seq

ChIP experiments were performed as previously described42 using the antibodies for H3K4me3 (04-745, Millipore) and H3K27me3 (07-449, Millipore) in 5F cells. For bioChIP analysis of EZH1 or EZH2 occupancy, Flag–biotin-tagged EZH1 or EZH2 was stably expressed in 5F cells. The chromatin was isolated and immunoprecipiated with streptavidin Dynabeads (Life Technologies) as previously described43. ChIP–seq libraries were generated using NEBNext ChIP–seq Library Prep Master Mix following the manufacturer’s protocol (New England Biolabs), and sequenced on an Illumina NextSeq500 system. ChIP–seq raw reads were aligned to the human genome assembly hg19 using Bowtie40 with the default parameters; only tags that uniquely mapped to the genome were used for further analysis. ChIP–seq peaks were identified using MACS41.

Bioinformatics and statistical analysis

All statistical calculations were performed using GraphPad Prism. Tests between two groups used a two-tailed unpaired Student’s t-test. Data are presented as mean ± s.e.m. Where indicated, ANOVA was used, with P < 0.05 considered significant. GSEA and GO were run according to default parameters in their native implementations. Statistical enrichment of gene lists was performed using Fisher’s exact test. No statistical methods were used to predetermine sample size.

Data availability

All RNA-seq, ATAC–seq and ChIP–seq data have been deposited to the Gene Expression Omnibus (GEO) database under the accession number GSE89418.