Abstract
Pinpointing functional noncoding DNA sequences and defining their contributions to health-related traits is a major challenge for modern genetics. We developed a high-throughput framework to map noncoding DNA functions with single-nucleotide resolution in four loci that control erythroid fetal hemoglobin (HbF) expression, a genetically determined trait that modifies sickle cell disease (SCD) phenotypes. Specifically, we used the adenine base editor ABEmax to introduce 10,156 separate A•T to G•C conversions in 307 predicted regulatory elements and quantified the effects on erythroid HbF expression. We identified numerous regulatory elements, defined their epigenomic structures and linked them to low-frequency variants associated with HbF expression in an SCD cohort. Targeting a newly discovered γ-globin gene repressor element in SCD donor CD34+ hematopoietic progenitors raised HbF levels in the erythroid progeny, inhibiting hypoxia-induced sickling. Our findings reveal previously unappreciated genetic complexities of HbF regulation and provide potentially therapeutic insights into SCD.
Similar content being viewed by others
Data availability
Raw and processed sequencing data generated in this study are available from the Gene Expression Omnibus under accession GSE157311. Source data are provided with this paper.
Code availability
Custom source code used in this paper can be downloaded from https://github.com/YichaoOU/ABE_NonCoding_functional_score.
References
Agrawal, P., Heimbruch, K. E. & Rao, S. Genome-wide maps of transcription regulatory elements and transcription enhancers in development and disease. Compr. Physiol. 9, 439–455 (2018).
Rickels, R. & Shilatifard, A. Enhancer logic and mechanics in development and disease. Trends Cell Biol. 28, 608–630 (2018).
Bolt, C. C. & Duboule, D. The regulatory landscapes of developmental genes. Development 147, dev171736 (2020).
Driscoll, M. C., Dobkin, C. S. & Alter, B. P. γδβ-Thalassemia due to a de novo mutation deleting the 5′ β-globin gene activation-region hypersensitive sites. Proc. Natl Acad. Sci. USA 86, 7470–7474 (1989).
Kioussis, D., Vanin, E., deLange, T., Flavell, R. A. & Grosveld, F. G. β-Globin gene inactivation by DNA translocation in γβ-thalassaemia. Nature 306, 662–666 (1983).
Lettice, L. A. et al. Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proc. Natl Acad. Sci. USA 99, 7548–7553 (2002).
Bauer, D. E. et al. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. Science 342, 253–257 (2013).
Chatterjee, S. & Ahituv, N. Gene regulatory elements, major drivers of human disease. Annu. Rev. Genomics Hum. Genet. 18, 45–63 (2016).
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).
Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).
Bulger, M. & Groudine, M. Functional and mechanistic diversity of distal transcription enhancers. Cell 144, 327–339 (2011).
Zheng, H. & Xie, W. The role of 3D genome organization in development and cell differentiation. Nat. Rev. Mol. Cell Biol. 20, 535–550 (2019).
Schoenfelder, S. & Fraser, P. Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet. 20, 437–455 (2019).
Henikoff, S. & Shilatifard, A. Histone modification: cause or cog? Trends Genet. 27, 389–396 (2011).
Cheng, J. et al. A role for H3K4 monomethylation in gene repression and partitioning of chromatin readers. Mol. Cell 53, 979–992 (2014).
Canver, M. C. et al. Variant-aware saturating mutagenesis using multiple Cas9 nucleases identifies regulatory elements at trait-associated loci. Nat. Genet. 49, 625–634 (2017).
Diao, Y. et al. A tiling-deletion-based genetic screen for cis-regulatory element identification in mammalian cells. Nat. Methods 14, 629–635 (2017).
Diao, Y. et al. A new class of temporarily phenotypic enhancers identified by CRISPR/Cas9-mediated genetic screening. Genome Res. 26, 397–405 (2016).
Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-based technologies for the manipulation of eukaryotic genomes. Cell 168, 20–36 (2017).
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Wienert, B. et al. KLF1 drives the expression of fetal hemoglobin in British HPFH. Blood 130, 803–807 (2017).
Wienert, B., Martyn, G. E., Funnell, A. P. W., Quinlan, K. G. R. & Crossley, M. Wake-up sleepy gene: reactivating fetal globin for β-hemoglobinopathies. Trends Genet. 34, 927–940 (2018).
Perkins, A. et al. Krüppeling erythropoiesis: an unexpected broad spectrum of human red blood cell disorders due to KLF1 variants. Blood 127, 1856–1862 (2016).
Traxler, E. A. et al. A genome-editing strategy to treat β-hemoglobinopathies that recapitulates a mutation associated with a benign genetic condition. Nat. Med. 22, 987–990 (2016).
Wu, Y. et al. Highly efficient therapeutic gene editing of human hematopoietic stem cells. Nat. Med. 25, 776–783 (2019).
Métais, J.-Y. et al. Genome editing of HBG1 and HBG2 to induce fetal hemoglobin. Blood Adv. 3, 3379–3392 (2019).
Galarneau, G. et al. Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation. Nat. Genet. 42, 1049–1051 (2010).
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Kurita, R. et al. Establishment of immortalized human erythroid progenitor cell lines able to produce enucleated red blood cells. PLoS ONE 8, e59890 (2013).
Grevet, J. D. et al. Domain-focused CRISPR screen identifies HRI as a fetal hemoglobin regulator in human erythroid cells. Science 361, 285–290 (2018).
Canver, M. C. et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015).
Grünewald, J. et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433–437 (2019).
Liu, N. et al. Direct promoter repression by BCL11A controls the fetal to adult hemoglobin switch. Cell 173, 430–442.e17 (2018).
Dogan, N. et al. Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility. Epigenetics Chromatin 8, 16 (2015).
Cheng, Y. et al. Principles of regulatory information conservation between mouse and human. Nature 515, 371–375 (2014).
Funnell, A. P. W. et al. 2p15-p16.1 microdeletions encompassing and proximal to BCL11A are associated with elevated HbF in addition to neurologic impairment. Blood 126, 89–93 (2015).
Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).
Borg, J. et al. Haploinsufficiency for the erythroid transcription factor KLF1 causes hereditary persistence of fetal hemoglobin. Nat. Genet. 42, 801–805 (2010).
Zhou, D., Liu, K., Sun, C.-W., Pawlik, K. M. & Townes, T. M. KLF1 regulates BCL11A expression and γ- to β-globin gene switching. Nat. Genet. 42, 742–744 (2010).
Natiq, A. et al. Hereditary persistence of fetal hemoglobin in two patients with KLF1 haploinsufficiency due to 19p13.2–p13.12/13 deletion. Am. J. Hematol. 92, E2–E3 (2017).
Danjou, F. et al. Genome-wide association analyses based on whole-genome sequencing in Sardinia provide insights into regulation of hemoglobin levels. Nat. Genet. 47, 1264–1271 (2015).
Thein, S. L. Genetic association studies in β-hemoglobinopathies. Hematology 2013, 354–361 (2013).
Huang, P. et al. Comparative analysis of three-dimensional chromosomal architecture identifies a novel fetal hemoglobin regulatory element. Gene Dev. 31, 1704–1713 (2017).
Ivaldi, M. S. et al. Fetal γ-globin genes are regulated by the BGLT3 long noncoding RNA locus. Blood 132, 1963–1973 (2018).
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Poole, W., Gibbs, D. L., Shmulevich, I., Bernard, B. & Knijnenburg, T. A. Combining dependent P-values with an empirical adaptation of Brown’s method. Bioinformatics 32, i430–i436 (2016).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Masuda, T. et al. Transcription factors LRF and BCL11A independently repress expression of fetal hemoglobin. Science 351, 285–289 (2016).
Mantovani, R. et al. The effects of HPFH mutations in the human γ-globin promoter on binding of ubiquitous and erythroid specific nuclear factors. Nucleic Acids Res. 16, 7783–7797 (1988).
Ronchi, A. E., Bottardi, S., Mazzucchelli, C., Ottolenghi, S. & Santoro, C. Differential binding of the NFE3 and CP1/NFY transcription factors to the human γ- and ∊-globin CCAAT boxes. J. Biol. Chem. 270, 21934–21941 (1995).
Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2019).
Bodine, D. M. & Ley, T. J. An enhancer element lies 3′ to the human A gamma globin gene. EMBO J. 6, 2997–3004 (1987).
Purucker, M., Bodine, D., Lin, H., McDonagh, K. & Nienhuis, A. W. Structure and function of the enhancer 3′ to the human A γ globin gene. Nucleic Acids Res. 18, 7407–7415 (1990).
Martyn, G. E. et al. Natural regulatory mutations elevate the fetal globin gene via disruption of BCL11A or ZBTB7A binding. Nat. Genet. 50, 498–503 (2018).
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Zhang, F. & Lupski, J. R. Non-coding genetic variants in human disease. Hum. Mol. Genet. 24, R102–R110 (2015).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2018).
Zeng, J. et al. Therapeutic base editing of human hematopoietic stem cells. Nat. Med. 26, 535–541 (2020).
Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).
Sanjana, N. E. et al. High-resolution interrogation of functional elements in the noncoding genome. Science 353, 1545–1549 (2016).
Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2019).
Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).
Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).
Menzel, S. et al. A QTL influencing F cell production maps to a gene encoding a zinc-finger protein on chromosome 2p15. Nat. Genet. 39, 1197–1199 (2007).
Stadhouders, R. et al. HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. J. Clin. Invest. 124, 1699–1710 (2014).
Vinjamur, D. S., Bauer, D. E. & Orkin, S. H. Recent progress in understanding and manipulating haemoglobin switching for the haemoglobinopathies. Br. J. Haematol. 180, 630–643 (2018).
Montavon, T. et al. A regulatory archipelago controls Hox genes transcription in digits. Cell 147, 1132–1145 (2011).
Snetkova, V. & Skok, J. A. Enhancer talk. Epigenomics 10, 483–498 (2018).
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Jeong, J. et al. High-efficiency CRISPR induction of t(9;11) chromosomal translocations and acute leukemias in human blood stem cells. Blood Adv. 3, 2825–2835 (2019).
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR–Cas9 variants. Science 368, 290–296 (2020).
Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, eaas9129 (2018).
Zhang, X. et al. Dual base editor catalyzes both cytosine and adenine base conversions in human cells. Nat. Biotechnol. 38, 856–860 (2020).
Grünewald, J. et al. A dual-deaminase CRISPR base editor enables concurrent adenine and cytosine editing. Nat. Biotechnol. 38, 861–864 (2020).
Sakata, R. C. et al. Base editors for simultaneous introduction of C-to-T and A-to-G mutations. Nat. Biotechnol. 38, 865–869 (2020).
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
Hu, J. et al. Isolation and functional characterization of human erythroblasts at distinct stages: implications for understanding of normal and disordered erythropoiesis in vivo. Blood 121, 3246–3253 (2013).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Pimentel, H., Bray, N. L., Puente, S., Melsted, P. & Pachter, L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods 14, 687–690 (2017).
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Li, Z. et al. Identification of transcription factor binding sites using ATAC-seq. Genome Biol. 20, 45 (2019).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Qi, Q. et al. Dynamic CTCF binding directly mediates interactions among cis-regulatory elements essential for hematopoiesis. Blood 137, 1327–1339 (2021).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
Landau, W., Niemi, J. & Nettleton, D.Fully Bayesian analysis of RNA-seq counts for the detection of gene expression heterosis.J. Am. Stat. Assoc. 114, 610–621 (2019).
Acknowledgements
R. Kurita and Y. Nakamura (Cell Engineering Division, RIKEN BioResource Research Center, Tsukuba, Japan) provided the HUDEP-2 cells. X. An (Laboratory of Membrane Biology, New York Blood Center) provided the anti-Band 3 antibody. We thank the St. Jude Children’s Research Hospital Flow Cytometry core facility for performing the cell sorting, the Hartwell Center core facility for performing the high-throughput sequencing and the Center for Advanced Genome Engineering for performing the targeted deep sequencing. We thank K. A. Laycock for scientific editing of the manuscript. This work was supported by St. Jude Children’s Research Hospital and ALSAC, National Institutes of Health grants R35GM133614 (to Y.C.), P01HL053749 (to M.J.W.) and R24DK106766 (to M.J.W., R.C.H. and Y.C.), the St. Jude Collaborative Research Consortium (to M.J.W. and Y.C.) and Doris Duke Foundation grant 2017093 (M.J.W.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
L.C., Y.L., M.J.W. and Y.C. designed the experiments, analyzed the data and wrote the manuscript. L.C. generated the HUDEP-2–ABEmax cell line and performed the CRISPR base editor screening. Y.L. and Y.C. designed the BPRSHbF model. L.C. and P.X. conducted the CD34+ cell genome editing, differentiation, flow cytometry and western blot analysis. L.C. and Q.Q. performed the ChIP-seq, ATAC-seq and capture HiChIP. R.F. performed the CUT&RUN assay. Y.Y. helped with the sickling assay. L.C. conducted the HPLC with help from J.Z. L.P. helped with interrogation of the SCD cohort data. R.F. and A.S. helped with the CRE sgRNA screening library and experimental design. J.C., R.W. and T.Y. helped with the gRNA functional validation. R.C.H. provided conceptual advice. Y.C. and M.J.W. supervised the study. All authors discussed the results and contributed to preparing the manuscript.
Corresponding authors
Ethics declarations
Competing interests
M.J.W. is a consultant for Cellarity and Novartis and has equity in Beam Therapeutics (a base-editing company). A.S. is the St. Jude Children’s Research Hospital site principal investigator of clinical trials for genome editing of SCD, sponsored by Vertex Pharmaceuticals/CRISPR Therapeutics (NCT03745287) and Novartis (NCT04443907). The industry sponsors provide funding for the clinical trial, which includes salary support paid to the institution of A.S. A.S. is also a consultant for Spotlight Therapeutics.
Additional information
Peer review information Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Establishment of an ABEmax-based system to perturb regulatory sequences.
a, ABEmax-Cas9 protein levels measured by Western blot analysis in wild-type HUDEP-2 cells (WT) and HUDEP-2 cells infected with different dosages of ABEmax lentivirus. β-Actin was used as a loading control. The result is representative of three independent experiments (Image was cropped from source data Fig. 2). b, HUDEP-2 cells with different levels of ABEmax expression were transduced with the same amount of gRNA targeting the HBG promoter. The graphs show the hemoglobin (Hb) protein content, as measured by isoelectric focusing high-performance liquid chromatography (IE-HPLC) in HUDEP-2 cells after 5 additional days of induced erythroid maturation. The result is representative of three independent experiments. c, Jitter plots showing the percentage of adenosine-to-inosine RNA modification by ABEmax in wild-type HUDEP-2 cells (WT), HUDEP-2 cells stably expressing an ABE (ABEmax), and HEK293T cells. The y-axis represents the efficiency of A-to-I RNA editing. n = total number of modified adenines observed. d, Targeted deep-sequencing analysis of the BCL11A CRE after editing with ABEmax and BCL11A_ENH gRNA. The mutations are indicated in bold. The red arrowhead indicates the targeted nucleotide. e, Western blot analysis with the indicated antibodies in undifferentiated (Day 0) and differentiated (Day 5) HUDEP-2 cells transduced with non-targeting control gRNA (Ctrl) or with BCL11A-ENH gRNAs. The result is representative of three independent experiments. (Image was cropped from source data Fig. 3).
Extended Data Fig. 2 High-throughput mapping of CREs regulating HbF in HUDEP-2 cells and single-gRNA validation in CD34+ HSPCs.
a,b, Dot plots showing the correlation between two biological replicates of ABE screens for the HbFhigh (a) and HbFlow (b) cell populations. Each dot represents one gRNA; the x- and y-axes represent the normalized read counts. c,d, Validation studies of top-hit gRNAs in normal donor CD34+ HSPC–derived erythroblasts. CD34+ cells were transfected with RNP complexes consisting of ABEmax + non-targeting control (Ctrl) gRNA or individual top-hit gRNAs and analyzed after 12 days of erythroid differentiation. c, HbF protein levels measured by Western blot analysis. The result is representative of three independent experiments. (Image was cropped from source data Fig. 4). d, Flow-cytometry plots showing the expression of the RBC maturation markers Band3 and CD49d after 12 days of differentiation (left) and a bar chart summarizing the results from three replicates (right). Error bars represent the mean ± S.E.M from three independent experiments. e, Boxplot comparing the HbF effects of gRNAs without editable adenines (n = 112) and none targeting control gRNAs (n = 20). Y-axis is log2 ratio of gRNA reads counts between HbFhigh and HbFlow cells. P-value was determined by unpaired two-tailed Wilcoxon test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values. f, Scatterplot showing the F-cell fractions measured by immune-flow cytometry in HUDEP-2- ABEmax and HUDEP-2-dCas9 cells transfected with 10 gRNAs. Each dot represents one gRNA. g, Comparison of target site mutation frequencies in HbFhigh and HbFlow cells. Cells were treated with ABEmax and 5 different gRNAs and then sorted based on HbF levels after 5 days differentiation. The frequencies are calculated based on one argeted deep-sequencing result.
Extended Data Fig. 3 ABE mutagenesis at different genomic loci.
a, Bar plot showing the effects of NFIX CREs on the expression levels of NFIX and KLF1. Y-axis shows relative mRNA expression measured by real-time RT-qPCR in HUDEP-2 cells edited with ABEmax and the two indicated gRNAs. The expression levels were normalized by those from HUDEP-2 cells treated with ABEmax and non-targeting control gRNA (Ctrl) (n = 3 independent experiments). b, β-Like globin gene cluster–associated HbFhigh gRNAs: Chromatin interaction loops, indicated by red arcs, were determined by H3K27ac HiChIP in HUDEP-2 cells. gRNA -log(FDR) represents the difference in gRNA abundance between the HbFhigh and HbFlow populations. ATAC-seq analysis reflects chromatin openness.
Extended Data Fig. 4 Empirical distribution of ABE editing efficiency and DNA sequence motifs measured by ATAC-seq.
a, Empirical distribution of ABEmax editing activities in HUDEP-2 cells. Bar plot shows the average editing activity at different positions among 23 different ABEmax edited loci. X-axis denotes positions relative to protospacer start (position 1). Y-axis shows the A to G conversion rate. b, A heatmap of on-target base-editing efficiencies of ABEmax as measured by targeted amplicon sequencing of 23 different edited genomic loci (row). Each cell represents one nucleotide. The cell number indicates the relative position of the nucleotide relative to the PAM sequence. The editing efficiency was measured by determining the percentage of nucleotide converted by ABEmax. c, The footprint profiles of GATA1, ZBTB7A, and CTCF binding sites derived from deep sequencing (ATAC-seq). The heatmap represents the ATAC-seq signals within a ±100-bp window for the top 1000 binding sites for each TF. Each row represents one binding site. Aggregated signals are plotted in the top panels.
Extended Data Fig. 5 3′ HBG1 enhancer–edited clones in HUDEP-2 cells.
(a) Genome browser screenshot of ZBTB7A occupancy profiles in HBG1 locus. gRNA track showing the location of the gRNA. Wild type HUDEP-2 ChIP-seq was downloaded from GSE103445. Two mutated clones (designated H2_mut_C1 and H2_mut_C2) were generated using ABEmax and gRNAs targeting 3′ HBG1 CRE. The position of the CRE was highlighted in blue. (b) Amplicon sequencing confirming the mutations in HUDEP-2 cells derived from single clones after treatment with the Chr11-3 gRNA. Edited adenines are marked in red box. (c–e) Validation studies of two HUDEP-2 single clones with mutations in the 3′ HBG1 enhancer. (c) The percentage of γ-globin mRNA as determined by real-time RT-qPCR. The error bars represent the ± S.E.M from three independent experiments. **** P = 4X10−7; unpaired t-test, two side. (d) The hemoglobin F fraction measured by IE-HPLC. The values represent the mean ± S.E.M from three independent experiments. ****P = 6X10−7; unpaired t-test, two side. (e) F-cell fractions measured by immuno-flow cytometry (left). The bar chart (right) shows the values from two independent experiments.
Extended Data Fig. 6 Epigenetic signals of CREs regulating HbF levels.
Box plots showing the epigenetic signal distribution among adenines with high (>30) (n = 313) and low (<10) BPRSHbF (n = 9268). (P-value were determined using with two-tailed Wilcoxon test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values.
Extended Data Fig. 7 Functional noncoding sequences and SNVs associated with HbF levels in patients with SCD.
a, The ratio of the mutation burden in patients with SCD with high HbF to that in patients with SCD with normal HbF at genomic loci with high BPRSHbF (the top 200). The x-axis represents the threshold of minor allele frequency (MAF) that was used to filter variants. The y-axis represents the different window sizes centered on genomic loci with high BPRSHbF. The number in each cell represents the ratio of the normalized mutation burden (see Methods) in patients with SCD with high RBC HbF levels to that in patients with SCD with normal HbF levels. b, The precision-recall curve representing the performance of a random forest model that predicts HbF levels by using the mutation burden within two groups of genomic loci. The green curve represents the model including only 18 common GWAS variants, and the red curve represents the model including the common GWAS variants plus 56 variants with high BPRSHbF. Dashed lines represent the precision at 75% recall rate. c, A box plot showing a pair-wise performance comparison of the two models. n = 400 random samplings. P-value is determined using paired two-tailed t-test. Box depicts the interquartile range; central line indicates the median and whiskers indicate minimum/maximum values.
Extended Data Fig. 8 Targeting erythroid-specific regulatory elements to increase HbF levels in erythroid progeny derived from HSPCs from donors with SCD.
a, A heatmap showing the distribution of chromatin accessibility, as measured by ATAC-seq, near edited adenines for 15 different blood cell types. Representative adenines with high (top) and low (bottom) erythroid-specific scores (Z-scores) were selected for plotting. The cell types for each track are shown at the bottom. b–e, CD34+ HSPCs from two donors with SCD were transfected with RNP consisting of ABE and Chr11-1 gRNA targeting the 3′ HBG1 enhancer or a non-targeting control gRNA (Ctrl), then grown in culture under conditions that support erythroid differentiation. Hemoglobinized erythroblasts were analyzed at day 12. b, The percentage of γ-globin mRNA as determined by real-time RT-qPCR (n = 2 different SCD participants). c, Representative flow-cytometry plots showing the expression of the RBC maturation markers Band3 and CD49d (n = 2 different SCD participants). d, May–Grünwald–Giemsa–stained erythroblasts. Scale bar, 20 μM. This is representative results of 2 SCD participants. e, Images of sickled erythroid cells. Arrowheads mark cells with sickle-like morphology. This is representative results of 2 SCD participants. Original picture was visualized by phase-contrast microscopy using the IncuCyte S3 Live-Cell Analysis System (Sartorius) with a 20X objective; Size bars, 20 μM.
Extended Data Fig. 9 Gating strategies used for cell sorting during RBC maturation.
a, Gating strategy to determine the percentage of RBC maturation markers Band3 and CD49d after 5 additional days of induced differentiation in WT and HUDEP-2-ABEmax cells presented on Fig. 1d. b, Gating strategy to determine the percentage of the RBC maturation markers Band3 and CD49d after 12 days of differentiation of normal CD34+ HSPCs (transfected by 5 gRNAs, respectively.) presented on Extended Data Fig. 2d. c, Gating strategy to determine the percentage of the RBC maturation markers Band3 and CD49d after 12 days of differentiation of SCD derived CD34+ HSPCs (transfected by 2 gRNAs, respectively.) presented on Extended Data Fig. 8c.
Extended Data Fig. 10 Gating strategies used for F cells sorting.
a, Gating strategy to determine the percentage of F cells in Ctrl or BCL11A-ENH gRNA transfected HUDEP-2 cells presented on Fig. 1i. b, Gating strategy to determine the percentage of F cells after 12 days of differentiation of SCD derived CD34+ HSPCs (transfected by 2 gRNAs, respectively.) presented on Fig. 6e. c, Gating strategy to determine the percentage of F cells presented on Extended data Fig. 5e.
Supplementary information
Source data
Source Data Fig. 1
Unprocessed western blots and/or gels.
Source Data Fig. 2
Unprocessed western blots and/or gels.
Source Data Fig. 3
Unprocessed western blots and/or gels.
Source Data Fig. 4
Unprocessed western blots and/or gels.
Source Data Fig. 5
Raw numbers and exact P values for all of the bar plots.
Rights and permissions
About this article
Cite this article
Cheng, L., Li, Y., Qi, Q. et al. Single-nucleotide-level mapping of DNA regulatory elements that control fetal hemoglobin expression. Nat Genet 53, 869–880 (2021). https://doi.org/10.1038/s41588-021-00861-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00861-8
- Springer Nature America, Inc.
This article is cited by
-
Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification
Nature Genetics (2024)
-
Erythroid lineage chromatin accessibility maps facilitate identification and validation of NFIX as a fetal hemoglobin repressor
Communications Biology (2023)
-
An analytical framework for decoding cell type-specific genetic variation of gene regulation
Nature Communications (2023)
-
Determining chromatin architecture with Micro Capture-C
Nature Protocols (2023)
-
Potent and uniform fetal hemoglobin induction via base editing
Nature Genetics (2023)