Introduction

There are >500 C2H2 zinc finger (ZF) proteins encoded in the mammalian genome1,2. They can be sub-classified according to the number of ZFs, the presence of additional functional domains (e.g. a KRAB domain), and the presence or absence of highly-conserved linker sequences between individual ZFs. The presence of the linker, TGEKP, is highly predictive of a DNA-binding function rather than another function such as RNA or protein binding. For most C2H2 zinc finger proteins there is limited information about function, but this is not the case for the SP/KLF subfamily which are ancient3,4. They play critical regulatory roles in the differentiation, proliferation, identity, and reprogramming of most cell types and tissues5,6.

There are twenty-five SP/KLF proteins in humans and mice; they share three very similar C-terminal C2H2 ZFs interspaced by highly-conserved linkers3,7. Linker 1 nearly always has the sequence, 1TGE4K5P, whereas linker 2 is more variable (Supplementary Fig. 1A). Nevertheless, the threonine at position +1 (relative to the end of the preceding α-helix), arginine or lysine at position +4, and proline at position +5, are highly conserved. NMR and crystallographic studies show linkers become structured in the presence of DNA, forming hydrogen bonds with the α-helix of the preceding ZF8,9. In this way, the linker stabilizes the preceding α-helix and increases its length. This process is known as α-helix C-capping10. It is thought that the linkers also facilitate ‘clamping’ of the three ZFs to their extended 9 bp binding site in the major groove of DNA, and thereby increase DNA-binding affinity.

Many SP and KLF family members are broadly expressed whereas others are tissue specific5. KLF1, the founding member of the KLF family, is expressed uniquely in erythroid progenitor cells11, and is essential for red blood cell production in mice and man12,13,14. ChIP-seq and RNA-seq experiments have shown KLF1 directly binds promoters and enhancers of genes which regulate the cell cycle, heme biosynthesis, globin production, and membrane/cytoskeletal integrity15,16,17,18,19. Missense mutations in the second linker of KLF1 are quite common in some populations such as those from Southern China and the Mediterranean (Supplementary Fig. 1B)20,21,22. Missense mutations are less common in peoples of European ancestry so they are under-represented in large exome databases such as those accessible via gnomAD and the UK Biobank. The combination of a missense mutation in the second linker of KLF1 with a loss-of-function mutation on the other allele results in non-spherocytic hemolytic anemia (NSHA) rather than β-thalassemia23, for reasons that are unclear.

Little is known about the biological functions of C2H2 linkers in vivo. We previously identified the mommeD45 mutation in Klf1 from a mouse ENU mutagenesis screen for epigenetic modifiers of expression of an α-globin transgene24. The mutation results in a single amino acid substitution of histidine to arginine (H350R) at position +3 (underlined) in the second linker (1TG3HRP) (Supplementary Fig. 1B). Interestingly, this histidine is not conserved between mouse and man; however, it is conserved in the closely related family members, KLF2 and KLF4 (Supplementary Fig. 1A). The +3 position in linker 2 (and in linker 1) is an acidic amino acid (glutamate or aspartate) for most KLF/SP family members (Supplementary Fig. 1A), so the basic amino acid (arginine) swap, H350R, is likely damaging to function. Indeed, in the homozygous state, the H350R mutation causes mild anemia with reticulocytosis and splenomegaly24.

Herein, we established an animal model of human NSHA by crossing Klf1 knockout mice12 with mommeD45 mice24. The mouse model accurately phenocopies humans who harbor similar compound heterozygous mutations in KLF120,21. We gained insight into how this mutation results in disease by performing ChIP-seq and RNA-seq in fetal liver (FL) and adult splenic erythroblasts. We show just a small subset of the KLF1-dependent transcriptome is sensitive to the H350R mutation. The set includes genes that encode transmembrane proteins [e.g. Slc2a1 (Glut1), Slc2a4 (Glut4), Slc1a5 (Asct2) and Slc43a1 (Lat3)] and cytoskeletal proteins (e.g. Ebp4.9/dematin), but surprisingly not some well-established KLF1 target genes such as globins (e.g. Hbb-b1) and cell cycle regulators (e.g. e2f2 and e2f4)11,12,25,26. This is consistent with the NSHA phenotype rather than thalassemia. We partially and transiently rescued the anemia with L-leucine supplementation in the drinking water. This provides a translational opportunity for humans with NSHA due to KLF1 mutations. We show that the H350R mutation leads to reduced affinity of DNA-binding globally in vivo and in vitro, but unaltered sequence specificity, distinguishing it from other mutations in DNA-contacting residues of KLF127,28,29,30,31. We expand the in vitro analysis to other anemia-causing linker 2 mutations in humans. Similar mutations in linkers of other C2H2 zinc finger TFs have been reported in various human genetic disorders32,33. Our studies provide a greater understanding of the global in vivo functions of the linkers of the C2H2 super family of TFs, and how these mutations cause diseases that are phenotypically distinct from diseases due to loss of function mutations.

Results

Compound heterozygous mice for null and missense mutations in linker 2 of Klf1 display NSHA

We interbred mice carrying the mommeD45 mutation in Klf1 (Klf1H350R/+)24 with mice carrying a null allele in Klf1 (Klf1+/−)12 to create Klf1H350R/− mice and litter mates with five other genotypes. Klf1−/− mice are embryonic lethal but Klf1H350R/− mice were born at expected Mendelian ratios. However, they were severely pale and jaundiced. The phenotype resembles that which occurs in human patients who co-inherit loss-of-function and linker 2 missense mutations in KLF120,21. There was severe poikilocytosis, with prominent schistocytes, micro-spherocytes, pencil cells, and nucleated red blood cells (Fig. 1A; arrow) in blood smears from Klf1H350R/− mice (target cells and stomatocytes were not prominent), whereas blood smears from heterozygote mice (Klf1+/− and Klf1H350R/+) and wild type mice (Klf1+/+) were normal. Blood smears from Klf1H350R/H350R mice showed slight polychromasia. There was a mild reduction in mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH) and hematocrit (Hct), and a marked increase in circulating reticulocytes in Klf1H350R/− mice compared to litter mates (Fig. 1B). We found an increase in normalized spleen weight (to body weight) (Fig. 1E), due to expansion of the red pulp (Fig. 1D). There was a marked increase in the number of CD71+ erythroid cells in the spleens of Klf1H350R/− mice at the relative expense of B and T cells (Fig. 1F). Together, this indicates the splenomegaly was due to extra-medullary erythropoiesis. There was an increase in immature erythrocytes (CD71+/TER119+) in the blood of Klf1H350R/− mice compared to Klf1+/+ litter mates and a corresponding decrease in CD71-/TER119+ mature erythrocytes (Fig. 1C). There was also a reduction in mature CD71-/TER119+ erythroid cells in the bone marrow (Fig. 1C). Together with the blood findings, this suggests either impairment of terminal erythroid differentiation or destruction prior to bone marrow exit with premature release of immature erythrocytes into the circulation; i.e. a stress response.

Fig. 1: Non-spherocytic hemolytic anemia with increased reticulocytes and splenomegaly in Klf1H350R/- mice.
figure 1

A Peripheral blood stained by May-Grünwald Giemsa (MGG) reveal hypochromic microcytic red blood cells (RBC) with anisopoikilocytosis (red arrows) and nucleated RBC (black arrow) in compound heterozygous mice (Klf1H350R/-). B Compound heterozygote mice have increased reticulocytes, decreased mean corpuscular volume (MCV), decreased hemoglobin content, and decreased hematocrit levels compared to wild type. C Compound heterozygous animals display a significant increase in the proportion of CD71-positive cells in blood and a significant decrease of TER119-positive cells in the bone marrow compared to normal mice. There is an expansion of red pulp (rp), but not white pulp (wp) in the spleens of compound heterozygous mice (D) together with an increase in relative spleen weight (E). F Spleens of compound heterozygous animals show a relative expansion of erythroid cells. In panels (B) and (E), each dot represents an independent animal (n ≥ 3) with bars representing mean ± SEM (***P < 0.0005, **P < 0.05). One-way ANOVA with Tukey post-hoc test was used for all statistical analysis. Images were taken on an Olympus BX50 with a DP26 camera using the cellSens Standard software. Scalebars represent either 20 μM or 100 μM as indicated.

Klf1 H350R/− embryos display reduced transcription of a small subset of KLF1-dependent genes

To understand the transcriptional consequences of the H350R mutation we performed mRNA-seq on Klf1+/−, Klf1H350R/−, and Klf1−/− fetal liver (FLs). Crosses were undertaken so that these three genotypes were present within the same litters; thus, appropriate comparisons could be made between the allelic series in the context of one Klf1 null allele. At E14.5, Klf1−/− embryos were severely anemic as expected12, but Klf1H350R/− embryos were phenotypically indistinguishable from Klf1+/− embryos. MGG-stained cytospins of the blood and fetal liver cells are shown in Supplementary Fig. 2. The circulating cells in Klf1−/− mice are extremely fragile as reported34. There are late erythroblasts in the fetal livers all genotypes (despite the lack of Ter119 expression) and some level of hemoglobinization, but the membranes are irregular in Klf1−/− and Klf1H350R/− compared with Klf1+/− mice. We identified 1786 differentially expressed genes (DEGs) between Klf1+/− and Klf1−/− FLs to re-establish the KLF1-dependent transcriptome in this cross (Supplementary Table 3). The majority of these genes were down regulated in the absence of Klf1 consistent with prior studies showing KLF1 acts primarily as a transcriptional activator in vivo15. As expected, this gene list is similar to previously published lists in BALB/c mice15,18,34.

We next identified just 228 genes that are sensitive to the H350R missense mutation by determining DEGs between Klf1+/− and Klf1H350R/− FL RNAs (Fig. 2A); 168 of these overlap with the KLF1-dependent gene list and 60 do not (Fig. 2A). Most of the genes display reduced gene expression in Klf1H350R/− FL, but some are increased (Fig. 2B). In fact, there is a range of transcriptional sensitivity to the H350R mutation, with many genes displaying slightly reduced expression levels, but failing to meet statistical cut-offs (Fig. 2B). Thus, many H350R-sensitive genes have an intermediate expression level between Klf1−/− and Klf1+/− as would be expected for a hypo-morphic mutation (Supplementary Fig. 3). For the significantly downregulated DEGs, we found enrichment of gene ontologies for the membrane cellular component and biological processes relating to the membrane (e.g. transmembrane transport and ion transport), and we found no significant differences in the expression of genes central to the function of KLF1 such as hemoglobin and cell cycle transcription factors (E2f2, E2f4)19,35. We undertook qRT-PCR to validate H350R-sensitive genes in comparison with known Klf1 target genes such as β-globin (Hbb-b1)11,15. Consistent with the RNA-seq data, Hbb-b1, Gata2, Gata1 and Fli1 are normally expressed in the FLs of Klf1H350R/− mice (Fig. 2B, D). E2f2, E2f4, Tspo2, and Dusp115,18,26, were expressed at intermediate levels in Klf1H350R/− compared to Klf1+/− and Klf1−/− FL RNA; i.e. only mildly reduced (Fig. 2D). However, many genes that encode transmembrane and cytoskeletal proteins and enzymes, such as Slc2a1, Slc43a1, Dmtn (Epb4.9), Slc1a5, Slc2a4, Fn3k, and Rgcc, were expressed at very low levels in Klf1H350R/− FL compared to Klf1+/− FL (Fig. 2D). Thus, RNA-seq and qRT-PCR results support the genetics which shows the KLF1-H350R protein is more defective at some target gene promoters and enhancers than others; it also explains the presence of NSHA rather than thalassemia.

Fig. 2: A subset of KLF1 regulated genes are sensitive to the H350R mutation.
figure 2

A Venn diagram of the 1786 DEGs in the KLF1 regulated transcriptome (Klf1−/− vs Klf1+/−) from E14.5 fetal liver. Of these, 168 are sensitive to the H350R mutation (Klf1H350R/− vs Klf1+/−). B MA plot of DEGs from Klf1H350R/− embryos. Blue dots represent genes that are >2-fold down (left) or up (right) regulated in H350R/− versus +/−. C qRT-PCR validation of DEGs identified from RNA-seq was performed with three biological replicates for each genotype and normalized to Hprt. The mean ± SEM is shown for each genotype and gene. One-way ANOVA with the Tukey post-hoc test was used for statistical analysis.

Adult splenic erythroid cells from Klf1 H350R/− mice display similarly altered gene expression

Klf1H350R/− mice survive into adulthood so we also undertook RNA-seq in FACS-sorted erythroid cells (CD71+, TER119+) from the spleens of adult mice (8–12 weeks of age) (See Methods). These cells are intermediate to late erythroblasts (S3 cells according to ref. 36). In this way we were able to compare transcriptomes in phenotypically similar populations of purified erythroblasts from five different genotypes (+/+, +/H350R, +/−, H350R/H350R and H350R/−), and thereby generate gene expression profiles within an allelic series. Klf1−/− mice die at E14.5 so we were not able to examine gene expression in adult erythroid cells of this genotype. There are few CD71 + , TER119+ (S3) erythroid cells in the spleens of WT mice but a progressive increase in numbers in H350R/H350R mice and H350R/- mice consistent with the degree of splenomegaly and extramedullary erythropoiesis (Figs. 1F3A). The level of surface expression of TER119 and CD71 (MFI) is slightly reduced in Klf1H350R/− splenic erythroid cells compared with other genotypes (Figs. 1C3A); this contrasts with Klf1−/− FL erythroid cells, which have markedly reduced expression of TER119 and moderately reduced expression of CD7134.

Fig. 3: KLF1-H350R sensitive genes in splenic erythroid progenitor cells.
figure 3

A FACS plots of B220-, CD3-, CD11b- cells in the spleens stained for CD71 and TER119. There is an increasing percentage of CD71 + TER119+ (S3) pro-erythroblasts in the spleens of the allelic series of mice. B MDS plot of log fold change versus −log10 FDR in Klf1H350R/− (n = 4) versus Klf1+/+ (n = 3) erythroid cells. Dots indicate DEGs with >1.5 fold expression change and an FDR of <0.05. C Expression of KLF1-H350R-sensitive genes for five different genotypes. Each dot is a sample from a different mouse. The legend indicates genotypes and numbers. D Expression of KLF1-H350R-insensitve genes. E Expression of some EPO target genes is increased in Klf1H350R/− mice.

We found 419 DEGs in erythroblasts from Klf1H350R/H350R mice compared with WT mice (1.5 fold; FDR < 0.05) (Supplementary Fig. 4), and we found 1209 DEGs in Klf1H350R/− mice compared with WT mice (Fig. 3B). In both comparisons, ~65% of genes were downregulated in the mutant mice and ~35% were upregulated. The latter were enriched for genes encoding proteins involved in EPOR signaling (e.g. Clint1, Suv420h2, Podxl, Thbs1, Cish, etc.)37 or stress erythropoiesis (Fig. 3E); none of the top 25 upregulated genes had a KLF1 ChIP-seq peak nearby. Thus, we conclude these are upregulated indirectly in response to anemic stress. For the down-regulated genes, there was very strong overlap with the H350R-sensitive genes in the FL. For example, there was marked down regulation of Epb4.9 (dematin) Slc1a5 (Asct2), Rgcc, Fn3k, and Slc2a4 (Glut4), and in Klf1H350R/− erythroid cells from spleen and FL (Figs. 2, 3C). Slc2a1 (Glut1) is not expressed in adult erythroid cells38, so it was only found as a DEG in FL. In many cases, there was a clear gradient of gene expression level with respect to the allelic series/genotypes; i.e. H350R/- cells have the greatest perturbation of gene expression, followed by H350R/H350R cells, +/− cells, and H350R/+ cells in comparison with +/+ cells (Fig. 3). Interestingly, expression of some well-known KLF1-target genes such as Hbb-b1 (β-globin), E2f2 and E2f426 was minimally perturbed in spleen erythroid cells from Klf1H350R/− mice, as in the FL (Figs. 3D2). Embryonic and fetal globin genes (Hbb-bh1, Hbb-y) were expressed at very low levels in adult spleen and not significantly altered in Klf1H350R/− mice (Fig. 3D); i.e. there was no evidence for upregulation of fetal and embryonic globin genes in Klf1H350R/− mice. This contrasts with studies of mice with dominant NSHA from a missense mutation in zinc finger two that contacts DNA (the nan mouse)39,40, and with elevated human γ-globin gene expression in Klf1−/− mice that harbor a human β-globin locus YAC41 and KLF1+/− humans42. In short, RNA-seq in adult splenic erythroid progenitor cells (S3 cells) detected a similar set of KLF1-H350R-sensitive genes to those found in the FL. We suggest this combination of expression changes underpins the observed NSHA phenotype rather than thalassemia (Fig. 1).

KLF1-H350R has reduced affinity but unaltered specificity for DNA in vivo

In order to determine the in vivo binding characteristics of the KLF1-H350R mutation, we performed ChIP-seq for KLF1 in Klf1+/+ versus Klf1H350R/H350R FLs. We identified a total of 1144 KLF1-occupied sites in wild type FL using MACS2 (see Methods). This reflects improved technical expertise compared with previously published data16. High quality ChIP-seq data for murine KLF1 in primary cells has proven to be difficult to obtain, so this new data set provides a valuable new resource for the field (GSE94351.) We found only 490 KLF1-occupied sites passed MACS2 peak calling in homozygous Klf1H350R/H350R FL, most likely due to globally reduced DNA-binding affinity (see below). An overlap of 341 sites was observed between the two datasets based on a window of 100 bp either side of the peak summits (Fig. 4A).

Fig. 4: The H350R mutation does not alter DNA-sequence binding specificity in vivo.
figure 4

A 1144 KLF1-occupied sites and 490 KLF1-H350R-occupied sites were called by MACS2 analysis. An overlap of 372 peaks (common peaks) was identified using a 100 bp window between peaks summits. B De novo motif discovery found enrichment of the classical 9 bp CACCC motif (C-rich strand) in both WT and H350R datasets using 100 bp surrounding the peak summit. C Stacked histogram illustrating the proportion of central motif sequence variants. Central motifs (CACCC motif to the nearest peak summit) were identified in 939 of the wild type peaks and 415 of the H350R peaks. Peaks with more than 1 identified binding site were omitted for this analysis. D Read density heat maps showing enrichment by ChIP surrounding the peak summit in both WT and H350R. Read density heat maps of common peaks display the shared regions (summits within 100 bp) only.

An identical DNA consensus motif was preferred by both the wild type and H350R-mutant KLF1 proteins in vivo according to de novo motif discovery (Fig. 4B), and from counting of specific motifs closest to the center of each peak, as described27 (Fig. 4C). Together, this shows unaltered DNA-binding specificity of KLF1-H350R in vivo and contrasts with ChIP-seq data for KLF1 mutations in DNA-contacting residues, such as those which cause dominant hemolytic anemia in the Nan mouse27,29 and CDA-IV in humans30. The motif is consistent with the extended CACCC-box motif (CCM-CRC-CCN on the C-rich strand; or, NGG-GYG-KGG on the G-rich strand) bound in vivo by all SP/KLF family members studied to date4,16,43,44,45,46,47. To further compare the in vivo DNA-binding ability of wild-type KLF1 versus KLF1-H350R, we counted reads under each peak (+/−100bp from peak center). We found counts over most shared peaks were consistently greater for wild type KLF1 than for KLF1-H350R (Fig. 4D). The majority of H350R-sensitive genes as determined by RNA-seq have a KLF1 ChIP-seq peak at the promoter, intron or upstream of the TSS, so most are likely direct targets of KLF1 (Table 1). In short, the motifs within the cis-regulatory modules of KLF1-H350R-sensitive genes are indistinguishable from those in less sensitive genes globally (Fig. 4C). For example, dematin (Ebp4.9) encodes a key linker protein between the cytoskeleton and the actin junctional transmembrane complex that includes GLUT1 and stomatin48,49. There are three independent Ebp4.9 gene promoters34. The second promoter and a putative intron enhancer in intron 2 are robustly bound by KLF1 in the FL (Fig. 5A). This binding is reduced in KLF1 H350R/H350R cells and mRNA is markedly downregulated. A previous report shows Glut1 is regulated by SP3 in muscle cells via a proximal promoter CACC box element50, but in FL there is a strong binding site at ~1 kb upstream of the promoter (Fig. 5B). Binding and expression are markedly reduced in Klf1H350R/− FLs. The same applies for promoters and putative intronic enhancers for Slc2a4 (Glut4) and Slc1a5 (Asct2) (Fig. 5 C-D), and many other KLF1 H350R-sensitive genes.

Table 1 KLF Linker 2 (H350R)-sensitive genes (DEGs from mRNA-seq in fetal liver)
Fig. 5: Dmtn (Epb4.9), Slc2a1 (Glut1), Slc1a5 (Asct2) and Slc2a4 (Glut4) and are all direct targets of KLF1 and sensitive to the KLF1-H350R mutation.
figure 5

UCSC Genomic Browser views of wiggle tracks of KLF1 ChIP-seq (red and blue) and mRNA-seq from the fetal liver (black and grey). ChIP-seq peak summits called by MACS2 are indicated with solid bars above the wiggle tracks of ChIP signal, which was generated by subtracting ChIP signal to background input DNA signal (Klf1+/+: red, Klf1H350R/H350R: blue). The mRNA-seq from the three fetal liver genotypes, Klf1+/−, Klf1H350R/− and Klf1−/−, are shown below, forward strand reads are coloured black and reverse strand reads in grey. KLF1 binds A the second promoter and putative enhancer for Dmnt, B a putative promoter for Slc2a1, C the promoter and putative intronic enhancer for Slc1a5, D the romoter and putative intronic enhancer for Slc2a4. There is reduced ChIP-seq signal in Klf1H350R/H350R mice, and reduced levels of RNA expression in Klf1H350R/-.

Recombinant linker 2-mutant KLF1 zinc finger domains have reduced DNA-binding affinity in vitro

It can be difficult to determine relative in vivo binding specificities of mutant transcription factors (TFs) from ChIP-seq data alone, due to variables which are difficult to control51. We therefore produced and purified GST-zinc finger domain fusion proteins in E.coli as previously described27 (see Methods) for in vitro binding assays (Supplementary Fig. 5). The KLF1 binding sequence (CCM-CRC-CCN) is categorized by the presence of guanine (type 1) or an adenine (type 2) at the central position (underlined). The importance of a cytosine versus adenine at position 3 (M) for overall affinity has not been studied in detail. Using electrophoretic mobility shift assays we find both GST-KLF1-zf and GST-H350R-KLF1-zf bind to canonical type 1 motifs such as those found in the Alas2 intronic enhancer and putative promoter and intronic enhancer of Slc2a4 (Fig. 6A) and type 2 motifs, such as that which is present in the E2f2 intronic enhancer26 and the Slc1a5 and Slc43a1 putative intronic enhancers (Fig. 6B). However, the affinity of GST-H350R-KLF1zf for the Type 1probe, CCC-CGC-CCT, and type 2 probe, CCC-CAC-CCT27,40, are both ~4-fold reduced (Fig. 6). Similar results were found for the Hbb promoter (Supplementary Fig. 5B). This reduced in vitro binding affinity is consistent with the ChIP-seq data. It shows insertion of an arginine at position +3 in linker 2 of KLF1 reduces overall affinity of the zinc finger domain for all variants of the recognition motif. We made two additional mutations in linker 2, T348A and G349R, which are equivalent to human disease-associated mutations20,21,22,52. These also result in ~4-fold reduced binding affinity for both type 1 and type 2 KLF1-binding sites in vitro (Fig. 6).

Fig. 6: Mutations in the second linker of KLF1 cause reduced DNA-binding affinity.
figure 6

A Electrophoretic mobility shift assay comparing the binding affinity of three KLF1 linker-mutations (H350R, G349R, T348A) against a known KLF1 binding site in the first intron of E2f2. Protein concentrations were first equalized, and then serially loaded using a 1:1 dilution with binding buffer, for each lane from left to right. The probe concentration was consistent across all 4 panels. B KD values are calculated from quantification of each blot shown (C).

To seek an explanation for reduced binding affinity, we revisited the structure of KLF4 bound to DNA53, and other solved zinc finger structures8,10,54. In the KLF4 structure, threonine at +1 in linker 2 is part of the extended α-helix far from the DNA substrate (Supplementary Fig. 6, yellow box). The glycine at position +2 plays an important role in α-helix termination or C-capping. Similarly, for TFIIIA the glycine at position +2 in both linkers 1 and 2 makes an i, i + 4 hydrogen bond with a carbonyl group on serine or leucine in the preceding α-helix (red line, Supplementary Fig. 6). In Zif268 and TFIIIA, the +4 lysine makes contact with the DNA backbone and this could also increase binding affinity10. It is difficult to speculate about the consequences of an arginine substitution at +3 in linker 2 (or G>R at +2 in the case of human patients), although it is worth noting that arginine does have a slightly stronger preference to adopt alpha helical secondary structure than either histidine or glycine55. One explanation for the observed phenotype may therefore be that these mutations could modify the length of the α-helix of finger 2, in both cases leading to reduction in DNA binding affinity. Alternatively, the close positioning of arginine sidechains at both linker positions +2/3 and +4 could result in repulsive forces between the sidechains that destabilize the important interaction between the +4 arginine and the DNA backbone.

Supplementation with L-leucine results in partial rescue of NSHA

One of the KLF1-H350R sensitive target genes is Slc43a1 or LAT3. LAT3-deficient zebrafish have severe anemia that can be rescued by L-leucine supplementation in the water56. Thus, we asked whether we could rescue the NSHA in Klf1H350R/− mice by supplementation of the drinking water with L-leucine (1.5%). Indeed, there was a significant increase in Hb of ~40 g/L (corrected p value < 0.005) and red blood cell count (RCC) of ~1.5 ×109/L (corrected p value < 0.01) in Klf1H350R/− mice, which was maximal after four weeks of supplementation, and was partially maintained at 6 weeks (18 g/L, p value < 0.01 for Hb), but mostly lost by 8 weeks. There was no change in Hb or RCC in Klf1+/− mice (Fig. 7) (see Discussion).

Fig. 7: Partial transient rescue of anemia in Klf1H350R/− mice with L-leucine supplementation.
figure 7

Hemoglobin (Hb) levels and RBC counts (RCC) in mice at baseline and every second week following supplementation with L-leucine (1.5% weight/vol) in the drinking water for Klf1+/− (red; n = 8) and Klf1H350R/− (blue; n = 9) mice. There is no significant change in Hb levels or RCC in Klf1+/− mice, but a significant and variable upregulation of Hb and RCC in Klf1H350R/− mice at 4 weeks (**p < 0.005) and at 6 weeks (*p < 0.01) after L-leucine supplementation. Two-way ANOVA with a Tukey post-hoc test was used for statistical analysis.

Discussion

Mutations in KLF1 are common in certain ethnic groups57,58. Homozygous loss-of-function (LoF) mutations result in hydrops fetalis, which is fatal without intrauterine or perinatal blood transfusion support13, whereas heterozygous LoF mutations result in a mild phenotype which resembles β-thalassemia trait with mildly raised HbF and HbA2, raised ZPP plus blood serological abnormalities23,59. Missense mutations in linker 2 result in very mild or absent phenotypes, but when co-inherited with loss of function mutations on the other allele, these lead to severe NSHA21. Lastly, dominant mutations at p.Glu325 in zinc finger 2 cause CDA type 460. This phenotypic variability is not fully understood and leads to diagnostic challenges.

We generated a murine model of NSHA by crossing loss of function12 and linker 2 mutations in Klf124. The mice have severe fragmentary hemolysis rather than β-thalassemia, just like human patients with NSHA and KLF1 linker mutations22,23. Using RNA-seq we show this is due to selective loss of a small subset of KLF1-dependent genes. Many of these encode transmembrane and cytoskeletal genes, which is consistent with the NSHA phenotype. One of the most down regulated genes is dematin (Epb4.9), which is a critical regulator of the cytoskeleton. Dematin knockout mice have severe fragmentary hemolysis and red blood cell dysmorphology similar to Klf1H350R/− mice61. We also found >10-fold downregulation of Scl2a1 (Glut1), Slc2a4 (Glut4), and Slc1a5 (ASCT2) in the fetal liver (Fig. 2), and Slc2a4 and Slc1a5 in the spleen (Fig. 3). There is a developmental switch in expression of Scl2a1 to Slc2a4 between fetal and adult red blood cells that is not altered by the Klf1 mutations38. Glut1 deficiency results in hemolytic anemia62, but the Glut4 knockout mouse does not have a major erythroid phenotype. ASCT2 is a glutamine transporter which is important for energy production and generation of erythroid cells from HSCs63. GLUT1/4 and ASCT2 play critical roles in the uptake of glucose and glutamine, respectively. Glutamine is harnessed for protein biosynthesis, but is also a substrate for nucleotide biosynthesis via the TCA cycle63 (Fig. 8).

Fig. 8: Model of metabolic defects in erythrocytes of Klf1H350R/− mice.
figure 8

The expression of three membrane transporters, Slc2a1 (Glut1), Slc1a5 (Asct2) and Slc43a1 (Lat3) were significantly downregulated in Klf1H350R/− fetal livers when compared to Klf1+/− litter mates. These three transporters are important in the maturing erythrocyte and are involved in metabolic processes such as glucose (GLUT1) and glutamine (ASCT2) uptake for nucleotide biosynthesis, as well as leucine and other neutral amino acid transport (LAT3) for hemoglobin synthesis. G6P is generated from glucose (imported by GLUT1) and is used for generation of NADPH from NADP+ by G6PD. This is critical for protection against oxidant stress.

Interestingly, exogenous delivery of nucleotides can rescue human erythroid differentiation from CD34 cells in which nucleotide synthesis from glutamine is blocked with 6-diazo-5-oxo-L-norleucine (DON)63. So, reduced expression of both transporters in Klf1H350R/− mice is likely to result in a shortage of nucleotide production and contribute substantially to anemia (Fig. 8). This raises the possibility that exogenous nucleotides might be able to partially rescue the anemia in patients with linker mutations in KLF1 if they could be delivered efficiently to erythroid progenitors in the bone marrow. Nucleotides are often added to infant formula preparations and they can be purchased as supplements in the belief they may enhance the health of the gut and immune system. Nucleotides are degraded to nucleosides by phosphatases in the gut and efficiently taken up by intestinal epithelial cells, but it is not clear if much enters the circulation to short cut purine and pyrimidine biosynthetic pathways. Hematopoietic stem and progenitor cells certainly express members of the ENT and CNT nucleoside transport families, so uptake from the diet is possible. Although this is not likely to be of major importance for normal nucleoside biosynthesis, it might provide an alternative source of nucleosides for KLF1 mutant erythroid progenitors which are likely to be starved of nucleotides because of deficiency in ASCT2 and GLUT1 transporters. Also, supplementation of the diet with L-glutamine might benefit patients with NSHA due to KLF1 mutations just as it does in patients with sickle cell disease64. In this case the L-glutamine would need to enter erythroid progenitors despite low levels of expression of the receptor.

Scl43a1 (LAT3) is also markedly downregulated in Klf1H350R/− FL (Fig. 2). LAT3 is a neutral amino acid transporter with particular avidity for L-leucine, which is an essential building block for hemoglobin synthesis in maturing marrow erythroblasts56. So, expression at <10% of endogenous levels is also likely to contribute to the anemia in Klf1H350R/− mice (Fig. 8). Interestingly, supplementation of L-leucine in the drinking water lead to a significant increase in hemoglobin and RCC in Klf1H350R/− mice (Fig. 7), just as it does in LAT3 deficient zebrafish56. The timing of improvement is consistent with improved translation of globins and other important genes in erythroid progenitor cells, but it is not clear why this benefit is not maintained. This suggests there could be an additional secondary translational problem in Klf1H350R/− erythroid progenitor cells beyond transcriptional defects. There are other transmembrane transporters which are sensitive to the H350R mutation, so the NSHA phenotype is likely due to a combination of loss of many or all of these targets.

We employed ChIP-seq in primary FL cells to determine whether the DEGs are direct target genes of KLF1 and whether the H350R mutation results in altered DNA binding affinity or specificity. We found strong evidence for direct activation of most of the DEGs by KLF1. In particular, it binds the promoters and enhancers of Slc5a1 (ASCT2), Slc2a1 (GLUT1), Slc2a4 (GLUT4), Dmtn (EBP4.9) and many other genes (Fig. 5), so these genes are direct KLF1 target genes (Table 1). We found a globally reduced ChIP signal in Klf1H350R/H350R erythroid cells compared to wild type cells (Fig. 4) and confirmed ~4-fold reduced affinity at all KLF1-bound sequences using recombinant purified zinc finger domains and EMSA assays (Fig. 6). This is consistent with previous work which shows similar loss of DNA-binding affinity in vitro as a result of linker mutations in other transcription factors9,65. Our work herein is the first to show this is also true in vivo. The DNA-binding specificity of H350R was unchanged in vivo and in vitro (Figs. 4, 6), as one might predict for a mutation far removed from the canonical DNA-binding domain (Supplementary Fig. 6). This contrasts with the change in DNA-binding specificity that accompanies mutations in the DNA-contacting amino acids of zinc finger 2; these cause dominant congenital dyserythropoietic anemia with marked erythroblastosis in mouse and man23,27,40,60.

Given the DNA-binding affinity is globally reduced, it is unclear why some genes are particularly sensitive to the linker 2 mutation and some are not. There were no obvious differences in promoter/enhancer contexts (e.g. additional DNA-motifs such as those that bind GATA1) between H350R-sensitive and insensitive genes. Some KLF1 target genes are also dose sensitive in humans who are simple carriers for KLF1 loss of function (LoF) mutations20,21,23,66,67. So, the mechanism for the selective dependency of some genes remains uncertain.

Nevertheless, this study sheds light on the general mechanisms by which mutations in the linkers of zinc finger TFs cause disease. Zinc finger TFs are one of the largest gene families in the mammalian genome. Linker mutations in ZBTB20 cause Primrose Syndrome33, and similar mutations in ZNF407 cause cognitive impairment32. One important lesson from this work is that missense mutations in TFs can result in different phenotypes from loss of function mutations because only a subset of target genes are affected. Thus, it can be difficult to suspect mutations in well known genes as likely disease-causing candidates based on clinical findings. These genes might not be considered for re-sequencing in such cases and the genetic etiology might remain a mystery.

Methods

Materials and methods

Mouse breeding and tissue collection

All animal procedures were conducted under approval by the University of Queensland and The Alfred Research Alliance Animal Ethics Committees. Animals were housed in Greenline techniplast IVC cages with corn cob bedding with 12/12 h dark/light cycle, 23 °C ambient temperature and humidity maintained at 70%. Animals were fed standard chow ad libitum. Heterozygous mommeD45 (D45) mice (Klf1H350R/−)24, were interbred with Klf1 heterozygous knockout (Klf1+/−) mice12. Both male and female offspring were used in this study and sacrificed by cervical dislocation. Pregnant females were sacrificed at E14.5 and single cell suspensions of FL cells were lysed in TRIzol. For rescue experiments, L-leucine was added at 1.5% in the drinking water from 6 weeks of age for 4 weeks.

Flow cytometry

Flow cytometry analyses were performed on blood, spleen, and bone marrow collected from mice at 6–8 weeks of age. FACS of splenic erythroblasts was performed on spleens from animals between 8–12 weeks of age. Antibodies used in this study are listed in Supplementary Table 1. All samples were analyzed on the BD LSRII flow cytometer. Data and cytometry plots were analyzed and generated with FlowJo (v9) software.

Histology and hematological evaluation

Spleens were fixed in 4%PFA overnight at 4 °C, dehydrated in 70% ethanol and embedded in paraffin. 4uM sections were stained with Haematoxylin and Eosin. Blood smears were prepared from peripheral blood from two-week-old mice and stained with May-Grünwuld-Giemsa. Peripheral blood from six-week-old mice was collected and analyzed using a Celldyn Sapphire or Celldyn Emerald (Abbott diagnostics) automated blood analyzer with mouse software.

Transcriptome profiling and gene expression analysis

RNA from female Klf1+/−, Klf1H350R/− and Klf1−/− fetal livers (FLs) were extracted from TRIzol suspensions (ThermoFisher) as per manufacturer’s protocol. Whole fetal liver samples were used because it is not possible to employ CD71 and TER119 surface expression to sort stages of erythroid gene expression in Klf1−/− fetal livers18,34. Samples with a Bioanalyser RIN Score >7 (Agilent Technologies) were filtered through the GLOBINclearTM kit (ThermoFisher) and mRNA was selected using the Dynabeads mRNA DIRECT kit (ThermoFisher). cDNA libraries were prepared from the filtered RNA using the Ion total RNA-seq Kit-v2 (ThermoFisher). These were sequenced on the Ion Proton as described27. Reads were mapped to the mouse genome (mm9) using a combination of Tophat and TMAP27. Cuffdiff was used to identify differential gene expression between the three genotypes: Klf1+/−, Klf1H350R/−, and Klf1−/−68. Gene expression levels were validated by qRT-PCR using SYBR green PCR master mix (Thermofisher). Primer sequences are available in Supplementary Table 2. Graphs were generated using RStudio with the ggplots2 package (ver. 0.98.1103) and Graphpad Prism (ver.5.01).

RNA from FACS-sorted CD71 + TER119+ splenic erythrocytes of five viable mouse genotypes was extracted from TRIzol. mRNA was isolated from 1 μg total RNA using the NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB; #E7490). RNA-seq libraries were prepared using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (NEB; #E7760) according to the manufacturer’s instructions. Libraries were multiplexed and sequenced on an Illumina NovaSeq 6000 in paired end mode with 50 bp reads. Reads were aligned to the mouse genome (mm9) with hisat2(2.1.0). Picard (2.9.2) was used to remove optical duplicates. Gene level counts were obtained with subRead-featureCounts (1.6.4). Differential expression analysis was performed with Voom/Limma through the DEGUST web portal (4.2-dev).

Chromatin Immunoprecipitation with sequencing

ChIP was performed using a rabbit polyclonal antibody raised against the N-terminus of KLF116 using FL cells from wild type and homozygous Klf1H350R/H350R embryos. Two immunoprecipitations from FLs for each genotype were pooled to increase library complexity. Sequencing libraries were generated from ChIP-ed and input DNA using the NEBNext ChIP-seq Library prep reagent set (NEB #E6200) and NEBNext Multiplex oligo for Illumina (NEB #E7335). Libraries were sequenced on a Nextseq500 (Illumina). Reads were mapped to the mouse genome (mm9 and mm10) using Bowtie2 and filtered through a blacklist and whitelist (re-introducing α and β-globin multimapping regions). PCR sequencing duplicates were removed using Picard tools (https://broadinstitute.github.io/picard/). MACS2 was used to identify peaks depicting wild type KLF1 and KLF1-H350R bound regions. DNA enrichment was validated by qPCR using primers spanning the peak and genomic regions ~1 kb up and downstream of the peak (See Supplementary Table 2 for primer sequences). Graphs were generated using Rstudio with the ggplots2 package (ver. 0.98.1103). Heat maps of enriched sequencing read depth were created using EaSeq69

Recombinant GST-ZF protein purification and in vitro binding affinity assays

The ZF domain of murine KLF1 (encoding amino acids 261 to 376) was amplified by RT-PCR from Klf1H350R/H350R FL cDNA and cloned into pGEX-6p-1 in frame with GST, to generate KLF1-H350R-zf. Mutations were introduced by PCR mutagenesis and cloned into the same site of pGEX-6p-1 to generate KLF1-T348A-zf and KLF1-G349R-zf. Generation of the wild type GST-KLF-zf vector was previously reported27. Rosetta (DE3) cells were transformed and selected in ampicillin and chloramphenicol. Expression of GST-fusion proteins was induced overnight at 16 °C in 1 mM IPTG. Fusion proteins were then extracted and purified on Glutathione-Sepharose 4B (GE Healthcare #17075601) beads, eluted with reduced glutathione in elution buffer as previously described27. Protein purity and yield was monitored using SDS-PAGE with Coomassie Blue staining (Supplementary Fig. 5), and function verified by electrophoretic mobility shift assay (EMSA). Oligos (20mers) were end labeled with 32P-γATP and polynucleotide kinase, then annealed with excess reverse strand oligo and purified using Sephadex G-25 spin columns (Roche). The forward strand oligo sequences used are provided in Supplementary Table 2. EMSA was performed as described27. Free and bound probes were quantified using a Typhoon Phosphor Imager and Image Quant (IQ) software. The KD was calculated by plotting bound probe (shifted) versus total (bound plus free). A polynomial curve (n = 3) of best fit was generated in graphpad and the protein concentration at which 50% of the probe is bound (i.e. KD) was assigned from this curve.

Structural analysis

The structure of KLF4 bound to DNA (PDB ID 5KE6)53 was visualized and Supplementary Fig. 6 was prepared using the UCSF Chimera Software Package70.

Statistics and reproducibility

Statistical analysis was performed in Graphpad Prism. One-way Anova with a tukey post-hoc test was used to compare the hematological parameters and qRT-PCR results.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.