Main

Recently, genetic discoveries have generally focused on common variants of small effect and rare coding variants identified through genome‐wide association studies (GWAS) and whole‐exome sequencing initiatives, respectively11,12. The effect of low‐frequency and rare non‐coding variants upon common diseases, and their underlying traits has been recently explored in an isolated population13,14, but has not been well‐studied to date in the general population. The UK10K project has generated a large whole‐genome sequence‐based resource to address this question in a general European‐ancestry population10, which is tenfold larger than the European subset of the 1000 Genomes project reference15.

Osteoporosis, diagnosed mainly through measurement of bone mineral density (BMD), is a common systemic skeletal disease characterized by an increased propensity to fracture. The narrow‐sense heritability of BMD has been estimated to be ∼85%, and GWAS have successfully identified numerous loci associated with BMD which in total explain ∼5% of the genetic variance for this trait16. However, these studies have been mainly unable to assess the role of low frequency (MAF 1–5%) and rare (MAF ≤ 1%) genetic variation, as these methods rely on testing common variants (MAF ≥ 5%). A recent sequencing‐based study identified a rare nonsense variant associated with BMD using 4,931 Icelandic subjects with low BMD and 69,034 population‐based controls9. This coding variant, which disrupts the function of LGR4, appears to be confined to the Icelandic population.

To investigate the role of rare and low‐frequency genetic variation on BMD in the general population of European descent, we first undertook whole‐genome sequencing in 2,882 subjects from two cohorts in the UK10K project and whole‐exome sequencing in 3,549 subjects from five cohorts (Supplementary Table 1) with BMD phenotypes. We then used a novel imputation reference panel generated by the UK10K and 1000 Genomes consortia to impute variants that were missing, or poorly captured, from previous GWAS studies in 26,534 subjects (Supplementary Table 1 and Extended Data Fig. 1a). The combined UK10K and 1000 Genomes reference panel, which contained 3,781 and 379 European individuals with whole‐genome sequences from UK10K and 1000 Genomes projects, respectively, enabled improved imputation, particularly of low‐frequency variants, when compared to the 1000 Genomes reference panel alone17. We then undertook de novo replication genotyping of lead variants in 13 cohorts for BMD, comprising 20,271 individuals of European descent.

We meta‐analyzed association results from all discovery cohorts (ntotal = 32,965, Supplementary Table 1) for BMD measured at the forearm, femoral neck and lumbar spine, the sites where osteoporotic fractures are most prevalent. We tested bi‐allelic single nucleotide variants (SNVs) with MAF ≥ 0.5% for association, declaring genome‐wide statistical significance at P ≤ 1.2 × 10−8 (accounting for all independent SNVs above this MAF threshold; Supplementary Methods)18. The sequence kernel association test (SKAT) was used to assess association of regions containing SNVs with MAF ≤ 5% and ≤1% (Supplementary Methods). All summary‐level meta‐analytic results are available for unrestricted download (http://www.gefos.org). Novel genome‐wide significant loci were then tested for their relationship with fracture in up to 508,253 individuals. Finally, functional genomics as well as cellular and animal models were used to investigate the relevance of these novel genetic associations to bone physiology.

Through meta‐analysis of sequenced and imputed single‐SNV association tests from the discovery cohorts (Supplementary Table 1), we identified a novel locus at 2q14.2 harbouring variants associated with lumbar spine BMD (lead low‐frequency SNV rs11692564(T), MAF = 1.7%, effect size = +0.24 s.d., P = 4 × 10−9, Fig. 1 and Table 1). The direction of effect was consistent across all discovery cohorts (Extended Data Fig. 2) and the mean imputation information score for the imputed cohorts was 0.71 (Supplementary Table 2). This variant is located 53 kilobase pairs (kb) downstream from engrailed homeobox‐1 (EN1), which, to our knowledge, has not previously been associated with any osteoporosis‐related traits in humans. The rs11692564 variant was not present on HapMap imputation panels, nor on genotyping chips, underlining the importance of developing more comprehensive imputation reference panels.

Figure 1: Association signals near engrailed homeobox-1 for lumbar spine BMD.
figure 1

a, A topological domain includes associated variants and EN1, and chromatin interaction analysis with paired‐end tag sequencing (ChIA‐PET for CTCF in MCF‐7 cell line) suggests a smaller interacting region containing EN1, and three genome‐wide significant variants for lumbar spine BMD (in red). hES cell, human embryonic stem cell. b, Association signals at the EN1 locus (green line at P = 1.2 × 10−8) for lumbar spine BMD. Red circles and triangles represent results from discovery and combined discovery and replication using fixed‐effects meta‐analysis (see Supplementary Information), respectively. c, Allele frequency versus absolute effect size for lumbar spine BMD for previously identified variants (blue)8 and the three EN1 novel variants (red). The red line denotes the mean of previously reported effect sizes.

PowerPoint slide

Source data

Table 1 Novel variants from single SNV association tests

To validate whole‐genome sequencing genotypes at rs11692564, we genotyped 1,853 whole‐genome sequenced subjects, and found all genotypes to be perfectly concordant (Supplementary Table 3). We validated imputation of rs11692564 in 3,601 imputed subjects through direct genotyping and observed that the association strengthened, and its statistical significance improved, as compared to imputed results (lumbar spine: imputed effect size = 0.22 s.d.; P = 0.05, genotyped effect size = 0.31 s.d.; P = 0.004) (Supplementary Table 4). We next sought additional evidence for the association at rs11692564 by performing additional de novo genotyping in 16,233 independent individuals and found a similarly large effect size in this population (effect size = +0.20 s.d.; P = 3 × 10−6). Meta‐analysis of the discovery and replication cohorts provided strong evidence for association (Pcombined‐meta = 2 × 10−14) (Table 1).

We also identified an additional association signal, arising from rs55983207 (MAF = 4%), 17 kb downstream of rs11692564 (r2 = 0.001) to be associated with femoral neck BMD from the combined meta‐analysis (Pmeta = 7.2 × 10−15, Table 1). A haplotype containing both effect alleles was not observed from within the UK10K whole‐genome sequenced cohort (total number of haplotypes = 7,562).

In addition to rs11692564, we also observed two additional novel genome‐wide significant variants for lumbar spine BMD near EN1, rs6542457 (MAF = 5.8%) and rs188303909 (MAF = 1.6%), which are 391 kb downstream and 67 kb upstream from rs11692564, respectively (Fig. 1b and Table 1). Variant rs188303909 was in moderate linkage disequilibrium (LD) with rs11692564 (r2 = 0.47), and conditional analysis demonstrated that these two association signals were not independent (Supplementary Table 5). However, rs6542457 was in low LD with rs11692564 (r2 = 0.002), and remained independent in conditional analyses (Supplementary Table 5). Overall, the EN1 locus harbours multiple non‐coding variants associated with lumbar spine and a single variant associated with femoral neck BMD. All three genome‐wide significant variants for lumbar spine BMD (Table 1) co‐localize solely with EN1 in a sub‐region of high interaction frequency within a single topologically associated domain19 (Fig. 1a).

The mean effect size of previously reported genome‐wide significant single nucleotide polymorphisms (SNPs) (MAF ≥ 5%) from the largest GWAS meta‐analysis to date for lumbar spine and femoral BMD was 0.048 s.d. and the largest effect size was 0.1 s.d.8. Hence, the observed effect size at rs11692564 is fourfold larger than this mean and twice that of the largest previously reported effect (Fig. 1c)8. For all genome‐wide significant variants, we observed larger effect sizes across decreasing MAF bins (Fig. 2a).

Figure 2: Genome‐wide features of association signals.
figure 2

a, Box plots of the effect sizes of genome‐wide significant SNVs (P < 1.2 × 10−8), pruned for LD (r2 < 0.2) by MAF bin for discovery cohorts. Grey bars represent the values of β not observed and for which we lack statistical power to observe (at α ≤ 1.2 × 10−8 and power ≥ 0.8). P values per phenotype are from the non‐parametric trend test across MAF bins (see Supplementary Information). b, Proportion of single nucleotide variants (SNVs) passing a false discovery rate (FDR) q‐value of 0.05 across different annotation features in discovery cohorts (green) versus matched control variants (red). The three panels on the right‐hand side show enrichment across a range of evolutionary constraint scores (GERP++ score), in which green denotes SNVs above the threshold and red denotes variants below the threshold. Bars represent standard error (for Methods refer to the Supplementary Information). FA, forearm; FN, femoral neck, LS, lumbar spine.

PowerPoint slide

Source data

An increase in BMD is associated with a decrease in risk of bone fracture. We therefore tested the association of rs11692564(T) (the low‐frequency allele at EN1 associated with the largest increase in BMD) in 18 cohorts comprising 508,253 individuals (98,742 cases and 409,511 controls, Supplementary Table 6). rs11692564(T) was strongly associated with a decreased risk of fracture (odds ratio (OR) = 0.85 (95% confidence interval (CI): 0.80–0.89); P = 2.0 × 10−11; I2 = 0.00) (Table 2 and Supplementary Table 7). Table 2 also shows clear associations between other variants near EN1 and risk of fracture. The fracture association at rs11692564 was 2.9‐fold larger than the mean of fracture associations detected in the largest GWAS to date, and 2.0‐fold larger than the largest previously identified fracture association8.

Table 2 Fracture meta‐analysis of EN1 variants

EN1 encodes a homeobox gene central to mouse limb development20, which has been shown to be involved in Wnt signalling interaction with Dkk1 (ref. 21). Studies of calvarial bone development and fracture healing of long bones in mice have shown that perinatal En1−/− mutants display osteopenia and enhanced skull bone resorption22, whereas in normal adult mice En1 is upregulated in the bone callus post‐fracture22. Investigating the functional role of EN1, we detected En1 expression during osteoblastogenesis in developing and mature cultured murine calvarial osteoblasts, but not in marrow‐derived osteoclasts, or in human primary osteoclast cultures (Fig. 3a and Extended Data Fig. 3). To determine where En1 is active in adult bones, we analysed vertebrae from En1lacZ/+ knock‐in mice23 and detected LacZ expression in proliferative and hypertrophic chondrocytes, osteogenic cells in the periosteum and trabecular bone surface, and in osteocytes of cortical and trabecular bone (Fig. 3b and Extended Data Fig. 4).

Figure 3: Mouse En1 functional experiments.
figure 3

a, Left, quantitative expression of En1 and its temporal pattern (RNA‐seq) in cultured calvarial murine osteoblasts (n = 3 per time point). Right, confirmation of the expression of En1 in a separate RT–PCR experiment of cultured calvarial murine osteoblasts and lack of expression in osteoclasts matured from bone-marrow-derived precursor cells (positive controls for osteoblasts (osteocalcin) and osteoclast (RANK) are also shown). TPM, transcripts per million. b, Representative sections from lumbar vertebra 2 show the growth plate and bone marrow (GP and BM, left), cortical bone (CB, middle), and trabecular bone (TB, right) at ×40 magnification from En1lacZ/+ adult mice (n = 2) stained for β‐gal activity (LacZ blue, En1+ cells) and alkaline phosphatase (AP, red late chondrocytes and actively calcifying tissues). In the periosteum (PO), all the LacZ+ cells were AP+; some AP BM cells expressed LacZ. Some AP proliferative chondrocytes in the GP expressed LacZ+, whereas most AP+ hypertrophic chondrocytes expressed LacZ. Some AP osteocytes (Ocy) in CB and TB were LacZ+. c, Left, histomorphometry images of lumbar vertebrae 5 show decreased trabecular bone volume and increased bone surface area occupied by osteoclast cells when comparing En1cre/flox (self‐deleted En1, sdEn1) mutants and En1flox/+ control mice. Right, reconstructed micro‐CT images show the mineral density in a control and an sdEn1 animal. d, Micro‐CT and histomorphometry measures within sdEn1 (n = 5) and controls (En1lox/+, n = 6). By micro‐CT, sdEn1 mutants exhibit decreased L5 trabecular number (Tb.N) and thickness (Tb.Th), as well as deceased bone volume fraction (BV/TV). Using histomorphometry, sdEn1 mutants exhibit increased osteoclastic area (TRAP/BS). BS, bone surface; TRAP, tartrate acid staining. Average for each measure denoted by the solid horizontal line. For each group, P value between control and sdEn1 is noted below label and was computed using paired t‐test.

PowerPoint slide

Source data

Using En1cre/+; R26lox‐STOP‐lox–EYFP reporter mice to genetically tag cells for which the En1 promoter was active at any point within a cell lineage, we confirmed that En1 expression was only observed in osteogenic lineages (Extended Data Fig. 4). As most En1−/− animals die soon after birth, we generated En1cre/flox self‐deleted En1 (sdEn1) conditional mutants24 (n = 5) and demonstrated by X‐ray micro‐computed tomography (micro‐CT) that mutants have lower trabecular bone volume fraction (BV/TV), trabecular number, and trabecular thickness in both the lumbar L5 vertebrae (Fig. 3c, d and Extended Data Fig. 5) and the femur (Extended Data Fig. 5) as compared to littermate controls (n = 6). A decrease in femoral cortical thickness was also observed (Extended Data Fig. 5). By histomorphometry (Fig. 3c), we observed that the sdEn1 mice had a statistically higher proportion of osteogenic and osteoclastic cells compared to littermate controls (Fig. 3d and Supplementary Table 8). The driving force for the low bone mass would appear to be an increase in osteoclastic activity induced by En1 null osteogenic cells. This in turn initiates the expected coupled increase in mineralizing bone formation (Fig. 3b, d) mediated by an increased number of osteogenic cells and thus conforms to a high turnover osteoporosis‐like phenotype, although dynamic histomorphometry and evidence from bone turn‐over markers would be required to confirm an increased rate of bone formation (Extended Data Fig. 4). Genetic evidence from homologous regions in mice also supported a role for En1 in bone, as the homologous region contained a quantitative trait loci (QTL) peak for femur BMD (Supplementary Table 9)25. These findings, together with an earlier study focusing on En1 function in calvarial bone development22 implicate this gene as an important mediator in skeletal biology.

Together, these findings suggest that EN1 plays an important role in bone physiology and that low‐frequency non‐coding variants mapping near EN1 have large effects on BMD and risk of fracture in the general European population.

We also identified a novel SNV at 7q31.31 within the intron of CPED1 (rs148771817(T), MAF = 1.2%, effect size = +0.47 s.d., Pdiscovery = 9.31 × 10−9) associated with forearm BMD (Table 1, Supplementary Table 10 and Extended Data Fig. 6). We replicated the association at rs148771817 in 2,539 independent individuals and found a similar effect size (effect size = +0.41 s.d., P = 6 × 10−4), and combined meta‐analysis of the discovery and replication cohorts for further improved statistical evidence for association (+0.46 s.d., P = 1 × 10−11) (Table 1). This variant had an effect size 2.2‐fold larger than the mean of previously reported effects for common variants associated with forearm BMD (Extended Data Fig. 6)26.

We previously identified rs7776725 to be associated with BMD at WNT16, a gene neighbouring CPED1, (Extended Data Fig. 6) and demonstrated that knockout of Wnt16 in mice confers a 50% decrease in bone strength (P = 7 × 10−13)26,27. We have recently shown that osteoblast‐derived Wnt16 represses osteoclastogenesis28. As a result, we undertook conditional analysis of rs148771817 upon rs7776725. The rs148771817 variant remained associated after conditioning, albeit with lower statistical significance (effect size = 0.35 s.d.; Pmeta = 1 × 10−7; Extended Data Fig. 6d). Similarly, conditional analysis of the common variant upon rs148771817 revealed little change in the effect size or the statistical significance (Supplementary Table 5). Although we acknowledge that both variants may be causal, our data does not permit us to distinguish if one or both of these variants have distinct biologic effects.

While rs148771817 is intronic in CPED1, we found that DNA accessibility at this region, as measured by DNase I hypersensitivity data from ENCODE studies, was moderately correlated with DNA accessibility at the WNT16 promoter in 305 cell types (maximum r2 = 0.4, P = 2.2 × 10−15, Supplementary Table 11), whereas correlation to the promoter of CPED1 was lower (maximum r2 = 0.1, P = 0.06). Moreover, analysis of chromosome conformation capture Hi‐C interaction frequencies from human H1 embryonic stem cells shows elevated interaction frequency between rs148771817 and WNT16 (Extended Data Fig. 6), though we also observed stronger interactions between these loci and their immediate neighbouring regions.

We assessed whether association signals were enriched for deleterious coding SNVs or SNVs with increased evolutionary constraint (see Supplementary Methods). These two groups of SNVs were matched to control SNVs by MAF and distance to gene (Supplementary Methods and Supplementary Table 12), followed by LD pruning (r2 < 0.2). We observed enrichment of association signal across the spectrum of positive evolutionary constraint thresholds, which was comparable to deleterious coding variants (Fig. 2b).

In total, we have identified multiple variants associated with BMD, including 3 genome‐wide significant loci for forearm BMD, 14 for femoral neck and 19 for lumbar spine (Supplementary Tables 10, 13, 14, 15, and Extended Data Figs 7 and 8). A common variant not on previous HapMap imputation panels, near the SOX6 gene was also identified (rs11024028, MAF = 20%) (Table 1), and was found to be an independent signal from a previously reported signal at this locus (rs7108738, r2 = 0.002)8. Consistent with recent experiments29,30, region‐based collapsing methods did not identify any convincing novel associations that were not already identified as genome‐wide significant through single SNV associations. This included collapsing variants below 1% and 5% MAF thresholds, including all variants, only variants with increased GERP++ scores or those from protein‐coding regions (Supplementary Table 16 and Extended Data Figs 9 and 10).

We have identified low‐frequency, non‐coding genetic variants of large effect that are present in the general population and associate with BMD and fracture. These variants have effect sizes up to fourfold larger than the mean effect described for common variants associated with BMD and approximately threefold larger than those for fracture. Our study illustrates that larger reference panels, covering relevant ethnicities, will facilitate the discovery of low frequency and rare variants. This was enabled here by a large imputation reference panel (UK10K and 1000 Genomes) which offered tenfold more European samples than the 1000 Genomes reference panel available at the time of analysis (phase I version 3). Although we did not identify coding low‐frequency or rare variants associated with BMD at a genome‐wide significant level, we did observe that deleterious coding variants were enriched for association as a group. This suggests the existence of as yet undiscovered coding variants influencing BMD. Importantly, we have also generated new functional evidence for a central role of the homeobox protein engrailed-1 gene in regulation of BMD and identified EN1 as a critical protein in bone biology. Our findings demonstrate the utility of whole‐genome sequencing‐based discovery and deep imputation to enable the identification of novel genetic associations. These discoveries provide an improved understanding of the pathophysiology of osteoporosis and suggest that more comprehensive sets of whole‐genome sequenced individuals, covering relevant ethnicities, will enable accurate imputation and thus facilitate discovery of low frequency and rare variants influencing complex traits and common disease.

Methods

More details for the Methods are in the Supplementary Information. All human studies were approved by their institutional ethics review committees, and all participants provided written informed consent.

Data reporting

No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment, except the teams undertaking micro‐CT and histomorphometry experiments were blinded to each other’s results.

Whole‐genome sequencing

ALSPAC and TwinsUK cohorts were sequenced at an average read depth of 6.7× through the UK10K program (http://www.UK10K.org) using the Illumina HiSeq platform, and aligned to the GRCh37 human reference using BWA31. SNV calls were completed using samtools/bcftools and VQSR and GATK were used to recall these calls.

Whole‐exome sequencing

The AOGC, FHS, RS‐I, ESP and ERF cohorts were whole‐exome sequenced as described in the Supplementary Information.

Whole‐genome genotyping

All remaining discovery cohorts were genome‐wide genotyped and imputed to the UK10K/1000 Genomes reference panel, as described in the Supplementary Information.

Association testing for BMD

Single variants with a MAF > 0.5% were tested for an additive effect on lumbar spine, femoral neck and forearm BMD, adjusting for sex, age, age2, weight and standardized to have a mean of zero and a standard deviation of one. Meta‐analysis of cohort‐level summary statistics was undertaken using GWAMA32. Conditional analyses for significant SNVs was performed using GCTA33. Region‐based collapsing tests were performed using skatMeta34, an implementation of the SKAT method35 that enables the meta‐analysis of multiple cohorts. For each cohort, variants with MAF ≤ 5% or ≤1% were collected and meta‐analysis using skatMeta was conducted for windows of 30 SNVs within each region, overlapping by 10 SNVs.

Replication genotyping

Lead SNVs were selected for replication genotyping, which was performed at LGC Genomics, Erasmus MC and deCODE Genetics using KASP genotyping. Association testing for replication genotyping was undertaken using the same additive model, using the same covariates for BMD, as above.

Fracture association testing

Fractures were defined as those occurring at any site, except fingers, toes and skull, after age 18. Both incident and prevalent fractures were included and were verified by either radiographic, casting, physician, or subject reporting. Fractures resulting from any type of trauma were considered. Covariates included in the additive model were age, age2, sex, height, weight, oestogen/menopause status (when available), ancestral genetic background and cohort‐specific covariates (such as clinical centre). Association testing was done in two phases. The first involved all 1,482 genome‐wide significant SNVs for BMD. In the second phase of fracture association testing, variants at EN1 were assessed in 18 cohorts, comprising 98,467 cases and 409,736 controls. Meta‐analysis of cohort‐level summary statistics was performed using GWAMA32.

Functional genomics

We tested whether variants with increasing GERP++ scores36 were more strongly associated with BMD than SNVs matched for distance to gene and MAF, after LD pruning using PLINK37 at an r2 of <0.2, using windows of 100 kb and a step of 20 kb. Coding variants were partitioned as deleterious using Variant Effect Predictor38 LD pruned (r2 < 0.2). The proportion of variants passing an FDR q‐value of ≤0.05 were reported.

En1 murine expression experiments

Pre‐osteoblast‐like cell were differentiated to osteoblasts from calvaria of C57BL/6J mice and expression levels of each gene was quantified using RNA‐seq. The temporal expression of En1 in cell culture experiments of these osteoblasts and bone-marrow-derived osteoclasts (isolated from long bones of six‐week‐old mice) was measured by PCR, with Bglap (osteocalcin) and Tnfrsf11a (RANK), serving as controls. Total mRNA for En1 in osteoblasts was quantified using real‐time PCR.

Micro‐CT and histomorphometry

Mouse husbandry and all experiments were performed in accordance with Memorial Sloan‐Kettering Cancer Center Institutional Animal Care and Use Committee‐approved protocols. Bone characteristics of self‐deleted conditional En1(sdEn1) mutants were compared to En1+/flox littermates using micro‐CT. The same animals were assessed for histomorphometry (and laboratories performing micro‐CT and histomorphometry were blinded to each other’s results). After tissue sectioning, samples were stained for calcification (calcein blue), tartrate acid (TRAP) to assess for osteoclasts and alkaline phosphatase to assess for osteoblasts.

Murine histology

Two‐month‐old En1lacZ/+ mice39 were sectioned at bone sites and stained for X‐gal and/or alkaline phosphatase and imaged at ×400.