Introduction

Nonsyndromic orofacial clefts (OFCs) are among the most common human birth defects, occurring in 1 in 700 live births worldwide (Leslie and Marazita 2013). Nonsyndromic OFCs occur in the absence of other major cognitive or structural abnormalities and have a complex etiology reflecting the combined actions of multiple genetic and environmental risk factors. The focus of much of the OFC genetics research has been on the most common forms: cleft lip with or without cleft palate (CL/P) and cleft palate alone (CP) (Dixon et al. 2011; Leslie and Marazita 2013). Multiple successful genome-wide linkage and association studies have contributed to the substantial progress in identifying potentially causal genes for OFCs over the past 10 years. To date, there have been eight CL/P GWASs (Beaty et al. 2010; Birnbaum et al. 2009; Camargo et al. 2012; Grant et al. 2009; Leslie et al. 2016a; Mangold et al. 2010; Sun et al. 2015; Wolf et al. 2015), a genome-wide meta-analysis of two CL/P GWASs (Ludwig et al. 2012), and two GWASs of CP (Beaty et al. 2011; Leslie et al. 2016b).

Collectively, these studies have demonstrated that OFCs exhibit significant genetic heterogeneity. For CL/P, at least 20 different genetic loci have been identified with compelling statistical and biological support. In contrast, only two GWASs for CP have been published with mixed results. The first study, despite interrogating 400 CP case-parent trios, did not identify any statistically significant SNP main effects (Beaty et al. 2011). The second study identified a single locus associated with CP, but this association signal was limited to European populations because of very low frequencies of the risk allele in other populations (Leslie et al. 2016b; Mangold et al. 2016). For both CL/P and CP, the identified risk loci only account for a modest portion of the genetic variance of OFCs, suggesting that additional genetic risk factors may be involved. CL/P and CP have historically been considered distinct disorders due to the different developmental origins of the lip and palate (Jiang et al. 2006), different prevalence rates among males and females (Mossey and Little 2002), and different proportions of syndromic cases (50% CP vs. 30% for CL/P) (Leslie and Marazita 2013). In the current study, we sought to identify additional genetic risk variants for OFCs, considering the historical groupings of CL/P and CP, but also exploring the possibility of shared etiology. Therefore, we conducted genome-wide meta-analyses for CL/P, CP, and all OFCs, drawing from the two largest CL/P studies published to date and the two published CP studies.

Methods

Contributing GWAS studies

Two consortia contributed to this study (Table 1). The first, hereafter called GENEVA OFC, used a family-based design and included 1604 case-parent trios with CL/P and 475 case-parent trios with CP, respectively, from populations in Europe (Denmark and Norway), the United States, and Asia (Singapore, Taiwan, Philippines, Korea, and China). The specifics of this study were previously described in Beaty et al. (2010, 2011). Briefly, samples were genotyped for 589,945 SNPs on the Illumina Human610-Quadv.1_B BeadChip, genetic data were phased using SHAPEIT, and imputation was performed with IMPUTE2 software to the 1000 Genomes Phase 1 release (June 2011) reference panel. Genotype probabilities were converted to most-likely genotype calls with the GTOOL software (http://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool.html), using a genotype probability threshold of 0.9.

Table 1 Counts of cases, controls, and trios included in the study

The second consortium included samples contributing to the Pittsburgh Orofacial Cleft (POFC) study, comprising 823 cases and 1319 case-parent trios with CL/P, 78 cases and 165 case-parent trios with CP, plus 1700 unaffected controls. Participants were recruited from 13 countries in North America (United States), Central or South America (Guatemala, Argentina, Colombia, Puerto Rico), Asia (China, Philippines), Europe (Denmark, Turkey, Spain), and Africa (Ethiopia, Nigeria). Additional details on recruitment, genotyping, and quality controls are described in Leslie et al. (2016a, b). Briefly, samples were genotyped for 539,473 SNPs on the Illumina HumanCore + Exome array. Data were phased with SHAPEIT2 and imputed using IMPUTE2 to the 1000 Genomes Phase 3 release (September 2014) reference panel and converted to most-likely genotypes for statistical analysis.

A total of 412 individuals were in both the GENEVA OFC and POFC studies so we excluded these participants from the GENEVA OFC study for this analysis. Informed consent was obtained for all participants and all sites had both local IRB approval and approval at the University of Pittsburgh, the University of Iowa, or Johns Hopkins University.

SNP selection

Quality control procedures were completed in each contributing study and have been described extensively in the original publications (Leslie et al. 2016a, b; Beaty et al. 2010, 2011). In the POFC study, SNPs with minor allele frequencies (MAF) less than 1% or those deviating from Hardy–Weinberg Equilibrium (HWE p < 0.0001) in genetically defined, unrelated European controls were excluded. Similarly, SNPs with MAF <1% or those deviating from HWE were excluded. To account for different marker sets and identifiers between the two imputed datasets, the final analysis included only those overlapping SNPs that were matched on chromosome, nucleotide position, and alleles. A total of 6090,031 SNPs were included in the meta-analysis.

Statistical analysis

We identified three analysis groups from the contributing studies: a case–control subgroup from POFC, an unrelated case-parent trio group from POFC, and an unrelated case-parent trio group from GENEVA OFC. In the case–control subgroup, logistic regression was used to test for association under the additive genetic model while including 18 principal components of ancestry (generated via principal component analysis [PCA] of 67,000 SNPs in low linkage disequilibrium across all ancestry groups) to adjust for population structure (Leslie et al. 2016a). The two case-parent trio subgroups from POFC and GENEVA were analyzed separately using the transmission disequilibrium test (TDT). The resulting effects estimates for the three analysis groups were combined in an inverse variance-weighted fixed-effects meta-analysis. The combined estimate, a weighted log odds ratio, follows a Chi squared distribution with two degrees of freedom under the null hypothesis of no association. Heterogeneity of effects was examined using confidence intervals of the effect estimates. GWAS was performed for all cleft types combined and for the CL/P and CP groups separately.

Subpopulation analyses

Because the contributing studies contained individuals from diverse populations, we also performed stratified analyses of Asian and European ancestry groups defined by PCA (Table 1). We only considered these subpopulations because they were the only ancestry groups represented in both OFC and GENEVA. In these analyses, 5 and 3 principal components of ancestry were included in European and Asian case–control analyses, respectively (Leslie et al. 2016a).

Bioinformatic analysis of top hits

We performed functional annotation enrichment analysis on genes using ToppFun from the ToppGene Suite (Chen et al. 2009) and significance was assessed using Bonferroni adjusted p-values. Enrichment of SNPs in regulatory regions was performed using FORGE v1.2 (Dunham et al. 2014). Individual SNPs were annotated for potential regulatory function using HaploReg v4.1 (Ward and Kellis 2012, 2016).

Results

CL/P

In the CL/P meta-analysis of 823 cases, 1700 controls, and 2811 trios, 1248 SNPs from 13 loci reached genome-wide significance (Table 2; Fig. 1a). Of these, 11 loci have been reported previously, including 8q24, 1q32 (IRF6), and 17p13 (NTN1). The 15q24 (ARID3B) locus, which reached genome-wide significance, was reported as a suggestive signal in the POFC study (Leslie et al. 2016a). We detected a novel association on 3q28 (lead SNP rs76479869, p = 1.16 × 10−8) within the third intron of TP63 (Fig. 2). Six additional loci approached genome-wide significance with p-values less than 5 × 10−7. Three of these loci were suggestive in the contributing studies: 3q21.1 (COL8A1) in the GENEVA OFC study and 5q13.1 (PIK3R1) and 17q21.32 (GOSR2/WNT9B) in the POFC study. The remaining three loci have not been associated with CL/P previously and include 4q21.1 (SHROOM3), 12q13.13 (KRT18), and 8p12 (NRG1) (supplemental Figs. 1–3).

Table 2 Top hits from meta-analysis of CL/P, CP, or all OFCs in all populations
Fig. 1
figure 1

Manhattan plots for genome-wide meta-analyses: a cleft lip with or without cleft palate (CL/P), b cleft palate (CP), c all orofacial clefts (CL/P plus CP). The red line denotes a Bonferroni-corrected genome-wide significant p value (p < 5 × 10−8). Peaks are labeled with the candidate gene or closest gene in the region; colored labels indicate the locus was identified in a previous study, black labels indicate new loci

Fig. 2
figure 2

SNPs in a TP63 enhancer are associated with CL/P. a LocusZoom plot of CL/P meta-analysis results. Points are color-coded based on linkage disequilbrium (r 2) in Europeans. b Annotations for the depicted regions: chromatin state segmentation from ENCODE data in selected cell types, p63 ChIP-Seq and binding motifs from McDade et al. (2012), H3K27Ac and H3K4Meq from ENCODE. c Results from HaploReg analysis of SNPs in high linkage disequilibrium with lead SNP rs76479869. Filled boxes indicate the presence of the annotation

We also stratified our analyses by ancestral group to determine if there were stronger signals in these subgroups. Although we did not detect any new signals in these sub-group analyses, we did identify multiple genome-wide significant signals in each subgroup. In Europeans, we detected five genome-wide significant signals: 1p36 (PAX7), 8q21.3, 8q24, 17q23.2 (TANC2), and 17p13 (NTN1) (supplemental Fig. 4A; supplemental Table 1). In the subset with Asian ancestry, we detected three group-specific signals: 1p22 (ARHGAP29), 1q32 (IRF6), and 17p13 (NTN1) (supplemental Fig. 4B; supplemental Table 2). In addition SNPs on 10q25 (VAX1) and 20q12 (MAFB) approached genome-wide significance. Overall, the stratified CL/P results are in agreement with previous findings (Beaty et al. 2010, 2013; Leslie et al. 2015; Ludwig et al. 2012).

CP

The meta-analysis of CP included a total of 78 cases, 1700 controls, and 616 trios. We observed a single genome-wide significant hit previously identified on 1p36 in GRHL3 (Fig. 1b). The only other hit with a p value less than 1 × 10−5 was on 5p13.2 within UGT3A2 (lead SNP rs604328, p = 5.85 × 10−6; supplemental Fig. 5). In the European subgroup, we identified two suggestive signals (see supplemental Table 1): GRHL3, which we previously identified in Europeans (Leslie et al. 2016b), and a new hit on 11q22.2 (lead SNP rs2260433, p = 8.70 × 10−6; supplemental Fig. 6A). The Asian subgroup was limited to trios from the POFC and GENEVA OFC studies, so we performed a meta-analysis of just these two groups (272 total trios) using a one degree of freedom test. Although there were no genome-wide significant hits, three loci achieved p values <5 × 10−5, and these were driven by the GENEVA OFC trios (supplemental Fig. 6B). Specifically, markers on 8q21.3, 8q24.3 (n.b. this is not the 8q24 peak in Europeans at 8q24.21), and 16p12.1 yielded suggestive evidence (supplemental Table 2; Figs. 7–9).

All OFCs

Historically, CL/P and CP have been analyzed separately. Nevertheless, we hypothesize that there may be genetic risk variants common to both sub-types, and, therefore, analysis of all OFCs (i.e., CL/P plus CP) may yield greater statistical power to identify such shared variants. Therefore, we used the above analytical approach to determine if combined analysis would identify new loci conferring risk to all OFCs. We identified 11 genome-wide significant loci (Fig. 1c). Not surprisingly, all but one of these were genome-wide significant in the CL/P group, which was the largest contributing sample for this analysis. The remaining genome-wide significant signal was on 9q22, immediately downstream of FOXE1 (lead SNP rs12347191, p = 1.33 × 10−9; Fig. 3). This locus was not genome-wide significant in either the CL/P (p = 7.75 × 10−7) or CP analyses (p = 5.42 × 10−4) alone, nor was it significant in either of the contributing studies.

Fig. 3
figure 3

FOXE1 is associated with all OFCs. a LocusZoom plot for OFC meta-analysis results. Points are color coded based on linkage disequilbrium (r 2) in Europeans. Labeled SNPs are denoted by diamonds: lead SNP rs12347191 and rs7850258, a functional SNP from Lidral et al. (2015). b Craniofacial enhancers in the region. In blue, enhancers tested in zebrafish from Lidral et al. (2015) study. In purple, enhances from the VISTA enhancer browser tested in mouse. c Results from HaploReg analysis of SNPs located in craniofacial enhancers. Filled boxes indicate the presence of an annotation

In the all OFCs ancestry-specific subgroups, several previously reported associations were recapitulated. For example, in the European subgroup, the 17p13 (NTN1) and 8q24 loci reached genome-wide significance with 1p36 (PAX7) and 8q21.3 approaching genome-wide significance (supplemental Table 1; Fig. 10A). In contrast, 1p22 (ARHGAP29), 1q32 (IRF6), 10q25 (VAX1), 17p22 (NOG), and 20q12 (MAFB) showed genome-wide significant or suggestive p-values among Asians (supplemental Table 2; Fig. 10B). The strengths of these signals in European and Asian populations are consistent with the results of previous studies. Novel signals with suggestive evidence of association were 6p24.3 (lead SNP rs1333657, p = 8.34 × 10−6; supplemental Fig. 11) in Europeans, and 9q33.3 (lead SNP rs78427461, p = 8.42 × 10−6; supplemental Fig. 12) in Asians.

Bioinformatics analysis of top hits

To further explore the biological relevance of the associated loci, we subjected all genes in proximity to the top SNPs (±100 kb, N = 72 genes) to functional annotation enrichment analysis with ToppFun from the ToppGene Suite (Chen et al. 2009). These loci were enriched for genes expressed in the olfactory pit, olfactory placode, and non-floor plate epithelium identified by RNA-Seq in embryonic mouse tissues (E8.5–E10.5). Of the 17 genes with this expression pattern, several are known OFC risk genes based on recognized Mendelian syndromes in humans and/or mouse models with craniofacial anomalies (e.g., IRF6, TP63, VAX1, and PAX7). However, this analysis can help prioritize candidate genes from associated loci otherwise lacking functional supporting evidence: NRG1, ZFHX4, KRT8, SHTN1, and FILIP1L.

Recognizing that OFC GWAS signals generally occur in non-coding parts of the genome, we also performed a FORGE analysis to look for tissue-specific signals using the Roadmap Epigenomics project data (Dunham et al. 2014). We did not detect any enrichment of signals, which likely reflects the multiple tissue types involved in craniofacial development, and the relative inaccessibility of the key tissue types. However, inspection of individual regions revealed several intriguing findings.

The 3q28 association signal resides within the largest intron of TP63, specifically within an expanse of H3K27Ac and H3K4Me1 histone modifications in NHEK cells (normal human epidermal keratinocytes) (Fig. 2a, b). Recently, Antonini et al. (2015) showed the orthologous region in mouse acts a cis-regulatory element that recapitulates p63 expression and is positively regulated by p63 (Fig. 2b). We annotated all variants in strong linkage disequilibrium with rs76479869 for regulatory functions using HaploReg v4 (Fig. 2c). Among the altered transcription factor motifs were multiple annotations for the Fox family of transcription factors, Pou transcription factors, and CEBPB. Most notable among these is SNP rs55660938 as the risk allele creates a binding site for CEBPB, a protein previously demonstrated to negatively regulate this enhancer (Antonini et al. 2015).

The 9q22 locus near FOXE1 was previously interrogated for craniofacial regulatory elements in zebrafish (Lidral et al. 2015) and mouse (Attanasio et al. 2013) (Fig. 3b). Lidral et al. identified three regulatory elements at −82.4, −67.7, and +22.6 kb from the FOXE1 transcription start site that largely recapitulated endogenous FOXE1 expression in the oral epithelium, heart, and thyroid (Lidral et al. 2015). Independently, three additional elements downstream of FOXE1 showed activity in forebrain and facial mesenchyme of mouse embryos. Because these regulatory elements were already identified, we selected SNPs located within them for bioinformatics analysis (Fig. 3c). Among the annotations, the risk allele at rs925487 destroys a PLAG1 binding site while the risk allele at rs10119853 creates IRF binding sites.

Discussion

We performed meta-analyses of two large GWAS to identify novel loci associated with risk to OFC. We identified new genome-wide significant loci for CL/P (3q28, TP63) and all OFCs (9q22, FOXE1), and recapitulated prior results for multiple loci. Overall, our results are in agreement with previous findings (Beaty et al. 2010; Leslie et al. 2016a, b; Ludwig et al. 2016). Our stratified analyses in Europeans and Asians are consistent with our previous observations that stronger signals are found within the subpopulations with the most statistical power, reflecting the minor allele frequency and information content of SNPs in different populations (Murray et al. 2012).

TP63 is an essential regulator of epidermal morphogenesis. Dominant mutations clustered in TP63 cause six syndromes with overlapping phenotypic features: ectrodactyly-ectodermal dysplasia clefting syndrome (Celli et al. 1999), Hay–Wells syndrome (McGrath et al. 2001), Rapp-Hodgkin syndrome (Kantaputra et al. 2003), split-hand/foot malformation (Ianakiev et al. 2000), limb-mammary syndrome (van Bokhoven and Brunner 2002), and ADULT syndrome (Amiel et al. 2001). Affected individuals are variably affected with ectodermal dysplasia, orofacial clefting, and split-hand/foot malformation, among other features. The phenotypic spectrum may also include nonsyndromic clefts as a de novo mutation was previously reported in an individual with apparently nonsyndromic cleft lip and palate (Leoyklang et al. 2007). Deletion of p63 in mouse results in a similar constellation of defects (Yang et al. 1999).

The complex regulation of TP63 expression occurs in a tissue- and layer-specific manner and depends on two conserved modules within its intronic enhancer (Antonini et al. 2015). These modules positively regulate TP63 when bound by p63 protein but are negatively regulated by Cepba, Cebpb, and Pou3f1 transcription factors. Although there are no polymorphisms in the described p63 binding sites, our bioinformatics analysis showed the risk allele for rs55660938 creates a Cebpb binding site, suggesting misregulation of p63 expression via this enhancer element contributes to development of orofacial clefts.

We identified an association between SNPs on 9q22 and OFCs by combining the CL/P and CP subtypes. This result follows several previous studies on this locus beginning with a genome-wide linkage scan with fine-mapping in CL/P that originally implicated FOXE1 (Marazita et al. 2009), followed by additional fine-mapping in CL/P and CP that narrowed the critical region further to a region near the FOXE1 gene (Moreno et al. 2009). More recently, an independent replication found that two of the top SNPs from previous studies were associated with CL/P, but were more strongly associated when CL/P and CP were analyzed together (Ludwig et al. 2014). Recessive FOXE1 mutations cause Bamforth–Lazarus syndrome, a rare Mendelian disorder characterized by cleft palate and congenital hypothyroidism (Bamforth et al. 1989). Similarly, mice lacking Foxe1 have cleft palate and thyroid dysgenesis (De Felice et al. 1998). Despite the clear connection between OFCs and the FOXE1 gene, this locus has not been identified in any previous GWAS. Ludwig et al. (2014) speculated that this may because the top SNPs from Moreno et al. (2009) are not well-represented on commercial SNP panels and previous candidate studies included only small numbers of SNPs. Therefore, our success may be due to dense genotyping in the region through custom SNP content and imputation of untyped SNPs. In addition, previous studies have not performed GWAS of all OFCs together. There is a growing emphasis on identifying subtype-specific association signals (e.g., CL or CLP), and we and others have contributed to that endeavor (Jia et al. 2015; Ludwig et al. 2012, 2016; Rahimov et al. 2008). However, this study also demonstrates that some signals reflect a shared etiology among the various cleft subtypes that will only be identified when all OFCs are considered together.

There are no common missense polymorphisms in FOXE1 and rare variants identified by sequencing do not account for the association signal, leading to a hypothesis that the functional variants are regulatory. Recently, multiple craniofacial enhancers were identified in a zebrafish screen of multi-species conserved elements or by ChIP-Seq of p300 in mouse craniofacial tissue. In the zebrafish study by Lidral et al. (2015), differential activity was observed for the -67.7 kb element with alleles at rs7850258. The OFC risk allele creates MYC and ARNT binding sites that increase activity of the enhancer (Lidral et al. 2015). In our study, rs7850258 was not among the top SNPs (p = 1.45 × 10−5) and is not in strong linkage disequilibrium with our lead SNP, rs12347191, (r 2 = 0.33, D′ = 0.65 in Europeans, see Fig. 3). We were unable to perform conditional analyses because of the large number of trios contributing to this study, so it remains possible SNP rs7850258 is an independent association signal, which would be consistent with the risk haplotypes described in Moreno et al. (2009). Our bioinformatic analysis of the other craniofacial enhancers identified several motifs altered by OFC risk alleles that are candidates for the molecular validation needed to identify specific functional variants regulating FOXE1. A major contribution of this study is the dense genotyping of this region that could allow comprehensive interrogation of risk alleles for OFCs.

A number of loci approached genome-wide significance in the full CL/P GWAS. Most of these were suggestive in one of the contributing studies, but some were not observed previously, and included biologically relevant genes, SHROOM3 (4q21.1), keratins (12q13.13), and NRG1. SHROOM3 is an actin-binding protein required for neurulation. The 12q13.13 locus contains a cluster of type II keratins, which heteropolymerize to form intermediate filaments in epithelial cells. The GWAS approach has now pointed to several genes involved in the cytoskeleton (Leslie et al. 2012, 2016a). Additional molecular evidence on other OFC-related genes further supports regulation of cytoskeletal dynamics in the pathogenesis of OFCs (Biggs et al. 2014; Caddy et al. 2010; De Groote et al. 2015; Leslie et al. 2012). Similarly SHROOM3 and NRG1 join a list of genes, including GRHL3, IRF6, and TFAP2A, required for neurulation that are also implicated in OFCs (Copp and Greene 2013; Kousa et al. 2013; Wang et al. 2011).

In conclusion, we have performed a multi-ethnic genome-wide meta-analysis of CL/P, CP, and all OFCs combined which revealed two novel, biologically relevant genes, TP63 (for CL/P) and FOXE1 (for all OFCs). Previously reported associations were recapitulated, and several new suggestive loci were implicated. Overall, this study reinforces the notion that OFCs exhibit a high level of genetic heterogeneity and illustrates the utility of combining studies via meta-analysis to yield new discoveries. These findings contribute to our growing understanding of the genetic architecture of OFCs and may one day benefit recurrence prediction and prognosis.