Introduction

Asthma is a chronic inflammatory disorder of the airways that affects at least 334 million people around the world (Global Asthma Network 2014). In the USA, racial and ethnic disparities in asthma are among the most significant among chronic diseases (Oh et al. 2016). Specifically, African American children are twice as likely as European Americans to suffer from asthma (Akinbami et al. 2014; Burchard 2014) and to die from complications of the disease (Gorina 2012).

Asthma is a multi-factorial disease likely affected by genetic and environmental factors. The presence of a strong genetic influence on disease susceptibility is evident, with heritability estimates from twin studies ranging from 63 to 92 % (Fagnani et al. 2008; McGeachie et al. 2013; Nieminen et al. 1991). However, to date, genome-wide association studies (GWAS) of asthma have uncovered only a small number of loci with small to modest effects (Ober and Yao 2011). Together, these loci explain a small portion of asthma heritability (McGeachie et al. 2013). Despite the attention and resources allocated to genetic research in the past decade, globally diverse populations are severely understudied. To date, 96 % of all GWAS have been performed in populations of European ancestry; specifically, only 1.9 % of pulmonary studies have included members of racial/ethnic minority groups (Burchard et al. 2015; Bustamante et al. 2011). Notably, the few studies that have been performed in non-European populations have revealed the existence of ethnic-specific loci not identified in other populations (Torgerson et al. 2011).

The disparity in asthma prevalence is mirrored in obesity, in which African American children have higher prevalence of obesity than European Americans (Guerrero et al. 2016). Previous studies report overlapping risk factors between asthma and obesity. Although the co-occurrence of asthma and obesity is common, the exact causal relationship between the two phenotypes remains unclear (Thomsen et al. 2007). Moreover, among African Americans, the association of obesity with asthma is stronger than in other racial/ethnic groups (Joseph et al. 2016). Genetic associations with asthma may potentially be mediated by obesity status. Pleiotropic effects between obesity and asthma at the single nucleotide polymorphism (SNP) or gene level may partly explain the association between these two conditions (Gonzalez et al. 2014; Melen et al. 2010, 2013; Wang et al. 2015). The majority of asthma genetic studies have not explored whether obesity mediates the association between identified genetic variants and asthma status (Akhabir and Sandford 2011; Hoffjan et al. 2003). Identifying loci related to asthma and obesity in minority populations has the potential of revealing genetic mechanisms underlying asthma severity and the response to therapy (McGarry et al. 2015; Sheehan and Phipatanakul 2015).

To address these gaps in clinical and biomedical research, we performed a genome-wide association study of asthma in African American. We then evaluated the impact of obesity as a mediator on significant associations between genetic variants and asthma. We also investigated whether genetic risk factors for asthma, previously identified in European and Asian populations, were generalizable to African Americans.

Material and methods

Study population

The Study of African Americans, Asthma, Genes, and Environments (SAGE II) is the largest ongoing gene-environment interaction study of asthma in African American children in the USA. SAGE II includes detailed clinical, social, and environmental data on asthma and asthma-related conditions with corresponding environmental exposure, demographic, and psychosocial information. Full details of the SAGE II study protocols have been described in detail elsewhere (Borrell et al. 2013; Nishimura et al. 2013; Thakur et al. 2013). Briefly, SAGE II was initiated in 2006 and recruited participants with and without asthma through a combination of clinic- and community-based recruitment centers in the San Francisco Bay Area. Institutional review boards of participant centers approved the study, and all participants or, for participants 17 or younger, their parents provided written informed consent. Participants 17 or younger also provided age-appropriate assent. Asthma cases were defined as individuals with a history of physician-diagnosed asthma and asthma controller or rescue medication use within the last 2 years and report of symptoms. Participants were eligible if they were 8–21 years of age and self-identified as African American and had four African American grandparents. Study exclusion criteria included the following: (1) any smoking within 1 year of the recruitment date, (2) 10 or more pack-years of smoking, (3) pregnancy in the third trimester, and (4) history of lung diseases other than asthma (cases) or chronic illness (cases and controls). The SAGE II study enrolled 1556 (920 cases and 636 controls) from 2006 to August 2013.

Genotyping and quality control

DNA was isolated from whole blood collected from 1381 SAGE II participants (817 cases and 564 controls) at the time of study enrollment using the Wizard® Genomic DNA Purification kits (Promega, Fitchburg, WI). Samples were genotyped with the Affymetrix Axiom® LAT1 array (World Array 4, Affymetrix, Santa Clara, CA), which includes 817,810 SNPs. This array was designed to capture genetic variation in African-descent populations such as African Americans and Latinos (Hoffmann et al. 2011). Genotype call accuracy and Axiom array-specific quality control metrics were assessed and applied according to the protocol described in further detail in Online Resource 1. Briefly, quality control inclusion criteria consisted of genotyping efficiency > 95 %, Hardy-Weinberg Equilibrium (HWE) p > 10−6, and minor allele frequency (MAF) > 5 %. Cryptic relatedness was also assessed to ensure that samples were effectively unrelated (no closer than third-degree relatives). After quality control procedures, 797,128 SNPs were available for analysis in a total of 1227 (812 asthma cases, 415 controls) individuals with complete measurements of global African ancestry, age, and sex.

Calculation of BMI categories

Body mass index (BMI) was calculated using height and weight according to the standardized formula (\( \frac{weight\;(kg)}{height^2\;\left({m}^2\right)} \)) (Center for Disease Control and Prevention 2014). BMI categories were defined as follows: obese (BMI ≥ 30) and non-obese (BMI < 30) for participants 20 and over. For individuals under 20 years of age, BMI measurements were first converted to age- and gender-specific BMI percentiles (BMI-pct) before assignment to BMI categories using the following criteria: obese (BMI-pct ≥ 95) and non-obese (BMI-pct < 95) (Center for Disease Control and Prevention 2014; McGarry et al. 2015). BMI information was only available for 1080 study participants (88 %).

Genetic ancestry estimation

The genetic ancestry of each participant was determined using the ADMIXTURE software package (Alexander et al. 2009) and modeled assuming two ancestral populations (African and European) using the HapMap phase III data from the YRI and CEU populations as reference (International HapMap et al. 2010) (Fig. 1).

Fig. 1
figure 1

Admixture proportions assessed with ADMIXTURE for the SAGE II participants included in this study: a cases and b controls. Each vertical bar represents one individual, and colors display the proportions of European (blue) and African (red) ancestry

Calculation of population-specific genome-wide significance threshold

The standard, and most commonly accepted, GWAS threshold for statistical significance is 5 × 10−8. This number was derived from applying the Bonferroni correction for multiple testing to a dataset of one million independent markers/SNPS. However, in many cases, this threshold is overly conservative and inappropriate, in cases where a smaller number of markers are genotyped and also in instances where the assumption of independence of tests is violated. The assumption of independence of tests is routinely violated in genetic studies due to the presence of linkage disequilibrium (LD) between markers/SNPs.

In order to correctly apply the Bonferroni correction to our dataset, we calculated the number of effectively independent tests among our 797,128 genotyped SNPs (independent tests = 116,105) using the protocol published by Sobota et al. (2015). This method estimates the effective number of independent tests in a genetic dataset after accounting for LD between SNPs using the LD pruning function in the PLINK 1.9 software package (Chang et al. 2015; Purcell and Chang 2015). The following parameters were used in PLINK 1.9 as advised by the authors: 100 SNP sliding window, step size 5 base pairs, and variance inflation factor 1.25. The number of independent tests was determined to be 116,105. This number was then used to calculate the genome-wide significance threshold (Bonferroni correction 0.05/116,105 = 4.3 × 10−7). A suggestive threshold was set at p < 10−6 for association results.

Statistical analysis

Comparison of demographic variables

Demographic characteristics were compared between asthma cases and controls (Table 1). If normally distributed, continuous variables were compared using the Student’s t test; if variables were non-normally distributed, the Wilcoxon rank sum test was performed. Dichotomous variables were assessed using the chi-squared test. Differences in global African ancestry proportions between cases and controls were assessed using the difference in proportion test (Acock 2014; Wang 2000). All tests were performed using STATA version 12 (StataCorp. 2011).

Table 1 Distribution of demographic characteristics for SAGE II participants

Association testing

Logistic regression was performed to assess the relationship between SNP genotype and asthma status. Genetic variants were coded to assess the additive effect of the minor allele (genotype coding, 0 = homozygous major, 1 = heterozygous, 2 = homozygous minor). All regression models were adjusted for age, sex, and global African ancestry. All testing was performed using the PLINK1.9 software package (Chang et al. 2015; Purcell and Chang 2015). Quantile-quantile (QQ) (Fig. 2) and Manhattan (Fig. 3) plots were generated using the “qqman” package (Turner 2014) in the R statistical software environment (R Development Core Team 2010). Significant and suggestive associations were further evaluated to determine if SNP effects were mediated by obesity status. Mediation analysis was performed using the “mediation” package in the R statistical software suite (Imai et al. 2010; Tingley et al. 2014) to determine the proportion of the total association due to an indirect effect of SNP genotype on asthma status via obesity (Table 4). Non-parametric bootstrap resampling (n = 1000) was used to generate 95 % confidence intervals and p values for predicted mediation estimates.

Fig. 2
figure 2

Quantile-quantile plot of the GWAS of asthma

Fig. 3
figure 3

Manhattan plot of the GWAS of asthma. In the x-axis, the chromosomal position of each SNP is represented and in the y-axis the −log10 transformation of the association p value for each SNP

Replication of previous asthma associations

We identified 53 previously reported SNP associations with asthma from 15 separate studies by assessing the NHGRI-EBI Catalog of published GWAS (Burdett et al. 2014; Welter et al. 2014) using the search term “asthma” and a significance threshold of 5 × 10−8 (Online Resource 2, Table S1). It is important to note that this number includes only studies focused specifically on asthma and not distinct, but related, phenotypes such as drug response, airflow obstruction, baseline lung function, etc. Of the 53 variants, 30 were genotyped in our study population (Online Resource 2, Table S1). The remaining 23 variants were imputed using the IMPUTE2 (Howie et al. 2009, 2011) software package and haplotypes from all available populations in the 1000 Genomes Project phase 1 as a reference panel (Genomes Project et al. 2010), obtained with SHAPEIT (Delaneau et al. 2012). SNPs with an IMPUTE2 information score < 0.3 were excluded from analysis. A total of 53 variants remained for replication analysis following quality control procedures (Online Resource 2, Table S1S2).

Allelic dosage data from the imputed SNPs were analyzed for association with asthma status using logistic regression under an additive model; genotypes were coded to measure the effect of the minor allele as previously described. Regression covariates included age, sex, and global African ancestry. The significant threshold for replication was p ≤ 0.05. Regression analyses were performed using the PLINK1.9 software package, as previously mentioned.

Functional annotation of associated variants

For variants with association p values that were genome-wide significant or suggestive, we performed a scan of functionality using HaploReg v4.1 (Ward and Kellis 2012) to query the empirical data from the ENCODE project and several expression quantitative trait loci (eQTL) databases. We also used the online bioinformatics data mining tool, SNPInfo, to determine if associated SNPs were predicted to affect biological functions (Xu and Taylor 2009).

Results

Study population

Demographic information for the SAGE II individuals included in this study (n = 1227) is presented in Table 1. The median age in control individuals (median = 16.4) was significantly higher than among asthma cases (median = 13.6) (p < 0.001). The percentage of females was also higher in controls (56.1 %) relative to cases (47.2 %) (p = 0.003). Obesity was significantly associated with asthma status (p = 0.002). However, no significant differences in global African ancestry were observed between asthma cases and controls (p = 0.81, Fig. 1).

Asthma GWAS

Our GWAS results did not show evidence of inflation due to population stratification (Fig. 2). We identified two variants on chromosome 10 in the patched domain-containing protein 3 (PTCHD3) gene region, that were significantly (rs660498, OR = 1.62; p = 2.20 × 10−7) and suggestively (rs2484180, OR = 1.54; p = 2.84 × 10−6) associated with increased asthma susceptibility (Table 2, Fig. 3). Two additional SNPs, rs17446324 (OR = 0.41; p = 5.00 × 10−6) located in the class 3 semaphorin (SEMA3E) gene on chromosome 7 and rs67731056 (OR = 1.88; p = 6.70 × 10−6) in the insulin receptor (INSR) gene region on chromosome 19, were suggestively associated with asthma risk. Of note, of the four identified associations, the minor allele for only rs17446324 was protective; in the other three loci, the minor allele conferred increased risk for asthma.

Table 2 Significant and suggestive allelic associations with asthma in SAGE II

These four SNPs were then evaluated using mediation analysis to determine if the association signal detected was a result of a direct effect of SNP genotype on asthma status or reflective of an indirect effect on asthma status mediated through obesity. Mediation analysis did not uncover any significant mediation effects via obesity status and provided strong evidence that SNP effects were truly indicative of “direct” asthma associations (Table 4).

Replication of previously identified asthma variants

To determine what proportion of asthma genetic associations discovered in other populations were generalizable to a pediatric African American population, we attempted to replicate previous findings using our study population (Table 3, Online Resource 2 (Table S3)). Of the 53 SNPs evaluated, only 3 nominally replicated in our study (p < 0.05) (Table 3). Importantly, only a single SNP, rs204993, on chromosome 6 displayed the same direction of effect in published studies as in our study population.

Table 3 Replication of previously identified GWAS associations with asthma/asthma-related phenotypes in SAGE II

Discussion

We performed a genome-wide association analysis of asthma susceptibility in African American children and identified novel, and possibly ethnic-specific, genetic associations. Our two most significant associations were between increased asthma susceptibility and genetic variation in the PTCHD3 gene region; specifically, rs660498, a variant located within 40 kb of the PTCHD3 gene boundary, and rs2484180, a missense SNP responsible for a cysteine to glycine change, located in the first exon of PTCHD3. Located less than 50 kb apart, these two variants are in moderate LD (r 2 = 0.44, p < 0.001). Notably, rs2484180 is located proximally to an exon splice region and is predicted to affect protein stability (Xu and Taylor 2009) and acts as an eQTL in adipose tissues (Ward and Kellis 2012). SNP rs660498 is located in an enhancer histone mark in monocytes and CD14+ cells, acting as an eQTL in whole blood (Ward and Kellis 2012). A recent study identified SNPs within patched-chain domain protein 1, PTCHD1, an important paralog of PTCHD3, as predictive of asthma exacerbations (Xu et al. 2011). PTCHD3 and PTCHD1 are both predicted to play a role in the Hedgehog signaling pathway, although not as well characterized as PTCHD1 (Furmanski et al. 2013; Ghahramani Seno et al. 2011). Functional studies highlight the major role that PTCHD1 plays in modulating T cell differentiation via regulation of the Hedgehog signaling pathway (Furmanski et al. 2013). PTCHD1 upregulation was shown to induce a Th2 phenotype in peripheral CC4+ T cells, and several studies have reported the importance of Th2 cells in the pathophysiology of asthma (Barnes 2001; Fahy 2015; Lloyd and Hessel 2010). To our knowledge, our report is the first instance of genetic variation in the PTCHD3 gene that is associated with increased asthma susceptibility. Interestingly, this gene has been previously associated with BMI in the Family Heart Study (FamHS) (National Center for Biotechnology Information 2009) and with an interaction of fasting glucose-related traits with BMI (Manning et al. 2012).

We also discovered two suggestively associated non-synonymous SNPs located in the SEMA3E and INSR genes. Genetic variation in SEMA3E has been previously associated with adipose tissue inflammation; it has been hypothesized that inflammatory cells in adipose tissue may promote an inflammatory state in the lungs and encourage the development of asthma (Mohanan et al. 2014; Shimizu et al. 2013). Lastly, INSR is the main regulator of insulin, which is known to affect smooth muscle contraction in the lungs and therefore may play a plausible biological role in asthma susceptibility (Schaafsma et al. 2007; Singh et al. 2013). Similar to both PTCHD3 and SEMA3E, INSR has also been linked with several BMI-related phenotypes (Pankov Iu 2013; Parekh et al. 2015).

The discovery that all of our associated variants were located within gene regions previously associated with obesity or BMI-related phenotypes motivated us to investigate whether our identified associations between SNP genotype and asthma status were mediated by obesity status. We performed mediation analysis to determine whether our significant associations were due to a direct relationship between SNP genotype and asthma susceptibility or were in fact the result of an indirect association with asthma susceptibility mediated by obesity/BMI. Results from our mediation analysis provided strong evidence that the effects of all of four of our SNPs of interest, either significantly or suggestively associated with asthma, were independent of obesity status (Table 4).

Table 4 Mediation analysis results for top asthma susceptibility associations in SAGE II

We have previously demonstrated that genetic and pharmacogenetic associations for asthma vary among different racial and ethnic groups (Drake et al. 2014; Galanter et al. 2014; Pino-Yanes et al. 2015; Torgerson et al. 2011). Although there are many potential explanations for these observations, one possibility is that there are unique racial and ethnic-specific genetic risk factors for asthma. In this study, we were unable to replicate the majority of prior findings of genetic associations with asthma (Online Resource 2 (Tables S1 and S3)). There are several possible reasons for our inability to replicate previous findings. One possible explanation is that previously discovered associations were population-specific and not generalizable to an African American population. Another possible explanation may be due in part to the differing LD patterns displayed by genetic variants between racial/ethnic groups (Shifman et al. 2003; Xu et al. 2007). It is also possible that a portion of the previous associations identified in adult studies of asthma susceptibility were age-dependent and may not have the same impact in a pediatric cohort (Castro-Giner et al. 2010; Pino-Yanes et al. 2013). We must also consider the possibility that the original findings reported in published GWA studies were false positives or spurious associations. Finally, we acknowledge that while SAGE II is the largest and most comprehensive pediatric gene-environment study of African American children with and without asthma to date, we were underpowered to detect smaller effect sizes, while we were adequately and/or well-powered to detect the stronger effects previously published due to our sample size (Fig. S1). Although we do note that the inherent uncertainty of whether the magnitude of the effect sizes for the previously associated SNPs between non-African American and African American populations makes discerning whether the reason for lack of replication was due to a lack of power difficult.

The identification of novel ethnic-specific genetic variation directly associated with asthma in genes that have been previously associated with obesity and/or adiposity is consistent with the observation of a phenotypic link between these two complex common disorders. The significant racial/ethnic disparity in prevalence and severity of asthma and obesity not only suggests the presence of ethnic-specific factors influencing both diseases but also indicates that there may be ethnicity-specific pleiotropic effects as well. Such findings underscore the importance of studying multiple racial and ethnic groups in genetic association studies. To our knowledge, studies of pleiotropy between asthma and obesity genetics have only been performed in pediatric populations of European descent (Gonzalez et al. 2014; Melen et al. 2010, 2013; Wang et al. 2015). This presents a gap in our scientific knowledge that limits the standard of care available for populations, such as African Americans, who are among the most severely affected by the co-occurrence of these conditions. Our findings are in tune with President Obama’s announcement of the Precision Medicine Initiative, which will include enrollment of at least one million Americans (National Institutes of Health 2015). The goal of this initiative is to begin a new era of biomedical research that centers on the inclusion of study populations for common disease reflecting the demographic and genetic diversity of US patient populations. As citizens and scientists, we must leverage our skills to ensure that our research furthers this goal and moves us closer to the development of therapies that will improve care in all patients.