Introduction

In the USA, asthma prevalence is highest in Puerto Ricans (26%) and lowest in Mexicans (10%) (Carter-Pokras and Gergen 1993; Homa et al. 2000). This is paradoxical since both groups are considered “Hispanic” or “Latino”. Although there are many potential explanations for this observation, including environmental and socioeconomic factors, one likely explanation is that the genetic predisposition to asthma differs among subgroups within the Latino population. Latinos are admixed and share varying proportions of West African, Native American and European ancestry (Choudhry et al. 2006; Hanis et al. 1991). The mixed ancestry of Latinos provides unique opportunities in epidemiological and genetic studies and may be useful in untangling complex gene–gene and gene–environment interactions in disease susceptibility (Burchard et al. 2003; Choudhry et al. 2007).

Several recent advances in statistical methods and genotyping techniques have resulted in a paradigm shift in genetic association studies, making it possible to perform a genome-wide association analysis, which does not require a priori knowledge of disease associated genes (Hirschhorn and Daly 2005; Kennedy et al. 2003; Matsuzaki et al. 2004; Risch 2000). An alternative yet complementary approach to genome-wide association analysis is admixture mapping (Smith and O’Brien 2005). Admixture mapping is a method for localizing disease causing genetic variants that differ in frequency across populations. It is most advantageous to apply this approach to admixed populations such as Latinos, which descended from a recent mix of three ancestral groups that have been geographically isolated for thousands of years. The approach assumes that near a disease causing gene there will be enhanced ancestry from the population that has greater risk of getting the disease. Thus if one can calculate the ancestry along the genome for an admixed sample set, one could use that to identify disease causing gene variants (Chakraborty and Weiss 1988; McKeigue 1997; Pfaff et al. 2001; Stephens et al. 1994). Admixture mapping requires the genotyping of several thousand markers (Smith et al. 2004) while genome-wide association requires genotyping of hundreds of thousands of markers (Hinds et al. 2005). The Affymetrix GeneChip Human Mapping 100K array set can genotype 116,204 SNPs in a given individual with a single genotyping assay (Kennedy et al. 2003). The ease and cost effectiveness of the GeneChip arrays for large-scale genotyping make them far more attractive than conventional genotyping platforms for genome-wide studies.

We used the Affymetrix 100K arrays to perform genome-wide association and admixture mapping analyses to identify loci associated with asthma in Puerto Ricans. First, we selected ancestry informative markers (AIMs) from the 100K arrays by screening Puerto Rican ancestral populations. We then performed three different analyses: (1) genome-wide association analysis testing association between individual SNPs and asthma disease status, (2) admixture mapping comparing cases and controls using the program Admixmap and (3) admixture mapping using likelihood ratio test on locus-specific ancestry estimates determined using the program Structure. In all three analyses, we incorporated adjustments to correct for confounding due to population stratification. By combining results from these three different analytical methods, we determined a set of most promising candidate regions for asthma in Puerto Ricans. We then selected the top SNPs from these candidate regions and genotyped them in a second sample of Puerto Rican families with asthma for validation analysis.

Materials and methods

Study participants

A total of 380 Puerto Rican subjects with asthma and 88 ethnically matched controls were included in this study. The genome-wide analyses included 96 subjects with moderate to severe asthma and 88 healthy controls. The moderate to severe asthma was defined based on baseline lung function (Pre-FEV1) of the asthmatic subject. Subjects with Pre-FEV1 less than 80% of predicted were categorized as having “moderate-severe” asthma. The validation analysis included 284 Puerto Rican asthma trios (father, mother and affected child). All subjects were recruited as part of the Genetics of Asthma in Latino Americans (GALA) study. Recruitment and patient characteristics were described in detail elsewhere (Burchard et al. 2004; Choudhry et al. 2005; Lind et al. 2003) but will be briefly described here. Ethnicity and national origin were self-reported and were ascertained using standardized questions. Puerto Rican subjects were enrolled only if both biological parents and all four biological grandparents were reported to be of Puerto Rican ethnicity. Interviews with children were conducted in the presence of parents. Eligible subjects with asthma had physician-diagnosed asthma and had experienced two or more asthma symptoms (among wheezing, coughing, and shortness of breath) in the previous two years. All control subjects were screened and considered to be eligible to participate if they did not have clinical evidence of asthma, allergies, atopy or any other allergic or pulmonary disease. All subjects (asthmatics and healthy controls) were between the ages of 8 and 40 years and were interviewed by bilingual and bicultural field workers and physicians specialized in asthma.

Genotyping using Affymetrix 100K arrays

We genotyped 37 West African, 42 European and 30 Native American samples using the Affymetrix GeneChip Human Mapping 100K array set to find AIMs relevant to Puerto Rico’s founding populations. The 37 West African samples are from individuals living in London, UK and South Carolina, USA, who are either non-admixed or have very low levels of admixture. The 42 European samples are from Coriell’s North American Caucasian panel. The Native American samples (Mayan, n = 15 and Nahua, n = 15) were recruited from villages in remote areas of Mexico. Genotyping of Puerto Rican asthma cases and controls was also performed using the Affymetrix 100K arrays. The genotyping was done following standard Affymetrix protocols and the data was processed using the Affymetrix-provided Genotyping Console Software (GCOS) and GeneChip DNA Analysis Software (GDAS) (Affymetrix Inc., Santa Clara, CA, USA). Ten samples were run in duplicate to assess for concordance between runs. The concordance rate was >99.9% and the overall genotyping success rate was >98.5%. Markers were assessed for Hardy–Weinberg equilibrium using a χ2 goodness-of-fit test. After excluding markers on the X chromosome, markers with minor allele frequency (MAF) <5%, extreme deviation from Hardy–Weinberg equilibrium (χ> 10) or study-wide genotyping call rates <95%, we retained 97,112 markers for further association analysis.

Selection of ancestry informative markers (AIMs)

The genotype data from the parental population samples was used to identify AIMs, which were then used to perform admixture mapping and to adjust the analyses for population stratification. Since the contemporary Puerto Rican population is a mixture of three parental populations, West Africans, Europeans and Native Americans, we used an iterative process for selecting our AIMs. For each of the three possible pairs of ancestral populations, we identified markers where the difference in allele frequency (δ) was at least 0.5 between any two ancestral populations. Once we identified such markers, we selected a subset that was adequately distributed across the genome, with the markers being far enough apart that they were in linkage equilibrium in the ancestral populations. These markers formed our set of 2,730 AIMs, which were then used to estimate individual and locus specific ancestry.

Estimation of individual ancestry

The individual ancestry estimates (IAE) were calculated using two different programs, Admixmap (Hoggart et al. 2003, 2004) and Structure (Falush et al. 2003; Pritchard et al. 2000), and the 2,730 AIMs. The IAE from these two programs were highly correlated (r > 0.9) (results not shown). The IAE from the Structure program were used to adjust for population stratification in the 100K association analysis.

Estimation of locus specific ancestry and admixture mapping analysis

The admixture mapping analyses were performed using the panel of 2,730 AIMs as described above and using two different programs, Admixmap and Structure. Admixmap uses a Bayesian probability model fit using Markov chain Monte Carlo to estimate locus specific ancestry (Hoggart et al. 2004). To run Admixmap, we provided the program with the genotypes of the 2,730 AIMs for our case and control subjects, as well as for the ancestral representatives. Admixmap performs a test for association between ancestral status and disease status and returns a p value for each marker. We also estimated the ancestral proportion at each of the 2,730 AIMs using the program Structure (Montana and Pritchard 2004). At each marker, a likelihood ratio statistic was computed, testing the null hypothesis of no association between ancestry and disease status while taking each individual’s overall ancestry estimates into account. Under the null hypothesis, the statistics follow a χ2-distribution and indicates the strength of deviation in ancestry between cases and controls at each locus. Statistical significance was assessed using a permutation test.

Genome-wide association analysis: regression method

We used a logistic regression model to test for association between genotype and disease status for 97,112 markers on the 100K arrays assuming an additive model for the disease. In addition to age and gender, IAE estimates using the program Structure were included as covariates in all the regression models to control the inflation of type I error rate due to population stratification.

Correction for multiple testing

The admixture mapping and genome-wide association analyses were corrected for multiple testing using the multtest package in the statistical language R. We used adjusted p values as calculated by multtest under the Benjamini & Yekutieli step-up false discovery rate (FDR) controlling procedure (Benjamini and Yekutieli 2001).

Identification of follow-up regions by overlap analysis

To identify areas of potential interest we used the combined strength of the different analytical methods: (1) individual SNP association, (2) admixture mapping using a case–control approach with the program Admixmap and (3) admixture mapping using likelihood ratio test on locus specific ancestry estimates from the program Structure. We identified regions for subsequent analysis by selecting markers that were highly ranked (based on unadjusted p value or likelihood ratio test score) in at least two of the three analytical methods.

Validation

To test the validity of the regions identified through the overlap analysis, we performed validation analysis on the most promising markers in these regions on a sample of 284 Puerto Rican asthma trios. The genotyping was performed using the fluorescent polarization (FP) method as directed by the manufacturer (PerkinElmer, Waltham, MA, USA) (Chen et al. 1999). The association of the markers with asthma disease status was tested using the transmission-disequilibrium test as implemented in the program FBAT (Laird et al. 2000).

Results

Identification of AIMs

Table 1 shows the percent of markers on the Affymetrix 100K arrays informative for West African-European, West African-Native American and European-Native American ancestries at different levels of ancestry informativeness as measured by difference in allele frequency (delta, δ). Most markers were only informative for one pair of ancestral populations, while some were informative for more than one pair. Among the markers on the 100K arrays, there were more than 10,000 markers with δ > 0.5 for West African-European, West African-Native American or European-Native American ancestry (Table 1). We selected a panel of 2,730 AIMs which had a δ value of >0.5 and were evenly spaced across the genome (inter-marker distance mean: 1 cM, median: 0.79 cM, first quartile: 0.67 cM and third quartile: 1.05 cM and the overall inter-marker range: 0.01–26.07 cM) for estimation of individual ancestry estimates and admixture mapping analysis for asthma (see supporting material, Table 1S).

Table 1 Ancestry informative markers (AIMs) identified using the Affymetrix GeneChip Human Mapping 100K array set

Evidence of population stratification in Puerto Rican Asthma cases and controls

The estimated average European and West African ancestry was different between our Puerto Rican cases and controls. Figure 1 shows box plots demonstrating differences in estimated European and West African ancestral proportions between cases (with medians of 62.0 and 22.3%, respectively) and controls (medians of 58.1 and 26.0%, respectively). The Native American ancestry was similar between the two groups (medians of 15.7% in cases and 15.9% in controls) (Fig. 1). The summary χ2 test using 2,730 AIMs also gave significant results (p = 0.01) suggesting that there are systematic differences in ancestry between our asthma cases and controls that could cause spurious genetic associations if measures of ancestry were not included in our analyses.

Fig. 1
figure 1

Boxplots showing distribution of ancestry estimates in Puerto Rican asthma cases and controls

Admixture mapping analysis

Neither of the admixture mapping analyses (Admixmap or likelihood ratio test based on Structure output) gave any significant results after correction for multiple testing using the FDR approach (Figs. 2, 3). There were 56 markers that had an unadjusted p value of <0.01 in the Admixmap analysis (Table 2) and 21 markers that gave a score of ≥10 in the likelihood ratio tests before correction for multiple testing (Table 3).

Fig. 2
figure 2

Distribution of p values for 2730 AIMs across the chromosomes from admixture mapping analysis using the program Admixmap

Fig. 3
figure 3

Distribution of likelihood ratio test scores for 2,730 AIMs across the chromosomes based on locus-specific ancestry estimates from the program Structure

Table 2 Top markers from the admixture mapping analysis using the program Admixmap. All markers with p values ≤0.01
Table 3 Top markers from the admixture mapping analysis using the program structure and likelihood ratio test statistic

Individual SNP analyses

Eight SNPs had p value of less than 10−4 in the regression analysis and were located on chromosomal regions 1p22.3, 1p32.2, 4q31.1, 10q23.31, 11q14.1, 13q13.3 and 13q22.3 (Table 4). The smallest unadjusted p value was 1.3 × 10−5 for a marker in chromosomal region 4q31.1. Taking into account the fact that 97,112 simultaneous hypothesis tests were conducted, none of the markers showed statistically significant association with disease status after correction for multiple testing (Fig. 4). However, it is still possible that some of the more significant markers did not have low enough p values simply due to the low power of the study, and to further explore this possibility we ranked the markers based on the relative strength of their association from the regression and admixture mapping analyses, and used these rankings for the overlap analysis.

Table 4 Top markers from the regression analysis
Fig. 4
figure 4

Distribution of p values for 97,112 markers across the chromosomes from asthma case–control regression analysis

Overlap analysis

For the overlap analysis, we selected the highest ranked SNPs from each of the three analytical methods. We selected 81 SNPs with p values ≤0.001 from the regression analysis, 56 SNPs with p values ≤0.01 from the Admixmap analysis and 21 SNPs with a combined score of ≥10 from the likelihood ratio test. We then performed an overlap analysis on these SNPs and selected regions, which were identified as being highly ranked by at least two of the three methods employed. We considered loci to be overlapping if at least two methods had highly ranked SNPs within 50 kilobases (kb) of each other. From these analyses, we identified five “overlap regions” where the markers showed higher significance or were closer together than in the other overlap regions (Table 5).

Table 5 The top five regions from the overlap analysis

To test for further significance within these candidate regions, we selected windows on either side of the overlap SNPs, and tested for enhanced significance in these windows. Windows consisted of sets of 500 markers from the 97,112 marker set for each of the five overlap regions, with markers chosen so that the overlap SNP was at the center of the window in terms of physical distance. Therefore, the window sizes varied (9–12 Mb) according to the density of the markers on the 100K arrays. We plotted the histograms of the p values from the regression analysis for the markers in these windows, as well as quantile–quantile (QQ) plots to test for deviation from the empirical p value distribution of all 97,112 markers tested. These plots indicated that the largest deviations from the empirical distribution were in the 5q23.3 and 13q13.3 regions (Fig. 5). These deviations could either be due to enhanced association between SNPs and disease status in these regions or could be due to local differences in ancestry since our regression and admixture mapping analyses were adjusted only for ancestry on a genome-wide level.

Fig. 5
figure 5

Plots showing histograms of p values for the markers from the top five regions from the overlap analysis and quantile-quantile (QQ) plots of these p values against all p values from the 100K regression analysis

Validation

To validate our findings from the overlap analysis, we selected one marker each from the 500-marker window in 5q23.3 and 13q13.3 overlap regions. The markers, rs1496348 and rs817737, had the best p value in the regression analysis in the 500 marker window in 5q23.3 and 13q13.3 overlap regions, respectively, and therefore were selected for validation. The two markers were genotyped in a sample of 284 Puerto Rican family trios with asthma. Since the initial genome-wide association analysis was performed on moderate to severe asthmatics, the transmission-disequilibrium test (TDT) was performed on all trios and also on trios with moderate to severe asthma. The TDT analysis suggested positive association between the SNP rs1496348 (5q23.3 region) and moderate to severe asthma (p value = 0.02, Table 6). Allele A of this marker was over-transmitted among trios in which the proband had moderate to severe asthma. The same allele was associated with moderate to severe asthma in the initial genome-wide association analysis with a p value of 0.0007 (Table 6). The result of the TDT analysis for SNP rs817737 in 13q13.3 region was not significant for both asthma and moderate to severe asthma (Table 6).

Table 6 Validation analysis

Discussion

Genome-wide association studies are desirable since they do not require a priori knowledge of the genes involved in a disease or disease-related phenotypes and may also provide greater power than linkage-based methods for identifying common variants conferring modest risk (Hirschhorn and Daly 2005; Risch 2000). However, a potential downside to performing genetic association studies in recently admixed populations, such as Puerto Ricans, is the possibility for spurious associations (confounding) due to population stratification (Cardon and Palmer 2003; Devlin and Roeder 1999; Marchini et al. 2004; Ziv and Burchard 2003). To date, there have been several successful attempts at gene mapping for complex diseases, including asthma, by genome-wide association analysis, but none has done so in an admixed population (WTCCC 2007; Buch et al. 2007; Gudmundsson et al. 2007; Hafler et al. 2007; Hakonarson et al. 2007; McPherson et al. 2007; Moffatt et al. 2007; Samani et al. 2007; Saxena et al. 2007; Scott et al. 2007; Steinthorsdottir et al. 2007; Tomlinson et al. 2007; Winkelmann et al. 2007; Yeager et al. 2007; Zanke et al. 2007; Zeggini et al. 2007).

This is the first report of genome-wide association and admixture mapping analysis for asthma in Puerto Ricans. Although our initial efforts at identifying disease associated loci based on marker-by-marker test statistics were limited by our small sample size, our validation analysis indicates that we could identify and validate regions that have been previously associated with asthma disease status by combining the strength from distinct but complementary analytical methods—direct genome wide association and admixture mapping. While the direct genome association analysis can be performed on any population to identify genetic variants (both ethnic-specific and cosmopolitan) associated with a disease, admixture-mapping methodology detects genetic variants in recently admixed populations that are responsible for racial differences in disease risk. Compared to direct genome-wide association study designs, admixture mapping exploits long-range linkage disequilibrium that exists in recently admixed populations and therefore requires fewer markers (McKeigue 1998; Montana and Pritchard 2004), and is more robust to allelic heterogeneity (Terwilliger and Weiss 1998). Although none of our admixture mapping or genome-wide association analysis showed significant association after correction for multiple testing, it is likely that some of the top scoring markers from different analyses are true associations that did not reach statistical significance due to low power of our study. The overlap analysis combining the results of admixture mapping and genome-wide association analysis did identify two regions, 5q23.3 and 13q13.3, as the most promising candidates for asthma in Puerto Ricans. But it is possible that this analysis missed other regions that could be better captured using one of the analytical methods than the other.

For a complex disease like asthma successful identification of causative factors will require large and well characterized samples, as have been used in other successful genome-wide association studies for complex diseases. In addition, it should be noted that many different factors (genetic or environmental) may exist that contribute to the development and severity of asthma. While every effort was made to ensure homogeneity of the phenotype, the possibility that different individuals in our study may have different genetic factors contributing to disease still exists.

One lesson our analysis demonstrates is that association studies in admixed populations should include adjustments for ancestry. We have previously shown that there is evidence of substantial confounding from population stratification in association studies of asthma in Latino populations (Choudhry et al. 2006). Here, we confirm our previous findings using a much larger set of AIMs (n = 2,730) suggesting that any association study of asthma in Latino populations should be tested and corrected for population stratification. In addition, we have identified AIMs on the Affymetrix 100K arrays for Latino populations, which may be helpful for future investigators intending to perform admixture mapping or genome-wide association studies in Latino populations, complementing three recent reports of Latino Admixture Mapping panels (Mao et al. 2007; Price et al. 2007; Tian et al. 2007).

The excess of lower p values than expected by chance in the 5q23.2 and 13q13.3 chromosomal regions may suggest the existence of asthma susceptibility loci in these regions in Puerto Rican population (Fig. 5). Multiple genome-wide linkage studies for asthma and related atopic phenotypes have been performed in ethnically diverse populations. Although, the results of these linkage mappings varied across studies and across racial and ethnic groups, regions 5q and 13q are two of the few most frequently reproduced regions in these studies (Koppelman et al. 2002; Ober et al. 1998, 2000; Wiltshire et al. 1998; Xu et al. 2000). It is interesting to note that these regions contain the asthma candidate genes IL4, IL13, and TGFβ, which have been studied extensively (Bartram and Speer 2004; Basehore et al. 2004; Battle et al. 2007; Camoretti-Mercado and Solway 2005; Howard et al. 2002; Li et al. 2007; Marsh et al. 1994). In addition, these regions contain several new potential asthma candidate genes including IL3, IL5, IL9, SMAD5, IRF1, FBN2 and SMAD9 that warrant further investigation. In addition, our genome-wide association analysis suggests some other regions with low p values that may be associated with asthma including 1p22.3, 1p32.2, 4q31.1, 10q23.31, 11q14.1 and 13q22.3 (Table 4). Some of these regions have been shown to be linked with asthma in previous studies (Colilla et al. 2003; Haagerup et al. 2004; Hoffjan and Ober 2002; Huang et al. 2003; Mathias et al. 2006; Pillai et al. 2006; Postma et al. 2005). The detection of causative variants in these regions will require fine mapping and functional validation.

Our results provide strong evidence that the previously linked 5q23 region is associated with asthma in Puerto Ricans. By finding an association in a previously identified region we provide a “proof-of-principle” for genome-wide association studies in admixed populations. Our study underscores the value of applying complementary analytical techniques to admixed populations to uncover genetic risk factors for complex traits.