Keywords

Asthma is a complex multifactorial disorder caused by a variety of different mechanisms which result in multiple clinical phenotypes (Pavord et al. 2018; Saglani and Custovic 2019). Although asthma does not show classical Mendelian inheritance (Jenkins et al. 1997), it has a strong genetic component, and a familial aggregation has been described in numerous studies. Twin studies have confirmed higher concordance of asthma in monozygotic than in dizygotic twins and estimated the heritability to be in the region of 60–70% (Duffy et al. 1990). The evidence to date suggests that the genetic component of asthma likely derives from numerous genes with small to moderate effects (Ober and Yao 2011).

1 Genetics of Asthma: From Candidate Genes to Genome-Wide Association Studies

“Asthma genes” have been identified through a range of approaches, from candidate gene association studies (Simpson et al. 2012) and family-based genome-wide linkage analyses (Daniels et al. 1996) to genome-wide association studies (GWAS) (Moffatt et al. 2007, 2010; Demenais et al. 2018). Results of the first genome-wide linkage study for asthma were published in 1996 (Daniels et al. 1996), suggesting six potential loci, one of which (chromosome 11q13) has been reported previously in a candidate gene association study (Lympany et al. 1992). The first lung-specific asthma candidate gene (ADAM33) was identified by positional cloning in 2002 (Van Eerdewegh et al. 2002), and the first GWAS of asthma was reported in 2007 (Moffatt et al. 2007), identifying multiple markers on chromosome 17q21 as associates of the childhood-onset asthma. Since then, a number of other loci were identified using these techniques, including (but not limited to) GPRA (Laitinen et al. 2004), HLA-G (Nicolae et al. 2005), PHF11 (Zhang et al. 2003), DPP10 (Allen et al. 2003), CYFIP2 (Noguchi et al. 2005), etc. Genome-wide association studies have identified many risk alleles and loci (some of which have been replicated in worldwide populations (Kim and Ober 2019)), including novel candidate genes such as CH13L1 (Ober et al. 2008) and DENND1B (Sleiman et al. 2010). Overall, the evidence suggests that there may be multiple genes underlying the linkage peaks (e.g. in the region 5q31–33 lie ADRB2, IL4, IL13, SPINK5, CD14, LTC4S, CYFIP2 and TIM1, and 17q21 locus includes genes ORMDL3, GSDMB, CDHR3, GSDMA and GSDML (Kim and Ober 2019; Zhang et al. 2019; Das et al. 2017; Ober 2016)). Each of these has relatively small effect or may be associated with different asthma endotypes. A comprehensive review which summarised the results of 42 GWASs to date of asthma, different asthma phenotypes and asthma-related traits has been published recently (Kim and Ober 2019) and provides an excellent summary of the many risk alleles and loci which were replicated in different populations. The most widely replicated asthma locus in GWASs is 17q12–21, followed by 6p21 (HLA region), 2q12 (IL1RL1/IL18R1), 5q22 (TSLP) and 9p24 (IL33) (Kim and Ober 2019). However, it is important to highlight the under-representation of ethnically diverse populations in most GWASs (Kim and Ober 2019). To mitigate against this, large consortia have been formed, which combine the results of multiple ethnically diverse GWASs in order to increase the overall power to identify asthma-susceptibility loci. Examples include the GABRIEL (Moffatt et al. 2010), EVE (Torgerson et al. 2011) and TAGC (Demenais et al. 2018) consortia, and the value of diverse, multi-ethnic participants in large-scale genomic studies has recently been shown in the Population Architecture using Genomics and Epidemiology (PAGE) study (Wojcik et al. 2019). The largest GWAS to date in asthma was performed by the Trans-National Asthma Genetic Consortium (TAGC) in 23,948 cases and 118,538 controls, revealing 18 loci that reached genome-wide significance (Demenais et al. 2018).

2 Genetics of Asthma and Allergic Diseases: Inconsistent Findings

However, despite undeniable successes, it is of note that genetic studies have produced relatively heterogeneous results with limited replication (Ober and Yao 2011; Ober and Hoffjan 2006). Furthermore, a “precise replication”, namely the identical association of the specific single nucleotide polymorphism (SNP) with the same phenotype, is rare (Hirschhorn et al. 2002). In asthma literature, the whole gene, rather than a specific (SNP) is often used as a unit of replication, and the finding of any association between the genetic variant of interest with any asthma-related phenotype is sometimes considered as evidence of replication (even if the association is in the opposite direction in different populations (Ober and Hoffjan 2006)). Even for the best replicated asthma genes (e.g. those on chromosome 17q21), there is at least one negative finding, the risk alleles/SNPs are occasionally not consistent, and the effect size is relatively small (with odds ratios often in the region of 1.1–1.2) (Ober and Hoffjan 2006). Finally, only a relatively small proportion of the estimated heritability of asthma (and other atopic phenotypes such as IgE (Granada et al. 2012) and eczema (Paternoster et al. 2012)) has been explained. For example, although the estimated heritability for total IgE is ~60% (Strachan et al. 2001), a relatively large study of its genetic determinants explained less than 1% of the variance (Maier et al. 2006). Similarly, two meta-analyses of GWASs for lung function identified several novel genome-wide significant loci (Hancock et al. 2010; Repapi et al. 2010), but these accounted for only 3% of the variation in lung function, with most of the variance remaining unexplained (Weiss 2010; Artigas et al. 2011). Overall, despite considerable promise, genetic studies of asthma and allergic diseases have so far had limited impact on our understanding of disease mechanisms and development of novel therapeutic targets, or on patient care.

One part of the explanation for the paucity of precise replication of genetic studies of asthma discussed in this chapter is the existence of numerous gene–environment interactions (Custovic et al. 2012; Ober and Vercelli 2011). Another important consideration is the heterogeneity of asthma (including considerable differences in the response to treatment, severity, age of onset and factors triggering symptoms), which has long been recognised and is a subject of a considerable debate. This culminated in the notion that asthma is not a single disease underpinned by similar pathophysiological mechanisms and genetic architecture leading to different clinical manifestations (“phenotypes”), but rather an “umbrella” diagnosis for a complex conglomerate of several different diseases (“endotypes”) (Anderson 2008; Lotvall et al. 2011), each caused by both distinct and overlapping mechanisms and with different genetic associates, but with similar symptoms and clinical presentation (Custovic et al. 2019). One of the consequences of the use of an aggregated definition (such as “doctor-diagnosed asthma”) in genetic studies is that important signals may be diluted by phenotypic heterogeneity (Custovic et al. 2015, 2019).

3 Gene–Environment Interactions

Asthma and allergic diseases are rarely purely genetically or environmentally driven and usually develop through complex interactions whereby environmental factors modulate the risk in genetically susceptible individuals (Ober and Yao 2011; Custovic et al. 2012; Vercelli and Martinez 2006). The concept that the same environmental exposure may have different effects among individuals with different genetic predisposition has been tested in studies which assessed the interaction between genes and the susceptibility to environmental factors (reviewed in Custovic et al. 2012). One of the most replicated gene–environment interactions which we will use as an exemplar is the relationship between endotoxin (lipopolysaccharide or LPS) and polymorphism in CD14 gene (Simpson et al. 2006). Endotoxin is a component of the exterior cell wall of gram-negative bacteria, and in the broad context of the “hygiene hypothesis” (Schaub et al. 2006; von Mutius 2007), exposure to endotoxin and other microbial products which are sensed by “danger signal” innate immunity receptors (e.g. the toll-like receptors, TLRs) serves to mould and calibrate the maturation of immune competence that is typically developmentally compromised during early life (Stein et al. 2016; von Mutius and Vercelli 2010). However, similar to many other environmental factors, contradictory findings have been reported on the relationship between endotoxin exposure and atopic phenotypes, with endotoxin in some studies conferring a protective effect (Braun-Fahrlander et al. 2002; Gereda et al. 2000), an increase in risk (Nicolaou et al. 2006), or having no effect (Bottcher et al. 2003). Endotoxin (LPS) is recognised by a cascade of receptors and accessory proteins, which include LPS binding protein, CD14 and the Toll-like receptor 4 (TLR4)–MD-2 complex (Park and Lee 2013). An association between a functional variant in the promoter region of the CD14 gene (−159C/T, rs2569190) (Baldini et al. 1999) with allergic phenotypes has been reported in several populations (reviewed in Simpson and Martinez 2010). Initial studies from Tucson reported that T allele was protective (Baldini et al. 1999), but subsequent studies in Germany found no association (Kabesch et al. 2004), and a study among Hutterites reported the opposite finding (i.e. that the T allele confers increased risk (Ober et al. 2000)). Given that CD14 is a part of the pattern recognition receptor complex for endotoxin, we tested the hypothesis that the effect of endotoxin exposure on allergic sensitisation may differ in individuals with different variants in CD14 SNP rs2569190 (−159C/T) (Simpson et al. 2006). The results have indicated that high endotoxin exposure is protective, but only in children who are C-allele homozygotes (Simpson et al. 2006). In contrast, there was no association between endotoxin exposure and sensitisation among T-allele homozygotes. These findings may explain the disparities in genetic association studies of this SNP and atopic phenotypes in different settings around the world (Simpson and Martinez 2010). For example, if this genetic variant is studied in isolation, in populations exposed to low level of endotoxin (such as Tucson), the T allele would confer protection (Baldini et al. 1999), while in those with high endotoxin exposure (such as a farming community of Hutterites) (Ober et al. 2000), the same allele (T) would confer the increase in risk. Finally, in populations with a range of exposures, there would be no clear association between genotype and outcome (Kabesch et al. 2004). Similar results which confirmed interaction between this polymorphism in CD14 and endotoxin exposure have been reported in several independent populations (Eder et al. 2005; Williams et al. 2006; Zambelli-Weiner et al. 2005).

A further example of the opposite effect of environmental exposure(s) among individuals with different genetic predisposition is the finding that the association between early-life day-care attendance and development of asthma during childhood may depend on a variant in TLR2 gene (Custovic et al. 2011). Day-care (likely a marker of high exposure to microbial agents) was found to be protective among T-allele carriers for TLR2 SNP rs4696480, whilst among AA homozygotes day-care attendance appeared to increase the risk. Thus, the relevance of the TLR2 gene was uncovered only when the relevant environmental exposure (day-care attendance) was identified and taken into account. Day-care appeared protective in the analysis in the whole population (Nicolaou et al. 2008), but the apparent protective effect is a consequence of a much larger number of children with the T allele in the population, who outnumbered AA homozygotes by a factor of almost 4:1, thereby concealing the fact that in a subgroup (AA homozygotes), day-care attendance actually increased the risk of disease (Custovic et al. 2011). These two examples indicate that if genotypes which interact with environmental exposures are studied in isolation, irrespective of the size of the population studied, associations can be missed.

Context dependency in terms of interactions with environmental exposure has also been shown for genes with an important and consistent main effect. For example, as outlined above, chromosome 17q21 locus is recognised as a major genetic risk locus for childhood-onset asthma (Moffatt et al. 2007; Stein et al. 2018; Hernandez-Pacheco et al. 2019) (in particular among children experiencing virus-induced wheezing in early infancy (Caliskan et al. 2013)). However, recent studies have shown that genetic polymorphism in ORMDL3 which confers the risk of asthma (GG genotype in SNP rs8076131) is amenable to environmental protection through exposure to farm animal sheds and presence of older siblings, whilst no protective effect of these exposures was observed for the insusceptible genotype (rs8076131 AA/GA) (Loss et al. 2016). Similarly, 17q21 variants have been shown to interact with the environmental tobacco smoke (ETS) exposure (Marinho et al. 2012; Blekic et al. 2013) and early-life pet ownership (Blekic et al. 2013). These findings suggest that stratified prevention strategies informed by individual genetic predisposition may be possible.

Another example of the susceptibility genotype which may interact with environmental exposures is filaggrin (FLG). FLG loss-of-function mutations cause impaired skin barrier and are associated with atopic dermatitis (AD) (Sandilands et al. 2007), and a range of other allergic conditions (Marenholz et al. 2006; Weidinger et al. 2008; Henderson et al. 2008a) and allergic sensitisation (Henderson et al. 2008a). However, certain environmental exposures may modify this association. For example, cat exposure in early life was found to increase the risk of AD in children with FLG loss-of-function mutations, with no effect cat ownership among those without FLG mutations (Bisgaard et al. 2008). In a study on peanut allergy, we have shown that early-life environmental exposure to peanut allergens measured in dust samples from homes increases the risk of peanut sensitisation and peanut allergy in children who carry FLG mutations, with no significant effect of exposure in children without FLG mutations (Brough et al. 2014). Our subsequent study has shown that FLG loss-of-function mutations also modify the impact of exposure to inhalant allergen such as house dust mite (HDM) and cat on the development of allergen-specific sensitisation, in that the impact of Der p 1 and Fel d 1 exposure on allergen-specific sensitisation was considerably higher among children with FLG loss-of-function mutations compared to those without (Simpson et al. 2020). In contrast to mite and cat exposure, the risk of sensitisation to any allergen was lower among children with FLG mutations who were exposed to dog in their home in infancy (Simpson et al. 2020). These findings may partly explain the differences in the effects of cat and dog ownership on allergic diseases and suggest that cat ownership may be a marker of high cat allergen exposure, while the protective effect of dog ownership (which extends to asthma and sensitisation to other allergens (Ownby et al. 2002)) may be mediated via changes in external or host microbiome (Sitarik et al. 2018). This study raises another important issue which is often overlooked in genetic studies of allergic diseases – that of time, and of a potentially crucial importance of longitudinal analyses. It confirmed previous observations that the association between early-life environmental exposures and allergic sensitisation changes with time (Ihuoma et al. 2018) and raised questions about the current approach to replication in genetic and gene*environment studies, suggesting that the timing of the assessment of outcomes may have a critically important impact on the results (Simpson et al. 2020).

It is likely that the impact of many (if not most) environmental exposures which influence the risk of allergic diseases is modified by the genetic predisposition. For example, ETS exposure increases the risk of wheezing (Murray et al. 2004), accelerated decline of lung function and increased asthma severity and morbidity (Strachan and Cook 1998), as well as the reduced responsiveness to inhaled corticosteroids (ICS) (Chalmers et al. 2002). However, not all exposed individuals develop symptoms, suggesting that some may be more susceptible to the effects of tobacco smoke (possibly due to genetic variation). Interaction has been demonstrated between tobacco smoke exposure and several asthma candidate genes, such as glutathione S transferases (Panasevich et al. 2010; Palmer et al. 2006; Rogers et al. 2009), TNF-α (Wu et al. 2007) and the β2-adrenoreceptor (ADRB2) (Wang et al. 2001), as well as other genomic loci (5q (Colilla et al. 2003), 1p, 9q (Colilla et al. 2003) and 1q43-q44, 4q34, 17p11, 5p15, 14q32 and 17q21 (Dizier et al. 2007; Meyers et al. 2005)). Similarly, variability in response to environmental HDM exposure in relation to HDM sensitisation has been attributed to the IL4 gene promoter polymorphism C-590T (Liu et al. 2004). However, it should be noted that there are conflicting data on specific gene–environment interactions. For example, Reijmerink et al. found a significant effect modification by in utero ETS exposure of ADAM33 polymorphisms on lung function and the development of AHR in children (Reijmerink et al. 2009), whereas Schedel et al. (Schedel et al. 2006) were unable to detect any interactions between passive smoke exposure (in utero or during childhood) and ADAM33 variants.

4 Heterogeneity of Asthma and Allergic Diseases

The clinical presentation of wheezing and asthma varies widely over the life course, and the extent to which phenotypic variation signals differences in disease aetiology remains unclear. Most large GWASs use the broadest possible definition of asthma (e.g. parentally or patient-reported “doctor-diagnosed asthma”). As there is no universally accepted operational definition, this may lead to the under- or over-estimation of cases in genetic studies. For example, a study by Van Wonderen et al. (2010) found that the choice of “case” definition has a large impact on the estimate of asthma prevalence in early life. The authors identified 60 different “case” definitions for diagnosing childhood asthma in 122 published articles from cohort studies. They then chose four common definitions and applied them to a single cohort, to find that prevalence estimates varied considerably from 15.1% to 51.1%.

Another factor which may impact upon the power to detect associations is misclassification of controls. For example, we have recently shown that the choice of the definition a “control” has major implications for detecting an association between AD and FLG genotype (Nakamura et al. 2019). By using different definitions of controls (“strict” and “moderate”) (Nakamura et al. 2019), we have demonstrated that although the sample size was reduced by approximately one-fifth when moving from the moderate to strict definition (as fewer children fulfilled the more stringent criteria), the power to detect genetic association increased by ~50% from 0.58 to 0.85 by having a “purer” control as a comparator for AD cases. These findings confirm that bigger is not necessarily always better in genetic studies of common complex heterogenous phenotypes (Schoettler et al. 2019). Further consideration for defining cases and controls is the co-morbidity of allergic diseases. Twin studies have estimated pairwise genetic correlations of asthma, AD, and allergic rhinitis to be greater than 0.5, suggesting that there are specific risk variants for each disease, but also shared genetic risk variants between diseases (Ferreira et al. 2017). Consequently, if for example in genetic studies of asthma one includes in the control group non-asthmatic individuals who have other allergic diseases (e.g. allergic rhinitis or AD), genetic signals may be weakened. Ferreira has demonstrated that using a strict definition of controls, where individuals do not experience any allergic disease, has the potential to increase the power, confirming that tighter definition of both cases and controls could strengthen findings on the genetic architecture of diseases (Ferreira 2014). To account for this, we suggest that genetic studies of allergic diseases should investigate and publish the sensitivity of their findings based on different definitions of cases and controls.

5 Can Focus on Disease Subtypes Improve Genetic Studies?

Deep phenotyping has the potential to identify new risk loci. A recent comparatively small GWAS which used a specific subtype of early-onset childhood asthma with recurrent, severe exacerbations as an outcome identified a novel gene, Cadherin Related Family Member 3 (CDHR3) as an associate of this specific subtype, but not of doctor-diagnosed asthma (Bønnelykke et al. 2014). This important discovery was made in a considerably smaller sample size (1,173 cases and 2,522 controls), but using a much more precise asthma subtype than was available in large international consortia which did not detect this association. For example, the previously mentioned largest asthma GWAS to date (TAGC) had almost 40-fold higher sample size (Demenais et al. 2018), but reported no significant association between CDHR3 and “asthma”, likely due to phenotypic heterogeneity inherent in the definition of asthma. Mechanistic studies that followed have suggested that CDHR3 may be a receptor for Rhinovirus C, identifying this as a potential therapeutic target (Bochkov et al. 2015). A major implication of this study is that with careful phenotyping, smaller sample sizes may be adequately powered to identify larger effect sizes than those in large GWASs with “looser” outcome definitions.

6 Gene Discovery Using Phenotypes of Allergic Disease Derived by Data-Driven Techniques

As highlighted previously, an unwanted consequence of increasing sample size in large GWASs is increased phenotypic heterogeneity, which may dilute effect sizes. One way of addressing this is to derive subtypes of asthma and allergic diseases, which should ideally be homogenous. Whilst there are abundant studies using longitudinal data to characterise allergic diseases into data-driven phenotypes, a fertile yet under-investigated area of research is the investigation of genetic markers of these disease subtypes (Belgrave et al. 2017).

Over the past two decades, there has been a growth in the number of studies using data-driven classification techniques to identify subtypes of asthma. One such approach is latent trajectory modelling, in which repeated measurements of observed symptoms are modelled to identify more homogeneous sub-populations within the larger heterogeneous population (Oksel et al. 2018a, b; Prosperi et al. 2014). By incorporating the temporal evolution of observed symptoms across the life course, investigators have been able to capture the phenotypic heterogeneity of asthma based on the timing and persistence of symptoms in a hypothesis-neutral way, and to identify sub-groups with consistent patterns of disease that were not known a priori (reviewed in Oksel et al. 2018b; Deliu et al. 2017; Deliu et al. 2016; Howard et al. 2015). Latent modelling approaches have been extensively used to identify and validate longitudinal trajectories of childhood wheeze (Henderson et al. 2008b; Granell et al. 2016; Belgrave et al. 2013; Savenije et al. 2011), severe exacerbations (Deliu et al. 2019), atopy (Lee et al. 2017; Havstad et al. 2014; Garden et al. 2013; Lazic et al. 2013), asthma (Deliu et al. 2017; Howard et al. 2015; Weinmayr et al. 2013) and lung function (Belgrave et al. 2014a, 2018) and to evaluate their associations with environmental risk factors. Even though these studies have been instrumental in elucidating the heterogenous nature of allergic diseases, there has been a paucity of research into the genetic associations of such phenotypes. ALSPAC investigators explored associations between genes in the 17q21 locus and symptom-based longitudinal wheezing phenotypes derived by latent class analysis (Henderson et al. 2008b). Their findings suggested that SNPs near ORMDL3, GSDML and IKZF3 were associated with persistent and intermediate-onset wheeze (characterised by onset before age 30 months), but not with early wheezing that resolved or with late-onset wheezing, indicating that the timing and pattern of symptoms in early life may have differential genetic associations (Granell et al. 2013). A further study related the same wheezing phenotypes to the genetic prediction scores based on 10–200,000 SNPs ranked according to their associations with physician-diagnosed asthma and found that the 46 highest ranked SNPs (which included SNPs in ORMDL3/GSDMB, IL1RL1, IL18R1 and IL33) predicted persistent and intermediate-onset wheezing phenotypes more strongly than doctor-diagnosed asthma (Spycher et al. 2012). Furthermore, SNPs below the stringent genome-wide significance threshold were associated with bronchial hyper-responsiveness and atopy. Combining data from ALSPAC and PIAMA cohorts, Savenije et al. (2014) found that intermediate-onset and late-onset wheeze (both highly associated with allergic sensitisation) were associated with several IL1RL1 and IL33 polymorphisms. The study suggested that allergic sensitisation, through the IL33-IL1RL1 pathway, may be a risk factor for the development of wheeze and subsequent asthma.

A recent study took this approach one step further by investigating the relationship between the polygenic risk score comprising 135 SNPs which were found to be associated with allergic diseases in a previous GWAS (Ferreira et al. 2017), and eight latent classes of allergic diseases which were derived using machine learning techniques applied on the longitudinal patterns of the development of eczema, wheeze and rhinitis (Belgrave et al. 2014b). The authors found strong evidence for differential genetic associations across the different developmental profiles of eczema, wheeze and rhinitis, with pooled polygenic risk score heterogeneity P-value of 3.3 × 10−14. SNP rs61816761 (a protein truncating variant in FLG gene) and SNP rs921650 (within an intron of GSDMB on chromosome 17q21.1), which have previously been identified as having disease-specific effects (Ferreira et al. 2017), were differentially associated with distinct disease profiles. The FLG locus was associated with all profiles that included eczema, but the association was much stronger for the classes with co-morbid wheeze and rhinitis. In contrast, the GSDMB locus was associated with all profiles which included wheeze (including transient wheezing), but with no additional risk of co-morbid conditions. These studies suggest that the investigation of genetic associates of more holistic phenotypes which take into account temporality and co-occurrence of different symptoms may provide evidence of heterogenous aetiological pathways which are masked by the umbrella case definitions in genetic studies. Ultimately, the case is strengthening for reforming the taxonomy of atopic diseases away from traditional symptom-based diagnostic criteria to incorporate advances in data analysis techniques, as well as molecular and genetic medicine.

7 Integration of Data

There is a paucity of studies which have collected measurable data which dynamically influence the susceptibility to asthma, which would allow large-scale hypothesis-neutral GWAS using derived multinomial probabilistic phenotypes as an outcome. Gathering such data at a sufficient scale may be impractical due to the cost and complexity of externally validating findings of clinical importance in different populations. There may also be a risk of diluting genetic effects due to differences in exposure to environmental factors, which are difficult to control for, and variations in genetic backgrounds.

We propose that an integrated approach to understanding the mechanisms of asthma and allergic diseases (including genetics) through harnessing the power of different data sources may translate into a better understanding of causal mechanisms, more accurate diagnoses and more personalised treatment (Haider and Custovic 2019). The proliferation of new types of data, namely biomarkers from “omics” technologies and systems biology, coupled with advances in computational power introduces new opportunities for the integration of different data sources to understand complex diseases more holistically (Canonica et al. 2018). More specifically, triangulation from different sources may help to elucidate the directionality of relationships between variables at a very individual level by modelling the complex interdependencies between multiple dimensions (e.g. genome, transcriptome, epigenome, microbiome and metabolome) thereby, moving away from associative to a more causal analysis. Pecak et al recently developed a catalogue of 190 potential asthma biomarkers from 73 studies covering 13 omics platforms (including genomics, epigenomics, transcriptomics, proteomics) (Pecak et al. 2018). They identified 10 candidate genes linked to asthma (for example, IL3, IL13, GATA3) that were present in at least two omics levels, thus demonstrating the potential for prioritising specific biomarker research and the development of targeted therapeutics.

8 Conclusions

The existence of numerous gene–environment interactions (many of which remain to be described (Vercelli and Martinez 2006)) and the use of aggregated phenotypes which are probably comprised of a number of different diseases with distinct pathophysiological mechanism make reproducible studies aiming to understand the mechanisms of allergic disorder difficult, if not impossible. Understanding the potential causes of heterogeneity in phenotyping may enable the identification of genetic associations in a more consistent way. We propose that one of the ways forward is to precisely define disease subtypes (for example, by applying novel mathematical approaches to rich phenotypic data) and use these latent subtypes in genetic association studies. Understanding of gene–environment interactions, which typically use simple binary definitions of outcomes, could also be enriched by taking a multinomial approach by exploring whether such interactions vary by disease subtype membership. The triangulating of genetic data with deep phenotyping, environmental and omics data may help to identify the underlying pathophysiology of allergic diseases more holistically and inform the development of personally tailored therapeutic targets, as well as the genotype-specific strategies for prevention (Custovic and Simpson 2004).