Introduction

For the time being, a positive family history is the clinician’s only tool for estimating the impact of genetic disposition on the development of an atherosclerotic disease. For family members, a doubling of risk can be assumed, when myocardial infarction (MI) had been diagnosed before the 55th year of age in a male or before the 65th year in a female first-degree relative [1]. Specifically, a high rate for reoccurrence was found in identical twins of MI patients. For example, the chance to die from MI before the age of 55 years was found to be increased eightfold, when an identical twin was affected at an early age [2]. The inherited risk of coronary artery disease (CAD) is particularly evident in (rare) families with multiple affected family members [3]. However, with the exception of the LDL-receptor gene, the molecular causes underlying such risk remained elusive until very recently [3, 4]. Indeed, it was the emergence of GWAS that led to the discovery of multiple common variants which reproducibly affect CAD risk [5].

It all started in 2007 with the discovery of the 9p21 risk locus [6]. Subsequently, steadily growing GWAS consortia were formed, which unraveled by now 164 chromosomal loci reaching genome-wide significance (P < 5 × 10−8) (Fig. 1) [7,8,9,10,11], thus demonstrating a linear relationship between sample size and the number of loci (Fig. 2), implying that larger sample sizes will lead to new discoveries. However, the statistical power to detect associations between genetic variants and a trait clearly depends not only on sample size, but also on other factors, such as the effect size and allele frequency at those loci [12]. For example, when considering the so-called ultra-rare variants (i.e., those with a frequency of 1/100,000), whole-genome sequencing and a sample size of more than one million would be required to identify associations, but only when the effect sizes of the variants are very large [12]. Hence, it is not expected that we might reach a saturation (i.e., identification of nearly monogenic variants significantly associated with CAD) soon, at least not within the next 5 years to come. Currently, in fact, exome-wide studies have just started to identify rare variants in the coding regions additionally contributing to CAD risk [13]. Moreover, the discoveries regarding monogenic causes were so far limited to the genes causing familial hypercholesterolemia and disorders in the NO-cGMP signaling pathway [3, 4, 16]. However, this is now expected to change as the scientific community is increasingly moving to whole-genome sequencing, thus increasing the number of CAD risk variants that can be identified and, possibly even, revolutionizing the discovery and diagnosis of monogenic disorders [17].

Fig. 1
figure 1

The figure displays all genes that have achieved genome-wide significant association signals for CAD in GWAS studies as of today [10, 11]. Genes at the 164 loci were grouped into functional classes by gene ontology and canonical pathway maps, such as ConsensusPathDB (http://cpdb.molgen.mpg.de), including the Kyoto Encyclopedia of Genes and Genomes. Some genes have been assigned to multiple pathways. The figure indicates that most genetically influenced mechanisms leading to an increased risk of coronary artery disease are poorly defined and not addressed by current treatments

Fig. 2
figure 2

The number of individuals studied (x-axis) vs. the number of CAD loci reaching genome-wide significance (y-axis) since the first CAD GWAS in 2007, until today (taken from [11])

The genetic architecture of coronary disease

The chromosomal loci associated with atherosclerosis risk that were identified by GWAS analyses are remarkable for several reasons:

  1. 1.

    Only few associated variants alter protein structure. Rather, most risk alleles appear to affect gene regulation.

  2. 2.

    Only 30% of the chromosomal loci conferring CAD risk do so via modulating traditional risk factors like LDL cholesterol and blood pressure (Fig. 1). Thus, the mechanisms increasing atherosclerosis risk are vague for the majority of chromosomal loci [14].

  3. 3.

    Almost all currently identified risk alleles—in part by design of GWAS analyses—are relatively frequent. For example, in Europeans, the probability to carry one or two risk alleles at the most prominent CAD risk locus, at chromosome 9p21.3, is 50% and 25%, respectively [6]. Thus, only 25% of our population is free of this specific genetic risk factor for MI—but there are at least 160 more risk loci in the genome! Given that we have two alleles at each locus, most Europeans carry overall between 130 and 190 of currently known risk alleles.

  4. 4.

    Each risk allele increases the probability of atherosclerosis only by a small margin, i.e., 5–25 relative percent per allele. In other words, individuals who are homozygous for the risk allele on chromosome 9p21.3 carry a 50% relative risk increase to suffer from MI (since they carry this risk allele twice) as compared to the 25% of our population, who do not carry this allele. However, even if a subject does not carry one of the 9p21.3 risk alleles, he or she is likely to carry many other risk alleles at the other loci.

  5. 5.

    The high number of loci carrying risk alleles, the high frequency of most of these risk alleles, and the even spread of risk alleles in the genome of subjects of a given population explain why the implications of the recently identified genetic factors are substantial at the population level, even though each individual risk allele confers only a relatively moderate effect.

  6. 6.

    The genetic risk conferred by the newly discovered common risk variants is largely independent of the risk signaled by a positive family history [15]. By contrast, in heavily affected families cascade-screening for familial hypercholesterolemia should be initiated to provide early preventive treatment [16].

  7. 7.

    Of note, however, each individual will also carry a number of alleles that may decrease the CAD risk, i.e., the so-called “protective alleles” (e.g., by disrupting certain protein functions, typically via loss-of-function effects of genes that go along with increased risk such as APOC3 [18] and ANGPTL4 [13]). Other alleles may specifically neutralize or diminish risk coming from environmental or endogenous factors.

  8. 8.

    Finally, the overall genomic architecture has to be considered, as genetic variants associated with increased risk of complex diseases such as CAD may be also found in genomes of long-lived people, and do not seem to compromise their longevity [19].

Clinical utilization of GWAS findings

From a clinician´s perspective one may ask, how these discoveries may improve prevention and treatment of coronary artery disease? A first step is the conduction of Mendelian randomization studies, which aim to predict the beneficial effect of medications [20]. The principle is based on the fact that any given genetic variant, which exclusively affects a biomarker (e.g., a lipid or inflammatory molecule), can only be related to the outcome (e.g., coronary disease), if this biomarker plays a causal role in this condition [20]. In this respect, GWAS have provided compelling evidence that pharmacological interventions to increase HDL cholesterol are unlikely to lower coronary risk, since there is little evidence that genetic variants which increase HDL cholesterol levels decrease CAD risk [21]. By contrast, medications that lower LDL cholesterol or triglycerides may be good candidates, since multiple genetic variants which lower LDL cholesterol or triglycerides levels also lower CAD risk [13, 22].

Nowadays, pharmaceutical companies increasingly pay attention to the predictive value of such GWAS results in their decision-making, when they select novel agents for clinical evaluation. Indeed, genetic variants may mimic drug effects and thereby allow predicting the outcome of clinical studies [13, 23]. Moreover, it has become increasingly clear that genetic variation can actually affect drug responses in individual patients, including susceptibility to adverse drug reactions.

Finally, the prediction of premature atherosclerosis may be improved by consideration of a genetic risk score build on the hundreds if not thousands of genetic variants that all modulate the respective disease risk [24]. Moreover, in addition to the 164 CAD risk loci, GWAS have identified a large number of genetic variants associated with the traditional CAD risk factors such as hypertension [25], type 2 diabetes [26] and hypercholesterolemia [27]. Indeed, it has been demonstrated that genetic risk scores based on risk factor SNPs, e.g. for hypertension, are likewise associated with CAD [25]. Hence, these additional genetic variants may be considered when constructing genetic risk scores, as this information may lead to more precise risk estimation for CAD and, in some cases, also to specific lifestyle recommendations.

Genotyping arrays are able to yield such information at low cost (e.g., 40 € in a research setting) and assign each individual a percentile rank of a genetic risk score within a given population. The higher the rank, the higher the risk to develop CAD, in particular if the genetic risk score is beyond the 80th or 90th percentile [24]. The advantage of such testing is that the predictive CAD risk value can be obtained already at a young age and thus before any manifestation of atherosclerotic lesions. However, future studies first need to determine the clinical utility of such information before genetic testing can be recommended as a diagnostic tool. Finally, as the run time and costs of whole-genome sequencing drop rapidly, we might be entering a new era of next generation diagnostics, soon.

The growing spectrum of causal pathways

From a clinical point of view, it may be helpful to condensate the many loci (and genes) to a manageable number of functional groups and pathways that may need therapeutic attention [5]. Figure 1 offers such grouping of genes. All genes listed are genome-wide significantly associated with CAD risk [11]. The allocation of the genes to downstream effects was made by gene ontology and canonical pathway maps including, among others, the Kyoto Encyclopedia of Genes and Genomes. Given that a gene can play a role in multiple biological processes, some genes are found multiple times in the Figure such that the overall number of entries is much larger than the 164 loci that house these respective genes. As can be seen, only a few functional groups and pathways (or genes) are currently addressed by therapeutic interventions. Indeed, only genetic variants affecting LDL cholesterol, triglycerides, platelet function, blood pressure or inflammation can be addressed by pharmacological or lifestyle measures that may neutralize an unfavorable disposition.

Figure 3 exemplifies a hypothetical sub-network from such a functional group: cell migration and adhesion. All genes illustrated in the sub-network are genome-wide significantly associated with CAD [11]. Endothelin-1, its receptor type A and other downstream genes in the Figure are likely to play a role in the development of atherosclerosis by modulating cell migration and adhesion, most probably through their impact on the activation of integrins [31]. Databases on protein–protein interactions curated from scientific literature suggest that the respective gene products may interact, as these genes were also annotated to the respective categories in databases or found manual curation (e.g., EDNRA) to affect cell migration and adhesion. Thus, it is possible that these genetic variants, identified for their genome-wide significant association with CAD, disturb this cellular function (e.g., in monocytes or endothelial cells) and therefore increase MI risk. Future studies need to merge such in silico predictions with experimental validation to broaden our understanding of the mechanisms leading to coronary disease [28].

Fig. 3
figure 3

The figure displays a hypothetical sub-network affecting CAD risk. All genes shown in the figure were genome-wide significantly associated with CAD in GWAS studies [10, 11]. Interestingly, all these CAD GWAS hits are related to the term cell migration and adhesion in functional annotations retrieved from the Gene Ontology (http://www.geneontology.org/) and ConsensusPathDB (http://cpdb.molgen.mpg.de/database) databases. The latter database integrates 32 public resources, including biochemical pathway data and protein–protein interactions (PPI) curated from the literature. Querying the protein–protein interactions and searching for direct interactions among the CAD GWAS hits previously annotated to the cell migration and adhesion functional group constructed the sub-network. It illustrates a hypothetical cascade by which endothelin-1 (EDN1) via its receptor A (EDNRA) and activation of the insulin receptor substrate signaling protein (IRS1) could potentially influence these processes. EDN1 endothelin-1, EDNRA Endothelin Receptor Type A, RHOA Ras homolog gene family member A, PRKCE Protein kinase C epsilon type, ITGB5 integrin subunit beta 5, PLCG1 phospholipase C gamma 1, NCK1 NCK adaptor protein 1, IRS1 Insulin receptor substrate1

The successful discovery or multiple risk alleles by GWAS allows to explain an increasing proportion of overall CAD heritability (i.e., currently about 25%) [11]. However, there is still a substantial proportion of “missing heritability”. This is particularly eminent in subjects with a positive family history, suggesting either specific gene–gene interactions (epistasis) or rare (private) variants have a profound effect in such families or individuals with otherwise unexplained risk. Finally, as for most complex multi-factorial diseases, it is the interplay between genetic predisposition, as well as lifestyle and environmental factors that modulates each individual’s risk of developing CAD, suggesting that more efforts should be put on documenting and integrating the latter.

Moreover, there is a substantial need to explain the disease mechanisms both, at the chromosomal level, as well as the level of subsequently affected functional groups and pathways. Currently, large efforts address the systems biology affected by genome-wide significant risk alleles [28].

The first step in the elucidation of the pathophysiological pathway is to identify the casual variant at each locus, followed by the challenge to identify the target gene affected by that variant which is ultimately responsible for the GWAS signal [29]. Next, the downstream mechanisms that are disturbed by changes of the causal gene need to be determined [28]. In most cases unraveled so far, this happens via alteration of gene expression and subsequent protein abundance in the first place [30]. However, despite valid hypotheses regarding many genes and pathways, the exact mechanisms underlying the identified loci often remain unknown. Even the assignment of the loci to genes is mainly based on proximity.

Conclusion

The last decade of genomic research led to the identification of 164 common genetic loci, each of them conferring modest risk for CAD and MI [10, 11]. It is foreseeable that more variants will be identified with increasing sample sizes of GWAS. In addition, whole-exome and whole-genome sequencing studies have identified rare risk variants in families and large patients’ cohorts with stronger effects. Particularly GWAS expanded the understanding of genetic disease etiology, such that by now we have a much better picture of the underlying biology. Currently, functional studies investigating the mechanistic link between genetic variation and disease onset, aim at identifying novel treatment targets. Enormous progress has been made in this respect, as exemplified by GUCY1A3, PCSK9, ANGPTL4, and ANGPTL3, i.e., genes with genome-wide association to CAD and potential druggability. Indeed, these recent findings are excellent starting points for individualized treatment strategies in the future. Finally, despite all these advances, only a part of the heritable risk for CAD can be explained until now.