Introduction

Prior to the introduction of randomized control trials (RCTs), physicians relied on personal experience, case reports, case series, and epidemiological associations to guide clinical practice. With the introduction of the RCT, researchers established a “gold standard” method for addressing important clinical questions that have changed the practice of medicine. In addition to limiting various biases associated with observational study design, RCTs can go beyond epidemiological associations to establish a causal relationship between a risk factor and a disease. While RCTs have improved our understanding of risk factors and treatments for cardiovascular disease (CVD), this method has several limitations. RCTs can be expensive and time consuming to conduct. They also require rigorous trial design to accurately interpret the results. From an ethical standpoint, clinical equipoise must be established in order to carry out a RCT. By this principle, there must be uncertainty in each arm as to which exposure or treatment will benefit patients. This requirement can make RCTs infeasible to conduct or can limit patient recruitment based upon patient or physician perceptions of benefit and risk. For example, testing the causal effect of a risk factor such as smoking or excessive alcohol use on CVD through a RCT would not be possible as it is not ethical to randomize patients to a smoking or alcohol use arm. Another important limitation to RCTs is that in many cases, there are no available therapeutic agents that can selectively modify the risk factor of interest. This limits the ability of RCTs to assess whether a risk factor is truly causal.

The limitations of RCTs to establish causality have led to the development of alternative techniques to investigate the relationship between a modifiable risk factor and a disease outcome. Mendelian randomization is one such technique that has gained increasing interest. Use of Mendelian randomization was initially utilized by Martin Katan to understand the relationship between cholesterol levels and cancer morbidity/mortality [1]. Since an individual’s cholesterol levels are inherently confounded by comorbidities that accompany cancer, Katan proposed to use genetic variation in the apolipoprotein E (ApoE) gene that correlates with low-density lipoprotein cholesterol (LDL-C) and total cholesterol levels, rather than the cholesterol levels themselves. He hypothesized that, if low cholesterol was indeed causally related to development of cancer, individuals with genetic variation linked to low LDL-C levels would have the highest incidence of cancer. Since Katan’s initial description, Mendelian randomization studies have grown exponentially (Fig. 1), in part, by advances in DNA sequencing and genome-wide association studies (GWAS) that have led to the identification of many of single nucleotide polymorphisms (SNPs) that are linked to risk factors. These variants can be used to answer questions which are not feasible to address by a RCT. It can also limit observational bias and get closer to establishing causality. In this review, we will discuss Mendelian randomization as an epidemiologic tool to evaluate emerging cardiovascular risk factors and describe its strengths and weaknesses. We will also summarize the current literature using Mendelian randomization in cardiovascular disease research and address future directions and applications of this methodology.

Fig. 1
figure 1

Mendelian randomization studies in PubMed. Graphical representation of the frequency of Mendelian randomization studies per year published in PubMed (accessed July 26, 2017)

Understanding Mendelian Randomization

Mendelian randomization uses the random distribution of naturally occurring SNPs that are strongly linked to a risk factor as the instrument for randomization (Fig. 2). This method is founded on the principle that an individual’s genotype is determined randomly at conception from his/her parental genotypes and that genetic variants governing variation in one trait are (often) inherited independently of those influencing another trait. Therefore, genotype will in most cases be independent of behavioral, dietary, and other factors that could confound the association between risk factor and outcome [2, 3]. Furthermore, unlike most observational studies, those with and without the specific genetic variant are expected to be balanced with regard to both measured and unmeasured confounders. Take for example the use of Mendelian randomization to understand risk factors for CVD through lipoprotein (a) (Lp(a)). Observational studies of Lp(a) have shown inconsistent associations between Lp(a) and CVD [4,5,6]. Lp(a) concentrations can vary 1000-fold with a skewed distribution in most populations and may be influenced by several confounding factors, such as age, sex, lifestyle factors, fasting state, and inflammation. Since the variation in Lp(a) levels is partially controlled by polymorphisms in the LPA gene locus, Mendelian randomization was employed in the Copenhagen City Heart Study using variants in the LPA gene characterized by the number of KIV-2 repeats. They found a modestly increased risk for CVD in carriers of genotypes associated with increased Lp(a) levels, supporting a causal role of elevated Lp(a) in the development of CVD [7].

Fig. 2
figure 2

Using Mendelian randomization to understand the relationship between risk factors and disease. This image depicts the relationship between a genetic variant (single nucleotide polymorphism or SNP), modifiable risk factor (exposure), and disease, accounting for confounders. Mendelian randomization relies on the assumption that genetic variants explain variation in the exposure but do not affect the disease outcome except potentially through the exposure, making them valid instrumental variables. Mendelian randomization is not (presumably) affected by confounders that can be associated with both the exposure and outcome, thereby providing an advantage over traditional observational study design methodology

An important contribution of Mendelian randomization is its ability to circumvent the problem of confounding and assess the causal relationship between an exposure and a disease. However, several assumptions must be met to determine causality. First, causal inferences require a true association between a genetic variant and an exposure, such that a genetic variant can serve as a reliable proxy for the exposure. This assumption is mostly likely to be true when the variant occurs in a gene that is directly related to the exposure such as the sickle cell variant in the HBB gene causing hemoglobin S production and the classic “sickling” changes in the red blood cell. Second, causal inferences require that the relationship between a genetic variant and an outcome is mediated through the exposure. The genetic variant should not affect the outcome directly or through alternate pathways other than the effect mediated by the exposure. For example, for Lp(a) above, the LPA gene would not be an appropriate choice for randomization if it also influenced blood pressure or glucose levels. Similarly, the genetic variant should not be associated with confounders of the outcome. If these assumptions are met and the limitations of Mendelian randomization are considered, evidence of a causal relationship between an exposure and an outcome is made.

Strengths of Implementing Mendelian Randomization

While observational studies have been a useful tool to assess associations between risk factors and disease, there are significant limitations to their use. Observational studies are prone to confounding (a second exposure is related to both the first exposure and the outcome making it seem as if the first exposure and outcome were directly linked), reverse causation (the outcome may actually cause the putative exposure), or selection bias (the exposure is only associated with the outcome in a segment of the population due to alternative characteristics). Mendelian randomization is a way to overcome these limitations since the central principle of this technique is that genetic variants randomly segregate during meiosis. As a result, genetic variants are randomly distributed within a population limiting the presence of confounding factors when participants are stratified by genotype rather than the level of risk factor.

Additionally, Mendelian randomization eliminates the possibility of reverse causation influencing the relationship between a risk factor and an outcome. In observation studies, the direction of the relationship between a risk factor and an outcome cannot be determined. Using Mendelian randomization, genetic variants are inherited prior to the outcome, and genotype stays constant throughout life. As a result, the risk factor or biomarker of interest clearly precedes the development of disease. By this principle, Mendelian randomization can also assess the cumulative lifetime effect of a risk factor such as lifetime exposure LDL-C levels on CVD risk. This addresses the limitations of evaluating the effect of longitudinal exposures on an outcome in observational studies, which have limited exposure/follow-up time.

Mendelian randomization has other advantages over RCTs as well. Although traditional RCTs are the gold standard for assessing the relationship between an exposure and an outcome, they can be difficult to implement or may not be feasible. Publicly available data from GWAS studies can be used to determine the effect of variants associated with a risk factor on outcomes. These variants can then be tested in other populations or in publicly available datasets to determine the causal relationship between a risk factor and a disease, which is often much less costly than designing a RCT.

Limitations of Implementing Mendelian Randomization

While Mendelian randomization can have significant advantages when implemented properly, this study design has certain challenges which can hamper the interpretation of the results. These limitations are important to be aware of then assessing the causal relationship between an exposure and an outcome. One of the most significant limitations is pleiotropy. Pleiotropy is the production of multiple effects by a single variant, all of which can influence the outcome. Vertical pleiotropy occurs when the exposure associates with multiple biomarkers on the same pathway. Horizontal pleiotropy occurs when the exposure associates with multiple biomarkers on different pathways. An example of this type of pleiotropy occurred with the study of a variant in the APOE gene. This variant was found to be associated with C-reactive protein (CRP) levels and risk of CVD, thereby leading the investigators to the erroneous conclusion that CRP was causal for CVD [8••]. However, true causal factor was LDL-C, which is also influenced by the APOE genotype and causally related to CVD. Pleiotropy is least likely to occur when the variant occurs in the gene coding for the biomarker associated with the variant. Pleiotropy becomes more likely if the relationship between the variant and the outcome is complex or indirect. Absence of pleiotropy is an assumption of Mendelian randomization and, if a SNP has pleiotropic effects, it is not a valid instrument and cannot be used. However, an alternative SNP could still be chosen as a proxy for the exposure allowing a Mendelian randomization approach.

A second limitation of Mendelian randomization is linkage disequilibrium (LD). LD is a statistical association between different genetic variants induced by the tendency of alleles that are close together on a chromosome to be inherited together and can occur when there is non-random association of variants at different loci. By this principle, two SNPs have LD if they are observed to be inherited together more often than expected. The likelihood of LD increases if the SNPs are located close to one another on the chromosome. For example, a SNP affecting the expression of gene A may be in linkage disequilibrium with a SNP that affects expression of gene B. If the product of gene B is causally related to the disease outcome, it would be wrong to conclude that gene A—or the dependent biomarker—is responsible for the phenotype, although such association could be found. As a result, to limit the influence of LD, ideal gene variants for Mendelian randomization are those not located in close proximity to other genes which also exert an influence on an outcome through alternate pathways.

A third limitation of Mendelian randomization is that precise estimates of causal effect are often biased. For example, causal effect estimates from Mendelian randomization studies can be thought of as a population-average effect (i.e., as if the intervention was applied to the entire population) and could be different than the effect of interventions applied to specific subgroups. On the other hand, weak genetic instruments, that explain too little variation in the exposure, could bias causal estimates or result in failure to establish causal relationships due do a lack of power. Use of large sample sizes or a genetic score combining multiple SNPs with additive or robust associations with the outcome of interest partially alleviate this concern. An additional caveat to interpretation of this technique is canalization. Canalization refers to the development of counter-regulatory mechanisms in response to a genetic variant. These mechanisms can take the form of genetic redundancy or alternative pathways that alter the relationship between a genetic variant and an outcome. For example, canalization can potentially be seen with variants associated with CYP1A1 gene which encodes for a cytochrome P450 enzyme. The highly inducible form of this enzyme is associated with an increased risk of lung cancer. In light smokers with this variant, the risk of lung cancer is increased by 7-fold. In heavy smokers with this variant, however, the risk of lung cancer is increased by 2-fold [9]. The differential impact of this variant may be the result of canalization through which the body’s response to heavy vs. light smoking complicates the interpretation of adverse effects of smoking on lung cancer development.

Mendelian Randomization in Cardiovascular Disease

Many studies have assessed the relationship between genetic variants, risk factors, and cardiovascular disease (CVD) (Table 1). One important risk factor that has been investigated using his methodology is obesity. Although obesity is frequently associated with CVD in observational studies, it is unclear if obesity causes CVD or if it is a confounder for other CVD risk factors. Using a composite of genetic variants associated with BMI, three studies have used Mendelian randomization to establish a potential causal link between BMI and CVD [13•, 23, 24]. Interestingly, a different study by Holmes et al. established a potential causal relationship between BMI and CVD risk factors, such as blood pressure, fasting glucose, inflammation, and type 2 diabetes mellitus (T2DM) but failed to show a similar relationship between obesity and CVD [25]. All of these studies have limitations including lack of specificity of the obesity phenotype and uncertainty regarding the presence of canalization. Abdominal, or central, obesity may be a more CVD-specific obesity phenotype. Observational studies have evaluated the association between abdominal adiposity (using waist-hip ratio [WHR] as a surrogate measure) and CVD with varying results [35,36,37]. Emdin et al. recently utilized a developed central obesity risk score comprised of 48 SNPs associated with WHR adjusted for BMI. The genetic risk score was positively associated with both T2DM and CVD outcomes. However, the score was also associated with alternative CVD risk factors such as triglyceride-rich lipoproteins, insulin, glucose, and systolic blood pressure. This alone does not invalidate the use of obesity variants as genetic instruments for testing the causal relationship between BMI or WHR and CVD. On the contrary, this suggests that alternative CVD risk factors could be direct consequences of obesity, rather than simply correlative factors. Similar findings were seen from a more recent Mendelian randomization study conducted by Dale et al. [13•]. Obesity has also been associated with atrial fibrillation in multiple observational studies and longitudinal exposure to obesity is associated with increased atrial fibrillation [38,39,40,41,42,43]. Chatterjee et al. used Mendelian randomization to evaluate the causal relationship between obesity and atrial fibrillation. Their study assessed the association between a genetic locus (FTO) which has been shown to have the strongest association with obesity and atrial fibrillation [44]. Additionally, they evaluated the association between a weighted composite of 39 SNPs associated with obesity and atrial fibrillation. Their results showed a positive association between obesity and atrial fibrillation using both methods, strengthening the evidence for a causal relationship [45•]. However, there were several caveats to this study, such as the association between the genetic variants and BMI was stronger at younger ages. Additionally, SNPs that were recently found to be associated with BMI were not included in this study.

Table 1 Recently published studies using Mendelian randomization to determine the relationship between emerging risk factors and cardiovascular disease

Blood cholesterol levels are an established risk factor for CVD. While it has been accepted that LDL-C plays a causal role in the development of CVD, there are a number of unanswered questions regarding this relationship [10•]. Despite reductions in LDL-C, patients on statin therapy can continue to have CVD events. Using Mendelian randomization, genetic variants associated with lower LDL-C were used to assess the effect of long-term LDL-C reduction on CVD. They found that exposure to lower LDL-C earlier in life was protective in the development of CVD [11], suggesting early initiation of statin therapy may be beneficial. Mendelian randomization was also utilized to study the relationship between T2DM and LDL-C. Statin therapy has been associated with an increased risk of developing T2DM [46, 47]. However, it is unclear if the risk of T2DM is due to LDL-C levels or statin use. Using variants associated with LDL-C, White et al. found reduced risk of T2DM with higher LDL-C. This finding contributes to the evidence that lipid-lowering therapy may have unintended cardiometabolic consequences [12]. Mendelian randomization studies have also been employed to understand the relationship between triglycerides and CVD. Observational studies have shown a positive association between triglycerides and risk of CVD but RCTs of triglyceride-lowering medications have been negative [48, 49]. Two Mendelian randomization studies have been performed to better understand this discordance. Holmes et al. used a genetic risk score consisting of variants associated with triglyceride levels. They demonstrated that genetic predisposition for higher triglyceride levels is associated with a higher risk of CVD [10•]. Similar findings were demonstrated in a study performed by White et al. [12]. Mendelian randomization has also been used to assess the relationship between high-density lipoprotein cholesterol (HDL-C), cholesteryl ester transfer protein (CETP), and CVD. Inhibition of CETP is known to result in higher HDL-C levels. GWAS studies have also shown a correlation between polymorphisms in the CETP locus and variation in lipid levels. A meta-analysis using Mendelian randomization of the CETP polymorphism and coronary artery disease (CAD) risk has shown mixed results. One study showed no risk reduction with higher genetically determined HDL-C levels [21, 50,51,52,53]. In contrast, a second study utilizing this polymorphism showed a reduced risk of CVD with reduction in circulating genetically determined CETP [22]. Additional testing evaluating the relationship between HDL-C and risk of CVD has been inconclusive. Using genetic variants associated with HDL-C, two studies failed to show a causal association between HDL-C and CVD using Mendelian randomization [10•, 12].

Plasma levels of C-reactive protein (CRP) are independently associated with risk of coronary heart disease, but whether CRP is causally associated with coronary heart disease or merely a marker of underlying atherosclerosis is uncertain. As briefly mentioned above, Elliott et al. carried out a genome-wide association (n = 17,967) and replication study (n = 13,615) to identify genetic loci associated with plasma CRP concentrations. They then carried out a Mendelian randomization study of the most closely associated SNPs in the CRP locus and published data on other CRP variants involving a total of 28,112 cases and 100,823 controls, to investigate the association of CRP variants with coronary heart disease. Polymorphisms in five genetic loci were strongly associated with CRP levels. However, genetic variants in the CRP locus showed no association with coronary heart disease: OR, 1.00; 95% CI, 0.97 to 1.02. The lack of concordance between the effect on coronary heart disease risk of CRP genotypes and CRP levels therefore argues against a causal association of CRP with coronary heart disease [8••].

Many other biomarkers linked to CVD risk have been investigated using Mendelian randomization, including markers of metabolic regulation, hemodynamic stress, inflammation, hormones, and vitamin levels. For example, adiponectin is a protein secreted by mature adipocytes that is downregulated in obese individuals [54]. In several observational studies, low adiponectin was associated with insulin resistance, T2DM, and dyslipidemia [55,56,57]. However, the association between adiponectin and CVD has been variable [58,59,60,61]. Utilizing Mendelian randomization, Borges et al. evaluated the effect of SNPs associated with adiponectin levels on CVD. Their results did not show a relationship between genetically determined adiponectin levels and CVD [26]. Another example is N-terminal pro-B-type natriuretic peptide (NT-proBNP), a marker of hemodynamic stress and a prognostic biomarker in a variety of cardiac diseases [62,63,64,65,66]. GWAS have identified a SNP in the promoter of the natriuretic peptide precursor B gene associated with NT-proBNP levels. Using data from the PLATelet inhibition and patient Outcomes (PLATO) trial, no association was seen between the primary composite outcome of cardiovascular death, myocardial infarction, and stroke and genetically determined NT-proBNP levels [27]. This provides evidence that NT-proBNP may be a marker, rather than a mediator, in the etiological pathway to CVD. Many other biomarkers have been studied with mixed results (Table 1), although the clinical implication of many of these biomarkers for risk screening, prevention, and treatment remain unclear.

Conclusions

As the use of Mendelian randomization becomes increasingly common, understanding this method of clinical research has become ever more important. This technique has been used to evaluate the potential etiological role of emerging risk factors for cardiovascular disease. It has also been used in creative and interesting ways to address questions that would be difficult to answer in clinical trials such as whether genetic risk for coronary disease is modifiable by lifestyle [67]. However, accurate interpretation of the results of these and future studies will require a keen understanding of the advantages and disadvantages of this technique. As our understanding of genetics and genetic variants improve, the ability to identify additional risk loci for cardiovascular disease will increase, opening new avenues for investigation using Mendelian randomization. It is important to note that this technique is still inherently observational in nature and that the best method to answer many clinical questions, when feasible, is still a randomized controlled trial. While Mendelian randomization will remain an important tool in evaluating causality in epidemiologic research and provide further insight into risk factors for CVD, understanding the strengths and limitations of the technique is important for appropriate interpretation of results and drawing clinical conclusions.