Keywords

Introduction

As outlined in some of the preceding chapters of this book, a number of genes responsible for dyslipidemias have been discovered because they harbor rare mutations that cause a very small number of individuals—typically in one or a few families—to have profoundly dysregulated blood lipid concentrations. In other words, these few individuals have monogenic or “Mendelian” dyslipidemias caused by the aberrant action of single genes. Although researchers have learned much about cholesterol metabolism by studying these genes over the past few decades, this work has left unanswered the question of what causes a significant proportion of the general population to have what might be thought of as garden variety dyslipidemias—high but not exceptionally high blood low-density lipoprotein cholesterol (LDL-C) concentrations, low but not exceptionally low blood high-density lipoprotein cholesterol (HDL-C) concentrations, and/or high but not exceptionally high blood triglyceride (TG) concentrations. It has been a popular belief that in these patients, in whom there do not seem to be single aberrant genes driving the abnormal lipid levels, the dyslipidemias are polygenic in nature, i.e., caused by the actions of multiple genes in tandem. By this reasoning, if the functions of 5, 10, 20, or 50 genes were slightly dysregulated, the combined effect would result in abnormal lipid levels.

Although the notion of polygenic dyslipidemias is attractive to many researchers, it had not been possible to test this model by detecting and measuring the slight dysregulation of each of the numerous genes that would be involved—indeed, it was not even clear which genes out of the roughly 20,000 genes in the human genome might be involved. In the past few years, the completion of the Human Genome Project and advances in genotyping and sequencing technologies have made it possible for the first time to do unbiased searches for genes that make small contributions to blood lipid levels in dyslipidemia patients. This chapter focuses on the methodology known as the genome-wide association study (GWAS) and summarizes the advances in knowledge regarding cholesterol metabolism that have emerged from the application of this methodology to tens of thousands of people and subsequent work to identify and characterize novel genes involved in dyslipidemias.

A Primer on Genome-Wide Association Studies

The human genome is roughly three billion DNA bases in size, spanning 23 chromosome pairs; the vast majority of the sequence is identical across the human species . What makes each individual unique is a large number of DNA variants distributed throughout the genome. Some of these DNA variants are extremely rare and have large effects on gene function; as described above, these variants can be responsible for monogenic disorders. Other DNA variants are quite common, occurring in >  1 % of the general population and, in some cases, the majority of the people in a population. Most of these common DNA variants are of no functional consequence, but some lie either within or near genes and have small effects on gene function. These variants do not alter gene activity enough to cause disease by themselves, but instead need to be combined with other gene variants or with environmental factors in order for disease to occur.

All of these common DNA variants are termed polymorphisms, of which there are several varieties. The most relevant to the use of GWAS to study polygenic disorders is the single nucleotide polymorphism (SNP) , in which a single base pair in the DNA differs from the usual base pair at that position. There are an estimated 11 million SNPs across the human genome, occurring on an average every few hundred base pairs. A local area on a chromosome around an SNP is termed a locus. Each person has two copies of each locus because of the pairing of chromosomes (the exceptions are loci on the X or Y chromosome in men, who have only one of each). A person’s genotype at an SNP is the identity of the base pair position for each of the two copies—also termed alleles—of the SNP on paired chromosomes; thus, a genotype is typically a combination of two alleles. These two alleles may be identical (termed homozygosity) or different (termed heterozygosity).

Groups of SNP alleles near genes tend to stay together with the genes as they are passed along from parents to children for generation after generation, over thousands of years. Thus, even if it is not known which gene contributes to a disease, one can use an SNP that is not in the gene—but is near to and therefore linked to the gene—as a “tag” for the gene. In the past decade, the technology has become available to determine the genotypes at hundreds of thousands of “tag” SNPs in a person’s DNA in a single experiment using a “gene chip.” By applying the gene chips to thousands of individuals, some with a disease and some without a disease, researchers are able to identify tag SNPs that are associated with disease. This is the principle underlying GWAS.

As an example of how a GWAS might be performed, imagine a study in which DNA samples are collected from several thousand individuals with high blood LDL-C levels (e.g., >  200 mg/dL) and several thousand individuals with low blood LDL-C levels (e.g., <  60 mg/dL in the absence of lipid-lowering medications). For each study participant, a gene chip would be used to determine the genotypes of more than 1 million SNPs across the genome. Although the use of gene chips on thousands of people would yield billions of pieces of data, the statistical methods to analyze these data are conceptually straightforward. Computer software is used to analyze each of the 1 million SNPs separately; for each SNP, the question is asked whether allele “A” and allele “B” of the SNP occur in equal proportions in the high LDL-C cohort and in the low LDL-C cohort. For the vast majority of the 1 million SNPs, no difference in the allele proportions would be observed. For a particular SNP, however, there might be a statistically significant difference in the allele proportions such that allele “A” occurs more commonly in the high LDL-C cohort than in the low LDL-C cohort. Because the SNP tags any nearby genes, the conclusion would be that there is a DNA variant in one of the local genes that influences that gene’s function in such a way as to influence blood LDL-C levels. From a researcher’s perspective, the SNP acts as a “signpost” indicating that somewhere in the locus lies the key to a biological mechanism that contributes to dyslipidemia in the general population. (In fact, some of the million tested SNPs are very close to one another and effectively tag the same locus, so it is the locus rather than any individual SNP that is considered to be a GWAS discovery.)

In practice, it would be difficult to recruit thousands of study participants with either very high or very low LDL-C levels due to their relative scarcity in the population. In an alternative study design, thousands of people from the general population at large would be recruited, and computer analysis would be performed using the blood LDL-C levels as a continuous rather than a categorical variable. The question would be framed in a different way: For a given SNP, what are the average LDL-C levels for individuals who are homozygous for allele “A” versus individuals who are heterozygous for alleles “A” and “B” versus individuals who are homozygous for allele “B”? If there are statistically significant differences among the three genotype groups, the SNP/locus would be considered to be associated with blood LDL-C levels.

It is worth noting that in either of these GWAS study designs, there are effectively a million experiments being performed, one for each individual SNP. As such, the traditional statistical significance threshold of P <  0.05 is inappropriate, since by that threshold 50,000 SNPs (5 % of 1,000,000) would be associated with LDL-C levels by chance alone, with most if not all being false positives. Accordingly, GWAS researchers insist on the statistical significance threshold being much more stringent, e.g., by adjusting for the number of experiments (known as the Bonferroni correction) such that P should be less than 0.05 ÷ 1,000,000, or P <  5 × 10−8, for the SNP/locus to be regarded as having a true association.

GWAS on Blood Lipid Traits

One consequence of the GWAS study design is that it becomes increasingly powered to detect associations as the number of study participants grows. Accordingly, the past several years have seen successive reports of increasingly larger GWAS studies of blood lipid concentrations, beginning with a few thousand individuals and culminating in more than 100,000 individuals. As such, the list of reported lipid-associated GWAS loci has substantially grown over that time period.

The first published high-density GWAS on blood lipid concentrations was performed with data from about 3000 individuals of European descent in the Diabetes Genetics Initiative. This study identified one statistically significant locus each for three lipid traits—LDL-C, HDL-C, and TG [1]. The LDL-C locus contained the apolipoprotein E (APOE) gene, and the HDL-C locus contained the cholesteryl ester transfer protein (CETP) gene—both well-established regulators of lipoprotein metabolism. The TG locus contained no previously identified lipid regulators; the single gene in the locus is glucokinase regulatory protein (GCKR). Subsequent functional experiments pointed to a coding missense variant in the GCKR gene as being responsible for the TG association [2, 3]. The fact that this first GWAS identified two known lipid genes provided strong validation of the study design as well as giving confidence that the GCKR locus was a true positive (and novel) finding.

A second set of published GWAS studies for blood lipid concentrations added the Finland–US investigation of NIDDM (noninsulin-dependent diabetes mellitus) genetics study (FUSION) and SardiNIA cohorts to the Diabetes Genetics Initiative for a cohort of about 9000 individuals of European descent [4, 5]. As a method of increasing the studies’ power to detect statistically significant associations, the researchers used a staged approach: They selected the SNPs with the best P values from the analysis of data from the initial 9000 individuals and genotyped just those SNPs in an additional 18,000 individuals of European descent from several other cohorts. This approach yielded a total of 19 statistically significant lipid-associated loci. In addition to the three loci identified by the first GWAS (APOE, CETP, GCKR), the list now included many more well-established lipid regulators, including apolipoprotein A-I (APOA1), apolipoprotein B (APOB), LDL receptor (LDLR), lipoprotein lipase (LPL), proprotein convertase subtilisin/kexin type 9 (PCSK9), and 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR). The last is noteworthy because it encodes the enzyme that is targeted by the statin class of LDL-C-lowering drugs. These GWAS studies also identified six novel loci, two of which were also discovered in simultaneously published GWAS studies that focused solely on LDL-C (a locus on chromosome 1p13) or TG (a locus on chromosome 7q11) [68].

A third set of GWAS studies on blood lipid concentrations, analyzing data from up to 40,000 individuals of European descent, identified more than 30 lipid-associated loci, half of them harboring well-established lipid regulators, the other half novel, continuing the trend observed from the first two sets of studies [911].

Finally, a definitive GWAS combining data from all of the cohorts with which the previous three sets of GWAS studies had been performed—collectively termed the Global Lipids Genetics Consortium (GLGC)—used more than 100,000 individuals of European descent to identify a total of 95 loci associated with one or more blood lipid concentrations—LDL-C, HDL-C, TG, and/or total cholesterol (Table 17.1) [12]. Many of these loci were also shown to be associated with lipids in other ethnic groups, including East Asians, South Asians, and African Americans. About two thirds of these loci are novel. Among the remaining one third of the loci are 16 genes that have been implicated in familial lipid disorders (Table 17.2), highlighting that the same gene can contribute to both monogenic and polygenic dyslipidemias, with a rare variant in the gene greatly perturbing its function and causing disease, and with a common variant in the gene mildly perturbing its function and combining with common variants in other genes to collectively produce disease.

Table 17.1 GWAS loci associated with blood lipid traits
Table 17.2 Monogenic dyslipidemias caused by genes with nearby common DNA variants identified in lipid GWAS

In support of the polygenic model of disease, the GLGC study calculated SNP “risk scores” that summarized the number of LDL-C-raising, HDL-C-raising, or TG-raising SNP alleles in each individual who had high or low LDL-C levels (mean 219 mg/dL vs. mean 110 mg/dL), high or low HDL-C levels (mean 90 mg/dL vs. mean 36.2 mg/dL), or high or low TG levels (mean 1,079 mg/dL vs. mean 106 mg/dL). Individuals with LDL-C risk scores in the top quartile of the combined GLGC cohort were 13 times as likely to have a high-LDL-C level than individuals with scores in the bottom quartile; individuals with HDL-C risk scores in the top quartile of the combined GLGC cohort were four times as likely to have a high-HDL-C level than individuals with scores in the bottom quartile; and individuals with TG risk scores in the top quartile of the combined GLGC cohort were 44 times as likely to have a high-TG level than individuals with scores in the bottom quartile [12]. These results confirm that the additive effects of multiple common variants do indeed contribute to dyslipidemias in many individuals.

In spite of these data, there have continued to be criticisms that the common variants discovered by GWAS have little clinical relevance, since the effects on gene function are small. Besides disregarding the polygenic model of disease, such arguments ignore the possibility that a GWAS gene can turn out to be clinically important if its activity is modulated by a large degree with pharmacological intervention, with HMGCR being the prototypic example. If the discovery of statins had not predated the GWAS era, the finding that HMGCR is in a locus associated with LDL-C levels would have suggested to researchers that pharmacological inhibition of HMGCR might be a viable therapeutic strategy. By this reasoning, some of the novel lipid GWAS genes may emerge as clinically useful drug targets, as described below.

Novel Genes That Have Emerged from GWAS

Although the 95 lipid-associated loci identified by the GLGC study potentially point to dozens of novel genes, to date only a few such candidates have been studied with functional experimentation. We focus on three genes from which novel insights into lipoprotein metabolism have started to be gleaned.

SORT1. Perhaps the most compelling of the novel lipid GWAS loci lies on chromosome 1p13. SNPs in the 1p13 locus have among the strongest associations with LDL-C of any loci in the genome. Individuals who are homozygous for the more common allele of one of these SNPs have an average 16 mg/dL higher blood LDL-C concentration than individuals who are homozygous for the less common allele [4]. The former also have about a 40 % increase in risk of myocardial infarction compared to the latter [13]. Thus, the 1p13 locus is strongly associated with both blood lipids and the most serious clinical phenotype resulting from dyslipidemias.

The 1p13 locus harbors several genes including CELSR2, PSRC1, and SORT1, none of which had previously been linked to lipid metabolism in the pre-GWAS era. Functional experimentation with these genes in mouse models revealed that SORT1 (which encodes the sortilin protein) modulated blood lipid levels when its expression was either increased or decreased in mouse liver, or when it was deleted in mice altogether [14, 15]. Cell-based experiments have discovered two roles for sortilin in lipoprotein metabolism. First, it regulates blood LDL-C levels by reducing the secretion of VLDL particles from the liver into the bloodstream, where the VLDL particles are ultimately converted to LDL particles [14]. It appears to do this in hepatocytes by directly binding apolipoprotein B (apoB)—the core protein of VLDL/LDL particles—in the endoplasmic reticulum/Golgi apparatus and trafficking apoB to the endolysosomal compartment for degradation, thereby reducing the number of VLDL particles produced and, ultimately, secreted [14, 16]. Second, sortilin appears to be able to function as an alternative LDL receptor by binding to apoB-carrying LDL particles in the bloodstream and facilitating the endocytosis of the particles into the cell, followed by their degradation in the endolysosmal compartment [16]. Both mechanisms should have the effect of lowering blood LDL-C concentrations (Fig. 17.1), consistent with the association of the 1p13 locus with LDL-C in human populations.

Fig. 17.1
figure 1

Model of sortilin actions in hepatocytes a Lipoprotein production, secretion, uptake, and degradation in hepatocytes. b Sortilin decreases very low-density lipoprotein (VLDL) particle secretion by trafficking nascent particles to the endolysosomal compartment. It also acts as an alternative LDL receptor to facilitate endocytosis of LDL particles into the cell and degradation in the endolysosomal compartment. The consequence of both actions is to reduce LDL cholesterol levels in the blood and thereby reduce the risk of myocardial infarction (MI)

The strong association of the 1p13 locus with myocardial infarction suggests SORT1 as a plausible clinical drug target, as modulation of the gene would be expected to not only reduce blood LDL-C levels but also the risk of myocardial infarction. However, the biological evidence indicates that increasing sortilin activity in liver would be a therapeutically useful intervention, which may be difficult to achieve with traditional therapies (in contrast to inhibiting an enzyme’s activity, as is the case with statin drugs and HMGCR).

TRIB1. Another compelling novel lipid GWAS locus lies on chromosome 8q24. SNPs in this locus have a distinctive pattern of association with multiple lipid traits, with the minor allele conferring decreased LDL-C levels, increased HDL-C levels, and decreased TG levels—all changes that are epidemiologically associated with lower risk of coronary artery disease (CAD) [4, 5]. Perhaps not surprisingly, then, the same SNPs are themselves associated with CAD risk [12].

There is a single gene in the locus, TRIB1 (tribbles homolog 1), which had not previously been linked to lipid metabolism in the pre-GWAS era. Two types of functional experimentation have been undertaken in mice [17]. First, Trib1 knockout mice (in which the gene has been deleted) were observed to have increased blood levels of cholesterol and TG. Second, mice in which Trib1 was overexpressed in liver showed the opposite effect—decreased blood levels of cholesterol and TG. The mechanism appears in part to be related to hepatic production of VLDL particles; Trib1 overexpression resulted in decreased particle production and secretion, whereas the knockout mice displayed the opposite effect [17]. This was reproduced in cell-based experiments in which TRIB1 was overexpressed in cultured human hepatoma cells, resulting in decreased apoB particle secretion. Furthermore, Trib1 appears to reduce the hepatic expression of genes involved in lipogenesis, including Acc1, Fasn, and Scd1 [17]. The mechanism(s) through which the protein produces these effects remains to be defined.

As with SORT1, it would appear that increasing TRIB1 expression in the liver would be a therapeutically useful intervention. Indeed, a TRIB1-targeting strategy might be of even greater clinical benefit than a SORT1-targeting strategy since GWAS data suggest that TRIB1 is linked to decreased LDL-C, increased HDL-C, and decreased TG in the blood, whereas SORT1 appears to be linked solely to decreased LDL-C. Moreover, the genetic association of SNPs in the TRIB1 locus with CAD risk offers reassurance that targeting TRIB1 would have the desired clinical outcome—a reduction in CAD rather than just modulation of blood lipid levels.

GALNT2. A novel lipid GWAS locus on chromosome 1q42 that is associated with both HDL-C and TG levels in the blood harbors a single gene, GALNT2, which encodes UDP-N-acetyl-alpha-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase-2, a member of a family of proteins that are involved in the initiation of mucin-type O-linked glycosylation on various proteins. As with SORT1 and TRIB1, no prior role of GALNT2 in lipoprotein metabolism was known. Unlike SORT1 and TRIB1, SNPs in the locus do not have any significant association with CAD.

Functional experimentation with GALNT2 in mice in which the gene was either overexpressed or knocked down in the liver indicated that the gene negatively regulates blood HDL-C levels [12]. The connection of GALNT2 with blood HDL-C and TG concentrations in humans was confirmed by the identification of two families with individuals with dyslipidemia—very high HDL-C and low TG levels—who were heterozygous for a loss-of-function missense mutation in GALNT2 [18]. Physiological studies of these individuals suggested that they had improved postprandial TG clearance due to impaired glycosylation of apolipoprotein C-III (apoC-III), which normally inhibits LPL, which itself hydrolyzes and thus decreases TG in the blood. Thus, GALNT2 represents another example of a gene for which common DNA variants produce a small effect on blood lipid levels and rare DNA variants can single-handedly produce dyslipidemias. However, to date there is no evidence that GALNT2 is associated with a change in CAD risk, and thus its relevance as a therapeutic target remains in question.

Conclusion

The past few years have witnessed remarkable progress in human genetics towards the understanding of polygenic disorders. Blood lipid concentrations represent some of the most successfully studied clinical traits with the use of GWAS. Whereas our prior knowledge of the genetics of dyslipidemias was limited to rare variants causing monogenic disorders, GWAS studies have now identified dozens of novel loci that appear to contribute to polygenetic dyslipidemias. A key challenge will be to determine the molecular mechanisms by which these loci influence blood lipid concentrations. Although functional studies of the novel loci are starting to yield new insights into lipoprotein metabolism, as demonstrated by the examples of the genes SORT1, TRIB1, and GALNT2, there are undoubtedly as many new discoveries to be made with respect to lipoprotein metabolism as have been made in the past few decades of investigation.