Keywords

7.1 Introduction

Similar to other abused substances, genetic factors contribute substantially to cannabinoid use and dependence. Heritability for abuse of cannabinoids, primarily cannabis and its derivatives, ranges from 30% to 80% [1, 2]. Heritability measures the contribution of segregating gene variants to the total variation in a trait or phenotype of interest. For substance use disorders heritability has often been estimated from twin studies in which the concordance rate for cannabis-related traits is compared in monozygotic (genetically identical) and dizygotic (fraternal) twins. Higher concordance of a trait in monozygotic versus dizygotic twin pairs indicates a substantial contribution of genetic factors relative to environmental factors. Thus, high heritability is an indication that genetic factors contribute significantly to cannabinoid use disorders (CUDs). However, few of these factors have been elucidated. In this chapter we review the growing list of genes associated with CUDs based on recent candidate gene, linkage, and genome-wide association studies in humans. Finally, we discuss reasons for and possible solutions to address the paucity of known genetic factors contributing to CUDs.

7.1.1 What Is Cannabis Use Disorder?

The term “substance use disorder” is derived from the Diagnostic and Statistical Manual of Mental Disorders: DSM-5 [3]. This manual defines a set of criteria used to diagnose problematic and recurrent use of drugs or alcohol that can impact health and social well-being. Symptoms of CUD include disruption in normal function caused by cannabis use and development of tolerance, craving, and/or withdrawal symptoms associated with increased or continued use. A cluster of withdrawal symptoms, including sleep disruptions, anxiety, anger, depression have been associated with early abstinence from cannabis and may contribute to continued use. As of 2018, there are no FDA-approved treatments for CUD.

Cannabis is one of the most widely consumed psychoactive substances worldwide [4]. In the United States, policy changes in individual state legislature beginning in 1996 for medical use and continuing in 2012 for recreational use are likely to increase cannabis use in this country. As the number of users increases, so does the risk of CUDs. According to recent epidemiological studies in the United States, CUD impacts ~6% of the population [5]. Key to the identification of individuals at risk for CUD and development of pharmacological interventions for CUD is a better understanding of the genetic factors contributing to the disease.

7.1.2 Introduction to Human Genetic Association Studies

Substance use disorders, like CUDs and other psychiatric diseases, are complex traits that are driven by the actions and interactions of multiple genetic and environmental factors. In the case of genetic factors, variation in heritable traits is caused by inheritance of different gene alleles (e.g. polymorphisms or sequence variants) that confer differential gene regulation or function. Human genes often contain multiple polymorphisms which can impact a single base (single nucleotide polymorphism or SNP) or several (insertion or deletion). These polymorphisms are typically biallelic and exist in one of two forms that represent the ancestral sequence or the altered sequence. The frequency of each allele in the population being sampled is generally a major consideration. Alleles with a minor allele frequency less than 1% will be difficult to study because few individuals with the minor allele exist in the population. In this case, detection of significant associations between inheritance of alleles and trait variation will be difficult even with moderate sample sizes. For this reason, the majority of associations identified between human disease and allelic variation involve common variants with relatively abundant allele frequencies in most human populations. Although common, these variants typically have low penetrance (probability that inheritance of the variant will cause the phenotype being measured). Genetic studies of human disease over the past decade have revealed that most human diseases are complex polygenic traits that result from the inheritance of many small effect risk alleles acting in an additive fashion [6]. These associations typically involve common variants, but rare alleles can also impact disease. Rare alleles are present at low frequency or only in specific human populations, and can only be identified using specialized study designs and populations. It is important to note that for CUDs, and most other complex diseases, the absolute number of risk alleles, their frequency in the population, and their individual contribution to disease risk (genetic architecture) in humans is unknown. Thus, the goal of human genetic association studies is to evaluate the genetic architecture of disease and identify risk genes and alleles in order to identify vulnerable individuals and design effective intervention or treatment strategies.

Three main approaches have been used towards the goal of identifying the genes and gene variants associated with CUDs in humans. These are candidate gene association studies (CGAS), family-based linkage studies (FLS), and genome-wide association studies (GWAS). In the first approach (CGAS), known allelic variants within a candidate gene are tested for an association with the disease. Often the candidate gene is selected based on a priori evidence regarding involvement in a disease related pathway and/or the presence of known functional variants. In a typical CGAS, inheritance of candidate gene alleles are associated with disease risk using statistical models and a case-control or family-based study design. A benefit to CGAS is that only a few associations are tested at a time, resulting in less correction for multiple testing and more significant association scores. A caveat to CGAS is the biased and limited experimental design which may lead to inflation of the contribution of the candidate gene to the disease phenotype. In contrast, FLSs and GWASs represent unbiased methods to identify gene variants contributing to phenotypic variation or disease risk.

FLSs represent a genome-wide approach to identify loci that are associated with a trait or disease risk. FLSs compare pedigrees among families to assess the likelihood that affected individuals share the same allele at a polymorphic marker more often than would be expected by chance when compared to unaffected relatives. Markers found to be significantly linked to the disease or trait by FLS are postulated to be near the causal gene variant. However, the region of linkage in FLS is often quite large. Historically this has been due to smaller sample sizes and marker panels in the size range of hundreds to thousands. For these reasons, most FLS of CUDs typically result in the identification of large (~10 Mb) linked regions that contain hundreds of potential candidate genes. Resolution to a single candidate gene in the larger linked region is not possible.

In contrast, a typical GWAS tests the association between a phenotype and allele frequency at hundreds of thousands or millions of individual polymorphisms (typically SNPs). Most mammalian genomes have been sequenced and there are many high-throughput sequencing and genotyping platforms available to identify the allelic variation (genotype) at each locus on a global scale for individuals in a population. Currently, the most cost-effective and high throughput strategies include the use of genotyping microarrays that profile inheritance at millions of common variants. Due to linkage disequilibrium (LD), all variants do not need to be genotyped. LD occurs because regions of the genome are inherited as small blocks of DNA from either parent. Polymorphic genetic markers contained within each LD block will be highly correlated with one another because they are inherited as a unit. Polymorphisms in adjacent blocks will be less well correlated., Therefore, representative tag SNPs can be used as a proxy for all variants within a region of high LD (regions with high LD that are inherited together are referred to as haplotype blocks). Genotypes for adjacent polymorphisms can be imputed later for individuals based on population haplotypes. Although it is assumed that genetic polymorphisms modulate gene function or expression, the causal gene variant is generally not known following a GWAS . It is important to remember that GWAS can identify candidate gene loci, but cannot generally identify causal variants, the impact of variants on gene function, or the biological mechanism by which the gene contributes to disease.

Relative to CGAS, FLS and GWAS are unbiased approaches that can lead to the detection of multiple loci containing genes and alleles that contribute to risk. Each loci exerts a small effect on disease risk (i.e. CUDs) and the sum of all risk alleles, referred to as polygenic risk, is a better overall predictor of disease risk that captures more of the genetic variability or heritability of the disease. However, the large number of tests performed in FLS and GWAS requires correction for multiple testing and results in severe statistical penalties. To account for the many linkage or association tests performed for each marker and the phenotype of interest, empirical P-values adjusted for multiple test correction are computed. Usually the adjustment is made following the results of hundreds to thousands of permutations of genotypes for individuals in a genetic study. The adjusted P-value is represented as the number of times a permuted logarithm of the odds (LOD) score for association is greater or equal to the maximum observed LOD score divided by the number of permutations plus one. The empirical adjusted P-value is also referred to as the genome-wide corrected P-value. For most GWAS studies this is set very low (P < E-09). For these reasons, the sample sizes and association scores required to reach statistical significance are much higher compared to CGAS. Increased sample sizes in recent GWAS has led to the identification of more candidate genes and better models of polygenic risk (for a review see [6]). In addition, genotyping a larger (or infinite) number of markers using microarray or next-generation DNA sequencing has the potential to resolve linkage region or loci down to a single gene.

7.2 Candidate Genes Identified Through Human Association Studies

Relative to alcohol and other drugs of abuse the number of association studies performed for CUDs and related traits remains relatively small. In this section candidate genes and the evidence supporting them will be reviewed.

7.2.1 Candidate Gene Association Studies

For CUD most CGAS have focused on gene variants within the endocannabinoid system. Endocannabinoid signaling is critical for modulation of numerous biological processes including, response to natural rewards, learning and memory , emotional processing, motor coordination, pain, energy metabolism , fertility, development, and immune response. Major endogenous lipid ligands of the endocannabinoid system include N-arachidonylethanolamide (AEA or anandamide ) and 2-arachidonoylglycerol (2-AG) . Both are synthesized from membrane precursors by N-acylphosphatidylethanolamine-specific phospholipase D (NAPE-PLD) in the case of AEA, and by 1,2-diacylglycerol (DAG) lipases DAGLα and DAGLβ in the case of 2-AG. Both endocannabinoids are catabolized by one of two enzymes—fatty acid amide hydrolase (FAAH) for AEA and monoacylglycerol lipase (MGLL) for 2-AG. Both ligands (AEA and 2-AG) act as agonists primarily at two G-protein coupled receptors (GPCRs) , cannabinoid receptor type 1 (CB1) and type 2 (CB2), although they can also activate other receptors including GPCRs 55 and 119, and peroxisome proliferator-activated receptor (PPARs). AEA also acts as an agonist at transient receptor potential ion channels (TRPs). Note that each member of the endocannabinoid system is encoded by genes located in distinct genomic regions as opposed to localization within several gene clusters.

Of the known endocannabinoid signaling genes, variants in two—cannabinoid receptor type 1 (CNR1) and FAAH—have repeatedly been tested for associations with CUDs and related traits (Table 7.1). It is unknown whether variants in CNR1 influence CB1 expression or function, however several variants have been associated with cannabis use or dependence [7, 8]. In particular the minor G allele of the rs806380 SNP is thought to exert a protective effect. A common missense variant (rs324420; C/A) that results in substitution of the amino acid threonine for proline in the FAAH enzyme has also been associated with CUDs and related traits.

Table 7.1 Results from candidate gene association studies

In contrast to variants in CNR1, the missense mutation in FAAH has an impact on enzyme stability and function. Inheritance of both copies of the minor A allele (A/A homozygous genotype) results in lower expression and activity of the enzyme due to decreased stability and increased proteolysis [9, 10]. Of interest, the major allele associated with normal enzymatic activity is more frequently associated with risk or problematic cannabis use [11]. For example, inheritance of the major allele (C/C genotype) has been associated with CUD [12] and high cannabis withdrawal symptoms and craving [13, 14]. Although associations between traits related to CUD and variants in CNR1 and FAAH have been reported, there are several studies for which these associations were not replicated [15] and overall there is no consensus regarding the involvement of these mutations in cannabis intake, withdrawal, and dependence.

Gene variants that impact the function of key enzymes involved in drug metabolism can also influence drug use and risk of developing use disorders. An example of this are functional variants in genes involved in alcohol metabolism (alcohol dehydrogenase and aldehyde dehydrogenase) which are among the strongest protective factors against development of alcohol dependence (reviewed in [16]). Functional variants in cannabinoid metabolizing genes exist but have not yet been associated with CUD. In the liver, the cytochrome P450 family of enzymes plays a role in processing cannabinoids. There are several functional variants that modulate expression or enzymatic activity of family members, in particular, polymorphisms in the P450 family member CYP2C9 were found to influence metabolism of the synthetic cannabinoid JWH-018 that is a high affinity agonist at cannabinoid receptors [17]. Two mutations CYP2C9∗2 (cysteine substitution for arginine at amino acid residue 144) and CYP2C9∗3 (leucine substitution for isoleucine at amino acid residue 359) were found to increase or decrease metabolism of the synthetic cannabinoid, respectively. However, the impact of these variants on synthetic cannabinoid or cannabis use and dependence is not known.

7.2.2 Family-Based Linkage Studies

The first FLS was performed for adolescent cannabis use and dependence by Hopfer and colleagues [18]. The population used in this study included adolescents (ethnic distribution roughly 8% African American or AA, 37% Hispanic, 52% European American or EA, and 4% other) in a substance abuse treatment program in Denver and their genetically related siblings. Participants were part of a larger Colorado Center on Antisocial Drug Dependence study [19]. In total, 324 adolescent sibling pairs from 192 families were included. Cannabis use was also measured in an age-matched control sample drawn from the same population (community sample) consisting of 4843 individuals. The community sample was used to standardize cannabis dependence (CD) scores in the treatment samples. Repeated cannabis use was defined as using cannabis at least six times and CD was measured as the number of lifetime symptoms based on DSM-IV criteria. Cannabis use and dependence were much higher (99% and 59%, respectively) in the adolescent treatment probands relative to their siblings (55% and 14%, respectively) and compared to the community sample (5% prevalence of CD in age-matched controls). Parents and sibling pairs were genotyped for 374 markers covering the 22 autosomal chromosomes (the X and Y sex chromosomes were excluded). Two linkage regions were identified that met the criterion for suggestive linkage (P = 0.0004, LOD > 2.5) between inheritance of parental alleles at a marker and CD. Suggestive linkage regions were located on Chrs 3 (3q21 near marker D3S1267) and 9 (9q34 near marker D9S1826). No significant loci were found. The interval for linkage on Chr 3 was located roughly between markers D3S1271 and D3S1292 (101 to 132 Mb using the GRCh38/hg38 human genome assembly) and the interval for Chr 9 was located between marker D9S290 to the end of the chromosome (128.6 to 138 Mb). Because of the small number of markers used in the analysis, many genes (376 for Chr 3 and 305 for Chr 9) were located in each linkage interval. Although the Chr 3 suggestive linkage region includes MGLL, the gene encoding the major enzyme responsible for catabolism of the endogenous cannabinoid 2-AG , the precise genes contributing to trait variation in this first study of CD remain elusive.

Following closely behind the first linkage analysis for CD were several larger FLS. Agrawal and colleagues [20] leveraged data from the Collaborative Study on the Genetics of Alcoholism (COGA; [21]) to perform a linkage analysis based on DSM-IV criteria for CD. The COGA population was unique because it consisted of many generations of families (~90% EA and ~10 AA) at high risk for alcoholism. Genotyping was performed for 1364 individuals with genetic high-risk for alcoholism using a microarray platform consisting of 1717 SNPs. A community sample of 984 individuals was not genotyped but used to address and correct for possible confounds associated with linkage analysis (e.g. gender, race, age). A suggestive locus (adjusted p = 0.71, LOD = 1.9) on Chr 14 spanning ~14 Mb from markers rs759364 to rs872945 (89.3 to 103 Mb and containing 311 gene models) was associated with CD in the mostly EA COGA cohort carrying risk alleles for alcohol dependence.

Agrawal and colleagues [22] performed linkage analysis for CD based on DSM-IV requirements on 3431 individuals from 289 Australian families comprising the Nicotine Addiction Genetics Program (NAG) [23]. These families (>90% Anglo-Celtic or Northern European ethnic origin) included siblings and parents with a lifetime history of heavy smoking (40 cigarettes in a 24 h period or 20 cigarettes per day during periods of heavy smoking). A community-based control sample of 5776 individuals was used to standardize phenotypes and correct for possible confounds associated with linkage analysis in the NAG cohort. The NAG cohort was genotyped for a panel of 381 autosomal markers. Factor analysis was performed on the abuse and dependence criteria to create a single cannabis problems factor score which accounted for the majority of the variance (>60%) among measures. Suggestive linkage regions for the cannabis problems factor score were identified on Chr 1 (~10 cM interval centered on marker D1S2841 located at 78.9 Mb) and Chr 4 (~25 cM interval centered on marker D4S419 located at 18.7 Mb). Note that the cM is a unit of genetic distance measured in map units. This unit of measure has historically been used in association studies where 1 cM corresponds to a recombination frequency of 1%. In humans, 1 cM is roughly equivalent to 1200 kb, but this varies between sexes and physical location on the chromosome. Again, linkage intervals identified in this study were too large to nominate single candidate genes.

In a separate and larger family study Agrawal and colleagues [24] performed linkage analysis for lifetime cannabis use, early-onset cannabis use, and frequency of cannabis use. The Australian cohort consisted of 5600 adult Australian twins, parents, and siblings from 2352 families genotyped at 1461 markers per individual. No markers passed the threshold for genome-wide significance. A suggestive linkage region (P ≈ 0.65) on distal Chr 18 near marker D18S1360/GATA129F05 was identified for cannabis initiation (LOD = 1.97) and frequency of use (LOD = 2.14). A suggestive region was also located on proximal Chr 19 for early-onset cannabis use (LOD = 1.92). Marker position was not provided for all traits in the study so approximate linkage regions for this study are included in Table 7.2. Similar to previous FLS, relatively small sample sizes and marker panels provided low statistical power to detect linkage regions as well as poor resolution within suggestive linkage regions (hundreds of candidate genes located within large regions of linkage).

Table 7.2 Results from family-based linkage analysis

Ehlers and colleagues [25] analyzed a separate cohort of 1647 adults (92% Caucasian) from families with a history of alcoholism in order to identify loci associated with CD, craving, and withdrawal (feeling nervous, tense, restless, or irritable during abstinence from cannabis use). The probands were genotyped for 811 markers and a control sample of 147 individuals was used to access baseline phenotype rates. For CD, two suggestive loci were identified on Chrs 1 (LOD = 2.1, 17 cM interval near marker D1S498) and 2 (LOD = 2.6, 22 cM interval near marker D2S2361). Five loci were identified for craving on Chrs 7 (LOD = 5.7, 13 cM interval near D7S502), 3 (LOD = 4.4, 12 cM interval near D3S1279), Chr 1 (LOD = 3.6, 12 cM interval near D1S199), and 6 (LOD = 3.2, 7 cM interval near D6S281). An additional two suggestive loci for craving were identified on Chrs 9 (LOD = 2.6, 19 cM interval near D9S157) and 15 (LOD = 2.3, 9 cM interval near D15S127). For withdrawal, the strongest linkage region was identified on Chr 9 (LOF = 3.6, 10 cM interval near D9S1838). Additional suggestive loci for withdrawal were identified on Chrs 3 (LOD = 2.5, 13 cM interval near D3S1566) and 7 (LOD = 2.2, 25 cM interval near D7S506). The withdrawal loci on Chrs 9 and 3 also demonstrated evidence of linkage for a phenotype related to sleep disruptions (“sleeplessness”). Because the population under study was recruited based on a family history of alcoholism, Ehlers and colleagues examined whether linkage regions for alcohol overlapped with CD and associated traits. None of the linkage regions identified for CD or craving and withdrawal phenotypes had previously been associated with alcohol related traits measured in the same cohort.

Finally, Han and colleagues [26] used a multistage design to identify gene variants associated with CD. Linkage analysis was first performed in two different ethnic study cohorts—AA (1022 individuals from 384 families) and EA (874 individuals from 355 families). Both cohorts were ascertained for cocaine and opioid dependence and selected families included at least two affected siblings for opioid or cocaine dependence based on DSM-IV criteria. Linkage was performed in each ethnic sample separately. The strongest linkage peak was identified on Chr 8 (8p2.11, LOD = 2.9) for the AA samples and another suggestive peak for these samples was also detected on Chr 14 (LOD = 2.26). In the EA samples, a suggestive linkage peak was detected on Chr 7 (LOD = 1.85). In the next stage of the analysis the authors used an independent data set from the Study of Addiction: Genetics and Environment (SAGE) that included 4036 unrelated individuals (275 AA cases and 401 controls and 422 EA cases and 1049 controls). GWAS was performed dependent on ethnic background for the strongest suggestive linkage peak identified on Chr 8 in the FLS. A SNP (rs17664708) located in a candidate gene for schizophrenia risk, NRG1, was modestly associated with CD in both AAs and EAs. The association between CD and genotype at rs17664708 was replicated in an independent sample of AAs (758 dependent cases and 280 controls) but was not able to be replicated in an independent sample of EAs (568 dependent cases and 318 controls). Of interest, the NRG1 variant is common in EA samples but rare in AA samples and was primarily associated with CD in AA samples. The variant in NRG1 may also be associated with drug dependence in general as it was also associated with opioid dependence in the original AA study cohort ascertained for cocaine and opioid dependence.

Some of the first genetic studies for CUD and related traits involved FLS that included less than 10,000 individuals, less than 3000 families, and less than 2000 markers. These studies were not well-powered to identify individual candidate genes. However, they were able to identify genomic regions that might harbor genes related to initial cannabis use and dependence. The one exception was the multistage analysis performed by Han and colleagues [26] which used FLS, GWAS , and independent replication cohorts to nominate variants in NRG1 as possible genetic risk factor for cannabis and opioid dependence, particularly in AAs. Taken together, these FLS were able to demonstrate that the high heritability for CUDs ascertained from twin studies translated into the detection of large linkage regions possibly harboring gene variants mediating CUD and cannabis use traits. Few of these regions pass stringent genome-wide correction and most were not replicated in separate cohorts. Thus, we cannot exclude the possibility that some of these suggestive loci represent false positives. Of note, some of these suggestive regions overlap among studies. Genetic maps cannot be translated directly into physical maps, but in general 1 cM is roughly equivalent to 1.2 Mb. Approximate linkage regions were determined for studies that provided marker information and/or the cM interval for linkage regions by using the physical marker locations as an anchor on the physical map (human genome assembly GRCh/hg38). Genetic distances in cM were then converted into physical map distances. Approximate linkage regions are provided in Table 7.2. Although these intervals are a rough estimate, they provide support for possible overlapping linkage regions containing gene variants that may modulate CUDs. This includes a region on Chr 9 from 128 to 144 Mb that is associated with both dependence and withdrawal in two different cohorts [18, 25]. A region on Chr 7 from ~38 to 75 Mb was associated with cannabis withdrawal and craving. However, both traits were collected from the same cohort [25]. It is important to note that comparing overlapping linkage intervals is not a robust comparison method across studies. The appropriate comparison between studies would be a meta-analysis using summary scores associated with linkage between markers and traits in both studies. However, this information was not provided in some of the FLSs. Another possible reason for the general lack of replication across studies could arise from differences in how each cohort was ascertained (e.g. adolescent cannabis dependence versus lifetime use or genetic risk for alcohol or nicotine dependence) and these loci may confer risk of CUD or related traits only during specific developmental periods or populations (e.g. NRG1).

7.2.3 Genome-Wide Association Studies

Although the number of cannabis use and related GWAS is still relatively small, several large studies have identified genome-wide significant and suggestive loci containing candidate genes (Tables 7.3 and 7.4). These studies provide evidence that CUD is a polygenic disease and that increasing the GWAS sample sizes should increase the number of candidate genes and generate better genetic predictors to evaluate risk, the relationship to other diseases or behavior traits, and the role of environmental factors.

Table 7.3 Results from genome-wide association studies
Table 7.4 Results from genome-wide association studies (top marker shown for each region)

The first GWAS for CD based on DSM-IV criteria was published by Agrawal and colleagues [27]. A panel of 948,142 SNP markers was genotyped in a case-control study design with 708 dependent cases and 2346 non-dependent controls (66% EA and 34% AA). Similar to many of the FLS, the population of cases and controls for which CD was assessed were originally ascertained for alcohol dependence. A caveat of measuring cannabis or other drug-related traits in these populations is highly comorbid polydrug dependence. However, at the time no populations recruited exclusively for cannabis-related traits existed and populations ascertained for alcohol dependence were readily available. As expected based on the small sample size (<10,000 individuals), no markers met genome-wide significance. However, the large number of genotyped markers resulted in suggestive associations for markers tagging individual genes or intergenic regions. These suggestive associations were located on Chrs 1 (UCHL5), 2 (LINC01122, KYNU), 3 (intergenic), 6 (CRYBG1, RPS6KA2), 9 (intergenic), 10 (STAM), 11 (MICAL2), 12 (CHST11, LGR5, CCDC91, ACADS), 13 (LINC00362, KATNAL1), 14 (intergenic), 16 (FTO), 17 (ANKFN1), 19 (intergenic), and 22 (intergenic) (Table 7.4). The first GWAS for CUDs lacked statistical power and did not follow-up their suggestive loci in an independent replication cohort. However, it identified the first putative candidate genes for CD.

Later GWAS [28,29,30,31,32,33] were able to increase the number of candidate gene associations for CUDs and related traits by increasing the sample size of the discovery cohort. This was achieved primarily by combining results from smaller GWAS studies using meta-analysis (metaGWAS). Individually, each study may be underpowered to detect small effect alleles due to small sample sizes. However, when each study is combined the detection of small effect loci becomes possible due to the increased sample size. In metaGWAS summary statistics (effect size, standard error, and/or p-values) for associations between SNPs and phenotypes from multiple population studies comprised of unique individuals are combined to generate new association scores, effect estimates, and evaluate data set heterogeneity (differences in methodology between studies that could impact results). For a review of the metaGWAS approach, see [34]. As a note of caution, metaGWAS can increase sample size and power, but inclusion of samples ascertained for substance dependence other than cannabis can introduce heterogeneity and has the potential to confound results or limit reproducibility.

Several groups, starting with Verweij and colleagues [33], were able to increase the number of subjects beyond 10,000 through the use of metaGWAS and by selecting a dichotomous cannabis-related trait (yes or no to cannabis use) that could easily be assessed on a large-scale. However, few markers passed the criterion for genome-wide significance at sample sizes of 20,000 to 30,000 individuals. For example, Verweij and colleagues densely genotyped over two million SNPs from families in Australia and the United Kingdom (10,091 related-individuals) that were part of the Australia and UK twin registries (Spector & Williams 2006). Associations between SNPs and initiation of cannabis use were assessed in the Australian and UK cohort separately using family-based association tests followed by meta-analysis. Suggestive associations were observed for markers on Chrs 6, 13, 11, and 17, but no SNPs reached genome-wide significance (Table 7.4). Likewise, Stringer and colleagues [32] examined 32,330 subjects (European ancestry) for lifetime cannabis use and failed to identify any SNP associations reaching genome-wide significance. This was despite tripling the sample size used by Verweij and colleagues [33] by combining 13 discovery samples collected from around the world (International Cannabis Consortium data sets) and performing meta-analysis. Nevertheless, suggestive associations were identified on Chrs 1, 2, 3, 5, 11, 12, 15 (Table 7.4) and a less stringent gene-based analysis of 24,576 genes/genetic regions identified significant associations for the genes NCAM1 (neural cell adhesion molecule 1, Chr11), CADM2 (cell adhesion molecule 2, Chr3), SCOC-AS1 (short coiled-coil protein anti-sense RNA 1, Chr4), SCOC (short coiled-coil protein, Chr4), and KCNT (Chr1) following multiple test correction (Table 7.4). The top SNP and gene associations identified in the discovery cohorts failed to replicate in an independent samples consisting of 5627 individuals (53% European and 47% AA), with the exception of suggestive associations for SCOC-AS1 and SCOC in one of the 4 replication samples (AA). SNP heritability based on common SNPs in the Stringer study was estimated at 13–20% for lifetime cannabis use, which was an improvement over the 6% SNP heritability computed for Verweij and colleagues [33]. Both studies included discovery cohorts with different recruitment strategies and subsequent wide variation in the prevalence of lifetime cannabis use among cohorts may have deflated heritability estimates in both studies. Nevertheless, improvements in SNP heritability with larger samples sizes in the Verweij study confirmed that lifetime use of cannabis is a heritable trait contributed to by many loci of small effect. Thus, further increases in sample size should result in identification of more significant loci.

As proof of this concept, Pasman and colleagues [29] published the largest metaGWAS of lifetime cannabis use (184,765 individuals) and identified eight genome-wide significant independent SNPs in six regions (Chrs 3, 7, 8, 11, 16, and 17). Altogether, the identified SNPs accounted for 11% of the individual variance in lifetime use of cannabis. Using gene-based tests they identified 35 genes significantly associated with lifetime cannabis use (Table 7.4). Replication in an independent cohort was not performed, likely because the replication cohort would be much smaller and less well-powered than the discovery cohort. There was also substantial heterogeneity among cohorts used in the meta-analysis that might have limited power in some analyses and/or reproducibility or generalizability. Despite some limitations of the study, Pasman and colleagues were able to identify multiple significant loci and genes for lifetime cannabis use using a massive cohort of nearly 200,000 individuals. This study provides more evidence that the genetic architecture of lifetime cannabis use is complex and involves many small effect genes. Importantly, most of the loci identified in the study were novel and had not been identified previously.

The GWAS discussed thus far took advantage of samples recruited based on different criteria to identify loci associated with lifetime cannabis use. However, there is some debate over how lifetime use is related to development of problematic use and dependence. Early use has been associated with progression to problematic cannabis use and susceptibility for other substance use disorders [1, 35,36,37]. Early use may also interact with environmental and social factors. For example, the age at which individuals begin to use cannabis may depend on the overall prevalence of use within a country. Higher prevalence has been related to younger ages of initiation [38].To begin to address this issue, Minică and colleagues [30] used GWAS to identify loci associated with early cannabis use. The authors performed metaGWAS on a discovery cohort of 24,953 individuals with replication in a sample of 3735 individuals. This study also estimated heritability for age of initiation at 39% based on three cohorts consisting of 8055 twins (European descent). SNPs in the ATP2C2 gene reached genome-wide significance (Table 7.4). However, they failed to replicate in the independent cohort, and SNP-based heritability for age of initiation was not significant. Note that in both metaGWAS studies with replication examined thus far [30, 32] the replication cohort was much smaller than the discovery cohort which may have limited the power for replication in the discovery cohort.

Only three metaGWAS studies [28, 31, 39] examined CUD directly. In the first study, Sherva and colleagues [31] identified loci associated within CD severity based on DSM-IV criteria using metaGWAS and replication across three independent cohorts consisting of 14,754 individuals (AA and EA). Each cohort was ascertained separately for drug dependence as part of the Yale-Penn Study on the genetics of substance use [40], the SAGE Study on the genetics of alcohol, nicotine, and cocaine use [41], and the International Consortium on the Genetics of Heroin Dependence [42]. SNPs tagging several independent loci met the criteria for genome-wide significance (P < E−7) in the AA samples alone or in the combined metaGWAS (Table 7.4). These SNPs were upstream of the gene for S1000 calcium binding protein (S100B) and within the gene for CUB and Sushi multiple domains 1 (CSMD1). Secondary analysis using a replication cohort found additional support for the dependence severity association score and SNPs in CSMD1 and the drug/metabolite transporter superfamily gene solute carrier family 35 member G1 (SLC35G1). Potential limitations of the study were that CD severity was significantly correlated with dependence for other drugs of abuse (alcohol, nicotine, opioids, and cocaine) and there was high heterogeneity among the sample cohorts included in metaGWAS and used for replication.

The second metaGWAS for CD relied on 8515 individuals of European descent and was drawn from five different cohorts, four of which were ascertained for substance use, including COGA, SAGE, and the Comorbidity and Trauma Study [43]. Agrawal and colleagues [28] analyzed 2080 dependent and 6435 non-dependent cannabis-using controls from this cohort using metaGWAS. The selection of non-dependent controls (based on DSM-IV criterion) with at least one reported use of cannabis was a unique aspect of the study. SNPs on Chr 10 were identified as genome-wide significant and there was modest evidence for replication of this association in AA (but not EA) individuals in a small independent replication sample (896 AA cases and 1591 controls). These SNPs were not associated with genes but the authors provided some evidence that one SNP in the Chr 10 region (rs1409568) may be located within an active enhancer. A suggestive association between dependence severity (cannabis dependence symptoms counts based on DSM-IV criteria) and SNPs on Chr 2 around marker rs2287641 was also identified but did not replicate in the independent cohort.

Finally, Demontis and colleagues [39] identified an association between SNPs located in a cluster on Chr 8 (rs56372821 index SNP) and CD using a data set consisting of 2387 dependent cases and 48,985 controls. The cohort used in this analysis differed from most of the previous studies in that it was ascertained for major mental illness (schizophrenia, bipolar disorder, attention deficit hyperactivity disorder, anorexia nervosa and autism spectrum disorder) and not drug use or dependence. All individuals were part of the Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH) Danish nation-wide cohort [44]. Significant replication was observed in a replication sample of 5501 cases and 301,041 controls. Of note, expression of the neuronal acetylcholine receptor (nAChR) alpha-2 subunit gene, CHRNA2, in human cerebellum was found to be controlled by the variants at the Chr 8 locus using the Genotype-Tissue Expression (GTEx) dataset [45]. Taken together these results provide a possible causal mechanism by which variants at the Chr8 locus regulate CHRNA2 brain expression and possible risk of CD.

7.3 Limitations and Future Directions

Increasing the sample size of GWAS for cannabinoid use has dramatically increased the number of markers that pass the criteria for genome-wide significance. This was clearly demonstrated by Pasman and colleagues [29] following identification of 8 independent genome-wide significant markers on Chrs 3, 7, 8, 11, 16, and 17 for lifetime cannabis use in a cohort of 184,765 individuals. Another large case-control study of 51,372 individuals identified one genome-wide significant loci on Chr 8 that replicated in an equally sized replication cohort [39]. However, these successes are modest compared with recent large GWAS studies for other diseases. For example, over 100 risk loci for Schizophrenia have been identified in GWAS combining ~50,000 individuals [6].

There are several possible reasons for the paucity of strong candidates in human association studies of cannabis use and dependence. The first reason is that the genetic architecture of cannabis use and dependence may be difference than that of other substance use disorders and psychiatric diseases. Most GWAS for substance use disorders and psychiatric diseases demonstrate a large genomic target associated with disease risk. In other words, many biological pathways and genes contribute a small amount to overall risk. Thus, many genes of small effect combine in an additive fashion to influence disease risk. This type of genetic architecture typically results in a positive linear relationship between sample size and identification of genome-wide significant associations. However, it is possible that the genetic architecture of cannabis use and dependence is different and that rare variants and non-additive interactions or environmental interactions drive the disease more than the combined actions of many small effect loci. However, this hypothesis seems at odds with recent large GWAS for cannabis use and dependence [29, 39] and for GWAS results for other drugs of abuse, which likely share some common underlying biological and genetic mechanisms.

Perhaps a more likely explanation for the small number of high-confidence candidates is the type of data sets used in recent GWAS . For cannabis research, there is a lack of large population-based data sets for which individuals were ascertained primarily for cannabinoid or cannabis-related traits. Combining data sets ascertained for psychiatric disease or dependence for other drugs of abuse can lead to heterogeneity among samples that can actually decrease power. This may explain why so few loci were identified as genome-wide significant level for association with lifetime cannabis use in a massive data set of nearly 200,000 individuals [29].

Yet another issue with association studies over the past two decades is the lack of replication among data sets. Only a single linkage region overlapped among FLS (Table 7.2). Of all the suggestive and significant associations identified in GWAS (Table 7.4), only two genes (CADM2 and NCAM1) were identified by different studies [29, 32]. However, it is important to note that both studies that identified NCAM1 and CADM2 included the same set of ~30,000 individuals from the International Cannabis Consortium.

Association studies in humans have the capability to identify genes and risk alleles. As the sample sizes for GWAS studies increase, so does the number of associations. If these alleles can be identified directly in humans, why the need for testing in preclinical animal models? The answer depends on biological systems and causality. Preclinical genetic animal models (specifically rodents) offer the ability to directly test the role of genes in CUDs and explore the underlying biology in ways that would be impossible in humans. Environmental factors can also be controlled and manipulated in preclinical studies in ways that are not possible in studies involving human subjects. Bi-directional translation between association studies in humans and preclinical models is essential for identifying the environmental, genetic, and molecular mechanisms contributing to disease and for design of effective therapeutics.

One of the simplest ways in which genetic preclinical models support association studies is through reverse genetic engineering. In this case, candidate genes are manipulated in the preclinical model to evaluate their role in disease. It is even possible to introduce the precise human genetic variant into a preclinical model to evaluate its impact. Such humanized mice have been used to evaluate the role of common functional variants in the catechol-O-methyltyransferase gene [46] and to model the role of alleles involved in risk of familial Alzheimer’s disease [47]. Thus, genetic engineering approaches in preclinical rodent models can be used to directly evaluate the role of candidate genes evaluated in human association studies. However, relatively few genes have been evaluated for a role in cannabis or cannabinoid-related traits in rodent models. The only gene identified from human association studies that has also been independently evaluated for cannabinoid-related traits is the NRG1 gene. Of interest, mice heterozygous for deletion of murine Nrg1 show enhanced sensitivity to the main psychoactive cannabinoid in cannabis, Δ9-tetrahydrocannabinol or THC [48,49,50,51]. Recent advances in genetic engineering, including CRISPR/Cas9 mediated genetic engineering [52] should facilitate functional evaluation of genes and gene variants such as CADM2 and the schizophrenia susceptibility gene, NCAM1, associated with cannabis use, dependence, and/or withdrawal in humans.

Preclinical genetic models can also be used for unbiased genome-wide linkage or association studies to identify genes and gene variants that contribute to disease variation. Examples of these include rodent genetic models in which two or more inbred progenitor strains are crossed repeatedly and then inbred (recombinant inbred lines such as the BXD panel or collaborative cross panel in mice) or outcrossed repeatedly (diversity outcross and heterogenous stock mice) to create genetic panels segregating millions of variants [53,54,55,56]. For a review see [57]. As of 2018, traits related to CUD in humans (initial sensitivity, tolerance and dependence, withdrawal severity, and/or self-administration) have not been profiled in genetic rodent populations in order to identify candidate genes. A single study attempted a short-term selection in an F2 cross between C57BL/6J and DBA/2J inbred strains in order to determine if locomotor sensitivity to THC was heritable and could be selected for in order to produce progeny that carry sensitive or resistant alleles for later genetic dissection [58]. As in all systems, preclinical models have some advantages and disadvantages. The clear advantage is the ability to manipulate all aspects of preclinical studies and derive causality from these controlled manipulations. The main disadvantage is that preclinical models are not identical to humans at all levels of behavior and physiology and, as a result, there will always be some controversy regarding translatability.

7.4 Conclusions

Association studies for cannabis use and dependence over the past two decades have identified candidate linkage regions (FLS, Table 7.2) and genes (primarily through GWAS , Table 7.4). In contrast, CGAS have yielded inconsistent results. Over the next two decades, it is likely that more GWAS containing 50,000 to 1000,000 individuals will be performed for cannabis use, dependence, and withdrawal. Recruiting samples directly for these traits along with other methods to reduce heterogeneity among cohorts can be expected to increase the number of genome-wide significant associations. This should lead to a larger and more reproducible list of candidates and a better assessment of polygenic risk and genetic architecture. It is also important to remember that, despite some of the current issues with power and reproducibility, human association studies have identified candidate genes and mechanisms that should be evaluated to determine how and how much they contribute to disease risk. The stage is already set for this type of translational research in preclinical animal models.