Keywords

1 Introduction

Endometriosis is a common complex condition that is caused by the interplay of multiple genetic and environmental factors. The genetic risk variants for the condition only present part of the disease risk, and environmental factors also a play an important role in disease pathogenesis either independently or through interaction with genetic factors [1]. The heritability that is the proportion of disease risk due to genetic factors for endometriosis has been estimated in two large twin studies [2, 3] that arrived at very similar estimates (49–51%). A separate study estimated 26% to be due to common genetic variation (DNA variants with a frequency >1% in the population) [4]. As the underlying pathology of endometriosis is not well understood, one way to explore underlying mechanisms is to investigate the genetic factors and their functions that are causal for the disease. For complex diseases such as endometriosis, the most powerful and appropriate study design to detect genetic risk factors is that of a genetic association study, in which the frequencies of variants are compared between cases and controls, similar to an epidemiological case control study in which the frequency of risk-factor exposures is compared. For situations in which a disease shows a very strong pattern of familial inheritance (e.g., “monogenic” familial breast or ovarian cancer), family-based approaches are more appropriate, which we do not cover here.

2 Discovery of Endometriosis Genetic Susceptibility Variants

In population-based study designs, genetic variants can be investigated using hypothesis-driven or hypothesis-free association methods. The hypothesis-driven approach, candidate gene association studies, relies on prior biological understanding of the condition and testing for association in these regions that are prioritized based on previous knowledge. Similar to other complex diseases, candidate gene association studies have not generally been successful in identifying robust results for endometriosis [5]. For the results to be robust, identified associations need to be replicated in an independent study in individuals of similar ancestral background. The reason for general failure of candidate gene association studies is manyfold: (1) The prior biological knowledge on the tested regions for association may not be relevant to the disease in question; (2) the coverage of common genetic variation in candidate gene regions is often limited and does not allow the testing of all potential common genetic risk variants in these regions (either directly, or indirectly through linkage disequilibrium with other variants); (3) the number of genes included in the study are often limited to a few that make up only a small part of a potentially causal underlying pathway; (4) and the sample sizes of candidate gene studies have often been insufficient to detect common genetic variants for common complex conditions. The standard approach now to identify common genetic variants for common complex conditions is a hypothesis-free method, namely the genome-wide association study (GWAS).

3 Genome-Wide Association Studies

GWAS have been very successful in the identification of common genetic variants underlying complex conditions. In a GWAS, typically at least 2000 cases and 2000 controls are genotyped at a genome-wide level using an “off the shelf” microarray containing probes that capture 100,000s of single nucleotide polymorphisms (SNPs) – single base-pair DNA variants. After extensive quality control, the genotypes of SNPs nearby that are not directly genotyped can be imputed, using a reference panel that includes a comprehensive catalogue of common genetic variants in the relevant ancestry population. Subsequently, the frequency of common SNPs is tested for differences between the case and control groups. Owing to the millions of statistical tests conducted across the genome, a stringent significance threshold needs to be adopted to reduce the number of false positive findings. The standard threshold used for genome-wide significance is p < 5 × 10−8. A detailed overview of GWAS design is given in Zondervan and Cardon [6]. All common genome-wide significant variants identified for common complex diseases and traits through GWAS are documented in the National Human Genome Research Institute (NHGRI) GWA Catalogue (www.genome.gov/GWAStudies). This catalogue demonstrates how successful the GWAS approach has been in identifying common variants underlying complex diseases and traits: To date, the catalogue includes data on 255,015 SNP-disease associations (25 April 2021).

To date, 10 GWAS in women of European and East Asian ancestry have been published for endometriosis, varying from 171 to 58,115 included cases (Table 6.1). The largest is a meta-analysis led by the International Endogene Genomics Consortium (IEGC), for which interim results were released in 2018, comprising of 15 GWAS and a replication analysis including a total of 58,115 cases and 733,480 controls [16]. An early GWAS had analyzed the effect of all SNPs combined by rASRM stage, showing a significantly higher genetic contribution to rASRM stage III/IV versus stage I/II disease (Proportion of endometriosis variation explained by common SNPs = 0.34, SD: 0.04 vs. 0.15, SD = 0.15) [9]. Therefore, subsequent GWAS meta-analyses were conducted separately for stage III/IV disease; the largest IEGC-led GWAS meta-analysis (2018) investigated association with rASRM stage III/IV disease, rASRM stage I/II disease (for the first time), and infertility-associated endometriosis subphenotypes, in addition to overall endometriosis. This study revealed 27 loci genome-wide significantly associated with endometriosis, 13 of which were novel (Table 6.2). Positionally, the lead SNPs for the identified genetic loci reside near genes that are involved in sex-steroid hormone, WNT signaling, cell adhesion/migration, cell growth/carcinogenesis, and inflammation-related pathways.

Table 6.1 Summary of 10 GWAS investigating associations with endometriosis
Table 6.2 Twenty-seven genome-wide significant loci from the GWAS meta-analysis for endometriosis, stage III/IV, stage I/II, and infertile endometriosis [16]

In subphenotype genome-wide association analyses, eight genome-wide significant signals were associated with stage III/IV disease and one genome-wide significant signal with infertility-associated endometriosis. Moreover, 21 of the 27 loci had larger effect sizes for stage III/IV compared to stage I/II disease (Table 6.2) suggesting that specific variants may confer risk for different subtypes of endometriosis through distinct pathways. Further studies with more detailed phenotypic data on endometriosis are needed to decipher the genetic variants that may be associated with different subtypes of the disease, and the identity of these subtypes beyond ASRM staging.

4 Conclusions and Future Work

The variance explained by the 27 loci together is 2.15% for overall endometriosis and 3.83% for rASRM stage III/IV disease [16], which shows that there are many more genetic susceptibility loci to be uncovered for endometriosis in larger, deeply phenotyped datasets. The most up-to-date findings show that genetic mechanisms underlying endometriosis implicate metabolic, reproductive, inflammatory, and pain-related pathways, although these are based on “nearest gene” assumptions (the notion that the gene nearest the risk variant is affected by the risk variant in terms of expression). Furthermore, the stronger associations observed with infertile endometriosis or stage III/IV endometriosis strengthen the fact that specific variants may confer risk for different subtypes of endometriosis through distinct pathways. Fine-mapping analyses are needed to identify the causal variants for each of the 27 loci. In particular, functional follow-up of identified variants is vitally important, examining their effects on transcriptomic, proteomic, metabolomic, and epigenomic data in tissues and cells relevant to endometriosis, i.e., endometrium and its cellular components.

As an example, WNT4/1p36.12 is a well-established locus associated with endometriosis, and the gene that sits nearest to the identified genome-wide significant variant is the WNT4 gene. However, this positional evidence is not enough to determine whether this is the gene that involved functionally in endometriosis pathology. Powell et al. investigated the gene expression profile around this 1p36.12 cytoband and identified that the endometriosis associated variant is a significant eQTL in whole blood decreasing expression of LINC00339 and increasing expression of CDC42. The eQTL for LINC00339 was also observed in endometrium tissue with same direction of effect. However, no evidence for eQTL effects of WNT4 was identified highlighting the importance and need for these functional studies to understand the disease-relevant mechanisms of the identified genetic risk variants [17].

Tissue-based molecular phenotyping data (transcriptomics, proteomics, and metabolomics) are not available for endometrium or its relevant cellular components in sufficiently large sample sizes from publicly available databases (e.g., the Genotype-Tissue Expression (GTEx) project [18, 19]). Two recent studies investigated the whole-transcriptome profiles utilizing RNA-sequencing (N = 206) and microarray-based gene expression (N = 123) in endometrium tissue and generated expression-quantitative trait loci (eQTL) maps to determine the genetic variants that regulate gene expression in endometrium tissue [20, 21]. The microarray-based and RNAseq-based eQTL maps identified variants that regulate expression of 198 and 327 unique genes, respectively. Such studies are very important to better understand the effect genetic risk variants have on gene expression in endometrium; however, similar profiling studies need to be conducted using other “omics” data (epigenomics, proteomics, and metabolomics). There is also need for collection of these tissue and cell types utilizing standardized protocols that will allow for collaboration between study centers to reach samples size needed for these functional investigations. The Endometriosis Phenome and Biobanking Harmonisation Project of the World Endometriosis Research Foundation has provided globally standardized protocols for data and sample collection in studies of endometriosis [22,23,24,25]. At the time of writing, 47 centers are using the standards for data and/or sample collection, with many 10,000s of samples already stored for research purposes in local study repositories. More large-scale integrated omics studies in deeply phenotyped patients are needed to understand the underlying causal mechanisms for endometriosis and dissect subtypes of this complex condition, leading to the discovery of novel, better targeted treatments.