Introduction

Multiple sclerosis (MS) is a chronic demyelinating disease of the central nervous system (CNS); it develops with progressive neurodegeneration mainly due to autoimmune inflammation (Nylander and Hafler 2012). MS is diagnosed in about 2.3 million people worldwide (Browne et al. 2014), affects mostly young people leading to early disability, and reduces the lifespan by approximately 7 years as compared with the general population (Marrie et al. 2015; Kaufman et al. 2014; Leray et al. 2015). MS is a complex disease: several causes underlie its development and are still incompletely understood (Marian 2012; Kilpinen and Barrett 2013).

MS is characterized by familial aggregation; namely, the risk to develop MS is higher in patient’s relatives than in the total population and is in negative correlation with genetic distance to the proband (Oksenberg 2013). Its transmission in families, however, disagrees with Mendelian inheritance. The concordance rates are about 24–30 % in monozygotic twins and 3–5 % in dizygotic twins (Lin et al. 2012). This type of inheritance actually is typical for polygenic diseases and results from integral action of many independent or interacting polymorphic genes, each making only a minor contribution to the disease (Bomprezzi et al. 2003; IMSGC et al. 2010). Other risk factors act on the background of genetic susceptibility and are currently thought to include the epigenetic regulation and environmental factors (Oksenberg 2013; Gourraud et al. 2012).

MS is clinically a heterogenic disease with several courses, different manifestations and progression rates. The most frequent is bout-onset MS, that is, relapsing–remitting MS (RRMS) (85 % of cases) and secondary progressive MS (SPMS), which develops in the majority of RRMS patients following a more or less long period of the disease. Primary progressive MS (PPMS), characterized by progressive disability augment, is less frequent (10–15 % of cases) (Nylander and Hafler 2012). There is growing evidence that different pathogenic mechanisms prevail in RRMS and PPMS (Bradl and Lassmann 2009; Iwanowski and Losy 2015). The genetic basis of MS clinical heterogeneity is also complex (Baranzini et al. 2009) and remains the important unresolved problem.

Studies of the molecular basis underlying the pathogenesis of MS are an important part of research in the field, helping to design new strategies for its prevention and treatment by identifying the genetic, epigenetic, and environmental risk factors of MS that might provide a prognostic tool to estimate the individual predisposition to MS. In this review, only studies of genetic architecture of MS are discussed.

Genes involved in MS have long been sought, and several approaches to this problem have been applied more or less successfully. A candidate gene approach was employed for several decades in numerous studies, wherein genes potentially associated with MS were chosen on the basis of the presumable MS pathogenesis. Several loci involved in MS were identified, including certain HLA class I (Naito et al. 1972) and class II (Compston et al. 1976) genes, IL7RA (Lundmark et al. 2007), CIITA (Rasmussen et al. 2001), and SOCS1 (Zuvich et al. 2011; Vanderbroeck et al. 2012). Data on the association of common allele groups of the HLA class II DRB1 gene with MS in various populations of the world and the non-HLA genes that showed a significant association with MS in more than two independent studies were summarized in (Ramagopalan and Ebers 2009; Bahreini et al. 2010). At the same time, negative results were obtained in many association studies; this is not surprising because improper genes can easily be chosen as candidates given that the MS pathogenesis is complex and still incompletely understood. Several studies were focused on possible interactions of HLA-DRB1 alleles and non-HLA genes or between non-HLA genes. For example, associations of CCR5 Δ32 (Favorova et al. 2002) and of VNTR polymorphism in MBP (Guerini et al. 2003) were found only in DR-stratified individuals. This strategy of searching for combinations resulted in creation of the APSampler algorithm (Favorov et al. 2005) that was successfully applied for MS cohorts and allowed finding the two three-allelic combinations associated with MS, both including HLA-DRB1 alleles (Favorova et al. 2006).

Genome-wide linkage analysis was used in the 1990s–2000th to investigate the MS inheritance in families and to identify the genome regions that deviate from independent segregation and cosegregate with the disease. Linkage analyses with several hundreds of highly polymorphic microsatellite repeats showing a relatively uniform distribution through the genome were carried out in more than 30 studies involving families with several MS patients (Ebers et al. 1996; Sawcer et al. 1996; Ban et al. 2002). Genome-wide linkage analyses performed in different ethnic groups, as well as meta-analyses of the pooled data (Hermanowski et al. 2007; GAMES and Transatlantic Multiple Sclerosis Genetics Cooperative 2003), yielded inconsistent results in the majority of cases, pointing to potential genetic heterogeneity of MS in different populations. The HLA class II locus in the major histocompatibility complex (MHC) region on chromosome 6p21 was the only exception, being already identified as a MS risk factor in earlier association studies. Later, linkage to MS was tested for single nucleotide polymorphisms (SNPs) (Sawcer et al. 2005), which usually consist of two alleles and are consequently less informative than microsatellites, but are more abundant in the genome. In spite of the higher genome coverage density, the study did not identify any region with statistically significant linkage to MS outside the HLA locus, pointing to a low sensitivity of the method (Cree 2014).

In the last decade, genome-wide association study (GWAS) has become the most common method to search for new genes involved in predisposition to MS. GWAS provides a powerful tool for investigating the genetic architecture of human polygenic diseases and is based on comparing the allele frequencies of SNPs distributed throughout the genome between samples of unrelated patients and control individuals. The analysis is performed using microarrays or more advanced techniques that allow a simultaneous genotyping at several tens of thousands to several millions of SNPs per genome (Cree 2014). Such experiments are comparatively inexpensive, allowing hundreds or thousands of patients to be genotyped in one study. This review considers the main achievements and challenges of using GWAS and complementary hypothesis-driven studies to identify the genes involved in MS. We overview the biological functions of the loci replicated between GWASs and discuss some aspects of “missing heritability.”

Gwas data for multiple sclerosis

The GWAS data for MS, as well as for other complex diseases, can be found in the regularly updated NHGRI-EBI GWAS catalog (Hindorff et al. 2009). Established in 2008 on the website of the National Human Genome Research Institute (NHGRI, United States), it migrated in March 2015 to the European Bioinformatics Institute (EMBL-EBI) and is now available at (http://www.ebi.ac.uk/gwas; Burdett et al. 2015). This catalog includes data of all published GWASs assaying at least 100,000 SNPs and all SNP-trait associations with p values <1 × 10−5 (Hindorff et al. 2009). This catalog includes not only studies of genetic susceptibility to MS but also GWASs of important MS clinical features: oligoclonal bands’ status, IgG levels, response to interferon beta therapy, MS severity, brain lesion load, age of onset, brain glutamate levels, normalized brain volume.

In Table 1, we summarized the data on all independent GWASs that investigated MS susceptibility in various (mostly Caucasian) populations by comparing the allele frequencies between MS patients and control subjects. Detailed information, including a description of the individual MS-associated loci at level of significance less than 1 × 10−5, is presented in Table S1.

Table 1 Summary of genome-wide association studies of susceptibility to multiple sclerosis published up to June 2015

The leading role in these studies belongs to International Consortia, which possess individual DNA samples from various clinics of the world; the main leaders are the International Multiple Sclerosis Genetic Consortium (IMSGC), Welcome Trust Case Control Consortium 2 (WTCCC2), and Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene). IMSGC was formed in 2003 with the aim of adopting a collaborative approach to the challenge of identifying all the inherited risk factors for MS. It has combined resources of 25 research groups from 14 countries and possesses DNA collected from more than 20,000 subjects. WTCCC2 performs GWASs of 13 common diseases including MS. Together with the IMSGC and the Genetic Analysis of Multiple sclerosis in EuropeanS (GAMES) collaborative group, WTCCC2 was able to recruit about 10,000 independent MS cases. The ANZgene Consortium, a collaborative project formed in 2007, comprises eight centers possessing over 3500 DNA samples for use in MS research.

Affymetrix or Illumina genome-wide platforms were mostly employed in GWASs, and target SNPs more or less regularly distributed throughout the genome increased in number with progress in technology. The number of SNPs covered by arrays widely varies in different studies (with range from 262 K to approximately 600 K) (Table 1). Importantly, only SNPs whose minor alleles are common (minor allele frequency, or MAF, more than 1 %) were included in the GWAS panels; in some studies, only SNPs with MAF even more than 5 % were studied.

Apart from increasing the average density of genome coverage, the genotype imputation method, i.e., the statistical inference of unobserved genotypes, was used in studies (Aulchenko et al. 2008; ANZgene et al. 2009; Nischwitz et al. 2010; Sanna et al. 2010; IMSGC et al. 2011, 2013; Patsopoulos et al. 2011; Matesanz et al. 2012; Isobe et al. 2015) to improve the efficiency of analyses. Genotype imputation allows evidence to be accurately evaluated for association of genetic markers that are not directly genotyped. The method is based on estimating the linkage disequilibrium level for polymorphic sites of a particular haploblock; as a result, a genotype at a specific SNP is predicted with a high probability (Li et al. 2009). Imputation makes it possible, first, to achieve consistency in the polymorphic site sets assessed in different samples of patients and controls (often genotyped using different platforms); second, to improve the statistical power of the study, thus detecting more significant associations than studies with a particular platform; third, to increase coverage at the putatively associated loci, allowing fine mapping of such regions. Genotype imputation was applied in the reviewed GWASs for all these purposes. In particular, due to this technique, datasets genotyped by different platforms were analyzed jointly in (Matesanz et al. 2012); the number of tested SNPs increased from 555,335 to 6,607,266 in (Sanna et al. 2010); and several previously identified loci were replicated by (Nischwitz et al. 2010), if not available in the used microarray.

The associations revealed in genome-wide studies need validation. For this purpose, GWAS is nearly always performed in two phases, or stages. The first, or discovery phase, aims at detecting associations, while the second, or replication phase, verifies the associations on new independent samples. As a rule, independent genotyping methods, such as real-time PCR or mass spectrometry, are used in replication phase. As seen from Tables 1 and S1, all studies except (Baranzini et al. 2009; Nischwitz et al. 2010; Patsopoulos et al. 2011) included these two phases.

Early MS GWASs included fewer than 1000 patients at discovery phase (Table 1), whereas up to 10,000 subjects were enrolled in the GWAS conducted in 2011 (IMSGC et al. 2011). On the whole, the number of associations increased with increasing sample size; this observation is in line with analogous data for other complex traits (Visscher et al. 2012). According to simulation studies, GWASs with fewer than about 2000 cases and 2000 controls at discovery phase will have low power to detect associations typical for common variants (Spencer et al. 2009). Ideally, to maximize the number of genuine associations, such studies should exceed the effective population size of ~10,000 individuals per group (Pèer et al. 2008; Sawcer et al. 2014). Of 13 GWASs performed for MS, only four (De Jager et al. 2009a, b, c; ANZgene et al. 2009; Patsopoulos et al. 2011; IMSGC et al. 2011) comply with the first requirement and only one (IMSGC et al. 2011) corresponds to the second. This is not surprising taking into account the median global MS prevalence of 33 per 100,000 (Browne et al. 2014).

In fact, the increase in the sample size in the majority of GWASs is achieved by including in the same study patients with different MS forms or even with the clinically isolated syndrome, a distinct etiologic form that may, but does not always, develop in clinically diagnosed MS (Kuhle et al. 2015). Patients with a certain MS form were examined only in two GWASs, one involving patients with bout-onset MS (RRMS or SPMS) (Comabella et al. 2008), and the other with PPMS (Martinelli-Boneschi et al. 2012). This allowed searching for SNPs specifically involved in the given MS form development, but at the same time restricted potential sample size, thereby decreasing the power to identify associations. As a result, unfortunately, only HLA loci were identified in these studies.

Another way to increase the sample size that was applied in the majority of GWASs (see Table S1) is to enroll individuals from different ethnic groups. In such cases, the problem of population stratification arises, that is, the risk of false-positive/-negative results due to allele frequency variations among different ethnic groups. Indeed, the enrichment of specific disease-susceptibility alleles in more genetically homogeneous populations was noted in several studies of complex diseases (Lu et al. 2014). It has been observed that associations of some SNPs are more consistent among populations than associations of other SNPs. This may depend on SNPs allelic frequencies, localization in genes and chromosomes, different LD patterns in populations, different evolutionary history of genes affecting complex disease, etc. (Marigorta et al. 2011). A possible way to partially obviate the problem is to include subjects of closely related ethnic groups in the study (Baranzini et al. 2009). Statistical adjustment, wherein data are analyzed separately for each ethnic group, is used to avoid the problem, but additional parameters are thereby introduced in the analysis, and the significance level grows lower (Bush and Moore 2012). More intricate statistical methods in data analysis are also used (IMSGC et al. 2011) to cope with population stratification, relying on comparison of allelic frequencies with HapMap data and using modified data on ethnicity as covariates (Bush and Moore 2012).

The ethno-specific nature of some associations may be a reason for lack of reproducibility of GWASs. So KIF1B found in Danish isolate (Aulchenko et al. 2008) was not replicated in datasets from other GWASs and, moreover, was shown by equivalence analysis not to be associated in IMSGC samples (Gourraud and IMSGC 2011). In cases where the results are replicated in several ethnoses (e.g., IL2RA, IL7RA, CLEC16A), associations are likely to be universal.

Gender-specific effects were also analyzed in two GWASs (Baranzini et al. 2009; IMSGC et al. 2011). MS associations identified in these studies differ between men and women, though the majority of gender-specific associations are not significant at the genome-wide level. Three SNPs in the most well-powered GWAS to date (IMSGC et al. 2011) were associated with MS at the genome-wide significance either in women (rs1800693 in TNFRSF1A) or in men (rs2293370 in TIMMDC1 and rs13333054 downstream IRF8). Taking into account the existing GWAS data and differences in prevalence and clinical phenotype of MS in men and women, it seems reasonable to pay more attention to gender-specific associations to shed light on potentially divergent gender mechanisms of MS pathogenesis.

Results of a meta-analysis that N.A. Patsopoulos et al. (2011) reported in 2011 are also included in Table 1 along with GWAS data obtained on relatively or totally independent samples. In this meta-analysis, data from seven earlier genome-wide studies (the sample was substantially extended in one of them), which were carried out using different platforms and included approximately 750,000 SNPs, were integrated via genotype imputation to produce one panel of 2.5 million SNPs. Considering that various methods of genotype imputation are widely used in genome-wide studies, including both their first and second phases (Matesanz et al. 2012), the study (Patsopoulos et al. 2011) not only statistically analyzed the total wealth of GWAS data available for MS at that time, but also demonstrated a strategy for further development of such studies.

As is seen in Table 1, virtually all GWASs showed a highly significant association of MS with the MHC region, whose most important role is the genetic control of the immune response. The most significant signals mapped to HLA-DRB1 class II gene (IMSGC et al. 2011; Patsopoulos et al. 2011; De Jager et al. 2009a, b, c). The significance level of the associations reaches p <4 × 10−225 for rs3135388 (De Jager et al. 2009a, b, c) and p <1 × 10−320 for non-attributed SNP tagging the same DRB1 gene (IMSGC et al. 2011). As a rule, risk alleles at these SNPs tag for HLA-DRB1*15:01. In several studies HLA class II is the only MS-associated locus (Comabella et al. 2008; Martinelli-Boneschi et al. 2012); in others, the number of significantly associated loci varies from 2 (Jakkula et al. 2010; Sanna et al. 2010) to 60 (IMSGC et al. 2011).

Eighty-six loci associated with MS with the significance level less than 1 × 10−5 were identified in GWASs conducted from 2007 to 2014 (Table S1). Inconsistency for a majority of identified loci in different GWASs is seen, possibly reflecting clinical and ethnic heterogeneity of samples as well as the presence of false-positive and/or false-negative results.

It is clear that with increasing genome coverage, a growing number of hypotheses are tested simultaneously, thus making false-positive associations more likely. Currently, the stringent requirements are put on the significance level, which is set at p < 5 × 10−8 to take into account the Bonferroni correction for 1,000,000 comparisons in GWASs (Pèer et al. 2008; Stranger et al. 2011). Figure 1 (modified from http://www.ebi.ac.uk/fgpt/gwas/) shows the chromosomal positions for all the loci associated with MS at the given significance level (black circles) on the background of the loci that had been associated with any CD at the same significance level by the time of diagram construction. It is seen that MS risk loci were found on many, but not all, chromosomes, and that they are irregularly distributed through the chromosomes. MS-associated SNPs were the most numerous on chromosomes 1 and 6 (seven and six loci, respectively) and absent from autosomes 4, 13, 15, 21 and sex chromosomes. In total, 47 loci were associated with MS at p < 5 × 10−8.

Fig. 1
figure 1

Chromosome distribution of the loci that were associated with MS susceptibility in any GWAS included in the NHGRI-EBI GWAS Catalog as of April 17, 2015 at p <5 × 10−8 (black circles). Background circles, loci that were associated with other CDs at the same significance level by the time of figure preparation [modified from the EBI website (http://www.ebi.ac.uk/fgpt/gwas/)]. One locus on the first chromosome is absent from this figure comparing with the original GWAS diagram because it is irrelevant for our study (rs533259, associated with MS lesion distribution from the study of Gourraud et al. 2013)

Hypothesis-driven genome-wide association studies for multiple sclerosis

In the hypothesis-driven studies described below, which are complementary to GWASs, the genome coverage density is increased for the regions most likely involved in the disease.

For the first time, this approach was applied for MS, along with ankylosing spondylitis, autoimmune thyroid disease (Graves’ disease) and breast cancer (1500 shared controls and 1000 patients for each disease) in the study of Wellcome Trust Case Control Consortium; Australo-Anglo-American Spondylitis Consortium, et al. (WTCCC et al. 2007). Genotyping was performed with a custom-made Infinium array (Illumina) and involved 14,436 non-synonymous SNPs (nsSNPs) in protein-coding genome regions. In addition, because three of the diseases were of autoimmune etiology, a dense set of 897 SNPs throughout the MHC region was genotyped. The strongest association observed in the study was between SNPs in the MHC region and the three autoimmune diseases studied, with p values of < 1 × 10−20 for each disease. In case of MS, the maximum signal is centered near HLA-DRB1, a well-known MS-associated gene.

To exploit the hypothesis-driven GWAS approach, Illumina has used the available data on genetic susceptibility to 11 autoimmune and inflammatory disorders to design a new microarray (ImmunoChip) (Parkes et al. 2013). The ImmunoChip includes more than 195,000 SNPs, with 2000 SNPs known to display the most significant associations per autoimmune disorder and a high coverage density for 186 loci. In fact, this array allows deep replication of formerly identified associations with MS and other immune-mediated diseases. An analysis with the ImmunoChip is not only more efficient, but also less expensive.

The ImmunoChip was used in MS as a typical autoimmune inflammatory disorder (IMSGC et al. 2013). More than 14,000 MS patients and more than 24,000 healthy individuals were included in the study at the first stage, and independent samples from earlier genome-wide studies were additionally used at the second stage, the total sample size thus exceeding 80,000 (all of the subjects were Caucasians). A large panel of highly significant associations was found in this global study. Particularly, association with MS at p < 5 × 10−8 was demonstrated for 49 polymorphic loci that had been found earlier (IMSGC et al. 2011), and 48 new loci were found (Table S2) that had not been detected in any earlier GWAS and still need replication.

Recently another ImmunoChip MS study was published (Isobe et al. 2015). All subjects were African Americans. The first stage of the study was performed on 803 MS patients and 1516 controls. 21 non-MHC MS risk loci out of 110 established earlier in Europeans (Oksenberg 2013) were replicated. The second phase was conducted on 620 MS cases and 1565 controls and was aimed toward replication of new associations. None of the SNPs achieved the level of significance less than 1 × 10−5. The power of the study is relatively low (MS is rare in persons of African ancestry); another problem is that it is not quite clear to what extent the ImmunoChip is applicable to investigating MS susceptibility in African Americans, keeping in mind that haploblocks are shorter in more ancient ancestries (Lambert and Tishkoff 2009).

Ms-associated loci replicated in independent genome-wide studies

Table 2 summarizes results of GWASs and hypothesis-driven genome-wide studies. Taking into account inconsistency of different genome-wide results, we present here only those MS-associated loci that were identified in at least two independent studies. Depending on the used microarrays, which include different SNPs in panels, a number of linked SNPs have been identified as conferring MS risk in the same loci. The following thresholds were set: the genome-wide significance level (p value <5 × 10−8) for the combined discovery and replication datasets should be reached at least in one study, while in the other(s) p value(s) should be less than 1 × 10−5. Different thresholds are used because of different numbers of simultaneously tested hypotheses in an initial and replication GWAS studies.

Table 2 Multiple sclerosis-associated alleles of polymorphic loci, which have been identified in at least two independent studies out of conducted genome-wide studies and complementary to GWAS hypothesis-driven studies (WTCCC et al. 2007; IMSGC et al. 2013), and showed p <5 × 10−8 in at least one of them

The loci corresponding to these criteria are summarized in Table 2. In total, up to May 11, 2015, 40 loci and 52 genes mapped to them are characterized, and the association of their alleles with MS can be accounted for as validated at least in Caucasians. They are located in all autosomes except 4, 7, 9, 13, 15, 18, 21.

As seen from Table 2, most associated SNPs are located in introns, intergenic regions; associations found in exons or UTRs are considerably less frequent. Keeping in mind that GWAS allows to identify MS-associated genomic regions, each of the MS-associated SNPs can be either functional (causative) SNP or tag SNP for the causative polymorphism, which was not tested directly. Causative polymorphism may affect the functional activity, level, timing or location of the gene’s product. So far, functional significance was demonstrated for several polymorphisms. For example, rs6897932, rs2104286, and rs1800693 of the cytokine receptor genes IL7RA, IL2RA, and TNFRSF1A influence proportions of the membrane-bound and soluble respective receptor forms (Gregory et al. 2007, 2012; Maier et al. 2009).

As for tag SNPs, association significance is in straight correlation with LD between causative SNP and tag SNP included in the panel. To identify the causative SNPs, several other SNPs not tested in the original GWAS are additionally analyzed in the associated loci; otherwise, the regions of interest are resequenced. This approach is known as fine-mapping study. The thorough analysis often involves logistic regression analysis to search for the most associated SNP as well as to detect independent associations within one locus. As an example, rs2300747 was identified as a primary association in CD58 in fine-mapping study (De Jager et al. 2009b) and therefore tested in the replication phase and in the meta-analysis of the simultaneous GWAS (De Jager et al. 2009a, b, c). Independent effects were assumed for two SNPs of TNFRSF1A (De Jager et al. 2009a, b, c).

Substantial differences in p values for a given locus (Table 2) can be seen. This can be due to the statistical power of the study (the lowest p values were obtained for the most well-powered studies), ethnic specificity of associations with MS, used SNP panels, etc. It is important to note that among loci withstanding the suggested criteria of replication, the majority could fulfill even more stringent requirements, having been identified in at least two studies at the genome-wide significance level. This particularly corresponds to loci identified in IMSGC study (IMSGC et al. 2011) and replicated by ImmunoChip study (IMSGC et al. 2013); several loci, including HLA, IL2RA, IL7RA genes, were identified in almost all studies and in many or all of them achieved p < 5 × 10−8. On the whole, p values as low as p < 1 × 10−320 for HLA and 2.3 × 10−47 for non-HLA loci (IL2RA gene) were obtained (IMSGC et al. 2011, 2013).

Odds ratio (OR) is a commonly applied measure of effect size. For the HLA locus, OR ranges from 2.05 to 3.3. For non-HLA genes, the MS risk alleles exert only a modest effect, odds ratio (OR) ranging from 1.1 to 1.3 with few exceptions (Table 2).

Genes carrying gwas-identified polymorphisms: possible biological roles in pathogenesis of multiple sclerosis

Numerous studies report that MS development is mediated by autoimmune inflammation in CNS, which results in demyelination, oligodendrocyte destruction, axonal breakdown, gliosis and neurodegeneration leading to irreversible neurological dysfunction (Goverman 2009).

Figure 2 represents the main stages of MS pathogenesis. Peripheral anergic autoreactive lymphocytes (mainly T cells) undergo activation with microbial superantigens or with self-antigens with enhanced immunogenicity, particularly due to chronic inflammation (Tauber et al. 2007). The next step is penetration of these cells through blood–brain barrier (BBB), a complex organization of cerebral endothelial cells, pericytes and their basal lamina, which are surrounded and supported by astrocytes and perivascular macrophages (Ortiz et al. 2014). Cytokine imbalance toward the increased production of pro-inflammatory cytokines by T-helpers 1 (Th1) and 17 (Th17) (Hedegaard et al. 2008) promotes the expression of adhesion molecules and HLA class II molecules on endothelial cells of BBB (Dore-Duffy et al. 1993). In addition, chemokines and pro-inflammatory cytokines with the assistance of matrix metalloproteinases may influence the integrity of tight junctions between endothelial cells, facilitating the migration of leukocytes into CNS (Holman et al. 2011). Activated myelin-specific T and B cells undergo the secondary reactivation in CNS by resident or recruited antigen-presenting cells (APCs). Activation of microglia and macrophages results in increased phagocytosis and production of cytotoxic agents such as oxidative radicals, NO, tumor necrosis factor (TNF), and glutamate, which contribute to myelin sheath damage (Lovett-Racke et al. 2011; Cunningham 2013). Activated B cells produce antibodies to myelin proteins and lipids, thus activating a complement system with the following formation of membrane attack complexes that directly damage myelin sheath (von Büdingen et al. 2011). Conversely, FoxP3-expressing natural Treg cells (Tregs) and IL-10-producing T regulatory type 1 cells (Tr1) play a crucial role in restricting autoimmune neuroinflammation and establishing immunological tolerance (Kleinewietfeld and Hafler 2014). Activated CD8+ T-lymphocytes participate in oligodendrocyte destruction and axonal breakdown via FAS/FAS-ligand-mediated cytolysis (Denic et al. 2013).

Fig. 2
figure 2

The main stages of MS pathogenesis (see description in text). APC antigen-presenting cell, BDNF brain-derived neurotrophic factor, HLA/MHC human leukocyte antigen, IFNg interferon gamma, GA glatiramer acetate, IL interleukin, NGF nerve growth factor, TNF tumor necrosis factor, TGFβ transforming growth factor beta, Th T-helper cell, Treg T regulatory cell, FasL Fas-ligand

Autoimmune mechanisms are mostly intense during disease exacerbations in RRMS. The chronic MS forms are characterized by more intensive myelin degeneration and oligodendrocyte and axonal loss (Lassmann et al. 2007). However, axonal loss was shown both in acute and chronically active lesions and in normal-appearing white matter in brain tissue of MS patients (Nylander and Hafler 2012). Many different immunological mechanisms may lead to axon injury including destruction by specific T cells, activated microglia, invading macrophages, natural killer cells and auto-antibodies against specific antigens (Iwanowski and Losy 2015). In addition, several non-immunologic mechanisms, such as neurotrophic factors imbalance (Hohlfeld 2008) and glutamate excitotoxicity (Stys 2005), can contribute to axon injury and neurodegeneration.

SNPs identified and replicated in GWASs of MS (Table 2) are located mostly in or near protein-coding genes and relatively rarely in known RNA-coding genes. The majority of MS-associated protein-coding genes are directly involved in immune-related functions. Relevant references are available in the GeneCards human gene database (http://www.genecards.org), a US National Center for Biotechnology Information (NCBI) resource (http://www.ncbi.nlm.nih.gov/gene), and in other sources. Here, we performed a brief GO search to overview the functions, which are enriched in this gene set. We searched for biological processes of genes from Table 2 in AmiGO database (http://amigo.geneontology.org/amigo/landing) using Bonferroni correction, provided by the website. Fifty-two GO terms significantly associated with this gene set were found (p < 0.05); for genes MIR1204, MIR1205, MIR1208, MIR3686, and PVT1, the search failed. To get rid of redundant GO terms, we put all found GO terms and their p values in the search window of REViGO web server (http://revigo.irb.hr/; Supek et al. 2011); one of the convenient graphical representations, which is available at the server, is shown at Fig. 3. It is seen that the majority of genes are involved in lymphocyte activation, cytokine response, cell–cell adhesion, regulation of immune response and other immune-related functions, as well as development regulation.

Fig. 3
figure 3

The main biological processes, in which are involved genes from Table 2. These genes were analyzed with AmiGO website (http://amigo.geneontology.org/amigo/landing); after that we put all GO terms, which were significantly (p < 0.05) involved in this gene set, as well as their p values, in search window of REViGO web server (http://revigo.irb.hr/; Supek et al. 2011). The results are presented as TreeMap; the size of squares reflects log p values of these processes’ association with the selected genes

It is well known that HLA locus is an essential component of immune response and immune system development. After 10 years of GWAS, the MHC region still represents roughly half of the MS genetic risk. The fact that HLA locus has a highly significant association with MS is consistent with the autoimmune nature of MS. A genotypic gradient of risk with ascending hierarchy from DRB1*11:01/DRB1*11:01 up to DRB1*15:01/DRB1*15:01 has been described (Oksenberg et al. 2008). Multiple epistatic interactions occur in the class II region, where certain genes nullify the risk conferred by DRB1 genotypes, and other genes act as modifiers of disease severity. It is likely that independent signals in the telomeric class I region of the HLA locus confer protection against MS (Oksenberg 2013). Further studies with denser coverage of the class I and class III HLA gene regions are necessary for identifying other MS-associated genes of the locus.

As for the non-HLA genes associated with MS, an essential part of those presented in Table 2 are directly involved in the T cell function; that may indicate the leading role of T cell immunity in MS development. We will consider briefly the functions known for the protein products of these genes. T cell differentiation and survival of immature CD4 + CD8+ thymocytes depends on the TCF7 transcription factor, whose activation is mediated by the prostaglandin E receptor 4 (PTGER4). Cytokines and cytokine receptors (IL12, IL7R, and IL2R), as well as the costimulatory molecules CD58 and CD86, also play a substantial role in T cell proliferation and differentiation. T cell activation depends on the regulatory protein encoded by TAGAP, and activation maintenance depends on expression of the CD6 receptor on the T cell surface. In turn, the product of NDFIP1 prevents T-helper-mediated inflammation from passing into a chronic phase. TYK2 kinase is involved in T-helper polarization; i.e., a decrease in TYK2 activity is associated with a Th2 bias of T-helper differentiation.

The products of CD40, RGS1, and CXCR5 are directly involved in the B cell function. A large group of genes codes for proteins involved in the NF-κB signaling pathway and inflammatory response. Genes of the group belong to the superfamily of the tumor necrosis factor (TNFSF14 and TNFAIP3) or its receptors (TNFRSF1A), cytokine receptors (IL22RA2), and adhesion molecules (VCAM1). The protein products of STAT3, IRF8, TYK2, and ZMIZ1 are involved in the JAK/STAT signaling pathways that regulate cytokine expression and affect the immune response development.

Some of the genes presented in Table 2 are expressed in the nervous system to a higher or lower level; these are EOMES, OLIG3, GALC, AHI1, RPL5, and MMEL1. The transcriptional activator eomesodermin, which is encoded by EOMES, plays an essential role in early differentiation of the embryo and the brain and is involved in CD8+ T cell differentiation. The GALC product acts as a galactosylceramidase and plays a role in lysosomal catabolism of galactosylceramide, which is a major glycolipid of the axon myelin sheath.

The functional role the other GWAS-identified genes play in MS susceptibility is not that clear. These genes encode transcription factors (e.g., the repressor HHEX, which is involved in hematopoietic cell differentiation, and protooncogene MYC, an important regulator of the cell cycle), cytochrome family proteins (CYP24A1 and CYP27B1), important components of various signaling pathways (e.g., MAPK1 protein kinase, regulator of G protein signaling RGS14, transcription factor BACH2), and many enzymes (among them methyltransferase METTL1, which is involved in tRNA methylation, and dimethylaminohydrolase DDAH1, which plays a role in nitric oxide generation). The function remains unknown for some of the genes (e.g., CLEC16A, although there is evidence that the gene is expressed almost exclusively in cells of the immune system).

Several microRNA genes—MIR1204, MIR1205, MIR1208, and MIR3686—map to MS-associated locus 8q24.21. This locus also harbors PVT1, which encodes a regulatory RNA, attributed to the long noncoding RNA class. As the majority of identified SNPs are located in introns or intergenic regions (Table 2) of protein-coding genes, it can be suggested that the list of MS-associated RNA-coding genes will be extended. These findings and considerations may indicate that the epigenetic regulation plays an important role in MS pathogenesis.

In general, the results of GWASs, both classical and hypothesis-driven, taken in the context of biological functions of identified genes, are in line with the modern concept of MS as an autoimmune disease affecting CNS, and can even expand our knowledge about MS pathogenesis. It should be mentioned that about one-third of MS-associated genes were shown to be associated with other autoimmune disorders in the ImmunoChip study (Oksenberg 2013).

Main conclusions and future perspectives

GWASs identified the multiple loci and genes involved in MS development, thus leading, according to the figurative expression (Oksenberg 2013), to creation of a new genetic atlas of MS. The currently available list of genes has already provided a better understanding of the potential molecular mechanisms involved in MS pathogenesis. However, we apparently are at the initial stage of this process. Recent estimations based on sibling studies (Sadee et al. 2014) suggest that the MS risk loci identified to date explain only about 27 % of its total heritability (Lill 2014). At that, HLA-DRB1*15:01 alone explains about 20 % of heritability (Lill 2014). There are several explanations of this “missing heritability” (Manolio et al. 2009; Sadee et al. 2014) in complex diseases, of which MS is a typical example.

One of them is a poor reproducibility of the GWAS-identified MS loci in other GWASs or subsequent candidate gene studies. As a matter of fact, the stringent requirements of data significance fail to completely prevent false-positive findings and simultaneously can prevent detection of some true associations because of insufficient statistical power of the studies.

Another possible reason of “missing heritability” is that the GWAS design is based on the common disease-common variant (CDCV) hypothesis, which states that weak effects of common alleles (MAF more than 1 %) provide a genetic basis for common diseases. The CDCV hypothesis is applicable to the polygenic diseases including MS (Bush and Moore 2012), but omits a possible effect of rare alleles (with a frequency of 0.01–0.1 %), while these alleles may contribute substantially to the disease (Visscher et al. 2012). Discovery of rare polymorphisms potentially conferring high risk in distinct individuals requires full-genome sequencing, which is still too expensive to provide in every individual.

“Missing heritability” may arise also as a result of the traditional locus-by-locus association analysis (Zuk et al. 2012). Meanwhile, epistasis between loci or pathways can take place in genetic susceptibility to MS (Lvovs et al. 2012; Cotsapas and Hafler 2013). The bioinformatics analysis currently used in GWASs does not report the risk factors that are determined by nonlinear (epistatic) interactions between alleles (including both rare and common ones) in an individual allele set (Lvovs et al. 2012), as well as interactions of genotype with nongenetic factors.

It is useful to keep in mind when interpreting GWAS results that many adaptive frequent variations, remaining in the noise of GWAS results, are in gene–gene interactions and can turn deleterious under unfavorable conditions (Sadee et al. 2014). Recently, novel approaches to GWAS analysis have been proposed. These strategies focus on the additive effects of multiple loci, acknowledging that each may make a small contribution to the overall phenotype, potentially providing valuable insights into the genetic basis of common disease (IMSGC et al. 2007; ISC et al. 2009). Rather than focusing on individual markers, network-based analysis methods take into account multiple loci in the context of molecular networks. Due to this critical feature, these methods can afford to use sub-genome-wide (nominal) statistical significance and increase the power to detect new associations and functional relationships between genes in complex traits (Wang et al. 2015). By applying network-based analysis of (Patsopoulos et al. 2011) and (IMSGC et al. 2011) results, in the study (IMSGC 2013), several plausible candidate genes were suggested. Of them, BCL10 and TRAF3 were associated at the genome-wide significance level in the ImmunoChip study (IMSGC et al. 2013). Analyzing nominal gene-level significance and studying genes in the context of biological networks seem a reasonable approach for future GWAS analysis (Matsushita et al. 2015).

One more approach considers association signal from haplotypes (SNP-strings) instead that of SNPs (Khankhanian et al. 2015). In this wise, there is search for clusters of SNPs, which are jointly associated with the disease and which presumably belong to a particular disease-associated haplotype. In this study, 32 regions were shown to be more significantly associated at SNP-haplotype model than at single-SNP model in the individuals studied at (IMSGC et al. 2011, 2013). Upon such kind of analysis, the area under curve characterizing overall heritability explained by haplotypes was higher than that of distinct SNPs (Khankhanian et al. 2015). Of note, odds ratios of associations of haplotypes are considerably higher than ORs of SNPs by themselves (Goodin and Khankhanian 2014).

Another aspect of the problem became clear in 2012 after completion of the ENCODE project, which was aimed at deciphering the functional part of the genome. A report of its completion (ENCODE Project Consortium et al. 2012) stated, in particular, that 80.4 % of the genome is functional, being involved in at least one biological process in at least one cell type. This conclusion indicates that, to understand the nature of MS, it may be insufficient to analyze the associations with the allelic variants that occur in protein-coding regions, which account for approximately 1.5 % of the total genome in humans, or in their vicinity. Other genome regions are important to decipher as well. Indeed, most SNPs associated with MS according to GWASs are located in introns and intergenic regions (see Tables S1 and 2 for example). There is growing evidence of vast involvement of noncoding DNA regions via eQTLs, noncoding regulatory RNAs and other mechanisms in gene expression regulation, and some data on MS risk loci suggest abundance of eQTLs among them (Sawcer et al. 2014). Considering this, it is reasonable to expect the appearance of novel GWAS SNP panels with more in-depth coverage of RNA-coding regions. In addition, in the currently existing GWAS design, copy number variations are underexplored (Lin et al. 2012).

The above considerations give grounds to expect that next generation sequencing (NGS) techniques, based on high-throughput parallel DNA sequencing, will be employed in GWASs of the genome variants associated with particular phenotypes, the more so as the NGS techniques continuously improve and grow cheaper (Pavlopoulos et al. 2013). The use of these techniques in no way alters the basic ideology of the GWASs, but rather provides for its further development.

New targets for MS therapy can be identified in new studies. For instance, GWAS data should prompt a search for target proteins among the NF-κB signaling pathway components. Apart from shedding light on the molecular pathogenesis of MS, which is required for developing new MS therapies, GWASs aim at identifying, via genotyping, the genetic risk factors suitable for predicting the individual susceptibility to MS. In other words, the most important result of studying the genetic architecture of MS would be formulation of a rule to predict the disease in a particular individual with high probability by carriage of certain alleles and allelic combinations. Studies estimating the individual MS risk on the basis of GWAS data alone (Wang et al. 2011) or in combination with findings from candidate gene studies (Jafari et al. 2011; Gourraud et al. 2011) have led to some encouraging results, though not clinically applicable yet. For example, individuals carrying more MS risk variants (estimated using the weighted Genetic Risk Score) have increased odds of MS development (De Jager et al. 2009c). Patients with severe disease, higher oligoclonal bands, or earlier age at onset typically have a higher genetic load (i.e., number of genetic markers of MS, assessed using Multiple Sclerosis Genetic Burden score (Gourraud et al. 2011). Patients with clinically isolated syndrome having a higher genetic load tend to develop MS more rapidly (Gourraud et al. 2012).

The genetic MS atlas will certainly be improved in the future by adding new findings and excluding the data that lack support. The heterogeneity of the MS patient samples examined so far makes it possible to expect that several partly different atlases will be obtained for different ethnic groups and for different MS forms as data replication studies will be performed in homogenous samples.