Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disorder, affecting more than 1% of the elderly population. Clinical manifestations include bradykinesia, resting tremor, muscular rigidity, and postural instability. Essential neuropathological features are loss of dopaminergic neurons in the substantia nigra and the presence of eosinophilic intracytoplasmatic inclusions known as Lewy bodies and Lewy neuritis. Despite early reports of familial parkinsonism, the disorder was long believed to be environmental with no genetic component [1].

However, since the 1990s, an increasing amount of evidence has established the importance of genetic factors in the etiology of PD. Mutations causing Mendelian forms of the disease have been identified on the basis of linkage studies in large families, and association studies have implicated several loci in sporadic disease. Knowledge about the genetic architecture of PD has provided important clues about pathogenic pathways and generated hypotheses for further research [2, 3]. Characterizing the role of SNCA, encoding the α-synuclein protein, has been a particularly important aspect of this development. Mutations in SNCA were the first to be identified in monogenic PD, and variants in this gene show the strongest association with sporadic disease. α-Synuclein is a main component of Lewy bodies. The aim of this paper is to review our current understanding of genetic variability at the SNCA locus in relation to PD.

Gene and protein

α-Synuclein is part of a highly conserved protein family called synucleins. The name derives from the fact that it was first identified in the synapses and nucleus in the torpedo ray [4]. In humans, the protein was originally isolated from purified preparations of amyloid from the cortex of patients with Alzheimer’s disease (AD) and named NACP (non-Aβ component of AD amyloid precursor) [5]. Based on homology with similar proteins in other species, the now more common name α-synuclein was introduced [6].

α-Synuclein is encoded by SNCA, located on chromosome 4q22. This gene comprises 6 exons, where the latter 5 are translated into a protein of 140 amino acids. Shorter isoforms are generated by alternative splicing (Fig. 1). It is worth noting that the current designation of exons and introns differs from early publications [7]. Graphical diagrams of SNCA also vary depending on whether the gene is displayed in the direction from the centromere, 3′–5′ on the reverse strand, or in the direction of transcription 5′–3′.

Fig. 1
figure 1

a Alternative splicing of mRNA gives rise to α-synuclein isoforms of three different sizes. b Immunohistochemistry for α-synuclein showing positive staining of an intraneural Lewy body in the substantia nigra in Parkinson's disease [94]

The normal function of α-synuclein is incompletely understood. It is predominantly expressed in neurons and localized to presynaptic terminals. A number of functions have been proposed [8], with recent evidence supporting a role in the cycle of repeated neurotransmitter release from presynaptic vesicles [9]. The toxic potential of α-synuclein is thought to depend on its capacity to aggregate and accumulate in the nerve tissue, although the causal relationships between different pathological cellular events are unclear. The proposed presence of α-synuclein in AD plaques has subsequently been questioned [10].

However, the protein was histopathologically linked to neurodegeneration when a study using immunohistochemical staining identified α-synuclein as the major component of Lewy bodies, the hallmark lesion of PD [11] (Fig. 1). Notably, this discovery was prompted by the recognition of SNCA as the first gene implicated in PD [12]. The development of α-synuclein antibodies has been of importance for pathological examinations in PD, leading to an increased understanding of the widespread pathologies in different parts of the brain as the disease progresses. Based on staging of PD lesions, it has been proposed that the pathology spreads throughout the nervous system in a stereotypic pattern, possibly originating in the gut or nose [13]. The idea of a spreading disease mechanism is also supported by reports of Lewy bodies observed in fetal neurons transplanted into the striatum of PD patients more than a decade earlier [14, 15]. A currently debated hypothesis states that misfolded α-synuclein may transfer between cells and induce further misfolding and aggregation in a prion-like manner [16].

Different patterns of α-synuclein pathology, both with and without Lewy bodies, have been characterized in other neurodegenerative diseases. These discoveries have suggested a molecular link between disorders that were previously conceived as unrelated, yielding the new concept of “synucleinopathies” [17, 18]. This term commonly refers to PD, dementia with Lewy bodies (DLB) and some rare Lewy body disorders, as well as multiple system atrophy, where α-synuclein-positive inclusions are found primarily in glial cells. However, lesions containing α-synuclein have also been shown in a variety of other neurodegenerative disorders, such as familial forms of AD, motor neuron disease, ataxia-teleangiectasia and others.

SNCA in autosomal dominant PD

Missense mutations

While early reports of hereditary PD described small pedigrees lacking autopsy confirmation, they served to inspire later researchers studying familial disease [19, 20]. A 1990 publication reported two large kindreds with autosomal dominantly inherited PD affecting 41 individuals in four generations [19]. Both kindreds originated from Contursi, a village in southern Italy, where, tracing the generations back to 1700 A.D., they were found to share common ancestors. The clinical appearance resembled sporadic PD, but with earlier onset, and autopsy of two patients revealed typical PD pathology with Lewy bodies. The PD phenotype in the Contursi kindred was later mapped to chromosome 4q21-23 by linkage analysis [21]. The penetrance in the family was estimated to 85%. Positional cloning and sequence analysis identified a missense mutation in SNCA (c.157G>A) resulting in a p.Ala53Thr substitution in the amino acid sequence [12]. The mutation segregated with PD in the Italian kindred and three Greek families with similar PD phenotype and pattern of inheritance.

The SNCA p.Ala53Thr mutation was the first genetic cause of PD to be recognized and provided the initial link between α-synuclein and PD pathology. Demonstrating how a single gene defect could be sufficient to determine the disease phenotype, the discovery also suggested that genetic factors could be more important for PD etiology in general than previously thought. Two further missense mutations have subsequently been identified, p.Ala30Pro (c.88G>C) in a German family with typical PD [22] and p.Gly46Lys (c.136G>A) in a Spanish family with a phenotype more resembling DLB [23].

Genomic multiplications

Another landmark discovery in the study of SNCA was the identification of genomic multiplications. A large family with dominantly inherited parkinsonism, often referred to as the Iowa kindred, was reported as early as 1962 [24]. Later studies of the same kindred described the phenotype as Dopa-responsive early onset parkinsonism with rapid progression, followed later by dementia [20]. Neuropathological examination showed widespread Lewy body pathology, and abundant α-synuclein inclusions were revealed with immunostaining [25]. Although SNCA mutations were initially thought to be excluded [26], further studies revealed linkage to this chromosomal region where a triplication of the entire locus was discovered [27]. A triplication of independent genetic origin, but with corresponding phenotype, was later identified in a Swedish–American family [26]. Subsequent screening for SNCA multiplications in patients with autosomal dominant PD also identified families with duplications [28, 29]. Interestingly, these patients had a more benign phenotype, resembling sporadic PD with later onset, slow progression, and less prominent cognitive decline compared to individuals with triplication. A study characterizing the multiplication breakpoints in different kindreds found that SNCA was the only gene showing copy number variation in all families [30]. The mechanisms behind multiplications appear to be both segmental gene duplication and recombination events with unequal crossing over, arising in relation to transposable repeat elements.

The recognition of SNCA multiplications has demonstrated several important insights. Overexpression of normal, wild-type α-synuclein is sufficient to cause parkinsonism [26]. Furthermore, the relationship seems to be dose dependent, with individuals carrying four gene copies being more severely affected than those carrying three copies. Finally, the discovery has contributed to a better understanding of the pathological interconnection between PD and DLB [31], including the controversy over whether these are distinct entities or represent a continuum of the same disease.

Both missense mutations and genomic multiplications of SNCA are extremely rare causes of PD. Mendelian forms of PD account for less than 10% in most populations. From the proportion of these being autosomal dominant, SNCA multiplications have been estimated to be the cause of about 2% [32]. Missense mutations seem even rarer. However, insights about α-synuclein and its role in PD pathogenesis also made SNCA an ideal candidate gene for association studies in sporadic disease.

SNCA in association studies

The detection of Mendelian disease loci through linkage analysis requires large families with many affected individuals for sufficient statistical power. However, with the exception of LRRK2 mutations in certain populations, monogenic forms of PD are rare. In later years, much of the attention has shifted towards association studies, seeking to identify risk variants for sporadic disease. Sporadic PD, like other complex diseases, is assumed to be caused by interactions of multiple genetic and environmental factors. Stochastic events may also contribute to disease if unfortunate molecular states occurring at random, such as protein misfolding, can initiate pathological processes [2]. Whereas the first PD association studies in the late 1990s analyzed a limited number of markers in few subjects, recent years have seen large consortia undertaking genome-wide association studies (GWAS) in thousands of samples. The PDGene website provides an updated synopsis of genetic association studies in PD, including meta-analysis of identified variants [33]. SNCA steadily ranks as the site’s “top result”, and the cited studies of SNCA now correspond to more than 60 publications (Table 1).

Table 1 Examples of genetic association studies of SNCA in Parkinson’s disease

Rep1

Early technology favored microsatellites as the most convenient class of polymorphism for association studies. The first published SNCA association study analyzed a mixed dinucleotide repeat in relation to AD [34]. This polymorphism, located in the promoter region and commonly referred to as Rep1 (D4S3481), came to be one of the most extensively studied susceptibility markers for PD. Early studies typically included small numbers of subjects, and results were contradictory. While several researchers reported an association between Rep1 repeat length polymorphisms and PD [3539], others were unable to replicate this finding [4042]. A meta-analysis of two Japanese studies even showed an inverse relationship with proposed risk alleles and PD risk [43]. In an attempt to settle this issue, a large collaborative analysis was undertaken, bringing the number of samples up to 2,692 patients and 2,652 controls [44]. This study found positive evidence of an association between Rep1 allele length variability and PD risk. Subsequently, these results have been replicated in an independent large series [45].

Thus, the status of Rep1 as a risk-associated polymorphism in PD has been well established. However, population differences, as well as Rep1 being a multiallelic marker, may still somewhat complicate the picture. Repeat length is commonly employed as a marker, although studies have demonstrated variability in sequence within the same size of the microsatellite [37, 38]. At least six different allele lengths have been reported [43], and allele designations differ between publications, PDGene using 259, 261, and 236 bp, respectively, for the three most common lengths. Distribution of alleles varies between Japanese and Caucasian populations [43]. The largest studies in Caucasians have demonstrated a trend of increasing PD risk with increasing allele size [45, 46]. Studies in Asian populations have included smaller numbers of subjects. However, their results seem to indicate that both 259- and 263-bp alleles are associated with increased risk compared with the middle length 261-bp allele [33, 43, 47, 48].

Multimarker mapping and linkage disequilibrium analysis

Association studies detect signals from common markers believed to be in linkage disequilibrium (LD) with one or more functional variants, and a set of such markers constitute a risk haplotype. As a consequence, the potential to identify a given polymorphism as significantly associated with disease depends crucially on the pattern of LD in the population under study. To further assess the relative importance of different variants at a susceptibility locus, a common strategy has been to undertake fine mapping studies with multiple markers spanning the gene. In this effort, however, extensive LD will limit the ability to narrow down regions of functional relevance.

From the turn of the century, a multimarker design highlighting haplotypes became more common in SNCA association studies. Technical advances facilitated this development, as modern methods for genotyping single nucleotide polymorphisms (SNPs) became more available. Whereas all larger studies have been able to confirm an association between common SNCA variants and PD risk, the relevant substructure of marker polymorphisms and functionally important variation has been difficult to elucidate.

The diverging results from Rep1 association studies motivated examinations genotyping several markers and performing haplotype analyses [37, 49]. This approach revealed that also SNPs in the 5′ region contribute to the association signal [50]. A 2005 case–control study characterized the LD structure of SNCA in the German population with a dense set of SNP markers spanning the whole gene [51]. Two larger LD blocks were identified, separated by the border between exon 4 and intron 4. SNPs in the 3′ block, including rs356219, showed the clearest evidence of association with disease, nominating this region as more likely to harbor functional variants. A similar study in Japanese subjects highlighted the same group of 3′ SNPs as most strongly associated, although the LD pattern differed [48]. In this population, the entire SNCA gene fell within the same LD block, but with Rep1 on the boundary (Fig. 2). A strong association with PD risk for SNPs in the 3′ region has subsequently been replicated in several studies [5257].

Fig. 2
figure 2

LD plots from HapMap #27 [95, 96], comparing Utah residents with northern and western European ancestry in the upper plot with Japanese in the lower plot. Green color indicates an LD of r 2 > 0.60. SNCA with its six exons and the location of the neighboring gene MMRN1 are displayed above. Arrows show the positions of Rep1 and rs356219. Blue lines are drawn to indicate the major LD blocks discussed by Mueller et al. [51] and Mizuta et al. [48], respectively. Although two blocks have been described in Caucasians, there is essentially extensive LD across SNCA in both populations, especially as defined by D′ (not shown in the figure). Note that the larger LD block in the Caucasian population extends longer into the region 5′ of the gene than in the Japanese, where Rep1 lies near the border between blocks

The evidence for association signals in several regions of SNCA could be interpreted in different ways. Some authors have pointed out that although two LD blocks can be identified, there is still evidence of substantial LD (as measured by D′) between markers at the 5′ and the 3′ end of the gene. In some studies, this seems to apply also to Rep1 [53, 54, 58], whereas others find evidence of LD only with SNP markers in the promoter region [52, 56]. It should be noted that LD is formally defined for biallelic markers, so that some form of conversion has to be made to include Rep1 in an analysis of LD structure. Furthermore, microsatellite markers are unstable and could be expected to show a high degree of variance in LD properties across populations. Assuming that the Rep1 association signal is dependent on LD with SNPs outside the promoter region could help explain why analysis of this marker shows population differences, as this particular LD feature is lacking in Asian populations [48, 59].

Another attractive hypothesis is the possible existence of several functional variants. It is expected that the recognition of multiple risk alleles at the same locus will be an important step towards a more complete understanding of the genetic architecture of both PD [2] and complex disease in general [60, 61]. Such allelic heterogeneity represents a plausible source to confusing association signals that differ across populations. Several studies have presented statistical evidence for the presence of independent association signals at the SNCA locus, both considering Rep1 [52, 56] and SNPs in different regions of the gene [57, 62].

Genome-wide association studies

In recent years, the feasibility of performing large GWAS has provided a powerful tool to identify susceptibility loci in complex diseases, without the need to hypothesize candidate genes. The earliest GWAS in PD were underpowered to detect genome-wide significant associations [46, 63]. However, when sufficiently large studies were published, it was hardly surprising that an association signal at SNCA was among the first statistically robust findings [58, 64]. Further GWAS efforts in recent years have nominated an increasing number of PD susceptibility loci. A meta-analysis and replication study highlighted a total of 11 significant associations [65]. The combined population attributable risk of all 11 loci was estimated to 60% out of which SNCA contributed 9.7%.

The experience with GWAS in PD has shown that SNCA harbors the most reliably reproducible association [58, 62, 6470], corroborating the status of this locus as central in the pathogenesis of the disease. GWAS provide little information about the probable location of functional variants at each locus, because the marker density is relatively sparse. However, it may be worth noting that the most significant SNPs from GWAS in PD have been located near the 3′ region, in LD with associated markers from previous studies, highlighting this region as most likely to be functionally relevant [56].

Additional parameters examined in association studies

Association studies may be designed to assess different variables of interest in addition to the disease phenotype itself. Since disease onset correlates with the number of gene copies in patients with genomic multiplications, it has been argued that promoter polymorphisms believed to modulate expression should also lower age at onset [44, 45]. Whereas one large [45] and some smaller [55, 71] studies report that Rep1 genotype modulates age at onset, a 2006 multicenter study did not find evidence for such an effect [44]. Two SNCA SNPs associated with PD risk did also seem to correlate with earlier onset in a recent GWAS [62]. In contrast, an association study genotyping rs356219 found a larger and more significant effect in the late onset group when the data were stratified by age at onset [72]. This might indicate that the results from the majority of genetic association studies are most representative of the more common, late onset form of PD, whereas patients with early onset may have distinct, and probably rarer, predisposing variants.

Gene–gene and gene–environment interactions are often highlighted as important unresolved issues in our understanding of heritability in complex diseases [60, 73]. In the case of SNCA, a possible interaction with the risk haplotype at the MAPT locus has been the most extensively studied. Although one study from the UK reported data supporting an interaction [74], a larger amount of evidence has subsequently gone in favor of independent effects at the SNCA and MAPT loci [57, 7577]. Risk variants at SNCA have also been analyzed in relation to smoking and pesticide exposure in smaller studies with yet inconclusive results [7880].

Correlations between SNCA genotypes and different parameters of expression have been investigated. A German group quantified messenger RNA (mRNA) and protein levels in peripheral blood mononuclear cells and post mortem in different brain regions from patients and controls. They reported evidence of association with expression both of Rep1 variants and SNPs in the 3′ region [81]. Subsequently, the risk-associated allele of rs356219 has been shown to correlate with transformed plasma α-synuclein levels [56]. A large meta-analysis of GWAS included an analysis of quantitative trait loci influencing DNA methylation and expression [65]. The authors report evidence suggesting an association between risk alleles and SNCA expression, but the results did not reach the significance threshold corrected for multiple testing. Such studies are themselves associative, as they detect the effects of a whole haplotype without isolating putative causal variants. However, they may point to a functionally relevant mechanism and provide a framework for testing individual variants in model systems.

Functional aspects of SNCA variability

Understanding the functional significance of genetic variants and the mechanisms contributing to pathogenesis of complex diseases has proved a challenging task, yet this insight could be crucially important for translational research aiming towards better drugs, diagnostics, and biomarkers. It could be argued that the examples of genomic multiplications leave us in a somewhat privileged situation with respect to SNCA. At least there are strong indications of one definitive mechanism. Overexpression of α-synuclein causes parkinsonism in a dose-dependent manner. Additional pathogenic mechanisms have also been proposed.

The process of α-synuclein aggregation from monomers via oligomeric intermediates, or protofibrils, into amyloid fibrils is considered central in PD pathogenesis, and there is evidence indicating that the oligomer may represent the most toxic aggregation state [82, 83]. The pathogenic missense mutations have been thought to modulate the protein’s aggregational properties, and increased toxicity of mutant α-synuclein has been demonstrated in cell culture [82, 83]. Interestingly, however, the p.Ala53Thr mutation corresponds to the wild-type amino acid in the rat homologue to human α-synuclein, a fact that has somewhat puzzled researchers since the mutation was first discovered [12].

Two papers have reported absent or significantly reduced expression of the mutant allele in affected patients with the p.Ala53Thr and p.Ala30Pro mutations, suggesting that the mechanism behind the phenotype could be haploinsufficiency [84, 85]. In contrast, a recent study presents a p.Ala53Thr patient with monoallelic expression where the normal allele is in fact overexpressed to an extent that mRNA levels exceed those of two alleles of control individuals [86]. This may indicate that even in the case of missense mutations, the pathogenic mechanism is essentially quantitative.

Variants with a small influence on disease risk typically lie outside of coding regions and do not affect the protein sequence. The functional effects of these susceptibility alleles are generally assumed to depend largely on changes in expression levels, which in the case of SNCA seems intuitively appealing. The only putatively functional polymorphism investigated to date has been Rep1. It has been reported that various Rep1 alleles have different effects on expression when the SNCA promoter region is cloned into a reporter construct in cell culture [87], where the highest level was observed with the 259-bp allele. Subsequently, the same group has demonstrated upregulation of human α-synuclein in transgenic mouse brain comparing the 261-bp with the 259-bp allele [88]. Although both studies seem to support a functional role for Rep1, the results are contradictory concerning the effects of different allele sizes. Furthermore, if a variant is truly causal, one should expect similar findings across populations in association studies, unless there is population-specific LD with protective variants skewing the association signal. Further studies are needed to clarify whether Rep1 is indeed functionally relevant.

Important regulatory elements may also be located outside the promoter region. Understanding the regulatory mechanisms that determine SNCA expression can help nominate regions likely to harbor functional variants. Studies of transcription factors [89] and methylation patterns [90, 91] have highlighted intron 1 as an interesting region, but these findings have not yet been linked to specific polymorphisms. Non-coding variants may also affect disease risk through an effect on splicing. The SNCA-112 splicing variant is an isoform lacking exon 5, which has been proposed to enhance the aggregation of α-synuclein [92]. An association between SNPs in the 3′ region and higher SNCA-112 mRNA ratio level has been reported, suggesting alternative splicing as a possible causative mechanism of functional polymorphisms in this LD with these variants [93].

Discussion

Despite affecting few families, with new kindreds very rarely discovered, studies of autosomal dominant PD with SNCA mutations can still yield important insights. Of recent significance, we believe that the hypothesis of allelic imbalance as the pathogenic mechanism behind SNCA point mutations warrants further investigations [86]. Although SNCA has been thoroughly corroborated as a susceptibility locus in sporadic PD, hardly anything is known about the causally relevant variation. This paradoxical situation has numerous analogies in other disorders, representing a general trend in genetic studies of complex diseases in the GWAS era.

From a theoretical perspective, it seems justified to assume that a gene at the heart of PD pathogenesis, shown to cause dominant disease by the mechanism of overexpression, could also harbor multiple susceptibility variants, probably conferring graded risk along the whole spectrum of allele frequencies and odds ratios. In this respect, it may not be surprising that an independent additional association was detected in a recent GWAS [62], considering the large sample size in such studies. Nevertheless, when association with disease has been reproduced across many smaller studies, they have tended to highlight the same markers, reporting fairly similar odds ratios. These findings suggest that, although not the only one of functional relevance, one single causal variant is at least likely to stand out as the quantitatively important driver of the association signal observed in these studies. A Norwegian study reproduced the associations of both Rep1 263 bp and 3′ SNP rs356219 [53]. However, both markers occurred on a risk haplotype with a frequency of 7% in patients and 3% in controls, responsible for the majority of the association signal. We suggest that a causal variant could plausibly contribute to PD in a subgroup of patients carrying such a haplotype. When common markers show significant association with disease, the functionally relevant variants in LD with the marker may turn out to be considerably less frequent.

In our view, further or larger multimarker association studies employing a conventional case–control design will probably not bring us closer to pinpointing any causal polymorphism. Instead, a new approach is needed. Targeted resequencing of the whole SNCA locus in selected patients carrying a risk haplotype could provide the first step in a strategy towards identifying these variants. Interpretation of sequencing data may not be straightforward. Nevertheless, if based on careful patient selection, a sequencing approach would have large advantages over a multimarker design in screening for variants showing statistical and biological plausibility. Nominated polymorphisms will have to be investigated in additional functional studies to clarify their possible functional role. From a translational viewpoint, understanding causal relations and mechanisms is crucial to future advancement. Ultimately, SNCA could become a target of novel therapeutic approaches directed at the underlying pathogenesis behind PD.