Introduction

The synuclein family consists of three genes, synuclein alpha (SNCA, chr4q22.1, MIM#163890), synuclein beta (SNCB, chr5q35.2, MIM#602569), and synuclein gamma (SNCG, chr10q23.2, MIM#602998) encoding small, soluble proteins (ɑ-, β-, and ɣ-synuclein) that are abundantly expressed in neural tissue [19]. Synucleins gained significant attention when a short peptide (non-amyloid component, NAC), derived from purified amyloid plaques from brain tissue of people with Alzheimer’s disease (AD), was identified as ɑ-synuclein [136]. A few years later, a missense mutation (p.Ala53Thr) in SNCA was identified as the cause of Parkinson’s disease (PD) in an Italian family presenting early onset autosomal dominant disease [102], and, in that same year, ɑ-synuclein was recognized as a constituent of Lewy bodies, strengthening its central role in the pathogenesis of neurodegeneration [125]. Since then, the role of synucleins, and in particular of ɑ-synuclein, in neurodegenerative and other diseases has been the center of several studies. In this review, we aim to critically summarize the genetic evidence implicating these proteins in synucleinopathies and briefly discuss genetic associations with other diseases.

‘Synucleinopathies’ is an umbrella term grouping together different diseases in which the pathological aggregates of ɑ-synuclein are a defining characteristic. These include primary synucleinopathies like PD, Parkinson’s disease with dementia (PDD), dementia with Lewy bodies (DLB), and pure autonomic failure (PAF). In Lewy body diseases, ɑ-synuclein predominantly aggregates in Lewy neurites and Lewy bodies in the form of neuronal cytoplasmic and neuritic deposits. The categorization of these disorders is based on two main aspects: the timeline of development of ɑ-synuclein pathology across different brain regions and the clinical presentation. Another primary synucleinopathy is multiple system atrophy (MSA), which is characterized by glial cytoplasmic inclusions of ɑ-synuclein mainly in oligodendrocytes [46]. Deposition of aggregated forms of ɑ-synuclein is also a common occurrence in other diseases such as AD and REM sleep behaviour disorder (RBD). Accumulation of ɑ-synuclein aggregates is frequently observed in brains as a concomitant pathology to abnormal deposition of Tau, DNA-binding protein 43 (TDP-43), amyloid-beta, or prion protein [65].

In terms of gene structure, SNCA has seven exons, five of which are protein coding, SNCB is organized in six exons (five protein coding) and SNCG has five exons, all protein coding. Phylogenetic analyses show that ɑ- and β-synucleins are more closely related to each other than to ɣ-synuclein [69]. All three proteins have highly conserved amino-terminal domains that include variable numbers of 11-residue repeats and a less conserved carboxy-terminal domain. The 11-mer repeats compose a conserved apolipoprotein-like class-A2 helix that mediates binding to phospholipid vesicles [30]. Binding to lipids leads to a significant shift in protein secondary structure.

Alpha-synuclein is heterogeneously expressed in both central and peripheral nervous system with higher expression in neocortex regions, and with protein immunoreactivity being enriched at presynaptic terminals, where it often co-localizes with β-, and ɣ-synucleins [54]. This location is reflective of the several roles ɑ-synuclein plays in synaptic activity, including regulation of synaptic vesicle trafficking and subsequent neurotransmitter release.

It is interesting to note that SNCB has its expression more confined to the brain when compared to SNCA or SNCG, with the latter showing a much more ubiquitous expression pattern across the tissues evaluated in GTEx (Supplementary Fig. 1). When using Braineac to compare patterns of expression between brain regions, the highest expression of SNCA and SNCB occurs in the same three brain regions (OCTX, FCTX and TCTX, blue panels in Fig. 1), which is not the case for SNCG. The lowest expression of SNCA in this dataset occurs in the thalamus, which is the brain region with the highest expression of SNCG (green panels in Fig. 1). It is also interesting that from all the three genes, SNCG is the one presenting the highest relative expression in substantia nigra across brain regions (orange panels in Fig. 1) but is the gene with less evidence of involvement in PD, a disease typically characterized by degeneration of substantia nigra.

Fig. 1
figure 1

Visualization of gene expression across the brain for SNCA, SNCB, and SNCG obtained from Braineac (https://www.braineac.org/). Gene expression is shown for ten brain tissues: CRBL (cerebellar cortex), FCTX (frontal cortex), HIPP (hippocampus), MEDU (medulla), OCTX (occipital cortex), PUTM (putamen), SNIG (substantia nigra), TCTX (temporal cortex), THAL (thalamus) and WHMT (intralobular white matter) ordered from the highest to the lowest expressed. The dataset was obtained from 134 brains from individuals free of neurodegenerative disorders

Mendelian disease caused by mutations in the synuclein genes

SNCA mutations

SNCA mutations have been reported as causative for PD and/or DLB and include missense variants and multiplications.

The typical PD clinical phenotype associated with SNCA mutations includes early onset rapid motor progression and, often, the presence of dementia. The 146 SNCA mutation carriers from 84 families reviewed by Trinh and colleagues presented a median age at onset (AAO) of 46 years, with 0.7% of cases showing a juvenile onset (AAO < 20 years), 33% showing an AAO between 20 and 40 years, and the majority (66%) showing later onsets of disease (> 40 years) [134]. These cases had a median disease duration at evaluation of 7 years and most frequently presented the typical PD features of bradykinesia, rigidity, tremor and postural instability and a good response to dopaminergic drugs. Other atypical signs, such as alien limb syndrome, and dyskinesia were also reported in subsets of cases.

Alpha-synuclein pathology in the form of Lewy bodies or Lewy neurites is typically found in all cases carrying SNCA mutations. Most cases, however, are not pure synucleinopathies with many presenting significant concomitant pathologies such as neurofibrillary tangles and TDP-43 [88].

Both missense and multiplications of SNCA have been associated with different degrees of autonomic dysfunction, also present in asymptomatic mutation carriers and typically preceding the onset of motor symptoms [18]. However, no SNCA mutations have been shown to cause PAF, making this a rare, sporadic disorder with no genetic forms described so far [27]. A similar situation is true for MSA: even though the inheritance of SNCA variants has been investigated in families with MSA, and some cases with known SNCA mutations, such as p.Gly51Asp [59, 60] and p.Ala53Glu [92], have neuropathological features of both MSA and PD, no causal SNCA mutations have been identified, so far, for pure MSA [57].

SNCA missense mutations

Currently four SNCA missense mutations are classified in the MDSgene database [61] as definitely pathogenic (p.Ala30Pro [67], p.Gly51Asp [59], p.Ala53Glu [92] and p.Ala53Thr [102] and have been shown to cause PD or related phenotypes (Table 1). In the same database, pathogenicity is reported as unclear for two variants initially reported as causative for PD: p.Glu46Lys [143] and p.His50Gln [4, 105]. SNCA p.Glu46Lys is currently classified as probably pathogenic, but the specific factor(s) that led to this classification are not completely apparent. This mutation has been found in two families (from Spain and Bolivia, making a common ancestry possible) [99], is absent from population databases, has a high CADD score for in silico prediction of pathogenicity (26.1); and a transgenic rat model expressing the mutation replicated several preclinical features of PD [15],which can be considered as sufficient evidence of definite pathogenicity.

Table 1 Features of SNCA missense variants reported as causative or potentially causative for synucleinopathies and associated phenotypic characteristics

SNCA p.His50Gln was also found in two families, but is present in population databases of genetic data with an appreciable frequency; and has a low CADD score for in silico prediction of pathogenicity (10.25), overall being associated with insufficient genetic evidence for the establishment of pathogenicity, as summarized by Blauwendraat et al. [10]. Nonetheless, in comparison to wild-type SNCA, p.His50Gln enhanced SNCA aggregation by reducing the solubility of monomers, leading to a decrease in the lag time of fibril formation and increase of the amount of fibrils formed [103, 112]. It was also found to be secreted at higher levels from SH-SY5Y cells and to be more cytotoxic to primary hippocampal neurons [58]. After determining the structures of wild-type full-length SNCA fibrils using cryo-EM [72], the same group chose p.His50Gln to examine the effects of mutations on the structure of SNCA fibrils. They did this by determining atomic structures of p.His50Gln SNCA fibrils and by comparing aggregation kinetics, seeding capacity, and cytotoxicity to wild-type SNCA. The authors identified two previously unobserved polymorphs of SNCA (narrow and wide fibrils) that possibly relate to the faster aggregation kinetics, higher seeding capacity in biosensor cells and greater cytotoxicity observed for mutant fibrils compared to the wild-type ones [13].

In addition to the mutations described above, there are three other variants reported in the literature as causative for PD that are not represented in the MDSgene database. One of these is the p.Ala53Val, originally identified in a Japanese family that also occurs in residue 53 of SNCA, but causes the reference alanine to change to valine. Interestingly, it was identified in the homozygous state, but the family presented an apparent autosomal dominant pattern of inheritance of disease. The proband had a slowly progressive Parkinsonism with onset at 55 years, later associated with hallucinations and dementia. A similar clinical picture was observed in several family members, with heterozygous carriers presenting different psychiatric manifestations [141]. Even though this variant has a minor allele frequency (MAF) in gnomAD East Asian population of 0.01%, in a cohort of 300 Japanese controls it was found once (MAF = 0.2%), indicating the need for more extensive investigation into its pathogenicity. Additionally, two SNCA rare variants (p.Ala18Thr and p.Ala29Ser) have been identified in sporadic PD patients of Polish origin [52]. The absence from gnomAD and the in silico pathogenicity prediction indicate a potential deleterious effect of both variations. The phenotype exhibited by carriers of these variants was similar to the one typically seen in carriers of the p.Ala30Pro mutation, all being associated with a late-onset form of PD.

All missense variants identified in SNCA and confirmed to be causative of PD and related disorders are located in the N-terminal amphipathic region of the protein (Fig. 6). More specifically, these are located within the two alpha-helical regions (residues 3–37 and 45–92) [52].

Even though the numbers of patients with missense mutations other than p.Ala53Thr are low, limiting the ability to reach solid conclusions on phenotypic differences across missense mutations, patients with p.Ala53Thr tend to have an earlier onset (mean AAO: 43.2 years) compared to p.Ala30Pro (mean AAO: 61.7 years) and p.Glu46Lys (mean AAO: 51.8 years). The initial motor symptoms can also be different: p.Ala53Thr carriers report bradykinesia more often, while p.Ala30Pro carriers have rigidity [134].

SNCA multiplications

SNCA multiplications are more frequent than missense mutations, being present in around 0.05% of European PD populations [107, 130]. Of these, duplications (three SNCA copies) are more common in familial PD than triplications (four SNCA copies) or missense mutations [134]. Complex CNVs have also been reported with an example being the case of a 351 kb triplication including SNCA and flanked by a duplication [36]. In addition to triplications, four copies of SNCA have also been reported in cases with homozygous duplications [62]. Several mechanisms have been proposed for the occurrence of CNVs in this evolutionary fragile region of the genome. A Swedish family with SNCA duplication and a Swedish-American family with triplication were shown to descend from a common ancestor. Genetic analyses of this large pedigree indicated an initial duplication event occurred through recombination, with a subsequent triplication happening by unequal crossing-over [39]. This finding suggested the possibility of conversion to triplication in future generations of families carrying SNCA duplications [63]. Non-allelic homologous recombination has been the most commonly suggested mechanism for SNCA duplication, based on the different types of repetitive elements found in the spanning region of the breakpoints [111]. More recently, Seo S.H. and colleagues used genome sequencing to analyse the breakpoint sequences of SNCA duplications causing PD in six patients. They concluded that homologous recombination mechanisms involving repetitive elements is probably not the main cause of duplication of SNCA and that the presence of microhomology at the junctions, and their position within stem-loop structures, suggests that replication-based rearrangements may be the most common mechanism for the occurrence of SNCA multiplications [118].

Patients with multiplications may present clinical and pathological characteristics of PD, PDD, DLB, or MSA [63]. In general, SNCA triplications are fully penetrant, presenting with early-onset and rapidly progressive Parkinsonism associated with dementia, autonomic dysfunction and psychiatric features [123]. In cases with SNCA duplications, the disease is clinically similar to idiopathic PD [17]. Even though penetrance rates are not yet fully established, there are reports of SNCA duplication carriers without the manifestation of disease at old ages [86].

There is a clear dosage effect in relation to AAO, with triplications being associated with the earliest onset (median of 31 years) and duplications with a median onset of 48 years [134]. In a meta-analysis of SNCA multiplications in familial Parkinsonism, including a total of 59 CNVs, an earlier AAO in triplications when compared to duplications was also seen both for motor symptoms (34.5 ± 7.9 years for triplications vs. 47.2 ± 10.6 years for duplications) and for cognitive decline (39.6 ± 5.5 years for triplications vs. 56.5 ± 9.6 years for duplications) [11]. Dystonia was found to be much more frequent in carriers of duplications when compared to carriers of triplications. On the other hand, depression was found in all triplication carriers and only in about half of duplication carriers [134].

SNCA multiplications associated with Parkinsonism have different breakpoints and present great variability in size. Multiplication size or number of genes did not correlate with onset of motor symptoms or dementia [11]. The largest multiplication reported defines a partial 4q trisomy, extending through 41.2 Mb and containing 150 genes. The patient carrying this heterozygous duplication presented early onset Parkinsonism but also other clinical features such as delayed developmental psychomotor milestones during infancy and musculoskeletal abnormalities, which are atypical given the known phenotypic spectrum of SNCA multiplications. These may be related to the altered expression of other genes included in the large duplicated genomic region [44]. The smallest multiplication reported in the literature was identified in a Japanese family and spans about 0.2 Mb [86].

Even though other genes are typically included in the duplicated/triplicated regions, amplification of the number of copies of SNCA seems to be the only event necessary for the occurrence of parkinsonism as this Japanese family only carried a duplicated region that included the full-length SNCA and the 5′ end of multimerin1 (MMRN1). An increase in the number of copies of MMRN1 is observed in all the other cases with SNCA multiplications. It is interesting to note that both SNCG and multimerin 2 (MMRN2) genes are located in the same region of human chromosome 10 and lie in the same orientation to each other, suggesting that these paralogs may have arisen through an evolutionary duplication event [111]. Figure 2 summarizes multiplications at SNCA for which breakpoints have been described and highlights how large CNVs (ranging for 26 Mb to the entire chromosome) have been identified in cohorts of children presenting with autism, intellectual disability/developmental delay, or multiple congenital anomalies. Given that parkinsonism has an age-dependent penetrance, it is possible that individuals presenting with non-neurodegenerative multiplications early in life will come to develop parkinsonian phenotypes if they survive to older ages. It would be important to follow-up on these cases and families to assess this possibility.

Fig. 2
figure 2

Duplications and triplications reported in the literature spanning the SNCA locus and for which the CNV breakpoints have been described. These multiplications were obtained from a literature search and from the MDSgene and Clinvar databases. While the majority of these multiplications were reported in patients with a neurodegenerative phenotype (typically involving parkinsonism and dementia), a few cases were linked to intellectual and developmental disorders, such as autism. The orange vertical bar indicates the position of SNCA. The top panel displays larger multiplications, including whole chromosome duplications and triplications. The bottom panel includes smaller multiplications at the SNCA locus. The raw data and corresponding references used to create this Figure are available in Supplementary Table 1

So far, detailed clinical presentations of SNCA/4q21 deletions and microdeletions are not available [101].

SNCA somatic mutations

Two PD cases were described having negative results for SNCA exon dosage alterations in peripheral blood, but positive CNV changes in oral mucosa cells. In these cases, it was interesting to note that the patient with the highest percentage of oral mucosa cells positive for SNCA multiplication [75% compared to 42.6% in the second patient, evaluated by fluorescent in-situ hybridization (FISH)] had a positive family history of PD, which was negative in the second patient. Both cases presented parkinsonian clinical phenotypes that fit within the described phenotypes for SNCA multiplications [94]. These cases of SNCA CNVs mosaicism were an interesting finding adding insight to SNCA rearrangements and highlighting the importance of considering low-grade mosaicism when genetically assessing neurodegenerative diseases, as well as the importance of expanding analyses to include other tissues in addition to peripheral blood. The second patient described above was also subjected to a globus pallidus internus deep brain stimulation procedure with positive results. This indicated that DBS can be successful in cases of SNCA mosaic duplications and further supports the hypothesis that the success of such interventions relates more to the phenotype of the patients rather than their genotypes [93].

Other studies assessing the role of SNCA somatic mutations in disease have been performed in brain tissue. One of these studies used high-resolution melting analysis to test SNCA coding exons for somatic point mutations in cerebellar DNA samples of 539 PD and DLB cases, as well as DNA from frontal cortex and substantia nigra from 20 cases. The authors did not detect any single-nucleotide variants, although, as the authors noted, the assay had a detection limit of ~ 5–10% allele frequency [106]. The same group investigated SNCA CNVs in synucleinopathies, focusing on PD and MSA, mostly in the substantia nigra and cingulate cortex, using FISH. PD cases were significantly more likely than controls to have any somatic SNCA gains both overall and in dopaminergic neurons. The authors also found a negative correlation between the proportion of dopaminergic neurons with copy number gains and AAO of PD. All MSA cases presented somatic gains with the highest levels identified in dopaminergic neurons being present in two of these MSA cases. Interestingly, the differences in neurons were significant for each disease when compared to controls, but for non-neurons, which are more relevant to MSA, the difference was significant only in MSA [82]. More recently, to determine if CNV mosaicism in synucleinopathies is specific to SNCA, or if there are gains throughout the genome, an analysis of single cell whole genome sequencing in 169 cells from two MSA cases was performed. The results revealed CNVs (> 1 Mb) throughout the genome in ~ 30% of cells with a mix of gains and losses in neurons, and almost exclusively gains in non-neuronal cells being observed across the genome. The authors proposed that somatic SNCA CNVs may contribute to the aetiology and pathogenesis of synucleinopathies. These can be risk factors for sporadic disease or can alternatively result from the disease process [95].

To confirm these findings, it will be critical to use larger sample sizes, include both cases and controls in the analyses and perform comprehensive assessments using unbiased single-cell approaches. It is clear that somatic CNVs have distinct profiles based on cell type making it essential to expand these single cell studies. Another critical point when considering disease relevance in these studies is sampling, both in terms of assessing relevant tissues and cells, but also in terms of testing sufficient numbers of cells and/or samples.

SNCB and SNCG mutations

Shortly after the identification of SNCB and SNCG, it was hypothesised that these genes could also have an impact in synucleinopathies. This is a valid hypothesis given the high degrees of homology between the three paralogues and their patterns of expression. As shown in Fig. 1, both SNCB and SNCG are expressed at high levels in various areas of the brain, including the substantia nigra, which is the main area of neuronal degeneration in PD. Other factors also substantiated this premise: SNCB was shown to modulate SNCA aggregation and toxicity both in vitro [50] and in vivo [35], and pathological accumulations of SNCG were found to be present in brains of PD and DLB patients and absent from the brains of controls and patients with other neurodegenerative disorders [41]. Additionally, overexpression of mouse SNCG led to a neurodegenerative phenotype in mice with SNCG aggregation, neuronal loss, motor deficits and premature death, reminiscent of ALS pathology in humans [85, 96]. This has been confirmed by the identification of a novel histopathological feature of ALS characterized by the presence of ɣ-synuclein positive structures in the spinal cord of a portion of ALS patients, suggesting that the pathological aggregation of ɣ-synuclein may contribute to the pathogenesis of ALS. In addition to its involvement in cancer [1], changes in SNCG expression and function have also been linked to other neurodegenerative-associated diseases such as glaucoma [128] and traumatic brain injury [127].

To test if SNCB variants could be involved in DLB, Ohtake et al. studied the genetic variability of the gene in 33 sporadic cases and 10 DLB families. The authors identified two variants (p.Val70Met and p.Pro123His) in unrelated cases and concluded that mutations in SNCB may predispose to DLB. One of the cases was apparently sporadic but the other was part of an extended pedigree. The analysis of segregation of the mutation with the disease in that family was, however, inconclusive: the mutated allele did not produce a disease phenotype in all individuals who carried it, with only three of four individuals who were heterozygous being definitely or potentially affected by DLB. The authors suggested the possibility of an autosomal dominant trait with reduced penetrance in these cases [89]. Subsequent studies screened SNCB in small numbers of PD patients (11 and 71 families) [70, 73] and of diffuse Lewy body dementia (n = 89) [87] and found no pathogenic mutations. More recently, a study using exome sequencing in a large cohort of DLB cases (n = 1004) did not identify these two variants, or any coding variants in SNCB in DLB cases [91].

In addition to co-segregation of a variant with disease in families, several other factors are typically considered when assessing the potential pathogenicity of variants, including case–control enrichment; presence/absence in population databases; and bioinformatic analyses using deleteriousness prediction software. These features for SNCB p.Val70Met and p.Pro123His are detailed in Table 2. The relatively high frequency of the variants in the general population, when considering the prevalence of DLB, argues against a causative role, particularly for p.Pro123His. For p.Val70Met it is also important to note that the cohort where the original finding was made most probably included Asian cases, although the specific number was not specified in the manuscript [86]. This would be in line with the exclusivity of minor alleles in Asian populations found in gnomAD. The tools used to predict the impact of both variants in the protein show inconclusive results. The CADD score for p.Val70Met puts this variant within the top 1% of deleterious variants in the genome. On the contrary, SIFT and Polyphen predict the variant to be benign. All prediction scores for p.Pro123His point towards pathogenicity, but from the two variants this is the one with the highest MAF in the general population. Taking this evidence together, and given that genetic evidence should be the main criteria for the establishment of pathogenicity, neither of these variants should be classified as pathogenic. These should be considered variants of unknown significance and not assumed to be causative for DLB or other synucleinopathies.

Table 2 Features of SNCB missense variants reported to be linked with DLB

Even without confirmation of the role of SNCB mutations in DLB, this finding led to both the inclusion of SNCB in the list of genes to be tested when genetically evaluating DLB families, and to molecular functional studies of SNCB and the specific variants identified. The genetic studies in DLB families have, so far, identified no additional disease causative variants [25, 129] substantiating the possibility that the pathogenicity of these variants has been misclassified, including in ClinVar. On the other hand, both in vitro and in vivo studies of these variants have pointed to potential pathogenic effects. Both variants seem to be involved in lysosomal pathology with the loss of the proline residue in p.Pro123His leading to marked structural changes of the protein which are sufficient to abolish the non-amyloidogenic characteristics of SNCB, to convert it into a neurotoxic species, and/or to induce the formation of neuritic pathology [55, 139]. The two main possibilities suggested are the loss of protective functions associated with these variants or the toxic gain of function by SNCB. The latter is supported by the increased propensity for aggregation in vitro conferred by p.Pro123His and p.Val70Met-recombinant SNCB proteins and by the fact that expression of these mutant SNCB proteins in neuroblastoma cells led to lysosomal pathology associated with protein aggregation [138, 139]. Transgenic mice expressing this SNCB p.Pro123His variant developed progressive neurodegeneration characterized by axonal swelling, astrogliosis, and behavioural abnormalities. Interestingly, the memory impairments were more marked than the motor deficits, in this model [40]. In the same study, the authors also demonstrated that neuropathology of p.Pro123His SNCB transgenic mice was not significantly affected by cross-breeding with SNCA knockout mice, but cross-breeding with SNCA transgenic mice resulted in enhanced phenotypes, including neuronal cell loss and dopaminergic dysfunction.

In SNCG, no pathogenic variants were identified in a cohort of 89 diffuse LBD patients [87]. The screening of the gene in 12 PD families and 10 sporadic cases did not reveal any pathogenic variants either [74]. A relatively larger study sequencing SNCG in 71 PD families also did not identify any pathogenic variants [70].

The results from these small studies suggest that SNCG rare variability does not have a causative role in Lewy body pathologies but larger studies are needed to confirm these findings.

There are no reports of multiplications in either SNCB or SNCG associated with synucleinopathies, but SNCB is part of the region duplicated in a novel syndrome characterized by short stature, microcephaly, delayed bone development and speech, with mild or no dysmorphism that seems to be reciprocal to the common Sotos syndrome deletion [38].

Variants in synuclein genes associated with disease risk

SNCA locus

To further highlight its pivotal role in PD, variants in SNCA were among the first to be identified as modulating risk for that disease. A signal at the 3′-end of the gene was identified in two independent GWAS, one performed in Caucasians and another in an Asian cohort [114, 122] (Fig. 3). Although the mechanism of action of the leading variant has been hypothesized to be affecting post-transcriptional RNA processing or RNA stability, no confirmatory studies have been performed. Additionally, since the initial publication, other studies have identified secondary independent signals at the locus, showing that SNCA has substantial allelic heterogeneity for PD. The relationship between these additional signals and the leading variant is still unclear, with no direct evidence supporting or disproving if they represent independent functional mechanistic associations or simply tag the same effect. In a large study comprising over 12,000 PD cases and 12,000 controls Pihlstrom et al. reported the replication of the top PD GWAS result, as well as a secondary hit at the 3′-end of the gene [98]. Additionally, the authors identified a third independent and genome-wide significant variant associated with PD in their data. After performing in silico eQTL and CAGE analyses, they concluded that the leading variant from the GWAS is the likely functional variant at this locus in PD. Shortly after the initial association of SNCA variants with PD, similar findings were made for MSA with a strong association with a variant in LD with the one reported for PD, suggesting that the same mechanisms might play a role in these two disorders [117]. These findings were, however, not replicated in all subsequent studies leaving the role of SNCA common variability in MSA to remain unclear [2, 20, 26, 113, 126].

Fig. 3
figure 3

Locus view of the most recent PD GWAS results at the SNCA locus with the top associated SNPs from GWAS of other synucleinopathies overlaid. P values shown here are plotted as reported in the original manuscripts (Supplementary Table 2). Labels highlight the top SNPs for each disease. The faded background SNPs were obtained from the summary statistics reported in the Nalls et al. 2019 GWAS. Recombination rates were derived from the 1000Genomes CEU population, and LD scores (R2 value) were obtained using LD link (https://ldlink.nci.nih.gov). Positions shown in genome assembly hg19

Given the common occurrence of comorbid pathologies across neurodegenerative diseases and the significant role of SNCA in the pathophysiology of AD, reviewed in [135], variants in SNCA have also been tested for association with AD. The study of three variants in a very small cohort (98 AD and 105 controls) suggested the possible association of rs10516846 with disease and with increased CSF levels of the protein in AD [137]. These genetic findings have not been replicated, but several studies have reported, in contrast to the typical profiles of low CSF SNCA levels in synucleinopathies [132], unaltered or slightly increased CSF SNCA levels in patients with MCI and AD [131]. Linnertz et al. tested the association of SNCA with Lewy body pathology in 400 AD cases without Lewy body pathology against a cohort of 107 cases with Lewy body variant AD. They identified nominally significant associations at two of the six variants tested at the SNCA locus, although none of them was the variant implicated in PD [75]. Recently, a study focused on a large cohort of neuropathologically diagnosed AD cases tested Lewy body comorbid pathology under a genome-wide association framework [7]. Although they found associations with Lewy body pathology in AD, SNCA showed no evidence of association in that dataset. Interestingly, in DLB—a disease that has been considered to be in a continuum from PD to AD—the first GWAS showed a strong signal at the SNCA locus [49]. This signal, however, was shown to be distinct and independent from the strongest PD signal that is located at the 3′-end of the gene (Fig. 3). The DLB signal is detected as a secondary, weaker signal in PD and could represent the inclusion of DLB cases as misdiagnosed PD in the large GWAS. It is not clear what these different associations represent in terms of biological variation, but it is plausible that they differently affect regulatory mechanisms, such as gene expression, in a specific biological context or in distinct populations of vulnerable cells. These findings were independently replicated by Guella and colleagues, who, additionally, showed an haplotype in intron 4 of SNCA that was specifically associated with their PD-dementia cases [48]. Taken together, these results suggest that even within a locus, different diseases can have disparate associations, perhaps leading to different biological impacts. Rapid eye movement sleep behavior disorder can be considered as a prodrome of synucleinopathy, as over 80% of RBD patients will eventually convert to an overt synucleinopathy. Because of this comorbidity, RBD has also been tested for genetic risk association at the SNCA locus. Krohn et al. tested a cohort of 1000 idiopathic RBD patients and about 6000 controls for association at SNCA, using a targeted sequencing approach. They found a significant association with variants at the 5′-end of the gene, which were independent from the original PD leading variant [66]. Interestingly, the RBD variant showing the most significant association with disease is in high LD with the variant identified in DLB and with the secondary PD signal, suggesting that these may all target the same signal. DLB is also a more common conversion diagnosis than PD in RBD cases [104], which agrees with a higher correlation at this strong genetic risk locus. In addition to modulating risk for disease, SNCA variants have also been associated with age at onset of PD, both in sporadic cases [9] as well as in carriers of LRRK2 mutations [12] and in various populations [53, 140]. The SNCA variant rs356182 has also been associated with an endophenotype of PD characterized by a tremor‐predominant phenotype and predicted a slower rate of motor progression [28].

SNCB and SNCG loci

The studies testing the association of genetic variability in SNCB and SNCG with the risk of development of synucleinopathies are few and typically include small sample sizes, when compared to the literature focused on SNCA. None of these have identified consistent significant associations with risk of disease. Brighina and colleagues analysed two SNCB SNPs (rs35035889 and rs1352303) in 370 PD case-unaffected sibling pairs and 168 case-unrelated control pairs (538 pairs total) and found no association with PD overall or in strata, nor when performing haplotype analyses. One of the SNPs (rs1352303) was, however, associated with age at onset of PD in women [14]. The same group performed a larger analysis that included 10 variants in SNCB and ten variants in SNCG in more than 1000 PD cases. The same association between SNCB variants with AAO of disease in women was identified although it did not survive multiple test correction [24]. Polymorphisms in SNCG were also analysed in the UK (25 PD and 55 controls) and German (262 PD and 170 controls) populations. Both studies failed to find significant differences in the allelic or genotypic distributions between cases and controls [37, 68]. In addition to sequencing SNCB and SNCG in 89 LBD patients, Nishioka and colleagues also genotyped polymorphisms in a larger cohort (172 patients and 447 controls). After correction for multiple testing, one variant (rs3750823) in the 5′ flanking region of SNCG remained significant in all the analyses [87].

Small case–control association studies have also been conducted to assess the association of genetic variability in SNCG with AD, again, with no significant associations identified [77].

In a GWAS of neocortical Lewy-related pathology performed in a population-based sample of individuals aged 85 or over, from Southern Finland and including 218 subjects, the authors specifically assessed the SNCB locus and found no significant results [97]. Neither the SNCB nor the SNCG loci have been found to be genome wide significant in PD [84] or DLB [49] GWAS. No variants present p-values for association, at either locus, that are close to significance in the PD and DLB GWAS (Supplementary Fig. 2).

Other genetic variability in synuclein genes

The total number of variants reported in gnomAD for SNCA (n = 8482), SNCB (n = 1109) and SNCG (n = 625) relates to the size of each gene. SNCA is by far the largest gene (114,216 bp), followed by SNCB (10,472 bp) and SNCG (4729 bp). This difference in gene sizes mainly relates to the very large intron 4 in SNCA. While both in SNCB and SNCG the proportion of variants found in intron 4 of each gene is about 40% of the total number of variants in each gene, in SNCA, this proportion doubles to 80%. The frequency of variants per base pair is the lowest in SNCA (0.08) when compared to the other two genes (SNCB = 0.11 and SNCG = 0.13), when taking into account the size of the genes (Fig. 4). The same is true when considering the rate of coding variants either when including synonymous variants (SNCA = 0.01, SNCB = 0.09 and SNCG = 0.22), or excluding synonymous variants (SNCA = 0.001, SNCB = 0.06 and SNCG = 0.18).

Fig. 4
figure 4

Whole-gene allele frequencies as reported in gnomAD (v. 2.1.1) including all populations. Allele frequencies (represented by vertical green bars) directly correspond to locations on the respective gene above. Genes structures displayed are the canonical Ensembl v75 (hg19) transcripts, as reported in gnomAD. The yellow boxes represent the length of the corresponding region of SNCA intron 4 in each gene

While SNCB and SNCG have very low probabilities of being loss-of-function intolerant (pLI = 0.07 and pLI = 0, respectively), for SNCA, this metric is very high (pLI = 0.9, Supplementary Table 3). This reflects the absence of coding loss of function variants in SNCA observed in gnomAD, when one would expect to see seven of such variants in this cohort of the general population (Fig. 5).

Fig. 5
figure 5

Variants within protein coding regions of the synuclein genes. Each bar represents the variant type and position within the exon, and the height of the bar along the y-axis displays the allele frequency. Variant positions, annotations, and frequencies were obtained from gnomAD (v2.1.1). Exon regions shown as defined by NCBI RefSeq (listed as protein coding exons on UCSC). Regions shown in light gray mark constrained coding regions as described by [51]. The most constrained regions on the synuclein genes are found on SNCB in exons 2 and 3 (95.044% constrained) followed by SNCA (92.212% constrained) and SNCG (90.830% constrained). Regions highlighted by red asterisks mark pathogenic variants as described on NCBI ClinVar. SNCG does not have any described pathogenic variants. Due to dramatically different frequencies between variants within exons, a maximum allele frequency threshold was set to 2e−4. Any variants with frequencies higher than this threshold are labelled with their reported frequency

From all the ɑ-synuclein residues mutated in synucleinopathies, amino acid 53 is the one showing a higher variability of alternative residues: in addition to the changes from alanine to glutamic acid, threonine and valine described in patients, a fourth change to glycine is also reported in gnomAD in one individual. Beta-synuclein also has an alanine at position 53, but the human ɣ-synuclein normally has a threonine. The ɑ-synuclein sequences for humans and rodents are 95% identical, with only seven amino acids differing between species. One of these seven amino acids is at position 53, which is normally a threonine in rodents (Supplementary Fig. 3).

Analyses of evolutionary patterns and structural dynamics indicate a strong purifying selection on the whole synuclein family. Alpha-synuclein has a critical region located from 32 to 58 in the N-terminal lipid binding alpha helix domain. This region is essential both from an evolutionary perspective and for the proper stability and protein conformation. It also harbours critical interaction sites making it important for disease pathogenesis [121].

Other non-coding variability in synuclein genes associated with disease

Both SNCA and SNCB undergo complex splicing events. These include in-frame splicing of coding exons (leading to at least three shorter transcripts and corresponding protein isoforms with different functional properties); alternative inclusion of at least four initial exons; and different lengths of 3′ untranslated regions. All different transcripts of SNCB are only expressed in the brain while only some of SNCA are brain specific [42]. The main factors currently described to regulate SNCA expression include GATA transcription factors predominantly binding to motifs located in SNCA intron 1 [116], a CpG island (also located in intron 1) [81], and a complex microsatellite repeat located 9.8 kb upstream of the transcriptional start of SNCA, called Rep1 (Fig. 6). This dinucleotide repeat is characterized by five alleles with different sizes and these alleles have been shown to regulate SNCA expression levels in different model systems. The repeat was shown to act as a modulator of SNCA transcription, with a fourfold increase in promoter activity [133], to increase SNCA mRNA and protein levels in a transgenic mouse model [29] and in a neuroblastoma cell system [21]. More recently, Soldner and colleagues used CRISPR/Cas9 in human-induced pluripotent stem cells (iPSCs) to analyse allele-specific expression of the repeat. In contrast to what had previously been shown, the authors found that neither the deletion of the microsatellite repeat element, nor its exchange for the shorter or longer repeat length risk alleles affected the cis-regulated expression of the linked SNCA allele. These results based on the assessment of early events in in vitro differentiated cells suggested that Rep1 has no clear role in SNCA regulation [124].

Fig. 6
figure 6

Representation of the structures of SNCA gene (top) and protein (bottom) depicting the main variants and genetic elements associated with disease. The figure is not to scale and positions are approximate. The main PD GWAS-associated variant (PD-MAX in Fig. 3) is located outside of the gene and is not represented here. Missense variants identified as possibly, probably, and definitely pathogenic (Table 1) are also represented

Given the initial reports of a potential role in the modulation of SNCA expression by Rep1, several studies have attempted to associate Rep1 alleles with the risk of developing PD. The results obtained have been contradictory, likely due to the study of small sample sizes. One larger analysis including over 2600 cases and 2600 controls identified a significant association of the long Rep1 allele with PD [79]. In brain tissue of 228 PD cases and 144 controls, Linnertz and colleagues found the protective allele (259 bp, homozygous) to present 40–50% SNCA mRNA reduction in the temporal cortex and substantia nigra [76]. The longest repeat (263 bp allele, risk allele for PD) has also been shown to result in a 2.5-fold increase in luciferase activity over the shortest repeat. The 261 bp allele (major allele) showed only a 1.5-fold increase over the shortest repeat, whereas the 259 bp allele increased expression by threefold [22]. Simon-Sanchez and colleagues tested if Rep1 represented an independent signal to the GWAS hit at the SNCA locus and showed that the signals are not independent [122]. The authors suggested that the associations identified at the Rep1 locus and at the SNPs identified in their GWAS were the result of residual LD between both loci. Given the more recent functional findings suggesting that Rep1 does not clearly contribute to SNCA regulation, one can infer that the association reported between Rep1 and PD risk is driven by the GWAS signal. This might explain some of the contradictory results, which may be linked to varying levels of LD across populations.

In addition to demonstrating the absence of a role of Rep1 in SNCA regulation, Soldner and colleagues also found that an enhancer element in intron 4 harboring two risk variants (rs356168 and rs375654) had a significant effect on allele-specific expression of SNCA [124].

Other SNCA intronic regions associated with disease include a CT-rich locus (chr4: 90,742,421-90,742,492) and a poly-T polymorphism (chr4: 90,749,444-90,749,566). The first has four different haplotypes and the risk haplotype was initially shown to be associated with Lewy body pathology in AD. It was also suggested to act as an enhancer with the risk haplotype leading to a higher mRNA expression of SNCA in the human brain [78]. More recently, a genome-wide screen for short structural variants focused on GWAS regions (using a bioinformatic tool developed to identify candidate causal variants associated with GWAS hits) provided a high score, implying a higher probability of a role in disease, for this highly polymorphic low-complexity cytosine–thymine (CT)-rich region in the context of synucleinopathies [115]. The second variant (rs149886412) consists of three alleles with different lengths modulating the efficiency of SNCA exon 3 splicing resulting in different levels of expression of SNCA 126 splice isoform [8]. The authors established that the poly-T (12T) length was associated with the highest SNCA 126 expression levels and the shortest poly-T (5T) with the lowest expression, when compared to the most frequent genotype (7T/7T). The alleles were also found to be differentially distributed among age groups suggesting a potential effect of this specific splice isoform in healthy aging.

The complex splicing events occurring both in SNCA and SNCB lead to different protein isoforms that are known to play various roles in the pathogenesis of synucleinopathies. One of the most important events seems to be the shift of isoforms expression ratios favoring the formation and accumulation of altered synucleins species [42].

Different 3′UTRs also contribute to isoform diversity and at least five SNCA transcripts differing in their 3′UTR length, ranging between 290 and 2520 bp have been reported [110]. Additionally, five miRNAs (miR-7, miR-153, miR-34b, miR34c, and miR-214) that directly regulate SNCA expression have been described [108].

For all these variants, the identified associations have, so far, not been independently replicated and studies in larger, well-characterized cohorts are needed to fully assess the role of these non-coding variants in synucleinopathies.

SNCA genetic variability in the context of structure, function and the environment

It is currently thought that both SNCA multiplications and at least some non-coding variants exert their pathogenic effect through increased protein expression. Even though a complete understanding of the pathological processes associated with these variants is still lacking details, it is important to note that overlaps occur at the clinical and neuropathological levels between multiplications and other types of variants, as this can indicate at least a partial commonality of pathobiological mechanisms occurring between different types of SNCA variants. Missense mutations on the other hand, can have a variety of effects on the protein, which can range from the loss of membrane binding to formation of oligomeric aggregates or changes in the ratio of tetrameric to monomeric ɑ-synuclein species [109]. The study of these rare mutations can, however, be extremely informative, particularly when paired with biochemical and high resolution structure biology approaches. These integrated approaches may lead to new insights associated with the specific mutations being studied but may also shed some light into general pathogenic processes. For example, ɑ-synuclein fibrils formed by p.Ala53Thr variants were recently found to rapidly nucleate competent species, to continuously elongate fibrils in the presence of increasing amounts of seeds and to overcome wild-type surface requirements for growth. These features may explain the typical early AAO associated with this mutation. The same study was also able to characterize the early stages of ɑ-synuclein polymerization based on the study of wild-type and familial ɑ-synuclein variants [90]. Such studies are also essential to understand the commonalities and differences between synucleinopathies. The association of particular ɑ-synuclein strains with distinct synucleinopathies has important implications to the understanding of the mechanisms of ɑ-synuclein misfolding and aggregation and how these relate to the different clinical and neuropathological features of these disorders [119]. To fully understand these differences, it will be essential to dissect the role of the genetic background in the formation of the different strains by systematically characterizing the genomic context in association with disease.

Given that most cases of synucleinopathies occur late in life, it is important to consider the concept of antagonistic pleiotropy in these diseases. In this hypothesis, it is postulated that one gene influences more than one trait where at least one of these traits is beneficial early in life and another is detrimental later in life. One example is Huntington disease which is caused by CAG trinucleotide repeat expansions in the HTT gene. The same genetic alteration is also associated with decreased risk of certain cancers and increased fertility [16]. Another example that is relevant to synucleinopathies and neurodegenerative diseases is APOE: the APOE ε4 allele may be beneficial in earlier ages and may only confer risk of cognitive decline later in life. Some of the benefits provided by the APOE ε4 allele early in life include higher IQs [142] and higher verbal fluency scores [3]. At the same time, the APOE ε4 allele is known to be the strongest genetic risk factor for Alzheimer's disease. A model of antagonistic pleiotropy can reconcile these apparent contradictory cognitive patterns occurring in young and old ages.

This is a difficult concept to study, particularly in humans, given the impossibility of the development of experimental studies, but genetic and genomic data can be used to this end, mainly providing correlational results. Cause-and-effect evidence can be established in model organisms.

In addition to the known ɑ-synuclein synaptic functions, other relevant roles have also been proposed for this protein, including suppression of apoptosis, regulation of glucose levels, antioxidation and neuronal differentiation, and regulation of dopamine biosynthesis [34]. More recently, a link between ɑ-synuclein and immunity has been established with the demonstration that ɑ-synuclein expression is up-regulated following viral infection in neurons, knockout of the ɑ-synuclein gene in mice leads to an increased viral growth in the brain and increased mortality, and H1N1 influenza virus induces aggregation of ɑ-synuclein by blocking protein degradation pathways [5, 6, 80]. These findings suggest that ɑ-synuclein may play a role in the host immune response to viral infections. In the context of antagonistic pleiotropy, one can speculate that genetic variability in SNCA may associate to a better response to viral infections early in life and contribute to the development of synucleinopathies later in life.

In addition to the functions previously described, ɑ-synuclein has also been shown to potentially bind to DNA and direct regulation of gene expression under specific conditions [120]. It may also interfere with epigenetic processes regulating gene expression [32]. These are processes that can occur with the direct presence of ɑ-synuclein in the nucleus [100] or through cellular mediators [56]. Alpha-synuclein has also been shown to interact with histones [47], which have been proposed to be toxic to neurons due to reduced histone acetylation [64].

Understanding the impact of environmental factors in synucleinopathies and their interactions with the genetic profiles in humans is challenging. Often, complex study designs including large sample sizes and longitudinal follow-ups are required. Nonetheless several associations of environmental factors with variants in SNCA have been reported. These include associations between SNCA rs356219, rs356220 and Rep1 and smoking, where carriers of the Rep1 263 bp allele who never smoked presented a significantly increased risk of PD. No variants in SNCA have been associated with coffee intake or head injury and only a suggestive association was found between rs3775423 and pesticide exposure [23, 43]. Similarly, it is also not straightforward to establish gene–gene interactions (epistasis). Even though epistasis is likely an ubiquitous component of the genetic architecture of common human diseases, it is difficult to detect and characterize mainly due to challenges with statistical modelling, computational power and interpretation of results [83]. Interactions at the protein level between a-synuclein and several other neurodegeneration relevant proteins like APOE, GBA, MAPT, ATP13A2, VPS35 and TDP-43 have been shown in cell and animal models [45], as well as in neuropathology studies [31, 33]. However, gene–gene interactions between SNCA and other genes have not yet been systematically characterized at the whole genome level in humans.

Concluding remarks

It is clear that genetic variability in the SNCA locus plays an important role across the spectrum of synucleinopathies. However, this role varies with the type of variants and with the specific disease. Mendelian SNCA mutations are well documented in PD, PDD and DLB cases and families. The same is not true for MSA and PAF where the analysis of mutations in disease has so far occurred in the background of another phenotype. The association of risk variants in the locus is also very clear for PD and DLB, but still unclear for PDD and MSA, and absent for PAF.

The evidence of the involvement of other synuclein genes in synucleinopathies is much scarcer, indicating SNCB and SNCG, most probably, do not have a genetic role in these diseases. Large GWAS have so far not identified risk factors for synucleinopathies at the SNCB or SNCG loci, and only two rare variants in SNCB have been described as possibly causative for DLB. Although the functional assessment of these variants seems to point to an effect in protein function in disease, the genetic data are not entirely supportive.

The impending results of large-scale whole-genome sequencing in PD and other synucleinopathies will allow the identification of additional disease-related variants. The interpretation of such findings will remain challenging, but the availability of large-scale sequencing data from large cohorts of diverse populations, from controls, and from additional disease cohorts will be essential. Similarly, the integration of genetic approaches with biochemical, biophysical and structural analyses in a background of relevant cellularity will be essential to fully understand the shared and specific pathobiological processes of synucleinopathies.