Introduction

Neurodegenerative disorders are a collective term used to include a number of diseases such as Alzheimer, Parkinson, multiple system atrophy, progressive supranuclear palsy, dementia with Lewy bodies, amyotrophic lateral sclerosis, Huntington disease, prion disease, spinocerebellar ataxia, spinal muscular atrophy, neuronal ceroid lipofuscinoses, Pick’s disease, frontotemporal dementia etc. All these conditions are characterized by neuronal degeneration and consequent altered functions in different regions across the central and peripheral nervous system with symptoms manifesting in the respective target tissues. Different parts of the brain are comprised of specialized neurons such as dopaminergic, cholinergic, serotoninergic, glutamatergic among others, alterations in which lead to different disease phenotypes (figure 1). Loss of specific neurons either due to intrinsic (genetic) or extrinsic (injury, infection etc.) factor is generally an irreversible process and contribute to a range of pathological features. Underlying these disorders are a common set of clinical features such as involuntary movement and relentless progress in disease (Bertram and Tanzi 2005). Yet another shared feature of all these conditions are that they are generally late onset and sporadic in occurrence with only a small but varying proportion being familial. A strong genetic component is undisputed in the aetiology of the latter group which is also generally characterized by comparatively earlier age of disease onset. However, based on the classical twin/adoption studies and epidemiological surveys, nongenetic factors together with genetic vulnerability are believed to contribute to the more abundant sporadic forms, thus justifying their inclusion under complex disease category.

Fig. 1
figure 1

Schematic presentation showing parts of the brain affected in different neurodegenerative disorder.

Conventional genetic analysis tools for linkage and contemporary approaches of next-generation sequencing performed using familial forms of most of these disorders have contributed notably to the discovery of several putative disease causal genes. These have provided novel insights into the pathogenesis of neurodegeneration and the field is still evolving at a dramatic pace and is expected to greatly influence the diagnosis and treatment in future (Pihlstrøm et al. 2017). Conversely, findings from genetic dissection of the common sporadic, complex disease forms have been limited, warranting use of additional strategies. Thus, uncovering additional risk conferring genes/loci may provide newer insights into disease biology with implications for improved/novel therapeutics. This review aims to provide a concise account of the genetics that we know, the pathways that they implicate, the challenges that are faced and the prospects that are envisaged for the sporadic, complex forms of neurodegenerative diseases, taking four most common conditions namely Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS) and Huntington disease (HD) as examples. It may be relevant to mention here that HD is a monogenic disorder, but due to its exceptional inheritance pattern, caused by as yet unclear modifier gene effects, it has been included under complex disease forms. A brief account of the disease per se and the available genetic data are provided separately for each of these four pathologies in the earlier part of this review and the challenges and prospects being more or less similar across these conditions are discussed together in the subsequent section. Major similarities and unique features of these four neurodegenerative disorders are also tabulated in table 1 in electronic supplementary material at http://www.ias.ac.in/jgenet/.

Table 1 Shows susceptibility genes/loci in AD identified in GWAS.

Alzheimer’s disease

AD is a chronic progressive neurodegenerative disease that is responsible for around 60 to 70% cases of dementia. AD is generally a disease in elderly population with a prevalence of 10–30% in the population >65 years of age, and an incidence of 1–3% affecting 20–30 million people worldwide with the incidence shooting up from 0.5% per year in 65–75 age group to 6–8% per year in age group 85 and above (Masters et al. 2015). Late onset sporadic cases comprise >95% of disease burden and <1% are familial, showing an autosomal dominant mode of inheritance and notably a much earlier age of onset cognitive deficits in AD is insidious in onset and progressive in course (Scheltens et al. 2016). Memory impairment, specifically the loss of memory of recent events, is the most frequent feature of AD and is usually its first manifestation followed by deficits in other cognitive domains. Executive dysfunction and impaired visuospatial skills are affected relatively earlier than language function and behavioural symptoms. Noncognitive neurologic deficits (pyramidal and extrapyramidal motor signs and seizures) can occur in late stages of AD. The speed of progression is variable but the average life expectancy after diagnosis is three to nine years.

In AD, there is neuronal cell death in wide areas of the cerebral cortex and hippocampus, starting with the neurons of the frontal and temporal lobes, gradually progressing to other areas of the neocortex at rates that vary considerably between individuals. Pathologically, it is marked by accumulation of insoluble forms of amyloid-\(\upbeta \) (A\(\upbeta )\) as plaques in extracellular spaces, as well as in the walls of blood vessels and aggregation of the microtubule protein tau in neurofibrillary tangles in neurons (Masters et al. 2015). The clinical diagnosis in AD is challenging as comorbidities such as cerebrovascular disease and hippocampal sclerosis are seen in patients. The development of Pittsburgh compound B (PiB)—a C11 radioactive analogue of fluorescent amyloid dye thioflavin T62 has helped in A\(\upbeta \) imaging in patients by positron emission tomography (PET) brain scans. Greater accuracy in diagnosis has been possible with the development of an advanced tracer florbetaben F18. Another method for diagnosis is CSF collection by lumbar puncture from AD patients and profiling for protein biomarkers A\(\upbeta \)42, total tau (T-tau) and phosphorylated tau (P-tau); and the most commonly used assay is specific for P-tau at Thr181 (Olsson et al. 2016).

Familial occurrence together with a greater risk of individuals of developing AD among those with first degree relatives with disease were suggestive of a genetic component in disease aetiology which was confirmed by heritability estimates for AD ranging between 57 and 78% (Meyer and Breitner 1998). Twin studies showing 32.2% concordance between monozygotic and 8.7% between dizygotic twins have revealed gene–environment interactions in the aetiology of AD with genetic determinants being probably lesser contributors (Pedersen et al. 2004). Average life span after disease onset is 8–10 years and there is no treatment available for AD. Cholinesterase inhibitors like donepezil, rivastigmine and galantamine are used to treat the cognitive symptoms (memory loss, confusion and hampered thought processes) associated with AD. Memantine (glutamate antagonist) provides some benefit for patients with moderate to severe dementia. New treatment modalities are urgently needed to prevent, delay or treat the symptoms of AD. Researchers are currently focussing on anti-amyloid approaches (including active and passive immunization strategies), \(\upgamma \)-secretase and \(\upbeta \)-secretase inhibitors and anti-aggregation drugs.

Genetic risk factors

The discovery of A\(\upbeta \) peptides in brains of both AD patients and persons with Down’s syndrome was one of the earliest clues that mutation in a gene on chromosome 21 might also cause AD in persons without Down’s syndrome (Glenner and Wong 1984a, b). As in PD, causal gene discovery was more plausible using the familial forms. Linkage analysis in families with AD identified Amyloid-\(\upbeta \) A4 protein precursor (APP) gene at 21q21.3 coding for the A\(\upbeta \) peptide precursor as a promising candidate (Hardy 2006). Specific mutations in APP seen in early onset AD (EOAD) patients confirmed its role in disease aetiology (Chartier-Harlin et al. 1991) but mutations in APP could not explain all the forms of familial EOAD and thus the search for additional genes/loci continued. Linkage analysis performed in more families discovered two more genes, namely presenilin-1 (PSEN1) (Sherrington et al. 1995) and presenilin-2 (PSEN2) (Levy-Lahad et al. 1995). As of date, a total of 52,241 and 45 mutations are reported in APP, PSEN1 and PSEN2, respectively (listed at http://www.alzforum.org/mutations) accounting for \(\sim \)5–10% of EOAD families. Identification of mutations has not only provided important insights into molecular mechanisms and pathways involved in AD pathogenesis but has also led to the identification of druggable targets (Van Cauwenberghe et al. 2016). In contrast, no mutations in these genes have been identified in the sporadic late onset AD (LOAD) cases. Environmental attributes have been notably elusive and genetic component, even if small, is probably complex and heterogeneous.

APOE (19q13.32) is well-documented and strongly associated risk conferring gene with AD. The presence of APOE immunoreactivity in A\(\upbeta \) amyloid deposits and neurofibrillary tangles linked APOE to AD pathology for the first time (Namba et al. 1991; Wisniewski and Frangione 1992). APOE encodes a glycoprotein involved in catabolism of triglyceride-rich lipoproteins, transport of cholesterol and other lipids, and also in neuronal growth, repair response to tissue injury, nerve regeneration, immune regulation and activation of lipolytic enzymes. APOE contains three major alleles, \(\upvarepsilon 2, \upvarepsilon 3\) and \(\upvarepsilon 4\) with varying frequencies across different ethnic groups. For example, it is 0.07 for APOE \(\upvarepsilon \)2, 0.78 for APOE \(\upvarepsilon \)3 and 0.15 for APOE \(\upvarepsilon \)4 in Americans of European descent (Saunders et al. 1993) versus 0.04% for APOE \(\upvarepsilon \)2, 0.89% for APOE \(\upvarepsilon \)3 and 0.07% for APOE \(\upvarepsilon \)4 in a north Indian cohort (Thelma et al. 2001). APOE2 (Cys112 and Cys158), APOE3 (Cys112 and Arg158) and APOE4 (Arg112 and Arg158) isoforms differ in the total charge present on the respective proteins and their structure which leads to altered binding to both cellular receptors and lipoprotein particles, and possibly changing the stability and rate of production and clearance. APOE binds to A\(\upbeta \) and effectuates the clearance of soluble A\(\upbeta \) and A\(\upbeta \) aggregations, and APOE \(\upvarepsilon \)4 is thought to be less efficient in mediating A\(\upbeta \) clearance and deposition, ultimately contributing to plaque formation (Strittmatter et al. 1993). APOE \(\upvarepsilon \)4 which increases the risk of developing AD in familial and sporadic EOAD and LOAD (with one allele imparting a threefold increase in risk and two alleles a 12 fold increase), is also associated with an earlier age of onset of AD and reported to contribute to \(\sim \)50% of sporadic AD (Strittmatter et al. 1993). On the other hand, APOE \(\upvarepsilon \)2 decreases the risk of developing AD and also delays the age of onset. Human APOE isoforms have been shown to cause isoform dependent decreases (APOE2>APOE3>APOE4) in neuritic plaque load and delayed time of onset of A\(\upbeta \) deposition in several mouse models of AD (Holtzman et al. 1999, 2000). Collectively, these studies indicate that APOE is a major causative or contributing factor for AD pathogenesis. Also studies have shown that T-tau and P-tau levels are increased in both brains and CSF of patients with AD which indicate the role of tau protein in AD pathology. A total of 107 mutations in MAPT (Tau) has been reported in LOAD as of today (http://www.alzforum.org/mutations).

Table 2 Shows rare variants in genes identified in AD using NGS technology.

In addition to APOE and MAPT as major susceptibility genes, >20 risk genes/loci for AD have been reported, mostly based on GWASs performed on cohorts of Caucasian ancestry in the last decade. Of note, APOE locus has been the most consistently associated across population groups of which all but three were case–control analysis (table 1). With limited insights from these studies, similar to most of the other studies in complex traits, emphasis shifted from common disease common variant (CDCV) to common disease rare variant (CDRV) hypothesis, the latter facilitated by the emergence of next-generation sequencing (NGS) technologies. Rare variants are expected to have larger size effect than GWAS loci, are easier to characterize functionally and to develop cellular and animal models of disease. Rare variants thus identified in sporadic AD patients by whole exome sequencing (WES) are seen in favourite candidate genes, namely APOE and APP as well as in five other genes of biological relevance (table 2). However, only APOE, APP and SORL1 are identified in GWAS. Meta-analysis of GWASs, identified genes that are mostly involved in endosomal vesicle cycling, immune response and cytoskeletal functions. Importantly, several of them are also involved in A\(\upbeta \) clearance or tau mediated toxicity (Van Cauwenberghe et al. 2016). Taken together, all these findings are still insufficient to explain the complete pathological spectrum of disease, facilitate effective disease risk prediction/prevention thus encouraging continuing efforts.

Parkinson’s disease

The second most common neurodegenerative disorder is PD, which results from specific loss of dopaminergic neurons in the substantial nigra pars compacta in midbrain (Fearnley and Lees 1991) leading to dopamine deficiency. Intraneuronal protein aggregates largely comprised of \(\upalpha \)-synuclein and commonly referred to as Lewy bodies are the pathological signature of PD (Vekrellis et al. 2004). The symptoms generally appears slowly over time. Early in the disease, the most obvious are tremors, rigidity, slowness of movement, and difficulty with walking. Dementia becomes common in the advanced stages of the disease. Depression and anxiety are also common in patients with PD. Other nonmotor symptoms include sensory, sleep and emotional problems. The diagnosis is made using UK brain bank criteria (Hughes et al. 1992) and clinical assessment is done using Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS). Exclusion criteria for PD is elaborate and includes history of repeated strokes with stepwise progression of Parkinsonian features, history of repeated head injury, history of definite encephalitis, 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) exposure, negative response to large doses of levodopa, supranuclear gaze palsy and presence of a cerebral tumour. Assessment of the extent of neuronal damage is done by neuroimaging techniques including PET, single-photon emission CT (SPECT) and novel MRI techniques. Unfortunately, there is no cure for PD and only symptomatic relief is generally achieved by levodopa-based therapies. Motor features which are initially manageable with these therapies worsen with disease progression together with emergence of complications including motor and nonmotor fluctuations, dyskinesia, and psychosis (Hely et al. 1999; Pandey and Srivanitchapoom 2017). Surgery involving placement of microelectrodes for deep brain stimulation has been used to reduce motor symptoms in severe cases where drugs are ineffective (Martinez-Ramirez et al. 2015), but its precise mechanism is unknown (Pandey and Sarma 2015). PD is largely an age dependent disability with a global prevalence of \(\sim \)1% in population >65 years of age and 4–5% at or after 85 years of age (Twelves et al. 2003; Savica et al. 2016) with approximately double the number of males affected (Van Den Eeden et al. 2003). Most PD cases are sporadic in occurrence but \(\sim \)10% of PD cases also exist as Mendelian forms. A small proportion of cases develop PD before the age of 40 years and thus are grouped under EOPD (Van Den Eeden et al. 2003) and another smaller category is comprised of juvenile PD with disease onset at <20 years of age. The prevalence of these two categories varies across ethnic groups (Pringsheim et al. 2014).

Genetic risk factors

As mentioned above, majority of PD cases are sporadic and early evidence of MPTP damaging the nigrostriatal dopaminergic neurons through glial activation were suggestive of a significant environmental attribute (Watanabe et al. 2005). Twin studies in an American cohort showing a concordance of 15.5% in monozygotic twins compared to 11.1% in dizygotic twins substantiated a genetic component as well (Tanner et al. 1999) and this was more or less replicated in a Swedish study (Wirdefeldt et al. 2011). As a consequence of the prolonged lifespan of neurons, it is believed that complex gene–environment interactions generate stress and ultimately results in cell death (Gu et al. 2005). But as in all the other common complex traits, elucidating the genetic contribution to complex forms of PD aetiology has been exigent. To start with, their genetic analysis was limited to a few candidate gene based association studies. Genes from the dopaminergic pathway and those coding for phases 1 and 2 drug metabolising enzymes were the obvious choice based on pharmacological evidence from successful levodopa treatment and the classical MPTP induced animal model of PD suggesting the role of environmental toxicants (Kopin et al. 1986). However, most of these observed associations were not replicated across studies and ethnic groups, and were far from being of any predictive value. On the other hand, genetic analysis of large PD kindred and smaller independent PD families using classical and contemporary approaches of linkage analysis and NGS respectively, have identified >18 PARK genes/loci namely SNCA, PARKIN, SPR, UCHL1, PINK1, DJ-1, LRRK2, ATP13A2, GIGYF2, HTRA2, PLA2G6, FBXO7, ADORA1, VPS35, EIF4G1, DNAJC13, CHCHD2, TMEM230, VPS13C, PTRHD1, PODXL and RIC3 with the last two genes discovered recently in our laboratory (Sudhaman et al. 2016a, b). These PARK genes identified to date have provided significant insights into cellular and molecular pathways involved in PD aetiology thus providing likely leads for identification of novel drug targets/improved therapeutics. Collectively these findings implicate aberrant protein folding, protein degradation, neuro-inflammation, neuro-developmental defects and mitochondrial homeostasis with the involvement of oxidative stress for disease pathology. At this time, it is relevant to note that although recessive forms of PD have a few clinical features that are distinct from typical sporadic PD, such as early onset, severe dystonia, stable cognition and early motor fluctuations (Khan et al. 2005), they are indistinguishable at the pathological level which may suggest the involvement of similar pathways malfunctioning and leading to disease progression. In fact, it was this knowledge that encouraged the extension of association studies of Mendelian genes such as SNCA, LRRK2, PARKIN and PINK1 etc. with sporadic PD cases (Oliveira et al. 2003; Groen et al. 2004; Mellick et al. 2005; Paisán-Ruíz et al. 2005). This marked the beginning of the marginal progress that has been witnessed in genetics of complex forms of PD, with significant associations of common SNPs in these four genes being consistently replicated across studies (Lesage and Brice 2012). This however, could not explain total disease heritability. It was only then that the GWAS approach was employed to revisit sporadic complex forms of PD to identify genomewide susceptibility genes/loci. More than 33 GWAS, mostly in Caucasian populations have been conducted to date. These studies have unravelled significantly associated common variants in >24 independent genes/loci and recent meta-analyses have revealed a few additional genes, namely SIPA1L2, INPP5F, MIR4697, GCH1, VPS13C, DDRGK1, ITPKB, SCN3A, SATB1, IL1R2 etc. (Nalls et al. 2014; Chang et al. 2017) as shown in table 3. However, it is notable that associations have been consistently replicated at only four of these genes, namely GBA, SNCA, MAPT and LRKK2 in independent case–control (Trotta et al. 2012; Mao et al. 2013; Campêlo et al. 2017) and genomewide association studies (GWAS) (Nalls et al. 2014). It may be relevant here to briefly describe the most consistently replicated linkage and/or GWAS based genes which may lend support to the true contribution of these genes to complex PD forms.

Table 3 Shows susceptibility genes/loci in PD identified in GWAS.

Mutations in a-synuclein (SNCA) and leucine-rich repeat kinase 2 (LRRK2) cause familial autosomal dominant forms of the disease. SNCA is the first PD gene identified by linkage studies (Polymeropoulos et al. 1997) but mutations/single-nucleotide variants and copy number variations in this gene contribute a very minor proportion to familial PD cases (Petrucci et al. 2016) compared to their robust association with sporadic cases. A dinucleotide repeat polymorphism in the promoter of SNCA, which increases its expression in vitro has been shown to be associated with increased PD risk (Maraganore et al. 2006). A strong association between specific haplotypes in the SNCA locus and sporadic PD has been demonstrated (Chai and Lim 2013). Even a common variant (rs356168) in the SNCA 3\(^\prime \) UTR is reported to be associated with increased PD risk by increasing the translation and accumulation SNCA protein (Toffoli et al. 2017). Both these types of variants essentially result in the aggregation of the \(\upalpha \)-synuclein protein leading to formation of Lewy bodies and presumably lead to dopaminergic neuronal cell death. On the other hand, mutations in LRRK2 coding for a kinase are the most common cause of sporadic PD, with a mutation frequency ranging from 2 to 40% in different populations (Klein and Westenberger 2012). More than 100 LRRK2 mutations have been so far described in PD families and sporadic cases (Li et al. 2014). PARKIN (Shimizu et al. 1998) and PINK1 (Valente et al. 2004) were identified in autosomal recessive PD families. Unlike familial cases, mutations reported in sporadic forms of PD are heterozygous and several lines of evidence suggest that these mutations are indeed susceptibility factors for PD (Khan et al. 2005; Eggers et al. 2010). PARKIN is an ubiquitin E3 ligase and is involved in mitochondrial maintenance, mitochondrial cytochrome c release, and autophagy of dysfunctional mitochondria (Pickrell and Youle 2015). PINK1 is a kinase that provides protection against mitochondrial dysfunction and regulates mitochondrial morphology via fission/fusion machinery and helps in autophagy (Pickrell and Youle 2015). Glucocerebrosidase (GBA) is another most common known genetic risk factor for sporadic PD (Ran et al. 2016). Interestingly, a few Gaucher disease (GD) patients also develop Parkinsonism (Neudorfer et al. 1996). Mutations in GBA are linked to GD where deficiency of the enzyme glucocerebrosidase causes lysosomal storage disorder. In Ashkenazi Jewish population with PD, 31.3% carried GBA mutations compared with 6.2% in the control individuals (Aharon-Peretz et al. 2004) with Asn370Ser mutation accounting for \(\sim \)70% of the disease burden (Sidransky and Lopez 2012). Another locus consistently associated with sporadic PD is MAPT, where H1 haplotype was shown to confer risk of PD in three Caucasian populations from Ireland, Norway and US (Skipper et al. 2004). Exon 1 of MAPT harbours a gene SAITOHIN (STH), where in a coding variant Gln7Arg (rs62063857) was shown to associate with risk of progressive supranuclear palsy and PD, and is in complete LD with the MAPT 238-bp intron 9 deletion that discriminates H1/H2 (Tobin et al. 2008). Besides, pathways such as ubiquitin–proteasome system, mitochondrial dysfunction, ROS mediated degeneration etc. leading to neuronal toxicity and cell death identified based on familial studies (described above), genes from association studies have enabled identification of additional pathways in the pathogenesis of complex forms of PD. These include immune system, autophagy/lysosomal degradation, microtubule stabilization, axonal transport, synaptic function and endocytosis. Yet genetic basis of majority of cases in this category is unexplained indicating that more genes/pathways remain to be identified.

ALS

ALS, commonly known as motor neuron disease is a progressive degenerative condition of the brain and spinal cord. Motor neurons are grouped into upper populations in the motor cortex and lower populations in the brain stem and spinal cord, with lower motor neurons innervating the muscles (Rowland and Shneider 2001). The core pathological feature in ALS is motor neuron death in the motor cortex and spinal cord, while in ALS with frontotemporal dementia (\(\sim \)10–15% ALS cases), neuronal degeneration is more widespread, occurring throughout the frontal and temporal lobes. ALS and frontotemporal dementia (FTD) are considered to be a part of spectrum. FTD patients are characteristically present with behavioural and speech problems, and ALS patients are present with alteration of voluntary movements caused by degeneration of motor neurons (Guerreiro et al. 2015). Loss of corticospinal motor neurons causes thinning and scarring (sclerosis) leading to muscle stiffness and spasticity, while loss of neurons in brain stem and spinal cord show electrical irritability and spontaneous muscle twitching. With disease progression, there is thinning of the ventral roots and they lose synaptic connectivity to their target muscles leading to denervational atrophy (amyotrophy) of the tongue, oropharynx and limb muscles (Brown and Al-Chalabi 2017). Diaphragm muscles are also affected, which generally leads to death due to respiratory paralysis within 3 to 5 years of diagnosis (Shoesmith et al. 2007). About one-third of ALS cases are bulbar, having difficulty in chewing, speaking, or swallowing. Cognition and behavioural changes are also observed in >30% ALS patients (Hobson and McDermott 2016). The diagnosis of ALS is primarily based on clinical examination using electromyography, MRI, blood/urine testing and muscle biopsy to rule out other neurological disorders which mimic ALS. Specific criteria for the diagnosis of ALS is known as the El Escorial criteria which provides a structured tool to define the level of confidence of a diagnosis in individual patients (Brooks et al. 2000). This classifies patients to different degrees of diagnostic certainty (definite, probable, probable with laboratory confirmation and possible) according to the presence of lower and upper motor neuron signs and their distribution in the four regions, i.e. bulbar, upper limb, thoracic and lower limb. First line of treatment is based on riluzole, a neuroprotective drug that blocks glutamatergic neurotransmission, which has been shown to improve mortality rate by 23% and 15% at 6 and 12 months, respectively (Traynor et al. 2003). In 2015, edaravone, an antioxidant compound was approved in Japan for treatment of ALS. On 5 May 2017, the FDA approved edaravone to extend the survival period of people with ALS (Rothstein 2017). Arimoclomol is another promising compound, which is a coinducer of heat shock protein expression under conditions of cellular stress. The therapeutic potential of this drug is currently under phase II clinical trial for ALS patients with SOD1 mutations (Kalmar et al. 2014).

ALS with a prevalence of 3–5 per 100,000 is an age dependent disorder with an increased incidence and higher prevalence in older populations (Pasinelli and Brown 2006). Monozygotic twin concordance is estimated to be between 60–80% suggesting a major genetic contribution to disease aetiology (Al-Chalabi et al. 2010). About 90% of ALS cases are sporadic with the rest being familial and generally inherited as an autosomal dominant condition (Byrne et al. 2011). Except in a few families with younger age at onset, it is \(\sim \)55 years in both these groups. As in AD and PD, there is genetic heterogeneity in both familial as well as sporadic ALS forms, but pathological and clinical features are similar indicating involvement of common cellular and molecular events leading to motor neuron degeneration. Based on the genes known to date, three processes namely (i) proteostasis and protein quality control; (ii) cytoskeletal dynamics; and (iii) RNA stability, function and metabolism seem to be generally affected (Weishaupt et al. 2016) which may provide leads for new drug discoveries.

Genetic risk factors

A number of genes relevant to ALS pathogenesis identified recently explain \(\sim \)70% of the familial cases and \(\sim \)30% of sporadic cases. This is notably higher compared to the heritability explained in PD and AD. Mutations found in familial ALS are also reported in sporadic forms. More than 50 genes including TARDBP, FUS and VPC have been identified in ALS using approaches ranging from conventional linkage analysis, candidate gene association/GWAS to recent NGS (Li and Wu 2016). Of these, disease causal variants/risk alleles in a few genes such as DCTN1, FIG4 and DAO have been limited to the study families or populations respectively. SOD1 was the first gene identified to be causal for ALS using linkage analysis in affected families (Siddique et al. 1991; Rosen et al. 1993), with mutations accounting for \(\sim \)20% of familial cases and also in \(\sim \)2–4% of the sporadic cases (Chen et al. 2013). Both sporadic and familial ALS cases with different SOD1 mutations show variation in the phenotype, in the age of onset, severity, rate of disease progression and duration of illness. For example: Asp90Ala mutation in exon 4 of SOD1 causes less severe form of the disease where average survival years are >10 after diagnosis (Andersen et al. 1996) whereas Ala4Val mutation in exon 1 has 91% penetrance and patients with this mutation survive for <18 months after the diagnosis (Cudkowicz et al. 1997). SOD1 is a ubiquitous cytosolic homodimeric protein, each subunit containing copper and zinc ions in the active site, its primary function being reduction of superoxide radical to \(\hbox {H}_{2}\hbox {O}_{2}\) (Fukai and Ushio-Fukai 2011). Studies have led to widespread acceptance of the hypothesis that SOD1 mutants acquire a novel toxic property independent of their enzymatic function (Redler and Dokholyan 2012). Both sporadic and familial ALS cases have aggregation of cytoplasmic proteins, prominently in motor neurons. The hallmark proteins of the pathogenic inclusions are SOD-1, TDP-43 or FUS. Almost all cases of ALS and tau-negative FTD have abnormal TDP-43 protein, also known as TDP-43 proteinopathy (Liscic 2017). The genetic findings of ALS and FTD also support a common link, as mutations in the same genes have been found in ALS, FTD, or FTD/ALS patients. Mutations in the TANK-binding kinase 1 gene (TBK1) has presented with features of ALS and FTD worldwide, however additional neurological features have also been described in TBK1 subjects. In a recent publication progressive supranuclear palsy (PSP) like phenotypes and progressive cerebellar syndromes in TBK1 subjects have been reported (Wilke et al. 2017). C9orf72 is another recently discovered gene in ALS linked to 9p chromosomal region. This gene is recognized as the most common form of ALS and FTD, additionally some patients may present with Parkinsonism, essential tremor and restless leg syndrome (Nuytemans et al. 2013b). Notably, an expansion of a hexa-nucleotide repeat GGGGCC (2–23 repeats in healthy individuals) in the noncoding region of this gene has been reported to be most common cause of familial ALS explaining >30% cases and >10% of the sporadic forms (DeJesus-Hernandez et al. 2011; Majounie et al. 2012). The C9orf72 protein has a role in nuclear and endosomal membrane trafficking and autophagy. There is evidence that RNA foci containing this repeat accumulate in the brains and spinal cords of affected people, and this suggested a second possible disease mechanism, involving toxic gain of function by repeat-containing RNA (Zu et al. 2013). Another \(\sim \)5% of sporadic and \(\sim \)3% of familial cases are caused by mutations in the gene coding for TDP-43 (Beleza-Meireles and Al-Chalabi 2009). In addition, modifiers such as 27–33 CAG repeats in ATXN2, increasing the risk of developing ALS (Daoud et al. 2011) and variants in EPHA4 that reduce expression of the axonal guidance improving the overall survival of ALS patients have also been reported (Elden et al. 2010; Van Hoecke et al. 2012). Besides linkage based causal gene discoveries, there have been more than 14 GWASs carried out on ALS, which have identified common variants associated (table 4). Genes such as UNC13A and C9orf72 are robustly replicated but many others including FGGY and ITPR2 have failed to show association in large cohorts (Daoud et al. 2010). These findings from sporadic forms have implicated regulation of chemotaxis and cellular communication/differentiation, intracellular signal transduction, oxidative stress and cytoskeleton organization pathway in addition to validating the involvement of SOD1 and c9orf72 mediated protein/RNA toxicity reported earlier in familial forms. Further, all common variants known to date explain <30% of sporadic cases. However, since most of these studies have been performed in European populations there is scope for identification of additional risk loci using populations with different genome architectures.

Table 4 Shows susceptibility genes/loci in a ALS identified in GWAS.

Huntington disease

Huntington disease (HD) is also a progressive neurodegenerative disease but unlike the sporadic complex forms of AD, PD and ALS described above, it is the most common monogenic condition in this group of disorders and largely following an autosomal dominant mode of inheritance. However, the variable age of disease onset witnessed among HD patients implying a significant role of modifier genes, warrants it to be considered under complex disease forms. HD is clinically diagnosed by assessment of motor dysfunction, cognitive impairment and neuropsychiatric features, with family history being a major feature. Loss of self and spatial awareness, depression, dementia and anxiety which is devastating to both patients and his or her family (Labbadia and Morimoto 2013) are associated features observed in HD patients. The earliest symptoms are often subtle changes in mood or mental abilities. Noncoordination and an unsteady gait often follow. As the disease advances, uncoordinated, jerky body movements also appears. Mental abilities generally decline into dementia (Ross and Tabrizi 2011). Degeneration of medium spiny neurons of striatum and also other regions of the brain such as cerebellar cortex, thalamus and cerebellum results in the manifestation of the characteristic disease symptoms (Bano et al. 2011), which are assessed by MRI or CT scans. Treatment to alter the course of HD is unavailable. Tetrabenazine treatment reduced chorea in a randomized controlled trial (Huntington Study Group 2006). Other drugs that help to reduce chorea include neuroleptics and benzodiazepines. Compounds such as amantadine or remacemide are still under investigation but have shown preliminary positive results (Mestre et al. 2009). Hypokinesia and rigidity, especially in juvenile cases, can be treated with antiparkinsonian drugs, and myoclonic hyperkinesia can be treated with valproic acid. Genetic counselling is advised if there is a family history of disease. HD has a global prevalence of 5–10 individuals per 100,000 (Roos 2010). The onset of disease is in the prime of adult life, i.e. fourth and fifth decade with disease duration of 15–20 years (Hardiman et al. 2016).

Genetic risk factors

Conventional restriction fragment length polymorphism based analysis in a large Venezuelan family with HD established its linkage at 4p16 (Gusella et al. 1983). However, it was only after the success stories of causal gene identification in Duchenne muscular dystrophy, cystic fibrosis and trinucleotide repeat expansion disorder fragile-X syndrome that the pathological CAG repeats in HTT were identified (MacDonald et al. 2017). The repeat expansion is a gain-of-function mutation leading to protein aggregation and toxicity by neuronal cell death in the striatum. Impairment of proteostasis network leading to synaptic dysfunction, mitochondrial toxicity and axonal transport seem to be the underlying pathological mechanisms. One of the strongest genetic risk factors associated with HD is the length of the CAG trinucleotide repeats in exon 1 of HTT. Normal populations have 9–35 of these repeats with median between 17 and 20, and >40 repeats lead to definite disease manifestations (Andrew et al. 1993; Snell et al. 1993). Repeats >60 results in juvenile Huntington’s disease (JHD) with onset at <20 years of age accounting for \(\sim \)7% of all HD cases (Nance and Myers 2001). It is the intermediate repeat lengths of 36–40 that have a lower disease predisposition and which may lead to incomplete penetrance (McNeil et al. 1997). These repeat sequences are prone to slippage during DNA replication resulting in their expansion and this is witnessed more through males and together with great variability in repeat lengths (Wheeler et al. 2007). The age at onset and severity of the disease are inversely correlated with the number of repeats. Further, HD is also characterized by genetic anticipation which means increased severity in disease and earlier age at onset in subsequent generations. Unlike other classical Mendelian disorders where a clear genotype–phenotype correlation is observed in HD such a prediction of disease phenotype and age at onset is not possible despite the presence of expanded CAG repeats. Two individuals with the same number of repeats in the pathological range can have as many as 20 years of difference in age of disease onset (Gusella and MacDonald 2009). Such a striking variation is currently attributed to interaction of additional genetic factors, i.e. modifier genes present in the genome, which has been substantiated by evidence discussed below. CAG repeat expansion in HTT is detected in 99% of cases with typical HD. In the remaining 1%, the disease has been linked to mutations in PRNP and JPH3 causing HD-like disease 1 and 2 (Pihlstrøm et al. 2017).

Since the discovery of pathogenic trinucleotide repeat expansion mutation in HTT and the observed clinical heterogeneity in HD, there have been a number of studies which investigated the role of genes that modify the disease pathogenicity. The length of CAG repeat in an affected individual was speculated to have a disease modifying effect and therefore, earlier studies focussed on variants at HTT locus itself that could alter the structure, function or expression of Huntingtin protein (Gusella et al. 2014). A recent study conducted on 4068 HD individuals however showed no significant impact. Interestingly, the same group showed that individuals with (CAG)\(_{n}\) expansion in both chromosomes (although a very rare event) have the age of disease onset consistent with the length of longer allele of the two expanded CAG repeat alleles (Lee et al. 2012). The other variants present in HTT haplotypes, like variable CCG repeats next to the CAG repeat, a deletion polymorphism at codon 2642, a deletion of one of the four consecutive GAG codons in exon 58, untranslated sequence of the transcript, intron sequences, and sequences flanking the centromeric and telomeric ends of the gene were also investigated for their modifying effect on age of disease onset but no correlation was observed (Gusella and MacDonald 2009). This was followed by the hypothesis testing approach where genes for establishing association were selected from transcriptome and proteome data obtained from peripheral blood of HD patients (Gusella and MacDonald 2009; Arning 2016). The genes were first sequenced to identify variants, if any, and then tested for association with age of onset. A number of replication studies using these variants have also been conducted lately in larger sample sizes and using appropriate statistical tools but with negligible replication (Arning and Epplen 2012). In addition, a few other candidate genes prioritized based on their role in energy metabolism, neurotransmission, HTT protein interactions and regulation of gene expression in the target tissue were also assessed for their association but results were not promising. These candidate gene-based association findings are summarised in table 5.

Table 5 Shows candidate genes associated with HD.

These limited insights from early association studies gave an impetus to perform the hypothesis free GWAS. A combined analysis of two sequential GWASs (GWAS1+GWAS2) using 1951 HD mutation positive samples of European ancestry collected at MaHDC (Massachusetts HD Center Without Walls) identified SNP rs146353869 on chromosome 15 to be significantly associated (\(P = 4.36 \times 10^{-9)}\) with the age of disease onset. Association of this SNP was notably replicated in another GWAS on an independent cohort of 2131 individuals (GWAS3) of the same ancestry (\(P=1.35\times 10^{-12})\). Meta-analysis of GWAS1+GWAS2+GWAS3 detected an enhanced association of the index SNP \((P=4.3\times 10^{-20})\) and interestingly, the analysis also identified two more associated SNPs, one on chromosome 15 (rs2140734; \(P=7.1\times 10^{-14})\) and another on chromosome 8 (rs1037699; \(P= 2.7\times 10^{-8})\). Of note, of the two SNPs on chromosome 15 (not in LD), rs146353869 seems to be the risk allele which accelerated the age of disease onset by 6.1 years while rs2140734 delayed the onset by 1.4 years; and risk allele rs1037699 on chromosome 8 hastened the onset by 1.6 years (Lee et al. 2015). In addition to the primary (CAG)\(_{n}\) expansion mediated neurodegeneration, the above detailed association findings are informative, but the actual mechanism(s) by which these modifiers influence the age of onset remains unclear. The major limitations in unravelling the modifiers to explain reduced penetrance or variable age of onset in HD could be clinical and genetic heterogeneity which current diagnostic/analytical tools have failed to address, an idea which has also been proposed by a few others (Genin et al. 2008). In depth analysis of the GWAS findings to identify potential protein-coding variants in LD with the GWAS SNPs and rare variants if any, in yet to be identified relevant genes may provide some leads.

Limitations

A detailed account of the limited insights into disease genetics, poor replication of association findings across studies and populations, an overall inability to establish genotype–phenotype correlations and/or predict risk in the predominant group of complex, sporadic forms of AD, PD, ALS and HD has been presented in the preceding paragraphs. The contrast between these observations and the powerful causal genes discoveries made in familial forms of the same pathologies is striking. Common and rare variants identified by several GWASs and their meta-analyses (Ramanan and Saykin 2013; Chang et al. 2017), emerging WES data and the limited WGS studies (Nicolas et al. 2016) have failed to explain total disease heritability in the sporadic forms of these disorders. Further, rare variants detected in these conditions such as PD (Nuytemans et al. 2013a), AD (Bertram 2016) and ALS (Morgan et al. 2015) are mostly private and confirming their inherited nature (and not a de novo origin) is also rather difficult due to the generally late age of onset of these disorders. Another aspect to be noted is that all these neurodegenerative diseases are tissue specific and sporadic cases may have postzygotic de novo mutations in the target tissue during early development leading to disease manifestation, which will never be detected in blood DNA. A few reports showing such postzygotic brain specific mutations in brain disorders support such a possibility (Beck et al. 2004; Rivière et al. 2012). Thus, a small percentage of disease may always remain unexplained. What these big data obtained from GWASs and NGS imply is then obvious. Even if only protein-coding variants and not regulatory changes were to be considered as predominant contributors to suboptimal functioning of the putative causal gene(s), we would neither be able to catalogue all the possible genomewide and tissue specific disease causing variants (DCVs) nor would we be successful in establishing replicable/reliable genotype–phenotype correlations. To put it in perspective, can even whole genome sequencing of blood DNA from sporadic forms of AD, PD and ALS identify oligogenic/polygenic variants and facilitate risk prediction and disease prevention? Would we be able to identify all the modifiers and their interactions to predict the clinical outcome of trinucleotide expansion present in the pathological range in individuals from HD families? Cohorts with differences in the genetic background may also segregate independent modifiers and may show poor replication. Further, clinical and genetic heterogeneity which characterize these diseases are unlikely to be addressed by attempts to document common susceptibility variants by increasing sample sizes from a few thousands to hundreds of thousands for GWASs and rare variants by advocating WGS. At best what the former could provide would be a few more risk conferring genes and a few more putative contributory pathways, if not already identified by WES based gene discoveries in familial forms. To reiterate, insights from GWAS findings for understanding the disease pathology per se have been negligible compared to those from findings in familial forms. Several loci of minor effects have been identified but very few of them have been functionally validated. Most of these are in the intronic regions and do not result in structural changes, but they may alter the expression levels of the adjacent genes as shown by rare study on functional validation of GWAS identified variant in SNCA (Campêlo et al. 2017). Therefore, a discovery approach combined with their functional validation using appropriate in vitro, cellular/animal models may be rewarding.

With all these constraints, advanced computational and systems biology approaches to analyse the big data are emerging. How the study of molecular, cellular and brain networks provides additional information on the effects of late onset AD-associated genetic variants has been discussed in a recent review highlighting the immune/microglia module to be strongly associated with AD pathophysiology (Gaiteri et al. 2016). At this juncture, it may be just important to follow these limited leads in hand and possibly explore alternate paradigms for unravelling the genetics of these complex neurodegenerative disorders in this era of predictive, preventive, personalized and participatory (P4) medicine. It is now evident that clinical heterogeneity in complex disease forms is the primary obstacle for effective discovery genomics (but not only limited to the four disorders under discussion in this review but also relevant to almost all of the \(\sim \)60% common complex disorders in humans). Developing a strategy to address this concern and obtain homogeneous sample sets may be the most rewarding. One such emerging strategy is ‘Ayurgenomics’—that is a combination of deep phenotyping principles practised in Ayurveda (the Indian traditional system of medicine) and contemporary genome analysis tools (Thelma 2008; Mukerji and Bhavana 2011; Juyal et al. 2012; Govindaraj et al. 2015; Prasher et al. 2016), which may fulfil this need. In this approach, every individual is categorized into one of the specific constitution (prakriti) groups of vata, pitta and kapha predominant or mixed prakriti using the Ayurveda criteria. Further, each of these prakriti groups are also documented for their susceptibility to different common complex traits, e.g. vata prakriti individuals are more prone to neurological disorders (Sharma 2003; Dey and Pahwa 2014). Accordingly, it is hypothesized that applying Ayurveda principles to predict prior risk/treatment outcome in genetic association studies may explain much more disease variance and thus potentially open up more predictive health, the major goal of P4 medicine. A few recent studies have validated these principles using rheumatoid arthritis (RA) as an example. An unpublished GWAS from our lab identified (i) different sets of functionally relevant risk genes with impressive effect sizes in each of the RA subgroups, and (ii) a very good correlation between genetic findings and clinical knowledge of Amavata (RA). Similar findings were also observed in a previous study (Juyal et al. 2012) suggesting that it may indeed be a very promising approach to overcome the current limitation of phenotypic heterogeneity in complex traits.

Prospects

It is clear from the above discussion that advances in discovery genomics and functional analysis of putative determinants with translational potential for complex forms of neurodegenerative diseases are negligible. Levodopa administration in PD, cholinesterase inhibitors in AD, tetrabenazine in HD and riluzole in ALS have been the oldest pharmacotherapies. However, the efficacy of these first lines of treatment have been limited. This may be due to pharmacogenetic differences, drug noncompliance or unknown causes contributing singly or in combination. In PD, mutations in PINK1 and PARKIN cause neuronal death due to mitochondrial dysfunction. By discovering the gene networks that orchestrate this process, a transcription regulator ATF4 has been singled out which plays a central role in mitochondrial metabolism and which may be used as a neuroprotector and may emerge as a new therapeutic strategy (Sun et al. 2013) but such examples of genome based medicine/novel therapeutics are rather few. Given that the central dogma in medicine is diagnosis-prevention-cure-treatment, should wisdom and solution be sought from alternate technologies/experimental advances?

One of the attempted alternate therapy has been replacement of dead dopaminergic neurons with the stem cell-derived neurons. Stem cell therapies have been attempted over the last two decades holding promise for treatment of neurodegenerative disorders, particularly PD and ALS. First clinical trial for tissue replacement using autologous adrenal medullary tissue (Backlund et al. 1985) showed significant clinical improvement. Subsequently, xenograft of porcine mesencephalic tissue was also done in PD patients but without much success (Fink et al. 2000). In another attempt, neural stem cells taken from cortical and subcortical tissue were isolated and expanded and injected into striatum resulting in improvement of clinical phenotypes (Neuman et al. 2009). Bone marrow derived stem cells have also been effectively used in clinical trials of patients with neurodegeneration (Lee et al. 2008) In the next phase, foetal mesencephalic transplantation was used successfully in three open label trials but did not replicate in National Institute of Health double blind trials (http://www.medicalnewstoday.com/articles/113908.php). However, these approaches suffered from limitations such as only a small number of embryonic stem (ES) cells which can be obtained from each abortuse, and thereby the need for more which in turn has associated problems of quality deterioration of already collected neurons. Thus, of the four stem cell populations, namely ES cells, neural stem cells, haematopoietic stem cells and induced pluripotent stem cells (iPSCs) being explored for clinical applications, iPSCs seem to be the most powerful permitting multiple experimentations. With the remarkable combination of iPSCs and genetic information, firstly the possibility, to generate iPSCs using patient-derived fibroblasts differentiated into neurons of choice depending on the neurodegenerative disease under consideration and secondly, to correct the disease causing mutation in a gene using clustered regularly interspaced short palindromic repeats (CRISPR) and its associated protein Cas9 strategy (Hockemeyer and Jaenisch 2016), this technology seems to emerge as the most potential therapeutic intervention for a range of genetic disorders. From functional characterization of genes of therapeutic relevance using gene specific assays to hypothesis free approaches (such as RNA sequencing or methylome sequencing etc.), from gene/genetic variant editing to genome editing using CRISPR/Cas9 in the same genetic background, and the possibility for screening of known or novel therapeutic molecules facilitating personalized medicine are the added advantages that iPSCs offer. The success story of iPSC-derived cells being transplanted in a Japanese woman in her 70s to treat her macular degeneration provided a proof of concept of its use for humans (Cyranoski 2014). Recent reports of iPSC-based dopaminergic neuron transplants in monkeys and sustained improvement after two year of follow up (Callaway 2017); and the first group of patients who received the transplant of cells in phase 1 clinical trial (NCT02452723) launched in March 2016 and results thereof further support the potential application of iPSCs in clinical medicine. A recent study showed the correction of the heterozygous MYBPC3 mutation which causes hypertrophic cardiomyopathy in human preimplantation embryos using this technology (Ma et al. 2017). Although a few challenges similar to earlier transplantation attempts of stem cells including infections, tumourigenesis, graft induced dyskinesia etc. have to be addressed completely, this therapeutic option holds promise. Given that iPSC-CRISPR/Cas9 based therapy becomes a reality in the near future, whole genome sequencing, identification of one or more rare causal variants therein, patient derived iPSC generation and gene/genome editing and repopulating the specific brain regions with the corrected neurons may be expected to provide the much needed effective therapeutic intervention for the debilitating neurodegenerative disorders and fulfil the dreams of personalized medicine. Even as efforts in this direction are ongoing, the concept of moderate sized stem cell banks where a quality checked set of lines carrying different combinations of commonly present HLA alleles is emerging (International stemcell corporation, Australia). In this clinical strategy, only the patient needs to be matched for immune compatibility and the whole process of generating individual patient-derived stem cells is not required. Simultaneously, pathway information for the gene(s) which is identified as putative causal in the patient would enable use of an already known drug molecule, bringing it under the current engagement with repurposed drug(s) which beats the need for new and elaborate FDA approvals etc. Pathway knowledge would also facilitate identification of novel druggable targets therein and subsequent development of new lead molecules, which would always be a preferred treatment option.