Main

Alzheimer’s disease (AD) is a common, progressive and fatal age-associated neurodegenerative disorder that is characterized by neuron loss and stereotypic deposition of misfolded proteins2. The formation of oligomers of amyloid-β may initiate disease pathogenesis, triggering a cascade of events that include the development of tau neurofibrillary tangles and oxidative stress1. Tau deposition, which correlates most closely with clinical features, progresses topographically over the course of illness from medial temporal lobe structures to the neocortex, as delineated in the Braak staging system3. Despite substantial mechanistic knowledge of the formation of misfolded proteins, the core basis of cellular dysfunction in AD is not well understood.

Somatic mutations occur in healthy human tissues12,13,14, including post-mitotic neurons15,16, in which they accumulate during ageing in a process known as genosenium5,17. Analysis of somatic mutational signatures can identify the mutagenic forces responsible, including ultraviolet irradiation in sun-exposed cancers and tobacco-associated polycyclic aromatic hydrocarbons in lung cancers8,18. In human neurons, mutational signature analysis has revealed that somatic single-nucleotide variants (sSNVs) result from multiple mutagenic forces, potentially including the oxidation of DNA nucleotides5. AD shows increased oxidative stress and damaged nucleotides4, but the extent to which these damaged nucleotides are eliminated by manifold DNA repair processes, and whether they result in persistent DNA mutations, producing permanent effects on genome structure or transcription, are not known. Bulk methods, including targeted gene sequencing19 and single-molecule sequencing20, have profiled aspects of AD somatic genetics, but AD has not to our knowledge been examined at the level of individual cellular genomes. Here, to test the hypothesis that specific mechanisms of genomic damage affect AD neurons, we applied single-cell whole-genome sequencing (scWGS) to single neurons from the brains of individuals with AD and neurotypical control individuals to compare the number, genomic locations and classes of somatic mutations that are associated with AD.

Somatic mutations in neurons during ageing

We performed scWGS on pyramidal neurons isolated from the brains of individuals with AD and neurotypical control individuals (Fig. 1a, Supplementary Tables 1, 2). We stained for the pan-neuronal marker NeuN to mark neurons, and further gated only the largest NeuN-positive nuclei (Fig. 1b). This separates, to a purity greater than 99%, the nuclei of pyramidal, excitatory neurons—which are preferentially vulnerable to both neurofibrillary tangle formation21 and cell death in AD22—from those of glia and smaller, inhibitory neurons (Fig. 1c). Here, scWGS involves single-cell alkaline lysis on ice, whole-genome amplification using multiple displacement amplification (MDA) and then several screening and quality control steps, so that only genomes that are well amplified are finally sequenced. In total, using MDA, we analysed 91 neurons from 8 cases of AD and 159 neurons from 18 neurotypical control individuals (Table 1). We identified sSNVs using the LiRA pipeline23, which uses linkage to germline haplotypes to increase specificity and estimates the genome-wide somatic mutation rate by accounting for the cell-specific proportion of phaseable linked sites and false positive rate. For these MDA-amplified single-cell genomes, we performed additional filtration steps based on previously reported patterns of nucleotide substitution attributed to artefacts of genome amplification by MDA24 (see Methods, Extended Data Fig. 1). This set of filtered sSNV calls showed a variant allele fraction distribution that was very similar to that of germline heterozygous SNVs in single-cell data (Extended Data Fig. 2), which allowed us to confirm that, in neurotypical individuals, neuronal sSNVs increased with age at a rate of 16–21 sSNVs per year (Fig. 1d, Extended Data Fig. 3a–d)—consistent with previous work on neurons5,20,25. Studies using clonally expanded cells from other human tissues have shown comparable yearly increases in sSNVs, ranging from 13 to 55 sSNVs per year, with higher rates in more rapidly dividing cell types (Extended Data Table 1).

Fig. 1: Somatic mutations in single neurons in control individuals and individuals with AD.
figure 1

a, Experimental outline for scWGS. From human brain, large neurons were isolated and their genomes were amplified, sequenced, and analysed for sSNV. FANS, fluorescence-activated nuclear sorting. b, FANS using AF488-conjugated anti-NeuN antibodies to label candidate neurons for separation from glia and other cell types. Boxes show the full population of DAPI+ diploid cellular nuclei (blue dashed box); the overall population of NeuN+ nuclei (pink dashed box); and the large NeuN+ subset (black box; the subject of this study). c, Single-nucleus transcriptomic profiling of each population. Individual cells are plotted according to t-distributed stochastic neighbour embedding (t-SNE) coordinates, and clusters of 50 cells or more are annotated50 and labelled by colour, with a pie chart of the relative abundance of excitatory neurons, inhibitory neurons and glia in each population. OPC, oligodendrocyte precursor cell. dl, sSNVs identified using MDA genome amplification. df, sSNVs in neurotypical control neurons. Data points represent single neurons; trend lines show linear mixed models (PFC: P = 3.3 × 10−7, R2 = 0.63; hippocampus (HC): P = 0.16, R2 = 0.18). g, Contribution of ageing signature A to sSNVs (P = 1.67 × 10−10, R2 = 0.68, linear mixed model). h, sSNVs as a function of age in neurotypical control individuals and individuals with AD (linear mixed model trend lines: blue, control: P = 6.8 × 10−7, R2 = 0.51; red, AD: P = 0.46, R2 = 0.01). AD contributes a significant excess of sSNVs in neurons relative to normal ageing (P = 6.5 × 10−5, linear mixed model). i, AD neurons show increased sSNVs compared with age-matched (over 50 years old) control neurons (874 sSNVs per neuron, P = 7.1 × 10−5, two-tailed Wilcoxon test). j, k, Excess sSNVs attributable to AD in the PFC (j) and the hippocampus (k). The dashed blue line shows sSNVs attributable to age (zero excess). For ik, black bars show mean ± s.e.m. l, Circos plot showing the wide distribution of sSNVs across the genome in AD neurons.

Source data

Table 1 Case information and number of neurons analysed in this study

We next examined the accumulation of sSNVs in pyramidal neurons located in the CA1 subfield of Ammon’s horn of the normal hippocampus, as this is a critical region in AD and other diseases. Hippocampal CA1 neurons from individuals who died with no neurological diagnosis showed a trend towards the accumulation of sSNVs with age (Fig. 1e), which was not significantly different from the increase in sSNVs seen in prefrontal cortex (PFC) neurons from neurotypical control individuals (P = 0.72, linear mixed-effects regression model (linear mixed model); overlay in Fig. 1f). When considering the PFC and the hippocampus together (Extended Data Fig. 3a–d), this set of single cells highlights a common pattern of sSNV accumulation in the pyramidal neurons of neurotypical individuals.

Large-scale DNA sequencing studies in cancer have identified patterns and contexts of nucleotide substitution, termed ‘signatures’8, which often reveal mutagenic forces. In normal PFC neurons, the age-related increase in mutations is driven mainly by certain C>T and T>C changes, termed signature A5. This signature resembles the age-related ‘clock-like’ signature that is observed in other normal cells as well as in essentially all cancer cells9, designated as signature SBS5 in the COSMIC mutational signature database (https://cancer.sanger.ac.uk/cosmic/signatures). Signature decomposition analysis of sSNVs from the composite dataset of PFC and hippocampal pyramidal neurons showed that the contribution of signature A in each neuron increased with age, at a rate of 15.0 ± 1.2 sSNVs gained per year (Fig. 1g). This age-related increase in signature A mutations is similar for PFC and hippocampal pyramidal neurons (P = 0.18, linear mixed model), and is the major driver of age-related sSNV accumulation in normal neurons. Despite their universal presence in many cell types, and their accumulation in nondividing cells, the cellular mechanism of such clock-like mutations is not clear. Signature SBS5 exhibits a transcriptional strand bias9, which suggests that events leading to these mutations are associated with RNA transcription. During transcription, the double helix is unwound, exposing single DNA strands to cytosine and thymine deamination17, which are subject to transcription-coupled nucleotide excision repair (TC-NER). Transcription may therefore sensitize expressed loci to somatic mutagenesis through transcription-associated damage or ineffective repair.

Somatic mutations in AD

We next assessed the burden of sSNVs in neurons from the brains of eight individuals with AD and found that AD neurons showed significantly more called sSNVs than expected on the basis of age (P = 6.5 × 10−5, linear mixed model; Fig. 1h). This excess was variable between neurons, mirroring the variable presence of AD pathology within neurons of a given brain region. AD neurons also showed a significant increase in called sSNVs in MDA experiments when directly compared to age-matched neurotypical control neurons (P = 7.1 × 10−5, two-tailed Wilcoxon test; Fig. 1i). This increase remained after controlling for potential covariates including post-mortem interval, sample storage time, sample DNA quality, sequencing depth, sequencing quality score, library insert size and number of heterozygous germline SNVs, as well as technical metrics of scWGS evenness (see Methods, Extended Data Fig. 3e–h). In the PFC, we observed significant gains in sSNVs in AD relative to normal ageing in seven out of eight individual cases of AD (Fig. 1j). Several of the genomes with the highest sSNV counts in AD came from the hippocampus, in which five of eight cases also showed significant increases in sSNVs compared with normal ageing (Fig. 1k). However, in three cases, the assayed hippocampal neurons did not show a detectable increase in the handful of cells assayed. On the basis of tau (Braak) and amyloid-β (Consortium to Establish a Registry for Alzheimer’s Disease; CERAD) neuropathological staging, hippocampal pathology appears to precede PFC damage, and the hippocampus of these late-stage cases invariably showed widespread neuronal loss as well (not shown). Thus, it is possible that highly mutated neurons are lost before death and therefore not possible to assay here, so our results may reflect resilient neurons that have survived despite advanced AD22. These results show that neurons in AD contain hundreds of additional sSNVs beyond that expected for their age, indicating that the disease process produces a level of genomic damage that is on par with more than a decade of normal accumulation of sSNVs.

The somatic mutations identified in AD neurons are pervasively distributed across the genome (Fig. 1l), with a trend towards an excess in regions at least 1 kb upstream from the transcription start site—where DNA damage has been implicated during neuronal gene transcription26—that does not survive Bonferroni correction (P = 0.045, two-tailed t-test; Extended Data Fig. 4). The broad genomic distribution of variants suggests that, rather than constituting a specific initial event in disease pathogenesis, somatic mutations are more likely to be secondary, resulting from other events that initiate AD and instigate mutagenic processes. Specifically, we did not observe somatic instances of known pathogenic mutations in classic germline AD risk genes (APP, PSEN1, PSEN2 and APOE), concordant with a recent report27, nor did we observe somatic increases in copy number of the APP gene, contrary to a previous study28 and as we reported in detail separately29. We also observed no consistent effect of an individual’s ApoE status or sex on the accumulation of sSNVs.

Mutational signature analysis in AD neurons

We next performed mutational signature analysis to identify whether specific processes cause somatic alterations in AD neurons. De novo signature decomposition revealed mutational signatures concordant with those previously reported in human neurons5 (Extended Data Fig. 5). We focused our analysis on neuronal signatures A and C (Fig. 2a), as signature B contains clonal developmental mutations, but is also where artefactual C>T mutations created by MDA amplification aggregate24. Signature A mutations increase with age in all samples, which suggests that this clock-like signature (that is most similar to the clock-like signature SBS5 from cancer5) constitutes an inherent feature of genome ageing. Signature A also shows a marginal increase in AD relative to age-matched controls (Fig. 2b, c), which does not reach statistical significance in these MDA experiments, but suggests that these mutational mechanisms could be accentuated in the setting of disease. On the other hand, AD neurons show a pronounced increase in signature C compared to controls (Fig. 2d, e), which accounts for most of the observed excess in alterations. The signature C burden shows more variation between neurons than that for signature A (Extended Data Fig. 5d), which suggests that signature C could result from irregular ‘calamitous’ events, in contrast to the uniform ageing represented in signature A.

Fig. 2: Somatic mutational signatures and patterns in AD neurons by MDA.
figure 2

a, Somatic mutational signatures identified by NMF5. b, c, Signature A contribution by age (b; AD excess 418, P = 3.1 × 10−4, linear mixed model) and in individuals with AD versus age-matched control individuals (c; 27% increase in AD, P = 0.10, two-tailed Wilcoxon test). d, e, Signature C contribution by age (d; AD excess 549, P = 1.4 × 10−3, linear mixed model) and in individuals with AD versus control individuals (e; 104% increase in AD, P = 8.7 × 10−8, two-tailed Wilcoxon test). f, Oxidative damage in AD neurons, using 8-oxoG immunofluorescence. Data points represent mean absorbance units (AU) ± s.e.m. of n = 100 neurons per case in PFC (full data in Extended Data Fig. 7). Trend lines show linear mixed-effects regression (AD versus control: P = 1.2 × 10−6). Inset shows representative immunofluorescence images; neurons (NeuN; green) and oxidized guanine (8-oxoG; magenta). Scale bars, 60 μm. g, Genomic sSNV density as a function of gene expression in the brain. Diamonds represent mean relative sSNV density in single neurons (black vertical lines show s.d., n = 1,000 permutations). Overall trend line is shown in black (R2 and P value, Pearson correlation); 95% confidence interval (CI) in grey; and AD and control trend lines in colours. h, GO analysis of genes mutated in single neurons. i, sSNVs by DNA strand template status. sSNVs in transcribed regions exhibit a strand bias in the excess mutations in AD neurons, which is most pronounced in C>A variants (*P = 0.017, two-tailed Poisson test). j, Coding mutation subtypes, in which increased nonsynonymous mutations in AD (P = 1.6 × 10−5, two-tailed t-test) increase the propensity for presentation of neoantigen peptides. k, sSNVs that result in gene knockout cells. Model for the abundance of neurons with gene inactivation, affecting function. Circles represent mean for each individual, (n > 3 neurons each, see Source Data), with 95% CI. c, e, j, Data are mean ± s.e.m.

Source data

Signature C includes C>A substitutions, which have previously been associated with oxidative damage to guanine nucleotides18. Signature C also has a significant contribution from the cancer-associated signature SBS8 (ref. 5) (Extended Data Fig. 6a). This signature is increased in stem cells with disrupted TC-NER10,30, and we have observed an increase in signature C in single human neurons deficient in TC-NER owing to ERCC6 mutations, and in neurons deficient for global NER owing to XPA or XPD mutations5. Overlap between AD sSNVs and other cancer-derived signatures also suggests a potential role for NER in T>A, T>C and C>T mutations (Extended Data Fig. 6b). Signature C has been reported in normal neurons at low but highly variable levels5, with some accumulation with age in the normal PFC, and a similar signature has also been reported in ageing stem cells from the liver and intestine6. Given that increased reactive oxygen species (ROS) and oxidative nucleic acid lesions have been reported in AD4,31,32,33, a plausible mechanism for the accumulation of signature C in AD is that increased oxidative damage overwhelms NER, which could also be attenuated in AD. The set of excess mutations in individuals with AD, represented as the trinucleotide spectrum of residual mutations when subtracting those present in control individuals, also includes contributions from the cancer signature SBS6 (Extended Data Fig. 6b), which is associated with defective DNA mismatch repair, raising the possibility that other repair mechanisms may further contribute to the generation of somatic mutations in AD neurons.

Oxidative damage in AD neurons

Because our mutational signature analysis suggested that DNA oxidation—previously observed in bulk analyses of brains from individuals with AD4,11—might contribute to the excess sSNVs in AD, we directly examined nucleotide oxidative damage in individual neurons. The most frequent oxidized nucleotide lesion due to oxidative stress is 8-oxoguanine (8-oxoG), and this is therefore used as a biomarker for cellular oxidative status and DNA damage. Immunofluorescence microscopy using an antibody targeting 8-oxoG showed that there were significantly higher levels of 8-oxoG in AD neurons than in neurotypical control neurons (P = 1.2 × 10−6, linear mixed model; Fig. 2f, Extended Data Fig. 7), indicating that increased levels of oxidative nucleotide damage contribute to C>A changes and to the increase in signature C in AD neurons.

Transcriptional influence on somatic SNVs

Mutations in genes that are critical for neuronal function and survival could directly affect cellular fitness. Despite the preferential repair of transcribed genes in human neurons34, the burden of sSNVs in transcribed regions of the genome correlated with gene expression levels in the brain (P = 3.1 × 10−3, Pearson correlation; Fig. 2g). When this observation was separated by signature, with increased expression we observed increased signature A mutations (P = 5.0 × 10−5, Pearson correlation), but decreased signature C mutations (P = 6.5 × 10−3, Pearson correlation). These findings provide further support for the hypothesis that ageing-associated signature A and AD-associated signature C arise from different mechanisms. For signature A, events during transcription appear to have a role in generating mutations, whereas signature C correlates inversely with expression and therefore may be more effectively repaired during transcription, including by TC-NER35.

Gene Ontology (GO) analysis of loci mutated in AD and control neurons revealed that genes involved in neuronal function were enriched for sSNVs (Fig. 2h). When considered together with the expression–sSNV findings, AD neurons show an influence of transcriptional processes on mutation generation. Such a transcriptional influence can produce an asymmetric pattern of mutations on the paired DNA strands. We therefore distinguished the sSNV sites by template status, between transcribed template strands and untranscribed strands (Fig. 2i). We found a significant strand bias for C>A mutations on the transcribed strand, along with a modest strand bias for C>T and T>C, providing further evidence that errors in transcription-related mechanisms have a role in the generation of sSNVs in AD neurons. As one example, an unrepaired oxidized guanine nucleotide, 8-oxoG, on an untranscribed strand could become a G>T mutation, which would be classified as a C>A mutation on the transcribed strand. In addition to the apparent protective role of NER processes against somatic mutation, the involvement of NER in signature C mutations also presents a potential mechanism for the accumulation of mutations in non-cycling cells, as NER involves the removal of an approximately 29-bp sequence by an exonuclease, followed by the replication of those 29 bp from the remaining DNA strand36; this allows for replication errors during repair if the template strand is also damaged.

Potential consequences of somatic mutations in AD

Somatic mutation or single-stranded damage that alters amino acids can contribute to neuronal dysfunction or loss by many mechanisms, including direct impairment of transcription, alterations in protein stability or creation of neoantigens. In protein-coding genes, AD neurons show more nonsynonymous mutations than age-matched control neurons (Fig. 2j), which has the potential to impair dosage-sensitive genes, or to create neoantigen peptides that could elicit T lymphocyte activation, immune attack and consequent cellular damage. Observations of clonal CD8+ T cells in cerebrospinal fluid and brain tissue in AD37 suggest that such autoactivation could be relevant in AD. Moreover, as somatic alterations accumulate in a genome, the likelihood of two deleterious exonic alterations in the same gene, producing a knockout cell, increases exponentially. We modelled the rate of sSNV-caused knockout neurons (Fig. 2k), and found a substantial projected increase in AD over controls (P = 0.022, generalized estimating equation model). This model suggests that dysfunctional neurons would be markedly more abundant in AD, which may be compounded by the length of certain AD-relevant genes38; compromising neuronal function may therefore be one way in which sSNVs affect cellular physiology39. The pronounced effect of genomic damage, even in non-dividing cells, is underscored by the observation that multiple defects in DNA repair result in neuronal dysfunction and degeneration5,40.

Interrogation of AD neuron genomes by PTA

The experiments discussed thus far, which used MDA to amplify the genomes of single neurons, used LiRA variant calling to counteract allele dropout23 and signature-based filtering of amplification artefacts (Extended Data Fig. 1), which are features of MDA-based methods. To corroborate our findings from MDA-amplified single neuron genomes, we applied a second single-cell amplification method that removes most or all amplification artefacts41,42 as an orthogonal approach. Primary template-directed amplification (PTA)41 achieves highly uniform genome amplification by using chain-terminating nucleotides to disfavour long amplification products that can be re-primed. PTA thus allows the identification of sSNVs in single human neurons while mitigating known single-cell artefacts that can be seen from MDA42, obviating the need for signature-based variant filtering. PTA-based scWGS of human neurons has confirmed that somatic mutations increase with age42. We performed PTA-based scWGS on a small sample of neurons from most brains profiled by MDA (29 neurons from 7 cases of AD and 40 neurons from 13 neurotypical control individuals; Table 1) and confirmed that AD neurons contain increased somatic alterations compared to controls (P = 3.9 × 10−4, linear mixed model; Fig. 3a). This effect remained after controlling for technical metrics (Methods, Extended Data Fig. 8c–f). The magnitude of the PTA-detected AD increase is somewhat lower than what was observed by MDA, which is likely to reflect in part residual amplification artefacts in MDA material. sSNVs detected by PTA show trinucleotide spectra (Extended Data Fig. 8a) and COSMIC signature contributions (Extended Data Fig. 8b) that are highly similar to those seen in multiplexed end-tagging amplification of complementary strands (META-CS), a recently reported duplex sequencing method that explicitly distinguishes double-stranded mutations and single-stranded DNA lesions25. PTA-identified mutational spectra closely cluster with META-CS-identified double-stranded mutations and are distinct from META-CS single-stranded lesions, which strongly suggests that PTA-detected sSNVs represent double-stranded somatic mutations.

Fig. 3: Profile of somatic mutations in single AD neurons by PTA.
figure 3

Single-neuronal nuclei were isolated from control and AD prefrontal cortex and subjected to PTA whole-genome amplification for scWGS. a, sSNVs as a function of age in neurotypical control individuals (blue) and individuals with AD (red). Blue and red lines show linear mixed model trend lines for each group (control: P = 2.0 × 10−16, R2 = 0.90; AD: P = 6.57 × 10−7, R2 = 0.59). By PTA, AD contributes a significant excess of sSNVs (196 per genome) in neurons compared to the normal ageing trend line (P = 3.9 × 10−4, linear mixed model). b, c, PTA-called sSNVs by mutational signature in each individual neuron. sSNV contributions are shown as a function of age for signature A (b; AD versus control P = 0.04, linear mixed model) and signature C (c; AD versus control P = 5.3 × 10−3, linear mixed model). d, Transcriptional influence on somatic mutation in neurons profiled by PTA. Genes with higher expression levels show increased overall and signature A density and decreased signature C density. Data points represent mean sSNV density relative to expected density based on the mutation trinucleotide context, with black vertical lines showing s.e.m. Controls represent age-matched (over 50 years old) neurotypical neurons. Overall trend line is shown in black; 95% CI in grey; and separate AD and control trend lines in colours. R2 and P values are shown for Pearson correlation. e, sSNVs by DNA strand template status. sSNVs in transcribed regions exhibit a strand bias in the excess mutations in AD neurons. For each nucleotide change, the proportional contributions of the transcribed and the untranscribed strand are shown. The strand bias ratio data in PTA-amplified neuron data showed a similar trend to that seen in MDA-amplified neurons.

Source data

We also examined PTA-detected mutations by signature decomposition, which again confirmed that signature A mutations increase with age in a clock-like manner (Fig. 3b), with a marginally significant increase in signature A in AD neurons (P = 0.04, linear mixed model). The AD-associated increase in mutations is most pronounced for signature C (P = 5.3 × 10−3, linear mixed model; Fig. 3c). As with the increase in total mutations in AD neurons, the PTA mutational signature findings mirrored the trends seen in MDA-amplified neuron genomes. The residual PTA-detected mutations in AD neurons show a distinct trinucleotide spectrum (Extended Data Fig. 8a), with an excess of C>A and C>T mutations that is also seen in MDA-amplified neurons. When analysed for contributions of COSMIC cancer mutation signatures, the residual mutations in AD neurons show a distinct pattern from that of control neurons (Extended Data Fig. 8b), including many signatures seen with MDA-detected AD residual mutations. Among these are SBS8 as well as SBS30, which is associated with the DNA repair enzyme NTHL1 that is involved in oxidative lesion repair. The PTA-detected burden of sSNVs in transcribed regions correlated with levels of gene expression in the brain (P = 2.8 × 10−3, Pearson correlation; Fig. 3d), whereas signature A and C mutations showed similar patterns to those seen with MDA-detected sSNVs, pointing to specific effects of transcriptional activity on mutation occurrence. We also noted a C>A strand bias in PTA-amplified AD neurons (Fig. 3e), further implicating transcription-related events in the generation of sSNVs in AD neurons. Thus, both scWGS approaches identified similar patterns, and suggest that the pathogenic mutational mechanisms in AD include DNA oxidation, NER DNA repair and transcriptional activity.

Although several studies have confirmed that neurons accumulate sSNVs with age5,20,25, one recent study using a single-molecule technique called NanoSeq did not find greater genome-wide mutation rates in AD-affected brains compared to aged brains of neurotypical control individuals, and actually reported a small but significant decrease in somatic mutations in AD20. There are a few potential reasons for this discrepancy as compared to our findings in single AD neurons. One possibility is that single-stranded lesions or variants contribute to our signal, although we have taken lengths to exclude this, including custom computational removal of known MDA artefacts and application of the PTA scWGS method. The NanoSeq study may also reflect an analysis of different cell populations from the individual cells that we studied here. The NanoSeq analysis studied bulk DNA from 15,000 pooled cells sorted using NeuN without size gating20, but we observed that sorting by NeuN alone includes excitatory and inhibitory neurons, as well as some glial cells (Fig. 1b, c). Therefore, the NanoSeq study does not enrich for the excitatory pyramidal neurons that are selectively vulnerable to AD21,22, which is likely to obscure the modest but consistent difference that we find when pyramidal neurons are enriched. The bulk NanoSeq method on all NeuN-expressing cells would also be susceptible to differences in cell-type abundance, which could account for the slightly decreased mutation count that was observed. Thus, increased somatic mutation burden in the AD brain may be limited to precisely the neuron subtypes that are most affected by the disease, potentially sparing some cell types.

Discussion

Our analysis reveals that excitatory neurons in the brains of individuals with AD accumulate genomic damage—and likely permanent mutations—beyond the levels that occur as a result of ageing alone. The pattern of genomic SNV accumulation in AD neurons appears to be distinct from an accentuation of normal ageing, as suggested by (1) the abundance of signature C, which is present but limited in the brain of neurotypical control individuals; and (2) signature-specific transcriptional influences. These genomic changes may include a spectrum of manifestations, including single-stranded DNA lesions and double-stranded mutations. Notably, putative mutations identified by PTA-based scWGS were molecularly similar to bone fide double-stranded mutations identified by duplex sequencing, but dissimilar to single-stranded lesions. These correlations, combined with the evenness of PTA genome coverage, suggest that the AD-specific somatic alterations are predominantly double-stranded mutations. Future studies that are specifically designed to compare DNA lesions with permanent mutations may shed further light on the differential effects these related phenomena have in AD. Other types of somatic alterations—such as short insertions and deletions, structural variants and retrotransposition events—can also be explored in greater depth as technologies improve.

Beyond abundance, the specific patterns of somatic alterations in AD neurons provide clues as to their causes and potential effects in AD pathogenesis (Fig. 4), and identify potential therapeutic targets. Signature C is notable for the presence of C>A variants, associated with oxidative damage, which has been observed previously in AD4 and which we found to be increased in AD neurons. This suggests that sSNVs occur downstream of ROS during disease pathogenesis. Signature C has a notable similarity to COSMIC signature SBS8, which is associated with the transcription-coupled repair of damaged guanine10, strongly suggesting that it accumulates either through disease-related defects in NER, or, more likely, from an accelerated accumulation of oxidized nucleotides that overwhelms the repair pathway. Oxidized nucleotides reflect the presence of increased ROS, which have previously been reported in the brain of individuals with AD, and which can be generated by a variety of processes—including inflammation and mitochondrial dysfunction, which have also been reported in AD43. Our data show how these oxidative lesions may impair genomic function by interacting with mutations that occur as a part of ageing.

Fig. 4: Model of the role of somatic mutations in AD pathogenesis.
figure 4

Amyloid-β (Aβ) oligomers initiate a cascade of events, including the conversion of tau to neurofibrillary tangles and the accumulation of ROS. After DNA damage by ROS or other mutagens, somatic mutations develop with characteristic features of signature C. NER affects the strand and gene distribution of somatic mutations, and rare base misincorporation during repair may also have a role in the progression from DNA damage to mutation. These somatic mutations stand to increase cellular vulnerability by mechanisms including gene inactivation and neoantigen presentation.

A major question that remains concerns how the buildup of AD-related genomic damage relates to the well-established accumulation of amyloid-β and tau proteins1,2. Indeed, both of these AD-associated misfolded proteins can induce ROS44,45, with the tau effect being mediated by mitochondrial dysfunction45. Furthermore, tau can trigger double-stranded DNA breaks46, thus further compounding the effects of sSNVs and potentially inducing more47. Many aspects of the oxidative stress induced by AD proteins are not clear, but this process may also include the amyloid-β-stimulated activation of microglia, which can produce ROS directly and can also indirectly initiate the generation of ROS through the release of pro-inflammatory cytokines48. Binding of amyloid-β to redox-active iron may also add oxidative stress49. It will be important to identify how protein misfolding and other known events in AD relate to the accumulation of somatic mutations in the pathogenesis of disease.

Methods

Data reporting

No statistical methods were used to predetermine sample size. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment.

Human tissue samples and selection of cases of AD

Post-mortem frozen human tissues were obtained from the Massachusetts Alzheimer’s Disease Research Center (MADRC) at Massachusetts General Hospital and the NIH Neurobiobank at the University of Maryland Brain and Tissue Bank (UMBTB). Tissue collection and distribution for research and publication was conducted according to protocols approved by the Partners Human Research Committee (for MADRC: 1999P009556/MGH, expedited waiver category 5) and the University of Maryland Institutional Review Board (for UMBTB: 00042077), and after provision of written authorization and informed consent. Research on these de-identified specimens and data was performed at Boston Children’s Hospital with approval from the Committee on Clinical Investigation (S07-02-0087 with waiver of authorization, exempt category 4). Many neurotypical control tissues and datasets were obtained as part of a previous study5. Neurotypical control cases had no clinical history of dementia or other neurological disease. AD cases had a clinical history of dementia consistent with AD, pathologically confirmed AD pathological change (Braak stage V–VI) and no other notable neurodegenerative pathology. Age-matched cohorts included individuals who were over 50 years old (Table 1).

Isolation of individual pyramidal neurons for single-cell studies

The isolation of single neuronal nuclei using fluorescence-activated nuclear sorting (FANS) for the neuronal nuclear transcription factor NeuN and whole-genome amplification (WGA) using MDA51 have been described previously5,52. In brief, nuclei were prepared from unfixed frozen human brain tissue, previously stored at −80 °C, in a dounce homogenizer using a chilled tissue lysis buffer (10 mM Tris-HCl, 0.32 M sucrose, 3 mM Mg(OAc)2, 5 mM CaCl2, 0.1 mM EDTA, 1 mM DTT, 0.1% Triton X-100, pH 8) on ice. Tissue lysates were layered on top of a sucrose cushion buffer (1.8 M sucrose 3 mM Mg(OAc)2, 10 mM Tris-HCl, 1 mM DTT, pH 8) and ultra-centrifuged for 1 h at 30,000g. Nuclear pellets were resuspended in ice-cold PBS supplemented with 3 mM MgCl2, filtered, then stained with anti-NeuN antibody directly conjugated to Alexa Fluor 488 (AF488) (Millipore MAB377X, clone A60, 1:1,250). NeuN staining produced a bimodal signal distribution (Fig. 1b, bottom), distinguishing NeuN+ and NeuN nuclei. Large neuronal nuclei, representing excitatory pyramidal neurons, were then identified by flow cytometry (using software BD FACSDiva v.8.0.2) by targeting the nuclei with highest NeuN signal among the NeuN+ neuronal fraction, while also gating for the population with the highest forward scatter area (FSC-A) signal, designated by the black box in Fig. 1b. This high-FSC-A, high-NeuN population is intended to represent large neurons, comprising 2–5% of the total population of nuclei in each sample.

The composition of the targeted population of large neurons was assessed using single-nucleus RNA transcriptomic sequencing (snRNA-seq), along with two control populations: all cells and all NeuN+ cells (each shown with respective gating boxes in Fig. 1b). snRNA-seq of these three populations of cellular nuclei was performed on a representative tissue sample (control individual 1465, prefrontal cortex). Nuclei were isolated as described above, with the following modifications: 0.2 U μl−1 Protector RNAse inhibitor (Roche RNAINH-RO) and 0.2 U μl−1 SuPERase-IN RNAse inhibitor (Invitrogen) were both added to the tissue lysis buffer and to the immunostaining buffer, and MgCl2 was omitted from the immunostaining buffer. For each of the 3 populations, 16,000 nuclei were sorted into one well of a 96-well plate, then subjected to snRNA-seq using the 10X Genomics Next GEM Single Cell 3′ GEM Kit v3.1 and Chromium Controller. From these three populations, three libraries were prepared, each with dual indexes using the 10X Genomics Dual Index Plate. Each library was then sequenced on Illumina NovaSeq S4. The raw snRNA-seq data of three 10X libraries were analysed separately and then aggregated by Cell Ranger (v.6.0.0)53, followed by variance normalization, t-SNE clustering and visualization processed by Pagoda2 (v.0.1.0)54. Clusters with 50 or more cells were manually annotated as different neuronal and glial subtypes on the basis of the expression of marker genes using a similar protocol to that described in a previous study50 These snRNA-seq data (Fig. 1c) enabled the assessment of various sorting populations shown in Fig. 1b. The full population of cells (DAPI+) contained a mixture of excitatory neurons, inhibitory neurons and glia. The overall NeuN+ population was highly enriched for neurons, but contained many inhibitory neurons and some glia. The population of cells targeted in this study, large NeuN+ nuclei, was highly enriched in pyramidal neurons, consisting of 100% neurons, of which 99.3% were excitatory neurons (Fig. 1c), with minimal inhibitory neurons and glia.

scWGS of pyramidal neurons using MDA

Single nuclei, prepared as described above, were sorted one nucleus per well into 96-well plates, with each well containing 2.8 μl alkaline lysis buffer (200 mM KOH, 5 mM EDTA, 40 mM DTT) pre-chilled on ice. Nuclei were lysed on ice for 15–30 min, then neutralized on ice in 1.4 μl neutralization buffer (400 mM HCl, 600 mM Tris-HCl, pH 7.5). These cold temperatures appear to be important to limit artefacts55. MDA was then performed in a 20 μl total reaction volume by addition of an MDA master mix (12.18 μL QIAGEN REPLI-g reaction buffer, 2.675 μl H2O, 0.105 μl DTT, 0.84 μl REPLI-g Phi29 polymerase enzyme). MDA was performed at 30 °C for 2 h. This protocol was applied to all new MDA samples in this study, and was confirmed to yield equivalent results as a prior protocol using Phi29 polymerase from a different distributor (repliPHI, Epicentre).

Samples were subjected to quality control by DNA quantification (PicoGreen, 3 μg yield required) and multiplex PCR for four random genomic loci. For an additional quality control step, we performed low coverage (0.5×) WGS, and cells with sufficiently even genome coverage (median absolute pairwise difference, MAPD; and coefficient of variation, CoV) were processed for deep sequencing. For germline reference, bulk DNA was purified using phenol:chloroform:isoamyl alcohol extraction and isopropanol precipitation, without RNAse A treatment.

Amplified single-neuron genomes were prepared for sequencing by DNA shearing and libraries generated by Psomagen (Macrogen) and Novogene using Illumina Tru-Seq kits and Illumina HiSeq X10 paired end sequencing (150 bp × 2) (Supplementary Table 1), as described previously5.

scWGS of pyramidal neurons using PTA

Single neurons, prepared as described above, were sorted one nucleus per well into 96-well plates and their genomes were amplified by PTA41,42, a method that pairs an isothermal DNA polymerase with a termination base to induce quasi-linear amplification. PTA reactions were performed using the ResolveDNA Whole Genome Amplification Kit (previously known as SkrybAmp EA WGA Kit) (BioSkryb Genomics). Nuclei were sorted into 3 μl Cell Buffer pre-chilled on ice. Nuclei were then lysed by addition of 3 μl MS Mix, with mixing at 1,400 rpm performed after each step. Lysed nuclei were then neutralized with 3 μl SN1 buffer. Three microlitres of SDX reagent was then added, followed by a 10-min incubation at room temperature. Eight microlitres of reaction mix (containing enzyme) was then added, for a total reaction volume of 20 μl. Amplification was carried out for 10 h at 30 °C, followed by enzyme inactivation at 65 °C for 3 min. Amplified DNA was then cleaned up using AMPure, and the yield was determined using PicoGreen binding (Quant-iT dsDNA Assay Kit, Thermo Fisher Scientific). Samples were then subjected to quality control by multiplex PCR for four random genomic loci as previously described5, and also by Bioanalyzer for DNA fragment size distribution. Amplified genomes showing positive amplification for all four multiplex PCR loci were prepared for Illumina sequencing. In contrast to MDA, a low-coverage WGS screening step was performed.

Libraries were prepared following a modified KAPA HyperPlus Library Preparation protocol described in the ResolveDNA EA Whole Genome Amplification protocol. In brief, end repair and A-tailing were performed for 500 ng amplified DNA input. Adapter ligation was then performed using the SeqCap Adapter Kit (Roche, 07141548001). Ligated DNA was cleaned up using AMPure and amplified through an on-bead PCR amplification. Amplified libraries were selected for a size of 300–600 bp using AMPure. Libraries were subjected to quality control using PicoGreen and TapeStation HS DS100 Screen Tape (Agilent PN 5067-5584) before sequencing. Single-cell genome libraries were sequenced on the Illumina NovaSeq platform (150 bp × 2) at 30× coverage (Supplementary Table 1). Data from PTA-amplified neuronal genomes in AD were analysed alongside data from control neurons that are reported elsewhere42.

Read-mapping and generation of BAM files

Reads generated from WGS were mapped onto the human reference genome (GRCh37 with decoy) by BWA (v.0.7.15)56 with default parameters. Duplicate reads were marked by MarkDuplicates of Picard tools (v.2.8) and post-processed with local realignment around indels and base quality score recalibration using Genome Analysis Toolkit (GATK) (v.3.5)57.

Calling of sSNVs from scWGS data

We used phasing-based linked read analysis (LiRA, v.2018Feb)23 to identify sSNVs against individual-specific bulk germline reference genomes, as described previously5. The initial somatic and germline variants were called using GATK’s HaplotypeCaller and germline variants were further phased by Shapeit 2 (v.904). sSNVs were called by LiRA and distinguished from technical artefacts when showing strong evidence for only two haplotypes with paired-end, read-backed linkage between the sSNV candidate and the adjacent germline heterozygous site. The autosomal genome-wide burden of sSNVs was then calculated by accounting for the proportion of phaseable sites and estimated false positive rate. We should emphasize that the raw LiRA calls are an intermediate step that requires scaling by a power ratio to calculate genome-wide somatic mutation rates that are comparable between cells (for example from MDA data, see Extended Data Fig. 1b). Of note, LiRA is only designed to call phased somatic variants in diploid genome regions, so we only considered sSNVs in autosomes for subsequent analyses to avoid potential detection bias in sex chromosomes between male and female individuals.

Because LiRA calling requires linked heterozygous germline sites for optimal specificity and false positive rate, it may limit its detection sensitivity in regions lacking phaseable germline variants. Therefore, to more comprehensively assess sSNVs in known AD risk genes (APP, PSEN1, PSEN2 or APOE) and the tau-encoding gene MAPT, we considered both the LiRA-called variants and the larger group of GATK calls that includes non-phaseable parts of these genes. In both LiRA-called variants and GATK calls, we identified no known pathogenic sSNVs in any of these AD-related genes. The question of clonal somatic mutations in these and other AD risk genes also has been examined in other studies by bulk gene sequencing19,58,59.

Given the more even genome coverage and potentially fewer artefacts that are produced by PTA42, we used Single Cell ANalysis of SNVs (SCAN-SNV, v.2019Oct)60, which does not require phasing information from adjacent germline variants and thus has more detection power in non-phaseable regions, to identify specific genomic sites of sSNVs for mutational signature and other downstream analyses.

Determining the evenness of single-cell genome amplification

The evenness of single-cell genome amplification was quantified using two different methods (Supplementary Table 4). First, the MAPD metric was calculated as reported previously61, which is the median value across all absolute differences between log2-transformed copy number ratio of neighbouring genome bins, and a higher MAPD score represents greater unevenness of amplification. Binning, GC normalization, segmentation and copy number estimation were performed to obtain copy number ratio per bin following a previous single-cell copy number analysis protocol62, and MAPD was then calculated by taking a median of absolute difference between neighbouring bins. Second, considering that MAPD cannot reflect the variance of the copy number ratio distribution within each neuron, the CoV was also calculated by normalizing the standard deviation of absolute difference between neighbouring bins by their mean. We also calculated a ‘power ratio’ metric, which is defined as the ratio between the LiRA-estimated genome-wide sSNV burden and the LiRA-called phaseable sSNV count, reflecting the proportion of the genome that has been adequately amplified for each single cell. Using mixed-effects modelling, we measured the effect of these three metrics of genome evenness on sSNV burden in well-characterized neurotypical PFC neurons. We then normalized the mutation burden in each cell and estimated the age and disease effects on sSNV burden, as described in the section ‘Mixed-effects modelling of somatic SNV burden’.

Mutational signature analysis

To discover mutational signatures of sSNVs, we calculated the frequency of mutations in the 96-trinucleotide contexts for all control and AD neurons from the identified single-neuron sSNVs (synthesized in Extended Data Fig. 5a for MDA, and in Extended Data Fig. 8a for PTA). Mutation signatures in MDA-amplified neurons were detected by fitting a non-negative matrix factorization (NMF)-based mutational signature framework63 using MutationalPatterns (v.1.8.0)64 (Extended Data Fig. 5b). As we increased the number of signatures, we estimated the signature stability and reconstruction error of each signature and identified four signatures (N1, N2, N3 and N4) (Extended Data Fig. 5c) that maximize the number of signatures while minimizing error (Extended Data Fig. 5b). We also used a second signature derivation method, SignatureAnalyzer (v.1.1)10,65, which can infer the optimal number of signatures from data by considering both model complexity and fitting accuracy. Under default parameters with half-normal distribution for priors and reducing effect of ultramutated samples, SignatureAnalyzer produced four signatures (W1–W4) with the greatest likelihood, which are nearly identical to signatures N1–N4 that were identified by MutationalPatterns (Extended Data Fig. 5c).

We observed a marked similarity between the de novo single-neuron signatures and previously published single-neuron signatures5 (Extended Data Fig. 5c), particularly when taking into account recently identified signatures of potential single-cell artefacts24. Each newly derived signature closely resembled a previously derived one: N4 with neuron signature A, N2 with neuron signature C, N1 with neuron signature B and potential artefact signature SBS scF, and N3 with SBS scE. To understand the underlying mechanisms for the identified mutational signatures, we further performed NMF analysis to decompose our signatures into the reported the COSMIC v3 signatures (https://cancer.sanger.ac.uk/cosmic/signatures/; Extended Data Fig. 6a). We also performed NMF analysis to fit the COSMIC signatures to our composite disease and control single-neuron mutational profiles, which is shown in Extended Data Fig. 6b.

Given the near identity between the de novo and prior neuron signatures, we used the prior signatures for our subsequent analyses. On the basis of the evidence that SBS scF (highly similar to signature B) represents potential single-cell artefacts24, we excluded the contributions from these signatures in our assessment of genome-wide sSNV burden for each single neuron.

Similarly, we used MutationalPatterns to determine mutational signature contributions in PTA-amplified neurons using the signatures we identified in MDA-amplified neurons. For PTA-amplified single-neuron genomes, we did not identify significant contributions from potential artefact signatures SBS scE and SBS scF, which prompted the filtering steps for data from MDA-amplified genomes. Therefore, for PTA-amplified genomes, we report unfiltered variant calling data.

Filtering of LiRA-called somatic SNVs from MDA-amplified genomes of single neurons

Previous studies and our observations have suggested additional measures beyond LiRA to further minimize experimental artefacts that may occur during MDA amplification of single-cell genomes24. Beginning with total LiRA-called sSNVs (Extended Data Fig. 1a), we undertook a series of analyses on our human neuron MDA scWGS data, examining the influence of uneven genome amplification and the value of identification of specific mutational signatures proposed as potential artefacts of single-cell genome amplification24. We found that cells with highly uneven genome amplification (MAPD > 2.0) show increased LiRA-called sSNV counts (Extended Data Fig. 1c), including sSNVs attributable to the potential artefact signature SBS scE, largely comprising GC>GT changes (Extended Data Fig. 1d). We also observed that a small subset of neurons, only seen in AD, show an ‘ultramutated’ profile (more than 20,000 LiRA-called sSNVs; Extended Data Fig. 1a), which is dominated by SBS scE (Extended Data Fig. 1d), suggesting that these amplified genomes may show LiRA sSNV calls that do not represent biological double-stranded fixed somatic mutations. The observed variants in these outlier cells may represent experimental artefacts, including false calls due to errors occurring early in genome amplification. Alternatively, the observed scE variants may also represent non-mutation biological events, such as unrepaired single-strand damaged nucleotides, which could be misread as sSNVs owing to strand dropout during genome amplification (Extended Data Fig. 1f). Although examination of the potential biological component of this phenomenon may provide important insights, we developed a computational filtering pipeline to generate a set of filtered sSNV calls, focusing our analysis on bona fide somatic mutations (Extended Data Fig. 1g).

Mixed-effects modelling of somatic SNV burden

To evaluate the relationships between somatic mutation and factors including age and disease status, we performed linear mixed-effects regression modelling using the lme4 (v.1.1-23) R package66, in a similar manner to our previous study5. Both genome-wide sSNV burden and signature-specific sSNV burden were considered as continuous outcomes in modelling. Disease status and other covariates of interest (for example, age and measurement of amplification evenness) were modelled as fixed effects, and donor–tissue groups were modelled as random effects, because neurons from a donor and each tissue type may be correlated owing to shared biological environment. Linear mixed-effects models were fitted using the maximum likelihood method, and P values from a t-test with the Satterthwaite approximation were calculated for each fixed effect as implemented in the lmerTest (v.3.1-2) R package67. Of note, we also used the marginal generalized least-squared method to fit the mixed-effects model, using the nlme (v.3.1-137) R package, which produced substantially similar results.

To test the age effect of sSNV burden in PFC and hippocampus from neurotypical individuals, we fitted the model \({y}_{{ijk}}=\left(\beta +{\gamma }_{j}\right)\times \)\({\rho }_{i}+\mu +{\theta }_{{ij}}+{\varepsilon }_{{ijk}}\), where yijk is the sSNV burden in neuron k from brain region j of donor i, β is the fixed-effect of age, γj is the fixed-effect of brain region j on age indicating interaction terms of age and brain region, ρi is the age of donor i, μ is the number of sSNVs at birth, θij is the random effect of the donor–tissue pair following a normal distribution with mean 0 and variance τ, and εijk is the measurement error of each neuron also following a normal distribution with mean 0 and variance σikj (Fig. 1d–f). To control for the potential confounding factor of genome amplification evenness, we further introduced another covariate, δijk, which represents the neuron-specific measurement of amplification evenness (for example, MAPD, CoV and power ratio) into the previous model, and re-estimated the age effect by subtracting the neuron-specific contribution of the amplification unevenness coefficient from yijk (Extended Data Fig. 3a–d). We found that PFC and hippocampus show no significant difference on the age effect before and after controlling for amplification evenness (all P > 0.25), therefore we did not consider the brain region covariate in downstream modelling. In addition to the genome-wide sSNV burden, we also analysed signature-specific sSNVs with similar models (Fig. 1g).

To test the difference of sSNV burden between AD and control neurons in an age-controlled manner, we fitted the model \({y}_{{ijk}}=\beta \times {\rho }_{i}+{\alpha }_{i}+\mu +{\theta }_{{ij}}+{\varepsilon }_{{ijk}}\), where αi is the fixed-effect of disease status (AD versus control), whereas yijk, β, ρi, μ, θij and εijk are defined as previously (Fig. 1h). We further adjusted the sSNV burden by considering the contribution of amplification evenness δijk as we estimated above, and the difference of sSNV burden between AD and control neurons remained significant in both MDA- and PTA-amplified neurons (Extended Data Figs. 3e–h, 8c–f).

To exclude the possibility that the observed sSNV burden increase in AD can be driven by systemic differences in sample or sequencing quality metrics, we further introduced ωijk into the linear mixed-effects model: \({y}_{{ijk}}=\beta \times {\rho }_{i}+{\alpha }_{i}+\mu +{\theta }_{{ij}}+{\varepsilon }_{{ijk}}+{\omega }_{{ijk}}\), where ωijk denotes one of the potential confounding factors including sex, post-mortem interval, DNA quality (DIN), sample storage time, sequencing depth, library insert size, proportion of read bases with base quality at least 20, and number of heterozygous germline SNVs (an indicator of genomic size of phaseable region). We confirmed that, in both MDA- and PTA-amplified neurons, the increased sSNV burden in AD remained significant after controlling for each (all P < 0.01). For Fig. 1j, k, we also calculated AD-attributable excess somatic mutations as the residual value for each single neuron after subtracting the age effect (\(\beta \times {\rho }_{i}+\mu \)) estimated from neurotypical control neurons in prefrontal cortex.

To test whether sSNV burden is associated with ApoE genotype in patients with AD, we fit the model \({y{\prime} }_{{ijk}}={\omega }_{i}+{\theta }_{{ij}}+{\varepsilon }_{{ijk}}\), where \({y{\prime} }_{{ijk}}\) is the age-corrected sSNV burden (\({y}_{{ijk}}-\beta \times {\rho }_{i}\)) for each neuron, and \({\omega }_{i}\) is the ApoE genotype of risk allele ε4 under dominant, recessive and additive genetic models. No significant association was observed in any of the three genetic models in MDA- or PTA-amplified neurons (all P > 0.21).

Gene expression analysis

To test whether somatic mutation is associated with gene expression level, we extracted the brain PFC expression data from GTEx68. The per-gene expression value was normalized for each individual after controlling for age and gender using DESeq2 (v.1.24.0)69 and averaged across all the individuals. Genes were then assigned to 10 deciles on the basis of their PFC expression levels, and all sSNV density was calculated for each decile of genes after normalizing by per-neuron sSNV detection power ratio and total gene length. To control for potential bias due to trinucleotide context and the distribution of phaseable regions (areas with sufficient sequencing coverage and an adjacent heterozygous germline SNP), we permuted the per-neuron sSNV list for 1,000 rounds by randomly shuffling the sSNVs within the phaseable regions while keeping the trinucleotide context distribution the same. We calculated the mean and standard deviation of the per-decile density in the permuted dataset, and then measured the difference between observed and expected sSNV density for each decile of AD or age-matched control group. This analysis included all brain regions in each experiment (PFC and hippocampus for MDA-based scWGS; PFC for PTA-based scWGS).

We further performed an NMF-based mutational signature analysis for sSNVs located in each decile of genes, to estimate the relative contributions of signature A, signature C, SBS scE and SBS scF for each decile. The sSNV density for each signature was calculated by multiplexing the overall sSNV density by each signature contribution.

Functional enrichment analysis

Analysis for functional enrichment of GO terms was performed using GOseq (v.1.34.1)70. For each RefSeq gene, we assigned a binary value ‘0’ or ‘1’ according to whether any sSNVs are located in the corresponding gene. Of note, this analysis is based on the LiRA output of sSNVs (signature-based filtering cannot be applied to individual genes or variants), and therefore this list may contain a small proportion of artefactual sSNVs. A probability weighting function in GOseq was applied to control for potential gene length bias. The Wallenius approximation method was used to test the enrichment of sSNVs, and the false discovery rate (FDR) method was further applied for the correction of multiple hypothesis testing. Genes without any GO annotation were ignored when calculating the total gene count. GO terms with fewer than 10 hits were excluded to avoid ascertainment bias. Very large GO terms with more than 1,000 genes were also ignored. All the GO terms with P < 0.01 in either AD or control neurons are listed in Supplementary Table 6.

Strand bias analysis

Mutations in transcribed regions of the genome may show a different density between transcribed and untranscribed strands (so-called strand bias)71,72, resulting from asymmetric mutagenesis and/or repair activity between strands. The transcriptional strands of genic sSNVs were assigned on the basis of the UCSC TxDb annotations by MutationalPatterns64. Mutated bases (‘C’ or ‘T’) on the same strand as the gene direction were categorized as ‘untranscribed’ and on the opposite strand as ‘transcribed’. Strand bias analysis was performed on the set of mutations identified in PFC and hippocampal neurons together, on the net increase (residual) of mutations in AD neurons over control neurons. Statistical significance was determined by the Poisson test.

Location of sSNVs relative to genomic features

Annotations from ANNOVAR73 were used to identify sSNVs falling in the following positions: intergenic, upstream (within 1 kb region upstream of transcription start site), 5′ UTR, exonic (coding sequence, not including untranslated regions), 3′ UTR, downstream (within 1 kb region downstream of transcription start site), splicing (within intronic 2 bp of a splicing junction), intronic. The functional interpretation was classified using four categories of SNV annotation: synonymous (SNV that does not cause an amino acid change), nonsynonymous (SNV that causes an amino acid change, excluding stop-gain and stoploss SNVs), stop-loss (nonsynonymous SNV that eliminates a stop codon), and stop-gain (nonsynonymous SNV that creates a stop codon). For exonic and UTR sSNVs, we further grouped them into 10 deciles according to their position relative to the transcript length. Similar to gene expression analysis, we used the 1,000 rounds of permutation within phaseable regions by controlling for trinucleotide context distribution, and then calculated the normalized difference (D) between observed (Nobs) and expected (Nexp) sSNV counts as below:

$$D=\frac{{N}_{{\rm{obs}}}-{N}_{{\rm{\exp }}}}{{N}_{{\rm{\exp }}}}$$

Modelling the accumulation of gene knockouts in neurons

Many specific heterozygous mutations could damage neuronal function39. Biallelic, exonic, deleterious ‘gene knockout’ (KO) mutations in essential genes would be especially damaging, such that there may be a threshold for the accumulation of such KO mutations above which neuronal function would deteriorate. On the basis of the number of sSNVs we identified in this report, we estimated the accumulation of gene KOs in cortical neurons, using a method described previously5. In brief, we estimated the probability of a mutation causing a gene knockout in a cell. In a diploid genome this corresponds to calculating the probability that two or more damaging mutations fall on the same gene, given the number of damaging mutations observed in a sample. This probabilistic problem can be modelled by an approximation of the birthday problem:

$$\begin{array}{c}Pr({\rm{K}}{\rm{O}}|n)=1-{{\rm{e}}}^{\frac{{-n}^{2}}{{\rm{n}}{\rm{o}}.{\rm{o}}{\rm{f}}{\rm{g}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{s}}}},{\rm{w}}{\rm{h}}{\rm{e}}{\rm{r}}{\rm{e}}\\ n={\rm{n}}{\rm{o.}}\,{\rm{o}}{\rm{f}}\,{\rm{s}}{\rm{S}}{\rm{N}}{\rm{V}}{\rm{s}}\times \frac{{\rm{t}}{\rm{o}}{\rm{t}}{\rm{a}}{\rm{l}}\,{\rm{d}}{\rm{e}}{\rm{l}}{\rm{e}}{\rm{t}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{o}}{\rm{u}}{\rm{s}}\,{\rm{v}}{\rm{a}}{\rm{r}}{\rm{i}}{\rm{a}}{\rm{n}}{\rm{t}}{\rm{s}}}{{\rm{t}}{\rm{o}}{\rm{t}}{\rm{a}}{\rm{l}}\,{\rm{v}}{\rm{a}}{\rm{r}}{\rm{i}}{\rm{a}}{\rm{n}}{\rm{t}}{\rm{s}}}\times 0.5,\end{array}$$

where n is the expected number of deleterious mutations for a given neuron. The approximation used here is different from the one published previously5 to allow for more robust approximation when 0 < n < 1. This model was further expanded to include information about genes that are intolerant to heterozygous mutations, resulting in haploinsufficiency and functional knockout. This is captured by the probability of loss-of-function intolerance (pLI) metric, with genes with a high pLI score (pLI ≥ 0.90) being less tolerant74. ExAC reported that 17% of all genes have such high pLI scores. We then used this information for the final model, written a follows:

$$n={\rm{n}}{\rm{u}}{\rm{m}}{\rm{b}}{\rm{e}}{\rm{r}}\,{\rm{o}}{\rm{f}}\,{\rm{d}}{\rm{e}}{\rm{l}}{\rm{e}}{\rm{t}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{o}}{\rm{u}}{\rm{s}}\,{\rm{m}}{\rm{u}}{\rm{t}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}{\rm{s}}$$
$${d}_{i}=\{{\rm{e}}{\rm{v}}{\rm{e}}{\rm{n}}{\rm{t}}\,{\rm{t}}{\rm{h}}{\rm{a}}{\rm{t}}\,{\rm{g}}{\rm{e}}{\rm{n}}{\rm{e}}\,i\,{\rm{h}}{\rm{a}}{\rm{s}}\,{\rm{a}}{\rm{t}}\,{\rm{l}}{\rm{e}}{\rm{a}}{\rm{s}}{\rm{t}}\,{\rm{o}}{\rm{n}}{\rm{e}}\,{\rm{m}}{\rm{u}}{\rm{t}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}\}$$
$${\pi }_{i}=\{{\rm{event\; that\; gene}}\,i\,{\rm{has\; a\; high\; pLI\; score}}\}$$
$$D=\{{\rm{probability\; of\; a\; gene\; having\; a\; deleterious\; mutation}}\}$$
$${\rm{\Pr }}\left({KO}|\pi ,D,n\right)=\pi \times \left(1-{\left(1-D\right)}^{n}\right)+(1-\pi )(1-{{\rm{e}}}^{-{nD}})$$

The average was taken across all cells per individual (n > 3 cells each, with specific n shown in the Source Data for Fig. 2k) and 95% CI on those point estimates were calculated for illustration purposes. A scale factor of 100 was used to convert probabilities into percentages. To test whether there was a higher probability of obtaining a KO in AD versus controls, we used generalized estimating equations with an exchangeable working correlation structure to model the probabilities using a probit link function using the geepack (v.1.3-1) R package. Namely, we fitted the model for each donor–tissue pairing k and neuron i as follows:

$$g\left({\kappa }_{k,i}\right)={\beta }_{{\rm{age}},k}{X}_{{\rm{age}},{ki}}+{\beta }_{{\rm{diagnosis}}}{X}_{{\rm{diagnosis}}}+{\beta }_{{\rm{diagnosis}}:{\rm{age}},{ki}}{X}_{{\rm{age}},{ki}}{X}_{{\rm{diagnosis}},{ki}}$$

with the correlation between two neurons in a donor-tissue pair defined as \({\rm{Corr}}\left({\kappa }_{k,i},{\kappa }_{k,{i}^{{\prime} }}\right)=\rho \), where \({\kappa }_{{ijk}}\) is the probability of a neuron having a KO mutation with the function g() being the probit link function.

Immunofluorescence microscopy for 8-oxoG as a biomarker for neuron oxidative damage

To examine whole-cell oxidation status in individual neurons in post-mortem human brain, we performed immunofluorescence staining and quantification for cellular 8-oxoG, the most frequent oxidative nucleotide product caused by ROS, under conditions known as oxidative stress. Formation of 8-oxoG is an important biomarker for oxidative status and oxidative DNA damage lesions in the cell75.

Fresh-frozen human brain PFC tissue was embedded in OCT medium and then cryo-sectioned (20 µm), with sections applied to uncharged glass slides and fixed for 10 min using 4 °C Carnoy’s fixative (60% ethanol, 30% chloroform and 10% acetic acid). Slides were washed in cold 1× PBS 3 times for 10 min each. A circle was drawn around the tissue section using a grease pen and slides were placed into a humifying chamber. Primary antibody solution consisted of: 0.2% Tween-20, rabbit anti-NeuN (1:1,000, Abcam ab177487) and mouse anti-8-oxoG (1:500, Abcam ab206461, clone 2Q2311) in blocking solution (10 mg ml−1 bovine serum albumin, 0.02 % sterile normal donkey serum, 2 mg ml−1 glycine, 2 mg ml−1 lysine in 1× PBS). Primary antibody solution was applied, and slides were sealed in a humidifying chamber and incubated at 4 °C overnight. Slides were then washed with cold 1× PBS and secondary antibody solution was applied to each slide. Secondary antibody solution: 0.2 % Tween-20, donkey anti-rabbit Alexa Fluor 488 (1:250, Thermo Fisher Scientific A32790) and donkey anti-mouse Alexa Fluor 555 (1:250, Thermo Fisher Scientific A32773) in 1× PBS. Slides were sealed in a humidifying chamber and incubated at 4 °C overnight. Slides were washed in 1× PBS then put in a dehydration series consisting of 50% ethanol (5 min), 70% ethanol (3 min × 2), 95% ethanol (3 min × 2), 100% ethanol (3 min × 2), and xylenes (5 min × 2). After the xylene step, tissue was permanently mounted using DPX and a glass coverslip. Slides were allowed to dry overnight before microscopy.

Two staining batches were performed for all cases, using an antibody master mix to reduce staining differences between slides. A middle-aged individual (46-year-old woman; case 5773) was used to establish the fluorescence exposure setting for 8-oxoG and NeuN and used for the imaging of all cases. Tissue was visualized by using a Zeiss Axio Observer 7 fluorescent microscope equipped with an X-cite Exacte 120 LEDboost lamp, Zeiss Axiocam 506 mono camera, Zen Blue 2.5 pro software and a 20× objective lens. AF488 (499ex/520 em) was paired with a 530/30 nm bandpass filter and AF555 (553ex/568em) was paired with a 582/15 nm bandpass filter channel. The top and bottom of intracellular NeuN immunoreactivity were used to establish z-stack bounds using 0.24-µm steps at 2,752 × 2,208 resolution, pixel size 4.54 µm × 4.54 µm and 1 × 1 binning. Neuron cell body 8-oxoG immunofluorescence was quantified using Fiji (ImageJ) software. For each case, n = 100 total neurons were examined and quantified for 8-oxoG (50 neurons each from two independent staining experiment batches per case). For each cell, a single z-section was chosen representing the centre of the neuron in the Z-plane. A line was drawn around the perimeter of the neuron cell body, as visualized by NeuN 488 channel. The mean grey value (absorbance units, AU) was measured within the perimeter area in the 8-oxoG 555 channel and considered the ‘intracellular signal’. The neuron perimeter object was moved to an area adjacent to the neuron with no intracellular NeuN or 8-oxoG immunoreactivity and the mean grey value was measured. This value was considered ‘background signal’ and was subtracted from the intracellular signal value. The final value was used to represent mean 8-oxoG immunofluorescence signal for the cell.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.