Somatic genomic changes in single Alzheimer’s disease neurons

Miller, Michael B.; Huang, August Yue; Kim, Junho; Zhou, Zinan; Kirkham, Samantha L.; Maury, Eduardo A.; Ziegenfuss, Jennifer S.; Reed, Hannah C.; Neil, Jennifer E.; Rento, Lariza; Ryu, Steven C.; Ma, Chanthia C.; Luquette, Lovelace J.; Ames, Heather M.; Oakley, Derek H.; Frosch, Matthew P.; Hyman, Bradley T.; Lodato, Michael A.; Lee, Eunjung Alice; Walsh, Christopher A.

doi:10.1038/s41586-022-04640-1

Somatic genomic changes in single Alzheimer’s disease neurons

Article
Published: 20 April 2022

Volume 604, pages 714–722, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

From

View current issue Submit your manuscript

Somatic genomic changes in single Alzheimer’s disease neurons

Download PDF

34k Accesses
96 Citations
435 Altmetric
43 Mentions
Explore all metrics

Abstract

Dementia in Alzheimer’s disease progresses alongside neurodegeneration^1,2,3,4, but the specific events that cause neuronal dysfunction and death remain poorly understood. During normal ageing, neurons progressively accumulate somatic mutations⁵ at rates similar to those of dividing cells^6,7 which suggests that genetic factors, environmental exposures or disease states might influence this accumulation⁵. Here we analysed single-cell whole-genome sequencing data from 319 neurons from the prefrontal cortex and hippocampus of individuals with Alzheimer’s disease and neurotypical control individuals. We found that somatic DNA alterations increase in individuals with Alzheimer’s disease, with distinct molecular patterns. Normal neurons accumulate mutations primarily in an age-related pattern (signature A), which closely resembles ‘clock-like’ mutational signatures that have been previously described in healthy and cancerous cells^6,7,8,9,10. In neurons affected by Alzheimer’s disease, additional DNA alterations are driven by distinct processes (signature C) that highlight C>A and other specific nucleotide changes. These changes potentially implicate nucleotide oxidation^4,11, which we show is increased in Alzheimer’s-disease-affected neurons in situ. Expressed genes exhibit signature-specific damage, and mutations show a transcriptional strand bias, which suggests that transcription-coupled nucleotide excision repair has a role in the generation of mutations. The alterations in Alzheimer’s disease affect coding exons and are predicted to create dysfunctional genetic knockout cells and proteostatic stress. Our results suggest that known pathogenic mechanisms in Alzheimer’s disease may lead to genomic damage to neurons that can progressively impair function. The aberrant accumulation of DNA alterations in neurodegeneration provides insight into the cascade of molecular and cellular events that occurs in the development of Alzheimer’s disease.

Somatic mutations in neurons during aging and neurodegeneration

Article Open access 28 April 2018

Somatic copy number variant load in neurons of healthy controls and Alzheimer’s disease patients

Article Open access 30 November 2022

Prevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders

Article Open access 07 October 2022

Main

Alzheimer’s disease (AD) is a common, progressive and fatal age-associated neurodegenerative disorder that is characterized by neuron loss and stereotypic deposition of misfolded proteins². The formation of oligomers of amyloid-β may initiate disease pathogenesis, triggering a cascade of events that include the development of tau neurofibrillary tangles and oxidative stress¹. Tau deposition, which correlates most closely with clinical features, progresses topographically over the course of illness from medial temporal lobe structures to the neocortex, as delineated in the Braak staging system³. Despite substantial mechanistic knowledge of the formation of misfolded proteins, the core basis of cellular dysfunction in AD is not well understood.

Somatic mutations occur in healthy human tissues^12,13,14, including post-mitotic neurons^15,16, in which they accumulate during ageing in a process known as genosenium^5,17. Analysis of somatic mutational signatures can identify the mutagenic forces responsible, including ultraviolet irradiation in sun-exposed cancers and tobacco-associated polycyclic aromatic hydrocarbons in lung cancers^8,18. In human neurons, mutational signature analysis has revealed that somatic single-nucleotide variants (sSNVs) result from multiple mutagenic forces, potentially including the oxidation of DNA nucleotides⁵. AD shows increased oxidative stress and damaged nucleotides⁴, but the extent to which these damaged nucleotides are eliminated by manifold DNA repair processes, and whether they result in persistent DNA mutations, producing permanent effects on genome structure or transcription, are not known. Bulk methods, including targeted gene sequencing¹⁹ and single-molecule sequencing²⁰, have profiled aspects of AD somatic genetics, but AD has not to our knowledge been examined at the level of individual cellular genomes. Here, to test the hypothesis that specific mechanisms of genomic damage affect AD neurons, we applied single-cell whole-genome sequencing (scWGS) to single neurons from the brains of individuals with AD and neurotypical control individuals to compare the number, genomic locations and classes of somatic mutations that are associated with AD.

Somatic mutations in neurons during ageing

We performed scWGS on pyramidal neurons isolated from the brains of individuals with AD and neurotypical control individuals (Fig. 1a, Supplementary Tables 1, 2). We stained for the pan-neuronal marker NeuN to mark neurons, and further gated only the largest NeuN-positive nuclei (Fig. 1b). This separates, to a purity greater than 99%, the nuclei of pyramidal, excitatory neurons—which are preferentially vulnerable to both neurofibrillary tangle formation²¹ and cell death in AD²²—from those of glia and smaller, inhibitory neurons (Fig. 1c). Here, scWGS involves single-cell alkaline lysis on ice, whole-genome amplification using multiple displacement amplification (MDA) and then several screening and quality control steps, so that only genomes that are well amplified are finally sequenced. In total, using MDA, we analysed 91 neurons from 8 cases of AD and 159 neurons from 18 neurotypical control individuals (Table 1). We identified sSNVs using the LiRA pipeline²³, which uses linkage to germline haplotypes to increase specificity and estimates the genome-wide somatic mutation rate by accounting for the cell-specific proportion of phaseable linked sites and false positive rate. For these MDA-amplified single-cell genomes, we performed additional filtration steps based on previously reported patterns of nucleotide substitution attributed to artefacts of genome amplification by MDA²⁴ (see Methods, Extended Data Fig. 1). This set of filtered sSNV calls showed a variant allele fraction distribution that was very similar to that of germline heterozygous SNVs in single-cell data (Extended Data Fig. 2), which allowed us to confirm that, in neurotypical individuals, neuronal sSNVs increased with age at a rate of 16–21 sSNVs per year (Fig. 1d, Extended Data Fig. 3a–d)—consistent with previous work on neurons^5,20,25. Studies using clonally expanded cells from other human tissues have shown comparable yearly increases in sSNVs, ranging from 13 to 55 sSNVs per year, with higher rates in more rapidly dividing cell types (Extended Data Table 1).

**Fig. 1: Somatic mutations in single neurons in control individuals and individuals with AD.**

Table 1 Case information and number of neurons analysed in this study

Full size table

We next examined the accumulation of sSNVs in pyramidal neurons located in the CA1 subfield of Ammon’s horn of the normal hippocampus, as this is a critical region in AD and other diseases. Hippocampal CA1 neurons from individuals who died with no neurological diagnosis showed a trend towards the accumulation of sSNVs with age (Fig. 1e), which was not significantly different from the increase in sSNVs seen in prefrontal cortex (PFC) neurons from neurotypical control individuals (P = 0.72, linear mixed-effects regression model (linear mixed model); overlay in Fig. 1f). When considering the PFC and the hippocampus together (Extended Data Fig. 3a–d), this set of single cells highlights a common pattern of sSNV accumulation in the pyramidal neurons of neurotypical individuals.

Large-scale DNA sequencing studies in cancer have identified patterns and contexts of nucleotide substitution, termed ‘signatures’⁸, which often reveal mutagenic forces. In normal PFC neurons, the age-related increase in mutations is driven mainly by certain C>T and T>C changes, termed signature A⁵. This signature resembles the age-related ‘clock-like’ signature that is observed in other normal cells as well as in essentially all cancer cells⁹, designated as signature SBS5 in the COSMIC mutational signature database (https://cancer.sanger.ac.uk/cosmic/signatures). Signature decomposition analysis of sSNVs from the composite dataset of PFC and hippocampal pyramidal neurons showed that the contribution of signature A in each neuron increased with age, at a rate of 15.0 ± 1.2 sSNVs gained per year (Fig. 1g). This age-related increase in signature A mutations is similar for PFC and hippocampal pyramidal neurons (P = 0.18, linear mixed model), and is the major driver of age-related sSNV accumulation in normal neurons. Despite their universal presence in many cell types, and their accumulation in nondividing cells, the cellular mechanism of such clock-like mutations is not clear. Signature SBS5 exhibits a transcriptional strand bias⁹, which suggests that events leading to these mutations are associated with RNA transcription. During transcription, the double helix is unwound, exposing single DNA strands to cytosine and thymine deamination¹⁷, which are subject to transcription-coupled nucleotide excision repair (TC-NER). Transcription may therefore sensitize expressed loci to somatic mutagenesis through transcription-associated damage or ineffective repair.

Somatic mutations in AD

We next assessed the burden of sSNVs in neurons from the brains of eight individuals with AD and found that AD neurons showed significantly more called sSNVs than expected on the basis of age (P = 6.5 × 10⁻⁵, linear mixed model; Fig. 1h). This excess was variable between neurons, mirroring the variable presence of AD pathology within neurons of a given brain region. AD neurons also showed a significant increase in called sSNVs in MDA experiments when directly compared to age-matched neurotypical control neurons (P = 7.1 × 10⁻⁵, two-tailed Wilcoxon test; Fig. 1i). This increase remained after controlling for potential covariates including post-mortem interval, sample storage time, sample DNA quality, sequencing depth, sequencing quality score, library insert size and number of heterozygous germline SNVs, as well as technical metrics of scWGS evenness (see Methods, Extended Data Fig. 3e–h). In the PFC, we observed significant gains in sSNVs in AD relative to normal ageing in seven out of eight individual cases of AD (Fig. 1j). Several of the genomes with the highest sSNV counts in AD came from the hippocampus, in which five of eight cases also showed significant increases in sSNVs compared with normal ageing (Fig. 1k). However, in three cases, the assayed hippocampal neurons did not show a detectable increase in the handful of cells assayed. On the basis of tau (Braak) and amyloid-β (Consortium to Establish a Registry for Alzheimer’s Disease; CERAD) neuropathological staging, hippocampal pathology appears to precede PFC damage, and the hippocampus of these late-stage cases invariably showed widespread neuronal loss as well (not shown). Thus, it is possible that highly mutated neurons are lost before death and therefore not possible to assay here, so our results may reflect resilient neurons that have survived despite advanced AD²². These results show that neurons in AD contain hundreds of additional sSNVs beyond that expected for their age, indicating that the disease process produces a level of genomic damage that is on par with more than a decade of normal accumulation of sSNVs.

The somatic mutations identified in AD neurons are pervasively distributed across the genome (Fig. 1l), with a trend towards an excess in regions at least 1 kb upstream from the transcription start site—where DNA damage has been implicated during neuronal gene transcription²⁶—that does not survive Bonferroni correction (P = 0.045, two-tailed t-test; Extended Data Fig. 4). The broad genomic distribution of variants suggests that, rather than constituting a specific initial event in disease pathogenesis, somatic mutations are more likely to be secondary, resulting from other events that initiate AD and instigate mutagenic processes. Specifically, we did not observe somatic instances of known pathogenic mutations in classic germline AD risk genes (APP, PSEN1, PSEN2 and APOE), concordant with a recent report²⁷, nor did we observe somatic increases in copy number of the APP gene, contrary to a previous study²⁸ and as we reported in detail separately²⁹. We also observed no consistent effect of an individual’s ApoE status or sex on the accumulation of sSNVs.

Mutational signature analysis in AD neurons

We next performed mutational signature analysis to identify whether specific processes cause somatic alterations in AD neurons. De novo signature decomposition revealed mutational signatures concordant with those previously reported in human neurons⁵ (Extended Data Fig. 5). We focused our analysis on neuronal signatures A and C (Fig. 2a), as signature B contains clonal developmental mutations, but is also where artefactual C>T mutations created by MDA amplification aggregate²⁴. Signature A mutations increase with age in all samples, which suggests that this clock-like signature (that is most similar to the clock-like signature SBS5 from cancer⁵) constitutes an inherent feature of genome ageing. Signature A also shows a marginal increase in AD relative to age-matched controls (Fig. 2b, c), which does not reach statistical significance in these MDA experiments, but suggests that these mutational mechanisms could be accentuated in the setting of disease. On the other hand, AD neurons show a pronounced increase in signature C compared to controls (Fig. 2d, e), which accounts for most of the observed excess in alterations. The signature C burden shows more variation between neurons than that for signature A (Extended Data Fig. 5d), which suggests that signature C could result from irregular ‘calamitous’ events, in contrast to the uniform ageing represented in signature A.

**Fig. 2: Somatic mutational signatures and patterns in AD neurons by MDA.**

Signature C includes C>A substitutions, which have previously been associated with oxidative damage to guanine nucleotides¹⁸. Signature C also has a significant contribution from the cancer-associated signature SBS8 (ref. ⁵) (Extended Data Fig. 6a). This signature is increased in stem cells with disrupted TC-NER^10,30, and we have observed an increase in signature C in single human neurons deficient in TC-NER owing to ERCC6 mutations, and in neurons deficient for global NER owing to XPA or XPD mutations⁵. Overlap between AD sSNVs and other cancer-derived signatures also suggests a potential role for NER in T>A, T>C and C>T mutations (Extended Data Fig. 6b). Signature C has been reported in normal neurons at low but highly variable levels⁵, with some accumulation with age in the normal PFC, and a similar signature has also been reported in ageing stem cells from the liver and intestine⁶. Given that increased reactive oxygen species (ROS) and oxidative nucleic acid lesions have been reported in AD^4,31,32,33, a plausible mechanism for the accumulation of signature C in AD is that increased oxidative damage overwhelms NER, which could also be attenuated in AD. The set of excess mutations in individuals with AD, represented as the trinucleotide spectrum of residual mutations when subtracting those present in control individuals, also includes contributions from the cancer signature SBS6 (Extended Data Fig. 6b), which is associated with defective DNA mismatch repair, raising the possibility that other repair mechanisms may further contribute to the generation of somatic mutations in AD neurons.

Oxidative damage in AD neurons

Because our mutational signature analysis suggested that DNA oxidation—previously observed in bulk analyses of brains from individuals with AD^4,11—might contribute to the excess sSNVs in AD, we directly examined nucleotide oxidative damage in individual neurons. The most frequent oxidized nucleotide lesion due to oxidative stress is 8-oxoguanine (8-oxoG), and this is therefore used as a biomarker for cellular oxidative status and DNA damage. Immunofluorescence microscopy using an antibody targeting 8-oxoG showed that there were significantly higher levels of 8-oxoG in AD neurons than in neurotypical control neurons (P = 1.2 × 10⁻⁶, linear mixed model; Fig. 2f, Extended Data Fig. 7), indicating that increased levels of oxidative nucleotide damage contribute to C>A changes and to the increase in signature C in AD neurons.

Transcriptional influence on somatic SNVs

Mutations in genes that are critical for neuronal function and survival could directly affect cellular fitness. Despite the preferential repair of transcribed genes in human neurons³⁴, the burden of sSNVs in transcribed regions of the genome correlated with gene expression levels in the brain (P = 3.1 × 10⁻³, Pearson correlation; Fig. 2g). When this observation was separated by signature, with increased expression we observed increased signature A mutations (P = 5.0 × 10⁻⁵, Pearson correlation), but decreased signature C mutations (P = 6.5 × 10⁻³, Pearson correlation). These findings provide further support for the hypothesis that ageing-associated signature A and AD-associated signature C arise from different mechanisms. For signature A, events during transcription appear to have a role in generating mutations, whereas signature C correlates inversely with expression and therefore may be more effectively repaired during transcription, including by TC-NER³⁵.

Gene Ontology (GO) analysis of loci mutated in AD and control neurons revealed that genes involved in neuronal function were enriched for sSNVs (Fig. 2h). When considered together with the expression–sSNV findings, AD neurons show an influence of transcriptional processes on mutation generation. Such a transcriptional influence can produce an asymmetric pattern of mutations on the paired DNA strands. We therefore distinguished the sSNV sites by template status, between transcribed template strands and untranscribed strands (Fig. 2i). We found a significant strand bias for C>A mutations on the transcribed strand, along with a modest strand bias for C>T and T>C, providing further evidence that errors in transcription-related mechanisms have a role in the generation of sSNVs in AD neurons. As one example, an unrepaired oxidized guanine nucleotide, 8-oxoG, on an untranscribed strand could become a G>T mutation, which would be classified as a C>A mutation on the transcribed strand. In addition to the apparent protective role of NER processes against somatic mutation, the involvement of NER in signature C mutations also presents a potential mechanism for the accumulation of mutations in non-cycling cells, as NER involves the removal of an approximately 29-bp sequence by an exonuclease, followed by the replication of those 29 bp from the remaining DNA strand³⁶; this allows for replication errors during repair if the template strand is also damaged.

Potential consequences of somatic mutations in AD

Somatic mutation or single-stranded damage that alters amino acids can contribute to neuronal dysfunction or loss by many mechanisms, including direct impairment of transcription, alterations in protein stability or creation of neoantigens. In protein-coding genes, AD neurons show more nonsynonymous mutations than age-matched control neurons (Fig. 2j), which has the potential to impair dosage-sensitive genes, or to create neoantigen peptides that could elicit T lymphocyte activation, immune attack and consequent cellular damage. Observations of clonal CD8⁺ T cells in cerebrospinal fluid and brain tissue in AD³⁷ suggest that such autoactivation could be relevant in AD. Moreover, as somatic alterations accumulate in a genome, the likelihood of two deleterious exonic alterations in the same gene, producing a knockout cell, increases exponentially. We modelled the rate of sSNV-caused knockout neurons (Fig. 2k), and found a substantial projected increase in AD over controls (P = 0.022, generalized estimating equation model). This model suggests that dysfunctional neurons would be markedly more abundant in AD, which may be compounded by the length of certain AD-relevant genes³⁸; compromising neuronal function may therefore be one way in which sSNVs affect cellular physiology³⁹. The pronounced effect of genomic damage, even in non-dividing cells, is underscored by the observation that multiple defects in DNA repair result in neuronal dysfunction and degeneration^5,40.

Interrogation of AD neuron genomes by PTA

The experiments discussed thus far, which used MDA to amplify the genomes of single neurons, used LiRA variant calling to counteract allele dropout²³ and signature-based filtering of amplification artefacts (Extended Data Fig. 1), which are features of MDA-based methods. To corroborate our findings from MDA-amplified single neuron genomes, we applied a second single-cell amplification method that removes most or all amplification artefacts^41,42 as an orthogonal approach. Primary template-directed amplification (PTA)⁴¹ achieves highly uniform genome amplification by using chain-terminating nucleotides to disfavour long amplification products that can be re-primed. PTA thus allows the identification of sSNVs in single human neurons while mitigating known single-cell artefacts that can be seen from MDA⁴², obviating the need for signature-based variant filtering. PTA-based scWGS of human neurons has confirmed that somatic mutations increase with age⁴². We performed PTA-based scWGS on a small sample of neurons from most brains profiled by MDA (29 neurons from 7 cases of AD and 40 neurons from 13 neurotypical control individuals; Table 1) and confirmed that AD neurons contain increased somatic alterations compared to controls (P = 3.9 × 10⁻⁴, linear mixed model; Fig. 3a). This effect remained after controlling for technical metrics (Methods, Extended Data Fig. 8c–f). The magnitude of the PTA-detected AD increase is somewhat lower than what was observed by MDA, which is likely to reflect in part residual amplification artefacts in MDA material. sSNVs detected by PTA show trinucleotide spectra (Extended Data Fig. 8a) and COSMIC signature contributions (Extended Data Fig. 8b) that are highly similar to those seen in multiplexed end-tagging amplification of complementary strands (META-CS), a recently reported duplex sequencing method that explicitly distinguishes double-stranded mutations and single-stranded DNA lesions²⁵. PTA-identified mutational spectra closely cluster with META-CS-identified double-stranded mutations and are distinct from META-CS single-stranded lesions, which strongly suggests that PTA-detected sSNVs represent double-stranded somatic mutations.

**Fig. 3: Profile of somatic mutations in single AD neurons by PTA.**

We also examined PTA-detected mutations by signature decomposition, which again confirmed that signature A mutations increase with age in a clock-like manner (Fig. 3b), with a marginally significant increase in signature A in AD neurons (P = 0.04, linear mixed model). The AD-associated increase in mutations is most pronounced for signature C (P = 5.3 × 10⁻³, linear mixed model; Fig. 3c). As with the increase in total mutations in AD neurons, the PTA mutational signature findings mirrored the trends seen in MDA-amplified neuron genomes. The residual PTA-detected mutations in AD neurons show a distinct trinucleotide spectrum (Extended Data Fig. 8a), with an excess of C>A and C>T mutations that is also seen in MDA-amplified neurons. When analysed for contributions of COSMIC cancer mutation signatures, the residual mutations in AD neurons show a distinct pattern from that of control neurons (Extended Data Fig. 8b), including many signatures seen with MDA-detected AD residual mutations. Among these are SBS8 as well as SBS30, which is associated with the DNA repair enzyme NTHL1 that is involved in oxidative lesion repair. The PTA-detected burden of sSNVs in transcribed regions correlated with levels of gene expression in the brain (P = 2.8 × 10⁻³, Pearson correlation; Fig. 3d), whereas signature A and C mutations showed similar patterns to those seen with MDA-detected sSNVs, pointing to specific effects of transcriptional activity on mutation occurrence. We also noted a C>A strand bias in PTA-amplified AD neurons (Fig. 3e), further implicating transcription-related events in the generation of sSNVs in AD neurons. Thus, both scWGS approaches identified similar patterns, and suggest that the pathogenic mutational mechanisms in AD include DNA oxidation, NER DNA repair and transcriptional activity.

Although several studies have confirmed that neurons accumulate sSNVs with age^5,20,25, one recent study using a single-molecule technique called NanoSeq did not find greater genome-wide mutation rates in AD-affected brains compared to aged brains of neurotypical control individuals, and actually reported a small but significant decrease in somatic mutations in AD²⁰. There are a few potential reasons for this discrepancy as compared to our findings in single AD neurons. One possibility is that single-stranded lesions or variants contribute to our signal, although we have taken lengths to exclude this, including custom computational removal of known MDA artefacts and application of the PTA scWGS method. The NanoSeq study may also reflect an analysis of different cell populations from the individual cells that we studied here. The NanoSeq analysis studied bulk DNA from 15,000 pooled cells sorted using NeuN without size gating²⁰, but we observed that sorting by NeuN alone includes excitatory and inhibitory neurons, as well as some glial cells (Fig. 1b, c). Therefore, the NanoSeq study does not enrich for the excitatory pyramidal neurons that are selectively vulnerable to AD^21,22, which is likely to obscure the modest but consistent difference that we find when pyramidal neurons are enriched. The bulk NanoSeq method on all NeuN-expressing cells would also be susceptible to differences in cell-type abundance, which could account for the slightly decreased mutation count that was observed. Thus, increased somatic mutation burden in the AD brain may be limited to precisely the neuron subtypes that are most affected by the disease, potentially sparing some cell types.

Discussion

Our analysis reveals that excitatory neurons in the brains of individuals with AD accumulate genomic damage—and likely permanent mutations—beyond the levels that occur as a result of ageing alone. The pattern of genomic SNV accumulation in AD neurons appears to be distinct from an accentuation of normal ageing, as suggested by (1) the abundance of signature C, which is present but limited in the brain of neurotypical control individuals; and (2) signature-specific transcriptional influences. These genomic changes may include a spectrum of manifestations, including single-stranded DNA lesions and double-stranded mutations. Notably, putative mutations identified by PTA-based scWGS were molecularly similar to bone fide double-stranded mutations identified by duplex sequencing, but dissimilar to single-stranded lesions. These correlations, combined with the evenness of PTA genome coverage, suggest that the AD-specific somatic alterations are predominantly double-stranded mutations. Future studies that are specifically designed to compare DNA lesions with permanent mutations may shed further light on the differential effects these related phenomena have in AD. Other types of somatic alterations—such as short insertions and deletions, structural variants and retrotransposition events—can also be explored in greater depth as technologies improve.

Beyond abundance, the specific patterns of somatic alterations in AD neurons provide clues as to their causes and potential effects in AD pathogenesis (Fig. 4), and identify potential therapeutic targets. Signature C is notable for the presence of C>A variants, associated with oxidative damage, which has been observed previously in AD⁴ and which we found to be increased in AD neurons. This suggests that sSNVs occur downstream of ROS during disease pathogenesis. Signature C has a notable similarity to COSMIC signature SBS8, which is associated with the transcription-coupled repair of damaged guanine¹⁰, strongly suggesting that it accumulates either through disease-related defects in NER, or, more likely, from an accelerated accumulation of oxidized nucleotides that overwhelms the repair pathway. Oxidized nucleotides reflect the presence of increased ROS, which have previously been reported in the brain of individuals with AD, and which can be generated by a variety of processes—including inflammation and mitochondrial dysfunction, which have also been reported in AD⁴³. Our data show how these oxidative lesions may impair genomic function by interacting with mutations that occur as a part of ageing.

**Fig. 4: Model of the role of somatic mutations in AD pathogenesis.**

A major question that remains concerns how the buildup of AD-related genomic damage relates to the well-established accumulation of amyloid-β and tau proteins^1,2. Indeed, both of these AD-associated misfolded proteins can induce ROS^44,45, with the tau effect being mediated by mitochondrial dysfunction⁴⁵. Furthermore, tau can trigger double-stranded DNA breaks⁴⁶, thus further compounding the effects of sSNVs and potentially inducing more⁴⁷. Many aspects of the oxidative stress induced by AD proteins are not clear, but this process may also include the amyloid-β-stimulated activation of microglia, which can produce ROS directly and can also indirectly initiate the generation of ROS through the release of pro-inflammatory cytokines⁴⁸. Binding of amyloid-β to redox-active iron may also add oxidative stress⁴⁹. It will be important to identify how protein misfolding and other known events in AD relate to the accumulation of somatic mutations in the pathogenesis of disease.

Methods

Data reporting

No statistical methods were used to predetermine sample size. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment.

Human tissue samples and selection of cases of AD

Post-mortem frozen human tissues were obtained from the Massachusetts Alzheimer’s Disease Research Center (MADRC) at Massachusetts General Hospital and the NIH Neurobiobank at the University of Maryland Brain and Tissue Bank (UMBTB). Tissue collection and distribution for research and publication was conducted according to protocols approved by the Partners Human Research Committee (for MADRC: 1999P009556/MGH, expedited waiver category 5) and the University of Maryland Institutional Review Board (for UMBTB: 00042077), and after provision of written authorization and informed consent. Research on these de-identified specimens and data was performed at Boston Children’s Hospital with approval from the Committee on Clinical Investigation (S07-02-0087 with waiver of authorization, exempt category 4). Many neurotypical control tissues and datasets were obtained as part of a previous study⁵. Neurotypical control cases had no clinical history of dementia or other neurological disease. AD cases had a clinical history of dementia consistent with AD, pathologically confirmed AD pathological change (Braak stage V–VI) and no other notable neurodegenerative pathology. Age-matched cohorts included individuals who were over 50 years old (Table 1).

Isolation of individual pyramidal neurons for single-cell studies

The isolation of single neuronal nuclei using fluorescence-activated nuclear sorting (FANS) for the neuronal nuclear transcription factor NeuN and whole-genome amplification (WGA) using MDA⁵¹ have been described previously^5,52. In brief, nuclei were prepared from unfixed frozen human brain tissue, previously stored at −80 °C, in a dounce homogenizer using a chilled tissue lysis buffer (10 mM Tris-HCl, 0.32 M sucrose, 3 mM Mg(OAc)₂, 5 mM CaCl₂, 0.1 mM EDTA, 1 mM DTT, 0.1% Triton X-100, pH 8) on ice. Tissue lysates were layered on top of a sucrose cushion buffer (1.8 M sucrose 3 mM Mg(OAc)₂, 10 mM Tris-HCl, 1 mM DTT, pH 8) and ultra-centrifuged for 1 h at 30,000g. Nuclear pellets were resuspended in ice-cold PBS supplemented with 3 mM MgCl₂, filtered, then stained with anti-NeuN antibody directly conjugated to Alexa Fluor 488 (AF488) (Millipore MAB377X, clone A60, 1:1,250). NeuN staining produced a bimodal signal distribution (Fig. 1b, bottom), distinguishing NeuN⁺ and NeuN⁻ nuclei. Large neuronal nuclei, representing excitatory pyramidal neurons, were then identified by flow cytometry (using software BD FACSDiva v.8.0.2) by targeting the nuclei with highest NeuN signal among the NeuN⁺ neuronal fraction, while also gating for the population with the highest forward scatter area (FSC-A) signal, designated by the black box in Fig. 1b. This high-FSC-A, high-NeuN population is intended to represent large neurons, comprising 2–5% of the total population of nuclei in each sample.

The composition of the targeted population of large neurons was assessed using single-nucleus RNA transcriptomic sequencing (snRNA-seq), along with two control populations: all cells and all NeuN⁺ cells (each shown with respective gating boxes in Fig. 1b). snRNA-seq of these three populations of cellular nuclei was performed on a representative tissue sample (control individual 1465, prefrontal cortex). Nuclei were isolated as described above, with the following modifications: 0.2 U μl⁻¹ Protector RNAse inhibitor (Roche RNAINH-RO) and 0.2 U μl⁻¹ SuPERase-IN RNAse inhibitor (Invitrogen) were both added to the tissue lysis buffer and to the immunostaining buffer, and MgCl₂ was omitted from the immunostaining buffer. For each of the 3 populations, 16,000 nuclei were sorted into one well of a 96-well plate, then subjected to snRNA-seq using the 10X Genomics Next GEM Single Cell 3′ GEM Kit v3.1 and Chromium Controller. From these three populations, three libraries were prepared, each with dual indexes using the 10X Genomics Dual Index Plate. Each library was then sequenced on Illumina NovaSeq S4. The raw snRNA-seq data of three 10X libraries were analysed separately and then aggregated by Cell Ranger (v.6.0.0)⁵³, followed by variance normalization, t-SNE clustering and visualization processed by Pagoda2 (v.0.1.0)⁵⁴. Clusters with 50 or more cells were manually annotated as different neuronal and glial subtypes on the basis of the expression of marker genes using a similar protocol to that described in a previous study⁵⁰ These snRNA-seq data (Fig. 1c) enabled the assessment of various sorting populations shown in Fig. 1b. The full population of cells (DAPI⁺) contained a mixture of excitatory neurons, inhibitory neurons and glia. The overall NeuN⁺ population was highly enriched for neurons, but contained many inhibitory neurons and some glia. The population of cells targeted in this study, large NeuN⁺ nuclei, was highly enriched in pyramidal neurons, consisting of 100% neurons, of which 99.3% were excitatory neurons (Fig. 1c), with minimal inhibitory neurons and glia.

scWGS of pyramidal neurons using MDA

Single nuclei, prepared as described above, were sorted one nucleus per well into 96-well plates, with each well containing 2.8 μl alkaline lysis buffer (200 mM KOH, 5 mM EDTA, 40 mM DTT) pre-chilled on ice. Nuclei were lysed on ice for 15–30 min, then neutralized on ice in 1.4 μl neutralization buffer (400 mM HCl, 600 mM Tris-HCl, pH 7.5). These cold temperatures appear to be important to limit artefacts⁵⁵. MDA was then performed in a 20 μl total reaction volume by addition of an MDA master mix (12.18 μL QIAGEN REPLI-g reaction buffer, 2.675 μl H₂O, 0.105 μl DTT, 0.84 μl REPLI-g Phi29 polymerase enzyme). MDA was performed at 30 °C for 2 h. This protocol was applied to all new MDA samples in this study, and was confirmed to yield equivalent results as a prior protocol using Phi29 polymerase from a different distributor (repliPHI, Epicentre).

Samples were subjected to quality control by DNA quantification (PicoGreen, 3 μg yield required) and multiplex PCR for four random genomic loci. For an additional quality control step, we performed low coverage (0.5×) WGS, and cells with sufficiently even genome coverage (median absolute pairwise difference, MAPD; and coefficient of variation, CoV) were processed for deep sequencing. For germline reference, bulk DNA was purified using phenol:chloroform:isoamyl alcohol extraction and isopropanol precipitation, without RNAse A treatment.

Amplified single-neuron genomes were prepared for sequencing by DNA shearing and libraries generated by Psomagen (Macrogen) and Novogene using Illumina Tru-Seq kits and Illumina HiSeq X10 paired end sequencing (150 bp × 2) (Supplementary Table 1), as described previously⁵.

scWGS of pyramidal neurons using PTA

Single neurons, prepared as described above, were sorted one nucleus per well into 96-well plates and their genomes were amplified by PTA^41,42, a method that pairs an isothermal DNA polymerase with a termination base to induce quasi-linear amplification. PTA reactions were performed using the ResolveDNA Whole Genome Amplification Kit (previously known as SkrybAmp EA WGA Kit) (BioSkryb Genomics). Nuclei were sorted into 3 μl Cell Buffer pre-chilled on ice. Nuclei were then lysed by addition of 3 μl MS Mix, with mixing at 1,400 rpm performed after each step. Lysed nuclei were then neutralized with 3 μl SN1 buffer. Three microlitres of SDX reagent was then added, followed by a 10-min incubation at room temperature. Eight microlitres of reaction mix (containing enzyme) was then added, for a total reaction volume of 20 μl. Amplification was carried out for 10 h at 30 °C, followed by enzyme inactivation at 65 °C for 3 min. Amplified DNA was then cleaned up using AMPure, and the yield was determined using PicoGreen binding (Quant-iT dsDNA Assay Kit, Thermo Fisher Scientific). Samples were then subjected to quality control by multiplex PCR for four random genomic loci as previously described⁵, and also by Bioanalyzer for DNA fragment size distribution. Amplified genomes showing positive amplification for all four multiplex PCR loci were prepared for Illumina sequencing. In contrast to MDA, a low-coverage WGS screening step was performed.

Libraries were prepared following a modified KAPA HyperPlus Library Preparation protocol described in the ResolveDNA EA Whole Genome Amplification protocol. In brief, end repair and A-tailing were performed for 500 ng amplified DNA input. Adapter ligation was then performed using the SeqCap Adapter Kit (Roche, 07141548001). Ligated DNA was cleaned up using AMPure and amplified through an on-bead PCR amplification. Amplified libraries were selected for a size of 300–600 bp using AMPure. Libraries were subjected to quality control using PicoGreen and TapeStation HS DS100 Screen Tape (Agilent PN 5067-5584) before sequencing. Single-cell genome libraries were sequenced on the Illumina NovaSeq platform (150 bp × 2) at 30× coverage (Supplementary Table 1). Data from PTA-amplified neuronal genomes in AD were analysed alongside data from control neurons that are reported elsewhere⁴².

Read-mapping and generation of BAM files

Reads generated from WGS were mapped onto the human reference genome (GRCh37 with decoy) by BWA (v.0.7.15)⁵⁶ with default parameters. Duplicate reads were marked by MarkDuplicates of Picard tools (v.2.8) and post-processed with local realignment around indels and base quality score recalibration using Genome Analysis Toolkit (GATK) (v.3.5)⁵⁷.

Calling of sSNVs from scWGS data

We used phasing-based linked read analysis (LiRA, v.2018Feb)²³ to identify sSNVs against individual-specific bulk germline reference genomes, as described previously⁵. The initial somatic and germline variants were called using GATK’s HaplotypeCaller and germline variants were further phased by Shapeit 2 (v.904). sSNVs were called by LiRA and distinguished from technical artefacts when showing strong evidence for only two haplotypes with paired-end, read-backed linkage between the sSNV candidate and the adjacent germline heterozygous site. The autosomal genome-wide burden of sSNVs was then calculated by accounting for the proportion of phaseable sites and estimated false positive rate. We should emphasize that the raw LiRA calls are an intermediate step that requires scaling by a power ratio to calculate genome-wide somatic mutation rates that are comparable between cells (for example from MDA data, see Extended Data Fig. 1b). Of note, LiRA is only designed to call phased somatic variants in diploid genome regions, so we only considered sSNVs in autosomes for subsequent analyses to avoid potential detection bias in sex chromosomes between male and female individuals.

Because LiRA calling requires linked heterozygous germline sites for optimal specificity and false positive rate, it may limit its detection sensitivity in regions lacking phaseable germline variants. Therefore, to more comprehensively assess sSNVs in known AD risk genes (APP, PSEN1, PSEN2 or APOE) and the tau-encoding gene MAPT, we considered both the LiRA-called variants and the larger group of GATK calls that includes non-phaseable parts of these genes. In both LiRA-called variants and GATK calls, we identified no known pathogenic sSNVs in any of these AD-related genes. The question of clonal somatic mutations in these and other AD risk genes also has been examined in other studies by bulk gene sequencing^19,58,59.

Given the more even genome coverage and potentially fewer artefacts that are produced by PTA⁴², we used Single Cell ANalysis of SNVs (SCAN-SNV, v.2019Oct)⁶⁰, which does not require phasing information from adjacent germline variants and thus has more detection power in non-phaseable regions, to identify specific genomic sites of sSNVs for mutational signature and other downstream analyses.

Determining the evenness of single-cell genome amplification

The evenness of single-cell genome amplification was quantified using two different methods (Supplementary Table 4). First, the MAPD metric was calculated as reported previously⁶¹, which is the median value across all absolute differences between log₂-transformed copy number ratio of neighbouring genome bins, and a higher MAPD score represents greater unevenness of amplification. Binning, GC normalization, segmentation and copy number estimation were performed to obtain copy number ratio per bin following a previous single-cell copy number analysis protocol⁶², and MAPD was then calculated by taking a median of absolute difference between neighbouring bins. Second, considering that MAPD cannot reflect the variance of the copy number ratio distribution within each neuron, the CoV was also calculated by normalizing the standard deviation of absolute difference between neighbouring bins by their mean. We also calculated a ‘power ratio’ metric, which is defined as the ratio between the LiRA-estimated genome-wide sSNV burden and the LiRA-called phaseable sSNV count, reflecting the proportion of the genome that has been adequately amplified for each single cell. Using mixed-effects modelling, we measured the effect of these three metrics of genome evenness on sSNV burden in well-characterized neurotypical PFC neurons. We then normalized the mutation burden in each cell and estimated the age and disease effects on sSNV burden, as described in the section ‘Mixed-effects modelling of somatic SNV burden’.

Mutational signature analysis

To discover mutational signatures of sSNVs, we calculated the frequency of mutations in the 96-trinucleotide contexts for all control and AD neurons from the identified single-neuron sSNVs (synthesized in Extended Data Fig. 5a for MDA, and in Extended Data Fig. 8a for PTA). Mutation signatures in MDA-amplified neurons were detected by fitting a non-negative matrix factorization (NMF)-based mutational signature framework⁶³ using MutationalPatterns (v.1.8.0)⁶⁴ (Extended Data Fig. 5b). As we increased the number of signatures, we estimated the signature stability and reconstruction error of each signature and identified four signatures (N1, N2, N3 and N4) (Extended Data Fig. 5c) that maximize the number of signatures while minimizing error (Extended Data Fig. 5b). We also used a second signature derivation method, SignatureAnalyzer (v.1.1)^10,65, which can infer the optimal number of signatures from data by considering both model complexity and fitting accuracy. Under default parameters with half-normal distribution for priors and reducing effect of ultramutated samples, SignatureAnalyzer produced four signatures (W1–W4) with the greatest likelihood, which are nearly identical to signatures N1–N4 that were identified by MutationalPatterns (Extended Data Fig. 5c).

We observed a marked similarity between the de novo single-neuron signatures and previously published single-neuron signatures⁵ (Extended Data Fig. 5c), particularly when taking into account recently identified signatures of potential single-cell artefacts²⁴. Each newly derived signature closely resembled a previously derived one: N4 with neuron signature A, N2 with neuron signature C, N1 with neuron signature B and potential artefact signature SBS scF, and N3 with SBS scE. To understand the underlying mechanisms for the identified mutational signatures, we further performed NMF analysis to decompose our signatures into the reported the COSMIC v3 signatures (https://cancer.sanger.ac.uk/cosmic/signatures/; Extended Data Fig. 6a). We also performed NMF analysis to fit the COSMIC signatures to our composite disease and control single-neuron mutational profiles, which is shown in Extended Data Fig. 6b.

Given the near identity between the de novo and prior neuron signatures, we used the prior signatures for our subsequent analyses. On the basis of the evidence that SBS scF (highly similar to signature B) represents potential single-cell artefacts²⁴, we excluded the contributions from these signatures in our assessment of genome-wide sSNV burden for each single neuron.

Similarly, we used MutationalPatterns to determine mutational signature contributions in PTA-amplified neurons using the signatures we identified in MDA-amplified neurons. For PTA-amplified single-neuron genomes, we did not identify significant contributions from potential artefact signatures SBS scE and SBS scF, which prompted the filtering steps for data from MDA-amplified genomes. Therefore, for PTA-amplified genomes, we report unfiltered variant calling data.

Filtering of LiRA-called somatic SNVs from MDA-amplified genomes of single neurons

Previous studies and our observations have suggested additional measures beyond LiRA to further minimize experimental artefacts that may occur during MDA amplification of single-cell genomes²⁴. Beginning with total LiRA-called sSNVs (Extended Data Fig. 1a), we undertook a series of analyses on our human neuron MDA scWGS data, examining the influence of uneven genome amplification and the value of identification of specific mutational signatures proposed as potential artefacts of single-cell genome amplification²⁴. We found that cells with highly uneven genome amplification (MAPD > 2.0) show increased LiRA-called sSNV counts (Extended Data Fig. 1c), including sSNVs attributable to the potential artefact signature SBS scE, largely comprising GC>GT changes (Extended Data Fig. 1d). We also observed that a small subset of neurons, only seen in AD, show an ‘ultramutated’ profile (more than 20,000 LiRA-called sSNVs; Extended Data Fig. 1a), which is dominated by SBS scE (Extended Data Fig. 1d), suggesting that these amplified genomes may show LiRA sSNV calls that do not represent biological double-stranded fixed somatic mutations. The observed variants in these outlier cells may represent experimental artefacts, including false calls due to errors occurring early in genome amplification. Alternatively, the observed scE variants may also represent non-mutation biological events, such as unrepaired single-strand damaged nucleotides, which could be misread as sSNVs owing to strand dropout during genome amplification (Extended Data Fig. 1f). Although examination of the potential biological component of this phenomenon may provide important insights, we developed a computational filtering pipeline to generate a set of filtered sSNV calls, focusing our analysis on bona fide somatic mutations (Extended Data Fig. 1g).

Mixed-effects modelling of somatic SNV burden

To evaluate the relationships between somatic mutation and factors including age and disease status, we performed linear mixed-effects regression modelling using the lme4 (v.1.1-23) R package⁶⁶, in a similar manner to our previous study⁵. Both genome-wide sSNV burden and signature-specific sSNV burden were considered as continuous outcomes in modelling. Disease status and other covariates of interest (for example, age and measurement of amplification evenness) were modelled as fixed effects, and donor–tissue groups were modelled as random effects, because neurons from a donor and each tissue type may be correlated owing to shared biological environment. Linear mixed-effects models were fitted using the maximum likelihood method, and P values from a t-test with the Satterthwaite approximation were calculated for each fixed effect as implemented in the lmerTest (v.3.1-2) R package⁶⁷. Of note, we also used the marginal generalized least-squared method to fit the mixed-effects model, using the nlme (v.3.1-137) R package, which produced substantially similar results.

To test the age effect of sSNV burden in PFC and hippocampus from neurotypical individuals, we fitted the model ${y}_{{ijk}}=\left(\beta +{\gamma }_{j}\right)\times $${\rho }_{i}+\mu +{\theta }_{{ij}}+{\varepsilon }_{{ijk}}$, where y_ijk is the sSNV burden in neuron k from brain region j of donor i, β is the fixed-effect of age, γ_j is the fixed-effect of brain region j on age indicating interaction terms of age and brain region, ρ_i is the age of donor i, μ is the number of sSNVs at birth, θ_ij is the random effect of the donor–tissue pair following a normal distribution with mean 0 and variance τ, and ε_ijk is the measurement error of each neuron also following a normal distribution with mean 0 and variance σ_ikj (Fig. 1d–f). To control for the potential confounding factor of genome amplification evenness, we further introduced another covariate, δ_ijk, which represents the neuron-specific measurement of amplification evenness (for example, MAPD, CoV and power ratio) into the previous model, and re-estimated the age effect by subtracting the neuron-specific contribution of the amplification unevenness coefficient from y_ijk (Extended Data Fig. 3a–d). We found that PFC and hippocampus show no significant difference on the age effect before and after controlling for amplification evenness (all P > 0.25), therefore we did not consider the brain region covariate in downstream modelling. In addition to the genome-wide sSNV burden, we also analysed signature-specific sSNVs with similar models (Fig. 1g).

To test the difference of sSNV burden between AD and control neurons in an age-controlled manner, we fitted the model ${y}_{{ijk}}=\beta \times {\rho }_{i}+{\alpha }_{i}+\mu +{\theta }_{{ij}}+{\varepsilon }_{{ijk}}$, where α_i is the fixed-effect of disease status (AD versus control), whereas y_ijk, β, ρ_i, μ, θ_ij and ε_ijk are defined as previously (Fig. 1h). We further adjusted the sSNV burden by considering the contribution of amplification evenness δ_ijk as we estimated above, and the difference of sSNV burden between AD and control neurons remained significant in both MDA- and PTA-amplified neurons (Extended Data Figs. 3e–h, 8c–f).

To exclude the possibility that the observed sSNV burden increase in AD can be driven by systemic differences in sample or sequencing quality metrics, we further introduced ω_ijk into the linear mixed-effects model: ${y}_{{ijk}}=\beta \times {\rho }_{i}+{\alpha }_{i}+\mu +{\theta }_{{ij}}+{\varepsilon }_{{ijk}}+{\omega }_{{ijk}}$, where ω_ijk denotes one of the potential confounding factors including sex, post-mortem interval, DNA quality (DIN), sample storage time, sequencing depth, library insert size, proportion of read bases with base quality at least 20, and number of heterozygous germline SNVs (an indicator of genomic size of phaseable region). We confirmed that, in both MDA- and PTA-amplified neurons, the increased sSNV burden in AD remained significant after controlling for each (all P < 0.01). For Fig. 1j, k, we also calculated AD-attributable excess somatic mutations as the residual value for each single neuron after subtracting the age effect ($\beta \times {\rho }_{i}+\mu $) estimated from neurotypical control neurons in prefrontal cortex.

To test whether sSNV burden is associated with ApoE genotype in patients with AD, we fit the model ${y{\prime} }_{{ijk}}={\omega }_{i}+{\theta }_{{ij}}+{\varepsilon }_{{ijk}}$, where ${y{\prime} }_{{ijk}}$ is the age-corrected sSNV burden (${y}_{{ijk}}-\beta \times {\rho }_{i}$) for each neuron, and ${\omega }_{i}$ is the ApoE genotype of risk allele ε4 under dominant, recessive and additive genetic models. No significant association was observed in any of the three genetic models in MDA- or PTA-amplified neurons (all P > 0.21).

Gene expression analysis

To test whether somatic mutation is associated with gene expression level, we extracted the brain PFC expression data from GTEx⁶⁸. The per-gene expression value was normalized for each individual after controlling for age and gender using DESeq2 (v.1.24.0)⁶⁹ and averaged across all the individuals. Genes were then assigned to 10 deciles on the basis of their PFC expression levels, and all sSNV density was calculated for each decile of genes after normalizing by per-neuron sSNV detection power ratio and total gene length. To control for potential bias due to trinucleotide context and the distribution of phaseable regions (areas with sufficient sequencing coverage and an adjacent heterozygous germline SNP), we permuted the per-neuron sSNV list for 1,000 rounds by randomly shuffling the sSNVs within the phaseable regions while keeping the trinucleotide context distribution the same. We calculated the mean and standard deviation of the per-decile density in the permuted dataset, and then measured the difference between observed and expected sSNV density for each decile of AD or age-matched control group. This analysis included all brain regions in each experiment (PFC and hippocampus for MDA-based scWGS; PFC for PTA-based scWGS).

We further performed an NMF-based mutational signature analysis for sSNVs located in each decile of genes, to estimate the relative contributions of signature A, signature C, SBS scE and SBS scF for each decile. The sSNV density for each signature was calculated by multiplexing the overall sSNV density by each signature contribution.

Functional enrichment analysis

Analysis for functional enrichment of GO terms was performed using GOseq (v.1.34.1)⁷⁰. For each RefSeq gene, we assigned a binary value ‘0’ or ‘1’ according to whether any sSNVs are located in the corresponding gene. Of note, this analysis is based on the LiRA output of sSNVs (signature-based filtering cannot be applied to individual genes or variants), and therefore this list may contain a small proportion of artefactual sSNVs. A probability weighting function in GOseq was applied to control for potential gene length bias. The Wallenius approximation method was used to test the enrichment of sSNVs, and the false discovery rate (FDR) method was further applied for the correction of multiple hypothesis testing. Genes without any GO annotation were ignored when calculating the total gene count. GO terms with fewer than 10 hits were excluded to avoid ascertainment bias. Very large GO terms with more than 1,000 genes were also ignored. All the GO terms with P < 0.01 in either AD or control neurons are listed in Supplementary Table 6.

Strand bias analysis

Mutations in transcribed regions of the genome may show a different density between transcribed and untranscribed strands (so-called strand bias)^71,72, resulting from asymmetric mutagenesis and/or repair activity between strands. The transcriptional strands of genic sSNVs were assigned on the basis of the UCSC TxDb annotations by MutationalPatterns⁶⁴. Mutated bases (‘C’ or ‘T’) on the same strand as the gene direction were categorized as ‘untranscribed’ and on the opposite strand as ‘transcribed’. Strand bias analysis was performed on the set of mutations identified in PFC and hippocampal neurons together, on the net increase (residual) of mutations in AD neurons over control neurons. Statistical significance was determined by the Poisson test.

Location of sSNVs relative to genomic features

Annotations from ANNOVAR⁷³ were used to identify sSNVs falling in the following positions: intergenic, upstream (within 1 kb region upstream of transcription start site), 5′ UTR, exonic (coding sequence, not including untranslated regions), 3′ UTR, downstream (within 1 kb region downstream of transcription start site), splicing (within intronic 2 bp of a splicing junction), intronic. The functional interpretation was classified using four categories of SNV annotation: synonymous (SNV that does not cause an amino acid change), nonsynonymous (SNV that causes an amino acid change, excluding stop-gain and stoploss SNVs), stop-loss (nonsynonymous SNV that eliminates a stop codon), and stop-gain (nonsynonymous SNV that creates a stop codon). For exonic and UTR sSNVs, we further grouped them into 10 deciles according to their position relative to the transcript length. Similar to gene expression analysis, we used the 1,000 rounds of permutation within phaseable regions by controlling for trinucleotide context distribution, and then calculated the normalized difference (D) between observed (N_obs) and expected (N_exp) sSNV counts as below:

$$D=\frac{{N}_{{\rm{obs}}}-{N}_{{\rm{\exp }}}}{{N}_{{\rm{\exp }}}}$$

Modelling the accumulation of gene knockouts in neurons

Many specific heterozygous mutations could damage neuronal function³⁹. Biallelic, exonic, deleterious ‘gene knockout’ (KO) mutations in essential genes would be especially damaging, such that there may be a threshold for the accumulation of such KO mutations above which neuronal function would deteriorate. On the basis of the number of sSNVs we identified in this report, we estimated the accumulation of gene KOs in cortical neurons, using a method described previously⁵. In brief, we estimated the probability of a mutation causing a gene knockout in a cell. In a diploid genome this corresponds to calculating the probability that two or more damaging mutations fall on the same gene, given the number of damaging mutations observed in a sample. This probabilistic problem can be modelled by an approximation of the birthday problem:

$$\begin{array}{c}Pr({\rm{K}}{\rm{O}}|n)=1-{{\rm{e}}}^{\frac{{-n}^{2}}{{\rm{n}}{\rm{o}}.{\rm{o}}{\rm{f}}{\rm{g}}{\rm{e}}{\rm{n}}{\rm{e}}{\rm{s}}}},{\rm{w}}{\rm{h}}{\rm{e}}{\rm{r}}{\rm{e}}\\ n={\rm{n}}{\rm{o.}}\,{\rm{o}}{\rm{f}}\,{\rm{s}}{\rm{S}}{\rm{N}}{\rm{V}}{\rm{s}}\times \frac{{\rm{t}}{\rm{o}}{\rm{t}}{\rm{a}}{\rm{l}}\,{\rm{d}}{\rm{e}}{\rm{l}}{\rm{e}}{\rm{t}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{o}}{\rm{u}}{\rm{s}}\,{\rm{v}}{\rm{a}}{\rm{r}}{\rm{i}}{\rm{a}}{\rm{n}}{\rm{t}}{\rm{s}}}{{\rm{t}}{\rm{o}}{\rm{t}}{\rm{a}}{\rm{l}}\,{\rm{v}}{\rm{a}}{\rm{r}}{\rm{i}}{\rm{a}}{\rm{n}}{\rm{t}}{\rm{s}}}\times 0.5,\end{array}$$

where n is the expected number of deleterious mutations for a given neuron. The approximation used here is different from the one published previously⁵ to allow for more robust approximation when 0 < n < 1. This model was further expanded to include information about genes that are intolerant to heterozygous mutations, resulting in haploinsufficiency and functional knockout. This is captured by the probability of loss-of-function intolerance (pLI) metric, with genes with a high pLI score (pLI ≥ 0.90) being less tolerant⁷⁴. ExAC reported that 17% of all genes have such high pLI scores. We then used this information for the final model, written a follows:

$$n={\rm{n}}{\rm{u}}{\rm{m}}{\rm{b}}{\rm{e}}{\rm{r}}\,{\rm{o}}{\rm{f}}\,{\rm{d}}{\rm{e}}{\rm{l}}{\rm{e}}{\rm{t}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{o}}{\rm{u}}{\rm{s}}\,{\rm{m}}{\rm{u}}{\rm{t}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}{\rm{s}}$$

$${d}_{i}=\{{\rm{e}}{\rm{v}}{\rm{e}}{\rm{n}}{\rm{t}}\,{\rm{t}}{\rm{h}}{\rm{a}}{\rm{t}}\,{\rm{g}}{\rm{e}}{\rm{n}}{\rm{e}}\,i\,{\rm{h}}{\rm{a}}{\rm{s}}\,{\rm{a}}{\rm{t}}\,{\rm{l}}{\rm{e}}{\rm{a}}{\rm{s}}{\rm{t}}\,{\rm{o}}{\rm{n}}{\rm{e}}\,{\rm{m}}{\rm{u}}{\rm{t}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}\}$$

$${\pi }_{i}=\{{\rm{event\; that\; gene}}\,i\,{\rm{has\; a\; high\; pLI\; score}}\}$$

$$D=\{{\rm{probability\; of\; a\; gene\; having\; a\; deleterious\; mutation}}\}$$

$${\rm{\Pr }}\left({KO}|\pi ,D,n\right)=\pi \times \left(1-{\left(1-D\right)}^{n}\right)+(1-\pi )(1-{{\rm{e}}}^{-{nD}})$$

The average was taken across all cells per individual (n > 3 cells each, with specific n shown in the Source Data for Fig. 2k) and 95% CI on those point estimates were calculated for illustration purposes. A scale factor of 100 was used to convert probabilities into percentages. To test whether there was a higher probability of obtaining a KO in AD versus controls, we used generalized estimating equations with an exchangeable working correlation structure to model the probabilities using a probit link function using the geepack (v.1.3-1) R package. Namely, we fitted the model for each donor–tissue pairing k and neuron i as follows:

$$g\left({\kappa }_{k,i}\right)={\beta }_{{\rm{age}},k}{X}_{{\rm{age}},{ki}}+{\beta }_{{\rm{diagnosis}}}{X}_{{\rm{diagnosis}}}+{\beta }_{{\rm{diagnosis}}:{\rm{age}},{ki}}{X}_{{\rm{age}},{ki}}{X}_{{\rm{diagnosis}},{ki}}$$

with the correlation between two neurons in a donor-tissue pair defined as ${\rm{Corr}}\left({\kappa }_{k,i},{\kappa }_{k,{i}^{{\prime} }}\right)=\rho $, where ${\kappa }_{{ijk}}$ is the probability of a neuron having a KO mutation with the function g() being the probit link function.

Immunofluorescence microscopy for 8-oxoG as a biomarker for neuron oxidative damage

To examine whole-cell oxidation status in individual neurons in post-mortem human brain, we performed immunofluorescence staining and quantification for cellular 8-oxoG, the most frequent oxidative nucleotide product caused by ROS, under conditions known as oxidative stress. Formation of 8-oxoG is an important biomarker for oxidative status and oxidative DNA damage lesions in the cell⁷⁵.

Fresh-frozen human brain PFC tissue was embedded in OCT medium and then cryo-sectioned (20 µm), with sections applied to uncharged glass slides and fixed for 10 min using 4 °C Carnoy’s fixative (60% ethanol, 30% chloroform and 10% acetic acid). Slides were washed in cold 1× PBS 3 times for 10 min each. A circle was drawn around the tissue section using a grease pen and slides were placed into a humifying chamber. Primary antibody solution consisted of: 0.2% Tween-20, rabbit anti-NeuN (1:1,000, Abcam ab177487) and mouse anti-8-oxoG (1:500, Abcam ab206461, clone 2Q2311) in blocking solution (10 mg ml⁻¹ bovine serum albumin, 0.02 % sterile normal donkey serum, 2 mg ml⁻¹ glycine, 2 mg ml⁻¹ lysine in 1× PBS). Primary antibody solution was applied, and slides were sealed in a humidifying chamber and incubated at 4 °C overnight. Slides were then washed with cold 1× PBS and secondary antibody solution was applied to each slide. Secondary antibody solution: 0.2 % Tween-20, donkey anti-rabbit Alexa Fluor 488 (1:250, Thermo Fisher Scientific A32790) and donkey anti-mouse Alexa Fluor 555 (1:250, Thermo Fisher Scientific A32773) in 1× PBS. Slides were sealed in a humidifying chamber and incubated at 4 °C overnight. Slides were washed in 1× PBS then put in a dehydration series consisting of 50% ethanol (5 min), 70% ethanol (3 min × 2), 95% ethanol (3 min × 2), 100% ethanol (3 min × 2), and xylenes (5 min × 2). After the xylene step, tissue was permanently mounted using DPX and a glass coverslip. Slides were allowed to dry overnight before microscopy.

Two staining batches were performed for all cases, using an antibody master mix to reduce staining differences between slides. A middle-aged individual (46-year-old woman; case 5773) was used to establish the fluorescence exposure setting for 8-oxoG and NeuN and used for the imaging of all cases. Tissue was visualized by using a Zeiss Axio Observer 7 fluorescent microscope equipped with an X-cite Exacte 120 LEDboost lamp, Zeiss Axiocam 506 mono camera, Zen Blue 2.5 pro software and a 20× objective lens. AF488 (499ex/520 em) was paired with a 530/30 nm bandpass filter and AF555 (553ex/568em) was paired with a 582/15 nm bandpass filter channel. The top and bottom of intracellular NeuN immunoreactivity were used to establish z-stack bounds using 0.24-µm steps at 2,752 × 2,208 resolution, pixel size 4.54 µm × 4.54 µm and 1 × 1 binning. Neuron cell body 8-oxoG immunofluorescence was quantified using Fiji (ImageJ) software. For each case, n = 100 total neurons were examined and quantified for 8-oxoG (50 neurons each from two independent staining experiment batches per case). For each cell, a single z-section was chosen representing the centre of the neuron in the Z-plane. A line was drawn around the perimeter of the neuron cell body, as visualized by NeuN 488 channel. The mean grey value (absorbance units, AU) was measured within the perimeter area in the 8-oxoG 555 channel and considered the ‘intracellular signal’. The neuron perimeter object was moved to an area adjacent to the neuron with no intracellular NeuN or 8-oxoG immunoreactivity and the mean grey value was measured. This value was considered ‘background signal’ and was subtracted from the intracellular signal value. The final value was used to represent mean 8-oxoG immunofluorescence signal for the cell.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

scWGS data have been deposited in the NIH Alzheimer’s disease genomic data repository, NIAGADS, under accession number NG00121. The data are available under controlled-use conditions established by the tissue banks and institutional review boards (see Methods), and can be obtained by qualified investigators at https://www.niagads.org/. Gene transcripts per million (TPM) data (V8) of GTEx samples were downloaded from https://www.gtexportal.org/home/datasets. Source data are provided with this paper.

Code availability

Custom Bash and R scripts used in this study are publicly available at https://gitlab.aleelab.net/august/ad-single-cell.git.

References

Selkoe, D. J. & Hardy, J. The amyloid hypothesis of Alzheimer’s disease at 25 years. EMBO Mol. Med. 8, 595–608 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hyman, B. T. et al. National Institute on Aging–Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease. Alzheimers Dement. 8, 1–13 (2012).
Article PubMed PubMed Central Google Scholar
Braak, H. & Braak, E. Staging of Alzheimer’s disease-related neurofibrillary changes. Neurobiol. Aging 16, 271–278 (1995).
Article CAS PubMed Google Scholar
Gabbita, S. P., Lovell, M. A. & Markesbery, W. R. Increased nuclear DNA oxidation in the brain in Alzheimer’s disease. J. Neurochem. 71, 2034–2040 (1998).
Article CAS PubMed Google Scholar
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Article ADS CAS PubMed Google Scholar
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316 (2018).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Lu, T. et al. REST and stress resistance in ageing and Alzheimer’s disease. Nature 507, 448–454 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).
Article PubMed PubMed Central CAS Google Scholar
Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Hazen, J. L. et al. The complete genome sequences, unique mutational spectra, and developmental potency of adult neurons revealed by cloning. Neuron 89, 1223–1236 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bhagwat, A. S. et al. Strand-biased cytosine deamination at the replication fork causes cytosine to thymine mutations in Escherichia coli. Proc. Natl Acad. Sci. USA 113, 2176–2181 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sala Frigerio, C. et al. On the identification of low allele frequency mosaic mutations in the brains of Alzheimer’s disease patients. Alzheimers Dement. 11, 1265–1276 (2015).
Article PubMed Google Scholar
Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).
Article ADS CAS PubMed Google Scholar
Fu, H. et al. A tau homeostasis signature is linked with the cellular and regional vulnerability of excitatory neurons to tau pathology. Nat. Neurosci. 22, 47–56 (2019).
Article CAS PubMed Google Scholar
Leng, K. et al. Molecular characterization of selectively vulnerable neurons in Alzheimer’s disease. Nat. Neurosci. 24, 276–287 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bohrson, C. L. et al. Linked-read analysis identifies mutations in single-cell DNA-sequencing data. Nat. Genet. 51, 749–754 (2019).
Article CAS PubMed PubMed Central Google Scholar
Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).
Article CAS PubMed PubMed Central Google Scholar
Xing, D., Tan, L., Chang, C.-H., Li, H. & Xie, X. S. Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc. Natl Acad. Sci. USA 118, e2013106118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Madabhushi, R. et al. Activity-induced DNA breaks govern the expression of neuronal early-response genes. Cell 161, 1592–1605 (2015).
Article CAS PubMed PubMed Central Google Scholar
Min, S. et al. Absence of coding somatic single nucleotide variants within well-known candidate genes in late-onset sporadic Alzheimer’s disease based on the analysis of multi-omics data. Neurobiol. Aging 108, 207–209 (2021).
Article CAS PubMed Google Scholar
Lee, M. H. et al. Somatic APP gene recombination in Alzheimer’s disease and normal neurons. Nature 563, 639–645 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, J. et al. APP gene copy number changes reflect exogenous contamination. Nature 584, E20–E28 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jager, M. et al. Deficiency of nucleotide excision repair is associated with mutational signature observed in cancer. Genome Res. 29, 1067–1077 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mecocci, P., MacGarvey, U. & Beal, M. F. Oxidative damage to mitochondrial DNA is increased in Alzheimer’s disease. Ann. Neurol. 36, 747–751 (1994).
Article CAS PubMed Google Scholar
Chun, H. et al. Severe reactive astrocytes precipitate pathological hallmarks of Alzheimer’s disease via H₂O₂⁻ production. Nat. Neurosci. 23, 1555–1566 (2020).
Article CAS PubMed Google Scholar
Pao, P. C. et al. HDAC1 modulates OGG1-initiated oxidative DNA damage repair in the aging brain and Alzheimer’s disease. Nat. Commun. 11, 2484 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Nouspikel, T. & Hanawalt, P. C. Terminally differentiated human neurons repair transcribed genes but display attenuated global DNA repair and modulation of repair gene expression. Mol. Cell. Biol. 20, 1562–1570 (2000).
Article CAS PubMed PubMed Central Google Scholar
Seplyarskiy, V. B. et al. Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. Nat. Genet. 51, 36–41 (2019).
Article CAS PubMed Google Scholar
Huang, J. C., Svoboda, D. L., Reardon, J. T. & Sancar, A. Human nucleotide excision nuclease removes thymine dimers from DNA by incising the 22nd phosphodiester bond 5′ and the 6th phosphodiester bond 3′ to the photodimer. Proc. Natl Acad. Sci. USA 89, 3664–3668 (1992).
Article ADS CAS PubMed PubMed Central Google Scholar
Gate, D. et al. Clonally expanded CD8 T cells patrol the cerebrospinal fluid in Alzheimer’s disease. Nature 577, 399–404 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Soheili-Nezhad, S., van der Linden, R. J., Olde Rikkert, M., Sprooten, E. & Poelmans, G. Long genes are more frequently affected by somatic mutations and show reduced expression in Alzheimer’s disease: Implications for disease etiology. Alzheimers Dement. 17, 489–499 (2020).
Article PubMed PubMed Central CAS Google Scholar
Crabtree, G. R. Our fragile intellect. Part I. Trends Genet. 29, 1–3 (2013).
Article CAS PubMed Google Scholar
Fragola, G. et al. Deletion of topoisomerase 1 in excitatory neurons causes genomic instability and early onset neurodegeneration. Nat. Commun. 11, 1962 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Gonzalez-Pena, V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc. Natl Acad. Sci. USA 118, e2024176118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Luquette, L. J. et al. Ultraspecific somatic SNV and indel detection in single neurons using primary template-directed amplification. Preprint at bioRxiv https://doi.org/10.1101/2021.04.30.442032 (2021).
Kaur, U. et al. Reactive oxygen species, redox signaling and neuroinflammation in Alzheimer’s disease: the NF-κB connection. Curr. Top. Med. Chem. 15, 446–457 (2015).
Article CAS PubMed Google Scholar
Butterfield, D. A., Castegna, A., Lauderback, C. M. & Drake, J. Evidence that amyloid beta-peptide-induced lipid peroxidation and its sequelae in Alzheimer’s disease brain contribute to neuronal death. Neurobiol. Aging 23, 655–664 (2002).
Article PubMed Google Scholar
David, D. C. et al. Proteomic and functional analyses reveal a mitochondrial dysfunction in P301L tau transgenic mice. J. Biol. Chem. 280, 23802–23814 (2005).
Article CAS PubMed Google Scholar
Khurana, V. et al. A neuroprotective role for the DNA damage checkpoint in tauopathy. Aging Cell 11, 360–362 (2012).
Article CAS PubMed Google Scholar
Sakofsky, C. J. et al. Repair of multiple simultaneous double-strand breaks causes bursts of genome-wide clustered hypermutation. PLoS Biol. 17, e3000464 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mandrekar-Colucci, S. & Landreth, G. E. Microglia and inflammation in Alzheimer’s disease. CNS Neurol. Disord. Drug Targets 9, 156–167 (2010).
Article CAS PubMed Google Scholar
Rottkamp, C. A. et al. Redox-active iron mediates amyloid-beta toxicity. Free Radic. Biol. Med. 30, 447–450 (2001).
Article CAS PubMed Google Scholar
Huang, A. Y. et al. Parallel RNA and DNA analysis after deep sequencing (PRDD-seq) reveals cell type-specific lineage patterns in human brain. Proc. Natl Acad. Sci. USA 117, 13886–13895 (2020).
Article CAS PubMed PubMed Central Google Scholar
Dean, F. B., Nelson, J. R., Giesler, T. L. & Lasken, R. S. Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 11, 1095–1099 (2001).
Article CAS PubMed PubMed Central Google Scholar
Evrony, G. D. et al. Single-neuron sequencing analysis of L1 retrotransposition and somatic mutation in the human brain. Cell 151, 483–496 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Fan, J. et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat. Methods 13, 241–244 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dong, X. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat. Methods 14, 491–493 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Keogh, M. J. et al. High prevalence of focal and multi-focal somatic genetic variants in the human brain. Nat. Commun. 9, 4257 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Park, J. S. et al. Brain somatic mutations observed in Alzheimer’s disease associated with aging and dysregulation of tau phosphorylation. Nat. Commun. 10, 3090 (2019).
Article ADS PubMed PubMed Central CAS Google Scholar
Luquette, L. J., Bohrson, C. L., Sherman, M. A. & Park, P. J. Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance. Nat. Commun. 10, 3908 (2019).
Article ADS PubMed PubMed Central CAS Google Scholar
Cai, X. et al. Single-cell, genome-wide sequencing identifies clonal somatic copy-number variation in the human brain. Cell Rep. 8, 1280–1289 (2014).
Article CAS PubMed PubMed Central Google Scholar
Baslan, T. et al. Genome-wide copy number analysis of single cells. Nat. Protoc. 7, 1024–1041 (2012).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).
Article CAS PubMed PubMed Central Google Scholar
Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).
Article PubMed PubMed Central CAS Google Scholar
Kim, J. et al. Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).
Article Google Scholar
Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. B. lmerTest Package: tests in linear mixed effects models. J. Stat. Softw. 82, 1–26 (2017).
Article Google Scholar
Consortium, G. T. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central CAS Google Scholar
Young, M. D., Wakefield, M. J., Smyth, G. K. & Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol. 11, R14 (2010).
Article PubMed PubMed Central CAS Google Scholar
Green, P. et al. Transcription-associated mutational asymmetry in mammalian evolution. Nat. Genet. 33, 514–517 (2003).
Article CAS PubMed Google Scholar
Polak, P. & Arndt, P. F. Transcription induces strand-specific mutations at the 5′ end of human genes. Genome Res. 18, 1216–1223 (2008).
Article CAS PubMed PubMed Central Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central CAS Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Coppede, F. & Migliore, L. DNA damage and repair in Alzheimer’s disease. Curr. Alzheimer Res. 6, 36–47 (2009).
Article CAS PubMed Google Scholar
Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016).
Article CAS PubMed PubMed Central Google Scholar
Franco, I. et al. Somatic mutagenesis in satellite cells associates with human skeletal muscle aging. Nat. Commun. 9, 800 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Zhang, L. et al. Single-cell whole-genome sequencing reveals the functional landscape of somatic mutations in B lymphocytes across the human lifespan. Proc. Natl Acad. Sci. USA 116, 9014–9019 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Article ADS CAS PubMed Google Scholar
Franco, I. et al. Whole genome DNA sequencing provides an atlas of somatic mutagenesis in healthy human cells and identifies a tumor-prone cell type. Genome Biol. 20, 285 (2019).
Article CAS PubMed PubMed Central Google Scholar
Brunet, J. P., Tamayo, P., Golub, T. R. & Mesirov, J. P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA 101, 4164–4169 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank R. Mathieu and L. Cheemalamarri at the Boston Children’s Hospital and Harvard Stem Cell Institute Flow Cytometry Research Facility, R. S. Hill, the Research Computing group at Harvard Medical School and the Boston Children’s Hospital Intellectual and Developmental Disabilities Research Center (IDDRC) Molecular Genetics Core for assistance. We thank C. L. Bohrson for mutational signature discussions. The brain and nuclei in Fig. 1 were illustrated by A. Lai with input from the authors, and Fig. 4 was illustrated by K. Probst (Xavier Studio) with input from the authors. Human tissue was obtained from the Massachusetts Alzheimer’s Disease Research Center (1P30AG062421-01) and the NIH Neurobiobank at the University of Maryland, and we thank the donors and families for their contributions, and J. Gonzalez and P. Dooley for assistance with tissue procurement. This work was supported by K08 AG065502 (M.B.M.); T32 HL007627 (M.B.M.); the Brigham and Women’s Hospital Program for Interdisciplinary Neuroscience through a gift from L. and T. Rand (M.B.M.); the donors of the Alzheimer’s Disease Research program of the BrightFocus Foundation A20201292F (M.B.M.); the Doris Duke Charitable Foundation Clinical Scientist Development Award 2021183 (M.B.M.); T32 GM007753 (E.A.M.); T15 LM007098 (E.A.M.); R00 AG054748 (M.A.L.); K01 AG051791 (E.A.L.); the Suh Kyungbae Foundation (E.A.L.), DP2 AG072437 (E.A.L.); R01 NS032457-20S1 (C.A.W.); R01 AG070921 (C.A.W. and E.A.L.); the F-Prime Foundation (C.A.W.); and the Allen Discovery Center program, a Paul G. Allen Frontiers Group advised program of the Paul G. Allen Family Foundation (C.A.W. and E.A.L.). C.A.W. is an Investigator of the Howard Hughes Medical Institute.

Author information

These authors contributed equally: Michael B. Miller, August Yue Huang
These authors jointly supervised this work: Michael A. Lodato, Eunjung Alice Lee, Christopher A. Walsh

Authors and Affiliations

Division of Neuropathology, Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Michael B. Miller
Division of Genetics and Genomics, Manton Center for Orphan Diseases, Boston Children’s Hospital, Boston, MA, USA
Michael B. Miller, August Yue Huang, Junho Kim, Zinan Zhou, Samantha L. Kirkham, Eduardo A. Maury, Hannah C. Reed, Jennifer E. Neil, Lariza Rento, Steven C. Ryu, Chanthia C. Ma, Michael A. Lodato, Eunjung Alice Lee & Christopher A. Walsh
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Michael B. Miller, August Yue Huang, Junho Kim, Eduardo A. Maury, Eunjung Alice Lee & Christopher A. Walsh
Department of Pediatrics, Harvard Medical School, Boston, MA, USA
Michael B. Miller, August Yue Huang, Junho Kim, Zinan Zhou, Samantha L. Kirkham, Eduardo A. Maury, Hannah C. Reed, Jennifer E. Neil, Lariza Rento, Steven C. Ryu, Chanthia C. Ma, Michael A. Lodato, Eunjung Alice Lee & Christopher A. Walsh
Department of Biological Sciences, Sungkyunkwan University, Suwon, South Korea
Junho Kim
Bioinformatics and Integrative Genomics Program, Harvard–MIT MD–PhD Program, Harvard Medical School, Boston, MA, USA
Eduardo A. Maury
Department of Molecular, Cell and Cancer Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
Jennifer S. Ziegenfuss & Michael A. Lodato
Allegheny College, Meadville, PA, USA
Hannah C. Reed
Howard Hughes Medical Institute, Boston, MA, USA
Jennifer E. Neil, Lariza Rento & Christopher A. Walsh
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Lovelace J. Luquette
Department of Pathology, University of Maryland School of Medicine, Baltimore, MD, USA
Heather M. Ames
Department of Pathology, Harvard Medical School, Massachusetts General Hospital, Boston, MA, USA
Derek H. Oakley & Matthew P. Frosch
Department of Neurology, Harvard Medical School, Massachusetts General Hospital, Boston, MA, USA
Matthew P. Frosch & Bradley T. Hyman
Department of Neurology, Harvard Medical School, Boston, MA, USA
Christopher A. Walsh

Authors

Michael B. Miller
View author publications
You can also search for this author in PubMed Google Scholar
August Yue Huang
View author publications
You can also search for this author in PubMed Google Scholar
Junho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Zinan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Samantha L. Kirkham
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo A. Maury
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer S. Ziegenfuss
View author publications
You can also search for this author in PubMed Google Scholar
Hannah C. Reed
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer E. Neil
View author publications
You can also search for this author in PubMed Google Scholar
Lariza Rento
View author publications
You can also search for this author in PubMed Google Scholar
Steven C. Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Chanthia C. Ma
View author publications
You can also search for this author in PubMed Google Scholar
Lovelace J. Luquette
View author publications
You can also search for this author in PubMed Google Scholar
Heather M. Ames
View author publications
You can also search for this author in PubMed Google Scholar
Derek H. Oakley
View author publications
You can also search for this author in PubMed Google Scholar
Matthew P. Frosch
View author publications
You can also search for this author in PubMed Google Scholar
Bradley T. Hyman
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Lodato
View author publications
You can also search for this author in PubMed Google Scholar
Eunjung Alice Lee
View author publications
You can also search for this author in PubMed Google Scholar
Christopher A. Walsh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.A.L., M.A.L., M.B.M. and C.A.W. conceived and designed the study. M.B.M., M.A.L., Z.Z. and S.L.K. performed single-neuron sorting and sequencing. A.Y.H. performed bioinformatic analysis with assistance from J.K. and E.A.M. L.R., S.C.R., S.L.K. and C.C.M. performed quality control experiments. B.T.H., M.P.F., D.H.O., M.B.M. and H.M.A. provided clinico-pathological analysis and selection of disease cases. J.S.Z. optimized and performed immunofluorescent imaging and quantification, and generated data shown in this manuscript. H.C.R. independently performed exploratory immunofluorescent staining. L.J.L. provided expertise in variant analysis and SCAN-SNV calling. J.E.N. contributed tissue procurement and ethics expertise. E.A.L., C.A.W. and M.A.L. supervised the study. M.B.M., A.Y.H., M.A.L., C.A.W. and E.A.L. wrote the manuscript.

Corresponding authors

Correspondence to Michael A. Lodato, Eunjung Alice Lee or Christopher A. Walsh.

Ethics declarations

Competing interests

C.A.W. is a paid consultant (cash, no equity) to Third Rock Ventures and Flagship Pioneering (cash, no equity) and is on the Clinical Advisory Board (cash and equity) of Maze Therapeutics. No research support is received. These companies did not fund and had no role in the conception or performance of this research project. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks Young Seok Ju and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Filtering of LiRA-called sSNVs to minimize single-cell artefacts from MDA amplification.

a, Total pre-filtering LiRA-called sSNV per genome for control and AD single neurons. Single neuronal nuclei from prefrontal cortex (PFC) and hippocampal CA1 (HC) underwent scWGS (45X targeted average coverage). Genome-wide counts of sSNV were determined using linked-read analysis (LiRA). Per genome sSNV counts for all control and AD neurons are shown here, prior to signature-based filtering. b, Total pre-filtering LiRA-called sSNV per genome plotted against raw LiRA-called sSNVs, an intermediate metric in the LiRA calling pipeline prior to power ratio adjustment for genome coverage and false positive rate. c, Single neuron sSNV counts in relation to coverage evenness of genome sequencing. Total pre-filtering LiRA-called sSNV counts from single neuronal nuclei are shown in relation to median absolute pairwise difference (MAPD) scores for the coverage evenness of each cell. At very high MAPD scores (>2.0), sSNV counts increase with MAPD, raising concern for artefactual sSNV calls in these cells owing to uneven genome coverage. d, e, Using NMF mutational signature analysis, the sSNV contribution was determined for two signatures potentially representing single-cell amplification artefacts: SBS scE and SBS scF²⁴. For signature, the mutation type frequency for each trinucleotide context is shown above the sSNV plot. SBS scF is composed of C>T changes, while SBS scE is characterized by a particular subset of C>T, GC>GT. Signature SBS scE showed elevation in cells with MAPD >2.0. Signature SBS scF shows a relationship between uneven amplification (high MAPD) and SBS scF, perhaps owing to allele dropout causing single strand lesions to be read as somatic mutations. A subset of AD neurons showed LiRA-called pre-filtering sSNV counts >20,000/neuron and substantial component of potential artefact signature SBS scE. These neurons may represent an agonal ‘ultramutated’ state, but were not included in subsequent analyses owing to the abundance of potential artefact signature SBS scE (see g). f, Schematic for potential generation of artefactual sSNV in scWGS owing to uneven coverage. The scWGS LiRA platform calls sSNVs that are linked by sequencing reads to heterozygous germline single nucleotide polymorphisms (SNPs) (left). A single-stranded lesion of DNA damage, such as oxidation or alkylation, is paired with an unmodified base on the opposite genomic strand, such that LiRA would not call a sSNV under conditions of sufficiently even sequencing coverage (middle). However, if severe non-uniformity in strand-specific amplification (strand dropout) occurred, the single-stranded DNA lesion (or a polymerase error on one strand) could be erroneously called as an sSNV (right). For this reason, severely uneven single-cell genome amplification could produce artefactual LiRA sSNV calls. g, Analysis pipeline for minimization of potential artefacts of single-cell genome amplification and sequencing. Using our observations and advances reported in Petljak et al.²⁴, we developed a computational pipeline to generate a set of higher-confidence filtered sSNV calls. This pipeline uses SNP-phased SNVs called by linked-read analysis (LiRA), and applies 3 additional specific steps to the initial variant call set: 1) Removal of single neurons which display widely uneven genome amplification, as indicated by MAPD score >2.0, above which the number of sSNVs increases (see c), raising concern for false positive variant calls due to uneven genome coverage; 2) Removal of single neurons whose mutational profile is dominated by the potential artefact mutational signature SBS scE (see d); and 3) Removal from each neuron the contribution of variants from the potentially artefactual signatures SBS scE and SBS scF. These steps produce counts of higher-confidence filtered sSNVs from single neurons. Although mutational signatures SBS scE and SBS scF have been previously reported as a potential artefact of single-cell genome amplification, the signal does potentially carry biological information. However, in this study we exclude these variants so as to minimize the influence of potential artefactual sSNV calls, to focus our analysis on the higher-confidence filtered sSNVs.

Source data

Extended Data Fig. 2 Single-cell variant calling identifies high-confidence sSNVs.

To assess the quality of the sSNVs identified from single-cell MDA-amplified WGS data, we compared their variant allele fractions in control and AD neurons to those of phaseable high-confidence heterozygous germline SNVs from the same neurons, shown for each base change type. The distributions between somatic and germline SNVs are comparable, indicating the validity of the somatic mutation calling method, as has been previously reported for the LiRA calling method^5,23.

Source data

Extended Data Fig. 3 sSNVs in neurotypical control and AD neurons, normalized by evenness of genome amplification or LiRA caller power ratio.

To assess the sSNV, as determined by the variant calling approach used in this study, we plotted sSNV counts from MDA-amplified single neurons against age, including using sSNV counts that were normalized for two distinct measures of evenness of genome coverage, median absolute pairwise difference (MAPD) and coefficient of variation (CoV). We also normalized by the power ratio used in LiRA phasing-based sSNV detection (see Methods). a–d, sSNVs per genome for neurotypical control neurons, with mixed-effects modelling trend lines for ageing. We observed a significant age-dependent increase of sSNV burden in each analysis, with the slope for human pyramidal neurons ranging from 16.4 sSNV/yr to 21.1 sSNV/yr, depending on the method of adjustment for genome coverage evenness. For analysis of PFC region cells alone, we observed a similar range of slopes by this analysis: 16.8 sSNV/yr to 21.3 sSNV/yr. e–h, sSNVs in AD compared to neurotypical control neurons. Unadjusted for evenness (e, reproduced from Fig. 1h, AD neurons show a mean of 2672 (range 783-8990) sSNVs, an excess of 971 over controls (P = 6.5 × 10⁻⁵, linear mixed model). f, Normalized for MAPD, AD neurons show a mean of 1582 (range 33-8366) sSNVs, an excess of 480 over controls (P = 0.01, linear mixed model). g, Normalized for CoV, AD neurons show a mean of 2264 (range 68-8861) sSNVs, an excess of 831 over controls (P = 6.7 × 10⁻⁵, linear mixed model). h, Normalized for power ratio, AD neurons show a mean of 2015 (range 162-7892) sSNVs, an excess of 511 over controls (P = 7.2 × 10⁻³, linear mixed model). In each analysis, AD neurons showed a significantly greater number of sSNV compared to control neurons. Although some normalizations may result in reduced detection of biological differences in AD specimens, we observed that sSNV differences are retained even after normalization, supporting a sSNV difference between AD and control neurons.

Source data

Extended Data Fig. 4 Distribution of sSNVs in relation to gene position comparing AD and age-matched control neurons.

a, sSNVs per neuron across different categories of genomic regions, based on position relative to gene structure. b, Proportional distribution of sSNVs in AD and control cases across different categories of genomic regions. Upstream and downstream were defined as <1 kb genomic regions from the transcription start and end sites, respectively. Each proportion is normalized by the expected proportion after controlling for trinucleotide context of phaseable regions. c, Proportional distribution of sSNVs relative to gene transcript length. The proportions for control or AD sSNVs were normalized by the expected proportion after controlling for trinucleotide context of phaseable regions. For each set, mean ± SEM is shown. For b, c, P value is shown for the observation showing statistically significant difference between AD and control (two-tailed t-test). AD neurons show a trend of excess over controls in sSNVs in upstream positions (not surviving Bonferroni correction). Data in this figure were obtained by MDA amplification of single genomes of neurons.

Source data

Extended Data Fig. 5 Somatic mutation trinucleotide context profiles and signature derivation in MDA-amplified single-neuron genomes.

a, Trinucleotide context somatic mutation profiles in AD and control neurons. Mutations called by LiRA are shown by base substitution change (bar colour), separated for each of the 16 possible trinucleotide contexts for each substitution (96 total trinucleotide contexts). For each brain region profiled, the aggregate is shown for AD cases, neurotypical controls, and the difference (residual of cases mutations minus control mutations). b, Signature metrics for de novo mutational signature derivation from neurons in this study. Using the frequency of sSNV mutations in their trinucleotide context for all control and AD neurons, we fitted mutational signatures with a NMF-based framework. We identified four signatures, N1-N4, that maximize the cophenetic of the decomposition⁸¹. c, sSNV mutational signatures evaluated in this study. We performed de novo mutational signature generation using NMF (MutationalPatterns and SignatureAnalyzer) on the set of scWGS data from single neurons from AD and neurotypical controls, which each produced 4 highly similar signatures by best fit. Previously published analysis of single neurons (Lodato et al.)⁵ during ageing produced 3 signatures: A, B, and C. A recently published study of cultured cells (Petljak et al.)²⁴ identified signatures thought to represent artefacts of scWGS, including SBS scE and SBS scF. d, Variation between neurons of mutational signature contributions. We performed linear regression for signature contribution with respect to age and disease status. The residual signature contribution of each neuron for signature A and signature C is shown here, for each disease group. Also shown are the mean (bar) ± standard deviation (boxes), with the range (whisker lines). In addition to the neurotypical control and AD neurons reported in this manuscript, we also performed this analysis on previously reported single human neuron data for two NER-deficiency diseases: Cockayne syndrome (CS) and xeroderma pigmentosum (XP)⁵. Because only PFC was studied for CS and XP, only the control and AD neurons from PFC were used for this analysis. For each disease group, signature C showed a greater standard deviation than signature A; standard deviation ratios between signatures C and A are as follows: 1.2 (control), 1.2 (AD), 3.2 (CS), and 1.1 (XP). Data were obtained from MDA amplification of single neuron genomes. Boxplots show mean ± SD, with whiskers denoting minima and maxima.

Source data

Extended Data Fig. 6 COSMIC mutational signature contributions to single-neuron signatures and disease-related mutational patterns.

a, The set of trinucleotide contexts in single neuron signatures derived in the prior study (signatures A and C)⁵, along with single neuron signatures derived de novo from single AD and control neurons (signatures N4 and N2 derived using MutationalPatterns, and signatures W3 and W2 derived using SignatureAnalyzer) were analysed for contributions by COSMIC v3 single base substitution mutational signatures by NMF. The matching prior and de novo signatures show highly similar COSMIC signature contributions. b, The set of mutation trinucleotide contexts present in AD and control neuron genomes amplified by MDA, as well as the matrix of mutations obtained by subtracting control from AD (AD residual), were analysed for contributions by COSMIC signatures. Multiple COSMIC signatures identified here, many of which also contribute to signature C⁵, are associated with transcription-coupled nucleotide excision repair at particular damaged nucleotides with specific resultant base changes, including: SBS8 (guanine damage, C>A mutations), SBS22 (adenine damage, T>A mutations), SBS12 (adenine damage, T>C mutations), and SBS19 (guanine damage, C>T mutations). Other signatures have been associated with deficiencies of separate DNA repair processes: SBS6 (mismatch repair) and SBS30 (base excision repair). SBS5, associated with ageing, contributes significantly to the control and AD samples, but not to the AD residual mutations.

Source data

Extended Data Fig. 7 Immunofluorescent detection of nucleotide oxidation in neurons.

Immunofluorescence was performed on post-mortem human brain prefrontal cortex. NeuN (AF488) was used to label neurons and 8-oxoG (AF555) used to label oxidized guanine nucleotides. a, For each case sample, in a full microscopic field of up to 100 NeuN+ neurons, 8-oxoG signal was quantified per neuron. Here, each data point represents the 8-oxoG signal from one neuron, with mean and SEM shown in black for each case. Figure 2f shows mean 8-oxoG values of each case in relation to age and disease status. b, Representative microscopy images (turquoise or purple boxes) are shown for neurotypical control and AD samples from a. n = 100 total neurons examined (50 neurons each from two independent staining experiment batches per case). NeuN+ neurons are shown in green and 8-oxoG in greyscale or magenta. Scale bars represent 60 µm.

Source data

Extended Data Fig. 8 Features of somatic mutations in single neurons assessed by PTA.

a, Trinucleotide somatic mutation spectra of cells or bulk samples studied by various methods were compared. For PTA-amplified single neurons, the aggregate of mutations is shown for AD cases, age-matched neurotypical controls, and the residual (net increase of case mutations over control mutations). Mutational spectra from other methods include NanoSeq-studied bulk samples from AD or controls and META-CS single neuron data for double-stranded mutations or single-stranded DNA lesions. Mutations are shown by base substitution change (bar colour). Of note, single-stranded DNA lesions show a distinct profile from mutations detected by PTA, NanoSeq, and META-CS. b, The spectra of mutations detected in PTA-amplified neurons (AD, control, and AD residual) and from other published methods were analysed for contributions by COSMIC cancer signatures. Elements of COSMIC signatures identified in the AD residual mutation set, including SBS8, also contribute to signature C⁵. Of note, single-stranded DNA lesions show a distinct profile from mutations detected by PTA, NanoSeq, and META-CS. c–f, sSNV detected using PTA in AD and neurotypical control neurons, normalized by evenness of genome amplification or LiRA caller power ratio. c, Total sSNVs per genome plotted against age (uncorrected, reproduced here from Fig. 3a for comparison). AD neurons show a mean of 1419 (range 514–2157) sSNVs, an excess of 196 over controls (P = 3.9 × 10⁻⁴, linear mixed model). d, MAPD-normalized sSNVs per genome, from which AD neurons show a mean of 1703 (range 814-2748) sSNVs, an excess of 453 over controls (P = 2.7 × 10⁻⁶, linear mixed model). e, CoV-normalized sSNVs per genome, from which AD neurons show a mean of 1440 (range 527-2255) sSNVs, an excess of 189 over controls (P = 5.3 × 10⁻⁴, linear mixed model). f, Power-normalized sSNVs per genome, from which AD neurons show a mean of 1423 (range 517–2166) sSNVs, an excess of 198 over controls (P = 3.8 × 10⁻³, linear mixed model). In each analysis, AD neurons showed a significantly greater number of sSNV compared to control neurons.

Source data

Extended Data Table 1 Studies of sSNV rates and signatures during ageing in various human cell types^{5,6,7,76,77,78,79,80}

Full size table

Supplementary information

Reporting Summary

Peer Review File

Supplementary Table 1

Sample information. The Sample Information tab contains detailed information for 26 individuals in present study. PMI = Post-Mortem Interval; SIDS = Sudden Infant Death Syndrome; MVA = Motor Vehicle Accident; HASCVD = Hypertensive Atherosclerotic Cardiovascular Disease; COPD = Chronic Obstructive Pulmonary Disease, RIN = RNA integrity number. The Library and Sequencing tabs contain information on each cell sequenced, for single-neuron genomes amplified with multiple displacement amplification (MDA) or primary template-directed amplification (PTA).

Supplementary Table 2

Sequencing statistics for WGS datasets. Tabs show the respective sequencing statistics for single-neuron genomes amplified with MDA or PTA.

Supplementary Table 3

sSNV candidates identified in each neuron. CC denotes Composite Coverage, an integer coverage-based quality metric for each putative sSNV²². Linked 1K Genomes SNP refers to the linked germline anchor site used to phase mutation calls. Orientation refers to whether the somatic alternate allele was on the same haplotype as the germline alternate allele (cis) or whether the two alternate alleles were on opposite haplotypes (trans). The two tabs show the respective characteristics for neuron genomes amplified with MDA or PTA.

Supplementary Table 4

sSNV counts per neuron. Mean sSNV count per gigabase pair (Gbp) estimates with lower bounds and upper bounds are provided. ‘Phaseable Mutations Identified’ reflects number of sSNV candidates passing the listed CC threshold. Estimated number of autosomal sSNVs was determined by multiplying the sSNV rate per Gbp by the size of the autosomal genome. Difference in number of identified phaseable mutations and estimated rates reflect ‘Power ratio’ extrapolation based on power analysis (see Methods). Filtering of estimated SNVs reflects removal of potential artefact signatures (see Methods). Median absolute pairwise difference (MAPD) and coefficient of variation (CoV) are measures of the unevenness of genome amplification.

Supplementary Table 5

Exonic sSNVs identified across datasets. Predicted functional effects are annotated.

Supplementary Table 6

Gene Ontology terms enriched for sSNVs.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miller, M.B., Huang, A.Y., Kim, J. et al. Somatic genomic changes in single Alzheimer’s disease neurons. Nature 604, 714–722 (2022). https://doi.org/10.1038/s41586-022-04640-1

Download citation

Received: 12 June 2020
Accepted: 14 March 2022
Published: 20 April 2022
Issue Date: 28 April 2022
DOI: https://doi.org/10.1038/s41586-022-04640-1
Springer Nature Limited

This article is cited by

NOX4-mediated astrocyte ferroptosis in Alzheimer’s disease
- Yasenjiang Maimaiti
- Ting Su
- Hong Xu
Cell & Bioscience (2024)
Hippocampal transcriptome-wide association study and pathway analysis of mitochondrial solute carriers in Alzheimer’s disease
- Jing Tian
- Kun Jia
- Heng Du
Translational Psychiatry (2024)
Methods and applications of genome-wide profiling of DNA damage and rare mutations
- Gerd P. Pfeifer
- Seung-Gi Jin
Nature Reviews Genetics (2024)
Genetic variation across and within individuals
- Zhi Yu
- Tim H. H. Coorens
- Pradeep Natarajan
Nature Reviews Genetics (2024)
Alzheimer’s disease, aging, and cannabidiol treatment: a promising path to promote brain health and delay aging
- Yanying Liu
Molecular Biology Reports (2024)

Somatic genomic changes in single Alzheimer’s disease neurons

Abstract

Similar content being viewed by others

Main

Somatic mutations in neurons during ageing

Somatic mutations in AD

Mutational signature analysis in AD neurons

Oxidative damage in AD neurons

Transcriptional influence on somatic SNVs

Potential consequences of somatic mutations in AD

Interrogation of AD neuron genomes by PTA

Discussion

Methods

Data reporting

Human tissue samples and selection of cases of AD

Isolation of individual pyramidal neurons for single-cell studies

scWGS of pyramidal neurons using MDA

scWGS of pyramidal neurons using PTA

Read-mapping and generation of BAM files

Calling of sSNVs from scWGS data

Determining the evenness of single-cell genome amplification

Mutational signature analysis

Filtering of LiRA-called somatic SNVs from MDA-amplified genomes of single neurons

Mixed-effects modelling of somatic SNV burden

Gene expression analysis

Functional enrichment analysis

Strand bias analysis

Location of sSNVs relative to genomic features

Modelling the accumulation of gene knockouts in neurons

Immunofluorescence microscopy for 8-oxoG as a biomarker for neuron oxidative damage

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation