Abstract
Understanding the molecular mechanisms underlying frontotemporal dementia (FTD) is essential for the development of successful therapies. Systematic studies on human post-mortem brain tissue of patients with genetic subtypes of FTD are currently lacking. The Risk and Modyfing Factors of Frontotemporal Dementia (RiMod-FTD) consortium therefore has generated a multi-omics dataset for genetic subtypes of FTD to identify common and distinct molecular mechanisms disturbed in disease. Here, we present multi-omics datasets generated from the frontal lobe of post-mortem human brain tissue from patients with mutations in MAPT, GRN and C9orf72 and healthy controls. This data resource consists of four datasets generated with different technologies to capture the transcriptome by RNA-seq, small RNA-seq, CAGE-seq, and methylation profiling. We show concrete examples on how to use the resulting data and confirm current knowledge about FTD and identify new processes for further investigation. This extensive multi-omics dataset holds great value to reveal new research avenues for this devastating disease.
Similar content being viewed by others
Background & Summary
Frontotemporal Dementia (FTD) is a devastating pre-senile dementia characterized by progressive deterioration of the frontal and anterior temporal lobes1. The most common symptoms include severe changes in social and personal behaviour as well as a general blunting of emotions. Clinically, genetically, and pathologically there is considerable overlap with other neurodegenerative diseases including Amyotrophic Lateral Sclerosis (ALS), Progressive Supranuclear Palsy (PSP) and Cortical Basal Degeneration (CBD)2. Research into FTD has made major advances over the past decades. Up to 40% of cases3 have a positive family history and up to 60% of familial cases can be explained by mutations in the genes Microtubule Associated Protein Tau (MAPT), Granulin (GRN) and C9orf724 which has been key to the progress in our understanding of its molecular basis. Several other disease-causing genes have been identified that account for a much smaller fraction of cases5. Mutations in MAPT lead to accumulation of the Tau protein in neurofibrillary tangles in the brain of patients while mutations in GRN and C9orf72 lead to the accumulation of TDP-436, as well as dipeptide repeat proteins (DPRs) and RNA foci in the case of C9orf727.
As of today, no therapy exists that halts or slows the neurodegenerative process of FTD and to develop successful therapies there is an urgent need to determine whether a common target and therapy can be identified that can be exploited for all patients, or whether the distinct genetic, clinical, and pathological subgroups need tailored treatments. Therefore, the development of remedies relies heavily on a better understanding of the molecular and cellular pathways that drive FTD pathogenesis in all FTD subtypes.
Although our knowledge of FTD pathogenesis using molecular and cellular biology approaches has significantly advanced during recent years, a deep mechanistic understanding of the pathological pathways requires simultaneous profiling of multiple regulatory mechanisms.
Post-mortem human brain tissue is an important source for studying the disturbance of molecular processes in patients with FTD. However, systematic studies on human post-mortem brain tissue with genetic subtypes of FTD are currently lacking. Therefore, the Risk and modifying factors in Frontotemporal Dementia (RiMod-FTD) consortium has generated a multi-omics data resource with the focus on mutations in the three most common causal genes for FTD: MAPT, GRN and C9orf72.
Here, we report four datasets from the frontal lobe of post-mortem human brain from RiMod-FTD samples. We present RNA-seq, CAGE-seq (Cap analysis gene expression sequencing), small RNA-seq (smRNA-seq), and methylation datasets from matched samples, which enable precise profiling of transcriptional dysregulations in genetic FTD subtypes (Fig. 1, Table 1). The RNA-seq dataset can be used to identify general transcriptional differences in the FTD subtypes as well as potential alternative splicing events. The regulation of the transcriptome can be studied using the other three datasets. CAGE-seq enables the detection of active or inactive promoters and thereby helps to pinpoint potentially disease-relevant transcription factors. Methylation and smRNA-seq datasets allow researchers to identify regulatory mechanisms that lead to down-regulation of certain genes. These different aspects of the transcriptome can be studied in detail due to matched samples for all these datasets.
We show that known differences and commonalities of the investigated FTD subtypes can be recapitulated with initial analyses of the datasets, such as for instance neuronal loss due to neurodegeneration. More importantly, interesting new aspects about these FTD subtypes can be easily obtained from this dataset. We therefore believe that a thorough analysis of these data from the RiMod-FTD project can reveal helpful new insights about FTD and believe that researchers will find these datasets helpful for their own studies. To make the data easily accessible, we provide a gene browser which allows to compare expression and methylation information between disease groups for specific genomic locations at www.rimod-ftd.org.
Methods
Donor samples employed in this study
The brain samples and/or bio samples were obtained from The Netherlands Brain Bank, Netherlands Institute for Neuroscience, Amsterdam (open access: www.brainbank.nl). All Material has been collected from donors for or from whom a written informed consent for a brain autopsy and the use of the material and clinical information for research purposes had been obtained by the NBB. Additional samples were provided by the Queen Square Brain Bank of Neurological Disorders and MRC, King College London. We complied to all relevant regulations of the mentioned brain banks for the work with these samples.
Gyrus frontalis medialis (GFM) tissue from each subject was divided into three pieces for transcriptomic, proteomic, and epigenetic experiments in a dry-ice bath using precooled scalpels and plasticware.
Genetic analysis
Genomic DNA was isolated from 50 mg of GFM frozen brain tissue by using the Qiamp DNA mini kit (Qiagen) following the manufacturer protocol. DNA concentration and purity were assessed by nanodrop measurement. DNA integrity was evaluated by loading 100 nanogram per sample on a 0,8% agarose gel and comparing size distribution to a size standard.
Presence of C9orf72 hexanucleotide repeat expansion (HRF) in post-mortem brain tissues was confirmed by primed repeat PCR according to established protocols. Reported mutations for MAPT and GRN were verified by sanger sequencing.
Transcriptomic procedures
RNA isolation from human brain tissue
Total RNA for CAGE-seq and RNA-seq was isolated from ±100 mg of frozen brain tissue with TRIzol reagent (Thermo Fischer Scientific) according to the manufacturer recommendation, followed by purification with the RNeasy mini columns (Qiagen) after DNAse treatment.
Total RNA for smRNA-seq was isolated from frozen tissue using the TRIzol reagent (ThermoFischer Scientific). After isopropanol precipitation and 80% ethanol rinsing RNA pellet was resuspended in RNAse free water and up to 10 micrograms of RNA was incubated with 2U of Ambion DNAse I (ThermoFischer) at 37 °C for 20 minutes. DNA-free RNA samples were then further purified by phenol-chloroform-isoamyl-alchol extraction followed by ethanol precipitation.
RNA QC
For each RNA sample, RNA concentration (A260) and purity (A260/280 and A260/230) were determined by Nanodrop measurement and RNA integrity (RIN) was assessed on a Bioanalyser 2100 system and/or Tape station 41200 (Agilent Technologies Inc.)
RNA-seq libraries
Total RNA-seq libraries were prepared from 1 microgram of total RNA from frozen brain tissue using the TruSeq Stranded Total RNA with Ribo-Zero Gold kit (Illumina) according to the protocol specifications. RNA-seq libraries were sequenced on a Hiseq2500 and HISeq4000 on a 2 × 100 bp paired end (PE) flow cell (Illumina) at an average of 100 M PE/sample.
CAGE-seq libraries
CAGE-seq libraries were prepared from 5 micrograms of RNA from frozen brain tissues according to a published protocol8. Libraries were sequenced on a HiSeq2000 and/or HiSeq2500 on a 1 × 50 bp single read (SR) flow cell (Illumina) at an average of 20 M reads/sample. On average, 21,212,923 reads were generated per sample.
smRNA-seq libraries
The small RNA-seq libraries were prepared in two different batches. They were prepared from frozen tissue starting from 2 micrograms of total RNA using the Nextflex Small RNA-seq kit v3 (Bioo Scientific) and the NEBNext Small RNA library prep set for Illumina (New England Biolabs), respectively. Libraries were sequenced on a NextSeq550 on a 75 cycles flow cell.
Methylation assay
To assess the methylation status of over 850000 CpG sites in promoter, gene body and enhancer regions we have used the MethylationEPIC bead chip arrays (Illumina).
Bisulfite conversion of genomic DNA, genome amplification, hybridization to the beadchips, washing, staining, and scanning procedure was performed by Atlas Biolabs (Atlas Biolabs, Berlin, Germany). Cases and controls DNAs were distributed randomly across each array.
RNA-seq processing and analysis
Raw FastQ files were processed using the RNA-seq pipeline from nf-core (nf-core/rnaseq v1.3)9, with trimming enabled. Gene quantification was subsequently done using Salmon (v0.14.1)10 on the trimmed FastQ files. Alignment and mapping were performed against the human genome hg38. On average, 82,165,192 reads could be uniquely mapped, which relates to on average 88.6% uniquely mapped reads per sample (Supplementary Table 2). In total, 59,270 transcripts could be identified. DESeq2 (v.1.26.0)11 was used to perform differential expression analysis. We corrected for the covariates sex and PH-value. For visualization and clustering, the data was transformed using the variance stabilization transformation from the DESeq2 package.
Cell type deconvolution
We performed cell type deconvolution on the RNA-seq data using Scaden12. For training we used the human brain training dataset used in the Scaden publication. Each ensembl model was trained for 5000 steps. Cell type deconvolution was then performed with the trained Scaden model on the RNA-seq count data. Relative changes in cell type composition were quantified by first calculating the average fractions of a cell type for all groups and then calculating the percentual change of cell fractions compared to the average control fractions. This allows to detect relative changes in cell type compositions. To test for statistical differences between disease groups, we have performed an ANOVA test with a post-hoc Tukey HSD test using R.
CAGE-seq processing and analysis
Sequencing adapters and barcodes in CAGE-seq FastQ files were trimmed using Skewer (v.0.1.126)13. Sequencing artefacts were removed using TagDust (v1.0)14. Processed reads were then aligned against the human genome hg38 using STAR (v.2.4.1)15. On average, 16,306,077 could be uniquely mapped per sample (76% uniquely mapped on average reads per sample, Supplementary Table 2). CAGE detected TSS (CTSS) files were created using CAGEr (v1.10.0)16. With CAGEr, we removed the first G nucleotide if it was a mismatch. CTSS were clustered using the ‘distclu’ method with a maximum distance of 20 bp. For exact commands used we refer to the reader to the scripts used in this pipeline: https://github.com/dznetubingen/cageseq-pipeline-mf. In total, we could identify 47,298 different peaks. Data was normalized to counts per million (CPM) for visualization on the website.
smRNA-seq processing and analysis
After removing sequencing adapters, all FastQ files were uploaded to OASIS217 for analysis. On average, 3,430,613 (+/− 1,365,407) reads could be uniquely mapped per sample (Supplementary Table 2). Subsequent differential expression analysis was performed on the counts yielded from OASIS2, using DESeq2 and correcting for sex and PH-value, as was done for the RNA-seq data. Additionally, we added a batch variable to the design matrix to correct for the two different batches of this dataset. In total, 2904 human smRNA genes could be detected. For visualization and clustering, the data was transformed using the variance stabilization transformation from the DESeq2 package.
Methylation data processing and analysis
The Infinium MethylationEPIC BeadChip, which consists of 866,091 CpG locations, data was analyzed using the minfi R package18. We removed all sites with a detection P-value above 0.01, on sex chromosomes and with single nucleotide polymorphisms (SNPs), leaving 810,290 loci for analysis. Data normalization was done using stratified quantile normalization. Sites with a standard deviation below 0.1 were considered uninformative and filtered out, leaving 170,595 sites that were used to perform a principal component analysis.
Additional sequencing quality control
To gather additional sequencing quality control metrics, all sequencing assays (RNA-seq, CAGE-seq, smRNA-seq) were aligned against the hg38 reference genome (RNA-seq, CAGE-seq) or miRbase (smRNA-seq) using bowtie219. Subsequently, the CollectAlignmentSummaryMetrics and CollectGcBiasMetrics from the Picard toolkit (https://broadinstitute.github.io/picard/) were run on the aligned data. Sample swap analysis on sequencing assays was performed with reads aligned against the hg38 reference genome using SMaSH20. We verified sample identify between assays by checking the sample with the most significant p-Value between assays up to a p-value of 0.2. This was not possible for all combinations of samples and assays as insufficient variant overlap led to bad p-values, especially between CAGE-seq and smRNA-seq. However, no sample swaps could be detected and for the majority of sample comparisons the correct identity could be verified. We furthermore verified the sex of all samples by comparing the ratio of reads mapping to the X and Y chromoses.
Data Records
All datasets have been submitted to ArrayExpress with the following accessions: E-MTAB-1264721 (RNA-seq), E-MTAB-1264622 (CAGE-seq), E-MTAB-1267423 (Methylation) and E-MTAB-1273124 (smRNA-seq). The sequencing datasets (RNA-seq, smRNA-seq and CAGE-seq) contain raw FastQ files and metadata information. The methylation dataset contains raw intensity values and metadata information. Processed data in the form of count tables have been uploaded to FigShare for ease of access: https://doi.org/10.6084/m9.figshare.23825595.v125. The cell type fractions from the deconvolution analysis have been uploaded to FigShare as well.
Technical Validation
RNA integrity
We only included samples with an RNA integrity value above 5. The mean RIN value is 6.9 with a standard deviation of 1.0.
Sample identity
We confirmed the sex of all samples by comparing the ratio of reads mapping to the X and Y chromosomes. All sequencing assays (RNA-seq, CAGE-seq, smRNA-seq) were checked for potential sample swaps using the SMaSH software20.
Sample statistics
To gain an overview of the dataset, general sample statistics are visualized in Fig. 2. Samples from the control group have a higher age compared to FTD samples, as FTD patients die at a younger age because of the disease. Percentage of different sexes are similar in all groups, with slightly higher percentages of female samples. FTD samples furthermore have, on average, a lower RNA integrity value and a lower pH value compared to healthy controls. The postmortem duration (PMD) is very similar across all groups. All metadata is provided in the supplements (Supplementary Table 1). Additional QC metrics were generated with the Picard toolkit and are provided as supplementary data (supplementary Tables 3 and 4).
Methylation data quality control and normalization
The detection P-value has been calculated for all positions and samples. This value is calculated by comparing the total DNA signal to the background level, which is estimated using negative control positions. All samples had very low detection P-values, with a maximum value of 0.0005 (Fig. 3a). We used stratified quantile normalization to normalize the methylation signal for every position. From Fig. 3a,b it is visible that this normalization can be used to even out the signal between samples in this dataset.
Cell type composition of samples
To evaluate how the predicted cell type composition of the RiMod-FTD samples reflects the disease, we performed cell type deconvolution with the RNA-seq data using the Scaden algorithm (see Methods). As expected, neuronal cells make up the largest fractions for all samples, although neuronal fractions are smaller in samples with FTD (Fig. 4c). Comparison of percentage changes of average cell type fractions compared to controls (Fig. 4a) reveals that microglial fractions are particularly large in samples with FTD-GRN. Furthermore, all disease subtypes show a strong increase in endothelial cell fractions compared to controls. The decrease in neuronal cells is largely caused by a smaller number of excitatory neurons in all disease subtypes, while inhibitory neuron fractions are not decreasing. A statistical comparison of cellular fractions between groups using an ANOVA with post-hoc Tukey test showed that many of these composition changes are significant for FTD-MAPT and FTD-GRN (Fig. 4b). In the FTD-C9orf72 group, no cell composition change is significant, indicating a less strong change in cellular composition in this group. Generally, these results are in accordance with recent findings about FTD biology, such as the involvement of excitatory neurons26,27, the increase microglial inflammatory response in FTD-GRN28, or very recently, neurovascular dysfunctions in FTD-GRN29.
Usage Notes
Inspection of assay results in RiMod-FTD browser
The data from the different assays of the RiMod-FTD project can be easily inspected using the RiMod-FTD Browser at https://www.rimod-ftd.org. The website allows to query the RNA-seq expression levels of single genes. It is easily visible, for instance, that GRN RNA levels are lower in brains from patients with GRN mutations, compared to the other groups (Fig. 5a). When selecting a gene, the surrounding genomic location is additionally shown with RNA-seq expression levels for genes in the region, CAGE-seq peaks and methylation levels. All data can be grouped by disease, pathology, sex, mutated gene and specific mutation (Fig. 5b,c). The browser does allow for quick inspection of FTD-related genes.
Code availability
The code for all analyses and figures of this study has been deposited at https://github.com/dznetubingen/rimod-ftd-dataset.
References
Bang, J., Spina, S. & Miller, B. L. Frontotemporal dementia. The Lancet 386, 1672–1682 (2015).
Panza, F. et al. Development of disease-modifying drugs for frontotemporal dementia spectrum disorders. Nat. Rev. Neurol. 16, 213–228 (2020).
Seelaar, H., Rohrer, J. D., Pijnenburg, Y. A. L., Fox, N. C. & van Swieten, J. C. Clinical, genetic and pathological heterogeneity of frontotemporal dementia: a review. J. Neurol. Neurosurg. Psychiatry 82, 476–486 (2011).
Olszewska, D. A., Lonergan, R., Fallon, E. M. & Lynch, T. Genetics of Frontotemporal Dementia. Current Neurology and Neuroscience Reports vol. 16 (2016).
Sirkis, D. W., Geier, E. G., Bonham, L. W., Karch, C. M. & Yokoyama, J. S. Recent Advances in the Genetics of Frontotemporal Dementia. Curr. Genet. Med. Rep. 7, 41–52 (2019).
Sieben, A. et al. The genetics and neuropathology of frontotemporal lobar degeneration. Acta Neuropathologica 124, 353–372 (2012).
Balendra, R. & Isaacs, A. M. C9orf72-mediated ALS and FTD: multiple pathways to disease. Nat. Rev. Neurol. 1, https://doi.org/10.1038/s41582-018-0047-2 (2018).
Takahashi, H., Lassmann, T., Murata, M. & Carninci, P. 5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat. Protoc. 7, 542–561 (2012).
Ewels, P. A. et al. nf-core: Community curated bioinformatics pipelines. bioRxiv 610741 https://doi.org/10.1101/610741 (2019).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Publ. Gr. 14, (2017).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 (2014).
Menden, K. et al. Deep learning–based cell composition analysis from tissue expression profiles. Sci. Adv. 6, eaba2619 (2020).
Jiang, H., Lei, R., Ding, S. W. & Zhu, S. Skewer: A fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15, 182 (2014).
Lassmann, T. TagDust2: A generic method to extract reads from sequencing data. BMC Bioinformatics 16, 24 (2015).
Dobin, A. & Gingeras, T. R. Mapping RNA-seq Reads with STAR. in Current Protocols in Bioinformatics https://doi.org/10.1002/0471250953.bi1114s51 (2015).
Haberle, V., Forrest, A. R. R., Hayashizaki, Y., Carninci, P. & Lenhard, B. CAGEr: Precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res. 43, e51–e51 (2015).
Rahman, R. U. et al. Oasis 2: Improved online analysis of small RNA-seq data. BMC Bioinformatics 19, 54 (2018).
Aryee, M. J. et al. Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).
Langmead, B. Aligning short sequencing reads with Bowtie. Curr. Protoc. Bioinforma. Chapter 11, Unit 11.7 (2010).
Westphal, M. et al. SMaSH: Sample matching using SNPs in humans. BMC Genomics 20, 1–10 (2019).
Menden, K. et al. H. P. E-MTAB-12647., ArrayExpress, https://identifiers.org/arrayexpress:E-MTAB-12647 (2023).
Menden, K. et al. H. P. E-MTAB-12646., ArrayExpress, https://identifiers.org/arrayexpress:E-MTAB-12646 (2023).
Menden, K. et al. H. P. E-MTAB-12674., ArrayExpress, https://identifiers.org/arrayexpress:E-MTAB-12674 (2023).
Menden, K. et al. H. P. E-MTAB-12731., ArrayExpress, https://identifiers.org/arrayexpress:E-MTAB-12731 (2023).
Menden, K. et al. H. P. RiMod-FTD Supplementary Data., figshare, https://doi.org/10.6084/m9.figshare.23825595.v1 (2023).
Benussi, A. et al. Toward a glutamate hypothesis of frontotemporal dementia. Front. Neurosci. 13 (2019).
Palese, F. et al. Anti-GluA3 antibodies in frontotemporal dementia: effects on glutamatergic neurotransmission and synaptic failure. Neurobiol. Aging 86, 143–155 (2020).
Martens, L. H. et al. Progranulin deficiency promotes neuroinflammation and neuron loss following toxin-induced injury. J. Clin. Invest. 122, 3955–3959 (2012).
Gerrits, E. et al. Neurovascular dysfunction in GRN-associated frontotemporal dementia identified by single-nucleus RNA sequencing of human cerebral cortex. Nat. Neurosci. 2022 258 25, 1034–1048 (2022).
Acknowledgements
Post-mortem brain tissue was obtained from the Dutch Brain Bank, Netherlands Institute for Neuroscience, Amsterdam, and from the London Neurodegenerative Disease Brain Bank, King’s College London, London, UK. The London Neurodegenerative Disease Brain Bank is part of the Brains for Dementia Research Initiative.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
P.H. initiated and designed the project, planned, and interpreted the experiments and wrote the manuscript. K.M. was involved in all bioinformatic data analyses and wrote the manuscript. A.F., L.K., P.R. and N.F. performed the small RNA-seq experiments. A.D. performed experiments involving smNPC neurons and microglia. C.B., P.R., N.F. and M.C. performed RNA-seq and CAGE-seq experiments. M.F. and T.N. analysed the CAGE-seq data. B.A. analysed the RNA-seq data. P.R. and S.B. planned and interpreted analyses and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Menden, K., Francescatto, M., Nyima, T. et al. A multi-omics dataset for the analysis of frontotemporal dementia genetic subtypes. Sci Data 10, 849 (2023). https://doi.org/10.1038/s41597-023-02598-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02598-x
- Springer Nature Limited
This article is cited by
-
PRDM16-DT is a novel lncRNA that regulates astrocyte function in Alzheimer’s disease
Acta Neuropathologica (2024)