Avoid common mistakes on your manuscript.
The KIAA1549:BRAF fusion is the most common alteration in pilocytic astrocytoma (PA). It is generated by a focal tandem duplication at 7q34 and acts as an oncogene by driving the mitogen-activated protein kinase (MAPK) pathway [4]. Detection of this characteristic genetic event is of high clinical relevance, both for its diagnostic/prognostic relevance and as a therapeutic target. RNA sequencing (RNA-Seq) of fresh-frozen or formalin-fixed paraffin-embedded (FFPE) tissue has recently gained popularity in the diagnostic setting [1]. By identifying split reads that map to two different genomic loci, RNA-Seq data can be used to detect expressed fusion genes. Several tools have been developed for this purpose, including Arriba (https://github.com/suhrig/arriba), FusionCatcher (https://github.com/ndaniel/fusioncatcher) and STAR-Fusion (https://github.com/STAR-Fusion/STAR-Fusion). Previous studies have suggested that the KIAA1549:BRAF fusion is expressed at a low level [5,6,7], but the reliability of detection of this important fusion using different RNA-Seq analysis pipelines has not been examined so far.
To this end, we generated RNA-Seq data (polyA-enriched, TruSeq Stranded, 2 × 100 bp paired-end reads) from 22 fresh-frozen pediatric PA tumor samples, in which a KIAA1549:BRAF fusion had previously been identified by whole-genome sequencing (WGS) [3]. The raw data was subsequently aligned by STAR [2] (v2.7.3a), and gene fusions were identified using Arriba (v1.1.0). Despite a total read count of about 200 million reads per sample (Fig. 1a), the KIAA1549:BRAF fusion was only correctly identified in 14/22 samples (Fig. 1b). In three additional samples, Arriba had identified but then discarded the fusion, as it was supported by just one sequencing read. In five samples, the fusion was not detected at all with this workflow. To investigate the influence of sequencing depth, we re-sequenced these five samples, substantially increasing their total read count to more than 500 million reads per sample (Fig. 1a). Surprisingly, however, we were still unable to detect the fusion in four of these five samples, only slightly changing the overall result (Fig. 1b). The detection rate was not significantly different between samples with a KIAA1549 exon 16—BRAF exon 9 (16:9) or with the 15:9 fusion variant (Online Resource Fig. 1a).
Next, we investigated different factors that could influence the detectability. The immune cell content, evaluated by ESTIMATE [8], was significantly lower in those samples in which the fusion was detected (reported or discarded) compared to those in which the fusion was missed (Fig. 1c), this suggests that a higher tumor cell content facilitates fusion detection. The amplitude of the genomic 7q34 gain as a further measure of tumor purity pointed in a similar direction (Online Resource Fig. 1b and 2). The expression level of the fusion partner genes, KIAA1549 (Fig. 1d) and BRAF (Online Resource Fig. 1c), was also significantly higher in cases where the fusion was detected. Interestingly, BRAF fusions with alternative fusion partners (FAM131B, GNAI1, MKRN1 or RNF130) were detected without problems, and the expression of these alternative 5′ genes was consistently higher than that of KIAA1549 (Fig. 1d). The estimated library size (a measure of the complexity captured by the RNA-Seq library) showed a trend towards correlating with detectability (Online Resource Fig. 1d), but had levels in both groups that were above those typically considered to cause general problems in fusion detection (< 30 million; authors’ unpublished observations). Furthermore, we could not exclude an influence of the library preparation protocol. Fusion analysis of an older RNA-Seq cohort was significantly more sensitive compared to the cohort presented here (Online Resource Fig. 1e), with the only obvious difference being the library preparation protocol (ribosome-depleted total RNA vs. polyA capture). Likely, a combination of all of these factors determines the overall detectability for a given sample. In particular, however, the samples in which the KIAA1549:BRAF fusion was missed ranked significantly worse for KIAA1549 expression and tumor cell content (Online Resource Fig. 1f–g).
Analyzing the data using FusionCatcher (v1.20) did not improve the overall result (Online Resource Table 1). FusionCatcher missed some fusions that were detected by Arriba but also reported one that was missed by Arriba. Therefore, we hypothesized that the raw sequencing data might contain fusion-relevant information that is differently processed by the algorithms. Indeed, scanning the raw FASTQ files for sequences spanning the breakpoint of KIAA1549 and BRAF (16:9 and 15:9) using the UNIX utility grep revealed matching reads in all samples. Further analysis showed that these split reads were not always properly aligned by STAR, which has known issues with overlapping paired-end reads and split reads with a short overhang, and were thus not visible to downstream processing by Arriba.
To overcome these limitations, we tested different parameters that have recently been incorporated into STAR. We found the settings –peOverlapNbasesMin 10 and –alignSplicedMateMapLminOverLmate 0.5 to improve the alignment of split reads from our paired-end sequencing data. In addition, we developed a new version of Arriba (v1.2.0) that is able to detect fusions with only one supporting read if they are included in a curated list of known fusions. This should reduce the number of false negatives observed with earlier versions of Arriba. These modifications substantially improved overall detection of the KIAA1549:BRAF fusion (Fig. 1e) and increased the confidence of identified fusions (Online Resource Fig. 1h). We further validated this optimized workflow in an independent diagnostic cohort, and found it to significantly outperform the previous standard analysis tools FusionCatcher and STAR-Fusion (Fig. 1f).
Finally, we analyzed RNA-Seq data from a set of > 1000 FFPE tissue samples processed in a diagnostic setting [6]. Importantly, the more sensitive detection parameters did not result in any false positive calls in non-KIAA1549:BRAF PA or other tumor types (100% specificity).
The presented modifications to STAR and Arriba considerably improved the detection rate of KIAA1549:BRAF fusions from RNA-Seq data in research and diagnostic settings. We expect that these improvements are likely to also result in increased fusion detection sensitivity in other contexts. It should be noted, however, that not all fusion-supporting evidence contained in the raw read data was picked up by our approach, even after optimization. Therefore, additional enhancements of STAR, Arriba and related tools will be needed in order to further improve the detection rate.
References
Byron SA, Van Keuren-Jensen KR, Engelthaler DM, Carpten JD, Craig DW (2016) Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet 17:257–271. https://doi.org/10.1038/nrg.2016.10
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21. https://doi.org/10.1093/bioinformatics/bts635
Jones DTW, Hutter B, Jäger N, Korshunov A, Kool M, Warnatz H-J et al (2013) Recurrent somatic alterations of FGFR1 and NTRK2 in pilocytic astrocytoma. Nat Genet 45:927–932. https://doi.org/10.1038/ng.2682
Jones DTW, Kocialkowski S, Liu L, Pearson DM, Bäcklund LM, Ichimura K et al (2008) Tandem duplication producing a novel oncogenic BRAF fusion gene defines the majority of pilocytic astrocytomas. Cancer Res 68:8673–8677. https://doi.org/10.1158/0008-5472.CAN-08-2097
Lin A, Rodriguez FJ, Karajannis MA, Williams SC, Legault G, Zagzag D et al (2012) BRAF alterations in primary glial and glioneuronal neoplasms of the central nervous system with identification of 2 novel KIAA1549:BRAF fusion variants. J Neuropathol Exp Neurol 71:66–72. https://doi.org/10.1097/NEN.0b013e31823f2cb0
Stichel D, Schrimpf D, Casalini B, Meyer J, Wefers AK, Sievers P et al (2019) Routine RNA sequencing of formalin-fixed paraffin-embedded specimens in neuropathology diagnostics identifies diagnostically and therapeutically relevant gene fusions. Acta Neuropathol 138:827–835. https://doi.org/10.1007/s00401-019-02039-3
Tomić TT, Olausson J, Wilzén A, Sabel M, Truvé K, Sjögren H et al (2017) A new GTF2I-BRAF fusion mediating MAPK pathway activation in pilocytic astrocytoma. PLoS ONE 12:e0175638. https://doi.org/10.1371/journal.pone.0175638
Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W et al (2013) Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun 4:2612–2711. https://doi.org/10.1038/ncomms3612
Acknowledgements
Open Access funding provided by Projekt DEAL. We thank Andrea Wittmann for excellent technical assistance, the Genomics and Proteomics Core Facility (GPCF) at the DKFZ for RNA sequencing services and the Omics IT and Data Management Core Facility (ODCF) at the DKFZ for data management and analysis services. This work was supported by the Everest Centre for Low-grade Paediatric Brain Tumours (The Brain Tumour Charity, UK), the Pediatric Low Grade Astrocytoma Fund (PLGA Fund) at the Pediatric Brain Tumor Foundation (PBTF), the German Academic Scholarship Foundation, the Molecular Diagnostics Program at the NCT Heidelberg and the Fondation Charles-Bruneau.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sommerkamp, A.C., Uhrig, S., Stichel, D. et al. An optimized workflow to improve reliability of detection of KIAA1549:BRAF fusions from RNA sequencing data. Acta Neuropathol 140, 237–239 (2020). https://doi.org/10.1007/s00401-020-02167-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00401-020-02167-1