Abstract
Direct sequencing of the complementary DNA (cDNA) using high-throughput sequencing technologies (RNA-seq) is widely used and allows for more comprehensive understanding of the transcriptome than microarray. In theory, RNA-seq should be able to precisely identify and quantify all RNA species, small or large, at low or high abundance. However, RNA-seq is a complicated, multistep process involving reverse transcription, amplification, fragmentation, purification, adaptor ligation, and sequencing. Improper operations at any of these steps could make biased or even unusable data. Additionally, RNA-seq intrinsic biases (such as GC bias and nucleotide composition bias) and transcriptome complexity can also make data imperfect. Therefore, comprehensive quality assessment is the first and most critical step for all downstream analyses and results interpretation. This chapter discusses the most widely used quality control metrics including sequence quality, sequencing depth, reads duplication rates (clonal reads), alignment quality, nucleotide composition bias, PCR bias, GC bias, rRNA and mitochondria contamination, coverage uniformity, etc.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mortazavi A, Williams BA, McCue K et al (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628. doi:10.1038/nmeth.1226
Marioni JCJ, Mason CEC, Mane SMS et al (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Gene Dev 18:1509–1517. doi:10.1101/gr.079558.108
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63. doi:10.1038/nrg2484
Wilhelm BT, Landry J-R (2009) RNA-Seq—quantitative measurement of expression through massively parallel RNA-sequencing. Methods 48:249–257. doi:10.1016/j.ymeth.2009.03.016
Wang ET, Sandberg R, Luo S et al (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476. doi:10.1038/nature07509
Katz Y, Wang ET, Airoldi EM, Burge CB (2010) Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7:1009–1015. doi:10.1038/nmeth.1528
Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515. doi:10.1038/nbt.1621
Cabili MN, Trapnell C, Goff L et al (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Gene Dev 25:1915–1927. doi:10.1101/gad.17446611
Guttman M, Garber M, Levin JZ et al (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28:503–510. doi:10.1038/nbt.1633
Prensner JRJ, Iyer MKM, Balbin OAO et al (2011) Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol 29:742–749. doi:10.1038/nbt.1914
Kannan K, Wang L, Wang J et al (2011) Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing. Proc Natl Acad Sci U S A 108:9172–9177. doi:10.1073/pnas.1100489108
Pflueger D, Terry S, Sboner A et al (2011) Discovery of non-ETS gene fusions in human prostate cancer using next-generation RNA sequencing. Gene Dev 21:56–67. doi:10.1101/gr.110684.110
Edgren H, Murumagi A, Kangaspeska S et al (2011) Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol 12:R6. doi:10.1186/gb-2011-12-1-r6
Peng ZZ, Cheng YY, Tan BC-MB et al (2012) Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol 30:253–260. doi:10.1038/nbt.2122
Bahn JHJ, Lee J-HJ, Li GG et al (2012) Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Gene Dev 22:142–150. doi:10.1101/gr.124107.111
Park EE, Williams BB, Wold BJB, Mortazavi AA (2012) RNA editing in the human ENCODE RNA-seq data. Gene Dev 22:1626–1633. doi:10.1101/gr.134957.111
Ramaswami G, Zhang R, Piskol R et al (2013) Identifying RNA editing sites using RNA sequencing data alone. Nat Methods. doi:10.1038/nmeth.2330
Benjamini Y, Speed TP (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40:e72. doi:10.1093/nar/gks001
Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38:e131. doi:10.1093/nar/gkq224
Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8(3):175–85
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8(3):186–94
Babraham Bioinformatics – FastQC a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Wang L, Wang S, Li W (2012) RSeQC: quality control of RNA-seq experiments. Bioinformatics. Oxford, England. doi:10.1093/bioinformatics/bts356
Levin JZ, Yassour M, Adiconis X et al (2010) Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Methods. doi:10.1038/nmeth.1491
Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38(6):1767-71. doi:10.1093/nar/gkp1137
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this protocol
Cite this protocol
Li, X., Nair, A., Wang, S., Wang, L. (2015). Quality Control of RNA-Seq Experiments. In: Picardi, E. (eds) RNA Bioinformatics. Methods in Molecular Biology, vol 1269. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-2291-8_8
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2291-8_8
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-2290-1
Online ISBN: 978-1-4939-2291-8
eBook Packages: Springer Protocols