Abstract
Data derived from microarray technologies are generally subject to various sources of noise and accordingly the raw data are pre-processed before formally analysed. Data normalization is a key pre-processing step when dealing with microarray experiments, such as circadian gene-expressions, since it removes systematic variations across arrays. A wide variety of normalization methods are available in the literature. However, from our experience in the study of rhythmic expression patterns in oscillatory systems (e.g. cell-cycle, circadian clock), the choice of the normalization method may substantially impair the identification of rhythmic genes. Hence, the identification of a gene as rhythmic could be just as an artefact of how the data were normalized. Yet, gene rhythmicity detection is crucial in modern toxicological and pharmacological studies, thus a procedure to truly identify rhythmic genes that are robust to the choice of a normalization method is required.
To perform the task of detecting rhythmic features, we propose a rhythmicity measure based on bootstrap methodology to robustly identify rhythmic genes in oscillatory systems. Although our methodology can be extended to any high-throughput experiment, in this chapter, we illustrate how to apply it to a publicly available circadian clock microarray gene-expression data and give full details (both statistical and computational) so that the methodology can be used in an easy way. We will show that the choice of normalization method has very little effect on the proposed methodology since the results derived from the bootstrap-based rhythmicity measure are highly rank correlated for any pair of normalization methods considered. This suggests, on the one hand, that the rhythmicity measure proposed is robust to the choice of the normalization method, and on the other hand, that gene rhythmicity detected using this measure is potentially not a mere artefact of the normalization method used. In this way the researcher using this methodology will be protected against the possible effect of different normalizations, as the conclusions obtained will not depend so strongly on them. Additionally, the described bootstrap methodology can also be employed as a tool to simulate gene-expression participating in an oscillatory system from a reference data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tu Y, Stolovitzky G, Klein U (2002) Quantitative noise analysis for gene-expression microarray experiments. Proc Natl Acad Sci USA 99: 14031–14036
Klebanov L, Yakovlev A (2007) How high is the level of technical noise in microarray data? Biol Direct 2: 9. https://doi.org/10.1186/1745-6150-2-9
Bolstad BM, Irizarry RA, Ȧstrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19: 185–193
Irizarry RA, Bolstad BM, Collin F et al (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31: e15. https://doi.org/10.1093/nar/gng015
Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA 98: 31–36
Hubbell E, Liu WM, Mei R (2002) Robust estimators for expression analysis. Bioinformatics 18: 1585–1592
Liu G, Loraine AE, Shigeta R et al (2003) NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res 31: 82–86
Irizarry RA, Hobbs B, Collin F et al (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4: 249–264
Wu Z (2009) A review of statistical methods for preprocessing oligonucleotide microarrays. Stat Methods Med Res 18: 533–541
Cheng L, Lo LY, Tang NLS et al (2016) CrossNorm: a novel normalization strategy for microarray data in cancers. Sci Rep 6: 18898. https://doi.org/10.1038/srep18898
Astrand M (2003) Contrast normalization of oligonucleotide arrays. J Comput Biol 10: 95–102
Workman C, Jensen LJ, Jarmer H et al (2002) A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol 3: research0048.1–research0048.16. https://doi.org/10.1186/gb-2002-3-9-research0048
Huber W, Von Heydebreck A, Sültmann H et al (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18: 96–104
Larriba Y, Rueda C, Fernández MA et al (2018) A bootstrap based measure robust to the choice of normalization methods for detecting rhythmic features in high dimensional data. Front Genet 9: 24. https://doi.org/10.3389/fgene.2018.00024
Slavov N, Airoldi EM, Van Oudenaarden A et al (2012) A conserved cell growth cycle can account for the environmental stress responses of divergent eukaryotes. Mol Biol Cell 23: 1986–1997
Oliva A, Rosebrock A, Ferrezuelo F et al (2005) The cell cycle-regulated genes of Schizosaccharomyces pombe. PLoS Biol 3: 1239–1260
Peng X, Karuturi RKM, Miller LD et al (2005) Identification of cell cycle-regulated genes in fission yeast. Mol Biol Cell 16: 1026–1042
Rustici G, Mata J, Kivinen K et al (2004) Periodic gene expression program of the fission yeast cell cycle. Nat Genet 36: 809–817
Barragán S, Fernández MA, Rueda C et al (2015) Determination of temporal order among the components of an oscillatory system. PLoS One 10: e0124842. https://doi.org/10.1371/journal.pone.0124842
Hughes ME, DiTacchio L, Hayes KR (2009) Harmonics of circadian gene transcription in mammals. PLoS Genet 5: e1000442. https://doi.org/10.1371/journal.pgen.1000442
Larriba Y, Rueda C, Fernández MA et al (2016) Order restricted inference for oscillatory systems for detecting rhythmic genes. Nucleic Acids Res 44: e163. https://doi.org/10.1093/nar/gkw771
Levine JD, Funes P, Dowse HB et al (2002) Signal analysis of behavioral and molecular cycles. BMC Neurosci 3: 1. https://doi.org/10.1186/1471-2202-3-1
Straume M (2004) DNA microarray time series analysis: automated statistical assessment of circadian rhythms in gene expression patterning. Methods Enzymol 383: 149–166
Hughes ME, Hogenesch JB, Kornacker K (2010) Jtk-cycle: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J Biol Rhythm 25: 372–380
Thaben PF, Westermark PO (2014) Detecting rhythms in time series with rain. J Biol Rhythm 29: 391–400
Robertson T, Wright FT, Dykstra RL (1988) Order restricted statistical inference. Wiley, New York
Fernández MA, Rueda C, Peddada SD (2012) Identification of a core set of signature cell cycle genes whose relative order of time to peak expression is conserved across species. Nucleic Acids Res 40: 2823–2832
Peddada SD, Umbach DM, Harris S (2012) Statistical analysis of gene expression studies with ordered experimental conditions. Handbook of statistics. Elsevier, Amsterdam
Barragán S, Fernández MA, Rueda C et al (2013) isocir: an r package for constrained inference using isotonic regression for circular data, with an application to cell biology. J Stat Sotw 54: i04. https://doi.org/10.18637/jss.v054.i04
Suárez MB, Alonso-Nuñez ML, del Rey F et al (2015) Regulation of ace2-dependent genes requires components of the PBF complex in Schizosaccharomyces pombe. Cell Cycle 14: 3124–3137
Rueda C, Fernández MA, Barragán S et al (2016) Circular piecewise regression with applications to cell-cycle data. Biometrics 72: 1266–1274
Barragán S, Fernández MA, Rueda C (2017) Circular order aggregation and its application to cell-cycle genes expressions. Bioinformatics 14: 819–829
Freudenberg J, Boriss H, Hasenclever D (2004) Comparison of preprocessing procedures for oligo-nucleotide micro-arrays by parametric bootstrap simulation of spike-in experiments. Methods Inform Med 43: 434–438
Nykter M, Aho T, Ahdesmäki M et al (2006) Simulation of microarray data with realistic characteristics. BMC Bioinformatics 7: 349. https://doi.org/10.1186/1471-2105-7-349
Parrish RS, Spencer III HJ, Xu P (2009) Distribution modeling and simulation of gene expression data. Comput Stat Data Anal 53: 1650–1660
Dembélé D (2013) A flexible microarray data simulation model. Microarrays 44: 115–130
Nagoshi E, Saini C, Bauer C et al (2004) Circadian gene expression in individual fibroblasts: Cell-autonomous and self-sustained oscillators pass time to daughter cells. Cell 119: 693–705
Baggs JE, Price TS, DiTacchio L et al (2009) Network features of the mammalian circadian clock. PLoS Biol 7: 0563–0575
Niforou KM, Anagnostopoulos AK, Vougas K et al (2008) The proteome profile of the human osteosarcoma u2os cell line. Cancer Genomics Proteomics 5: 63–77
Gautier L, Cope L, Bolstad BM et al (2004) Affy - analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20: 307–315
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5: 299–314
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57: 289–300
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. Chapman & Hall/CRC, Boca Raton
Emerson JD, Hoaglin DC (1983) Analysis of two-way tables by medians. Understanding robust and exploratory data analysis. Wiley, New York
Pizarro A, Hayer K, Lahens NF et al (2013) Circadb: a database of mammalian circadian gene expression profiles. Nucleic Acids Res 41: D1009–D1013. https://doi.org/10.1093/nar/gks1161
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Larriba, Y., Rueda, C., Fernández, M.A., Peddada, S.D. (2019). Microarray Data Normalization and Robust Detection of Rhythmic Features. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_9
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9442-7_9
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9441-0
Online ISBN: 978-1-4939-9442-7
eBook Packages: Springer Protocols