Abstract
Microarray technology provides an opportunity for scientists to analyze thousands of gene expression profiles simultaneously. Due to the widely use of microarray technology, several research issues are discussed and analyzed such as missing value imputation or gene-gene regulation prediction. Microarray gene expression data often contain multiple missing expression values due to many reasons. Effective methods for missing value imputation in gene expression data are needed since many algorithms for gene analysis require a complete matrix of gene array values. In addition, selecting informative genes from microarray gene expression data is essential while performing data analysis on the large amount of data. To fit this need, a number of methods were proposed from various points of view. However, most existing methods have their limitations and disadvantages.
To estimate similarity between gene pairs effectively, we propose a novel distance measurement based on the well-defined ontology structure for genes or proteins: the gene ontology (GO). GO is a definition and annotation for genes that describe the biological meanings of them. The structure of GO can be described as a directed acyclic graph (DAG), where each GO term is a node, and the relationships between each term pair are arcs. With GO annotations, we can hence acquire the relations for the genes involved in the experiment. The semantic similarity of two genes within biological aspect can be identified if we perform some quantitative assessments on the gene pairs with their GO annotations.
In this chapter, we first provide the reader with fundamental knowledge about microarray technology in Section 1. A brief introduction for microarray experiments will be given. We then discuss and analyze essential research issues about microarray in Section 2. We also present a novel method based on k-nearest neighbor (KNN), dynamic time warping (DTW) and gene ontology (GO) for the analysis of microarray time series data in Section 3. With our approach, missing value imputation and gene regulation prediction can be achieved efficiently. Section 4 introduces a real microarray time-series dataset. Effectiveness of our method is shown with various experimental results in Section 5. A brief conclusion is made in Section 6.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Acuna, E., Rodriguez, C.: The treatment of missing values and its effect in the classifier accuracy. In: Proceedings of the Classification, Clustering XE Clustering and Data Mining Applications, pp. 639–648 (2004)
Ouyang, M., Welsh, W.J., Georgopoulos, P.: Gaussian mixture clustering and imputation of microarray XE microarray data. Bioinformatics 20(6), 917–923 (2004)
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson, J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large B-cell lymphoma identified by gene expression XE gene expression profiling. Nature 403, 503–511 (2000)
Chen, L.C., Lin, Y.C., Arita, M., Tseng, V.S.: A novel approach for handling missing values in microarray XE microarray data. In: Proceedings of the International Computer Symposium, pp. 45–50 (2008)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for DNA microarray XE microarrays. Bioinformatics 17(6), 520–525 (2001)
Kim, S., Imoto, S., Miyano, S.: Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression XE gene expression data. Biosystems 75, 57–65 (2004)
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A bayesian missing value estimation method for gene expression XE gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)
Kim, H., Golub, G.H., Park, H.: Missing value estimation for DNA microarray XE gene expression XE gene expressiondata: local least squares XE local least squares imputation. Bioinformatics 21(2), 187–198 (2005)
Choong, M.K., Charbit, M., Yan, H.: Autoregressive-model-based missing value estimation for DNA microarray XE microarray time series data. IEEE Transactions on Information Technology in Biomedicine 13(1), 131–137 (2009)
Choong, M.K., Levy, D., Yang, H.: Study of microarray XE microarray time series data based on forward–backward linear prediction and singular value decomposition XE singular value decomposition. International Journal of Data Mining and Bioinformatics 3(2), 145–159 (2009)
Shan, Y., Deng, G.: Kernel PCA regression for missing data estimation in DNA microarray XE microarray analysis. In: Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 1477–1480 (2009)
Wang, X., Li, A., Jiang, Z., Feng, H.: Missing value estimation for DNA microarray XE microarray gene expression XE gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinformatics 7, 1–10 (2006)
Wong, D.S.V., Wong, F.K., Wood, G.R.: A multi-stage approach to clustering and imputation of gene expression XE gene expression profiles. Bioinformatics 23, 998–1005 (2007)
Liu, J., Ni, B., Dai, C., Wang, N.: A simple method of inferring pairwise gene interactions from microarray XE microarray time series data. In: Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, pp. 3346–3351 (2005)
Yang, A.C., Hsu, H.H., Lu, M.D.: Outlier filtering for identification of gene regulations in microarray XE microarray time-series data XE time-series data. In: Proceedings of the Third International Conference on Complex, Intelligent and Software Intensive System, pp. 854–859 (2009)
Tseng, V.S., Chen, L.C., Chen, J.J.: Gene relation discovery by mining similar subsequences in time-series microarray XE microarray data. In: Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 106–112 (2007)
Vlachos, M., Kollios, G., Gunopulos, G.: Discovering similar multidimensional trajectories. In: Proceedings of the Eighteenth International Conference on Data Engineering, pp. 673–684 (2002)
Lee, M.S., Liu, L.Y., Chen, M.Y.: Similarity analysis of time series gene expression XE gene expression using dual-tree wavelet transform. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. I-413–I-416(2007)
Friedman, N., Linial, M., Nachman, I., Péer, D.: Using Bayesian network to analyze expression data. In: Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, pp. 601–620 (2000)
Yeung, L.K., Yan, H., Liew, A.W.C., Szeto, L.K., Yang, M., Kong, R.: Measuring correlation between microarray XE microarray time series data using dominant spectral component XE dominant spectral component. In: Proceedings of the Second Asia-Pacific Bioinformatics Conference, vol. 29, pp. 309–314 (2004)
Mohammadi, A., Saraee, M.H.: Estimating missing value in microarray XE microarray data using fuzzy clustering and gene ontology XE gene ontology. In: Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, pp. 382–385 (2008)
Xiang, Q., Dai, X.: Proving missing value imputation in microarray XE microarray data by using gene regulatory information. In: Proceedings of the Second International Conference on Bioinformatics and Biomedical Engineering, pp. 326–329 (2008)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. National Academy of Science 95, 14863–14868 (1998)
Kalpakis, K., Gada, D., Puttagunta, V.: Distance measures for effective clustering of ARIMA time-series. In: Proceedings of the IEEE International Conference on Data Mining, pp. 273–280 (2001)
Myers, C., Rabiner, L., Roseneberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Transactions On Acoustics, Speech, and Signal Processing ASSP-28, 623–635 (1980)
Rabiner, L., Rosenberg, A., Levinson, S.: Considerations in dynamic time warping algorithms for discrete word recognition. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-26, 575–582 (1978)
Furlanello, C., Merler, S., Jurman, G.: Combining feature selection and DTW for time-varying functional genomics. IEEE Transactions on Signal Processing 54(6), Part 2, 2436–2443 (2006)
Yu, H.M., Tsai, W.H., Wang, H.M.: Query-by-Singing system for retrieving karaoke music. IEEE Transactions on Multimedia 10(8), 1626–1637 (2008)
Salvador, S., Chan, P.: Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11(5), 561–580 (2007)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-26, 43–49 (1978)
Berndt, D., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Proceedings of the Workshop on Knowledge Discovery in Databases (1994)
Kruskall, J.B., Liberman, M.: The symmetric time warping algorithm: from continuous to discrete. Time Warps, String Edits, and Macromolecules: The theory and Practice of String Comparison (1983)
Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-23, 52–72 (1975)
Keogh, E., Pazzani, M.: Derivative dynamic time warping. In: Proceedings of the First SIAM International Conference on Data Mining, Chicag, Illinois (2001)
Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19, 1275–1283 (2003)
Sanfilippo, A., Baddeley, B., Beagley, N., Gopalan, B.: Enhancing automatic biological pathway generation with GO-based gene similarity. In: Proceedings of the International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, pp. 448–453 (2009)
Tuikkala, J., Elo, L., Nevalainen, O.S., Aittokallio, T.: Improving missing value estimation in microarray XE microarray data with gene ontology XE gene ontology. Bioinformatics 22, 566–572 (2006)
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K.M., Eisen, B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray XE microarray hybridization. Molecular Biology of the Cell 9, 3273–3297 (1998)
Cho, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T., Gabrielian, A., Landsman, D., Lockhart, D.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)
Filkov, V., Skiena, S., Zhi, J.: Analysis techniques for microarray XE microarray time-series data XE time-series data. In: Proceedings of the Fifth Annual International Conference on Computational Molecular Biology, pp. 124–131 (2001)
Website: Gene ontology XE Gene ontology website, http://www.geneontology.org/ (last accessed on March 1, 2011)
Website: Saccharomyces Genome Database XE Saccharomyces Genome Database, http://www.yeastgenome.org/ (last accessed on March 1, 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Yang, A.C., Hsu, HH. (2011). DTW-GO Based Microarray Time Series Data Analysis for Gene-Gene Regulation Prediction. In: Biba, M., Xhafa, F. (eds) Learning Structure and Schemas from Documents. Studies in Computational Intelligence, vol 375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22913-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-22913-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22912-1
Online ISBN: 978-3-642-22913-8
eBook Packages: EngineeringEngineering (R0)