Abstract
The recent explosion in availability of gene and protein expression data for cancer detection has necessitated the development of sophisticated machine learning tools for high dimensional data analysis. Previous attempts at gene expression analysis have typically used a linear dimensionality reduction method such as Principal Components Analysis (PCA). Linear dimensionality reduction methods do not however account for the inherent nonlinearity within the data. The motivation behind this work is to demonstrate that nonlinear dimensionality reduction methods are more adept at capturing the nonlinearity within the data compared to linear methods, and hence would result in better classification and potentially aid in the visualization and identification of new data classes. Consequently, in this paper, we empirically compare the performance of 3 commonly used linear versus 3 nonlinear dimensionality reduction techniques from the perspective of (a) distinguishing objects belonging to cancer and non-cancer classes and (b) new class discovery in high dimensional gene and protein expression studies for different types of cancer. Quantitative evaluation using a support vector machine and a decision tree classifier revealed statistically significant improvement in classification accuracy by using nonlinear dimensionality reduction methods compared to linear methods.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Peng, Y.: A novel ensemble machine learning for robust microarray data classification. Comput. Biol. Med. 36(6), 553–573 (2006)
Shi, C., Chen, L.: Feature Dimension Reduction for Microarray Data Analysis Using Locally Linear Embedding. In: APBC, pp. 211–217 (2005)
Ye, J., et al.: Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data. IEEE/ACM Trans. Comput. Biology Bioinform. 1(6), 181–190 (2004)
Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics, 65–83 (2003)
Dai, J., et al.: Dimension Reduction for Classification with Gene Expression Microarray Data. Statistical Applications in Genetics and Mol. Biol. 5(1), 1–15 (2006)
Madabhushi, A., et al.: Graph Embedding to Improve Supervised Classification and Novel Class Detection: Application to Prostate Cancer. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 729–737. Springer, Heidelberg (2005)
Tenenbaum, J.B., et al.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290, 2319–2322 (2000)
Roweis, S.T., Saul, L.: Nonlinear Dimensionality Reduction by Local Linear Embedding. Science 290, 2323–2326 (2000)
Dawson, K., et al.: Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm. BMC Bioinformatics 6, 195 (2005)
Nilsson, J., et al.: Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics 20, 874–880 (2004)
Shi, J., et al.: Comparing Ensembles of Learners: Detecting Prostate Cancer from High Resolution MRI. In: Beichel, R.R., Sonka, M. (eds.) CVAMIA 2006. LNCS, vol. 4241, pp. 25–36. Springer, Heidelberg (2006)
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Shipp, M.A., et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8, 68–74 (2002)
Gordon, G.J., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)
Beer, D., et al.: Gene-expression Profiles Predict Survival of Patients with Lung Adenocarcinoma. Nature Medicine 8(8), 816–823 (2002)
Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359(9306), 572–577 (2002)
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Alizadeh, A.A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Yeoh, E.J., et al.: Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling. Cancer Cell 1(2), 133–143 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, G., Rodriguez, C., Madabhushi, A. (2007). An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-72031-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72030-0
Online ISBN: 978-3-540-72031-7
eBook Packages: Computer ScienceComputer Science (R0)