Abstract
Analysis of microarray data, when presented with raw gene expression intensity data, often take two main steps when analyzing the data. First pre-process the data by rescaling and standardizing so that overall intensities for each array are equivalent. Second, apply statistical methodologies to answer scientific questions of interest. In this paper, for the data pre-processing step, we introduce a thresholding algorithm for rescaling each array. Step 2 involves statistical classification and dimension reduction methodologies. For this we introduce the method of partial least squares (PLS) and apply it to the leukemia microarray data set of Golub et al. (1999). We also discuss the use of principal components analysis (PCA), quadratic discriminant analysis (QDA) and logistic discrimination (LD). Finally, we discuss other potential applications of PLS in analyzing gene expression data that address prediction of a target gene, prediction of the reaction in cell lines, assessment of patient survival, and generalisations in predicting multiple classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon et al. (1999), “Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proceedings of the National Academy of Sciences, 96, 6745–6750.
Alizadeh et al. (2000), “Distinct Types of Diffuse Large B—Cell Lymphoma Identified by Gene Expression Profiling,” Nature, 403, 503–511.
Bittner et al. (2000), “Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling,” Nature, 406, 536–540.
de Jong, S. (1993), “SIMPLS: An Alternative Approach to Partial Least Squares Regression,” Chemometrics and Intelligent Laboratory Systems, 18, 251–263.
Dudoit, S., Fridlyand, J., Speed, T.P. (2000), “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” Technical Report #576, Department of Statistics, U. C. Berkeley.
Flury, B. (1997), A First Course in Multivariate Analysis. Springer-Verlag, New York.
Frank, I.E., and Friedman, J.H. (1993), “A Statistical View of Some Chemometric Regression Tools” (with discussion), Technometrics, 35, 109–148.
Garthwaite, P.H. (1994), “An Interpretation of Partial Least Squares,” Journal of the American Statistical Association, 89, 122–127.
Geladi, P., and Kowalski, B.R. (1986), “Partial Least Squares Regression: Tutorial,” Analytica Chimica Acta, 185, 1–17.
Golub et al. (1999), “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, 286, 531–537.
Hand, J.D. (1981), Discrimination and Classification. John Wiley Sons, Chichester, England.
Hand, J.D. (1997), Construction and Assessment of Classification Rules. John Wiley Sons, Chichester, England.
Helland, I.S. (1988), “On the Structure of Partial Least Squares,” Communications in Statistics-Simulation and Computation, 17, 581–607.
Helland, S., and Almoy, T. (1994), “Comparison of Prediction Methods When Only a Few Components are Relevant,” Journal of the American Statistical Association, 89, 583–591.
Hoskuldsson, A. (1988), “PLS Regression Methods,” Journal of Chemometrics, 2, 211–228.
Johnson, R.A. and Wichern, D.W. (1992), Applied Multivariate Analysis. Prentice-Hall, New Jersey, 4th edition.
Jolliffe, I.T. (1986), Principal Component Analysis. Springer-Verlag, New York.
Lorber, A., Wangen, L.E., and Kowalski, B.R. (1997), “A Theoretical Foundation for the PLS Algorithm,” Journal of Chemometrics, 1, 19–31.
Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979), Multivariate Analysis. Academic Press, London.
Martens, H. and Naes, T. (1989), Multivariate Calibration, John Wiley Sons, New York.
Massey, W.F. (1965), “Principal Components Regression in Exploratory Statistical Research,” Journal of the American Statistical Association, 60, 234–246.
Nguyen, D.V. and Rocke, D.M. (2000), “Classification in High Dimension with Application to DNA Microarray Data,” manuscript.
Nguyen, D.V. and Rocke, D.M. (2001), “Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data,” to appear in Bioinformatics.
Nguyen, D.V. and Rocke, D.M. (2001b), “Partial Least Squares Proportional Hazard Regression for Application to DNA Microarray Data,” manuscript.
Nguyen, D.V. and Rocke, D.M. (2001c), “Multi-Class Cancer Classification Via Partial Least Squares Using Gene Expression Profiles,” manuscript.
Perou N et al. (2000), “Molecular Portrait of Human Breast Tumors,” Nature, 406, 747–752.
Perou N et al. (1999), “Distinctive Gene Expression Patterns in Human Mammary Epithelial Cells and Breast Cancer,” Proceedings of the National Academiy of Sciences, USA, 96, 9112–9217.
Phatak, A., and Reilly, P.M., and Penlidis, A. (1992), “The Geometry of 2-Block Partial Least Squares,” Communications in Statistics-Theory and Methods, 21, 1517–1553.
Press, S.J. (1982), Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference. Robert E. Krieger Publishing Company Inc., Malabar, Florida, 2nd edition.
Rocke, D.M. and Durbin, B. (2000), “A Model for Measurement Error for Gene Expression Arrays,” to appear in Journal of Computational Biology.
Ross et al. (2000), “Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines,” Nature Genetics, 24, 227–235.
Scherf et al. (2000), “A Gene Expression Database for the Molecular Pharmacology of Cancer,” Nature Genetics, 24, 236–244.
Stone, M., and Brooks, R. J. (1990), “Continuum Regression: Cross-validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares, and Principal Components Regression” (with discussion), Journal of the Royal Statistical Society, Series B, 52, 237–269.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Nguyen, D.V., Rocke, D.M. (2002). Classification of Acute Leukemia Based on DNA Microarray Gene Expressions Using Partial Least Squares. In: Lin, S.M., Johnson, K.F. (eds) Methods of Microarray Data Analysis. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0873-1_9
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0873-1_9
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5281-5
Online ISBN: 978-1-4615-0873-1
eBook Packages: Springer Book Archive