Abstract
The effectiveness of cancer treatment depends strongly on an accurate diagnosis. In this paper we propose a system for automatic and precise diagnosis of a tumor’s origin based on genetic data. This system is based on a combination of coding theory techniques and machine learning algorithms. In particular, tumor classification is described as a multiclass learning setup, where gene expression values serve the system to distinguish between types of tumors. Since multiclass learning is intrinsically complex, the data is divided into several biclass problems whose results are combined with an error correcting linear block code. The robustness of the prediction is increased as errors of the base binary classifiers are corrected by the linear code. Promising results have been achieved with a best case precision of 72% when the system was tested on real data from cancer patients.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Dietterich, T., Bakiri, G.: Error-correcting output codes: A general method for improving multiclass inductive learning programs. In: Proceedings of the 9th National Conference on Artificial Intelligence (AAAI 1991), pp. 572–577. AAAI Press, Menlo Park (1991)
Freund, Y., Schapire, R.R.: Experiments with a new boosting algorithm. InMachine Learning. In: Proceedings of the Thirteenth International Conference onMachine Learning, Morgan Kaufmann, San Francisco (1996)
Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression. Science 286, 531–537 (1999)
Lin, S., Costello Jr., D.J.: Error Control Coding: Fundamentals and Applications. Prentice-Hall, Englewood Cliffs (1983)
MacKay, D.J.C., Neal, R.M.: Good Codes based on Very Sparse Matrices. In: Cryptography and Coding the IMA Conference (1995)
MacKay, D.J.C., Neal, R.M.: Good Error-Correcting Codes based on Very Sparse Matrices. IEEE transactions on Information Theory (1999)
Mukherjee, S.: Classifying Microarray Data Using Support Vector Machines. In: Berrar, D.P., Dubitzky, W., Granzow, M. (eds.) A Practical Approach to Microarray Data Analysis, pp. 166–185. Kluwer Academic Publishers, Dordrecht (2003)
Ramaswamy, S., et al.: Multi-Class Cancer Diagnosis Using Tumor Gene Expression Signatures. PNAS 98, 15149–15154 (2001)
Schölkpof, B., Smola, A.: Learning with Kernels Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2001)
Storey, J., Tibshirani, R.: Statistical Significance for Genome-Wide Experiments (2003), http://www-stat.stanford.edu/~tibs/ftp/fdringenomics.pdf
Tapia, E.: New learning models based on recursive error correcting codes, Doctoral Thesis, ETSI de Telecomunicación Universidad Politécnica de Madrid, Spain (2001)
Tapia, E., González, J.C., Hüntemann, A., García-Villalba, J.: Beyond Boosting: Recursive ECOC Learning Machines. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 62–71. Springer, Heidelberg (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Yeang, C.H., et al.: Molecular classification of multiple tumor types. Bioinformatics 17 (Suppl. 1), 316–322 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hüntemann, A., González, J.C., Tapia, E. (2005). Tumor Classification from Gene Expression Data: A Coding-Based Multiclass Learning Approach. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds) Biological and Medical Data Analysis. ISBMDA 2005. Lecture Notes in Computer Science(), vol 3745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573067_22
Download citation
DOI: https://doi.org/10.1007/11573067_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29674-4
Online ISBN: 978-3-540-31658-9
eBook Packages: Computer ScienceComputer Science (R0)