Abstract
This work is motivated by the need for consensus clustering methods using multiple datasets, applicable to microarray data. It introduces a new method for clustering samples with similar genetic profiles, in an unsupervised fashion, using information from two or more datasets. The method was tested using two breast cancer gene expression microarray datasets, with 295 and 249 samples; and 12,325 common genes. Four subtypes with similar genetic profiles were identified in both datasets. Clinical information was analysed for the subtypes found and they confirmed different levels of tumour aggressiveness, measured by the time of metastasis, thus indicating a connection between different genetic profiles and prognosis. Finally, the subtypes identified were compared to already established subtypes of breast cancer. That indicates that the new approach managed to detect similar gene expression profile patterns across the two datasets without any a priori knowledge. The two datasets used in this work, as well as all the figures, are available for download from the website http://www.cs.newcastle.edu.au/~mendes/BreastCancer.html.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Buriol, L., Franca, P., Moscato, P.: A new memetic algorithm for the asymmetric traveling salesman problem. Journal of Heuristics 10, 483–506 (2004)
Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. In: Proceeding of the 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 418–426. IEEE Computer Society (2003)
Glover, F., Kochenberger, G.: Handbook of Metaheuristics. Springer, USA (2003)
Grotkjaer, T., Winther, O., Regenberg, B., Nielsen, J., Hansen, L.: Robust multi-scale clustering of large dna microarray datasets with the consensus algorithm. Bioinformatics 22, 58–67 (2006)
Hoshida, Y., Brunet, J., Tamayo, P., Golub, T., Mesirov, J.: Subclass mapping: Identifying common subtypes in independent disease data sets. PLoS ONE 2, e1195 (2007)
Hu, X., Stern, H.M., Ge, L., O’Brien, C., Haydu, L., Honchell, C.D., Haverty, P.M., Wu, B.P.T., Amler, L.C., Chant, J., Stokoe, D., Lackner, M.R., Cavet, G.: Genetic alterations and oncogenic pathways associated with breast cancer subtypes. Molecular Cancer Research 7, 511–522 (2009)
Irvin Jr., W., Carey, L.: What is triple-negative breast cancer? European Journal of Cancer 44, 2799–2805 (2008)
Mendes, A.: Consensus clustering of gene expression microarray data using genetic algorithms. In: Proceedings of PRIB 2008 - Third IAPR International Conference on Pattern Recognition in Bioinformatics (Supp. volume), pp. 181–192 (2008)
Miller, L.D., Smeds, J., George, J., Vega, V.B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E.T., Bergh, J.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proceedings of the National Academy of Sciences 102, 13550–13555 (2005)
Monti, S., Mesirov, P.T.J., Golub, T.: Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52, 91–118 (2003)
Moscato, P., Mendes, A., Berretta, R.: Benchmarking a memetic algorithm for ordering microarray data. Biosystems 88, 56–75 (2007)
Olariu, S., Zomaya, A.: Handbook of Bioinspired Algorithms and Applications. Chapman & Hall/CRC, USA (2005)
Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F.L., Walker, M.G., Watson, D., Park, T., Hiller, W., Fisher, E.R., Wickerham, L., Bryant, J., Wolmark, N.: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. The New England Journal of Medicine 351, 2817–2826 (2004)
Perreard, L., Fan, C., Quackenbush, J., Mullins, M., Gauthier, N., Nelson, E., Mone, M., Hansen, H., Buys, S., Rasmussen, K., Orrico, A., Dreher, D., Walters, R., Parker, J., Hu, Z., He, X., Palazzo, J., Olopade, O., Szabo, A., Perou, C.M., Bernard, P.: Classification and risk stratification of invasive breast carcinomas using a real-time quantitative rt-pcr assay. Breast Cancer Research 8, R23 (2006)
Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene-expression data. Genome Biology 5, R94 (2004)
van de Vijver, M., He, Y., van’t Veer, L., Dai, H., Hart, A., Voskuil, D., Schreiber, G., Peterse, J., Roberts, C., Marton, M., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H., Rodenhuis, S., Rutgers, E., Friend, S., Bernards, R.: A gene expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine 347, 1999–2009 (2002)
van’t Veer, L., Bernards, R.: Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature 452, 564–570 (2008)
Weigelt, B., Baehner, F., Reis-Filho, J.: The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. Journal of Pathology 220, 263–280 (2010)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, USA (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mendes, A. (2011). Identification of Breast Cancer Subtypes Using Multiple Gene Expression Microarray Datasets. In: Wang, D., Reynolds, M. (eds) AI 2011: Advances in Artificial Intelligence. AI 2011. Lecture Notes in Computer Science(), vol 7106. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25832-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-25832-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25831-2
Online ISBN: 978-3-642-25832-9
eBook Packages: Computer ScienceComputer Science (R0)