Abstract
The concept of biclustering evolved from traditional clustering techniques, which have proved to be inadequate for discovering local patterns in gene microarrays, in particular with shifting and scaling patterns. In this work we compare similarity measures applied in different biclustering algorithms and review validation methodologies described in literature. To our best knowledge, this is the first in-depth comparative analysis of proximity measures and validation techniques for biclustering. Current trends in design of similarity measures as well as a rich collection of state-of-the-art benchmark datasets are presented, supporting algorithm designers in classification of comparison and quality assessment criteria of emerging biclustering algorithms.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Aguilar-Ruiz, J.: Shifting and scaling patterns from gene expression data. Bioinformatics 21(20), 3840–3845 (2005)
Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., et al.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)
Armstrong, S., Staunton, J., Silverman, L., Pieters, R., den Boer, M., Minden, M., Sallan, S., Lander, E., Golub, T., Korsmeyer, S., et al.: Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30(1), 41–47 (2002)
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., et al.: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25 (2000)
Ayadi, W., Elloumi, M., Hao, J.: Pattern-driven neighborhood search for biclustering of microarray data. BMC bioinformatics 13(suppl. 7), S11 (2012)
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the Sixth Annual International Conference on Computational Biology, RECOMB 2002, pp. 49–57. ACM, New York (2002), http://doi.acm.org/10.1145/565196.565203
Bozdağ, D., Kumar, A.S., Catalyurek, U.V.: Comparative analysis of biclustering algorithms. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, pp. 265–274. ACM, New York (2010), http://doi.acm.org/10.1145/1854776.1854814
Bozdağ, D., Parvin, J.D., Catalyurek, U.V.: A biclustering method to discover co-regulated genes using diverse gene expression datasets. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 151–163. Springer, Heidelberg (2009), http://dx.doi.org/10.1007/978-3-642-00727-9_16
Bryan, K.: Biclustering of expression data using simulated annealing. In: Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, CBMS 2005, pp. 383–388. IEEE Computer Society Press, Washington, DC (2005), http://dx.doi.org/10.1109/CBMS.2005.37
Chen, G., Jaradat, S., Banerjee, N., Tanaka, T., Ko, M., Zhang, M.: Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data. Statistica Sinica 12(1), 241–262 (2002)
Chen, P., Popovich, P.: Correlation: Parametric and nonparametric measures, pp. 137–139. Sage Publications, Incorporated (2002)
Cheng, Y., Church, G.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, vol. 8, pp. 93–103 (2000)
Choi, S., Cha, S., Tappert, C.: A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics 8(1), 43–48 (2010)
Dharan, S., Nair, A.S.: Biclustering of gene expression data using reactive greedy randomized adaptive search procedure. BMC Bioinformatics 10(suppl. 1), S27 (2009)
Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95(25), 14863–14868 (1998)
Eren, K., Deveci, M., Küçüktunç, O., Çatalyürek, Ü.: A comparative analysis of biclustering algorithms for gene expression data. Briefings in Bioinformatics (2012)
Erten, C., Sözdinler, M.: Biclustering expression data based on expanding localized substructures. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 224–235. Springer, Heidelberg (2009)
Faith, J., Driscoll, M., Fusaro, V., Cosgrove, E., Hayete, B., Juhn, F., Schneider, S., Gardner, T.: Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucleic Acids Research 36(suppl. 1), D866–D870 (2008)
Gasch, A., Spellman, P., Kao, C., Carmel-Harel, O., Eisen, M., Storz, G., Botstein, D., Brown, P.: Genomic expression programs in the response of yeast cells to environmental changes. Science Signalling 11(12), 4241 (2000)
Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)
Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences 97(22), 12079–12084 (2000)
Gu, J., Liu, J.S.: Bayesian biclustering of gene expression data. BMC genomics 9(suppl. 1), 4 (2008)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2), 107–145 (2001)
Hartigan, J.: Direct clustering of a data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)
Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., Khamiakova, T., Van Sanden, S., Lin, D., Talloen, W., et al.: Fabia: factor analysis for bicluster acquisition. Bioinformatics 26(12), 1520–1527 (2010)
Hoshida, Y., Brunet, J., Tamayo, P., Golub, T., Mesirov, J.: Subclass mapping: identifying common subtypes in independent disease data sets. PloS One 2(11), e1195 (2007)
Ihmels, J., Bergmann, S., Barkai, N.: Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13), 1993–2003 (2004)
Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N., et al.: Revealing modular organization in the yeast transcriptional network. Nature Genetics 31(4), 370–378 (2002)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999), http://doi.acm.org/10.1145/331499.331504
Jain, A.K., Dubes, R.: Algorithms for clustering data. Prentice-Hall, Inc. (1988)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010), http://dx.doi.org/10.1016/j.patrec.2009.09.011
Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., et al: Kegg for linking genomes to life and the environment. Nucleic acids research 36(suppl. 1), D480–D484 (2008)
Kerr, G., Ruskin, H., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Computers in Biology and Medicine 38(3), 283–293 (2008)
Lazzeroni, L., Owen, A., et al.: Plaid models for gene expression data. Statistica Sinica 12(1), 61–86 (2002)
Li, G., Ma, Q., Tang, H., Paterson, A., Xu, Y.: Qubic: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Research 37(15), e101–e101 (2009)
Liu, F., Zhou, H., Liu, J., He, G.: Biclustering of gene expression data using eda-ga hybrid. In: IEEE Congress on Evolutionary Computation, CEC 2006, pp. 1598–1602. IEEE (2006)
Liu, J., Li, Z., Hu, X., Chen, Y.: Biclustering of microarray data with mospo based on crowding distance. BMC bioinformatics 10(suppl. 4), S9 (2009)
Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004)
Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial & Applied Mathematics 5(1), 32–38 (1957)
Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proc. Pacific Symp. Biocomputing, vol. 3, pp. 77–88 (2003)
Myers, J., Well, A.: Research design and statistical analysis. Lawrence Erlbaum (2002)
Nepomuceno, J., Troncoso, A., Aguilar-Ruiz, J., et al.: Biclustering of gene expression data by correlation-based scatter search. BioData Mining 4(3) (2011)
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 27(1), 29–34 (1999)
Orzechowski, P., Boryczko, K.: Parallel approach for visual clustering of protein databases. Computing and Informatics 29(6+), 1221–1231 (2010), http://www.cai.sk/ojs/index.php/cai/article/view/140
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Romesburg, C.: Cluster analysis for researchers. Lulu. com (2004)
Roy, S., Bhattacharyya, D., Kalita, J.: Deterministic approach for biclustering of co-regulated genes from gene expression data. Advances in Knowledge-Based and Intelligent Information and Engineering Systems 243, 490–499 (2012)
Santamaría, R., Quintales, L., Therón, R.: Methods to bicluster validation and comparison in microarray data. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 780–789. Springer, Heidelberg (2007)
Sharan, R., Elkon, R., Shamir, R.: et al.: Cluster analysis and its applications to gene expression data. In: Ernst Schering Res Found Workshop, vol. 38, pp. 83–108 (2002)
Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle–regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9(12), 3273–3297 (1998)
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl. 1), S136–S144 (2002)
Tavazoie, S., Hughes, J., Campbell, M., Cho, R., Church, G., et al.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Teng, L., Chan, L.: Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. Journal of Signal Processing Systems 50(3), 267–280 (2008)
Wilcox, R.: Introduction to robust estimation and hypothesis testing. Academic Press (2005)
Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelic, A., Von Rohr, P., Thiele, L., et al: Sparse graphical gaussian modeling of the isoprenoid gene network in arabidopsis thaliana. Genome Biol. 5(11), R92 (2004)
Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of Third IEEE Symposium on Bioinformatics and Bioengineering, pp. 321–327 (March 2003)
Yip, K.Y., Cheung, D.W., Ng, M.K.: Harp: A practical projected clustering algorithm. IEEE Trans. on Knowl. and Data Eng. 16(11), 1387–1397 (2004), http://dx.doi.org/10.1109/TKDE.2004.74
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Orzechowski, P. (2013). Proximity Measures and Results Validation in Biclustering – A Survey. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2013. Lecture Notes in Computer Science(), vol 7895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38610-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-38610-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38609-1
Online ISBN: 978-3-642-38610-7
eBook Packages: Computer ScienceComputer Science (R0)