Abstract
With the rapid development of information technology, computers are proving to be a fundamental tool for the organization and classification of electronic texts, given the huge amount of available information. The existent methodologies for text mining apply standard clustering algorithms to group similar texts. However, these algorithms generally take into account only the global similarities between the texts and assign each one to only one cluster, limiting the amount of information that can be extracted from the texts. An alternative proposal capable of solving these drawbacks is the biclustering technique. The biclustering is able to perform clustering of rows and columns simultaneously, allowing a more comprehensive analysis of the texts. The main contribution of this paper is the development of an immune-inspired biclustering algorithm to carry out text mining, denoted BIC-aiNet. BIC-aiNet interprets the biclustering problem as several two-way bipartition problems, instead of considering a single two-way permutation framework. The experimental results indicate that our proposal is able to group similar texts efficiently and extract implicit useful information from groups of texts.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Agrawal, R., Gehrke, J., Gunopulus, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of the ACM/SIGMOD Int. Conference on Management of Data, pp. 94–105 (1998)
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. of the 8th Int. Conf. on Inteligentt Systems for Molecular Biology, pp. 93–103 (2000)
de Castro, L.N, Von Zuben, F.J.: aiNet: An Artificial Immune Network for Data Analysis. In: Data Mining: A Heuristic Approach, pp. 231–259 (2001)
de França, F.O., Bezerra, G., Von Zuben, F.J.: New Perspectives for the Biclustering Problem. IEEE Congress on Evolutionary Computation, 2768–2775 (2006)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc. of the 7th Int. Con. on Knowledge Discovery and Data Mining, pp. 269–274 (2001)
Feldman, R., Sanger, J.: The Text Mining Handbook. Cambridge University Press, Cambridge (2006)
Goldberg, D., Nichols, D., Brian, M., Terry, D.: Using collaborative filtering to weave an information tapestry. ACM Communications 35(12), 61–70 (1992)
Haixun, W., Wei, W., Jiong, Y., Yu., P.S.: Clustering by pattern similarity in large data sets. In: Proc. of the 2002 ACM SIGMOD Int. Conf. on Manag. Data, pp. 394–405 (2002)
Hartigan, J.A: Direct clustering of a data matrix. Journal of the American Statistical Association (JASA) 67(337), 123–129 (1972)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. on Computational Biology and Bioinformatics 1, 24–25 (2004)
Sheng, Q., Moreau, Y., De Moor, B.: Biclustering micrarray data by Gibbs sampling. Bioinformatics 19(suppl. 2), 196–205 (2003)
Symeonidis, P., Nanopoulos, A., Papadopoulos, A., Manolopoulos, Y.: Nearest-Biclusters Collaborative Filtering. In: Proc. of the WebKDD 2006 (2006)
Tang, C., Zhang, L., Zhang, I., Ramanathan, M.: Interrelated two-way clustering: an unsupervised approach for gene expression data analysis. In: Proc. of the 2nd IEEE Int. Symposium on Bioinformatics and Bioengineering, pp. 41–48 (2001)
Tanay, A., Sharan, R., Shamir, R.: Biclustering Algorithms: A Survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology. Chapman & Hall/CRC Computer and Information Science Series (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Castro, P.A.D., de França, F.O., Ferreira, H.M., Von Zuben, F.J. (2007). Applying Biclustering to Text Mining: An Immune-Inspired Approach. In: de Castro, L.N., Von Zuben, F.J., Knidel, H. (eds) Artificial Immune Systems. ICARIS 2007. Lecture Notes in Computer Science, vol 4628. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73922-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-73922-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73921-0
Online ISBN: 978-3-540-73922-7
eBook Packages: Computer ScienceComputer Science (R0)