Abstract
The task of automatic text summarization consists of generating a summary of the original text that allows the user to obtain the main pieces of information available in that text, but with a much shorter reading time. This is an increasingly important task in the current era of information overload, given the huge amount of text available in documents. In this paper the automatic text summarization is cast as a classification (supervised learning) problem, so that machine learning-oriented classification methods are used to produce summaries for documents based on a set of attributes describing those documents. The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task. Computational results are reported for experiments with a document base formed by news extracted from The Wall Street Journal of the TIPSTER collection–a collection that is often used as a benchmark in the text summarization literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Deb, K.: Multi-Objective Optimization using Evolutionary Algorithms. John Wiley & Sons, Chichester (2001)
Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, Heidelberg (2002)
Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Patten Recognition 33, 25–41 (2000)
Larocca Neto, J., Santos, A.D., Kaestner, C.A.A., Freitas, A.A.: Document clustering and text summarization. In: Proc. 4th Int. Conf. Practical Applications of Knowledge Discovery and Data Mining, pp. 41–55 (2000)
Larroca Neto, J., Freitas, A.A., Kaestner, C.A.A.: Automatic text summarization using a machine learning approach. In: XVI Brazilian Symposium on Artificial Intelligence. LNCS (LNAI), vol. 2057, pp. 205–215. Springer, Heidelberg (2002)
Larroca Neto, J.: A Contribution to the Study of Automatic Text Summarization Techniques (in Portuguese). Master’s thesis, Pontífica Universidade Católica do Paraná (PUC-PR), Graduate Program in Applied Computer Science (2002)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (1998)
Lyman, P., Varian, H.R.: How much information Retrieved from, http://www.sims.berkeley.edu/how-much-info-2003 on [01/19/2004]
Mani, I., Bloedorn, E.: Machine learning of generic and user-focused summarization. In: Proc. of the 15th National Conf. on Artificial Intelligence (AAI 1998), pp. 821–826 (1998)
Mani, I., House, D., Klein, G., Hirschman, L., Obrsl, L., Firmin, T., Chrzanowski, M., Sundeheim, B.: The tipster summac text summarization evaluation. MITRE Technical Report MTR 92W0000138, The MITRE Corporation (1998)
Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)
Mani, I.: Automatic Summarization. John Benjamins Publishing Company, Amsterdam (2001)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Nevill-Manning, C.G., Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C.: KEA: Pratical Automatic Keyphrase Extraction, pp. 245–255. ACM DL, New York (1999)
Pappa, G.L., Freitas, A.A., Kaestner, C.A.A.: Attribute selection with a multiobjective genetic algorithm. In: XVI Brazilian Symposium on Artificial Intelligence. LNCS (LNAI), vol. 2057, pp. 280–290. Springer, Heidelberg (2002)
Pappa, G.L., Freitas, A.A., Kaestner, C.A.A.: A multi-objective genetic algorithm for attribute selecion. In: Proc. 4th Int. Conf. on Recent Advances in Soft Computing (RASC 2002), pp. 116–121. University of Nottingham, UK (2002)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Silla Jr., C.N., Kaestner, C.A.A.: An Analysis of Sentence Boundary Detection Systems for English and Portuguese Documents. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 135–141. Springer, Heidelberg (2004)
Sparck-Jones, K.: Automatic summarizing: factors and directions. In: Mani, I., Maybury, M. (eds.) Advances in Automatic Text Summarization, pp. 1–12. The MIT Press, Cambridge (1999)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)
Witten, I.H., Frank, B.: Data Mining: Pratical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Zhong, N., Liu, J., Yao, Y.: In search of the wisdom web. IEEE Computer 35(1), 27–31 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Silla, C.N., Pappa, G.L., Freitas, A.A., Kaestner, C.A.A. (2004). Automatic Text Summarization with Genetic Algorithm-Based Attribute Selection. In: Lemaître, C., Reyes, C.A., González, J.A. (eds) Advances in Artificial Intelligence – IBERAMIA 2004. IBERAMIA 2004. Lecture Notes in Computer Science(), vol 3315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30498-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-30498-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23806-5
Online ISBN: 978-3-540-30498-2
eBook Packages: Springer Book Archive