Abstract
In this work, we introduce a software testbed for temporal processing of Portuguese texts, composed by several building blocks: identification, classification and resolution of temporal expressions and temporal text segmentation. Starting from a simple document, we can reach a set of temporally annotated segments, which enables the establishment of relationships between words and time. This temporally enriched information is then placed into an Information Retrieval system. This work represents a step forward for Portuguese language processing, with notorious lack of tools. Its main novelty is temporal segmentation of texts. Even with target application in temporal aware Information Retrieval, the described software tools can be used in other application scenarios.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Alonso, O., Strötgen, J., Baeza-Yates, R., Gertz, M.: Temporal Information Retrieval: Challenges and Opportunities. In: 1st International Temporal Web Analytics Workshop (TWA-WWW 2011), pp. 1–8 (2011)
Mani, I.: Recent developments in temporal information extraction. In: RANLP, Borovets, Bulgaria, pp. 45–60 (2003)
Mani, I., Wilson, G.: Robust temporal processing of news. In: 38th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 69–76 (2000)
Vazov, N.: A system for extraction of temporal expressions from French texts based on syntactic and semantic constraints. In: ACL 2001 Workshop on Temporal and Spatial Information Processing, Toulouse, France (2001)
Verhagen, M., Pustejovsky, J.: Temporal processing with the TARSQI toolkit. In: COLING, ACL, Morristown, USA, pp. 189–192 (2008)
Schilder, F., Habel, C.: From temporal expressions to temporal information: Semantic tagging of news messages. In: ACL 2001 Workshop on Temporal and Spatial Information Processing, Toulouse, France, pp. 65–72 (2001)
Hagège, C., Baptista, J., Mamede, N.J.: Caracterização e processamento de expressões temporais em português. Linguamática 2(1), 63–76 (2010)
Misra, H., Yvon, F., Jose, J.M., Cappe, O.: Text segmentation via topic modeling: an analytical study. In: CIKM 2009, pp. 1553–1556. ACM, New York (2009)
Misra, H., Yvon, F., Cappé, O., Jose, J.: Text segmentation: a topic modeling perspective. Information Processing and Management 47(4), 528–544 (2011)
Bramsen, P., Deshpande, P., Lee, Y.K., Barzilay, R.: Finding temporal order in discharge summaries. In: AMIA 2006, Washington DC, USA, pp. 81–85 (2006)
Craveiro, O., Macedo, J., Madeira, H.: Use of Co-occurrences for Temporal Expressions Annotation. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 156–164. Springer, Heidelberg (2009)
Craveiro, O., Macedo, J., Madeira, H.: Leveraging temporal expressions for segmented-based information retrieval. In: ISDA, pp. 754–759. IEEE (2010)
Mota, C., Santos, D. (eds.): Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca (2008)
Ahn, D., Adafre, S.F., de Rijke, M.: Extracting temporal information from open domain text: A comparative exploration. In: DIR 2005, pp. 3–10 (2005)
Alonso, O., Gertz, M., Baeza-Yates, R.: Clustering and exploring search results using timeline constructions. In: CIKM 2009, pp. 97–106. ACM, New York (2009)
Bestgen, Y., Vonk, W.: The role of temporal segmentation markers in discourse processing. Discourse Processes 19, 385–406 (1995)
Hearst, M.A.: Multi-paragraph segmentation of expository text. In: 32nd Annual Meeting on Association for Computational Linguistics, pp. 9–16. ACL (1994)
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics 22, 249–254 (1996)
Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics 28, 19–36 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Craveiro, O., Macedo, J., Madeira, H. (2012). It Is the Time for Portuguese Texts!. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-28885-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)