Abstract
Over the past few years, the amount of information generated, consumed and stored on the Web has grown exponentially, making it impossible for users to keep up to date. Temporal data representation can help in this process by giving documents a sense of organization. Timelines are a natural way to showcase this data, giving users the chance to get familiar with a topic in a shorter amount of time. Despite their importance, little is known about their use in the context of single documents. In this paper, we present Time-Matters, a novel system to automatically explore arbitrary texts through temporal narratives in an interactive fashion that allows users to get insights into the relevant temporal happenings of a story through multiple components, including temporal annotation, storylines or temporal clustering. In contrast to classical timeline multi-document summarization tasks, we focus on performing text summaries of single documents with a temporal lens. This approach may be of interest to a number of providers such as media outlets, for which automatically building a condensed overview of a text is an important issue.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Recent times have shown an abundance of textual content creating new challenges for those who want to quickly get insights, without having to read entire documents. Much of this text is in free form. Extracting information from it requires the use of computer resources capable of understanding natural language. Presenting text using temporal structures can help reduce the effort of the reader [4, 15]. For example, they can define the time period of events in news articles [18, 21], play an important role in communication platforms, such as Twitter [1,2,3] or Wikipedia [13], and help contextualize historical texts [14] or legal documents [12]. Advances on these domains are partially due to the existence of temporal taggers, such as Heideltime [19] or SUTime [9]. Timelines appear in this context as a common approach that leverages the detected temporal signals to summarize the information spread over multiple documents in a temporal order fashion. However, little is known about their use in the scope of single documents [16, 20]. An optimal summary should cover all the important temporal aspects of a text while disregarding unimportant or irrelevant dates. However, manually building these timelines may be a laborious and time-consuming task, and an impossible effort for average users or professionals interested in making sense of an increasing volume of textual data. This slows down the process of text analytics and data understanding. In this paper, we present Time-Matters, a novel system that can give users an automatic overview of the most important time-periods and associated text stories in a short amount of time without having to read text-heavy documents. This can be very useful in several scenarios and domains and fits within the recent trend of automatically generating narratives from texts [8]. For instance, it may be of importance for media outlets [17], interested in telling stories and in reaching new audiences with alternative and appealing forms, but also for those interested in quickly extracting temporal information from long documents such as Wikipedia documents.
To accomplish this objective, we adapted a previously introduced version of Time-Matters [5] which worked over queries and multiple documents, to single texts. In particular, we aim to estimate the importance of the temporal expressions detected in a text and hence disregard the non-relevant ones. The goal is to not only provide a temporal annotation of the text with the corresponding scores given by the Time-Matters algorithm, but also to offer users the chance to interact with the system with a temporal storyline component that shows the most important stories of a text. We do this in an interactive fashion that includes a timeline and graphical elements likely related to parts of the story. Further possibilities include exploring the most relevant stories of the text through temporal clustering. Another important key aspect of our approach is that it is unsupervised, domain and corpus-independent as it does not require any training stage and builds upon local text statistical features extracted from single documents. Hence, it can readily be applied to any text. The core of Time-Matters is also mostly language-independent. While it anchors on Heideltime [19] to detect temporal expressions it can also use a simple rule-based approach (focused on years detection), which, while not as effective as Heideltime, may be a good solution when performance and language is an issue. As a contribution to the research community, we make available an online demo [http://time-matters.inesctec.pt], an API [http://time-matters.inesctec.pt/api], a python package [https://github.com/LIAAD/Time-Matters] and a docker image [https://hub.docker.com/r/liaad/time-matters] of Time-Matters. On the sidelines, we also make public a python package wrapper for Heideltime [https://github.com/JMendes1995/py_heideltime] which aims to facilitate the use of this well-known temporal tagger.
2 Time-Matters Algorithm
Our assumption is that the relevance of a candidate date \({d}_{j}\) may be determined with regards to the relevant terms \({W}_{j}^{*}\) that it co-occurs with in a given context (defined as a window of n terms in a sentence or the sentence itself). That is, the more a given candidate date is correlated with the most relevant keywords of a text \({t}_{i}\), the more relevant the candidate date is for the text at hand. To model this temporal relevance, we rely on the Generic Temporal Similarity measure (GTE) [5], which makes use of co-occurrences of keywords and temporal expressions as a means to identify relevant dates within a text. In this work, relevant keyphrases and temporal expressions are respectively detected by YAKE! keyword extractor [6, 7], and Heideltime temporal tagger [9]. GTE is formalized in Eq. 1 and ranges between 0 (irrelevant) and 1 (relevant), where IS is the InfoSimba similarity measure [10].
A fully detailed description of the underlying scientific approach and the evaluation methodology for the study of queries and multiple documents can be found in Campos et al. [5]. Readers are also recommended to refer to our wiki documentation [https://github.com/LIAAD/Time-Matters/wiki] for an in-depth understanding of the single document version explored in this demo.
3 Time-Matters Demonstration
We demonstrate our approach using an arbitrary text related to the 1st anniversary of the Haiti earthquake held on January 12, 2011. Texts can be given as input in the homepage or as an URL, in which case, we make use of the well-known Newspaper 3k library [https://newspaper.readthedocs.io] to extract contents. The resulting interface is divided into five major components: “Annotated Text”; “Storyline”, “Temporal Clustering”; “Timeline”; and “Scores”. In this paper, we put an emphasis on the first two, “Annotated Text” and “Storyline”, due to space reasons.
Annotated Text.
Figure 1 shows the “Annotated Text” component. At the top, we can observe the time spent to obtain the results, the number of relevant annotated temporal expressions instances and the text language. Time performance is highly dependent on the Heideltime component as computing GTE scores is a quick process. Each date is tagged with a 5-color Likert relevance scale, from least relevant dates (bold red) to most relevant ones (bold green). To get a sense of the relevance of the dates, users can also mouse over a given temporal expression. By default, only relevant temporal expressions, those with GTE scores equal or above 0.35 (according to the experiments conducted in [5]) are shown to the user. Scores close to 1 are considered highly relevant in the particular part of the text being analyzed. Equal date instances in different sentences can also result in different scores (one such approach can be explored in the advanced options section in the homepage). In addition to relevant dates, users can also ask for least relevant ones (scores < 0.35) as exemplified in Fig. 1 for the temporal expression “the afternoon of February 11, 1975” (marked in bold red), which is shown a score of 0. By doing this, we give users the opportunity to understand the effectiveness of the Time-Matters algorithm in filtering out non relevant dates initially marked by the temporal tagger. One can also observe, marked as bold, the relevant keyphrases co-occurring next to the date and that most contribute to the results of Time-Matters. By default, n-grams are set to 1, meaning that keywords will be formed by 1 single token only, though other options can be defined in the advanced options setting.
Storyline Visualization.
The storyline interface (see Fig. 2) explores the different stories of a text through a temporal lens. The component at the top, highlights the relevant dates (“1564”), its score (“0.799”), the sentence where the date occurs and a summary of that particular part of the story (“great earthquake mentioned”) given by YAKE! [6]. The story is also illustrated automatically with images. We leverage on the Portuguese web archive Arquivo.pt [11] images search API v1 [https://github.com/arquivo/pwa-technologies/wiki]. While this API can obtain results for any language it naturally works better for its native language, Portuguese. Users can then navigate between the different time-periods by either clicking at the right row (labelled in this figure example as “Recorded in Haiti, 2010”) or at the bottom timeline component which gives, per se, a temporal overview of the story.
In this paper, we suggest a simple yet effective approach for summarizing a text through a temporal perspective, highlighting the most important temporal aspects of the text. As future research, we plan to investigate further elaborated solutions that study the correlation between the detected relevant dates and the relevant events found in the surroundings of the date. This can be used to improve not only the story description but also the retrieval of images.
References
Alonso, O., Shiells, K.: Timelines as summaries of popular scheduled events. In: Proceedings of the 22nd International Conference on World Wide Web (WWW 2013), Rio de Janeiro, Brazil, 13–17 May 2013, pp. 1037–1044 (2013)
Alonso, O., Tremblay, S.-E., Diaz, F.: Automatic generation of event timelines from social data. In: Proceedings of the 2017 ACM on Web Science Conference (WebSci 2017), New York, USA, 25–28 June 2017, pp. 207–211 (2017)
Alonso, O., Kandylas, V., Tremblay, S.-E.: How it happened: discovering and archiving the evolution of a story using social signals. In: Proceedings of the ACM/IESS Joint Conference on Digital Libraries (JCDL 2018), Texas, USA, 3–7 June 2018, pp. 193–202 (2018)
Campos, R., Dias, G., Jorge, A., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. 47(2), Article 15 (2014)
Campos, R., Dias, G., Jorge, A.M., Nunes, C.: Identifying top relevant dates for implicit time sensitive queries. Inf. Retrieval J. 20(4), 363–398 (2017). https://doi.org/10.1007/s10791-017-9302-1
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: A text feature based automatic keyword extraction method for single documents. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 684–691. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_63
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! keyword extraction from single documents using multiple local features. Inf. Sci. J. 509, 257–289 (2020)
Campos, R., Jorge, A., Jatowt, A., Sumit, B.: Third International workshop on narrative extraction from texts (Text2Story’20). In: Jose, J., et al. (eds.) Proceedings of the 42nd European Conference on Information Retrieval (ECIR’20), pp. 648–653. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_86
Chang, A.X., Manning, C.D.: SUTIME: a library for recognizing and normalizing time expressions. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), Istambul, Turkey, 23–25 May 2012, pp. 3735–3740 (2012)
Dias, G., Alves, E., Lopes, J.: Topic segmentation algorithms for text summarization and passage retrieval: an exhaustive evaluation. In: Proceedings of the 22nd Conference on Artificial Intelligence (AAAI 2007), Vancouver, Canada, 22–26 July 2007, pp. 1334–1340. AAAI Press (2007)
Gomes, D., Cruz, D., Miranda, J., Costa, M., Fontes, S.: Search the past with the portuguese web archive. In: Proceedings of the 22nd International Conference on World Wide Web (WWW 2013), Rio de Janeiro, Brazil, 13–17 May 2013, pp. 321–324 (2013)
Hausner, P., Aumiller, D., Gertz, M.: Time-centric exploration of court documents. In: Proceedings of the 3rd International Workshop on Narrative Extraction from Texts (Text2Story20@ECIR 2020), Lisbon, Portugal, 14 April 2020, pp. 31–37 (2020)
Hausner, P., Aumiller, D., Gertz, M.: TiCCo: time-centric content exploration. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM 2020), Virtual Event, Ireland, 19–23 October 2020, pp. 3413–3416. ACM Press (2020)
Jatowt, A., Campos, R., Bhowmick, S., Doucet, A.: Document in context of time (DICT): system that provides temporal context for analyzing old documents. In: Proceedings of the 28th ACM International Conference on Knowledge Management (CIKM 2019), Beijing, China, 03–07 November 2019, pp. 2869–2872. ACM Press (2019)
Kanhabua, N., Blanco, R., Nørvåg, K.: Temporal information retrieval. Found. Trends Inf. Retrieval 9(2), 91–208 (2015)
Kanhabua, N., Romano, S., Stewart, A.: Identifying relevant temporal expressions for real-world events. In: Proceedings of the Workshop on Time-aware Information Access (TAIA’12@SIGIR’12), Portland, USA, 12–16 August 2012 (2012)
Martinez-Alvarez, M., et al.: First international workshop on recent trends in news information retrieval (NewsIR’16). In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 878–882. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_85
Pasquali, A., Mangaravite, V., Campos, R., Jorge, A.M., Jatowt, A.: Interactive system for automatically generating temporal narratives. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11438, pp. 251–255. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15719-7_34
Strötgen, J., Gertz, M.: Multilingual and cross-domain temporal tagging. Lang. Resour. Eval. 47(2), 269–298 (2013)
Strötgen, J., Alonso, O., Gertz, M.: Identification of top relevant temporal expressions in documents. In: Proceedings of the 2nd Temporal Web Analytics Workshop (TempWeb12@WWW’12), Lyon, France, 17 April 2012, pp. 33–40 (2012)
Tran, G., Alrifai, M., Herder, E.: Timeline summarization from relevant headlines. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 245–256. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_26
Acknowledgements
Ricardo Campos and Alípio Jorge were financed by the ERDF – European Regional Development Fund through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project PTDC/CCI-COM/31857/2017 (NORTE-01-0145-FEDER-03185). This funding fits under the research line of the Text2Story project. Célia Nunes was financed by the Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) through projects UIDB/00212/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Campos, R. et al. (2021). Time-Matters: Temporal Unfolding of Texts. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-72240-1_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72239-5
Online ISBN: 978-3-030-72240-1
eBook Packages: Computer ScienceComputer Science (R0)