When Was It Written? Automatically Determining Publication Dates

Garcia-Fernandez, Anne; Ligozat, Anne-Laure; Dinarelli, Marco; Bernhard, Delphine

doi:10.1007/978-3-642-24583-1_22

Anne Garcia-Fernandez¹⁸,
Anne-Laure Ligozat^18,19,
Marco Dinarelli¹⁸ &
…
Delphine Bernhard¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7024))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

757 Accesses
9 Citations

Abstract

Automatically determining the publication date of a document is a complex task, since a document may contain only few intra-textual hints about its publication date. Yet, it has many important applications. Indeed, the amount of digitized historical documents is constantly increasing, but their publication dates are not always properly identified via OCR acquisition. Accurate knowledge about publication dates is crucial for many applications, e.g. studying the evolution of documents topics over a certain period of time.

In this article, we present a method for automatically determining the publication dates of documents, which was evaluated on a French newspaper corpus in the context of the DEFT 2011 evaluation campaign. Our system is based on a combination of different individual systems, relying both on supervised and unsupervised learning, and uses several external resources, e.g. Wikipedia, Google Books Ngrams, and etymological background knowledge about the French language. Our system detects the correct year of publication in 10% of the cases for 300-word excerpts and in 14% of the cases for 500-word excerpts, which is very promising given the complexity of the task.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata

Article Open access 02 March 2020

Automatically Detecting References from the Scholarly Literature to Records in Archives

A dataset of publication records for Nobel laureates

Article Open access 18 April 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Albert, P., Badin, F., Delorme, M., Devos, N., Papazoglou, S., Simard, J.: Décennie d’un article de journal par analyse statistique et lexicale. In: DEFT 2010, TALN (2010)
Google Scholar
Blandine, C., Silberzstein, M.: Dictionnaires électroniques du français. Langue française 87 (1990)
Google Scholar
De Jong, F., Rode, H., Hiemstra, D.: Temporal language models for the disclosure of historical text. In: Humanities, Computers and Cultural Heritage, p. 161 (2005)
Google Scholar
Galibert, O.: Approches et méthodologies pour la réponse automatique à des questions adaptées à un cadre interactif en domaine ouvert. Ph.D. thesis, Université Paris-Sud 11, Orsay, France (2009)
Google Scholar
Grouin, C., Forest, D., Paroubek, P., Zweigenbaum, P.: Présentation et résultats du défi fouille de texte DEFT2011. In: Actes TALN (2011)
Google Scholar
Grouin, C., Forest, D., Sylva, L.D., Paroubek, P., Zweigenbaum, P.: Présentation et résultats du défi fouille de texte DEFT 2010: Oú et quand un article de presse a-t-il été écrit? In: Actes TALN (2010)
Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)
Google Scholar
Kanhabua, N., Nørvåg, K.: Improving temporal language models for determining time of non-timestamped documents. In: Research and Advanced Technology for Digital Libraries, pp. 358–370 (2008)
Google Scholar
Kanhabua, N., Nørvåg, K.: Using temporal language models for document dating. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 738–741. Springer, Heidelberg (2009)
Chapter Google Scholar
Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331(6014), 176–182 (2011)
Article Google Scholar
Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - a case study in intensive care monitoring. In: Proceedings of ICML 1999, pp. 268–277. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Naji, N., Savoy, J., Dolamic, L.: Recherche d’information dans un corpus bruité (OCR). In: CORIA (2011)
Google Scholar
Nørvåg, K.: Supporting temporal text-containment queries in temporal document databases. Data & Knowledge Engineering 49(1), 105–125 (2004)
Article Google Scholar
Nunberg, G.: Google’s Book Search: A Disaster for Scholars. The Chronicle of Higher Education (August 2009) (Online, accessed April 13, 2011)
Google Scholar
Oger, S., Rouvier, M., Camelin, N., Kessler, R., Lefèvre, F., Torres-Moreno, J.: Système du LIA pour la campagne DEFT 2010: datation et localisation d’articles de presse francophones. In: DEFT 2010, TALN (2010)
Google Scholar
Rosset, S., Galibert, O., Bernard, G., Bilinski, E., Adda, G.: The LIMSI participation to the QAst track. In: Working Notes of CLEF 2008 Workshop, Aarhus, Danemark (2008)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, Chichester (1998)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI-CNRS, Orsay, France
Anne Garcia-Fernandez, Anne-Laure Ligozat, Marco Dinarelli & Delphine Bernhard
ENSIIE, Evry, France
Anne-Laure Ligozat

Authors

Anne Garcia-Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Laure Ligozat
View author publications
You can also search for this author in PubMed Google Scholar
Marco Dinarelli
View author publications
You can also search for this author in PubMed Google Scholar
Delphine Bernhard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Università di Pisa, Italy
Roberto Grossi
Consiglio Nazionale delle Ricerche, Area della Ricerca di Pisa, Istituto di Scienza e Tecnologia dell’Informazione “Alessandro Faedo”, Via Giuseppe Moruzzi 1, 56124, Pisa, Italy
Fabrizio Sebastiani & Fabrizio Silvestri &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia-Fernandez, A., Ligozat, AL., Dinarelli, M., Bernhard, D. (2011). When Was It Written? Automatically Determining Publication Dates. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds) String Processing and Information Retrieval. SPIRE 2011. Lecture Notes in Computer Science, vol 7024. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24583-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-24583-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24582-4
Online ISBN: 978-3-642-24583-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

When Was It Written? Automatically Determining Publication Dates

Abstract

Chapter PDF

Similar content being viewed by others

unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata

Automatically Detecting References from the Scholarly Literature to Records in Archives

A dataset of publication records for Nobel laureates

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

When Was It Written? Automatically Determining Publication Dates

Abstract

Chapter PDF

Similar content being viewed by others

unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata

Automatically Detecting References from the Scholarly Literature to Records in Archives

A dataset of publication records for Nobel laureates

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation