Keywords

1 Introduction

Digital libraries (DLs) are information systems that offer services over large sets of digital objects [23]. The traditional search functionalities of DLs, such as EuropeanaFootnote 1, consider that users express their information need through a natural language query, and the digital library returns a ranked list of digital objects. This approach works well on the Web, which may be intended as a very large DL where the objects of the search are pages rich in textual contents, images, video and interlinked each other. On the other hand, this type of traditional search functionality runs poorly on most DLs. The reason is that the digital objects contained in them (e.g. representations of books, manuscripts, photographs, videos) are not meant to be read and navigated on the fly like Web pages and the search is based only on the metadata associated to objects that are semantically poor. As a result, the response to a web-like query on a digital library is typically a ranked list of metadata descriptors. In our study, we aim at overcoming the limitations of the search functionality of current DLs by introducing a new first class search functionality: the narrative. The vision is that a user searching e.g. for Dante Alighieri – the major Italian poet of the late Middle Ages – in Europeana would obtain in response not the ranked list of objects concerning Dante Alighieri but rather a narrative about Dante, made up of a list of events that compose his biography, linked to the relevant objects of the digital library that contextualizes them.

First of all, in order to introduce this new search functionality and to develop an ontology for representing narratives, we studied the Artificial Intelligence (AI) literature, and in particular the Event Calculus theory, to identify the logical components of narratives (e.g. events, actions, fluents, physical object, agents), and give their formal definitions. Then, we mapped these logic components with the terms of the CIDOC CRM ontology [8] to evaluate if it would be possible to take it as reference vocabulary. In this paper we report the result of the study of the AI literature and the result of the mapping activity.

The paper is structured as follows: Sect. 2 reports an overview of the related works, in Sect. 3 the analysis of the AI literature in order to identify the formal components of narratives is presented. In Sect. 4, a mapping between the formal components of narratives and the CIDOC CRM is reported, in order to evaluate if it could be a reference ontology for representing narratives. Section 5 presents a brief discussion of the results of the mapping activity. Finally, Finally, we report our Sect. 6.

2 Related Works

In literary theory, narratology is a discipline devoted to the study of the narrative structure and the logic, principles, and practices of its representation. Computational Narratology studies narratives from a computational perspective. In particular, it focuses on “‘the algorithmic processes involved in creating and interpreting narratives, modelling narrative structure in terms of formal computable representations”’ [10]. The term “Computational Narratology” (CN) can assume different meanings according to the research context. In particular, in the context of Humanities, computational narratology is defined as a methodological instrument for constructing narratological theories, extending narratological models to larger bodies of text, providing precise and consistent explication of concepts [24]. From a cognitive computing point of view, this term refers to narrative texts, computer games, and more in general, software developed using semiotic, sociolinguistic and cognitive linguistic theories [12]. In the Artificial Intelligence field, computational narratology refers to the story generation systems, i.e. any computer application that creates a written, spoken, or visual presentation of a story. Storytelling systems aim at reproducing a human-like narrative behaviour or at creating interfaces or game environments using narrative as interactive method. Some of the early storytelling systems are TALE-SPIN [22], UNIVERSE [17], GESTER [29] and JOSEPH [15] that changes the story grammars to create new stories. Other storytelling systems are MINSTREL [40], MEXICA [30] and BRUTUS [3]. These are hybrid systems that implement a computer model of creativity in writing. Recently, ontologies were used to generate narratives. For example, MAKEBELIEVE [18] uses common-sense knowledge, selected from the ontology of the Open Mind Commonsense Knowledge Base [39], to generate short stories from an initial one given by the user. ProtoPropp [11] uses an ontology of explicitly relevant knowledge and the Case-Based Reasoning method over a defined set of tales. In FABULIST [33] the user supplies a description of an initial state of the world and a specific goal, and the system identifies the best sequence of actions to reach the goal.

The concept of event is a core element of the narratology theory and of the narratives. People conventionally refer to an event as an occurrence taking place at a certain time in a specific location. Various models have been developed for representing events on the Semantic Web, e.g. Event Ontology [32], Linking Open Descriptions of Events (LODE) [37], the F-Model [36]. More general models for semantic data organization are the CIDOC CRM [8] and the Europeana Data Model [7]. Narratives have been recently proposed to enhance the information contents and functionalities of DLs, with special emphasis on information discovery and exploration. For example, in the CultureSampo project [13] an application to explore Finnish cultural heritage contents on the Web, based on Semantic Web technologies, was developed. This system uses an event-based model and makes links among events and digital objects. However, it does not allow visualizing the event and the related digital objects as a semantic network provided with the semantic relations that connect events and objects. Another example is Bletchley Park Text [28], a semantic application helping users to explore collections of museums. Visitors express their interests on some specific topics using SMS messages containing keywords. The semantic description of the resources is used to organize a collection into a personalized web site based on the chosen topics. In the PATHS project [9] a system that acts as an interactive personalized tour guide through existing digital library collections was created. In this system the events are linked by inherence relations. Similar to the approach of PATHS project, within the CULTURA project [1] a tool to enrich the cultural heritage collections with guided paths in the form of short lessons called narratives was developed. The Storyspace system [42] allows describing stories based on events that span museum objects. The system is focused on the creation of curatorial narratives from an exhibition. Each digital object has a linked creation event in its associated heritage object story.

The OntoMedia ontology [16] allows annotating the narrative content of heterogeneous media through description of the semantic content of that media (e.g. literary texts, TV program). The representation may be limited to the description of some or all of the elements contained within the source or may include information regarding the narrative relationship that these elements have both to the media and to each other. Another example is the tool developed within the Cadmos project [19], which adopts a computer-supported semantic annotation of narrative media objects (video, text, audio, etc.) and integrates a large commonsense ontological knowledge. A narrative ontological model has been developed also by the Labyrinth Project [5]. The Labyrinth system allows users exploring digital cultural heritage archives and is based on narrative relations among knowledge resources.

3 Logic Definitions of the Components of Narratives

In this Section we report the formal logic definitions of the components of narratives as defined in the Event Calculus theory, with a brief mention to the Situation Calculus as related background.

The Situation Calculus (SC) is a logic language for representing and reasoning about dynamical domains [20, 21]. In dynamical domains the scenarios change because of the actions performed by the agents. A dynamic world is modelled as a series of situations resulting from actions performed in the world. SC represents changing scenarios as a set of first-order formulae, sometimes enriched with some second-order features [41]. The basic elements of the calculus are:

  • Situations. A situation represents a sequence of actions. The situation is a state resulting from these actions. Sequences of actions are represented using the function symbol do, so that do(a, s) represents the new situation after that the action a is performed in situation s.

  • Fluents. Fluents are functions and predicates that vary over situations (e.g. location of the agent). Fluents are situation-dependent components used to describe the effects of actions. The fluents can be distinguished in: relational fluents and functional fluents. The former has only two values: true or false, while the latter can take a range of values. As a convention, the situation is the last argument of a fluent [21], e.g. Holding(G1, S0) where S0 is the situation.

  • Actions. Actions are changes performed by agents from a situation to another in a dynamic world. Each action can be described in the simplest version of Situation Calculus using two axioms: (i) the Possibility Axiom that specifies when an action can be performed; (ii) the Effect Axiom that defines the consequences of an executed action.

SC works well when there is a single agent performing instantaneous, discrete actions. When actions have duration and can overlap with each other the alternative formalism is the Event Calculus (EC) [14, 26, 27], which is used for reasoning on actions and changes and it is based on points rather than on situations. EC allows reasoning over intervals of time and fluents are time-dependent rather than situation-dependent. EC axioms define a fluent true at a point in time if “the fluent was initiated by an event at some time in the past and was not terminated by an intervening event” [34].

Davidson [6] defines actions as a sub class of events. In Davidson’s opinion, the distinct sign between general events and actions is the intentionality of actions, e.g. when an agent performs an action for a reason.

Like SC, Event Calculus has actions. However, the Davidson’s distinction between events and actions is not present. In the EC actions are events. In the following list we reported the logical definitions of the components of narratives of some interest for our representation.

  • Generalized events. In the context where actions and objects are aspects of a physical universe with a spatial and temporal dimension, a generalized event is a space-time chunk. This abstraction allows thinking to generalized event concepts like actions, locations, times, fluents and physical objects.

  • Mental events and mental objects. The relations between an agent and “mental objects” like believes, knows and wants, are called propositional attitudes, because they identify attitudes that agents can have towards a proposition [34]. Using the reification method, it is possible to turn a proposition into an object that could become an argument of a sentence (because only terms and not sentences can be arguments of predicates).

  • Narrative. As reported in [41], a narrative is a possibly incomplete specification of a set of actual event occurrences [25, 35]. The EC is narrative-based, unlike the standard SC in which an exact sequence of hypothetical actions is represented.

Following the narratology theory [31, 38], we envisage a narrative as consisting of two main elements: the fabula and one or more narrations of the fabula. The fabula is built on top of basic events (including actions), endowed with:

  • a mereological relation, relating events to the sub-events that compose them.

  • a temporal occurrence relation, associating each event with a time interval during which the event occurs; an event occurs before (or during, or after) another event just in case the period of occurrence of the former event is before (or during, or after) the period of occurrence of the latter event.

  • a causality relation, relating events that in normal discourse are predicated to have a cause-effect relation.

4 CRM Ontology Mapping

In order to develop a semantic model to represent narratives, on top of which developing the new search functionality for DLs, we evaluate to use the CIDOC Conceptual Reference Model (CRM) as reference ontology. The CIDOC CRM (CRM for short) is a high-level ontology and an ISO standardFootnote 2 that allows the information integration of the data relating to the cultural heritage domain and their correlation with the knowledge stored in libraries and archives [8]. The CRM promotes a shared understanding of cultural heritage information through a semantic framework that any cultural heritage organization can use to map its cultural objects. The evaluation was based on the mapping between the logic components of narratives and the definitions of the terms included in the CRM. The result of the mapping is reported below. The definitions are extracted from the CRM official documentationFootnote 3.

  • Event. In the CRM, the class E5 Event corresponds to the definition of event in the EC theory. This class “comprises changes of states in cultural, social or physical systems, regardless of scale, brought about by a series or group of coherent physical, cultural, technological or legal phenomena. Such changes of state will affect instances of E77 Persistent Item or its subclasses”.

  • Action. Actions identified by Davidson correspond to the class E7 Activity in the CRM. “This class comprises actions intentionally carried out by an actor that result in changes of state in the cultural, social, or physical systems documented. This notion includes complex, composite and long-lasting actions such as the building of a settlement or a war, as well as simple, short-lived actions such as the opening of a door”.

In order to refine our mapping, we analysed the single types of generalized events that are useful to represent the “factual” components of events and we mapped them with the classes of CRM.

  • Agent. The CRM uses the class E39 Actor to represent people, either individually or in groups, who have the potential to perform intentional actions.

  • Location. This concept is represented in the CRM through the class E53 Place. “This class comprises extents in space, in particular on the surface of the earth, in the pure sense of physics: independent from temporal phenomena and matter”.

  • Time. CRM uses the class E52 Time-Span to represent this concept. “This class comprises abstract temporal extents, in the sense of Galilean physics, having a beginning, an end and a duration. Time Span has no other semantic connotations”.

  • Physical Objects. In the CRM the class E18 Physical Thing describes “all persistent physical items with a relatively stable form, man-made or natural. Depending on the existence of natural boundaries of such things, the CRM distinguishes the instances of Physical Object from instances of Physical Feature, such as holes, rivers, pieces of land etc”.

  • Mental Objects. In the CRM the class E28 Conceptual Object comprises “non-material products of our minds and other human produced data that have become objects of a discourse about their identity, circumstances of creation or historical implication. The production of such information may have been supported by the use of technical devices such as cameras or computers”.

The relations defined on the events (and actions) of the fabula, are expressed by the following CRM properties:

  • Mereological Relation. The mereological relation is represented using the property P9 consists of (forms part of), which associates an instance of E4 Period with another instance of E4 Period that is defined by a subset of the phenomena that define the former. Note that E5 Event is a sub-class of E4 Period, therefore P9 can be used also as an event mereology.

  • Event Occurrence Relation and Temporal Relation. The event occurrence relation is represented by the CRM property P4 has time-span (is time span of), which describes the temporal confinement of an instance of an E2 Temporal Entity and therefore of an event. Because the period of occurrence of an event may not be known, the CRM allows to directly relate events based on their occurrence time. To this end, it introduces seven properties (P114 to P120) mirroring the temporal relations formalized by Allen’s temporal logic [2].

  • Causality Relation. The causality relation is represented by the CRM property O13 triggers (is triggered by), which is actually part of an extension of CRM, the CRMSciFootnote 4. O13 associates an instance of E5 Event that triggers another instance of E5 Event with the latter (. . .); in that sense it is interpreted as the cause.

5 Discussion

As result of the mapping, the identified logic components of narratives can be defined using classes and properties of the CRM. Furthermore, the CRM provides several sub classes of E5 Event which recognize types of event (e.g. E63 Beginning of Existence, E64 End of Existence, E65 Creation). These sub classes are useful to establish a first categorization of events. Furthermore, another advantage of the use of the CRM is the existence of CRMinf, an extension of the CRM, which we are evaluating for the description of the inference processes of the narrator. Indeed, we additionally considered to represent the inferential process of a narrator who reconstructs the events composing a narrative starting from the study of the primary sources, e.g. a scholar who studied Dante Alighieri’s biography analyses primary sources and on the basis of them s/he identifies and justifies that a particular event has to be included in Dante’s life. Our model aims at describing the knowledge provenance, i.e. the process of tracing the origins of knowledge [4]. Reconstructing the inference process is important to evaluate the trustworthiness of the knowledge. Using this information, users can determine the quality of the knowledge based on its derivations. CRMinf is a formal ontology supporting the explicit representation of contextual information about data. In particular, it aims at representing data attribution, scientific concepts of observation, inferences and beliefs. Generally speaking, CRMinf represents “integrating metadata about argumentation and inference making in descriptive and empirical sciencesFootnote 5, such as biodiversity, geology, geography, archaeology, cultural heritage, conservation, research IT environments and research data libraries”Footnote 6.

The components of narratives defined in the previous Sections could be intended as a first conceptualization of an ontology for representing narratives. For this reason, we have started the validation of these components by partially expressing them in the CRM and by using them to formally represent some of the main events that compose the biography of Dante Alighieri, selected as case study. Our representation of the events of Dante’s life is derived from a biography of the poet written by an authoritative Italian biographer of Dante.

6 Conclusions and Future Work

In this paper we have described a study of the Artificial Intelligent literature, especially of the Event Calculus theory, in order to identify the formal components of narratives. Then, we have mapped these components with the classes and properties of the standard ontology CIDOC CRM to evaluate if it could be taken as reference vocabulary to construct an ontology for representing narratives. On the top of this ontology for narratives, we aim at developing a new search functionality for DLs. Indeed, one of the main problems of the current DLs is the limitation of the informative services offered to the user. DLs provide simple search functionalities which return a list of information objects but no semantic relation among them is usually reported. Our aim is to allow the DLs to return a narrative instead of a simple list of objects. This narrative is based on the events that compose it linked to the correlated objects of the DL and endowed with a set of semantic relations that connect these events into a semantic network meaningful to the user. After this first study to identify the formal components of narratives and the mapping with the CIDOC CRM, we are currently working on creating an ontology for narratives.