Keywords

1 Introduction

Evolution, innovation, and history are typically documented in a chronological manner, with their associated processes set sequentially through a series of highlighted events filtered from a myriad of interactions that ultimately lead to the culminating event. But the reality of history and culture exists within a broader context of those relationships across multiple dimensions that are filtered out because they appear to be lesser influencers in the historical universe. Depending upon perspective and focus, the accepted seminal events in a narrative of history and culture represent a distillation of other smaller tangential narratives. Yet it is the role of archivists to “employ as broad a definition as possible of what records are and of what events and phenomena are worth documenting” [7].

Rebuilding these inter-relationships and piecing together the various interactions in history from archival collections of primary sources is typically a labor-intensive endeavor. In many cases, archival library collections are comprised of various donations or acquisitions, some with questionable provenance and completeness. One may consider that portions of such collections are analogous to fossils or ancient artifacts [6, 8, 14]. Like incomplete buried skeletons, some sections of an archival collection are distributed across various archives and processed with varying degrees of descriptive detail. The frequent archival approach to such disjuncts and variety of information formats (i.e. photographs, drawings, manuscripts, and physical artifacts) is to sort and catalog materials using a combination of chronological, donor-specific, or subject-affinity methodologies. As a consequence, research with primary archival sources is a time-consuming manual process where the various relationships among archival collections are uncovered through happenstance.

This work aims to help people explore, understand, and rediscover the many-to-many relationships of content within library archives using multi-modal narrative agents. Over the last several decades, the scale of accessible information has grown, both in the volume in which information is being gathered and the openness by which information has been shared and made available. This has lead to incredible opportunities for knowledge and discovery. For example, the English Wikipedia alone has more than five million entries. As the oldest technological university in the US, Rensselaer Polytechnic Institutes archival collections document the evolution and impact of technology and engineering from the Industrial Revolution through the Space Age. Our digital collections alone consume 1.5 TB of storage.

As the scale and complexity of available information grows, the ease by which individuals may reasonably be expected to traverse and explore said information unassisted wanes. Furthermore, ones ability to ascertain the veracity of information sources becomes challenged. The amount of information can easily overwhelm a person and prevent one from seeing the underlying relationships among facts and data. We believe that with new digital mediums, people can benefit from machine-assisted methods of data exploration.

In this project, we describe and define digital objects, curated within library archives via a semantic web, with a focused group of individuals composed of librarians and graduate and undergraduate students. The project uses a narrative agent to help people explore the semantic web by actively constructing narratives using information from the semantic web, and to help trigger new information discoveries by posing questions based upon previously undiscovered relationships. In the next sections, we will discuss related work, present our system and report preliminary findings.

2 Narrative as the Basis for Making Sense of the History

Storytelling is ubiquitous with the human cultural experience. To see how prevalent narrative is and has been, one has to look not only at the widespread existence in the past of oral storytelling traditions in many of the worlds oldest cultures, but also observe the success of modern forms of entertainment with underlying narrative, such as cinema and video games. However, besides its apparent utility as a method of entertainment, narrative is also one of the oldest methods by which humans have traditionally exchanged information.

Narrative is concerned with how information is structured in a story: what information is included, how it is ordered, and how it is connected. We use narrative to share the happenings in our lives, to sway each others opinions, and to pass knowledge between one another. Narrative may also have a more basic connection with human knowledge.

Narrative is thought to be intuitive to how humans think about and organize information. Narrative has been posited as one of the general fundamental ways that humans organize knowledge [9]. More specifically, experiences and memories are said to be organized in a narrative fashion, with the various facts of our personal experiences being cast as a series of events and their narrative connections [5]. So too is our understanding of time cast in a narrative light, as a temporal sequence of linked events [2]. Through narrative storytelling, there exist methods for ordering and presenting information that are related to how humans intuitively organize knowledge.

American cognitive scientist Jerome Bruner pointed out that there are two modes of how people make sense of their environment [4]. Bruner calls the two modes the paradigmatic or logico-scientific mode and the narrative mode. The paradigmatic or logico-scientific mode collects facts from ones experience and the narrative mode tries to make sense of the experience. In other words, the narrative process aims at endowing experience with meaning, which is often composed of causal and temporal relationships of events – the core components of narrative. For people, these two modes are used as means for convincing one another: facts convince one of their truths, while stories support their likeness. Our multi-modal visualization and narrative system works in a similar way, by making the relationships among information more visible and thus inspiring people to discover new relationships. To our knowledge, there has not been an interactive storytelling system that is specifically designed to help people discover new information centered and based on library archives.

3 System Architecture

In our previous work, we have applied narrative and storytelling strategies to qualitative information presentation, developing a system to automatically generate narratives from topic-relationship information networks [10, 11], as well as techniques for using multiple interweaving story lines [3], topic anchor points, and analogies [12].

Fig. 1.
figure 1

An example of a subset of a knowledge graph.

We have also created an automated narration system that takes structured open domain information and tailors the presentation to a user using storytelling techniques [3, 10, 11]. It aimed at presenting the information as an interesting and meaningful story by taking into consideration a combination of factors, ranging from topic consistency and novelty to learned user interests.

Starting from any point in a knowledge graph, such as the subset shown in Fig. 1 (left) with part of its XML representation (right), the agent can talk about the knowledge graph by introducing the topics one by one. Note that, while not shown, each directed edge in Fig. 1 has an edge in the other direction with a reciprocal relationship. A diagram of the systems architecture is shown in Fig. 2. When deciding what to talk about next, the agent strives to form a piece of narrative rather than simply listing the facts. It does so in two steps: sequencing and connection.

First, the Topic Sequencer creates an initial sequence of topics to present. Starting from an initial topic, it adds topics to the end of the sequence iteratively. The next topic of the sequence is chosen by balancing multiple objectives related to narrative and user experience, such as suggesting novel content or maintaining spatial and temporal consistency. These objectives form a set of constraints, which are used to score each potential next topic. The best-scoring topic is added to the end of the sequence. The Topic Sequencer is derived from previous work on automatic narrative generation from information networks [10, 11].

Next, the Topic Connector creates connections between topics in the sequence based on pairwise relationships between topics in the knowledge graph. The Topic Connector marks when in the sequence to allude to a future topic in the sequence and when in the sequence to refer back to past topics in the sequence. Examples of both can be seen in Fig. 4. In this example, relationships with the topic directly prior to the current one (blue text), reminders of topics visited several topics ago (yellow text), and allusions to future topics (purple text) can be seen.

The Topic Connector also marks the most appropriate places in the sequence to pause narration of the sequence and explicitly give the user a turn to interact. Places in the sequence to pause narration are selected based on the connectivity between topics in the sequence that have already been presented and topics in the sequence that have yet to be presented. The Topic Connector is derived from previous work on analogies [12] and using multiple interleaving storylines in narrative generation [3].

Fig. 2.
figure 2

System diagram.

Once a connected topic sequence is created, the Narration Manager presents the sequence one topic at a time through the interactive visualization. The interactive visualization consists of three panels, as can be seen in Fig. 3. The left panel shows a map displaying the location of each topic. The center panel shows a timeline with topics represented as nodes, with category lanes and lines connecting topics that are related in the knowledge base. The right panel shows a set of images and the text narration for the topic the Narration Manager is currently presenting. The agent gives an audio narration of the text as well.

At any point in the narration, the user can select a topic in the center panel that they wish for the agent to discuss. The Narration Manager also explicitly asks the user to select a topic at points deemed most appropriate by the Topic Connector. When the user selects a topic, it interrupts the current sequence with a short, new subsequence calculated from the selected topic.

Fig. 3.
figure 3

Screen shots from existing system.

The Topic Sequencer creates a new subsequence, taking into consideration the topics that have been presented in the initial sequence and the new topic selected by the user. The subsequence is inserted after the point where the initial sequence was paused. Then, the rest of the initial sequence is placed after it. Connections between the subsequence and the initial sequence are formed by the Topic Connector. The Narration Manager then continues narration with the new combined sequence, starting with the user’s selected topic. Thus, the storytelling agent is able to both react to the user and maintain a consistent narrative plan.

In the top text in Fig. 4, which shows the text from the right panel in Fig. 3, the system can be seen pausing narration of the sequence and alluding to future topics in the sequence which have yet to be presented and which are related to topics that have been presented. In the bottom text in Fig. 4, the system can be seen describing the user-selected topic later in the same narration.

Fig. 4.
figure 4

Example text from two topics in the same narration. (Color figure online)

4 New Knowledge Discovery While Constructing a Knowledge Graph

We performed a preliminary study of using this interactive narrative system to organize and describe Rensselaer Polytechnic Institute’s library archives. The details of the study and our main findings are included below.

We experimented with creating new knowledge graphs based on archival library content and with a focused group comprised of four librarians, one graduate student and one undergraduate student. When working with the interactive storytelling system, we need to represent the network of archival information as topics and their relationships. More specifically, each topic is treated as a node in a knowledge graph, which has a description and links to other nodes. For example, the node “Alumni Building” and the node “Rensselaer Polytechnic Institute” are linked by the “is part of” relationship.

Prototyping of this project revealed a need for librarians and archivists to move beyond sets of simple format conversion tasks towards the development of new methodologies in controlled vocabularies, metadata, and cataloging. As librarians worked to migrate the institutional archives, which are stored in a hierarchical data structure, e.g. Dublin Core to the new graphical structure associated used by the multi-modal storytelling system, we found that the difference between the two representations of data can often inspire the librarians to form new hypotheses and discover new information. The new information, in turn, becomes part of the knowledge graph and may inspire new discoveries.

For example, a previously known relationship between two topics (a campus president and an architect) had only one type of connection (contractual) – President Ricketts contracted with architect Joseph Lawlor on several building construction projects. While working through the structural migration, librarians began to question if other types of additional connections existed. One building contracted to Joseph Lawlor is the fraternity house for Theta Xi. Librarians began to wonder if both Lawlor and Ricketts were members of the fraternity. After researching the Theta Xi yearbooks, librarians confirmed that both Lawlor and Ricketts were members of the same fraternity, and that Ricketts had pre-dated Lawlor as a fraternity brother. The newly uncovered relationship helped to add greater context to the facts and the narrative constructed by the system.

Thus, the process of creating the knowledge graph from existing archives becomes an iterative hypothesis testing process. In this work, we observed how the same focus group of librarians and archivists went through these iterations multiple times in order to establish appropriate metadata needed to document relationships between data points.

5 Discussion and Future Work

While the Rensselaer Polytechnic Institute librarians searched for information to encode into a knowledge graph for the system, the process of exploration was valuable and revealing. However, despite the value of the process, building the knowledge graph was performed with minimal machine assistance by web-browsing online resources, reading physical archives, and writing the XML file for the knowledge graph by hand. As the librarians’ discoveries were made from the connections they found while building the knowledge graph, and the narrative generation system reveals connections between topics while exploring the knowledge graph, a natural enhancement to the process would be to integrate knowledge base authoring with the narrative system. While exploring the existing knowledge base using the system, new incidental connections, like the relationship between Joseph Lawlor and President Ricketts, can be hypothesized, confirmed with additional external resources, and included into the knowledge base. There are two main directions in pursuit of this. The first is the integration of a knowledge graph authoring tool with the system, such as Jambalaya for Protégé [13] or Lucidchart for the creation of mindmaps [1], to alter knowledge graphs in real-time without directly handling an XML representation. The second is the integration of external resource access and exploration to directly browse and display digital archives.