Fig. 1
figure 1

Physical and digital resources to aid a literary traveller

1 Introduction

Many physical spaces are (mostly invisibly) inhabited by literary figures: Joyce’s Mr Bloom walks through Dublin, Ireland [33]; Tremain’s Harriet goes shopping in Christchurch, New Zealand [46]; Jane Austen’s characters travel through Bath, England [2]. Sometimes their presence is marked by physical references, such as quotations on the pavement (e.g., O’Connell Street in Dublin) or commemorative plaques on houses (e.g., The Assembly Rooms in Bath). Book lovers can go on ‘literary tours’ to follow their heroes’ footsteps [45].

Other spaces carry invisible layers of myths, poetry, or history. For example, the village Grasmere—as well as mountains and other features in the English Lake District—has been written about in poems by Wordsworth and others.Footnote 1 Some locations are even physical reminders of events and stories that originally belonged to other places, such as Tolkien’s Middle Earth which originated in England now being mapped onto the New Zealand landscape.Footnote 2

In preparation for a literary tour a traveller today might first “look up” their travel destination in Wikipedia, as a springboard to access related information. They may decide to download the works of an author they admire, related to the place they are visiting (say, Jane AustenFootnote 3). They might even start exploring the texts for references to locations to visit and peruse other resources to organize their literary tour. Figure 1 shows example resources used to identify places: (a) the text itself may be read carefully to identify locations, such as the Pump Room in BathFootnote 4 which could then be located on a map, (b) a physical literary map may be used to identify relevant locations and their respective places in a book,Footnote 5 (c) online lists link location descriptions (e.g., “Outside front entrance of the Town Hall”) to literary textsFootnote 6 or (d) provide GPS map markers for literary texts.Footnote 7 For some locations, physical markers have been put in place, such as literary coalhole covers in London,Footnote 8 as shown on the map in Fig. 1e. Other resources provides location information for selected books, such as in Fig. 1f, in which novels by Jane Austen are analysed for locations and quotes are listed next to a small map dedicated to the location and its surroundings.Footnote 9 Finally, Fig. 1g is a screenshot of Placing Literature,Footnote 10 an online service that allows users to annotate GPS coordinates on a digital map with comments such as relevant book scenes. Our literary tourist may combine the information gleaned from these diverse resources and create their own literary tour.

Given their travel plans, they might further choose to download audio book versions onto a mobile device before leaving on their trip.Footnote 11

All that has been described so far is possible using existing software applications—albeit, in a piecemeal approach. The traveller has to collect information about locations from physical or digital maps and lists of location descriptions. The link to the textual resource is often imprecise, and they might have to read disconnected quotes to resolve where a location is. An audio book is not synchronized with the traveller’s current location. Finally, the traveller needs to prepare these mashups between location and book text in advance or at least be aware of these links ahead of travelling.

The aim of the work presented here was to bring these elements seamlessly together in one software application. Furthermore, we desired a system where the loading of relevant information does not need to be done explicitly. Rather, a profile of the user’s interests is maintained and the software makes decisions as to what information to alert the user to, based on their location. Electronic navigation and geographic positioning are becoming increasingly common as travel support: many new mobile apps on smartphones offer location-aware information. We explore here the use of audio and text-to-speech functions to make digital books audible in their appropriate location. In particular, we describe our work in developing this idea from conception to realization in our prototypes of Tipple, which in turn draws upon research into alerting services as well as digital libraries [10].

The structure of the article is as follows: we start with a brief usage scenario and a summary of the 18 functional requirements that were established to meet the aims of the work. In this there is a strong focus on audio, as this medium is well suited to a mobile environment. We then review related work. We discuss concepts of literary tourism and evaluate six research-based travel guide systems, with respect to the functional requirements. This article also includes overviews of TIP [28] and Greenstone [48], respectively, the tourist information system and digital library software that the work builds upon. In Sect. 4 the design is presented, detailing new features and where existing software components in TIP needed to be extended; no changes were needed to Greenstone. In Sect. 5 implementation details are given. Sect. 6 describes the setup and results of our first field study, in which participants visited the Hamilton Gardens using Tipple with a reference-style book.Footnote 12 Section 7 reports on two further studies executed with books with fictional content. We discuss the insights gained from our three field studies in comparison with related results and projects in Sect. 8, before concluding the articles in Sect. 9.

2 Functional requirements

2.1 Usage scenario

Imagine a traveller’s phone installed with the Tipple software and loaded with a profile of their interests. When their location matches some literary works of an author in whom they are interested, a “chirp chirp” sound is played through their headphones, alerting them to an area of interest. Glancing down at their smartphone, they see that a location nearby has been highlighted and marked with a book icon. For instance, this may indicate a chapter from Austen’s Persuasion near the Assembly Rooms in Bath, a poem referring to a street in Auckland, or Wordworth’s poem Grasmere about the Lake District. They choose to deviate from their planned route, select the audio to play, and then listen to the chapter or poem referring to the location they are in. On concluding the audio, information about further available chapters are displayed, in the case of a book, or else other places nearby with literary connotations. From here the user can decide to resume their journey as planned or else to choose to travel to one of the nearby literary locations to hear another text spoken in the environment in which it was set.

A functional requirements analysis was undertaken to establish the features and characteristics of the software to be developed. We distinguish requirements for location-awareness (L1–L9) and requirements for presenting audio access (A1–A9).

2.2 Requirements for location awareness

Location-based access requires an awareness of the locations of both literary items and (potentially mobile) users.

  1. (L1)

    Generic service We are interested in finding a generic solution or service for location-based access to books. This means that the software should be able to process location information of any books being made available in a predefined input format.

  2. (L2)

    Identification of relevant book parts Based on the user’s current location, the parts of a book that refer to this location need to be identified. This goes beyond creating separate text chunks for each location; rather the literary work needs to receive a location markup or location annotation. The locations that are relevant to a given book are called points of interest (POI) or landmarks.

  3. (L3)

    Presentation of relevant book parts The parts of a book that are relevant to a given location need to be made available to the user for consumption.

2.3 Landmarks presentation

The locations relevant to a book need to be indicated to a user, preferably allowing access via different context filters.

  1. (L4)

    Landmarks by distance from current user location The presentation of landmarks in relationship to the user’s current location allows the user to choose a direction and order of landmarks for easy spatial exploration. This is advantageous when only selected places in a book are visited, potentially in conjunction with other travels, or independently of a particular book.

  2. (L5)

    Landmarks by book order Presenting landmarks in the order in which they appear in the book allows the user to read the book concurrently with their travels, following the story through different locations.

  3. (L6)

    Landmarks on a single map The service needs to support user travel decisions in relation to book landmarks. For this, the relevant landmarks should be presented on a single map such that their relative distances can be explored. Additional cues might be given about the order in which landmarks appear in a book (e.g., by POI numbering or travel path indication).

2.4 Mobility support

To allow for book access in relation to the location of mobile users, the system itself also needs to provide features of mobility.

  1. (L7)

    Offline use It cannot be assumed that all locations relevant to a book will be provided by web access. The service needs to be able to cope with phases of disconnectedness.

  2. (L8)

    Design for mobile interface Mobile services are typically available on hand-held devices which have small screens and are used outdoors. The system’s interface needs to be designed for use in such conditions.

  3. (L9)

    Location-based alerts As users are mobile, it cannot be assumed that they are able to continually interact with the service. They may need to concentrate on traffic, for instance. The service, therefore, should offer some type of low-key alert as the user moves into the proximity of a book’s landmark.

2.5 Requirements for audio-based access

Digital books include both audio books via MP3 files and books in text form loaded from the Digital Library.

  1. (A1)

    Playing chapter-based books (MP3) Books that have a chapter structure typically follow a narrative and should be available for reading in chronological order. The service should also let the user know about the order and location of other chapters in the book.

  2. (A2)

    Playing reference books (MP3) Reference books are collections of articles; only parts of the book may be related to the user’s current position and profile. Users should receive appropriate information sourced from this material relating to their current location and be able to choose whether or not to play them.

  3. (A3)

    Text-to-speech Books from the digital library need to be read to a user. Books are loaded based on a user’s current location and their profile information. When the texts are loaded from the digital library, text-to-speech software needs to convert the text to voice.

  4. (A4)

    Text-to-speech and text display In addition to having the text read out, a user may want to follow the textual form. They should be given the option to read the text, listen to the spoken text, or to do both at the same time.

  5. (A5)

    Audio for this book Additional MP3 files may be available and are independent of the text. When accessing a chapter in the digital library, the corresponding audio of this chapter should be loaded and played if available. Hyperlinks should connect MP3 books and those from the digital library so that a user can switch between the two modes.

  6. (A6)

    Audio books control The system should alert the user (e.g., by a sound) to let them know that some literary text is available to be played/displayed at a location. A user dialogue window should give the user the option to play the book and also indicate other relevant chapters or items that are available.

2.6 Mobility of audio content

With digital libraries being transported onto mobile devices [4], displaying library content requires adjustments to accommodate the limitations of screen size and limited input options. Furthermore, the context of the usage situations is dominated by multi-tasking and distracted attention [29]. We, therefore, identified general requirements that need to be fulfilled to support easy reception of audio content.

  1. (A7)

    Anti-overlap When a new object presents itself or the user wants a book to be read, this should not automatically play over the top of an existing audio item. The user should be able to choose whether to interrupt any current audio.

  2. (A8)

    Interaction audio control This feature is similar to standard audio players using play lists and play control functions. Users can use the controls to start or stop the audio, as well as insert into and delete audio from the play list.

  3. (A9)

    User preferences Users should be able to set their preferences, e.g., for repeat-gap and anti-overlap. The repeat-gap is used to identify the time between each loop of an audio alert. The anti-overlap parameter allows the user to select whether they want the alerts for multiple sites to be mixed together, or played one by one, and if they are to interrupt book readings.

This list of requirements is used for both the comparison of related work and as an implementation guide.

3 Related work

We first describe the two systems this research builds upon: TIP and Greenstone (Sect. 3.1). Then we discuss related approaches and studies. As there are no comparable studies for location-based audio books in digital libraries we evaluated web-resources and apps for literary tourism (Sect. 3.2) and mobile guide systems that use audio support (Sect. 3.3) and audio books in digital libraries (Sect. 3.4). Later in the article (Sect. 8), we provide a broader discussion and comparison of our study results to related studies on audio presentation for mobile users.

3.1 Foundations

The location-based access to audio books is built on the combined foundation of an electronic travel guide and a digital library.

3.1.1 Tourist guide system TIP

The mobile tourist information provider (TIP) combines an event notification service and a location-based service to alert its users to interesting tourist attractions in their vicinity [28]. TIP was selected as the basis for Tipple as it provides the location-awareness part of the project. The information is tailored to the user’s context: their personal interests (as defined in a profile) and their current and past location (as measured by GPS). Users will not receive the same information twice, except when they explicitly ask for it [27]. TIP was an ideal choice as it was developed around the concept of overlapping text annotations instead of distinct text elements, to allow for re-use of the text elements in varying combinations depending on user interest [30].

3.1.2 Digital library: Greenstone

The open source Greenstone software was chosen for the digital library component of this project [6]. It can organize many types of electronic documents such as text, HTML, pictures, music and audio or video. Of particular significance to this project is Greenstone’s ability to configure the structure of information, such as links to documents and sections within documents.

We use the digital library as a document server being accessed via the TIP system. This differs from the normal pattern of usage in Greenstone, where it is the user who formulates and initiates the query. Greenstone itself will not be modified in this project; instead we use the Greenstone/TIP bridge (see Sect. 4).

3.2 Literary tourism

Literary tourism is a type of cultural tourism that focuses on real-world settings associated with works of fiction and the authors’ lives [15]. It has a surprisingly long tradition in the English-speaking world, including visits to the settings of classical literature, and places associated with their authors. In recent times, media-related tourism (i.e., places associated with films or music) has gained similar significance [11].

The traditional frameworks of describing tourism [37, 38] refer to tourists and the sites they visit, and to the markers that identify the sites to the tourists, possibly giving further information. Cunningham and Hinze [15] extends the model by adding as mediators those who provide access to the sites for the tourists (e.g., councils, tour operators, or providers of apps and software). It identifies five tourist personas, reflecting on their motivations for engaging in literary tourism: literary Pilgrims (intense personal and intellectual connection to tour), Heritage Tourists (general interest in tour), Dabblers (minor addition to other tourism), Romantics (excited by ‘untouched’ history and places), and Nostalgics (tour links with personal life history). Pilgrims and Heritage Tourists are assumed to be willing to engage in material before or after a tour, whereas Dabblers, Romantics, and Nostalgics are less likely to do so. Similar characteristics hold for information presented in a mobile app: Pilgrims and Heritage Tourists might appreciate receiving significant amounts of information as they travel on location, whereas the other three are more likely to prefer a more lighthearted presentation. For example, Dabblers might enjoy ‘edutainment’ interactions, while Romantics and Nostalgics might view any collaborative offers as intrusion. Comparing our requirements to the persona requirements, we can see that our software would be suitable for Pilgrim and Heritage travellers in particular and might appeal to some Dabblers. The preferences of Romantics and Nostalgics are not addressed by our requirements.

We will use both the framework from [15] and our requirements from Sect. 2 to explore some websites and apps available for literary tourism.

3.2.1 Specific book resources

The first group is a selection of typical web-resources tailored to specific books. These are typically one-off solutions that offer location information in connection to a book but without the actual book content. We selected some typical resources to discuss here as examples. There are private blogs (e.g., Lost Symbol Footnote 13), single webpages (e.g., da Vinci Code Footnote 14), and complex sites (e.g., Tale of Genji Footnote 15) referring to locations and books, lists compiled by communities (Sherlock Holmes’ London Footnote 16), and location lists for books compiled by travel agencies (e.g., Eat Pray Love Footnote 17), and online travel booking systems (e.g., Harry Potter Footnote 18). Typically these resources provide images of locations and reference the part of the relevant book. The structure is typically clustered around locations (e.g., London, Paris, Milano for the da Vinci Code) and does not follow the structure of the book’s narrative. The books’s content is reproduced using small quotes, if at all. Mobile use does not seem to be intended and no reference is give to the current location of the reader.

In their presentation and the material provided, the resources seem to be aimed at a mix of travellers. For example, the Tale of Genji and the London of Sherlock Holmes provide detailed information for Pilgrims and Heritage travellers, whereas the da Vinci Code and Harry Potter seem to target Nostalgics and Dabblers. Eat Pray, Love was the only resource that was found to be directed at Dabblers. The specific book resources are often created by travellers and interested laypeople themselves. They are specific (not generic as in L1) and usually do not have any awareness of mobile users (L2, L4, L7–L9). There is no access to full-text or audio (L3, A1–A9). The top row in Table 1 summarises our observations with regard to the requirements.

Four of the example sources introduced in Sect. 1, Fig. 1a, b, e, f, belong to this category of book-specific resources and show at least the limitations as outlined above; some are more restrictive.

3.2.2 Tailorable resources

We now describe generic approaches to tours or services that can be tailored to different literary works. There are mobile apps (such as Novel Navi Footnote 19 and the Southern Literary Trial Footnote 20), online aggregations and maps (e.g., Bookfriend,Footnote 21 codexMap Footnote 22 and Atlas of Fiction,Footnote 23) online resources focussing on books (Google Books Footnote 24 and Google Lit Trips Footnote 25).

Table 1 Evaluation results for analysis of related work

Novel Navi is a mobile app for Kyoto, which contains a collection of short novels written specially for the app by professional and amateur authors. The Southern Literary Trail is an app that guides users through sites associated with literary figures of the southern USA. For example, users can visit the New Orleans apartment where Tennessee Williams began writing “A Streetcar Named Desire.” Bookfriend is an Android mash-up app that aggregates information available online to provide a comprehensive view of a book. Users can discover facts about their favourite books, such as author information and film adaptations, as well as read reviews by other people. CodexMap is an online map in which an automatic harvester and online users can place books graphically. The Atlas of Fiction is a privately maintained map of locations that appear in fiction books. No further information about books or authors is given. Google Books digitizes paper books and makes them available online for searching. Search and cross-referencing allows one to find further books. Google Lit Trips provides downloadable files that mark the journeys of characters from literature on the surface of Google Earth. At each location along the journey there are place markers with pop-up windows containing a variety of resources including relevant media, discussion starters and links to supplementary information about “real world” references made in that particular portion of the story.

These generic approaches cover a broader range of travellers than the individual book resources. They all cater for Pilgrims and Heritage travellers (with their many details of information), and most also address Dabblers with easy interfaces and potential inclusion of social networks. CodexMap and Atlas of Fiction make low demands on travellers and, therefore, might even cater for Romantics and Nostalgics. We note that support for Dabblers is more widespread among these systems (see also [15]).

These approaches are apps, maps or resources that aim to provide information for several books (see L1 in Table 1). None of these systems are truly location-based as they present the data independent of the user’s location (see L2 in Table 1), predominantly in unordered lists from which the user then has to select the location or topic of interest. Consequently, none of these systems provide location-based alerts to help direct the user’s attention while travelling (L9) or assist in navigation in relation to the user’s location (L4). Full texts of books are only (to some extent) available in Google Books (L3), and the location information is not ordered by book content (L5). Presentation of landmarks on a single map (L6) is only available in the generic services, which may be due to the more ad-hoc way of compilation and presentation in many other sources. Similarly, support for mobility (L7 and L8) is only available in some dedicated apps. None of the required audio features (A1–A9) are supported.

Two examples from the introduction, shown in Fig. 1c, d, are location-specific but cover several literary works, while the Placing Literature system shown in Fig. 1d is more generic but also does not include any reference to the user’s location or any full-text elements of the literary works.

3.3 Travel guide research

We evaluated six tourist guide systems, which partially match the requirements described in Sect. 2. The lower part of Table 1 compares these systems. We do not focus on Audio Navigation that directs users by audio to their target, which has been described elsewhere [29, 47]. Also the term audio icon has more than one meaning in the literature. In some publications it refers to a graphic representation in the user interface indicating that an audio signal is available once the icon is selected (e.g., pronunciation help in online dictionaries). In this article the term audio icon means a user is actively notified that information is available by playing a sound, without the need of the user to select it—a form of ‘audio alert’.

The mobile tourist system Access Sight [35] is designed for both sighted and blind users. It uses ‘Hearcons’ (equivalent to our audio icons). Sighted users are guided by both visual information and audio and blind users rely solely on auditory information. To navigate successfully, blind users need to build up from the audio information a mental model of the area. Different groups of sounds are related to different types of objects. Conversion of text to audio has been used to identify places of interest for blind people; it can also be an additional service for sighted users. The spoken text is not based on chapters, so it does not need Audio Books Control as it does not link to any audio. It has anti-overlap (A7), a function that has been implemented to let blind people decide when to hear the audio information. Playback of audio cannot be controlled as there is no provision to set any parameters for audio.

Hear&There [22, 32] is an augmented audio reality system that provides basic mechanisms for navigation and orientation in the form of a digitally recorded voice. The system does not use text-to-speech. The auditory information is based, for example, on the museum artifacts that are in the vicinity of a user. Audio is controlled by user movement, such as stop and slow down. It is not based on chapters, nor does it support Audio Books Control. The sound radius and volume are increased for those artifacts that relate to the current interest of the user. To minimize overlap of audio, the number of audio items played is controlled by the walk speed. For example, if a user walks through a room quickly the audio related to that room will not be played.

The Tour Guide for Travelers [9] delivers multimedia information and other services to a user’s mobile device. Tour Guide is suitable for a museum visit, taking a walk in a city, or a car trip, for example. When a user gets close to a place of interest the system will alert the user by ‘earcon’/audio icon. Different types of earcons are associated with different types of information. The system will ask a user whether they want to visit the signalled place. This is similar to Audio Books Control, which asks a user whether they want the book. Next, the user can push a button on the screen to view more information about the signalled place. Bellotti et al. do not indicate what the system does when new information arrives. This system does not use text-to-speech or digital recording so it does not support the requirements related to books via MP3, books via a digital library and most of the required interactions.

The GUIDE system [12] for tourists offers a button with which users can ask for information pertaining to their new location. To help a user to remember that a button on the mobile device needs to be pressed, the system uses a sound that gives notice to users to trigger the application. It presents the same sound for different types of object. GUIDE employs two steps to anti-overlap: one is that users simply press a back button, and the second is that the system forces the current information to remain on the screen and alerts the user that some new information is coming (e.g., a waving flag on the screen). The authors do not mention whether there is any need to set any user parameters. The information presented by GUIDE is text and graphics and the stages for pushing in new information and the anti-overlap are similar to our project. This system does not use text-to-speech and digital recordings, i.e., it does not support the requirements related to Books via MP3 and Books via a digital library. No information is available if the new information is based on a user’s interest.

The Automated Tour Guide [8] is a museum guide system that can play descriptions of museum artifacts. The audio is triggered by walking close to the artifacts and will stop by walking away. The audio is not based on a user’s interest. An infrared transmitter placed in the ceiling provides accurate positions such that only one audio is played about the nearest artifacts, to avoid any overlap. The anti-overlap mechanism has no explicit interaction with the user. There is a button to trigger the audio that directs users to other parts of the museum. In this system, each user carries their own digital audio source, so that the descriptions can be heard on each user’s own time schedule. The system does not use text-to-speech and does not support the requirements related to Books via MP3, Books via a digital library or most of the required interactions.

3.3.1 Limitations

Given the sphere of operation the systems target, unsurprisingly, none of these systems refer to literary works. Further, none of them allow the handling of location annotations to existing text: they all require the text elements to be provided as distinct chunks of text that are stored with the POI. None of the systems have any concept of order or relation between text elements other than by location. This significantly limits their potential use for the location-based presentation of books from a digital library.

3.4 Digital library systems

A number of digital library (DL) projects have employed components that are also used in our work, such as text-to-speech synthesis, interactive map visualizations and access from mobile devices. Since these projects focussed on different issues, and use a different mixture of these components, we restrict ourselves to a brief overview of the differences and similarities. As with Access Sight, a primary use of text-to-speech technology in DL systems has been to provide capabilities for print-disabled users, such as the visually impaired [3, 35, 36]. While the “hands-free” aspect is of relevance to this work, our requirements are not so strict. For example, near field communication (NFC), to simplify user certification [1], is not required. Indeed, as we are exploring the inter-relationship between text and audio it is fundamental to our design to allow tactile interaction with the user interface as well as the option of going hands-free.

A gazetteer to link the text in the digital library with geographical information is at the heart of most map-based DL systems, e.g. [14, 44], together with interfaces designed for access from a desktop PC or laptop in mind, as our empirical testing of these sites demonstrated. In comparison, our work investigates how users access such geo-spatially enhanced DL content from a GPS-enabled mobile device. The findings reported below, therefore, are directly applicable to such DL projects.

Some projects take a more mobile-centric DL view, e.g. [18, 20, 39]. Here the focus is on access through mobile phones (as with our project), but with more emphasis on content creation and/or enrichment (significantly different to our project). In the case of the projects in India and South Africa [18, 39] the target user groups were strong in oral tradition but low in literacy rates, which is not the case with our project.

4 Early design: server-based system

In this section we present the design of Tipple, which is based on the functional requirements reported in Sect. 2.

The Tipple system has been through three principal design iterations: (1) server-based system, (2) initial mobile system and (3) the revised mobile implementation. We will briefly review the first two iterations before giving a more detailed account of the revised mobile implementation. In this section, we describe the first iteration (server-based system), with the second and third iterations (both mobile systems) being covered in the next section (Sect. 5).

Tipple has been built on concepts from both TIP and Greenstone (introduced in Sect. 3). We extended two existing TIP services: the travel plan service and the Greenstone bridge. The TIP travel plan service [31] was a client-side application that provided users with a zoomable map to help them create their travel plan. The users would draw out their intended travel route on the map and the TIP system then provided suggestions for POI along the way. The travel plan service shared a central database with other TIP services. We extended this service to identify book locations that are close to the current travel path of a user.

In a previous project [24], we implemented a Greenstone/TIP bridge to provide location-based access to a digital library. The Greenstone Digital Library was used as a server to search and deliver the information based on the user’s location and TIP profile. This service takes the user’s location and profile information and creates a query, which is then sent to the digital library to search for possible documents. In advance (when the digital library is formed), Greenstone documents were prepared for location-based search using a place name recognition package that added location mark-up to the documents. All place names in documents were linked to place names in both TIP and Greenstone.

For the server-based system, we required a Greenstone location-search at the chapter level of audio books, and so we changed its indexing setting from document level to section level to achieve this. The system selects audio books that are relevant to the user’s location. Their relevance is determined at a chapter level: if any chapter of a book is related to a given location, the book will be selected. All relevant audio books will be listed for the user, with highlighted chapters referring to the current location. The user can choose to listen to the audio content or to view the DL text. We also offer the option to play these chapters by travel route sequence. For the audio books and audio icons, we required a Greenstone location-search at the chapter level. In [24] the user selected a region for their current location, and place names within this area were used in the query. In the server-based system this works directly with a user’s location. We extended the interface of the TIP/Greenstone Bridge: instead of repeatedly selecting a region of interest, the user is immediately directed to the selected collection based on their current location.

In this server-based version of our software, users were expected to pre-define their travel routes using the travel planning component. The system would then list all the routes related to the current user, the locations on each route and the chapters related to the locations. When a user selected a Travel Plan, the system added all relevant chapters based on location within the route.

The text-to-speech function was built into the server side and thus enabled the transfer of the texts between the TIP/Greenstone Bridge and our main service. For the audio control panel, we decided simply to use hyperlinks to control the audio. Integration of these services with Greenstone through the TIP/Greenstone bridge gave us our first incarnation of Tipple. Details of this design are available in [19, 24].

4.1 Implementation

The server-based TIP system is implemented as a Java Servlet. A PostgreSQL database stores all relevant data. The user interface was implemented using Java Server Pages (JSP). Greenstone is similarly implemented as a servlet but without the use of JSP. For further details see [28]. Figure 2 gives an overview of the interplay between the new and existing components; existing components of Greenstone and TIP used for Tipple are shown in grey.

Fig. 2
figure 2

Components of the Audio Service (with TIP and Greenstone servers)

As the user moves, the GPS component measures the current location, and the recommender component identifies locations nearby. The recommendations contain information about the recommended locations: a description, place name, and GPS coordinate. This information is then used by the notifier to trigger the Audio Manager [25]. The Audio Manager provides a list of available audio books (from the TIP database or from Greenstone via the TIP/Greenstone bridge) to the user. If there is no audio specifically for the location, the search is continued on the next higher level (e.g., region or location group). It receives the user’s book and chapter selection and sends a list of chapters that have been ordered by the Travel Plan component to the Audio Server. The Audio Control Panel provides control over the currently playing audio books. The Audio Manager may send text to the text-to-speech component to be read out. Text from the digital library can also be sent to the interface to be shown directly.

The system provides two main interfaces for the user: the first one is the audio-based client to give a user information about the place via text-to-speech or via MP3. The second interface is a text-based Tipple web page that provides additional recommendations pertinent to the current location of the user. Additional interfaces allow manipulation of play-lists and settings (not shown in the figure).

Traditional JSP pages, as used in TIP, were not suitable for the display of changing information as was now needed; we, therefore, switched to using AJAX. For example, the system has to synchronize the audio with visual elements to indicate which audio is playing. This ‘animated’ web page, therefore, needs to regularly check with its server for new information (polling). The location information is updated when the user’s position has changed and the audio reference is updated when the audio ends.

The Audio Server Component is a Java application. It uses a TCP socket pair to communicate with clients. The Audio Server is responsible for mixing available audio streams into a single stream (e.g., for location notification while the audio is playing).

Fig. 3
figure 3

The role of audio in server-based Tipple. a Travel plan with audio. b Audio Book

4.2 Walk-through

We illustrate the implemented system working in a given example setting. The landmarks used are sourced from the kind of databases and online resources as shown in Fig. 1. We assume our intrepid explorer—let us call her Anne—is a regular user of the system and that she has already developed an interest profile around the writer Jane Austen. We use the example of Anne visiting Bath, UK.

As Anne sets out from the YMCA in central Bath, she is notified that she is near the Assembly Rooms. Anne takes out her mobile phone; Fig. 3a shows the interface presented to her. In the time it took to retrieve the phone as she walked, the system determined that another location of interest (the Gravel Walk) has moved into range. Thumbnails for each location are displayed in the upper portion of the screen. A speaker icon is shown next to each location as they are both currently available for playing.

Several lines of hyperlinked text accompany each thumbnail: the location’s title, the word “Stop” and (optionally) the word “Book(s).” Touching the title for the Assembly Rooms, for example, brings up specific information about this site. If an item also has the “Book(s)” hyperlink displayed, then this signifies there is content in the digital library related to this location. Accessing content through this hyperlink, there is in fact only one book displayed, Persuasion, and selecting this brings up the screen shown in Fig. 3b, where Tipple has highlighted in bold the chapter(s) associated with this location—in this case the penultimate chapter, Chapter 23. A short excerpt from each chapter is shown (the start of the chapter) followed by two icons: selecting the first plays the audio book version of the chapter; the second plays the version from the digital library using text-to-speech synthesis. The chapter excerpt is itself hyperlinked to the text in the digital library.

Reference books are essentially handled in the same way as regular books, although different visuals are used in the interface to help differentiate them. It would be easy to imagine a selection of bibliographic textbooks, for example, being associated with the Jane Austen Centre in Bath, which was also near Anne’s location but is not part of any narrative book. Internal metadata in the digital library is used to mark them as reference books so Tipple can detect them accordingly.

5 Design and implementation: mobile and self-contained Tipple

Here we present the design of the improved mobile version of the software. We ported the system from a server-based core with web-based interfaces to a native mobile app that accesses a local database or digital library. Thus the system can function independent of network access, but inclusion of external digital libraries is also supported. We engineered this transformation in two stages: first we migrated the TIP-level services to operate self-contained on the mobile device (but left the digital library as a networked resource); then we migrated the digital library software to also operate self-contained on the mobile device. Android was chosen as the mobile platform for development, primarily because of its alignment with the Java programming language. Another factor was its open-source base and the fact that there are no registration or license fee barriers to being able to install and run software on the devices running the mobile operating system.

5.1 Migrating TIP-level services

We initially ported the server-based systems onto a mobile device such that all user-related data was now kept locally on the device. The map system was redesigned from the travel plan service to using Open Street Maps, a format that permits local storage of its data. Book content still came from a networked digital library server; however, we added a small document cache that existed on the device so Tipple could be configured to operate in an area without network connectivity, albeit a geographically confined area.

As the original Greenstone/TIP bridge was supplying location-based cross-links between TIP and Greenstone (e.g., to be used for location-based browsing and “virtual” travellers), the service of automatically tagging single words or phrases did not match our needs in the mobile Tipple implementation (Sect. 5). Instead, we manually location-tagged paragraphs and chapters, as described in Sects. 6 and 7. We also provided additional audio information to the user (without screen-based reading). However, as not all books in the digital library may be available with pre-recorded audio content, we added a text-to-speech function.

The interface was redesigned from using hyperlinks and a web-based user interface to a mobile-app style interaction. More specifically, by targeting the Android operating system, this decision meant moving away from a generic approach that would operate on desktops or any sort of mobile device, as long as it had a web browser, to one requiring specific implementations for specific mobile operating systems. This was a decision we consciously made, as we wanted to experiment with an interface where there was greater control over the level of user interactivity supported. Consequently the client side was completely redesigned and newly implemented.

This first fully mobile prototype was used for our initial user studies (reported in Sect. 6). Following the initial evaluation, we implemented additional features and adapted existing ones according to the feedback we received. This second, improved mobile software was then used in our further studies.

5.2 Finalized Design

Greenstone 3 has been ported to run, self-contained, on mobile smartphones such as the Apple iPhone [5] and, more recently, Android devices. Figure 4 provides a schematic overview of the software architecture for the mobile implementation of Tipple, which was informed by the previous design. The main division in the software is between the interactive application that runs on the phone and the geo-tagged documents that reside in a digital library. We deliberately choose to leave the digital library software loosely coupled to the TIP services through a network connection (HTTP) as this gave us the greatest flexibility. With the digital library server installed on the phone and its configuration file set to “localhost” Tipple could operate self-sufficiently in an area with no network connectivity (such as the Hamilton Gardens, one of our test sites). Equally simple to set up was receiving the information from a networked central server. This latter configuration has the fringe benefit that the same collated tourist information is available to “virtual” travellers too, who can access the information from their current location through their web browser using the standard search and browse access methods provided by the DL.

In terms of communication interaction between the two, the Tipple application on the phone is the initiator in the exchange and the services of the DL software accessed through a servlet. More specifically, the digital library component is built using Greenstone 3 [16] using a Solr derived Search Service to support numeric and location-based searching in addition to the regular metadata and full-text retrieval.

A significant difference of note in this design, over our previous ones, is the pattern of communication between the application and the DL. Given that HTTP is client-side initiated, much of the complexity of these former designs was taken up dealing with ways that the server could asynchronously “push” data to the client (a kind of reverse-AJAX technique), requiring the use of components such as a socket server running on the client’s device. In this implementation we introduced a document cache on the client side so we could provide for all our event-triggered needs solely with traditional client-side initiated requests (i.e., only needing client-side “pull” operations).

Fig. 4
figure 4

Components of the Audio Service on native Tipple

Instead of waiting for the server to notify the client the moment the user’s GPS location matches a specific location—as we did previously, requiring everyone’s GPS location to be continually sent to it—the client is responsible for monitoring where the user is and what potential places of interest might be nearby. It does this by retrieving, through the DL search service, the set of documents that fall within a catchment area centred on the user’s current GPS location. These documents are then stored in the Document Cache and accessed locally as needed when the user moves around. The new approach does mean that a Tipple application typically goes through periods where it places a high demand on the server to deliver content, followed by quiet spells; however, overall we have found that (because the client is no longer trying to provide real-time information to the user sourced from a remote server) this new configuration delivers a more responsive experience to the user.

For the map component of Tipple we used the open source library MapsForgeFootnote 26 that offers a GoogleMap’s inspired API that works with Open Street Map (OSM). In our version we chose to implement this, by default, as static files on the phone (again motivated by our requirement that the entire application work without network connectivity). Prior to installation an OSM protocol-buffer binary format (PBF) file for the area the subject will be travelling in is prepared using a desktop computer and subsequently transferred onto the phone as part of the installation procedure. MapsForge also provides support for retrieving and displaying map information live from the OSM web server. The flexibility this gives allows Tipple to operate more as a “mash-up” in spirit than as a traditional client-server design.

Android phones are equipped with a text-to-speech API (TTS), and so there was not much implementation work involved in getting Tipple to speak out loud the textual information it has about a location. One complication specific to our pilot study was that, while the information provided about the gardens was predominantly in English, there was also a significant number of Māori words intermingled. The pronunciation of these words by the TTS system was less than ideal. Even though the TTS API is tied in with Android’s support for locales, there was not a version for Māori available, and indeed even if there was it is not entirely obvious how this could be utilized in the mixed language environment we had. We, therefore, chose to augment our implementation with a dictionary lookup method that mapped Māori words to phonetically expressed versions in the locale the phone was set for (English). Text displayed on the phone’s screen was presented “as is”; however, any text sent to the TTS API first had occurrences of words in our bespoke dictionary changed to its phonetic version.

5.3 Multi-platform version

As remarked in the previous section, native mobile implementations have to be developed within the bounds of the chosen operating system (i.e. Android, iOS or Windows Mobile), limiting the applicability to multiple platforms. With the system design finalized and in order to explore a more generic cross-platform approach for Tipple, we returned to our web-based approach. If the desired functionality is achieved using a web browser, cross-platform support would naturally be given. Our assessment was that these technologies have now matured to the point where they can deliver our functional requirements without resorting to complicated server-side “push” operations, such as those described in Sect. 4. For Tipple, the key benefit from making such a move is that it retains the simplified design described earlier in the previous section, whilst at the same time being able to deploy the implementation on a wider range of mobile operating systems, not just Android.

Content from the digital library server (running on the phone or else over a network connection) is retrieved through AJAX calls from the phone’s web browser, displayed as landmarks on the map. The accompanying text is synthesized using text-to-speech and/or displayed on screen in a popup window.

To achieve this, we have made use of the PhoneGap platformFootnote 27—also known as Apache CordovaFootnote 28—for multiple mobile operating system software development. This is an approach to mobile app development strongly focused around HTML5 capabilities and related W3C standards and recommendations. When a particular mobile operating system’s web browser does not support a particular web technology, then PhoneGap provides a mechanism through ‘plugins’ for that capability to be implemented natively, but done in a way that makes it look no different at the Web API level. This means the software developer only needs to implement interface functionality once, with the specific details of what natively gets installed on a particular mobile device to make this work sorted out by the PhoneGap compilation and installation process.

Many existing plugins are provided, with the ability for a developer to write their own if desired. For example, Apple’s Mobile Safari directly supports the W3C text-to-speech API; however, the same is not true on Android (at the time of writing) but a PhoneGap plugin for it exists. We draw upon this in our multi-platform implementation. For map rendering functionality, we make use of the open source Leaflet APIFootnote 29 accessing Open Street Map data, paired up with PhoneGap’s local filesystem API so it can be stored and accessed directly on the phone, if desired. Location-aware GPS, letting the app find out where the user is located, is another plugin provided by PhoneGap.

This multi-platform version of the Tipple software only changes the underlying software scaffolding. The principle components (see Fig. 4) remain the same. Equally, the Tipple frontend interface design did not have to undergo any changes.

5.4 Walk-through

Figure 5 shows screenshots of the audio book service running on an Android phone. The screenshots are taken from an example of travelling around the Hamilton Public Gardens of which more details can be found in the next section.

Fig. 5
figure 5

Interface of native mobile Tipple app. a Audio at location. b Text at location

In Figure 5a—which captures the user’s tour after their first few minutes when using the Audio-Only mode—the user has passed through the first location (chirp chirp) on the map, but chosen not to play the audio commentary just yet and progressed to the second location (where a second “chirp chirp” rings out). This is the point the snapshot has been taken. From here the user can start listening to the first location (Play) which will automatically continue on to play the audio for the second location (with a ‘bing’ to separate the two audio descriptions). Alternatively, they can skip to the next location in the queue (press Skip-to-Next button once). From there they can start the audio for the second location (Play) or press the “Skip” button for a second time (now labelled “End”) to clear the queue of locations.

Text-Only mode, shown in Fig. 5b, is similar to the Audio-Only mode. Here the Play button is replaced with a Show/Hide text button, for displaying a text rendition of the commentary superimposed over the map. The figure captures the same moment of the visitor’s tour as before. Despite the name of this mode, audio alerts are still played (as well as a vibration alert) to notify the user of a new location. Having heard the second “chirp chirp” sound the user has elected to press the Show text button in this case to read about the first location they visited. The same choices to view the next location’s commentary or clear the queue are also available in this mode using the skip button. There is also a mixed text-and-audio mode, which is not shown here.

6 Study methodology and user study with reference book

In this section. we describe the methodology of our three user studies in general Sect. 6.1. We then describe the first study, which used a reference book. The two subsequent studies, which focus on works of fiction, are described in Sect. 7.

6.1 User study methodology

The three studies used different types of literary works. The first one used a reference book, while the other two used works of fiction. A reference book does not require the contents to be shown in any particular order (no chapter structure). It has, in our case, some overlapping locations. The second work is a fictional work, in which a sequence of events is described (chapter-like structure). The locations were not coded with any overlapping elements. The third work is a collection of poems referring to Auckland’s cityscape. The poems do not have to be visited in order and no overlap occurs between location annotations. We do not discuss here the works and annotation examples in great detail, as this is space consuming.

We focus in this article on explorative studies to show the interplay between the software elements and to discuss the general concept of our Tipple system. The studies were semi-structured in nature. Before using the software, the participants were asked for demographic information and prior knowledge of relevant information systems and specific locations used in the study. After using the software, participants were encouraged to talk about their experience. We guided them via a number of open-ended questions that aimed to encourage participants to discuss their experience and to share any further ideas they might have for the software. None of the subjects received incentives for participating in a study.

We describe the preparations, setup and questions used for the reference book study on in detail in the next subsection, in addition to the study results. We then present the results for the two studies using works of fiction (Sect. 7). For these, the study setup will be described more briefly, focussing on any differences to the initial study.

6.2 User study with reference book

We conducted a first user study using a book about the Hamilton Gardens, the text of which was semantically enriched with location data. This user study tested the viability of the concept and the usability of the interaction design as implemented.

6.2.1 Setup data

One of the difficulties of studying location-based audio books is the need to provide users with sufficient example data. We decided to first explore the use of a reference book, as at that point in time no suitable works of fiction were available for the region we are based in. We used a customized book for the Hamilton Gardens—specifically the Paradise Gardens, which is composed of a set of gardens in microcosm, each garden style from a different country—that was created by extending an existing work with information gained from a transcribed guided tour. Explicit references to relative locations were avoided (e.g., “to the left”). The data were prepared in book form and split into separate sections based on the different gardens in the Paradise Garden collection. Each of the sections was manually analysed for location information, and location mark-up was inserted using GPS coordinates as the centre of a circular area. Some of the location annotation is overlapping, as some garden features are part of a larger garden (e.g., the Rocks in the Japanese garden).

The text content used for this study (as well as for the other two studies) was produced independently of this research. For our study, the text was professionally transcribed and minimally manipulated to be inserted into the digital library. No content was edited beyond removal of relative location references. Similarly, the location markup used for the later studies was produced independently and merely recorded and transformed to be suitable for encoding. Thus neither text nor location markup introduced or expressed any bias by the researchers.

No electronic map was readily available to be used for the study. All available online maps provided satellite views that were either not available at the desired resolution or (if available) too complex to be used (with many trees, flower beds and lawns that were hard to distinguish). None of the available raster maps distinguished any features for the gardens. This made it necessary to create a raster map with sufficient resolution so that it could be used in the study. Figure 6 shows the electronic map that was created for this study using the mapping tools developed tp upload data to Open Street Map.Footnote 30

Fig. 6
figure 6

Paradise Gardens

Fig. 7
figure 7

Previous visits to Hamilton Gardens

Fig. 8
figure 8

Use of (audio) books

The participants were invited to visit the Hamilton Gardens. After a brief introduction and a general questionnaire, they were given an Android phone with the Tipple software pre-installed. Participants were then encouraged to visit the Paradise Gardens deciding on their own route and timing. The gardens provide a number of options for the route taken; the book did not make any references to particular routes. Participants toured the site individually. The researchers met the participants at the exit of the gardens for a follow-up interview. No particular tasks were set for the visit. The motivation was to simulate an ordinary visit to the gardens without any unnecessary interference.

6.2.2 Questions

The following topics were explored in the questionnaire and the follow-up interview:

  1. 1.

    Pre Tipple use: general questions about the participant and their usage of books and audio books;

  2. 2.

    Post Tipple use: usability and affordance of the Hamilton Gardens audio guide; and

  3. 3.

    Post study: usage analysis of the Hamilton Gardens audio book.

A summary of the participants’ answers and a discussion of our analysis are presented in the next two sections. We now present the results of the introductory questionnaire, system usage, and the follow-up interviews.

6.2.3 Study results

Participant background The 16 participants who took part in the study were students from the department of Computer Science at the University of Waikato (14 participants aged 20–35, two aged 35–50; 3 females and 13 males). Most of the study participants had previously visited the Gardens while six had visited rarely or never (see Fig. 7).

Figure 8 gives an overview of the participants’ familiarity with, and use of, books and audio books (presented as a boxplot) using a five-point Likert scale. Participants fall into two groups: most of them read books often or always (50 %, 8 of 16), six read rarely or never, two read sometimes. The majority would not take works of fiction on travels (median of ‘rarely’), even fewer would take travel guides (median at 1.5; mode is ‘never’). Audio books, audio guides and electronic guides are almost completely unused.

Several participants use mobile electronic maps on their phone (10 of 16 ‘sometimes’ or more often); one of them typically uses digital photographs of maps to save bandwidth. The reason most often cited for low usage of electronic information is the cost of internet connections.

Audio-books at Hamilton Gardens Figure 9 shows the overall length of time participants spent using the service. 12 of the 16 participants predominantly used the Audio Service (blue/light grey), and four participants used the text-based display (red/dark grey). No relationship could be found between each participant’s preference for books and the time spent using the service; however, there was evidence that lack of familiarity with the general location led to impatience. For the two shortest visits in our trial—20 min (P9, text) and 25 min (P5, audio)—neither participant had been to the gardens before. Both found it difficult to relate the audio/text to their surroundings; P5 never stopped to listen to the audio, and P9 expected the book text to be matched by physical signs in each garden. It seemed that these two participants found the audio book concept difficult and had expected something more akin to an audio travel guide.

Fig. 9
figure 9

Usage times

Fig. 10
figure 10

Navigation

Fig. 11
figure 11

Information

Figures 10, 11, and 12 summarise the participants’ feedback about the usability of the system. Figure 10 shows the participants’ feedback about the navigation (summasized from a number of open-ended questions in the semi-structured interview with the participants after using the Tipple system). Most participants (10 of 16) were satisfied with the localisation (map and location identifiers). Problems identified were the need to look at the map for navigation (P3) and the lack of feedback about direction of movement (P5). Participants who were satisfied with the map also observed that additional explicit information about the direction of their current movements would have helped. They suggested the use of additional indications, such as arrows.

Figure 11 shows a summary of the participant feedback about the information access. The audio icon (chirp) signalling the availability of text and audio information was generally described as being helpful. Two participants confused it with their phone ringtone and found repetitive signals annoying (the active areas in the gardens were relatively close to each other). The majority of participants (15 of 16) understood and enjoyed the book content. Suggestions for improvements were to create smaller items with an option to continue instead of offering all available information for each location at once.

Fig. 12
figure 12

Content reception

All four readers of the digital text were satisfied with the interaction and readability of the text (see Fig. 12); 8 of the 11 participants with audio were satisfied, with one not answering. Both P1 and P11 observed that the text-to-speech audio became boring after some time, as it was rather monotonous.

Fig. 13
figure 13

Presentation preferences

All 16 participants were given the opportunity to explore the other presentation form (i.e. text for audio users and audio for text users) after they finished their walk through the gardens. They were then asked for their preference of presentation (audio or text). Figure 13 summarises the presentation preferences of the participants for our study. All audio participants preferred the audio presentation over text; none of them asked for text representation. The reason most often given was the advantage of being able to look at the surroundings and not having to read from a small screen. As a possible additional use of text, they mentioned the advantage of skimming and quick decision making. Two participants using the text display preferred text over audio; P13 felt it would have ‘looked crazy’ if they had been walking around with an audio guide and they preferred the option of quick skimming. P14 was a non-native speaker and preferred to read the text in their own time.

Figure 14 shows the suggestions made by participants for further information. None of these were specifically prompted (i.e., no list of options was provided); rather participants were asked for any additional information they would wish to have displayed or any functionality they would have liked to have.Footnote 31 During our analysis we summarised their feedback into the categories given in Fig. 14. The label ‘layered information’ refers to breaking the text into smaller units; ‘further information (open)’ refers to any additional information that may be available, e.g., online or in a DL. Participants liked the idea of displaying further information (text or photographs) of other locations mentioned in the text, which a link to a digital library service would be well placed to support.

Additionally, they suggested the use of photographs to identify any artefacts the text mentions. In summary, the feedback of the participants indicates that increasing the exposure of the underlying DL services in the mobile application would enrich its functionality in line with users’ expectations.

Changes made after the first user study Following the results of the first study, we improved the interface and implemented further features to make the system more usable, especially for chapter-based books.

Fig. 14
figure 14

Additional information: suggestions (Hamilton Gardens)

Figure 15 shows screenshots of some of the changed interfaces: (a) we introduced new chapter location icons and a travel path feature; (b) users can now add their own photographs, which are added to the text shown; (c) users can follow several books in the same location (indicated here by differently-coloured icons); and (d) each of these features can be managed via an options menu. Further details about the software extensions can be found in [40]. Please note that both Fig. 15a, c are shown here zoomed out to highlight the path feature and the places belonging to two books, respectively. Books can be turned on and off as the user prefers. For the map, a simple pinch gesture allows the user to zoom in and out, comparable with other mobile map systems. This feature was used competently by all our participants to get the level of zoom with which they felt most comfortable.

7 Two user studies using works of fiction

In addition to the Hamilton Gardens study that tested the viability and usability of the software, we executed two further studies in which participants used Tipple for location-based access to works of fiction: one on poetry and one with an ongoing narrative.

7.1 Stations at the gardens

The Stations at the Gardens is a yearly arts event in Hamilton that uses a well-known story separated into 14 chapters. Each part of the story is further illustrated by pieces of art that are installed in the Paradise Gardens every evening for one week. We used Tipple to make location-based audio recordings of the story available to visitors of the event. As the event happened during the evenings, any alternative paper-based text distribution suffered poor readability.

The event organizers provided the story text as well as a map. We manually created a digital book and assigned the location annotations. For fine-tuning, we visited the event location and adjusted the location information if we found that the actual location differed from those that had been planned originally.

Fig. 15
figure 15

Additional and changed features. a Travel path indication. b Adding user-defined photographs. c Map with several books. d Options menu

The software was installed on 11 Android phones belonging to visitors to the event. It was also available from the event’s website and in an Audio-Only version for other phones. Additionally, we distributed Tipple-installed phones to selected visitors. As the event happened in the dark, visitors were given torches by the organizers to explore the garden locations and stations that related to particular elements of the story. Almost 700 people visited the event; we were present with our devices and software on four evenings. Most visitors were adults of various nationalities, most of whom had not brought their phone. Very few had Android phones but all of those who did were happy to have our app installed. However, most visitors were unable to wait for the complete installation before they entered as this required copying and installing the APK (details can be found in [40]). All participants were handed questionnaires and also encouraged to be available for interviews; not all took up the offer.

We observed and briefly interviewed eight participants between 8 and 10 p.m. on several evenings, another eight filled in a short questionnaire. 8 of these 16 participants were females and 8 were males; all were from the general public (i.e., non-IT participants). None of the participants were previously familiar with the Tipple software nor had they taken part in our first user study. Due to our approach of observing users “in the wild,” the results are of somewhat mixed quality, as to the study being executed during a public event. Visiting times using Tipple are shown in Fig. 16 where it can be seen that tours lasted longer for people who used the audio feature. It took on average 40 min and most users stayed between 30 to 60 min; two phones had low battery power (cutting out after 10 and 20 min, respectively).

Six of the eight participants using the questionnaire were satisfied with the audio sound. The audio was generally described as being “helpful” and “interesting.” One participant said “It was exciting to hear (the story) when looking at (the artists’) works.”

A problem was created by the close proximity between some stations (leading to the wrong chapter being offered). The majority of participants read and played the audio in order. One participant preferred to play the chapters by selecting them from the map as they “would not have to be close to the location but [are] still able to read the story.” Two other participants also suggested that the audio not be queued by locations visited (the latest one always played first) but to allow audio to be played in their own time independent of location. This feature is already available in Tipple but as it was not the focus of our study the participants were not made familiar with it.

Eight users tried both audio and text during the study; eight used audio only. The majority of audio participants (7 of 8) preferred the audio presentation (one preferred text); see Fig. 17. Five of the participants who tried both options preferred audio, two the text, while for one participant no data are available. Two participants who preferred text were non-native speakers who wanted to read the text in their own time. One participant with preference for audio said that it was “not comfortable to read in the dark and on a small screen.” Some participants felt the chirping noise the system made to alert them to another chapter was disturbing in quiet areas (when used without earphones).

Participants suggested integration into social networking services to record their own photographs (see Fig. 18) or receive more information (four participants). Two participants wished to take notes and one wanted to share the pictures provided by the app.

Overall we received positive feedback from the participants but technical issues and the experience of Tipple users “in the wild” made running the trial somewhat haphazard.

7.2 Poetry on the pavement

Poetry on the Pavement was an event in Auckland, New Zealand’s largest city. Throughout the city, a selection of poems was placed on the pavement, which together told the story of the city. For the purposes of our study, each poem was treated as a separate chapter of a book. We created the digital book in the library and manually added the location annotations based on the information that was made available from the event organisers.

Learning from the experiences of the Stations event, for the Poetry on Pavement study, we did not distribute the software to participants’ phones but used pre-installed Tipple phones. We used a SWOT analysis to identify strength, weaknesses, opportunities and threats of the Tipple app, and a usability questionnaire based on ISO 9241 [7].

The study had 20 participants: 15 resulted from invitations issued to passers-by on Auckland’s streets (with a wide variety of ages and nationalities) and five participants with computer science backgrounds.

Fig. 16
figure 16

Usage times (Stations event)

Fig. 17
figure 17

Presentation preferences (Stations event)

Fig. 18
figure 18

Additional information: suggestions (Stations event)

Fig. 19
figure 19

Use of (audio) books (Poetry-on-Pavement event)

Half of the participants were females and half males; their ages ranged from just under 20 (3), 20–35 (7), 36–50 (8), 51–65 (1) and one was over 65. None of the participants were previously familiar with the Tipple software nor had they taken part in our first two user studies.

We observe that the prior experience with digital content in this group is quite different to the one participating in the original study in Hamilton Gardens: 18 of 20 participants used digital maps while travelling, 9 used audio guides in museums, 9 read novels while travelling and 8 used electronic travel applications (for details see Fig. 19).

Figures 20 and 21 summarise the participants’ feedback about the usability of the system. Figure 20 shows the participants’ feedback about the navigation with the participants after using the Tipple system, following a similar semi-structured format to the second study. Most participants were satisfied with the localisation (map and location identifiers), though some wished for additional explicit information about the direction of their current movements. When asked about their satisfaction with the speed at which the system operated (e.g., for redrawing the map and loading information), 17 of 20 were very satisfied, and three found it acceptable.

When asked about the controllability of different app features (see Fig. 21), most participants liked the way in which they could switch between the presentations forms (i.e. audio, text or audio and text) and how to access the picture taking function. However, 12 of 20 people found the selection of books not ideal (11 found it ‘okay’ and one did not like it). One participant expressed that it was “too many steps to change the book preferences,” while another observed that it took “at least three steps.” One participant without experience in mobile app usage said they would not have known that they could change the preferences without help.

Fig. 20
figure 20

Navigation (Poetry-on-Pavement event)

Fig. 21
figure 21

Information (Poetry-on-Pavement event)

We received positive feedback about the system features such as showing chapter locations, chapter numbers and travel path (showing connections between chapters on the map), as shown in Fig. 22. Several participants mentioned particularly that they enjoyed the location-trigged system, the presentation of chapter order and the clear design of the travel paths. They observed that the chapter order becomes very clear through the use of the travel path on the map. One participant suggested that the travel path should stand out more on the map.

Fig. 22
figure 22

Helpfulness of systems features

Fig. 23
figure 23

Controllability of app

Participants found the system easy to control (see Fig. 23) but mentioned possible travel interruptions. They would have liked to have more information about the books available and overall wished that there were more books available that they could listen to (that is, the lower mark on book selection seemed more of a reflection on the range of books available than the control mechanism). They also suggested support for taking several pictures in one location.

Nineteen of 20 participants were satisfied with the overall experience of Tipple. They observed that the chirping alert is occasionally intrusive during travel. The advantage of also being able to read the chapters in their own time was explicitly mentioned. One non-native English speaker said they did not like the application because it was mainly using English, which they found challenging to read or listen to while travelling.

Participants recommended additional features (when asked an open question for additional information or features): they wished options to personalise the system with their own images and for close integration with social networking services such as twitter or facebook, see Fig. 24. Two users would have liked the the option of taking notes, and two others wished for the app to support different languages.

Fig. 24
figure 24

Additional information: suggestions

8 Discussion

We first compare our findings from the three studies to those of earlier studies. It needs to be stressed again that no directly comparable study has been previously executed. The available studies are mainly for audio output in tourist guide systems or navigation systems. Spoken audio in digital libraries has so far only been considered for desktop-based system, e.g., for reading learning tools or for text-to-speech to increase accessibility (e.g., systems for visually impaired or dyslexic people [3, 21, 43]).

8.1 Studies of related work

User studies have never systematically explored large numbers of audio guides. Related work in HCI has typically focused on technological innovation (such as context-awareness). Available studies predominantly focus on system design and evaluation. The general need for field trials in particular is still under discussion as they are difficult to conduct and time consuming [34]. However, we believe that mobile and adaptive systems that explore new interaction pathways provide richer results in field studies and cannot be adequately tested in a lab-based study. Most studies on audio guides in museums focus on (re-)evaluation of the user requirements (e.g. [41, 49]). Audio museum guides follow similar design strategies to travel guides. Because they are restricted to enclosed spaces, they typically need different means to identify locations or exhibition items. Audio information is more easily distributed (dedicated hardware available in the museum) and is typically pre-recorded. These characteristics do not hold for a system like Tipple—a digital library that is accessed while being at the location referred to in the book.

8.2 Sharing audio books

Each of our study participants used Tipple by themselves. However, in an informal pilot to our first study, two participants went together and expressed the desire to listen to the audio book together. This would constitute a new DL access pattern as users do not typically search and read documents in digital libraries in groups. Audio interaction and collaboration between users has been evaluated by Aioki et al. [1], who developed an electronic guide book that allowed museum visitors to explicitly share audio information (pre-recorded and spoken). They found that most couples used shared audio as a conversational resource. Evjemo et al. [17] also report that nearly 40 % of their study participants explicitly asked for the opportunity to share audio content in a museum and this included nearly 90 % of those who usually had company when visiting museums. Most of our participants reported that they typically went travelling with friends or family. However, none asked for an opportunity to share the information, which may be due to the participants being on their own during our studies.

8.3 Directions and object identification

Some participants wished to receive information not only about their location but also about the direction of their movements. Additionally, several participants reported difficulties in identifying items that were mentioned in the book in their immediate surroundings. Combined audio and directional signage were previously evaluated [13] (note that our system does not provide audio directions). That study found that a map with photographs or directional arrows and photographs allows users to find their way more quickly than when using the map alone. We believe that audio messages in addition to audio book readings would lead to confusion as our participants already complained about the spoken feedback provided (for instance, when a site was previously visited). Directing users to items based on sound (similar to [20]) may be worth exploring as an alternative. Previous evaluation of audio tourism guides used short pre-recorded audio (around 30 s) [29]. In contrast, audio from our DL books is typically longer and relies on text-to-speech. Holland et al. observed that participants in their studies preferred audio over video and more extensive use of still images was requested. This is similar to our study, where participants asked for additional photographs to identify and relate to the objects mentioned in the book. This request finds a direct parallel in their study, where approximately 60 % of the participants reported that it was occasionally difficult to link the information given to the surrounding buildings.

8.4 Interaction model with the book

It is typically observed that a visual user interface takes attention away from the physical surroundings, whilst listening to audio clips seem to balance the attention much better [1]. Nevertheless, we had four participants overall who preferred to read text and a number of participants were undecided. A strong reason for this preference was the self-paced access (easier understanding for non-native speakers, faster skimming or relevance determination for proficient readers).

8.5 Tourism system versus digital library

Earlier in this article we made the point that previous evaluated systems belong to the categories of tourism and/or museum guides. Digital libraries have been implemented typically as desktop-based applications and only recently in mobile applications [5]. We did not find any studies for location-aware audio books in digital libraries or location-aware access to digital libraries in general. We briefly discuss here the differences between the two approaches to highlight that, even though studies in tourism guides allow some comparison with our studies, they are still limited. The differences between guide systems and audio books are matters of recording, length, purpose and the need for pre-processing.

Tourist audio guides typically provide relatively short snippets of pre-recorded audio (for example, [17] describes 3 min as a particularly long recording). An exception was AccesSights [35] with text-to-speech for guiding visually impaired users. In contrast, book chapters or even parts of book chapters can easily span more than several minutes. Readings that are only a few seconds would be almost unheard of in audio books. Typically guides are purpose-built for a particular location whereas audio readings of books are usually not intended for automatic reading on location. One of the consequences is the pre-processing necessary, which is inherent in building a guide system but constitutes an additional step for books. Locations need to be identified and attached to (parts of) book chapters. Automatic location mark-up of text is still an open research issue. Two of the 16 participants in the Hamilton Gardens study had difficulties interacting with the system as they expected it to more clearly refer to significant locations and give routing guidance. We believe that their expectation of the app’s interaction model was that of a tourist guide and not an audio book. This aspect needs further exploration. This problem did not occur for either of the two studies on fiction/sequential books. We believe that participants in these latter cases formed mental models that were more strongly influenced by the book concept.

8.6 Language issues

Several participants noted the limited quality of the text-to-speech function. They reported that listening to the monotonous voice used became “boring” and thus hard to follow. However, as better sounding text-to-speech programmes are regularly becoming available, we are of the opinion that this problem should be alleviated in the near future. Already many people are now used to using text-to-speech, e.g., in mobile navigation systems. Furthermore, none of the participants in our three user studies was wearing headphones—these might further influence the audio experience.

Even though Tipple supports pre-recorded audio, this feature was not tested in our studies. It would be worthwhile to test the acceptance of the audio feature again with newer software and more options available. Finally, the limitations of text-to-speech can be avoided using Tipple’s feature of playing text as MP3 audio—this, however, would require prerecording the reading of the books.

The researchers observed an additional issue when developing the chapters referring to the Māori garden (beyond the south west corner, Fig. 6). No text-to-speech implementation is available for Māori; moreover, the spoken text was to be delivered in English and a mixing of languages is not supported. To mitigate this issue we developed a bespoke phonetic dictionary containing a fixed list of Māori words appearing in the text. Culturally this is not a viable option nor does it scale well for larger bodies of text.

8.7 Limitations

There are certain limitations to note about the user evaluation undertaken to date. Naturally, the most appropriate way to evaluate a software for audio presentation of books (or audio-books) is to evaluate it using a complete set of books. Executing such an evaluation is clearly a goal that we wish to achieve; however, current limitations are caused by the difficulty to source suitable content that is freely available and located in the region we are based in. We are in discussion with local authors, but to date such an opportunity had not arisen. Another limitation is that our user studies only sparsely populate the intended audience (i.e., people of all ages and backgrounds interested in literary tourism). This is a common problem in research projects, which we remedied in part by involving the general public in the studies using works of fiction. However, these then encountered issues related to running user studies in the wild, such as limited hardware support for the software, as outlined in Sect. 7.

Our studies represent an initial exploration of a new software concept and, despite the shortcomings discussed above, we believe that a number of useful insights were gained from our studies, as outlined in this article.

8.8 Semantic annotations

The semantics of location annotations needs further exploration. Variations of precision and certainty are something that could be encoded in a location’s annotation. Piatti et al. [42] discuss options of distinguishing between actual location, remembered location and dream location. Furthermore, overlapping annotations have been used only sparingly in our examples so far, and the wider implications of hierarchical annotation need to be explored. When a story is set in New Zealand, the scene in Hamilton and the protagonists are at home, for instance, then the level of data provided to the user should most likely depend on the user’s location (e.g., presenting more general information if a user is further away). These considerations do not so much concern the annotation itself. Rather it is a question of user experience in presenting the annotation.

We believe that automatic annotations of locations have limited applicability, and for Tipple we currently follow a crowd-sourcing approach of manual location annotations [26]. In this work we have begun to distinguish between POI, areas, paths and gateways. We also explore the option of offering alternative locations and selecting specific locations for the traveller to visit even though the literary work remains unspecific. These options may be more viable for works of fiction, while historic or factual books might wish to rely on precise locations. A detailed discussion goes beyond the scope of this article.

The implementation discussed in this article is agnostic to how the annotations were created.

9 Conclusion

In this paper, we have presented work on an audio-based tourist information system that sources a large portion of its information from documents annotated with geo-locations contained in a digital library. Having established the functional requirements and reviewed related work, we detailed our system’s design and implementation followed by a series of user studies that investigated the roles text and audio play in delivering location-aware information to the user.

Tipple was implemented as a combination of travel software and digital library, opening up opportunities that neither of these two types of systems possess independently.

The system’s potential was explored through three field studies of literary texts with varying characteristics: a reference book, a narrative and a set of poems. Each of those highlighted different possibilities for the system to be used and developed further. The location annotations used in our three field studies were done manually, though (as noted previously) we are now exploring crowd-sourcing of location annotations for books.

Tipple is agnostic to the types of books being used. Indeed, there is no particular reason to even limit the scope of the works to writers. Subjects like history, geography or religion would work equally well.

9.1 Current work

We have also started exploring the use Augmented Reality (AR) in conjunction with Tipple to provide an alternative spatial context in which the user can access content from the digital library. The existing 2D map view of Tipple has been enhanced to include an AR View button, which can be selected at any time. Choosing this option switches the display to a live video camera mode which shows an equivalent view of the map-based location markers. Using the smartphone’s digital compass in addition to GPS information, the software augments a live video stream with interactive markers—in our case representing the geographical locations of the content retrieved from the digital library.

Figure 25 shows a snapshot of Tipple in this new AR mode, taken from an example of the app in use on our local university campus. Holding the phone or tablet vertically, as the user pans their device, round markers (white circles in this case) come in and out of view. The larger the circle, the closer the user is to that location; distance information is also displayed. Walking to that location, or else touching one of these markers on screen triggers accessing that location as before in Tipple. Orientation and the location of other markers nearby is aided by a radar view overlaid in the top-left corner.

Fig. 25
figure 25

The latest Tipple interface using Augmented Reality

In developing this extension to Tipple only a modest amount of implementation work was needed. Much of the essential work was accomplished using the general purpose open source Augmented Reality project Mixare,Footnote 32 available natively for both Android and iOS. The key to combining Mixare’s API with our software was mapping our geo-location markup into the JSON format used by Mixare, and then attaching callback methods that returned control to Tipple once a marker in the augmented view was activated (triggered when a user walks into the actual location) or else selected it (touched the marker on the screen).

9.2 Future work

Possible future avenues of research for this project are manyfold:

  1. 1.

    We have begun to explore the semantics of location annotations in more detail, for example, to address the potential implications of hierarchical annotations, overlap of location annotations and options for situations that have more than one location (e.g., telephone calls) [26].

  2. 2.

    We are exploring options of mobile location-based information that goes beyond text, such as music (e.g., according to the countries that the Paradise Gardens represent) maps (particularly historic maps), and educational content, where users would be more involved instead of merely being passive recipients of information.

  3. 3.

    We are planning to execute a user study with a longer chapter-based book, preferably with members of the public. Similarly, a longitudinal study with a small number of participants could provide many insights and study aspects that go beyond our current evaluation, which focussed on the general concept, system architecture and interaction issues.

There are a number of aspects that are already supported in Tipple but that were not yet studied in detail, such as remote access to audio books. Furthermore, the aspect of newly visiting versus revisiting or even reading/listening to the book for the first time or again, or after reading the “real” book is a question which we would be interested to explore in future. Many other aspects come to mind, which could not be included. These have to be left to further specialized research that goes beyond an exploration of the Tipple concept and architecture. Examples are explorations of suitable audio/tactile alerts for public spaces, social aspects of group activities and the impact of professionally-produced audio books.