1 Introduction

The unceasing advances in Information and Communication Technologies (ICT) have profoundly impacted our society and daily lives, providing relevant benefits in a wide set of sectors. In the particular case of the cultural industry, technological advances not only enable to capture, disseminate and (persistently and more effectively) provide access to cultural assets and events but also enable to adapt and transform each of the previous aspects as desired. In addition, the recent explosion of immersive media technologies, like virtual reality (VR) and extended reality (XR), further augments the possibilities in this sector, opening the door to new fascinating opportunities and revenue models. VR not only enables a hyper-realistic digitalisation of cultural assets and events—and thus their digital preservation—but also becomes a powerful medium for interactive storytelling (e.g. [1, 2]) and knowledge acquisition, and further enables the reconstruction of both tangible and intangible cultural heritage (e.g. [3, 4]), even from the past (e.g. [5]).

VR has the power of overcoming geographical and temporal barriers for enjoying cultural events, performances, guides and heritage, making virtual travels through time and space a reality. The involved agents in the cultural sector are becoming increasingly aware of the opportunities that VR can provide in terms of communication, entertainment and learning. This even enables to expand the audience reach, providing interactive, hyper-personalised and engaging experiences, anytime and anywhere.

A simple and cheap, yet effective and hyper-realistic, way to provide VR experiences is through 360° videos, also known as immersive or VR360 videos. In VR360 videos, a view in every direction is recorded at the same time using an omnidirectional camera or a camera rig that captures overlapping angles simultaneously. The multiple views are then stitched together into a single, high-resolution and seamless panoramic video. VR360 videos do not allow for free navigation around 3D VR environments (commonly known as 6 degrees of freedom (6DoF)) and for highly interactive experiences combined with motion and position tracking, as they are flat formats and are captured from a single point of view. However, VR360 videos have the potential of providing immersive experiences, as well as a high degree of realism with less effort than using virtual modelling and reconstruction techniques in 3D VR, as real scenarios and characters can be directly captured with a VR360 camera.

VR360 videos can be enjoyed via traditional devices (like PC, laptops, smartphones) or VR devices (like head-mounted displays (HMDs)). Regardless of the device, the users can freely choose the viewing direction around the 360° space, but each device type can provide different interaction mechanisms. Examples are the mouse or keyboard in PCs, and internal movement sensors, such as the gyroscope, in smartphones and HMDs.

Especially when watched on VR devices, VR360 videos can provide unique experiences placing viewers at the centre of the scene, and allowing to freely look anywhere. This gives viewers the illusion of actually ‘being there’ in the captured environment. This greatly improves both immersion and engagement—compared with the use of traditional media. In essence, VR360 video production has changed the way audiences perceive and interpret video content, opening new possibilities in terms of digital storytelling. Within the cultural sector, VR360 videos have the potential of providing worldwide visitors unique experiences where they can make itinerant virtual tours and visits at cultural spaces and events. It enables the exploration of the most culturally and spiritually significant spaces and events in human history, like some of civilisation greatest wonders and amazing creations of mankind, acquiring a rich cultural understanding. What is more, these immersive pieces can give viewers access to corners of the world they never imagined they would experience. They additionally have the potential of providing features not possible when visiting these places in person, like experiencing from floating views around the floor, or choosing the seat or even bouncing from seat to seat to wing to backstage while an orchestra or an opera is playing. Due to their great potential, major platforms, like Facebook, Google and Youtube, and many cultural institutions have started creating VR360 videos to provide more innovative and immersive experiences to their audience. Despite the growing interest in, and adoption of, VR360 experiences, research studies on, and thus solutions for, accessibility in immersive media are limited so far. This hinders the interaction of a significant percentage of the population with VR experiences. Proper technological solutions, contents, interfaces and recommendations need to be sought in order to ensure a proper narrative interpretation of information and usability, regardless of the capacities of the users, their age, language and/or other specific impairments. This will contribute to a global e-inclusion, offering equal opportunities of access to the whole consumers’ spectrum, while ensuring compliance with regulatory guidelines.

Accessibility in immersive media, and in VR360 videos in particular, is the research topic addressed in this paper. As proof of evidence, Section 2 will provide some context, outlining statistics about people with accessibility needs, the increasing demographic ageing, and highlighting both the interest of senior citizens in the cultural sector and their value therein. Likewise, existing regulatory frameworks and conventions to secure a democratic and equal access to cultural goods by all citizens will be introduced in that section. In conjunction, these issues reflect the high relevance of the research topics addressed in this paper. Then, an overview of current initiatives and solutions by major platforms and cultural institutions to provide VR360 experiences will be provided in Section 3. Section 3 will also discuss the needs for making immersive experiences accessible, highlighting the key associated challenges and lack of appropriate solutions. Therefore, Section 3 supports the potential of this novel medium on the one hand, while reflects on the need for research on efficiently integrating accessibility features for VR360 videos on the other hand.

After that, Section 4 will provide an overview of a developed end-to-end platform to integrate access services (like subtitling, audio description and sign language) and appropriate interaction modalities, in immersive media services, like VR360 videos and spatial audio. The platform includes the necessary components from media production to media consumption, but especial attention will be given to the content consumption part, as it is the interface with which end users will interact. In particular, Section 4 will present the key features of an open-source VR360 player that enables a personalised presentation of immersive and accessible content. The player serves as a demonstrator of potential solutions to address the identified challenges and limitations, but also as a real tool to test and validate them. As proof of concept, some examples of cultural VR360 pieces with access services, created using the developed platform and integrated in the VR360 player, will be introduced. After this, Section 5 will discuss the contributions of this work and their potential impact in the cultural sector, providing standard-compliant, open-source and validated tools to third-party entities for adding access service features and appropriate interaction interfaces to existing and future VR360 content. That section will also reflect on the need of extending the contributions of this work to the wider 3D VR and XR domains.

2 Societal impact

Technological advances typically provide significant social and financial benefits, but may also raise barriers for some citizens.

While user interaction and engagement is at the core of VR technologies, no homogeneous profiles for the consumers exist. The common classifications by gender and age impact the interaction with ICT and participation in cultural content [6]. Not all consumers behave or have similar levels of enjoyment when interacting with VR experiences [7]. A significant percentage of the population has accessibility needs, which is augmented by the increasing ageing population. In addition, novel technologies raise a further issue: efficient usage and interaction is influenced not only by disabilities but also of capabilities—e.g. skills—which also become a key aspect to be considered.

This section provides statistics about population with accessibility needs, including the aged, highlighting their relevance in the cultural sector. Then, some regulatory frameworks in this domain are introduced. Finally, it provides arguments about why taking into account the whole spectrum of the population is essential to contribute to e-inclusion and universal access to the cultural sector.

2.1 Population with accessibility needs

According to the World Health Organization (WHO), around 466 million people—i.e. over 5% of the world’s population—have disabling hearing loss, and this amount is expected to increase over 900 million by 2050 [8]. It is also estimated that approximately 1.3 billion people live with some form of distance or near vision impairment [9]. Similarly, statistics from the Institute of Hearing Research have indicated an estimation of 81.5 million adults with hearing loss in Europe, which means around one in seven adults.

In addition, the ageing process is becoming a demographic trend. The continuous population growth and ageing will further increase the risk of the amount of people acquiring hearing loss, vision impairment and other disabilities. This becomes a challenge to be overcome in our society. For example, although the total population in the European Union (EU) is projected to increase from 511 million in 2016 to 520 million in 2070 (an increase by 1.8%), the old-age dependency ratio (i.e. people aged 65 and above relative to those aged 15 to 64) in the EU is projected to increase by 21.6 percentage points, from 29.6% in 2016 to 51.2% in 2070 [10]. It is expected that in 2020 approximately 120 million persons in the EU will have multiple and/or minor disabilities. According to statistics from the UN, between 2015 and 2030, the number of people in the world aged 60 years or over is projected to grow by 56%, a tendency which is even expected to increase. Similar statistics reflect that in the next decades the number of older people will double, and the number of the ‘oldest old’ might almost triple [11]. According to Eurostat [12], in 2017, nearly one-fifth (19%) of the EU population was aged 65 and more. This ageing tendency is closely related to disability rates, as older people may develop age-related impairments, such as visual, hearing, physical and cognitive. According to EPRS [13], over a third of people 75 years of age have physical, mental or sensory impairments that pose an accessibility issue; and over 20% are considerably challenged. Likewise, the World Wide Web Consortium (W3C) presented a study in 2008Footnote 1 that relates ageing not only to hearing loss but also to visual, physical and cognitive challenge. It is stated in that study that hearing loss is experienced by 47% of people 61 to 80 years and by 93% of people over 81 years. Moderate or severe hearing loss or profound deafness is experienced by 20% of people aged 61–80 and by 75% of people over 80. Furthermore, it is stated that the impact of hearing loss may difficult to discern the audio and the background sounds and higher pitch sounds can be missed. In relation to visual decline, the prevalence is 16% of people aged between 65 and 74 years old, 10% of people aged 75–84 years old and 46% of people aged 85+ years old. Impact of visual decline mainly affects the ability to focus, contrast sensitivity and the perception on the change of colour. Likewise, around 20% of people aged 70 years old are estimated to suffer cognitive decline, having mainly an impact in short-term memory, concentration and distraction issues.

As proof of evidence, the United Nations Centre for Regional Development (UNCRD)Footnote 2 provides a framework for action in a number of articles highlighting the interrelations between ageing and disability, especially in the article 9 (accessibility).

Linguistic and cultural capabilities also need to be taken into account as an additional relevant fact to all these accessibility and ageing-related numbers and trends. Society has become global, and people commonly travel or move to different countries for professional and/or leisure issues, so language and culture may also become a barrier.

All these facts have a clear impact in the media accessibility field, which has traditionally mostly focused on audience with disabilities, but now needs to take into account other audiences, such as the elderly, non-native speakers and illiterate to enable their full democratic participation in the society where ICT and novel technologies play a key role.

In line with these trends and facts, the average age of audiences in the culture and arts sectors is progressively increasing. Eurostat has conducted studies on older people cultural participation patterns and preferences [14]. As shown in Fig. 1, although the results significantly vary between EU countries, they are noteworthy in general. Visiting cultural sites appeared to be the most attractive cultural activity for people aged 65–74 (43%), followed by live performances (38%) and by cinema (27%).

Fig. 1
figure 1

Cultural consumption variations in older people preferences [14]

Similarly, recent studies about the age of the audience in the scenic arts agree on the increasing ageing average of the audience. According to a research study carried out by the Audience AgencyFootnote 3 in the UK in 2017, the audiences in classical music artforms, such as opera, are much more likely to be in middle and older age groups, with 37% aged over 61. In addition, older audiences are not a phenomenon restricted to classical music artforms, but are also very interested in other artforms, like theatre.

These statistics show that senior citizens are big consumers of culture and heritage, and this is supported by two key factors. First, they are no longer active workers and have more free time. Second, cultural activities and out-of-season tourism are frequently promoted for them.

From an economic point of view, the cultural and creative industries are among the fastest growing sectors. According to UNESCO [15], the culture sector accounts for 6.1% of the global economy, with an estimated global worth of 4.3 trillion USD per year. Beyond the big annual revenues, the cultural and creative industries generate nearly 30 million jobs worldwide. Therefore, the incomes produced by the consumption of cultural goods by the elderly should not be underestimated. But away from the economic impact, universal access to cultural goods must be guaranteed [16], so cultural spaces and events should be adapted to everyone. Beyond architectural adaptation, presentation of content and information should be adapted to comprehensively reach all citizens. Traditionally, museums have attempted to increase their attendance by improving the quality of their content, and not by reducing the cost of the entrance [17,18,19].

These actions will contribute to diminishing the digital divide for senior citizens and people with accessibility needs, which will in turn contribute to an inclusive society and to an increased well-being. Different regulatory frameworks and recommendations have been elaborated to protect this principle. These are reviewed in the next sub-section.

2.2 Accessibility legislation

Disabilities, sensorial impairments and ageing have functional, social, emotional and economic impacts. This involves a barrier to properly access and interpret the information, which is a right of all citizens, without exceptions, in all sectors and services. E-inclusion thus becomes a priority of worldwide governments, and this is also the case in the EU, as proposed by the Single Digital Market (SDM) strategy that aims to guarantee a seamless exchange of media assets across countries and communities, and by two important UN conventions that have legal repercussions within the scope of accessibility to cultural goods.

The Convention on the Protection and Promotion of the Diversity of Cultural Expressions [20] identified the elderly as an audience sector to be protected against exclusion. This is further expanded in the Convention of Rights of Persons with Disabilities [21], since the elderly may not be classified as disabled, but are the highest number of citizens with hearing and/or sight loss. These two UN conventions have been developed in two Directives and one Act affecting all EU countries. In order to comply with the new legislation, public sector organisations will need to monitor the accessibility of their websites, apps and media content, and report to a central authority identified for each country. The three pieces of legislation are:

1) The EU Directive on the Accessibility of Websites and Mobile Applications [22]. It requests from all EU members to meet common accessibility standards in public bodies websites and mobile apps. It is based on Web Content Accessibility Guidelines (WCAG) 2.0 [23], and references EN 301 549 [24] as the standard which will enable websites and apps to comply with the law. This Directive was transposed into the laws of each EU member state by September 2018.

2) The Audiovisual Media Services Directive (AVMSD) [25]. It governs a coordination of national legislations on audiovisual media, addressing key issues like:

  • Rules to shape technological developments

  • Preserving cultural diversity

  • Protecting children and consumers

  • Safeguarding media pluralism

The Directive was approved in 2018 and member states will have 21 months to transpose it into national legislation.

3) The European Accessibility Act [26]. It is a law that aims at making many products and services in the EU more accessible for persons with disabilities. Some examples include:

  • Smartphones, tablets and computers

  • TV programmes

  • E-books

  • Online websites and mobile apps

It takes the form of a Directive, which is legally binding for all Member States.

Therefore, these three pieces of EU legislation demand accessibility services in all cultural venues and events, not only for their websites, services and spaces but also for their offered content and information. As reviewed in the next section, many major platforms and cultural institutions are increasingly providing VR360 videos to their audiences, so these pieces of content also need to be accessible.

2.3 Discussion

The number of citizens with disabilities and/or functional limitations is significantly increasing with the ageing of the population, and this fact will also increase the demand for accessible products and services, including those in the cultural sector.

Accessibility has a considerable economic impact, but overall becomes a human right to guarantee universal access to services and information. To achieve this, the whole population, regardless of their capabilities, disabilities, ageing and languages, must be considered. A number of EU legislation pieces have been elaborated to regulate and monitor the adoption of appropriate accessibility solutions and practices.

Typically, accessibility has been only considered after a technology has matured to meet the demand of the mass market, and to fulfil legal requirements. However, when it comes to novel and not-yet mainstreamed services, such as VR, the development of accessibility solutions becomes more challenging. It is due to the fact that accessibility becomes an issue not only of sensorial disabilities but also of capabilities, technological skills and usability aspects [27]. However, considering accessibility since the early development of novel technologies will contribute to more effective solutions, a wider adoption and less financial burden. In the particular case of immersive technologies, there is still a gap in making VR experiences accessible to everyone. Immersive experiences need to be inclusive across different languages, addressing the needs not only of those with hearing and low vision problems but also of people with cognitive or learning difficulties, newcomers, people with low literacy and the aged, and these requirements also apply to the cultural sector.

3 Related work

This section firstly provides an overview of relevant companies and cultural institutions that provide VR360 experiences to their audiences. Then, the key aspects to be taken into account for providing accessibility in immersive VR360 content will be overviewed, by highlighting the challenges and current limitations. Finally, a review of existing VR360 players will be provided, analysing their support for accessibility features and the interaction modalities provided by them.

3.1 Companies and cultural institutions providing VR360 experiences

Several media companies and institutions have invested significant efforts on the production of VR360 immersive content, in a variety of scopes. This sub-section provides some examples, ranging from journalism to films and documentaries and to cultural venues and events.

The New York Times (NYT) is a clear example of a major media company, with worldwide influence and audience, that decided to make use of VR360 videos not only as a new reporting tool for journalism but also as an innovative storytelling form. As proof of evidence, NYT VRFootnote 4 was launched in November 2015 and The Daily 360Footnote 5, an initiative in which one VR360 video is published every day, was launched 1 year later. The created VR360 videos so far span from new reports, to documentaries, series and films, to cultural events and guides.Footnote 6 Although NYT developed an ad hoc player, their VR360 videos are also published on their Youtube channel,Footnote 7 which has over 2 million subscribers. In 1-year period, the created VR360 videos reached over 2 million views on YouTube and over 94 million on Facebook. The VR Gorilla – Virtual Reality Productions company has created many VR360 videos to provide immersive experiences about, e.g. cultural documentaries, events and guided tours to different places with high cultural value. The videos are available on its Youtube channel,Footnote 8 which was over 20,000 subscribers, including videos over 200,000 views.

RYOT is also an immersive media company founded in 2012 in Los Angeles that very rapidly became known for the production of VR360 videos. The company was acquired by The Huffington Post/AOL, in turn owned by Verizon, in April 2016. Films produced by the studio have won several awards, and even an Oscar nomination. In a joint project with Jaunt Studios, RYOT started the creation of an innovative VR360-based documentary series in 2015, entitled Holy Land. Holy Land is a five-part series that virtually brings viewers to the Middle East to observe the confluence of Judaism, Christianity and Islam. The episodes transport viewers to the Holy Land to experience some cultural events and locations, while also showing a region that has been stricken by decades of cultural conflict and political tensions.

VR360 videos have been also provided in the scope of museums for exhibitions, entertainment or educational experiences, becoming powerful examples of how immersive formats offer new pathways for audiences to engage in culture and arts. The State Hermitage Museum, located in Saint Petersburg (Russia), offers to its worldwide visitors a VR360 video that allows them to get immersed in historical facts and cultural events related to the museum. Likewise, since 2016, the Dali Museum, also in Saint Petersburg, provides a permanent VR360 experienceFootnote 9 that allows the audience to enter inside Salvador Dali’s surrealist painting Archaeological Reminiscence of Millet’s Angelus, venturing into the towersand discovering surprises around every corner. The video has more than 2 million views on Youtube.Footnote 10 This VR360 experience has gained visitor compliment and international awards. It is also being exhibited in other venues, like at Second Century Studios (Florida, USA) during 2019.

Finally, Google started in 2012 a project called Google Arts & Culture, which aims at virtually reconstructing different art collections, cultural events and sites. This includes a battery of VR360 videosFootnote 11 captured from different places around the globe. These videos allow viewers to enjoy arts and cultural performances, like ballet, orchestra and opera productions, seeing and hearing the action from different positions.

It is also becoming increasingly popular that museums and cultural institutions provide other forms of VR experiences to their audiences [2,3,4], like 3D models of famous exhibits, virtual 6DoF tours, gamification tools and augmented reality (AR) apps. Some examples are the State Hermitage Museum, the Google Arts & Culture project, the National Museum of Natural History in Paris and the British Museum in London. Acknowledging the high relevance of these VR services and experiences, and the fact that they also require accessibility-related features, this paper focuses on VR360 content, which has been proven as an effective and worldwide adopted medium to provide immersive and engaging experiences, with high cultural and educational value.

3.2 Research on accessible VR360 experiences: limitations and challenges

As for every media content, VR360 videos need to be accessible. VR360 videos include audio and visual elements which, at the same time, may contain linguistic and non-linguistic components. For example, a VR360 video may include an off-screen narrator and on-screen characters speaking in an environment where other sounds and music can be heard. The visuals provide information about the spatio-temporal settings, the speakers and their actions, and can also include text on screen, be it added in the postproduction process (e.g. informative captions) or as part of the diegetic world (e.g. posters on a wall) [28].

User meaning-making process is possible, thanks to the intersection and interaction between all these diverse components, which contribute to a comprehensive understanding of the content. Therefore, alternatives must be provided to cater for the needs of those users who cannot access all audiovisual components, which is the aim of the access services described next.

3.2.1 Subtitling (for the deaf and hard-of-hearing)

When the audience cannot make use of the audio, subtitles for the deaf and hard-of-hearing (SDH) [29] can transfer not only the words of the characters but also other relevant sound information. This includes music and non-speech information, such as sound effects and paralinguistic sounds (e.g. laughing, crying). Another feature that needs to be transferred is who is speaking, which becomes more relevant and complex when the speaker is not visible on screen. SDH provide an account of all relevant acoustic elements and differ from interlingual subtitles, addressed to a hearing audience whose barrier is linguistic and not sensorial.

Up to date, many subtitling solutions for traditional 2D video content have been proposed, even including support for personalisation options (e.g. [30,31,32]) and advanced presentation methods for better speaker identification (e.g. [33]). However, subtitling solutions for immersive media are still in their infancy. Taking into account the specifics of immersive media, the next key research questions and challenges to be addressed in this field can be highlighted:

  • Where to position the subtitles in the 360° sphere? The EBU R-95 recommendation [34] provides guidelines for presentation of subtitles in safe areas of traditional 16:9 TV screens. However, immersive displays, like HMDs, have a different aspect ratio (around 1:1) and a very limited field of view (FoV), typically around 90–110°. Accordingly, the format and size of safe areas that provide comfortable viewing experiences when using HMDs need to be determined. Likewise, specific subtitle presentation modes (e.g. subtitles attached to the speaker) and personalisation features would need to be explored for a better immersion/engagement.

  • What are the appropriate text fonts and sizes? There are several studies and recommendations on that subject for traditional media, but not yet for immersive media, where both accessibility and immersion are crucial.

  • How to deal with non-speech information? This is a similar feature than for traditional media, but VR environments open the door to new possibilities (e.g. use of 3D elements, spatial presentation).

  • How to guide users towards the active speaker(s)? Unlike in traditional media, the users have freedom to explore around the omnidirectional area in VR360 content. Therefore, it could happen that the active speaker is outside of the current user’s FoV. Although spatial audio could support the users in perceiving where the speaker is, deaf and hard-of-hearing users could not hear it, or the audio could not be listened in noisy/public environments. Therefore, appropriate (visual) guiding methods need to be designed in order effortlessly find the speaker in the 360° space, thus contributing to a more effective content comprehension.

  • Is the reading speed the same in immersive media than in traditional media? There are several studies and recommendations on that subject and on the number of characters to be presented for traditional media, but not yet for immersive media, where higher amount of visual data is presented to users.

As far as authors know, just two recent studies have partially investigated this topic, excluding works that have added burned-in subtitles to VR360 pieces at post-processing stages. First, a study carried out by BBC [35] compared four solutions for subtitles presentation: (i) Evenly spaced: subtitles are equally spaced with a separation of 120° in a fixed position below the eye line; (ii) Follow head immediately: subtitles are displayed always in front of the viewer, and follow him/her when looking around; (iii) Follow with lag: subtitles appear directly in front of the viewer, and they remain there until the viewer looks somewhere else; then, the subtitles rotate smoothly to the new position in front of the viewer; and (iv) Appear in front, then fixed: subtitles appear in front of the viewer, and then are kept fixed until they disappear. Preliminary studies on short clips concluded that the Follow head immediately method was preferred by users, mainly because (i) the subtitles were easy to locate and (ii) viewers did not miss the subtitles when exploring the 360° area. However, the particular implementation in [35] resulted in blocking effects (i.e. subtitles were blocking important parts of the image and were considered obstructive), which could be optimised. Second, the study in [36] also compared the first two previous presentation methods, but no clear differences between their appropriateness and on users’ preferences were identified. The results from these two studies encourage further research on the field, e.g. considering other content types and longer pieces of content, but especially addressing the other mentioned and unexplored challenges that are essential for both accessibility and immersion.

3.2.2 Sign language interpreting

Another access service that makes audio content accessible is sign language interpreting, which translates speech into sign language. Many service providers have included sign language videos in their contents as a burned-in overlay or picture-in-picture (PiP) window. In addition, the work in [31] leveraged the interactivity features enabled by the Hybrid Broadcast Broadband TV (HbbTV) [37] standard to be able to dynamically activate and deactivate the sign language service and to provide some personalisation features, like adjusting the position and size of the PiP window. However, as far as authors know, dynamic and personalisable solutions for sign language interpreting for VR360 content are nonexistent, and similar challenges and open questions to those of subtitling apply to this service.

3.2.3 Audio description

When the audience cannot access the visuals, audio description becomes the solution. Audio description is an access service in which the visuals are translated into words and interspersed in the gaps in which there is no dialogue. In this service, the audio describer aims at transforming the Who, Where, When, What and How elements of each scene into auditory information. In certain productions, like opera, in which few gaps for audio description are available, audio introductions are offered to the audience, sometimes in combination with a touch tour of the props on stage [38]. Existing solutions and research approaches for audio describing traditional media are reviewed in [39, 40].

Audio describing VR360 content is more challenging though. In this medium, content selection is not based on the visuals provided on a 2D screen, but on an omnidirectional or even 3D world where actions can happen all around and users can wander at their leisure [41]. Therefore, it could happen that audio description is provided to a region of the 360° space that is outside the current user’s FoV. Likewise, there is typically more information to audio describe than in traditional media. Spatial audio could come to the rescue, by providing cues about where the action is taking place. Different audio placement and presentation modes, even scripting approaches, together with personalisation features, could be explored with the goal of maximizing accessibility and immersion. No research on audio description for VR360 has been conducted yet, but professionals have recently shown interest and highlighted the need for novel solutions and approaches in this topic [41].

3.2.4 Audio subtitling

Visuals may also contain text, such as subtitles, which provide a translation of the original. Audiences who cannot see the subtitles, but also audiences who cannot adequately read them (e.g. users with reading difficulties or dyslexia), are in need of what is called audio subtitling, aka spoken subtitles. Audio subtitling becomes an ecological access service, since it gives another use and recycles existing subtitling assets. Previous studies on audio subtitling have mostly focused on descriptive theoretical perspectives and on current practices, but less on technological solutions and experimental studies. The existing solutions and research approaches for audio subtitling traditional media, together with the deployment status of this access service, are reviewed in [42]. Most of the times, audio description and audio subtitles are integrated into a single product. In other cases, an independent track with audio subtitles is offered, very often with a text-to-speech voice. In any case, audio subtitling should be made available with audio description, as a complement, never as a substitution.

As for subtitling and audio description, no research has been conducted on effectively providing audio subtitling for VR360 content. However, similar challenges and opportunities than for these first two access services apply to this latter one.

3.2.5 Easy to read subtitling

A more innovative access service mixes easy-to-read language with subtitles, and proposes to create easy-to-read subtitles [43]. Easy-to-read subtitles offer a simplified version of the subtitles and are suitable for situations in which users want to focus more on the visuals and get a shorter and simpler account of the linguistic content. They can also provide benefits to users with reading difficulties. This becomes especially relevant in VR360 content where (i) the FoV is more limited (thus having less space for subtitles); and (ii) the users will have more time to explore the 360° area. Research on this novel access service is very scarce, even for traditional media, but its application for VR360 content is promising, and also requires an exploration of the identified research challenges and questions for subtitling.

3.2.6 Accessible user interfaces

Research on the design and adoption of appropriate user interfaces (UIs) and interaction modalities for traditional media services and consumption devices has been conducted in the last years. These issues become more complex when many options need to be provided to the users, e.g. in terms of settings of the consumption experiences and of the particular access services being provided, but this is especially true and relevant in the VR360 domain. Further research needs to be conducted on designing accessible, but immersive, UIs for VR360 environments, integrating advanced interaction modalities and access services, while enabling hyper-personalised consumption experiences. Examples of the challenges to be addressed, and open questions to be explored, in this context include the limited FoV and unknown safe areas in VR360 environments; the availability of 3D omnidirectional environments; the freedom to spatially explore the 360° area; the adoption of interaction modalities used in the VR world (e.g. use of movement sensors for exploration, use of timers, gestures and/or VR controllers for selection purposes) and modern interaction modalities for better accessibility (e.g. voice control) and finding a trade-off between immersion and accessibility that maximises usability.

3.3 Existing VR360 players and their accessibility features

Up to date, many VR360 players have been developed by the research community and industry. Table 1 provides a brief, but comprehensive, comparative between most of the known VR360 players in terms of technology used for development, use of commercial or free licence (including if they are released as open-source); supported platforms and devices; supported access services and supported assistive technologies.

Table 1 Comparative of VR360 players

As can be seen in Table 1, not all VR360 players have a free licence, and just few of them have been released as open-source. Multi-platform support is a typical feature of the available VR360 players, but this is not totally true when it comes to their support in TVs and HMDs. Their support for access services is mostly limited to subtitling. However, the existing subtitling solutions are based on burned-in subtitles (and thus without providing personalisation and interaction features), on traditional television rendering modes or on the presentation modes previously introduced in this section for the BBC player. No VR360 players supporting the dynamic presentation of a sign language video have been found, and the support for audio description as an additional stream is just supported by JW Player and Youtube player, but without taking advantage of the previously discussed opportunities that spatial audio can give in VR360 environments, and without providing any personalisation feature. Support for audio subtitles is unavailable in the existing players. Regarding the envisioned assistive technologies and methods, none of them is supported in the existing VR360 players. The Facebook and NYT players provide a radar for supporting the user in getting oriented in the 360° environment (represented as ‘*’ symbol in the table), but this is used as a guiding method for the VR360 video, not linked with any access service. As indicated in the last row of Table 1 and further detailed in the next section, the accessibility-enabled VR360 player developed within the umbrella of this work provides support for all the listed features. Therefore, this comparative confirms the advantages and benefits provided by the developed player, and of all its associated components for media authoring, processing and delivery, not only in terms of technological aspects but also in terms of the supported interaction modalities, access services, assistive technologies and personalisation features. The ‘+’ symbol in specific cells of Table 1 indicates that the developed VR360 player provides enhanced performance and/or more appropriate options for the specific features being reviewed (in particular, presentation of subtitles and audio description) than the other existing VR360 players.

4 End-to-end platform for immersive accessibility

As a response to the previously identified limitations and needs, the EU H2020 Immersive Accessibility (ImAc) project (www.imac-project.eu) is exploring how access services (like subtitling, audio description and sign language) and different forms of interaction modalities (like voice control, use of VR controllers) can be efficiently integrated within immersive media (like VR360 video and spatial audio), while enabling hyper-personalised experiences adapted to the preferences and/or needs of the audience.

The ImAc project is carried out by a cross-disciplinary consortium, composed of international experts in the fields of content creation and service provisioning (e.g. broadcasters), multimedia systems and accessibility. The premise of ImAc is to avoid considering accessibility as an afterthought, but as an essential aspect in the specification and deployment of end-to-end immersive services. Likewise, ImAc attempts to avoid the proliferation of isolated and closed add-on solutions, by designing a standard-compliant end-to-end system that encompasses all stages from media authoring to media consumption.

The project has adopted a user-centric methodology where end-users, professionals and stakeholders are involved at every stage of the project through the organisation of workshops, focus groups, tests and the attendance to events. This elicits a bottom up approach to determining the accessibility, interaction and personalisation requirements. The insights from the user-centric activities in turn determine the necessary technological solutions to meet these requirements, and the specific services and scenarios to be provided. By combining the consortium expertise, the adopted premises and the followed scientific methodology, the project aims to ensure that immersive VR360 experiences are inclusive across different languages, addressing the needs not only of those with hearing and low vision problems but also of people with cognitive or learning difficulties, newcomers, people with low literacy and the aged.

The next sub-sections present the ImAc platform components, paying special attention to the accessibility-enabled VR360 player, and provide some examples of VR360 content newly created and/or adapted with accessibility layers within the umbrella of the project.

4.1 Overview of the end-to-end platform

To achieve the targeted goals, an end-to-end platform has been developed. The platform includes all the necessary parts, components and steps from media ingest and authoring to media consumption (see Fig. 2). It has been designed with the premise of keeping backward-compliance with current technologies, formats, infrastructures and practices in media systems. Likewise, the platform components have been developed by using web technologies. This guarantees cross-device, cross-platform and even cross-browser support, eliminating the need for any installation and/or updates at the client side [44]. These two design choices will maximise re-usability, inter-operability and the changes of successful deployment and exploitation.

Fig. 2
figure 2

Overview of the end-to-end platform

Next, an overview of each one of the platform parts is provided to better understand the context—and potential impact—of this work.

4.1.1 Content production/authoring

The platform includes components to allow the ingestion of immersive VR360 content and the production of their related accessibility assets. Accordingly, the content authoring part of the platform includes a set of (web-based) tools for the edition of access service content (including subtitles, audio description and sign language), and their integration with the available VR360 content. The specified features and signalling metadata provided by the editing tools are not detailed, but can be inferred from the features provided by the VR360 player (described later).

4.1.2 Service provider

This part of the platform includes different components for the management and cataloguing of content, and triggering their publication, either by associating them to scheduled (broadcast) TV programs or by posting them on specific websites. One key component in this part is the Accessibility Content Manager (ACM), developed to enable the upload of VR360 content (and optionally its related assets, like personalised covers/thumbnails and existing access service content), to catalogue it, to link it with the editing tools and finally to trigger the publication of the immersive and accessibility content for the final consumption.

4.1.3 Content Preparation and Distribution

This part of the platform includes components for preparing the available content for their distribution. These components are mainly in charge of content encoding, segmentation and signalling. The project focuses on the distribution of contents via broadband web servers or content delivery networks (CDNs), by making use of Dynamic Adaptive Streaming over HTTP (DASH) technology [45]. However, the platform is also prepared for the delivery of broadband content via DASH, but as an enrichment of conventional broadcast Digital Video Broadcasting (DVB) services, by leveraging the features provided the HbbTV standard [37].

4.1.4 Content consumption

This part of the platform includes a web-based portal (i.e. landing page) for language selection, initial settings, content listing and selection (see Fig. 3), and a web-based player for the presentation of the VR360 content (see Fig. 4).

Fig. 3
figure 3

ImAc portal

Fig. 4
figure 4

UI of the ImAc player

The portal and player have been developed by exclusively relying on web-based technologies. Therefore, the VR360 content can be consumed via a range of traditional (e.g. connected TVs, PCs, laptops, tablets and smartphones) and VR devices (e.g. HMDs). This also becomes an advantage compared with other existing VR360 players, whose use is restricted to specific platforms and/or device types (see Table 1).

The different accessibility, interaction and personalisation features provided by the player, together with its UI, are presented in the next sub-section.

4.2 Accessibility-enabled VR360 player

This sub-section describes the main features of the developed portal and accessibility-enabled VR360 player. A demo video outlining and showcasing these features can be watched at https://bit.ly/2Wqd336. Their current version and the available VR360 and accessibility content can be accessed via this URL: http://imac.i2cat.net/playertest/. Their source code can be downloaded from https://github.com/ua-i2cat/ImAc.

4.2.1 Access services

The VR360 player provides support for different presentation modes and personalisation features for each one of the previously discussed access services. These presentation modes and features have been preliminary selected through conducted user-centric activities and insights from related works, so the player can be used as a prototype to be used in future subjective evaluations in order to decide on their final adoption and/or refinement. Therefore, all these implemented presentation modes and features for presentation of access services are targeted at providing efficient solutions to the identified challenges and limitations in Section 3.

Subtitling

Three main presentation modes for subtitles have been considered and developed. The first two are the ones previously proposed and tested by the research community, reviewed in Section 3: subtitles always-visible (see Fig. 5) and evenly spaced every 120°. Conducted tests in [46, 47] have shown that always-visible subtitles are clearly preferred, mainly because they were easier to find and to read and less distracting and users perceived a higher freedom to explore the 360° environment without missing the subtitles. This confirms the preliminary insights from [35], when using different types and longer VR360 clips, and without leading to annoying blocking effect issues in the developed solution, as in [35].

Fig. 5
figure 5

Always-visible subtitles at different positions (left figure: at top; right figure: at bottom) and with different visual indicators positions (left figure: arrows; right figure: radar)

In these two presentation modes, different visual indicators to guide the users towards the target speaker(s) have been considered (see Fig. 5): arrows and radar. In the case of the radar, the user’s FoV is also indicated, and the position of the speaker is indicated by using a mark of the same colour as the subtitles for a better identification (see Fig. 5, right screenshot).

In addition, a third hybrid presentation mode has been developed. It consists of attaching the subtitles to the speaker for a better identification, complementing them with always-visible guiding methods (e.g. arrows) to indicate where the speaker is, if he/she is outside of the user’s FoV. As long as the speaker is within the user’s FoV, the visual indicator is automatically hidden. This presentation mode is outlined in Fig. 6.

Fig. 6
figure 6

Subtitles attached to the speaker with always-visible visual indicators

Regardless of the presentation mode and guiding method, the presentation of subtitles can be also dynamically personalised via the player menu in terms of the following: size (three size levels); background (outlined text or a semi-transparent background box); position (top and bottom) and language.

Likewise, the presentation of easy-to-read subtitles is supported. Results from conducted tests have preliminarily proven that easy-to-read subtitles are preferred over traditional subtitles by elderly participants when watching VR360 clips from an opera performance [48]. That recent study has also confirmed the positive attitude of the elderly towards the consumption of VR360 content for enjoying culture and arts.

Sign language

Sign language interpreting is provided via an overlay PiP video window positioned at the bottom right of the FoV, as in typical 2D services. Following the same approach than for subtitles, the sign language window is kept always visible regardless of where the user is looking at. Based on previous findings in related works (e.g. [31]), dynamic personalisation features are supported as well, such as three size levels and two positions (bottom right and left). Two visual indicators (arrows and radar) can be also enabled, as for subtitles (see Fig. 7).

Fig. 7
figure 7

Presentation of a sign language video together with a visual indicator and text-based information for a better speaker’s identification

In order to provide a better identification of the target speaker, his/her name (or even a descriptive info text) can also be added below the video window (see Fig. 7).

Finally, unlike other existing VR360 players, the developed player supports the simultaneous presentation of both subtitles and sign language (see Fig. 8). This feature was suggested in a focus group with German users.

Fig. 8
figure 8

Simultaneous presentation of sign language and subtitles

Audio description

For audio description, the player leverages the potential of spatial audio formats (ambisonics) to provide support for different presentation or audio placement modes, which in turn can be complemented with different narratives or scripting modes. This can contribute to both a better accessibility and immersion, as users can better interpret the 360° space and story via auditory cues. Based on the insights from conducted focus groups [41], the three implemented audio presentation modes are:

  1. 1.

    Classic mode: no positioning

  2. 2.

    Static mode: from a fixed point in the scene

  3. 3.

    Dynamic mode: coming from the direction of the action

Likewise, the player provides support for extended audio description tracks (for specific scenes, actions or objects), which can be optionally activated to get extra information (e.g. via clicks or voice commands). Independent volume settings for the audio description and main audio tracks are supported as well.

Audio subtitling

Similarly than for audio description, the player supports different spatial presentation modes for audio subtitles, using either human or synthetic (e.g. via text-to-speech engines) voices. Likewise, audio subtitling can be presented simultaneously with the other access services, like subtitling and audio description, and independent volume levels can be set.

Audio subtitles are a less developed access service, but have been proven to provide benefits [42], so their support can be also considered as an outstanding feature of the developed VR360 player.

4.2.2 Interaction modalities

The portal and player enable different interaction modalities, which can be seamlessly activated depending on the consumption device being used. Examples include the use of the mouse or keyboard on PCs/laptops, the touchscreen or gyroscope on smartphones/tablets and head movement sensors and VR controllers on HMDs.

Additionally, voice control is supported, including both voice recognition for the execution of commands and spoken feedback to their execution. For such a purpose, a modular and extensible solution has been developed to communicate the portal and player with external voice controllers, like Amazon Echo (Alexa) or Google Home. This allows providing the voice control feature even for those devices not including any audio input/output connections.

4.2.3 User interface

The design of responsive, attractive, intuitive and accessible UIs becomes challenging, especially when many options and personalisation features need to be provided, and when using VR environments involving 3D elements and VR screens with limited FoV (e.g. HMDs).

On the one hand, a web interface for the portal has been designed. It includes the list of available VR360 videos and information about them, such as their title, cover, duration, the available access services and the available languages. The users can select the desired language for the UIs, as well as some initial settings that can be changed afterwards during media consumption. Figure 9 provides a screen capture of the initial screen of the portal and of its General Settings menu. Not all the screens for the available settings in the portal are provided, but they can be checked via the demo video and player URL provided at the beginning of this section.

Fig. 9
figure 9

Examples of the UI of the ImAc portal

As can be observed in Figs. 9 and 10, text-based icons for the access services have been adopted. These icons have been proposed by DR (Danish Broadcaster),Footnote 12 and the European Broadcasting Union (EBU) is supporting their standardised usage.

Fig. 10
figure 10

Accessibility icons adopted in the ImAc portal and player

The portal UI and its different screens have been proven to be WCAG-compliant [23], thus following the W3C guidelines for developing accessible web interfaces.

On the other hand, a menu for the player has been designed. It is automatically adapted to the device being used, and to the available FoV and 3D environment in which the VR360 content is presented. The menu includes the typical playout control commands and the icons to activate/deactivate the available access services (see Fig. 11). It also includes a settings control through which settings for the player and for the access services can be dynamically selected/(de-)activated.

Fig. 11
figure 11

Designed player UI with the Settings menu, showing how visual feedback on the execution of commands and for the selected options is provided

The UI provides visual feedback on the execution of commands and on the selected/activated options (see Fig. 11). For a better comprehension of the controls, hover features are provided for the access service icon controls (showing their more commonly known acronyms, like ST, SL, AST and AD, in English). The hover feature has been also enabled for the control at the top left of the menu, which is targeted at activating an enhanced-accessibility variant of the menu (see Fig. 12). This menu variant has an enlarged size, high contrast and a bigger pointer (whose size can also be personalised) for interaction with the visual elements. The available controls, menus and sub-menus are the same in both menu variants. As the menu occupies a significant part of the FoV, a preview feature has been added to let the user instantaneously check how the current settings for visual presentation modes look like, without having to close the menu and open it again. This feature can be accessed by clicking on the eye icon, available in the main menu, but also in the sub-menus of the enhanced-accessibility menu to which this feature applies.

Fig. 12
figure 12

Enhanced-accessibility variant of the player menu/UI

Finally, the player will be used as a proof of concept to investigate another unexplored, but relevant, UI aspect in the VR arena: determining the most appropriate size and layout of the safe area (i.e. the region of the screen/FoV within which visual elements are presented). This is key for providing comfortable, and immersive, viewing experiences.

4.3 Examples of accessibility-enabled VR360 cultural content

Apart from developing the necessary technological components, key goals of ImAc consist of creating appropriate VR360 and access service content, and planning pilot actions to determine the benefits of such contributions. Although presenting results from the conducted objective and subjective evaluations is out of scope of this paper, Table 2 lists some examples of cultural VR360 clips that have been either newly created or enriched with an accessibility layer within the scope of the project, including a brief description about them, the access services for these different clips and the available languages.

Table 2 Examples of created/adapted accessible VR360 content

This list of accessible VR360 videos is continuously being extended, via the production/adaptation of new VR360 content by both project members and third-party entities, using the developed end-to-end tools.

Readers are referred to the VR360 clip for the opera performance for a fully accessible VR360 experience, including all the described access services, presentation modes and personalisation features. The other available videos include access services and presentation modes targeted for specific audiences, e.g. to evaluate their appropriateness or the users’ preferences. However, their extension for fully accessibility is planned.

What is more, next steps consist of not only creating more culture-, arts- and educational-related VR360 content but also allowing third-party entities to make their VR360 content fully accessible by using the developed end-to-end tools. This is a proof of the real impact of the contributions of this work within the culture and heritage sectors.

5 Discussion and future work

The use of VR technologies and, in particular, of VR360 content can provide great benefits to the culture and heritage sectors. However, as in any (multimedia) service, accessibility becomes an essential requirement to contribute to global e-inclusion, guaranteeing universal access to the information, and an effective content comprehension by everyone. The specification, development and evaluation of accessibility solutions for novel technologies, like VR, are more challenging than for traditional technologies. This is due to the fact that no other comparable accessibility solutions exist and the consumers are not yet used to these novel technologies. However, considering accessibility since the early development of novel technologies contributes to more effective solutions and to a wider and earlier adoption, without handicaps for any citizen. Likewise, accessibility in media services must address the needs not only of users with specific sensorial disabilities, such as hearing or vision impairments, but also of users with cognitive or learning difficulties, newcomers, people with low literacy and the aged.

This article has reviewed the opportunities and benefits that VR360 content can provide within the fields of culture, arts and heritage. Then, it has discussed the relevance of these sectors, and especially, the necessity of addressing the needs of all citizens herein, in terms of ethical concerns, fairness and equal access, as well in terms of economical impact. Different international regulatory frameworks to support and monitor the accessibility of a variety of services have been reviewed as well.

Given this context, the services and interaction modalities necessary to enable accessible VR360 experiences have been identified, highlighting the existing challenges and gaps in state-of-the-art solutions. As a response to these limitations, an end-to-end platform to efficiently integrate accessibility services within VR360 services has been designed. Rather than addressing the identified limitations just at the consumption side, the whole end-to-end media chain has been considered in order to provide full-fledged solutions to be used for all involved agents (i.e. content creators, accessibility experts, content providers and consumers). The key components of the end-to-end platform have been briefly presented, paying special attention to the accessibility-enabled VR360 player, as it is the interface for user interaction, and thus the most critical piece for accessibility. All supported presentation modes and personalisation features by the player have been described, being all of them determined via a thorough analysis of the state-of-the-art and user-centric activities.

It is important to emphasise that the technological contributions have been developed with the premises of adhering to current practices and existing resources, and of keeping backward-compliance with existing (standard) technologies and formats in the media sector. This will maximise re-usability, interoperability and the changes of successful deployment and exploitation. Indeed, many of the newly developed solutions are being proposed for standardisation in different international bodies, like the W3C, Moving Picture Experts Group (MPEG) and International Organization for Standardization (ISO).

The available VR360 and access service content has been listed. This includes not only newly created clips but also already existing VR360 pieces created by third-party entities that have been enriched with the inclusion of access service content. This reinforces the potential impact of this work, giving the chance to adapt any VR360 content for full accessibility and hyper-personalisation. Furthermore, given the standard-compliance of the developed solutions, third-party content providers can also integrate accessible VR360 content in their service offering, via either augmenting traditional TV services with a synchronised presentation of such content on companions screens, thanks to the features enabled by the emerging HbbTV standard, and/or via the embedding of the developed player on their own websites.

Therefore, the presented contributions serve as a demonstrator of how a novel medium with high relevance, as VR360 content, can be made accessible. This can contribute to overcoming the accessibility limitations and needs that have been previously discussed, which are highly relevant in the culture, arts and heritage sectors, but also in other ones, like education and entertainment.

This is not yet a closed research work, but several plans are scheduled for the future. As short-term efforts, it is planned to extend the corpus of accessible VR360 content and to conduct objective and subjective evaluations. In addition, dissemination initiatives will be launched to contribute to the adoption of the developed solutions by third-party entities, like content providers. Apart from these tasks, two related research lines are also of interest. First, this work has been focused on VR360 experiences, which have been proven to be very relevant in the discussed sectors. However, the exploration of the accessibility field in 3D VR experiences with 6DoF, and even in XR scenarios, becomes also necessary, and poses new research challenges. Second, recent studies have revealed improvements on the perceived quality of experience (QoE) when adding multi-sensory stimuli, like haptic feedback and smells, to traditional media formats. Given these promising results, their addition to immersive media formats and determining their impact on accessibility become two very relevant research topics worth to explore in the future.