Keywords

1 Introduction

Eye-tracking is a technique used to determine gaze direction of eyes and also the sequence in which the eyes are moved at a given time. It is very useful for researchers, who try to understand movements of the eyes while a person is involved in different activities. During the years, several techniques were developed for measuring eye-movements and these usually involve the use of a device commonly known as an eye-tracker. It generally consists of a camera and associated electronic components to produce data representing gaze positions (or “gaze data”). This gaze data can be used to determine where the individual is looking, which can be described in terms of fixations (a steady gaze for a predetermined period of time) and saccades (movement of the gaze between two points).

The relation between eye-movements and internal brain processes (thinking, cognition, etc.) is known for a long time. For this reason, eye gaze data have been used in many fields such as neurology, ophthalmology, psychology, and related areas to study oculomotor characteristics and abnormalities, and their relation to mental states and cognition. There are some applications, still confined to controlled environment, for research in advertising and marketing, as well as in human factors engineering to evaluate web sites and computer interfaces.

This paper is concerned with the usability of eye-tracking for a better understanding of a work of art. In fact, design and implementation of digital experiences cannot be separated from the core component of the system: humans. This is true for almost every domain, even more when we talk about museums. The comprehension of visitor behaviour during their visit can provide needful data for improving the museum exhibition [24].

This might positively affect visitor experience, hence museum appeal. A modern approach to the fruition of art is actually based on a wide and targeted use of technologies [1]. For this reason, a growing number of museums is adopting digital tools as integral part of the exhibition, providing users with new instruments to study in deep art [9, 17].

One of these cutting edge technologies is Augmented Reality (AR). It allows the visualization of digital contents through a display, with the same point of view of the user, by superimposing virtual objects on the real scene. It also permits the visualization of virtual objects (e.g. 3D models, audio, text, images) avoiding the use of artefacts (i.e. QR code) to retrieve contents, in addition to an automatic and interactive visualization of Points of Interest (POIs).

In this study, we describe an experiment that used eye-tracking techniques to define a criteria for the selection of focal points for AR applications, for a better understanding of human behaviour. The main contribution of this paper is to provide useful information, analysing the movement of the eyes, in order to define the baseline towards a total User Centered Design for AR applications, forecasting the display of art contents, based on masterpiece features, technological constrains and user typology. The number of AR applications is significantly increasing, also for Digital Cultural Heritage (DCH) purposes. The reason of this increment is manifold; first of all, AR is simple to use and intuitive. Technology is mature and, up to now, there are several available tools for a cross-platform development. Markerless tracking systems (predominantly based on image matching algorithms) are robust and stable, even when lighting conditions are not optimal. Besides, thanks to the diffusion of smartphones and tablets with high computational capabilities, adoptions of such applications are more widespread. In a form of artwork, markerless AR permits the visualization of virtual objects (e.g. 3D models, audio, text, images) avoiding the use of artefacts (i.e. QR code) to retrieve contents. It works in addition to an automatic and interactive visualization of Points of Interest (POIs). Several researches show the advantages of Mobile AR (MAR) in conveying CH contents [11]. In [18] for example, an application to enhance the experience of visitors in a real case museum scenario was presented. An interactive touchscreen for a detailed visualization of the painting provides the user with a brushstroke-level visualization. The mobile applications are available in the app stores. They offer the Museum customers the possibility to deepen their knowledge of paintings like the “The Ideal City” presented in this paper, with a handy tool. Thanks to the Augmented Reality section, users can frame the artwork and interact with superimposed contents that are a continuum between the painting and the users point of view. Markerless AR is well established, but the following research questions, that triggered this work, have not been completely addressed: (i) which is the best criteria for the selection of contents? (ii) can we identify the user characteristics to personalize contents according to her background? and (iii) is it possible to analyse the most observed areas of the painting, to define attention points for AR applications? Previous work [3] delivered an optimized AR solution, finalized thanks to a classical user test and survey. This first stage allowed to define highlights and weak points of the application, towards the final solution, already published, mainly based on users suggestions. In an attempt to answer the questions mentioned above and given the results of our previous research, we used the eye-tracking to define a criteria for the selection of focal points for AR applications. The preliminary results showed that the proposed methodology for the definition of AR contents, displayed as POIs in artwork applications, is promising. The main objective is to define the baseline towards a total User Centered Design for AR applications, forecasting the display of art contents, based on masterpiece features, technological constraints and user typology.

The paper is organized as follows: Sect. 2 is an overview of the existing approaches in the context of eye-tracking; Sect. 3 gives details on the proposed method for this problem, that is still the core of our work; final section (Sect. 4) describes the results and a discussion over the potentials of this system, followed by conclusions and our future works in this direction (Sect. 6).

2 Related Works

Eye-movements are the expression of the relation between what is observed and its relevance to the interest of viewer and they are an important index of selection. It is well-known that the process of seeing is not simply the reproduction in our brain of external stimuli, but the result of more sensory part connected to individuals’ own internal representation [14, 20, 23]. Several factors contribute to eye resolution: the density of photoreceptors in the retina, the nature of the neural connections between them, and the contrast of the stimulus. The use of eye-trackers allow to record the movements of the eyes. Recent advancements have made them smaller, less inexpensive and invasive. In [5] are used for human computer interaction and they can be mounted directly on the user’s head or can be positioned in the work environment. In last applications, eye-trackers are used to select objects on the screen or in a virtual environment, to provide attentional information of a user.

In relation to art the contribution of eye-tracking is quite recent.

For example, in [10] the authors describe statistical regularities in artwork that appear related to low-level processing strategies in human vision, especially for the processing of natural scenes and the variations in these statistics.

In [20] the pattern of fixations of subjects looking at figurative and abstract paintings from different artists and at modified versions is described and several aspects of these art pieces were changed with digital manipulations. The authors found that people look at art depending on the subjects’ interests and their artistic appreciation.

Another paper in which are used eye-tracking technique is [16] with the aim to investigate the influence of bottom-up and top-down processes on visual behaviour of subjects, while they observe representational paintings.

The analysis of the exploratory pattern of viewer and selection of salient visual aspects can optimize AR application.

AR was recognised as an emerging technology [12], and the diffusion of smartphones and tablets with high computational capabilities made it look like easer to have a more widespread adoption of such applications. Recently, the focus of the evaluation on AR application has developed [6]. The first AR software is ARToolKit [13] for putting contents in overlay with the use of the video tracking and a virtual camera in the same point of view of the observer. The AR mobile applications are ARAC Maps [8] (for archeological content) and AR-TagBrowse [7] (for tagging and browsing virtual 3D objects).

A new concept of Adaptive Augmented Reality (\(A^{2}R\)), employed within the context of the creation of an AR guide for the museum visit is proposed in [4]. The main goal is to provide a augmented, unique, museum visiting experience, and to monitor the physiological reactions of the museum visitor. It has an important tool for better understanding the impact on human cognition of learning environments, such as museum and gallery.

3 Methods

3.1 Participants

Forty Italian students and employees at Università Politecnica delle Marche took part in the study. All participants were young adults between the ages of 18 and 56 (\(mean age = 27.9\) \(S.D. = 7.8\)), equally distributed across gender and had normal vision. An additional group of 12 students was recruited for the pre-testing.

3.2 Procedure

The survey included two steps. During the first step, the visual behaviour of subjects using eye-tracking was analysed, in the second step respondents were asked to complete a questionnaire regarding interest and attitudes towards art.

Eye-Tracking. A Tobii Eye-Tracker X2-60 and the Imotions®Attention Tool software (vers. 5.7) were used to record the eye-tracking data. All measurements were performed in a quiet room and under standard illumination conditions. Each participant was seated 60 cm from the eye-tracker and monitor. The digital versions of the paintings were showed in a \(23^{\prime \prime }\) monitor, at a resolution of 1920\(\,\times \,\)1080 pixels, preserving the original aspect ratio. Participants were informed that their eye-movements were recorded. The eye-movement indicators, on which the analysis is based, are fixations and saccades. Fixations are eye pauses over a particular of interest averaging about 300 ms. Saccades are rapid eye-movements between fixations. Each trial started with a 9-point calibration pattern. After calibration of the eye-tracker, the study started with an experimental task (Task 1). The first slide included instructions for Task 1 and participants were free to read and proceed to the next slide. Three famous paintings, all preserved at the National Gallery of Marche (Urbino, Marche Region, Italy), were shown in a randomised order: “The Ideal City” (Unknown author 1480–1490), the “Portrait of a Gentlewoman” (Raffaello 1507) and the “Flagellation” (Piero della Francesca 1463–1464) (Fig. 1).

Fig. 1.
figure 1

Paintings used for the test.

During the exposure, the proportions of each painting were preserved. Participants were asked to look freely at the pictures presented. Since previous literature results on view durations varied a lot, the exposure time of the stimuli was determined during a pre-test. Twelve students, at Università Politecnica delle Marche, with no particular knowledge of art, were asked to observe a faithful reproduction of the picture “The Ideal City” as if they were at the museum. For this test we used the Eye-Glasses mobile eye-tracker. The average time of observation registered was 64 s. Based on pre-test results, each painting was shown to the participants to the main study for maximum 1 min. The pre-test that we performed was also useful to compare the outcomes between using digital image and using the real-size artwork. In fact, even if looking at the real painting would better simulate the real conditions, we noticed the following issues: due to the head movements of the testers and the low resolution of the glasses, we could not achieve a satisfactory precision to properly identify the areas of interest. Consequently, the test was carried out using the digital version of the artworks. After the three pictures sequence, respondents were asked to look again at “The Ideal City” picture (Task 2). This time, six details of the painting were framed in red in the picture, as visible in Fig. 2.

Fig. 2.
figure 2

“The Ideal City” painting (Task 2) (Color figure online)

The six-framed details were defined according to the existing AR application for this painting. They include some architectural details that experts considered relevant in this painting: the doves, the vanishing point; the capitals; the landscape in the background; the floor and the geometry that characterize the whole painting.

Survey. An on-line survey followed the previous tasks. Participants were asked to focus again on “The Ideal City” painting and to rank the six particulars from the most interesting to the least interesting. Then, a set of twenty items was selected to measure participants’ interest in art (Table 1). Subjects were asked to express their level of agreement with a 5 Point Likert scale (from “Strongly Disagree” to “Strongly Agree”). This interest-in-art scale included items adapted from three existent scales (by [2, 15]). Six questions were added to assess participants’ art experience [2]. Socio-demographic questions (gender, age, level of instruction) concluded the survey.

Table 1. Interest-in-art scale. R means “reversed scale”

4 Results

Data collected were extracted using the IMotions®Attention Tool software and analysed using STATA vers. 13. IMotions®provides different metrics for each Area Of Interest (AOI): the TTFF-F and the Time spent-F. The TTFF-F represents the time to first fixation or in other words, it identifies which areas of interest the participants at first sight saw; while the Time spent-F provides the time spent in a specific AOI. In general, a short time value of TTFF-F indicates that the participant’s fixation for that particular AOI started immediately as the image appeared on the screen; while a high time value of TTFF-F shows that the fixation achieved late or not started. The TTFF-F value is equal to the entire exposure time of the image when the fixation not started.

Fig. 3.
figure 3

Heat map of “The Ideal City” with free-screening (Color figure online)

Figure 3 shows the heat map, for all 40 participants, when they were asked to observe the painting as they were at the museum, according to Task 1. In this figure, the red colour in the central part of the painting shows that this is the most attracting area. Participants’ eyes formed a “vertical line” in the middle of the painting where is the Baptistery is located. This vertical line equally divides the painting into two parts. The heat map shows that, in the central area the number of fixations, during the “free-screening”, was larger (1398 fixations) than in the rest of the painting (Fig. 3). The TTFF-F in this area is 0.5 s, meaning that, as the picture appeared on the screen, participants looked in the middle at first and persistently look in it for quite a long time. The Time spent-F was higher (11.00 s), as compared to other areas. The heat map, when looking at the TTFF-F values, also indicate that once participants have looked in the central part of the painting, they move their attention to the nearest areas. The subjects’ saccade pattern is defined in Fig. 3 by the sequence of the numbers from 1 to 7. The number of fixations and the Time spent-F decreases from the centre to the external areas of the painting. While, the TTFF-F increases from the centre to the external sides of the painting. Subjects were initially attracted by zone number 2 (TTFF-F = 6.5 s, Time spent-F = 6.1 s and 763 fixations) and, then by zone number 3 (TTFF-F = 7.7 s, Time spent-F = 7.2 s and 846 fixations), both very close to the dominant area. The more external areas, including the zones 4, 5, 6 and 7, are characterized by a small number of fixations (zone 4: 63 fixations, zone 5: 92 fixations, zone 6: 44 fixations, zone 7: 53 fixations). In order to deepen the investigation, we relate the analysis of AOIs during the free screening (Task 1) with the results of Task 2, when we asked to observe the painting with the 6-framed details of the AR application. In Table 2, relevant data for each AOI as obtained in Task 1 and Task 2 are reported.

Table 2. The six AOIs metrics for Task 1 and Task 2

Results reported in Table 2, show that AOI number 2 collected the highest number of fixations for both tasks. In this AOI, participants also spent the majority of time. The number of fixations for AOI 2 is significantly higher respect to AOIs number 4, 5 and 6 (\(p < 0.001\)). Results for all the other AOIs, for the two tasks, differed. During the free screening phase (Task 1) participants essentially focussed their attention on the central areas of the painting, where they spent most of the time. During Task 2, when obliged to follow the instructions and to look at the six red rectangular (Fig. 4), participants show a higher number of fixations but more balanced among the AOI.

Fig. 4.
figure 4

Heat map of “The Ideal City” with 6 areas of interest (Color figure online)

Results indicate that the attention of participants was higher during Task 2. Both the number of fixations and the time spent increased for each AOI. The number of fixations is significantly higher for all AOI (AOI 1 \(p<0.05\), AOI 2, AOI 3, AOI 4, AOI 5 and AOI 6 \(p<0.001\)). Participants spent more time inside each AOI: the increase of time spent is significantly higher for AOI 2 (\(p<0.05\)), AOI 4, AOI 5 and AOI 6 (\(p<0.001\)). Participants’ visual route changed when they were asked to follow the instructions (Task 2). In other words, during the free vision of the painting (Task 1) the heat map shows a wider and opened discover of the painting, later, when the view includes the 6 framed details (Task 2), participants only explored the six framed zone. Comparing the two heat maps, the vertical line in the central part of the free-vision disappears during Task 2 and the time spent on the painting is more distributed across the six-framed particulars. The presence of a vertical line means that, according to the respondents, there are other attractive areas that should be included in the AR application. Two areas of interest generated a high number of fixations: the window above the central door of the Baptistery (337 fixations) and the cross of the Baptistery (136 fixations). It is important to note that the area around the cross of the Baptistery, although it was not immediately watched (TTFF-F = 28.8 s), especially attracted the attention of participants eyes (since the number of fixations was relatively high). The temporal order of AOI (TTFF) also differed in the two tasks. On the other hand, the marginal zones are less interesting for respondents who essentially focussed on the details in the central vertical line and those on areas closed to that. This analysis reduces the areas of interest to 5 zones, all around the Baptistery: the central door, the window and the cross all on the Baptistery, plus AOI 2 and AOI 3 (see Fig. 3 for AOI codes).

5 Discussion

In this study we used eye-tracking to investigate subjects viewing pattern behaviours in AR applied to a single painting (“The Ideal city”). The preliminary results confirm some of the previous findings on art viewing behaviour investigated with eye-tracking. We found that, participants attention fall on the centre of gaze of the painting [20]. In the lack of a human figure, the Baptistery area acts as an attractor [16]. In this area the number of fixations, during both free-screening and with 6-framed areas, was significantly larger than in all the other areas of the painting. The eye-movements vary between free vision and when participants are asked to follow a task, as in [21]. People visual patterns changed as function of the task requirements by focussing their eyes on the 6- framed AR regions, in place of a global searching strategy in the free screening. Furthermore, the task influenced the temporal order of the view.

Experimental tests, depicted in Figs. 3 and 4 demonstrate that our eye-tracking system provides useful input information to design a personalised user experience in AR applications. For these reasons, considering the heat map of Fig. 3, we divided the painting in six equal areas (Fig. 5).

Fig. 5.
figure 5

Heat-maps of paths. Darker colours represent the areas with higher fixation time. (Color figure online)

At a first stage, we analysed all eyes-tracks, calculating a posteriori probabilities for eye presence in an “area of interest” and for transitions between adjacent areas (see Fig. 6).

Fig. 6.
figure 6

Markov chain with a posteriori probabilities of a model of the attention scheme over the painting.

This representation is a model of the attention scheme that can be incorporated in the AR applications to have a transition probability or to guide the user on a novel AR interaction scheme. Nodes represent “area of interest” with presence probability, darker colours indicated high probabilities. Links represent passages between adjacent areas with transition probability. For example, node B has the higher presence probability of 30 %, while link A–B has the higher transition probability of 80 %. These probabilities can be used to analyse movement trends in front of painting and comparing them with a priori expected behaviour, they can provide insights to improve our application with multimedia content.

The proposed solution opens up new outcomes and possibilities, especially to improve the design and user experience of AR applications. First of all, by knowing in advance the visitors behaviour when looking at a painting, it is possible to know where their attention was focused. In this way, thanks to AR functions, will be possible to guide the user attention through virtual contents, also for the areas with less interest. The use of a visual sensing approach, besides, permits to recognize predefined patterns, based on the gaze tracking of the visitor. This represents a useful tool that allows the museum managers to know which are the details a visitor is interested in. The second important contribution will be the possibility to customize the contents according to the user typology. The adapted content, with predefined visitors’ profiles, can make the experience exciting and rewarding, promoting a more tailored learning. By monitoring the interest and engagement of the visitors the organization of multimedia exhibition would be strongly improved. Finally, this methodology represents a reliable way to perform a real User-Centered Design (UC) informed approach, as well as to identify the system requirements, using an interdisciplinary approach that places the CH professionals at the center of the design process.

6 Conclusions and Future Works

We presented a novel approach for analysing AR usability evaluation. The use of eye-tracking has a potential measure of analysing people attitudes during visual processing offers an deeper understanding of possible common patterns of fixations. In the painting “The Ideal City” the attention fall on the central area, subjects were inevitably attracted to the area in the painting with the most relevant particular in front of the subject. The manipulation implemented by the researcher (adding AR squares in specific areas) acted as a visual cue, partially modifying the natural common pattern of exploration of the painting. Further analyses are needed to investigate possible influences of participants experience and interest in art on viewing pattern behaviours. Additionally analysis of physiological data, associated with some statistical analysis on the main indicators (gaze, TTFF, fixations, etc.) should be should be performed to identify possible dynamic patterns.

Our future researches are directed towards the use of eye-tracking techniques to investigate human behaviour for works of art, to provide Augmented Reality to users based on gaze information. Our research work have several researches going on in the filed of AR and Smart Environments [19, 22], and the results obtained with this study are promising in the direction of making museum spaces more interactive and aware of human presence. Moreover, we intend to improve our eye-tracking system. We will conduct a user evaluation for identify the type of person observe painting: “expert” of art or “non-expert”. For this purpose we will use machine learning techniques and many classifiers will be tested and compared, to investigate the application’s adaptability to the visitor, based on the user profile and on the anticipation of the users’ interest. It is well known that one of the most common problems when dealing with the development of marker based AR applications is the display of different contents at different locations at the same time. By sensing the “user awareness” to the system, for instance allowing the application to know what area the user is interested in at the moment with the aid of gaze tracking devices, contents will be provided in a more reliable and proper way.