Introduction

“‘Seeing’ happens effortlessly and yet is endlessly complex.” (Hatfield and Allred 2012) The influence of the visual properties of a built space or landscape on the behaviors of people within them and the manipulation of these visual properties to cue or constrain behaviors are subjects of long-standing archaeological interest. Within the archaeological discourse questions about the relationships between visual-spatial properties, individual behavior and intention in design abound. Are two megaliths inter-visible in all seasons and weathers? How close do you need to be to a relief to see the details in the carving? Is a privileged view of a stone accessible from one or multiple viewpoints and how many? How sensitive to variations in lighting is the visibility of a rock art group? Where is your attention drawn when you walk through a doorway, absent any prior knowledge or cultural constraints? The subject has been treated by numerous scholars, and we may cite Llobera, Lock, Paliou, Eve, Hamilakis, Chrysanthi and Ashley (e.g., Chrysanthi et al. 2012; Hamilakis et al. 2002 ; Llobera 2007; Llobera 2011, Llobera 2012, Lock 2003; Lock et al. 2014; Paliou et al. 2011; Paliou 2011; Paliou 2013; Eve 2012; Eve 2014; Ashley 2012), to name only a few of those whose work has been broadly influential.

As this line of questioning has developed, the emphasis has shifted from visibility or inter-visibility to more subtle questions of the visual structure of a built space or landscape, the visual affordances created, and the possible visual experiences of past individuals or groups. Llobera (2005) provides an overview of the state of the art at the time and reviews the main questions and critiques of archaeologies of visual perception and visualscapes. GIS-based approaches have dominated calculation-driven research in this area to date, with most gains over the past decade coming in the size of the area that could be modeled and or the detail in which it may be analyzed.

Now, advances in cognitive neurosciences and a suite of improved computational modeling tools (Pashler 2016; Petersen and Posner 2012; Parks et al. 2015; Borji and Itti 2015; Borji et al. 2013; Julien and Riche 2016; Li et al. 2013; Dickinson and Pizlo 2015), combined with the proliferation of detailed 3D models of archaeological complexes and landscapes, offer an opportunity for new approaches to this topic, based on models of low-level perceptual cues and visual attention (see Chum and Wolfe 2001 for a nice overview of the principles and models of visual attention and a thorough bibliography). The new approach described here takes aim at the question of where people will look, rather than simply what is visible. The goal is to investigate the intentions of designers of spaces and to interrogate the visual aspects of the experience of a place. Specifically, this paper explores how measurements of visual saliency might help us to interpret the choices past people are making with respect to the visually interesting features of a place.

In simple terms, our approach involves placing detailed 3D models of a built space or landscape into a digital environment where realistic lighting and movement are simulated. An individual then walks through that space or landscape using a first-person controller, a standard element of most gaming environments (Steuer 1992). What is visible at each moment is recorded in the form of a video stream, which may be broken down into a sequence of connected images or scenes. This set of scenes is then analyzed using software that calculates the visual saliency or basic attractiveness in terms of attention of each area in the scene. The areas of high and low visual saliency are mapped across the scenes, allowing for a ready assessment of which features or areas are particularly visually attractive and which are less likely to draw attention (following the kind of approach used in, e.g., McNamara et al. 2014; Baluch and Itti 2015; Asteriadis et al. 2014). This set of saliency maps, together with the corresponding images of what is visible in each moment, form the dataset from which further inferences are made. Using this dataset, we may consider the placement of decoration or objects in places of high or low visual salience or how objects or decoration might increase or decrease the visual salience of a particular spot.

The advantage of the approach described here is that it allows us to identify spots (combinations of features, scenes) as visually salient without the influence of culturally contextual knowledge that there is something present that “should be” attractive or important. It allows us, to a greater extent, to separate what we think is interesting in a specific cultural context from what simply attracts the eye. The tension between these two things is the starting point, from which we can ask questions about what might have been important or interesting to someone experiencing the world with a very different set of references and cultural cues.

The low hanging fruit for this kind of study includes sanctuaries and churches, large public markets, Roman villae or elite houses, and the many regionally and chronologically specific variants on these categories of places (e.g., Brück 2005; Bjur and Frizell 2009; Renfrew 2007; Bertemes and Biehl 2001; Wallace-Hadrill 1988; Fredrick 1995; Wallace-Hadrill 2008; Pollini 2012; Molyneaux 2013). A second requirement is a reasonably good level of preservation, such that the place may be reconstructed and/or documented at a good level of detail.

We carried out our initial experiments at Knowth (Fig. 1), a multi-period but primarily Neolithic sanctuary that is part of the Brú na Bóinne complex in Ireland (Eogan 1986; Eogan 1998; Eogan et al. 2008). By grounding our work firmly in the context of how people experience spaces visually, as understood on the basis of low-level perceptual cues, we hope to reassert the broad anthropological relevance of studies of visual properties of places like Knowth.

Fig. 1
figure 1

The main mound at Knowth and the eastern passage

A Brief Overview of Studies on Visibility and Visual Experience in Archaeology and Their Critics

Studies on the visual properties of landscapes and built environments began in archaeology with the simple idea that it would be useful to know what places were inter-visible, with the motivating idea being that inter-visible places might well be more inter-connected. This is a variant on the geographic principle that places closer together, where “closer” may be defined in a variety of ways, are more likely to be related and to interact with one another (Legendre 1993; Tobler 2004; Sui 2004; Miller 2004). Analyses along these lines include explicit point-to-point inter-visibility calculations, viewshed calculations from a site or series of sites, where the visible area around the location is assessed, and cumulative viewshed calculations, where the visual prominence of areas of a landscape or structure are assessed (Tschan et al. 2000). These calculations have been used as proxy data to assess the relative importance of sites, where very visible or very hidden sites are assumed to be most important, areas of control, where sites are thought to have influence over areas that can be seen from them, and connectivity, where inter-visibility is part of the assessment of social connection, and to model travel across an area, where visual access is treated as a parameter for wayfinding or an attractor when modeling best or most likely paths (e.g., Fry et al. 2004; Wheatley and Gillings 2000; Bernardini et al. 2013; Gillings 2009; Lake and Woodman 2003). These analyses, largely carried out within quantitative and processual research traditions, brought to light valuable insights about the landscapes and settlement dynamics of the areas studied. In spite of their utility and popularity, there critiques emerged from within the communities carrying out these studies. Insufficiently accounting for changes in topography, ignoring the impact of vegetation or our insufficient ability to model past vegetation with sufficient spatial detail, the relatively coarse spatial resolution of the models, failure to account for variations in the height of observers, and countless other metric sins (e.g., Gillings and Wheatley 2001; Ervin and Steinitz 2003; Llobera 2006) were pointed out in the literature. These critiques, largely aimed at the “how” of the analyses, were followed by a new volley of criticism from an emerging post-processual (primarily landscape) archaeology (e.g., Tilley 2008; Brück 2005; Van Dyke and Alcock 2003; reviewed in Thomas 2001; Johnson 2012).

Indeed, the next major phase in archaeological analyses based on the visual properties of place was tied up with the emergence of post-processualism, and in particular with post-processual landscape archaeology. Scholars such as Bender (Bender 1993; Bender 1999; Bender 2001) linked the experience of place, including the visual experience, to people’s roles as co-creators of landscapes and the built environment, and argued that a holistic multi-sensory (including visual) approach to past experience of places is necessary to understand human-place interactions. With the emergence of a post-processual set of approaches to the visual properties of past places, and an attendant desire to achieve veridic models of human experience of places, came an interest in realistic and hyperrealistic visualizations, through which a modern person might attempt to approximate past experiences of that place. Phenomenologically oriented archaeologists such as Bender, Hamilton, Tilley, Edmonds, and Hodder have led work in the embodied engagement with landscape in archaeology (e.g., Bender et al. 2007; Tilley 1994; Edmonds 1999).

The interest in embodiment and experience led to a new line of criticism—that archaeological studies of the experience of space and landscape are often blatantly ocularcentric and neglect the importance of the other senses. This idea is behind many of the critiques levied against the GIS-based, classificatory analyses that create visualscapes and viewsheds. For example, Ingold (2000: 281–285) makes a strong argument that the focus on ordering and prioritizing modalities frequently becomes a research objective in itself, rather than a tool for engaging with multi-faceted sensory experiences. In contradiction to this tendency, an explicit sensory archaeology has emerged that draws heavily upon anthropological and ethnographic studies, emphasizing how multi-vocality varies based on placing priorities on different sensory modalities (e.g., Howes 2005) (Frieman and Gillings 2007). In spite of these efforts, the problems of escaping ocularcentrism in the practice of archaeology remain, as noted by Gosden, saying, “We live in a 21st century sensorium which (unconsciously) influences the ways we can understand the past.” Gosden has suggested that to appreciate the sensory worlds of others, we need to unlearn our sensory education with its prejudice toward vision (Gosden 2001: 166)—not an easy undertaking.

Extending this, the argument for reintegrating other senses and unlearning our concentration on the visual has been well made in the context of landscape archaeology, by Thomas (2008) in particular, who has argued strongly against computer analysis taking place away from the landscape. He treats computer or virtual approaches as incompatible with embodiment, which encompasses, “an experience is not limited to what can simply be seen from a point in the landscape, but includes what can be felt, heard, smelt, tasted, and touched; and moreover, how our sensory reactions change as we move through and encounter landscapes from our situated body.” (Eve 2012, p. 583).

As was the case with what we may characterize as processual approaches, scholars attempting post-processual engagements with the visual experience of place were critiqued from within their own community. The emphasis on hyperrealist visual and spatial models in immersive environments (computing, GIS, or gaming) as a primary environment for engaging with past experience of landscapes and built spaces has been broadly critiqued as being overly ocularcentric, falling into the same trap as GIS-based analyses of the visual properties of places. The discourse emerging in the 1990s and early 2000s around both attempts to use visual aspects of place as archaeological information and critiques of such attempts has waxed and waned, taking in more sophisticated viewshed calculations (e.g., discussions in Llobera 2003; Llobera 2007), the incorporation of other senses into calculations of scapes, such as the work of Mlekuz on the area in which one can hear church bells ringing and the role of sound in defining territory (Mlekuz 2004).

We can broadly characterize this as the state of play in the late 1990s and early 2000s. Since this point, the trend has been toward compromise: a balancing of the desire to be grounded in data and systematic in quantification and modeling, and recognition that the experiential cannot be ignored. Indeed, many of the scholars cited above are working across the general divide described, combining quantitative methods with questions and interpretations of the results informed by an interest in past experience. Scholars including Llobera, Gillings, Bernadini, Ashley, and Mlekluz all actively take up the challenge of the balancing act between the quantitative and experiential, and the approach described here likewise attempts to balance these prerogatives.

This experiment is a starting point for the use of visual saliency calculations in archaeological research. We may also design more sophisticated, multi-user experiments and compare where participants walk and look (tracking their movement and eye movement in the digital environment) and compare these with the areas identified as the most visually attractive by software analyzing where people should look based on models of visual saliency. We may add tasks and prior experience in a digital space, factors known to influence reactions to low-level perceptual cues. Much as calculations of visual access and routeways in 2.5D GIS environments have many variant and diverse applications, we can expect that calculations of visual saliency in 3D digital environments will be adapted and applied in numerous ways. The set of approaches suggested here is open to the same critiques as many quantitative studies: the method described focuses explicitly on vision and movement, which cannot fully describe experience. Following archaeology’s interest in multi-sensory experience, we recognize that the incorporation of other sensory properties and experiences is intriguing and seems a clear direction for future work. However, given the current state of technology, we limit this experiment to addressing visual experience.

Principles of Modeling Visual Perception and Attention in Reconstructed Virtual Places

How can we establish an approach to visual experience of objects and places that is experiential but not culturally specific, generalizable rather than individual, and grounded in data rather than anecdotal? Research in the neurosciences and in visual cognition may fortuitously provide a framework for linking between physical models of places and functional models of behavior and experience within them. This work builds on the substantial advances made in several fields related to the neurosciences and visual cognition, notably over the past 5 to 10 years. Further, basic improvements in computer hardware and computer graphics have increased the detail at which data may be efficiently captured and, more importantly, the ability to manage very large datasets on a desktop computer has improved. New compressed formats and algorithms that take advantage of octree or other LOD loading schemes contribute to making larger datasets manageable without the need to invest in non-consumer-grade hardware.

At the same time, substantial advances have been made in gaming and immersive environments, with both Unity3D and UnReal providing platforms at no or little cost for an academic user. Both platforms have active developer communities and are engaged in creating a broad knowledge base and lowering barrier to entry for non-professional developers. Digital virtual environments, in particular those based on gaming or virtual reality, provide a convenient platform for new experiments in modeling how visual attention might have operated in past places. These gaming and VR platforms provide “out-of-the-box” essential features including reasonably realistic simulation of bodily movement, improving models of how lighting interacts with different materials and surfaces, an intuitive, embodied sense of distance through movement, and naturalistic visual interaction with the surrounding digital environment. Textured 3D models of the landscapes and structures within them being studied are readily imported into these gaming and VR environments.

The models themselves may be created based on scan data (as in Fig. 2), structure from motion (SFM), manual reconstruction, or procedural modeling. The level of expertise and effort invested in the creation of the 3D models which are the basis for the experiment has an obvious and direct impact on the likelihood of achieving interesting and convincing results from a study based on their visual properties. Here, we are open to criticism, in that substantial investment in the creation of 3D models which accurately represent, to the best of our knowledge, the past situation is necessary. However, we argue that this concern is mitigated by the number of sites and landscapes documented and reconstructed for various cultural heritage and tourism projects, which may be re-used in experiments on visual perception and attention. Taking advantage of extant models of what are often unusually well-preserved locales, many of which likewise meet the criteria of being places meant cue behaviors or reactions, both facilitate pushing forward the experiments and give a second life to the 3D models created for other purposes. Indeed, the growing popularity of 3D models in archaeology, particularly as SFM lowers the bar in terms of the resources necessary to create the models, is a motivating factor for a project like this. The popularity of SFM and the consequent proliferation of 3D models motivate us to seek out new means to use the growing collection of models to address archaeological and anthropological questions.

Fig. 2
figure 2

The Eastern Passage at Knowth, based on terrestrial laserscan data, seen in Meshlab

Closely tied to growth in gaming and immersive environments and the increase in content produced for them (e.g., detailed 3D worlds, imaginary, or modeled from reality) is the recent explosion of devices designed to be worn to interact with these environments, including head-mounted devices like the Oculus Rift and the Hololens, omnidirectional treadmills, the LEAP controller, gesture and motion tracking gloves, and no doubt others that will rapidly make this list outdated at the moment of publication. These devices broadly incorporate motion, gesture, and eye tracking.

In concert with these technological advances are the aforementioned developments over the past decade in the neurosciences and visual cognition, fields taking advantage of the technological improvements to pursue their own long-standing interests in visual perception and attention for a variety of applications. Of particular interest to us are computational models of visual attention, both bottom-up and top-down, based on data drawn from eye-tracking studies and from theoretical models of low-level perceptual cues.

Models of Visual Attention Based on Low-Level Perceptual Cues

Humans are constantly immersed in environments filled with an overwhelming quantity of visual information. Consequently, a process has evolved that allows us to rapidly select what is important and reduces the information present in order to operate within the limited capacity of our visual system (Chum and Wolfe 2001). This selective process, which prioritizes what visual information is relevant in relation to the situation, is what we term attention. In the attention process, the brain continually repositions the center of gaze on regions of interest, referred to in much of the scientific literature as focuses of attention (FOAs). FOAs appear to be defined by two complementary mechanisms: a “bottom-up” process, in which rapid task-independent scans through scenes define basic features of the greatest interest (salient features) and based on low-level perceptual cues (see below) and a slower “top-down” process, guided by task-dependent priorities and prior experience (Fig. 3). Researchers are still far from consensus on how visual attention works, but recent progress has led to the development of comprehensive theoretical models of the mechanics of attention, linking perception and cognition.

Fig. 3
figure 3

Visual attention and FOA mapped based on different tasks, as presented in the classic case study by Yarbus (1967)

Key to these models of attention is the concept of visual saliency. Visual saliency is the perceptual quality which causes some areas in visual scenes to appear different and visually interesting in comparison to their immediate contexts—in layman’s terms, things that are visually salient stand out. Visual saliency is a driving force in models of natural vision—biologically motivated models, directing eye movements, defining FOA, and participating in object detection and scene understanding tasks.

The recent progress in research on visual saliency draws upon several areas of vision research developed over the last 40 years, including studies of contextual effects on neuronal responses (Albright and Stoner 2002), texture perception (Julesz 1981), guided search theory (Wolf 2007), and saliency-based attention (Itti and Koch 2001). These research areas all touch on two basic questions: which visual features are identified in the visual cortex and how these features create visual saliency. Many computational models of visual saliency have been developed. In Itti et al.’s model (Itti et al. 1998; Itti and Koch 2000; Itti and Koch 2001) which is the basis for the C++ Neuromorphic toolkit used in one of the case studies presented here, saliency is computed on the basis of the relative difference between a target and its surroundings (object and background) across a set of feature dimensions including color, intensity, orientation, and flicker, obtained by filtering across several scales (Fig. 4). These feature dimensions equate to the low-level visual cues mentioned above and are essentially measures of the presence and properties of the basic visual features of any object. Statistical models provide an alternative means of computing visual saliency (e.g., Zhang et al. 2008). In these models, statistical descriptions are computed on either a scene or a set of scenes, and a variety of measures of visual saliency are calculated based on these. Xu et al. (2010) on the other hand argue that for visual saliency to have any biological utility for natural vision, it must be tied to the statistics of natural variations of visual features and their contexts. Therefore, they model visual saliency based on the probability of observing individual or collections of visual variables in specific contexts. In their model, saliency should be high when a visual variable appears with an unlikely context and saliency should be low when a visual variable appears with a likely context.

Fig. 4
figure 4

Conceptual model of visual attention (image by Siagian and Itti 2007 )

The number and variety of computational visual attention models intended for automatically predicting human FOA on images or videos (image sequences) have exponentially increased over the past decade. Several families of methods have been proposed, defined by principles including center-surround difference, contrast, rarity, novelty, redundancy, irregularity, surprise, or compressibility, but all draw on the idea of local variation in a given context. A suite of the main computational methods are compared using standard benchmark datasets by the MIT Saliency Lab, and metrics on performance are available at http://saliency.mit.edu/.

Part of visual saliency is the perception of shape. For archaeologists, we argue that shape perception is particularly interesting and important. Color and surface texture are often poorly preserved, and motion through a space is highly variable. Of the features of past environments, the 3D shape of structures, spaces, and objects is most likely to be preserved. Foundational work by Pizlo on the importance of 3D shape in visual attention and the veridical perception of 3D shape is therefore relevant (Pizlo 2010; Dickinson and Pizlo 2015).

Shape, Shape Constancy, and How Humans Recognize and Identify Things

Archaeologists have long been implicitly interested in the shapes of objects and structures, as attested by numerous architectural studies of the layouts of buildings and typologies of forms of ceramics or lithic tools. But how does one define shape, and what role does the shape of something play in how we as humans interact with it? Many definitions of shape have been given over the years (See Pizlo 2010 for an overview). Pizlo et al. provide a new definition and use the term “shape” to refer to the spatial regularity (self-similarity) of an object. They argue that humans use shape to identify objects, remember and recognize objects, infer an object’s functions, and identify the permanence of objects in changing scenes. They argue that this definition based on self-similarity and internal spatial regularity is the only useful definition because our recognition of a thing must not depend on comparing it with other objects (Li et al. 2013, p.26). This is an important point for archaeologists thinking about new experiences and first encounters, because essentially it allows for people to recognize a new object as something interesting—or at least non-random—without having something else to which to compare it.

Pizlo and Li propose a scale of self-similarity, from the highly symmetrical to the “shapeless” without internal symmetry. Some inanimate objects such as unworked rocks and crumpled papers have no trace of symmetry and are shapeless in their definition; functional objects—the “important” objects, e.g., an acheulean hand-axe—do have varying degrees of symmetry and shape, as do some natural objects and biological forms, e.g., a cow. The implications of self-symmetry and shape constancy for communication and recognition have been picked up in the archaeological context by scholars seeking to understand early human cognition, in particular with reference to the development and use of stone tools (see the review in Wynn 2002 and Hodgson 2011).

For the purposes of this study, we rely on Pizlo and Li’s argument that 3D shape is defined, “based on self-similarity and internal spatial regularity [is the only useful definition] because our recognition of a thing must not depend on comparing it with other objects” to argue that 3D shapes with high degrees of internal symmetry and regularity are recognizable as important things. We can calculate the degrees of internal symmetry and spatial regularity in various ways, e.g., patterns of curvature concentration or distributions of local roughness. By following their definition of shape and its place in visual perception, we can use a variety of morphological metrics to describe objects and evaluate their visual saliency, as well as understand properties that affect their recognition and role in other visual-cognitive tasks.

Pizlo and Li further argue that in shape recovery—the process of recognition—this symmetry is important and responsible for the special and unique status of shape in visual perception and for the fact that shapes are perceived veridically (see Pizlo 2010 for the uniqueness of shape in visual perception). Essentially, Pizlo argues that the uniqueness of shape as a perceptual property lies in the fact that it is both complex and structured and demonstrates that shape alone can be the basis for recognition. He states that, “3D shape is unique in perception, because 3D shape is the only visual property that has sufficient complexity to guarantee accurate identification of objects. Furthermore, 3D shape perception, when shape is properly defined, is almost always veridical. In our everyday life, we can easily recognize important objects such as a car, chair, or dog on the basis of their shapes alone.” (Pizlo 2010) This argument is essential for our experiments, because it justifies our prioritization of 3D shape as a visual property over concerns with fidelity in rendering lighting or color. While the 3D shape of many structures may be reasonable reconstructed, correctly rendering the color and lighting of a place as it would have appeared in the past presents a host of other challenges.

Experiments in Low-Level Perceptual Cues and an Archaeological Understanding of the Experience of Past Places at Knowth

Building on the principles described above, we take two experimental tacks. The first is to use computational models of visual saliency to create models of what and where should be visually attractive. In the case studies presented here, we use Itti’s metric as implemented in the C++ Neuromorphic toolkit. The second is to calculate various metrics of 3D morphology that are, according to Pizlo et al.’s approach, essential to the visual experience and use these as proxies for likely visual saliency and “interest value” of objects and places.

The Chamber Tomb

The first experiment in attempting to apply computational models and principles of visual saliency and shape perception to investigate past spatial-visual experience was carried out using models of the Knowth Passage tomb. Knowth is one of the three main burial mounds that make up the pre-historic complex in the Brú na Bóinne. Knowth, Newgrange, and Dowth—the three main mounds—had their main period of use in the Neolithic. More than 90 further monuments have been recorded in the area. The earth mound at Knowth is ringed with a continuous stretch of megaliths, most of which are decorated, as are the megaliths which line the passage and chamber. The Brú na Bóinne tombs, in particular Knowth, feature the largest and densest assemblage of megalithic art in Western Europe. The tomb contains two long passages, the eastern of which has a tripartite chamber with a corbelled ceiling at its terminus. The chamber at the back contains a basin stone in the northern niche and a semi-closed southern niche. The excavations of the mound led by G. Eogan and his colleagues revealed a large amount of cremated remains on and adjacent to the basin stone. The study of these remains suggests multiple depositions taking place over an extended time period. Therefore, we may imagine many visitors to the tomb, both to deposit remains and perhaps separately to relocate the remains of those buried previously (Eogan et al. 2008).

The location is one where we can readily argue for intentionality in its design, and an expectation of meaningful individual and communal experiences within and linked to the composition of the place, and movement around and through it. The legibility and interactions between the decoration, the physicality and shape of the stones themselves, and the experience of the place have been argued in various contexts, for example by J. Thomas who notes that the emphasis placed on portals and doorways, and points of demarcation and divisions between areas of space within and without the tomb at Knowth, where the kerb stones surrounding the mound, became larger and more ornately decorated in the areas about the entrances to the passages (Thomas 1990:175, citing Eogan 1986). Thomas points out that, “While the earliest passage tombs simply add a boulder kerbed mound and a short passage to the irregularly-shaped orthostatic chamber, developments in the earlier third millennium (the mid-fourth cal. BC) [e.g., Knowth] clearly elaborated the spatial progress to the chamber” (Thomas 1990: 174) and further suggests for passage tombs in general that, “The journey from the outside world into the chamber space is thus a highly orchestrated one, in which the individual is constantly being made aware that he or she is passing between radically different spaces, by being presented with symbols and by being forced to bend down or squeeze through particular parts of the passage.” The eastern passage at Knowth is a prime example, intermittently decorated with spirals and several larger designs on individual stones. The low passage, less than 1.5 m in height, allows light in from the eastern end at certain times of day and seasons, and one imagines anyone entering would have carried a torch or another source of light.

A number of questions emerge when considering the visual experience of walking down the passage, many raised before by scholars like Thomas. Are the decorations meant to be seen, and by whom? How do they relate to the stones and the space? And for our purposes, are they “obviously” attractive, or do you have to know to look for them? What dominates the “untutored” visual experience of a journey down the passage?

The Models and Calculations

The exterior and interior of the passage tomb at Knowth were thoroughly reconstructed following excavation (Cleere 2015), and the structure has been documented thoroughly through laserscanning (on at least three separate occasions, including once by the author). The point clouds obtained were interpolated into colored 3D meshes that form a complete model of the space and its immediately surrounding landscape. The 3D model of the Knowth tomb and its landscape context were dropped into a gaming environment (Unity 3D in this instance) where lighting was simulated and a first-person controller (FPC) inserted to mimic naturalistic movement through the space. The FPC was guided down the eastern passage (Fig. 3) at a normal walking speed and with gaze generally forward. The images of what was in view as the FPC traveled down the passage were recorded as a video stream. From this stream, individual images and sequences of images may be analyzed and the visual saliency of the features present in the field of view at that moment calculated. Several moments where decorative features, specifically relatively small spirals, were emerging into view or passing out of view were selected for initial assessment using the iLab C++ Neuromorphic toolkit.

The iLab toolkit is a biologically inspired system (see the iLab toolkit documentation at ilab.usc.edu/tookit/documentation for a thorough discussion of the parameters used in the model). In this system, an input image is decomposed into a set of multiscale neural “feature maps.” In each map, local discontinuities are described along the axes of color, intensity, orientation, motion, and flicker. Within its calculation of bottom-up models of visual attention, the toolkit allows for a degree of flexibility in the weighting of various parameters. The importance of color, intensity, orientation, motion, and flicker may be weighted, and the size of the center-surround area and degree of normalization may be adjusted. A feature map is constructed for each axis using non-linear spatially competitive dynamics. Essentially, the reaction of a simulated neuron at any location in a map is affected by the reaction of neighboring neurons. This contextual affect is likewise rooted in neurobiological research. The feature maps are combined into a single scalar “saliency map” which encodes for the salience of a location in the scene. The calculation of the saliency map is the essential step in the process as it directly reflects inputs from the pre-attentive, parallel feature extraction stage. It is the fact that the analysis is based primarily on pre-attentive visual cues that allows it to cut across cultures. It provides a baseline that can be helpful as we explore how peoples from different cultural contexts, including past cultures, seem to create and engage with visual cues.

The point of highest salience in the map at any given time is subsequently detected by a winner-take-all neural network, and the focus of attention (FOA) is shifted toward this location. To continue to move the FOA, the current FOA is temporarily weighted to be less important within the saliency map, an effect referred to as inhibition of return. The combination of winner-takes-all and inhibition-of-return results in a visual scan of the saliency map beginning with the most attractive features and tracking down the scale of visual saliency.

The set of algorithms implemented in the toolkit has been expanded since its initial development to include top-down models of visual attention, including task-dependent visual attention, and models that account for prior knowledge. These models are not included in the discussion in this article, but are certainly relevant for archaeologists. In this set of experiments, equal weights were assigned to all the variables in use in each run. In the course of the experiments, calculations were run variously using all the variables, only intensity, only intensity and orientation, and only intensity, orientation, and flicker. While specific FOAs varied depending on the combination of low-level descriptors used in each run, the broad trends in terms of what is visually interesting in the space and the time taken for attention to reach the decorative features, as discussed below, remained consistent.

A Conceptual Exercise in the Visual Saliency of the Spiral Reliefs in Knowth’s Eastern Passage

The visual experience in the Knowth eastern passage does not have intuitively obvious “most salient” points because while each rock face is smooth and in their arrangement going down the tunnel, their edges are grouped from a central-corridor perspective such that what is in our visual field traveling down the passage has many edges and contours. Putting aside for a moment problems of lighting and shadow, head tilt, and the desire to walk quickly, consider the following: spatially in terms of contour (curvature) concentration or roughness and objects standing out from their immediate surroundings as different (cf. Pizlo 2010), how different are the spiral reliefs in the passage from the overall background as created by the megaliths? Curvature concentration and roughness are both basic measurements of shape, and therefore, following the logic that shape is fundamental to visual saliency, these metrics are good proxy measures for the visual saliency of different features in a scene. Both curvature concentration and roughness are calculated over a specific spatial scale, which is computed based on the kernel size. Setting the correct kernel size for the calculation becomes a question of the scale at which we are considering local variation and global rarity. If we consider the normal human field of view and the distance from the sides and ceiling of the passage of a person walking down it, we can select a kernel size or series of kernel sizes that represent an area that is “in view” at any given time. In human vision research, this is sometimes referred to as the useful field of view and defined as the visual area within which we can identify salient points and extract rapidly, without eye or head movement. We can then calculate curvature concentration and related morphological measures like roughness at that scale (as seen in Fig. 5, top). In doing so, the curvature concentration metrics suggest that the overall pattern created by the arrangement of the megaliths is the dominant visual feature in the passage and that the decorative spirals are not natural visual attractors based on their 3D shape alone or their shape and position within the passage.

Fig. 5
figure 5

Roughness values with a 100-mm kernel size, mapped across the 3D model of the chamber located at the end of the east passage at Knowth. Higher values are concentrated at joins and the edges of stones (above), notably here on the corbelled ceiling (shown in normal color below)

Are the spiral reliefs, then, meant to be visually prominent, meant to be noticed? Like the curvature concentration metrics, the results analyses using the iLab toolkit suggest that the reliefs are not strongly visually attractive in their context within the passage.

At spiral A (Fig. 6), located on the right hand side of the passage as one walks toward the tripartite chamber, on the ceiling near the juncture of two of the roof slabs, we can see that depending on the weighting of the intensity, color, motion, and orientation parameters, it takes between roughly 500 and 1000 ms for attention to arrive at the spiral, with 8–12 FOAs prior to arriving there (Fig. 7) walking toward the chamber, based on an analysis of the sequence of images sampled at 100-ms intervals within the 300-ms/1-m/two-step area in which objects are generally identified while walking (see below).

Fig. 6
figure 6

Spiral A

Fig. 7
figure 7

Visual attention and FOA over time, based on intensity only (top), intensity and orientation (middle), and all variables (bottom). In the per-variable saliency maps, the white areas are more salient. FOAs are represented as yellow circles, and red arrows indicate the path of the FOAs as attention shifts across the scene

Walking away from the chamber, approaching the same spiral in a different visual context, again it is not within the initial set of FOA. Rather, the cracks between the megalithic slabs are the most visually salient features and attract the initial FOA (Fig. 8). The structure of the stones dominates all the basic modalities used in the saliency calculation.

Fig. 8
figure 8

Spiral A, seen from the other direction as one walks out of the passage. Again, attention is dominated by the structure of the stones and not the decorative spiral

At spiral B, located further along the passage, the spiral is again located on the ceiling stones, but this time more centrally on a slab, rather than near a join between slabs. At this slightly wider point in the passage, visual attention is dominated by the complex pattern of smaller stones used to fill chinks in the sides of the passage, and again by the joins between megaliths, both between the sides and ceiling of the passage and along each face. It takes more than 1500 ms for attention to arrive at the spiral, as it is drawn back and forth across the passage, with over 20 prior FOAs (Fig. 9).

Fig. 9
figure 9

Visual attention and FOA at spiral B, over time

People walk on average at 1 m per second. How long does a spiral remain in the relatively central part of the field of view, and to how many FOAs does that equate? Or more colloquially, how long do we have to notice the spirals? Walking complicates focus and reduces the time available to look at something like a spiral. All object fixations, or FOAs, must be of sufficient duration to allow people to navigate safely while moving. During normal locomotion, an object is identified at least two steps, or about 300 ms, before reaching the obstacle (Hollands et al. 2002; Patla and Vickers 1997). Further, Patla and Vickers (1997) found that travel fixations were the most dominant gaze behavior, accounting for 60% of all fixations during locomotion. Taken together, we can safely say that items that take 500–1300 ms to win the visual saliency race for the first time will be missed consistently by an observer walking at an average pace, who only has 300–400 ms to notice that object, if we generously give 300 ms as it comes into view for two steps and another 0–100 as it passes out of view (Figs. 7, 8, and 9). Unless a person intentionally stops and looks, they will sweep past the spirals without seeing them. We have highlighted two examples here, but the pattern persists along the passage. Of course, different modes of walking might well provide different results here. A person walking slowly, stopping every few paces, or a tall person who must duck their head, would have a different experience. The modeling exercise undertaken here is, then, a point of departure for future investigations into the influence of different modes of walking, or the basis for an argument that a walk down the passage was intended to be slow, giving enough time to engage with the decoration.

More generally, in this highly ritualized space, the decorative choices made provide clues to the intentions of the space’s builders and users. We must consider the possibility that the spiral reliefs constitute only a minor part of the decoration of the interior of the space and are not intended to be central drivers in the visual experience. We may also consider the possibility that they are intended for an audience that knows to look for them, as they are effectively hidden in plain sight. We might also consider the role lighting could have played in highlighting features a group or individual wished to give importance. The experiments here assume that the interior of the passage is lit at an even, low level. Torchlight carried along the passage should, if anything, exaggerate the saliency of the joins between the stones, as they cast strong shadows. A journey down the passage in the dark, dominated by sound and touch, would require a different kind of experiment altogether. Strategically placed lighting sources might draw the eye to various carvings. Regardless of our favored interpretation of the results, the experiment encourages us to reconsider our archaeologist’s gaze, our modern academic visual experience of the passage, and how we map that to our interpretations of visual experience in the past (c.f. Wickstead 2009).

Discussion and Future Directions

The idea that the spirals are not important to the visual experience of the eastern passage at Knowth for many who traveled down it may feel counterintuitive to an archaeologist, particularly to one well versed in megalithic art and to its importance in this ritual landscape and structure. We are attuned to pay attention to the spirals, to look for them as we go. Our experience and culturally specific situation as western archaeologists or modern observers aware of the importance of the spirals point our interpretation in a different direction than the one suggested both by the basic analysis of 3D shape and symmetry provided through the curvature concentration metric and by the iLab toolkit calculations. The analyses based on low-level perceptual cues provide us with a chance to try and escape our own preconceptions of what is important, meaningful, or likely to interest someone in an imagined past. They further suggest different possibilities for the features that create the visual structure and experience of the space. In this case, the physicality of the stones themselves is the “first-level” experience available to everyone who travels down the passage, while the decoration provides another, perhaps insider experience.

Using the eastern passage at Knowth as an example, we have presented a proof of concept for a new approach to studying visual and spatial properties of structures, building on tools emerging from research in cognitive neurosciences and allied fields. Essentially, by relying on the principles of low-level perceptual cues and the importance of 3D shape in visual attention and recognition, we are attempting to ally a long-standing archaeological interest in the experiential aspects of past places with the knowledge that we cannot take our individual personal experiences as representative of a universal experience. Knowth provides a textbook case study, in that the interior of the passage tomb is well reconstructed, and we may readily argue that the space is intentionally designed and meant to evoke meaningful experience. Variants on the approach taken here might be tried on a variety of built environments and landscapes with reasonable levels of preservation or reconstruction.

The experiments undertaken and described here are far from unproblematic. There are important differences between the experience of movement and vision in a virtual space when compared with those experiences in the real world, and these differences are the subject of active research in neuroscience (see a useful review by Taube et al. 2013) and the gaming industry, where establishing a sense of presence in a virtual environment (e.g., Bosch 2016; Scoresby and Shelton 2011; Thalmann et al. 2016) is seen as integral to the effectiveness of games. Repeating the experiment within more immersive virtual environments, including CAVEs and VR treadmills, which would allow for greater freedom of movement, might well provide different results in terms of the specific features appearing as most salient in this location. The effects of learning and prior knowledge on visual experience and “task-dependent visual search” have not been studied here (Cabeza et al. 2004; Itti et al. 1998). There are other models of visual attention, the question of task-driven vs. undirected visual experience, variations in parameters within the calculations of the visual attention model selected that are possible, the vagaries of interpreting the meaning of the attention maps, and the possibility of problems in the modeling of the space or the likely basic visual intake of an actor moving through it. Specialists in human visual attention are far from agreement on how low-level perceptual cues work, and 3D shape perception and recognition are likewise still actively debated by domain specialists. In spite of these reservations, the direction is a productive one. The approach taken here and its many possible variants open new possibilities for archaeologists interested in the visual structure, design, and experience of past places.