1 Introduction

An inherent challenge in mixed traffic environments of the future is that manually driven and automated vehicles (AVs) will need to interact with non-automated road users, such as pedestrians and cyclists. This interaction may occur in ambiguous scenarios, where the rules of the road may not be clear, either due to lack of clear environmental/infrastructural advice, and/or as a result of local or national cultural and behavioural “norms”. Here, there is typically a need for cooperation, and constructive communication and interaction between these different actors, so that they may reach an agreement regarding safe future motion plans, especially if they share the same road space. Such interactions are currently quite frequent in an urban environment, and with the introduction of “driverless” AVs, it is important for all road users to have a good understanding of the intentions of these vehicles, especially in the absence of an accountable human operator. Therefore, there is a need to understand how the right cooperation strategy between all road users can be developed, to ensure successful deployment and acceptance of AVs by all road users, and promote smooth and cooperative flow of traffic.

The vision of the interACT projectFootnote 1 (https://interact-roadautomation.eu), funded by the European Commission, is to develop novel, holistic interaction concepts for AVs, that will enable the future integration of these vehicles in mixed traffic environments, in a safe and intuitive manner. Currently, as road users, humans use multiple means of implicit cues, such as approach speed, and explicit communication, such as eye contact and gestures, as well as vehicle signals, to anticipate the intention of the other traffic participants on the road. Although the exact means of communication can differ across different regions and cultures, these acts allow effective coordination of future motion plans between different road users. However, currently, AVs are thoroughly lacking such coordination capabilities, and their interaction with other road users is often limited to, and mostly dominated by, the rational principle of collision avoidance. Therefore, to safely integrate AVs in complex, mixed traffic environments in the future, we must ensure that the AV can interact with other road users in an intuitive, expectation-conforming manner. This will allow other road users, as well as those on-board the AVs (who may still be required to resume control in case of emergencies), to correctly interpret the intentions of the AV, and coordinate their planned actions accordingly. Results from a previous study, conducted during the CityMobil2 project [1], showed that when interviewed after interacting with low speed AVs (which operated in a shared space setting, and without a driver) pedestrians and cyclists highlighted the importance of some kind of external communication messages from these AVs, to compensate for the absence of an accountable operatorFootnote 2. A message that acknowledged they had been detected by the AV was rated highest by this group of 664 respondents, interviewed across Greece, France and Switzerland. As a follow-on to some of the human factors questions addressed in the Citymobil2 project, interACT is conducting further work in this context, to enhance knowledge in the field, and improve the interaction of AVs with both the on-board user and pedestrians by:

  • Developing psychological models of interaction as the basis for the development of a “Cooperation and Communication Planning Unit”, a central software unit for the integrated planning of intuitive AV interaction, based on the AV behaviour, and explicit communications with its on-board user and other traffic participants;

  • Enhancing methodologies for intention recognition and behaviour prediction of other road users, to allow shared situation awareness, and coordinated and safe vehicle behaviour planning;

  • Establishing a safety layer for all situations, in which interaction is not possible/not safe enough, or in case of interaction failures, i.e. due to misinterpretations;

  • Developing novel fail-safe trajectory planning methods, with a special focus on complex mixed traffic scenarios;

  • Establishing new evaluation methods for studying interaction of road users with AVs, and ensuring user acceptance.

Efforts are currently undergoing by project partners in Germany, United Kingdom, Italy and Greece, to achieve the above goals. This chapter reports on the efforts achieved in the first year of the project, which has included: (i) extensive observation and interview studies conducted in current urban environments, noting the types of interactions and communications taking place between pedestrians and drivers; focusing particularly on low speed environments and un-signalised junctions; (ii) Lidar and video-based analysis to obtain kinematic data regarding road users’ interactions in a complex setting; (iii) Human-in-the-loop virtual reality studies, to understand pedestrians’ crossing behaviour in response to vehicles travelling at various speeds, investigating whether different types, positions and colours of externally presented messages from AVs affect crossing behaviour; and (iv) mathematical modelling techniques, used to inform AV developers of the types of interactions expected by other road users, and how this can be managed by the AV, to create better traffic flow, and a fairer, yet more cooperative, relationship between different road users sharing the same road space.

The next sections provide a short overview of each of the above investigations, summarising our current understanding of the state of the art, and briefly comparing these to related studies in this context.

2 Human Interactions and Negotiations in Current Urban Settings

2.1 Pedestrian-Driver Interaction at Un-Signalised Junctions

To understand how drivers and pedestrians currently interact with each other at un-signalised junctions, where negotiations are necessary in the absence of clear infrastructure-based guidelines, such as traffic lights and zebra crossings, we started our investigations by observing current behaviour in urban settings across three European cities. A series of on-road observations were conducted in: Leeds, UK; Athens, Greece; and Munich, Germany; which were also accompanied by birds-eye view video recordings of the junctionsFootnote 3 (see Fig. 1).

Fig. 1.
figure 1

An aerial view of the intersections used at Leeds (left), Athens (middle) and Munich (right). Yellow arrows represent the location and direction of pedestrians’ crossings. The blue and green lines represent the direction of travel for vehicles. The red stars represent the location of a group of two observers who used the mobile app to record observed behaviour (for further details see [2]).

Following a project workshop amongst the partners, effort was made to find a similar setting across the three cities, although practicalities regarding ease of data collection, erection of video cameras, and geographical differences, created some challenges regarding an exact match. The main criterion here was that observations should be based at un-signalised locations, encouraging “jay walking”, in order to assess any negotiation tactics used, in the absence of formal traffic rules. Extensive effort was then invested by partners to create an easy to administer, HTML-based observation app see [2, 4], which was comprehensively piloted before data collection, to ensure researcher familiarity. Two observers were then positioned at designated locations in each city, and recorded any observable behaviour by the pedestrians, drivers and their vehicles, using the app. Communication between the observers took place throughout data collection, and the type of data recorded included: body signals from the pedestrians and drivers (hand/looking behaviour), observable messages from the vehicle (such as flashing lights or honking horns), and any “negotiation tactics” by either actor, with regards to the crossing manoeuvre, such as stopping, decelerating, or crossing the road. The app also allowed recording of road user demographics (gender and age category), road details (exact location) and weather. For a more comprehensive overview of the observation protocol, see [2, 5] for more details).

Data from 989 pedestrian interactions were collected by these observations. Overall, results from these studies showed quite similar behaviour by all road users, regardless of country studied. An interesting observation, also confirmed by the work of others in this context [7, 9, 10], was the distinct lack of explicitly observable gestures by the pedestrians and drivers, with less than 4% of pedestrians and 3% of drivers using hand or head gestures during the negotiations. Honking and flashing lights were only seen for 1% of the interactions. Instead, results suggest that pedestrians may use the vehicle’s behaviour to determine their crossing decision, crossing when they ascertained yielding by the vehicle. Indeed, a follow on questionnaire study, administered to a subset of the pedestrians (~20%), after they crossed the junction, confirmed this prediction [2]. An interesting observation across all sites was that, overall, only 72% of pedestrians looked towards the vehicle as they crossed the road. Therefore, this could provide an interesting challenge for the AV, if it is to establish whether pedestrians have identified its presence, before approaching a shared location, where there is no obvious clue from their body language.

2.2 Results from Video- and Lidar-Based Data Analysis

In addition to the observation studies outlined above, video recordings of the interactions were conducted, by placing cameras in an elevated position, overlooking the junctions, as shown in Fig. 1. Computer vision was used for developing detection, classification and tracking algorithms, combined with camera calibration and homography, to extract kinematic data from observed traffic participants (see also Sect. 4.2). A ground based LiDAR was used to record the positions of traffic participants over time, reducing the use of kinematic movements from videos, and removing any personal data. However, challenges existed for use of this ground based LiDAR, due to occasional obstructions. Therefore, the link between these recordings and manual observations were key, to provide a more holistic overview of the interactions.

Analyses from the LiDAR and video data are currently ongoing, although preliminary results suggest that there was a velocity threshold for interactions, where drivers mostly provided pedestrians with a right of way, when their travelling speed was already well below the allowed speed limit. This, more cooperative, behaviour from drivers was mostly observed in congested traffic, and during tailbacks at signalised intersections, because drivers had already reduced their travelling speed, and it was therefore easier to offer a clear path for the jaywalking pedestrians. As discussed further below in Sect. 4.2, such an interaction will be quite problematic for current AVs, as an approach from jaywalking pedestrians will likely result in a yielding action by the AV, to avoid collision, which will likely cause more erratic flow of traffic, especially for other, human-controlled, non-automated, vehicles.

3 Using Virtual Reality to Study Human Interaction with Future AVs

Recently, a number of vehicle manufacturers, keen to deploy driverless “robotaxi” – style AVs without a responsible human controller, have begun discussing the benefit of implementing some type of external message on an AV, which will inform pedestrians about its behaviour – replacing any human-based communication [3]. Our human factors work from the CityMobil2 project provides rather mixed results about the best type of message used in this context. This study showed that, although pedestrians in Greece, Switzerland and France were all keen to receive some sort of information from the AV, preference for visual versus auditory messages was rather mixed across different groups. The type of message preferred was also linked to the behaviour depicted by the AV, and varied due to cultural norms, as well as the different infrastructures available to the AV [8].

Overall, however, respondents from this project preferred the use of conventional signals (lights and beeps) to text and spoken words, and wished to receive either visual or auditory signals that would announce information about whether or not the vehicle was turning/yielding/beginning to move. Other work in this area has begun to investigate the matter further, testing a variety of driving conditions, to establish the efficacy of such external messages e.g. [6]. In addition to the above examples, studies have investigated the value of messages that are used to express: whether or not it is safe for the pedestrian to cross, whether AVs that look like a conventional vehicle should signify their automation status, and whether particular types, colours, and locations of lighting are better than others [9,10,11]. Results have been mixed, with some showing major changes in crossing behaviour, such that pedestrians’ receptivity towards AVs significantly increased with the presence of external HMIs [12], and others, for example [10], suggesting that pedestrians rely on the behaviour of the vehicles rather than the information on the external HMI.

In the absence of easily accessible (fully) driverless vehicles, which can be used to portray different types of external interfaces for communication with pedestrians, novel tools such as Wizard-of-Oz techniques [13], human-in-the-loop pedestrian simulators [13], and immersive Virtual Reality (VR) Head-Mounted Displays [14] are used to provide a suitable alternative for cost-effective, controlled and repeatable research studies in this context. In the interACT project, VR has been very effective for such research, evaluating and improving potential interaction strategies between humans and future AVs. Here, design-focused workshops with expert and naïve participants have been used for visualisation of potential solutions, with relatively minimal effort spent on defining and refining external HMIs (eHMIs), before deploying them for actual user studies on prototype vehicles [15].

VR offers the opportunity to study human-AV interaction for assessing the speed and quality of comprehension of AV behaviour, or for assessing the traffic participants’ behaviour or emotions in response to the AV. For example, Head- Mounted Displays (HMDs) have been used in the project to assess pedestrians’ actual crossing behaviour in VR, in response to vehicles with different kinematic features [2]. This type of manipulation is useful for evaluating participants’ feelings of safety when interacting with an AV, and assessing the efficiency/receptivity/learning effect of different eHMI designs. They also provide knowledge on choosing the most appropriate time gaps and conditions for testing future eHMIs.

For example, in a study by [16], participants saw a pair of vehicles approaching from the right, and were asked to cross the road, after the first vehicle had passed. The approaching speed of the second vehicle was manipulated (25 mph, 30 mph or 35 mph) and the time gap between the two vehicles ranged between 1–8 s (with 1 s increments). In addition, the second vehicle was either decelerating as it approached the pedestrian, or not. Data from the decelerating trials showed that 51% of crossings happened before the second vehicle decelerated, and 31% of crossings happened after the approaching vehicle had stopped, with only 18% of crossings happening during the deceleration. Results which are also confirmed by the modelling work of [31] in this context.

Previous Wizard-of-Oz studies investigating pedestrian response to “fake AVs”, have shown that pedestrians did not feel comfortable, or safe, crossing the road in front of the specially customised vehicle, where the driver (sitting behind a fake steering wheel in the passenger seat of the modified vehicle) was seen to be asleep or deeply engaged in reading a newspaper [17]. In the interACT project, we investigated this matter further, using an HMD VR-based study, where participants were asked to cross the road in a set-up similar to [16] described above (see [19]). To establish if driver presence and attention affected crossing behaviour, the second vehicle in this study (which was always travelling at 30 mph) was presented in three different conditions (no driver, distracted driver – looking down, and attentive driver – looking straight ahead). To ensure the VR setup was realistic, and the drivers were actually visible, all participants completed a short set of trials at the end of the experiment, pressing a button on the controller to confirm the presence or absence of drivers in the vehicle. Although pedestrian crossing behaviour was not affected by the three conditions, follow-on questionnaires on perceived behavioural control and perceived risk [18], showed that the “driver present” conditions were rated higher than driver distracted/driver absent trials [19].

Finally, the interACT project’s VR studies on explicit communication by AVs, which have utilised various visual message strategies such as ground projections, directed signal lamps and LED bands, have thus far shown a reduced initiation time for pedestrians to cross the road. Although objective studies showed little difference in crossing initiation time between the different concepts, participants reportedly preferred animations and symbols to static images [20] (Fig. 2).

Fig. 2.
figure 2

Depiction of one of the VR studies, showing participant with the HMD and the road environment used for studying crossing behaviour

4 Computational Models

4.1 Neurobiologically-Informed Mathematical Models

Another, even more concrete, way of describing how road users behave when they interact with each other in shared space, is to develop mathematical models that permit computer simulations of the interaction behaviour. Traffic microsimulation is a well-established field of research, and commercial software products exist that permit traffic simulations that are accurate on the scale of a large junction or a city centre, for example, to predict how a range of alternative road infrastructure designs will affect traffic throughput [21, 22]. The road user behaviour models in these traffic simulations are, however, not designed to capture the details of local interactions, and this underdeveloped area is now garnering increasing attention, with some modellers approaching it from a traffic microsimulation starting point [24, 25], and others addressing it as a data-driven machine learning challenge [26,27,28].

In the interACT project, a third type of approach has been taken, partly because the aim has been very specific: to generate useful insights and tools for the AV-human interaction design work in the project. To this end, a novel modelling framework has been proposed [29], building on psychological and neuroscientific models of decision-making, such as evidence accumulation [23, 29, 30]. The benefit of this type of framework is that it allows the model to integrate sensory evidence, both from AV movement and eHMI messages, in a manner that is both mathematically straightforward, and neuro-biologically plausible. This model has been applied to a pedestrian crossing decision, qualitatively reproducing the empirically observed tendency of pedestrians to either cross early in front of a yielding vehicle, or otherwise wait until the vehicle has come to a complete stop [16, 31]. This type model can also be used to study efficiency of AV-human interactions [31]. Figure 3 shows how this tentative model predicts a considerable traffic flow benefit, both of the AV providing an eHMI message which signals yielding (panel b compared to panel a), and of the AV slightly exaggerating its yielding deceleration (moving upward along the y axis). Currently ongoing, but not yet published work, has shown that this type of model can be successfully fitted to observed human behaviour, both in vehicle-pedestrian and vehicle-vehicle scenarios.

Fig. 3.
figure 3

Results of simulations with the pedestrian crossing model proposed by [31], showing how the 80 percentile of lost time for the AV due to the interaction (i.e., how much earlier the AV would have arrived at its destination, had the pedestrian not been present), as a function of time left to the pedestrian crossing when the vehicle initiates yielding (TTC), and the magnitude of the yielding deceleration. Panels (a) and (b) show results without and with an eHMI indication of yielding, respectively. Figure from [31]. Copyright © 2018 National Academy of Sciences. Reprinted by Permission of SAGE Publications, Inc.

4.2 Using Game Theory to Understand the Interaction Between Pedestrians and AVs

Controlling autonomous vehicles in the presence of pedestrians, when they are competing for the same space, requires an understanding of the processes of interaction and negotiation between them. Game theory provides a formal basis for modelling multi-agent competitive interactions. For instance [32] constructed a mathematical model of interactions between two such agents approaching an unmarked intersection, a technique which can be modified to a range of scenarios, such as pedestrians crossing the road, vehicle-vehicle interactions, and pedestrian-pedestrian interactions. The model is based on the “game of chicken” of Game Theory, extended to a temporal model, and is deliberately simplified as much as possible, to illustrate the core idea, using heavy quantification of both space and time. Here, at each “tick”, a game is played in discrete time, in which the two agents can choose to yield, by moving forward slowly, or asserting themselves by moving forward quickly. They approach each other in this way, and will collide, unless one of them yields. The mathematics of the model shows that the optimal strategy when both players have the same utilities, is the same for both players, and is probabilistic. At each time, they should flip a biased coin and yield if they get heads; with the bias of the coins increasing towards heads with certainty, as time runs out. The incentive to cooperate and avoid a disaster thus grows with time. The model further shows that if the utilities are modified to make one player survive better in a collision – such as being a vehicle vs a pedestrian, or an SUV vs a smaller car – then even a small change in these utilities will break the symmetry of the model, and give them a relatively high chance of winning every interaction.

The above model has been tested in experimental laboratory settings e.g. [33], where a board game was played by seated participants, to model the same collision scenario. This study showed that the behaviour of the players can be fitted via a Gaussian Process over parameters, using the model. The authors then extended this setup to a more realistic, but still heavily constrained, physical laboratory experiment, with participants walking towards each other in discrete time and space, using the same methods to fit the model parameters.

Overall, results from this model suggests that AVs must retain an ability to deliberately cause harm of some sort to other road users, in order to make any progress at all. The value of this model has been included in a consultation paper, currently out in circulation by the UK Law Commission, on Autonomous Vehicles [34], and may contribute to changing the law of the UK, to ensure a fairer relationship between AVs and other road users.

It can be argued that the “chicken model” described above is quite crude in its assumptions, especially as it does not yet include any ability for the players to signal information to each other than via their speeds and positions. However, it has been argued that real-world interactions may include many such signals, such as eye contact, head direction, and body language, which, as shown in our own observations is particularly useful for solving conflicts. To begin to form an understanding of these signals for later use in the chicken model, we used the video recordings from Leeds outlined in Sect. 2.2, to investigate which of a bank of such signals are useful to predict the final outcome of pedestrian-car interactions [35].

This basic model was later extended to a temporal filtration model, which shows how the probability of the game winner evolves over time as a function of the signals provided [36]. Future work will try to integrate these models, using signalling behaviour as an additional input to the game theory mathematics, to refine its solutions, then test them in virtual reality, and physical AV human experiments.

5 Summary and Conclusions

This paper provides an overview of the complex relationship that exists between different road users in a mixed traffic environment, and summarises the preliminary results of a set of behavioural studies conducted in this context, as part of the European interACT project. It highlights the value of using different methodologies to understand the behaviour of road users in current traffic settings, illustrating the complexity of road user behaviour, and the influence of infrastructural, social, and cultural norms, in this context. Knowledge is currently building on how new methods can be used to help towards innovation of new forms of communication and interaction for future AVs, such as the value of external Human Machine Interfaces to replace the communication currently provided by human drivers. The challenge here is to ensure that AVs’ manoeuvres in mixed settings are safe, and therefore acceptable by all road users. However, it is also important to ensure that AVs’ behaviour and progress is not restricted by their, currently limited, obstacle detection rules, which will reduce their ability to achieve a smooth and uninterrupted journey. As more AVs are introduced for testing in such settings, it is likely that human interactions with them will also change, following some level of Behavioural Adaptation [37]. This allows the likelihood of more knowledge to be gained by both AV developers, and human road users, ensuring that these two actors can cooperate more efficiently with each other in the future urban environment.