Keywords

1 Introduction

The increasing level of automation in vehicles has a strong influence on the role of the human in the car. With the introduction of conditionally automated driving (L3), the driving task is completely handed over to the automated driving system (ADS), while the human operator remains responsive to intervene in cases of ADS-issued requests or system failures [25]. The associated paradigm change of the human operator, e.g. in partially automated systems (L2, [25]), towards a passenger affects the design of the human-machine interface (HMI) in the car [19]. Transitions between higher and lower levels of automation require an HMI facilitating the interaction between the human in the car and the ADS with a strong focus on the communication of the current responsibility for the driving task.

Automotive HMIs generally comprise output channels (e.g. displays, auditory signals), input channels (e.g. buttons, pedals), and a dialogue logic to ensure the appropriate interaction between drivers and their vehicles [2]. While the field of automated driving has seen developments into speech-based HMIs or haptic feedback, the following paper and study design focus on the development and evaluation of a mainly visual/manual HMI.

In a literature review, [1] identify common research methods in the context of usability assessments for ADS HMIs. The authors critically discuss the findings using the structure of study characteristics and derive a best practice advice for planning a usability evaluation. In addition to other methods, [1] propose to conduct usability assessments in driving simulators pointing out the advantage of driving simulators as being an efficient and risk-free alternative to real-driving environments [4]. However, [1] remind that results on usability assessments of ADS HMIs obtained in driving simulators have not yet been reviewed for their validity, i.e. transferability towards the real world.

Another important aspect is the potential impact of culture on the usability assessment. The growing body of research stresses the importance of culture when designing or evaluating user-interface design in general [12] and usability with a focus on automotive HMIs [16]. Therefore, cultural effects should be kept in mind when developing methodological recommendations for usability testing in the context of ADS HMIs.

This paper develops a study design based on the advice on user studies for usability evaluations provided by [1]. Furthermore, two HMI concepts are developed to serve as the research subjects. The study design presented will be applied to a series of four experiments within an ongoing project. The project pursues three objectives: (1) to evaluate the best practice advice by [1] in practical use; (2) to assess the validity of driving simulators; and (3) to investigate the influence of cultural factors on usability assessments. To conclude, the project will propose a practical approach to usability testing of ADS HMIs that covers different constructs of usability and appropriate dependent variables within their application areas. The first step in this project is presented in this paper. It comprises the development of two HMI concepts serving as research subjects. Furthermore, this paper outlines a study design describing the challenges of applying it to four experiments with varying test settings.

2 Design of Experiment

This chapter covers the study design that will be applied in a series of four experiments. The design practically applies the best practice advice provided by [1]. Therefore, the structure of this chapter takes up the structure of their paper and comprises the subsections Definition of Usability, Sample Characteristics, Test Cases, Dependent Variables, Conditions of Use, and Testing Environment. Additionally, the decision for two HMI concepts, their development and design are presented (Human-machine interfaces). The procedure of the experiment is outlined in subsection Procedure.

The study is developed to be suitable for the application in four different experiment settings. These cover one driving simulator experiment and three test track experiments in different countries. The study design considers constraints due to safety aspects or resources available at the different testing sites, e.g. length of test tracks or surrounding traffic. This results in four highly comparable experiments that feature only necessary and minor differences, e.g. language adaptations. The repetition of the study design will allow conclusions on the impact of the testing environment and potential cultural effects to be drawn.

2.1 Definition of Usability

This study design applies the ISO 9241, defining usability as the “extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” [14] (p. 2). Furthermore, the NHTSA minimum requirements towards an HMI for automated driving shall be considered in the study design. The requirements state that an HMI must be designed in such a way that the user understands that the ADS is “(1) functioning properly; (2) currently engaged in ADS mode; (3) currently “unavailable” for use; (4) experiencing a malfunction; and/or (5) requesting control transition from the ADS to the operator” [23] (p. 10). The definitions will be applied to the selection of test cases [23] and dependent variables [14, 23]. The resulting usability assessment is limited to the basic functions provided by an ADS. The results implore the participants’ understanding of the ADS and their interaction with the ADS.

2.2 Sample Characteristics

The target sample for this study design represents the potential user population, therefore, the car-driving population. For other target populations, different sample distributions might be appropriate. As recommended by [20] affiliations with study-related organisations or the tested HMI are avoided. The majority of participants shall have little or no experience with automated driving. By testing naïve participants, the intuitive usability of the ADS HMIs can be assessed. The age range is between 18 and 75. The goal is to ensure an even distribution that covers different age groups such as the age groups proposed by the NHTSA visual-manual distraction protocol (18–24, 25–39, 40–54, > 54) [22]. Gender distribution is balanced. A sociodemographic survey inquires further aspects such as visual impairments. Data on driving experience in general, prior experience with driving assistant systems, and the familiar manufacturing brands are recorded. The samples should be of great resemblance among the different experiments to ensure comparability. The target sample presented shows a great variety in its characteristics. The size of subgroups, e.g. age, is not sufficient for inferential statistical analysis. Nevertheless, important trends could be uncovered motivating future research.

2.3 Test Cases

The selection of test cases comprises mostly non-critical situations due to safety aspects of the test track experiments. Critical situations, e.g. with a limited time budget for take-overs, are important for safety-related assessments of ADS, such as controllability assessments [11], and have a low probability of occurrence. For evaluating the usability, especially the constructs efficiency and satisfaction [14], frequently recurring situations are of greater importance. The test cases cover standard situations, i.e. transitions between different automation modes and changes in the availability of automation modes as recommended by [1]. This allows an assessment of the basic functions provided by an ADS. Additionally, one critical situation requiring immediate intervention by the participant (TC12) is included. The selection of test cases allows conclusions related to the NHTSA minimum requirements [23]. Table 1 shows the assignment of the NHTSA minimum requirements to the specific test cases based on the information provided by the HMI concepts.

Table 1. Description of the twelve test cases and their linkage to the NHTSA minimum requirements [23] (p. 10): “(1) functioning properly; (2) currently engaged in ADS mode; (3) currently “unavailable” for use; (4) experiencing a malfunction; (5) requesting control transition from the ADS to the operator.”

In the HMIs to be tested information on the active automation mode and the availability of the different automation modes is constantly displayed. Therefore, in all test cases the participant receives information on the first three requirements “functioning properly”, “currently engaged in ADS mode”, and “currently unavailable for use”. The requirements “experiencing a malfunction” and “requesting control transition from the ADS to the operator” are addressed by two test cases each. No permutation of the test cases is planned because specific test cases build on precedent test cases, e.g. a take-over request requires the prior activation of L3 automated driving.

2.4 Human-Machine Interfaces

The two HMI concepts serve as the research subjects. The concepts have the purpose of increasing the variance of results within each experiment. This provides insights into relative validity and identifies metrics sensitive to differences in HMI design. In two previous within-subject studies [9, 10], two HMI concepts were tested that varied in their compliance with several items (Items 2, 3, 7, 8, 9, and 14) of [21]. The study results confirmed differences between the two concepts in both, behavioural and self-reported measures on usability and acceptance. Therefore, a similar procedure is applied here.

The concepts are based on the HMI of [5] and adjusted for the twelve test cases and the three automation modes L0, L2, and L3 [25]. Both concepts provide information on the active automation mode, the availability of automation modes and possibly malfunctions and transition requests. Infotainment is displayed on the right side of the HMIs, though it is not functionally implemented. One HMI concept was designed following recommendations of the NHTSA minimum guidelines [23] and the HMI guidelines listed by [21], therefore called high-compliance HMI. The HMI is limited to a mainly visual HMI, comprising the instrument cluster, LED-strips on the steering wheel and warning sounds. The other concept comprises only the instrument cluster. It deviates from the high compliance HMI by deliberately violating eight items of the guidelines of [21], therefore called low-compliance HMI. Figure 1 shows snapshots of the English HMI concepts visualising the differences. The violations concern the effective communication of transitions (Item 3), the functional grouping of icons and notifications (Item 5), the colour contrast (Item 7) and the general colour selection of symbols (Items 14, 15), the size and style of texts and icons (Item 8), the supplement of non-standard symbols with text explanations (Item 9), and the multimodality of high-priority notifications (Item 18) [21].

The HMI is controlled by two buttons on the steering wheel. The left button allows the transitions L0 → L2, L2 → L0, and L3 → L0. The right button toggles L2 ↔ L3. When pressed in L0, the high-compliance HMI provides textual feedback on its function while the low-compliance HMI does not show any reaction. Additionally, the participant can deactivate L2 and L3 by steering or braking.

An expert assessment is conducted with six researchers working in the field of HMIs for three to seven years (M = 4.5). First, the experts assessed the two HMI concepts by using ten heuristics collated from [24] and [21] and rated the severity of violated heuristics. Afterwards, the experts were interviewed on the colours, icons, and the icons’ positioning. The experts were able to express further feedback and comments in the final interview. The results confirm the different degrees of compliance of the two HMI concepts. Improvement suggestions were implemented to further increase the difference in compliance between the concepts. The control logic (toggle) of both HMI concepts was criticised by two experts. However, this was not changed due to technical constraints and because both HMI concepts applied the same control elements and logic.

The participants experience one HMI concept each and provide data on its usability. A between-subject design is chosen to avoid learning effects that are expected to be considerable due to the similarity of the concept basic structure.

Fig. 1.
figure 1

Snapshots from high-compliance HMI (left) and low-compliance HMI (right) just after a transition to L2 (top) and in the middle of a planned take-over request by the ADS (bottom). Items violated in the low-compliance HMI are indicated with their respective number [21].

2.5 Dependent Variables

The experiment collects both self-reported and observational data. Table 2 provides an overview of the dependent variables and connects them to the items of [21] violated in the low-compliance HMI that potentially affect the dependent variables. Furthermore, the dependent variables are associated with the NHTSA minimum requirements [23] and the constructs of effectiveness, efficiency, and satisfaction of the ISO 9241 [14]. This allows a more in-depth assessment of the usability of the HMIs.

Observational Measures.

Eye-tracking data is collected to calculate the attention ratios (percentage of time on area of interest) to the street, the instrument cluster, the control buttons on the steering wheel, and the tablet. The Surrogate Reference Task [13] on the tablet serves as a non-driving related activity only permitted when driving L3 automation. In automated driving research, attention ratios are used to assess trust [17] or mode awareness [6]. In this study design, attention ratios are applied to reveal whether the HMI is effectively communicating the active automation mode to the participant. Furthermore, gaze paths, gaze attention times, glance numbers and glance durations are analysed for test cases containing notifications by the HMI feedbacking how efficiently users receive the information.

Table 2. List of the dependent variables and their linkage to the items of [21] violated in the low-compliance HMI, the linkage to the three constructs of usability (a) effectiveness, (b) efficiency, and (c) satisfaction of the ISO 9241 [14] and the linkage to the NHTSA minimum requirements [23] (p. 10): “(1) functioning properly; (2) currently engaged in ADS mode; (3) currently “unavailable” for use; (4) experiencing a malfunction; (5) requesting control transition from the ADS to the operator.”

Button presses for transitions, braking and steering behaviour are recorded. Takeover times and hands-off detections during L2 are analysed. The data show whether participants reach the intended goals and if they do so efficiently. The driving behaviour mostly covers the constructs of effectiveness and efficiency of usability, but also provides information on the fulfilment of the NHTSA minimum requirements [23].

After each test case, the experimenter rates the quality of the participants’ interaction with the ADS on a 5-point Likert scale ranging from “no problem” to “help of experimenter” [8].

Self-reported Measures.

Participants are requested to indicate the last active automation mode, and the availabilities of different modes. In order to investigate the mental model of the allocation of the driving task, participants are asked whether they were permitted to take their hands off the steering wheel or answer e-mails. The short interviews provide valuable information on the effectiveness of the HMI concept and whether it comprehensively communicates the currently active automation mode and availability of other automation modes. When changing between automation modes, the experimenter asks about problems and encourages the participant to express feedback and thoughts.

After completion of the test drive, the participants fill out the system usability scale [3], the usability metric of user experience [7], the user experience questionnaire [18], and 1-item questions on trust and acceptance. A short interview gathers further insights of the participants’ experience with the HMI.

2.6 Procedure

The experimental setup describes the overall study procedure to provide a better understanding of the general setup. The procedure is oriented to typical usability studies and shall enable the systematic collection of multifaceted data on human-machine interaction in this complex and dynamic context.

Prior to the test drive, participants consent to the experiment and fill out a sociodemographic questionnaire followed by a familiarisation drive. Participants are informed that their simulated car is equipped with an ADS providing the three automation modes called “manual driving”, i.e. L0; “assisted driving”, i.e. L2; and “automated driving”, i.e. L3 [25]. Participants are instructed to engage in the Surrogate Reference Task [13] when L3 is active. Participants are instructed to initiate transitions only if explicitly requested by the ADS or the experimenter. The test drive comprises twelve test cases in a fixed order. Each test case starts at the beginning of the straight and ends in standstill at the turn-around for a short interview. The experiment ends with the questionnaires and the final interview.

Due to safety constraints and technical constraints of the underlying driver assistance system in the test track vehicles, participants must manually accelerate and decelerate in between the test cases. Thus, the participants are required to pre-set the automation mode that is needed for the respective test case themselves. Consequently, data collection on observational measures is limited to the centre of a straight (route metres 200 m–700 m) which excludes the participants’ manual acceleration and deceleration. Test case events such as system notifications and transition requests are triggered at three different locations along the route (325 m, 450 m, 575 m), permutating across the test cases. Neither the range for data collection nor the trigger locations are visible to the participant. The speed limit for the automation and the driver is set to 30 km/h. This results in about 60 s of data recorded for each test case.

2.7 Conditions of Use

As described in the subsection Sample Characteristics the study design is intended to cover the intuitive usability of the HMI concepts. Participants receive written information about the three automation modes of the ADS and their respective allocation of the driving task. The experimenter verbally repeats this information and answers questions. He indicates the two buttons on the steering wheel needed for changing the automation modes but does not give any operating instructions. A familiarisation drive is conducted prior to the test drive. However, it does not cover handling the ADS. Consequently, the test drive collects data on the first contact with the ADS.

2.8 Testing Environment

The experiment is repeatedly conducted in different testing environments. The first experiment is conducted in a static driving simulator consisting of a BMW 6-series convertible with front and back view projectors enabling an immersion with a front field of view of about 180°. The simulation software is SILAB. The simulated track consists of a three-lane straight about 900 m in length with opportunities to turn-around at both ends. Lane changes or surrounding traffic are not involved.

The simulated test track equals the real driving test track used in the second experiment that is conducted at the Universität der Bundeswehr München in Neubiberg, Germany. In the three test track experiments, the instrumented vehicle is a BMW 3series model equipped with Driving Assistant Professional. The vehicle is modified to enable L3 automation and the free programming of the HMI. The other two test track experiments are planned to be conducted on test tracks with similar features in the USA and Japan.

3 Limitations

The study design presented is limited to the usability assessment in terms of evaluating the users’ intuitive understanding and interaction with the ADS. Only basic functions of the ADS are covered. Furthermore, the study design is subject to several constraints that arise from the goal of maximum comparability between experiments and overall project goals. The proposed study design is applicable for all four test sites to cover intercultural aspects. However, due to safety considerations and limits in the local conditions of the different test tracks, the overall setting is rather simple, e.g. speed of 30 km/h, no surrounding traffic. To meet the claim of developing a recommendation for usability testing in the context of ADS HMIs, a large number of dependent variables is applied. This imbalance between experiencing a system and assessing it might increase the effort of the participants and negatively impact the quality of results. The HMI concepts that serve as the research subject are mainly visual. The concepts differ from each other only in the visual design and the usage of auditory warnings. Control elements and the handling are kept constant. The design of HMIs regarding their modalities and options for interaction should be subject to future considerations.

4 Summary and Outlook

This paper outlines a study design that builds on the best practice advice by [1], and adapts the latter to the practical application in a series of four experiments in different locations. Additionally, the development process and the design of two HMI concepts is described. This paper gives an insight into the challenges of designing comparable driving experiments across different test settings. It proposes different measurements and metrics to quantify the various aspects of usability. The development of an appropriate study design is the first step in proposing a practical approach to usability testing of ADS HMIs that encompass different constructs of usability and appropriate dependent variables within their application areas.