Keywords

1 Introduction

Military personnel often have seconds to decide whether to fire or not [7]. These personal can have improved firearm safety through cognitive aids and training [7]. As the military adopts new display technologies [3], there is an opportunity to develop compensatory aids that can reduce casualties. These aids may also serve as a way to reduce the amount of gear needed to be carried or switched between in the field. As an example, a map and a notepad can both be shown on a head-up display, reducing the homing time needed for users to move their attention from either device to the other.

This paper outlines a user study evaluating one such compensatory aid. That aid utilized a virtual reality (VR) system that concentrated on the usability of splash-zone representations, which could indicate the impact area of a weapon. This aid was developed with military use in mind, as such, the tasks given in the study are set to emulate a portion of the workflow completed by joint terminal attack controllers (JTACs). JTAC’s are tasked with integrating information about targets, personnel movements, aircraft coordination, and attack coordination [11]. Currently these tasks are supported by a variety of digital (i.e., tablet, phone, radio, digital map), and analog (i.e., paper pad, binoculars) technologies [9]. This system uses a VR head-mounted display (HMD) to render synthetic battlefield information in participant’s field of view. This can be considered to take the place of a digital map and a communication device. That said, the system can be extended to have the functionalities of other tools needed in this workflow (i.e., annotation of the map, notepad).

The portion of the JTAC workflow most closely implemented in and tested with this system were the tasks of identifying a target area, identifying the locations of objects, and identifying those objects positions relative to the target area. Often the identification of target zones, and the associated objects near that zone, is relayed to aircraft pilots whose view of the world may be different from the JTAC’s. The terms “left” and “right” are too ambiguous to be used in this scenario, especially given the intense time pressures that JTAC operators function under. Due to that, this system is designed using a north up bird’s eye view (Fig. 1).

As technologies continue to advance, the military’s adaptation of augmented reality (AR) and VR displays is becoming more prevalent [3]. By leveraging this emerging technology to facilitate a JTAC style workflow, this system can reduce the need for military personal to carry additional devices (i.e., digital map, rangefinder) through incorporating them into a single system. While optical see-though solutions such as augmented reality HMDs can provide more contextual awareness, their limited field of view and difficulties rendering objects in bright areas makes them sub-optimal for testing this system. Due to that, this system has been developed on an VR-HMD. This choice allows for a wider field of view and more exact control of the rendered objects visibility. Most importantly, this system offers a salient and accessible means of showing target zones. This style of compensatory aid can be of much benefit to end-users in the field.

This studies objectives were:

  • To find how effectively participants could use the visualization shown to assess whether entities or objects (e.g., vehicles, troops) were inside the target zone (Fig. 1)

  • To assess the extent to which the visualization features might inhibit other judgments of importance such as recognition of the entity and its relative bearing from the center of the zone

  • To examine the learning of these judgments across the experimental session

Fig. 1.
figure 1

Example of experimental stimulus.

2 Related Work

The military has pursued using VR-HMDs as a training aid for technical job requirements. In support of that, the 2018 “Joint Fire Support Executive Steering Committee” report specifies that some portion of JTAC live training can be replaced by the use of virtual simulations [4]. VR-HMD training for military applications has seen some success. Sui et al. (2016) found that the use of VR training can effectively retrain military medical professional on operational skill-sets that have been degraded by infrequent use [10].

Within the context of JTAC workflows, prior work has shown that a combination of human factors design and AR-HMDs can be successfully used in JTAC workflow related tasks [11]. However, that work primarily focused on testing the feasibility of various information displays as measured by human-factors based evaluations. Where that work focused on the feasibility of information displays, this work focuses on the usability of those displays.

The study of VR-HMD compensatory aid design has been somewhat limited. Work on battlefield aids usually focuses on the use of AR-HMDs, and mobile AR (i.e., a cell phone display with video-pass-through technology) [1, 5]. One such aid using mobile AR on commercially available phones was shown to be both usable for in field sensor fusion information display, and as a military viable option [5]. Other work has shown that the use of AR-HMDs can be beneficial as navigation aids [1], as well as for the identification of building obscured troops locations  [6, 8].

This work is inspired by the positive results seen when using VR-HMDs for military skill-set training, and the use of similar technologies for in-field use compensatory aids. This paper represents a push forward in evaluating the real-time use of a virtual battlefield aid. In order to do that, this paper establishes the acceptance and effectivity of this aid’s use in a controlled environment. This paper differentiates itself from prior work by examining learning rates and visual information assessment, as opposed to examining the feasibility of information display on these technologies.

3 Methods

A user study was conducted with 24 unpaid volunteers. All 24 volunteers were active students (age 18 or older) at Colorado State University and out of those 18 were enrolled in the United States Army Reserve Officers’ Training Corps (ROTC). Of the 24 volunteers, 5 were females and 19 were males. The average age of the volunteers was 19.96 with a standard deviation of 2.99. At the time when the experiment was conducted, the experiment was under export control, therefore, all volunteers that applied had to be US citizens. A second requirement was that the volunteers had a 20/20 vision or corrected to close to 20/20 via glasses/contacts/surgery.

All volunteers that fulfilled the requirements were asked to sign an informed consent, an attestation that they were citizens, and that they had an acceptable vision to continue as participants in the experiment. After filling out those forms, each participant was briefed on the tasks that would be presented to them. The briefing was done via instructions, which each participant had to read. After reading the instructions, a dialogue would take place with the participant, where questions were asked to confirm that the instructions had been understood. The next couple of minutes was optional if the participant had other questions.

Next, the head-mounted display was shown, along with the function of each knob/button and how to how to properly wear it. They were given as long as necessary to adjust it so that their viewing was not impaired due to calibration. A set of menus/images were shown to verify that the participant’s vision was focused, and when that was not the case, an additional amount of time was reserved for that purpose.

For each trial participants viewed a virtual environment, which contained a splash-zone and an object. Participants were asked to respond to three questions regarding that scene:

  • Is the object inside or outside the splash-zone? (In/out).

  • What is the Azimuth position of the object? (Clock position of the object).

  • Please describe the object. (Object description).

Participants were trained to respond to these scenes as quickly as possible. When shown the scene in Fig. 1, participants were expected to respond to question 1 (“in”), question 2 (“4 o’clock”), and question 3 (“bus”). There were 6 possible objects (Fig. 2). There were 12 possible clock positions (1–12), and 2 possible splash-zone locations (In/Out). During the experimental trials, these three variables were combined orthogonally in a \(2 \times 12 \times 6\) experimental design. Trials lasted approximately 5–10 s; this time is determined by how long it took a participant to make each judgment.

A block consisted of six scenes. Participants were first given a chance to train on the system with two blocks (Blocks 1–2). The next four blocks (Blocks 3–6) were the experimental blocks considered for the analysis. Therefore, each question was presented a total of 24 times (excluding the training blocks) to each of the 24 participants.

It was hypothesized that some of the participants would not have experienced a virtual reality system before, which could have caused some eye strain. Therefore, an optional short break was highly encouraged between blocks.

At the end of each experiment, a system usability score (SUS) questionnaire was presented to each participant so that each participant can rate the system’s usability [2].

Fig. 2.
figure 2

The six objects and their names.

3.1 Apparatus

The computer used to perform the experiment consisted of an Alienware Aurora R8 with 1TB SSD, 2TB SATA HDD, an NVIDIA GeForce RTX 2070 with 8GB GDDR6 RAM, an Intel i7 9700K CPU, and 64GB RAM. The head-mounted display used was the HTC Vive Pro Eye. Two generic tripod stands were used to setup the Vive’s base stations. A Yeti USB Microphone was used to record each participant’s responses. An external 2TB hard drive was used to keep a backup of the data under lock and key.

4 Results

There was a significant impact of block on response time which manifested as a decrease in response time over blocks or “practice” (Fig. 3). This was supported by an analysis of variance (ANOVA) \((F(3,1717) = 6.68, p\,<\,.001)\). Total response time was then subdivided into the time used to make each of the three consecutive judgments.

Fig. 3.
figure 3

Total response time (in seconds).

Fig. 4.
figure 4

In/Out response time (in seconds).

4.1 Response Time

Figure 4 shows the effect of block on judgment time (in/out). Although Fig. 4 suggests faster time during the last session, an ANOVA showed that this was not statistically significant \((F(3,567) = 0.82, p = .48)\). There was a continuous decrease in azimuth judgment time with practice \((F(3,567) = 6.37, p < .001)\) (Fig. 5). There was also a significant effect of blocks on object recognition time with \(F(3, 567) = 3.43, p = 0.017\) (Fig. 6).

Fig. 5.
figure 5

Azimuth position/clock judgment time.

Fig. 6.
figure 6

Object recognition time (in seconds).

4.2 Accuracy

The accuracy of the first judgment, in/out of the splash-zone was nearly at ceiling performance from the start (97%) and, as such, was not affected by practice. The accuracy of the azimuth judgment as a function of block is shown in Fig. 7. Clock position accuracy was high overall (approximately 88%). Block did not have a significant impact on this judgment \((F(3,92) = 0.42, p = 0.74)\). A closer inspection of the data revealed that participants made near perfect judgments of clock position when the azimuth lay along the line of sight (12:00 or 6:00) and orthogonal (9:00 or 3:00), but had more difficulty judging the off-angle azimuths of 1, 2, 4, 5, 7, 8, 10, and 11 o’clock, where there was only an 80% accuracy rate.

Fig. 7.
figure 7

Clock judgment accuracy.

Fig. 8.
figure 8

Object recognition accuracy.

Figure 8 shows the effects of block on the accuracy of participant object recognition. While the ANOVA revealed a marginally significant effect \((F(3,92) = 2.67, p = .052)\), the absence of a monotonic trend across blocks suggests that practice did not improve object recognition accuracy. Closer analysis of these data indicated near perfect (99%) accuracy when the object was located at an azimuth perpendicular to the line of sight; reduced accuracy (93%) when it was located along the line of sight, and still further reduced accuracy (81%) when the object was located on the off-angle azimuths.

Fig. 9.
figure 9

SUS satisfaction per question, showing a ratio representing the sum of all user ratings per questions against the total number of maximum points.

Fig. 10.
figure 10

The Question column list the questions that were presented in the SUS with the mean on its right and the individual responses per participant on the right side.

4.3 System Usability Score

The questionnaire given to each participant at the end of each experiment was meant to give us an insight into what worked and what needed to improve. The questionnaire consisted of 17 questions, where each question could be rated from one to seven (1 being the lowest satisfaction and 7 being the highest satisfaction rating). Figure 10 contains a list of the questions, the mean of all participants per questions on the left, as well as each participant’s individual response to each question on the right side of the question. A total of 10 questions were not answered by participants, which were replaced by a rating of 4, as it is the average between the ratings (1 and 7) and can act as a neutral number. Although the instructions for the ratings (1 being the lowest and 7 the highest) were provided at the beginning of the SUS and they were reminded at intervals, some participants confessed, after finishing, that they were confused and thought that a rating of 1 was better. The maximum number of points for all questions were 119 (\(7 \times 17 = 119\), 7 corresponding to the highest satisfaction and 17 to the number of questions). Figure 9 illustrates the ratio of the sum for each question over the total maximum points. This shows that the overall satisfaction rate for all questions were well above 75%.

5 Discussion

The primary objective of the experiment was to assess if the entities were inside or outside of the splash zone. As the overall correct response rate of participants were over 99%, one could see that participants did not have any problems affirming the object’s location. Another objective was to assess if the visualization features might inhibit some of the judgment calls. The object recognition accuracy fluctuated between 80%–90%. Some of the concerns that participants expressed after the experiment were that the color of the object and the texture of the background closely resembled each other, which at times made it difficult to recognize the type of object being presented. Object identification was also occasionally inhibited by the display. When objects were located perpendicular to the line of sight they were never interposed, and their identity was judged with near perfect accuracy. When objects were along the line of sight, or at off angles, roughly half of the time they would have been partially obscured by the fence, which degraded performance.

The final objective was to assess the learning effect of the presented judgments across the experiment sessions. The clock judgment accuracy (Fig. 7) revealed that the mean stayed between 85%–90%. Likewise, the object recognition accuracy (Fig. 8) mean was always between 80%–90%. Making these judgments from a long distance and in the case of most participants, doing this for the first time, the resulting accuracy was to be expected. However, when the response times of the clock judgment (Fig. 5) and the object recognition time (Fig. 6) are reviewed, the graphs indicate that the required time to respond was decreasing. The decreasing time graphs (Figs. 5 and 6), along with the over 80% accuracy graphs (Figs. 7 and 8) indicate that a learning effect was taking place.

The learning effect indicates that the system is easy to learn in a short amount of time as seen by the users improved performance over each block. This learning rate also indicates that this system can improve splash-zone awareness for minimally trained participants, whose first exposure to the system was the same day as the experiment (6 out of 24 participants had experience with VR). It is possible that these trends may hold true in augmented reality HMD systems as well. If that is found to be true, this system may be able to aid military personal as a field use compensatory aid. The low training overhead, and quick target identification times could both the success of this aid.

The SUS is important as it will help the team to reshape the future of this project. While a 75% overall satisfaction ratio is high, there are some areas that need improvement, such as “The interface of this system was pleasant” and “I was able to complete the tasks and scenarios quickly using this system”, both of which had ratings in the low 70 s. Since some of the lower ratings were due to the participant mistaking the high ratings for the low ratings, it would be beneficial to mention the SUS scales to the participant verbally at intervals and to place a reminder of these ratings at the beginning of every other question using a larger font. Overall, the SUS results show that this system is usable and well received by minimally trained participants. We believe this is a positive indication that this system would similarly be usable by military personal, which have some overlapping military training with the participant pool used in this study due to the participants ROTC training.

6 Future Work

Some of the deficiencies previously mentioned is that the foreground object would need to be distinct from the background texture. One approach would be to change the shape of the object, as the light rendered on it will make it look different. Consequently, choosing a different color for the object or the background is another way to separate one from the other, making the interface easy to follow. This experiment showed that a learning effect takes place while the participant is performing the experiment. A prospective experiment could be one that shows a participant’s accuracy when a participant needs to identify an object given several objects scattered in the splash zone. An awareness of this type given in a specific situation would need to be instantaneous. JTACs are required to make instant judgment calls on any terrain at any cardinal location and understanding their reaction time and requirements will help them achieve their goal.

7 Conclusion

Participants could determine if the objects were in or out of the splash-zones with near perfect accuracy. The judgments of object identity appeared to have been slightly inhibited. When the objects were located perpendicular to the line of sight they were recognized with near perfect accuracy, whether inside or outside the fence (Fig. 1). Occasionally misidentification of targets was due to the fence obscuring the object itself, or due to the similarity of the texture of the object shown and the texture of the background. Mistakes caused by the fence obscuring objects were most common in trials where the objects were along or at off angle to the line of sight. Participants became quicker with practice, shortening their overall response time by nearly 1 second. It was the only the in/out judgments that did not improve with practice, but these were so simple that response time would be difficult to shorten. These results demonstrate that this system can improve splash-zone understanding.