Keywords

1 Introduction

Fig. 1.
figure 1

The different modalities used in the study included Slider (1(a),1(d)), egocentric video (1(b), 1(e)), exocentric video (1(c), 1(f)), and sound (not in figure). The user remained stationary as the robot approached. Here all robots are at a height of 5 ft.

With the appearance of the COVID-19 global pandemic, as well as the influx of public facing robots performing tasks in the world (delivering food, medicine, and greeting or guiding passersby), the social robotics community faces many questions related to human-robot proxemics, but relatively limited access to in-person participants. Even in an otherwise normal world, studies conducted in-person are generally localised to the geographic area where experimenters are located and restrict the sample population demographics. This work describes an investigation of human interaction methods and leverages a relatively small set of in-person distancing results, the ability to compare to a previously published study, and a set of methods previously used in human-human distancing to answer fundamental questions regarding methodology for assessing human distancing. This work will inform future researchers on the utility of these methods, and any lessons learned in their application to improve our ability to target limited in-person experimental resources to problems that will likely produce interesting results.

This work explores the following research questions:

  • What are the different modalities we can use to prototype human-robot proxemics studies?

  • How do the results compare to studies run in-person?

This work indicates that the in-person test results are relatively consistent with the trends observed in each of the online techniques, with the exocentric video condition producing the most similar results (albeit with a 2.3x magnitude increase). This was observed through the analysis of the projective distances reported, the reported participant affect, and the qualitative comments from the participants. Given these results and the confidence of the participants that they were able to imagine themselves interacting using these techniques, recommendations are made for when to leverage each technique in future studies.

2 Related Work

In this section we cover prior work in the context of interaction modalities utilized in human-human and human-robot interaction studies, and literature related to impact of height on human-robot proxemics, in order to situate the current work.

2.1 Modalities in Interaction Studies

Human-Human Interaction. Prior studies in human-human proxemics utilized various methods to understand the personal space that the users wanted to maintain including unobtrusive observations, stop distance [21], video [25], sound [21], adjustable size of stimulus image, chair placement or choice, felt board technique, paper-and-pencil procedures [21, 25], positioning of miniature figures, and preference judgements for photographs showing differing spacing and size of projected faces. In surveying the human-human proxemics methodology landscape [8] found that in-person stop distance measurement is the most reliable and preferred technique for experimental evaluations, while pencil-and-paper and felt board methods are the least reliable. The video (exocentric) [25] and sound [21] modalities were found to be a more reliable comparison to in-person interaction compared to other techniques like paper-and-pencil procedures.

Human-Robot Interaction. User perception of robot is affected by the medium used to present the human-robot interaction [30]. Prior studies have used various methods to evaluate HRI hypothesis like text [30], slider [11], 3D figurine [11], virtual agent/animated character [12, 15, 18, 22, 29], virtual reality [4, 16], telepresence (live video) [12, 18], pre-recorded video [9, 14, 20, 28,29,30], and some went a step further and also provided a comparison to in-person studies [9, 12, 15, 16, 18, 22, 28,29,30].

On one hand we have findings like one by [15] where people were found to have stronger behavioral and attitudinal responses to co-present robots compared to telepresent or virtual agents. While on the other hand studies have found modalities like videos to work favorably well compared to in-person interactions. In-person interaction with the robot can be useful in evaluating the social aspects of the robot, but can lead to higher anxiety level and lower trust [30], but videos can be particularly effective in enhancing users’ perceptions of the performance of robots on its intended functionality, without the elevated anxiety. [22] used videos of animated sUAVs to understand how to effectively communicate intent, to improve the flight design to inform the follow-up in-person study. Similarly [9] conducted online studies with exocentric video clips and later ran a confirmatory in-person study with ground robot.

In fact, [14, 20] piloted both egocentric and exocentric videos, but decided to opt for egocentric videos, to provide better focus on the movements of the robot, without contextual distractions like age, gender, and ethnic background of the actor in the video. Studies by [12] (egocentric video), and [28, 29] (switch from exocentric to egocentric view) comparing real world evaluations of interactive prototypes with web-based video prototypes found results from video modality tend to be consistent with in-person studies, although the former may not contain all the salient factors that may be present in a real-world setting. Proxemics interactions have not been explored in the context of online modalities with sUAV as one of the interactants. Our study will test slider, video (egocentric and exocentric to observe differences in distancing based on viewpoints), and sound modalities.

2.2 Impact of Robot Height on Proxemics

Height has been found to affect how people react to robots in many studies [17, 19, 27]. For example, [19] discovered that the size of peoples’ proxemics zones are directly proportional to the height of a ground robot, but [17] found that as ground robot height increases, the distance people prefer between themselves and a robot decreases. [7] specifically researched how operational height of sUAVs may affect people’s comfortable approach distance, and did not find any significant effect. They note that a possible reason for the lack of difference in preference may be the lack of a realistic setting (UAV was tethered) and participants’ feeling of security. [27] varied the altitude of the sUAV in study, and found that a constant altitude trajectory (at 1.75 m \(\approx \) 5.74 ft) is preferred over increasing or decreasing altitude trajectories. Our in-person and online studies will test the impact of straight trajectories where an un-tethered sUAV will maintain it’s altitude as it approaches the user.

3 Experiment

This paper presents a study to address the research questions: What are the different modalities we can use to prototype human-robot proxemics studies? and How do the results compare to studies run in-person? These research questions are answered using various projective and definitive measures for stop-distance adapted from prior interaction studies. To answer the first question, an online study was conducted by varying the interaction modality: 2D-distancing slider, egocentric video, exocentric video, and sound clips. To answer the second question, data from the online studies was then compared to data from in-person studies: one previously conducted by [3], and the other conducted in our lab.

3.1 Materials

Asctec Hummingbird sUAV and the Double telepresence robot were used in our studies similar to [3]. The ground robot operated at a height (measured to the top of the robot) of 5 ft (1.52 m). The operational height of the aerial robot was set to 3 ft, 5 ft, or 7 ft. The robots’ approach speed was set to 0.2 m/s. In order to track the robot and the user, Vicon markers were placed on the robots, while the user was asked to wear a pre-made marker object around their neck.

3.2 Testbed

The overall study setup for recording the videos and conducting the in-person study replicated the baseline study by [3] including the study space (testbed figure attached in appendix).

The participant interacted with the robot in the enclosed section of the room (4.88 m by 3.53 m). The participant stood in the marked (S) while the robot approached from it’s start location marked with (R). The experimenter controlled the robots (UAV and ground robot) from the outside section (4.88 m by 1.03 m). A backup human pilot observed the experiments via live video feed (through two Sony CX440 video cameras), ready to take control of robots if necessary.

While this system was followed for the in-person study, the same setup with a male actor portrayed as the user (similar to [28]) was used to capture the exocentric video (as shown in Figs. 2(a) and 2(b)), and lastly the camera was placed roughly at the height of 1.5 m for the egocentric video (as shown in Figs. 2(c) and 2(d)) and sound clips.

Fig. 2.
figure 2

The videos were captured with the sUAV flying 3ft, 5ft or 7ft height, and the Double ground robot (all marked in yellow box), from ego (2 (d), 2(c)) and exo (2(b),2(a)) centric point-of-view. Similar conditions were faced by users in the in-person study. (Color figure online)

3.3 Studies

The following online and in-person studies were conducted:

Online Studies. Amazon’s Mechanical Turk (MTurk) [2] was used to recruit participants for the online study. Following recommended practices [1, 10], we pre-screened participants by requiring them to have number of approved HITs > 5000 and HIT approval rate for all Requesters’ HITs > 97% in their MTurk history. Participant anonymity was maintained as required by our Institutional Review Board by tracking only the MTurk worker ID.

The online studies were conducted with the Double ground ro bot, and sUAV flying at heights of 3 ft, 5 ft and 7 ft. The participants were randomly assigned to conditions, and the interaction order was counterbalanced between participants. Once participants accessed the study via Mturk, they first entered background information and answered a pre-interaction questionnaire, next positioned a random order of [5ft Double, and 3ft, 5ft, 7ft sUAV] in their online modality, and finished with an exit questionnaire. Post-interaction questionnaires were administered after the first interaction with double and the sUAV. The surveys were administered using Google Forms, and the web pages containing slider and video/sound clips were hosted on university servers.

2D-Distancing Using Sliders. A UI comprised of a slider was used with the human’s image on left (static), and the robot’s image on the slider handle (movable), with the scene presented to the user from exocentric point-of-view. The user was provided the following instructions:

“Imagine that you are the figure on the left. How far apart would you place the following two figures by dragging the figure on the right?”

Video Stop-Distancing. Each user was shown a video of the robot approaching from either egocentric or exocentric point-of-view, and provided the following instructions:

“Start the following video with sound on. Once the approach distance of the robot in the video begins to make you feel uncomfortable, stop the video. Finally, click submit.”

Sound Stop-Distancing. User was provided a sound clip of the robot approaching (recorded from egocentric point-of-view), and provided the following instructions:

“Start the following video with sound on. Imagine a robot is approaching you. Once the approach sound of the robot in the video begins to make you feel uncomfortable, stop the video. Finally, click submit.”

Attention Check. We asked the participants to watch for random ‘attention checks’ to increase performance (described in 3.3). Instead of distancing the interactants by using the slider or the video/sound player, participants were asked to name the interactants in case of slider and asked to report a word (“robot") inserted into the video/audio clip. These checks were inserted to verify if the participants were carefully reading instructions instead of mindlessly clicking through tasks.

In-person Study. An in-person study was conducted with the sUAV flying at differnt altitudes: 3 ft, 5 ft, or 7 ft. The participants were randomly assigned to conditions, and the interaction order was counterbalanced between participants. Once the participant arrived at the experiment location in our university lab, their consent was obtained and they answered a pre-questionnaire to record background information and pre-interaction measures. Next they were asked to wear the fiducial markers’ object and participants not wearing eye glasses were also asked to wear safety glasses for all interactions. Once the robot started approaching the user, they were asked to say “stop” once the robot’s closeness began to make them feel uncomfortable. The stop-distancing technique is similar to the one in [3, 6] and follows recommendations for use by [8]. On completion of each of the three interaction sessions, they were asked to fill out post-interaction questionnaire to collect their feedback and post-interaction measures.

3.4 Participants

Online Study. In the online study conducted on MTurk, participants were paid a fixed compensation ($3 USD) for a task that took 34 min on average to complete. We controlled for the quality of our data by excluding data from 65 participants who failed attention check task (described in Sect. 3.3), and 13 participants where we discovered that some had answered the study multiple times despite clear instructions not to do so due to how the studies were published on MTurk. Ultimately, the study had 288 participants (187 male and 101 female) between the ages of 19 and 69 (\(\mu \) = 36.93, \(\sigma \) = 10.55).

In-person Study. The in-person study conducted at a university research lab had 36 participants (19 male and 17 female) between the ages of 19 and 67 (\(\mu \) = 33.36, \(\sigma \) = 16.69). These participants were recruited through on-campus advertisements and emails to campus mailing lists. Participants were compensated $15 for participating in the 1 h duration study. For two participants in Study 2 the sUAV crashed before interaction. Since this may have impacted their approach distances, their data was not used and 2 new participants were run with the same treatment conditions to get data for all 36.

Prior Interactions with Robot(s). In the online (taken together) and in-person studies conducted by us, 50.35% and 52.77% of the participants reported to have interacted with a robot respectively. It is important to note that the robot interaction question was phrased broadly to include single interactions and those in museums or with robot vacuums.

4 Results

Results will be presented from the online and in-person studies to compare results on distancing, user comments, and participant affect.

The data from all the online studies was converted to distances (in meters) using the proportions applied to the slider study assuming a human of average height (1.5 m), distance (3.65 m) between user and robot start positions, as well as the ROS bag files used to record the flight paths in video and sound studies converted to correspond to the video/audio timestamps. All results are reported using the final submitted value from the online form, unless reported otherwise. Normality of data was tested using the Shapiro Wilk test. None of the data were found to be normally distributed and hence for all further analysis non-parametric tests were chosen. The Mann-Whitney test was used to compare gender data. The Wilcoxon Signed-Rank test (for two measurements in Double and UAV-5 study) and the Friedman test (for 3 measurements in the UAV at multiple heights study) were used to compare the comfortable approach distances (measured in meters). Finally, the Nemenyi’s Test was used for posthoc analysis [24] wherever the Friedman test was used. All tests were corrected for false discovery using Benjamini-Hochberg Procedure (BHP).

4.1 Interaction Observations from Online Study

Fig. 3.
figure 3

Summary of online study interactions for all modalities.

During the in-person interaction, users were not able to send the robot backwards, so the approach distance recorded was the distance at which user stopped the robot. In contrast, for the different modalities we tested in the online studies, and also due to the nature of the online interaction, we found instances where users provided multiple answers and calibrated the distance to be comfortable. These interactions are summarized in Fig. 3, where we can see the number of interactions where users changed the distance, increased distance after first choosing a lower value, experimented by setting the lowest distance, and set a higher distance after first trying to set the lowest distance.

While 97.88% of users in the online study reported to able to effectively visualize themselves as the user in the study, 76.76% of users felt that an in-person interaction might change how close they allowed the robot. Out of these, 61% of users indicated that they would choose to interact with the Double closer than their own indicated placement, while only 27.06% felt that they would prefer to have the UAV closer in real interactions. Given the already large projected distances in the different methods, it’s quite interesting, and contrary to the finding that users in the in-person study let the robots approach at close distances.

4.2 In-person Study: sUAV Flying at Different Heights

In the in-person study we conducted with the sUAV flying at three different altitudes, the comfortable approach distances at 3 ft (\(\mu = 0.67\), \(\sigma = 0.26\)), 5 ft (\(\mu = 0.65\), \(\sigma = 0.23\)) and 7 ft (\(\mu = 0.65\), \(\sigma = 0.18\)) were not statistically significantly different (Table 1). The distance values are however closer in magnitude and in the same zone (personal) as results reported by [3, 7].

Similar to the participants trying out the closest approach distance by either moving the slider to closest point or watching the video/sound clips until the robot approached to the closest point in the online study (described Sect. 4.1), it’s interesting that even during the in-person interaction some users allowed the robot to approach very close, and did not stop the approaching robot 38.88% of the times and stopped it only at the last moment 5.55% of the times. In the in-person interaction however they would not be able to send the robot backwards once it approached, instead the robot was halted by the autonomous code maintaining the safety distance around the user.

Table 1. Approach distances (in meters) measured for in-person studies. p-values correspond to the conditions of each study.

4.3 Online Studies

Table 2 summarizes the results for this section.

Table 2. Projective comfortable approach distance (in meters) calculated for all online methods. Statistical significance is indicated with compared pairs (the conditions being compared) marked with a, b, and c.

Robot Types: Ground Robot (the Double) and sUAV. The comfortable approach distances for the Double and the sUAV were found to be statistically significantly different in each of the four modalities (p < 0.05), where users allowed the Double to approach at a closer distance compared to the sUAV. These results are consistent with those of [3] summarized in Table 1, though the approach distance magnitudes differ. The video (exo) condition was the closest, but still roughly 2.3x the measured distance for both approaches.

sUAV Flying at Different Heights: 3 Ft, 5 Ft, and 7 Ft. The results for this differ across the modalities. In slider and video (exo) modalities, the comfortable approach distances for the sUAV flying at 3 ft, 5 ft, and 7 ft were not statistically significantly different, and this is consistent with our findings in the in-person study (Sect. 4.2). In the video (ego) and sound modalities however 3ft and 5ft, and 5ft and 7ft were found to produce different distances respectively.

4.4 User Comments

In the exit questionnaire the users were asked “Do you have any other comments about this experiment?” and “Is there anything that has not been addressed that you find important?”. Participants, in general, expressed curiosity and engagement; other common feelings are summarized briefly in this section.

Participants expressed a preference for the ground robot compared to the aerial vehicle. For the UAV, the participants commented on the noise generated by the vehicle and expressed negative feelings towards the propeller blades. A few users commented on allowing the robot to approach. Finally, the general comments pointed to overall feelings of interest in the study.

Slider:

  • “Well, if I were to meet up with the robot for real, I’d probably let it get closer than what I’m imagining. I know for sure I would not make it be farther away from me.”

  • “It is hard visualize interaction with a robot via computer screen, in person interaction could present a totally different experience.”

  • “I think if the robots have propellers or fly, I would want them a little farther away than a robot that was on wheels.”

Video (ego):

  • “The last robot that wasn’t flying was a lot easier to not be scared of.”

  • “None of the robots spoke. That would have an impact.”

Video (exo):

  • “I liked the last robot better, don’t like that flying thing.”

  • “The sound of the UAV’s is what makes me dislike them, I think.”

Sound:

  • “Sound of the robots makes a difference in how we perceive them, I would have liked to know what these robots looked like too.”

  • “Videos are all playing a black screen and sound only with little to no variation in the volume/intensity. The one with the word ’double’ at the bottom sounded like a nice day out in the park so I don’t know how I can judge an imagined robot from that. ”

  • “This experiment, especially the sounds, completely stressed me out!!”

  • “Were the sounds actually real robots? I didn’t think UAVs were that loud.”

4.5 Human-Human Distance

In order to baseline the collected numbers and out of interest due to the ongoing pandemic, we asked participants to indicate (using a slider) how close they would allow another human being to approach them. On average participants distanced the human figure 0.85 m (\(\sigma = 1.02\)) away. One participant commented “Closeness of human preference depends on Covid.”. These results are relatively similar to those observed in human-human distancing (M=0.73 m) [5].

4.6 PANAS

The Wilcoxon Signed Rank test was used to compare the differences of affect PA (sociability) and NA (stress) [26] post-interaction with the ground and aerial robots compared to the initial pre-interaction “Today” measurement. When looking at the differences of affect PA (sociability) and NA (stress) [26], the participants reported higher distress after interacting with the aerial robot than with the ground robot (\(W(271) = 6387.0\), \(p < 0.001\)). The average “Today” value for NA was 17, average NA after interaction with the sUAV was 17.76, and after the Double was 16.6. These results for NA, computed for all online studies together, are consistent with the findings of [3].

This effect was observed separately in the egocentric video (\(W(67) = 501\), \(p < 0.05\)) and exocentric video (\(W(69) = 394\), \(p < 0.05\)) modalities as well, but not significant after applying the BHP correction.

5 Discussion

5.1 Limitations

The most prominent limitations of this study are due to the testing modalities, where each sacrificed fidelity in different ways. The slider, where the interactants were images, was missing the sound and visuals; the sound modality was missing the visuals; and all online studies were missing the in-person experience. In the current implementation, the slider restricts testing of variables like speed and variable paths that require 3D perceptions and automated movement of the robot. To investigate these factors, one of the other prototyping mediums should take precedence. Despite these differences, 97.88% of users reported that they were able to effectively visualize themselves as the user in the study.

The lack of significant difference for the UAV flying at 3 and 5 ft were similar in online and in-person studies, but this did not hold for the UAV at 7 ft. One possibility is that in-person user responses were impacted by other factors like the perception of room size or ceiling height, which were less obvious in the egocentric video and absent in the sound modality. Another explanation is that the physical presence of interactants during in-person interaction afforded viewers with better depth perception and motion parallax [15], which was lacking in the online modalities.

The projective measurement results from online studies are similar to the results of in-person studies, but the distances differ in magnitude. We argue that it is fair to sacrifice the precision in favor of ease of deployment and ability to detect patterns in interaction which can then be refined through smaller in-person tests.

5.2 Implications

Our findings suggest that choosing among the slider, video (ego or exo), or sound for specific purposes requires a consideration of the social and informational dimensions of the task at hand. Based on the study being conducted, understanding the contextual information conveyed by each method is important in eliciting the most effective response for each method from the user. However, given the information here, we could have tested other potential studies to find one likely to elicit differences (such as increased speed, variable flight paths, etc.).

5.3 Recommendations

Many users reported multiple answers for each method and, as opposed to in-person interaction, they could refine their answer by moving the robot closer and further to find their preferred distance. Prior studies in ground robots have found this to not impact the distancing [13, 23], whether the robot approached the user or user approached the robot. But the patterns in our data indicate that the same may not hold true for aerial vehicles. In future studies, researchers might confirm the impact of this by allowing the user to refine their answer after stopping the robot as an iterative process.

With respect to applying the different methods, we would recommend the following. Sound seems to be an effective modality if you are testing the size of very different vehicles or acoustics for a deployment space to understand which design might be preferred. Ego video is useful for systems that can be fully observed from this relatively limited view to understand the expected perceived size of interaction. Exo video would be effective for testing most use cases and deployment details due to the wider view and the standoff, but are limited in their application to exceptionally loud systems, so might be well complimented by a sound or ego video study.

6 Conclusion

Through the use of crowd-sourcing platforms, we were able to complete a set of studies that would generally have taken many months (in the absence of a global pandemic) in less than one month. The similarity of the observed trends to those observed in person may have encouraged a different selection of in-person study (such as one that would impact sound). We hope that the demonstration of these techniques and, in particular, the relative consistency of the exo video condition (though at a different magnitude) with in-person trends will encourage researchers to leverage these methods for future exploratory work. Discussion includes recommendations for when to use the different modalities.