Keywords

1 Introduction

Playing interactive games is considered as an entertainment or even sometimes as an educational tool. Due to the recent improvement in the field of social robotics, robots can also take parts in some interactive games. Considering the importance of interactive games, different computer games and robotic platforms have been developed, invented or specialized for different purposes like education [1], therapy [2, 3], and fashion industry [4]. The common assumption is that human players prefer to interact with robots in the same way that they interact with other human players during an interactive game [5]. In this regard, intention detection and the ability of predicting other players’ decisions during a game play can be valuable [6,7,8,9]. In [6], Variable-Order Markov is used to build a probabilistic model that is able to use the historic behavior of gamers and to predict their next actions. In another recent study [7], a teaching-learning-prediction model is presented for a robot to predict human intentions using wearable sensing information. Dermy et al. [8] described their software for predicting the intention of a user physically interacting with the humanoid robot iCub.

During the recent years, Social and Cognitive Robotics Laboratory at Sharif University of Technology, Iran have focused on robotic platforms for individuals with special needs [10,11,12]. Currently, we are working on development of a social robot called RASA for educating children with hearing problems [13]. One of our goals is to develop a human-robot interactive gaming platform for RASA with the ability of predicting human players’ next move. To this end, the well-known children’s game “Rock-Paper-Scissors” (RPS) is chosen for RASA, since it is easy to play (human players do not need any tool or major training), and it can be played by a wide range of ages, and even by children with hearing problems. Due to the sequential nature, Rock-Paper-Scissors is not a completely random game and even for amateur players, the results from previous rounds are important in decision making process [14]. Some of the studies on intention detection in Rock-Paper-Scissors are based on analysis of physical or biometric data from the player [15, 16]. In these methods, prediction is not based on the history of the game. However, we would like to “predict” the player’s intention based the previous rounds results. This might lead to a more pleasant game since there is always a chance for the player to defeat the robot, while it is not purely random.

This paper presents an initial attempt to have an interactive human-robot gaming platform for intention prediction studies on RASA. The platform used in this research, consists of a Leap Motion sensor to get the hand gestures of the player, then the players hand gesture is recognized through a Multilayer Perceptron Neural Network. Then, a Markov Chain model is used to predict players next hand gesture, and finally, RASA exerts the robot’s reaction by implementing the proper hand gestures. Our hypothesis is: “using intention prediction during our gameplay with RASA would lead to a high socially acceptable human-robot interaction”. In this regard, the acceptance and attractiveness of the setup, interaction quality during the game, and legitimacy of our hypothesis were assessed during a pilot study.

2 Robotic System

2.1 Leap Motion Controller

In this study, we have used Leap Motion Controller as an input device to capture human hands gestures. The effective range of the Leap Motion Controller approximately is a spherical cap with radius of 600 mm above the center of the device with a field of view of about 150° [17]. Leap Motion Controller tracks hands and arms and distinguishes right hand from left hand. In addition, it measures and computes the position and orientation of the palm, distance between finger bones’ joints, and the direction of each bone in fingers and the position of the fingertips of the hand for each frame. The sensor’s software development kit assumes 4 bones in each finger. The beginning and the end point position of each bone is available.

2.2 RASA, the Humanoid Robot

We have used on of our robotic platforms called RASA as our humanoid robot for the HRI procedure. RASA is designed and manufactured in the Center of Excellence in Design, Robotics, and Automation (CEDRA) at Sharif University of Technology [13]. RASA is 110 cm tall and has 32° of freedom (DOF). There are 13 DOFs in each arm, and 3 DOFs in the neck of the robot. Using a display screen as the face for facial expression make the interactions richer. Figure 1 shows RASA and its hand gestures.

Fig. 1.
figure 1

(a) RASA humanoid robot. (b) rock sign, (c) paper sign, and (d) scissors sign represented by RASA respectively.

3 Methodology

3.1 Hand Gesture Recognition Using Artificial Neural Network

A Multi-Layer Perceptron (MLP) neural network is used to classify the rock, paper, and scissors gestures from the raw data which is captured by the Leap Motion Controller. The MLP takes 14-element input vector fully connected to 5 neurons at the first layer with ReLU activation function, and has 3 neurons with sigmoid activation functions at the last layer. To train the MLP, an inclusive training dataset including 450 features vectors is gathered. Each features vector contains 14 inner products of normalized direction of successive bones of each finger.

3.2 Intention Prediction Algorithm

As a robot, to win a sequential game with arbitrary choices like Rock-Paper-Scissor in a fare game without cheating, the robot needs to analyses the history of human’s actions for predicting his/her next action. To this end, we used the Markov Chain model [18] as a stochastic tool to predict the human’s next action. If the probabilities of all actions are equal, the next action will be taken using a uniform random distribution; for example, this state occurs at the first round where the robot have no history of the opponent’s behavior.

State of the game at each round is a tuple of robot’s action and the user’s action. Since there are three states for each action, it is obvious that the total number of possible states are 9. Therefore, 9 discrete probability distribution functions (PDF) should be considered to anticipate the opponent’s action. For instance, an arbitrary distribution function of this type is shown in Eq. (1) where R, P, and S stand for rock, paper, and scissors gestures, respectively.

The PDFs are updated based on the number of observations for each instance and the total number of observations. In order to make the algorithm adaptable to user’s strategy changes, we applied a decay ratio (0.9) to all previous number of observations after each round. Therefore, the algorithm relies more on its recent observations than the old ones. The update rule is presented in Eq. (2) in which N is the number of observations and e is the decay ratio.

(1)
$$ N_{i}^{new} = e \times N_{i}^{old} + N_{i}^{last} $$
(2)

Finally, the decision is made based on the prediction. Then RASA plays one of the pre-recorded hand gestures of the type: rock, paper, or scissors. Since the mechanical response of the robot is slower than the user’s, the chosen gesture is shown on a screen to insure the opponent that the action is generated spontaneously and RASA is not cheating.

3.3 Experimental Setup and Participants

Each participant was asked to play the Rock-Paper-Scissors game with the robot in two modes after a short introduction presented by the RASA robot about the game session and rules. The two mentioned modes (including “mode a” and “mode b”) referred to the robot’s strategies in making decisions in replying the participants. In “mode a”, the robot uses the random strategy in selecting/showing its actions; while in “mode b”, RASA made decisions and performed the game based on the presented Markov Chain model. Half of the subjects were randomly selected to play “mode a” first, and then “mode b”, and the other half did vice versa (i.e. counterbalanced condition). It should be noted that the participants were not informed of the robot’s exact strategy/technique for its decision making during each mode and they were just told when each mode finished; instead, a questionnaire was provided to ask the subjects to rate and compare the possible differences between the robot’s two strategies in playing the game. The number of the rounds in each mode was 20 and the whole process for each participant took about 10 min. The game sessions were held at the Social and Cognitive Robotics Lab., Sharif University of Technology, Iran.

We invited 32 participants (22 males and 10 females) to take part in playing the Rock-Paper-Scissors game with the RASA robot. The mean age and standard deviation of the subjects were 27 and 5 years, respectively.

3.4 Assessment Tool

We provided the following questionnaire (Table 1) and asked the subjects to fill in it after the game session. Based on the results observed during the games (e.g. the number of robot’s wins, draws, and loses) in modes a and b, and the participants viewpoints, it is studied whether there is a significant difference between the performance of the robot in the random and Markov Chain-based strategy and a possible correlation between the questionnaire results and the robot’s performances.

Table 1. The questionnaire provided in this study to collect the participants’ viewpoints regarding the robot’s performance.

4 Results and Discussion

4.1 Hand Gesture Recognition Results

In the training phase of the neural network for hand gesture recognition, we used 80% of our training data set for the training procedure and the remaining as our test data. We used K-fold cross-validation of 10 for the training dataset in the training process. The training method was Adam algorithm which is based on the Stochastic Gradient Descent (SGD). We indicated that the gesture recognition’s result for this network on its test data was 93% and the standard deviation was 11% which is fairly admissible to be used in our experimental conditions. Now, the MLP is ready to be used in the RPS game operation as the gesture recognition algorithm. The outputs of the MLP would send to the next stage (i.e. the prediction algorithm) as an input. It should be noted that regarding the recognition of hand gestures, although the presented algorithm could handle the situations of this study, for real life applications, perhaps more robust and real-time hand gesture detection is needed which is investigated in [19].

4.2 Human-Robot Interaction Results

A sample snapshot of the designed HRI game is shown in Fig. 2a. We consider the points +1, 0, and −1 for the win, draw, and lost situations in each round, respectively. Therefore, we can define the “score” parameter in each set to be the number of robot’s wins minus the number of its losses. The positive, zero, and negative scores are equivalent to the robot’s overall performance of win, draw, and lost. One may interest to see different details of each rounds’ results for the both used algorithms; however, considering that there is not enough room to present all of the figures in this paper, we selected three graphs which is described in the following.

Fig. 2.
figure 2

(a) The RASA robot during the RPS game. (b) The histogram of the robot’s scores in the Markov Chain mode and random mode (positive, zero, and negative game scores indicate the robot’s wins, draws, and losses, respectively).

As the first graph of the robot’s performance during the game sessions, Fig. 2b shows the histogram of the robot’s scores in this study in both modes. By performing paired T-tests (Table 2) on the observed results in the two modes, it is indicated that in the current situation of this study (i.e. the number of participants and the small sample set of games), the robot’s mean scores as well as its overall results in the Markov Chain mode are significantly higher than the random mode (p-values < 0.05) which shows that the robot became a smarter opponent in mode b which makes the HRI more appealing. As it can be seen in Fig. 2b, in 16 out of 32 sets, the participants beat the robot with the random strategy. The most frequent observed result in the random mode is the score of zero (i.e. the overall result of draw) and interestingly, it seems that the random mode’s bar graphs somewhat looks like a normal distribution around zero which is the most likely prediction come to mind due to the random nature of the used algorithm. On the other hand, our participants could not easily beat the robot during the Markov Chain mode and in 21 out of 32 games (66%), they lost the game to RASA. An interesting observation in this study was that only in 5 out of 32 performed pair games, the overall results of the random strategy is higher than the smart strategy (e.g. for participant #6, the random strategy drew while the smart strategy lost the game to the human opponent).

Table 2. The results of the paired T-tests on the robot’s scores (i.e. mode a verses mode b).

In Fig. 3a, the average number of the wins for the robot in each round are presented for both modes. Qualitatively, we can observe that in the second half of the games (i.e. rounds 11–20), the difference between the average scores of the robot in the smart mode and the equivalent number in the random mode is higher than the similar difference in the first half of the game. We believe that this probably happened due to the ability of the Markov Chain model in intention prediction of the users based on the history of their movements and the model performed more powerful as the rounds continued and its database got richer. In Fig. 3b, another perspective of the mentioned observation is presented. It can be seen that during our experiments, the average cumulative scores of the Markov Chain model has approached the mean score of +1 (robot’s win) while the equivalent number for the random mode has tended to −1 (robot’s lost). While it may not been considered as a big difference between the strategies’ performance, it can show us that the robot is a stronger/smarter opponent when it uses Markov Chain strategy.

Fig. 3.
figure 3

The histogram of the robot’s scores in the Markov Chain mode and random mode (positive, zero, and negative game scores indicate the robot’s wins, draws, and losses, respectively).

Regarding the participants’ viewpoints of the robot-assisted interactive game, the results of the filled questionnaire are presented in Table 3. As a preliminary estimate of the developed HRI’s acceptability, the mean average of Questions #1 to #4 is 4.19 out of 5 which indicates that the developed HRI was highly acceptable and enjoyable for the users while they involved in doing the game’s tasks quite serious (Question #5). The overall Cronbach’s alpha (Q1–Q4) is the acceptable value of 0.74 while each Cronbach’s alpha regarding Q1 to Q4 are 0.60, 0.62, 0.67, and 0.79, respectively.

Table 3. The mean and standard deviation (SD) of Questions #1 to #5 of the assessment tool.

In order to investigate whether there is a significant difference between the participants’ viewpoints about the robot’s intelligence during each game mode, a paired T-test was conducted on the results of Question #6 (in “mode a” and “mode b”). As it is shown in Table 4, the p-value is less than 0.05 which means that regarding the subjects’ opinions in our experiments, the mean score of the robot’s intelligence in the Markov Chain-based algorithm was significantly higher than the random strategy mode. Without having a previous knowledge about the used strategies, the participants sensed that the performance of the robot in one mode is more intelligent than the other mode. This finding alongside the other questions’ scores is also in line with the confirmation of the hypothesis of this study that intention prediction in playing Rock-Paper-Scissors makes the robot quite attractive, intelligent, and even a stronger opponent for the users which would lead to a more socially acceptable HRI. Finally, after calculating the correlations between the robot’s scores and the results of Question #6, we observed that the Pearson coefficient (r) is 0.34 (p-value = 0.006) which indicates a positive correlation between the participants’ viewpoints of the robot’s intelligence and the scores they got in the games. As it is mentioned, the real aim of using intention prediction is to improve human-robot game playing more natural and pleasurable. We believe that this is doable when the robot is also trying to predict the opponent’s next move as other people do. It should be noted that intention prediction does not lead an opponent who always wins. As in [20], it is shown that loosing at times on purpose increases engagement of children and makes educational tasks more fun. Due to the combinatorial random-strategic nature of the RPS game, both of the players have appropriate chance to win the game and using the prediction strategies by the robot does not guarantee the robot’s win.

Table 4. The results of the paired T-test of the subjects’ viewpoints about the robot’s intelligence in the games (i.e. mode a verses mode b).

5 Conclusion

In this paper, we presented a social robotic setup to play Rock-Paper-Scissors while the robot is trying to predict the intention of its opponent during the gameplay. Although RPS is not considered as a solved game, the analysis of the results from previous rounds can improve the chance of winning. We had two strategies ahead, one was playing randomly and the other was using an algorithm for player’s intention prediction. Among different intention prediction methods, we used a probabilistic model (Markov Chain) using the results from gameplay history for prediction. Nonetheless, this also leave an acceptable chance for the opponent to win the robot. Our assumption was that playing with a platform equipped to intention prediction system can be highly enjoyable. In the prediction module, a Markov Chain model is implemented to memorize the information about transitions of game states. Each game state contains both the actions of the user and the robot at every round. After conduction experimental tests with 32 persons, the results coming from both the gameplay scores and the surveys shows that Markov Chain based strategy reaches reasonable high rate of wins comparing to the random strategy. The results also revealed that participants attribute high scores in smartness, attractiveness, and acceptance of the robot who is equipped to a smart algorithm.

As the main limitation of this study, the combinatorial random-strategic nature of the RPS game makes it difficult to make big/strong claims about the findings of this study; and generalizing them to the other situations needs recruiting more participants and performing more rounds in the games in future studies. In addition, other prediction approaches such as learning based, or ensemble methods could be implemented on the robot to evaluate their effectiveness. Moreover, it can be mentioned that in normal communication, the task is not that always well-defined; i.e. the intention prediction cannot be made by getting real-life sequences of the 3 gestures that are used in RPS. There are various adversarial games which could be selected as a case study to investigate other dimensions of intention detection in human-robot games.