1 Introduction

As interactions between humans and robots become more complex, there is increasing interest in building robots that can interact with humans in more intuitive and meaningful ways; robots such as the Fish-Bird wheelchairs [54] have demonstrated that people naturally seek interaction through touch and expect even inanimate-looking robots to respond to tactile stimulation. In robotics it is therefore important to design a method for touch identification that can be active over all or most of a robot’s surface area; this could be achieved using an artificial “sensitive skin.”

The functional requirements for an artificial sensitive skin—such as spatial resolution and sampling rate—remain debatable and are to some extent dependent on the intended application of the skin. Our previous work [60] has discussed approaches to the creation of artificial sensitive skin. Furthermore, extensive descriptions of various sensor types can be found in Dario et al. [10], De Rossi and Scilingo [11] and Cutkosky et al. [8]; a thorough review of the state-of-the-art in robot tactile sensing is given by Dahiya et al. [9].

The interpretation of touch in robotics is a vast, unresolved research area that will play a crucial role in the further development of human-robot interaction (HRI). A robot that is able to “feel,” “understand,” and respond to touch in accord with human expectations could lead to more meaningful and intuitive HRI.

Research in the area of tactile HRI [5, 28, 29, 37, 44, 61, 62, 65, 66] has typically been focused around methods for identifying touch modalitiesFootnote 1—for example, “tap,” “pat,” and “stroke.” The location of touch on the body of a robot has also been used to differentiate between touch modalities and what could be called socially-loaded touch [7, 36, 46, 67]. Socially-loaded touch is defined here as touch that contains some hidden social implication which cannot be interpreted unambiguously without additional information, such as the location of the touch on the body, or the social context of the interaction. The interpretation of emotions and other social messages has mainly focussed on facial expression [3, 16], acoustic information [47], audiovisual data [4] and physiological signals [19, 41]. As far as we are aware, the interpretation of affective touch for robotics—and in particular through an artificial sensitive skin—has not been widely studied. For extensive surveys in HRI, long-term social HRI and tactile HRI the reader is referred to [18, 39] and [2].

In our previous work [59, 60] we developed a large-scale, flexible and stretchable touch-sensitive artificial skin for robotics based on the principle of electrical impedance tomography (EIT). This skin, which can be used to extract information such as location, duration and intensity of touch, was used to cover flat [61] and three-dimensional structures such as a full-sized artificial arm [62]. Furthermore, in [61, 62] we demonstrated that tactile information extracted from the EIT-based artificial skin, together with a ‘LogitBoost’ classification algorithm, can be successfully used to differentiate between multiple touch modalities commonly used by humans during tactile interaction.

This paper extends our previous findings and concentrates on the interpretation of social touch through the classification, using a supervised LogitBoost algorithm, of a set of messages and emotions communicated via touch from humans to the artificial arm used in [62]. The experimental results reported here demonstrate that an artificial arm covered with an EIT-based sensitive skin, together with attributes such as location, duration, and intensity of touch, can be used to classify social touch at better-than-chance levels and with accuracies comparable to those achieved by human touch recipients. A correlation between different touch modalities, attributes extracted from tactile stimuli, and the social messages they communicate is presented. Differences in the classification accuracies of participants depending on their gender and cultural background are discussed.

The remainder of the paper is organised as follows. An overview of tactile communication is presented in Sect. 2. Section 3 then introduces the field of touch in social HRI. Section 4 briefly describes EIT and how it was used to create a touch-sensitive skin that covers an artificial arm. Section 5 describes data preprocessing and classification. The experiments are then described in Sect. 6, followed by discussion and conclusions in Sects. 7 and 8.

2 Tactile Communication

The interpretation of touch between humans is highly complex. Early work on social interaction [24, 25] demonstrated that humans extract important information from tactile stimuli that helps them to understand the interaction. Influencing the interpretation of touch are factors such as the modality of the touch (e.g. pat, push, scratch, etc.), the location of the touch on the body, the gender [26, 33] and cultural backgrounds [43] of the two people touching, and the content and prosody of any concurrent speech [31].

It is almost impossible not to respond to touch, yet communication by touch is so powerful that misinterpretation of intentions is potentially harmful. The location of touch, for example, could be divided into two classes: “non-vulnerable” body parts such as hands, arms, shoulders, and upper back, and “vulnerable” body parts such as head, neck, torso, lower back, buttocks, legs, thighs and feet [33, 33]. In general, the more a touch is seen as an invasion of privacy, the less positive—loving, pleasant and friendly—it is rated to be [26].

Additionally, the congruence between a touch, the context of the interaction in which the touch occurs and the social intimacy of the people involved in the interaction are all significant factors in the interpretation of the touch and the touch recipient’s psychological receptivity to the touch [26]. For example, in some cultures (e.g. Europe, North and South America), a pat on the buttocks is acceptable between members of a sporting team after a good play, but it could be considered sexual in more intimate interactions [75]. A pat on the head is often interpreted as condescending, whereas a pat on the back is typically used to signify congratulations or condolence [27]. The effect of gender is also important. According to Heslin et al. [26], women derive the primary meaning of a touch from their relationship to the other person, while men most significantly define the meaning of a touch by the other person’s gender.

Recent studies have demonstrated that touch also communicates emotions, and humans have the ability to distinguish between different emotions transmitted through touch alone [22, 23]. Anger, for example, can be characterised by a touch of short duration and moderate-to-strong intensity, such as pushing and shaking, whilst sadness is associated with a light touch of moderate duration, such as nuzzling or hugging.

Although the aim of touching during human interaction is to communicate messages rather than to transmit touch modalities, human descriptors of touch commonly invoke touch modalities. In other words, it is easier to understand that anger is transmitted by “pushing and shaking” than by a “short duration touch of moderate-to-strong intensity.” After all, what does moderate-to-strong intensity really mean?

The term “touch modalities” as it is used here refers to the basic form of touch in which a tactile gesture is differentiated only by the underlying characteristics of the touch itself. Consequently, touch modalities are characterised by attributes such as intensity, movement and duration. The interpretation of social touch, however, comprises a more complex process than identification of a touch modality. Like modality identification, it begins with the basic characteristics of the tactile gesture, but instead aims to “understand” the intended meaning behind the touch. That is, to consider attributes such as body location and context to provide a more accurate and practical interpretation of touch. This process would, for example, allow one to discriminate between a condescending pat and a congratulatory pat by considering which body part the pat is applied to [27].

Although different touch modalities are commonly related to some specific social messages (e.g. stroke with affection, push with rejection), the authors believe that the interpretation of social touch does not strictly require a prior or partial interpretation of touch modality, as suggested in Fig. 1. After all, the true aim is to understand the message, not the modality used to transmit it.

Fig. 1
figure 1

Conceptual diagram of the interpretation of social touch by a robot, as envisaged by the authors. Discontinuous arrows indicate the subordinate role of touch modality interpretation in the interpretation of social touch. This figure suggests that the interpretation of social touch does not necessarily require a prior interpretation of touch by modality

It is evident that humans extract information from tactile stimuli that helps them to interpret social touch. In robotics, it is important to design a method for touch identification and interpretation that allows for both natural and intuitive interactions with humans.

3 Touch in Social HRI

Even in its early stages of development, sensing and interpretation of intuitive touch has been shown to play an important role in HRI, where robots such as Paro the seal [5658, 70, 71] and the child-sized robot KASPAR [4951] have provided significant physical and mental improvements to child and adult patients. Furthermore, interactive humanoid robots with tactile sensing capabilities illustrate the possibility of using robots to improve daily life; from Robovie [34] working as a tutor in classrooms, to Robonaut 2 [12] collaborating side-by-side with humans in the International Space Station. Although these examples demonstrate that interpretation of tactile stimuli would be a useful tool in HRI, methods for tactile sensing and touch interpretation—generally based on machine learning algorithms—are still far from perfect.

Iwata and Sugano [28, 29], for example, used a modified counter-propagation algorithm to classify ten touch modalities while Stiehl and Breazeal [65] and Stiehl et al. [66] used artificial neural networks to classify eight touch modalities. In both experiments touch was transmitted only by a single individual. The k-nearest neighbour algorithm was used by Naya et al. [44] to classify five touch modalities transmitted by eleven subjects; temporal decision trees were used by Koo et al. [37] to classify four touch modalities from 12 subjects while the present authors used a LogitBoost algorithm to classify nine modalities transmitted by 40 individuals [62].

Knight et al. [36], on the other hand, used touch information from local sensors combined with the location of touch on the robot’s body to differentiate between socially-loaded touch (e.g. hug, head pat, foot rub, slap cheek etc.) and touch modalities. Although the work approached the interpretation of social touch, only socially-loaded touches that were clearly not related to any social message were considered. Similar work, where touch modality and body location were used to classify socially-loaded touch, was presented by Taichi et al. [67].

Furthermore, Cooney et al. [7] extended the classification of socially-loaded touch to a full-bodied robot by using vision, in addition to touch, to augment the classification. Classification of touch was done using a support vector regression algorithm. In addition, and through participants’ descriptions, the cited work evaluated different levels of affection typically conveyed by 20 different socially-loaded tactile gestures.

Noda et al. [45, 46] proposed a method for the classification of touch based on scenarios such as “let’s shake hands;” “give me a hug;” “I wish you’d pat me on the head;” “hello” and “what’s your name?” The robot in this research was controlled using the “Wizard of Oz” methodology [35] while it interacted with humans. Touch features were based on the cross-correlation of data from discrete sensors distributed over the robot’s body. Although this work closely relates to the interpretation of social touch (e.g. “hello”) it is intermixed with socially-loaded touch (e.g. “I wish you’d pat me on the head”). Furthermore, class labels were assigned on the basis of how the robot approached—as the touch initiator—and spoke to the participants. It is believed that this behaviour could have strongly influenced the way that participants transmitted touch.

Yohanan and MacLean [7274] studied the interpretation and display of affective touch through the artificial Haptic Creature. In [74] Yohanan and MacLean surveyed people to evaluate the likelihood that each of thirty tactile gestures would be used by the participants to communicate nine specific emotions to the robotic creature. Each participant then transmitted their most likely touch gestures to the Haptic Creature whilst imagining feeling the emotion that they associated with the touch gesture. The location and pressure of touch were manually encoded second-by-second. This study provided extensive information about the tactile gestures that were used to communicate emotions to a pet-like robot.

4 EIT-based Artificial Sensitive Skin

EIT [40, 69] is a non-invasive imaging technique used to estimate the internal conductivity distribution of an electrically conductive body by taking measurements from electrodes attached only at the boundary of the body. If the conductivity in a region of the body changes, the current distribution also changes and EIT can be used to quantify these changes. In this method, electrodes are typically located in the borders of a thin conductive material that changes its conductivity properties due to applied pressure. Conductive rubbers, foams, fabrics, etc. can be used as the medium. EIT then allows changes in resistance—and therefore pressure—across the sheet to be determined. The essence of EIT is to inject a known current into the electrically conductive medium using a pair of boundary electrodes, while taking potential measurements at the remaining electrodes. Scanning this pattern of current injection and potential measurement rapidly around various electrodes it is possible to calculate the approximate conductivity distribution inside the medium through inverse solution of Maxwells equations.

Since most of the sensing area is made of thin materials without any internal wiring, it is possible to create large, flexible and stretchable ‘skins’ of arbitrary shapes. Furthermore, as the response of the system depends only on the variable-conductance material used, materials sensitive to different types of excitation, such as pressure or temperature, could be used. An EIT-based sensitive skin has the potential to be a low-cost and easy-to-manufacture solution to the problem of large-scale sensing. For the current application only pressure sensing was considered.

A major disadvantage of EIT-based sensors, as compared with other pressure/force sensing technologies, is their poor spatial resolution. However, considering that during social interaction humans have the ability to understand messages transmitted via touch regardless of the sensory limitations in some areas of human skin—for example, lower spatial resolution in arms, back and stomach [38]—it was assumed by the authors that high spatial resolution is not an absolute requirement.

For this work, our previous configuration of an artificial arm covered with an EIT-based skin was used [6062]. The reader is referred to the cited work for more detailed information about EIT forward and inverse solutions, hardware and software implementations, and how they were used to realise a touch-sensitive skin.

To provide a more natural and intuitive touch environment, the sensitive skin was shaped and mounted on to the forearm and upper arm of a full-size fibreglass mannequin, as shown in Fig. 2. The use of a thin layer of polyurethane foam under the EIT-based sensitive skin provides a degree of mechanical compliance and damping that serves to attenuate transients when the artificial skin is touched. A soft suede fabric was placed on top. An arm was selected for these experiments as it provides a “moderate” or “very pleasant” zone [26] when touched by participants of either gender.

Fig. 2
figure 2

a Dimensions of the artificial skin with 19 circular electrodes. Step-by-step assembly of the skin layers on top of a mannequin arm: b polyurethane foam, c EIT-based sensitive skin (black) and soft suede fabric and d finished arm

The mannequin arm was then elastically mounted with freedom to rotate about an axis normal to the plane of Fig. 3 and translate sideways in that figure. The mounting incorporated rubber as an elastic and damping element. A Tekscan, Inc. FlexiForce \(^{\circledR }\) sensor was used inside the shoulder joint to measure the magnitude and direction, principally parallel to the translatory degree of freedom, of whole-arm movement. This sensor emulates, in a very simple way, the proprioceptive sensing of muscular effort that occurs in a human arm.

Fig. 3
figure 3

Shoulder joint sensor and artificial arm’s shoulder joint. The FlexiForce sensor is located flat under the black rubber and cannot be seen

A user interface for experiment control was written in LabVIEW \(^{\circledR }\). Data acquisition was achieved via an ADLINK Technology Inc. board, FEM meshes were generated using DistMesh [48], and inverse solution and image reconstruction were done in MATLAB\(^{\circledR }\).

5 Preprocessing and Classification Methods

This section introduces the steps followed to transform the raw data obtained during experimentation (Sect. 6) into an appropriate format that could be used for touch classification. The classification algorithm is also introduced.

5.1 Data Filtering

Since this application is required to work in real time, data preprocessing began from the data collection stage where potential measurements were low-pass filtered and amplified in hardware before their acquisition. Noise was additionally reduced by over-sampling the signals by a factor of 10. This over-sampling factor achieved a good compromise between noise reduction and sampling rate. Data collection was achieved at approximately 40 Hz after over-sampling.

5.2 EIT Inverse Solution

The EIT inverse solution is needed to find the distribution of conductivity changes inside the conductive domain that corresponds to measurements of electrode potentials. Difference imaging [1] was used for the inverse solution.

Unlike our previous work [5962] where the simplified point electrode model was used to solve the EIT problem, here the complete electrode model with the generalised Tikhonov regularisation, as described in [6, 69], were used. This model considers the existence of a discrete number of electrodes of finite size, the shunting effect of a conductive electrode and the potential drop due to the electrode’s contact impedance. A total of 729 elements connected by 448 nodes were used for the FEM mesh required for the forward solution. Since all experiments were performed with the artificial skin fixed onto the rigid arm, skin deformation and electrode movement were insignificant and any consequential conductivity changes [64] were ignored. This configuration results on a spatial resolution of approximately 10–15 % of the characteristic dimension of the unrolled artificial skin (Fig. 2a).

5.3 Data Segmentation

The procedure followed was to consider a full touch interaction from the beginning of a tactile stimulus until its end. The beginning and end of each touch was defined by a change in the intensity above (or below) a pre-defined threshold of a signal from either the artificial skin or the shoulder joint sensor. A variable size window was generated to mark the beginning and end of each touch.

5.4 Feature Extraction

In a sensitive skin, it is sensible to use features analogous to those used when evaluating the human sense of touch. Four attributes are proposed as a foundation for the features needed to classify different types of touch.

  1. 1.

    Pressure intensity is the most obvious manifestation of touch. The maximum and minimum pressure intensity values over the surface of the skin were used as features; these values correspond to the maximum and minimum conductivity changes that occur over all finite elements in the mesh. In addition to the conductivity changes in the artificial skin, two independent features were used to encode the magnitude and direction in which the artificial arm—as a whole—is moved. These features were taken as the maximum and minimum potential changes in the joint sensor over the duration of the touch.

  2. 2.

    Touch location is equivalent to the system’s ability to locate accurately the centroid of a touch. Only those elements in the reconstructed image containing at least 75 % of the maximum amplitude were considered. Touch location was evaluated by computing the coordinates of the centroid of an individual stimulus. Location was encoded by the \(x\) and \(y\) axis values with their origin at the centre of the elbow, between electrodes 6 and 15 in Fig. 2. The total distance from the initial to final locations of a touch was also used as a feature.

  3. 3.

    Area of contact refers to the fraction of the area in contact between two objects and provides information about pressure distributions on the surface of the skin. Contact area was computed by evaluating two features. The first feature is the spatial resolution at the instant in time where the maximum pressure intensity occurs. The spatial resolution was evaluated by calculating the ratio of the number of elements in the reconstructed image containing at least 50 % of the maximum amplitude when a single stimulus is applied. The second feature is the mean of spatial resolutions during the duration of the touch.

  4. 4.

    Temporal information refers to changes in the touch stimulus applied to the skin over the time of contact. Four features based on temporal information were used: touch duration, maximum rate of positive intensity change, maximum rate of negative intensity change and touch count. Touch duration was evaluated as the time taken from the beginning to the end of the touch. Maxima of positive and negative intensity changes were computed by evaluating the difference in intensity between the elements in the reconstructed image at adjoining time sequences. The positive intensity rate is then the rate of intensity change when the pressure is applied, while the negative intensity rate refers to the change when the pressure is released. Finally, touch count refers to the number of sequential touches that constitute a particular message.

Altogether 13 features based on the four attributes presented above were used for classification and are listed in Table 1. These features were selected from a list of approximately 20 candidate features using the information gain attribute evaluation technique as implemented in WEKAFootnote 2, the Waikato environment for knowledge analysis [21]. This technique provides the information required to measure the relative contributions of various features in a classification problem and so rank them in order of importance. The same approach was used successfully with similar results in [62] during the classification of touch modalities. The thirteen features in Table 1 were selected using aggregate data from the experiments in [62] and the current experiments. If multiple touches were used to communicate a message, the average of their values was used for classification.

Table 1 List of attributes and corresponding features used for touch classification with the artificial arm

5.5 Classification with LogitBoost

In the present work, the “LogitBoost” classifier [17] with decision stumps as a base (weak) learner was used. This classifier has previously been used by the authors for classification of touch modality on flat [61] and three-dimensional surfaces [62]. Additional information about the algorithm can be found in the cited publications.

WEKA was used for classification. All steps from the inverse solution to classification were carried out off-line. If the off-line preprocessing and classification steps were performed on-line, the complete process of data collection, preprocessing and classification would have executed at approximately 36 Hz.

6 Experiments

This section describes the experiments that were performed to evaluate the ability of the system to distinguish between a number of emotions and social messages that are commonly transmitted by humans during social interactions. All experiments were conducted with the same setting described in our previous work on the interpretation of touch modalities [62]. That is, a private experimental room was divided in two by an opaque curtain with the artificial arm protruding through (Fig. 4). A table was placed such that half of it was on each side of the curtain.

Fig. 4
figure 4

Human and artificial arms showing a male participant interacting with the arm of a human touch recipient. Both arms project through holes in an opaque curtain located on the far right of the image. Models were used to protect the confidentiality of experiment participants

Potential measurements from the EIT-based artificial skin were taken from 17 boundary electrodes referenced to two internal electrodes using the 4th injection pattern [60]. The electrical potential was not measured at electrodes carrying injected current, giving a full set of 285 independent voltage measurements at each acquisition time step. Potential measurements from the shoulder joint sensor were taken via a voltage divider connected directly to the data acquisition hardware. Data from the artificial skin and the shoulder joint sensor were acquired in parallel and at the same sampling rate.

In addition to a “no touch” gesture, the six basic emotions proposed by Ekman and Friesen [14] and described to the participants as “anger,” “fear,” “happiness,” “sadness,” “disgust” and “surprise” were studied. These emotions were selected because they have been previously used to classify emotions transmitted by facial expressions [3], speech [47], physiological signals [19] and even touch between humans [22, 23]. Furthermore, six social messages described to the participants as “attention-getting,” “greeting,” “acceptance,” “rejection,” “affection” and “animosity” were incorporated. These messages were centred on Heslin’s functional to friendship categories [24]; Jones’ positive, control and ritual touches [31]; and Guerrero’s negative touch [20]. More playful, intimate, and sexual categories of touch, as defined by [31] p. 298 et seq.], were not considered because they require a higher level of intimacy than could be expected during experiments such as those reported here. The “no touch” gesture was necessary to allow for very soft touches that the system could not detect.

To minimise any biasing effects, all instructions were provided to each participant—using the same experimenter script—at the beginning of the session. To reduce the potentially confounding variation between participants of different cultural backgrounds, a definitionFootnote 3 of each emotion and message was given to participants in addition to the word descriptor. All participants involved in these experiments had not taken part in our previous work and had not interacted with the artificial arm.

It was hypothesised that the gestures used to communicate emotions and social messages would be consistent during repetitions from a single individual but quite variable across a range of individuals. Two experiments were designed to evaluate both hypothesised dependencies. All experimental sessions were video recorded for verification post hoc.

6.1 Experiment One: Touch from a Single Individual

The objective of this experiment was to determine the accuracy of the classifier for social touch transmitted to the artificial arm by a single individual. Two male participants, both students from Latin America, were recruited from the Faculty of Engineering and Information Technologies at the University of Sydney.

Participants entered the room one by one and were assigned the role of touch transmitter, while the artificial arm was used as the touch recipient. After an brief introduction to the experiment, participants were given 10 min to think about ways that they might communicate each social message and emotion solely by touching an arm, from the shoulder to the wrist. After taking a seat in front of the artificial arm, a practice session with five randomly-selected emotions and/or messages was conducted to confirm a participant’s clear understanding of the experiment instructions.

Participants then had no further contact with the experimenter until the end of the experiment. Prompted by a simple user interface on a computer screen, participants were instructed to convey the message or emotion displayed as a word on the screen. Each word—both emotions and social messages—was displayed 50 times in a random order, giving a total of 650 touch samples per participant. The entire session took each participant approximately 50 min to complete. To aid the classification process, all data samples were labelled automatically with their correct class by the software immediately after data acquisition.

To reduce possible variability in the results, a 10-fold cross-validation technique [68] was used to assess the classification accuracy. This technique randomly divides the full data set into 10 mutually exclusive subsets of the same size. The classifier is trained with nine subsets and tested with the one remaining subset. The training and testing process is then repeated until all ten subsets have been used for classification. The average accuracy across all subsets provides an estimate of the accuracy rate of the classifier. To provide replicability when comparing classifiers, ten 10-fold cross validations were performed using different randomly created subsets of the data.

To evaluate the accuracy of the classification, results are presented as averaged confusion matrices in Table 2. The first column in each matrix lists the actual class while the first row is the predicted class. As shown in this table, the LogitBoost algorithm performed better during the classification of touch from Participant A, with all classes correctly classified with an overall performance of approximately 95 %. These results confirm the ability of the system, and the chosen features, to discriminate between different gestures transmitted by humans via touch. Furthermore, they suggest that relative consistency and a well-defined pattern—made of a combination of the extracted features—exist in the tactile gestures transmitted by this participant. On the other hand, Table 2b shows the misclassification of up to 51 % of the message acceptance as greeting, which suggests that similar gestures were used by Participant B to transmit these two different messages. A summary of the classification accuracies for the two touch transmitters is presented in Table 3.

Table 2 Averaged confusion matrices showing percentage accuracy of LogitBoost classification of touch from a single touch transmitter, either Participant A or B. The first column in each matrix lists the actual class while the first row gives the predicted class
Table 3 Mean and standard deviations (in parenthesis) of percentage accuracy of a LogitBoost classification of touch from a single touch transmitter, either Participant A or B, averaged over all emotions and social messages respectively

Finally, if data from Participant A and Participant B are classified together we can obtain additional information about associations that may exist between the tactile gestures used by both participants during the communication of emotions and social messages. These associations can be obtained by using the LogitBoost algorithm trained with data from Participant A to classify data from Participant B, and vice versa. Results that show these two-way associations in the form of confusion matrices of average classification accuracies are given in Table 4. Average accuracies for the classification of emotions and social messages are 32 % and 51 % respectively.

Table 4 Averaged confusion matrices showing percentage accuracy of LogitBoost classification of touch from one participant, when the algorithm is trained with data from Participant A and tested by classifying touch from Participant B, and vice versa. Accuracies are averaged over both participants. The first column in each matrix lists the actual class while the first row gives the predicted class

With the exception of the three social messages attention, rejection and animosity, Table 4 shows a significant decrease in the classification accuracies of both participants relative to those shown in Table 2. These three messages were communicated in similar ways by both participants, whereas the remaining emotions and messages were transmitted in significantly different ways by the two participants. For example, Participant A used higher pressure intensities to convey anger, while Participant B used higher intensities during the communication of fear.

The information gain attribute evaluation as implemented in WEKA was used to rank the features in order of importance. The evaluation indicates that the features that contributed the most during all classifications were location of the touch on the \(x\) axis, touch duration, maximum rate of positive intensity change and maximum pressure intensity value. The least contributors were minimum potential value in the joint sensor and touch count.

The following experiment extends these results by analysing touch from a larger number of individuals.

6.2 Experiment Two: Touch from Multiple Individuals

This experiment aimed to determine the accuracy of the classifier for the same social messages and emotions transmitted by a range of individuals. To provide additional comparative data to assist in the evaluation of the classification system relative to the classification accuracy of individual human touch recipients, a set of control experiments was also conducted by having participants touch a human arm with the touch recipient acting as a human classifier of social touch. As a result, this experiment was divided into two parts: one used the artificial arm as a touch receiver while the other used a human arm to receive touch.

A total of 42 individuals (twenty-one pairs) took part in the experiment: 29 males and 13 females aged between 20 and 50 years old. Volunteers originated from nine different countries within Australasia (38 %), Europe (5 %), Latin America (52 %), and the Middle East (5 %), and identified as belonging to six different religious beliefs. Participants were recruited through email lists and word of mouth. Participants were randomly paired in twenty unacquainted same-gender pairs and one mixed-gender couple. All participants were well-educated, with most having a university degree. The sexual orientation of participants was not recorded. Same-gender, unacquainted pairs were preferred in an attempt to reduce possible biasing effects due to participant discomfort and/or personal relationships during experimentation. Consequently, the data obtained from the mixed-gender couple were removed from the control (human classifier) experiments. All data were considered for the LogitBoost classification.

Following a similar procedure as in the first experiment (Sect. 6.1), two participants entered the room and took seats on opposite sides of the curtain. After a brief introduction to the experiment, one participant was randomly assigned the role of touch transmitter while the other became the touch recipient. From that moment, participants were not allowed to see or talk with each other until the end of the experiment. Participants were encouraged to act naturally and to perform as similarly as possible when touching the human and artificial arms.

First, the individual assigned the role of touch transmitter was instructed to convey each message or emotion displayed on the computer screen to the artificial arm. Each message and emotion was displayed five times in random order, giving a total of 2,730 touch samples over 42 participants.

In the second part of the experiment, immediately after the sequence of touches to the artificial arm, the individual assigned the role of touch recipient introduced their uncovered left arm through the hole in the dividing curtain (Fig. 4). Following the same procedure as with the artificial arm, the touch transmitter was again instructed to convey each message and emotion displayed on the computer screen five times to the human recipient’s uncovered arm, giving a total of 2,520 touch samples over 42 participants. The touch recipient performed touch classification immediately after each touch. The “no touch” gesture was removed from this section as it was clear to the touch recipient if their arm had been touched or not. For this part of the experiment, both the user interface and the response sheet used by the touch recipient were arranged to first display all emotions followed by all social messages. This configuration was intended to simplify the classification by “forcing” the human touch recipient to select only emotions (or messages) when an emotion (or message) has been transmitted. To reduce forced-choice effects [53], the modified forced-choice scale presented by [15] was adopted. This means that a “none” option described as “none of the six options presented” was incorporated in the response sheet.

Finally, the pair of participants exchanged roles with the touch recipient becoming the touch transmitter and vice versa. The experiment was repeated with the same conditions, although the touch was transmitted first to the human arm and then to the artificial arm. This change was made to minimise any effects caused by the touch transmitter adapting to the experimental setup. As a result, one half of the participants started the experiment touching a human arm. The session took approximately 45 minutes to complete.

Unlike Experiment One, in which a 10-fold cross-validation technique was used to assess the LogitBoost classification accuracy, in Experiment Two cross-validation was performed across all 42 touch transmitters so that variations in the classification could be assessed by touch transmitter. As a result, the LogitBoost classifier was trained with the data from 41 participants and tested with touch data from the one remaining participant. This training and testing process was repeated until touch from all 42 participants had been classified. Consequently, neither the algorithm nor the human touch recipient were trained with data from the touch transmitter whose touch they were attempting to classify. In the case of the human, “training” is through life experience of touch, yet the human had no experience of touch from the particular touch transmitter whose touch they were classifying. Similarly, the LogitBoost algorithm was trained using data from 41 touch transmitters and subsequently classified data from the remaining transmitter. The average accuracy across the ensemble of all touch transmitters provides an estimate of the accuracy rate of each classifier.

Averaged confusion matrices for the LogitBoost algorithm and the human classifier (Table 5) show that, with the exception of the human classification of greeting, all emotions and messages were successfully classified at above-chance levels. A random guess would be correct in one of seven trials, or 14 %. The lowest results were obtained during the classification of acceptance by the LogitBoost algorithm, with only 27 % of samples correctly classified. These results also compare positively with the accuracies reported for humans in Hertenstein et al. [22], where participants correctly identified anger (57 %), fear (51 %), happiness (30 %), disgust (63 %) and surprise (24 %); in those experiments sadness was not accurately identified. The 100 % accuracy in human classification of the “no touch” gesture represents our assumption that each touch recipient could easily know if their arm had been touched. This “no touch” gesture was included to maintain consistency when comparing human and LogitBoost classifiers.

Table 5 Averaged confusion matrices showing percentage accuracy of LogitBoost and human classification of touch from multiple touch transmitters. Human classification is averaged over multiple touch recipients. The first column in each matrix lists the actual class while the first row gives the predicted class

A summary of the classification accuracies for touch transmitted by all 42 individuals is presented in Table 6. Note that human classification accuracy for the mixed-gender couple (not considered in the overall human classification accuracies) was 66 % for emotions and 69 % for social messages, slightly higher than the averaged result of all participants. Furthermore, the results presented in Table 6 show that, in general, classification of social messages provided better results than classification of emotions for both LogitBoost and human classifiers.

Table 6 Mean and standard deviations (in parenthesis) of percentage accuracy of touch from multiple touch transmitters averaged over all emotions and social messages respectively. Standard deviations represent the dispersion of accuracies between touch transmitters

An evaluation of features based on their information gain indicates that the features that contributed the most during the classification of both messages and emotions were all based on pressure intensities: maximum rate of negative intensity change, maximum pressure intensity value, minimum pressure intensity value, and maximum rate of positive intensity change. The least contributors towards classification of emotions were touch count and displacement of touch, while the least contributors for the classification of social messages were spatial resolution and mean of spatial resolutions over the duration of the touch.

7 Discussion

Robotics researchers are working with several technologies to improve the sensing capabilities of artificial skins. It is unclear, however, whether a high-performance artificial skin—with, for example, sub-millimetre spatial resolution and sampling rates over 100 Hz—is necessarily a key factor in improving HRI through touch. The results presented in this paper demonstrate that a relatively low-resolution EIT-based sensitive skin, together with a shoulder joint sensor, can be used to discriminate between different social messages and emotions communicated to an artificial arm. The accuracy of classifying touch transmitted by one individual was as high as 94 % when classified by the LogitBoost algorithm. This accuracy reduced to approximately 50 % for touch transmitted by multiple (42) individuals. In the context of these results, it is interesting to note that the accuracy of human touch localisation on the forearm and upper arm is typically 8–10 mm, and the two-point discrimination threshold is approximately 40 mm.

If we scrutinise the results obtained by multiple individuals, it becomes clear that participants’ individual interpretations of tactile communication and their unique understanding of emotions and social messages comes into play. That is, the understanding of an individual is important in determining how they transmit and interpret emotions and messages encoded in touch. At first, participants appeared to be unable to decide how to transmit each message or emotion and changed their tactile gestures over repeated transmissions of the “same” touch. This suggests that participants had to adjust to the idea of transmitting specific messages solely by touching an arm, either real or artificial. Variation in how an individual uses touch to convey an emotion or social message undoubtedly influences the accuracy with which the emotion or social message is classified. Future work should endeavour to quantify this variation within and between individuals, and assess how this variation influences the interpretation of touch.

Moreover, since emotions were not induced in the participants, participants could have been communicating intentions rather than emotions. Although the communication of emotional intention does not capture the full emotion, it has been previously accepted as a fair approximation in studies of the communication of “emotions” via facial expressions [16], haptic links [63], and touch between humans [22, 23]. Future work should investigate whether the same decision-making process applies, for example, after an emotion is induced in a participant.

In reviewing the experimental results for the human classification of touch it was observed that, similar to our previous work with the classification of touch modalities [62], humans tend to compare different tactile gestures administered by the same person to aid their classification. As a result, the classification accuracies improved as more touch repetitions from the same person were experienced (Fig. 5). A chi-squared test of independence was performed to evaluate the difference in net classification accuracy between the first ten samples and the last ten samples over all participants. Results show that classification accuracy was significantly higher during the participants’ last ten samples both for emotions (\(\chi ^2\) (1,N=800) = 8.096, p \(<\) 0.005) and social messages (\(\chi ^2\) (1,N=800) = 4.62, p \(<\) 0.05). This observation suggests that higher classification accuracies could be expected when touch from a single person is evaluated, as confirmed by the highly accurate results shown in Experiment One (Table 2). Future work should investigate if a similar “learning” process can be realised in robotics by using semi-supervised machine learning instead of the fully-supervised technique used here.

Fig. 5
figure 5

Classification and polynomial trend line of 30 touch gestures (five repetitions of six emotions and six social messages) classified by 40 individuals. Each point represents the number of individuals who performed a correct classification

If we consider the classification results in Experiment Two we notice that the classification of social messages resulted in generally higher accuracies than the classification of emotions for both the human and LogitBoost algorithms. Mistakes in emotion classification were made particularly between emotions such as anger, fear and disgust, which are similar in valence and arousal [52, 55], and messages such as rejection and animosity, which are both negative forms of touch. Considered together, these results suggest that both social messages and emotions occur along a continuum of various dimensions (such as valence), and more concrete interpretation of touch is only possible when the context of the interaction is considered. It is important to remember that in these experiments touch was transmitted only to an arm, and more information could be obtained if touch was transmitted to the whole body, as demonstrated by Hertenstein et al. [23] in their investigation of the communication of emotions between humans.

Furthermore, if the number of samples that were classified as “none” by the human touch recipients (Table 5) is reviewed, we notice a greater number of non-classified emotions than non-classified social messages. This finding is attributed to two possible factors. First, the classification accuracy was higher during the communication of social messages for both the LogitBoost and the human classifiers. Second, emotions were communicated first, followed by social messages. Future research should confirm (or deny) these influences. Note that participants were asked to classify messages and emotions that would be conveyed in random order, and were not informed that messages would be transmitted only after emotions.

7.1 Statistical Analysis

To assess potential differences in the interpretation accuracy between groups of participants, a one-way analysis of variance (ANOVA) was conducted using different groups as the independent variables and the accuracy per participant as the dependent variable. A total of three a priori hypotheses were tested for each data set (human and LogitBoost) using Bonferroni adjusted alpha levels of 0.017 per test (.05/3). For the first test, the independent variables consisted of the following groups: males, females, participant starting as touch transmitter and participant starting as touch recipient (Table 7). No significant differences were found in either human classification of emotions [F(3,76) = 0.01, p = 0.99, \(\eta ^2 <\) 0.001] or social messages [F(3,76) = 0.67, p = 0.57, \(\eta ^2\) = 0.026], nor LogitBoost classification of emotions [F(3,80) = 1.77, p = 0.16, \(\eta ^2\) = 0.062] or social messages [F(3,80) = 0.56, p = 0.65, \(\eta ^2\) = 0.02].

Table 7 Means and standard deviations (in parenthesis) of percentage accuracies of LogitBoost and human classification for different groups of participants. “Count” is the number of participants in each group

Similar tests were conducted to assess potential differences between country of origin (four regions: Australasia, Europe, Latin America, and Middle East; Table 7) and religionFootnote 4 (Catholic, Christian, Hindu, Jewish, Protestant, and no religion identified; Table 7). Country of origin and religion were self-reported by the participants. No significant differences were found during the comparison by country of origin in either human classification of emotions [F(3,36) = 0.64, p = 0.59, \(\eta ^2 = 0.051\)] and social messages [F(3,36) = 0.08, p = 0.97, \(\eta ^2 = 0.007\)], or LogitBoost classification of emotions [F(3,36) = 0.64, p = 0.59, \(\eta ^2 = 0.015\)] and social messages [F(3,38) = 0.22, p = 0.88, \(\eta ^2=~0.017\)]. The comparison between religious beliefs also yielded non-significant results in human classification of emotions [F(5,34) = 0.81, p = 0.55, \(\eta ^2\) = 0.106] and social messages [F(5,34) = 0.45, p = 0.81, \(\eta ^2\) = 0.063], and LogitBoost classification of social messages [F(5,34) = 0.19, p = 0.94, \(\eta ^2\) = 0.036]. There was, however, a significant effect at the p \(<\) 0.017 level during the comparison of religion for the LogitBoost classification of emotions [F(5,36) = 4.00, p = 0.006, \(\eta ^2\) = 0.357].

Post hoc comparisons using the Tukey honestly significant difference test indicated that the mean score for the Christian group (M = 69 %, SD = 9 %) was significantly higher than the Catholic (M = 46 %, SD = 9 %), Jewish (M = 46 %, SD = 8 %) and no religion reported (M = 43 %, SD = 11 %) groups. The differences with the Hindu (M = 40 %, SD = 0 %) and Protestant (M = 47 %, SD = 0 %) groups were not significant. Taken together, these results suggest that religious beliefs may play a role during the communication of emotions via touch. Since this effect was found only when the touch was transmitted to the artificial arm, it is believed that transmitter-receiver pairings may play an important role during the classification of touch. The artificial arm, however, may have acted as a neutral agent that permitted more accurate comparisons. Future work should investigate this effect across a larger population.

7.2 A Close Inspection to Tactile Gestures

If the feature values extracted from all touch samples in Experiments One and Two are closely examined, a number of relevant characteristics are seen:

  1. 1.

    In general, feature values from Participants A and B were significantly better clustered than were features from multiple participants. This result provides further evidence of the range of variability between gestures transmitted by multiple individuals.

  2. 2.

    The highest pressure intensities during the communication of emotions were found for anger and fear; both negative emotions of high arousal.

  3. 3.

    Maximum displacements of touch were observed during the communication of sadness and fear; both negative emotions.

  4. 4.

    All social messages and emotions were communicated to the outside of the arm. This suggests that during a “full-bodied” robotics application, such as a humanoid robot, more emphasis should be given to an artificial skin located in the outer side of the arm. This possibility should be experimentally tested.

  5. 5.

    The shortest touch durations were observed during the communication surprise, disgust and anger; all emotions of high arousal. A similar observation was made during the communication of the message attention.

  6. 6.

    Social messages such as greetings and rejection were conveyed more often by touching areas near the hand. We speculate that the hand would have been touched if this were allowed. Attention was often communicated by touching the shoulder.

  7. 7.

    More touch repetitions were observed during the communication of social messages. In most cases when multiple sequential touches were used these were composed of sequential repetitions of similar touches.

If social touch and touch modalities are considered together it is possible to discern relationships between them. That is, to reveal information about which touch modalities were more commonly used to communicate different social messages and emotions. In our previous work [62], a similar approach to the one followed in this paper was performed to classify touch modalities transmitted via touch to the artificial arm described in Sect. 4. Associations between social touch and touch modalities can be identified by using a LogitBoost algorithm trained using data from our earlier touch modality experiments [62] to classify emotions and social messages from the current experiment data, and vice versa.

Table 8 shows the average of such two-way associations in the form of confusion matrices of averaged classification accuracies. High accuracies indicate strong correlations between an emotion or social message and the touch modality or modalities that were used to transmit it. The table shows that anger and fear—both emotions of high arousal—were typically communicated by squeezing, while stroking and scratching were used to communicate emotions of neutral arousal such as happiness and sadness. Furthermore, Table 8 shows that a social message such as attention is communicated by tapping and patting; greeting by pulling and squeezing; and affection by stroking and scratching. These results are consistent with the work by Yohanan and MacLean [74] which demonstrated that a touch modality such as squeeze can be used to convey “excitement”—also an emotion of high arousal—while a stroke was used to communicate “pleased,” which—like happiness—is a positive emotion of neutral arousal. No other similarities with the results reported in [74] were found; this is attributable to the dissimilar sets of touch gestures and emotions used, and to the quite different experimental settings.

Table 8 Averaged confusion matrix of LogitBoost classification showing the relation between touch modalities (from [62]) and social touch as a percentage of correctly classified samples. The first column in each matrix lists either emotions or social messages while the first row gives touch modalities

Altogether, these results demonstrate that similar gestures can be used to communicate different messages and emotions, and suggest that better interpretation of social touch can be achieved without considering a prior interpretation of touch modality. For example, emotions such as anger and disgust—both communicated by pushing—could be differentiated from each other by using the fact that anger often involves tactile gestures of longer duration and higher pressure intensity. Similarly with anger and fear; both emotions were communicated by squeezing and can be better discriminated by touch duration and the location where they are applied. In this regard, it should be noted that the classification results presented in [62] showed that touch modalities such as pat and tap were often confused with each other.

7.3 Considerations for Future Experiment Design

As a result of the work described here, the following suggestions are provided for consideration in any future experimental work.

All experiments reported here were performed with pairs of participants acting as touch transmitters and touch recipients. Although no significant differences were found between participants starting as touch transmitters and those starting as touch recipients, future experiments should consider using different participants as transmitters and recipients of touch to reduce any biasing effects.

It is important to recognise that human interpretation of touch begins almost immediately following the start of a touch: interpretation does not wait until the touch has ended. Touch interpretation in HRI should therefore also consider temporal classification methods in which the classification can begin before the end of the touch, as proposed by Koo et al. [37].

During the current research, all attributes used for the interpretation of touch ignored the effects of multi-touch that may have occurred when participants used both their hands simultaneously. For example, touching two body parts simultaneously may be characteristic of more intimate forms of touch [23].

During the control experiments reported in Sect. 6.2 it was assumed that the touch recipient could unambiguously know when they were touched. As a result, the “no touch” gesture was removed. To add consistency to the experimentation process, future work could consider a method in which all experiments consistently include (or exclude) the “no touch” gesture.

Finally, one pressure sensor was used to measure magnitude and direction in which the artificial arm was moved. Since force is not (strictly) applied orthogonally to the direction of arm movement, future work should consider a more comprehensive set of sensors to allow the force and direction of the arm movement to be determined accurately.

8 Conclusions

This paper presented experiments on the classification of social touch using a thin, flexible and stretchable sensitive skin based on the principle of EIT. The sensitive skin was used to cover a full-sized mannequin arm, and has the ability to extract information such as location, duration and intensity of touch. The extracted information was successfully used to classify six social messages and six emotions using a LogitBoost algorithm.

The experimental results presented in this paper show that interpretation of social touch through a sensitive-skin-covered artificial arm is possible. LogitBoost classification of touch from a single participant was performed with up to 90 % accuracy for both emotions and social messages. These results reduced to approximately 50 % for touch from multiple participants. This accuracy was similar to the averaged accuracy of multiple humans classifying social touch to their own uncovered arm.

No significant dependence of touch classification accuracy on gender or country of origin of the touch transmitter was found. A statistically significant difference was found in the classification of emotions transmitted by participants of different self-reported religious beliefs. Future research should investigate these cultural factors across a larger population.

Finally, a correlation between social touch and touch modalities identified modalities of touch that are typical used for the communication of different messages. This result shows that a single touch modality such as “push” could communicate a number of different messages and emotions, whilst the communicated message could be better defined by the underlying characteristics of the touch itself.