1 Introduction

By 2030 about 10.8 million older adults in the USA will be living with disability due to a stroke [77]. Providing a good quality of life for these older adults requires maximizing independent functioning after a stroke. This implies that in the future, more stroke rehabilitation may need to occur outside the traditional clinical setting and in more community-based settings such as adult daycare centers, independent living and assisted living centers [43]. Luker and colleagues [60] indicated that stroke survivors are already asking for more exercise opportunities in the community. Their study shows that many stroke survivors value physical activity, are willing to engage in more physical activity outside of formal therapy sessions, actively ask to participate in setting their goals for rehabilitation, and desire for their patient-therapist interactions during rehabilitation to empower them to reclaim autonomy through the recovery of functional abilities.

Some have responded to the increasing need to provide more rehabilitation and healthcare in the community by calling for increases in the human labor force in healthcare in several ways: (1) by training local healthcare advocates and caregivers to provide community-based rehabilitation to their own family and community, (2) by creating pathways for lower education entry into the healthcare workforce and (3) by providing pathways for persons wishing to change careers mid-stream, or even students of other disciplines wishing to become health professionals [106, 107]. Others have responded by considering the increasing use of mobile health and robotics technology [35, 61].

Evidence supports the use of active, repetitive practice of functional and purposeful activities to restore motor control and gain the capacity to complete important daily life roles [4, 52, 81]. Task-oriented stroke rehabilitation training is not without challenges. For example, functional activities often referred to as Activities of Daily Living (ADL) tasks can be quite complex and diverse. The large variety of ADL tasks often makes it challenging for clinicians to appropriately grade them to match patients upper limb capacity, goals, interests and cognitive level [52]. Lang and colleagues suggest that although this is true, most ADL tasks require the same key action skills to complete such as reaching, grasping, manipulation, and release, and thus strategies to match clients’ motor capabilities, goals, and interests to specific, challenging tasks can be taught [52, 53, 81].

Robots can play a unique role in supporting stroke rehabilitation and independent living in non-traditional settings [14, 22, 54, 56, 59, 64, 65, 71, 76, 101]. Robots can demonstrate a task, invite patients to engage in therapeutic exercise, guide the exercise activity with behaviors designed to make exercise more enjoyable, monitor the patients’ movements, and act as social agents [9, 20, 25, 30, 67]. However, enabling effective robot-assisted task-oriented therapy can be challenging. Typically robotic therapy systems are not able to support complex real-world ADL tasks which often include reaching, grasping, release and intricate manipulation [71]. The diversity of ADL tasks requires the robot to observe cues and adjust roles appropriately. For example, the robot may need to be capable of not only identifying the ADL tasks being performed, how they are being performed and with what objects [12], but also identify when to provide assistance to the client.

Our long-term goal is to develop robots that are advanced systems to be used under the therapist’s direction as a tool that implements repetitive and labor-intensive therapies [32, 71, 101]. In one scenario, clinical decisions could be managed by the rehabilitation team and when appropriate, planned and executed on the robot by the therapist. Ideally, we envision scenarios where the therapist demonstrates training tasks to the robot and teaches the robot how to function in a session. In subsequent encounters with a patient, the robot learns how to best perform the task(s) with the patient and provide autonomous or semi-autonomous therapy while the therapist provides supervisory oversight of the robot-patient interaction.

In this paper, we present our process of developing a model of patient-therapist interactions during task-oriented stroke therapy that can guide robot-patient interactions. From qualitative analyses of videos illustrating a therapist and a stroke patient interacting in a therapy session focused on upper limb training [38], we suggest that a stimulus-response paradigm can model aspects of observed patient-therapist interactions. In this model, the therapist and the patient take on a set of acting states or roles and are motivated to move from one role to another when certain physical or verbal stimuli or cues are sensed and received. We develop this model and examine how it applies across 8 different activities of daily living tasks in task-oriented stroke therapy sessions captured in 8 videos and determine how often these roles and cues occur and which were most often used. We discuss how observed roles and cues may be mapped to current and future examples of robot-patient interactions. We discuss limitations and implications of designing a robot able to fulfill the roles modeled.

2 Human-Human Interactions

Developing a physical, social and therapy agent for rehabilitation requires an in-depth understanding of human-human interactions as seen in therapist-patient dyads during stroke therapy. There are three major components of human-human interactions that need to be modeled to realize human-robot interactions in therapy: the kinematics, the haptics and the intent of the interaction from the therapist perspective [45, 57, 73, 85]. Figure 1 illustrates these critical components that define an interaction. Specifically, the movements of the therapist and patient during a therapy session can be captured in order to model the kinematics of the motion. The forces involved in the physical contact between the therapist and patient during a therapy session define the haptics of the interaction. Lastly, the intentions of the therapist communicated through physical and verbal cuing actions communicate the reason an action was performed.

Fig. 1
figure 1

The kinematic, haptic and intent components are needed to describe human-human physical interactions. The patient-therapist interaction is just one specific type of interaction

There have been studies conducted to understand the kinematics of human-human physical interactions or human-robot agents. Figures 2 and 3 illustrate some kinematic capture methods we have used for close human-human interactions. Marker-based and markerless motion capture methods as well as wearable inertial sensors can be used to quantify therapist and patient movements a therapy session [42, 51, 73, 87]. Methods using visual motion capture can be limited by frequent occlusion that results from close human-human interactions either due to marker drop out or failure of the kinematic algorithm to separate kinematics when one subject touches another [42]. More recent use of machine learning and deep neural networks to characterize multi-body interactions are promising new solutions to these problems [87, 109].

There have been studies conducted to understand the haptics of human-human physical interactions or human-robot agents. Inertial and force sensors are often used to detect haptic interactions of touch and assistance. For example, Galvez et al. attached a sensorized orthosis to the legs of patients with spinal cord injury and measured shank kinematics and forces exerted by different trainers during several training sessions [28]. Fitter et. al. used inertial data to quantify hand-clapping movements which can then be mapped onto a robot [24]. They further created a social robot that can dynamically adjust tempo while playing hand clapping games with a human user [23]. Sawers et. al. used custom force sensors to quantify and investigate the interactive forces during a gait training task [84]. They measured the direction and magnitude of the interaction forces between 2 human partner dancers to determine how those forces are used to communicate movement goals [84, 85]. They were able to quantify these interactions forces used during dancing. They observed that these forces were small and seem to act primarily as guiding cues about movement goals and did not provide physical assistance [84]. It is important to note that the measurement and characterization of the therapist’s contact with the patient during a human-human or human-robot task does not have to be conveyed or quantified via forces, but could also be conveyed or quantified via sound, electromyography, vibration, position, velocity etc. For example, Wallis and colleagues demonstrate that sonification of movement can impart important information about the therapist’s movement to the user [103]. Losey and colleagues reviewed the variety of ways human physical intention can be measured and interpreted while he or she is coupled to a robot [57].

Fig. 2
figure 2

Example of collecting kinematic data with wearable intertial sensors

Fig. 3
figure 3

Example of collecting kinematic data with markerless motion capture using Microsoft Kinect

To our knowledge there are few studies looking at therapeutic intent during training of functional tasks for upper limb stroke therapy. Experiments to understand the intent behind therapist physical and verbal behaviors are not as common. Stanton and colleagues [90] observed forty unique patient-therapist dyads during 30 min of actual practice of everyday activities during a stroke rehabilitation session. This study focused on the examining feedback received by patients during rehabilitation and not on the roles themselves. During therapy, therapists often use physical and verbal behaviors to cue and direct patients. It is not always clear how these cues relate to therapist and patient behaviors. Some studies indicate that physical cuing behaviors usually precede or are followed by verbal cuing behaviors. It is suggested that combinations of cuing behaviors form the basis of implicit and explicit motor learning strategies used by therapists to elicit motor re-learning after a stroke [50, 90, 91].

Effective motor learning often involves giving patients goal-oriented exercises that effectively balance challenge, problem solving, and functional ability supplemented by appropriate physical and verbal prompts by the therapist [6, 78]. Some studies suggest that feedback addressing motor impairments (e.g., quality of arm movement) compared with feedback addressing movement outcome (e.g., how the task was done) may be more beneficial for stroke patients [13, 48]. Verbal and auditory cuing during therapy have been found to improve rehabilitation functional outcomes; additionally, physical cues such as guidance, assistance and resistance have been shown to be beneficial in improving motor learning and task specific outcomes during rehabilitation [49, 63, 90].

Ideally, any robot assistant should maximize motor learning during stroke therapy and perform both physical and verbal cuing during close human-robot interactions. Given this requirement a better understanding of therapists’ behaviors and the intent behind their behaviors during task-oriented therapy would shed light on how we can better tune existing and future patient-robot interactions. This increased understanding will allow more accurate modelling and mapping of therapist behaviors onto a therapy robot.

3 Human-Robot Interactions

Modern technology has created a myriad of novel techniques for stroke rehabilitation ranging from interactive media [103] to therapy robots [105]. It has been observed that users tend to prefer embodied agents for both physical exercise [18] and cognitive therapy [96]. Embodied agents can also provide physical assistance to users with higher impairment levels where as interactive media is more suitable for those with lower impairment. Robots are primarily used to interact with patients during upper limb stroke rehabilitation in two main ways: as a therapy robot [6] or as a socially assistive robot [21].

Therapy robots are typically connected to the impaired limb of the human across the joint or at the end-effector and are designed to provide haptic assistance to directly aid the limbs to move. They provide therapeutic exercise for the impaired limb in a variety of ways and are generally thought to improve muscle strength, motor control, and reduce spasticity [71, 101]. For example, the user’s limb is always physically connected to Inmotion robot [10, 56, 62] which provides active or assist-as-needed guidance to the upper limb during planar tasks. Lo et al. showed that the Inmotion robot system can train the upper limb of stroke survivors and significantly improves motor control in the limb [56]. Others show that therapy robots can provide assistance in a reaching and grasping task such as drinking [11, 58, 71]. For most therapy robots, we expect the human-robot interaction to mimic patient-therapist interactions where a therapist takes on the role as a helper that physically guides the patient’s limb movements.

The framework introduced by Jarrassé et al. [40] examines interactions of two human agents who are always physically coupled and the relationship between joint tasks can be defined as cooperative, collaborative or antagonistic tasks. They argue that during therapy, the therapist-patient interaction should be modelled as a cooperative one in which the patient is learning from the therapist to build their own capacity while the therapist assists in the process. They use a cost function to define the interaction goal of each human or robot agent in the dyad which is to minimize error and effort during an interaction. In this relationship, they hypothesize that the cost function of the teacher should minimize the student’s error and its own effort.

Socially assistive robots, whether mobile or non-mobile, humanoid or animaloid, are designed to engage the patient in primarily non-contact social and exercise interactions. They often act as social agents that can demo a task, invite patients to engage in therapeutic exercise, guide the exercise activity with behaviors designed to make exercise more enjoyable and monitor the patients’ movements [20, 67]. For example, Fasola and Matarić successfully created a non-contact social and therapy agent for older adults that provided not only, active guidance, feedback and task monitoring, but also instructed and steered the task [20, 65]. For example, a social robot can be taught to demonstrate a task for a patient [23, 47] as well as how to monitor a patient’s whole-body or limb movements [104]. For social robots, we expect the human-robot interaction to mimic patient-therapist interactions where a therapist takes on the role as demonstrator and/or observer and provides non-contact guidance and/or feedback for the patient’s whole body or limb movements. We would expect that these robots would work to minimize the patient’s error and maximize the patient’s effort.

It is important to define what is meant by the terms “error” and “effort” [57]. These terms can take on many meanings and often depend on the task being performed. The term “error” is not meant to judge the patient and ascribe fault but is intended to delineate the deviations from a target or movement pattern and its measurement is intended to quantify progress towards the most effective movement pattern suggested for the patient by the therapist. For example, “error” may refer to the difference between the patient’s position and a desired target position defined by the therapist and “effort” maybe the difference between a patient’s muscle activity level and a desired muscle target level. We assume that error and effort are individually defined for patients by their therapists and are always metered with respect to what the patient can reasonably do to maximize success and minimize frustration.

Clearly, the evidence shows that a variety of human-robot interactions exist in neurorehabilitation. However, very few of these interactions involve therapeutic and/or socially assistive robots that can dynamically and independently move from being a physically coupled robot helper to one that can end the contact and assume the role as demonstrator and/or observer. The ability to transition freely between helping roles, demonstrating roles and observing roles often characterizes real patient-therapist interactions during a typical occupational therapy session.

To our knowledge there are few studies looking at therapeutic intent during training of functional tasks for upper limb stroke therapy. This paper describes the proposed stimulus-response model [45] we used to examine patient-therapist interactions during task-based stroke therapy and thus supplement the limited research done in observing and analyzing the intent behind such interaction. In doing so, we provide insights into the patient and therapist roles, physical and verbal cuing behaviors and how those cues are used with respect to the roles.

4 Modelling Patient-Therapist Interactions Using A Stimulus-Response Model

Human-human interactions often involve humans taking on different roles during the interaction. For example, Reed and Peshkin [80] reported that when working together in a coupled 1 DOF task where two humans were focused on rotating the same crank arm, the humans assumed different roles to accomplish the task such as one taking the leader role and the other, the follower role. Different roles are also observed in patient-therapist interactions on therapy tasks [90]. However, the patient and therapist may not remain in constant contact with each other. How best to model complex human-human interactions to guide human-robot interactions is still an ongoing problem. A typical method in artificial intelligence is to make robots model human actions using stimulus-response methods implemented as state-based control. A stimulus-response paradigm [2] is the change in the state of a system based on a cue or stimulus sensed by the system resulting in a response which may entail changing from or remaining in each state. Behavior-based robotics, a complex solution for modeling social robots [66], is a form of “functional modeling which attempts to synthesize biologically inspired behavior.” This type of perception-action model is commonly used in psychology to model how animal or human organisms make decisions and act on the environment in response to stimuli from the environment. How the stimuli are processed cognitively to provide appropriate physical and verbal responses form the basis for developing verbal or non-verbal human-robot interactive robots that are effective, more sociable and acceptable to humans [2, 7, 69, 89, 97].

Other engineering human-robot models are based on the physical interaction model and assumes constant contact and some level of shared control. Losey et al. [57] note that the division of roles in shared control during these situations as well as in those reviewed by Jarrassé and colleagues [40, 41] can be seen as an act of “arbitration,” which can also be viewed as two agents negotiating their level of autonomy. Their framework proposes the modeling of user intent, feedback from robot to human and arbitration. Jarrassé and colleagues [41] describe a general type of physical interaction as one involving a sensorimotor exchange with the environment, a robot or a human as “motor interaction.” Their interaction model considers the energy needed for both systems to physically perform the task and the information used by both systems to inform each about the ongoing action.

We overlay a simplified stimulus-response behavior model on upper limb therapy sessions for patients with stroke, where contact and non-contact task guidance is provided by a therapist. Figure 4 shows the stimulus-response model that we have developed to describe observed interactions during an occupational therapy session. Here, the therapist can take on three roles: demonstrator, helper and observer. The corresponding roles for the patient are observer, performer with assistance and performer (Fig. 4). A scenario may flow as follows: the therapist is in a demonstrator role when explaining the task or clarifying any task-related queries that the patient may have. The patient remains in an observer role during that period. Once the demonstration is completed, the patient begins to perform the task while the therapist moves into an observer role. If the patient is (1) observed to have difficulties in performing the task such as their impaired arm deviated from a desired motion during task execution or their impaired hand grasp slipped or asks (2) for help, the therapist moves into a helper role and enables the patient to perform the task with assistance. The term “helper” is used because it preserves the action of helping when it is the therapist only, but acknowledges that when a robot is the helper, the robot’s actions provide assistance to the patient and to the therapist.

Fig. 4
figure 4

Stimulus-response model for patient-therapist interactions. The therapist and patient engage in three complementary roles and the changing of roles are triggered by physical or verbal cues that act as stimuli to cause behavior or role changes. [45]

Table 1 Physical and Verbal cues [45]

A change from a role may occur due to one or more physical or verbal or cue(s) shown in Table 1. We defined a set of commonly used physical cues and verbal cues used by patients and therapists. We assume that during a session these cues will be provided either by the therapist or by the patient. The chosen codes for the cues are based on the Occupational Therapy Rater Interaction Analysis System (OT-RIAS) [102], a method for quantifying patient-therapist interactions from a behavioral perspective rather than a robotics approach like ours and the occupational therapy practice framework-3 [1].

We acknowledge that the proposed model will not capture the full complexity and richness of patient-therapist interactions during stroke therapy sessions, but we believe this model is a starting framework that captures the varying roles found within a patient-therapist dyad. We also acknowledge that other engineering models maybe needed to describe the physical/motor interactions [40, 57] within the helper role when contact is made.

5 Methods

Eight video examples of occupational therapy sessions for various activities of daily living (ADLs) were used (Fig. 5). These videos depicted an expert occupational therapist with a patient with stroke in various ADLs obtained from the International Clinical Educators Inc. (ICE) video library [38] with permission. ICE videos are used in occupational therapy education programs internationally to demonstrate and teach rehabilitation techniques to occupational therapy students. Videos included shoe shining, cleaning dishes, making iced tea, making a sandwich, arranging flowers, washing a car, sweeping a sidewalk and shaving.

There is an absence of literature that describes explicitly how therapists and patients take on roles during therapeutic interactions with patients. The key roles and patient-therapist dyads were developed based on observations of occupational therapy sessions–particularly sessions supporting task performance involving objects–and discussions with expert therapists. During a therapeutic interaction, the distinctive roles a therapist can take on include helper, demonstrator or observer, while the patient can be an observer, performer, or performer with assistance. Once the roles were developed, and we attempted to identify them during therapy observations, we observed that the roles occurred in patient-therapist role dyads, which we attempt to validate in this study.

The cues were developed using two tools: the OT-RIAS (Occupational Therapy Roter Interaction Analysis System) [102] and the Occupational Therapy Practice Framework (OTPF-3)—3rd Edition [1]. The OT-RIAS system is a quantitative approach to study occupational therapy verbal interaction with 45 categories. We analyzed these categories to obtain a condensed list of 7 codes that could be reliably coded. For example, in the OT-RIAS system “asks” is categorized in twelve different categories, however, for the purposes of our model, we had only one “ask” category, since our purpose was to determine the type of interaction and not the specific type of information requested. For our physical cues, we used the OTPF-3, specifically the motor skills section of the document. We analyzed this section to determine a condensed set of physical cues that could be reliably coded. The OTPF-3 includes 16 motor skills, and we condensed them into 10 physical cues. The cues that were not included are aligns, positions, bends, coordinates, walks, calibrates, flows, and endures. In addition, we added two cues to the therapist physical cue list including guides and touches which are typical physical interactions when a therapist assists a patient with task completion. Using the model presented in Fig. 4 and cues identified in Table 1, two therapists independently coded the set of 8 videos using Multimedia Video Task Analysis (MVTA) software [27]. The coder assigned a role to the patient and therapist and identified the timing and type of cue used as well as cue that acted as a stimulus for a change in role. Figure 6 shows a sample video coded using MVTA.

Fig. 5
figure 5

Analyzed videos [38]. The videos shown from left to right: (top) shoe shining, cleaning dishes, making iced tea, making a sandwich; (bottom) arranging flowers, washing a car, sweeping a sidewalk, and shaving

Fig. 6
figure 6

The code categories and time lines determine when an event happened. The therapist and the patient are always in one of the three roles

Fig. 7
figure 7

Structure of an Interaction Sequence Diagram. The actors in our diagram are the therapist and the patient (client). Each actor has a lifeline. The three roles for the therapist and the patient become the object behaviors that have lifelines of their own. The rectangles on the lifelines represent the time span for which a specific role is active. The transitions between the roles are represented by cues

The MVTA software generated multiple reports based on the codes for each video. The breakpoint report gave the sequential start and stop times for every code. The duration report provided the time spent in each role. Thus, there were 6 breakpoint reports and 6 duration reports per video for therapist roles, physical cues, and verbal cues; and patient roles, physical cues and verbal cues. These reports were processed by a custom MATLAB script which extracted coded roles and cues and identified the frequency of occurrence of each cue and role along with duration of occurrence within each video and then across all videos. We also examined role-role and role-cue relationships to determine how one role related to another and which cues resulted in role changes. We examined if the therapist spent time in all roles across all videos and if role changes in all videos were caused by a physical or verbal cue initiated by either the patient or therapist.

Table 2 Inter-rater agreement through Cronbach’s Alpha

Role-changing cues were considered to be those cues which occurred within 3 s of the occurrence of a role change or within half of the total duration of the role if the duration of the role is less than 3 s. These 3-second buffers were chosen based on observation to account for minor errors in coding. For example, these buffers when implemented in our automated analysis procedure served to minimize the chance that a cue that caused a role change would be classified incorrectly.

Fig. 8
figure 8

Frequency of occurrence of roles and time spent in each role by therapist and patient

Using the results of the analysis, we developed a pictorial representation for the cues that caused a role change which can be seen in Fig. 7 known as the interaction sequence diagram. This representation has been derived from the software engineering concept of sequence diagrams [26]. Sequence diagrams detail when and how the objects of a system interact with each other. The therapist and the patient are the “actors” who go through a sequence of roles or behaviors which are like the “objects of the classes”. The parallel vertical lines represent the “lifelines” which retain the temporal information of the video data. The cues are the “messages” that cause a change in role.

Coder agreement for roles and cues were determined using Cronbach’s Alpha (\(\alpha \)) [15] (Eq. 1). There are K components in test set \(X = Y_1 +Y_2 + ...+ Y_K\):

$$\begin{aligned} \alpha = \frac{K}{K-1} \Big ( 1- \frac{\sum _{K}^{i=1}\sigma ^2_{Y_i}}{\sigma ^2_x}\Big ) \end{aligned}$$
(1)

We calculated the Cronbach Alpha values for the duration and frequency of physical cues, verbal cues and roles.

6 Results

The coders were consistent in identifying roles and cues. Table 2 reports the frequency and duration correlation results for roles, cues, role changes. Cronbach alpha values for roles, physical cues and, verbal cues were consistently greater than \(\alpha = 0.96\) indicating a robust agreement across coders.

Table 3 Percentage time and frequency of physical cues. (*) represents most frequent physical cues
Table 4 Percentage time and frequency of verbal cues. (*) represent most frequent verbal cues

6.1 Therapist and Patient Roles

Figure 8 summarizes the percentage duration and frequency of the roles for therapists and patients across all videos. Both the therapist and the patient spent time in all three roles: 6 times as demonstrator/patient observer, 34 times as observer/patient performer, and 34 times as helper/patient performer with assistance. The frequency and duration of the therapist and patient roles correlated suggesting that treating the dyad as a unit is accurate. Therapists spent more time in the helper role (53.41%). Correspondingly, the patient spent 51.75% of the time being helped. If the patient was able to complete tasks more autonomously (41.63%), then the therapist was in the observer role (40.18%). The therapist spent the least amount of time in the demonstrator role (6.41%) and the patient spent least amount of time in the observer role (6.63%). The therapist demonstrated the task in the beginning of the session or if a clarification was required. The demonstrator role was the least used and this may have been because some of the videos were taken after the therapist had already explained the task.

6.2 Therapist and Patient Cues

Tables 3 and 4 show the duration and frequency of physical and verbal cues across all videos. The therapist performed a total of 195 physical cues across all sessions. Most of these cues provided physical assistance to the patient. There were a total of 25 patient physical cues, which were triggered when the patient made an error in the task or was unable to perform the task satisfactorily. The therapist performed a total of 199 verbal cues across all sessions. Most of these verbal cues provided indirect instructions for guidance or encouragement. The patient performed a total of 15 verbal cues across all sessions, which were frequently a request for assistance or clarification from the therapist.

Out of the 10 possible therapy physical cues, the reaches, lifts, and stabilizes cues were the ones that caused therapist to change roles. Reaches, lifts and stabilizes had a mean frequency of 36, 24 and 41 respectively. The cue, stabilizes, was used when patients required physical support to perform the task. The remaining physical cues are those that can be considered patient errors that required therapist intervention and led to role changes. The supports/expresses agreement understanding or willingness, requests/asks, commands and states verbal cues had high frequencies of 54, 39, 30 and 56 respectively. The cue states was typically statements (e.g. “try another way”) that told the patient to initiate, continue or complete a task without giving specific instructions. The supports cue was used for encouragement. Of the 4 possible verbal cues by the patient, describes/explains/states occurred 78.06 % of the time. These cues occurred when patients were clarifying the task or explaining their actions and understanding of task. In general, a patient’s verbal requests or physical inability to perform an action completely, correctly, or accurately led the therapist to switch into helper role.

Table 5 lists the frequency of the physical and verbal cues seen within each of the patient/therapy dyad as outlined in Fig. 4. The next sections report the cues within each role along with how the therapist role changes were triggered and its implication for a robot agent.

Table 5 Frequency of physical and verbal cues used within each therapist/patient dyad (*) represent most frequent cues

6.3 Therapist Demonstrator to Robot Demonstrator

For the demonstrator role, the therapist was often seen describing the task to be done as well as providing verbal and physical instructions for performing the task. This role typically ended when the demonstration of the task was completed and an invitation was given to the patient to begin. Correspondingly, the patient listened and observed the therapist actions. In the demonstrator role, the reaches cue (frequency = 3) was the most common physical cue and states (frequency = 5) was the most common verbal cue used by the therapist. This observation confirms that the therapist spent most of the time providing instructions to subject. When switching from demonstrator to observer roles, the therapist performed the supports/expresses agreement understanding or willingness cue most frequently. The patient did not trigger any cues when switching from the observer role to a performer role which implies that a therapist cue caused a role change in the patient. When switching from demonstrator to helper roles, commands and reaches were the frequent cues. The patient in the observer role triggered only 2 cues across all videos which was the does not initiate and the supports cue. This finding shows that a change from the demonstrator into any other role often happens at the discretion of the therapist. From these observations, identify some abilities that a demonstrator robot should have:

  • Performs a set of tasks and exercises for the patient.

  • Able to reach, grip, stabilize, guide and manipulate the patient’s limb.

  • Provides clear instructions and directions for the patient.

  • Communicates with subject to clarify actions demonstrated.

  • Transitions to either the helper role or the observer role at the end of the demonstration.

  • Learns new tasks autonomously or an interface with which it can be easily taught new tasks

Robots have been used in a demonstrator role quite frequently [16, 19, 33, 36, 68, 70, 75]. For example, the robot developed by Fasola et al. [19] instructs users (demonstrator role) to perform simple physical exercise through a series of personalized exercise games. This robot is capable of monitoring user performance (an observer role) and switching through various behaviour modules to avoid user boredom. Further, their users preferred the physical robot over a virtual agent. Another example of a robot demonstrator is TAIZO [68], a small humanoid that demonstrates physical exercises along with a human partner. This robot is able to verbally interact with the human demonstrator and is used as a means to capture the attention of inattentive people in crowds. A Nao robot is used by Görer et al. [33] to demonstrate exercises and can provide verbal instructions to users. The robot learns these movements from motion retargeted from a human. They further provide a taxonomy of exercises that can be used with such a robot. Robots such as the Nao have been significantly used in pediatric therapy as well. Nguyen et al. [75] uses a Poppy robot which also learns its movements from a human user. The Poppy robot has also been used in cases where the exercise movements are pre-programmed [16]. A Nao robot is used to demonstrate lower limb exercises in [70] and upper limb exercises in [36]. Most of the above robots are able to transition from a demonstrator to an observer role. In addition, they are able to provide verbal assistance but not physical assistance.

6.4 Therapist Observer to Robot Observer

The therapist within the observer role was often seen to monitor the user’s performance on the task. In general, the observations are done without direct contact with the patients and help with body movements and limb performance is provided verbally. The therapist often focuses on encouraging the patient as well. Specifically, stabilizes (frequency = 11) and reaches (frequency = 7) were the most frequent physical cues. The most common verbal cues were supports/expresses agreement understanding or willingness (frequency = 30) followed by states (frequency = 21) and requests/asks(frequency = 20). The patient was correspondingly in the performer role and frequently triggered the describes/explains/states and does not lift cues.

In an observer role, the therapist provides encouragement and support to the subject. The three verbal cues, supports/expresses agreement understanding or willingness, states and corrects triggered a change from observer to demonstrator roles. No physical cues were triggered by the therapist during this role change. The patient did not trigger any cues in this scenario. This could imply that this change is a result of the end of the task being performed or the therapist stopping the task to provide further instructions or clarifications. The therapist frequently triggered the supports/expresses agreement understanding or willingness (frequency = 20), requests/asks (frequency = 20) and states (frequency = 21) verbal cues when changing from the observer role to the helper role. Reaches (frequency = 7) and lifts (frequency = 4) were the common therapist physical cues that led to the this role change. The patient often triggered the describes/explains/states (frequency = 6), does not lift (frequency = 5), does not coordinate (frequency = 4) and does not reach (frequency = 5) cues during this role change. This change seems to be frequently triggered by the client being unable to perform a particular action or the therapist attempting to correct an incorrect action performed by the client. From this we identify some abilities that a observer robot should have:

  • Monitors users’ body and/or limb movements

  • Assesses users’ performance with respect to known performance criteria

  • Adapts the therapy behaviours to match users’ performance.

  • Able to reach, grip, lift, stabilize, move point in order to transition.

  • Communicates with patient by providing corrective feedback, suggestions for better performance, clarifications on tasks.

  • Transitions to either the helper role or the demonstrator role at the end of the demonstration.

  • Learns new tasks autonomously and the error tolerance permitted for each movements per patient.

Studies have shown that outfitting patients with a motion capture system (either inertial or vision based) such as Xsens MVN [83] or a wearable exoskeleton [39] gives the robot real-time information about the patient’s movement. Fasola et al. [19] achieves this goal using a vision-based method that can recognize user pose by segmenting the image and then determining the position of the arm relative to the face. McCarthy et al. [70] uses a Kinect to monitor its subjects. Guneysu et al. [36] tracks users using inertial measurement units. [16] uses Gaussian Mixture Models to estimate and assess user movement. Tanguy et al. [94] is another example where a similar approach is used to assess user movement.

In addition, other studies have done one or more of the following: (1) outfitted the robot with a low-cost Kinect camera to monitor information about the environment and patient’s interaction with objects, (2) outfitted the robot with tactile, accelerometry or force sensors to enable sensing of the force of the interaction [24, 85]; (3) outfitted the robot with a touch screen interface, a natural language processor or voice recognition system to enable interpretation of verbal responses or receipt of commands from the patient that could be mapped onto actions [69]; and (4) outfitted the patient with sensors to enable the robot to monitor patient emotional state such as heart rate, breathing rate, galvanic skin responses to understand emotional valence (happy, unhappy) and arousal (excited, bored) [39, 55, 92, 98]. Most of the systems mentioned are able to observe the user movements and provide verbal feedback and communicate encouragingly. Most are not prepared to transition to a physically helping role.

6.5 Therapist Helper to Robot Helper

The therapists in the helper role will typically provide assistance according to the patient’s difficulty during the observer role, apriori knowledge of the patient’s motor and cognitive impairments or the patient’s requests. The patient is guided in their movements by the therapist. The most common physical cues were lifts (frequency = 33), stabilizes (frequency = 20), reaches (frequency = 25), guides (frequency = 20), moves (frequency = 11) and manipulates (frequency = 12). The therapist reached in to directly touch the patient’s limb, lift the arm, stabilize the limb movement patient or guide the limb. Although forces were not measured, most of the physical encounters appeared to be small guiding forces. Supports/expresses agreement understanding or willingness (frequency = 24), states (frequency = 30), commands (frequency = 25), and requests/asks (frequency = 19) were the common verbal cues used. As a performer with assistance, the patient triggered the does not lift, does not grip and describes/explains/states cues frequently. States was the only verbal cue of the therapist during a role change from helper to demonstrator. But, the physical cues reaches, moves, stabilizes and manipulates were triggered. Supports was the only verbal cue while does not grip and does not lift were the physical cues triggered by the patient during this change. This finding implies that changing from this role often occurred when the task was completed. When moving from helper to observer the supports/expresses agreement understanding or willingness was the most frequent verbal cue and reaches and stabilizes were the most common physical cues performed by the therapist. Requests/asks and describes/explains/states were the verbal cues triggered by the patient during this role change. No physical cues were triggered by the patient in this case. The therapist returns to an observer role once the client is able to perform the task by themselves. From these observations we identify some abilities that a helper robot should have:

  • Adapts the therapy helping behaviours to match users’ performance.

  • Able to transition into another role by ending a physical cue such as reaching, gripping, moving, lifting, transporting, stabilizing, guiding, pointing, touching or manipulating the patient’s arm.

  • Performs safe physical assistance in the form of touches, lifting, guiding etc.

  • Assesses user’s performance based on a given criteria.

  • Provides either physical or verbal feedback based on user performance.

  • Communicates with users by providing motivational and understanding statements.

  • Transitions to either the observer role or the demonstrator role at the end of the demonstration.

Most robots that are capable of monitoring the robot provide some form of feedback to the user. The exercise coach built by Fasola and Matarić [19] does this task by using visual motion capture data adapting its behaviour to be more interactive. Tanguy et al. [94] provides verbal feedback so that the user can correct their motion. Liu et al. [55] demonstrated how online affect detection could enable a robot to adapt its behaviors to improve the therapeutic interaction with children with autism. The robot was programmed to have three possible states/behaviors. Using a learning algorithm driven by state vector machines (SVM) to learn whether the child was liking a particular behavior and a reinforcement algorithm driven by a QV reinforcement learning for rewarding one behavior over another, the study showed that the robot was able to adapt based on the child “liking” preference and settle on a behavior that the child liked the best. The Inmotion robot [56] and other hands-on robots that are physically attached to the patient [101] can be considered helper robots. Some are more transparent than others in terms of whether they are still “felt” when they turn of the helping mode and allow the patient to perform without assistance. Majority of existing robotic systems are unable to fully transition to an uncoupled hands-off state from a coupled hands-on state.

7 Discussion

This study demonstrated that aspects of the patient-therapist interactions in task-oriented stroke therapy can be overlaid on a stimulus-response paradigm where the therapist and the patient take on a set of acting states or roles and are motivated to move from one role to another when certain physical or verbal stimuli or cues are sensed and received. We examined how the model applies across 8 activities of daily living tasks and observed that therapist spent time in all roles. The maximum time was spent in the helper role and equal amounts of verbal and physical cues were given by the therapist. Role changes were triggered by physical and/or verbal cues mainly by the therapist. We observed that the therapist reacted most often to physical cues from the patients that indicated the patient was making an error and they then physically intervened to minimize that error. They typically entered the helper role on request from the patient or on observation of an error. Examples of such role-changing cues include does not lift, does not grip, and does not reach. Although the patient role change was often driven by the therapist, there were some instances when the patient requested clarification or help.

Two other factors could have impacted the change of roles and use of cues. The first is the complexity of the task, which is defined as the number of steps required to complete the sequence of actions within the task [74]. Occupational therapists typically layer actions within a task in a pedagogic sequence to allow the patient to perform the entire task; the sequence usually advancing from simple to complex in order to approximate real-world conditions [74, 82]. This process is especially true for patients with both cognitive impairments as well as motor impairments. Mullick and colleagues [74] showed that the ability to learn a simple or a complex task may be mediated by the cognitive function of a person [74]. Given this finding we suggest that the helper role may be used even more for patients with low cognitive functioning, and the physical and verbal cues would still be used in equal amounts. However, since we could not obtain levels of cognitive functioning of the patients from the videos, the relationships among cognitive function, task complexity, and time and frequency of roles and cues cannot be determined.

7.1 Implications for Clinical Effectiveness

Schweighofer et al. [86] have identified three tenets of robot-assisted task oriented therapy. The first tenet states that a robot must be able to enable users to train on actual functional tasks. Such robot systems would need to aid in identifying ADL components, grading task complexity, and adapting ADL instructions to the stroke patient. This behavior would require the robot to utilize the demonstrator role to instruct the client how to perform a task. Like therapists, the robot in the role could utilize object affordances to determine the task to be performed [44]. The robot will need to identify the affordances quickly using computer vision techniques and perform a demonstration of the desired task [12]. Social affordances have been previously used in robotics to enable multi-step task planning where an iCub utilizes learnt affordances to perform a pick and place task in collaboration with a human partner [99]. Awaad et al. [3] leverage the functional affordances of objects to generate socially acceptable behaviour for domestic robots. Here, the robot plans and executes a tea-making task through by first generating a domain specific planning problem which uses affordances during plan execution. Though affordances have been studied the existing literature to some extent, they are yet to be used for enabling social-physical interactions in the context of therapy [72, 108].

The second tenet states that robot should facilitate active participation training. Repetition and active participation from the patient are vital aspects of therapy [17]. To encourage active participation from the client, the robot can provide supportive cues as needed to motivate patients to persist and give effort. In addition, the robot can switch between the observer and helper roles to provide assistance during a functional task only when it detects the appropriate physical or verbal cue. The robot, thus encourages the patient to complete the task on their own. Additionally, the robot in the helper role must be able to safely provide the patient with the support that they require. Motion tracking can be used to determine patient pose and identify how and when to provide the required assistance. Finally, the robot should be able to personalize the training that it provides. The robot should be able to dynamically adjust the level of support offered patients of various functional levels. The observer role will thus need to be able to quantify patient progress and skill levels and the helper role must be able to quantify client progress. The patient skill levels determined by the observer role can also enable the demonstrator role to customize the task complexity.

Overall, the benefit of developing robotic systems that can aid in task-oriented therapy comes from expanding the utility of robots to assist clinicians in stroke rehabilitation by supplementing therapy sessions where repetitions and self-practice on ADL tasks are required. The system can thus reduce the number of sessions with a human therapist, thus enabling them to meet with more patients. A robotic system such as this can thus help alleviate the shortage of clinicians.

7.2 Implications for Robot Therapists

Kinematic actions of the therapist during a state can be mimicked by the robot by providing the robot with information about therapist kinematics before, during or after a state. The kinematics of the therapist actions can be captured using motion capture systems such as an exo-skeletal robot [39], a vision-based system such as the Vicon [73], or wearable inertial sensors [34]. In Mohan et al. [73], we used the Vicon motion capture system to monitor and quantify the kinematics of patient-therapist interactions during collaborative therapy tasks. Kinematic signatures arising from motion capture systems can then be analyzed with corresponding movement primitives used to support online activity recognition. Guerra et al. showed that inertial sensors can be used to capture, learn and classify upper extremity movement primitives in healthy and stroke patients during functional tasks [34]. A Hidden Markov Model along with logistic linear regression was used to predict rest, reach, grasp and other key components of each activity.

One requirement for implementation would be to provide a robot the ability to identify cues and transition accurately between states. If the robot starts in demonstrator mode, how does it know when to help? The robot could receive kinematics of user motion through a motion capture system and learn to interpret them in order to detect a cue. Further, it would also require a versatile end-effector with haptic feedback to provide the required assistance. There is a requirement for a large quantity of data about therapist intent that can be used in conjunction with a learning algorithm in order to give the robot these abilities. Recently, we experimented with a possible solution that allowed a therapist to teach a Baxter robot the reaching phase of a drinking motion [104]. The robot then used this kinematic information to build a Gaussian error tolerance around the desired movement kinematics. Using data from an inertial sensor attached to the arm of a user, the robot was able to monitor the user’s arm motion and move from observer to helper when excessive deviation from the desired movement was observed. New sensors are needed to aid in transitioning from observer to helper modes. Beckerle and colleagues [5] reviewed several new advances that can improve the ability of the robot to perceive. Tactile sensing and robot skins can extend robot sensing capabilities by acquiring contact information from large-scale surfaces. These can be conformable, cheap, and easy to manufacture. This technology could allow the robot to make and break contact with the user with more than just its end-effector.

Leveraging the interaction framework developed by Jarrassé et al. [40, 41] along with the proposed stimulus-response model, some of the more simpler aspects of the therapist-patient interaction could be defined by cost functions that modify as the therapist/patient changes state. In the helper role, the Jarrassé framework suggests that the robot could be governed by a cost function where the robot as teacher seeks to constantly minimize its own effort and the patient’s error–deviation from a defined movement target or pattern. Takagi et al. shows how this model could be applied to control one degree of freedom wrist flexion/extension robots used for point-to-point reaching movements by healthy dyads [93]. In the Jarrassé framework, the human-human dyad and subsequently the human-robot dyad is in constant contact, but this is not the case in the demonstrator and observer roles seen within therapist-patient dyads.

We observe that the contact between the patient and therapists is made and broken frequently as transitions between roles occurs. This finding then raises the question of how best to automatically determine the states of coupled and uncoupled robot actions across an entire therapy task. Our model goal is that the human-human or human-robot dyads may move between being physically coupled and then uncoupled states where the goal of each agent changes due to a state change. One possibility is to first assume the cost-function articulated above is intact during contact and non-contact states in the therapist-patient dyads and secondly, to assume a weak or strong elastic band connecting the therapist to the patient [29]. Making these assumptions, we could then determine a cost function for how physical or verbal cues would change the strength of the spring gain on the elastic band. We acknowledge that while this would be an oversimplification of the “teacher” or “therapist” role, it may serve as a first attempt to capture at least the salient quality of the need for diminishing “helping” actions as the patient progresses from more impaired to less impaired. Another technique was demonstrated by Shu et al. [88] where social affordances were used to facilitate interactions such as hand shakes and high-fives with a Baxter robot. Here, motion tracking data extracted from RGB-D videos collected via a Kinect were used as input to an social affordance grammar learning algorithm. This algorithm utilized a weakly supervised method to represent the grammar as a spatiotemporal AND-OR graph which was then used in a real-time motion interference algorithm to enable a Baxter robot to interact with a human agent. An algorithm such as the one presented by Shu et al. [88] can be used to enable to robot to switch between observer and helper role i.e., to switch between physically coupled and uncoupled states.

Regardless of the context for defining the physical and social therapy robot, it is important to consider how best to make the interaction with the robot acceptable to patients. The ability of the therapist to adapt their behaviors and to respond to patients’ verbal and non-verbal cues and then provide a situated and related response to the patients is a critical aspect of the human-human interaction. Mavridis et al. indicate that a critical component of user acceptance of human-robot interaction is the robot’s ability to verbally or non-verbally communicate in such a way as it is situated into the here and now. To do so, one method is to establish a clear connection between what is heard by the robot and said to the robot and the resulting robot actions [69]. Other studies show that changing parameters such as robot facial expression, robot personality, proximity to user, situated language, behavior, non-verbal expressions such as eye-gazing were important for developing an acceptable human-robot interaction [89]. Tapus and Matarić showed the importance of embodying the robot with affective qualities such as changing personality, facial expression, and behavior in response to patients’ actions [95]. Fitter et al. showed that when the robot’s arm compliance as well as its facial expression is changed during an interactive clapping task, the users’ perception of the robot as friendly and sociable also changed [25]. Once a social and therapy agent is developed, appropriate assessment and evaluation methods should be used to critically examine the acceptability of the human-robot interaction. Sim et al. in a recent review suggest that evaluation and assessment methodologies must be used to examine short term and long-term impacts of design choices as well as perceived and actual acceptance and usability of the robot as social and therapy agent [89].

7.3 Implications of Shared Control

Losey and colleagues argue that human-robot interactions are about shared control and that shared control can be arbitrated [57]. Arbitration is when each partner negotiates his or her level of autonomy with respect to each other. A shared control scheme is presented in Fig. 9. In this example, the therapist could program the robot to be more or less autonomous in its support of activities with the patient. This implies that the therapist may program the robot to be more autonomous in one dyadic state and less in another.

The shared control scheme (Fig 9) is common in rehabilitation robotics and is often used to leverage therapists’ expertise to allow one therapist to oversee more than one patient [46, 100]. In studies with Inmotion robot, the rehabilitation robot was programmed to be autonomous in the helper role only and applied assist-as-needed forces to the impaired arm of the patient [56]. In this scenario, the therapist could choose to oversee another patient. In another case, the therapist may program the robot to be autonomous in both observer and helper roles and allow the robot to decide when to transition across these roles. For example, in Wang et al. [104], the Baxter robot was taught to autonomously observe the user’s attempts to duplicate a desired kinematic movement and intervene by lifting its right arm to apply a lifting force with its end-effector to the patient’s forearm when slow progress or excessive deviation was detected with respect to a Gaussian decision tunnel implemented around the desired trajectory. Of course, when we ask robots to take on more of the control, we increase the need for safety. Jarrassé and colleagues [41] point out that in any close human-robot interaction the exchange of energy must be monitored and echoes the need for safety. Such a robot should ensure that it provides a safe common workspace, enable a human to predict the behaviour of the robot and ensure that collisions do not result in serious injury [37]. The robot would need sufficient information about the subject’s motion such that it can provide safe assistance [57]. For example, in Wang and colleagues [104], an inertial measurement unit was used to measure the user movement and provide feedback to the robot and the robot was programmed to move its arm at low speeds.

Fig. 9
figure 9

Stimulus Response Model for Patient-Therapist Interactions where robot shares control with therapist

7.4 Sources of Error and Limitations

The stimulus-response model presented above and the shared control extension with the robot are simplified representations of a complex and rich interaction between patient and therapist during a therapy session. While we understand that all the nuances of this interaction cannot be fully represented in this state-based model, the model serves as an ideal understanding of how the patient-therapist dyad may operate during the session and shift between states. The human-robot interaction may need to be modeled within each dyadic state as well [8, 41, 57].

Our stimulus-response model also assumes that the therapist and patient remain in a state until cued or stimulated to leave it. There may be scenarios where the roles are more interdependent such as a condition where the therapist is providing assistance or supporting the patient while demonstrating the next task. This behavior can be implemented using an evolutionary algorithm such as the one seen in [31] where the existing behaviours or role can be enhanced by combining behaviours from the initial repertoire. It was very dependent on patient errors in terms of representing physical cues for the patient. This interdependence could be overcome with combining the stimulus response model with a goal directed model as seen in [79] which shows the use of such a model to anticipate user emotion. Another limitation found was in cue definition. The cues that we defined became insufficient in cases when role changes occurred when the therapist finished demonstrating a task and the patient began the task soon after. For future studies, we recommend the introduction of a set of administrative cues that can handle role changes that are not caused by either a physical or verbal cue and can indicate activities such as the beginning or the end of a task (Table 6).

Table 6 Proposed administrative cues for the patient and therapist

Another source of limitation for our study is in the number of videos we examined to extract the roles and cue frequency. We examined only 8 simple and complex ADLs which are a sub-set of tasks used during stroke therapy. There are a large variety of ADLs that are often used to drive stroke therapy. We determined the patient-therapist interactions could be generalizable but it was unclear how task complexity affected the roles and cues [74, 82]. For example, one measure of complexity could be the length of a task which could lead to longer time spent in the helper role by the therapist to provide assistance as the patient gets fatigued. Perhaps the analysis of a larger and diverse data set can give us a clearer picture of the effects of task complexity and how a robot should modify its behavior depending on the complexity of functional tasks. Another limitation is in the type of subjects represented in the 8 videos. The videos, which were of mainly low-functioning subjects, show that we need to better understand how the identified roles and cues would change for stroke survivors with a large variety of physical function (low, medium, and high). Some hint from the data suggest that the role of helper may be extended when the subject is more low functioning and that the robot would spend more time in a physically coupled mode. Another limitation is in the lack of explicit knowledge of kinematics and haptics of the actively occurring interactions. In the future, it will be important to collect a larger variety of therapy videos across patient function and tasks and if possible simultaneous kinematic and haptic information for both patient and therapists.

8 Conclusion and Future Work

The stimulus-response model appears to be able to capture some relationships observed between patient and therapist in a variety of daily living tasks and presents a reasonable model of robot-patient interactions that may closely approach real therapy. However, there are other interactions in therapy that this model does not capture, such as empathy and caring or the influence of cognition and depression on behavior. Although, the data of cues and roles presented were specific to the tasks evaluated and the patients involved in this study, we anticipate that given new tasks and patients, the overall interaction scheme proposed would remain the same, but the percentage of time spent in roles would change depending on the level of impairment of the patient or the specific task. The robot would still need to dynamically switch between the three roles based on the cues and feedback from its sensors. We believe that this work is an initial attempt at modelling such complex interactions and our future research will build upon this foundation. We are in the process of building a large database of patient-therapist interactions. We plan to examine the generalizability of this model and determine the situations under which the model does not hold true with the goal of extending the model as needed. Our other goals are to implement a computational model that is valid for both coupled and non-coupled interactions as well as contact and non-contact interactions and a stimulus-response-based controller that can control the robot in changing roles and responding to cues.