Keywords

1 Introduction

Stroke is one of the leading causes of the loss of motor functions in patients [1]. Stroke patients often require intensive sessions of rehabilitation to partially or completely regain their lost motor abilities [2]. Robot-aided rehabilitation utilizes the advantage of using robotics to provide rehabilitation for patients [3,4,5,6]. Moreover, it has been shown to be beneficial in upper-limp rehabilitation [7]; however, more work needs to be done make robot-aided rehabilitation more accessible. Virtual reality (VR) systems, on the other hand, provide further possibilities, as demonstrated in [8], where Sveistrup assessed several settings. According to the assessment, the combination of VR and haptic interfaces offered greater outcomes in terms of patient engagement by giving more diversity. These haptic interfaces are frequently used as an assistive device to help motor learning/relearning of daily activities (ADLs) [9]. The recovery goal of rehabilitation for stroke patients is to regain the motor function they were performing before the stroke. Motor relearning is assumed to govern the motor recovery if the type of loss to be addressed by rehabilitation is known, the type of motor learning to be targeted is known, and the patient has an undamaged learning ability [10]. Motor relearning is thought to take the same elements as motor learning in healthy people. Therefore, understanding and modeling motor learning in healthy individuals will help to understand the process of motor relearning that aids the patient to reach motor recovery [11].

Various models for studying motor learning exist in the literature. A recently developed model of motor learning is based on the free energy principle. The Free Energy Principle (FEP) is a mathematical theory that attempts to explain brain structure and function by drawing on developments in statistics, physics, theoretical biology, and the machine learning [12]. In general, the FEP states that all biological agents strive to maintain their equilibrium in the face of negative external influences and to actively control internal equilibrium under changing environmental conditions [13]. The FEP, therefore, proposes to conceptualize human learning as a process of entropy minimization through “active inference,” while the brain encodes a Bayesian network whose neural dynamics are governed by a generative model that predicts sensory data [14]. In addition, model-based and model-free approaches to learning motor control schedules are also discussed in [15]. Ueyama applied system identification techniques to model motor learning and recovery revealing adaptation and generalization functions by linear state space models [16]. Similarly, Casadio et al. proposed a linear model to predict the performance of impaired subjects during robot-assisted exercise, and they claim that using computational models is promising to predict the outcomes of robotic rehabilitation [17]. In [18, 19], a feedforward type of artificial neural network is used to model the use-dependent recovery of locomotor force and learning is simulated by a biologically plausible reinforcement learning algorithm. They show that the model makes predictions that are consistent with clinical and brain imaging data. In [20], the effects of control systems on motor relearning in a robotic hand exoskeleton are simulated. Reinforcement learning is used to model voluntary torque generation by the subject during the rehabilitation process. It is shown that the kinematic control system that does not interact with the patient results in slacking. Finally, the mirror paradigm is used as the control task in this work. ADLs can also include activities in conjunction with others [21]. Thus, a commonly used approach in the literature to promote imitation and social coordination is the mirror game, in which players attempt to imitate each other's movements in one of two modalities: Leader-Follower (LF) or Joint Improvisation [22]. Indeed, several works have shown that through haptic interactions, better overall motor performance can be achieved for both the leader and the follower [23, 24].

Accordingly, a new experimental setup is proposed in this paper, consisting of a pinching manipulandum and a VR-based mirror game in LF mode as a motor control task. The virtual leader makes a compound movement consisting of three sine waves. The follower avatar is controlled by the unidirectional force generated by the interaction between human and manipulandum due to pinching. To move the avatar in the opposite direction, the user must learn to use the gravity prevailing in the virtual reality system. The data collected during the compound learning task is expected to provide models for motor learning. The manipulandum is used by a healthy individual for five consecutive days in this proof-of-concept study, and the data collected is analyzed to model and assess the motor learning of the participant.

2 Materials and Methods

In this section, the physical setup containing a mirror game that is driven by a finger manipulandum is presented. In this human-robot interaction setup, the user learns to gauge the force that is applied using the manipulandum to achieve an acceptable tracking performance.

2.1 Virtual Mirror Game and Pinching Manipulandum

Virtual Mirror Game. A leader-follower type mirror game is implemented in Simulink® Desktop Real-Time 2021b where the player controls the vertical motion of a box in free fall. The game environment is designed using V-Realm Editor and is shown in Fig. 1. Given a predefined leader motion [23, 25], the user attempts to track the leader by applying a unidirectional force that is used to move the box upwards; the downwards motion is achieved by using simulated gravity (~0.5 g). The unidirectional nature of the force created a demanding experience in which the participants had to somewhat oppose the gravitational force during descending action in order to maintain synchronized motion with the leader. The tracking performance is calculated in real-time using a position error-based score metric and displayed to the user on the screen [26]. The score is calculated as,

$${T}_{f}^{-1}{\int }_{0}^{{T}_{f}}{e}_{max}-\left|{y}_{F}\left(t\right)-{y}_{L}\left(t\right)\right| dt$$
(1)

In Eq. 1, \(t\) is the elapsed time, \({e}_{max}\) is the maximum tolerable error, \({y}_{F}\) and \({y}_{L}\) are the follower’s and leader’s positions, respectively, \({T}_{f}\) is the round duration.

Fig. 1.
figure 1

Hand pinching the manipulandum (Left). The tracking mirror game. (Right)

Pinching Manipulandum.

End-effector-based rehabilitation tools are widely used in robot-assisted rehabilitation applications. Planar manipulanda [27, 28] are used in various studies to assess motor learning behavior in healthy individuals and patients. They are commonly driven by interaction-type controllers and are used to provide haptic interaction between the users and a virtual environment [24]. A pinching manipulandum refers to an apparatus designed to facilitate pinching action applied with the index and thumb fingers [29, 30]. In this paper, a pinching manipulandum is developed to perform haptic interaction between the users and the mirror game. To achieve pinching action, two ergonomic finger pads were constrained by horizontal sliders. These sliders were controlled using a centered, double slider-crank mechanism that is actuated by a single motor (Maxon EC-max 60W). A force sensor (Honeywell FSG15N1A) was placed against the finger pad such that the force is directed through lever action from the fingertips. The measured force was used to drive an admittance controller which rendered a virtual spring-damper system at the fingertips, creating a human-robot interaction scheme, Fig. 2. The generated pinching force is transmitted through a data acquisition board (National Instruments PCI-6229 M Series) at 2 kHz to a real-time control system to control the manipulandum motor and run the aforementioned mirror game.

Fig. 2.
figure 2

Shows the system architecture for sensing, using, and logging data. Dashed lines indicate systems implemented in software.

2.2 Models for Motor Learning

As stated in [31], the FEP involves the Bayesian brain hypothesis, which states that brain function explores how well the nervous system can operate under uncertain conditions in a way that approximates the ideal recommended by Bayesian statistics: the idea that the brain is an inference machine [32]. In this work, an exponential model is used to describe the complex nature of motor learning [14]. The model is given as,

$$y=a{e}^{bx}$$
(2)

where, the coefficients \(a\) and \(b\) can be interpreted as the initial magnitude of variability and the decay rate, respectively, where \(x\) is the round number and \(y\) is a performance error metric.

In addition, two machine learning approaches are utilized to predict the performance of the participant. The first approach trains a Long Short-Term Memory (LSTM) [33, 34] network using the training dataset to predict the last day’s performance. The second approach uses a tree-based machine learning models to evaluate which rounds were the most prominent for the motor learning of the participant.

2.3 Experimental Design

In this proof-of-concept study, one participant (female, age 31) played the mirror tracking game using the manipulandum. Over five consecutive days, the participant played a total of 104 rounds. A round is defined as the 30- or 60-s-long tracking task that is described in Sect. 2.1. At the beginning, the baseline skill level is assessed by playing two 60-s rounds (BL), afterwards and over 5 consecutive days, the participant played 20 30-s rounds per day for a total 100 rounds (T). On the fifth day, the skill level was assessed again by two 60-s rounds (R) identical to the baseline. The experimental protocol is shown in Fig. 3. Importantly, BL and R rounds had identical leader motion patterns, whereas the TR rounds had reversed leader motion pattern [14, 26].

Fig. 3.
figure 3

Presents the experimental protocol used to collect the data. The training data is collected as batches of 20 rounds over 5 consecutive days.

The RMSE provides a measure of the error between the leader and follower’s position. As a result, this metric can be thought of as a measure of synchrony between the leader and the follower. It is defined as, [35]

$$RMSE=\frac{1}{L}\sqrt{\sum\nolimits_{k=1}^{n}\frac{1}{n}{\left({x}_{L,k}-{x}_{F,k}\right)}^{2}}$$
(3)

where \(L\) is the position range, \(n\) is the number of time intervals, and the \(x\)’s refer to the positions of the leader and follower at the kth sample step. It is expected to see this metric decrease as the participants play more rounds of the game; this can be correlated with the motor learning performance. Indeed, this metric is used to fit the free energy function.

3 Results and Discussion

3.1 Free Energy Model

The free energy model is fitted to the data collected over 5 days. The position data of the leader and the pursuer for 100 training rounds were used to determine the RMSE for each round separately. The resulting scatter plot is shown in Fig. 4. Each data point refers to the RMSE of a single round. Then the free energy model (Eq. 2) is fitted to the scatter plot. The exponential fitting coefficients \(a = 0.234\) and \(b = -0.00914\) are determined with 95% confidence and are consistent with the expectations of the model in [14]. According to the FEP, the data fit shows the specific rate and variance of the subject's motor learning during the training period. To evaluate the error densities, Bayesian Regression modeling is used [36,37,38]. The entire distribution of Fig. 4 is defined as the density of the error of the participant. While the black line (y) represents the density of the original error, the other thin lines represent the posterior predictive distribution (yrep), which represents the predicted error based on the Bayesian model. The density distribution figure demonstrates that the posterior projected data is more comparable to the true data. The validation of the Bayesian model also supports the free energy model for motor learning.

Fig. 4.
figure 4

(Left) The free energy fitted model \(y=a{e}^{bx}\) for the obtained RMSE values. (Right) The probability of the position error density of the Bayesian model for multiple predictors (day and rounds).

3.2 Use of Long Short-Term Memories (LSTM) and Tree-Based Predictive Models

Using the training data from the training session, an LSTM type of artificial neural network [34] is trained to predict the subject's position response (i.e., the location of the follower avatar) on the fifth day. Each day’s 20th training session is utilized in the network to narrow down the data, Fig. 5 (left). In this study, the past 4 lags were given to the LSTM architecture as a time-delay embedding to capture the auto-correlation [39]. In the architecture of LSTM, 1 recurrent layer and 2 features in the hidden state are used. The nonlinearity is supplied by the tanh function. The LSTM layer ends with one linear layer which applies the linear transformation to incoming data. For training the network, an Adam optimizer with a learning rate of 0.01 is used, and the loss function is set as the mean square error. 2000 epochs are needed to train the LSTM. Figure 5 (right) shows the performance of the predictor. In addition to the visual representation, the forecasting performance of the LSTM is evaluated by calculating the RMSE score on the test data. The RMSE of the model for the test data is 0.053 which indicates that the model has a considerable prediction capability.

Furthermore, a tree-based machine learning algorithm for predicting the user’s R1 and R2 responses is developed with the aim of understanding which training round is effective in the prediction of R1 and R2; this is achieved through the variable importance property of tree algorithms. All processes are conducted using Jupyter Notebook and related Python libraries including Pandas, Sklearn, Xgboost, and Shap.

Fig. 5.
figure 5

(Left) Shows serially-combined position error data for each T20 in the five consecutive days. (Right) Shows the prediction of the LSTM.

Before starting the modeling, the dataset is divided randomly into two parts; train (80%) and test (20%). After that, the train data is again randomly divided into the train (80%) and validation sets (20%). The parameters of applied models are tuned using the random search approach. Two tree-based algorithms, which are Random Forest and Xgboost, are applied and compared concerning \({R}^{2}\) and \(RMSE\). The prediction performance of both models for validation and test sets is given in Table 1. It is seen that the Xgboost model outperforms Random Forest in the prediction of the position of the user in the R1 and R2. Next, which round of the game is the most effective in the prediction of the position of the user can be explored using a beeswarm plot of the model created by SHAP. SHAP is a game-theoretical Shapley value-based method to explain the machine learning model. It is seen in Fig. 6 that the most effective training round on R1 is the 6th round, followed by the 14th and 7th rounds. Therefore, it can be said that when the performance of the user in the 6th round increases, then the performance of the user in the R1 round also increases. On the other hand, it can be said that when the score of the user in the 14th round increases, the score of the user in R1 decreases. It is seen in Fig. 6 that the most effective training round on R2 is the 14th round, followed by the 9th and 7th rounds. It can be said that when the position score of the user in the 14th round increases, then the position score of the user in R2 decreases. A similar interpretation can be also made for round 9. However, the R2 score decreases when the 7th round score decreases.

Table 1. The performance of Random Forest and Xgboost in the prediction of R1 and R2
Fig. 6.
figure 6

Shows the SHAP plots for R1 (left) and R2 (right).

4 Conclusion

In this paper, we briefly described the pinching manipulandum system, which was developed to measure motor learning in healthy people at first. Continuous implicit learning in the leader-follower modality is combined with learning to generate unidirectional force and use the gravitational field to locomote the follower. Such a challenge must be tailored to the skill level of the participants. The primary goal of our study is to develop subject-specific motor learning models. The initial phase of model construction is led by basic exponential functions derived from the free energy principle and predictors derived from machine learning methods. These models will then be combined with more structured learning models to replicate more complex motor control policies. They will also serve as previous knowledge for subject-specific reinforcement learning models. It is critical to evaluate the most efficient rounds or sessions for motor learning to develop optimal rehabilitation programs. It is important to increase the number of participants in order to demonstrate that data from the proposed experimental platform is used to construct subject-specific and distinguishable models. Manipulandum will be used as an end effector kind of rehabilitation robot for stroke patients, as well as in conjunction with an exoskeleton type of robotic system to serve as an interaction environment for the VR mirror therapy protocol. Finally, it is proposed to construct patient-specific optimal robotic hand rehabilitation systems with control systems that impose personalized entropic sources in order to maximize motor relearning.