Keywords

1 Introduction

Several neuroscientific studies focus on eye-head behaviour during locomotion. Results about head and eyes during walking mostly come from two-dimensional studies on linear overground, turning, treadmill locomotion, running and walking on compliant surface [16]. These studies have shown that the body, head, and eyes rotate in response to the up-down and side-to-side motion to maintain stable head pointing and gaze in space. This is achieved through the joint effect of two main classes of reflexes, which rely on the output of the inertial system: 1. the vestibulo-ocular reflex (VOR), which stabilizes the visual axis to minimize retinal image motion; 2. the vestibulocollic reflex (VCR), which stabilizes the head in space through the activation of the neck musculature in response to vestibular inputs. The VOR compensates for head movements that would perturb vision by turning the eye in the orbit in the opposite direction of the head movements [7]. Several approaches have been used to model the VOR depending on the aim of the study. In robotics literature we found some controllers inspired by the VOR [811]. The VCR stabilizes the head based on the inertial input space by generating a command that moves the head in the opposite direction to that of the current head in space displacement. When the head is rotated in the plane of a semicircular canal, the canal is stimulated and the muscles are activated. This stimulation produces a compensatory rotation of the head in the same plane. If more than one canal is activated, then an appropriate reflex response is produced. Unlike the VOR, the VCR controls a complex musculature. The VOR involves six extraocular muscles, each pair acts around a single rotation axis. On the other hand, the neck has more than 30 muscles controlling pitch, roll and yaw rotations.

In robotics, some head stabilization models already exist implemented on humanoid robots. Gay et al. [12] proposed a head stabilization system for a bipedal robot during locomotion controlled by the optical flow information. It is based on Adaptive Frequency Oscillators to learn the frequency and phase shift of the optical flow. Although the system can successfully stabilize the head of the robot during its locomotion, it does not take in consideration the vestibular inputs. The most close to the neuroscientific findings of the VCR are the works proposed by Kryczka et al. [1315]. They proposed an inverse jacobian controller [13, 14] based on neuroscientific results [16] and an adaptive model based on a feedback error learning (FEL) [15] able to compensate the disturbance represented by the trunk rotations. All the presented models try to reproduce specific aspects of the gaze stabilization behaviour, but none of them can provide a comprehensive model of gaze stabilization, integrating eye stabilization (VOR) together with head stabilization (VCR).

By considering the analysis of neuroscience findings, we can conclude that in order to replicate eye-head stabilization behaviours found in humans it is necessary to be able to replicate the joint effect of VCR for the head and VOR for the eye. This work goes in this direction by presenting a model that replicates the coordination of VCR and VOR and is suitable for the implementation on a robotic platform. We used, as a disturbance motion, inertial data acquired on a human subject performing various locomotion tasks (straight walking, running, walking a curved path on normal and soft ground) and replicated by the torso of a humanoid robot. The purpose of these tests is to assess the effectiveness of the stabilization capabilities of the proposed model rejecting the torso disturbance measured in real walking tasks through the joint stabilizing effect of head and eye of the simulated iCub robot.

2 Eye-Head Stabilization Model

In order to implement the VOR-VCR system, a bio-inspired feed-forward control architecture was used. The model uses classic feedforward controllers that generate motor commands purely based on the current error. Each controller is coupled with a learning network that generates predictions based on internal models that are used to fine tune motor commands. An overview of the model can be seen in Fig. 1.

Fig. 1.
figure 1

The proposed model of eye-head stabilization. Dashed lines represent encoder readings, dotted lines show inertial values.

2.1 Head Stabilization System

Inside the head stabilization system, the output of the VCR internal model (\(u_{vcr}\)) is added to the output of the feedforward controller (\(e_{vcr}\)) in order to generate motor commands that stabilized the head against the disturbance originating from the torso movements. The VCR Feedforward Controller is implemented as a PD controller, and its output is computed as a function of the inertial readings (In, \(\dot{In}\)):

$$\begin{aligned} e_{vcr} = k_p \cdot In + k_d \cdot \dot{In}. \end{aligned}$$
(1)

The inputs to the learning network are the current and desired position and velocity of the robotic head, and the network is trained with newly generated motor commands. In order to provide a proper reference to the VCR internal model, the current value of the external disturbance must be estimated. Using the readings coming from the inertial measurement unit and the encoder values, the disturbance vector (d) can be estimated using only direct kinematics functions, by computing \(d = In - \tilde{In}\), i.e. by subtracting the expected angular rotations given by the encoder values (\(\tilde{In}\)) from the inertial readings (In). \(\tilde{In} = [\varphi , \vartheta , \psi ]\) are the Euler angles for the rigid roto-translation matrix \(K(\theta _h)\) from the root reference frame to the inertial frame, computed as:

$$\begin{aligned} \varphi&=atan2(-K(\theta _h)_{2,1},K(\theta _h)_{2,2}),\end{aligned}$$
(2)
$$\begin{aligned} \vartheta&=asin(K(\theta _h)_{2,0}),\end{aligned}$$
(3)
$$\begin{aligned} \psi&=atan2(-K(\theta _h)_{1,0},K(\theta _h)_{0,0}). \end{aligned}$$
(4)

Likewise, the same procedure can be followed in order to estimate the velocity of the disturbance:

$$\begin{aligned} \dot{d} = \dot{In} - \dot{\tilde{In}} = \dot{In} - J(\theta _h) \cdot \dot{\theta _h}, \end{aligned}$$
(5)

where J is the geometric Jacobian from the root reference frame to the inertial frame.

2.2 Eye Stabilization System

The eye stabilization system implements the VOR and, similarly to the head stabilization system, produces a motor command for the eyes that is the sum of the feedforward controller output (\(e_{vor}\)) and the VOR internal model one (\(u_{vor}\)). Given that the eye should stabilize the image against the relative rotation of the head, the error is computed as the difference between inertial measurements and the current eye encoders (\(\theta _e\), \(\dot{\theta _e}\)). Thus, the output of the VOR feedforward controller is computed as

$$\begin{aligned} e_{vor} = k_p \cdot (-In) + k_d \cdot (-\dot{In}). \end{aligned}$$
(6)

The VOR internal model receives in input the head position and velocity signal as references, acquired through the vestibular system, along with the proprioceptive feedback, and uses the generated motor command as a training signal.

2.3 Learning Network

Prediction of the internal model is provided by a learning network that is implemented with a machine learning approach, Locally Weighted Projection Regression (LWPR) [17]. This algorithm has been proved to provide a representation of cerebellar layers that in humans are responsible for the generation of predictive motor signals that produce more accurate movements [18, 19]. The LWPR spatially exploits localized linear models at a low computational cost through an online incremental learning. Therefore, the prediction process is quite fast, allowing real-time learning. LWPR incrementally divides the input space into a set of receptive fields defined by a centre \(c_{k}\) and a Gaussian area characterized by a positive definite distance matrix \(D_{k}\). The activation on each receptive field k in response to an input x is expressed by

$$\begin{aligned} p_{k}(x) = exp\left( -\frac{1}{2}(x-c_{k})^{T} D_{k}(x-c_{k})\right) , \end{aligned}$$
(7)

while the output is \(y_k(x) = w_k \cdot x + \epsilon _k\), where \(w_k\) and \(\epsilon _k\) are the weight vector and bias associated with the k-th linear model. For each iteration, the new input, x, is assigned to the closest RF based on its weight activation, and consequently, the centre, the weights and the kernel width are updated proportionally to a training signal. Moreover, the number of local models increases with the complexity of the input space.

The global output of the LWPR is given by the weighted mean of all the predictions \(y_{k}\) of the linear local models created:

$$\begin{aligned} u(x) = \frac{\sum ^{N}_{k=1} p_{k}(x) y_{k}(x)}{\sum ^{N}_{k=1} p_{k}(x)}. \end{aligned}$$
(8)

3 Experimental Procedure

In order to collect human inertial data relative to locomotion tasks, experiments were conducted on a human subject with no visual and vestibular impairments. An inertial measurement unit (IMU) was placed on the back of the subject, near T10, the tenth vertebra of the thoracic spine, as depicted in Fig. 2.

Fig. 2.
figure 2

Placement of the inertial measurement unit on the subject.

The IMU used was an Xsens MTi orientation sensorFootnote 1, that incorporates an on-board sensor fusion algorithm and Kalman filtering. The inertial unit is able to produce the current orientation of the torso at a frequency of 100 Hz.

Three different tasks were performed by the subject: straight walking (25 m), circular walking and straight running (25 m). The circular walking was carried out by asking the subject to walk with a circular pattern, without any indication of the pattern on the ground. Such task was executed both on normal and soft ground, provided by placing a foam rubber sheet on the ground. The foam had a density of 40 kg/m\(^3\) and the sheet measured 103\(\,\times \,\)160\(\,\times \,\)15 cm. All tasks were performed with bare feet.

Due to the fact that the inertial readings relative to the yaw rotational axis (rotation around z) can often be inaccurate because of drifting, we decided not to use such readings. Moreover, in order to prevent drifts of the sensor measurements on the other two rotational axis (pitch and roll, rotations around y and x respectively), each trial lasted less than one minute with a reset of the rotational angle at the beginning of the trial [20].

4 Robotic Platform

The proposed model was implemented for the iCub robot simulator [21], a software included with the iCub libraries. The iCub head contains a total of 6 degrees of freedom: 3 for the neck (pan, tilt and swing) and 3 for the eyes (a common tilt, version and vergence), while the torso has 3 degrees of freedom (pan, tilt and swing). The visual stereo system consists of 2 cameras with a resolution of 320\(\,\times \,\)240 pixels.

In order to assess the repeatability of the experiments on the iCub simulator, first test were conducted to evaluate whether the measurements of the simulated robot inertial rotations were compatible with the collected data. Thus, the collected torso rotations were given as motor commands to the robot torso. A graphical comparison can be seen in Fig. 3, where the actual IMU data is shown alongside the robot one. It can be observed that the simulation is accurate enough to reproduce the data, even if with a delay of 50 ms. The error between the two signals was then computed after a temporal alignment and its Root Mean Squared value is 0.21 deg for the pitch rotational axis and 0.12 deg for the roll.

Fig. 3.
figure 3

Comparison between collected IMU readings and simulated ones, on pitch rotational axis (top) and roll (bottom). (Color figure online)

5 Results

The stabilization model was tested on the data coming from the three different locomotion tasks (straight walking, circular walking and straight running). Due to the fact that the collected inertial data related to the yaw rotational axis was not considered, the eye-head stabilization model has been simplified, so that no stabilization on the yaw axis was performed. Moreover, given that the robot eyes cannot influence stabilization on the roll rotational axis, due to the fact that only tilt and pan motors are present, only disturbance on the pitch axis was compensated by the VOR model.

The main measure of error during a stabilization task is the movement of the camera image. In particular, human vision is considered stable if the retinal slip (the speed of the image on the retina) is under 4 deg/s [22]. In order to compute the error from the camera image, a target was placed in front of the simulated robot and its position was tracked from the camera images via a colour filtering algorithm during the execution of the task. Another measure of performance considered is the inertial orientation and speed of the head. As already stated before, no movement on the yaw rotational axis was considered, thus only the camera error on the vertical axis is relevant for the evaluation.

For each task, a comparison between the same task performed with and without the stabilization model will be presented. The values of the gains of the PD controllers were set to \(k_p = 5.0, k_d = 0.1\) for the VCR model and to \(k_p = 1.0, k_d = 0.1\) for the VOR model, for all trials.

5.1 Straight Walking

Results for the compensation of the disturbance of straight walking inertial data can be found in Table 1, where the Root Mean Square (RMS) values for inertial readings and target position and speed are presented. In this and subsequent tables, \(In_p, \dot{In_p}\) are the inertial readings for rotation (deg) and rotation speed (deg/s) on the pitch axis, \(In_r, \dot{In_r}\) are the inertial readings for rotation (deg) and rotation speed (deg/s) on the roll axis, \(v, \dot{v}\) are the position of the target on the camera image (deg) and its speed (retinal slip, deg/s).

Table 1. Results for straight walking data.

Figures 4, 5 and 6 show the behaviour of the task, showing the target position and retinal slip, inertial data for the pitch rotational axis and inertial data for the roll axis, respectively. From these results it can be noticed that while the roll disturbance is almost completely compensated by the VCR model, the magnitude of the rotational velocity on the pitch axis is too high to be fully compensated by the said model, that only provides an improvement in the position space. Nevertheless, the VOR subsystem is still able to maintain the camera image stable, with a mean vertical retinal slip lower than 4 deg/s. Moreover, Fig. 4 also shows a comparison between the full stabilization model and a simplified model with only the PD controllers. While the PD only implementation is able to reduce the error on the camera, it is outperformed by the complete model, thus proving the effectiveness of the latter.

Fig. 4.
figure 4

Stabilization task with data from a straight walking task, target position (top) and retinal slip (bottom). (Color figure online)

Fig. 5.
figure 5

Stabilization task with data from a straight walking task, inertial position data (top) and inertial velocity (bottom), pitch axis. (Color figure online)

Fig. 6.
figure 6

Stabilization task with data from a straight walking task, inertial position data (top) and inertial velocity (bottom), roll axis. (Color figure online)

5.2 Circular Walking

Two sets of data were collected for circular walking tasks: one for normal ground and one for soft ground. Results for both cases are presented in Table 2, where it can be observed that walking on soft ground produces a greater disturbance, especially in the velocity space. Despite the higher disturbance the model is still able to stabilize the head and the camera image, achieving stable vision in both cases. As in the straight walking case, the disturbance on the pitch axis cannot be fully compensated by the VCR alone, but thanks to the VOR module, the vision remains stable. The behaviour on the soft ground task can be observed in Fig. 7.

Table 2. Results for circular walking data on normal and soft ground.
Fig. 7.
figure 7

Stabilization task with data from a circular walking task on soft ground, target position (top) and retinal slip (bottom). (Color figure online)

5.3 Straight Running

During the last experiment, data from the straight running was used to move the robot torso. From Table 3 it can be observed that the model is not able to achieve a complete compensation of the disturbance, due to the high rotational velocities on the two axes. Nevertheless, the mean retinal slip is reduced to a quarter of the one of the trial with no stabilization. Thus, the model provides a viable solution even for disturbances of this magnitude, as it is also shown in Fig. 8.

Table 3. Results for straight running data.
Fig. 8.
figure 8

Stabilization task with data from a straight running task, target position (top) and retinal slip (bottom). (Color figure online)

6 Conclusions

In this work we present the first complete model of gaze stabilization based on the coordination of VCR and VOR and we validate it through an implementation on a simulated humanoid robotic platform. We tested the model using, as a disturbance motion, inertial data acquired on a human subject performing various locomotion tasks that we replicated with the torso of the simulated iCub robot. Results show that the model is able to perform well in almost all trials, with the exception of the straight running task, by reducing the retinal slip below 4 deg/s, thus achieving stable vision. In the running task, the model was still able to improve the stabilization by reducing the retinal slip to a quarter of the one from the task were no stabilization was present. As such, this model has proven suitable to be used on humanoid robotic platforms, where it could help during visually guided locomotion tasks by stabilizing the camera view against the disturbance produced by walking.