1 Introduction

Posture control and balance are required to maintain equilibrium when walking or standing and to provide buttress when performing a motor task. Losing balance is one of the typical reasons of failure for humanoids, often damaging the hardware, as reported for example for the DARPA challenges [1,2,3]. During such a challenge the robot is evaluated in terms of goal achievement, without (directly) going into details of the reason of the failure. An evaluation system focused on the details of posture control is envisaged to be useful to inspire the improvement of the components of the control system. The study of human posture control can provide inspiration for the control of humanoids [4,5,6,7] and, on the other hand, humanoids represent a potential testbed for theories for human neurology [8]. Studies involving human-inspired posture control systems usually include an ad hoc specified test of performance, while neurology works exploiting the robot as a simulation device for the comparison between human and robot behavior defined on some quantitative basis (e.g. body-sway frequency response to external disturbances). In this work we specify a set of tests and performance indicators (PI) that are meant to make such evaluations repeatable and comparable between different robots. This fits in the more general effort of producing benchmarking tools for humanoid robots [6, 9,10,11,12].

2 Tests and Performance Indicators

Sinusoidal disturbance. Providing an external disturbance with a sinusoidal profile allows for an evaluation of the performance in terms of disturbance rejection. Different kinds of stimuli can be used, e.g. surface tilt or translation. The response consisting of the induced body sway is used to compute gain on a specific frequency as ratio between response and stimulus [13]. The periodic nature of the stimulus can test the ability of the robot to exploit prediction [14]. Considering that in general the response of the robot is not linear, several frequencies and amplitudes can be tested obtaining several scores. In general, a smaller gain is considered a better performance, nevertheless a more “relaxed” compensation of the disturbances may be more efficient and hence the gain may be evaluated together with energy consumption or mechanical work produced by the actuators [13].

For testing the movement of the support surface with a sinusoidal profile, the pi is the body sway over the stimulus: gain . The smaller the gain the better the performance.

Raised Cosine. A support surface movement, e.g. translation or tilt, with a velocity profile of a raised cosine represents a smooth version of a step function that can be used safely for humanoids and human subjects [15]. In this way the transient response to external stimuli can be evaluated in terms of characteristics like rise time, overshoot, settling time, peak time and delay-time.

Raised cosine is a “safe” version of the step function that can be used to evaluate a PI reflecting the transient response characteristics: rise time, overshoot, settling time, peak time and delay-time.

Model parameters. Parametric models of human posture control can be fitted on experimental data. This transforms a series of body sway measures and input stimuli into a set of parameters. In particular we developed a system to fit the nonlinear DEC (disturbance estimation and compensation) model [4] based on convolutional neural networks [16]. The parameters are not a PI by themselves, but they can be used to assess some properties of the humanoid such as joint stiffness and total loop delay. The set of parameters represents a feature set that can be used in the development of machine learning solutions and to define a similarity between two different robots.

Parameters for posture control models are a concise and meaningful representation of robot behavior that can be used for performance evaluation.

Human likeness. A dataset of results from human experiments is provided as a reference for the benchmarking. The set includes healthy subjects and subjects with specific health conditions affecting sensorimotor control such as spasticity or vestibular loss. The experiments consisted in providing the subject with a stimulus consisting of a tilt or a translation of the support surface in the sagittal plane, while body sway was recorded as output. The profile used for the stimulus is a pseudorandom ternary signal, PRTS [17]. The comparison between different behaviors is defined in terms of the norm of the difference between frequency response functions on a set of relevant frequencies (specifically \({\varvec{f}}_{{{\varvec{peak}}}}\) = [0.0165, 0.0496, 0.0992, 0.1322, 0.1818, 0.2314 0.2975, 0.3636, 0.4463, 0.5785, 0.7273, 0.9256, 1.1736, 1.4545, 1.7686, 2.1983] Hz). Such frequencies are defined by the structure of the PRTS power-spectrum \(P\left( f \right)\) that has a “comb” profile with peaks on those frequencies separated by ranges of frequencies with virtually no signal. Furthermore, the peaks of the PRTS power-spectrum have larger values at lower frequencies [18]. This implies a better signal-to-noise ratio for the first components. A weighting proportional to \(P\left( {{\varvec{f}}_{peak} } \right)\) is applied in the comparison. The distance between two FRFs is defined and the norm of the difference weighted by the precision matrix, i.e. the inverse of the covariance matrix ∑, computed on the dataset of normal subjects, this together with the foretold weighting leads to the definition of the norm:

$$D = \sqrt {{\varvec{d}}^{{\varvec{T}}} {\varvec{S}}\Sigma^{ - 1} \user2{S d}}$$
(1)

where \({\varvec{S}} = diag\left( {P\left( {{\varvec{f}}_{peak} } \right)} \right)\) is the diagonal matrix representing the reweighting due to the power-spectrum, and \({\varvec{d}}\) is the difference between the two FRFs.

This approach does not require model identification because it is performed on the basis of the data. The comparison can be performed between the tested robot and the average of the groups (healthy or with special deficient conditions) or between two single samples in order to quantify how much two robots differ from each other.

Human likeness can be estimated on the basis of a comparison with a dataset from human experiments. Different groups of subjects can provide a reference to ‘diagnose’ a specific behavior. The measure in (1) defines a norm that can be used also to compare two specific trials.

3 Conclusion

In this contribution we presented a set of PIs for posture control and an overview of our humanoid performance benchmarking principles. The human experimental dataset, the software implementing the proposed analysis and the hardware required to perform the proposed tests will be available through the EUROBENCH initiative (https://eurobench2020.eu/). Specifically, the moving platform has been designed for humanoids, but, provided that safety for users is properly ensured, the here described PIs can be applied to the study of wearable robots.