1 Introduction

Human motion is closely mimicked by the biped robots. The usage of legs facilitates the motion of robots navigating uneven surfaces and cluttered environments, such as climbing stairs and rocky terrain with ease. Furthermore, the movements of this type require less energy intensive [1, 2]. According to modeling considered for bipedal robots, their movement cycle can consist of different phases, such as a single support phase (SSP) and a double support phase (DSP) [3]. Since there is no actuator between the foot and the ground, it is more common to be modeled as underactuated, especially in the biped robots by point feet. Further to the challenge of under-actuation and complex nonlinear dynamics, the impact of the foot with the ground is a significant factor that affects the stability of motion, which makes the system have a hybrid form when viewed instantaneously [4,5,6]. Movement in creatures with legs is typically rhythmic and repetitive [7]. A closed orbit, known as a limit cycle, is therefore constructed in the phase space [8,9,10]. There are certain characteristic attributes encoded in the limit cycle, such as average speed or typical energy level set. Also, there are various methods to control the energy [11,12,13], angular momentum [14,15,16], and others [17,18,19] for this application. However, most of the algorithms rely on tracking precomputed reference paths. The reference paths may be determined for their models by analogy, either with biological researches [20,21,22] or with a simple model of dynamics [23,24,25] or using the shape of passive motion gaits [26,27,28]. One of the main strategies for generating these paths is the optimization based on various cost criteria, such as the minimum amount of power expended over a walking cycle [29,30,31]. Only a few works have produced a closed-loop system with a definite stability property. The Virtual Holonomic Constraints (VHCs) are employed as a key concept that is imposed asymptotically on the robot motion via feedback control. This approach removes timed signals entirely from the control loop and replaces them with state-dependent virtual constraints [32,33,34]. To demonstrate motion stability, some researchers used a Poincare return map strategy for an underactuated system with instantaneous impact (USWII) phenomenon [19, 35, 36]. It is a numerical method that places a line as a transversal section across the closed orbit and investigates the stability by comparing the convergence of the points along that line within a temporal progression [37,38,39]. Therefore, we are unaware of the stability at first. However, even though it is harder to control than precomputed reference path-based methods, it enhances the robustness of the feedback loop by eliminating exogenous reference signals. Moreover, it decreases the cost of such analysis by reducing the dimensions and complexity of the dynamic system. The other advantage of this method in an analysis and control design is that it takes into account both kinematics and dynamics. As yet, these works have not evolved to be more compatible with human movement in the environment, nor do they have the ability to smoothly modulate their speed based on human intent. There were various approaches used to regulate the speed of the robot motion. For example, in [40], their approach used time-varying rules, and the robot was considered as fully actuated. In [41], they mainly focused on passive robots, and the result was that changing physical parameters, such as stiffness of hip springs or other joint flexibilities, affected robot speed. In [42], they used a two-level control structure that did not rely on model dynamics or reference paths to control robot speed, although they required a library of data to figure out how control inputs affect the robot speed. In [43], they considered an inverted pendulum model to adjust the robot speed by injecting energy. They also utilized the Poincare return map to analyze the stability. In [44], they shift the energy equilibrium point based on a variable-length inverted pendulum to achieve the desired speed. Some of these works simplified the biped model as an inverted pendulum and almost in all, they did not analyze the under-actuation or impact effect, and also did not provide guarantees of stability under switching condition. Ensuring stable switching, on the other hand, requires estimating the basin of attraction of each individual gait and a proper transition between other cycles in a way that no instability occur. The VHCs have been used in [32] to regulate walking speed. For some speed values, a library of different sets of the VHCs is developed and then smooth maps between each of the two sets are generated. Their stability was also based on the Poincare return mapping. Similarly, this perspective is employed in [45, 46]. It can be often an intolerable challenging task, for generating each gait. We attempted to present a comprehensive and brief compression of this issue. Providing a library of data [32, 42] and saving it on board require extensive numerical integration of the high-dimensional nonlinear dynamics of the system with high processing costs, as well as searching for stability of each gate.

It then needs a large amount of memory in practical applications after providing a library with these motion gates and most of these works cannot be matched with other models. The other drawback is that the resulting cycles are only calculated for specific values of motion speeds, and for other values of speed, the obtained cycles cannot be directly used. Therefore, there is not an effective algorithm that can adjust the speed of biped robot walking in real time. Considering simple model as an inverted pendulum has two faults, since it is different from the main dynamic model [23,24,25]. The first is: it causes that we cannot adjust precisely the speed even more in accelerating maneuver due to ignoring the under-actuation or impact effect in analyzing. The second is; it does not provide guarantees of stability under switching condition and cannot propose a stability rule. For conclusion, the impact cause state space has a hybrid dynamic in which the states are switched, whereas the under-actuation causes state space has zero dynamic. Aside from the impact and under-actuation issue, there are several challenges in achieving this goal. The first is to stabilize a suitable VHC in the state space and provide repetitive motion or a closed orbit lying on the constraint manifold (more information can be found in [47]). The second is orbital stabilization using a suitable control algorithm. The third is online adjustment of the VHC parameters according to the desired demand of the gait speed. We propose our solution to remove this defect that has not been properly addressed to date. As a result, we use a novel method to search online for new manifolds to reach each arbitrary non-predetermined biped robot speed which eliminates the need for a library, unlike [32].

1.1 Contributions

The aim of this article is to present a practical approach for regulating the average speed of a five-link biped robot from an initial stable movement. Our work here relies on the idea of the adaptive deforming of the VHCs, which enable our dynamic system, after applying the controller, to be autonomous by eliminating time. Our method uses the original model of the robot to find out the desired average speed, therefore it is a very accurate and more robust response as well. Four contributions are made in this article. In the first contribution, we develop a methodology that analytically determines the stability obtained from stabilizing the VHCs to remove numerical and time-consuming methods such as Poincare returns maps. Second, the robot average speed is being regulated online. This is achieved using the concept of adaptive deformation of constraints, which transforms the obtained periodic orbit into one with the desired characteristics. Third, a mathematical formula is derived to determine the average speed of the created orbit in the USWII systems, which has proven to be unique in the phase diagram. At last, we design a hierarchical control scheme with two levels, that have a robust property. At the low level, a controller stabilizes the VHCs in the system, resulting in making an orbit in a two-dimensional constraint manifold. At the high level, the controller uses adaptive deforming of the constructed orbit to adapt its characteristics to our desires. The shaping process occurs in a basin of attraction of the next gate, following the proper stability theorem. The high-level controller searches the suitable characterizing parameters of VHCs when the low-level control acts. So, we define a critic agent to observe the stability and quality of convergence in low-level control. The critic results are used to tune the high-level control actions in changing the VHCs parameters.

1.2 Organization

The article is organized as follows. In Sect. 2, the dynamic model of the biped robot in both SSP and DSP phase is presented. In Sect. 3, we outline our strategy to get a solution for the problem. In Sect. 4, we describe the motion planning and present a stability theorem. In Sects. 5, we design the hierarchical control method in two levels with a critic agent. In Sect. 6, some simulations are illustrated and a brief discussion about the results is presented.

2 Model of biped robot with instantaneous impact

This section reviews the kinematics and kinetics of the biped robot presented in [32, 33]. Since most of the locomotion takes place in the sagittal plane and on a level surface, the robot is considered planar, and is therefore examined from a two-dimensional perspective, like [19, 45, 48]. We consider a biped robot with five rigid links and four joints that consist of a torso, hips, and two equal-length legs with point feet, as shown in Fig. 1. Each link is further assumed to have a uniformly distributed mass, like [32,33,34]. The links are connected by an actuated revolute joint that acts as an external source of energy that generates torque. Therefore, there are four actuators. The walking gait is supposed that there are successively two phases of SSP (with only one leg touching the ground) and an instantaneous DSP (where the legs alternate). The model from [33] is used here, which also known as RABBIT.

Fig. 1
figure 1

The schematic of five-link underactuated biped Robot

2.1 Modeling of SSP

During SSP, the stance leg is placed on the ground and acts as an inverted pendulum. It is based on the physical assumption that the friction on the ground prevents the foot from sliding, which is typical and found in some literatures, like [19, 32, 35]. Inspired to the biological of the human walking, there is no actuator between the foot and the ground. Thus, the system has five degrees of freedom (DOF) with four actuators as outlined above. Due to this, the biped robot has one degree of under actuation. We consider the absolute angles of links are the generalized coordinates of the system. The dynamic equation of the SSP can generally be derived easily by using the Lagrange method [33, 34]. The result is in standard second order system.

$$ D\left( q \right)\ddot{q} + C\left( {q,\dot{q}} \right)\dot{q} + G\left( q \right) = B\left( q \right)u $$
(1)

Here, \(q=({q}_{1}, ..., {q}_{5})\in Q\) is a configuration coordinates vector with \({q}_{i}, i\in 1..5\) determined by an angular displacement variable in\({\left[\mathcal{R}\right]}_{2\pi }\). The configuration manifold\(Q={\prod }_{i=1}^{5}{q}_{i}\), then, is a generalized cylinder. \(\dot{q}\in {\mathcal{R}}^{5}\) is the generalized velocities that \({\left({q}^{T},{\dot{q}}^{T}\right)}^{T}\in \mathrm{TQ}\) where \(\mathrm{TQ}\) is the Cartesian product of \(\mathrm{TQ}=Q\times {\mathcal{R}}^{5}\). In Eq. (1), \( B\left( q \right):Q \to {\mathcal{R}}^{{5 \times 4}} \) is a smooth map whose rank is four for all\(q\in Q\). Further, \(D(q)\), the mass matrix, is symmetric and positive definite for all \( q\in Q\), \(C\left(q,\dot{q}\right)\) is a matrix of Coriolis and centrifugal forces and \(G(q)\), the gradient of potential energy function, is smooth and \(u\) is a vector of control input. One can see the specific formula expression of Eq. (1) in [32] section E page 465.

2.2 Modeling of DSP

The impact between the swing leg and the ground is analyzed as a contact between two rigid bodies, whereas the transition takes place in an infinitesimal amount of time [32, 35, 46]. Since this phase occurs when the tip of the swing foot touches the ground, we define impact surface (I) given as,

$$ I: = \left\{ {\left( {q^{T} ,\dot{q}^{T} } \right)^{T} \in TQ\left| { Y_{{\text{swing foot}}} = 0 ,X_{{\text{swing foot}}} } \right\rangle \,0 } \right\} $$
(2)

The contact model requires the full seven DOF. Accordingly, one can add the two extra coordinates as the position of swing foot, \({p}_{2}(q)\). External forces act as impulses since the impact occurs instantaneously. It can be proven that the configuration states are not changed, but there happens a jump in the velocity states (see [32] and its other references for more details). As a result, the governing differential equation of this phase is converted into the following algebraic equation,

$$ D\left( {\theta_{d} } \right)\left( {\dot{\theta }_{d}^{ + } - \dot{\theta }_{d}^{ - } } \right) = \;F_{ext} $$
(3)

where \(\theta_{d} : = \left( {q^{T} ,X_{{\text{swing foot}}} \left( {q^{ - } } \right),Y_{{\text{swing foot}}} \left( {q^{ - } } \right)} \right)^{T}\) is the orientation of the robot links in the space which depicts seven DOF. \(F_{ext} : = \mathop \smallint \limits_{{t^{ - } }}^{{t^{ + } }} \delta F_{ext} \left( \alpha \right)d\alpha ,\;\dot{\theta }_{d}^{ + }\) is the velocity coordinates after the impact, by index \(+\). And, \(\dot{\theta }_{d}^{ - }\) is the velocity coordinates before this event, by index\(-\).

To solve Eq. (3) for all unknowns, we need two additional equations describe what happens in touching the swing foot with the ground. These equations are derived under the assumption of no rebound nor slip [32,33,34], as Eq. (4).

$$ \frac{{\partial p_{2} \left( {\theta_{d} } \right)}}{{\partial \theta_{d} }}\dot{\theta }_{d}^{ + } = 0 $$
(4)

As a result, the impact map is obtained using Eq. (4) and Eq. (3).

$$ \left( {q^{ + } ,\dot{q}^{ + } } \right) = \Delta \left( {q^{ - } ,\dot{q}^{ - } } \right) $$
(5)

2.3 Dynamics equation in its overall form

The overall form of the governing equation for the dynamics is expressed with Eq. (6).

$$ \Sigma : = \left\{ {\begin{array}{*{20}c} {D\left( q \right)\ddot{q} + C\left( {q,\dot{q}} \right)\dot{q} + G\left( q \right) = B\left( q \right)u\quad \left( {q^{ - } ,\dot{q}^{ - } } \right) \notin I} \\ {\left( {q^{ + } ,\dot{q}^{ + } } \right) = \Delta \left( {q^{ - } ,\dot{q}^{ - } } \right)\left( {q^{ - } ,\dot{q}^{ - } } \right) \in I} \\ \end{array} } \right. $$
(6)

Let \(x: = \left( {q^{T} ,\dot{q}^{T} } \right)^{T} \in TQ\). The state space for the system is then taken as Eq. (7).

$$ \Sigma _{s} : = \left\{ {\begin{array}{*{20}c} {\dot{x} = f\left( x \right) + g\left( x \right)ux^{ - } \notin I} \\ {x^{ + } = \Delta \left( {x^{ - } } \right)\quad \quad x^{ - } \in I} \\ \end{array} } \right. $$
(7)

The dynamics of the system is governed by a differential equation and a discrete map. The differential equation defines the evolution of the trajectories in time, and the time when the trajectories meet the impact surface. The discrete map results in a very rapid change in the velocity components of the state vector. The differential equation is therefore reinitialized after the impact event to be solved again in the next step of walking.

3 Problem formulation

This article addresses the problem of steadily regulating the average speed of a five-link bipedal robot with the instantaneous impact depending on the demands. First, it must define the model dynamics. This work, due to our control method, is the dynamic based to increase its accuracy and robustness. Our model of a biped robot with instantaneous impact is a hybrid system with one degree of under actuation [32], as shown by Eq. (6). But unlike [23, 32, 42], we neither intend to provide a library nor use a simple model. The hybrid property has a discontinuous effect on state-space trajectories at impact event. The discontinuity of Eq. (6) shifts the state variables to a different location in space according to the impact mapping \(\Delta \left( . \right)\). We therefore design a proper control method for the mentioned purpose, while overcoming both under-actuation and hybrid challenges. To make this point, we defined the stability law in the first section. By that, we express three sets that should be stabilized relative to each other in the control method. One of these sets shows the gait of walking defined using four VHCs to make a manifold, known as a constraint manifold. The two specifications of the constraint manifold are highlighted here. The first is that it contains the trajectory of zero dynamics. The second is that its governing equation is shown by a second-order differential equation. Based on these specifications, we formulate a relation for the average speed of a biped robot. The other sets deal with how we obtained the goal. We will therefore design a two-level control scheme with asymptotic stability property in each level. Speed regulation is achieved by deforming the VHCs among altering their parameters, name as characterization parameters. A low-level control scheme is implemented by stabilizing the VHCs to build a periodic motion gate. The high-level control scheme is paid for dynamical shaping of the VHCs through the obtained formula for the average speed of the biped robot. Change in the VHCs takes place when the low-level controller converges VHCs close to zero. In this regard, the control scheme includes a critic agent that assesses the system to plan for getting new characterization parameters from high-level control. If the trend of motion is stable, it also increases the time rate of generating new characterization parameters to accelerate the average speed regulation process, and if the system comes to be unstable, it reduces the time rate as much as it needed to allow the system to stabilize itself by means of low-level control. Feedback linearization backstepping sliding mode (FLBS) control method [49, 50] is employed at low level to enhance robustness against disturbances. One can find the overall scheme of the proposed method in Fig. 2.

Fig. 2
figure 2

The schematic of control method

4 Motion planning and stability analysis

In this section, we describe an appropriate stability theorem and explain how one can use it for our purposes. To this end, we use the VHCs to make a stable gait that can be utilized by a biped robot. In contrast to numerical methods commonly used in references [32, 37, 51], we describe the sufficient and necessary conditions for the analytical stability analysis. We then introduce the notion of adaptive deforming by characterizing these constraints. Moreover, we develop a mathematical formula to determine the exact average speed of the biped robot with the original model.

4.1 Characterizing the VHCs

Throughout this work, the VHCs are used to achieve our aims. Since the system has 4 independent controller inputs, we suppose 4-row vector function, \(H(q)\), to define VHCs with relative degree of \(\left\{2,\dots ,2\right\}\). The relative degree is determined by the appearance of the input in the time derivatives of the \(H(q)\) w.r.t the dynamic system, \(\Sigma \). The Bezier curves is utilized for the functions of \(H(q)\). Having no sudden or large oscillations resulted from the small changes in their coefficients persuade us to select them [32, 52]. Let \(1\le i\le 4\) and \(\sigma \) is a regular parameterization. Bezier polynomial [16] of degree \(M\) is defined as polynomial \({Be}_{i}\left(\sigma \right):\left[\mathrm{0,1}\right]\to {\mathbb{R}}\) with \(M+1\) coefficients \(\beta_{k}^{i} { },{ }1 \le k \le M + 1.\)

$$ Be_{i} \left( \sigma \right): = \mathop \sum \limits_{k = 0}^{M} \frac{M!}{{k!\left( {M - k} \right)!}}\beta_{k}^{i} \left( {1 - \sigma } \right)^{M - k} \sigma^{k} $$
(8)

A regular parameterization should be defined based on the configuration state of zero dynamic, \(\theta \). In [32, 45, 53], it is shown that the walking gait is monotonic; thus, a regular parameterization can be dimensionless, like Eq. (9).

$$ \sigma = \frac{{\theta - \theta^{ + } }}{{\theta^{ - } - \theta^{ + } }} $$
(9)

In Eq. (9), \({\theta }^{+}\) and \({\theta }^{-}\) are the maximum and minimum values. Our method calls \(\beta \) as the characterization parameters, i.e., by changing them, we can modify the characteristic of the dynamic system.

When the dynamic system \(\Sigma \) with the output of \(y=H\left(q,\beta \right)\) is in rhythmic motion, it is time-invariant. In this manner, zero dynamics equation is derived to represent the maximum internal dynamics of the system in which the controller does not appear on the equation in any form. It is easily calculated by multiplying the left annihilator of \(B\left( q \right),\;B^{ \bot } \left( q \right){ }:Q{ } \to {\mathbb{R}}^{1 \times 4} \backslash \left\{ 0 \right\}\), into Eq. (6), on the left-hand side. As a result, we get the zero dynamics equation described as follows,

$$ \left\{ \begin{gathered} \alpha \left( \theta \right)\theta + \rho \left( \theta \right)\dot{\theta }^{2} + \gamma \left( \theta \right) = 0\quad \quad \left( {\theta ^{ - } ,\dot{\theta }^{ - } } \right) \notin I \hfill \\ \left( {\theta ^{ + } ,\dot{\theta }^{ + } } \right) = \Delta \left( {\theta ^{ - } ,\dot{\theta }^{ - } } \right)\quad \quad \quad \quad \quad \left( {\theta ^{ - } ,\dot{\theta }^{ - } } \right) \in I \hfill \\ \end{gathered} \right. $$
(10)

4.2 Overall stability analysis

The control method approach followed in this paper has a hierarchical structure. This approach is divided into two levels. As we describe shortly, each level corresponds to stabilizing a closed set of the space state, which we introduce below. After that, these two levels have to be stable together in a general insight, to form the following stability theorem (Theorem 1).

Definition 1

Let \({\Gamma }_{1}\) and \({\Gamma }_{2}\) be closed positively invariant sets with \({\Gamma }_{1}\subset {\Gamma }_{2}\subset \mathcal{X}\). For any \(\epsilon >0\), \({\Gamma }_{1}\) is stable relative to \({\Gamma }_{2}\) for \({\Sigma }_{s}\) if a neighborhood \(\mathcal{B}\left({\Gamma }_{1}\right)\) exists such that \(\phi \left({\mathcal{R}}^{+},\mathcal{B}\left({\Gamma }_{1}\right)\cap {\Gamma }_{2}\right)\subset {\mathcal{D}}_{\epsilon }\left({\Gamma }_{1}\right)\) where \(\phi \left({\mathcal{R}}^{+},\mathcal{B}\left({\Gamma }_{1}\right)\cap {\Gamma }_{2}\right)\) denotes the set \(\left\{\phi \left(t,{x}_{0}\right):t\in {\mathcal{R}}^{+},{x}_{0}\in \mathcal{B}\left({\Gamma }_{1}\right)\cap {\Gamma }_{2}\right\}\) and \({\mathcal{D}}_{\epsilon }\left({\Gamma }_{1}\right)\) denotes the \(\epsilon \)-ball given by the set \({\mathcal{D}}_{\epsilon }\left({\Gamma }_{1}\right)=\left\{x\in \mathcal{X}:{\Vert x\Vert }_{{\Gamma }_{1}}<\epsilon \right\}\) [54]. \({\Vert x\Vert }_{{\Gamma }_{1}}\) shows the point-to-set distance of \(x\) to \({\Gamma }_{1}\).

For all \(x_{0} \in \Gamma_{i} , i = 1,2\) and all \(t \ge 0,\) the invariance property implies that \( \phi \left( {t,x_{0} } \right) \in \Gamma_{i} , i = 1,2\). Besides, the set \(\Gamma_{1}\) is asymptotically stable relative to \({\Gamma }_{2}\) for\({\Sigma }_{s}\), provided that when \({x}_{0}\in {\Gamma }_{2}\) then \({\Gamma }_{1}\) is asymptotically stable. Therefore, with the help of some suitable sets and the following stability theorem, we are able to shape the strategy for reaching our goal. For the first set, it is defined some characterization parameters and then modify them to form the other sets. \({\Gamma }_{3}\) represents the manifold on which the trajectories are initially placed. It is assumed that the average speed of the biped robot,\(\overline{v }\), has an amount \({\overline{v} }_{0}\) which would be varied to our preference. The set \({\Gamma }_{1}\) defines the target manifold in which the dynamic system has the demand characteristic,\({\overline{v} }_{des}\). Unlike the methods in which the desired trajectories should be completely pre-defined, it is not clear from the beginning and is defined based on the search. Therefore, some transition manifolds \({\Gamma }_{2}\) are constructed through which we can reach to the target manifold\({\Gamma }_{1}\). These topologies are set by the high-level controller. Obviously, all manifolds of \({\Gamma }_{2}\) together surround a range of space from \({\Gamma }_{3}\) manifold to \({\Gamma }_{1}\) manifold. We show the successive transition manifolds \({\Gamma }_{2}\) by\({\Gamma }_{2,t}\),\(0<t<\infty \). Since the biped robot has 5 DOF in DSP with 4 control input, it must consider 4 outputs as VHC Bezier curves. For simplicity, the characterization parameters are assumed to be one or two. Equation (11) demonstrates the referred manifolds, \({\Gamma }_{1},{\Gamma }_{2},{\Gamma }_{3}\) for regulating the average speed of a biped robot using two characterization parameters.

$$ \begin{gathered} \Gamma_{1} = \left\{ {\left( {q,\dot{q},\beta \_1} \right) \in {\mathbb{R}}^{12} :y - H_{\beta \_1} \left( q \right) = 0 \& \overline{v} - \overline{v}_{{{\text{des}}}} = 0} \right\} \hfill \\ \Gamma_{2,t} = \left\{ {\left( {q,\dot{q},\beta \left( t \right)} \right) \in {\mathbb{R}}^{12} :H_{\beta \left( t \right)} \left( q \right) - H_{{\beta^{1} }} \left( q \right) \to 0\;{\text{as}}\;t \to \infty , \beta \left( 0 \right) - \beta^{0} < \smallint_{1} } \right\} \hfill \\ \Gamma_{3} = \left\{ {\left( {q,\dot{q},\beta \_0} \right) \in {\mathbb{R}}^{12} :y - H_{\beta \_0} \left( q \right) < \smallint_{2} } \right\} \hfill \\ \end{gathered} $$
(11)

where \(\in_{1}\) and \(\in_{2}\) have sufficiently small positive values. \(\beta \_0\) and \(\beta \_1\) indicate the different set of \(\beta\) devoted to the constraint manifolds.

Theorem 1

Suppose a five-link biped robot is USWII dynamic system (7) with locally Lipschitz feedback \({u}_{i}(x), i=1..4\). Suppose its initial movement is placed on the set \({\Gamma }_{3}\), which is positively invariant. As a target for robot motion, the desired average speed can be specified as a set \({\Gamma }_{1}\). Therefore, it must positively invariant and asymptotically stable for the closed-loop system (7), if the following conditions (a)-(b) are held.

a. The sequence of manifolds \({\Gamma }_{2,t}\) must be made asymptotically convergent to \({\Gamma }_{1}\) as \(t\to \infty \).

b. For any two successive manifolds \({\Gamma }_{2,{t}_{1}}\) and \({\Gamma }_{2,{t}_{2}}\) there exists\(\epsilon \), \(0<{t}_{2}-{t}_{1}<\epsilon \), such that the local Lipschitz feedback \({u}_{i}(x), i=1..4\) cause the \({\Gamma }_{2,{t}_{2}}\) is asymptotically stable starting from \({\Gamma }_{2,{t}_{1}}\).

Proof

The proof can be reached directly by using the Theorem 10 in [55] for both successive sets of \({\Gamma }_{2,t}\) and considering their convergence to \({\Gamma }_{1}\).

From Theorem 1 and the characterized VHCs, it is apparent that the biped robot must have a stable motion with Bezier VHC curves using adaptive deforming strategy. Since the general forms of the VHCs have not altered (only changing in the value of the characterization parameter), it is preferring the low-level control to be robust and exponentially stabilizing. If the changes are enough small, one can infer that the transition manifolds would be close enough to be placed in their attractive domains. As a result, the low-level control makes \({\Gamma }_{3}\) positively invariant and satisfies condition (b). The high-level controller is responsible for producing transition manifolds to converge to \({\Gamma }_{1}\), i.e., to satisfy condition (a). Additionally, a critical agent is designed to check that the transition manifolds are sufficiently close and prevent the system from becoming unstable.

5 Proposing the control approach

In this section, we introduce an innovative control approach at two levels. In the high level, new characterization parameters are sought to meet the demand. Through the low-level control, the new VHCs produced by the high-level control are stabilized. In addition, this approach has a critic agent that assesses the system dynamics and convergence rate of low-level control and tune the change rate of VHCs parameters.

5.1 Designing the low-level control

There are some types of control method that can stabilize the VHCs in a dynamic system. It is common to use numerical methods to examine stability in the literature, such as Poincare maps [39, 54], and some others modify it to become more robust [9, 51, 55]. Researchers are drawn to using the analytical method due to the drawbacks inherent in numerical method. It will demonstrate the analytic stability theorem using the concept of the impact invariance. The impact invariant property specifies that while the states are on a particular path, they will remain on it even after the impact event. Equation (12) shows the impact relation; therefore, the states before and after the impact mapping \(\Delta \left( . \right)\) fulfill the VHC functions. Even though this concept does not eliminate jumps in the trajectories, it ensures that the trends of the control is preserved.

$$ \left( {q^{ + } ,\dot{q}^{ + } } \right) = \Delta \left( {q^{ - } ,\dot{q}^{ - } } \right) $$
(12)

The FLBS method was used in this study. It is a robust and can stabilize the VHCs to be exponentially attracted by the system trajectories. Here, we present only the final control form and its Lyapunov relation; further details can be found in [49, 50]. Consider \(y\) as the output function of the Bezier curve VHCs associated with Eq. (7). Following the FLBS method, we can define the Lyapunov function of the low-level controller, \({V}_{L}\), named as inner Lyapunov function, as follows:

$$ V_{L} = \frac{1}{2}y^{2} + \frac{1}{2}\left( {\dot{y} + \lambda_{1} y} \right)^{2} + \frac{1}{2}S_{L}^{2} $$
(13)

where \({S}_{L}\) is inner sliding surface with Eq. (14) and \({\lambda }_{1}>0\).

$$ S_{L} = \lambda_{2} \left( {L_{f} y\left( z \right) + \lambda_{1} y} \right) + \left( {L_{f}^{2} y\left( z \right) + L_{g} L_{f} y\left( z \right)u + \lambda_{1} L_{f} y\left( z \right)} \right) + y $$
(14)

\(L_{f} y\left( z \right)\) is the Lie derivative of \(y\left( z \right)\) with respect to \(f\left( z \right)\). In this method, the control law has a dynamic equation as Eq. (15) with \(\lambda_{2} ,\lambda_{3} ,k > 0\).

$$ \begin{aligned} \dot{u} & = - L_{g} L_{f} y\left( z \right) ^{ - 1} \left( {2L_{f} y\left( z \right) + \lambda_{1} \left( {L_{f}^{2} y\left( z \right) + L_{g} L_{f} y\left( z \right)u} \right)} \right. \\ \left. { + \frac{{\text{d}}}{{{\text{d}}t}}L_{f}^{2} y\left( z \right) + \frac{{\text{d}}}{{{\text{d}}t}}L_{g} L_{f} y\left( z \right)u + \lambda_{2} \left( {L_{f}^{2} y\left( z \right)} \right. + L_{g} L_{f} y\left( z \right)u + \quad \lambda_{1} L_{f} y\left( z \right)) + \lambda_{1} y + k{\text{sign}}\left( {S_{L} } \right) + \lambda_{3} S_{L} } \right) \\ \left. { + \frac{{\text{d}}}{{{\text{d}}t}}L_{f}^{2} y\left( z \right) + \frac{{\text{d}}}{{{\text{d}}t}}L_{g} L_{f} y\left( z \right)u + \lambda_{2} \left( {L_{f}^{2} y\left( z \right)} \right. + L_{g} L_{f} y\left( z \right)u + \quad \lambda_{1} L_{f} y\left( z \right)) + \lambda_{1} y + k{\text{sign}}\left( {S_{L} } \right) + \lambda_{3} S_{L} } \right) \\ \end{aligned} $$
(15)

The time derivative of the inner Lyapunov function, \({V}_{L}\), is negative definite which is shown in Eq. (16), also it was also shown in [49, 50] that the system has an exponential stability.

$$ \dot{V}_{L} = - \lambda_{1} y^{2} - \lambda_{2} \left( {\dot{y} + \lambda_{1} y} \right)^{2} - \lambda_{3} S_{L}^{2} - kS_{L} {\text{sign}}\left( {S_{L} } \right) $$
(16)

Consequently, the development of a low-level controller with exponential stability is completed, and its robustness is sufficient to update the characterization parameters without concern. Hence, we proved that the \({\Gamma }_{3}\) is at least asymptotically stable for the dynamic system (7) regarding Theorem 1.

Theorem 2

Regard the USWII as a form of Eq. (7) with 5 DOF and 4 independent actuators. Consider a set of 4 smooth functions \(H\left(q\right)\) has a well-defined relative degree \(\left\{2,\dots ,2\right\}\) and it is chosen as VHCs. Let \(\Gamma \) denotes its constraint manifold, i.e., \(\Gamma =\left\{\left.\left(q,\dot{q}\right)\in TQ\right|H\left(q\right)=0,{L}_{f}H\left(q\right)=0\right\}\) where \({L}_{f}H(q)\) refers to Lie derivative of \(H(q)\) along \(f\). The dynamic system (7) can be asymptotically stabilized, if the following conditions are satisfied.

\(\left(\mathrm{a}\right) \) is invertible on every point of \(\Gamma \).

(b) \(\Gamma_{1}\) is impact invariant.

(c) The convergence time of the controller is strictly less than the time of a single step of biped robot.

Proof

By stabilizing a set of 4 smooth function \(H\left(q\right)\) of VHCs, the biped robot would have a rhythmic motion [33, 34]. For rhythmic motions, suppose we have the nominal solution \({\phi }_{0}\left(t\right)\) of Eq. (7) such that \({\phi }_{0}\left(t+T\right)={\phi }_{0}\left(t\right)\) for all \(t\in {\mathbb{R}}^{+}\), where a finite \(T>0\) exists. Condition (a) causes that if the states of the biped robot are placed on its constraint manifold \(\Gamma \), the control method could stabilize them. On the other hand, the term \({{L}_{g}{L}_{f}H(q) }^{-1}\) in feedback linearization method like Eq. (15) is bounded [33, 34]. As a result, the control method ensures that VHCs remain stable if there are deviations in the states which make them get far from \(\Gamma \). The USWII has a hybrid property at impact event [32, 39]. Impact causes that the states become far from the \(\Gamma \), i.e., \({\phi }_{0}\left({T}^{+}\right)\not\subset \Gamma \). The impact invariant property, shown in Eq. (12), causes the states also to be maintained on \(\Gamma \), especially after the impact event. Condition (c) causes that the biped robot find its required movement rapidly before it faces its impact mapping \(\Delta \left( . \right)\). Because the impact mapping may cause the states to get far from \(\Gamma \) \(\square \).

Remark 1

Suppose \({\phi }_{0}\left(t\right)\) is the desired periodic orbit. A suitable control should be chosen such that the contraction taking place during the SSP dominates the expansion occurrence over the impact event. The FLBS method can be utilized as a low-level controller.

Following Theorem 2, if the rank \({dH}_{q}\) equals 4 for all \(q\in {H}^{-1}(0)\), it leads to condition (a) be satisfied [32, 33]. By defining the output of the dynamic system as follows,

$$ y = H\left( {q,\beta } \right) = h_{0} \left( q \right) - h_{{{\text{des}}}} \left( {\sigma ,\beta } \right) $$
(17)

Given that Eq. (17) is split into two parts, \({h}_{0}(q)\) and \({h}_{\mathrm{des}}(\sigma ,\beta )\). Because of independence \({h}_{\mathrm{des}}(\sigma ,\beta )\) toward states, if the rank \({h}_{0}(q)=4\), then rank \({\mathrm{d}H}_{q}\) will equal 4. For simplicity, we define \({h}_{0}\left(q\right)={\left[{q}_{2},{\dots ,q}_{5}\right]}^{T}\) and \({q}_{1}=\theta \), where \(\theta \) represents the configuration state of zero dynamic equation.

Since there are four VHCs, it is easier to avoid complexity by using \(M=3\). In addition, it is a minimum number chosen for the Bezier curves to add impact invariant property. When \({\theta }^{+}\) and \({\theta }^{-}\) are the maximum and minimum values of the zero dynamic state in Eq. (9), it is common to write Eq. (18) for the initial and final conditions of each step.

$$ \begin{gathered} \left( {q^{ - } ,\dot{q}^{ - } } \right) = \left( {\beta_{M} ,\frac{{M\dot{\theta }^{ - } }}{{\theta^{ - } - \theta^{ + } }}\left( {\beta_{M} - \beta_{M - 1} } \right)} \right) \hfill \\ \left( {q^{ + } ,\dot{q}^{ + } } \right) = \left( {\beta_{0} ,\frac{{M\dot{\theta }^{ + } }}{{\theta^{ - } - \theta^{ + } }}\left( {\beta_{1} - \beta_{0} } \right)} \right) \hfill \\ \end{gathered} $$
(18)

Based on Eq. (12), VHCs is impact-invariant if and only if \(\Delta \left( {S \cap \Gamma } \right) \subset \Gamma\). Therefore, \((\beta_{2} ,\beta_{3} )\) can be calculated in relation of \((\beta_{0} ,\beta_{1} )\) as follows:

$$ \left( {\beta_{0} ,\frac{{3\dot{\theta }^{ + } }}{{\theta^{ - } - \theta^{ + } }}\left( {\beta_{1} - \beta_{0} } \right)} \right) = \Delta \left( {\beta_{3} ,\frac{{3\dot{\theta }^{ - } }}{{\theta^{ - } - \theta^{ + } }}\left( {\beta_{3} - \beta_{2} } \right)} \right) $$
(19)

Characterization parameters can be expressed as \({(\beta }_{0},{\beta }_{1})\) or their functions.

5.2 Designing the high-level control

High-level controller is responsible for generating new characterization parameters to produce a new VHC set for dynamic adaptation. There must be two requirements behind the design of the high-level method. The method must first produce sets in \({\Gamma }_{2,t}\) that converge to our objective, \({\Gamma }_{1}\). The second requirement is that it must have some parameters for adjusting the convergence rate. It must formulate a mathematical relation for the desired characteristics at first. The zero dynamic equation plays a key role in this regard. It is common to transmit the evolution of zero dynamics to the normal form by a suitable mapping of the state form \(\left({\zeta }_{1},{\zeta }_{2}\right)\), as demonstrated in Theorem 5.1 in [32].

$$ \left\{ {\begin{array}{*{20}c} {\dot{\zeta }_{1} = \psi_{1} \left( {\zeta_{1} ,\beta } \right)\zeta_{2} } \\ {\dot{\zeta }_{2} = \psi_{2} \left( {\zeta_{1} ,\beta } \right) } \\ \end{array} } \right. $$
(20)

where \(\left( {\zeta_{1} ,\zeta_{2} } \right)\) can be define as follows:

$$ \left\{ {\begin{array}{*{20}c} {\zeta_{1} = \left. \theta \right|_{{\Gamma_{3} }} } \\ {\zeta_{2} = \left. {\frac{\partial K}{{\partial \dot{\theta }}}} \right|_{{\Gamma_{3} }} } \\ {\psi_{1} \left( {\zeta_{1} ,\beta } \right) = \left. {\frac{\partial \theta }{{\partial q}}\left[ {\begin{array}{*{20}c} {\frac{\partial H}{{\partial q}}} \\ \gamma \\ \end{array} } \right]^{ - 1} \left[ {\begin{array}{*{20}c} 0 \\ 1 \\ \end{array} } \right]} \right|_{{\Gamma_{3} }} } \\ {\psi_{2} \left( {\zeta_{1} ,\beta } \right) = - \left. {\frac{\partial U}{{\partial \theta }}} \right|_{{\Gamma_{3} }} } \\ \end{array} } \right. $$
(21)

that \(K\) and \(U\) are the functions of kinetic and potential energy and \(\gamma \) is obtained from the last entry of \(D(q)\). We will then derive a formula that shows the average speed of the robot during each step. The average walking rate can be calculated by dividing the step length by the elapsed time after a step.

Definition 2

The time impact per step (TIS) \( T1:\,TQ \to \mathbb{R} \cup \left\{ \infty \right\} \) can be defined as the time when the impact condition happens, as follows:

$$ T_{I} \left( {x_{0} ,\beta } \right): = \left\{ {\begin{array}{*{20}c} {\inf \left\{ {t_{f} \ge 0 | \varphi \left( {t_{f} ,x_{0} } \right) \in I} \right\} } \\ {\infty {\text{otherwise}},} \\ \end{array} } \right. $$
(22)

\({x}_{0}\) is the initial state and \({t}_{f}\) is the time taken that the trajectories reach to the impact surface, \(I\). By keeping the step length as the placing of the swing foot on the ground in mind, \({p}_{2}({\theta }^{-},\beta )\), the average speed can be computed as follows,

$$ \overline{\upsilon } = \frac{{p_{2} \left( {\theta^{ - } ,\beta } \right)}}{{T_{I} \left( {x_{0} ,\beta } \right) \circ \Delta }} = {\mathcal{F}}\left( \beta \right) $$
(23)

where \(\circ\) refers the Composition of maps. Both \(p_{2} \left( {\theta^{ - } ,\beta } \right)\) and \(T_{I} \left( {x_{0} ,\beta } \right) \circ \Delta\) are well-defined on the open subset \(\tilde{I} \subset I\) where \(0 < T_{I} \left( {x_{0} ,\beta } \right) \circ \Delta < \infty\) and the associated impacts are transversal to \(I\). In Theorem 3, we prove that zero dynamic trajectories can be fixed by adding the property of the impact invariance to 4 VHCs. It causes that the dynamics of the whole system is only affected by the parameters of the VHCs, and not by the initial conditions or the control parameters, therefore we can formulate \(T_{I}\) by considering \(\zeta_{2} {\text{d}}\zeta_{2} = \frac{{\psi_{2} \left( {\zeta_{1} } \right)}}{{\psi_{1} \left( {\zeta_{1} } \right)}}{\text{d}}\zeta_{1}\) from Eq. (21) as follows:

$$ T_{I} = \mathop \smallint \limits_{{\theta^{ + } }}^{{\theta^{ - } }} \frac{{{\text{d}}\zeta_{1} }}{{\psi_{1} \left( {\zeta_{1} ,\beta } \right)\zeta_{2} \left( {\zeta_{1} ,\beta ,\zeta_{2}^{ - } } \right)}} $$
(24)

where \(\zeta_{2} \left( {\zeta_{1} ,\beta ,\zeta_{2}^{ - } } \right)\) can be defined as follows:

$$ \frac{{\zeta _{2}^{{ - 2}} - \zeta _{2}^{{ + 2}} }}{2} = \mathop \smallint \limits_{{\zeta _{1}^{ + } }}^{{\zeta _{1}^{ - } }} \frac{{\psi _{2} \left( {\zeta _{1} } \right)}}{{\psi _{1} \left( {\zeta _{1} } \right)}}{\text{d}}\zeta _{1} $$
(25)

Theorem 3

Suppose the USWII as a form of Eq. (7) with 5 DOF and restricted by 4 impact-invariant VHCs. Then, the dynamics of the system is unique regardless of the control parameters and initial conditions.

Proof

When the dynamic system is restricted to four impact-invariant VHCs, the output, Eq. (17), should be zero. Therefore, there is a homeomorphic map that relates the states of the system to zero dynamic state. For showing the uniqueness of the orbit, it is sufficient to show that the zero dynamic evolution is fixed and predetermined, and not dependent on the controller stabilizing the VHCs. Zero dynamics is described by the differential equation with a discrete mapping in Eq. (10). Solve Eq. (10) in its general form by assuming B is a trajectory of zero dynamic states and C is the initial state. Two constants are then used to identify B for the differential equation of two orders.

$$ \theta \left( t \right) = F\left( {A_{1} ,A_{2} ,t} \right) $$
(26)

If we solve the trajectory for the initial and final times at each step (without contrast, we can determine \(T_{I}\) as its motion period), we find;

$$ \left\{ {\begin{array}{*{20}c} {\theta \left( 0 \right) = F\left( {A_{1} ,A_{2} ,0} \right) = \theta^{ + } } \\ {\theta \left( {T_{I} } \right) = F\left( {A_{1} ,A_{2} ,T_{I} } \right) = \theta^{ - } } \\ {\dot{\theta }\left( 0 \right) = F\left( {A_{1} ,A_{2} ,0} \right) = \dot{\theta }^{ + } } \\ {\dot{\theta }\left( {T_{I} } \right) = F\left( {A_{1} ,A_{2} ,T_{I} } \right) = \dot{\theta }^{ - } } \\ \end{array} } \right. $$
(27)

It should be noted that the mathematical formula of \(F\left(.\right)\) is determined by the VHC shapes. We have four relations and seven unknown parameters, \(T_{I} , A_{1} ,A_{2} , \theta^{ + } , \theta^{ - } , \dot{\theta }^{ + } , \dot{\theta }^{ - }\). The other equations can be derived from the discreet mapping, \(\Delta \left( {\theta^{ - } ,\dot{\theta }^{ - } } \right)\), that results in two new equations. Last equation is also derived from the impact surface\(I\), at the time of impact. After determining the seven unknown parameters using the seven equations outlined above, the zero dynamic will have a fixed trajectory. Therefore, the orbit of the USWII restricted by 4 distinct VHCs is independent of the initial parameters or control parameters that affect how long the constraints become stabilized.

By Theorem 3, we prove for each set of VHCs, one orbit with a specific characteristic is shaped in a phase diagram. Therefore, every characteristic only depends on the shape of VHCs. To have a desired average speed, we first define \(e=\overline{\upupsilon }-{\overline{\upupsilon } }_{des}\) and the outer sliding surface \({S}_{H}\), as follows:

$$ S_{H} : = e + \alpha_{1} \smallint e {\text{d}}t $$
(28)

For the design of the stabilizing high-level control method, the outer Lyapunov function can be chosen as \(V_{H} = \frac{1}{2}S_{H}^{2}\). According to [57,58,59], the time derivative of \(V_{H}\) should be negative definite to have an asymptotically stability in high-level control. Taking the derivative of \(\dot{S}_{H} = - \alpha_{2} S_{H}\), the time derivative of \(V_{H}\) is \(\dot{V}_{H} = - \alpha_{2} S_{H}^{2}\). By expanding \(\dot{S}_{H} = - \alpha_{2} S_{H}\) by Eq. (29), we have,

$$ \dot{e} + \alpha_{1} e = \mathop {\overline{\upsilon }}\limits - \mathop {\overline{\upsilon }}\limits_{d} + \alpha_{1} e = \frac{{\partial \overline{\upsilon }}}{\partial \beta }\dot{\beta } - \mathop {\overline{\upsilon }}\limits_{d} + \alpha_{1} e = - \alpha_{2} S_{H} $$
(29)

Thus, \(\dot{\beta }\) should satisfy Eq. (30).

$$ \dot{\beta } = \left( {\frac{{\partial \overline{\upsilon }}}{\partial \beta }} \right)^{ - 1} \left( {\mathop {\overline{\upsilon }}\limits_{d} - \alpha_{1} e - \alpha_{2} S_{H} } \right) $$
(30)

where \(\overline{\upupsilon }\) is obtained from Eq. (23). Firstly, the high-level control law searches for new parameters in the domain of attraction of the previous constraint manifold points, then it finds which of the new parameters makes the system have the desired average speed. To satisfy condition (b) of Theorem 1 in the closed-loop system, \({\Gamma }_{1}\) should become asymptotically stable using the proposed control method and, any for two successive sets of \({{\Gamma }_{2,t}}_{1} and {{\Gamma }_{2,t}}_{2}\) (as mentioned in condition (b) of Theorem 1), \({{\Gamma }_{2,t}}_{2}\) should be asymptotically stable starting from \({{\Gamma }_{2,t}}_{1}\). The change in \({\alpha }_{2}\) and \({\alpha }_{1}\) directly affects the convergence rate of producing sets. This rate can be tuned by the critic agent using those parameters. Therefore, we called them the adjusting parameters.

5.3 Designing the critic agent

Assuring that the control process does not lead to instability in the dynamic system is the task of the critic agent. Two important points need to be considered. As a first point, a criterion should be identified that indicates whether the system has departed from the path determined by the control method. The second point relates to how and how much the production rate of new sets, \({\Gamma }_{2,t}\), varies over time. The best way to provide the criterion is to check the trend of convergence of the output functions. The rate of change in VHCS should be proportional to the rates of convergence of \(y\) and \(\dot{y}\) to zero using the low-level controller. In other words, the generation of new \({\Gamma }_{2,t}\) should wait until the dynamic system becomes stable. By decreasing or increasing the adjustment parameters, the critic agent tries to adjust the rate of generating \({\Gamma }_{2,t}\). A few rules about the changes are also introduced to ensure the second point is met. If \({u}_{\psi },\) known as an assessment value, is calculated by considering \({e}_{\psi }=\mathrm{col}(y,\dot{y})\), we design the critic agent like [60, 61].

$$ u_{\psi } = \left\{ {\begin{array}{*{20}l} 1 \hfill & {if\;\dot{e}_{\psi } > 0\;or\;e_{\psi } > E_{{\max }} } \hfill \\ 0 \hfill & {else} \hfill \\ \end{array} } \right. $$
(31)

where \({E}_{\mathrm{max}}\) represents a specified affirmative limit. Hence, the critic agent, \(\psi \), is calculated by a dynamic in Eq. (32).

$$ \dot{\psi } = - \frac{1}{{\delta_{l} }}\left( {1 - u_{\psi } } \right)\psi + \delta_{m} u_{\psi } \left( {1 - \psi } \right) $$
(32)

where \({\delta }_{l}\) and \({\delta }_{m}\) are two affirmative parameters specify the speed at which \(\psi \) shows convergences or divergences. The following rules are applied to get a smoothly adaptive deforming law that obtain with the average speed of a biped robot.

  1. (1)

    If the critic agent is more than an upper-bound value, it should stop the generating new characterization parameters, e.g., for \(\psi >0.6\) then \({\alpha }_{1}={\alpha }_{2}=0.01\).

  2. (2)

    If the critic agent is between two limit values, it should wait until the critic agent fixes its trend. If its derivative is negative, e.g., for \(0.35<\psi <0.55\) then \({\alpha }_{1}\) and \({\alpha }_{2}\) not changed. If its derivative is positive, e.g., for \(0.35<\psi <0.55\) then \({\alpha }_{1}\) and \({\alpha }_{2}\) should decrease.

  3. (3)

    If the critic agent is less than a lower-bound value, it could increase the rate of generation new characterization parameters, e.g., for \(\psi <0.2\) then \({\alpha }_{1}\) and \({\alpha }_{2}\) should increase.

6 Results and discussion

In this section, we present the simulation results along with a brief discussion that illustrates the performance of the proposed adaptive deforming strategy. By using this strategy, the average speed of a five-link biped robot is adaptively converged according to our demand. Modeling and its parameters, recall Eq. (6), are taken from the famous bipedal robot, RABBIT [33]. Under-actuation and hybridity are two characteristics of this model. In order to overcome the issue, VHCs are used to make the robot dynamics follow the desired motion. We add the impact-invariance property, like Eq. (19), to use the analytical stability. VHCs are regarded as the outputs of the dynamic system. They are characterized to obtain specific characterization parameters such as a torso angle and step length. The low-level controller with Eq. (15) tries to stabilize them at all times. When they are stabilized, Eq. (24) provides the characteristics of the dynamics. The high-level controller then produces the new characterization parameters based on Eq. (30). The critic agent assesses stability by evaluating the convergence rate of the VHCs and their derivatives, like Eq. (32). Some rules are required for the critic agent to regulate the generating rate of new characterization parameters by determining the adjusting parameters. They are discussed in designing the critic agent section. The control and critic agent parameters are gathered in Table 1.

Table 1 The schedule of parameter values

Simulations demonstrate how well our strategy adapts to our demand as smoothly as possible. To begin, we initialize the robot to have an average speed of 1 (m/s) with 1.65 (rad) torso angle and 0.7 (m) step size. The first demand intends that the biped robot maintains its average speed. Then after 15, 50, and 80 steps, it will alter its average speed to attain 1.8 (m/s), 1.2 (m/s), and 2 (m/s), respectively. With this demand, we hope to cover all possible trends of increasing and decreasing. For first demand, we pick the torso angle as a candidate of the characterization parameter. Figure 3 depicts the closed orbits of certain states in the phase portraits. \({q}_{1}\) and \({q}_{2}\) and \({q}_{3}\) with their derivations are shown. As we expected, each constant part of the first demand leads to a close orbit. It also depicts the trend of the critic agent in regulating the adjusted parameters. Despite the high nonlinearity of Eq. (24) and the fluctuating in the critic agent trend, it can produce smooth adaptations. As time passes, the trend of the critic agent condenses to zero in the constant part of the average speed of the biped robot. Figure 4 depicts the trend of the regulation of the average speed of the biped robot and the selected characterization parameter during the adaptation process. It shows how adaptive deformation occurs smoothly and effectively. It is possible to change the parameters to increase the rate of response. However, this may add overshoot in trend or make stronger control signals than the actuator can generate. Figure 5 depicts the values of control signals for this demand. It shows that a high average speed causes greater control signals. Although one can optimize their values by tuning the low-level control parameters. The second demand relates to a sinusoidal desire average speed in the range of 0.8 (m/s) to 1.6 (m/s) with a period of 30 steps in one cycle. For this demand, we pick both torso angle and step size as a candidate of the characterization parameters. Figure 6 depicts the trend of the regulation of the average speed of the biped robot and the selected characterization parameters during the adaptive deforming by adjusting two characterization parameters. After taking a few steps, it deforms nicely the motion to have our demand. Because the low-level controller stabilizes VHCs of the characterization parameters after the high-level controller produces them, there is a lag in the response. According to this demand, we find that the dynamic system has a delay of about four steps. Although we show for this case if there were a delay rejection, how the trend changes. For showing robustness against disturbances, we suppose an unknown force acting on the robot that makes 100 (Nm) moment in all joints for about 20 percent of the period of each step during 20 steps. In addition, we chose another demand with a sinusoidal desire average speed in the range of 0.7 (m/s) to 1.3 (m/s) with a period of 30 steps in one cycle. In this case the delay is about 2 steps. It shows that for a low amount of the average speed, the delay is smaller. Figure 7 depicts their results on adaptive deforming of the motion. The control method can effectively reject the disturbance and continue to adaptive deforming. Figure 8 depicts the low-level control signals. It is shown that although the disturbances are large, the control system performs well.

Fig. 3
figure 3

The left figure depicts the phase portrait of configuration states vs. their velocities. The right figure depicts the critic agent trend, along with adaptive deforming method by adjusting one parameter of the VHCs, the Torso angle. The colors indicate the behavior of states and the critic agent in the last step before changing the desired velocity. The brown color indicates the average speed of 1 (m/s), the green color indicates the average speed of 1.8 (m/s), the blue color indicates the average speed of 1.2 (m/s), and the red color indicates the average speed of 2 (m/s)

Fig. 4
figure 4

The left figure depicts the desired, recommended, and actual average speed of the biped robot. The black line indicates three different constant demands occurring just a few steps after one another. The blue line indicates the recommended ones from the response of the high-level controller by adjusting one characterization parameter, the torso angle. The red line indicates the actual ones. The right figure depicts the variation of the torso angle for the adaptive deforming method

Fig. 5
figure 5

The low-level control signals in the evolution of time for the adaptive deforming method by adjusting one parameter of the VHCs, the Torso angle

Fig. 6
figure 6

The left figure depicts the desired, recommended, and actual average speed of the biped robot for a sinusoidal desire average speed. The right figure depicts the variation of the torso angle and step size for the adaptive deforming method

Fig. 7
figure 7

The left figure depicts the desired, recommended, and actual average speed of the biped robot for a sinusoidal desire average speed. It represents a result affected by disturbances, by which the circle on the figure emphasizes the time of occurrence. The right figure depicts the trend of the critic agent to deal with disturbances

Fig. 8
figure 8

The low-level control signals in the evolution of time for the adaptive deforming method by adjusting two characterization parameters of the VHCs in response to large disturbances. The circle shows the results of the control signals facing with large disturbances

7 Conclusion

In this paper, we have investigated the issue of regulating the average walking speed of five-link biped robots. The system has one degree of under-actuation and an instantaneous impact. The motion is obtained by defining 4 VHCs and adding an impact invariant property. To achieve the goal, we have introduced the concept of adaptive deforming to vary the initial constraint manifold to ones that fulfill our desires. A novel two-level hierarchical control has designed to synchronize the enforcement of the VHCs with the modification of their parameters in order to achieve the desired outcome. In our knowledge, it is the first approach that presents formal stability criteria for the controlled system which can simultaneously adapt the biped robots walking characteristic according to our desire. The simulation results have verified the effectiveness of the control method in regulating new maneuvers and rejecting the disturbances. In future work, this approach will be enhanced by eliminating the delay between the actual maneuvering and the command, as well as an extension suitable for running motion.