Control systems need to react to the environment in a predictable and repeatable fashion. Control systems take measurements and use them to control the process. For example, a ship measures its heading and changes its rudder angle to attain a desired heading.

Typically, control systems are designed and implemented with all of the parameters hard-coded into the software. This works very well in most circumstances, particularly when the system is well known during the design process. When the system is not well defined or is expected to change significantly during operation, it may be necessary to implement learning control. For example, the batteries in an electric car degrade over time. This leads to less range. An autonomous driving system would need to learn that range was decreasing. This would be done by comparing the distance traveled with the battery’s state of charge. More drastic, and sudden, changes can alter a system. For example, in an aircraft, the air data system might fail due to a sensor malfunction. If GPS were still operating, the plane would want to switch to a GPS-only system. In a multi-input-multi-output control system, a branch may fail, due to a failed actuator or sensor. The system might have to be modified to operate branches in that case.

Learning and adaptive control are often used interchangeably. In this chapter, you will learn a variety of techniques for adaptive control for different systems. Each technique is applied to a different system, but all are generally applicable to any control system.

Figure 5.1 provides a taxonomy of adaptive and learning control. The paths depend on the nature of the dynamical system. The rightmost branch is tuning. This is something a designer would do during testing, but it could also be done automatically as will be described in the self-tuning Recipe 5.1. The next path is for systems that will vary with time. Our first example of a system with time-varying parameters applies Model Reference Adaptive Control (MRAC) for a spinning wheel. This is discussed in Section 5.2.

Figure 5.1
A classification chart for a dynamic system to be controlled. Constant parameters are split into known and unknown values that lead to constant control system parameters. Time-varying parameters are split into known and unknown variations that lead to gain and auto-updated control system parameters.

Taxonomy of adaptive or learning control

The next example is ship control. Your goal is to control the heading angle. The dynamics of the ship are a function of the forward speed. While it isn’t learning from experience, it is adapting based on information about its environment.

The last example is a spacecraft with variable inertia. This shows very simple parameter estimation.

5.1 Self-Tuning: Tuning an Oscillator

We want to tune a damper so that we critically damp a spring system for which the spring constantly changes. Our system will work by perturbing the undamped spring with a step and measuring the frequency using a Fast Fourier Transform. We then compute the damping using the frequency and add a damper to the simulation. We then measure the undamped natural frequency again to see that it is the correct value. Finally, we set the damping ratio to 1 and observe the response. The frequency is measured during operation, so this is an example of online learning. The system is shown in Figure 5.2.

Figure 5.2
A diagram of the spring-mass damper system. A fixed spring with k and a damper with c are connected to the left of mass m which moves rightwards. R acts rightwards.

Spring-mass-damper system. The mass is on the right. The spring is on the top to the left of the mass. The damper is below. F is the external force, m is the mass, k is the stiffness, and c is the damping

In Chapter 4, we introduced parameter identification in the context of Kalman Filters, which is another way of finding the frequency. The approach here is to collect a large sample of data and process it in batch to find the natural frequency. The equations for the system are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{r} & =&\displaystyle v \end{array} \end{aligned} $$
(5.1)
$$\displaystyle \begin{aligned} \begin{array}{rcl} m\dot{v} & =&\displaystyle -cv -kr \end{array} \end{aligned} $$
(5.2)

c is the damping and k is the stiffness. The damping term causes the velocity to go to zero. The stiffness term bounds the range of motion (unless the damping is negative). The dot above the symbols means the first derivative with respect to time. That is

$$\displaystyle \begin{aligned} \dot{r} = \frac{dr}{dt} \end{aligned} $$
(5.3)

The equations state that the change in position with respect to time is the velocity, and the mass times the change in velocity with respect to time is equal to a force proportional to its velocity and position. The second equation is Newton’s law:

$$\displaystyle \begin{aligned} F = ma \end{aligned} $$
(5.4)

where F is force, m is mass, and a is acceleration.

\(\blacksquare \) TIP Weight is the mass times the acceleration of gravity.

$$\displaystyle \begin{aligned} \begin{array}{rcl} F & =&\displaystyle -cv - kr \end{array} \end{aligned} $$
(5.5)
$$\displaystyle \begin{aligned} \begin{array}{rcl} a & =&\displaystyle \frac{dv}{dt} \end{array} \end{aligned} $$
(5.6)

5.1.1 Problem

We want to identify the frequency of an oscillator and tune a control system to that frequency.

5.1.2 Solution

The solution is to have the control system measure the frequency of the spring. We will use an FFT to identify the frequency of the oscillation.

5.1.3 How It Works

The following script shows how an FFT identifies the oscillation frequency for a damped oscillator.

The function is shown in the following code. We use the RHSOscillator dynamical model for the system. We start with a small initial position to get it to oscillate. We also have a small damping ratio so it will damp out. The resolution of the spectrum is dependent on the number of samples:

$$\displaystyle \begin{aligned} r = \frac{2\pi}{nT} \end{aligned} $$
(5.7)

where n is the number of samples and T is the sampling period. The maximum frequency is

$$\displaystyle \begin{aligned} \omega = \frac{nr}{2} \end{aligned} $$
(5.8)

The following shows the simulation loop and FFTEnergy call.

FFTSim.m

FFTEnergy is shown as follows.

FFTEnergy.m

The Fast Fourier Transform takes the sampled time sequence and computes the frequency spectrum. We compute the FFT using MATLAB’s fft function. We take the result and multiply it by its conjugate to get the energy. The first half of the result has the frequency information. aPeak is to indicate peaks for the output. It is just looking for values greater than a certain threshold.

Figure 5.3 shows the damped oscillation. Figure 5.4 shows the spectrum. We find the peak by searching for the maximum value. The noise in the signal is seen at the higher frequencies. A noise-free simulation is shown in Figure 5.5.

The tuning approach is to

  1. 1.

    Excite the oscillator with a pulse

  2. 2.

    Run it for 2n steps

  3. 3.

    Do an FFT

  4. 4.

    If there is only one peak, compute the damping gain

Figure 5.3
Three line graphs plot r in meters, v in meters per second, and y r in meters versus time in minutes. All three lines start at (0, 0), fluctuate up to around 5 minutes, and then become constant at 0 on the y-axis.

Simulation of the damped oscillator. The damping ratio ζ is 0.5, and the undamped natural frequency ω is 0.1 rad/s

Figure 5.4
A line graph for F F T energy with resolution equals 9.59 e negative 04 radians per second plots the log of energy versus frequency. The line increases exponentially from (0.001, negative 2.2) to (0.1, 0.7), and decreases with dense calculations up to 30 radians per second. Values are estimated.

The frequency spectrum. The peak is at the oscillation frequency of 0.1 rad/sec

Figure 5.5
A line graph for F F T energy with resolution equals 9.59 e negative 04 radians per second plots the log of energy versus frequency in radians per second. The line increases exponentially from (0.001, negative 2.2) to (0.1, 0.7), and decreases smoothly to (30, negative 5.5) . Values are estimated.

The frequency spectrum without noise. The peak of the spectrum is at 0.1 rad/s in agreement with the simulation

The script TuningSim calls FFTEnergy.m with aPeak set to 0.7. The value for aPeak is found by looking at a plot and picking a suitable number. The disturbances are Gaussian-distributed accelerations, and there is noise in the measurement. Note that this simulation uses a different right-hand-side function RHSOscillatorControl. The measurement with noise is implemented as

TuningSim.m

The disturbances are implemented with a step perturbation, which ends at a given step, and random noise:

TuningSim.m

The tuning code using FFTEnergy is shown in the following snippet.

TuningSim.m

The entire loop is run four times, with the first time undamped and the second, third, and fourth times updating the tuned gain. The results in the command window are

If the random noise is large enough, the loop may tune more than once. Running it a few times or increasing the noise will show this behavior.

As you can see from the FFT plots in Figure 5.6, the spectra are “noisy” due to the sensor noise and Gaussian disturbance. The criteria for determining that the system is underdamped it is a distinctive peak. If the noise is large enough, we have to set lower thresholds to trigger the tuning. The top-left FFT plot shows the 0.1 rad/s peak. After tuning, we damp the oscillator sufficiently so that the peak is diminished. The time plot in Figure 5.6 (the bottom plot) shows that, initially, the system is lightly damped. After tuning, it oscillates very little. There is a slight transient every time the tuning is adjusted at 1.9, 3.6, and 5.5 seconds. The FFT plots (the top right and middle two) show the data used in the tuning.

An important point is that we must stimulate the system to identify the peak. All system identification, parameter estimation, and tuning algorithms have this requirement. An alternative to a pulse (which has a broad frequency spectrum) would be to use a sinusoidal sweep. That would excite any resonances and make it easier to identify the peak. However, care must be taken when exciting a physical system at different frequencies to ensure it does not have an unsafe or unstable response at natural frequencies.

Figure 5.6
7 line graphs. Top. 4 graphs plot the log of energy versus frequency. The line in graph 1 increases to peak and then decreases. The 3 other graphs have decreasing lines with dense fluctuations. Bottom. 3 graphs for oscillators plot r, v and y r versus time in hours. Lines fluctuate up to around 1.8.

Tuning simulation results. The first four plots are the frequency spectra taken at the end of each sampling interval; the last shows the results over time. Upper left, before tuning, the peak is seen

Figure 5.7
An illustration depicts rotor speed control. A wheel mounted on a base rotates around the vertical axis.

Speed control of a rotor for the Model Reference Adaptive Control demo

5.2 Implement MRAC

Our next example is to control a rotor with an unknown load so that it behaves in a desired manner. We will use Model Reference Adaptive Control (MRAC). The dynamical model of the rotary joint is [3] and is shown in Figure 5.7.

$$\displaystyle \begin{aligned} \frac{d\omega}{dt} = -a\omega + bu_c + u_d \end{aligned} $$
(5.9)

where the damping a and/or input constants b are unknown. ω is the angular rate. uc is the input voltage, and ud is a disturbance angular acceleration. This is a first-order system that is modeled by one first-order differential equation. We would like the system to behave like the reference model:

$$\displaystyle \begin{aligned} \frac{d\omega}{dt} = -a_m\omega + b_mu_c + u_d \end{aligned} $$
(5.10)

5.2.1 Problem

We want to control a system to behave like a particular model. Our example is a simple rotor.

5.2.2 Solution

The solution is to implement a Model Reference Adaptive Control (MRAC) function.

5.2.3 How It Works

The idea is to have a dynamic model that defines the behavior of your system. You want your system to have the same dynamics. This desired model is the reference, hence the name Model Reference Adaptive Control (MRAC). We will use the MIT rule [3] to design the adaptation system. The MIT rule was first developed at the MIT Instrumentation Laboratory (now Draper Laboratory), which developed the NASA Apollo and Space Shuttle guidance and control systems.

Consider a closed-loop system with one adjustable parameter, θ. θ is a parameter, not an angle. The desired output is ym. The error is

$$\displaystyle \begin{aligned} e = y - y_m \end{aligned} $$
(5.11)

Define a loss function (or cost) as

$$\displaystyle \begin{aligned} J(\theta) = \frac{1}{2}e^2 \end{aligned} $$
(5.12)

The square removes the sign. If the error is zero, the cost is zero. We would like to minimize J(θ). To make J small, we change the parameters in the direction of the negative gradient of J or

$$\displaystyle \begin{aligned} \frac{d\theta}{dt} = -\gamma \frac{\partial J}{\partial \theta} = -\gamma e \frac{\partial e}{\partial \theta} \end{aligned} $$
(5.13)

This is the MIT rule. If the system is changing slowly, then we can assume that θ is constant as the system adapts. γ is the adaptation gain. Our dynamic model is

$$\displaystyle \begin{aligned} \frac{d\omega}{dt} = a\omega + bu_c \end{aligned} $$
(5.14)

We would like it to be the model:

$$\displaystyle \begin{aligned} \frac{d\omega_m}{dt} = a_m\omega_m + b_mu_c \end{aligned} $$
(5.15)

a and b are the actual unknown parameters. am and bm are the model parameters. We would like a and b to be am and bm. Let the controller for our rotor be

$$\displaystyle \begin{aligned} u = \theta_1u_c - \theta_2 \omega \end{aligned} $$
(5.16)

The second term provides the damping. The controller has two adaptation parameters. If they are chosen to be

$$\displaystyle \begin{aligned} \begin{array}{rcl} \theta_1 & =&\displaystyle \frac{b_m}{b} \end{array} \end{aligned} $$
(5.17)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \theta_2 & =&\displaystyle \frac{a_m-a}{b} \end{array} \end{aligned} $$
(5.18)

the input-output relations of the system and model are the same. This is called perfect model following. This is not required. To apply the MIT rule, write the error as

$$\displaystyle \begin{aligned} e = \omega - \omega_m \end{aligned} $$
(5.19)

With the parameters θ1 and θ2, the system is

$$\displaystyle \begin{aligned} \frac{d\omega}{dt} = -(a+b\theta_2)\omega + b\theta_1u _c \end{aligned} $$
(5.20)

where γ is the adaptation gain. To continue with the implementation, we introduce the operator \(p = \frac {d}{dt}\). We then write

$$\displaystyle \begin{aligned} p\omega = -(a+b\theta_2)\omega + b\theta_1u_c \end{aligned} $$
(5.21)

or

$$\displaystyle \begin{aligned} \omega = \frac{b\theta_1}{p + a + b\theta_2}u_c \end{aligned} $$
(5.22)

We need to get the partial derivatives of the error with respect to θ1 and θ2. These are

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\partial e}{\partial \theta_1} & =&\displaystyle \frac{b}{p + a + b\theta_2}u_c \end{array} \end{aligned} $$
(5.23)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\partial e}{\partial \theta_2} & =&\displaystyle -\frac{b^2\theta_1}{\left(p + a + b\theta_2\right)^2}u_c \end{array} \end{aligned} $$
(5.24)

from the chain rule for differentiation. Noting that

$$\displaystyle \begin{aligned} u_c = \frac{p + a + b\theta_2}{b\theta_1}\omega \end{aligned} $$
(5.25)

the second equation becomes

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\partial e}{\partial \theta_2} =\frac{b}{p + a + b\theta_2}y \end{array} \end{aligned} $$
(5.26)

Since we don’t know a, let’s assume that we are pretty close to it. Then let

$$\displaystyle \begin{aligned} p + a_m \approx p + a + b\theta_2 \end{aligned} $$
(5.27)

Our adaptation laws are now

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{d\theta_1}{dt} & =&\displaystyle -\gamma\left(\frac{a_m}{p + a_m}u_c\right)e \end{array} \end{aligned} $$
(5.28)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{d\theta_2}{dt} & =&\displaystyle \gamma\left(\frac{a_m}{p + a_m}\omega\right)e \end{array} \end{aligned} $$
(5.29)

Let

$$\displaystyle \begin{aligned} \begin{array}{rcl} x_1 & =&\displaystyle \frac{a_m}{p + a_m}u_c \end{array} \end{aligned} $$
(5.30)
$$\displaystyle \begin{aligned} \begin{array}{rcl} x_2 & =&\displaystyle \frac{a_m}{p + a_m}\omega \end{array} \end{aligned} $$
(5.31)

which are differential equations that must be integrated. The complete set is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{dx_1}{dt} & =&\displaystyle - a_m x_1+a_m u_c \end{array} \end{aligned} $$
(5.32)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{dx_2}{dt} & =&\displaystyle - a_m x_ 2+ a_m \omega \end{array} \end{aligned} $$
(5.33)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{d\theta_1}{dt} & =&\displaystyle -\gamma x_1 e \end{array} \end{aligned} $$
(5.34)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{d\theta_2}{dt} & =&\displaystyle \gamma x_2 e \end{array} \end{aligned} $$
(5.35)

Our only measurement would be ω which would be measured with a tachometer. As noted before, the controller is

$$\displaystyle \begin{aligned} \begin{array}{rcl} u & =&\displaystyle \theta_1 u_c - \theta_2 \omega \end{array} \end{aligned} $$
(5.36)
$$\displaystyle \begin{aligned} \begin{array}{rcl} e & =&\displaystyle \omega - \omega_m \end{array} \end{aligned} $$
(5.37)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{d\omega_m}{dt} & =&\displaystyle - a_m \omega_m + b_m u_c \end{array} \end{aligned} $$
(5.38)

The MRAC is implemented in the function MRAC shown in its entirety in the following listing. The controller has five differential equations that are propagated. The states are [x1, x2, θ1, θ2, ωm]. RungeKutta is used for the propagation, but a less computationally intensive lower-order integrator, such as Euler, could be used instead. The function returns the default data structure if no inputs and one output is specified. The default data structure has reasonable values. That makes it easier for a user to implement the function. It only propagates one step.

MRAC.m

Now that we have the MRAC controller done, we’ll write some supporting functions and then test it all out in RotorSim.

5.3 Generating a Square Wave Input

5.3.1 Problem

We need to generate a square wave to stimulate the rotor in the previous recipe.

5.3.2 Solution

For simulation and testing our controller, we will generate a square wave with a function.

5.3.3 How It Works

SquareWave generates a square wave. The first few lines are our standard code for running a demo or returning the data structure.

SquareWave.m

This function uses d.state to determine if it is in the high or low part of a square wave. The width of the low part of the wave is set in d.tLow. The width of the high part of the square wave is set in d.tHigh. It stores the time of the last switch in d.tSwitch.

A square wave is shown in Figure 5.8. There are many ways to specify a square wave. This function produces a square wave with a minimum of zero and a maximum of one. You specify the time at zero and the time at one to create the square wave.

We adjusted the y-axis limit and line width using the following code.

SquareWave.m

Figure 5.8
A line graph of v versus t in seconds plots a square wave. The line starts at (0, 0) and moves through (10, 0), (10, 1), (20, 1), (20, 0), (30, 0), (30, 1), (40, 1), (40, 0), (40, 0), (50, 0), (50, 1), (60, 1), (60, 0), (70, 0), (70, 1), (80, 1), (80, 0), (90, 0), (90, 1), and (100, 0).

Square wave

\(\blacksquare \) TIP h = get(gca,’children’) gives you access to the line data structure in a plot for the most recent axes.

5.4 Demonstrate MRAC for a Rotor

5.4.1 Problem

We want to create a recipe to control our rotor using MRAC.

5.4.2 Solution

The solution is to implement our Model Reference Adaptive Control (MRAC) function in a MATLAB script from Recipe 5.2.

5.4.3 How It Works

MRAC is implemented in the script RotorSim. It calls MRAC to control the rotor. As in our other scripts, we use PlotSet for our 2D plots. Notice that we use two new options. One ’plot set’ allows you to put more than one line on a subplot. The other ’legend’ adds legends to each plot. The cell array argument to ’legend’ has a cell array for each plot. In this case, we have two plots each with two lines, so the cell array is

Each plot legend is a cell entry within the overall cell array.

The rotor simulation script with MRAC is shown in the following listing. The square wave functions generate the command to the system that ω should track. RHSRotor, SquareWave, and MRAC all return default data structures. MRAC and SquareWave are called once per pass through the loop. The simulation right-hand-side, that is the dynamics of the rotor, in RHSRotor, are then propagated using RungeKutta. Note that we pass to pointer for RHSRotor to RungeKutta.

RotorSim.m

\(\blacksquare \) TIP Pass pointers @fun instead of strings ’fun’ to functions whenever possible.

RHSRotor is shown as follows.

RHSRotor.m

The dynamics are just one line of code. The remaining returns the default data structure.

The results are shown in Figure 5.9. We set the adaptation gain, γ, to 1. am and bm are set equal to 2. a is set equal to 1 and b to \(\frac {1}{2}\).

The first plot shows the rotor’s estimated and true angular rates on top and the control demand and actual control sent to the wheel on the bottom. The desired control is a square wave (generated by SquareWave). Notice the transient in the applied control at the transitions of the square wave. The control amplitude is greater than the commanded control. Notice also that the angular rate approaches the desired commanded square wave shape.

Figure 5.10 shows the convergence of the adaptive gains, θ1 and θ2. They have converged by the end of the simulation.

MRAC learns the gains of the system by observing the response to the control excitation. It requires excitation to converge. This is the nature of all learning systems. If there is insufficient stimulation, it isn’t possible to observe the behavior of the system, so there is not enough information for learning. It is easy to find an excitation for a first-order system. For higher-order systems or nonlinear systems, this can be more difficult.

Figure 5.9
Two line graphs for angular rate and control. Top. Omega in radians per second versus time in minutes. true and estimated lines form overlapping square waves. Bottom. U versus time. Command has a small square wave, while control has a square wave with spikes of increasing amplitudes.

MRAC control of a rotor

Figure 5.10
A line graph for control parameters plots theta 2 versus theta 1. The line starts at (0, 0), curves downward to (2, negative 0.3), and increases with semicircular fluctuations of decreasing amplitudes to end at (4.2, 2.2). Values are estimated.

Gain convergence in the MRAC controller

Figure 5.11
A diagram depicts the top view of the pentagonal outline of a ship in a 2-D plane with axes. The ship's direction u along the x-axis is at an angle psi with the horizontal axis, while the y-axis has v and delta is indicated at the rear of the ship.

Ship heading control for gain scheduling control

5.5 Ship Steering: Implement Gain Scheduling for Steering Control of a Ship

5.5.1 Problem

We want to steer a ship at all speeds. The problem is that the dynamics are speed dependent, making this a nonlinear problem. The model is shown in Figure 5.11.

5.5.2 Solution

The solution is to use gain scheduling to set the gains based on speeds. The gain schedule is learned by automatically computing gains from the dynamical equations of the ship. This is similar to the self-tuning example except that we are seeking a set of gains for all speeds, not just one. In addition, we assume that we know the model of the system.

5.5.3 How It Works

The dynamical equations for the heading of a ship are in state space form [3]:

$$\displaystyle \begin{aligned} \left[ \begin{array}{l} \dot{v}\\ \dot{r}\\ \dot{\psi} \end{array} \right] = \left[ \begin{array}{rrr} \left(\frac{u}{l}\right)a_{11}&ua_{12}&0\\ \left(\frac{u}{l^2}\right)a_{21}&\left(\frac{u}{l}\right)a_{22}&0\\ 0&1&0 \end{array} \right] \left[ \begin{array}{l} v\\ r\\ \psi \end{array} \right] + \left[ \begin{array}{r} \left(\frac{u^2}{l}\right)b_1\\ \left(\frac{u^2}{l^2}\right)b_2\\ 0 \end{array} \right]\delta + \left[ \begin{array}{r} \alpha_v\\ \alpha_r\\ 0 \end{array} \right] \end{aligned} $$
(5.39)

v is the transverse speed, u is the ship’s speed, l is the ship length, r is the turning rate, and ψ is the heading angle. αv and αr are disturbances. The ship is assumed to be moving at speed u. This is achieved by the propeller that is not modeled. The control is rudder angle δ. Notice that if u = 0, the ship cannot be steered. All of the coefficients in the state matrix are functions of u, except for the heading angle. Our goal is to control the heading given the disturbance acceleration in the first equation and the disturbance angular rate in the second.

The disturbances only affect the dynamics states, r, and v. The last state, ψ, is a kinematic state and does not have a disturbance.

Table 5.1 Ship parameters [3]

The ship model is shown in the following code, RHSShip. The second and third outputs are for use in the controller. Notice that the differential equations are linear in the state and the control. Both matrices are a function of the forward velocity. We are not trying to control the forward velocity, it is an input to the system. The default parameters for the minesweeper are given in Table5.1. These are the same numbers that are in the default data structure.

RHSShip.m

In the ship simulation, ShipSim, we linearly increase the forward speed while commanding a series of heading psi changes. The controller takes the state space model at each time step and computes new gains which are used to steer the ship. The controller is a linear quadratic regulator. We can use full-state feedback because the states are easily modeled. Such controllers will work perfectly in this case but are a bit harder to implement when you need to estimate some of the states or have unmodeled dynamics.

ShipSim.m

The quadratic regulator generator code is shown in the following listing. It generates the gain from the matrix Riccati equation. A Riccati equation is an ordinary differential equation that is quadratic in the unknown function. In steady state, this reduces to the algebraic Riccati equation that is solved in this function.

QCR.m

a is the state transition matrix, b is the input matrix, q is the state cost matrix, and r is the control cost matrix. The bigger the elements of q, the more cost we place on deviations of the states from zero. That leads to tight control at the expense of more control. The bigger the elements of b the more cost we place on control. Bigger b means less control. Quadratic regulators guarantee stability if all states are measured. They are a very handy controller to get something working. The results are given in Figure 5.12. Note how the gains evolve.

The gain on the angular rate r is nearly constant. Notice that the ψ range is very small! Normally, you would zoom out the plot. The other two gains increase with speed. This is an example of gain scheduling. The difference is that we autonomously compute the gains from perfect measurements of the ship’s forward speed.

ShipSimDisturbance is a modified version of ShipSim that is a shorter duration, with only one-course change, and with disturbances in both angular rate and lateral velocity. The results are given in Figure 5.13.

5.6 Spacecraft Pointing

5.6.1 Problem

We want to control the orientation of a spacecraft with thrusters for control. We do not know the inertia, which has a major impact on control.

5.6.2 Solution

The solution is to use a parameter estimator to estimate the inertia and feed it into the control system.

5.6.3 How It Works

The spacecraft model is shown in Figure 5.14.

The dynamical equations are

$$\displaystyle \begin{aligned} \begin{array}{rcl} I & =&\displaystyle I_0 + m_f r_f^2 \end{array} \end{aligned} $$
(5.40)
$$\displaystyle \begin{aligned} \begin{array}{rcl} T_c + T_d& =&\displaystyle I\ddot{\theta} + \dot{m}_f r_f^2 \dot{\theta} \end{array} \end{aligned} $$
(5.41)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \dot{m_f} & =&\displaystyle -\frac{T_c}{r u_e} \end{array} \end{aligned} $$
(5.42)
Figure 5.12
8 line graphs. Top. 4 graphs for ship states plot v in meters per second, r in radians per second, psi in radians, and u in meters per second versus time in minutes. Bottom. 4 line graphs for ship gain and rudder plot gain v, gain r, gain psi, and delta in radian versus time in minutes.

Ship steering simulation. The states are shown on the top with the forward velocity. The gains and rudder angle are shown on the bottom. Notice the “pulses” in the rudder to make the maneuvers

Figure 5.13
Four line graphs for ship states and rudder angle plot v in meters per second, r in radian, psi in radian, and delta in radian versus time in minutes. V plots a line that decreases and then increases while r peaks at 1 minute. Psi increases from and delta dips at 0.8 minutes. Values are estimated.

Ship steering simulation. The states are shown on the left with the rudder angle. The disturbances are Gaussian white noise

Figure 5.14
A diagram of a rectangular spacecraft with I 0, and 2 circular fuel tanks with m f. R f extends upwards to the fuel tank from the center, and r leftwards to a thruster from the center.

Spacecraft model

where I is the total inertia, I0 is the constant inertia for everything except the fuel mass, Tc is the thruster control torque, Td is the disturbance torque, mf is the total fuel mass, rf is the distance to the fuel tank center (moment arm), r is the vector to the thrusters, ue is the thruster exhaust velocity, and θ is the angle of the spacecraft axis. Fuel consumption is balanced between the two tanks, so the center of mass remains at (0,0). The second term in the second equation is the inertia derivative term, which adds damping to the system.

Our controller is a PD (proportional derivative) controller of the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} T_c & =&\displaystyle Ia \end{array} \end{aligned} $$
(5.43)
$$\displaystyle \begin{aligned} \begin{array}{rcl} a & =&\displaystyle -K(\theta + \tau\dot{\theta}) \end{array} \end{aligned} $$
(5.44)

K is the forward gain and τ the rate constant. We design the controller for unit inertia and then estimate the inertia so that our dynamic response is always the same. We will estimate the inertia using a very simple algorithm:

$$\displaystyle \begin{aligned} I_k =K_I I_{k-1} - (1-K_I) \frac{T_{c_{k}}}{\ddot{\theta}_{k}} \end{aligned} $$
(5.45)

KI is less than or equal to one. We will do this only when the control torque is not zero and the change in rate is not zero. This is a first difference approximation and should be good if we don’t have a lot of noise. The following code snippet shows the simulation loop with the control system. The dynamics are in RHSSpacecraft.m.

SpacecraftSim.m

We only estimate inertia when the control torque is above a threshold. This prevents us from responding to noise. We also incorporate the inertia estimator in a simple low-pass filter. The results are shown in Figure 5.15. The threshold means the algorithm only estimates inertia at the very beginning of the simulation when it is reducing the attitude error.

Figure 5.15
Seven line graphs plot angle theta in radians, rate omega in radians per second, fuel m f in kilograms, estimated inertia in l in kilogram meter square, disturbance t d in newton meters, control torque t c in newton meters, and true inertia in l in kilogram meter square versus time in seconds.

States and control outputs from the spacecraft simulation

Figure 5.16
A double line graph of l in kilogram meter square versus time in seconds. The line for true is constant at 1.01 kilograms meter square. The line for estimate decreases from around (0, 1.019) to (50, 1.01), becomes constant, and ends at (150, 1.01). Values are estimated.

Estimated and actual inertia from the spacecraft simulation

The dynamics function computes the true inertia from the fuel mass state and the dry mass inertia. This allows the script to compare the estimate against the truth value in Figure 5.16.

This algorithm appears crude, but it is fundamentally all we can do in this situation given just angular rate measurements. Note that the inertia estimate happens while the control is operating, making this a nonlinear controller. More sophisticated filters or estimators could improve the performance.

5.7 Direct Adaptive Control

5.7.1 Problem

We want to control a system for which the plant is unknown. This is one in which the order and parameters for the model are unknown.

5.7.2 Solution

The solution is to use direct adaptation based on Lyapunov control.

5.7.3 How It Works

Assume the dynamics equation is

$$\displaystyle \begin{aligned} \dot{y} = ay + bu \end{aligned} $$
(5.46)

u is the control. If a is < 0, the system will always converge. If we use feedback control of the form u = −ky, then

$$\displaystyle \begin{aligned} \dot{y} = (a-bk)y + bu_d \end{aligned} $$
(5.47)

where ud is an external disturbance. If a − bk is positive, the system is unstable. If we don’t know a or b, then we can’t guarantee stability with a fixed gain control. We could try and estimate a and b and then design the controller in real time. A simple approach [18] is an adaptive controller. Assume that b > 0, then the gain is

$$\displaystyle \begin{aligned} \dot{k} = y^2 \end{aligned} $$
(5.48)

This is known as a universal regulator. To show this is stable, pick the Lyapunov function:

$$\displaystyle \begin{aligned} V - \frac{y^2}{2} \end{aligned} $$
(5.49)

Its derivative is

$$\displaystyle \begin{aligned} \dot{V} = (a-bk)y^2 = (-bk)\dot{k} \end{aligned} $$
(5.50)

Integrating

$$\displaystyle \begin{aligned} \frac{y^2}{2} = ak - \frac{bk^2}{2} + C \end{aligned} $$
(5.51)

Since \(\dot {k} > 0\), k can only increase. k has to be bounded because, otherwise, the right-hand side could be negative, which is impossible because the left-hand side is always positive. The following script implements the controller with a > 0. Notice how the controller drives the error to zero.

DirectAdaptiveControl.m

The results are shown in Figure 5.17. Note the rapid convergence. No knowledge of a or b is required. a and b are never estimated.

Figure 5.17
3 line graphs for direct adaptive control. X versus time in seconds has a line that decreases from (0, 0.1) to (0, 100). U versus time plots a line that decreases from (0, negative 0.01) to (5, negative 0.015) and ends at (0, 100). K versus time plots an increasing line from (0, 0.1) to (100, 0.2).

Direct adaptive control

Table 5.2 Chapter Code Listing

5.8 Summary

This chapter has demonstrated adaptive or learning control. You learned about model tuning, model reference adaptive control, adaptive control, and gain scheduling. Table 5.2 lists the functions and scripts included in the companion code.