1 Introduction

Controlling systems with constraints, in particular nonholonomic constraints, is an important topic of modern mechanics (Bloch 2003). Nonholonomic constraints tend to appear in mechanics as idealized representations of rolling without slipping, or some other physical process when a nonpotential force such as friction prevents the motion of the body in some directions that are controlled by generalized velocities (Arnold et al. 1997). Because of the importance of these systems in mechanics, control of nonholonomic systems has been studied extensively. We refer the reader to the recent papers on the motion planning of nonholonomic systems (Li and Canny 1992; Jean 2014), controllability (Lewis and Murray 1997; Lewis 2000; Cortés and Martínez 2004), controlled Lagrangians (Zenkov 2000; Zenkov et al. 2002; Bloch et al. 2015b), and the symmetry reduction approach to the optimal control of holonomic and nonholonomic systems (Bloch 1996; Koon and Marsden 1997; Bloch 2003; Agrachev and Sachkov 2004; Bullo and Lewis 2005). Recent progress in the area of optimal control of nonholonomic systems from the geometric point of view with extensive examples has been summarized in Bloch et al. (2015a), to which we refer the reader interested in the historical background and recent developments.

Physically, the methods of introducing control to a nonholonomic mechanical system can be roughly divided into two parts based on the actual realization. One way is to apply an external force while enforcing the system to respect the nonholonomic constraints for all times. This is akin to the control of the motion of the Chaplygin sleigh using internal point masses (Osborne and Zenkov 2005) and the control of the continuously variable transmission (CVT) studied in Bloch et al. (2015a). The second way is to think of controlling the direction of the motion by controlling the enforced direction of motion, as, for example, is done in the snakeboard (Koon and Marsden 1997). On some levels, one can understand the physical validity of the latter control approach, since one would expect that a reasonably constructed mechanical system with an adequate steering mechanism should be controllable. The goal of this paper can be viewed as “orthogonal” to the latter approach. More precisely, we consider the control of the direction normal to the allowed motion. Physically, it is not obvious that such a control mechanism is viable, since it provides quite a “weak” control of the system: There could be many directions normal to a given vector, and the system is free to move in a high-dimensional hyper-plane. As it turns out, however, this type of control has the advantage of preserving appropriate integrals of motion, which yield additional restrictions on the motion of the system. While this discussion is by no means rigorous, it shows that there is a possibility of the system being controllable.

Moreover, the approach of control using the nonholonomic constraint itself has additional advantages. As was discussed recently in Gay-Balmaz and Putkaradze (2016), allowing nonholonomic constraints to vary in time preserves integrals of motion of the system. A general theory was derived for energy conservation in the case of nonholonomically constrained systems with the configuration manifold being a semidirect product group (for example, the rolling unbalanced ball on a plane). It was also shown that additional integrals of motion can persist with perturbations of the constraints. These ideas, we believe, are also useful for control theory applications. Indeed, we shall show that using the nonholonomic constraints themselves as control preserve energy and thus puts additional constraints on the possible trajectories in phase space. On the one hand, this makes points with different energies unreachable; on the other hand, for tracking a desired trajectory on the fixed energy surface, the control is more efficient because it reduces the number of effective dimensions in the dynamics.

The paper is structured as follows. Section 2 outlines the general concepts and notations behind the mechanics of a rigid body with nonholonomic constraints, in order to make the discussion self-consistent. We discuss the concepts of symmetry reduction, variational principles, and both the Lagrange–d’Alembert and vakonomic approaches to nonholonomic mechanics. Section 2.3 outlines the general principles of Pontryagin optimal control for dynamical systems defined in \({\mathbb {R}}^n\). Sections 3 and 4 discuss the control of a specific problem posed by Suslov, which describes the motion of a rigid body under the influence of a nonholonomic constraint, stating that the projection of the body angular velocity onto a fixed axis (i.e., a nullifier axis) vanishes. While this problem is quite artificial in its mechanical realization, because of its relative simplicity, it has been quite popular in the mechanics literature. The idea of this paper is to control the motion of the rigid body by changing the nullifier axis in time. A possible mechanical realization of Suslov’s problem when the nullifier axis is fixed is given in Borisov et al. (2011); however, it is unclear how to realize Suslov’s problem when the nullifier axis is permitted to change in time. We have chosen to focus on Suslov’s problem, as it is one of the (deceptively) simplest examples of a mechanical system with nonholonomic constraints. Thus, this paper is concerned with the general theory applied to Suslov’s problem rather than its physical realization. Section 3 derives the pure and controlled equations of motion for an arbitrary group, while Sect. 4 derives the pure and controlled equations of motion for SO(3). In Sect. 3.2, particular attention is paid to the derivation of the boundary conditions needed to correctly apply the principles of Pontryagin optimal control while obtaining the controlled equations of motion. In Sect. 4.2, we show that this problem is controllable for \(\textit{SO}(3)\). In Sect. 5, we derive an optimal control procedure for this problem and show numerical simulations that illustrate the possibility to solve quite complex optimal control and trajectory tracking problems. Section 6 provides a conclusion and summary of results, while Appendix A gives a brief survey of numerical methods to solve optimal control problems.

2 Background: Symmetry Reduction, Nonholonomic Constraints, and Optimal Control in Classical Mechanics

2.1 Symmetry Reduction and the Euler–Poincaré Equation

A mechanical system consists of a configuration space, which is a manifold M, and a Lagrangian \(L(q,{\dot{q}}): \textit{TM} \rightarrow {\mathbb {R}}\), \((q,{\dot{q}}) \in \textit{TM}\). The equations of motion are given by Hamilton’s variational principle of stationary action, which states that

$$\begin{aligned} \delta \int _a^b L(q, {\dot{q}}) \mathrm {d} t=0 \, , \quad \delta q(a)=\delta q(b)=0, \end{aligned}$$
(2.1)

for all smooth variations \(\delta q(t)\) of the curve q(t) that are defined for \(a\le t \le b\) and that vanish at the endpoints (i.e., \(\delta q(a)=\delta q(b)=0\)). Application of Hamilton’s variational principle yields the Euler–Lagrange equations of motion:

$$\begin{aligned} \frac{\partial L}{\partial q} - \frac{\mathrm {d} }{\mathrm {d} t} \frac{\partial L}{\partial {\dot{q}}} =0. \end{aligned}$$
(2.2)

In the case when there is an intrinsic symmetry in the equations, in particular when \(M=G\), a Lie group, and when there is an appropriate invariance of the Lagrangian with respect to G, these Euler–Lagrange equations, defined on the tangent bundle of the group TG, can be substantially simplified, which is the topic of the Euler–Poincaré description of motion (Holm 2011; Poincaré 1901). More precisely, if the Lagrangian L is left-invariant, i.e., \(L(hg,h {\dot{g}})=L(g,{\dot{g}})\) \(\forall h \in G\), we can define the symmetry-reduced Lagrangian through the symmetry reduction \(\ell =\ell (g^{-1} {\dot{g}})\). Then, the equations of motion (2.2) are equivalent to the Euler–Poincaré equations of motion obtained from the variational principle

$$\begin{aligned} \delta \int _a^b \ell (\xi ) \text{ d } t =0 \, , \quad \text{ for } \text{ variations } \quad \delta \xi = {\dot{\eta }} + \mathrm{ad}_\xi \eta \, , \forall \eta (t):\, \eta (a)=\eta (b)=0. \end{aligned}$$
(2.3)

The variations \(\eta (t)\), assumed to be sufficiently smooth, are sometimes called free variations. Application of the variational principle (2.3) yields the Euler–Poincaré equations of motion:

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t} \frac{\delta \ell }{\delta \xi } - \mathrm{ad}^*_ \xi \frac{\delta \ell }{\delta \xi } =0. \end{aligned}$$
(2.4)

For right-invariant Lagrangians, i.e., \(L(gh,{\dot{g}}h)=L(g,{\dot{g}})\) \(\forall h \in G\), the Euler–Poincaré equations of motion (2.4) change by altering the sign in front of \(\mathrm{ad}^*_\xi \) from minus to plus. In what follows, we shall only consider the left-invariant systems for simplicity of exposition.

As an illustrative example, let us consider the motion of a rigid body rotating about its center of mass, fixed in space, with the unreduced Lagrangian defined as \(L=L(\Lambda , {\dot{\Lambda }})\), \(\Lambda \in \textit{SO}(3)\). The fact that the Lagrangian is left-invariant comes from the physical fact that the Lagrangian of a rigid body is invariant under rotations. The Lagrangian is then just the kinetic energy, \(L=L(\varvec{\Omega })=\frac{1}{2} {\mathbb {I}} \varvec{\Omega }\cdot \varvec{\Omega }\), with \({\mathbb {I}}\) being the inertia tensor and \(\varvec{\Omega }=\left( \Lambda ^{-1} {\dot{\Lambda }}\right) ^\vee \). With respect to application of the group on the left, which corresponds to the description of the equations of motion in the body frame, the symmetry-reduced Lagrangian should be of the form \(\ell \left( \Lambda ^{-1} {\dot{\Lambda }} \right) \). Here, we have defined the hat map \({\textvisiblespace }^\wedge : {\mathbb {R}}^3 \rightarrow \mathfrak {so}(3)\) and its inverse \({\textvisiblespace }^\vee : \mathfrak {so}(3) \rightarrow {\mathbb {R}}^3\) to be isomorphisms between \(\mathfrak {so}(3)\) (antisymmetric matrices) and \({\mathbb {R}}^3\) (vectors), computed as \(\widehat{a}_{ij}=- \epsilon _{i j k} a_k\). Then, \(\mathrm{ad}^*_{\varvec{\Omega }} \varvec{\Pi }= - \varvec{\Omega }\times \varvec{\Pi }\) and the Euler–Poincaré equations of motion for the rigid body become

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d} t} \varvec{\Pi }+ \varvec{\Omega }\times \varvec{\Pi }=\mathbf {0}, \quad \varvec{\Pi }:= \frac{\partial \ell }{\partial \varvec{\Omega }} = {\mathbb {I}} \varvec{\Omega }\, , \end{aligned}$$
(2.5)

which are the well-known Euler equations of motion for a rigid body rotating about its center of mass, fixed in space.

2.2 Nonholonomic Constraints and Lagrange–d’Alembert’s Principle

Suppose a mechanical system having configuration space M, a manifold of dimension n, must satisfy \(m < n\) constraints that are linear in velocity. To express these velocity constraints formally, the notion of a distribution is needed. Given the manifold M, a distribution \({\mathcal {D}}\) on M is a subset of the tangent bundle \(\textit{TM} = \bigcup _{q \in M} T_q M\): \({\mathcal {D}} = \bigcup _{q \in M} {\mathcal {D}}_q\), where \({\mathcal {D}}_q \subset T_q M\) and \(m = \mathrm {dim} \, {\mathcal {D}}_q < \mathrm {dim} \, T_q M = n\) for each \(q \in M\). A curve \(q(t) \in M\) satisfies the constraints if \({\dot{q}}(t) \in {\mathcal {D}}_{q(t)}\). Lagrange–d’Alembert’s principle states that the equations of motion are determined by

$$\begin{aligned} \delta \int _a^b L(q, {\dot{q}}) \mathrm {d} t=0 \Leftrightarrow \int \left[ \frac{\mathrm {d} }{\mathrm {d} t} \frac{\partial L}{\partial {\dot{q}}}- \frac{\partial L}{\partial q} \right] \delta q \, \text{ d } t = 0 \Leftrightarrow \frac{\mathrm {d} }{\mathrm {d} t} \frac{\partial L}{\partial {\dot{q}}}- \frac{\partial L}{\partial q} \in {\mathcal {D}}_q^\circ \end{aligned}$$
(2.6)

for all smooth variations \(\delta q(t)\) of the curve q(t) such that \(\delta q(t) \in {\mathcal {D}}_{q(t)}\) for all \(a\le t \le b\) and such that \(\delta q(a)=\delta q(b)=0\), and for which \({\dot{q}}(t) \in {\mathcal {D}}_{q(t)}\) for all \(a\le t \le b\). If one writes the nonholonomic constraint in local coordinates as \(\sum _{i=1}^n A(q)^j_i {\dot{q}}^i=0\), \(j=1, \ldots , m < n\), then (2.6) is written in local coordinates as

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t} \frac{\partial L}{\partial {{\dot{q}}}^i}- \frac{\partial L}{\partial q^i} = \sum _{j=1}^m \lambda _j A(q)^j_i \, , \quad i=1,\ldots ,n \, , \quad \sum _{i=1}^n A(q)^j_i {{\dot{q}}}^i=0 , \end{aligned}$$
(2.7)

where the \(\lambda _j\) are Lagrange multipliers enforcing \(\sum _{i=1}^n A(q)^j_i {\delta q}^i=0\), \(j=1, \ldots , m\). Aside from Lagrange–d’Alembert’s approach, there is also an alternative vakonomic approach to derive the equations of motion for nonholonomic mechanical systems. Simply speaking, that approach relies on substituting the constraint into the Lagrangian before taking variations or, equivalently, enforcing the constraints using the appropriate Lagrange multiplier method. In general, it is an experimental fact that all known nonholonomic mechanical systems obey the equations of motion resulting from Lagrange–d’Alembert’s principle (Lewis and Murray 1995).

2.3 Optimal Control and Pontryagin’s Minimum Principle

Given a dynamical system with a state \({\mathbf {x} }\) in \({\mathbb {R}}^n\), a fixed initial time a, and a fixed or free terminal time \(b>a\), suppose it is desired to find a control \(\mathbf {u}\) in \({\mathbb {R}}^k\) that minimizes

$$\begin{aligned} \int _a^b L({\mathbf {x} },\mathbf {u},t) \mathrm {d} t, \end{aligned}$$
(2.8)

subject to satisfying the equations of motion \({\dot{{\mathbf {x} }}} = \mathbf {f}( {\mathbf {x} }, \mathbf {u},t)\), \(m_1\) initial conditions \(\varvec{\phi }({\mathbf {x} }(a))=\mathbf {0}\), and \(m_2\) terminal conditions \(\varvec{\psi }({\mathbf {x} }(b),b)=\mathbf {0}\). Following Bryson (1975) by using the method of Lagrange multipliers, this problem may be solved by finding \(\mathbf {u}\), \({\mathbf {x} }(a)\), and b that minimizes

$$\begin{aligned} S = \left\langle \varvec{\rho }, \varvec{\phi }({\mathbf {x} }(a)) \right\rangle + \left\langle \varvec{\nu },\varvec{\psi }({\mathbf {x} }(b),b) \right\rangle + \int _a^b \left[ L({\mathbf {x} }, \mathbf {u},t) +\left\langle \varvec{\pi },\mathbf {f}({\mathbf {x} }, \mathbf {u},t) - \dot{\mathbf {x}} \right\rangle \right] \mathrm {d} t, \end{aligned}$$
(2.9)

for an \(m_1\)-dimensional constant Lagrange multiplier vector \(\varvec{\rho }\), an \(m_2\)-dimensional constant Lagrange multiplier vector \(\varvec{\nu }\), and an n-dimensional time-varying Lagrange multiplier vector \(\varvec{\pi }\). Defining the Hamiltonian H as \(H({\mathbf {x} }, \mathbf {u},\varvec{\pi },t) = L({\mathbf {x} }, \mathbf {u},t)+\left\langle \varvec{\pi },\mathbf {f}({\mathbf {x} }, \mathbf {u},t)\right\rangle \) and by integrating by parts, S becomes

$$\begin{aligned} S = \left\langle \varvec{\rho }, \varvec{\phi }({\mathbf {x} }(a)) \right\rangle + \left\langle \varvec{\nu },\varvec{\psi }({\mathbf {x} }(b),b) \right\rangle + \int _a^b \left[ H({\mathbf {x} }, \mathbf {u},\varvec{\pi },t)+\left\langle {\dot{\varvec{\pi }}}, {\mathbf {x} }\right\rangle \right] \mathrm {d} t - \left. \left\langle \varvec{\pi },{\mathbf {x} }\right\rangle \right| _a^b.\nonumber \\ \end{aligned}$$
(2.10)

Before proceeding further with Pontryagin’s minimum principle, some terminology from the calculus of variations is briefly reviewed. Suppose that y is a time-dependent function, w is a time-independent variable, and Q is a scalar-valued functional that depends on y and w. The variation of y is \(\delta y \equiv \left. \frac{\partial y}{\partial \epsilon } \right| _{\epsilon =0}\), the differential of y is \(\mathrm {d} y \equiv \delta y + {\dot{y}} \mathrm {d}t = \left. \frac{\partial y}{\partial \epsilon } \right| _{\epsilon =0}+\left. \frac{\partial y}{\partial t} \right| _{\epsilon =0} \mathrm {d}t\), and the differential of w is \(\mathrm {d} w \equiv \left. \frac{\mathrm {d} w}{\mathrm {d} \epsilon } \right| _{\epsilon =0}\), where \(\epsilon \) represents an independent “variational” variable. The variation of Q with respect to y is \(\delta _y Q \equiv \frac{\partial Q}{\partial y} \delta y\), while the differential of Q with respect to w is \(\mathrm {d}_{w} Q \equiv \frac{\partial Q}{\partial w} \mathrm {d} w\). The total differential (or for brevity “the differential”) of Q is \(\mathrm {d} Q \equiv \delta _y Q + \mathrm {d}_{w} Q =\frac{\partial Q}{\partial y} \delta y + \frac{\partial Q}{\partial w} \mathrm {d} w\). Colloquially, the variation of Q with respect to y means the change in Q due to a small change in y, the differential of Q with respect to w means the change in Q due to a small change in w, and the total differential of Q means the change in Q due to small changes in y and w. The extension to vectors of time-dependent functions and time-independent variables is straightforward. If \(\mathbf {y}\) is a vector of time-dependent functions, \(\mathbf {w}\) is a vector of time-independent variables, and Q is a scalar-valued functional depending on \(\mathbf {y}\) and \(\mathbf {w}\), then the variation of \(\mathbf {y}\) is \(\delta \mathbf {y} \equiv \left. \frac{\partial \mathbf {y}}{\partial \epsilon } \right| _{\epsilon =0}\), the differential of \(\mathbf {y}\) is \(\mathrm {d} \mathbf {y} \equiv \delta \mathbf {y} + {\dot{\mathbf {y}}} \mathrm {d}t = \left. \frac{\partial \mathbf {y}}{\partial \epsilon } \right| _{\epsilon =0}+\left. \frac{\partial \mathbf {y}}{\partial t} \right| _{\epsilon =0} \mathrm {d}t\), the differential of \(\mathbf {w}\) is \(\mathrm {d} \mathbf {w} \equiv \left. \frac{\mathrm {d} \mathbf {w}}{\mathrm {d} \epsilon } \right| _{\epsilon =0}\), the variation of Q with respect to \(\mathbf {y}\) is \(\delta _\mathbf {y} Q \equiv \frac{\partial Q}{\partial \mathbf {y}} \delta \mathbf {y}\), the differential of Q with respect to \(\mathbf {w}\) is \(\mathrm {d}_{\mathbf {w}} Q \equiv \frac{\partial Q}{\partial \mathbf {w}} \mathrm {d} \mathbf {w}\), and the total differential (or for brevity “the differential”) of Q is \(\mathrm {d} Q \equiv \delta _\mathbf {y} Q + \mathrm {d}_{\mathbf {w}} Q =\frac{\partial Q}{\partial \mathbf {y}} \delta \mathbf {y} + \frac{\partial Q}{\partial \mathbf {w}} \mathrm {d} \mathbf {w}\).

Returning to (2.10), demanding that \(\mathrm {d} S=0\) for all variations \(\delta {\mathbf {x} }\), \(\delta \mathbf {u}\), and \(\delta \varvec{\pi }\) and for all differentials \(\mathrm {d} \varvec{\rho }\), \(\mathrm {d} \varvec{\nu }\), \(\mathrm {d} {\mathbf {x} }(b)\), and \(\mathrm {d} b\) gives the optimally controlled equations of motion, which are canonical in the variables \({\mathbf {x} }\) and \(\varvec{\pi }\),

$$\begin{aligned}&{\dot{{\mathbf {x} }}} =\left( \frac{\partial H}{\partial \varvec{\pi }} \right) ^\mathsf {T} = \mathbf {f}( {\mathbf {x} }, \mathbf {u},t) , \end{aligned}$$
(2.11)
$$\begin{aligned}&{\dot{\varvec{\pi }}}=-\left( \frac{\partial H}{\partial {\mathbf {x} }} \right) ^\mathsf {T}= - \left( \frac{\partial \mathbf {f}}{\partial {\mathbf {x} }} \right) ^\mathsf {T} \varvec{\pi }- \left( \frac{\partial L}{\partial {\mathbf {x} }} \right) ^\mathsf {T}. \end{aligned}$$
(2.12)

In addition to these equations, the solution must satisfy the optimality condition

$$\begin{aligned} \left( \frac{\partial H}{\partial \mathbf {u}} \right) ^\mathsf {T} = \left( \frac{\partial f}{\partial \mathbf {u}} \right) ^\mathsf {T} \varvec{\pi }+ \left( \frac{\partial L}{\partial \mathbf {u}} \right) ^\mathsf {T}= \mathbf {0}, \end{aligned}$$
(2.13)

and the boundary conditions which are obtained by equating the appropriate boundary terms involving variations or differentials to zero. The left boundary conditions are

$$\begin{aligned} \varvec{\phi }({\mathbf {x} }(a))= & {} \mathbf {0}, \end{aligned}$$
(2.14)
$$\begin{aligned} \left[ \left( \frac{\partial \varvec{\phi }}{\partial {\mathbf {x} }} \right) ^\mathsf {T}\varvec{\rho }+\varvec{\pi }\right] _{t=a}= & {} \mathbf {0}, \end{aligned}$$
(2.15)

and the right boundary conditions are

$$\begin{aligned} \varvec{\psi }({\mathbf {x} }(b),b)= & {} \mathbf {0}, \end{aligned}$$
(2.16)
$$\begin{aligned} \left[ \left( \frac{\partial \varvec{\psi }}{\partial {\mathbf {x} }} \right) ^\mathsf {T}\varvec{\nu }-\varvec{\pi }\right] _{t=b}= & {} \mathbf {0}, \end{aligned}$$
(2.17)
$$\begin{aligned} \left[ \left( \frac{\partial \varvec{\psi }}{\partial t} \right) ^\mathsf {T} \varvec{\nu }+H \right] _{t=b}= & {} \left[ \left( \frac{\partial \varvec{\psi }}{\partial t} \right) ^\mathsf {T} \varvec{\nu }+L+ \varvec{\pi }^\mathsf {T} \mathbf {f} \right] _{t=b} = 0, \end{aligned}$$
(2.18)

where the final right boundary condition (2.18) is only needed if the terminal time b is free. If \(H_{\varvec{u}\varvec{u}}\) is nonsingular, the implicit function theorem guarantees that the optimality condition (2.13) determines the k-vector \(\varvec{u}\). (2.11), (2.12), and (2.14)–(2.18) define a two-point boundary value problem (TPBVP). If the terminal time b is fixed, then the solution of the 2n differential equations (2.11) and (2.12) and the choice of the \(m_1+m_2\) free parameters \(\varvec{\rho }\) and \(\varvec{\nu }\) are determined by the \(2n+m_1+m_2\) boundary conditions (2.14)–(2.17). If the terminal time b is free, then the solution of the 2n ordinary differential equations (2.11) and (2.12) and the choice of the \(m_1+m_2+1\) free parameters \(\varvec{\rho }\), \(\varvec{\nu }\), and b are determined by the \(2n+m_1+m_2+1\) boundary conditions (2.14)–(2.18).

3 Suslov’s Optimal Control Problem for an Arbitrary Group

3.1 Derivation of Suslov’s Pure Equations of Motion

Suppose G is a Lie group having all appropriate properties for the application of the Euler–Poincaré theory (Marsden and Ratiu 2013; Holm 2011). As we mentioned above, if the Lagrangian \(L=L(g,{\dot{g}})\) is left G-invariant, then the problem can be reduced to the consideration of the symmetry-reduced Lagrangian \(\ell (\Omega )\) with \(\Omega =g^{-1} {\dot{g}}\). Here, we concentrate on the left-invariant Lagrangians as being pertinent to the dynamics of a rigid body. A parallel theory of right-invariant Lagrangians can be developed as well in a completely equivalent fashion (Holm 2011). We also assume that there is a suitable pairing between the Lie algebra \({\mathfrak {g}}\) and its dual \({\mathfrak {g}}^*\), which leads to the co-adjoint operator

$$\begin{aligned} \left\langle \mathrm{ad}^*_a\alpha , b \right\rangle := \left\langle \alpha , \mathrm{ad}_a b \right\rangle \quad \forall a, b \in {\mathfrak {g}}\, , \alpha \in {\mathfrak {g}}^*. \end{aligned}$$

Then, the equations of motion are obtained by Euler–Poincaré’s variational principle

$$\begin{aligned} \delta \int _a^b \ell (\Omega ) \text{ d } t =0 \quad \end{aligned}$$
(3.1)

with the variations \(\delta \Omega \) satisfying

$$\begin{aligned} \delta \Omega = {\dot{\eta }} + \mathrm{ad}_\Omega \eta \, , \end{aligned}$$
(3.2)

where \(\eta (t)\) is an arbitrary \({\mathfrak {g}}\)-valued function satisfying \(\eta (a)=\eta (b)=0\). Then, the equations of motion are the Euler–Poincaré equations of motion

$$\begin{aligned} {\dot{\Pi }} - \mathrm{ad}^*_\Omega \Pi =0 \, , \quad \Pi := \frac{\delta \ell }{\delta \Omega } . \end{aligned}$$
(3.3)

Let \(\xi (t) \in {\mathfrak {g}}^*\), with \(\xi (t) \ne 0 \; \forall t\) and introduce the constraint

$$\begin{aligned} \left\langle \xi ,\Omega \right\rangle =\gamma \left( \xi ,t \right) . \end{aligned}$$
(3.4)

Due to the constraint (3.4), Lagrange–d’Alembert’s principle states that the variations \(\eta \in {\mathfrak {g}}\) have to satisfy

$$\begin{aligned} \left\langle \xi , \eta \right\rangle =0. \end{aligned}$$
(3.5)

Using (3.5), Suslov’s pure equations of motion are obtained:

$$\begin{aligned} {\dot{\Pi }} - \mathrm{ad}^*_{\Omega } \Pi =\lambda \xi , \quad \Pi := \frac{\delta \ell }{\delta \Omega }, \end{aligned}$$
(3.6)

where \(\lambda \) is the Lagrange multiplier enforcing (3.5). In order to explicitly solve (3.6) for \(\lambda \), we will need to further assume a linear connection between the angular momentum \(\Pi \) and the angular velocity \(\Omega \). Thus, we assume that \(\Pi = {\mathbb {I}} \Omega \), where \({\mathbb {I}} :{\mathfrak {g}} \rightarrow {\mathfrak {g}}^*\) is an invertible linear operator with an adjoint \({\mathbb {I}}^* :{\mathfrak {g}}^* \rightarrow {\mathfrak {g}}\); \({\mathbb {I}}\) has the physical meaning of the inertia operator when the Lie group G under consideration is the rotation group \(\textit{SO}(3)\). Under this assumption, we pair both sides of (3.6) with \({\mathbb {I}}^{-1*} \xi \) and obtain the following expression for the Lagrange multiplier \(\lambda \):

$$\begin{aligned} \lambda \left( \Omega , \xi \right) =\frac{ \displaystyle \frac{\mathrm {d} }{\mathrm {d} t} \left[ \gamma (\xi ,t) \right] - \left\langle {\mathbb {I}} \Omega ,\displaystyle \frac{\mathrm {d} }{\mathrm {d} t} \left[ {\mathbb {I}}^{-1*} \xi \right] + \mathrm{ad}_\Omega {\mathbb {I}}^{-1*} \xi \right\rangle }{\left\langle \xi , {\mathbb {I}}^{-1*} \xi \right\rangle }. \end{aligned}$$
(3.7)

If we, moreover, assume that \(\gamma (\xi ,t)\) is a constant, e.g., \(\gamma (\xi ,t)=0\) as is in the standard formulation of Suslov’s problem, and \({\mathbb {I}} :{\mathfrak {g}} \rightarrow {\mathfrak {g}}\) is a time-independent, invertible linear operator that is also self-adjoint (i.e., \({\mathbb {I}}={\mathbb {I}}^*\)), then (3.7) simplifies to

$$\begin{aligned} \lambda \left( \Omega , \xi \right) =-\frac{ \left\langle {\mathbb {I}} \Omega , \mathrm{ad}_\Omega {\mathbb {I}}^{-1} \xi \right\rangle +\left\langle \Omega , {\dot{\xi }} \right\rangle }{\left\langle \xi , {\mathbb {I}}^{-1} \xi \right\rangle }, \end{aligned}$$
(3.8)

the kinetic energy is

$$\begin{aligned} T(t) = \frac{1}{2} \left\langle \Omega , \Pi \right\rangle = \frac{1}{2} \left\langle \Omega , {\mathbb {I}} \Omega \right\rangle , \end{aligned}$$
(3.9)

the time derivative of the kinetic energy is

$$\begin{aligned} \begin{aligned} {\dot{T}}(t)&= \frac{1}{2} \left[ \left\langle {\dot{\Omega }}, {\mathbb {I}} \Omega \right\rangle +\left\langle \Omega , {\mathbb {I}} {\dot{\Omega }} \right\rangle \right] = \left\langle \Omega , \frac{1}{2} \left[ {\mathbb {I}} + {\mathbb {I}}^* \right] {\dot{\Omega }} \right\rangle = \left\langle \Omega , {\mathbb {I}} {\dot{\Omega }} \right\rangle = \left\langle \Omega , {\dot{\Pi }} \right\rangle \\&= \left\langle \Omega , \mathrm{ad}^*_{\Omega } \Pi +\lambda \xi \right\rangle = \left\langle \mathrm{ad}_{\Omega } \Omega , \Pi \right\rangle + \lambda \left\langle \Omega , \xi \right\rangle = \lambda \gamma \left( \xi ,t \right) , \end{aligned} \end{aligned}$$
(3.10)

and kinetic energy is conserved if \(\gamma (\xi ,t)=0\).

3.2 Derivation of Suslov’s Optimally Controlled Equations of Motion

Consider the problem (3.6) and assume that \(\Pi = {\mathbb {I}} \Omega \) so that the explicit equation for the Lagrange multiplier (3.7) holds. We now turn to the central question of the paper, namely optimal control of the system by varying the nullifier (or annihilator) \(\xi (t)\). The optimal control problem is defined as follows. Consider a fixed initial time a, a fixed or free terminal time \(b > a\), the cost function \(C\left( \Omega ,\dot{\Omega },\xi ,{\dot{\xi }},t\right) \), and the following optimal control problem

(3.11)

and subject to the left and right boundary conditions \(\Omega (a)=\Omega _a\) and \(\Omega (b)=\Omega _b\). Construct the performance index

$$\begin{aligned} S= & {} \left\langle \rho ,\Omega (a)-\Omega _a \right\rangle + \left\langle \nu ,\Omega (b)-\Omega _b \right\rangle + \int _a^b \left[ C + \left\langle \kappa , \left( {\mathbb {I}} \Omega \right) ^\cdot - \mathrm{ad}^*_\Omega {\mathbb {I}} \Omega - \lambda \xi \right\rangle \right] \mathrm {d}t \nonumber \\= & {} \left\langle \rho ,\Omega (a)-\Omega _a \right\rangle + \left\langle \nu ,\Omega (b)-\Omega _b \right\rangle + \left. \left\langle \kappa , {\mathbb {I}} \Omega \right\rangle \right| _a^b \nonumber \\&+ \int _a^b \left[ C - \left\langle {\dot{\kappa }} + \mathrm{ad}_\Omega \kappa \, , {\mathbb {I}} \Omega \right\rangle - \lambda \left\langle \kappa , \xi \right\rangle \right] \mathrm {d}t, \end{aligned}$$
(3.12)

where the additional unknowns are a \({\mathfrak {g}}\)-valued function of time \(\kappa (t)\) and the constants \(\rho , \, \nu \in {\mathfrak {g}}^*\) enforcing the boundary conditions.

Remark 3.1

(On the nature of the pairing in (3.12)) For simplicity of calculation and notation, we assume that the pairing in (3.12) between vectors in \({\mathfrak {g}}\) and \({\mathfrak {g}}^*\) is the same as the one used in the derivation of Suslov’s problem in Sect. 3.1. In principle, one could use a different pairing which would necessitate a different notation for the \(\mathrm{ad}\) operator. We believe that while such generalization is rather straightforward, it introduces a cumbersome and nonintuitive notation. For the case when \(G=\textit{SO}(3)\) considered later in Sect. 4, we will take the simplest possible pairing, the scalar product of vectors in \({\mathbb {R}}^3\). In that case, the \({\text{ ad }}\) and \(\mathrm{ad}^*\) operators are simply the vector cross product with an appropriate sign.

Pontryagin’s minimum principle gives necessary conditions that a minimum solution of (3.11) must satisfy, if it exists. These necessary conditions are obtained by equating the differential of S to 0, resulting in appropriately coupled equations for the state and control variables. While this calculation is well established (Bryson 1975; Hull 2013), we present it here for completeness of the exposition as it is relevant to our further discussion.

Following Bryson (1975), we denote all variations of S coming from the time-dependent variables \(\kappa \), \(\Omega \), and \(\xi \) as \(\delta S\) and write \(\delta S = \delta _{\kappa } S+\delta _{\Omega } S+\delta _{\xi } S\). By using partial differentiation, the variation of S with respect to each time-independent variable \(\rho \), \(\nu \), and b is \(\left\langle \frac{\partial S}{\partial \rho },\mathrm {d} \rho \right\rangle \), \(\left\langle \frac{\partial S}{\partial \nu }, \mathrm {d} \nu \right\rangle \), and \(\frac{\partial S}{\partial b} \mathrm {d} b\), respectively. Thus, the differential of S is given by

$$\begin{aligned} \begin{aligned} \mathrm d S&= \delta S + \left\langle \frac{\partial S}{\partial \rho } , \mathrm {d} \rho \right\rangle + \left\langle \frac{\partial S}{\partial \nu } , \mathrm {d} \nu \right\rangle + \frac{\partial S}{\partial b} \mathrm {d} b \\&= \delta _{\kappa } S+\delta _{\Omega } S+\delta _{\xi } S+ \left\langle \frac{\partial S}{\partial \rho } , \mathrm {d} \rho \right\rangle + \left\langle \frac{\partial S}{\partial \nu } , \mathrm {d} \nu \right\rangle + \frac{\partial S}{\partial b} \mathrm {d} b. \end{aligned} \end{aligned}$$
(3.13)

Each term in \(\mathrm {d} S\) is computed below. It is important to present this calculation in some detail, in particular, because of the contribution of the boundary conditions. The variation of S with respect to \(\kappa \) is

$$\begin{aligned} \delta _{\kappa } S = \int _a^b \left\langle \left( {\mathbb {I}} \Omega \right) ^\cdot - \mathrm{ad}^*_\Omega {\mathbb {I}} \Omega - \lambda \xi , \delta \kappa \right\rangle \mathrm {d}t. \end{aligned}$$
(3.14)

Since \(\delta \Omega (a)=0\), \(\mathrm {d} \Omega (b) = \delta \Omega (b) + {\dot{\Omega }}(b) \mathrm {d} b\), and

$$\begin{aligned} \begin{aligned} \delta _\Omega \left\langle \kappa , \mathrm{ad}^* _\Omega {\mathbb {I}} \Omega \right\rangle&= \left\langle \kappa , \mathrm{ad}^* _{\delta \Omega } {\mathbb {I}} \Omega \right\rangle + \left\langle \kappa , \mathrm{ad}^* _\Omega {\mathbb {I}} \delta \Omega \right\rangle \\&= \left\langle \mathrm{ad}_{\delta \Omega }\kappa , {\mathbb {I}} \Omega \right\rangle + \left\langle \mathrm{ad}_\Omega \kappa , {\mathbb {I}} \delta \Omega \right\rangle \\&= \left\langle -\mathrm{ad}_{\kappa } \delta \Omega , {\mathbb {I}} \Omega \right\rangle + \left\langle {\mathbb {I}}^* \mathrm{ad}_\Omega \kappa , \delta \Omega \right\rangle \\&= \left\langle -\mathrm{ad}^*_{\kappa } {\mathbb {I}} \Omega +{\mathbb {I}}^* \mathrm{ad}_\Omega \kappa , \delta \Omega \right\rangle , \end{aligned} \end{aligned}$$
(3.15)

the variation of S with respect to \(\Omega \) is

$$\begin{aligned} \delta _{\Omega } S= & {} \left\langle \rho ,\delta \Omega (a) \right\rangle + \left\langle \nu ,\delta \Omega (b) \right\rangle +\left. \left\langle \kappa , {\mathbb {I}} \delta \Omega \right\rangle \right| _a^b+\left. \left\langle \frac{\partial C}{\partial {\dot{\Omega }}}, \delta \Omega \right\rangle \right| _a^b\nonumber \\&+ \int _a^b \left\langle \frac{\partial C}{\partial \Omega }-\frac{\mathrm {d} }{\mathrm {d} t}\frac{\partial C}{\partial {\dot{\Omega }}} - {\mathbb {I}}^* \left( {\dot{\kappa }} + \mathrm{ad}_\Omega \kappa \right) + \mathrm{ad}^* _\kappa {\mathbb {I}} \Omega - \frac{\partial \lambda }{\partial \Omega } \left\langle \kappa \, , \, \xi \right\rangle , \delta \Omega \right\rangle \mathrm {d}t \nonumber \\= & {} \left. \left\langle \nu + {\mathbb {I}}^* \kappa +\frac{\partial C}{\partial {\dot{\Omega }}}, \delta \Omega \right\rangle \right| _{t=b} \nonumber \\&+ \int _a^b \left\langle \frac{\partial C}{\partial \Omega }-\frac{\mathrm {d} }{\mathrm {d} t}\frac{\partial C}{\partial {\dot{\Omega }}} - {\mathbb {I}}^* \left( {\dot{\kappa }} + \mathrm{ad}_\Omega \kappa \right) + \mathrm{ad}^* _\kappa {\mathbb {I}} \Omega - \frac{\partial \lambda }{\partial \Omega } \left\langle \kappa \, , \, \xi \right\rangle , \delta \Omega \right\rangle \mathrm {d}t \nonumber \\= & {} \left. \left\langle \nu + {\mathbb {I}}^* \kappa +\frac{\partial C}{\partial {\dot{\Omega }}}, \mathrm {d} \Omega \right\rangle \right| _{t=b}-\left. \left\langle \nu + {\mathbb {I}}^* \kappa +\frac{\partial C}{\partial {\dot{\Omega }}}, {\dot{\Omega }} \right\rangle \right| _{t=b} \mathrm {d} b\nonumber \\&+ \int _a^b \left\langle \frac{\partial C}{\partial \Omega }-\frac{\mathrm {d} }{\mathrm {d} t}\frac{\partial C}{\partial {\dot{\Omega }}} - {\mathbb {I}}^* \left( {\dot{\kappa }} + \mathrm{ad}_\Omega \kappa \right) + \mathrm{ad}^* _\kappa {\mathbb {I}} \Omega - \frac{\partial \lambda }{\partial \Omega } \left\langle \kappa \, , \, \xi \right\rangle , \delta \Omega \right\rangle \mathrm {d}t.\nonumber \\ \end{aligned}$$
(3.16)

Since \(\mathrm {d} \xi (b) = \delta \xi (b) + {\dot{\xi }}(b) \mathrm {d} b \), the variation of S with respect to \(\xi \) is

$$\begin{aligned} \begin{aligned} \delta _{\xi } S&= \left. \left\langle \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}}, \delta \xi \right\rangle \right| _a^b \\&\qquad + \int _a^b \left\langle -\frac{\mathrm d}{\mathrm d t} \left( \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}} \right) + \left( \frac{\partial C}{\partial \xi } - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial \xi } \right) - \lambda \kappa , \delta \xi \right\rangle \mathrm {d}t \\&= \left. \left\langle \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}}, \mathrm {d} \xi \right\rangle \right| _{t=b} \\&\qquad - \left. \left\langle \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}}, {\dot{\xi }} \right\rangle \right| _{t=b} \mathrm {d} b - \left. \left\langle \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}}, \delta \xi \right\rangle \right| _{t=a} \\&\qquad + \int _a^b \left\langle -\frac{\mathrm d}{\mathrm d t} \left( \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}} \right) + \left( \frac{\partial C}{\partial \xi } - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial \xi } \right) - \lambda \kappa , \delta \xi \right\rangle \mathrm {d}t. \end{aligned} \end{aligned}$$
(3.17)

The remaining terms in \(\mathrm {d}S\), due to variations of S with respect to the time-independent variables, are

$$\begin{aligned} \left\langle \frac{\partial S}{\partial \rho } , \mathrm {d} \rho \right\rangle = \left\langle \Omega (a)-\Omega _a , \mathrm {d} \rho \right\rangle , \end{aligned}$$
(3.18)
$$\begin{aligned} \left\langle \frac{\partial S}{\partial \nu } , \mathrm {d} \nu \right\rangle = \left\langle \Omega (b)-\Omega _b , \mathrm {d} \nu \right\rangle , \end{aligned}$$
(3.19)

and

$$\begin{aligned} \frac{\partial S}{\partial b} \mathrm {d} b = \left[ \left\langle \nu ,{\dot{\Omega }} \right\rangle + C + \left\langle \kappa , \left( {\mathbb {I}} \Omega \right) ^\cdot - \mathrm{ad}^*_\Omega {\mathbb {I}} \Omega - \lambda \xi \right\rangle \right] _{t=b} \mathrm {d} b. \end{aligned}$$
(3.20)

Adding all the terms in \(\mathrm {d} S\) together and demanding that \(\mathrm {d} S=0\) for all \(\delta \kappa \), \(\delta \Omega \), \(\delta \xi \), \(\mathrm {d} \Omega (b)\), \(\mathrm {d} \xi (b)\), \(\mathrm {d} \rho \), \(\mathrm {d} \nu \), and \(\mathrm {d} b\) (note here that \(\delta \kappa \), \(\delta \Omega \), and \(\delta \xi \) are variations defined for \(a \le t \le b\)) gives the two-point boundary value problem defined by the following equations of motion on \(a\le t \le b\)

$$\begin{aligned} \delta \kappa :&\quad \left( {\mathbb {I}} \Omega \right) ^\cdot - \mathrm{ad}^*_\Omega {\mathbb {I}} \Omega - \lambda \xi = 0 \end{aligned}$$
(3.21)
$$\begin{aligned} \delta \Omega :&\quad \frac{\partial C}{\partial \Omega }-\frac{\mathrm {d} }{\mathrm {d} t}\frac{\partial C}{\partial {\dot{\Omega }}} - {\mathbb {I}}^* \left( {\dot{\kappa }} + \mathrm{ad}_\Omega \kappa \right) + \mathrm{ad}^* _\kappa {\mathbb {I}} \Omega - \frac{\partial \lambda }{\partial \Omega } \left\langle \kappa \, , \, \xi \right\rangle =0 \quad \end{aligned}$$
(3.22)
$$\begin{aligned} \delta \xi :&\quad -\frac{\mathrm d}{\mathrm d t} \left( \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}} \right) + \left( \frac{\partial C}{\partial \xi } - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial \xi } \right) - \lambda \kappa =0 \end{aligned}$$
(3.23)

the left boundary conditions at \(t=a\)

$$\begin{aligned} \hbox {d}\!\rho :&\quad \Omega (a) = \Omega _a \end{aligned}$$
(3.24)
$$\begin{aligned} \delta \xi (a):&\quad \quad \left[ \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}} \right] _{t=a} = 0 \end{aligned}$$
(3.25)

and the right boundary conditions at \(t=b\)

$$\begin{aligned} \text{ d }\!\nu :&\quad \Omega (b) = \Omega _b \end{aligned}$$
(3.26)
$$\begin{aligned} \mathrm {d} \xi (b):&\quad \left[ \frac{\partial C}{\partial {\dot{\xi }}} - \left\langle \kappa \, , \, \xi \right\rangle \frac{\partial \lambda }{\partial {\dot{\xi }}} \right] _{t=b} = 0 \end{aligned}$$
(3.27)
$$\begin{aligned} \text{ d }\!b:&\quad \left[ C-\left\langle \frac{\partial C}{\partial {\dot{\Omega }}}, {\dot{\Omega }} \right\rangle - \left\langle \kappa ,- {\dot{{\mathbb {I}}}} \Omega + \mathrm{ad}^*_\Omega {\mathbb {I}} \Omega + \lambda \xi \right\rangle \right] _{t=b} = 0 \end{aligned}$$
(3.28)

where \(\lambda \) is given by (3.7) and the final right boundary condition (3.28) is only needed if the terminal time b is free. Equations (3.21), (3.22), and (3.23) together with the left boundary conditions (3.24)–(3.25) and the right boundary conditions (3.26)–(3.27) and, if needed, (3.28), constitute the optimally controlled equations of motion for Suslov’s problem using change in the nonholonomic constraint direction as the control.

4 Suslov’s Optimal Control Problem for Rigid Body Motion

4.1 Derivation of Suslov’s Pure Equations of Motion

Having discussed the formulation of Suslov’s problem in the general case for an arbitrary group, let us now turn our attention to the case of the particular Lie group \(G=\textit{SO}(3)\), which represents Suslov’s problem in its original formulation and where the unreduced Lagrangian is \(L=L(\Lambda , {\dot{\Lambda }})\), with \(\Lambda \in \textit{SO}(3)\). Suslov’s problem studies the behavior of the body angular velocity \(\varvec{\Omega }\equiv \left[ \Lambda ^{-1} {\dot{\Lambda }} \right] ^\vee \in {\mathbb {R}}^3\) subject to the nonholonomic constraint

$$\begin{aligned} \left\langle \varvec{\Omega }, \varvec{\xi }\right\rangle =0 \, \end{aligned}$$
(4.1)

for some prescribed, possibly time-varying vector \(\varvec{\xi }\in {\mathbb {R}}^3\) expressed in the body frame. Physically, such a system corresponds to a rigid body rotating about a fixed point, with the rotation required to be normal to the prescribed vector \(\varvec{\xi }(t)\in {\mathbb {R}}^3\) . The fact that the vector \(\varvec{\xi }\) identifying the nonholonomic constraint is defined in the body frame makes direct physical interpretation and realization of Suslov’s problem somewhat challenging. Still, Suslov’s problem is perhaps one of the simplest and, at the same time, most insightful and pedagogical problems in the field of nonholonomic mechanics, and has attracted considerable attention in the literature. The original formulation of this problem is due to Suslov in 1902 (Suslov 1946) (still only available in Russian), where he assumed that \(\varvec{\xi }\) was constant. This research considers the more general case where \(\varvec{\xi }\) varies with time. In order to match the standard state-space notation in control theory, the state-space control is assumed to be \(\varvec{u}= {\dot{\varvec{\xi }}}\). We shall also note that the control theoretical treatment of unconstrained rigid body motion from the geometric point of view is discussed in detail in Agrachev and Sachkov (2004), Chapters 19 (for general compact Lie groups) and 22.

For conciseness, the time dependence of \(\varvec{\xi }\) is often suppressed in what follows. We shall note that there is a more general formulation of Suslov’s problem when \(G=\textit{SO}(3)\) which includes a potential energy in the Lagrangian,

$$\begin{aligned} \ell (\varvec{\Omega },\varvec{\Gamma })=\frac{1}{2} \left\langle {\mathbb {I}} \varvec{\Omega }, \varvec{\Omega }\right\rangle -U(\varvec{\Gamma }), \quad \varvec{\Gamma }= \Lambda ^{-1} \mathbf {e}_3. \end{aligned}$$
(4.2)

Depending on the type of potential energy, there are up to 3 additional integrals of motion. For a review of Suslov’s problem and a summary of results in this area, the reader is referred to an article by Kozlov (2002).

Let us choose a body frame coordinate system with an orthonormal basis \(\left( \mathbf {E}_1,\mathbf {E}_2,\mathbf {E}_3 \right) \) in which the rigid body’s inertia matrix \({\mathbb {I}}\) is diagonal (i.e., \({\mathbb {I}} = \mathrm {diag}({{\mathbb {I}}_1,{\mathbb {I}}_2,{\mathbb {I}}_3})\)) and suppose henceforth that all body frame tensors are expressed with respect to this particular choice of coordinate system. Let \(\left( \mathbf {e}_1,\mathbf {e}_2,\mathbf {e}_3 \right) \) denote the orthonormal basis for the spatial frame coordinate system and denote the transformation from the body to spatial frame coordinate systems by the rotation matrix \(\Lambda (t) \in \textit{SO}(3)\). The rigid body’s Lagrangian is its kinetic energy: \(l = \frac{1}{2} \left\langle {\mathbb {I}} \varvec{\Omega }, \varvec{\Omega }\right\rangle \). Applying Lagrange–d’Alembert’s principle to the nonholonomic constraint (4.1) yields the equations of motion

$$\begin{aligned} {\mathbb {I}} {\dot{\varvec{\Omega }}} = \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }+ \lambda \varvec{\xi }\, , \end{aligned}$$
(4.3)

where the Lagrange multiplier \(\lambda \) is given as

$$\begin{aligned} \lambda = - \frac{ \left\langle \varvec{\Omega }, {\dot{\varvec{\xi }}} \right\rangle + \left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle }{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle }, \end{aligned}$$
(4.4)

thereby incorporating the constraint equation. In order to make \(\lambda \) well defined in (4.4), note that it is implicitly assumed that \(\varvec{\xi }\ne 0\) (i.e., \(\varvec{\xi }(t) \ne 0 \; \forall t\)). As is easy to verify, equations (4.3) and (4.4) are a particular case of the equations of motion (3.6) and the Lagrange multiplier (3.8). Also, equations (4.3) and (4.4) generalize the well-known equations of motion for Suslov’s problem (Bloch 2003) to the case of time-varying \(\varvec{\xi }(t)\). For the purposes of optimal control theory, we rewrite (4.3) and (4.4) into a single equation as

$$\begin{aligned} \mathbf {q}\left( \varvec{\Omega },\varvec{\xi }\right) := \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left[ {\mathbb {I}} {\dot{\varvec{\Omega }}} - \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right] + \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\xi }=\mathbf {0}.\nonumber \\ \end{aligned}$$
(4.5)

We would like to state several useful observations about the nature of the dynamics in the free Suslov’s problem, i.e., the results that are valid for arbitrary \(\varvec{\xi }(t)\), before proceeding to the optimal control case.

On the nature of constraint preservation Suppose that \(\varvec{\Omega }(t)\) is a solution to (4.5) (equivalently (4.3)), for a given \(\varvec{\xi }(t)\) with \(\lambda \) given by (4.4). We can rewrite the equation for the Lagrange multiplier as

$$\begin{aligned} \begin{aligned} \lambda&= - \frac{1}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \displaystyle {\frac{\mathrm {d} }{\mathrm {d} t}} \left\langle \varvec{\Omega }, \varvec{\xi }\right\rangle + \frac{ \left\langle {\mathbb {I}} {\dot{\varvec{\Omega }}}-\left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle }{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle }. \end{aligned} \end{aligned}$$
(4.6)

On the other hand, multiplying both sides of (4.3) by \({\mathbb {I}}^{-1} \varvec{\xi }\) and solving for \(\lambda \) gives

$$\begin{aligned} \lambda =\frac{ \left\langle {\mathbb {I}} {\dot{\varvec{\Omega }}}-\left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle }{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle }. \end{aligned}$$
(4.7)

Thus, from (4.6) and (4.7) it follows that the equations of motion (4.5) with \(\lambda \) given by (4.4) lead to \(\displaystyle {\frac{\mathrm {d} }{\mathrm {d} t}} \left\langle \varvec{\Omega }, \varvec{\xi }\right\rangle = 0\), so that \(\left\langle \varvec{\Omega }, \varvec{\xi }\right\rangle = c\), a constant that is not necessarily equal to 0. In other words, the equations (4.5), (4.4) need an additional condition determining the value of \(\left\langle \varvec{\Omega }, \varvec{\xi }\right\rangle =0\). Therefore, a solution \(\left( \varvec{\Omega },\varvec{\xi }\right) \) to Suslov’s problem requires that \(\mathbf {q}\left( \varvec{\Omega },\varvec{\xi }\right) =\mathbf {0}\) and \(\left\langle \varvec{\Omega }(a), \varvec{\xi }(a) \right\rangle =0\), where \(t=a\) is the initial time.

On the invariance of solutions with respect to scaling of \(\varvec{\xi }\) In the classical formulation of Suslov’s problem, it is usually assumed that \(|\varvec{\xi }|=1\). When \(\varvec{\xi }(t)\) is allowed to change, the normalization of \(\varvec{\xi }\) becomes an issue that needs to be clarified. Indeed, suppose that \(\varvec{\Omega }(t)\) is a solution to (4.5) for a given \(\varvec{\xi }(t)\), so that \(\mathbf {q}\left( \varvec{\Omega },\varvec{\xi }\right) =\mathbf {0}\) and further assume that \(\left\langle \varvec{\Omega },\varvec{\xi }\right\rangle = 0\). Next, consider a smooth, scalar-valued function \(\pi (t)\) with \(\pi (t) \ne 0\) on the interval \(t \in [a,b]\), and consider the pair \(\left( \varvec{\Omega },\pi \varvec{\xi }\right) \). Then,

$$\begin{aligned} \mathbf {q}\left( \varvec{\Omega },\pi \varvec{\xi }\right)&= \left\langle \pi \varvec{\xi }, {\mathbb {I}}^{-1} \left( \pi \varvec{\xi }\right) \right\rangle \left[ {\mathbb {I}} {\dot{\varvec{\Omega }}} - \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right] \nonumber \\&\qquad + \left[ \left\langle \varvec{\Omega },\left( \pi \varvec{\xi }\right) ^\cdot \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \left( \pi \varvec{\xi }\right) \right\rangle \right] \pi \varvec{\xi }\nonumber \\&= \pi ^2 \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left[ {\mathbb {I}} {\dot{\varvec{\Omega }}} - \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right] \nonumber \\&\qquad + \left[ \left\langle \varvec{\Omega },{\dot{\pi }} \varvec{\xi }+ \pi {\dot{\varvec{\xi }}} \right\rangle + \pi \left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \pi \varvec{\xi }\nonumber \\&= \pi ^2 \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left[ {\mathbb {I}} {\dot{\varvec{\Omega }}} - \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right] + \left[ \pi \left\langle \varvec{\Omega }, {\dot{\varvec{\xi }}} \right\rangle + \pi \left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \pi \varvec{\xi }\nonumber \\&= \pi ^2 \mathbf {q}\left( \varvec{\Omega },\varvec{\xi }\right) = \mathbf {0}. \end{aligned}$$
(4.8)

Hence, a solution \( \varvec{\Omega }(t)\) to (4.5) with \(\left\langle \varvec{\Omega },\varvec{\xi }\right\rangle =c = 0\) does not depend on the magnitude of \(\varvec{\xi }(t)\). As it turns out, this creates a degeneracy in the optimal control problem that has to be treated with care.

Energy conservation Multiplying both sides of (4.3) by \(\varvec{\Omega }\), gives the time derivative of kinetic energy:

$$\begin{aligned} {\dot{T}}(t) = \frac{\mathrm {d}}{\mathrm {d}t} \left\{ \frac{1}{2} \left\langle {\mathbb {I}} \varvec{\Omega }, \varvec{\Omega }\right\rangle \right\} = \left\langle {\mathbb {I}} \varvec{\Omega }, {\dot{\varvec{\Omega }}} \right\rangle = \lambda \left\langle \varvec{\Omega }, \varvec{\xi }\right\rangle = \lambda c, \end{aligned}$$
(4.9)

where we have denoted \(\left\langle \varvec{\Omega },\varvec{\xi }\right\rangle =c=\) const. Thus, if \(c=0\) (as is the case for Suslov’s problem), kinetic energy is conserved:

$$\begin{aligned} T(t) = \frac{1}{2} \left\langle {\mathbb {I}} \varvec{\Omega }, \varvec{\Omega }\right\rangle = \frac{1}{2} \sum _{i=1}^3 {\mathbb {I}}_i \varvec{\Omega }_i^2=e_S, \end{aligned}$$
(4.10)

for some positive constant \(e_S\), and \(\varvec{\Omega }\) lies on the surface of an ellipsoid which we will denote by E. The constant kinetic energy ellipsoid determined by the rigid body’s inertia matrix \({\mathbb {I}}\) and initial body angular velocity \(\varvec{\Omega }(a)=\varvec{\Omega }_a\) on which \(\varvec{\Omega }\) lies is denoted by

$$\begin{aligned} E = E({\mathbb {I}},\varvec{\Omega }_a) = \left\{ \mathbf {v}\in {\mathbb {R}}^3 : \left\langle \mathbf {v},{\mathbb {I}} \mathbf {v}\right\rangle = \left\langle \varvec{\Omega }_a,{\mathbb {I}} \varvec{\Omega }_a\right\rangle \right\} . \end{aligned}$$
(4.11)

Integrating (4.9) with respect to time from a to b gives the change in kinetic energy:

$$\begin{aligned} T(b)-T(a) = c \int _a^b \lambda \, \mathrm {d} t. \end{aligned}$$
(4.12)

Thus, \(\varvec{\Omega }(a)\) and \(\varvec{\Omega }(b)\) lie on the surface of the same ellipsoid iff \(c=0\) or \(\int _a^b \lambda \, \mathrm {d} t=0\). If \(c=0\), as is the case for Suslov’s problem, the conservation of kinetic energy holds for all choices of \(\varvec{\xi }\), constant or time-dependent. We shall note that if the vector \(\varvec{\xi }\) is constant in time, and is an eigenvector of the inertia matrix \({\mathbb {I}}\), then there is an additional integral \(\frac{1}{2} \left\langle {\mathbb {I}} \varvec{\Omega }, {\mathbb {I}} \varvec{\Omega }\right\rangle \). However, for \(\varvec{\xi }(t)\) varying in time, which is the case studied here, such an integral does not apply.

4.2 Controllability and Accessibility of Suslov’s Pure Equations of Motion

We shall now turn our attention to the problem of controlling Suslov’s problem by changing the vector \(\varvec{\xi }(t)\) in time. Before posing the optimal control problem, let us first consider the general question of controllability and accessibility using the Lie group approach to controllability as derived in Brockett (1972), Nijmeijer and Van der Schaft (2013), and Isidori (2013). Since for the constraint \(\left\langle \varvec{\Omega }, \varvec{\xi }\right\rangle =0\) all trajectories must lie on the energy ellipsoid (4.11), both the initial and terminal point of the trajectory must lie on the ellipsoid corresponding to the same energy. We shall therefore assume that the initial and terminal points, as well as the trajectory itself, lie on the ellipsoid (4.11). Before we proceed, let us remind the reader of the relevant definitions and theorems concerning controllability and accessibility, following Bloch (2003).

Definition 4.1

An affine nonlinear control system is a differential equation having the form

$$\begin{aligned} {\dot{x}} = f(x)+\sum _{i=1}^k g_i (x) u_i, \end{aligned}$$
(4.13)

where M is a smooth n-dimensional manifold, \(x \in M\), \(u=\left( u_1,\ldots ,u_k \right) \) is a time-dependent, vector-valued map from \({\mathbb {R}}\) to a constraint set \(\Phi \subset {\mathbb {R}}^k\), and f and \(g_i\), \(i=1,\ldots ,k\), are smooth vector fields on M. The manifold M is said to be the state-space of the system, u is said to be the control, f is said to be the drift vector field, and \(g_i\), \(i=1,\ldots ,k\), are said to be the control vector fields. u is assumed to be piecewise smooth or piecewise analytic, and such a u is said to be admissible. If \(f \equiv 0\), the system (4.13) is said to be driftless; otherwise, the system (4.13) is said to have drift.

Definition 4.2

Let a be a fixed initial time. The system (4.13) is said to be controllable if for any pair of states \(x_a, x_b \in M\) there exists a terminal time \(b \ge a\) and an admissible control u defined on the time interval [ab] such that there is a trajectory of (4.13) with \(x(a) = x_a\) and \(x(b) = x_b\).

Definition 4.3

Given \(x_a \in M\) and a time \(t \ge a\), \(R(x_a,t)\) is defined to be the set of all \(y \in M\) for which there exists an admissible control u defined on the time interval [at] such that there is a trajectory of (4.13) with \(x(a) = x_a\) and \(x(t) = y\). The reachable set from \(x_a\) at time \(b \ge a\) is defined to be

$$\begin{aligned} R_b(x_a) = \bigcup _{a \le t \le b} R(x_a,t). \end{aligned}$$
(4.14)

Definition 4.4

The accessibility algebra \({\mathcal {C}}\) of the system (4.13) is the smallest Lie algebra of vector fields on M that contains the vector fields f and \(g_i\), \(i=1,\ldots ,k\); that is, \({\mathcal {C}}=\text{ Lie } \, \{\mathbf {f},\mathbf {g}_1,\ldots ,\mathbf {g}_k \}\) is the span of all possible Lie brackets of f and \(g_i\), \(i=1,\ldots ,k\).

Definition 4.5

The accessibility distribution C of the system (4.13) is the distribution generated by the vector fields in \({\mathcal {C}}\); that is, given \(x_a \in M\), \(C(x_a) =\text{ Lie }_{x_a} \, \{\mathbf {f},\mathbf {g}_1,\ldots ,\mathbf {g}_k \}\) is the span of the vector fields X in \({\mathcal {C}}\) at \(x_a\).

Definition 4.6

The system (4.13) is said to be accessible from \(x_a \in M\) if for every \(b > a\), \(R_b(x_a)\) contains a nonempty open set.

Theorem 4.7

If \(\mathrm {dim} \; C(x_a)=n\) for some \(x_a \in M\), then the system (4.13) is accessible from \(x_a\).

Theorem 4.8

Suppose the system (4.13) is analytic. If \(\mathrm {dim} \; C(x_a)=n \; \forall x_a \in M\) and \(f=0\), then the system (4.13) is controllable.

To apply the theory of controllability and accessibility to Suslov’s problem, we first need to rewrite the equations of motion for Suslov’s problem in the “affine nonlinear control” form

$$\begin{aligned} {\dot{{\mathbf {x} }}} = \mathbf {f}(\mathbf x)+\sum _{i=1}^3 \mathbf {g}_i (\mathbf x) u_i, \end{aligned}$$
(4.15)

where \({\mathbf {x} }\) is the state variable and \(u_i\) are the controls. We denote the state of the system by \(\mathbf x \equiv \left[ \begin{array}{c} \varvec{\Omega }\\ \varvec{\xi }\\ \end{array} \right] \) and the control by \(\mathbf {u} \equiv {\dot{\varvec{\xi }}}\). Thus, the individual components of the state and control are \(x_1 = \Omega _1\), \(x_2 = \Omega _2\), \(x_3 = \Omega _3\), \(x_4 = \xi _1\), \(x_5 = \xi _2\), \(x_6 = \xi _3\), \(u_1 = {{\dot{\xi }}}_1\), \(u_2 = {{\dot{\xi }}}_2\), and \(u_3 = {{\dot{\xi }}}_3\). The equations of motion (4.5) can be expressed as

$$\begin{aligned} {\dot{\varvec{\Omega }}} = \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }- \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\xi }\right\} \,, \quad {\dot{\varvec{\xi }}} = \mathbf {u}. \end{aligned}$$
(4.16)

To correlate (4.16) with (4.15), the functions \(\mathbf {f}\) and \(\mathbf {g}\) in (4.15) are defined as

$$\begin{aligned} \mathbf {f}(\mathbf x) \equiv \left[ \begin{array}{c} \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }- \left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \varvec{\xi }\right\} \\ \mathbf {0}_{3 \times 1}\\ \end{array} \right] \end{aligned}$$
(4.17)

and

$$\begin{aligned} \mathbf {g}_i(\mathbf x) \equiv \left[ \begin{array}{c} -\frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \Omega _i \varvec{\xi }\\ \mathbf {e}_i\\ \end{array} \right] \quad \text{ for } \quad 1 \le i \le 3. \end{aligned}$$
(4.18)

Here, \(\mathbf {f}(\mathbf {x}) \) is the drift vector field and \(\mathbf {g}_i(\mathbf {x})\), \(1 \le i \le 3\), are the control vector fields; \(\mathbf {0}_{3 \times 1} = \left( 0,0,0\right) ^T\) denotes the \(3 \times 1\) column vector of zeros and \(\mathbf {e}_i\), \(i=1,2,3\), denote the standard orthonormal basis vectors for \({\mathbb {R}}^3\). An alternative way to express each control vector field \(\mathbf {g}_i\), \(1 \le i \le 3\), is through the differential geometric notation

$$\begin{aligned} \mathbf {g}_i = \frac{-\Omega _i \xi _m }{d_m \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \frac{\partial }{\partial \Omega _m}+\frac{\partial }{\partial \xi _i}. \end{aligned}$$
(4.19)

As noted in the previous section, the first three components, \(\varvec{\Omega }\), of the state \(\mathbf {x}\) solving (4.15) must lie on the ellipsoid E given in (4.11), under the assumption that

$$\begin{aligned} \left\langle \varvec{\Omega }(a),\varvec{\xi }(a)\right\rangle =0 \end{aligned}$$
(4.20)

for some time a. As shown in the previous section, (4.20) implies that a solution of (4.15) satisfies \(\left\langle \varvec{\Omega }(t),\varvec{\xi }(t)\right\rangle =0\) for all t. Also, it is assumed that \(\varvec{\xi }\ne 0\) (i.e., \(\varvec{\xi }(t) \ne 0 \; \forall t\)). Hence, the state-space manifold is \(M = \left\{ \mathbf {x} \in {\mathbb {R}}^6 | \, \frac{1}{2} \left\langle {\mathbb {I}} \varvec{\Omega }, \, \varvec{\Omega }\right\rangle =e_S, \left\langle \varvec{\Omega },\varvec{\xi }\right\rangle =0,\right. \left. \varvec{\xi }\ne 0 \right\} \). Let \(K = {\mathbb {R}}^6 \backslash \ \mathbf {0}\), a 6-dimensional submanifold of \({\mathbb {R}}^6\). Note that \(M = \Phi ^{-1}(\mathbf {0}_{2 \times 1})\), where \(\Phi : K \rightarrow {\mathbb {R}}^2\) is defined by

$$\begin{aligned} \Phi (\mathbf {x}) = \left[ \begin{array}{c} \frac{1}{2} \left\langle {\mathbb {I}} \varvec{\Omega }, \, \varvec{\Omega }\right\rangle - e_S \\ \left\langle \varvec{\Omega },\varvec{\xi }\right\rangle \end{array} \right] . \end{aligned}$$
(4.21)

The derivative of \(\Phi \) at \(\mathbf {x} \in K\), \(\left( \Phi _*\right) _\mathbf {x} : T_\mathbf {x} K \rightarrow T_{\Phi (\mathbf {x})} {\mathbb {R}}^2\), is

$$\begin{aligned} \left( \Phi _*\right) _\mathbf {x} = \left[ \begin{array}{cccccc} d_1 \Omega _1 &{}\quad d_2 \Omega _2 &{}\quad d_3 \Omega _3 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ \xi _1 &{}\quad \xi _2 &{}\quad \xi _3 &{}\quad \Omega _1 &{}\quad \Omega _2 &{}\quad \Omega _3 \end{array} \right] = \left[ \begin{array}{cc} {\mathbb {I}} \varvec{\Omega }&{}\quad \varvec{\xi }\\ \mathbf {0}_{3 \times 1} &{}\quad \varvec{\Omega }\end{array} \right] ^{\mathsf {T}}. \end{aligned}$$
(4.22)

Since \(\left( \Phi _*\right) _\mathbf {x}\) has rank 2 for each \(\mathbf {x} \in K\), \(\Phi \) is by definition a submersion and \(M = \Phi ^{-1}(\mathbf {0}_{2 \times 1})\) is a closed embedded submanifold of K of dimension 4 by Corollary 8.9 of Lee (2003). Being an embedded submanifold of K, M is also an immersed submanifold of K (Lee 2003).

The tangent space to M at \(\mathbf {x} \in M\) is

$$\begin{aligned} T_{\mathbf {x}} M = \left\{ \mathbf {v} \in T_{\mathbf {x}} K = {\mathbb {R}}^6 | \left( \Phi _*\right) _\mathbf {x} \left( \mathbf {v} \right) = \mathbf {0}_{2 \times 1} \right\} . \end{aligned}$$
(4.23)

Using (4.17), (4.18), and (4.22), it is easy to check that \( \left( \Phi _*\right) _\mathbf {x} \left( \mathbf {f}(\mathbf x) \right) = \mathbf {0}_{2 \times 1}\) and \( \left( \Phi _*\right) _\mathbf {x} \left( \mathbf {g}_i(\mathbf x) \right) = \mathbf {0}_{2 \times 1}\) for \(1 \le i \le 3\). Hence, \(\mathbf {f}(\mathbf x) \in T_{\mathbf {x}} M\) and \(\mathbf {g}_i(\mathbf x) \in T_{\mathbf {x}} M\) for \(1 \le i \le 3\) by Lemma 8.15 of Lee (2003). So \(\mathbf {f}, \mathbf {g}_1, \mathbf {g}_2,\) and \(\mathbf {g}_3\) are smooth vector fields on K which are also tangent to M. Since M is an immersed submanifold of K, \(\left[ \mathbf {X},\mathbf {Y}\right] \) is tangent to M if \(\mathbf {X}\) and \(\mathbf {Y}\) are smooth vector fields on K that are tangent to M, by Corollary 8.28 of Lee (2003). Hence, \(\text{ Lie }_{\mathbf {x}} \, \{\mathbf {f},\mathbf {g}_1,\mathbf {g}_2,\mathbf {g}_3 \} \subset T_{\mathbf {x}} M\) and therefore \(\text{ rank } \, \text{ Lie }_{\mathbf {x}} \, \{\mathbf {f},\mathbf {g}_1,\mathbf {g}_2,\mathbf {g}_3 \} \le \dim \, T_{\mathbf {x}} M = 4\).

For \(1 \le i,j \le 3\) and \(i \ne j\), the Lie bracket of the control vector field \(\mathbf {g}_i\) with the control vector field \(\mathbf {g}_j\) is computed as

$$\begin{aligned} \begin{aligned} \left[ \mathbf {g}_i,\mathbf {g}_j \right]&= \left[ \frac{-\Omega _i \xi _m }{d_m \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \frac{\partial }{\partial \Omega _m}+\frac{\partial }{\partial \xi _i},\frac{-\Omega _j \xi _l }{d_l \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \frac{\partial }{\partial \Omega _l}+\frac{\partial }{\partial \xi _j} \right] \\&= \frac{\Omega _i \xi _m \xi _l \delta _{mj}}{d_m d_l \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \frac{\partial }{\partial \Omega _l}-\frac{\Omega _j }{d_l} \left\{ \frac{\partial }{\partial \xi _i} \left( \frac{\xi _l}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \right) \right\} \frac{\partial }{\partial \Omega _l} \\&\qquad -\frac{\Omega _j \xi _l \xi _m \delta _{il}}{d_l d_m \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \frac{\partial }{\partial \Omega _m} +\frac{\Omega _i }{d_m} \left\{ \frac{\partial }{\partial \xi _j} \left( \frac{\xi _m}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \right) \right\} \frac{\partial }{\partial \Omega _m}\\&= \frac{\Omega _i \frac{\xi _j}{d_j} - \Omega _j \frac{\xi _i}{d_i}}{d_l \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \xi _l \frac{\partial }{\partial \Omega _l}-\frac{\Omega _j \delta _{il} }{d_l \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \frac{\partial }{\partial \Omega _l}+ \frac{2 \Omega _j \xi _i \xi _l}{d_i d_l \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \frac{\partial }{\partial \Omega _l} \\&\qquad +\frac{\Omega _i \delta _{jm} }{d_m \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \frac{\partial }{\partial \Omega _m}-\frac{2 \Omega _i \xi _j \xi _m}{d_j d_m \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \frac{\partial }{\partial \Omega _m} \\&= \frac{\Omega _j \frac{\xi _i}{d_i}-\Omega _i \frac{\xi _j}{d_j} }{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \frac{\xi _l}{d_l} \frac{\partial }{\partial \Omega _l}+\frac{\Omega _i }{d_j \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \frac{\partial }{\partial \Omega _j}-\frac{\Omega _j }{d_i \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \frac{\partial }{\partial \Omega _i}, \end{aligned} \end{aligned}$$
(4.24)

recalling that \(\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle = \sum _{i=1}^3 \frac{\xi _i^2}{d_i} \).

Next, to prove controllability and compute the appropriate prolongation, consider the \(6 \times 6\) matrix comprised of the columns of the vector fields \(\mathbf {g}_i(\mathbf {x})\) and their commutators \([\mathbf {g}_i(\mathbf {x}),\mathbf {g}_j(\mathbf {x})]\), projected to the basis of the space \((\partial _{\varvec{\Omega }}, \partial _{\varvec{\xi }})\):

$$\begin{aligned} V = \left[ \mathbf {g}_1,\mathbf {g}_2,\mathbf {g}_3,\left[ \mathbf {g}_1,\mathbf {g}_2 \right] ,\left[ \mathbf {g}_1,\mathbf {g}_3 \right] ,\left[ \mathbf {g}_2,\mathbf {g}_3 \right] \right] = \left[ \begin{array}{cc} -\frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \varvec{\xi }\otimes \varvec{\Omega }^\mathsf {T} &{} \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } A \\ I_{3 \times 3} &{} \mathbf {0}_{3 \times 3} \end{array} \right] ,\nonumber \\ \end{aligned}$$
(4.25)

where we have defined

$$\begin{aligned}&A = \left[ \begin{array}{ccc} -\Omega _2 &{}\quad -\Omega _3 &{}\quad 0 \\ \Omega _1 &{}\quad 0 &{}\quad -\Omega _3 \\ 0 &{}\quad \Omega _1 &{}\quad \Omega _2 \end{array} \right] + \frac{1}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \varvec{\xi }\otimes \left[ \left( \varvec{\Omega }\times {\mathbb {I}}^{-1} \varvec{\xi }\right) ^\mathsf {T} D \right] \, , \nonumber \\&\quad D = \left[ \begin{array}{ccc} 0 &{}\quad 0 &{}\quad -1 \\ 0 &{}\quad 1 &{}\quad 0 \\ -1 &{}\quad 0 &{}\quad 0 \end{array} \right] . \end{aligned}$$
(4.26)

In (4.25), \(I_{3 \times 3}\) denotes the \(3 \times 3\) identity matrix and \(\mathbf {0}_{3 \times 3}\) denotes the \(3 \times 3\) zero matrix. Since \(V \subset \text{ Lie }_{\mathbf {x}} \, \{\mathbf {f},\mathbf {g}_1,\mathbf {g}_2,\mathbf {g}_3 \}\), \( \text{ rank } \, V \le \text{ rank } \, \text{ Lie }_{\mathbf {x}} \, \{\mathbf {f},\mathbf {g}_1,\mathbf {g}_2,\mathbf {g}_3 \} \le \dim \, T_{\mathbf {x}} M = 4\). It will be shown that \( \text{ rank } \, V = 4\), so that \(\text{ rank } \, \text{ Lie }_{\mathbf {x}} \, \{\mathbf {f},\mathbf {g}_1,\mathbf {g}_2,\mathbf {g}_3 \} = \dim \, T_{\mathbf {x}} M = 4\).

Since the bottom 3 rows of the first 3 columns of V are \(I_{3 \times 3}\), the first 3 columns of V are linearly independent. Note that since \({\mathbb {I}}^{-1}\) is a diagonal matrix with positive diagonal entries, \(\text{ rank } \, \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } A = \text{ rank } \, A\). If \(\text{ rank } \, \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } A = \text{ rank } \, A >0\), each of the last 3 columns of V, if nonzero, is linearly independent of the first 3 columns of V since the bottom 3 rows of the first 3 columns are \(I_{3 \times 3}\) and the bottom 3 rows of the last 3 columns are \(\mathbf {0}_{3 \times 3}\). Hence, \(\text{ rank } \, V = 3+\text{ rank } \, A\). Since \(\text{ rank } \, V \le 4\), \(\text{ rank } \, A\) is 0 or 1.

The first matrix in the sum composing A in (4.26) has rank 2, since \(\varvec{\Omega }\ne \mathbf {0}\) (i.e., at least one component of \(\varvec{\Omega }\) is nonzero). The 3 columns of the first matrix in (4.26) are each orthogonal to \(\varvec{\Omega }\) and have rank 2; hence, the columns of the first matrix in (4.26) span the 2-dimensional plane in \({\mathbb {R}}^3\) orthogonal to \(\varvec{\Omega }\). Since \(\left\langle \varvec{\Omega },\varvec{\xi }\right\rangle =0\), \(\varvec{\xi }\) lies in the 2-dimensional plane orthogonal to \(\varvec{\Omega }\) and so lies in the span of the columns of the first matrix. Thus, \(\varvec{\xi }\) and \(\varvec{\Omega }\times \varvec{\xi }\) is an orthogonal basis for the plane in \({\mathbb {R}}^3\) orthogonal to \(\varvec{\Omega }\). Since the columns of the first matrix span this plane, at least one column, say the \(j^\mathrm {th}~(1 \le j \le 3)\), has a nonzero component parallel to \(\varvec{\Omega }\times \varvec{\xi }\). The second matrix in the sum composing A in (4.26) consists of 3 column vectors, each of which is a scalar multiple of \(\varvec{\xi }\). Hence, the \(j^\mathrm {th}\) column in A has a nonzero component parallel to \(\varvec{\Omega }\times \varvec{\xi }\). Thus, A has rank 1, V has rank 4, and \(\text{ rank } \, \text{ Lie }_{\mathbf {x}} \, \{\mathbf {f},\mathbf {g}_1,\mathbf {g}_2,\mathbf {g}_3 \} = \dim \, T_{\mathbf {x}} M = 4\). By Theorems 4.7 and 4.8, this implies that (4.15) is controllable or accessible, depending on whether \(\mathbf {f}\) is nonzero. Thus, we have proved

Theorem 4.9

(On the controllability and accessibility of Suslov’s problem) Suppose we have Suslov’s problem \(\varvec{q}\left( \varvec{\Omega },\varvec{\xi }\right) =\mathbf {0}\) with the control variable \({\dot{\varvec{\xi }}}(t)\). Then,

  1. 1.

    If \({\mathbb {I}} = c I_{3 \times 3}\) for a positive constant c, then \(\varvec{\Omega }\) lies on a sphere of radius c, \(\mathbf {f}=\mathbf {0}\) for all points in M, and (4.15) is driftless and controllable.

  2. 2.

    If \({\mathbb {I}} \ne c I_{3 \times 3}\) for all positive constants c (i.e., at least two of the diagonal entries of \({\mathbb {I}}\) are unequal), then \(\varvec{\Omega }\) lies on a nonspherical ellipsoid, \(\mathbf {f} \ne \mathbf {0}\) at most points in M, and (4.15) has drift and is accessible.

4.3 Suslov’s Optimal Control Problem

Let us now turn our attention to the optimal control of Suslov’s problem by varying the direction \(\varvec{\xi }(t)\). The general theory was outlined in Sect. 3.2, so we will go through the computations briefly, while at the same time trying to make this section as self-contained as possible. Suppose it is desired to maneuver Suslov’s rigid body from a prescribed initial body angular velocity \(\varvec{\Omega }_a \in E\) at a prescribed initial time \(t=a\) to another prescribed terminal body angular velocity \(\varvec{\Omega }_b \in E\) at a fixed or free terminal time \(t=b,\) where \(b \ge a\), subject to minimizing some time-dependent cost function C over the duration of the maneuver (such as minimizing the energy of the control vector \(\varvec{\xi }\) or minimizing the duration \(b-a\) of the maneuver). Note that since a solution to Suslov’s problem conserves kinetic energy, it is always assumed that \(\varvec{\Omega }_a,\varvec{\Omega }_b \in E\). Thus, a time-varying control vector \(\varvec{\xi }\) and terminal time b are sought that generate a time-varying body angular velocity \(\varvec{\Omega }\), such that \(\varvec{\Omega }(a)=\varvec{\Omega }_a \in E\), \(\varvec{\Omega }(b)=\varvec{\Omega }_b \in E\), \(\left\langle \varvec{\Omega }(a),\varvec{\xi }(a) \right\rangle =0\), the pure equations of motion \(\mathbf {q}=\mathbf {0}\) are satisfied for \(a \le t \le b\), and \(\int _a^b C \mathrm {d}t\) is minimized.

The natural way to formulate this optimal control problem is:

$$\begin{aligned} \min _{\varvec{\xi },\,b} \int _a^b C \, \mathrm {d} t \, \text{ s.t. } \, \left\{ \begin{array}{ll} \mathbf {q}=0, \\ \varvec{\Omega }(a)=\varvec{\Omega }_a \in E,\\ \left\langle \varvec{\Omega }(a),\varvec{\xi }(a)\right\rangle =0,\\ \varvec{\Omega }(b)=\varvec{\Omega }_b \in E. \end{array} \right. \end{aligned}$$
(4.27)

The collection of constraints in (4.27) is actually overdetermined. To see this, recall that a solution to \(\mathbf {q}=0\), \(\varvec{\Omega }(a)=\varvec{\Omega }_a\), and \(\left\langle \varvec{\Omega }(a),\varvec{\xi }(a)\right\rangle =0\) sits on the constant kinetic energy ellipsoid E. If \(\left( \varvec{\Omega },\varvec{\xi }\right) \) satisfies \(\mathbf {q}=\mathbf {0}\), \(\varvec{\Omega }(a)=\varvec{\Omega }_a \in E\), and \(\left\langle \varvec{\Omega }(a),\varvec{\xi }(a)\right\rangle =0\), then \(\varvec{\Omega }(b) \in E\), a 2-d manifold. Thus, only two rather than three parameters of \(\varvec{\Omega }(b)\) need to be prescribed. So the constraint \(\varvec{\Omega }(b)=\varvec{\Omega }_b\) in (4.27) is overprescribed and can lead to singular Jacobians when trying to solve (4.27) numerically, especially via the indirect method. A numerically more stable formulation of the optimal control problem is:

$$\begin{aligned} \min _{\varvec{\xi },\,b} \int _a^b C \, \mathrm {d} t \, \text{ s.t. } \, \left\{ \begin{array}{ll} \mathbf {q}=0, \\ \varvec{\Omega }(a)=\varvec{\Omega }_a \in E,\\ \left\langle \varvec{\Omega }(a),\varvec{\xi }(a)\right\rangle =0,\\ \varvec{\phi }(\varvec{\Omega }(b))=\varvec{\phi }( \varvec{\Omega }_b), \, \mathrm {where} \; \varvec{\Omega }_b \in E, \end{array} \right. \end{aligned}$$
(4.28)

and where \(\varvec{\phi }: E \rightarrow {\mathbb {R}}^2 \) is some parameterization of the 2-d manifold E. For example, \(\varvec{\phi }\) might map a point on E expressed in Cartesian coordinates to its azimuth and elevation in spherical coordinates. Using the properties of the dynamics, the problem (4.27) can be simplified further to read

$$\begin{aligned} \min _{\varvec{\xi },\,b} \int _a^b C \, \mathrm {d} t \, \text{ s.t. } \, \left\{ \begin{array}{ll} \mathbf {q}=0, \\ \varvec{\Omega }(a)=\varvec{\Omega }_a \in E,\\ \varvec{\Omega }(b)=\varvec{\Omega }_b \in E, \end{array} \right. \end{aligned}$$
(4.29)

which omits the constraint \(\left\langle \varvec{\Omega }(a),\varvec{\xi }(a)\right\rangle =0\). One can see that (4.27) and (4.29) are equivalent as follows. Suppose \(\left( \varvec{\Omega },\varvec{\xi }\right) \) satisfies \(\mathbf {q}=\mathbf {0}\), \(\varvec{\Omega }(a)=\varvec{\Omega }_a \in E\), and \(\varvec{\Omega }(b)=\varvec{\Omega }_b \in E\). Since \(\varvec{\Omega }_a,\varvec{\Omega }_b \in E\) have the same kinetic energy, i.e., \(T(a)=\frac{1}{2}\left\langle \varvec{\Omega }_a,{\mathbb {I}} \varvec{\Omega }_a \right\rangle =\frac{1}{2}\left\langle \varvec{\Omega }_b,{\mathbb {I}} \varvec{\Omega }_b \right\rangle =T(b)\), equation (4.12) shows that \(\left\langle \varvec{\Omega }(a),\varvec{\xi }(a)\right\rangle =0\) or \(\int _a^b \lambda \, \mathrm {d} t=0\). The latter possibility, \(\int _a^b \lambda \, \mathrm {d} t=0\), represents an additional constraint and thus is unlikely to occur. Thus, a solution of (4.29) should be expected to satisfy the omitted constraint \(\left\langle \varvec{\Omega }(a),\varvec{\xi }(a)\right\rangle =0\).

In what follows, we assume the following form of the cost function C in (4.29):

$$\begin{aligned} C:= C_{\alpha ,\beta ,\gamma ,\eta ,\delta } \equiv \frac{\alpha }{4} \left[ \left| \varvec{\xi }\right| ^2 -1 \right] ^2+\frac{\beta }{2} \left| {\dot{\varvec{\xi }}} \right| ^2+\frac{\gamma }{2} \left| \varvec{\Omega }- \varvec{\Omega }_d \right| ^2+\frac{\eta }{2} \left| {\dot{\varvec{\Omega }}} \right| ^2+\delta , \end{aligned}$$
(4.30)

where \(\alpha \), \(\beta \), \(\gamma \), \(\eta \), and \(\delta \) are nonnegative constant scalars. The first term in (4.30), \(\frac{\alpha }{4} \left[ \left| \varvec{\xi }\right| ^2 -1 \right] ^2\), encourages the control vector \(\varvec{\xi }\) to have near unit magnitude. The second term in (4.30), \(\frac{\beta }{2} \left| {\dot{\varvec{\xi }}} \right| ^2\), encourages the control vector \(\varvec{\xi }\) to follow a minimum energy trajectory. The first term in (4.30) is needed because the magnitude of \(\varvec{\xi }\) does not affect a solution of \(\mathbf {q}=0\), and in the absence of the first term in (4.30), the second term in (4.30) will try to shrink \(\varvec{\xi }\) to \(\mathbf {0}\), causing numerical instability. An alternative to including the first term in (4.30) is to revise the formulation of the optimal control problem to include the path constraint \(\left| \varvec{\xi }\right| =1\). The third term in (4.30), \(\frac{\gamma }{2} \left| \varvec{\Omega }- \varvec{\Omega }_d \right| ^2\), encourages the body angular velocity \(\varvec{\Omega }\) to follow a prescribed, time-varying trajectory \(\varvec{\Omega }_d\). The fourth term in (4.30), \(\frac{\eta }{2} \left| {\dot{\varvec{\Omega }}} \right| ^2\), encourages the body angular velocity vector \(\varvec{\Omega }\) to follow a minimum energy trajectory. The final term in (4.30), \(\delta \), encourages a minimum time solution.

As in Sect. 4.2, using state-space terminology, the state is \(\mathbf x \equiv \left[ \begin{array}{c} \varvec{\Omega }\\ \varvec{\xi }\\ \end{array} \right] \) and the control is \(\mathbf {u} \equiv {\dot{\varvec{\xi }}}\) for the optimal control problem (4.29). It is always assumed that the control \(\mathbf {u} = {\dot{\varvec{\xi }}}\) is differentiable, and therefore continuous, or equivalently that \(\varvec{\xi }\) is twice differentiable.

4.4 Derivation of Suslov’s Optimally Controlled Equations of Motion

Following the method of Bryson (1975), Hull (2013), to construct a control vector \(\varvec{\xi }\) and terminal time b solving (4.29), the pure equations of motion are added to the cost function through a time-varying Lagrange multiplier vector and the initial and terminal constraints are added using constant Lagrange multiplier vectors. A control vector \(\varvec{\xi }\) and terminal time b are sought that minimize the performance index

$$\begin{aligned} S = \left\langle \varvec{\rho }, \varvec{\Omega }(a)-\varvec{\Omega }_a \right\rangle +\left\langle \varvec{\nu }, \varvec{\Omega }(b)-\varvec{\Omega }_b \right\rangle +\int _a^b \left[ C+\left\langle \varvec{\kappa },\mathbf {q} \right\rangle \right] \mathrm {d}t, \end{aligned}$$
(4.31)

where \(\varvec{\rho }\) and \(\varvec{\nu }\) are constant Lagrange multiplier vectors enforcing the initial and terminal constraints \(\varvec{\Omega }(a)=\varvec{\Omega }_a\) and \(\varvec{\Omega }(b)=\varvec{\Omega }_b\) and \(\varvec{\kappa }\) is a time-varying Lagrange multiplier vector enforcing the pure equations of motion defined by \(\mathbf {q}=\mathbf {0}\) as given in (4.5). In the literature, the time-varying Lagrange multiplier vector used to adjoin the equations of motion to the cost function is often called the adjoint variable or the costate. Henceforth, the time-varying Lagrange multiplier vector is referred to as the costate.

The control vector \(\varvec{\xi }\) and terminal time b minimizing S are found by finding conditions for which the differential of S, \(\mathrm {d}S\), equals 0. The differential of S is defined as the first-order change in S with respect to changes in \(\varvec{\kappa }\), \(\varvec{\Omega }\), \(\varvec{\xi }\), \(\varvec{\rho }\), \(\varvec{\nu }\), and b. Assuming that the cost function is of the form \(C\left( \varvec{\Omega },{\dot{\varvec{\Omega }}},\varvec{\xi },{\dot{\varvec{\xi }}},t\right) \) and equating the differential of S to zero give, after either some rather tedious direct calculations or by using the results of Sect. 3.2, Suslov’s optimally controlled equations of motion:

$$\begin{aligned} {\dot{\varvec{\Omega }}}= & {} \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }- \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\xi }\right\} \nonumber \\ {\dot{\varvec{\kappa }}}= & {} \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \left( \frac{\partial C}{\partial \varvec{\Omega }}-\frac{\mathrm {d} }{\mathrm {d} t}\frac{\partial C}{\partial {\dot{\varvec{\Omega }}}} \right) ^\mathsf {T}-\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left[ {\mathbb {I}} \left( \varvec{\Omega }\times \varvec{\kappa }\right) + \varvec{\kappa }\times \left( {\mathbb {I}} \varvec{\Omega }\right) \right] \right. \nonumber \\&\left. + \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \left[ {\dot{\varvec{\xi }}} +{\mathbb {I}}\left( \varvec{\Omega }\times \left( {\mathbb {I}}^{-1} \varvec{\xi }\right) \right) +\left( {\mathbb {I}}^{-1} \varvec{\xi }\right) \times \left( {\mathbb {I}} \varvec{\Omega }\right) \right] -2\left\langle {\dot{\varvec{\xi }}}, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle {\mathbb {I}} \varvec{\kappa }\right\} \nonumber \\ \left( \frac{\mathrm {d}}{\mathrm {d}t}\frac{\partial C}{\partial {\dot{\varvec{\xi }}}} \right) ^\mathsf {T}= & {} \left( \frac{\partial C}{\partial \varvec{\xi }} \right) ^\mathsf {T} - 2 \left\langle \varvec{\kappa },\left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right\rangle {\mathbb {I}}^{-1} \varvec{\xi }+ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle {\mathbb {I}}^{-1} \left( \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right) \nonumber \\&+ \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle + \left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\kappa }- \left\langle \varvec{\kappa },{\dot{\varvec{\xi }}}\right\rangle \varvec{\Omega }\nonumber \\&-\left[ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle I - 2 {\mathbb {I}}^{-1} \varvec{\xi }\left( {\mathbb {I}} \varvec{\kappa }\right) ^\mathsf {T} \right] {\dot{\varvec{\Omega }}} -\varvec{\Omega }\varvec{\xi }^\mathsf {T} {\dot{\varvec{\kappa }}}, \end{aligned}$$
(4.32)

for \(a \le t \le b\), the left boundary conditions

$$\begin{aligned} \begin{aligned} \varvec{\Omega }(a) -\varvec{\Omega }_a&= 0 \\ \left[ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \varvec{\Omega }+ \left( \frac{\partial C}{\partial {\dot{\varvec{\xi }}}} \right) ^\mathsf {T} \right] _{t=a}&= 0, \end{aligned} \end{aligned}$$
(4.33)

and the right boundary conditions

$$\begin{aligned} \begin{aligned} \varvec{\Omega }(b) -\varvec{\Omega }_b&= 0 \\ \left[ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \varvec{\Omega }+\left( \frac{\partial C}{\partial {\dot{\varvec{\xi }}}} \right) ^\mathsf {T} \right] _{t=b}&= 0 \\ \left[ C+\left\langle \varvec{\kappa },\mathbf {q} \right\rangle - \left\langle \left\langle \varvec{\xi },{\mathbb {I}}^{-1}\varvec{\xi }\right\rangle {\mathbb {I}} \varvec{\kappa }+\left( \frac{\partial C}{\partial {\dot{\varvec{\Omega }}}} \right) ^\mathsf {T}, {\dot{\varvec{\Omega }}} \right\rangle -\left\langle \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \varvec{\Omega }+ \left( \frac{\partial C}{\partial {\dot{\varvec{\xi }}}} \right) ^\mathsf {T},{\dot{\varvec{\xi }}} \right\rangle \right] _{t=b}&= 0. \end{aligned} \end{aligned}$$
(4.34)

Using the first equation in (4.32), which is equivalent to (4.5), and the second equation in (4.34), the third equation in (4.34), corresponding to free terminal time, can be simplified, so that the right boundary conditions simplify to

$$\begin{aligned} \begin{aligned} \varvec{\Omega }(b) -\varvec{\Omega }_b&= 0 \\ \left[ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \varvec{\Omega }+\left( \frac{\partial C}{\partial {\dot{\varvec{\xi }}}} \right) ^\mathsf {T} \right] _{t=b}&= 0 \\ \left[ C - \left\langle \left\langle \varvec{\xi },{\mathbb {I}}^{-1}\varvec{\xi }\right\rangle {\mathbb {I}} \varvec{\kappa }+\left( \frac{\partial C}{\partial {\dot{\varvec{\Omega }}}} \right) ^\mathsf {T}, {\dot{\varvec{\Omega }}} \right\rangle \right] _{t=b}&= 0. \end{aligned} \end{aligned}$$
(4.35)

Equations (4.32), (4.33), and (4.35) form a TPBVP. Observe that the unknowns in this TPBVP are \(\varvec{\kappa }\), \(\varvec{\Omega }\), \(\varvec{\xi }\), and b, while the constant Lagrange multiplier vectors \(\varvec{\rho }\) and \(\varvec{\nu }\) are irrelevant.

This application of Pontryagin’s minimum principle differs slightly from the classical treatment of optimal control theory reviewed in Sect. 2.3. Let us connect our derivation to that section. In the classical application, the Hamiltonian involves 6 costates \(\varvec{\pi }\in {\mathbb {R}}^6\) and is given by

$$\begin{aligned} H= & {} L\left( \varvec{\Omega },\varvec{\xi },\varvec{u},t\right) \nonumber \\&+\left\langle \varvec{\pi }, \left[ \begin{array}{c} \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left( {\mathbb {I}} \varvec{\Omega }\right) {\times } \varvec{\Omega }{-} \left[ \left\langle \varvec{\Omega },\varvec{u}\right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\xi }\right\} \\ \varvec{u}\end{array} \right] \right\rangle ,\nonumber \\ \end{aligned}$$
(4.36)

whereas in our derivation above, the Hamiltonian involves only 3 costates \(\varvec{\kappa }\in {\mathbb {R}}^3\) and is given by

$$\begin{aligned} H_r {=} C\left( \varvec{\Omega },{\dot{\varvec{\Omega }}},\varvec{\xi },{\dot{\varvec{\xi }}},t\right) {-}\left\langle \varvec{\kappa },\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }{-} \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\xi }\right\rangle ,\nonumber \\ \end{aligned}$$
(4.37)

with \(L\left( \varvec{\Omega },\varvec{\xi },\varvec{u},t\right) =C\left( \varvec{\Omega },{\dot{\varvec{\Omega }}},\varvec{\xi },{\dot{\varvec{\xi }}},t\right) \), since \({\dot{\varvec{\Omega }}}\) is a function of \(\varvec{\Omega }\), \(\varvec{\xi }\), and \({\dot{\varvec{\xi }}}\) and since \(\varvec{u}= {\dot{\varvec{\xi }}}\).

It can be shown that the classical costates \(\varvec{\pi }\) can be obtained from the reduced costates \(\varvec{\kappa }\), derived here, via

$$\begin{aligned} \varvec{\pi }= - \left[ \begin{array}{c} \left\langle \varvec{\xi },{\mathbb {I}}^{-1}\varvec{\xi }\right\rangle {\mathbb {I}} \varvec{\kappa }+\left( \frac{\partial C}{\partial {\dot{\varvec{\Omega }}}} \right) ^\mathsf {T} \\ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \varvec{\Omega }+\left( \frac{\partial C}{\partial {\dot{\varvec{\xi }}}} \right) ^\mathsf {T} \end{array} \right] . \end{aligned}$$
(4.38)

Now consider the particular cost function (4.30) corresponding to the optimal control problem (4.29). For this cost function, the partial derivative of the Hamiltonian (4.36) with respect to the control \(\varvec{u}={\dot{\varvec{\xi }}}\) is

$$\begin{aligned} \begin{aligned} H_{\varvec{u}} = H_{{\dot{\varvec{\xi }}}} =\frac{\partial L}{\partial \varvec{u}}+\varvec{\pi }_d^\mathsf {T}\frac{\partial {\dot{\varvec{\Omega }}}}{\partial \varvec{u}}+\varvec{\pi }_e^\mathsf {T}&=\frac{\partial C_{\alpha ,\beta ,\gamma ,\eta ,\delta }}{\partial {\dot{\varvec{\Omega }}}}\frac{\partial {\dot{\varvec{\Omega }}}}{\partial {\dot{\varvec{\xi }}}}+\frac{\partial C_{\alpha ,\beta ,\gamma ,\eta ,\delta }}{\partial {\dot{\varvec{\xi }}}}+\varvec{\pi }_d^\mathsf {T}\frac{\partial {\dot{\varvec{\Omega }}}}{\partial {\dot{\varvec{\xi }}}}+\varvec{\pi }_e^\mathsf {T} \\&=\eta {{\dot{\varvec{\Omega }}}}^\mathsf {T} \frac{\partial {\dot{\varvec{\Omega }}}}{\partial {\dot{\varvec{\xi }}}}+\beta { {\dot{\varvec{\xi }}}}^\mathsf {T}+\varvec{\pi }_d^\mathsf {T}\frac{\partial {\dot{\varvec{\Omega }}}}{\partial {\dot{\varvec{\xi }}}}+\varvec{\pi }_e^\mathsf {T}, \end{aligned} \end{aligned}$$
(4.39)

where we have defined for brevity

$$\begin{aligned} \varvec{\pi }_d \equiv \begin{bmatrix} \pi _1 \\ \pi _2 \\ \pi _3 \end{bmatrix} \qquad \text{ and } \qquad \varvec{\pi }_e \equiv \begin{bmatrix} \pi _4 \\ \pi _5 \\ \pi _6 \end{bmatrix} \qquad \text{ and } \text{ where } \qquad \frac{\partial {\dot{\varvec{\Omega }}}}{\partial {\dot{\varvec{\xi }}}} = \frac{{{\mathbb {I}}}^{-1} \varvec{\xi }\varvec{\Omega }^\mathsf {T}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } . \end{aligned}$$
(4.40)

The second partial derivative of the Hamiltonian (4.36) with respect to the control \(\varvec{u}={\dot{\varvec{\xi }}}\) is

$$\begin{aligned} \begin{aligned} H_{\varvec{u}\varvec{u}}&= H_{{\dot{\varvec{\xi }}} {\dot{\varvec{\xi }}}}=\eta \left( \frac{\partial {\dot{\varvec{\Omega }}}}{\partial {\dot{\varvec{\xi }}}} \right) ^\mathsf {T} \frac{\partial {\dot{\varvec{\Omega }}}}{\partial {\dot{\varvec{\xi }}}}+\beta I \\&=\frac{\eta }{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \left[ {{\mathbb {I}}}^{-1} \varvec{\xi }\varvec{\Omega }^\mathsf {T} \right] ^\mathsf {T} \left[ {{\mathbb {I}}}^{-1} \varvec{\xi }\varvec{\Omega }^\mathsf {T} \right] +\beta I ={\tilde{c}} \varvec{\Omega }\varvec{\Omega }^\mathsf {T}+\beta I, \end{aligned} \end{aligned}$$
(4.41)

where \({\tilde{c}} =\frac{\eta }{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \varvec{\xi }^\mathsf {T} {\mathbb {I}}^{-2} \varvec{\xi }\) is a nonnegative scalar. Recall that it is assumed that \(\beta \ge 0\). If \(\beta =0\), then \(H_{\varvec{u}\varvec{u}}={\tilde{c}} \varvec{\Omega }\varvec{\Omega }^\mathsf {T}\) is singular since \(\varvec{\Omega }\varvec{\Omega }^\mathsf {T}\) is a rank 1 matrix. Hence, if \(H_{\varvec{u}\varvec{u}}\) is nonsingular, then \(\beta > 0\). Now suppose that \(\beta >0\). Part of the Sherman–Morrison formula (Dahlquist and Björck 1974) says that given an invertible matrix \(A \in {\mathbb {R}}^{n \times n}\) and \(\varvec{w},\varvec{v}\in {\mathbb {R}}^{n \times 1}\), \(A+\varvec{w}\varvec{v}^\mathsf {T}\) is invertible if and only if \(1+\varvec{v}^\mathsf {T}A^{-1} \varvec{w}\ne 0\). Letting \(A = \beta I\) and \(\varvec{w}=\varvec{v}=\sqrt{{\tilde{c}}}\varvec{\Omega }\), the Sherman–Morrison formula guarantees that \(H_{\varvec{u}\varvec{u}}\) is nonsingular if \(1+\frac{{\tilde{c}}}{\beta } \varvec{\Omega }^\mathsf {T} \varvec{\Omega }\ne 0\). But \(\frac{{\tilde{c}}}{\beta } \varvec{\Omega }^\mathsf {T} \varvec{\Omega }\ge 0\), so \(1+\frac{{\tilde{c}}}{\beta } \varvec{\Omega }^\mathsf {T} \varvec{\Omega }\ge 1\) and \(H_{\varvec{u}\varvec{u}}\) is nonsingular. Therefore, \(H_{\varvec{u}\varvec{u}}\) is nonsingular if and only if \(\beta >0\). Thus, the optimal control problem (4.29) is nonsingular if and only if \(\beta >0\). Since singular optimal control problems require careful analysis and solution methods, it is assumed for the remainder of this paper, except in Sect. 5.1, that \(\beta >0\). As explained in the paragraph after (4.30), \(\beta >0\) requires that \(\alpha >0\). So for the remainder of this paper, except in Sect. 5.1, it is assumed that \(\beta >0\) and \(\alpha >0\) when considering the optimal control problem (4.29).

For the particular cost function (4.30), with \(\alpha >0\), \(\beta >0\), \(\gamma \ge 0\), \(\eta \ge 0\), and \(\delta \ge 0\), the optimally controlled equations of motion (4.32) defined on \(a\le t\le b\) become

$$\begin{aligned} {\dot{\varvec{\Omega }}}&= \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }- \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\xi }\right\} \nonumber \\ {\dot{\varvec{\kappa }}}&= \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \gamma \left( \varvec{\Omega }- \varvec{\Omega }_d \right) -\eta {\ddot{\varvec{\Omega }}}-\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left[ {\mathbb {I}} \left( \varvec{\Omega }\times \varvec{\kappa }\right) + \varvec{\kappa }\times \left( {\mathbb {I}} \varvec{\Omega }\right) \right] \right. \nonumber \\&\qquad \left. + \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \left[ {\dot{\varvec{\xi }}} +{\mathbb {I}}\left( \varvec{\Omega }\times \left( {\mathbb {I}}^{-1} \varvec{\xi }\right) \right) +\left( {\mathbb {I}}^{-1} \varvec{\xi }\right) \times \left( {\mathbb {I}} \varvec{\Omega }\right) \right] -2\left\langle {\dot{\varvec{\xi }}}, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle {\mathbb {I}} \varvec{\kappa }\right\} \\ {\ddot{\varvec{\xi }}}&= \frac{1}{\beta } \left\{ \alpha \left( \left| \varvec{\xi }\right| ^2 -1 \right) \varvec{\xi }- 2 \left\langle \varvec{\kappa },\left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right\rangle {\mathbb {I}}^{-1} \varvec{\xi }+ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle {\mathbb {I}}^{-1} \left( \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right) \right. \nonumber \\&\qquad + \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle + \left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\kappa }- \left\langle \varvec{\kappa },{\dot{\varvec{\xi }}}\right\rangle \varvec{\Omega }\nonumber \\&\qquad \left. - \left[ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle I - 2 {\mathbb {I}}^{-1} \varvec{\xi }\left( {\mathbb {I}} \varvec{\kappa }\right) ^\mathsf {T} \right] {\dot{\varvec{\Omega }}} - \varvec{\Omega }\left\langle \varvec{\xi }, {\dot{\varvec{\kappa }}} \right\rangle \right\} ,\nonumber \end{aligned}$$
(4.42)

the left boundary conditions (4.33) become

$$\begin{aligned} \begin{aligned} \varvec{\Omega }(a) -\varvec{\Omega }_a&= 0 \\ \left[ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \varvec{\Omega }+\beta {\dot{\varvec{\xi }}} \right] _{t=a}&= 0, \end{aligned} \end{aligned}$$
(4.43)

and the right boundary conditions (4.35) become

$$\begin{aligned} \begin{aligned} \varvec{\Omega }(b) -\varvec{\Omega }_b&= 0 \\ \left[ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \varvec{\Omega }+\beta {\dot{\varvec{\xi }}} \right] _{t=b}&= 0 \\ \left[ \frac{\alpha }{4} \left[ \left| \varvec{\xi }\right| ^2-1 \right] ^2+\frac{\beta }{2} \left| {\dot{\varvec{\xi }}} \right| ^2+\frac{\gamma }{2} \left| \varvec{\Omega }- \varvec{\Omega }_d \right| ^2+\delta -\frac{\eta }{2} \left| {\dot{\varvec{\Omega }}} \right| ^2 - \left\langle \left\langle \varvec{\xi },{\mathbb {I}}^{-1}\varvec{\xi }\right\rangle {\mathbb {I}} \varvec{\kappa }, {\dot{\varvec{\Omega }}} \right\rangle \right] _{t=b}&= 0. \end{aligned} \end{aligned}$$
(4.44)

(4.42) is an implicit system of ODEs since \({\dot{\varvec{\kappa }}}\) depends on \({\ddot{\varvec{\Omega }}}\), which in turn depends on \({\ddot{\varvec{\xi }}}\), while \({\ddot{\varvec{\xi }}}\) depends on \({\dot{\varvec{\kappa }}}\). While one can in principle proceed to solve these equations as an implicit system of ODEs, an explicit expression for the highest derivatives can be found which reveals possible singularities in the system. The system can be written explicitly as

$$\begin{aligned} \begin{aligned} {\dot{\varvec{\Omega }}}&= \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }- \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\xi }\right\} \\ {\ddot{\varvec{\xi }}}&= \frac{1}{\beta } \left[ I - \frac{ \eta \left\langle \varvec{\xi },{\mathbb {I}}^{-2} \varvec{\xi }\right\rangle }{\beta \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2+\eta \left\langle \varvec{\xi },{\mathbb {I}}^{-2} \varvec{\xi }\right\rangle \left\langle \varvec{\Omega },\varvec{\Omega }\right\rangle }\varvec{\Omega }\varvec{\Omega }^\mathsf {T} \right] \left\{ \mathbf{h} - \varvec{\Omega }\left\langle \varvec{\xi },\mathbf{g} \right\rangle \right\} \\ {\dot{\varvec{\kappa }}}&= \mathbf{g} + \frac{\eta \left\langle \varvec{\Omega }, {\ddot{\varvec{\xi }}} \right\rangle }{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} {{\mathbb {I}}}^{-2} \varvec{\xi }, \end{aligned} \end{aligned}$$
(4.45)

where we have defined \(\dot{\mathbf{n}}_1\) as

$$\begin{aligned} \dot{\mathbf{n}}_1 = \left\langle {\dot{\varvec{\Omega }}} ,{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} {\dot{\varvec{\Omega }}} \right) \times \varvec{\Omega }+\left( {\mathbb {I}} \varvec{\Omega }\right) \times {\dot{\varvec{\Omega }}},{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} {\dot{\varvec{\xi }}} \right\rangle , \end{aligned}$$
(4.46)

\(\mathbf{g}\) as

$$\begin{aligned} \begin{aligned} \mathbf{g}&= \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \gamma \left( \varvec{\Omega }- \varvec{\Omega }_d \right) \right. \\&\qquad -\eta {\mathbb {I}}^{-1} \left\{ \left( {\mathbb {I}} {\dot{\varvec{\Omega }}} \right) \times \varvec{\Omega }+ \left( {\mathbb {I}} \varvec{\Omega }\right) \times {\dot{\varvec{\Omega }}}-\left[ \frac{\left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle }{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \right] {\dot{\varvec{\xi }}} \right. \\&\left. -\left[ \frac{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \dot{\mathbf{n}}_1-2\left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \left\langle {\dot{\varvec{\xi }}},{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle }{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle ^2} \right] \varvec{\xi }\right\} \\&\qquad -\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left[ {\mathbb {I}} \left( \varvec{\Omega }\times \varvec{\kappa }\right) + \varvec{\kappa }\times \left( {\mathbb {I}} \varvec{\Omega }\right) \right] \\&\qquad \left. + \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle \left[ {\dot{\varvec{\xi }}} +{\mathbb {I}}\left( \varvec{\Omega }\times \left( {\mathbb {I}}^{-1} \varvec{\xi }\right) \right) +\left( {\mathbb {I}}^{-1} \varvec{\xi }\right) \times \left( {\mathbb {I}} \varvec{\Omega }\right) \right] -2\left\langle {\dot{\varvec{\xi }}}, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle {\mathbb {I}} \varvec{\kappa }\right\} , \\ \end{aligned} \end{aligned}$$
(4.47)

and \(\mathbf{h}\) as

$$\begin{aligned} \begin{aligned} \mathbf{h}&= \alpha \left( \left| \varvec{\xi }\right| ^2 -1 \right) \varvec{\xi }- 2 \left\langle \varvec{\kappa },\left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right\rangle {\mathbb {I}}^{-1} \varvec{\xi }+ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle {\mathbb {I}}^{-1} \left( \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }\right) \\&\qquad + \left[ \left\langle \varvec{\Omega },{\dot{\varvec{\xi }}} \right\rangle + \left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\kappa }- \left\langle \varvec{\kappa },{\dot{\varvec{\xi }}}\right\rangle \varvec{\Omega }\\&\qquad - \left[ \left\langle \varvec{\kappa },\varvec{\xi }\right\rangle I - 2 {\mathbb {I}}^{-1} \varvec{\xi }\left( {\mathbb {I}} \varvec{\kappa }\right) ^{\mathsf {T}} \right] {\dot{\varvec{\Omega }}}. \end{aligned} \end{aligned}$$
(4.48)

The ODEs (4.45) and the left and right boundary conditions (4.43)–(4.44) define a TPBVP for the solution to Suslov’s optimal control problem (4.29) using the cost function (4.30). We shall also notice that while casting the optimal control problem as an explicit system of ODEs such as (4.45) brings it to the standard form amenable to numerical solution, it loses the geometric background of the optimal control problem derived earlier in Sect. 3.2.

Remark 4.10

(On optimal solutions with switching structure and bang-bang control) It is worth noting that in our paper we allow the control \({\dot{\varvec{\xi }}}\) to be unbounded so that it may take arbitrary values in \({\mathbb {R}}^3\). In addition, note that at the end of the previous section, the control \({\dot{\varvec{\xi }}}\) is assumed to be differentiable and therefore continuous. However, if we were to set up a restriction on the control such as \(|{\dot{\varvec{\xi }}}|\le M\) for a fixed \(|\varvec{\xi }|\), say \(|\varvec{\xi }|=1\), and permit \({\dot{\varvec{\xi }}}\) to be piecewise continuous, then the solutions to the optimal control problems tend to lead to bang-bang control obtained by piecing together solutions with \(|{\dot{\varvec{\xi }}}|=M\). The constraint \(|\varvec{\xi }|=1\) is equivalent to the constraint \(\left\langle \varvec{\xi },{\dot{\varvec{\xi }}} \right\rangle =0\) with the initial condition \(\left| \varvec{\xi }(a)\right| =1\). The constraint \(|{\dot{\varvec{\xi }}}|\le M\) is equivalent to the constraint \(|{\dot{\varvec{\xi }}}|^2-M^2-\theta ^2 = 0\), where \(\theta \) is a so-called slack variable. To incorporate these constraints, the Hamiltonian given in (4.36) must be amended to

$$\begin{aligned} \begin{aligned} H&= L\left( \varvec{\Omega },\varvec{\xi },\varvec{u},t\right) \\&\qquad +\left\langle \varvec{\pi }, \left[ \begin{array}{c} \frac{{{\mathbb {I}}}^{-1}}{\left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle } \left\{ \left\langle \varvec{\xi }, {\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega }- \left[ \left\langle \varvec{\Omega },\varvec{u}\right\rangle +\left\langle \left( {\mathbb {I}} \varvec{\Omega }\right) \times \varvec{\Omega },{\mathbb {I}}^{-1} \varvec{\xi }\right\rangle \right] \varvec{\xi }\right\} \\ \varvec{u}\end{array} \right] \right\rangle \\&\qquad +\left\langle \varvec{\mu }, \begin{array}{c} \left\langle \varvec{\xi },{\dot{\varvec{\xi }}} \right\rangle \\ \left| {\dot{\varvec{\xi }}} \right| ^2-M^2-\theta ^2 \end{array} \right\rangle , \end{aligned} \end{aligned}$$
(4.49)

where \(\varvec{\mu }= \begin{bmatrix} \mu _1 \\ \mu _2 \end{bmatrix} \in {\mathbb {R}}^2\) are new costates enforcing the new constraints and the control now consists of \(\varvec{u}= {\dot{\varvec{\xi }}}\) and \(\theta \). A solution that minimizes the optimal control problem with Hamiltonian (4.49) is determined from the necessary optimality conditions \(H_{\varvec{u}} = \mathbf {0}\) and \(H_\theta = -2 \mu _2 \theta = 0\). The latter condition implies that \(\mu _2=0\) or \(\theta =0\). If \(\mu _2 = 0\), the control \(\varvec{u}= {\dot{\varvec{\xi }}}\) is determined from \(H_{\varvec{u}} = \mathbf {0}\) and \(\theta \) is determined from \(\theta ^2 = | {\dot{\varvec{\xi }}} |^2-M\). If \(\theta = 0\), the control \(\varvec{u}= {\dot{\varvec{\xi }}}\) is determined from \(| {\dot{\varvec{\xi }}} |^2=M\) and \(\mu _2\) is determined from \(H_{\varvec{u}} = \mathbf {0}\). The difficulty is determining the intervals on which \(\mu _2=0\) or \(\theta =0\); this is the so-called optimal switching structure. In this paper, this difficulty is avoided by assuming that the control \(\varvec{u}= {\dot{\varvec{\xi }}}\) is unbounded and differentiable rather than bounded and piecewise continuous. Instead of bounding the control \(\varvec{u}= {\dot{\varvec{\xi }}}\) through hard constraints, large magnitude controls are penalized by the term \(\frac{\beta }{2} \left| {\dot{\varvec{\xi }}} \right| ^2\) in the cost function (4.30).

5 Numerical Solution of Suslov’s Optimal Control Problem

5.1 Analytical Solution of a Singular Version of Suslov’s Optimal Control Problem

In what follows, we shall focus on the numerical solution of the optimal control problem (4.29) by solving (4.45), (4.43), and (4.44), with \(\alpha >0\), \(\beta >0\), \(\gamma \ge 0\), \(\eta \ge 0\), and \(\delta \ge 0\). As these equations represent a nonlinear TPBVP, having a good initial approximate solution is crucial for the convergence of numerical methods. Because of the complexity of the problem, the numerical methods show no convergence to the solution unless the case considered is excessively simple. Instead, we employ the continuation procedure, namely we solve a problem with the values of the parameters chosen in such a way that an analytical solution of (4.29) can be found. Starting from this analytical solution, we seek a continuation of the solution to the desired values of the parameters. As it turns out, this procedure enables the computation of rather complex trajectories as illustrated by the numerical examples in Sect. 5.3.

To begin, let us consider a simplification of the optimal control problem (4.29). Suppose the terminal time is fixed to \(b=b_p\), \(\beta =0\), \(\eta =0\), and \(\delta =0\). In addition, suppose \(\varvec{\Omega }_d\) is replaced by \(\varvec{\Omega }_p\), where \(\varvec{\Omega }_p\) satisfies the following properties:

Property 5.1

\(\varvec{\Omega }_p\) is a differentiable function such that \(\varvec{\Omega }_p(a)=\varvec{\Omega }_a\) and \(\varvec{\Omega }_p(b_p)=\varvec{\Omega }_b\).

Property 5.2

\(\varvec{\Omega }_p\) lies on the constant kinetic energy manifold E, i.e., \(\left\langle {\mathbb {I}} \varvec{\Omega }_p, {\dot{\varvec{\Omega }}}_p \right\rangle =0\) iff \(\left\langle {\mathbb {I}} \varvec{\Omega }_p, \varvec{\Omega }_p \right\rangle =\left\langle {\mathbb {I}} \varvec{\Omega }_a , \varvec{\Omega }_a \right\rangle \).

Property 5.3

\(\varvec{\Omega }_p\) does not satisfy Euler’s equations at any time, i.e., \({\mathbb {I}} {\dot{\varvec{\Omega }}}_p(t) - \left[ {\mathbb {I}} \varvec{\Omega }_p(t) \right] \times \varvec{\Omega }_p(t) \ne \mathbf {0} \; \forall t \in [a,b_p]\).

Under these assumptions, (4.29) simplifies to

$$\begin{aligned} \min _{\varvec{\xi }} \int _a^{b_p} \left[ \frac{\alpha }{4} \left[ \left| \varvec{\xi }\right| ^2 -1 \right] ^2+\frac{\gamma }{2} \left| \varvec{\Omega }- \varvec{\Omega }_p \right| ^2 \right] \, \mathrm {d} t \, \text{ s.t. } \, \left\{ \begin{array}{ll} \mathbf {q}=0, \\ \varvec{\Omega }(a)=\varvec{\Omega }_a \in E,\\ \varvec{\Omega }(b_p)=\varvec{\Omega }_b \in E. \end{array} \right. \end{aligned}$$
(5.1)

As discussed immediately after (4.41), (5.1) is a singular optimal control problem since \(\beta =0\). If there exists \(\varvec{\xi }_p\) such that \(\left| \varvec{\xi }_p \right| =1\) and \(\mathbf {q}\left( \varvec{\Omega }_p, \varvec{\xi }_p \right) =\mathbf {0}\), then \(\varvec{\xi }_p\) is a solution to the singular optimal control problem (5.1) provided Property 5.1 is satisfied. To wit, for such a \(\varvec{\xi }_p\) and given Property 5.1, take \(\varvec{\Omega }=\varvec{\Omega }_p\) and \(\varvec{\xi }=\varvec{\xi }_p\). Then, \(\mathbf {q}\left( \varvec{\Omega }, \varvec{\xi }\right) =\mathbf {q}\left( \varvec{\Omega }_p, \varvec{\xi }_p \right) =\mathbf {0}\), \(\varvec{\Omega }(a)=\varvec{\Omega }_p(a)=\varvec{\Omega }_a\), \(\varvec{\Omega }(b_p)=\varvec{\Omega }_p(b_p)=\varvec{\Omega }_b\), and \(\displaystyle \int _a^{b_p} \left[ \frac{\alpha }{4} \left[ \left| \varvec{\xi }\right| ^2 -1 \right] ^2+\frac{\gamma }{2} \left| \varvec{\Omega }- \varvec{\Omega }_p \right| ^2 \right] \, \mathrm {d} t =0\).

Now to construct such a \(\varvec{\xi }_p\), assume \(\varvec{\Omega }_p\) satisfies Properties 5.15.3. To motivate the construction of \(\varvec{\xi }_p\), also assume that \({\hat{\varvec{\xi }}}\) exists for which \(\mathbf {q}\left( \varvec{\Omega }_p,{\hat{\varvec{\xi }}} \right) =\mathbf {0}\), \({\hat{\varvec{\xi }}}(t) \ne 0 \; \forall t \in \left[ a,b_p\right] \), and \(\left\langle \varvec{\Omega }_p,{\hat{\varvec{\xi }}} \right\rangle =c=0\). Since \(\left\langle \varvec{\Omega }_p,{\hat{\varvec{\xi }}} \right\rangle =c=0\), \(\mathbf {q}\left( \varvec{\Omega }_p,\pi {\hat{\varvec{\xi }}} \right) =\mathbf {0}\) for any rescaling \(\pi \) of \({\hat{\varvec{\xi }}}\). Letting \({\tilde{\varvec{\xi }}} \equiv \lambda \left( \varvec{\Omega }_p,{\hat{\varvec{\xi }}} \right) {\hat{\varvec{\xi }}} = {\mathbb {I}} {\dot{\varvec{\Omega }}}_p - \left( {\mathbb {I}} \varvec{\Omega }_p \right) \times \varvec{\Omega }_p\), \(\mathbf {q}\left( \varvec{\Omega }_p,{{\tilde{\varvec{\xi }}}} \right) =\mathbf {q}\left( \varvec{\Omega }_p,\lambda \left( \varvec{\Omega }_p,{\hat{\varvec{\xi }}} \right) {\hat{\varvec{\xi }}} \right) =\mathbf {0}\). Next, by Property 5.3 (i.e., \({\mathbb {I}} {\dot{\varvec{\Omega }}}_p(t) - \left[ {\mathbb {I}} \varvec{\Omega }_p(t) \right] \times \varvec{\Omega }_p(t) \ne \mathbf {0} \; \forall t \in \left[ a,b_p\right] \)), normalize \({{\tilde{\varvec{\xi }}}}\) to produce a unit magnitude control vector \(\varvec{\xi }_p\):

$$\begin{aligned} \varvec{\xi }_p \equiv \frac{{\tilde{\varvec{\xi }}}}{\left| {\tilde{\varvec{\xi }}} \right| } = \frac{ {\mathbb {I}} {\dot{\varvec{\Omega }}}_p - \left( {\mathbb {I}} \varvec{\Omega }_p \right) \times \varvec{\Omega }_p}{\left| {\mathbb {I}} {\dot{\varvec{\Omega }}}_p - \left( {\mathbb {I}} \varvec{\Omega }_p \right) \times \varvec{\Omega }_p \right| }. \end{aligned}$$
(5.2)

Again due to scale invariance of the control vector, \(\mathbf {q}\left( \varvec{\Omega }_p,\varvec{\xi }_p \right) =\mathbf {0}\).

One can note that this derivation of \(\varvec{\xi }_p\) possessing the special properties \(\mathbf {q}\left( \varvec{\Omega }_p,\varvec{\xi }_p \right) =\mathbf {0}\) and \(\left| \varvec{\xi }_p \right| =1\) relied on the existence of some \({\hat{\varvec{\xi }}}\) for which \(\mathbf {q}\left( \varvec{\Omega }_p,{\hat{\varvec{\xi }}} \right) =\mathbf {0}\), \({\hat{\varvec{\xi }}}(t) \ne \mathbf {0} \; \forall t \in \left[ a,b_p\right] \), and \(\left\langle \varvec{\Omega }_p,{\hat{\varvec{\xi }}} \right\rangle =c=0\). Given \(\varvec{\xi }_p\) defined by (5.2) and by Property 5.2 (i.e., \(\left\langle {\mathbb {I}} \varvec{\Omega }_p, {\dot{\varvec{\Omega }}}_p \right\rangle =0\)), it is trivial to check that \(\left\langle \varvec{\Omega }_p,\varvec{\xi }_p \right\rangle =0\), so that indeed \(\mathbf {q}\left( \varvec{\Omega }_p,\varvec{\xi }_p \right) =\mathbf {0}\) with

$$\begin{aligned} \lambda \left( \varvec{\Omega }_p,\varvec{\xi }_p\right)\equiv & {} - \frac{ \left\langle \varvec{\Omega }_p, {{\dot{\varvec{\xi }}}}_p \right\rangle + \left\langle \left( {\mathbb {I}} \varvec{\Omega }_p \right) \times \varvec{\Omega }_p, {\mathbb {I}}^{-1} \varvec{\xi }_p \right\rangle }{ \left\langle \varvec{\xi }_p, {\mathbb {I}}^{-1} \varvec{\xi }_p \right\rangle } \\= & {} \frac{ \left\langle {\mathbb {I}} {\dot{\varvec{\Omega }}}_p-\left( {\mathbb {I}} \varvec{\Omega }_p \right) \times \varvec{\Omega }_p, {\mathbb {I}}^{-1} \varvec{\xi }_p \right\rangle }{ \left\langle \varvec{\xi }_p, {\mathbb {I}}^{-1} \varvec{\xi }_p \right\rangle } = \left| {\mathbb {I}} {\dot{\varvec{\Omega }}}_p - \left( {\mathbb {I}} \varvec{\Omega }_p \right) \times \varvec{\Omega }_p \right| . \end{aligned}$$

Thus, \(\varvec{\xi }_p\) defined by (5.2) is a solution of the singular optimal control problem (5.1). Moreover, \(\varvec{\xi }_p\) has the desirable property \(\left\langle \varvec{\Omega }_p,\varvec{\xi }_p \right\rangle =0\). The costate \(\varvec{\kappa }= \mathbf {0}\) satisfies the ODE TPBVP (4.45), (4.43)–(4.44) corresponding to the analytic solution pair \((\varvec{\Omega }_p,\varvec{\xi }_p)\).

5.2 Numerical Solution of Suslov’s Optimal Control Problem via Continuation

Starting from the analytic solution pair \((\varvec{\Omega }_p,\varvec{\xi }_p)\) solving (5.1), the full optimal control problem can then be solved by continuation in \(\gamma \), \(\beta \), \(\eta \), and \(\delta \) using the following algorithm. We refer the reader to Allgower and Georg (2003) as a comprehensive reference on numerical continuation methods, as well as our discussion in Appendix. Consider the continuation cost function \(C_{\alpha ,\beta _c,\gamma _c,\eta _c,\delta _c}\), where \(\beta _c\), \(\gamma _c\), \(\eta _c\), and \(\delta _c\) are variables. If \(\gamma =0\), choose \(\beta _m\) such that \(0<\beta _m \ll \min \{\alpha ,\beta ,1\}\); otherwise if \(\gamma >0\), choose \(\beta _m\) such that \(0<\beta _m \ll \min \{\alpha ,\beta ,\gamma \}\). If the terminal time b is fixed, choose \(b_p=b\); otherwise, if the terminal time is free, choose \(b_p\) as explained below.

If \(\gamma =0\), choose \(\varvec{\Omega }_p\) to be some nominal function satisfying Properties 5.15.3, such as the projection of the line segment connecting \(\varvec{\Omega }_a\) to \(\varvec{\Omega }_b\) onto E and let \(b_p\) be the time such that \(\varvec{\Omega }_p(b_p)=\varvec{\Omega }_b\). For fixed terminal time \(b_p\), solve (4.29) with cost function \(C_{\alpha ,\beta _m,\gamma _c,0,0}\) by continuation in \(\gamma _c\), starting from \(\gamma _c = 1\) with the initial solution guess \((\varvec{\Omega }_p,\varvec{\xi }_p)\) and ending at \(\gamma _c = \gamma =0\) with the final solution pair \((\varvec{\Omega }_1,\varvec{\xi }_1)\).

If \(\gamma >0\) and \(\varvec{\Omega }_d\) does not satisfy Properties 5.15.3, choose \(\varvec{\Omega }_p\) to be some function “near” \(\varvec{\Omega }_d\) that does satisfy Properties 5.15.3 and let \(b_p\) be the time such that \(\varvec{\Omega }_p(b_p)=\varvec{\Omega }_b\). For fixed terminal time \(b_p\), solve (4.29) with cost function \(C_{\alpha ,\beta _m,\gamma _c,0,0}\) by continuation in \(\gamma _c\), starting from \(\gamma _c = 0\) with the initial solution guess \((\varvec{\Omega }_p,\varvec{\xi }_p)\) and ending at \(\gamma _c = \gamma \) with the final solution pair \((\varvec{\Omega }_1,\varvec{\xi }_1)\).

If \(\gamma >0\) and \(\varvec{\Omega }_d\) satisfies Properties 5.15.3, choose \(\varvec{\Omega }_p=\varvec{\Omega }_d\), let \(b_p\) be the time such that \(\varvec{\Omega }_d(b_p)=\varvec{\Omega }_b\), and construct the solution pair \((\varvec{\Omega }_1,\varvec{\xi }_1)\) with \(\varvec{\Omega }_1=\varvec{\Omega }_p\) and \(\varvec{\xi }_1 = \varvec{\xi }_p\).

For fixed terminal time \(b_p\), solve (4.29) with cost function \(C_{\alpha ,\beta _c,\gamma ,0,0}\) by continuation in \(\beta _c\), starting from \(\beta _c = \beta _m\) with the initial solution guess \((\varvec{\Omega }_1,\varvec{\xi }_1)\) and ending at \(\beta _c = \beta \) with the final solution pair \((\varvec{\Omega }_2,\varvec{\xi }_2)\). Next, for fixed terminal time \(b_p\), solve (4.29) with cost function \(C_{\alpha ,\beta ,\gamma ,\eta _c,0}\) by continuation in \(\eta _c\), starting from \(\eta _c =0\) with the initial solution guess \((\varvec{\Omega }_2,\varvec{\xi }_2)\) and ending at \(\eta _c=\eta \) with the final solution pair \((\varvec{\Omega }_3,\varvec{\xi }_3)\). If the terminal time is fixed, then this is the final solution. If the terminal time is free, solve (4.29) with cost function \(C_{\alpha ,\beta ,\gamma ,\eta ,\delta _c}\), letting the terminal time vary, by continuation in \(\delta _c\), starting from

$$\begin{aligned} \delta _c = -\left[ \frac{\alpha }{4} \left[ \left| \varvec{\xi }\right| ^2{-}1 \right] ^2{+}\frac{\beta }{2} \left| {\dot{\varvec{\xi }}} \right| ^2+\frac{\gamma }{2} \left| \varvec{\Omega }- \varvec{\Omega }_d \right| ^2 -\frac{\eta }{2} \left| {\dot{\varvec{\Omega }}} \right| ^2-\left\langle \left\langle \varvec{\xi },{\mathbb {I}}^{-1}\varvec{\xi }\right\rangle {\mathbb {I}} \varvec{\kappa }, {\dot{\varvec{\Omega }}} \right\rangle \right] _{t=b}\nonumber \\ \end{aligned}$$
(5.3)

with the initial solution guess \((\varvec{\Omega }_3,\varvec{\xi }_3,b_p)\) and ending at \(\delta _c=\delta \) with final solution triple \((\varvec{\Omega }_4,\varvec{\xi }_4,b_4)\). If the terminal time is free, then this is the final solution.

5.3 Numerical Solution of Suslov’s Optimal Control Problem via the Indirect Method and Continuation

Suslov’s optimal control problem was solved numerically using the following inputs and setup. The rigid body’s inertia matrix is

$$\begin{aligned} {\mathbb {I}} = \left[ \begin{array}{ccc} 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 2 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 3 \end{array} \right] . \end{aligned}$$
(5.4)

The initial time is \(a=0\), and the terminal time b is free. The initial and terminal body angular velocities are \(\varvec{\Omega }_a=\varvec{\phi }(a)/2=[5, \, 0, \, 0]^\mathsf {T}\) and \(\varvec{\Omega }_b=\varvec{\phi }_\parallel (b_d) \approx [2.7541, \, -2.3109, \, -1.4983]^\mathsf {T}\), respectively, where \(\varvec{\phi }\) and \(\varvec{\phi }_\parallel \) are defined below in (5.5)–(5.7) and \(b_d=10\).

The desired body angular velocity \(\varvec{\Omega }_d\) (see Fig. 1) is approximately the projection of a spiral onto the constant kinetic energy ellipsoid E determined by the rigid body’s inertia matrix \({\mathbb {I}}\) and initial body angular velocity \(\varvec{\Omega }_a\) and defined in (4.11). Concretely, we aim to track a spiral-like trajectory \(\varvec{\Omega }_d\) on the constant kinetic energy ellipsoid E:

$$\begin{aligned} \varvec{\phi }(t)= & {} \left[ 10,\,t \cos {t},\, t \sin {t} \right] ^\mathsf {T}, \end{aligned}$$
(5.5)
$$\begin{aligned} \mathbf {v}_\parallel= & {} \sqrt{\frac{\left\langle \varvec{\Omega }_a,{\mathbb {I}} \varvec{\Omega }_a\right\rangle }{\left\langle \mathbf {v},{\mathbb {I}} \mathbf {v}\right\rangle }}\mathbf {v} \quad \mathrm {for} \, \mathbf {v} \in {\mathbb {R}}^3\backslash \mathbf {0}, \end{aligned}$$
(5.6)
$$\begin{aligned} \varvec{\phi }_\parallel (t)= & {} \left[ \varvec{\phi }(t) \right] _\parallel , \end{aligned}$$
(5.7)
$$\begin{aligned} \sigma (t)= & {} \frac{1}{2} \left[ 1+\tanh {\left( \frac{t}{.01}\right) } \right] , \end{aligned}$$
(5.8)
$$\begin{aligned} s(t)= & {} \sigma (t-b_d), \end{aligned}$$
(5.9)
$$\begin{aligned} \varvec{\Omega }_d(t)= & {} \varvec{\phi }_\parallel (t)\left( 1-s(t)\right) +\varvec{\Omega }_b s(t). \end{aligned}$$
(5.10)

The setup for \(\varvec{\Omega }_d\) is to be understood as follows. The graph of \(\varvec{\phi }\) (5.5) defines a spiral in the plane \(x=10\). Given a nonzero vector \(\mathbf {v} \in {\mathbb {R}}^3\), the parallel projection operator \(\parallel \) (5.6) constructs the vector \(\mathbf {v}_\parallel \) that lies at the intersection between the ray \(R_{\mathbf {v}} = \left\{ t \mathbf {v} : t>0 \right\} \) and the ellipsoid E. The spiral \(\varvec{\phi }_\parallel \) defined by (5.7) is the projection of the spiral \(\varvec{\phi }\) onto the ellipsoid E, which begins at \(\varvec{\Omega }_a\) at time a, and terminates at \(\varvec{\Omega }_b\) at time \(b_d\). Also, \(\sigma \) (5.8) is a sigmoid function, i.e., a smooth approximation of the unit step function, and s (5.9) is the time translation of \(\sigma \) to time \(b_d\). \(\varvec{\Omega }_d\) (5.10) utilizes the translated sigmoid function s to compute a weighted average of the projected spiral \(\varvec{\phi }_\parallel \) and \(\varvec{\Omega }_b\) so that \(\varvec{\Omega }_d\) follows the projected spiral \(\varvec{\phi }_\parallel \) for \(0\le t < b_d\), holds steady at \(\varvec{\Omega }_b\) for \(t>b_d\), and smoothly transitions between \(\varvec{\phi }_\parallel \) and \(\varvec{\Omega }_b\) at time \(b_d\). The coefficients of the cost function (4.30) are chosen to be \(\alpha = 1\), \(\beta = .1\), \(\gamma = 1\), \(\eta = 1 \, \mathrm {or} \, .01\), and \(\delta = .2\).

Fig. 1
figure 1

The desired body angular velocity is approximately the projection of a spiral onto the constant kinetic energy ellipsoid

The optimal control problem (4.29) was solved numerically via the indirect method, i.e., by numerically solving the ODE TPBVP (4.45), (4.43)–(4.44) through continuation in \(\beta \), \(\eta \), and \(\delta \) starting from the analytic solution to the singular optimal control problem (5.1), as outlined in Sect. 5.2. Because most ODE BVP solvers only solve problems defined on a fixed time interval, the ODE TPBVP (4.45), (4.43)–(4.44) was reformulated on the normalized time interval \(\left[ 0,1\right] \) through a change of variables by defining \(T \equiv b-a\) and by defining normalized time \(s \equiv \frac{t-a}{T}\); if the terminal time b is fixed, then T is a known constant, whereas if the terminal time b is free, then T is an unknown parameter that must be solved for in the ODE TPBVP. The finite difference automatic continuation solver acdc from the MATLAB package bvptwp was used to solve the ODE TPBVP by performing continuation in \(\beta \), \(\eta \), and \(\delta \), with the relative error tolerance set to 1e-8. The result of acdc was then passed through the MATLAB collocation solver sbvp using Gauss (rather than equidistant) collocation points with the absolute and relative error tolerances set to 1e-8. sbvp was used to clean up the solution provided by acdc because collocation exhibits superconvergence when solving regular (as opposed to singular) ODE TPBVP using Gauss collocation points. To make acdc and sbvp execute efficiently, the ODEs were implemented in MATLAB in vectorized fashion. For accuracy and efficiency, the MATLAB software ADiGator was used to supply vectorized, automatic ODE Jacobians to acdc and sbvp. For accuracy, the MATLAB Symbolic Math Toolbox was used to supply symbolically computed BC Jacobians to acdc and sbvp. ADiGator constructs Jacobians through automatic differentiation, while the MATLAB Symbolic Math Toolbox constructs Jacobians through symbolic differentiation.

Fig. 2
figure 2

Numerical solution of the optimal control problem for \(\alpha =1, \, \beta =.1,\, \gamma =1,\, \eta =1,\, \delta =.2\), and free terminal time. The optimal terminal time is \(b=11.36\). a The optimal body angular velocity roughly tracks the desired body angular velocity with \(\eta =1\). b Preservation of the nonholonomic orthogonality constraint. c Evolution of the magnitude of the control vector, which stays near unity. d Evolution of the costates \(\varvec{\kappa }(t)\)

Fig. 3
figure 3

Numerical solution of the optimal control problem for \(\alpha =1, \, \beta =.1,\, \gamma =1,\, \eta =.01,\, \delta =.2\), and free terminal time. The optimal terminal time is \(b=9.84\). a The optimal body angular velocity accurately tracks the desired body angular velocity with \(\eta =.01\). b Preservation of the nonholonomic orthogonality constraint. c Evolution of the magnitude of the control vector, which stays near unity. d Evolution of the costates \(\varvec{\kappa }(t)\)

Figures 2 and 3 show the results for \(\eta =1\) and \(\eta =.01\), respectively. The optimal terminal time is \(b=11.36\) for \(\eta =1\) and is \(b=9.84\) for \(\eta =.01\). Figures 2a and 3a show the optimal body angular velocity \(\varvec{\Omega }\), the desired body angular velocity \(\varvec{\Omega }_d\), and the projection \(\varvec{\xi }_\parallel \) of the control vector \(\varvec{\xi }\) onto the ellipsoid E. Recall that \(\gamma \), through the cost function term \(\frac{\gamma }{2} \left| \varvec{\Omega }- \varvec{\Omega }_d \right| ^2\), influences how closely the optimal body angular velocity \(\varvec{\Omega }\) tracks the desired body angular velocity \(\varvec{\Omega }_d\), while \(\eta \), through the cost function term \(\frac{\eta }{2} \left| {\dot{\varvec{\Omega }}} \right| ^2\), influences how closely the optimal body angular velocity \(\varvec{\Omega }\) tracks a minimum energy trajectory. For \(\gamma =1\), \(\frac{\gamma }{\eta }=1\) when \(\eta =1\) and \(\frac{\gamma }{\eta }=100\) when \(\eta =.01\). As expected, comparing Figs. 2a and 3a, the optimal body angular velocity \(\varvec{\Omega }\) tracks the desired body angular velocity \(\varvec{\Omega }_d\) much more accurately for \(\eta =.01\) compared to \(\eta =1\). Figures 2b and 3b demonstrate that the numerical solutions preserve the nonholonomic orthogonality constraint \(\left\langle \varvec{\Omega }, \varvec{\xi }\right\rangle =0\) to machine precision. Figures 2c and 3c show that the magnitude \(\left| \varvec{\xi }\right| \) of the control vector \(\varvec{\xi }\) remains close to 1, as encouraged by the cost function term \(\frac{\alpha }{4} \left[ \left| \varvec{\xi }\right| ^2 -1 \right] ^2\). Figures 2d and 3d show the costates \(\varvec{\kappa }\). In Figs. 2a, 3a, 2d, and 3d, a green marker indicates the beginning of a trajectory, while a red marker indicates the end of trajectory. In Fig. 3a, the yellow marker on the desired body angular velocity indicates \(\varvec{\Omega }_d(b)\), where \(b=9.84\) is the optimal terminal time for \(\eta =.01\).

To investigate the stability of the controlled system, we have perturbed the control \({\dot{\varvec{\xi }}}\) obtained from solving the optimal control ODE TPBVP and observed that the perturbed solution \(\varvec{\Omega }\) obtained by solving the pure equations of motion (4.5) as an ODE IVP using this perturbed control is similar to the anticipated \(\varvec{\Omega }\) corresponding to the solution of the optimal control ODE TPBVP and the unperturbed control. While more studies of stability are needed, this is an indication that the controlled system we studied is stable, at least in terms of the state variables \(\varvec{\Omega }\) and \(\varvec{\xi }\) under perturbations of the control \({\dot{\varvec{\xi }}}\). More studies of the stability of the controlled system will be undertaken in the future.

Verification of a local minimum solution It is also desirable to verify that the numerical solutions obtained by our continuation indirect method do indeed provide a local minimum of the optimal control problem. Chapter 21 in reference Agrachev and Sachkov (2004) and also reference Bonnard et al. (2007) provide sufficient conditions for a solution satisfying Pontryagin’s minimum principle to be a local minimum; however, the details are quite technical and may be investigated in future work. These sufficient conditions must be checked numerically rather than analytically. COTCOT and HamPath, also mentioned in Appendix, are numerical software packages which do check these sufficient conditions numerically.

Due to the technicality of the sufficient conditions discussed in Agrachev and Sachkov (2004), Bonnard et al. (2007), we have resorted to a different numerical justification. More precisely, to validate that the solutions obtained by our optimal control procedure, or the so-called indirect method solutions, indeed correspond to local minima, we have fed the solutions obtained by our method into several different MATLAB direct method solvers as initial solution guesses. We provide a survey of the current state of direct method solvers for optimal control problems in Appendix.

Note that the indirect method only produces a solution that meets the necessary conditions for a local minimum to (4.29), while the direct method solution meets the necessary and sufficient conditions for a local minimum to a finite-dimensional approximation of (4.29). Thus, it may be concluded that an indirect method solution is indeed a local minimum solution of (4.29) if the direct method solution is close to the indirect method solution. The indirect method solutions were validated against the MATLAB direct method solvers GPOPS-II and FALCON.m. GPOPS-II uses pseudospectral collocation techniques, uses the IPOPT NLP solver, uses hp-adaptive mesh refinement, and can use ADiGator to supply vectorized, automatic Jacobians and Hessians. FALCON.m uses trapezoidal or backward Euler local collocation techniques, uses the IPOPT NLP solver, and can use the MATLAB Symbolic Math Toolbox to supply symbolically computed Jacobians and Hessians. Both direct method solvers we have tried converged to a solution close to that provided by the indirect method, which is to be expected since the direct method solvers are only solving a finite-dimensional approximation of (4.29). Thus, we are confident that the solutions we have found in this section indeed correspond to local minima of the optimal control problems.

6 Conclusions

We have derived the equations of motion for the optimal control of Suslov’s problem and demonstrated the controllability of Suslov’s problem by varying the nonholonomic constraint vector \(\varvec{\xi }\) in time. It is shown that the problem has the desirable controllability in the classical control theory sense. We have also demonstrated that an optimal control procedure, using continuation from an analytical solution, not only can reach the desired final state, but can also force the system to follow a quite complex trajectory such as a spiral on the constant kinetic energy ellipsoid E. We have also investigated the sufficient conditions for a local minimum, and while we did not implement them, all the numerical evidence we have points to the solutions found being local minima.

The procedure outlined here opens up a possibility to control nonholonomic problems by continuous time variation of the constraint. We have derived the analysis case only for Suslov’s problem, which we consider one of the most fundamental problems in nonholonomic mechanics. It would be interesting to generalize the theory of the optimal control derived here to the case of an arbitrary manifold. Of particular importance for the controllability will be the dimensionality and geometry of the constraint versus that of the manifold. This will be addressed in future work.