Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Simultaneous Localization and Mapping (SLAM) is the process of determining the position and orientation of a moving platform within an environment, while also building a map of the environment, primarily using observations of environmental features measured from and with respect to the moving platform. Since the seminal work by Smith et al. (1990), there have been several demonstrated implementations of SLAM using land (Bosse et al. 2003; Gutmann and Konolige 1999; Dissanayake et al. 2001; Guivant and Nebot 2001; Thrun and Lui 2003b) and underwater vehicles (Williams et al. 2001; Olson et al. 2004) where two-dimensional, horizontal localization and mapping is performed. In each of these implementations, SLAM involves the fusion of two particular sets of information usually coming from sensors onboard the vehicle; firstly the vehicle uses exteroceptive sensors to sense the terrain/landmarks relative to the vehicle (such as vision, laser, and radar sensors), and secondly the vehicle uses proprioceptive sensors and information to sense its own motion (such as wheel encoders, inertial sensors, and vehicle model information such as holonomic/nonholonomic constraints).

In the context of Unmanned Aerial Vehicles (UAVs), where the platform undergoes rapid six degree-of-freedom (6-DoF) motion, inertial sensors are a logical choice for the proprioceptive core of a SLAM implementation, due to their high frequency of information and ability to track platform motion in 6-DoF without relying on external information such as a vehicle dynamic model. Exteroceptive sensors (i.e., terrain sensors) come in a variety of forms depending on the application, including RAdio Detection And Ranging (RADAR), LIght Detection And Ranging (LIDAR), and electro-optical sensors such as cameras. Such sensors provide a variety of different observations of the terrain including range and bearing to features and bearing-only observations of features (i.e., camera images), necessitating SLAM algorithms that account for this variability.

The utility of SLAM in the context of UAV operations can be considered from both a localization and a mapping perspective. Traditionally, airborne localization has relied on external navigation aids such as satellite/radio positioning systems (i.e., the Global Positioning System (GPS)) or terrain/landmark maps (i.e., in terrain-aided navigation systems (TANS)). SLAM allows for intermittent localization when external navigation aids fail due to, for example, satellite signal occlusion/jamming or when operating over unknown terrain, thus improving localization system robustness. During mapping and surveying tasks, a UAV uses terrain observations (e.g., aerial photography) along with information about the UAV’s own position and orientation at the time of the observation to construct a map. Consistent and accurate mapping relies on accounting for the correlation between the errors in multiple features in the map, induced by a common error in the estimate of the platform’s position and orientation; SLAM maintains these relationships, computing a joint estimate of the localization and mapping states, improving accuracy in the platform’s localization estimates and thus in the final map. Large-scale mapping tasks may also necessitate the use of multiple aerial platforms; algorithms for multivehicle inertial SLAM are a natural extension of the single-vehicle case where shared map information can be used to assist in localization across platforms and to build more accurate maps than achievable with a single UAV.

This chapter provides an overview of algorithms for inertial sensor-based SLAM within the context of UAVs. The presentation in this chapter is based on the use of the Extended Kalman Filter (EKF) and the Extended Information Filter (EIF) due to their ease of understanding, applicability to online implementation, and prevalence in airborne localization applications outside of SLAM (i.e., aided inertial localization, see Giovanni 1979; Bar-Itzhack et al. 1982; Meyer-Hilberg and Jacob 1994). The discussion here includes an examination of SLAM for both small- and large-scale operation over the surface of the Earth, inertial SLAM using both range- bearing and bearing-only observations of the terrain and a look at several different centralized and decentralized architectures for performing multi-vehicle SLAM. 21.2 provides an overview of the equations of motion and sensor models used in inertial SLAM with an examination of the issue of SLAM in both global and local coordinate systems and how this affects the way in which inertial SLAM is implemented. 21.3 examines the structure of the inertial SLAM algorithm when terrain observations are made using a range/bearing sensor. Section 21.4 examines a special case of inertial SLAM when only bearing observations of the terrain are available, such as in the case of electro-optical sensing. Section 21.5 examines the inertial SLAM algorithm when applied to multiple vehicles where vehicles share map information with each other.

2 Inertial SLAM Sensor Equations

The inertial SLAM algorithm is formulated using an EKF and uses information from inertial sensors and from feature observations from onboard terrain sensors in order to estimate the vehicle’s position, velocity, attitude, inertial sensor biases, and map feature locations. The algorithm works on a two-stage process of state prediction (where equations for predicting the vehicle states from inertial sensor data are used) and state observation/update (where equations describing the geometry of terrain feature observations are used).

This section describes the background equations behind inertial sensor-based SLAM algorithms, including equations of motion for inertial localization and sensor equations for terrain-observing sensors. Figure 21.1 illustrates the relevant relationships between the frames of reference and vectors used in the SLAM algorithm. The structure of the process and observation model equations for both global and local SLAM is discussed in the following subsections.

Fig. 21.1
figure 1

Vectors and frames of reference in the inertial SLAM algorithm. The Earth-Centered, Earth-Fixed (ECEF) frame (e), local-level navigation frame (n), body-fixed frame (b), and sensor-fixed frame (s) and the relationship between the vehicle position (p n,p e), feature position (m n,m e), and sensor observation are shown. Also shown is the observation of a terrain feature coming from an example terrain sensor (in this case a vision camera)

2.1 Global Vs. Local SLAM

Localization and mapping tasks vary in environmental scale depending on the application. Some tasks require knowledge of the location of each feature in the map with respect to global co-ordinates (such as an ECEF coordinate system (see Fig. 21.1)). This form of localization and mapping requires external information either in the form of global position information (such as from GPS) or global map information (i.e., the global position of one or more features in the map). Other tasks may only require a map of the environment in which the positions of features are known only with respect to one another or with respect to the starting location of the vehicle, a task which when using SLAM does not require external map or navigation aids.

These two forms of SLAM are referred to as “global SLAM” or “local SLAM” depending on whether map and location information must be referenced to a global coordinate system or merely and local arbitrary coordinate system (such as the starting position of the UAV). This issue is particularly important when using inertial sensors as inertial localization relies to some degree on global information about the local coordinate system’s motion with the rotation of the Earth. Local SLAM using inertial sensors is therefore only possible when certain assumptions are made which treat the local map coordinates as an inertial frame of reference by ignoring the effect of the Earth’s rotation.

2.2 Inertial Localization and SLAM Process Model Equations

The presentation here begins by considering the process model equations from a global SLAM perspective. In this case, the location of the vehicle and the terrain is estimated with respect to the center of the Earth. The estimated state vector \({\bf{\hat x}}\left( k \right)\) at time step k in this case contains the three-dimensional vehicle position (p e), velocity (v e) and Euler angles (Ψ n = [ϕ,θ, ψ]T), IMU sensor biases \(\left( {{\rm{\delta }}{\bf{f}}^b {\rm{and}}\,{\rm{\delta \omega }}_{ib}^b } \right)\), and the N three-dimensional feature locations \(\left( {{\bf{m}}_i^e } \right)\) in the environment:

$${\bf{\hat x}}\left( k \right) = \left[ {{\bf{p}}^e \left( k \right),{\bf{v}}^e \left( k \right),{\bf{\Psi }}^n \left( k \right),{\rm{\delta }}{\bf{f}}^b \left( k \right),{\rm{\delta \omega }}_{ib}^b \left( k \right),{\bf{m}}_{\rm{1}}^e \left( k \right),{\bf{m}}_{\rm{2}}^e \left( k \right), \ldots,{\bf{m}}_N^e \left( k \right)} \right]^T$$
(21.1)

where i = 1,…, N, the superscript e indicates Earth-Centered, Earth-Fixed (ECEF) frame referenced vectors, the superscript b indicates body-fixed frame referenced vectors and the superscript n indicates the Euler angles Ψ n parameterize the body-fixed to local navigation frame Direction Cosine Matrix (DCM) transformation \({\bf{C}}_b^n\) (see Chapter 3 of this book for details on frames of reference). Figure 21.1 illustrates the relevant relationships between the frames of reference and vectors used in the SLAM algorithm.

Euler angles are used to represent the attitude of the platform rather than quaternions in order to reduce the number of parameters in the estimator. It is assumed that the vehicle’s pitch angle is constrained so as to avoid the Euler angle singularity at θ = 90°; however, the process model could be adapted to implement quaternions for the parameterization of \({\bf{C}}_b^n\) in the event that the platform motion is not constrained in such a way.

The state estimate \({\bf{\hat x}}\) is predicted forward in time by integrating the first-order nonlinear dynamic equation:

$${\bf{\dot \hat x}}\left( t \right) = {\bf{F}}_c \left[ {{\bf{\hat x}}\left( t \right),{\bf{u}}\left( t \right)} \right] + {\bf{G}}_c \left[ {{\bf{w}}\left( t \right)} \right]$$
(21.2)

where F c [.,.] is the continuous-time process model function, G c [.] is the continuoustime input function, u(t) is the system input (which is comprised of inertial sensor readings), and w(t) is uncorrelated, zero-mean vehicle process noise vector of covariance Q (which is comprised of inertial sensor noise errors). The process model equations for the vehicle position, velocity, and attitude are based on the 6-DoF inertial localization equations in which an Earth-frame mechanization (Titterton and Weston 1997) is applied:

$${\bf{\dot p}}^e = {\bf{v}}^e$$
(21.3)
$${\bf{\dot v}}^e = {\bf{C}}_b^e {\bf{\hat f}}^b - {\bf{C}}_b^e {\rm{\delta }}{\bf{f}}^b - {\bf{C}}_b^e {\bf{w}}_{{\rm{accel}}} - 2\left( {{\rm{\omega }}_{ie}^e \times {\bf{v}}^e } \right) + {\bf{g}}_l^e$$
(21.4)
$${\bf{\dot \Psi }}^n = {\bf{E}}_b^n \left( {\hat \omega _{ib}^b - {\rm{\delta }}\omega _{ib}^b - {\bf{w}}_{{\rm{gyro}}} - {\bf{C}}_n^b \omega _{ie}^n } \right)$$
(21.5)

where \({\omega _{ie}^e }\) and \({\omega _{ie}^n }\) are the Earth’s rotation rate vectors measured in the ECEF and local navigation frames, respectively:

$$\omega _{ie}^e = \left[ {\matrix{ 0 \cr 0 \cr {\omega _{{\rm{Earth}}} } \cr } } \right]$$
(21.6)
$$\omega _{ie}^n = \left[ {\matrix{ {\omega _{{\rm{Earth}}} \cos (\lambda )} \hfill \cr 0 \hfill \cr { - \omega _{{\rm{Earth}}} \sin (\lambda )} \hfill \cr } } \right]$$
(21.7)

where λ is the latitude angle of the vehicle and ω Earth = 7.292115 × 10−5 rad/s is the rotation rate of the Earth. \({\bf{E}}_b^n\) is the body to navigation frame rotation rate transformation matrix:

$${\bf{E}}_b^n = \left[ {\matrix{ 1 & {\sin \phi \tan \theta } & {\cos \phi \tan \theta } \cr 0 & {\cos \phi } & { - \sin \phi } \cr 0 & {\sin \phi \sin \theta } & {\cos \phi \sec \theta } \cr } } \right]$$
(21.8)

\({\bf{E}}_b^n\) is the local gravity term:

$${\bf{g}}_l^e = {\bf{g}}^e - \omega _{ie}^e \times \left( {\omega _{ie}^e \times {\bf{p}}^e } \right)$$
(21.9)
$$= - g\left[ {\matrix{ {{{{\bf{p}}_x^e } \over {\left| {{\bf{p}}^e } \right|}}} \cr {{{{\bf{p}}_y^e } \over {\left| {{\bf{p}}^e } \right|}}} \cr {{{{\bf{p}}_z^e } \over {\left| {{\bf{p}}^e } \right|}}} \cr } } \right] - \left| {\left[ { \times \omega _{ie}^e } \right]} \right.^2 {\bf{p}}^e$$
(21.10)

where g is the acceleration acting on the vehicle due to gravity, referenced in ECEF coordinates and \(\left[ { \times \omega _{ie}^e } \right]\) is the skew symmetric matrix of the Earth’s rotation rate vector. \({\bf{C}}_b^n\) is the DCM transformation from the body to local navigation frame, and \({\bf{C}}_b^e\) is the DCM transformation from the body to ECEF frame which are related via

$${\bf{C}}_b^e = {\bf{C}}_n^e {\bf{C}}_b^n$$
(21.11)
$$C_n^e = \left[ {\matrix{ { - \sin \left( \lambda \right)\cos \left( l \right)} & { - \sin \left( l \right)} & { - \cos \left( \lambda \right)\cos \left( l \right)} \cr { - \sin \left( \lambda \right)\sin \left( l \right)} & {\cos \left( l \right)} & { - \cos \left( \lambda \right)\sin \left( l \right)} \cr { - \cos \left( \lambda \right)} & 0 & { - \sin \left( \lambda \right)} \cr } } \right]$$
(21.12)

where l is the longitude angle of the vehicle’s position. The vectors \({\bf{\hat f}}^b\) and \(\hat \omega _{ib}^b\)are the accelerometer specific force vector reading and gyroscope rotation rate reading, respectively (where \({\bf{u}}\left( t \right) = \left[ {{\bf{\hat f}}^b \left( t \right),\hat \omega _{ib}^b \left( t \right)} \right]^T\)), δf b and \(\delta \omega _{ib}^b\) are the accelerometer and gyro biases, respectively, and w accel and w gyro are the accelerometer and gyro noise values. The true acceleration and rotation rates, f b and \(\omega _{ib}^b\), are defined:

$${\bf{f}}^b = {\bf{\hat f}}^b - \delta {\bf{f}}^b - {\bf{w}}_{{\rm{accel}}}$$
(21.13)
$$\omega _{ib}^b = \hat \omega _{ib}^b - \delta \omega _{ib}^b - {\bf{w}}_{{\rm{gyro}}}$$
(21.14)

Accelerometer and gyro biases are known to fluctuate by very small amounts due to temperature changes and bias wander, particularly in low-cost IMUs. The accelerometer and gyro bias process models are given by

$$\delta {\bf{\dot f}}^b = {\bf{w}}_{b,{\rm{accel}}}$$
(21.15)
$$\delta \omega _{ib}^b$$
(21.16)

where w b,accel and w b,gyro are bias drift noises for the accelerometers and gyros, respectively. The complete noise vector w(t) for the process model based on errors from the IMU is composed:

$${\bf{w}}\left( t \right) = \left[ {{\bf{w}}_{{\rm{accel}}} \left( t \right),{\bf{w}}_{{\rm{gyro}}} \left( t \right),{\bf{w}}_{b{\rm{,accel}}} \left( t \right),{\bf{w}}_{b{\rm{,gyro}}} \left( t \right)} \right]^T$$
(21.17)

Finally, map feature locations are estimated in the ECEF frame and are also assumed to be constant as the state of the terrain is stationary and the process model of the i th feature is given by

$${\bf{\dot m}}_i^e = {\bf{0}}$$
(21.18)

Equations 21.321.5, 21.15, 21.16, and 21.18 can also be expressed in discretetime, recursive form by assuming a first-order Euler integration step:

$${\bf{\hat x}}\left( k \right) = {\bf{F}}\left[ {{\bf{\hat x}}\left( {k - 1} \right),{\bf{u}}\left( k \right),k} \right] + {\bf{G}}\left[ {{\bf{w}}\left( k \right)} \right]$$
(21.19)

where F[.,., k] is the nonlinear state transition function at time k and G[., k] is the input model transition function at time k. The discrete process model thus becomes

$${\bf{p}}^e \left( k \right) = {\bf{p}}^e \left( {k - 1} \right) + {\bf{v}}^e \Delta t$$
(21.20)
$${\bf{v}}^e \left( k \right) = {\bf{v}}^e \left( {k - 1} \right) + \left[ {{\bf{C}}_b^e {\bf{\hat f}}^b - {\bf{C}}_b^e \delta {\bf{f}}^b - {\bf{C}}_b^e {\bf{w}}_{{\rm{accel}}} - 2\left( {\omega _{ie}^e \times {\bf{v}}^e } \right) + {\bf{g}}_l^e } \right]\Delta t$$
(21.21)
$${\bf{\Psi }}^n \left( k \right) = {\bf{\Psi }}^n \left( {k - 1} \right) + \left[ {{\bf{E}}_b^n \left( {\hat \omega _{ib}^b - \delta \omega _{ib}^b - {\bf{w}}_{{\rm{gyro}}} - {\bf{C}}_n^b \omega _{ie}^n } \right)} \right]\Delta t$$
(21.22)
$$\delta {\bf{f}}^b \left( k \right) = \delta {\bf{f}}^b \left( {k - 1} \right) + {\bf{w}}_{b,{\rm{accel}}} \left( k \right)$$
(21.23)
$$\delta \omega _{ib}^b \left( k \right) = \delta \omega _{ib}^b \left( {k - 1} \right) + {\bf{w}}_{b,{\rm{gyro}}} \left( k \right)$$
(21.24)
$${\bf{m}}_i^e \left( k \right) = {\bf{m}}_i^e \left( {k - 1} \right)$$
(21.25)

where Δt is the time difference between the k and k−1 discrete-time segments.

2.2.1 Process Model Equation Approximations for Local SLAM

When performing SLAM with respect to a local coordinate system where no global information is available, several approximations must be made in the inertial SLAM equations. In local SLAM, the position and orientation of the vehicle is maintained with respect to a local navigation frame which is fixed to an arbitrary and unknown location on the Earth’s surface rather than the ECEF frame. The local SLAM equations are thus derived under the assumption that local navigation frame is an inertial frame of reference by ignoring the small Coriolis and centripetal accelerations which are incurred by the Earth’s rotation.

In local SLAM, the estimated state vector is \({\bf{\hat x}}_{{\bf{local}}} \left( k \right)\), in which the vehicle position and velocity and the position of map features are now reference in local navigation frame coordinates:

$${\bf{\hat x}}_{{\bf{local}}} \left( k \right) = \left[ {{\bf{p}}^n \left( k \right),{\bf{v}}^n \left( k \right),{\bf{\Psi }}^n \left( k \right),\delta {\bf{f}}^b \left( k \right),\delta \omega _{ib}^b \left( k \right),{\bf{m}}_1^n \left( k \right),{\bf{m}}_2^n \left( k \right), \ldots,{\bf{m}}_N^n \left( k \right)} \right]^T$$
(21.26)

The continuous-time and discrete-time process models for local SLAM are thus given by

$$\dot{\hat {\bf x}}_{{\rm{local}} } \left( t \right) = {\bf{F}}_{c,{\rm{local}}} \left[ {{\hat {\bf x}}_{{\rm{local}}} \left( t \right),{\bf{u}}\left( t \right)} \right] + {\bf{G}}_{c,{\rm{local}}} \left[ {{\bf{w}}\left( t \right)} \right]$$
(21.27)
$$\bf{{\hat x}}_{{\rm{local}} } \left( t \right) = {\bf{F}}_{{\rm{local}}} \left[ {{\bf{\hat x}}_{{\rm{local}}} \left( {k - 1} \right),{\bf{u}}\left( k \right),k} \right] + {\bf{G}}_{{\rm{local}}} \left[ {{\bf{w}}\left( k \right)} \right]$$
(21.28)

The vehicle process model equations in continuous-time form are the following:

$${\bf{\dot p}}^n = {\bf{v}}^n$$
(21.29)
$${\bf{\dot v}}^n = {\bf{C}}_b^n {\bf{f}}^b - {\bf{C}}_b^n \delta {\bf{f}}^b - {\bf{C}}_b^n {\bf{w}}_{{\rm{accel}}} + {\bf{g}}^n$$
(21.30)
$${\bf{\dot \Psi }}^n = {\bf{E}}_b^n \left( {\hat \omega _{ib}^b - \delta \omega _{ib}^b - {\bf{w}}_{{\rm{gyro}}} } \right)$$
(21.31)

where g n = [0,0, g]T is the vector of acceleration due to gravity in the local navigation frame and g = 9.81 m/s2. Terrain map features are now referenced in local navigation frame coordinates, and their process model is given by

$${\bf{\dot m}}_b^n = {\bf{0}}$$
(21.32)

The complete form of the discrete-time process model equations in local SLAM are thus

$${\bf{p}}^n \left( k \right) = {\bf{p}}^n \left( {k - 1} \right) + {\bf{v}}^n \Delta t$$
(21.33)
$${\bf{v}}^n \left( k \right) = {\bf{v}}^n \left( {k - 1} \right) + \left[ {{\bf{C}}_b^n {\bf{\hat f}}^b - {\bf{C}}_b^n \delta {\bf{f}}^b - {\bf{C}}_b^n {\bf{w}}_{{\rm{accel}}} + {\bf{g}}^n } \right]\Delta t$$
(21.34)
$${\bf{\Psi }}^n \left( k \right) = {\bf{\Psi }}^n \left( {k - 1} \right) + \left[ {{\bf{E}}_b^n \left( {\hat \omega _{ib}^b - \delta \omega _{ib}^b - {\bf{w}}_{{\rm{gyro}}} } \right)} \right]\Delta t$$
(21.35)
$$\delta {\bf{f}}^b \left( k \right) = \delta {\bf{f}}^b \left( {k - 1} \right) + {\bf{w}}_{b{\rm{,accel}}} \left( k \right)$$
(21.36)
$$\delta \omega _{ib}^b \left( k \right) = \delta \omega _{ib}^b \left( {k - 1} \right) + {\bf{w}}_{b{\rm{,gyro}}} \left( k \right)$$
(21.37)
$${\bf{m}}_i^n \left( k \right) = {\bf{m}}_i^n \left( {k - 1} \right)$$
(21.38)

2.3 Landmark/Terrain Sensor Equations

The observation model equations describe the relationship between the sensor observation of a map feature in the sensor coordinates of the terrain sensor to the estimated states in SLAM. Figure 21.1 illustrates the relevant relationships between the sensor frame, sensor observation and map, and vehicle positions. The observation z i (k) is related to the estimated states using Eq. 21.39 for global SLAM and Eq. 21.40 for local SLAM:

$${\bf{z}}_i \left( k \right) = {\bf{H}}_i \left( {{\bf{p}}^e \left( k \right),{\bf{\Psi }}^n \left( k \right),{\bf{m}}_i^e \left( k \right),k} \right) + {\bf{v}}\left( k \right)$$
(21.39)
$${\bf{z}}_i \left( k \right) = {\bf{H}}_{i,{\rm{local}}} \left( {{\bf{p}}^n \left( k \right),{\bf{\Psi }}^n \left( k \right),{\bf{m}}_i^n \left( k \right),k} \right) + {\bf{v}}\left( k \right)$$
(21.40)

where H i (.,.,., k) and H i ,local(.,.,., k) are functions of the feature location, vehicle position, and Euler angles and v(k) is uncorrelated, zero-mean observation noise errors of covariance R.

Terrain observations could come from a variety of different sensors such as RADAR, LIDAR, or an electro-optical camera. The SLAM algorithm requires that point features can be extracted from the observation sensor data which go on to be mapped terrain features. In the case of large objects that do not show up as a “point” in sensor data, the centroid of the object is found or else several points can be used to represent a single object. Example feature extraction algorithms for vision include Scale-Invarient Feature Transform (SIFT) features (Lowe 2004) or model-based feature matching (Nixon and Aguado 2001). Features in this sense are points in the sensor data that are distinct and easily recognizable or else points in the sensor data that appear to correlate well with a given feature model or template that is specified offline.

Sensor observations can be broken up into two types: firstly range/bearing observations (as might be available from RADAR or a LIDAR) and secondly bearing-only observations (as might be available from an electro-optical sensor such as a camera). When both range and bearing observations are made, the observation model is given by

$${\bf{z}}_i \left( k \right) = \left[ {\matrix{ {\rho _i } \cr {\varphi _i } \cr {\vartheta _i } \cr } } \right] = \left[ {\matrix{ {\sqrt {\left( {x_i^s } \right)^2 + \left( {y_i^s } \right)^2 + \left( {z_i^s } \right)^2 } } \cr {\tan ^{ - 1} \left( {{{x_i^s } \over {y_i^s }}} \right)} \cr {\tan ^{ - 1} \left( {{{z_i^s } \over {\sqrt {\left( {x_i^s } \right)^2 + \left( {y_i^s } \right)^2 } }}} \right)} \cr } } \right]$$
(21.41)

where ρ i,φ i, and ϑ i, are the observed range, azimuth, and elevation angles to the feature and \(x_i^s,y_i^s\), and \(z_i^s\) are the Cartesian coordinates of \({\bf{p}}_{ms,i}^s\) the relative position of the i th feature with respect to the sensor, measured in the sensor frame.

There are two forms that can be used to represent a bearing-only observation. The observation z i (k) can be represented by azimuth (φ i ) and elevation angles (ϑ i ):

$${\bf{z}}_{i,{\rm{ang}}} \left( k \right) = \left[ {\matrix{ {\varphi _i } \cr {\vartheta _i } \cr } } \right] = \left[ {\matrix{ {\tan ^{ - 1} \left( {{{y_i^s } \over {x_i^s }}} \right)} \cr {\tan ^{ - 1} \left( {{{z_i^s } \over {\sqrt {\left( {x_i^s } \right)^2 + \left( {y_i^s } \right)^2 } }}} \right)} \cr } } \right]$$
(21.42)

where R ang is the angular noise covariance. For vision camera, the observation is better represented as pixels in the image of the camera, using a pinhole camera model:

$${\bf{z}}_{i,{\rm{pix}}} \left( k \right) = \left[ {\matrix{ u \cr v \cr } } \right] = \left[ {\matrix{ {f_u \left( {{{y_i^s } \over {x_i^s }}} \right) + u_0 } \cr {f_v \left( {{{z_i^s } \over {x_i^s }}} \right) + v_0 } \cr } } \right]$$
(21.43)

where u 0, v 0, f u, and f v are calibration parameters for the camera and the pixel noise covariance is R pix. The relationship between the pixel coordinates and the azimuth and elevation angles is given by Eq. 21.44

$${\bf{z}}_{i,{\rm{pix}}} \left( k \right) = \left[ {\matrix{ u \cr v \cr } } \right] = \left[ {\matrix{ {f_u \left( {{{y_i^s } \over {x_i^s }}} \right) + u_0 } \cr {f_v \left( {{{z_i^s } \over {x_i^s }}} \right) + v_0 } \cr } } \right]$$
(21.44)

\({\bf{p}}_{ms,i}^s\), the relative position of the i th feature with respect to the sensor, measured in the sensor frame is given by:

$$\matrix{ {{\bf{p}}_{ms,i}^s = {\bf{C}}_b^s {\bf{C}}_n^b \left[ {{\bf{m}}_i^n - {\bf{p}}^n - {\bf{C}}_b^n {\bf{p}}_{sb}^b } \right]} \cr { = {\bf{C}}_b^s {\bf{C}}_e^b \left[ {{\bf{m}}_i^e - {\bf{p}}^e - {\bf{C}}_b^e {\bf{p}}_{sb}^b } \right]} \cr }$$
(21.45)

where \({\bf{C}}_b^s\) is the transformation matrix from the body frame to the sensor frame and \({\bf{p}}_{sb}^s\) is the sensor offset from the vehicle center of mass, measured in the body frame, otherwise known as the “lever arm” and \({\bf{C}}_e^b = \left( {{\bf{C}}_b^e } \right)^T\).

2.4 Inertial SLAM with Range and Bearing Sensors

When both range and bearing observations to features are available, the inertial SLAM algorithm is broken up into the following processes:

  1. 1.

    State Prediction. The process model equations in Sect. 21.2.2 are used to “predict” forward in time the estimated state vector and state covariance matrix in an EKF prediction step.

  2. 2.

    Data Association. Every time a new feature observation is made, the data association step is used to determine whether the observation is of a new feature or a known feature, and which feature it is of.

  3. 3.

    Feature Initialization. When a feature is observed by the terrain sensor for the first time, its position is computed and augmented into the estimated state vector and its initial position uncertainty augmented into the state covariance matrix.

  4. 4.

    State Update. Each time an observation is made of a previously seen feature, it is used in an EKF update step to correct the value of the estimated state and update the state covariance matrix.

Figure 21.2 provides an overview of the range/bearing inertial SLAM algorithm. At each time step, inertial sensor data is used for the prediction stage. When features are extracted from the sensor data, they are run through a data association stage. New features are augmented into the state vector and state covariance matrix, and observations of existing features are used to update the estimated state vector and state covariance matrix. The following subsections summarize the equations and methods in each step of the algorithm. For further details of inertial SLAM with range and bearing sensors and an implementation example, the reader is referred to Kim and Sukkarieh (2003).

Fig. 21.2
figure 2

Overview of the range/bearing inertial SLAM algorithm. the algorithm is recursive; EKF state prediction (1) is performed when inertial sensor data is available. Data association (2), feature initialization (3), and EKF state update (4) are performed when terrain sensor observations are made

2.5 State Prediction

The state prediction stage is run recursively each time a new reading is taken from the inertial sensors. In the case of global SLAM, the estimated state vector \({\bf{\hat x}}\left( {k + 1} \right)\) is predicted from the previous time step state estimate using Eqs. 21.2021.25. The state covariance P (k + 1), which is the state covariance at time step k after prediction, is computed as

$${\bf{P}}^ - \left( {k + 1} \right) = \nabla {\bf{FP}}\left( k \right)\nabla {\bf{F}}^T + \nabla {\bf{GQ}}\left( k \right)\nabla {\bf{G}}^T$$
(21.46)

where ∇F and ∇G are the Jacobians of the state transition function F[.,., k] with respect to the state vector \({\bf{\hat x}}\left( {k + 1} \right)\) and input model transition function G[., k] with respect to the noise input w(k + 1), respectively. In the case of local SLAM, the estimated state vector \({\bf{\hat x}}_{{\rm{local}}} \left( k \right)\) is predicted forward using Eqs. 21.33–21.38. The state covariance P (k + 1) is computed as

$${\bf{P}}^ - \left( {k + 1} \right) = \nabla {\bf{F}}_{{\rm{local}}} {\bf{P}}\left( k \right)\nabla {\bf{F}}_{{\rm{local}}}^T + \nabla {\bf{G}}_{{\rm{local}}} {\bf{Q}}\left( k \right)\nabla {\bf{G}}_{{\rm{local}}}^T$$
(21.47)

where ∇Flocal and ∇Glocal are the Jacobians of the state transition function F local [.,., k] with respect to the state vector \({\bf{\hat x}}_{{\rm{local}}} \left( {k + 1} \right)\) and input model transition function G local [., k] with respect to the noise input w(k + 1), respectively.

2.6 Feature Initialization

When the first range/bearing observation of a particular feature is obtained, its position is calculated using the initialization function \({\bf{J}}_1 \left[ {{\bf{\hat x}}\left( k \right),{\bf{J}}_2 \left( {{\bf{z}}_i \left( k \right)} \right)} \right]\) which is given as

$${\bf{J}}_1 \left[ {{\bf{\hat x}}\left( k \right),{\bf{J}}_2 \left( {{\bf{z}}_i \left( k \right)} \right)} \right] \to {\bf{m}}_i^e = {\bf{p}}^e + {\bf{C}}_n^e {\bf{C}}_b^n {\bf{p}}_{sb}^b + {\bf{C}}_n^e {\bf{C}}_b^n {\bf{C}}_s^b {\bf{p}}_{ms}^s$$
(21.48)

for the case of global SLAM and

$${\bf{J}}_{1,{\rm{local}}} \left[ {{\bf{\hat x}}_{{\rm{local}}} \left( k \right),{\bf{J}}_2 \left( {{\bf{z}}_i \left( k \right)} \right)} \right] \to {\bf{m}}_i^n = {\bf{p}}^n + {\bf{C}}_b^n {\bf{p}}_{sb}^b + {\bf{C}}_b^n {\bf{C}}_s^b {\bf{p}}_{ms}^s$$
(21.49)

for the case of local SLAM, where

$${\bf{J}}_2 \left( {{\bf{z}}_i \left( k \right)} \right) \to {\bf{p}}_{ms,i}^s = \left[ {\matrix{ {\rho _i \cos \left( {\varphi _i } \right)\cos \left( {\vartheta _i } \right)} \cr {\rho _i \sin \left( {\varphi _i } \right)\cos \left( {\vartheta _i } \right)} \cr {\rho _i \sin \left( {\vartheta _i } \right)} \cr } } \right]$$
(21.50)

for both the global and local SLAM cases. The state vector and covariance are then augmented to include the new feature position:

$${\bf{\hat x}}_{{\rm{aug}}} \left( k \right) = \left[ {\matrix{ {{\bf{\hat x}}\left( k \right)} \cr {{\bf{m}}_i^e \left( k \right)} \cr } } \right]$$
(20.51)
$$\bf{{p}}_{{\rm{aug}}} \left( k \right) = \left[ {\begin{matrix} {\bf{I}}& {\bf{0}} \cr {\nabla {\bf{J}}_x }& {\nabla {\bf{J}}_z } \end{matrix}} \right] \left[ {\begin{matrix}{\bf{P}}\left( k \right) & {\bf{0}} \\ {\bf{0}}& {{\bf{R}}_k }\end{matrix}} \right] \left[ {\begin{matrix} {\mathbf{I}} & {\bf{0}} \\ {\nabla {\bf{J}}_x }& {\nabla {\bf{J}}_z }\end{matrix}}\right]^T$$
(20.52)

for the case of global SLAM and

$${\bf{\hat x}}_{{\rm{locak,aug}}} \left( k \right) = \left[ {\matrix{ {{\bf{\hat x}}_{{\rm{local}}} \left( k \right)} \cr {{\bf{m}}_i^n \left( k \right)} \cr } } \right]$$
(20.53)
$$ {\bf P}_{\rm local,aug} \left( k \right) = \left[ {\begin{matrix} {\bf{I}}& {\bf{0}} \cr {\nabla {\bf{J}}_x }& {\nabla {\bf{J}}_z } \end{matrix}} \right] \left[ {\begin{matrix}{\bf{P}}_{\rm local}\left( k \right) & {\bf{0}} \\ {\bf{0}}& {{\bf{R}}_k }\end{matrix}} \right] \left[ {\begin{matrix} {\mathbf{I}} & {\bf{0}} \\ {\nabla {\bf{J}}_x }& {\nabla {\bf{J}}_z }\end{matrix}}\right]^T$$
(20.54)

for the case of local SLAM where ∇J x and ∇J z are the Jacobians of the initialization function J 1 with respect to the state estimate \({\bf{\hat x}}\left( k \right)\) and the observation z i (k), respectively. The position of this feature becomes correlated to all of the vehicle states including position, velocity, and attitude along with inertial sensor biases and also to the position of other features in the map.

2.7 Data Association

Data association is the process of matching observations of features from the terrain sensor with the estimated 3D position of the feature within the map. The validity of potential associations between observations and features is assessed using the Mahalanobis distance (γ) (Neira and Tardos 2001) in the sensor space (range, azimuth, and elevation):

$$\gamma _i = v_i \left( k \right)^T {\bf{S}}_i \left( k \right)^{ - 1} v_i \left( k \right)$$
(20.55)

where v i (k) and S i (k) are the innovation and innovation covariance (see Sect. 21.3.4) for an observation of the i th feature in the map at time segment k.

When checking the association of a given observation with the initialized features in the map, γ i is calculated for each initialized feature. Matchings that fall within a defined threshold of γ i corresponding to a 95 % level of confidence are considered acceptable, in which case the observation is associated to a known feature in the map and an EKF state update is performed (see Sect. 21.3.4 below). If the observation does not fall within the threshold of an existing map feature, it is assumed that the feature has not been seen before, and thus the observation data is used to initialize the feature into the map as is shown in Sect. 21.3.2.

This method of data association works well when feature observations are well spaced in the sensor coordinates with respect to the uncertainty of the vehicle position and orientation. In the case of very dense feature observations, it may be necessary to consider not only matches of individual features to individual sensor observations but also the joint compatibility of several features to several observations simultaneously as to overcome association ambiguity. For efficient techniques for joint compatibility data association based on the innovation gate in Eq. 21.55, the reader is referred to Neira and Tardos (2001).

2.8 State Update

Once a feature has been initialized into the state vector, subsequent observations of this feature are used to update the entire state vector consisting of the vehicle pose, velocity, inertial sensor biases, and the position of this feature and other features in the environment. The state estimate is updated in an EKF update step where the updated state estimate and state covariance matrix after the update are as follows:

$${\bf{\hat x}}^ + \left( {k + 1} \right) = {\bf{\hat x}}^ - \left( {k + 1} \right) + {\bf{W}}\left( {k + 1} \right)v\left( {k + 1} \right)$$
(20.56)
$${\bf{P}}^ + \left( {k + 1} \right) = {\bf{P}}^ - \left( {k + 1} \right) - {\bf{W}}\left( {k + 1} \right){\bf{S}}_i \left( {k + 1} \right){\bf{W}}\left( {k + 1} \right)^T$$
(20.57)
$$v_i \left( {k + 1} \right) = {\bf{z}}_i - {\bf{H}}_i \left( {{\bf{\hat x}}^ - \left( {k + 1} \right)} \right)$$
(20.58)
$${\bf{S}}_i \left( {k + 1} \right) = \nabla {\bf{H}}_i {\bf{P}}^ - \left( {k + 1} \right)\nabla {\bf{H}}_i^T + {\bf{R}}\left( {k + 1} \right)$$
(20.59)
$${\bf{W}}\left( {k + 1} \right) = {\bf{P}}^ - \left( {k + 1} \right)\nabla {\bf{H}}_i^T {\bf{S}}_i^{ - 1} \left( {k + 1} \right)$$
(20.60)

for the case of global SLAM and

$${\bf{\hat x}}_{{\rm{local}}}^{\rm{ + }} \left( {k + 1} \right) = {\bf{\hat x}}_{{\rm{local}}}^ - \left( {k + 1} \right) + {\bf{W}}\left( {k + 1} \right)v\left( {k + 1} \right)$$
(20.61)
$${\bf{P}}_{{\rm{local}}}^ + \left( {k + 1} \right) = {\bf{P}}_{^{{\rm{local}}} }^ - \left( {k + 1} \right) - {\bf{W}}\left( {k + 1} \right){\bf{S}}_i \left( {k + 1} \right){\bf{W}}\left( {k + 1} \right)^T$$
(20.62)
$$v_i \left( {k + 1} \right) = {\bf{z}}_i - {\bf{H}}_{i,{\rm{local}}} \left( {{\bf{\hat x}}_{{\rm{local}}}^ - \left( {k + 1} \right)} \right)$$
(20.63)
$${\bf{S}}_i \left( {k + 1} \right) = \nabla {\bf{H}}_{i,{\rm{local}}} {\bf{P}}_{{\rm{local}}}^ - \left( {k + 1} \right)\nabla {\bf{H}}_{i,{\rm{local}}}^T + {\bf{R}}\left( {k + 1} \right)$$
(20.64)
$${\bf{W}}\left( {k + 1} \right) = {\bf{P}}_{{\rm{local}}}^ - \left( {k + 1} \right)\nabla {\bf{H}}_{i,{\rm{local}}}^T {\bf{S}}_i^{ - 1} \left( {k + 1} \right)$$
(20.65)

for the case of local SLAM where \({\bf{H}}_i \left( {{\bf{\hat x}}^ - \left( {k + 1} \right)} \right)\) or \({\bf{H}}_{i,{\rm{local}}} \left( {{\bf{\hat x}}_{{\bf{local}}}^ - \left( {k + 1} \right)} \right)\) is the predicted feature observation which is computed from the estimated vehicle position and attitude and estimate map location using Eqs. 21.41 and 21.45. ∇H i and ∇H i ,local are the Jacobians of the observation function with respect to the predicted state vector. Once a feature leaves the field of view of the sensor, its position remains in the state vector and continues to be updated via its correlations to other visible features in the state vector.

3 Inertial SLAM with Bearing-Only Sensors

When bearing-only observations are made using a terrain sensor such as in the case of a vision system, additional elements must be added to the SLAM algorithm. Performing SLAM with bearing-only observations poses two main additional challenges to the range and bearing case. Firstly, a single bearing-only observation provides insufficient information alone to localize a feature in 3D. Instead observations from two sufficiently different poses are required. Secondly, data association is complicated by bearing-only observations. Since the 3D position of the feature is not known from a single observation, the Mahalanobis distance (Neira and Tardos 2001), commonly used for validation gating in tracking tasks, and as shown for the range-bearing case in Sect. 21.3.3, cannot be calculated in the standard way.

This section details variations to the inertial SLAM algorithms that account for complications arising from the use of bearing-only terrain sensors. The bearing-only inertial SLAM algorithm is broken up into the following processes:

  1. 1.

    State Prediction. The prediction step in the bearing-only case is performed in a similar manner as in the range/bearing case.

  2. 2.

    Data Association. Data association for initialized features follows the same methods as shown in the range/bearing case. Data association of uninitialized features is tackled by creating multi-hypothesis distributions of the possible feature locations in 3D (i.e., along the line of sight of an observation). Subsequent observations of the same feature can be associated by matching the most likely hypotheses and culling the hypotheses that do not match.

  3. 3.

    Feature Initialization. A delayed initialization technique is used to store information from bearing-only observations until there exists two observations with a sufficient baseline from which to initialize the 3D position of the feature. Once this is available, all of the information contained in stored observations is recovered.

  4. 4.

    State Update. The update step in the bearing-only case is performed in a similar manner as in the range/bearing case where bearing-only observations are used to update the estimated state which includes vehicle states, stored pose data, and features whose 3D positions have been initialized into the map.

Figure 21.3 provides an overview of the bearing-only inertial SLAM algorithm. The following subsections summarize the equations and methods in each step of the algorithm. For further details of inertial SLAM with bearing-only sensors and an implementation example, the reader is referred to Bryson and Sukkarieh (2007).

Fig. 21.3
figure 3

Overview of the bearing-only inertial SLAM algorithm. The algorithm is recursive; EKF state prediction (1) is performed when inertial sensor data is available. Data association (3) and EKF state update (4) are performed when terrain sensor observations are made of features that have been already initialized into the map. Observations of new features are stored until enough observations exist to initialize the position of the feature into the state vector (2)

3.1 State Prediction and Update

The bearing-only SLAM equations rely on the storing of vehicle pose data into the estimated state vector. The methods for storing pose data is shown below in Sect. 21.4.2. The state prediction for the bearing-only case follows the same equations and methods as shown in the range/bearing observation case as shown in Sect. 21.3.1 except that stored pose data is predicted forward in the predict step in the same way as stored map features. The only difference in the state update between the bearing-only case and range/bearing case (shown in Sect. 21.3.4) is that now no range information is available in the observation. Equations 21.5621.65 are used for the update where Eqs. 21.42 or 21.43 is used to compute \({\bf{H}}_i \left( {{\bf{\hat x}}^ - \left( {k + 1} \right)} \right)\) or \({\bf{H}}_{i,{\rm{local}}} \left( {{\bf{\hat x}}_{{\bf{local}}}^ - \left( {k + 1} \right)} \right)\), the predicted feature observation, depending on the exact form of bearing observation from the terrain sensor (i.e., bearing angles or pixels).

3.2 Feature Initialization

A single bearing-only observation is insufficient to initialize the 3D position of a feature into the SLAM filter with Gaussian uncertainty. The following subsection outlines a method for delayed initialization of a feature into the filter by using stored observations and vehicle pose information. For simplification, the feature position is always initialized into the local navigation frame first before being transformed into the ECEF frame if global SLAM is performed. Otherwise, the feature position remains in the local navigation frame.

3.2.1 Storing Feature Observations and Vehicle Pose Information

When an observation of an uninitialized feature is made, the current bearing-only observation is stored, and the SLAM state vector is augmented to include the current vehicle pose (three position states and three Euler angle states):

$$ {\bf{\hat x}}_v = \left[ {\begin{matrix} {{\bf{p}}^e \left( k \right)} \cr {{\bf{v}}^e \left( k \right)} \cr {\Psi ^n \left( k \right)} \cr {\delta {\bf{f}}^b \left( k \right)} \cr {\delta \omega _{ib}^b \left( k \right)} \end{matrix} } \right],{\bf{\hat x}}_p = \left[ {\begin{matrix}{{\bf{C}}_e^n {\bf{p}}^e \left( k \right)} \cr {\Psi ^n \left( k \right)} \end{matrix} } \right] = \left[ {\begin{matrix} {{\bf{p}}^n \left( k \right)} \cr {\Psi ^n \left( k \right)} \end{matrix}} \right] $$
(20.66)
$${\bf{\hat x}}_{{\rm{aug}}} = \left[ {\matrix{ {{\bf{\hat x}}_v \left( k \right)} \cr {{\bf{m}}^n \left( k \right)} \cr {{\bf{\hat x}}_p \left( k \right)} \cr } } \right]$$
(20.67)

The state covariance matrix is then augmented with the stored vehicle pose:

$${\bf{P}}_{{\rm{aug}}} \left( k \right) = \left[ {\matrix{ {{\bf{P}}_{vv} } & {{\bf{P}}_{vm} } & {{\bf{P}}_{vp} } \cr {{\bf{P}}_{mv} } & {{\bf{P}}_{mm} } & {{\bf{P}}_{mp} } \cr {{\bf{P}}_{pv} } & {{\bf{P}}_{pm} } & {{\bf{P}}_{pp} } \cr } } \right]$$
(20.68)

\({\bf{\hat x}}_v\) is the concatenation of the vehicle position, velocity, attitude, and inertial sensor bias states where \({\bf{\hat x}}_p\) is the concatenation of the vehicle position and attitude (i.e., the vehicle pose states) at the time of the observation. \({\bf{\hat x}}_{{\rm{aug}}}\) is the augmented state vector which is comprised of the vehicle states \(\left( {{\bf{\hat x}}_v } \right)\), the 3D positions of all of the map features (m n), and the added vehicle pose states \(\left( {{\bf{\hat x}}_p } \right)\). The observation (coordinates of the feature in the image plane) is stored separately from the EKF state vector.

In local SLAM, the covariance term P pp (covariance of the pose states) is derived by taking the position and attitude covariance matrix subblocks from within P vv (since these states have the same value and the same covariance as the current vehicle position and attitude). Similarly the covariance subblock Ppm is taken from the existing cross correlations between the current vehicle states and map states (i.e., subblocks of P vm corresponding just to position and attitude). P pv is the cross correlation between all of the current vehicle states (i.e., position, attitude, velocity, and inertial sensor biases) and the current pose states. This is thus taken from subblocks of P vv itself since there are states in common (position and attitude), and thus parts of the added vehicle pose states \(\left( {{\bf{\hat x}}_p } \right)\) are completely correlated to the current vehicle state \(\left( {{\bf{\hat x}}_v } \right)\). In the case of global SLAM, a \({\bf{C}}_e^n\) transformation is applied to the vehicle position covariances in order to represent the stored pose in the local navigation frame.

As the process model of the vehicle comes into play and the filter moves forward in time, the correlations between the stored pose and the new time vehicle states (i.e., \({\bf{\hat x}}_v\)(k + 1)) will decrease (due to the process noise on \({\bf{\hat x}}_v\)(k + 1)).

3.2.2 Deciding When to Initialize a Feature

Eventually enough feature observations will be made from varying vehicle poses to initialize the position of the feature. Initializing a feature too early (i.e., by not using enough observations or observations with insufficient separating angle) can result in inconsistency as the true probability distribution of the feature location is not well represented by a Gaussian. The disadvantage with overdelaying the initialization is that the uncertainty in the vehicle states continues to grow before the initialization. As the uncertainty in the current vehicle state grows, linearization errors can affect the consistency of the filter. The information should be initialized and recovered as quickly as possible to limit the effect the inconsistency can have. Additionally, it may be desired to quickly recover the stored information which contributes toward the accuracy of the vehicle localization estimates which may be used as feedback for the control of the vehicle. Another upper limit on deciding how long to delay the initialization is driven by reducing the increased computational burden imposed by adding stored observations to the state vector.

In Bailey (2003), the author discusses a method for testing the conditioning of the initialization by using a Kullback-Leibler distance measure between the linearized update and an approximation of the update using particles to represent the final probability distribution of the feature position. This method however is very computationally intensive; instead, a more practical approach is to set a conservative threshold for the minimum angle between observations necessary to initialize a feature.

3.2.3 Initializing a 3D Feature Position Estimate

When it is decided to initialize the 3D position of a feature into the map, the two stored observations of the feature which are separated by the largest angle are used to create an initial estimate of the feature position. Each bearing-only observation can be represented by a 3D point in space y n from where the observation was made (at the origin of the sensor) along with a unit vector ū n pointing along the line of sight of the observation; thus,

$${\bf{y}}^n = {\bf{p}}^n + C_b^n {\bf{p}}_{sb}^b$$
(20.69)
$${\bf{\bar u}}^n = C_b^n C_s^b {\bf{\bar p}}_{ms}^s $$
(20.70)

where \({\bf{\bar x}}\) indicates the unit vector of a vector x. p n and \(C_b^n \) are determined from the stored pose data associated to each observation, and \({\bf{\bar p}}_{ms}^s \) is determined from the observation data itself using Eq. 21.44 to convert the pixel observation to azimuth and elevation angles and Eq. 21.71 to convert to a unit vector:

$${\bf{\bar p}}_{ms}^s = \left[ {\matrix{ {\cos \left( {\varphi _i } \right)\cos \left( {\vartheta _i } \right)} \cr {\sin \left( {\varphi _i } \right)\cos \left( {\vartheta _i } \right)} \cr {\sin \left( {\vartheta _i } \right)} \cr } } \right]$$
(20.71)

The lines of sight generated by each observation should intersect at one point in 3D space corresponding to the feature location. Since the observations and stored vehicle pose information is noisy, the lines of sight will generally not intersect perfectly. Instead, the initial feature position is computed as the closest point between the two lines for each observation:

$$\matrix{ {{\bf{m}}_i^n } \hfill & { = {\bf{G}}\left( {{\bf{p}}_1^n,{\bf{p}}_2^n,\Psi _1^n,\Psi _2^n,{\bf{z}}_1,{\bf{z}}_2 } \right)} \hfill \cr {} \hfill & { = {1 \over 2}\left( {{\bf{y}}_1^n + {\bf{y}}_2^n + p_1.{\bf{\bar u}}_1^n + p_2.{\bf{\bar u}}_2^n } \right)} \hfill \cr } $$
(20.72)
$$p_1 = {{\left( {\left( {{\bf{y}}_2^n - {\bf{y}}_1^n } \right) \times {\bf{\bar u}}_2^n } \right) \cdot \left( {{\bf{\bar u}}_1^n \times {\bf{\bar u}}_2^n } \right)} \over {|{\bf{\bar u}}_1^n \times {\bf{\bar u}}_2^n |^2 }}$$
(20.73)
$$p_2 = {{\left( {\left( {{\bf{y}}_1^n - {\bf{y}}_2^n } \right) \times {\bf{\bar u}}_1^n } \right) \cdot \left( {{\bf{\bar u}}_2^n \times {\bf{\bar u}}_1^n } \right)} \over {|{\bf{\bar u}}_2^n \times {\bf{\bar u}}_1^n |^2 }}$$
(20.74)

In the event that there is a large discrepancy between the two lines (i.e., the minimum distance between the closest two points, one on each line, is larger than a threshold), the observations may be incorrect. This could be due to a misassociation of one of the observations or if the feature is moving for some reason. In this case, the observations are discarded, and the feature is not initialized. Provided there is no large discrepancy between the lines, the state vector and covariance matrix in the SLAM filter are then augmented to include the estimate of the new feature:

$${\bf{\hat x}}_{{\rm{aug}}} \left( k \right) = \left[ {\matrix{ {{\bf{\hat x}}\left( k \right)} \hfill \cr {{\bf{m}}_i^n \left( k \right)} \hfill \cr } } \right]$$
(20.75)
$$ {\bf{P}}_{\rm{aug}} \left( k \right) = \left[ \begin{matrix}{\bf{I}} & 0 \cr {\nabla {\bf{G}}_p } & {\nabla {\bf{G}}_z } \end{matrix} \right] \left[ \begin{matrix}{{\bf{P}}\left( k \right)} & 0 \cr 0 & {{\bf{R}}_{2 \times 2} } \end{matrix} \right] \left[ \begin{matrix}{\bf{I}} & 0 \cr {\nabla {\bf{G}}_p } & {\nabla {\bf{G}}_z } \end{matrix} \right]^T $$
(20.76)

where ∇G p and ∇G z are the Jacobians of the initialization function G(.) with respect to the pose states \(\left( {{\bf{p}}_1^n,\,{\bf{p}}_2^n,\Psi _1^n,\Psi _2^n } \right)\) and the observations (z 1, z 2), respectively, and R 2×2 is

$${\bf{R}}_{2 \times 2} = \left[ {\matrix{ {\bf{R}} \hfill & {\rm{0}} \hfill \cr {\rm{0}} \hfill & {\bf{R}} \hfill \cr } } \right]$$
(20.77)

3.2.4 Recovering the Information from Remaining Stored Observations

Once two observations have been used to initialize the 3D position of the feature into the filter, the remaining stored observations (z 1, z 2,…, z j ) are run through a batch EKF update. The update corrects not only the current feature being initialized but also the other features in the map and the current vehicle state estimates. The updated state vector and state covariance is calculated using EKF update equations described above for the range-bearing algorithm (i.e., Eqs. 21.56 and 21.57) where the innovation v(k) is composed using all of the stored observations for the feature:

$$v\left( k \right) = \left[ {\matrix{ {{\bf{z}}_{1,{\rm{pix}}} - {\bf{H}}\left( {{\bf{p}}_1^n,\Psi _1^n,{\bf{m}}_i^n } \right)} \cr {{\bf{z}}_{2,{\rm{pix}}} - {\bf{H}}\left( {{\bf{p}}_2^n,\Psi _2^n,{\bf{m}}_i^n } \right)} \cr \vdots \cr {{\bf{z}}_{j,{\rm{pix}}} - {\bf{H}}\left( {{\bf{p}}_j^n,\Psi _j^n,{\bf{m}}_j^n } \right)} \cr } } \right]$$
(20.78)

Once the update has been completed, pose states that no longer have any associated stored observations been removed from the state vector and their corresponding rows and columns removed from the covariance matrix. Finally, in the case of global SLAM, the newly initialized feature position is transformed from the local navigation frame into the ECEF frame. If local SLAM is performed, the feature position remains in the local navigation frame.

3.3 Data Association

When performing data association for features that have already been initialized into the map, the same methods are used as in the range/bearing SLAM case shown in Sect. 21.3.3. Issues arise when attempting to find a data association test that can be performed for uninitialized features. Since the exact 3D position of the feature is not known, one cannot consistently calculate the innovation or innovation covariance of the feature. Instead, from only one or a small number of observations with a small baseline, the observation could lie anywhere in 3D space along the line-of-sight of the observation.

In order to associate observations of features that have not yet been initialized into the 3D map, a multi-hypothesis of Gaussian distributions is created for the possible 3D locations of the feature along the line of sight vector for the first observation of the feature.

3.3.1 Starting a New Feature

When an observation is made in the image that cannot be associated to any other previously seen feature, initialized or uninitialized, it is assumed that this observation has come from a new feature that has not been seen before. The process begins by storing the observation and augmenting the EKF state vector with the current vehicle pose (see Sect. 21.4.2.1).

From the single observation, a set of equally weighted hypotheses are created for where the feature could lie in 3D space along the line of sight. The mean \(\left( {{\bf{\hat x}}_j } \right)\) and covariance (P j ) for each hypothesis are calculated for several different values of range (r j ) in equal increments from an expected minimum and maximum sensor range as shown in the left of Fig. 21.4 using Eqs. 21.79 and 21.80:

$$\matrix{ {{\bf{\hat x}}_j } \hfill & { = {\bf{G}}\left( {{\bf{p}}^n \left( k \right),\Psi ^n \left( k \right),{\bf{z}}_i \left( k \right),r_j } \right)} \hfill \cr {} \hfill & { = {\bf{p}}^n + C_b^n {\bf{p}}_{sb}^b + r_j.\left( {C_b^n C_s^b {\bf{p}}_{ms}^s } \right)} \hfill \cr {} \hfill & {} \hfill \cr } $$
(20.79)
$${\bf{P}}_j = \nabla {\bf{G}}_v {\bf{P}}_{vv} \nabla {\bf{G}}_v^T + \nabla {\bf{G}}_z {\bf{R}}_{{\rm{ang}}} \nabla {\bf{G}}_z^T $$
(20.80)
Fig. 21.4
figure 4

Data association of observations to uninitialized features. When a feature is seen for the first time, a set of hypotheses for the 3D position of the feature is generated at equal range increments along the line of sight (top). Future observations are checked for matches to any of the hypotheses. When a match is made to one of the hypotheses, the remaining hypotheses that do not match are culled (bottom)

where \({\bf{p}}_{ms}^s \) is calculated from the observation data using Eq. 21.71 and ∇G v , ∇G z are the Jacobians of the function G(.) with respect to vehicle states and the observation and range data, respectively. The number of hypotheses used and the maximum and minimum range and thus the spacing between the hypotheses depend on the desired accuracy in the initial feature position with more hypotheses resulting in a better initialization. A record of the multi-hypothesis distribution is maintained separately from the state vector and is used only to assist in associating future observations of the feature.

3.3.2 Associating Future Observations and Maintaining Feature Hypotheses

Since each hypothesis is Gaussian with a mean defined in 3D space, the innovation and innovation covariance can be calculated for each hypotheses for each uninitialized feature using

$$v\left( k \right) = {\bf{z}}_{i,{\rm{ang}}} \left( k \right) - {\bf{H}}\left( {{\bf{p}}^n \left( k \right),\Psi ^n \left( k \right),{\bf{\hat x}}_j } \right)$$
(20.81)
$${\bf{S}}\left( k \right) = \nabla {\bf{H}}_x \left( k \right){\bf{P}}_{{\rm{hyp}}} \left( k \right)\nabla {\bf{H}}_x^T \left( k \right) + {\bf{R}}_{{\rm{ang}}} $$
(20.82)
$${\bf{p}}_{{\rm{hyp}}} \left( k \right) = \left[ {\matrix{ {{\bf{P}}_{pp} } \hfill & 0 \hfill \cr 0 \hfill & {{\bf{P}}_j } \hfill \cr } } \right]$$
(20.83)

where P pp is the covariance subblock of the current vehicle position and attitude states. An approximation ismade that the current vehicle state and the hypothesis are uncorrelated in order to simplify the data association process; to account for these correlations in the data association process would require a large computational resource. The result of this approximation is that during large SLAM loop closures, the EKF may not associate observations of a new feature and some observations may be discarded. It may thus take longer for new features at this time to be integrated into the map. For a given observation and for each hypothesis for a given uninitialized feature, γ is calculated using Eqs. 21.55, 21.81, and 21.82. If the value of γ is below the threshold corresponding to a 95% confidence for at least one of the hypotheses, then the observation is matched to this uninitialized feature.

In order to simplify the association of uninitialized features, when an association is made between an observation and one of the hypotheses for a given feature, all other hypotheses of this feature for which the observation does not match are culled from the set of hypotheses from which to associate future observations. As the vehicle moves around an uninitialized feature, the number of hypotheses gradually drops until only one hypothesis matches, the one that is closest to the true 3D feature location. The bottom sub-figure of Fig. 21.4 illustrates this process.

3.3.3 Data Association Procedure

Each time observations from the feature extraction process are received, the procedure begins by using Eq. 21.55 to evaluate the potential matching between each observation and each of the 3D initialized features. Observations that match 3D initialized features are associated and sent on to the SLAM filter to be updated. In the event of multiple features matching a single observation, the matching with the lowest value of γ will be accepted.

The remaining observations are tested for matches with each of the hypotheses for each uninitialized feature. Observations that match with at least one hypothesis of an uninitialized feature are associated to this feature. The observation itself is stored, and the vehicle pose at the current time is then added to the state vector (see Sect. 21.4.2.1). In the event of multiple uninitialized features matching a single observation, all matchings to this observation are rejected.

For each remaining observation not matched to an initialized or un-initialized feature, a new set of hypotheses is created (see Sect. 21.4.3.1).

The proposed multi-hypothesis method for data association could also potentially be used for initializing the feature position, as has been demonstrated in Kwok and Dissanayake (2004), where hypotheses are pruned until only one is left, which then becomes the initialized feature. In this approach, the line-of-sight intersection method is used for calculating the initial feature position; in order to achieve the same accuracy with the multi-hypothesis method, one would require a prohibitively large number of hypotheses (i.e., 1-m resolution for a feature at a range of 200 m would require 200 hypotheses).

4 Multi-vehicle Inertial SLAM Algorithm

In multi-vehicle SLAM, several vehicles move over a common section of terrain where the task is to build a common map of the environment while providing localization estimates to each platform. The use of multiple cooperating vehicles has many advantages over SLAM on a single vehicle. Multiple vehicles provide wider sensor coverage and thus can be used to build more extensive terrain maps in less time. The accuracy of the constructed terrain map is greater than in the single-vehicle case as multiple vehicles contribute information toward a given feature location. This increased terrain map accuracy also creates an increase in the accuracy of the localization estimates for each vehicle.

The work by Fenwick et al. (2002), Mourikis and Roumeliotis (2005), Walter and Leonard (2004), and Thrun and Lui (2003a) provides examples of multivehicle SLAM where all vehicles send their sensor data to a central Kalman filter. These approaches are fully centralized, involving the communication of raw sensor data from each vehicle to a central source and thus requiring a large amount of communication bandwidth. In Nettleton et al. (2003) and Ong et al. (2003), the authors demonstrate a decentralized architecture for multi-vehicle SLAM using a mathematically equivalent form of the EKF known as the Extended Information Filter (EIF). The use of the EIF in these approaches avoids the need to communicate raw sensor data, where instead this raw data is used in SLAM by each vehicle locally before communicating the map information.

In this section, data fusion architectures are considered for sharing map information built by each vehicle using the inertial SLAM algorithms discussed in Sects. 21.3 and 21.4. The core principle behind the data fusion schemes is the use of the EIF which allows for the map data contributions from each platform to be summed together where the processes of the combined estimation task are distributed among the vehicles. Both centralized and decentralized architectures are discussed in Sects. 21.5.2 and 21.5.3.

4.1 Global Vs. Local SLAM for Multiple Vehicles

When global SLAM is performed by each of the vehicles in the data fusion network, each vehicle uses the ECEF frame for referencing the position of terrain features in the environment. In the case of local SLAM, however, each vehicle may use its own independent local navigation frame in which it builds its local map. In order to fuse information from multiple maps, the relative transformations between each local navigation frame must be known. In the case where no localization or prior terrain reference information is available, the vehicles can compute a relative transformation by matching at least two features from each local map (for an example of this process, the reader is referred to Fox et al. (2006)). Once the transformation is known, a common representation of the environment can be built with respect to each vehicle’s local navigation frame.

4.2 Centralized Architectures for Multi-vehicle Inertial SLAM

This section presents a centralized, distributed data fusion architecture for multivehicle inertial SLAM. The architecture is centralized, where some part of the data fusion process is performed at a central node. The architecture is also distributed; rather than communicate all of the raw sensor data from both the inertial and terrain sensors to a central data fusion source, each vehicle firstly performs singlevehicle inertial SLAM as shown in Sects. 21.3 and 21.4. The local terrain map built on each vehicle is then communicated in information form to a central source at regular intervals where data fusion is performed. Finally, the central data fusion node communicates the fused map information back to each vehicle, where this information is fused back into the local map. The following subsections describe the process in more detail.

4.2.1 Centralized, Distributed Data Fusion

The centralized, distributed data fusion is based on the independent opinion pool architecture shown in Manyika and Durrant-Whyte (1994). At regular intervals, each vehicle takes its current state estimate relating to the map estimates only, that is, x m (k) and P mm (k), where

$${\bf{x}}_m \left( k \right) = \left[ {\matrix{ {{\bf{m}}_1^e \left( k \right)} \cr {{\bf{m}}_2^e \left( k \right)} \cr \vdots \cr {{\bf{m}}_N^e \left( k \right)} \cr } } \right]$$
(20.84)

for the case of global SLAM and

$${\bf{x}}_m \left( k \right) = \left[ {\matrix{ {{\bf{m}}_1^n \left( k \right)} \cr {{\bf{m}}_2^n \left( k \right)} \cr \vdots \cr {{\bf{m}}_N^n \left( k \right)} \cr } } \right]$$
(20.85)

for the case of local SLAM. P mm (k)/ is a 3N × 3N matrix of the elements of P k relating to the map feature estimates. Each vehicle then calculates its posterior information:

$${\bf{Y}}_j \left( k \right) = {\bf{P}}_{mm}^{ - 1} \left( k \right)$$
(20.86)
$${\bf{y}}_j \left( k \right) = {\bf{Y}}_j \left( k \right){\bf{x}}_m \left( k \right)$$
(20.87)

for the j th vehicle where j = 1, …, M, where M is the number of vehicles, and communicates this to the central map filter. The information that is sent will obviously be correlated to the information that was sent in the previous communication (since each vehicle’s posterior information is based on the entire history of observations it has made). To overcome this, the central data filter maintains a record of the information that it has been sent in the previous communication (Y j (k − 1), y j (k − 1)) by each vehicle. When the new information arrives, the old information is subtracted from it before adding it to the central map information, in order to remove the correlations and only count new information. The central map information update at the central data filter is thus

$${\bf{Y}}_{{\rm{central}}} \left( k \right) = {\bf{Y}}_{{\rm{central}}} \left( {k - 1} \right) + \sum\limits_{j = 1}^M {\left( {{\bf{Y}}_j \left( k \right) - {\bf{Y}}_j \left( {k - 1} \right)} \right)} $$
(20.88)
$${\bf{y}}_{{\rm{central}}} \left( k \right) = {\bf{y}}_{{\rm{central}}} \left( {k - 1} \right) + \sum\limits_{j = 1}^M {\left( {{\bf{y}}_j \left( k \right) - {\bf{y}}_j \left( {k - 1} \right)} \right)} $$
(20.89)

Once the information is combined in the central filter, a state-space estimate of the map feature locations and covariance can be recovered using Eqs. 21.98 and 21.99:

$${\bf{P}}_{mm,{\rm{central}}} \left( k \right) = {\bf{Y}}_{{\rm{central}}}^{ - 1} \left( k \right)$$
(20.90)
$${\bf{P}}_{mm,{\rm{central }}} \left( k \right) = {\bf{P}}_{mm,{\rm{central}}} \left( k \right){\bf{y}}_{{\rm{central}}} \left( k \right)$$
(20.91)

4.2.2 Applying Local Node Feedback to the Independent Opinion Pool

So that each vehicle’s localization estimates can benefit from the observations of features made by other vehicles, information about the central map should be fed back to each of the local nodes. In the same way that was done on the central data filter, each vehicle must store the last information update that it received from the central filter (Y central(k − 1),y central(k − 1)) so as not to double count the information that has been sent to it. Thus, when each vehicle receives the communicated central information, it firstly computes its posterior information over the entire state space consisting of local vehicle estimate and map features using Eqs. 21.86 and 21.87 and updates this information using Eqs. 21.92 and 21.93:

$${\bf{Y}}_{{\rm{local}}} \left( k \right) = {\bf{Y}}_{{\rm{local}}} \left( k \right) + \left( {{\bf{Y}}_{{\rm{central}}} \left( k \right) - {\bf{Y}}_{{\rm{central}}} \left( {k - 1} \right)} \right)$$
(20.92)
$${\bf{y}}_{{\rm{local}}} \left( k \right) = {\bf{y}}_{{\rm{local}}} \left( k \right) + \left( {{\bf{y}}_{{\rm{central}}} \left( k \right) - {\bf{y}}_{{\rm{central}}} \left( {k - 1} \right)} \right)$$
(20.93)

The local information is then transformed back into state-space and covariance form to provide the updated estimate of the vehicle localization and map features, which is substituted back into the EKF in the single-vehicle SLAM architecture. The operation of the central filter with local node feedback is illustrated in Fig. 21.5.

Fig. 21.5
figure 5

Distributed centralized multi-vehicle SLAM: independent opinion pool architecture with local node feedback. Each vehicle communicates its posterior map estimates in information form which are added together at a central data filter. The central data filter then feeds back the information to each of the vehicles in order to update their local maps

This centralized, distributed architecture has several advantages over a completely centralized filter such as a reduction in the required communication bandwidth (as only local estimates must be communicated, not observations and process model inputs) and the ability to deal with intermittent communications and delays as the information is maintained on the local vehicle.

4.3 Decentralized Architectures for Multi-vehicle Inertial SLAM

The multi-vehicle inertial SLAM algorithm can be decentralized by removing the central filter, where each vehicle now communicates directly to each other vehicle in the network. This type of architecture was demonstrated for feature tracking tasks and SLAM in Nettleton (2003). At regular intervals each UAV takes its current state estimate relating to the map estimates only, that is, x m (k) and P mm (k), and calculates its posterior information using Eqs. 21.86 and 21.87. Each UAV maintains a record of the information sent during the last communication (i.e., Y j (k − 1),y j (k − 1)) which is subtracted from the current information to form the new information that UAV has about the feature map:

$${\bf{Y}}_{j,{\rm{new}}} \left( k \right) = {\bf{Y}}_j \left( k \right) - {\bf{Y}}_j \left( {k - 1} \right)$$
(20.94)
$${\bf{y}}_{j,{\rm{new}}} \left( k \right) = {\bf{y}}_j \left( k \right) - {\bf{y}}_j \left( {k - 1} \right)$$
(20.95)

This new information is then communicated to each of the other UAVs. When each UAV receives all of the information updates from each of the other UAVs, this information is summed together along with the current UAV information to form the updated estimate of the map features in information form:

$${\bf{Y}}_{j,{\rm{update}}} \left( k \right) = {\bf{Y}}_j \left( k \right) + \sum\limits_{i = 1}^M {{\bf{Y}}_{i,{\rm{new}}} \left( k \right)} $$
(20.96)
$${\bf{y}}_{j,{\rm{update}}} \left( k \right) = {\bf{y}}_j \left( k \right) + \sum\limits_{i = 1}^M {{\bf{y}}_{i,{\rm{new}}} \left( k \right)} $$
(20.97)

Once all of the information from other UAVs is combined in the update, a state- space estimate of the map feature locations and covariance can be recovered back into the EKF using Eqs. 21.98 and 21.99:

$${\bf{P}}_{j,mm,{\rm{update}}} \left( k \right) = {\bf{Y}}_{j,{\rm{update}}}^{ - 1} \left( k \right)$$
(20.98)
$${\bf{x}}_{j,m,{\rm{update}}} \left( k \right) = {\bf{P}}_{j,mm,{\rm{update}}} \left( k \right){\bf{y}}_{j,{\rm{update}}} \left( k \right)$$
(20.99)

The operation of the decentralized SLAM filter is illustrated in Fig. 21.6.

Fig. 21.6
figure 6

Decentralized SLAM data fusion architecture. Each UAV communicates to each other UAV in the team its latest map state estimate in inverse covariance (information) form. Information is then added at the receiving end and converted from information space back into state space into the EKF

4.4 Delayed Observations, Network Outages, and Communication Bandwidth Constraints

Realistic communication networks between the vehicles will not be able to provide continuous and instantaneous communication of information. Instead, real networks will contain significant delays and outages between different nodes when vehicles move out of range of one another and will not always be able to communicate all of the map information when the map becomes very large.

Delayed observations are not a significant issue in multi-vehicle SLAM as features are considered stationary, and thus information about a feature’s location is independent of time and can be added in a delayed fashion and out of order. When there are outages in the network communications, this can cause the vehicles to lose track of the common information they possess. This can be overcome by constraining the structure of the communications network to tree structures (Nettleton 2003). When communication bandwidth constraints apply across the network, the vehicles may only communicate information representing a subset of the features contained in their map. In this case, information must be fused together using the covariance intersect algorithm (Julier and Uhlmann 2001) due to correlations between the submap and the rest of the map features, which is not accounted for in further communications. These concepts are all discussed further in Nettleton (2003).

5 Conclusion

This chapter has developed the basic equations and methods for inertial sensor- based SLAM. The fundamental equations that model inertial sensors and inertial localization were analyzed; two different applications were examined for when localization and mapping is performed in either a global frame of reference or in an arbitrary local frame of reference. The inertial SLAM algorithms were examined for the case of range and bearing observations from a terrain sensor and also when terrain observations were made from a bearing-only sensor. Finally, the problem of multi-vehicle inertial SLAM was examined. Two different types of data fusion architecture were considered: firstly, centralized architectures in which map information is shared among vehicles via a central communications source and, secondly, decentralized architectures where the vehicles communicate and share data with each other directly.