1 Introduction

It is well-known that before attempting a mission involving a ground vehicle in off-road conditions a reliable and comprehensive analysis of the mobility capabilities of such a vehicle is desired. This goal can be solved by means of computer simulation, where terrain profile plays a key role. Traditionally, this analysis considers nominal values for the key variables involved in the simulation (e.g. elevation). This leads to unreliable and limited results due to the uncertainty present in those variables (Lessem et al. 1996; Peynot et al. 2014).

Terrain geometry information typically comes from remote sensory sources (e.g. radar technology, imagery methods, etc.). Those techniques lead to models of the terrain with uncertainty associated with the spatial position of data points. Thus, any elevation model of the terrain is corrupted by uncertainty. Digital Elevation Models (DEMs) produced by the US Geological Survey agency are a good example of this issue (Zhou and Liu 2004).

Here, we describe a stochastic approach for vehicle mobility prediction over large regions [>\(5 \times 5\) (km\(^2\))]. In this framework, a model of the terrain is created using geostatistical methods. The performance of a vehicle is then evaluated while considering the terrain profile. In order to account for uncertainty, Monte Carlo simulations are performed, leading to a statistical analysis. Uncertainty in elevation is due to the new interpolated terrain model to a higher spatial resolution than the original DEM (through a geostatistical method called Ordinary Kriging (Chiles and Delfiner 2012)). The ultimate goal is to compare different routes using different cost functions in the path planning algorithm and various performance indices.

Notice that this work is framed in the context of the next-generation NRMM military software (the current software is described in Haley et al. (1979)). One requirement is that such software can be executed on general-purpose computers available in current military vehicles or military camps in battle fields. In this regards, the proposed approach means an efficient solution in terms of computation burden, and demonstrates its suitability on a standard-performance computer.

The rest of the paper is organized as follows. Related work is reviewed in Sect. 2. Section 3 gives a general view of the proposed methodology. Section 4 reviews some geostatistical tools used in this research and a subsampling approach is also proposed to reduce the complexity of a large DEM. The mobility prediction problem is explained in Sect. 5. Illustrative examples are given in Sect. 6. Finally, conclusions and future work are discussed in Sect. 7.

2 Related work

This paper addresses the problem of manned and unmanned vehicles mobility prediction in off-road conditions. In this context, a comprehensive survey of the state-of-the-art is found in Papadakis (2013). As in that survey, many publications cope with 3D path planning in the vicinity of the robot (Goldberg et al. 2002; Thrun et al. 2006). Those approaches are generally not appropriate for planning longer routes over large environments (e.g. Curiosity rover over a large Martian region or a military vehicle over a battle field) because they are based on sensors that perceive only the surrounding environment (e.g. stereovision, LIDAR).

Regarding the issue of path planning over longer routes, an important research effort has been made in the field of combining remote data with ground data. The focus of that work has been mainly in the fields of obstacle avoidance (Stentz et al. 2003) and robot location (Vandapel et al. 2006). In this way, some ideas proposed in this paper were inspired by the high-fidelity traversability analysis proposed in Helmick et al. (2009) and the research related to combining remote and ground data in Stentz et al. (2002); Vandapel et al. 2006). In Helmick et al. (2009) mobility prediction is carried out online (every 20–50 m) in order to find an optimal path between the current robot position and a desired target. In this way, many paths are randomly generated between the robot position and the next waypoint; the robot motion is thus analyzed for each path, and eventually the route with the lowest cost is followed.

In our case, mobility prediction is planned for the entire route before the vehicle moves (planning stage). In Vandapel et al. (2006) a 3D elevation map is used to test the performance of a vehicle traversing between two given points. In this case, a kinematic-model-based analysis is performed. But only a binary answer is given for each cell in the map: traversable or non-traversable. After that, a path planner is used to determine the path of least cost in terms of a performance index. In Stentz et al. (2002) an unmanned air vehicle that flies ahead of the vehicle is used to detect holes and other hazards before the onboard vehicle sensors would be able to detect them.

The research in Lessem et al. (1996) addressed the problem of stochastic mobility prediction in military vehicles. However, uncertainty is arbitrarily selected depending on “expert opinions”. The output is a speed map based on the maximum and minimum speeds that a military vehicle can achieve in each map cell. In contrast, in our methodology uncertainty comes from a rigorous mathematical formulation (ordinary Kriging). Furthermore, it results in an optimal route between two desired points in terms of several performance indices.

Another similar research to that described here is found in Peynot et al. (2014). Here, the authors define mobility prediction as the problem of estimating the likely behavior of a planetary exploration rover in response to given control actions on a given terrain. In this sense, uncertainty is in the control rather than in the DEM/terrain map.

Mobility prediction has also been considered in terms of the vehicle-terrain interactions (Ishigami et al. 2009; Willoughby et al. 2006). In Ishigami et al. (2009), a statistical method for mobility prediction considering uncertainty in terrain physical properties (cohesion and internal friction angle) is proposed. In this case, both a vehicle dynamic model and a wheel-terrain contact model are employed for simulating the robot motion considering uncertainty in terrain parameters.

In Karumanchi et al. (2010) a non-parametric learning technique is presented to generate a mobility map based on terrain elevation and wheel slip. Path planning takes advantage of such map improving vehicle heading and velocity in off-road slopes.

On the other hand, some research projects focus on the reconstruction of a 3D surface (digital elevation map) from sparse data obtained from a remote sensor (Hadsell et al. 2009; Kweon and Kanade 1992). However, these works do not consider the second goal of our research, that is, stochastic mobility prediction.

3 General framework

The first problem addressed in this paper is framed within the context of estimating the values of a regionalized variable (elevation) at places where it has not been measured and, after that, analyzing the performance of a vehicle over such terrain.

Figure 1 shows the methodology followed in this research. Initially, a DEMFootnote 1 is selected related to the environment where the vehicle is going to operate. After that, a reduced-order representation of the DEM points is obtained via a subsampling approach. This is required in order to enable an affordable computation of the variogram and Kriging method. Once a set of representative points, in terms of the variogram and elevation profile, are selected, the ordinary Kriging method is applied. This procedure yields a model of the terrain at a finer resolution (see Assumption 1). This model can be applied to statistical simulation because each interpolated point has an associated uncertainty (Kriging variance). Next, the D* algorithm is applied in order to obtain multiple routes between two points (i.e. starting point and desired goal or waypoint). Finally, the optimal route is given in terms of some performance indices (e.g. the shortest path, the path with the lowest uncertainty, the flattest route, etc.) (see Assumption 2).

Fig. 1
figure 1

Schematic view of the steps carried out in this research for predicting the mobility of a ground vehicle over a large region [>\(5 \times 5\) (km\(^2\))]

Assumption 1

We assume that there is no uncertainty or error associated to the original DEM of the terrain (positional errors, data precision, etc.). In this paper, the uncertainty is only considered after obtaining a model of the terrain at a finer spatial resolution (new interpolated points), that is, after applying Kriging.

Assumption 2

It is assumed that the method proposed here will be used for the mobility of car-like vehicles. In particular, the testbed considered in this research is the known Humvee military vehicle (AM General 2015). The reasoning behind the spatial resolution of the model of the terrain and the use of a car-like kinematic model in the simulations are both based on the dimensions and features of such vehicle.

The main features and limitations of this approach can be summarized as follows:

  • Global path planning is considered rather than local path planning (planning in the vicinity of the vehicle). The latter problem is mainly related to detection and avoidance of local obstacles using onboard sensors. In this sense, this research does not ensure local obstacle avoidance. It is assumed that a lower layer to the one proposed here will be available in the real vehicle.

  • This solution does not result in a binary answer, i.e. the path is traversable or not; instead statistical data supporting each decision is given.

4 Reconstruction of a 3D terrain surface from remote data: spatial prediction problem

This section introduces the geostatistical tools used in this research (Sect. 4.1). The use of geostatistical tools is motivated because the original DEM has a resolution that is too coarse for vehicle mobility prediction (\(>\)30 m). Thus, an interpolation strategy is required in order to obtain a model of the terrain at a finer resolution. In our case, the desired resolution is the double of the vehicle length, that is, 10 m (Humvee’s length is 5 m, see Sect. 6 for details, and Remark 1 for a discussion about this reasoning). The most common approach is ordinary Kriging, which estimates the elevation of points (at a finer resolution) depending on the elevation and spatial arrangement of measured points (variogram). However, before attempting Kriging, a subsampling step is required in order to reduce the number of points of the original DEM (Sect. 4.2). This step is necessary to perform efficient geostatistics-related computations on a standard-performance computer (as remarked in the introductory section).

It bears mentioning that in this research DEM of the terrain are employed. Since the dimensions of the grid are known and the number of observations in each row is known, the implicit spatial relationships between elevation values can be determined. There are mainly two types of DEM depending on the coverage and on the resolution. The US Geological Survey (USGS) agency produced a DEM with coverage of the US with a resolution of 30 m. A worldwide coverage is offered by the SRTM NASA and NGA agencies. The SRTM DEM have a resolution of 90 m. More information about those formats can be found in Webgis (2015).

Remark 1

There is not a clear answer to what is the most appropriate spatial resolution in order to perform a reliable stochastic mobility prediction analysis. It is not known whether any detailed study on this issue has been performed. It appears that spatial resolution of data for 3D terrain models should be dependent on the size of the vehicle, the variability of the terrain, and on the nature of any natural or man-made obstacles that the vehicle must negotiate. In this research, a resolution of 10 m has been considered for the interpolated model of the terrain (originally such model was 30- or 90-m resolution) because the length of the intended vehicle (i.e. Humvee military vehicle) has a length of 5 m. Smaller resolutions to 10 m led to unfeasible solutions (impractical computation time). This key point will be considered in the coming research.

4.1 Fundamentals of geostatistics

Geostatistics aims at providing quantitative descriptions of natural variables distributed in space time (Bivand et al. 2013; Chiles and Delfiner 2012; Isaaks and Srivastava 1989; van der Meer 2012; Webster and Oliver 2007). Furthermore, it deals with a methodology to quantify spatial uncertainty. Geostatistical modeling is based on the statistical relationships among the measured points. It then produces an interpolation function based on a covariance or variogram model derived from the data, rather than an a priori model of the interpolation function (as it is the case of deterministic interpolation) (Bohling 2005). For that reason, geostatistics is data-driven rather than model-driven.

4.1.1 Variogram

A variogram describes the statistical relationship between points as a function of point separation. That is, it shows how the dissimilarity between z(s) and \(z(s+h)\) evolves with the separation, or lag distance, h (Chiles and Delfiner 2012; Isaaks and Srivastava 1989; Lakhankar et al. 2010). In this sense, the variogram represents the spatial correlation between available samples, and it is calculated according to:

$$\begin{aligned} \gamma (h) = \frac{1}{2 N} \sum _{i=1}^{N}\big ( z(s_i) - z(s_i + h)\big )^2, \end{aligned}$$
(1)

where N is the number of data points, \(z(s_i)\) is the elevation at point \(s_i\), and h represents the lag, that is, the vector separating the locations of two variables (ASTM Standards 1996). For instance, \(h= 1 \times d,\ldots \), d being the distance between points. If points are sampled at 10 m, then \(d = 10\), and the variogram is calculated for the “lag”: [10, 20, 30, \(\ldots \)].

An empirical variogram provides information on the spatial autocorrelation considering the raw data, but does not provide information for all possible directions and distances. For that reason, it is necessary to fit a model (i.e. a continuous function) to the empirical variogram. There are several variogram models, with the most common ones: spherical, exponential and Gaussian. More information about theoretical variogram models can be found in Chiles and Delfiner (2012).

Some important features of a variogram model are:

  • It starts at zero (for \(h = 0, z(s_i+h)-z(s_i) = 0)\). Except when there is a “nugget effect”, for further detail (Chiles and Delfiner 2012).

  • It generally increases with h.

  • Range reflects the degree of dissimilarity of ever more distant samples. If the variogram stabilizes at a certain level, such level is called sill. In other words, the range represents the value of the curve in the x axis when the variogram reaches steady state, and the sill is the value of such curve but in the y axis.

  • Various ways can be considered to fit a variogram model, including Chiles and Delfiner (2012) and Gorsich and Genton (2000): manual fitting, least squares (automatic fitting), and maximum likelihood. In this research, an automatic least squares approach has been used.

Notice that it is expected the points that are close assume close values because these values were generated under similar physical conditions (“same geological environment”). In contrast, at long distances the geologic conditions are different, and greater variations are to be expected.

4.1.2 Ordinary Kriging formulation

Once a variogram model has been fitted to sparse data, predictions can be made using the Kriging formulation. There are mainly three formulations of Kriging depending on whether the mean of the data is known (simple Kriging) or not (ordinary Kriging and universal Kriging). In the field of geostatistics, the most common approach is ordinary Kriging (OK) (Li and Heap 2011; Srivastava 2013; Tsui et al. 2013; Webster and Oliver 2007) because the mean is not known a priori and because only spatial variability is meaningful. Following the OK formulation, at every point where the elevation is unknown, it is interpolated following a weighted linear combination of the available samples (Isaaks and Srivastava 1989)

$$\begin{aligned} \hat{Z}(s_0) = \sum _{i=1}^{N} w_i Z(s_i), \end{aligned}$$
(2)

where \(\hat{Z}(s_0)\) is the estimated elevation of the sample located at \(s_0\) (new interpolated point), N is the number of samples, and \(w_i\) are the weights for the linear combination of known samples \(Z(s_i)\). The index i represents the samples available in the dataset.

Ordinary Kriging minimizes the mean squared error of prediction

$$\begin{aligned} \min \, \sigma _e^2 = \mathbb {E}\left[ Z(s_0) - \sum _{i=1}^N w_i Z(s_i)\right] ^2 \, s.t. \, \sum _{i=1}^N w_i = 1. \end{aligned}$$
(3)

As shown in Isaaks and Srivastava (1989), the mean squared error can be written as

$$\begin{aligned} \sigma _e^2 = 2 \sum _{i = 1}^N w_i \gamma (s_0 - s_i) - \sum _{i=1}^N \sum _{j=1}^N w_i w_j \gamma (s_i - s_j), \end{aligned}$$
(4)

recall that \(\gamma (h)\) is the variogram model.

After differentiating the previous equation with respect to the weights and setting the derivatives equal to zero, it leads to

$$\begin{aligned} \sum _{j= 1}^{N} w_j \gamma (s_i - s_j) + \lambda = \gamma (s_0 - s_i), \end{aligned}$$
(5)

where \(\lambda \) is the Lagrange parameter used for ensuring \(\sum _{i=1}^N w_i = 1\) (Isaaks and Srivastava 1989). Using matrix notation the previous system of equations can be written as

$$\begin{aligned} W= & {} C^{-1} D, \nonumber \\ W= & {} \left[ w_1, w_2, \ldots , w_N, \lambda \right] , \nonumber \\ C= & {} \left\{ \begin{array}{lll} \gamma (s_i - s_j), &{}\quad i = 1, \ldots , N; \, j = 1, \ldots , N, \\ 1, &{}\quad i = N +1; \, j = 1, \ldots , N, \\ 1, &{} \quad i = 1, \ldots , N; \, j = N+1, \\ 0, &{}\quad i = N + 1; \, j = N + 1. \end{array} \right. \nonumber \\ D= & {} \left[ \gamma (s_0 - s_1), \gamma (s_0 - s_2), \ldots , \gamma (s_0 - s_N), 1\right] ^T. \end{aligned}$$
(6)

Once the weights are calculated, the predicted elevations at the unknown locations can be estimated using Eq. (2). Additionally, the error variance (or Kriging variance) for each predicted point is obtained using Eq. (4). This Kriging variance represents the uncertainty associated with each new point. Stochastic analysis of the possible realizations of those points can then be performed.

4.2 Subsampling a DEM

The reliability of an experimental variogram is affected by the number of samples or its inverse, the density of data. Evidently when all the points of the DEM are considered, the variogram is obtained precisely. However, by increasing the number of samples the computation time required for calculating the variogram (and the subsequent Kriging-based interpolation) increases as well. In this sense, this section proposes a methodology that reduces dramatically the computation time for calculating a variogram despite a high correlation with respect to the variogram obtained if all the point of the DEM were employed (see Remark 2).

Remark 2

As remarked in the introductory section, one key requirement of the proposed approach deals with an efficient computation. Efficiency is understood in terms of being able to run the proposed methodology in a general-purpose computer in a reasonable computation time (around 1–2 h depending on the size of the environment considered). The reason why the subsampling step has been added to this methodology is because when the variogram and ordinary Kriging functions were run over the entire datasets (e.g. the two scenarios introduced in Sect. 6), the computation time increased dramatically (several hours), even the computer ran out of RAM memory (Intel Core i7 3 GHz, 16 MB RAM). After adding this subsampling step, before running Kriging, we were able to successfully apply Kriging and at a reasonable computation time.

4.2.1 Geostatistical subsampling-literature review

The standard way to estimate the variogram is through the Method of Moments (MoM) (Chiles and Delfiner 2012), Eq. (1). In Webster and Oliver (2007), the authors point out that in order to estimate a variogram following this method more than 100–150 samples are required. The MoM is followed in numerous papers where a set of discrete samples (100–300 samples) are used for estimating the variogram (Basaran et al. 2011; Bechler et at. 2013; Hosseini et al. 2014).

The work (Brus and Gruijter 1994) suggests a procedure based on sampling theory in order to determine the samples used for calculating the variogram. This procedure is based on repeatedly selecting pairs of points; the first point is randomly chosen, and the second point is chosen at distance h from the first one. Notice that many publications follow a similar approach (simple or systematic random sampling), see for example Anderson et al. (2006) and Lakhankar et al. (2010).

As shown in this paper, systematic sampling is effective for variogram estimation, but shows a poor performance in terms of fixing the elevation profile of the entire DEM. Recall that this issue is fundamental in case of subsequent Kriging (i.e. obtaining a new DEM at a finer spatial resolution). Additionally, we demonstrate that the precision of a variogram does not only depend on the number of samples. In fact, it depends on the sampling method. Additionally, those papers neither present an analysis of the computation time required nor statistical evidence of how well the samples fit the variogram of the entire dataset.

On the other hand, the residual maximum likelihood method (REML) constitutes a methodology for obtaining a variogram (Kerry et al. 2010; Webster and Oliver 2007). In this case, the three parameters that define a variogram (range, sill, and nugget) are solved by means of an optimization problem (maximum likelihood problem). Specifically the model parameters are obtained directly from the generalized increments of a covariance matrix of the full data. However, as explained in Webster and Oliver (2007): it is computationally intensive; it is only valid for three variogram functions; and it too readily converges to a local optimum. For those reasons, we focus on the known method of moments in order to estimate the variogram. Additionally, it bears mentioning that REML has been applied to identify the optimal sampling interval in order to get a map or set of samples before attempting Kriging (Kerry et al. 2010). This issue is not exactly applied to the subsampling problem that we attempt in this research.

4.2.2 Systematic sampling

The systematic sampling method arranges the population according to some ordering scheme and then selecting elements at regular intervals (Thompson 2012). This strategy matches properly with the structure of a DEM, and with the research addressed here. Recall that in a DEM information is recorded at regularly spaced intervals (raster form). For example, if a USGS DEM sampled with a resolution of 30 m is considered, a sample would be selected every 30 m.

4.2.3 Stratified sampling via K-means algorithm

The stratified sampling approach is based on partitioning a given population into homogeneous subgroups or “strata”. Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected. The principle of stratification is to partition the population in such a way that the units within a stratum are as similar as possible (Pengelly 2002; Rubinstein and Kroese 2007).

The major concern regarding the stratified sampling approach is how to separate the original DEM into different strata. In our context, one reasonable approach is to find such strata depending on elevation. However, this issue is not simple because of the continuous nature of the elevation, especially in uneven terrain. To solve this problem, K-means clustering has been considered (Wu 2012). Notice that K-means has already been applied in geostatistics, see for example (Arieira et al. 2011; Kumar et al. 2011). In such publications, K-means is mainly used for stratifying a large region into similar areas mainly in terms of habitat type and soil type. Here, the K-means algorithm is used to split the DEM in terms of elevation. After that, one or two points are randomly selected within each strata. Figure 2 shows the application of the proposed stratified sampling method to the two scenarios considered in this research (those scenarios are introduced in Sect. 6).

Fig. 2
figure 2

Stratification of two types of DEM. The first one deals with a 30-m resolution DEM and the second example is related to a 90-m resolution DEM. Colors represent different layers or strata. a Death Valley (CA, USA). No. strata = 15. b Sahara desert (Chad, Africa). No. strata = 20

The key issue of this approach deals with the number of centers used by the K-means algorithm. This is a major issue related to any kind of algorithm based on K-means (Wu 2012). In this paper, the optimal value is obtained according to the theoretical variogram fitted to the DEM (explained subsequently).

Notice that a method called DP-Means has been recently proposed in order to solve this issue (Kulis and Jordan 2012). Specifically, DP-Means uses a scale/penalty parameter to control the creation of new clusters during clustering rather than requiring a fixed K as in K-means. In this way, a new cluster is only formed when a point is sufficiently far away from all existing cluster centroids. It bears mentioning that this solution is not applicable to our problem because we do not need to cluster the elevation profile according to a fixed number of strata. In fact, we are interested in testing different strata/samples in order to reduce the error in fixing a given variogram model (i.e. the variogram model of the entire DEM).

4.2.4 Proposed methodology: combining systematic and stratified sampling

Recall that the goal of the subsampling approach is to find the minimum number of samples required such that the variogram of this reduced DEM fits the theoretical variogram of the raw DEM. In order to solve this issue the properties of the variogram and the sampling methods must be taken into account.

First, the variogram is calculated by Eq. (1). To obtain a meaningful set of samples the reduced data set should be as sparse as possible in order to cover the whole range of distances in the original DEM. If this fact is not ensured, the variogram would not fit properly the sill. On the other hand, the set of samples should comprise the range of elevations presents in the original DEM as much as possible. This issue has an effect on the range of the variogram. Additionally, we are interested in fitting the elevation profile in order to obtain a meaningful model after applying the Kriging process.

One proposal of combining systematic and stratified sampling approaches is motivated by the following fact. Notice that the systematic sampling approach leads to a sparse set of samples, which cover the entire dataset (Fig. 3a). This feature is not found using the stratified sampling approach (Fig. 3b). In fact, many samples are clustered at the top right corner. This is explained because of the hill in the environment (this experiment is related to the Airport Lake region, which is introduced in detail in Sect. 6). Recall that the stratified sampling approach deals with getting samples regarding the different strata (elevations) in the environment, those strata are mainly present in the hill. On the other hand, the stratified sampling approach captures properly the elevations in the original DEM as observed in Fig. 2a, but the systematic sampling approach has a worse result in terms of representing the terrain profile (i.e. elevations); see Sect. 6.2.3 for a quantitative comparison of these methods. A filter has been implemented to remove those samples that are very close, and hence, to avoid redundancy in the samples.

Fig. 3
figure 3

Samples obtained after applying the systematic and stratified sampling approaches to one of the scenarios considered in this work (Death Valley, CA, USA), see Sect. 6 for further details. a Systematic sampling, 127 samples. b Stratified sampling, 300 samples

5 Mobility prediction based on elevation uncertainty

As already mentioned, the Kriging procedure not only produces a model of the environment at a finer resolution than the original but also, and most importantly, the uncertainty associated with each new point (error estimate), see Sect. 4.1.2.

The next step deals with analyzing the performance of a vehicle moving between two given points, starting point and goal point, considering this new model of the terrain. The D* algorithm has been employed in order to obtain the optimal route between the starting and goal points (Corke 2011; Stentz 1995). In this research, three different cost functions has been considered in order to obtain the optimal route. The first cost finds the shortest route between the starting point and the goal point. For that purpose, the D* cost function considers the Euclidean distance between points (x–y plane). Notice that, in this case, uncertainty is not considered. This route is labeled as “min-distance”. The second route is obtained as the shortest distance between the starting and goal points but also minimizing the uncertainty. That is, the variance associated to each point is also considered in the D* cost function. From now on, this route is called “min-uncertainty”. Finally, the route labeled as “min-slope” represents the shortest route between the starting and goal points but also minimizing the slope between points. It bears mentioning that such slope is obtained estimating the elevation among the 8 neighbors around one point (i.e. the angle). After that, the point with the maximum angle is saved in an array paired with the sampled point. It means that when the D* algorithm is employed those angles will be minimized and it will result in the flattest route.

The following issue deals with checking the performance of the vehicle over such routes. In this sense, the kinematic model of a car-like vehicle is considered (LaValle 2006). Furthermore, a motion controller is required, in this case, the well-known pure pursuit path follower has been taken into account (Amidi 1990). This step will allow us to estimate the energy required to actually traverse each route.

Finally, n Monte Carlo simulations are performed in order to analyze the effect of uncertainty in the energy required to traverse each route. This idea is inspired by the following references in the context of soil and Earth sciences: Davis and Keller (1997), Fisher (1991), and Hunter and Goodchild (1997). Specifically, a realization for each sample within the desired route is generated according to its associated uncertainty (Kriging estimate and Kriging variance).

5.1 Performance indices

One key aspect in order to perform the stochastic mobility analysis of each route is related to the performance indices considered. In this sense, five indices have been selected: (i) length of the route, (ii) average value of the uncertainty, (iii) variation of the elevation, (iv) energy required to traverse each route, (v) average value of the slope considering the Kriging variance. The last index has been obtained simulating different realizations of the elevation of each point (according to its associated uncertainty). Thus, it represents the average slope between points after n Monte Carlo simulations (the absolute value of the difference has been considered in order to avoid problems with the sign).

The energy is calculated according to the following equation

$$\begin{aligned} E_j= & {} \sum _{m=1}^{2} \sum _{k=1}^{N_j} \tau _{m}(k) v_{m}(k), \end{aligned}$$
(7)

where \(j = 1, 2, 3\) means min-distance, min-uncertain, min-slope route, respectively; m represents the steering or the traction motor, N is the number of samples of each route it is related to the length of the route), \(\tau \) is the torque, and v is the linear velocity. Notice that the torque required for each motor has been obtained from the following equation

$$\begin{aligned} \tau _m= & {} k_t \frac{V_s - k_i \omega _m}{R}, \quad \forall \, m = 1, 2, \end{aligned}$$
(8)

where \(k_t\) is the torque constant of the motor, \(V_s\) is the nominal voltage, \(k_i\) is the inverse of the speed constant, \(\omega \) is the angular velocity, and R is the terminal resistance. Practical details about those constants are given in Table 2, see Sect. 6.4.

6 Illustrative examples

6.1 Environments

In order to validate the methods proposed in this research, two different scenarios have been analyzed.Footnote 2 First, a region of the Death Valley called Airport Lake (LAT: 35 52 30 N; LONG: \(-\)117 37 30 W), see Fig. 4. The data were obtained in the 7.5-min USGS format. This region covers an area of \(11.34 \times 13.65\)   (km\(^2\)). It is important to mention that this DEM is represented in projected coordinates (i.e. UTM system). On the other hand, a region of the Sahara desert in Chad (Africa) has also been considered (LAT: 15 01 67 N; LONG: 21 16 50 E), see Fig. 5. The data were obtained in the SRTM format. This region covers an area of \(6.55 \times 8.18\)   (km\(^2\)). As in the previous case, this DEM is represented in projected coordinates (i.e. UTM system).

Fig. 4
figure 4

Airport Lake, Death Valley (CA, USA). USGS DEM (30-m resolution). a Overhead view (Google Earth). b Detail of this environment. c DEM, raw samples (color represents elevation)

Fig. 5
figure 5

Sahara desert (Chad, Africa). SRTM DEM (90-m resolution). a Overhead view (Google Earth). b Detail of this environment. c DEM, raw samples (color represents elevation)

Figures 4c and 5c show the raw DEM related to the Airport Lake and the Sahara desert. The difference between them is that in the former the points are sampled every 30 m and in the latter are sampled every 90 m. For that reason, the first DEM is denser than the second one.

To calculate the variograms and applying Ordinary Kriging the Matlab mGstat toolbox has been employed (mGstat 2015). Additionally, the SRTM DEM was first processed in ArcGIS suite (Esri) before using it in Matlab (spherical coordinates to UTM). The stratified sampling approach has been implemented taking advantage of the VL-Feat toolbox (Vedaldi and Fulkerson 2015). The D* algorithm has been implemented using the Robotics toolbox (Corke 2011).

6.2 Subsampling algorithms

In this section the performance of the subsampling approaches before attempting Kriging and the mobility prediction analysis is examined. In particular, the following sampling approaches are compared: systematic sampling, stratified sampling, and the method based on combining systematic and stratified approaches. These methods have been applied to the Airport Lake and the Sahara desert regions.

It bears mentioning that the tuning parameter (i.e. the number of points or strata) of each subsampling approach has been selected in order to maximize the following metrics: minimum number of samples, minimum computation time, minimum error in range and sill, maximum correlation with the variogram model (Pearson’s correlation coefficient), and maximum correlation with the elevation profile (Pearson’s correlation coefficient).

6.2.1 Experiment 1: Airport Lake (Death Valley, CA, USA)

This section details the performance of the subsampling approaches considering the Airport Lake. The stratified sampling has been configured with 300 strata and one random sample has been selected within each strata (K-Means = 300). The systematic approach has been configured in order to obtain a random sample every 90 m leading to 127 samples. Additionally, two configurations of the proposed method are tested. The first one considered 15 strata and two random points for each strata plus 127 samples from the systematic sampling (156 points, 1 point was filtered). The second configuration deals with 15 strata and one random sample per strata plus 127 samples from the systematic sampling (142 points).

Firstly, we estimate the theoretical variogram of the entire DEM. In this case, the raw DEM is composed of 34,741 samples leading to a computation time of 44 min (Intel Core i7 3 GHz, 16 MB RAM). This variogram is automatically fitted, through the least squares method, by a Gaussian model (5th-order polynomial).

Figure 6a, b show the first experiments carried out in order to check the performance of the systematic sampling approach (127 samples, 0.1 s) and the stratified sampling approach (300 samples, 0.2 s), respectively. The empirical variograms do not fit the Gaussian model that represents the entire DEM (black curve). An interesting conclusion from these experiments is that the stratified sampling approach fits the range of the Gaussian model (5167 vs 5273), but not the sill (81164 vs 29814). In contrast, the systematic sampling method has a smaller error in the sill (21371 vs 29814), but there is a deviation in the range (4841 vs 5273). Notice that the units of the sill and range are (m\(^4\)) and (m), respectively.

Fig. 6
figure 6

Variograms (Airport Lake). The continuous line is the theoretical variogram considering all the raw samples, the blue circles mean the empirical variogram of each method. a Systematic sampling, 127 points. b Stratified sampling, 300 points. c Combined sampling, 156 points. d Combined sampling, 142 points

Figure 6c, d displays the result obtained with the proposed subsampling approach (156 and 142 samples). Observe that the empirical variograms fit better the variogram model obtained with the entire DEM. Especial mention is for the case where two random samples are selected within each strata (Fig. 6c). Here, as the range as the sill fit the theoretical variogram model (5089 vs 5273; 28529 vs 29814). Notice that a deep quantitative analysis is addressed in Sect. 6.2.3. As shown in Figure 7, the points selected by the proposed approach fit properly the elevation profile. Even though, in this regard the best result is obtained using the stratified sampling method.

Fig. 7
figure 7

Elevations (Airport Lake). Notice that the circles represents the subsampled points and the red profile the entire data set (raw samples). a Systematic sampling, 127 points. b Stratified sampling, 300 points. c Combined sampling, 156 points. d Combined sampling, 142 points

6.2.2 Experiment 2: Sahara Desert (Chad, Africa)

This section focuses on the performance of the subsampling approaches considering the Sahara desert. The stratified sampling has been experimentally configured with 65 strata and three random samples per strata leading to 195 samples. The systematic approach has been configured in order to obtain 73 samples. The two configurations of the proposed subsampling approach are: 23 strata with one random point for each strata plus 73 samples from the systematic sampling. The second configuration is 23 strata with two random samples per strata plus 73 samples from the systematic sampling.

As in the previous experiments, the theoretical variogram of the entire DEM is first calculated. In this case, the raw DEM is composed of 6660 samples leading to a computation time of 1 min and 20 s (Intel Core i7 3 GHz, 16 MB RAM). Figure 8a, b show the experiments carried out with the systematic sampling and the stratified sampling approach, respectively. The main result is that the empirical variograms do not fit the Gaussian model that represents the entire DEM. A similar behavior to the previous scenario is observed, that is, the stratified sampling approach fits the range of the Gaussian model (5167 vs 5273), but not the sill (81164 vs 29814). In contrast, the systematic sampling has a smaller error in the sill (21371 vs 29814), but there is a deviation in the range (4841 vs 5273).

Fig. 8
figure 8

Variograms (Sahara desert). The continuous line is the theoretical variogram considering all the raw samples, the blue circles mean the empirical variogram of each method. a Systematic sampling, 73 points. b Stratified sampling, 195 points. c Combined sampling, 96 points. d Combined sampling, 117 points

Figure 8c, d display the result obtained with the proposed subsampling approach (156 and 142 samples). Observe that the empirical variograms fit the variogram model obtained with the entire DEM. The best case is obtained when two random samples are selected within each strata (Fig. 8c). Here, as the range as the sill are similar to the theoretical variogram model (5089 vs 5273; 28529 vs 29814). As in the previous experiment, the proposed approach achieves a proper performance fitting the elevation profile, see Figure 9.

Fig. 9
figure 9

Elevations (Sahara desert). Notice that the circles represents the subsampled points and the red profile the entire data set (raw samples). a Systematic sampling, 73 points. b Stratified sampling, 195 points. c Combined sampling, 96 points. d Combined sampling, 117 points

6.2.3 Discussion about the subsampling experiments

In this section we analyze the results obtained with the subsampling approaches in terms of the following metrics: number of samples, computation time, range and sill of the variograms, Pearson’s correlation coefficient between the original variogram and the variogram obtained for each sampling approach, Pearson’s correlation coefficient between each elevation profile, and the mean of the squared residuals between the experimental values and the fitted variogram model. In order to calculate the Pearson’s correlation coefficients a 5th-order polynomial has been fitted to the empirical variogram and to the elevation profile in each case. Recall that those experiments were run in a general-purpose computer (Intel Core i7 3 GHz, 16 MB RAM).

Table 1 shows that the subsampling proposed methodology is worthy in terms of computation time and without reducing the precision of the empirical variogram. Notice that the best result is obtained with the two configurations highlighted in the table (156 samples for the Airport Lake and 96 samples for the Sahara desert). The deviation from the original DEM is the smallest one considering all the metrics. It bears mentioning that the maximum similarity in the elevation profile is usually achieved by the stratified sampling approach. However, this approach yields a poor correlation in terms of the variogram (Pearson’s correlation coefficient). This research also draws the following points:

  • The number of samples is not directly related to the confidence of the empirical variogram. We demonstrate that 156 samples (Airport Lake) and 96 samples (Sahara desert) are better than 300 samples and 195 samples, respectively (combined subsampling approach vs stratified subsampling).

  • Regarding the sampling step, it is important to take into account not only the elevation of the samples (stratified sampling), but also the distance between points (systematic sampling). The solution that leads to the best metrics is in fact a combination of both approaches (Pearson’s correlation coefficient).

  • The computation time can be dramatically reduced by selecting the appropriate samples, instead of considering the entire DEM (44 min vs 0.1 s, in case of the Airport Lake environment).

  • The suitability of the combined sampling approach is demonstrated in terms of the range and sill of the variogram and the elevation profile.

It is important to remark that a slight variation in the shape of the variogram (small deviation in the range or sill) has no effect on the ordinary Kriging weights and the Kriging estimates (elevations). It only affects the Kriging variance (Isaaks and Srivastava 1989). For that reason, considering the reduced-order datasets will not affect the Kriging performance.

Table 1 Summary of the performance of the subsampling approaches

6.3 Ordinary Kriging and global path planning

Once a set of samples have been selected representing the raw DEM, the next step deals with applying the ordinary Kriging algorithm in order to obtain a new model of the terrain at a finer resolution. In this case, 10-m resolution models are desired, which leads to an affordable computation burden (see Remark 2 and discussion in Sect. 7). After that, the D* algorithm is applied in order to obtain the optimal route between two desired points. Recall that three cost functions are evaluated: min-distance (the shortest route), min-uncertainty (the route with the minimum uncertainty), and min-slope (the flattest route).

Figure 10a details the result of applying ordinary Kriging to the Airport Lake environment, and Fig. 10b deals with the Sahara desert. Notice that the generated models fit properly the original DEMs, see Figs. 4 and 5, respectively. Furthermore, it is important to highlight the small standard deviation (uncertainty) obtained. In both environments it is smaller than 0.2 (m) (blue color). Brighter colors mean larger uncertainty, notice that they appear near the boundaries of the surfaces because no samples are available for those regions (from the subsampling step). The range in the standard deviation associated to the Death Valley model is (0, 15.67) (m), the mean value is 0.2 (m) with standard deviation 0.81 (m). For the Sahara scenario the range is (0, 4.48) (m), the mean value is 0.05 (m) with standard deviation 0.18 (m).

Fig. 10
figure 10

Surface reconstruction using ordinary Kriging (stochastic model of the terrain). The original resolutions were 30- and 90-m, respectively. The new DEMs have a resolution of 10 m. Notice that blue color means low uncertainty, \(<\)0.2 (m) for Death Valley and \(<\)0.05 (m) for Sahara desert. a Airport Lake (Death Valley, CA, USA). b Sahara desert (Chad, Africa)

After obtaining the new models of the terrain with Kriging, the D* path planning algorithm is used to obtain the optimal route according to a cost function. Figure 11 shows the min-distance and the min-uncertainty routes for the Airport Lake. As expected, the shortest route (straight-line) corresponds to the min-distance line (red line). The min-uncertainty route considers the variance of the elevation (uncertainty obtained from Kriging). For that reason this route passes as close as possible to the original sampled points (black dots).

Fig. 11
figure 11

Routes obtained using the D* path planner. Notice that the min-uncertainty path goes closer to the actual sampled points. In contrast, the min-distance path follows a straight line between the starting point and the goal. The mesh represents the interpolated model considering the nominal elevations (Kriging estimations)

Figure 12 displays a deterministic terrain profile illustrating the minimum slope between points (8-neighbors to each point). In this sense, a path going through a brighter region (yellow) would mean a flatter route (small variation in the elevation between one point and its neighbor). On the other hand, hazards such as high slopes are represented by blue or red color, that is, the difference in elevation between one point and its neighbors is larger than in a brighter region. Notice that positive values (red color) mean positive slopes (the vehicle would pitch up), and negative values (blue color) represent negative slopes (the vehicle would pitch down).

Fig. 12
figure 12

Deterministic terrain profile: representation of the 3D map in terms of the maximum slope between the 8-neighbors around each point and D* optimal path minimizing the slope between points. Notice that in order to estimate this map the nominal value of the elevation of each point has been considered

Figure 13 shows the three routes obtained with the D* algorithm in the x–y plane. Observing these plots is even easier to understand the difference between the three cost functions. For example, the min-distance route follows a straight line between the starting and the goal points, which is expected. However, the min-uncertainty route follows a different pattern. Although a longer path is generated, the ground vehicle would move over a safer route where more “certain points” are traversed (black points). The min-slope route also takes a different path.

Fig. 13
figure 13

Airport Lake (USA). Comparison of the three routes obtained the D* algorithm considering the variance (uncertainty) of each point as the cost function (min-uncertainty path), considering the elevation (min-distance path), or the terrain profile (min-slope path). a Min-distance route. b Min-uncertainty route. c Min-slope route

Figure 14 shows the min-distance and the min-uncertainty routes regarding the Sahara desert scenario. Again the shortest route (straight-line) corresponds to the min-distance line. As expected, the min-uncertainty route passes fairly close to the sampled points, although it means a longer path.

Fig. 14
figure 14

Routes obtained using the D* path planner. The mesh represents the interpolated model considering the nominal elevations (Kriging estimations)

Figure 15 displays the deterministic terrain profile dealing with the minimum slope between points. Notice that compared to the previous scenario the Sahara desert is a flatter scenario, the elevation range is smaller than 100 m. For that reason, the slope angles are smaller than in the previous environment (range 2.5\({^\circ }\)–1.5\({^\circ }\)). This fact explains why the min-slope route is quite similar to the min-distance route.

Fig. 15
figure 15

Deterministic terrain profile: representation of the 3D map in terms of the maximum slope between the 8-neighbors around each point and D* optimal path minimizing the slope between points

Finally, Fig. 16 shows the three routes obtained with the D* algorithm in the x–y plane. Notice the similarity between the min-distance and the min-slope routes. The main difference still takes place with the min-uncertainty route. As in the previous scenario, a longer path is generated, but with the benefit of following a safer route (smallest uncertainty regarding the true elevation of the terrain).

Fig. 16
figure 16

Sahara desert (Chad). Comparison of the three routes obtained the D* algorithm considering the variance (uncertainty) of each point as the cost function (min-uncertainty path), considering the elevation (min-distance path), or the terrain profile (min-slope path). a Min-distance route. b Min-uncertainty route. c Min-slope route

6.4 Mobility prediction

This section deals with analyzing the mobility of a vehicle over the routes discussed in the previous section. Notice that a maximum linear velocity of 1 (m/s) and a maximum steering angle of 75\({^\circ }\) have been assumed for the testbed considering in these experiments. The lookahead distance of the pure pursuit algorithm is 4 (m), and the sampling period is 1 (s). More technical data about the vehicle configuration is summarized in Table 2. Furthermore, the vehicle is equipped with two Mason DC motors, model 253298 (one motor for traction and another motor for steering).

Table 2 Features of the vehicle considered in this research

It bears mentioning that 150 Monte Carlo simulations have been carried out for each reference route and for each scenario. It means that a random value for the elevation of each reference point has been generated according to the mean value of the elevation and the Kriging variance.

Table 3 summarizes the performance of each route in terms of the cost functions previously introduced (see Sect. 5.1). As expected, the shortest route is obtained employing the min-distance cost in the D* algorithm. The longest route is obtained using the min-uncertainty cost because it was shown it pursues to follow as much “true” samples as possible.

Table 3 Summary of the three cost functions considered in this research

The route with the lowest uncertainty results from the min-uncertainty cost. Recall that this value has been obtained as the mean value of the Kriging variance. For that reason, there is a clear difference between the values obtained in the first scenario and those obtained for the second environment. Uncertainty is larger in the Sahara desert because the original DEM had a resolution of 90 m and the new model has a resolution of 10 m. In contrast, the resolutions of the two models of the Death Valley region are much more similar, 30 versus 10 m. Those results show the importance of uncertainty in elevation and its relation with the spatial resolution.

The trajectory with the lowest mean value for the elevation is given by the min-slope cost. Again, this cost function is given in terms of the average value of the mean elevation. For that reason, when the deterministic terrain profile related to the maximum slope angles between points is considered by the D* algorithm, it is not surprising that it leads to the flattest route.

One interesting conclusion is drawn from the last two columns. First, notice that the energy spent for the two motors of the vehicle (steering motor and traction motor) is fairly similar regarding the min-distance and the min-slope costs because the length of the trajectories is similar in both cases. Even in the first scenario the energy required for the min-slope route is smaller than the min-distance despite the fact it is slightly longer. This is explained because, according to the current configuration of the vehicle (maximum steering angle, maximum linear velocity, etc.), the min-distance route requires more energy to be followed. On the other hand, it bears mentioning that although the min-slope route achieves the minimum value in terms of the mean elevation it does not ensure the smallest value in the slope column. This is not altogether unexpected because the min-slope route has a larger mean uncertain value than the min-uncertainty route. It means that after 150 runs the average value of the slope is smaller when the uncertainty in elevation is smaller. Recall that the index dealing with elevation is obtained as the average of the mean elevation for each route (Kriging mean), and the column related to the slope results from simulating the Kriging variance associated to each point during 150 runs.

The main conclusion of these experiments is that this research demonstrates from a stochastic point of view that the min-uncertainty route is more appropriate than the min-distance and the min-slope routes if uncertainty in elevation is considered. It means that a minimum value in the slope (5th column in Table 3) would eventually lead to a lower consumption. Recall that the fourth column, called Energy, is only calculated for the x–y plane, it does not then account for all the energy required to traverse the 3D world. This issue is in fact shown through a realistic animation discussed in the following section.

6.5 Realistic animation using ANVEL

The results obtained in this section validate the main conclusion drawn from the previous experiments.Footnote 3 In particular, the DEM of the Sahara desert environment has been loaded in the robotic simulator ANVEL (Quantum Signal, http://anvelsim.com), see Fig. 17a. After that, two different routes have been defined: the first route constitutes a flat path, however in the second one the vehicle moves gradually through a slope until the goal, see Fig. 17a, b. This scenario validates the importance of slope variable analyzed in previous section and why the min-uncertainty trajectory results in the most efficient route in terms of energy saving.

Fig. 17
figure 17

Comparing the performance of the proposed methodology through the realistic simulator ANVEL. Notice that the same DEM than used before has been loaded in ANVEL. a ANVEL configured with the Sahara desert DEM. b Routes followed by the vehicle (dist. = \(\sim \)55 m). c Elevations. d Power required by the motor during the traverses. e Speed during the traverses

It bears mentioning that the original routes have been shortened in order to perform an attractive animation (\(\sim \)55 (m) instead of the original \(\sim \)10 (km) routes). A similar vehicle to the one used previously has also been employed in ANVEL. In particular, a Jeep vehicle with a mass of 850 (kg) and a maximum velocity of 1 (m/s) has been employed.

As observed in Fig. 17c, the power required by the flattest trajectory (“route 1”) is smaller than that required by the route where the vehicle has to face a slope (“route 2”) despite the fact both routes traverse the same distance, 55 m. In particular, the power of the first trajectory is 7300 [HP], and the power required in order to traverse the second route is 10480 [HP]. This result demonstrates the conclusions drawn in the previous section, that is, a flatter terrain is preferred to a steepest route despite the traversed distance is the same. Finally, Fig. 17d shows the forward speed of the vehicle during the traverses. Notice that the speed achieved during the first experiment is slightly higher than in the second case. This is not altogether unexpected because in the second route a higher slip occurs.

7 Conclusions and future work

This paper presents a mobility prediction strategy for manned and unmanned ground vehicles planned to operate over large regions (>\(5\times 5\) (km\(^2\))). An important contribution of this work is that we have demonstrated the importance of considering uncertainty in elevation in the cost function of the path planning algorithm. It not only leads to a safer route but also it could mean a lower consumption. However, depending on the criteria or objective selected, it might not mean the optimal solution (e.g. when the shortest route is desired). Specifically, two different environments with different elevation profiles have been tested. The first one deals with a \({\sim }13 \times 13\) (km\(^2\)) region; the second environment has a dimensions of \({\sim }8 \times 8\) (km\(^2\)). The reason why those two environments have been selected is because they have two different DEM formats and two different resolutions. The Airport Lake DEM has a USGS format with 30-m resolution. The Sahara desert has an SRTM format with 90-m resolution, which means points are more sparse in the second DEM than in the first one. Notice that 30-m resolution DEMs are only available for the US territory so far, worldwide coverage is only available through the SRTM format and 90-m resolution.

The computation burden is a major issue working with actual digital elevation models of regions larger than \(5 \times 5\) (km\(^2\)) because large datasets have to be handled (e.g. Airport Lake DEM has almost 35,000 samples and the model obtained with Kriging has almost 2 million points). For this reason, identifying methods for obtaining a reduced-order representation of that DEM constitutes a key point in this field. In this research, a subsampling approach is proposed. It certainly reduces computation time in the variogram calculation and in the Kriging step without reducing the precision of the geostatistics-related metrics. Due to this contribution, we have been able to perform Kriging over a reduced-order representation of the DEMs. When we tried Kriging over those scenarios with no subsampling algorithm the computation time increased dramatically (Intel Core i7 3 GHz, 16 MB RAM).

Future efforts will focus on combining uncertainty in elevation with information from the terrain itself (i.e. terrain trafficability). This issue will lead to more reliable and safer routes. Firstly, taking into account uncertainty in elevation will lead to more efficient routes, and secondly, considering terrain information will reduce the risk of vehicle entrapment.