Keywords

1 Introduction

Sometimes in real applications, multiple conflicting objectives should be optimized, but there is no mathematical or simulation model of the objectives involved. Instead, there is data, e.g., obtained via physical experiments. In such cases, surrogate models can be built using the given data and optimization is then performed with the surrogate models. In the literature, surrogate models such as Kriging [8], neural networks [18] and support vector regression [16] have been typically used for solving computationally expensive optimization problems [6, 10]. If we may conduct new (expensive) function evaluations when needed, this process is called online data-driven optimization [20]. When we do not have access to additional data during the optimization, we call it offline data-driven optimization [11].

In using surrogate models, the main challenge is to manage the models for improving convergence and diversity without too much sacrifice in the accuracy of models. In online data-driven optimization problems, an infill criterion [6] is maximized or minimized for updating the models iteratively during the optimization process. However, this is not applicable for offline data-driven optimization when no further data is available during the optimization process. So far, little research has been conducted on solving optimization problems, where no new data is available for managing the surrogates [4, 11, 20]. In such case, the quality of the solutions obtained after using the surrogate models is entirely dependent on the accuracy of the models and optimizer used.

When solving an offline data-driven problem with multiple conflicting objectives, one can fit models using all the data available for each objective function. Then an evolutionary multiobjective optimization (EMO) algorithm can be used on these models to find a set of approximated nondominated solutions. Essentially, in that case, an offline data-driven multiobjective optimization problem (MOP) can be divided into two major parts: model building and using an EMO algorithm.

Some surrogate models, like Kriging, provide uncertainty information (or standard deviation) about the predicted values. A low standard deviation implies that the actual objective function value has a higher chance of being close to the predicted value (though the actual function may remain unknown and the only information is the data available). Therefore, one possible way to improve the accuracy of the model is to utilize uncertainty in the fitted model as an additional objective to be optimized.

In this article, we study different ways to deal with the uncertainty information provided by the Kriging models in offline data-driven multiobjective optimization. Moreover, we consider the effect of using different initial sampling techniques on some benchmark test problems. In this study, we simulate offline problems by generating data for problems with known optimal solutions to be able to analyze the results. The results show the effect of utilizing uncertainty information in the quality of solutions.

The rest of this article is organized as follows. We summarize the basic concepts of data-driven optimization and Kriging model in Sect. 2. In Sect. 3, we present different approaches of incorporating uncertainty information in the optimization problem and present and analyze the results in Sect. 4. Finally, we draw conclusions in Sect. 5.

2 Background

2.1 Generic Offline Data-Driven EMO

We consider MOPs of the following form:

$$\begin{aligned} \begin{gathered} \text {minimize } {\left\{ f_1(\mathbf {x}),\ldots ,f_k(\mathbf {x})\right\} }, \\ \text {subject to} \ \mathbf {x} \in S, \end{gathered} \end{aligned}$$
(1)

with k \((\ge 2)\) objective functions and the feasible set S is a subset of the decision space \(\mathbb {R}^n\). For any feasible decision vector \(\mathbf {x}\) we have a corresponding objective vector \(f(\mathbf {x})=(f_1(\mathbf {x}),\ldots ,f_k(\mathbf {x}))\).

MOPs that are offline in nature can generally be solved by the approach given in Fig. 1. In what follows, we refer to it as a generic approach. As described in [11, 21], the solution process can be split into three major components: (1) data collection, (2) model building and management, and (3) EMO method utilized. The collection of data may also incorporate data pre-processing, if it is required. Once the data has been obtained, the objectives and constraints of the MOP are formulated. The next stage is to build surrogate models (also known as meta-models) e.g. for each objective function using the available data. Finally, an EMO method is used to find nondominated solutions utilizing the surrogates as objective functions. As objectives to be optimized in (1) we have for \(i=1,\ldots ,k\) the predicted means \(\hat{f}_i\) of the surrogate of objective \(f_i\) and our objective vector is denoted by:

$$\begin{aligned} \mathbf {\hat{f}} = ( \hat{f_1}(\mathbf {x}),\ldots ,\hat{f_k}(\mathbf {x}) ). \end{aligned}$$
(2)
Fig. 1.
figure 1

Flowchart of a generic offline data-driven evolutionary multiobjective optimization approach.

Selecting proper surrogate models is a challenging task in model management. In online data-driven EMO, the quality of the surrogate models can be accessed and updated as new data becomes available during the optimization process. However, for offline data-driven EMO this is not possible. It becomes even more challenging with the data being noisy [22], skewed [23], time-varying [2] or heterogeneous [3]. Thus, it is crucial to build, before optimization, surrogates that are as good approximations as possible of the “true” objective functions. One way to improve the accuracy of the surrogates is to enhance the quality of the data. In this research, our consideration is on a general level and we do not go into the characteristics of the data.

In offline data-driven EMO, the possible ways to improve the accuracy of the surrogate models are to have an effective data pre-processing for noise removal [4], creating synthetic data [23], transferring knowledge [15] or applying advanced machine leaning techniques [19, 20]. However, it is quite possible that the surrogate models are not good representations of the true objectives. It may even happen that the solutions obtained are actually worse than the data used for fitting the models.

2.2 Kriging

Kriging or Gaussian process regression has been widely used as a surrogate model for solving expensive optimization problems [6]. The main advantage of using Kriging is its ability to provide uncertainty information of the predicted values. Given a Kriging model, the approximated mean value \(y^*\) and its variance \(s^2\) for a sample (or decision variable value) \(\mathbf {x}^*\) are as follows:

$$\begin{aligned} y^*= \mathbf {k}(\mathbf {x}^*,\mathbf {X})K(\mathbf {X},\mathbf {X})^{-1} \mathbf {y}, \end{aligned}$$
(3)
$$\begin{aligned} s^2 = \mathbf {k}(\mathbf {x}^*,\mathbf {x}^*) - K(\mathbf {x}^*,\mathbf {X})K(\mathbf {X},\mathbf {X})^{-1}K(\mathbf {X},\mathbf {x}^*), \end{aligned}$$
(4)

where \(\mathbf {X}\in \mathbb {R}^{N_I \times n}\) is the matrix of the given data with \(N_I\) items with n decision variables, \(\mathbf {y}\in \mathbb {R}^{N_I}\) is the vector of given objective values corresponding to some decision vector, \(K(\mathbf {X},\mathbf {X})\) is the covariance matrix of \(\mathbf {X}\) and \(\mathbf {k}(\mathbf {x}^*,\mathbf {X})\) is a vector of covariances between \(\mathbf {x}^*\) and \(\mathbf {X}\). For more details about Kriging, see [17].

3 Approaches to Incorporate Uncertainty

As new data cannot be obtained in offline data-driven optimization, it is difficult to update the surrogates and enhance their accuracy. One approach is to build a very accurate surrogate model before the optimization process. Another possible approach is to provide a suitable metric in addition to final solutions after the optimization process, which can be used to measure the accuracy of solutions obtained. This approach can be beneficial when the surrogate models cannot provide a very exact representation of the true objective functions. One such instance can be when the data consists of optimal solutions. In such a case, the surrogate might not be a good representation of the actual objectives, which might lead to degraded final solutions. Providing a set of solutions together with the uncertainty information of predicted final solutions can be helpful in the decision making process.

As previously discussed, the two major components in offline data-driven optimization are building a surrogate model and using an EMO algorithm. In this research we have limited ourselves by focusing on a few variations of the optimization problem which try to minimize the uncertainty in the final solutions. As shown in Fig. 2, the uncertainties in the predicted value of the Kriging models are utilized as additional objective functions. By considering uncertainties in this way, the EMO method tries to minimize the predicted mean values from the fitted Kriging models by subsequently minimizing the standard deviations in the prediction. Thus, the final set of nondominated solutions will consist of solutions with different levels of uncertainty.

Fig. 2.
figure 2

Flowchart of offline data-driven optimization with uncertainty.

We have tested three different approaches for utilizing uncertainties in the optimization. Approach 1 uses all the standard deviations given by each surrogate model as additional objectives. The resulting objective vector in Approach 1 is:

$$\begin{aligned} \mathbf {\hat{f}} = (\hat{f_1}(\mathbf {x}),\ldots ,\hat{f_k}(\mathbf {x}),s_1(\mathbf {x}),\ldots ,s_k(\mathbf {x}) ), \end{aligned}$$
(5)

where \(\hat{f_i}(\mathbf {x})\) and \(s_i(\mathbf {x})\) and are the predicted mean and the standard deviation values for the \(i^{th}\) objective. Final solutions are obtained by performing a nondominated sort on the archive of predicted solutions (predicted mean values and standard deviations) stored while optimization. It might be possible that the solutions have different uncertainties for different objectives. We double the number of objectives which may increase the complexity of solving the resulting optimization problem.

Approach 2 utilizes the average of the standard deviations given by each of the surrogate models as an additional objective and the resulting objective vector is:

$$\begin{aligned} \mathbf {\hat{f}} = (\hat{f_1}(\mathbf {x}),\ldots ,\hat{f_k}(\mathbf {x}),\bar{s}(\mathbf {x}) ), \end{aligned}$$
(6)

where \(\bar{s}(\mathbf {x})\) is the average of the standard deviations from Kriging models built for each objective function. This method has fewer objectives when compared to Approach 1, however, either of the approaches provide solutions with a range of uncertainty values. Both Approaches 1 and 2 can provide an option for filtering solutions based on the uncertainty information.

Approach 3 utilizes the expected improvement (EI) [12] for every surrogate model as objectives to be optimized by the EMO algorithm, see, e.g. [9]. Expected improvement can be expressed as \(\text {EI}(\mathbf {x})=(f_{min}-\hat{f}(\mathbf {x}))\varPhi \left( \frac{f_{min}-\hat{f}(\mathbf {x})}{s(\mathbf {x})} \right) + s(\mathbf {x})\phi \left( \frac{f_{min}-\hat{f}(\mathbf {x})}{s(\mathbf {x})} \right) \), where \(\phi (\cdot )\) and \(\varPhi (\cdot )\) are the standard normal density and distribution function respectively, and \(f_{min}\) is a k-dimensional vector, where the \(i^{th}\) component represents the best values of the \(i^{th}\) objective function in the given data. The objective vector in this case is:

$$\begin{aligned} \mathbf {\hat{f}} = \left( \text {EI}_1(\mathbf {x}),\ldots ,\text {EI}_k(\mathbf {x})\right) , \end{aligned}$$
(7)

where \(\text {EI}_i(\mathbf {x})\) is the expected improvement value for the \(i^{th}\) objective. The EI criterion takes the predicted mean value and the standard deviation into account.

Now we have introduced three approaches for incorporating uncertainty information. Algorithm 1 shows the process of applying any of them in the offline optimization process, where k is the number of objectives and we can use the maximum number of evaluations using surrogate models as a stopping criterion.

figure a

4 Experimental Results

We compare the three different approaches to each other and also to a generic approach (as (2) in Subsect. 2.1), using test problems DTLZ2, DTLZ4–DTLZ7 with 2, 3 and 5 objectives. As said, we generate data for these problems and fit Kriging models there. The dimension of the decision variable space n is fixed to 10.

The size of the data set used is 109 (corresponds to the \(11n-1\) [5, 13, 24]). The sampling techniques for creating the data sets were Latin hypercube sampling (LHS), uniform random sampling and a special case of sampling which we call optimal-random sampling. In the latter, 50% of the data are nondominated solutions and the remaining 50% are uniform random samples. This kind of hypothetical sampling might resemble a special case where most of the samples in the given data set are close to optimal, and thus the optimization process could no longer improve the solutions further. However, in such a scenario the offline optimization technique should not compute final solutions which are worse than the provided samples. A total of 31 independent runs from each sampling were performed for each case.

We used indicator based evolutionary algorithm (IBEA) [25] as the EMO method as it has been demonstrated to perform well in [1] even for problems with a higher number of objectives. The selection criterion was \(I_{\epsilon +}\) (Step 6 in Algorithm 1) with \(\kappa \) parameter values 0.51, 0.87 and 0.48 for \(k= 2, 3 \) and 5, respectively, and \(\kappa \) value of 0.5 for any other number of objectives. The population size was 100 and the maximum number of function evaluations was 40 000 according to [1]. We used Matlab implementation of Kriging models with first order polynomial functions and a Gaussian kernel function.

For measuring the performance of different approaches, we first performed a nondominated sort on the archive (also including the additional objective(s)). These nondominated solutions were then evaluated with the real objective function. After obtaining their true objective function values, dominated solutions were removed producing the final nondominated set. For comparing the quality of solutions for all the approaches, inverted generational distance (IGD) metric was utilized with 5000 points in the reference set for all problems.

Table 1 shows the comparison between the mean and standard deviation values of the IGD for all the three approaches and the generic approach. It was observed that Approaches 1 and 2 performed better than the generic approach for LHS and uniform random sampling for all the problems with various numbers of objectives with the exception of DTLZ6 and DTLZ7. However, while using optimal-random sampling, Approaches 1 and 2 performed better than the generic approach for DTLZ2, DTLZ4-5 and better for DTLZ6 and DTLZ7 for few of the objectives. Approach 3 did not produce good results for any of the problems, objectives or sampling technique.

Adding uncertainties as additional objectives pose a major problem in explaining the effect of optimization as the fitness landscape of the uncertainties is mostly unknown. A possible explanation that no noticeable performance improvement is observed in DTLZ6 when using Approaches 1 and 2 is because the problem consists has a non-uniform (or biased) [7] degenerated Pareto front. Adding additional uncertainty objectives makes the problem even harder to solve and fewer nondominated solutions are obtained. For DTLZ7, a possible explanation for the worse performance of Approaches 1 and 2 is that the objective functions are completely separable [14]. Thus, the additional objectives added by Approaches 1 and 2 only make the problem more difficult than the generic approach.

For optimal-random sampling the advantage of Approaches 1 and 2 was clearly visible. Despite the initial sampling including also nondominated solutions, the generic approach failed to provide good solutions. This is because the surrogate models do not provide a perfect representation of the true objectives. While utilizing EIs as objectives in Approach 3, the solutions were actually worse (comparing mean IGD values) for most of the cases. This is because EI tries to balance between convergence and diversity. Therefore, it can select a solution with a high uncertainty for achieving its goal.

Figure 3 shows the root mean square error (RMSE) of the final solutions obtained by different approaches with LHS sampling on problems with two objectives. It can be observed that the solutions obtained by Approaches 1 and 2 are more accurate in most of the cases. This means that using uncertainty as additional objective(s) helps to find solutions with a low approximation error. Therefore, using uncertainty in the optimization process can be considered as an advantage in solving an offline data-driven EMO problem where there is no possibility for updating the surrogate models. An illustration of solutions obtained after evaluating them with real objectives for the DTLZ2 problem with LHS and optimal-random sampling is shown in Fig. 4. Due to space limitations, further analysis is available at http://www.mit.jyu.fi/optgroup/extramaterial.html as additional material. The performance of the proposed approaches on other test problems (i.e., DTLZ1, DTLZ3, WFG1-WFG3, WFG5 and WFG9) can also be found at the above-mentioned website.

Fig. 3.
figure 3

RMSE of the final solutions for bi-objective problems. Here f1 and f2 are the objectives and “Gen”, “Appr1”, “Appr2” and “Appr3” are the generic and Approaches 1, 2 and 3, respectively. Opt.Rand is optimal-random sampling.

Table 1. Means and standard deviations of IGD values of the final archive, evaluated on the true objective functions, obtained by each approach, for various problems and sampling techniques. (Best values are in bold)
Fig. 4.
figure 4

Final solutions obtained of the run with the median IGD value using different approaches for LHS sampling (top three rows) and optimal-random sampling (bottom three rows) for the DTLZ2 problem.

5 Conclusions

We have considered offline data-driven optimization with evolutionary multiobjective optimization. We used Kriging to fit surrogate models to data and proposed and tested three approaches to utilize uncertainty information from Kriging models in the optimization. A comparison was done with several benchmark problems, sampling techniques and varying the number of objectives in solving offline data-driven multiobjective optimization problems. Adding uncertainty as one or more objectives showed improvements in the final solutions for certain problems in our benchmark testing. However, utilizing expected improvements as objectives (in Approach 3) did not seem to be effective in solving this kind of problems. The analysis also revealed that the solutions obtained in Approaches 1 and 2 are more accurate compared to the ones obtained using a generic approach (without uncertainty information).

Future work will include comparing the performance of the proposed approaches with bigger initial sample sizes, higher number of decision variables and higher number of objectives. Aiding the decision making process by giving a decision maker an option to select a final solution using the uncertainty information is another direction to work on. Moreover, filtering techniques can be applied to remove solutions with higher uncertainties. Testing on real-world data sets and exploring different ways to deal with uncertainties using other surrogate models will also be future research topics.