1 Introduction

According to Cua et al. [1], quality management principles emphasize the importance of cross-functional product development and systematic management process, as well as the involvement of customers, suppliers, and employees to ensure the quality of products and processes.

Kano and Nakagawa [2] argued that, to improve product quality, a system having at least the following functions is necessary: (1) predicting product quality by operating conditions; (2) detecting faults and malfunctions for preventing undesirable operation; and (3) determining the best operating conditions to improve product quality. The first function is performed through the development of a software program, which is a mathematical model that relates the operating conditions to product quality. The second function is performed via multivariate statistical process control. The third function is performed by formulating and solving optimization problems.

In most industrial processes, the relationships between the answers and the decision variables are unknown. To obtain this information, it is necessary to design and execute experiments and to collect and analyze the data. In a planned experiment, purposive variations are made in controllable process variables, observing the resulting output data to make inferences about which variables are responsible for the observed changes.

According to Montgomery [3], when the objective is to optimize a given problem, the response surface methodology (RSM) should be chosen to define the experimental design. As one of the objectives of RSM is to optimize the answers, it is recommended, whenever possible, to represent them through second-order models, as the curvature presented by them defines the location of an optimal point.

Despite being considered an adequate approximation for the responses of interest, the values generated by the estimated model will always present an error in relation to the real values. The magnitude of these errors is measured using the prediction variance of the model. Thus, the quality of the forecast of a response depends on the prediction variance. Small prediction variance values are desirable for reliable predictions [4].

From an analysis of the manufacturing processes, it is concluded that the optimization of various possibly controlled parameters, such as quality, cost, and productivity, leads to multi-objective mathematical models. In industrial processes where the joint optimization of multiple characteristics is desired, the problem can be defined by the following mathematical formulation:

$$\begin{aligned} {\text{Min}}\left\{ {f_{1} \left( x \right),f_{2} \left( x \right), \ldots ,f_{k} \left( x \right)} \right\} \hfill \\ {\text{s}}.{\text{t}}.:h_{i} \left( x \right) = 0,\quad \;i = 1,2, \ldots ,l \hfill \\ \quad \;\;g_{j} \left( x \right) \le 0,\quad \, j = 1,2, \ldots ,m, \hfill \\ \end{aligned}$$
(1)

where f1(x), f2(x),…, fk(x) are objective functions to be optimized; hi(x) represents the l equality constraints; and gj(x) represents the m inequality constraints.

In multi-objective problems, it is very unlikely that all the functions are minimized simultaneously by one optimal solution x*. Indeed, these goals are a function of the same decision variable set and are conflicting [5]. The Pareto optimal solution concept, also called the compromise solution, has become considerably relevant to these problems. A feasible solution x* is Pareto optimal if no other feasible solution z exists such that \(f_{i} \left( z \right) \le f_{i} (x^{*} ),\quad i = 1,2, \ldots ,m\), with \(f_{j} \left( z \right) < f_{i} \left( {x^{*} } \right)\) in at least one objective j.

The purpose of multi-objective optimization processes (MOPs) is to offer support and ways to find the best compromise solution, in which the decision maker and his/her preference information play an important role, as it is typically responsible for the final solution of the problem. As it is difficult to know the importance degree to be assigned to each objective [6], the weights for each function are eventually defined, subjectively influenced by the analyst’s preferences.

However, Zeleny [7], when proposing his weighting method based on entropy for linear multi-objective optimization, discussed some points against this practice, among which the following are cited: 1. human capacity to reach an overall assessment by weighting and combining different attributes is not very good, and such a weight allocation process is unstable, suboptimal, and often arbitrary; 2. the total number of all possible and identifiable criteria and attributes can be very large, as it is not plausible to expect that any human being can assign weights to hundreds of attributes with any reliability; 3. Weight changes reflect the fact that they are dependent on a particular problem, i.e., any particular weighting structure must be learned to be more the result of the analysis rather than its input. Indeed, to elicit direct preference information from the analyst can be counterproductive in real-world decision-making because of the requirement of a high cognitive effort [8].

The question of weighing has been discussed since the publication of Zeleny’s works in the 1970s. Since then, many works on the subject have been published but without an apparent consensus. In general, the literature on the subject is divided into four categories: equally distributed weighting; random weighting; subjective weighting methods; and objective weighting methods. Subjective weighting is supported by methods based on personal or collective judgments, usually produced by direct assignment [9], ANP [10], AHP [11, 12] and/or fuzzy method [13, 14]. Objective weighting methods set priorities according to quantitative values. The main representative of this category is the methods based on entropic parameters [15,16,17].

In the researched literature, the ways in which the weighting in the MOPs affects the forecast variance, for RSM experimental design, have never been studied; hence, it provides a scope for exploring the theoretical contributions pertaining to this topic. Thus, the main objective of this study is to develop a method to identify the optimal weights in MOPs, based on the weighting diversification obtained through the maximization of entropy and diversity functions, and to study how the weighting affects forecast variance in multi-objective optimization using RSM. This paper proposes that the use of entropic metrics in choosing optimal weights in MOPs can reduce forecast variance. Hence, the present proposal is called robust optimal point selection (ROPS). The use of metrics proposed in ROPS is presented as a useful tool in the multiple-criteria decision-making process, because it leads to robust responses without the necessity of including the variance term in the mathematical formulation of the problem, making it simpler.

2 Theoretical fundamentals

2.1 Weighting methods applied to multi-objective optimization

As previously mentioned in Sect. 1, during the MOP, numerous efficient solutions may be generated to form the Pareto frontier. Due to complexity in formulating and solving mathematical problems, choosing the best point to be implemented becomes a non-trivial task.

By assigning different weights to the representative objective functions of the characteristics of the processes that we want to optimize, we consider the relative importance of each parameter within the analyzed process. This indicates that weights should be assigned to functions to indicate their relative importance to identify the important aspects during the optimization process, thus electing priorities [18].

According to Taboada et al. [19], Gaudreault et al. [20], and Pilavachi et al. [21], the priority given to the criteria is essential to achieve results and should be applied with caution, as the final result can vary significantly depending on the importance assigned to each of its objectives. This may lead to a problem because of the uncertainty of decision makers about the exact weight of objective functions and utility functions [19].

The Pareto set includes all the rational choices, among which the decision maker must identify the solution by comparing their various objectives [19]. Several techniques have been presented to search the solution space for a set of Pareto optimal solutions. However, the major drawback of such methods is that the decision maker can choose from several solutions. Thus, according to Taboada et al. [19], it is necessary to bridge the gap between single solutions and optimal Pareto sets.

The lack of consensus to stipulate an acceptable weighting method makes the process even more difficult. This is due to the large number of methods that can be applied and the considerable differences among them [18].

The question of weighing has been discussed since the publication of Zeleny’s works [7, 22]. Melachrinoudis [23] determined an optimum location for an undesirable facility in a workroom environment. The author defined the problem as the selection of a location within the convex region that maximizes the minimum weighted Euclidean distance with respect to all existing facilities, where the degree of undesirability between an existing facility and the new undesirable entity is reflected through a weighting factor.

Saaty [24] presented a multi-criteria decision-making approach, named the analytic hierarchy process (AHP), in which selected factors are arranged in a hierarchic structure descending from an overall goal to criteria, subcriteria, and alternatives in successive levels. Despite its popularity, this method has been criticized by decision analysts. Some authors have pointed out that Saaty’s procedure does not optimize any performance criterion [25]. However, according to Promentilla et al. [26], the analytic network process (ANP), which is a generalized form of AHP, is an attractive tool for understanding the complex decision problem better, as this approach overcomes the limitation of the linear hierarchical structure of the AHP.

Figueira et al. [8] presented a method for ranking a finite set of actions evaluated on a finite set of criteria. The generalized regression with intensities of preference (GRIP) is based on indirect preference information and the ordinal regression paradigm. It can be compared to the AHP, as the decision maker is requested to express the intensity of preference in qualitative-ordinal terms in both approaches. However, in contrast to AHP, in GRIP, the marginal value functions are just a numerical representation of the original qualitative-ordinal information. The pairwise comparison principle has also been used in more recent models as a set of dominance decision rules induced from rough approximations of comprehensive preference relations [6].

Taboada et al. [19] proposed a different approach. In their work, the authors presented two alternatives to reduce the Pareto optimal set, to be used in the decision-making stage. The first is by an order of objective functions without, however, assigning them to numerical values and the second is the use of cluster analysis between the Pareto optimal points. According to the authors, the act of reducing the Pareto optimal set makes the decision-making process easier.

Over time, other methods for deriving priority weights have been proposed, such as, methods using simulated annealing [27, 28], geometric mean procedure [29, 30], methods based on constrained optimization models [31], trial and error methods [32], methods using fuzzy logic [27, 29, 30, 33, 34], and methods using grey decision [35,36,37].

Recently, Monghasemi et al. [38], dealing with the multi-objective optimization of time–cost-quality trade-off problems in construction projects, have used Shannon’s entropy [39] to define the weights involved in the optimization process. According to the authors, Shannon’s entropy can provide a more reliable assessment of the relative weights for the objectives in the absence of the decision maker’s preferences.

Rocha et al. [40] and Rocha et al. [41] used Shannon’s [39] entropy index associated with an error measure to determine the most preferred Pareto optimal point in a vertical turning MOP.

Wang et al. [42], when reviewing the methods of multi-criteria decision-making, classified the weighting methods into two main groups: subjective and objective weighting methods. Subjective weighting is supported by methods based on personal or collective judgments, usually produced by expert panels, the Delphi method, paired comparison, both in its original form incorporated into either the AHP or ANP, etc. [18]. In contrast, objective weighting methods set priorities according to quantitative values obtained mainly by applying statistical models or procedures that implicitly calculate the criteria weights. The main representative of this category is the entropy method presented by Zeleny [7, 22]. Ibáñes-Forés et al. [18] presented two other categories: equally distributed weighting and random weighting. The latter involves analyzing the results under all possible combinations of weights that can be assigned to each criterion in the study, typically using any simulation technique. The theoretical review performed using these categories is summarized in Table 1.

Table 1 Review of weighting methods in the literature

Based on Table 1, one can perceive the extent of the subject. Even after more than 40 years of research, it remains relevant. Diverse applications can be found: energy sector, sustainability, chemical industry, machining processes, teachers’ evaluation, etc. Notably, despite these efforts, there is no intention to exhaust the theme, mainly due to the different applicability of the weighting. Many works were included, because they explicitly used some of the aforementioned methods, despite not presenting a discussion on the weighting. Nevertheless, several other works could also be included in this literature review.

Among the papers presented, only Shahraki and Noorossana [93] proposed to evaluate any variability parameter when selecting the best Pareto optimal solution. The authors used two criteria to make this selection: the sensitivity to reliability levels and the process capability index.

This work aims to study how the weighting functions in multi-objective optimization affect the forecast variance.

2.2 Entropy

In 1865, when the German physicist Rudolf Clausius attempted to give a new name to irreversible heat loss, the word “entropy” was introduced. Since then, entropy has played an important role in thermodynamics. This concept also helps measure the amount of order and disorder [99]. The word entropy had belonged to the domain of physics until 1948 when Claude Shannon, while developing his theory of communication [39], used the term to represent a measure of information [100].

Entropy can be defined as a measure of probabilistic uncertainty. Its use is indicated in situations where the probability distributions are unknown, in search of diversification. Among the several other desirable properties of Shannon’s entropy index, the following are highlighted: Shannon’s measure is nonnegative, and its measure is concave. The first is desirable, because the entropy index ensures non-null solutions. The latter is desirable, because it is much easier to maximize a concave function than a non-concave one [100]. Higher entropy values indicate more randomness; less information is expressed.

Shannon’s entropy index is one of several diversity indices used to measure diversity in categorical data. It is simply the information entropy of the distribution, treating species as symbols and their relative population sizes as the probability [101]. The information can simply be defined as the values of the objectives. The underlying assumption is that an event that has a lower probability of occurrence is more likely to provide more information by its occurrence [92].

The maximum entropy principle determines the less informative probability distribution for a random variable x given any prior information about x. If the mean and variance information of x are available, the continuous probability distribution that maximizes the differential Shannon entropy is the normal distribution. According to Zhou et al. [99], when dealing with continuous probability distributions, the density function is evaluated for all values of the argument. Thus, given a continuous probability distribution with a density function f(x), its entropy can be defined as

$$S(x) = - \int\limits_{ - \infty }^{ + \infty } {f(x)\ln f(x){\text{d}}x} ,$$
(2)

where \(\int_{ - \infty }^{ + \infty } {f(x){\text{d}}x} = 1\) e \(f(x) \ge 0\).

As the weights used in the weighting of functions in multi-objective optimization are proportions, f(x) follows a discrete probability distribution. Thus, Eq. (2) becomes

$$S(w) = - \sum\limits_{i = 1}^{m} {w_{i} \ln w_{i} } ,$$
(3)

where wi are the weights assigned to the objectives to be optimized.

The index shown in Eq. (3) is also known as the Shannon–Weiner entropy index [102].

2.3 Diversity

According to Stirling [102], our actions are permeated with a lack of certainty arising from various sources such as incomplete knowledge, contradictory information, data variability, conceptual imprecision, different reference points, and the inherent indeterminacy of several natural and social processes.

The theory of probability attempts to address this issue. A probability can be assigned to each possible set of future events. It can be considered to reflect the established frequency of occurrence of similar past events under comparable conditions and is thus, in some sense, objective. This “frequentist” interpretation of probability is vulnerable to doubts about the comparability of past and future circumstances and results. In a more subjective way, from a Bayesian perspective, probability can be considered simply to reflect the probabilities of different eventualities, given the best available information and the prior expert opinion. However, due to the deficiency of information, these procedures tend to be vulnerable to error, unconscious bias, or manipulation [102].

Recognizing these difficulties, a distinction is made between risk (where the probability density function can significantly be set for a range of possible outcomes) and uncertainty (where there is no basis for assigning probabilities). In situations where there is no basis for assigning probabilities to outcomes or knowledge about several possible outcomes, another state of the absence of certainty has been distinguished, i.e., ignorance. In several fields, ignorance, rather than risk or uncertainty, dominates the real decision-making process [102].

Of all the strategies developed to deal with the absence of certainty, the best one is diversification. The concepts of diversity employed in several fields of science have the combination of only three properties—variety, balance, and disparity—each of which is a necessary but insufficient diversity feature [103].

Stirling [103] stated that variety is the number of categories into which the elements of the system are divided. The larger the variety, the greater is the diversity. Balance is a function of the pattern of division of elements across categories. The greater the balance, the greater is the diversity. Disparity indicates how different the elements are from one another. The greater the disparity between the elements, the greater is the diversity.

According to Stirling [103], Shannon’s entropy index, as presented in Eq. (3), only includes the variety and balance dimensions. Thus, the author proposed a formulation that considered variety, balance, and disparity as follows:

$$\Delta = \sum\limits_{ij(i \ne j)} {d_{ij}^{\alpha } (w_{i} w_{j} )^{\beta } } ,$$
(4)

where dij is the disparity between two elements; w indicates the weights representing the proportion of the elements i and j; α and β are terms quantifying the importance degree between disparity and balance, and, in the reference case, α = β = 1.

The disparity (dij) is a measure of the difference between the objects. For this, two measures are most widely used: correlation measures and distance measurements.

The method usually known to measure the correlation between two variables is the Pearson linear coefficient, which can be calculated as

$$\rho = \frac{{\sigma_{XY} }}{{\sigma_{X} \sigma_{y} }} = \frac{{\sum\limits_{i = 1}^{n} {\sum\limits_{i = 1}^{n} {(Y_{i} - \bar{Y})(X_{i} - \bar{X})} } }}{{\left[ {\sum\limits_{i = 1}^{n} {(X_{i} - \bar{X})^{2} } \sum\limits_{i = 1}^{n} {(Y_{i} - \bar{Y})^{2} } } \right]^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} }},$$
(5)

where \(\sigma_{XY}\) corresponds to the covariance between X and Y; \(\sigma_{X}\) corresponds to the standard deviation of X; and \(\sigma_{Y}\) corresponds to the standard deviation of Y.

High positive correlations indicate similarity, and high negative correlations indicate disparity. Thus, it is defined that \(d_{ij} = 1 - \rho_{ij}\).

The most commonly recognized distance measure is the Euclidean distance. It is the measure of the length of a straight line drawn between two objects when represented graphically. Thus, the greater the distance between two objects, the greater is their disparity. A distance measure in the context of multi-objective optimization can be calculated as the Euclidean distance between the anchor points, that is, the points that optimize each response individually, calculated as follows [15]:

$$\mathop {d_{ij} }\limits_{i \ne j} = \sqrt {\left[ {x_{{1_{{f_{i} (x)}} }}^{*} - x_{{1_{{f_{j} (x)}} }}^{*} } \right]^{2} + \left[ {x_{{2_{{f_{i} (x)}} }}^{*} - x_{{2_{{f_{j} (x)}} }}^{*} } \right]^{2} + \cdots + \left[ {x_{{n_{{f_{i} (x)}} }}^{*} - x_{{n_{{f_{j} (x)}} }}^{*} } \right]^{2} } ,$$
(6)

where \(x_{1} , \, x_{2} \ldots x_{n}\) are the decision variables of the problem; \(f_{i} (x)\) and \(f_{j} (x)\) are the objective functions.

3 Robust optimal point selection

As discussed earlier, several of the weighting strategies employed during the optimization process and decision-making consist of, at least in one of their stages, imprecise and subjective elements. Large portions of these strategies still use error-prone elements, which can make significant contributions. However, considering that, among all the consulted sources, only Shahraki and Noorossana [93] proposed to evaluate any variability parameter when selecting the best Pareto optimal solution, another theoretical gap that the current study intends to explore is the forecast variance behavior in relation to the weighting strategies.

ROPS is an alternative approach for identifying optimal weights for MOPs. To this end, Rocha et al. [40] and Rocha et al. [41] proposed a weighting method that combines Shannon’s entropy and an error measure. The entropy-based weighting presented in the aforementioned studies was useful in identifying the optimal weights used in multi-objective optimization. Nevertheless, the authors did not discuss the forecast variance.

Therefore, addressing this gap in the work of Rocha et al. [40] and Rocha et al. [41] and in the literature in general, this paper presents different weighting strategies showing how these strategies affect the forecast variance. The diversity index [103], entropy index [39] and entropy-based weighting [40, 41] are used as parameters for selecting the most preferred Pareto optimal point and their results are compared. In all these possibilities, the forecast variance behavior was evaluated.

The optimization algorithms are included during the step of identifying optimal solutions, after they have been modeled using RSM (for the mathematical formulation of RSM, see [15, 41, 98]). The generalized reduced gradient (GRG) algorithm is used by the Excel® Solver function. The normal boundary intersection (NBI) approach is used to identify the Pareto optimal solutions and construct the Pareto frontier (for the mathematical formulation of NBI, see [104]). This approach was chosen, because it has become possible to define a Pareto frontier with evenly distributed solutions, regardless of the function convexity, overcoming the drawbacks of the weighted sum method.

To demonstrate the proposition of the present study mathematically, consider the following MOP:

$$\begin{aligned} \mathop {\text{Min}}\limits_{x} \, \sum\limits_{i = 1}^{n} {w_{i} f_{i} (x)} \hfill \\ {\text{s}}.{\text{t}}.:\;\sum\nolimits_{i = 1}^{n} {w_{i} = 1} \hfill \\ \quad \;\;w_{i} \ge 0,\quad i = 1, \ldots ,n, \hfill \\ \end{aligned}$$
(7)

where fi (x) represents the objective functions to be optimized, and wi represents the weights assigned to each objective function.

To calculate the variance for the function under analysis, the following process is considered:

$$\begin{aligned} {\text{Var}}\left[ {\sum\limits_{i = 1}^{n} {w_{i} f_{i} (x)} } \right] & = \sum\limits_{i = 1}^{n} {\left[ {\frac{{\partial w_{i} f_{i} (x)}}{{\partial f_{i} (x)}}} \right]}^{2} \sigma_{{f_{i} }}^{2} + 2\sum\limits_{i}^{n} {\sum\limits_{j}^{n} {\left[ {\frac{{\partial w_{i} f_{i} (x)}}{{\partial f_{i} (x)}}} \right]} } \left[ {\frac{{\partial w_{j} f_{j} (x)}}{{\partial f_{j} (x)}}} \right]\sigma_{{f_{i} f_{j} }} \\ & = \sum\limits_{i = 1}^{n} {w_{i}^{2} } \sigma_{{f_{i} (x)}}^{2} + 2\sum\limits_{i}^{n} {\sum\limits_{j}^{n} {w_{i} w_{j} } } \sigma_{{f_{i} f_{j} }} \\ & = \sum\limits_{i = 1}^{n} {w_{i}^{2} } {\text{Var}}\left[ {f_{i} (x)} \right] + 2\sum\limits_{ \, i}^{n} {\sum\limits_{ \ne \, j}^{n} {w_{i} w_{j} } } \rho_{{f_{i} f_{j} }} \sqrt {Var\left[ {f_{i} (x)} \right] \times Var\left[ {f_{j} (x)} \right]} , \\ \end{aligned}$$
(8)

where \(\rho_{{f_{i} f_{j} }}\) is the correlation between the functions fi and fj.

Considering that we can calculate the variance of fi(x) at a given point \({\mathbf{X}}_{0}^{T} = \left[ {1 \, x_{01} \, x_{02} \, . \, . { } . { }x_{0k} } \right]\), such as \({\text{Var}}[f_{i} ({\mathbf{X}}_{0} )] = \hat{\sigma }_{{f_{i} }}^{2} {\mathbf{X}}_{0}^{T} ({\mathbf{X}}^{T} {\mathbf{X}})^{ - 1} {\mathbf{X}}_{0}\), we can modify Eq. (8) to

$$\begin{aligned} & {\text{Var}}\left[ {\sum\limits_{i = 1}^{n} {w_{i} f_{i} ({\mathbf{X}}_{0} )} } \right] = \sum\limits_{i = 1}^{n} {w_{i}^{2} } \left[ {\hat{\sigma }_{{f_{i} }}^{2} {\mathbf{X}}_{0}^{T} ({\mathbf{X}}^{T} {\mathbf{X}})^{ - 1} {\mathbf{X}}_{0} } \right] \\ & \quad + \,2\sum\limits_{i}^{n} {\sum\limits_{ \ne \, j}^{n} {w_{i} w_{j} } } \rho_{{f_{i} f_{j} }} \sqrt {\left[ {\hat{\sigma }_{{f_{i} }}^{2} {\mathbf{X}}_{0}^{T} ({\mathbf{X}}^{T} {\mathbf{X}})^{ - 1} {\mathbf{X}}_{0} } \right] \times \left[ {\hat{\sigma }_{{f_{j} }}^{2} {\mathbf{X}}_{0}^{T} ({\mathbf{X}}^{T} {\mathbf{X}})^{ - 1} {\mathbf{X}}_{0} } \right]} . \\ \end{aligned}$$
(9)

Now, let \(\rho_{{f_{i} f_{j} }}\) equal zero. In this case, Eq. (9) becomes

$${\text{Var}}\left[ {\sum\limits_{i = 1}^{n} {w_{i} f_{i} ({\mathbf{X}}_{0} )} } \right] = \sum\limits_{i = 1}^{n} {w_{i}^{2} } \left[ {\hat{\sigma }_{{f_{i} }}^{2} {\mathbf{X}}_{0}^{T} ({\mathbf{X}}^{T} {\mathbf{X}})^{ - 1} {\mathbf{X}}_{0} } \right].$$
(10)

As the variance of the estimated responses depends on the square of the weight assigned to each response, one way of minimizing its value is by diversification, i.e., by the uniform distribution of weights among the functions involved in the MOP.

Figure 1 shows the step-by-step proposal.

Fig. 1
figure 1

Step-by-step proposal

The NBI approach is used to solve the MOP, using the following equation [104]:

$$\begin{aligned} \mathop {\text{Max}}\limits_{{\left( {x, \, D} \right)}} \, D \hfill \\ {\text{s}}.{\text{t}}. :\bar{\varPhi }w - D\bar{\varPhi }e = \bar{F}({\mathbf{x}}) \hfill \\ \quad \;\;{\mathbf{x}}^{T} {\mathbf{x}} \le \alpha^{2} \hfill \\ \quad \;\;{\mathbf{x}} \in \varOmega , \hfill \\ \end{aligned}$$
(11)

where w is the convex weighting; D is the distance between the Utopia line and the Pareto frontier; \(\bar{F}({\mathbf{x}})\) is the vector containing the individual values of the normalized objectives in each run; \(e\) is a column vector of ones; α is the value of the axial point of experimental planning; \(\varPhi\) and \(\bar{\varPhi }\) are the payoff and normalized payoff matrices, respectively, and can be written as

$$\varPhi = \left[ {\begin{array}{*{20}c} {f_{1}^{*} \left( {x_{1}^{*} } \right)} & \cdots & {f_{1}^{{}} \left( {x_{m}^{*} } \right)} \\ \vdots & \ddots & \vdots \\ {f_{m}^{{}} \left( {x_{1}^{*} } \right)} & \cdots & {f_{m}^{*} \left( {x_{m}^{*} } \right)} \\ \end{array} } \right] \Rightarrow \bar{\varPhi } = \left[ {\begin{array}{*{20}c} {\frac{{f_{1}^{*} \left( {x_{1}^{*} } \right) - f_{1}^{*} \left( {x_{1}^{*} } \right)}}{{f_{1}^{{}} \left( {x_{m}^{*} } \right) - f_{1}^{*} \left( {x_{1}^{*} } \right)}}} & \cdots & {\frac{{f_{1} \left( {x_{m}^{*} } \right) - f_{1}^{*} \left( {x_{1}^{*} } \right)}}{{f_{1}^{{}} \left( {x_{m}^{*} } \right) - f_{1}^{*} \left( {x_{1}^{*} } \right)}}} \\ \vdots & \ddots & \vdots \\ {\frac{{f_{m} \left( {x_{1}^{*} } \right) - f_{m}^{*} \left( {x_{m}^{*} } \right)}}{{f_{m}^{{}} \left( {x_{1}^{*} } \right) - f_{m}^{*} \left( {x_{m}^{*} } \right)}}} & \cdots & {\frac{{f_{m}^{*} \left( {x_{m}^{*} } \right) - f_{m}^{*} \left( {x_{m}^{*} } \right)}}{{f_{m}^{{}} \left( {x_{1}^{*} } \right) - f_{m}^{*} \left( {x_{m}^{*} } \right)}}} \\ \end{array} } \right].$$
(12)

In mixture design of experiments, the factors are the ingredients or components of a mixture, and consequently, their levels are not independent. With two components, the experimental region for the mixture experiments considers all values along one line. In the case of three components, this region is the area bounded by one triangle, where the vertices correspond to the neat blends, the sides to the binary mixtures, and the triangular region to the complete mixtures (for the mathematical formulation of Mixture Design of Experiments, see [41]).

With regard to the metrics used as weighting criteria (presented in step 6 of the flowchart), to compare how different weighting metrics affect the prediction variance, the ratios Shannon’s entropy/error and diversity/error are calculated. The use of the error allows the reduction of the distance of the optimum Pareto solution determined from its ideal value, which justifies its use in the denominator. The original ratio entropy/error (ξ) metric is obtained using the equation [40, 41]:

$$\begin{aligned} {\text{Max}}\;\xi = \frac{{\rm Entropy}}{{\rm GPE}} \hfill \\ {\text{s.t.}}:\sum\limits_{i = 1}^{n} {w_{i} } = 1 \hfill \\ \quad \;\;0 \le w_{i} \le 1. \hfill \\ \end{aligned}$$
(13)

The global percentage error (GPE) in Eq. (13) is calculated as [105]

$${\text{GPE}} = \sum\limits_{i = 1}^{m} {} \left| {\frac{{y_{i}^{*} }}{{T_{i} }} - 1} \right|,$$
(14)

where \(y_{i}^{*}\) is the value of the Pareto optimal responses;\(T_{i}\) is the defined target; \(m\) is the number of objectives.

By dividing the GPE by the number of objectives, m, we derive the mean absolute percentage error (MAPE), as presented by Montgomery et al. [106]:

$${\text{MAPE}} = \frac{1}{m}\sum\limits_{i = 1}^{m} {\left| {\frac{{y_{i}^{*} - T_{i} }}{{T_{i} }}} \right|} .$$
(15)

In the present study, the GPE will be replaced by the MAPE, yielding the equation:

$$\begin{aligned} {\text{Max}}\;\vartheta = \frac{{\text{Entropy}}}{{\text{MAPE}}} \hfill \\ {\text{s.t.}}:\sum\limits_{i = 1}^{n} {w_{i} } = 1 \hfill \\ \quad \;\;0 \le w_{i} \le 1. \hfill \\ \end{aligned}$$
(16)

Two strategies are used to define the parameter dij when calculating the diversity. First, we generate the diversity correlation (DC) using \(d_{ij} = 1 - \rho_{ij}\). Second, we create the diversity optimum (DO) using the Euclidean distance between the anchor points, i.e., points that optimize each answer individually, as presented in Eq. (6). The strategy presented in Eq. (16) will be used for both the diversity metrics.

In this work, the unscaled prediction variance (UPV) will be used as a measure of the variance of the model. According to Zahran et al. [107], several measures of prediction performance exist for comparing experimental designs, the most commonly considered one being the scaled prediction variance (SPV). SPV is defined as \(N{\text{Var}}\left[ {\hat{y}({\mathbf{X}}_{0} )} \right]/\sigma^{2} = N{\mathbf{X}}_{0}^{T} ({\mathbf{X}}^{T} {\mathbf{X}})^{ - 1} {\mathbf{X}}_{0}\), where N is the total sample size. However, if direct comparisons between the expected variance of estimation are desired, the UPV could be modeled directly by the variance of the estimated mean response divided by \(\sigma^{2}\): \({\text{Var}}\left[ {\hat{y}({\mathbf{X}}_{0} )} \right]/\sigma^{2} = {\mathbf{X}}_{0}^{T} ({\mathbf{X}}^{T} {\mathbf{X}})^{ - 1} {\mathbf{X}}_{0}\). It is equivalent to the hat matrix [108].

4 Illustrative examples

Some cases were used to demonstrate the applicability of the proposed method. The first two cases consider simulated experimental matrices for a hypothetical process. The first case considers two convex objective functions and two decision variables. The second case considers three objective functions with different convexities and two decision variables. The third case refers to a machining process for hardened steel using a tool with wiper geometry, considering three objective functions and three decision variables. These experimental matrices were composed using the central composite design (CCD), because, according to Montgomery [3], for the modeling of the response surface functions, the experimental design most often used for data collection is the CCD. In all the cases, five center points (cp) were used, because Myers et al. [108] argued that the use of five center points provides reasonable stability of the prediction variance throughout the experimental region.

4.1 Case 1

For the analysis of the first case, consider that a certain process has some characteristics that depend on two variables. Thus, to analyze two of its characteristics that are to be minimized, a sequential set of experiments was established using a CCD, constructed according to the response surface design 22, with 4 axial points and 5 central points, generating 13 experiments. Table 2 presents the CCD for this process (Step 1).

Table 2 CCD simulated for two responses—Case 1

The experimental matrix described in Table 2 presents some desirable properties for second-order response surface models, which are axial points defined as \(\alpha = \sqrt[4]{{2^{k} }}\) with the cp number equal to 5. According to Myers et al. [108], this ensures the rotationality and good dispersion of the prediction variance throughout the experimental region.

The analysis of the experimental data generates the mathematical modeling presented in Table 3, and Fig. 2 presents the response surface for the generated models (Step 2):

Table 3 Mathematical models for objective functions—Case 1
Fig. 2
figure 2

Response surfaces—Case 1

Once the equations were defined, a simplex lattice arrangement of degree 10 (Step 4) was implemented, generating the combination of weights to be used in multi-objective optimization using the NBI (Step 3).

The data in Table 4 correspond to the optimum Pareto points of the optimization of the responses y1 and y2. This set of points forms the Pareto frontier for the problem under analysis (Step 5). Figure 3 graphically shows the Pareto frontier obtained.

Table 4 Arrangement of mixtures and metric calculations—Case 1
Fig. 3
figure 3

Pareto frontier—Case 1

It can be observed in Fig. 3 that the multi-objective optimization method employed, i.e., the NBI, could construct a Pareto border with uniformly distributed points, which becomes an advantage in the decision-making process by allowing the decision maker to evaluate trade-off behavior easier and determine how prioritizing one response affects the other. This would not be possible if there were an agglomeration of solutions at some point, generating a discontinuous boundary. The mixture arrangement, by providing a uniform combination of weights, favors the construction of the frontier and the obtaining of canonical mixing polynomials by modeling the responses.

Figure 4 is presented to visualize the solution space referring to the optimal Pareto points. As the variance of the forecast is measured in the solution space, i.e., \({\text{UPV}} = {\mathbf{X}}_{0}^{T} ({\mathbf{X}}^{T} {\mathbf{X}})^{ - 1} {\mathbf{X}}_{0}\), visualizing how the points are distributed in this space is essential; therefore, this can indicate how the variance behaves in the analyzed problem.

Fig. 4
figure 4

Solution space—Case 1

An important aspect is that the weights assigned to the responses during the optimization influence the points in the solution space, which indicates that the weighting influences the prediction variance.

Based on the data presented in Table 4 (Step 6), a Pearson correlation analysis was performed between the weighting metrics and the variance measure, UPV. Thus, Table 5 presents the results of the correlation analysis, together with their respective p values, with values lower than 5% indicating statistically significant correlations.

Table 5 Pearson correlation between metrics and variance—Case 1

The ratios of the diversification metrics, i.e., entropy, DC, and DO, with MAPE were analyzed, presenting correlation values with the UPVs of − 0.687, − 0.672, and − 0.672, respectively. The negative and statistically significant correlations presented by these metrics indicate that they are good parameters for defining the optimal weights for the MOP presented, leading to a reduction in the variance and, consequently, a robust response from the point of view of variability, maintaining the diversification between the answers.

The modeling of the weighting metrics using mixture arrangement is generated (Step 7) from the data presented in Table 4. Thus, its canonical mixing polynomials are:

$$\begin{aligned} {{\text{Entropy}}\mathord{\left/ {\vphantom {{\text{Entropy}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} & = - 0.064w_{1} - 0.0408w_{2} + 32.154w_{1} w_{2} - 50.437w_{1} w_{2} (w_{1} - w_{2} ) \\ & \quad + \, 3 7.192w_{1} w_{2} (w_{1} - w_{2} )^{2} , \\ \end{aligned}$$
(17)
$${{\text{DC}}\mathord{\left/ {\vphantom {{\text{DC}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} = 0.008w_{1} - 0.032w_{2} + 3.501w_{1} w_{2} - 4.957w_{1} w_{2} (w_{1} - w_{2} ) + 2.202w_{1} w_{2} (w_{1} - w_{2} )^{2} ,$$
(18)
$${{\text{DO}}\mathord{\left/ {\vphantom {{\text{DO}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} = 0.030w_{1} - 0.118w_{2} + 13.103w_{1} w_{2} - 18.554w_{1} w_{2} (w_{1} - w_{2} ) + 8.243w_{1} w_{2} (w_{1} - w_{2} )^{2} ,$$
(19)
$${\text{UPV}} = 0.4025w_{1} + 0.2072w_{2} - 0.2497w_{1} w_{2} - 0.1270w_{1} w_{2} (w_{1} - w_{2} ) - 0.0491w_{1} w_{2} (w_{1} - w_{2} )^{2} .$$
(20)

All the canonical mixing polynomials had a good fit, as all have an adjusted R2 close to 100%. Notably, it was possible to model the UPV as a function of the weights. Therefore, the weights interfere in the space of the solution, as shown in Fig. 4.

Finally, it is possible to maximize the functions related to Entropy/MAPE, DC/MAPE, and DO/MAPE metrics. This action aims to maximize diversification and reduce error. This process executed for each metric generates a vector of optimal weights (Step 8) to be used in the original optimization problem, implemented using the NBI, generating different optimal responses, allowing their comparison. Table 6 summarizes the results obtained.

Table 6 Summary of results—Case 1

All the metrics used performed well, especially considering that the maximum value of UPV for the analyzed problem was 0.403. The goal of diversification was achieved by preventing the achievement of zero weights.

4.2 Case 2

For the analysis of the fourth case, consider three characteristics, y1, y2, and y3, of a process that depend on two variables. To maximize y1 and minimize y2 and y3, a sequential set of experiments was established using a CCD, constructed according to the response surface design 22, with 4 axial points and 5 central points, generating 13 experiments. Table 7 presents the CCD for this process.

Table 7 Simulated CCD for three responses—Case 2

The analysis of the experimental data generates the mathematical modeling presented in Table 8, and Fig. 5 presents the response surface for the generated models.

Table 8 Mathematical models for objective functions—Case 2
Fig. 5
figure 5

Response surfaces—Case 2

Once the equations were defined, a simplex lattice arrangement of degree 10 was implemented, generating the combination of weights to be used in multi-objective optimization using the NBI.

The data in Table 9 correspond to the optimum Pareto points of the optimization of the responses y1, y2, and y3. This set of points forms the Pareto frontier for the problem under analysis. Figure 6 graphically shows the Pareto frontier obtained.

Table 9 Mixture arrangement and metric calculations—Case 2
Fig. 6
figure 6

Pareto frontier from different perspectives—Case 2

Figure 7 is presented to visualize the solution space referring to the optimal Pareto points.

Fig. 7
figure 7

Solution space—Case 2

As shown in the previous case, it is observed that the points move in the solution space, as the weights are changed in the optimization process, which will directly influence the UPV values (Figs. 8, 9, 10, 11).

Fig. 8
figure 8

Response surface and contour plot for Entropy/MAPE—Case 2

Fig. 9
figure 9

Response surface and contour plot for DC/MAPE—Case 2

Fig. 10
figure 10

Response surface and contour plot for DO/MAPE—Case 2

Fig. 11
figure 11

Response surface and contour plot for UPV—Case 2

Based on the data presented in Table 9, a Pearson correlation analysis was performed between the weighting metrics and the variance measure, UPV. Thus, Table 10 presents the results of the correlation analysis, together with their respective p values.

Table 10 Pearson correlation between metrics and variance—Case 2

It can be observed that all the diversification/error metrics presented a negative and statistically significant correlation with the UPV, which indicates that the maximization of these metrics reduces the measurement of the UPV.

From the data presented in Table 9, the canonical mixing polynomials with their respective response and contour plot surfaces are shown as follows:

$$\begin{aligned} {{\text{Entropy}}\mathord{\left/ {\vphantom {{\text{Entropy}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} & = - 0.513w_{1} + 0.086w_{2} - 0.095w_{3} + 9.814w_{1} w_{2} + 7.834w_{1} w_{3} \\ & \quad + \, 1 9. 5 8 8w_{2} w_{3} - 16.483w_{1} w_{3} (w_{1} - w_{3} ) - 22.203w_{2} w_{3} (w_{2} - w_{3} ) \\ & \quad - \, 8 6. 4 1 5w_{1} w_{1} w_{2} w_{3} { + 93} . 3 7 2w_{1} w_{2} w_{2} w_{3} + 22 1. 7 8 4w_{1} w_{2} w_{3} w_{3} \\ & \quad + \,36.169w_{1} w_{3} (w_{1} - w_{3} )^{2} + 14.674w_{2} w_{3} (w_{2} - w_{3} )^{2} , \\ \end{aligned}$$
(21)
$$\begin{aligned} {{\text{DC}}\mathord{\left/ {\vphantom {{\text{DC}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} & = - 0.013w_{1} + 0.051w_{2} + 0.006w_{3} + 3.835w_{1} w_{2} + 2.782w_{1} w_{3} \\ & \quad { + }\, 1. 9 4 5w_{2} w_{3} - 1.132w_{1} w_{2} (w_{1} - w_{2} ) - 5.330w_{1} w_{3} (w_{1} - w_{3} ) \\ & \quad - \,2.321w_{2} w_{3} (w_{2} - w_{3} ) - 7. 6 3 5w_{1} w_{1} w_{2} w_{3} + 1 4. 4 4 0w_{1} w_{2} w_{2} w_{3} \\ & \quad { + }\, 4 8. 3 4 2w_{1} w_{2} w_{3} w_{3} - 2.662w_{1} w_{2} (w_{1} - w_{2} )^{2} + 6.798w_{1} w_{3} (w_{1} - w_{3} )^{2} , \\ \end{aligned}$$
(22)
$$\begin{aligned} {{\text{DO}}\mathord{\left/ {\vphantom {{\text{DO}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} & = - 0.12w_{1} + 0.16w_{2} - 0.03w_{3} + 4.51w_{1} w_{2} + 3.53w_{1} w_{3} \\ & \quad { + }\, 7. 7 8w_{2} w_{3} - 6.23w_{1} w_{3} (w_{1} - w_{3} ) - 8.00w_{2} w_{3} (w_{2} - w_{3} ) \\ & \quad - \, 3 1. 3 2w_{1} w_{1} w_{2} w_{3} + 2 9. 0 7w_{1} w_{2} w_{2} w_{3} + 6 2. 6 6w_{1} w_{2} w_{3} w_{3} \\ & \quad - \, 3.06w_{1} w_{2} (w_{1} - w_{2} )^{2} + 10.08w_{1} w_{3} (w_{1} - w_{3} )^{2} , \\ \end{aligned}$$
(23)
$$\begin{aligned} {\text{UPV}} & = 0.193w_{1} + 0.403w_{2} + 0.207w_{3} - 0.431w_{1} w_{2} - 0.012w_{1} w_{3} \\ & \quad - \,0.253w_{2} w_{3} { + }0.379w_{1} w_{2} (w_{1} - w_{2} ){ + }0.058w_{1} w_{3} (w_{1} - w_{3} ) \\ & \quad - \,0.124w_{2} w_{3} (w_{2} - w_{3} ) + 0. 9 9 7w_{1} w_{1} w_{2} w_{3} - 0. 5 8 0w_{1} w_{2} w_{2} w_{3} \\ & \quad - \, 0. 4 0 5w_{1} w_{2} w_{3} w_{3} - 0.132w_{1} w_{2} (w_{1} - w_{2} )^{2} - 0.123w_{1} w_{3} (w_{1} - w_{3} )^{2} \\ & \quad - \,0.051w_{2} w_{3} (w_{2} - w_{3} )^{2} . \\ \end{aligned}$$
(24)

Notably, all the canonical mixing polynomials had a good fit, as all have an adjusted R2 close to 100%. Once again, we could model the variance as a function of the weights.

As in the first case, the functions of the metrics were maximized, generating the result presented in Table 11.

Table 11 Summary of results—Case 2

If the amplitude of variation of the UPV for this problem (0.190–0.403) is considered, it can be affirmed that all the analyzed metrics performed well, as they led to the choice of optimal Pareto points located in the region of minimum variance. Furthermore, the weights between the responses are well distributed and without zero weights due to diversification.

4.3 Case 3—Real case analysis

For this real case analysis, the method proposed in this work was used to optimize the machining process for hardened steel AISI H13 using a polycrystalline cubic boron nitride (PCBN) tool with wiper geometry, based on Campos [109]. For this study, we considered the material removal rate (MRR), surface roughness parameter (Ra), and cutting force (Fc), using cutting speed (Vc), feed rate (f), and the depth of cut (d) as the decision variables. The workpieces were machined using the range of parameters defined in Table 12. The decision variables were analyzed in a coded way.

Table 12 Parameters used in the experiments

A sequential set of experimental runs was established using a CCD built according to a response surface design 23, with 6 axial points and 5 center points, generating 19 experiments (Table 13).

Table 13 CCD for MRR, Ra, and Fc.

The analysis of the experimental data generates the mathematical modeling presented in Table 14, and Fig. 12 presents the response surface for the generated models.

Table 14 Mathematical models of objective functions
Fig. 12
figure 12

Response surface for MRR, Ra, and Fc (hold value: d = 0)

Once the equations were defined, a simplex lattice arrangement of degree 10 was implemented, generating the combination of weights to be used in multi-objective optimization using the NBI.

The data in Table 15 correspond to the optimum Pareto points of the optimization of the responses MRR, Ra, and Fc. This set of points forms the Pareto frontier for the problem under analysis. Figure 13 graphically shows the Pareto frontier obtained.

Table 15 Mixture arrangement and metric calculations—Case 3
Fig. 13
figure 13

Pareto frontier from different perspectives—Case 6

Figure 14 is presented to visualize the solution space referring to the optimal Pareto points.

Fig. 14
figure 14

Solution space—Case 6

As shown in the previous case, it is observed that the points move in the solution space, as the weights are changed in the optimization process, which will directly influence the UPV values (Figs. 15, 16, 17, 18).

Fig. 15
figure 15

Response surface and contour plot for Entropy/MAPE—Case 3

Fig. 16
figure 16

Response surface and contour plot for DC/MAPE—Case 3

Fig. 17
figure 17

Response surface and contour plot for DO/MAPE—Case 3

Fig. 18
figure 18

Response surface and contour plot for UPV—Case 3

Based on the data presented in Table 15, a Pearson correlation analysis was performed between the weighting metrics and the variance measure, UPV. Thus, Table 16 presents the results of the correlation analysis, together with their respective p values.

Table 16 Pearson correlation between metrics and variance—Case 3

Notably, all the diversification/error metrics presented a negative and statistically significant correlation with the UPV, which indicates that the maximization of these metrics reduces the measurement of the UPV.

From the data presented in Table 15, the canonical mixing polynomials with their respective response and contour plot surfaces are shown as follows:

$$\begin{aligned} {{\text{Entropy}}\mathord{\left/ {\vphantom {{\text{Entropy}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} & = - 0.044w_{1} - 0.008w_{2} + 0.002w_{3} + 7.430w_{1} w_{2} + 8.170w_{1} w_{3} \\ & \quad { + }\, 8. 6 6 3w_{2} w_{3} - 2.030w_{1} w_{3} (w_{1} - w_{3} ) - 1.245w_{2} w_{3} (w_{2} - w_{3} ) \\ & \quad { + }\, 1 6. 8 0 9w_{1} w_{1} w_{2} w_{3} + 10.736w_{1} w_{2} w_{2} w_{3} + 21.903w_{1} w_{2} w_{3} w_{3} \\ & \quad { + }\,2.772w_{1} w_{2} (w_{1} - w_{2} )^{2} + 4.666w_{1} w_{3} (w_{1} - w_{3} )^{2} + 5.090w_{2} w_{3} (w_{2} - w_{3} )^{2} , \\ \end{aligned}$$
(25)
$$\begin{aligned} {{\text{DC}}\mathord{\left/ {\vphantom {{\text{DC}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} & = - 0.0028w_{1} + 0.0058w_{2} + 0.0013w_{3} + 4.3612w_{1} w_{2} + 1.5282w_{1} w_{3} \\ & \quad { + }\, 2. 4 7 1 5w_{2} w_{3} - 0.0533w_{1} w_{2} (w_{1} - w_{2} ) - 0.3433w_{1} w_{3} (w_{1} - w_{3} ) \\ & \quad - \,0.3297w_{2} w_{3} (w_{2} - w_{3} ) + 1. 2 7 6 0w_{1} w_{1} w_{2} w_{3} + 1. 5 3 7 7w_{1} w_{2} w_{3} w_{3} \\ & \quad - \,0.4891w_{1} w_{2} (w_{1} - w_{2} )^{2} + 0.0696w_{1} w_{3} (w_{1} - w_{3} )^{2} + 0.1254w_{2} w_{3} (w_{2} - w_{3} )^{2} , \\ \end{aligned}$$
(26)
$$\begin{aligned} {{\text{DO}}\mathord{\left/ {\vphantom {{\text{DO}} {\text{MAPE}}}} \right. \kern-0pt} {\text{MAPE}}} & = - 0.006w_{1} + 0.010w_{2} + 0.004w_{3} + 9.099w_{1} w_{2} + 5.776w_{1} w_{3} \\ & \quad { + }\, 6. 0 5 0w_{2} w_{3} - 0.123w_{1} w_{2} (w_{1} - w_{2} ) - 1.311w_{1} w_{3} (w_{1} - w_{3} ) \\ & \quad - \,0.780w_{2} w_{3} (w_{2} - w_{3} ) + 3. 3 2 6w_{1} w_{1} w_{2} w_{3} - 1.385w_{1} w_{2} w_{2} w_{3} \\ & \quad { + }\,3.190w_{1} w_{2} w_{3} w_{3} - 1.037w_{1} w_{2} (w_{1} - w_{2} )^{2} + 0.200w_{1} w_{3} (w_{1} - w_{3} )^{2} \\ & \quad { + }\,0.324w_{2} w_{3} (w_{2} - w_{3} )^{2} , \\ \end{aligned}$$
(27)
$$\begin{aligned} {\text{UPV}} & = 0.583w_{1} + 0.443w_{2} + 0.349w_{3} - 0.307w_{1} w_{3} + 0.343w_{2} w_{3} \\ & \quad { + }\,1.143w_{1} w_{2} (w_{1} - w_{2} ) + 0.822w_{1} w_{3} (w_{1} - w_{3} ) + 0.803w_{2} w_{3} (w_{2} - w_{3} ) \\ & \quad - \, 1 0. 6 6 2w_{1} w_{2} w_{2} w_{3} - 1.260w_{1} w_{2} (w_{1} - w_{2} )^{2} + 0.810w_{1} w_{3} (w_{1} - w_{3} )^{2} \\ & \quad { + }\,0.644w_{2} w_{3} (w_{2} - w_{3} )^{2} . \\ \end{aligned}$$
(28)

Notably, all the canonical mixing polynomials had a good fit, as all have an adjusted R2 close to 100%. Once again, we could model the variance as a function of the weights.

As in the other presented cases, the functions of the metrics were maximized, generating the result presented in Table 17.

Table 17 Summary of results—Case 3

If the amplitude of variation of the UPV for this problem (0.279–0.607) is considered, it can be affirmed that all the analyzed metrics performed well, as they led to the choice of optimal Pareto points located in the region of minimum variance. Furthermore, the weights between the responses are well distributed and without zero weights due to diversification.

4.4 Comparative analysis between the cases

The results of the cases are compared. Table 18 presents the results of UPV for each metric in each case analyzed.

Table 18 Summary of results—Case 3

In general, the strategy of using diversification and error as parameters for the selection of the most preferred Pareto optimal point was efficient, as it led to the choice of a point with low prediction variance without zeroing any of the weights associated with the objective functions. The use of the proposed weighting for Case 1 leads to a 45.66% reduction in the prediction variance if we consider the maximum value of UPV for the problem in question. For Case 2, the reduction in the UPV is 52.85%. In Case 3, which presented a problem of optimization of the turning process of hardened steel using a tool with wiper geometry, the reduction is 49.75%. For real industrial problems, the information regarding the most reliable prediction is very important, because the analyst does not initially know where the optimum is situated in the experimental space.

Notably, in all the cases analyzed, it was possible to model the variance in terms of weights. This is because the weights interfere in the solution space. Nevertheless, the points of the solution space chosen to optimize the single optimization problem, i.e., the individual objectives, do not necessarily match the minimum variance points of the experimental space. This makes the act of choosing Pareto optimal robust solutions non-trivial. In this context, the ROPS proposal becomes relevant by inducing the choice of Pareto optimal points having less prediction variance.

Finally, the behavior of the variance and the choice of the optimal Pareto point with less variability were not affected either by the convexity of the functions and by the number of functions to be optimized or by the amount of variables involved in each process. This allows the use of ROPS to solve problems of different dimensions.

5 Conclusions

As previously mentioned in Sect. 2.1, weighting methods for selecting an optimal point in the Pareto frontier, as an aid to decision-making, are studied even after several years of research. The present study aimed to discuss the variability of the Pareto optimal responses, which is not extensively discussed in the literature, despite the extensive discussion about the behavior of the variance in experimental designs. Therefore, this paper introduced the ROPS, developed to choose the most preferred Pareto optimal point in MOPs using RSM.

The study could demonstrate that the weights used in the MOP influence the prediction variance of the obtained response. Furthermore, the use of diversification measures, such as entropy and diversity, associated with measures of error, such as MAPE, was useful in mapping regions of minimum variance within the Pareto optimal responses obtained in the optimization process. Thus, the results show that the proposed method is efficient and applicable to choose the vector of weights that produces Pareto optimal results with less variability and greater reliability.

Finally, the use of metrics proposed in ROPS is presented as a useful tool in the multiple-criteria decision-making process, because it leads to robust responses without the necessity of including the variance term in the mathematical formulation of the problem, making it simpler. As a proposal for future studies, we recommend the use of ROPS for different designs of experiments models, to evaluate their behavior under different experimental conditions.