Introduction

Metaheuristic approaches such as genetic algorithms, simulated annealing and ant colony optimization have been widely used to deal with various mining optimization problems (Kumral 2004; Kumral and Dowd 2005; Leite and Dimitrakopoulos 2007; Manchuk and Deutsch 2008; Lamghari and Dimitrakopoulos 2012; Shishvan and Sattarvand 2015; Goodfellow and Dimitrakopoulos 2016; Ruiseco et al. 2016; Sauvageau and Kumral 2016; Villalba and Kumral 2018a, b). Metaheuristic techniques are popular because they are easy to formulate, are highly adaptable to the problem structure, require less rigorous mathematical background on the part of the user, and require shorter computational times compared to exact approaches (Rayward-Smith 1996). However, it is almost impossible to know how near a solution is to optimal. As an intelligent and iterative process, metaheuristic generation adapts global (exploration) and local (exploitation) search concepts to find near-optimal solutions based on learning strategies (Osman and Laporte 1996). Given that mines are planned and designed under many uncertainties, near-optimal solutions are often acceptable. “Nearness” depends on the trade-off between exploration and exploitation, which is governed by the selection of parameter values. Thus, parameter calibration is a key issue when implementing a metaheuristic approach. However, it is usually overlooked and parameters are selected arbitrarily instead.

Evolutionary algorithms like genetic algorithms (GA) have been used in finance and mine planning optimization with positive results, for example in: increasing the profit by controlling dilution, accounting for orebody uncertainty, and balancing mining directions and orebody orientation in stope definition for underground mining (Villalba and Kumral 2018a, b); determining ore and waste dig limits of daily production for open-pit mining (Ruiseco et al. 2016; Ruiseco and Kumral 2017); finding the optimal location for a mine facility; and optimizing parameters of the Schwartz–Smith two-factor model to analyze the information from a transaction in future markets to reduce financial risk in future contracts on commodity prices (Sauvageau and Kumral 2016). However, these GA-based methodologies did not cover in detail the importance of defining their GA control parameters.

A generation in GA starts with an initial solution set or population of chromosomes. Using crossover and mutation operators, new offspring are formed. Solutions with high fitness values are retained to generate offspring. In other words, the higher the fitness, the higher the chance that solution will pass its genotype to the next generation. This process is repeated over many iterations until convergence. The performance of a GA depends mainly on the parameters (Mitchell 1999; Reeves 2003; Pandey et al. 2014), whose values should be determined carefully; otherwise, the search may end with a solution that is trapped in a local optimum. In practice, parameter values are usually calibrated one at a time using a simple sensitivity analysis. Although this method can potentially find sensitive parameters, it is time-consuming and inefficient in mine planning optimization where there are many parameters.

In the past, various approaches have been proposed to calibrate GA parameters. Grefenstette (1986) calibrated GA parameters through an adaptive search and tuning strategy, whereby two performance metrics were used: (1) online performance based on the average performance of examined structures during the search (t = 0, 1,,T), where each structure is evaluated at each time t; and (2) offline performance based on the average of the best performances in time intervals [0,t]. Global robustness requires performance measures for the entire set of the response surface. For example, six control parameters needed 1000 evaluations using metalevel GAs, and each GA tested against five functions. Since the first experiment selected samples from a performance distribution, the best 20 samples of this distribution required additional testing. Even though the tests succeeded in finding control parameters that optimized the GA performance, this approach may require intractable CPU time to select control parameters in relevant mining problems. Eiben et al. (1999) proposed a taxonomy to eliminate vagueness in terminology and reviewed previous researches regarding control parameters in evolutionary algorithms. In that paper, three drawbacks regarding calibration were emphasized: (i) trying all combinations become nearly impossible in some problems; (ii) even if interactions are ignored, the evaluations can be time-consuming; and (iii) despite running many configurations, there is still a chance that selected parameters are not optimal The epistasis iteration between control parameters—which include the mutual impact of parameters on each other, and the complexity of joint influence of parameters on GA behavior—challenges any optimization of GA setting. A self-adaptive method could assist GAs in arranging their parameter itself; otherwise, a skeptical approach could use heuristic tools to adjust parameters adding to adaptive parameter control. Nannen and Eiben (2007) presented an approach called Relevance Estimation and Value Calibration of GA parameters (REVAC), which systematically explores the range of possible parameter setting combinations. In this approach, a distribution by each parameter (marginal density function) over a parameter’s range assigns high probability to values that lead to excellent GA performance. Distributions with a narrow peak or a broad plateau correspond to highly or moderately relevant parameters, respectively. Thus, values with high probability in such distributions define the GAs setting. With this approach, however, the use of 1000 evaluations demands unmanageable computing time; therefore, it is hard to implement in mining problems.

This paper proposes an approach to calibrate the population size and crossover and mutation rates of a GA using (1) full factorial design (FFD) to estimate main effects and parameter interactions and (2) response surface methodology (RSM) to find optimum values of design parameters. These two methods share the following characteristics (Bezerra et al. 2008): (a) the experimental design includes different levels and combinations of factors; (b) the factors are independent variables; (c) “levels” refer to values that these factors can take and are used during coding to replace design factors with an indicator set (e.g., in a three-level experiment design, the low, middle and high values are replaced by – 1, 0, + 1, respectively); (d) responses are dependent variables; and (e) residual error helps to measure how well a model fits the experimental data where low residual error is desired.

Full Factorial Design

Ronald A. Fisher introduced FFD in the 1920s through the early 1930s in collaboration with researchers from many fields and proposed the three principles of experimental design: randomization, replication and blocking (Montgomery 1997). Randomization prevents unknown bias, which can modify the result of an experiment. Replication of an experiment under the same conditions is performed to estimate experimental error and increase the precision of the experiment. Blocking helps to increase precision by eliminating the effect of nuisance factors on experimental error (Clifton Young 1996; Telford 2007).

A FFD consists of two or more factors, each with two or more levels. It differs from other designs because experimental units take all combinations of the factor levels (Fig. 1). A FFD is also called a “fully crossed design” because it permits analysis of the effect of all factors on the response variable, all levels of the factors and interactions between factors. This design is geometrically assembled because values are taken from the edge midpoints, axials and vertices of a cube in case of three factors (Fig. 1, left), that is, the number of factors implies n-dimensional shape. A FFD requires 3n runs (n = number of factors studied with three levels) to cover all experimental points. To overcome flaw in the exponential number of combinations of factors, an improved version called fraction factorial design can collect a fraction of the total number of vertices.

Figure 1
figure 1

Three-level factorial (left), Box–Behnken (middle) and Box–Wilson (right) designs

Response Surface Methodology

The RSM was introduced by George E. P. Box and Kenneth B. Wilson in the 1950s for use in chemical industries (Box and Wilson 1951). The ability to refine models and optimize a response that depends on several variables are the main strengths of RSM (Montgomery 1997). The methodology consists of: (1) choosing the independent variables; (2) delimiting the experimental region; (3) determining the experimental design; (4) fitting the experimental data; (5) evaluating the model fitness; and (6) evaluating the displacement in direction to the optimal region, which may lead to finding the optimal values of the experimental variables (Bezerra et al. 2008). Polynomial functions describe the relationships between response and independent variables. Since a single polynomial model may not represent the functional relationship over the entire domain, the domain can be divided to yield a reasonable approximation per portion. Interactions between independent variables can be modeled by a low-order (e.g., first order or second order) polynomial model to describe the system and explore experimental conditions leading to its optimization (Bezerra et al. 2008).

The Box–Behnken (known as three-level) and Box–Wilson (known as central composite) are two common designs in RSM. The Box–Behnken design performs well with few factors. It is suitable for fitting quadratic models that require three levels of each factor; the treatment combinations are at the edge midpoints and center of the process spaces (Ferreira et al. 2007). For instance, the geometry of a Box–Behnken design for three factors may be as a sphere partially within a cube whose edge midpoints correspond to tangents on the sphere at 12 locations (Fig. 1, middle). The number of experiments is 2n (n − 1) + cp = 13, where cp is a central point and n is number of factors. The Box–Wilson design is considered a fractional design that adds points to estimate curvature. For three factors, the circumscribed central composite will have points that describe a sphere around the factorial cube. The design may consider 2n + 2n + cp = 16 experiments, which include fractional factorial points, axial points with distance alpha from cp and center points (Fig. 1, right). However, if the data lack curvature, FFD or experimental design for the first-order model can be explored.

Model Formulation

During development of a GA-based stope optimization method, Villalba and Kumral (2018a, 2018b) noted that the GA parameter values significantly affected the quality of the solution where profits vary ± 11 and ± 7%, respectively. Furthermore, the quality of the solution could not be assessed because the parameter selection was based on sensitivity analysis, which randomizes one variable at a time and cannot quantify interaction effects of multiple parameters on profit or how well the solution space was searched. In the current research, multiple runs of the stope optimizer for each parameter combination were executed with the main goals of (1) finding the parameter configuration that maximizes profit in the stope optimizer using RSM and (2) understanding the relative importance of each parameter and quantifying parameter interactions using FFD.

Each experiment provides a response (y) and the change in response produced by a change in the level of the factor defines their factor effect. An iteration between factors is presented when the difference in response between levels of one factor is not the same as other factors (Montgomery 1997). Thus, the iteration between factors is described by a regression model (Eq. 1), which describes the set of responses (Eq. 2) given by metaheuristic optimization.

$$ \hat{y}(\text{z} ) = \beta_{0} + \beta_{1} z_{1} + \beta_{2} z_{2} + \beta_{3} z_{3} + \beta_{4} z_{4} + \cdots + \beta_{p} z_{p} + \varepsilon $$
(1)

Factors or p predictor variables z1, z2,…,zp in Eq. 1 are associated with the metaheuristic optimization parameters to be calibrated. For instance, if the model requires three parameters to be calibrated, predictor variables z1, z2 and z3 will be linked with these parameters, and their interactions terms are represented by predictor variable z4 to zp. The regression model also contains a variable ε, which is the measurement error and the effect of other variables not explicitly considered in the model. The linear regression theory of a single \( \hat{y}(\text{z} ) \) response defines a mean that depends continuously on the z1, z2,…,zp and random error ε (Johnson and Wichern 2007). The unknown parameters β0, β1…βp in Eq. 1 are estimated from the solution of n experiments or y responses. The model becomes represented by n independent observations on y and their linked values of z, thus:

$$ \left[ {\begin{array}{*{20}c} {y_{1} } \\ {y_{2} } \\ \vdots \\ {y_{n} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 1 & {z_{11} } & {z_{12} } & \cdots & {z_{1p} } \\ 1 & {z_{21} } & {z_{22} } & \cdots & {z_{2p} } \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & {z_{n1} } & {z_{n2} } & \cdots & {z_{np} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\beta_{0} } \\ {\beta_{1} } \\ \vdots \\ {\beta_{p} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {\varepsilon_{1} } \\ {\varepsilon_{2} } \\ \vdots \\ {\varepsilon_{n} } \\ \end{array} } \right] $$
(2)

The polynomial model permits examination of the set of predictor variables to determine how well they predict a response and which variables are significant predictors of a response. The regression equation from fitting a model to the observed y1, y2,…,yn and corresponding z11,…,z1p; z21,…,z2p,…,zn1,…,znp known values are used to calculate regression coefficients β0, β1,…,βp using least square estimation. However, since variables have different units, it is challenging to directly compare coefficients because lower coefficients do not necessarily represent less important predictors. A standardized coefficient (std-beta) is needed to compare the coefficients and find the predictors that impact more (or less) on the response. The coefficients of the predictor with negligible influence on the response or multiple predictors linearly related with other predictors can be dropped from the selected model (Helland 2000). The z values that represent the interaction of the control parameters generate a high dimension surface called response surface. This response surface can be portrayed as a contour plot. In the absence of iterations, the lines of this contour plot will be parallel and straight and would correspond to a flat surface. The response surface plot facilitates finding the best setting for the control parameters graphically.

The regression model fits the set of observations with values derived from metaheuristic optimization. Once the best-fit \( \hat{y}_{r} \left( z \right) \) regression model is found for r response, the values that provide the best response in the response surface are found by maximizing a \( d_{r} \left( {\hat{y}_{r} (z)} \right) \) desirability function (Eqs. 3 and 4) using a gradient descent algorithm. In addition, the procedure gives the option to have R responses r = 1,,R given the same dependent variables. Maximizing the desirability function provides the values of the dependent variables that generate the maximum or minimum response given the regression model. This function requires three parameters to calculate desirability values: the maximum limit Ur, the lower limit Lr per response r, and exponent s (weight), which defines how important it is to reach Ur in response maximization (Eq. 3) and to reach Lr in response minimization (Eq. 4).

$$ d_{r} \left( {\hat{y}_{r} (z)} \right) = \left\{ {\begin{array}{l} {0\quad{\text{ if }}\hat{y}_{r} \left( z \right)\, <\, L_{r} } \\ {\left( {\frac{{\hat{y}_{r} \left( z \right) - L_{r} }}{{U_{r} - L_{r} }}} \right)^{s}\quad {\text{ if L}}_{r} \le \hat{y}\left( z \right) \le } \\ {1\quad{\text{ if }}\hat{y}_{r} \left( z \right) \,> \,U_{r} } \\ \end{array} U_{r} \, }\quad \right.\forall r = 1,..,R $$
(3)
$$ d_{r} \left( {\hat{y}_{r} (z)} \right) = \left\{ {\begin{array}{l} {1\quad{\text{ if }}\hat{y}\left( z \right)\, < \,L_{r} } \\ {\left( {\frac{{\hat{y}_{r} \left( z \right) - U_{r} }}{{L_{r} - U_{r} }}} \right)^{s} \quad{\text{ if }}L_{r} \le \hat{y}\left( z \right) \le } \\ { 0\quad {\text{ if }}\hat{y}\left( z \right) \,>\, U_{r} } \\ \end{array} U_{r} \, } \quad \right.\forall r = 1,..,R $$
(4)

The weight considers s = 1 when the desirability function is linear, s > 1 when the points close to Ur or Lr, need to be treated with high importance and s < 1 when they need to be treated with lower importance (Bezerra et al. 2008). The Ur must denote a larger desired value in maximization (Eq. 3) and Lr with lower desired value in minimization. The desirability function,\( d_{r} \left( {\hat{y}_{r} (z)} \right) \), is used to maximize (or/and minimize) R responses simultaneously (Derringer and Suich 1980) through the gradient ascent (or descent) iterative optimization algorithm for finding their maximum (and/or minimum) values. The search starts at any location in the function, then a small step proportional to the positive (or negative) gradient vector is taken at each iteration to approach a local maximum (or minimum) of the function. To move alongside (or against) the gradient, the values of this step are aggregated to (or subtracted from) the location that is updated during iterations. The goal is to find a sequence of updates that show convergence of the function to a local maximum (or minimum), which can also be global if the function is convex (Snyman 2005). An overall desirability D combines the individual \( d_{r} \left( {\hat{y}_{r} (z)} \right) \)r = 1,,R desirabilities using a geometric mean (Eq. 5). Desirability ranges from 0 to 1, where 0 corresponds to undesirable responses and 1 corresponds to the most desirable response to reach Ur and/or Lr.

$$ D = \left( {d_{1} \left( {\hat{y}_{1} (z)} \right) \times d_{2} \left( {\hat{y}_{2} (z)} \right) \ldots \times d_{R} \left( {\hat{y}_{R} (z)} \right)} \right)^{{\frac{1}{R}}} $$
(5)

The methodology described here is used to calibrate GA parameters, but can be used with any metaheuristic algorithm. Evolutionary algorithms like GA have been used in mine planning; however, these GA-based methodologies did not cover in detail the importance of defining their GA control parameters, such as population size, crossover rate, and mutation rate (Kumral 2004; Ruiseco et al. 2016; Sauvageau and Kumral 2016; Verhoeff 2017; Villalba and Kumral 2018a).

The population size (P) affects GA performance: a small P will have limited access to the search space, resulting in poor performance. A large P with more access to the search space prevents premature convergence to suboptimal solutions; however, a low convergence rate results since a large P requires more evaluations per generation.

The crossover rate (C) influences the number of times crossover is used, where C × P solutions or (C × solution space) locations in the solutions undertake crossovers. The crossover exploits the current solutions, which lead the population to converge on a good solution, which can be the global optimum. The higher the C, the more quickly a population accounts for new solutions, but too high a C carries the risk of discarding high-performance solutions before the selection process produces any improvement (Grefenstette 1986). Crossover rates that are too low will not permit enough new solutions; changes in the population will be null, resulting in a lower exploration rate.

The mutation rate (M) assists in increasing variability of a population as a secondary search operator. After the crossover operation, the block position of new population solutions undergoes random changes, which increase the probability of exploring the search space. Mutation rates that are too low lead to premature convergence to local optima instead of the global optimum. High mutation rates lead to random searches, which diminishes the GA search ability and prevents converging on the optimum solution (Reeves 2003).

The control parameters in GA can vary among mining deposits: the optimal control parameters at deposit A will differ from deposit B because mining deposits are complex and their grade spatial variability influences their mine planning optimization setting parameters. The control parameters P, C and M can lead to profit variation of ± 11% (Villalba and Kumral 2018a) and affect the performance and efficiency of GA. Thus, mine planning optimization requires different control parameters values. The methodology herein balances exploitation and exploration ability by finding the optimal control parameters. The case study demonstrates the calibration of these parameters by maximizing the desirability function of approaching the desired profit by using a gradient descent algorithm. Three regression models obtained from three experiments assist in defining their respective desirability function.

Case Study

To illustrate the merits of the proposed approach, the control parameters of GA-based stope optimization are calibrated. The optimization determines orebody portion to maximize the profit of underground mining operation (Villalba and Kumral 2017). Input data are from a narrow vein gold deposit sector located in a volume of 140 m in the east, 188 m in the north and 150 m in the vertical direction. A total of 109 composites of gold data and a three-dimensional geological solid were used to calculate vein grades, with gold values ranging from 0.017 to 28.26 g/t. This data corresponds to the case study data used and stope layout algorithm proposed by Villalba and Kumral (2018b). Multiple realizations or equally probable orebody scenarios, simulated using sequential Gaussian simulation (Deutsch and Journel 1998; Shi et al. 2000), were scaled into the blocks to provide input to the stochastic stope layout optimizer. To account for orebody uncertainty and find the most profitable mining direction of stopes following the orebody directions, the stope layout based on GA was performed in three stages: (1) measure stope layout uncertainty based on equally probable orebody realizations; (2) create an average design whereby feasibility evaluation breeds an initial population; and (3) use GA operators to improve this initial population over generations. The proposed methodology mainly affects the third stage where GA is used. The control parameters of the GA were calibrated using experimental design tools in JMP statistical software (v. 13.0.0), in which desirability is maximized using gradient descent algorithm and prediction profiler tools.

The FFD, Box–Behnken, and Box–Wilson designs define the stope layout profit as a single response variable and the desirability function (Eq. 3) for maximization, minimization and target. The desirability is maximization of the profit, expressed in thousands (K). L1 = US$18,500 K is the lower limit, U1 = US$22,000 K is the upper limit and importance is s = 1. Desirability is 0 when a design provides profits < US$18,500 K, and is 1 when a design has profit > US$22,000 K. Each design evaluation (Table 1) considers P from 10 to 40, C from 0.1 to 0.4 and M from 0.05 to 0.15. Ranges of these control parameters were taken from Villalba and Kumral (2018b). The sensitivity analysis showed that values outside these ranges did not generate higher profit. The lower and upper values of these ranges correspond to the minimum and maximum values, respectively, in the definition of factors in surface methodology and two-level FFD. Also, the FFD uses an additional level (P = 25, C = 0.25, M = 0.1) to make a fair comparison with the other two designs that also consider central points in their experimental designs. The first, second and third design requires 27, 13 and 15 experiments, respectively. These experiments use three levels referring to the values that P, C, and M can take and are used during coding (the low, middle and high values are replaced by –, 0, +, respectively, in JMP software), e.g., the experiment 000 in Table 1 means that P, C, and M take middle values 25, 0.25, and 0.1, respectively. The computing time was stored only for the Box–Wilson experiment because a further analysis considers both profit and also time as responses in Figure 8.

Table 1 Population size (P), crossover rate (C), mutation rate (M), profit and solution time for full factorial design (FFD), Box–Behnken and Box–Wilson experimental designs

Sensitivity analysis considers the effect of one parameter at each iteration and ignores the inter-dependencies among the parameters. Therefore, computational time increases due to a large number of tests, unrealistic scenarios may be unnecessarily examined, and less information is gathered. On the other hand, experimental design requires a reduced number of experiments to describe the relationship between factors and predictors by a polynomial model, which assists in finding optimal control parameters while maximizing profit. To calibrate parameters in this case study, sensitivity analysis required 60 experiments, FFD required 45% of sensitivity analysis experiments, and Box–Wilson and Box–Behnken required 44–52% of FFD experiments (Table 1). Thus, the proposed approach based on experimental design simplifies and shows better performance in comparison with sensitivity analysis.

The initial attempt at model selection considers only the first-order model, which gives the basic model representation and helps to understand the initial contribution on the profit of P, C, and M independently before accounting for their interactions. As expected, the fit of this preliminary RSM without cross product factors is low: R2 = 0.35, 0.42 and 0.17 for FFD, Box–Behnken and Box–Wilson designs, respectively. However, if standardized coefficients (std-beta) are compared, P has the strongest influence in the FFD, Box–Behnken and Box–Wilson designs (std-beta = 0.35, 0.46 and 0.39, respectively) followed by C (std-beta = 0.25, 0.43 and 0.12, respectively) and M (std-beta = 0.09, 0.16 and − 0.07, respectively). The variance inflation factor (VIF) across the three designs varies between 1 and 1.3, where VIF < 10 means there is no multicollinearity between the factors or the predictor variables. Ideally, the predictor variables of regression models are weakly correlated with each other but highly correlated with the dependent variable.

The lack of fitness of the preliminary model suggests more terms are needed. Terms may be added in the following order: (1) first-order, (2) cross product (iteration) and (3) second-order (quadratic) and (4) higher-order terms. Some scenarios will require transforming either the responses or the factors (Melvin 2000). The three preliminary first-order models were expanded to cross products in FFD (R2 = 0.56) and second-order in RSM for both Box–Behnken (R2 = 0.88) and Box–Wilson (R2 = 0.74) designs. Adjusted R2 values increased (from 0.23 to 0.32 in FFD, from 0.23 to 0.37 in Box–Behnken and from – 0.03 to 0.34 in Box–Wilson) because the adjusted R2 considers the number of predictors as denominators in their equations. R2 is an intuitive measure for model selection and knows how well a model fits a set of observations; however, additional residual plot, knowledge about nature of the problem, and other model statistics assist in the fitness evaluation. Since low R2 value still indicates a right relationship between predictors and response, low R2 values obtained are tolerated in the calibration of GA control parameters because the proposed approach focusses in modeling the relationship between factors with higher priority than profit prediction itself. When the GA control parameters are defined, the profit is maximized again using metaheuristic stope layout optimization.

In the three designs, the least square algorithm was used to fit the multiple regression models, where their intercept terms ranged from US$18,941 K to US$19,643 K. The interactions between independent variables given by FFD—during an screening stage—we modeled by a low-order polynomial model (Eq. 6) to explain the system and explore experimental conditions, whereas RSM required a second-order polynomial to define the nature of the response surface in the optimal region that assist in finding optimal setting while maximizing the response (Eqs. 7 and 8).

As observed in the preliminary model and Table 2, P had the strongest effect on the profit (std-beta = 0.35–0.46) and M had the least effect (std-beta = 0.08–0.16) among the designs.

Table 2 Standardized coefficients in regression models for FFD, Box–Behnken and Box–Wilson experimental designs

Results of Full Factorial Design

The regression model for the three-level FFD is shown in Eq. 6. Profit depends on seven terms, which are three control parameters and their iteration terms.

$$ \begin{aligned} {\text{Profit}} & = 18941.3 - 10.9P + 644.81C + 680M + 70(P - 25)(C - 0.25) - 55.9(P - 25)(M - 0.1) \\ & \quad + 18022.2(C - 0.25)(M - 0.1) \\ \end{aligned} $$
(6)

As expected, the FFD surface did not perform as well as the RSM designs to fit the case study data (black dots in Fig. 2), but suggests graphically a critical point to be explored that is congruent with the RSM design results. This corresponds to P, C and M of 40, > 0.40 and > 0.10, respectively.

Figure 2
figure 2

The surface of the regression model using full factorial design

Exploration of the response surface model (of two variables at the time) must focus on the vicinity of these critical points that may represent a local or global maximum and are graphical approximations. However, maximization of the desirability function—obtaining values close to the specified U1 = US$22,000 K—will better facilitate defining the optimum setting for all whole contributing variables or GA control parameters. The prediction profiler in Figure 3 shows how the prediction model of the stope profits changes when the variable settings are modified.

Figure 3
figure 3

Prediction profiler and maximum desirability using the full factorial design

The maximum combined desirability obtained by FFD is approximately 0.43 because the profits obtained by the prediction model are primarily < U1. This desirability is obtained when P = 40, C = 0.40 and M = 0.15; however, the prediction profiler graphs and surface response suggest exploring places with higher C and M values.

Results of Box–Behnken Design

Profit depends on 10 terms, which are combinations of three control parameters (as factors) until the second order. The regression model term related to C in the Box–Behnken design (Eq. 7) yielded a std-beta of 0.43, close to the maximum std-beta (0.46) obtained by P. The iteration of P and C also yielded a high std-beta (0.43); however, the quadratic terms of P maintain a high std-beta (absolute − 0.39) whereas the quadratic term of C yielded a very low std-beta (0.09).

$$ {\text{Profit}} = 19,602 + 173.6P + 160.8C + 60.6M + 228P \times C + 107.8P \times M + 37C \times M - 237.4P^{2} - 57.6C^{2} + 75.6M^{2} $$
(7)

Additional second-order terms in the model generated surfaces that fit the data and determined the best possible combinations of variables (Fig. 4).

Figure 4
figure 4

The surface of regression model using Box–Behnken design in RSM

The desirability value using the Box–Behnken design was higher (0.47) than using FFD because the prediction model generates profits closer to U1 = US$20,000 K (Fig. 5).

Figure 5
figure 5

Prediction profiler and maximum desirability using the Box–Behnken design

The optimal setting provided by this desirability maximization is P = 40, C = 0.40 and M = 0.15. At a P of approximately 40, profit steadily increases as C increases. However, the profile of the mutation ratio does not show convergence at the optimal setting.

Results of Box–Wilson Design

The regression model (Eq. 8) for the Box–Wilson design and the influence of the parameters on the response follow the findings for previous designs with a high std-beta (0.40) obtained for P; however, iterations of C and M obtained higher std-beta values (0.53 and absolute − 0.32, respectively). Thus, C and M have a synergistic effect on the response.

$$ {\text{Profit}} = 19,656 + 155.6P + 45.5C - 26.4M + 122.1P \times C + - 67.9P \times M + 233.1C \times M - 59P^{2} - 60.5C^{2} + 215M^{2} $$
(8)

Similar to Box–Behnken, Box–Wilson design (Fig. 6) suggested a critical point associated with potential optimum GA control parameters that reach the maximum profit in the response surface (P, C, and M of approximately 40, > 0.4 and > 0.10, respectively). This location is also related to the critical point found by FFD.

Figure 6
figure 6

The surface of regression model using Box–Wilson design in RSM

The prediction profiler of the mutation rate is improved by the Box–Wilson design (Fig. 7), but the maximized desirability value is less than the two previous designs (0.40). The prediction profile found two optimal setting parameters (P and C) similar to the two previous designs and P, C and M found their best profit at values 40, 0.40, and 0.116, respectively.

Figure 7
figure 7

Prediction profiler and maximum desirability using the Box–Wilson design

In summary, comparison of the surface profiler and contours of two factors with their response profit are shown for each design. Their directions of movement relative to the fitted surface lead toward the optimum response. The directions with the steepest slopes move more rapidly toward maximization (Melvin 2000). The population size (P = 40) and the crossover rate (C = 0.40) are defined and ratified by the three designs, but mutation rate needs to be confirmed from a reduced interval of possible variations identified by the prediction profiler and desirability maximization in the Box–Wilson design. Thus, mutation rate was tested at 0.10, 0.116 and 0.12 using GA optimization and the profits obtained were US$19,927, 19,887 and 18,810 K, respectively. Also, the optimal setting for the two first designs with M = 0.15 and a better desirability value (which implied higher profit was found by regression model) was also tested using GA optimization. This last test provided a profit of US$19,675 K, lower than the profit using M from the reduced interval. Thus, the parameters were calibrated as follows: P = 40, C = 0.40 and M = 0.10. This decision was assisted by maximizing the desirability value and verifying the prediction profiler. Since the response values of the experiments correspond to metaheuristic solutions, a critical point or optimal setting here may represent a value close to the global optimum in stope layout optimization.

GA Solution Time as a Second Response

Moreover, the case study considered the GA solution time for a specified number of generations as a second response variable in the analysis. This post-optimization value can assist in calibrating the parameters and balancing exploration and exploitation in GA because reasonable convergence time is relevant in decision-making. The goal is to find the minimum solution time while simultaneously maximizing the profit. The desirability of this second response (r = 2) is set as minimization of the solution time where L2 = 1 h as the lower limit, U2 = 30 h as the upper limit and importance s = 1. Desirability is 0 when solution time > U2 and is 1 when solution time is < L2.

The maximized combined desirability was 0.57, and the prediction profiler shows a maximum profit aligned with a minimum solution time (Fig. 8). As expected, the profit is lower than using a single response where only the profit is maximized; however, the profit and solving time evaluated simultaneously confirmed that P = 40 is the optimal setting for this case study. Further, a larger population size does not always imply a longer solution time; rather, crossover and mutation steadily increase the solution time, though the effect is less evident for mutation rate. These parameters are case–study-dependent because the variability within each solution may affect their selection. In other words, the stope layout pattern, which depends on orebody spatial variability, may affect the choice of GA parameters. This profit and solution time analysis using experimental design may also assist in further exploration and exploitation solution space decisions using higher or lower setting parameters.

Figure 8
figure 8

Maximization of desirability where profit is maximized and solution time is minimized

Conclusions

Genetic algorithms (GA) provide a powerful stochastic search strategy that can be used in various optimization problems including mine planning. Control parameter selection is either overlooked or it is based on sensitivity analysis. Therefore, the impact of each parameter on the optimization, the relative importance of each parameter and the interactions among parameters are unknown. In the case study, these parameters could cause profit variation of ± 7% and affect the performance and efficiency of GAs. It should be noted that these parameters are not universal and differ among optimization trials. Therefore, the proposed approach should be repeated for different instances.

In seeking a balance between GA exploitation and ability to explore the solution space via parameter calibration, a multiple regression model assisted in maximizing a desirability function to obtain profit close to a target value through a gradient ascent/descend algorithm. This defines the optimum setting for all GA control parameters.

The case study was implemented for three classical experimental design techniques. The 3-level full factorial design determined the impact of each parameter on the response variable, as well as the relative importance of parameters and their interactions, whereas the Box–Behnken and Box–Wilson designs as part of the response surface methodology (RSM) found the parameter configuration that maximizes profit during stope optimization. Population size had the strongest effect on profit (std-beta 0.35–0.61), crossover rate had a moderate effect (std-beta 0.12–0.43) and mutation rate had the weakest effect (std-beta 0.08–0.16) across the designs. These three designs produced \( \hat{y}_{r} \left( z \right) \) regression models, and their surface and predictor profilers illustrated the location of critical points or potential optimal parameters; however, the maximization of \( d_{r} \left( {\hat{y}_{r} (z)} \right) \) desirability function defines the optimum setting for all the parameters simultaneously. Thus, the desirability function to obtain profit close to the desired profit defined the population size (P = 40) and crossover rate (C = 0.4), but the mutation rate (M = 0.10) required an additional evaluation from a reduced interval (0.1–0.12) identified by the prediction profiler in RSM, from the original interval (0.5–0.15). These parameters were tested in the stope layout metaheuristic optimization and ratified as optimal because they provided the highest profit. Since the response values correspond to metaheuristic solutions, the optimal setting of parameters may be linked to the global optimum in the stope layout optimization.

Furthermore, minimization of solution time was considered as a second response in the analysis. As expected, the profit was lower than using a single response where only the profit is maximized; however, evaluating profit and solution time simultaneously confirmed that (1) population size (P = 40) is still the optimal setting for this case study, (2) a larger population size does not always imply a longer solution time, (3) the solution time increases steadily in proportion to crossover and mutation rates, (4) simultaneous profit and solution time analysis may assist in further exploration and exploitation decision with higher or lower setting parameters, and (5) the trade-off of acceptable computing time and profit desirability are illustrated. As the case study showed, the experimental design can be used to determine the values of GA operators.