Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

An algorithm is a formal description of a series of steps, which provides a solution for a given problem [1]. Obviously, the output of an algorithm depends on the input data and its parameters; an algorithm can be viewed as a black-box procedure, which receives in input a set of data and returns an output representing the results. The possibility of selecting different sets of parameters potentially allows to achieve satisfactory results for different instances of a problem, thus increasing the generality of an algorithm.

Parameters tuning represents a crucial part of any experimental protocol in global optimization; indeed, parameters setting heavily affects the quality of the solutions and the speed of convergence of an optimization algorithm [2, 3]. Finding good parameters values is pivotal when using stochastic algorithms [4]. Typically, parameters are set using commonly used values, or by a trial-and-error approach; interestingly, very few approaches have been proposed to systematically find optimal parameters settings [57]. Recently, Bartz-Beielstein introduced the Sequential Parameter Optimization (SPO) algorithm, a general framework for experimental analysis that accounts for iterative refinement of the parameters [8]. Successively, Hutter et al. proposed SPO \(^+\), an improved variant of SPO that uses log-transformations and the intensification criterion [9]. Hutter et al. also proposed ParamILS, a method to find optimal parameters settings through local search in the parameters space [10]. An alternative approach consists of using a racing scheme where, starting from an initial set of techniques, an iterative selection is performed [11].

We address the problem of tuning an algorithm by introducing the Sensitive Algorithmic Tuning (SAT) method; our approach finds optimal parameters settings by minimizing the worst-case performance of the most sensitive parameters. We evaluate this method by tuning the Differential Evolution (DE) optimization algorithm [12], although, in principle, our method can be applied to any algorithm.

The paper is organized as follows: in Sect. 2 we introduce the problem of tuning algorithms and the notion of maximum success region. Section 3 presents the concepts of robustness and sensitivity, and describes the Morris sensitivity analysis method and the Differential Evolution algorithm. In Sect. 4, we describes the SAT algorithm. Section 5 presents the experimental results. Finally, in Sect. 6 we discuss the conclusions and future works.

2 Algorithmic Tuning

Algorithmic tuning refers to the process of finding a set of values for the parameters of an algorithm, ensuring a satisfactory solution to the problem in the average case.

A trial-and-error approach is computationally expensive and could lead to poor performances if biased by a-priori knowledge. Without loss of generality, we assume that an algorithm is defined as follows:

$$\begin{aligned} Y=A(P,X) \end{aligned}$$
(1)

where A is a procedure that takes in input a problem instance P and a vector of parameters X, and returns an output Y. W.l.o.g., we assume that each parameter \(x_i\in X \subset \text {I}\!\text {R}\) is constrained to an interval \([x_-,x^+]\).

Finding an optimal parameter setting is an intractable problem. Let X be a parameter setting for an algorithm A, where each parameter can take k different discrete values within the interval \([x_-,x^+]\); it follows that the number of feasible parameters settings is \(k^{\vert X\vert }\). This result makes an exhaustive search intractable for large instances.

Parameters tuning generally consists in running the algorithm on a testbed problem \(\bar{P}\), which shares same characteristics with the original problem P (e.g. unimodality). In this context, it becomes crucial to identify a region of the parameters space that maximizes the probability of success, which we denote as maximum success region.

Definition 1

(Maximum Success Region). Let \(X=[x_-,x^+] \subset \mathrm {I}\!\mathrm {R}\) be the range for the parameters of an algorithm A. X is called maximum efficiency region if the following condition holds:

$$ \forall x \in X: P_r(A(P,x)=S(P)) \approx 1 $$

where \(P_r\) is a function that represents the probability of obtaining the exact solution S(P) of P.

The concept of maximum efficiency region fits particularly well the task of tuning the parameters of optimization algorithms; in this case, the maximum efficiency region of an optimizer is the subspace of parameters, which ensures near-optimal solutions.

It should be noted that optimization methods are subject to eager tuning, typically by requiring a large number of objective function evaluations to ensure the convergence to an optimum. It is possible to overcome this limitation by systematically choosing the smallest parameter value ensuring a maximum success rate; this general principle is exploited by the SAT algorithm.

3 Methods

In this section, we describe a framework for robustness and sensitivity analysis that represents the basis of the Sensitive Algorithmic Tuning (SAT) algorithm.

3.1 Robustness and Sensitivity Estimation

Algorithmic robustness is the probability of finding a satisfactory solution to a problem, even when the parameters are not optimally tuned. In general, there is a range of values for each parameter for which near-optimality is guaranteed. W.l.o.g., we assume that an algorithm is correct if a parameters setting ensuring an optimal or suboptimal solution exists. We define the robustness condition \(\rho \) and the accuracy yield \(\varGamma \) as follows:

Definition 2

(Robustness Condition). Let \(X \in {\mathrm {I}\!\mathrm {R}}^n\) be a parameters setting for an algorithm A. Given a parameters set \({X_*}\) obtained by perturbing X, the robustness condition \(\rho \) is defined as follows:

$$\begin{aligned} \rho (X,{X_*},A,P,\epsilon ) = \left\{ \begin{array}{ccc} 1 &{} \; if &{} \mid A(P,X)- A(P,X_*)\mid \le \epsilon \\ 0 &{} &{} otherwise\\ \end{array}\right. \end{aligned}$$
(2)

where the robustness threshold \(\epsilon \) denotes the precision in the objective function value.

Definition 3

(Yield). Let \(X \in {\mathrm {I}\!\mathrm {R}}^n\) be a parameters setting characterizing the behavior of a technique D. Given an ensemble T of parameters settings obtained by sampling the parameters space of X, the yield \(\varGamma \) is defined as follows:

$$\begin{aligned} \varGamma (X,A,P,\epsilon ,\rho ,T) = \frac{\sum _{{X_*} \in { T}}\rho (X,{X_*},A,P,\epsilon )}{|{ T}|} \end{aligned}$$
(3)

Since we consider subsets of the parameters space, we perform a Monte-Carlo sampling that generates trial settings in a specific parameters subspace. In our study, we set the robustness threshold \(\epsilon =10^{-5}\) and the number of trials \(|T|=100\).

An ad-hoc algorithmic tuning requires knowledge of the effects of the parameters on the output. In this context, the Morris sensitivity analysis technique [13] represents an interesting approach; it ranks the parameters based on their effect on the output, and does not require information about the system being analyzed [14]. In particular, a parameter is considered sensitive if variations of its value significantly affect the performance of the algorithm. In this context, the Morris technique can be used as an automated method for analyzing algorithms. We hypothesized that the identification of regions of low sensitivity could lead to an effective parameters tuning; this idea represents the basic principle of SAT.

3.2 Morris Method

Sensitivity analysis studies the influence of the input parameters on the output of a function (e.g. an algorithm), and identifies possible relations between parameters, e.g. linear and nonlinear.

The Morris method is a one-at-a-time (OAT) global sensitivity analysis technique. Given a set of parameters values, a parameter at time is modified and the variation of the output is recorded. This information is used to calculate the mean values \(\mu \) and the standard deviations \(\sigma \) associated with each parameter. Parameters with a high \(\mu \) have an important impact on the output, large values of \(\sigma \) indicate nonlinear relations with other parameters, whereas small mean values are associated with negligible effect.

3.3 Differential Evolution Algorithm

Differential Evolution (DE) is a stochastic population-based algorithm developed for global optimization in continuous spaces [12]. DE is used to solve multi-modal, multi-objective, dynamic or constrained optimization problems; it finds application in several real-world problems, such as digital filter design, fermentation processes, dispatch optimization, and several others [1518].

DE exploits a vector of differences that is used as a perturbation operator. Given an objective function \(f:\text {I}\!\text {R}^n \rightarrow \text {I}\!\text {R}\), DE starts by generating NP individuals at random, where the values of the variables are constrained in their respective lower and upper bounds. At each generation, each individual is modified according to a crossover probability \(C_r\), using the following scheme:

$$\begin{aligned} x_{g+1}^i = x_{g}^i+F_w \times (y_{g}^i-z_{g}^i) \end{aligned}$$
(4)

where \(x_{g+1}^i\) is the \(i-\)th variable of the new individual at generation \(g+1\); \(y_{g}\) and \(z_{g}\) are two individuals of the population such that \(x\ne y \ne z\); and \(F_w\) is a weighting factor.

If \(f(x_{g+1})<f(x_{g})\), \(x_{g+1}\) replaces \(x_{g}\) in the population. Typically, the algorithm stops when a predetermined number of generations (G) is reached.

The algorithm has few parameters; \(C_r\) controls the exploring ability of DE, \(F_w\) controls its exploiting ability, while NP determines the population size. In particular, for large-scale problems, DE requires large NP values and a sufficient number of generations to obtain satisfactory results.

4 Sensitive Algorithmic Tuning

The Sensitive Algorithmic Tuning (SAT) algorithm is a deterministic method that relies on sensitivity analysis and worst-case screening to identify maximum success regions within the parameters space.

Sensitive parameters are those that typically decrease the success or failure of an algorithm. The sensitivity of a parameter is strictly related to its uncertainty region; in general, a large parameter range makes difficult to find an optimal setting. When the value of a parameter is outside its maximum success region, we can observe an increase in sensitivity. Sensitivity minimization is a key principle in system design; it is necessary to obtain robust parameters setting, but not sufficient to guarantee near-optimal solutions. To overcome this limitation, we adopt a worst-case screening method. Given two parameters settings, SAT chooses the one providing the best solution in the worst case.

The SAT algorithm is depicted in Algorithm 1. At each step, SAT splits the parameters space of each parameter and performs Morris analysis in both regions; it then selects the subspace with the highest mean value, aiming at tuning the most sensitive parameter first. The splitting procedure can be any interval-cutting strategy [19]; in our experiments, we generate two regions by halving the range of sensitive parameters.

figure a

An example of the application of the SAT algorithm to a technique with two parameters is depicted in Fig. 1. The objective function values obtained by each parameters setting sampled during sensitivity analysis are also used for evaluating the worst case performance of the algorithm. We use two halting conditions; the attainment of an optimal solution in the worst case, or the impossibility of further splitting the parameters space. This strategy is useful to prevent a waste of computational effort when the parameter region is sufficiently small.

Fig. 1.
figure 1

Iterations of the SAT algorithm on a bi-dimensional parameter space. \(\emptyset \) denotes a region that will not be furtherly splitted.

5 Experimental Results

We use the SAT algorithm for tuning DE, with the parameters reported in Table 1.

Table 1. Parameters of the Differential Evolution algorithm. n denotes the dimension of the problem.

Since parameters settings are problem-dependent, we consider unimodal and multimodal numerical functions of dimensions \(d=10\) and \(d=20\) (see Table 2). Three unimodal functions are used, characterized by multiple quadratic terms. Multimodal functions take into account noise (see function \(f_4\)), quadratic and quartic terms. It should be noted that the number of local minima of \(f_5\) increases exponentially with its dimension [20].

Table 2. Test problems. \(f^*\) represents the global minimum; (\(X_-\)) and \((X^+)\) are the lower and upper bound, respectively, n denotes the dimension of the problem.

The metric adopted for evaluating the DE performance is the average value of the best solution over 10 independent runs; moreover, we apply the SAT algorithm to find a setting that ensures a worst-case result of \(10^{-1}\).

An extensive set of simulations is performed to achieve an optimal parameters setting for DE. Table 3 reports the results of SAT on the five testbed problems for \(d=10\). The experimental results show that SAT is able to find satisfactory settings for all the problems.

Table 3. Results for \(d=10\). \(f_w^*\) is the worst case objective function value; v(f) represents the variance of the output of all the sampled settings; the lower and the upper bound of each parameter found by SAT are within brackets.

On unimodal functions \(f_1\) and \(f_2\), we are able to obtain an upper bound on the algorithm performance that is close to the global optimum. The \(f_3\) function represents an exception within the unimodal set: it is not surprising that the worst case result is many orders of magnitude worse than the others, due to the presence of several plateau regions that can trap the algorithm. The values of variance show that DE is highly influenced by its parameters, and the initial bounds contain many suboptimal parameters settings.

The multimodal functions are more complex, as the presence of noise and several local minima leads to a difficult tuning. Despite this complex scenario, SAT is able to find a setting that ensures near-optimal solutions in the worst case. The variance is several orders of magnitude lower with respect to unimodal functions; however, this is probably due to the smaller intervals on which the algorithm operates.

Fig. 2.
figure 2

Sensitivity landscape of \(f_3\) for \(d=10\). \(\mu ,\sigma \) are reported on the x and y axis, respectively; the objective function values are reported on the z axis.

Fig. 3.
figure 3

Sensitivity landscape of \(f_5\) for \(d=10\). \(\mu ,\sigma \) are reported on the x and y axis, respectively; the objective function values are reported on the z axis.

Figures 2 and 3 report the sensitivity landscapes of \(f_3\) and \(f_5\). The lowest function value is achieved when the sensitivity of each parameter is close to zero. This result seems to support the underlying strategy implemented in SAT. The sensitivity landscapes of the parameters are different for the two problems; for the \(f_3\) function, the population size, the number of generations and the weighting factor show a convex shape, while the crossover probability has a spike near the maximum success region. By inspecting the sensitivity landscape of \(f_5\), we note that the best results are obtained when the sensitivity is close to zero, but the shape of the landscape is more rugged. For all parameters, we have a high number of peaks at different points, which remarks the complexity of tuning DE in this case. Despite this rugged parameter space, SAT is able to identify the best setting.

The results for the tuning of DE when \(d=20\) are presented in Table 4.

Table 4. Results for \(d=20\). \(f_w^*\) is the worst case objective function value; v(f) represents the variance of the output of all the sampled settings; the lower and the upper bound of each parameter found by SAT are within brackets.

On the unimodal functions \(f_1\) and \(f_2\), the algorithm finds a setting with a satisfactory upper bound, however this result is not as good as the one found for the smaller dimension instances. \(f_3\) proves to be the most difficult unimodal problem of the entire set; in this case, lower accuracy than expected was obtained. The values of variance are dramatically increased, and the tuning becomes more difficult for increasing problem dimension. Although SAT was not able to find a parameters setting that matched the desired upper bound, the results required only limited computational effort.

In Figs. 4 and 5, we report the sensitivity landscape of \(f_3\) and \(f_5\) for \(d=20\).

Fig. 4.
figure 4

Sensitivity landscape of \(f_3\) for \(d=20\). \(\mu ,\sigma \) are reported on the x and y axis, respectively; the objective function values are reported on the z axis.

Fig. 5.
figure 5

Sensitivity landscape of \(f_5\) for \(d=20\). \(\mu ,\sigma \) are reported on the x and y axis, respectively; the objective function values are reported on the z axis.

The plots confirm the results obtained for lower dimensional problems. The lowest function value is always achieved when the parameters sensitivity is close to zero. The landscape for the two problems are comparable with the previous, however we observe an increase in \(\mu \) and \(\sigma \); for example, for \(f_3\), the crossover probability shows a peak in the central region that is far from the maximum success region. For the \(f_5\) function, a rugged landscape is also obtained, and we observe increased sensitivity values for all parameters. This landscape seems to remark the presence of sensitivity barriers, which makes difficult the tuning process.

Finally, we perform robustness analysis. Since parameters ranges are considered instead of single values, we have to evaluate the probability of finding satisfactory results using any setting belonging to a given parameter space. Table 5 shows the yield values of the settings for each problem. High yield values are obtained in all testbeds, except for the \(f_3\) and \(f_4\) functions. We conclude that the proposed settings are robust and effective, and guarantee near-optimal performances of the DE algorithm.

Table 5. Robustness analysis of SAT parameters. d represents the dimension, \(\varGamma \) is the yield.

6 Conclusions

Algorithm design is a complex process that should pursuit three general properties; efficiency, effectiveness and generality. These properties are strictly dependent on how algorithms parameters are configured. Parameters tuning is therefore required to ensure efficiency and effectiveness when solving a problem.

In this work, we propose the Sensitive Algorithmic Tuning (SAT), a new method for the automatic tuning of algorithms. SAT uses the Morris technique to determine the parameters to optimize to obtain satisfactory upper bounds on the expected performances. We evaluate the performance of SAT on the problem of tuning the Differential Evolution (DE) algorithm. The experimental results show the effectiveness of our approach; moreover, the discovered settings present a high degree of robustness. Interestingly, these results confirm that optimal settings are associated with a low parameters sensitivity.

We believe the parameters tuning problem should be exploited more deeply by extending the cutting strategy; for instance, an alternative branch-and-bound [21] approach could represent an efficient sampling strategy, where sensitivity information could be used to perform expansion and pruning of the exploring tree. We suggest the enhanced SAT algorithm could represent a valid candidate for tuning one of the most important algorithms of the Internet era: PageRank [22].