Automatic Tuning of Algorithms Through Sensitivity Minimization

Conca, Piero; Stracquadanio, Giovanni; Nicosia, Giuseppe

doi:10.1007/978-3-319-27926-8_2

Piero Conca¹⁷,
Giovanni Stracquadanio¹⁸ &
Giuseppe Nicosia¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9432))

Included in the following conference series:

International Workshop on Machine Learning, Optimization and Big Data

2173 Accesses
3 Citations

Abstract

Parameters tuning is a crucial step in global optimization. In this work, we present a novel method, the Sensitive Algorithmic Tuning, which finds near-optimal parameter configurations through sensitivity minimization. The experimental results highlight the effectiveness and robustness of this novel approach.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Tuning Algorithms for Stochastic Black-Box Optimization: State of the Art and Future Perspectives

Algorithm Engineering in Robust Optimization

A Fair Comparison Between Standard PSO Versions

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

An algorithm is a formal description of a series of steps, which provides a solution for a given problem [1]. Obviously, the output of an algorithm depends on the input data and its parameters; an algorithm can be viewed as a black-box procedure, which receives in input a set of data and returns an output representing the results. The possibility of selecting different sets of parameters potentially allows to achieve satisfactory results for different instances of a problem, thus increasing the generality of an algorithm.

Parameters tuning represents a crucial part of any experimental protocol in global optimization; indeed, parameters setting heavily affects the quality of the solutions and the speed of convergence of an optimization algorithm [2, 3]. Finding good parameters values is pivotal when using stochastic algorithms [4]. Typically, parameters are set using commonly used values, or by a trial-and-error approach; interestingly, very few approaches have been proposed to systematically find optimal parameters settings [5–7]. Recently, Bartz-Beielstein introduced the Sequential Parameter Optimization (SPO) algorithm, a general framework for experimental analysis that accounts for iterative refinement of the parameters [8]. Successively, Hutter et al. proposed SPO $^+$, an improved variant of SPO that uses log-transformations and the intensification criterion [9]. Hutter et al. also proposed ParamILS, a method to find optimal parameters settings through local search in the parameters space [10]. An alternative approach consists of using a racing scheme where, starting from an initial set of techniques, an iterative selection is performed [11].

We address the problem of tuning an algorithm by introducing the Sensitive Algorithmic Tuning (SAT) method; our approach finds optimal parameters settings by minimizing the worst-case performance of the most sensitive parameters. We evaluate this method by tuning the Differential Evolution (DE) optimization algorithm [12], although, in principle, our method can be applied to any algorithm.

The paper is organized as follows: in Sect. 2 we introduce the problem of tuning algorithms and the notion of maximum success region. Section 3 presents the concepts of robustness and sensitivity, and describes the Morris sensitivity analysis method and the Differential Evolution algorithm. In Sect. 4, we describes the SAT algorithm. Section 5 presents the experimental results. Finally, in Sect. 6 we discuss the conclusions and future works.

2 Algorithmic Tuning

Algorithmic tuning refers to the process of finding a set of values for the parameters of an algorithm, ensuring a satisfactory solution to the problem in the average case.

A trial-and-error approach is computationally expensive and could lead to poor performances if biased by a-priori knowledge. Without loss of generality, we assume that an algorithm is defined as follows:

$$\begin{aligned} Y=A(P,X) \end{aligned}$$

(1)

where A is a procedure that takes in input a problem instance P and a vector of parameters X, and returns an output Y. W.l.o.g., we assume that each parameter $x_i\in X \subset \text {I}\!\text {R}$ is constrained to an interval $[x_-,x^+]$.

Finding an optimal parameter setting is an intractable problem. Let X be a parameter setting for an algorithm A, where each parameter can take k different discrete values within the interval $[x_-,x^+]$; it follows that the number of feasible parameters settings is $k^{\vert X\vert }$. This result makes an exhaustive search intractable for large instances.

Parameters tuning generally consists in running the algorithm on a testbed problem $\bar{P}$, which shares same characteristics with the original problem P (e.g. unimodality). In this context, it becomes crucial to identify a region of the parameters space that maximizes the probability of success, which we denote as maximum success region.

Definition 1

(Maximum Success Region). Let $X=[x_-,x^+] \subset \mathrm {I}\!\mathrm {R}$ be the range for the parameters of an algorithm A. X is called maximum efficiency region if the following condition holds:

$$ \forall x \in X: P_r(A(P,x)=S(P)) \approx 1 $$

where $P_r$ is a function that represents the probability of obtaining the exact solution S(P) of P.

The concept of maximum efficiency region fits particularly well the task of tuning the parameters of optimization algorithms; in this case, the maximum efficiency region of an optimizer is the subspace of parameters, which ensures near-optimal solutions.

It should be noted that optimization methods are subject to eager tuning, typically by requiring a large number of objective function evaluations to ensure the convergence to an optimum. It is possible to overcome this limitation by systematically choosing the smallest parameter value ensuring a maximum success rate; this general principle is exploited by the SAT algorithm.

3 Methods

In this section, we describe a framework for robustness and sensitivity analysis that represents the basis of the Sensitive Algorithmic Tuning (SAT) algorithm.

3.1 Robustness and Sensitivity Estimation

Algorithmic robustness is the probability of finding a satisfactory solution to a problem, even when the parameters are not optimally tuned. In general, there is a range of values for each parameter for which near-optimality is guaranteed. W.l.o.g., we assume that an algorithm is correct if a parameters setting ensuring an optimal or suboptimal solution exists. We define the robustness condition $\rho $ and the accuracy yield $\varGamma $ as follows:

Definition 2

(Robustness Condition). Let $X \in {\mathrm {I}\!\mathrm {R}}^n$ be a parameters setting for an algorithm A. Given a parameters set ${X_*}$ obtained by perturbing X, the robustness condition $\rho $ is defined as follows:

$$\begin{aligned} \rho (X,{X_*},A,P,\epsilon ) = \left\{ \begin{array}{ccc} 1 &{} \; if &{} \mid A(P,X)- A(P,X_*)\mid \le \epsilon \\ 0 &{} &{} otherwise\\ \end{array}\right. \end{aligned}$$

(2)

where the robustness threshold $\epsilon $ denotes the precision in the objective function value.

Definition 3

(Yield). Let $X \in {\mathrm {I}\!\mathrm {R}}^n$ be a parameters setting characterizing the behavior of a technique D. Given an ensemble T of parameters settings obtained by sampling the parameters space of X, the yield $\varGamma $ is defined as follows:

$$\begin{aligned} \varGamma (X,A,P,\epsilon ,\rho ,T) = \frac{\sum _{{X_*} \in { T}}\rho (X,{X_*},A,P,\epsilon )}{|{ T}|} \end{aligned}$$

(3)

Since we consider subsets of the parameters space, we perform a Monte-Carlo sampling that generates trial settings in a specific parameters subspace. In our study, we set the robustness threshold $\epsilon =10^{-5}$ and the number of trials $|T|=100$.

An ad-hoc algorithmic tuning requires knowledge of the effects of the parameters on the output. In this context, the Morris sensitivity analysis technique [13] represents an interesting approach; it ranks the parameters based on their effect on the output, and does not require information about the system being analyzed [14]. In particular, a parameter is considered sensitive if variations of its value significantly affect the performance of the algorithm. In this context, the Morris technique can be used as an automated method for analyzing algorithms. We hypothesized that the identification of regions of low sensitivity could lead to an effective parameters tuning; this idea represents the basic principle of SAT.

3.2 Morris Method

Sensitivity analysis studies the influence of the input parameters on the output of a function (e.g. an algorithm), and identifies possible relations between parameters, e.g. linear and nonlinear.

The Morris method is a one-at-a-time (OAT) global sensitivity analysis technique. Given a set of parameters values, a parameter at time is modified and the variation of the output is recorded. This information is used to calculate the mean values $\mu $ and the standard deviations $\sigma $ associated with each parameter. Parameters with a high $\mu $ have an important impact on the output, large values of $\sigma $ indicate nonlinear relations with other parameters, whereas small mean values are associated with negligible effect.

3.3 Differential Evolution Algorithm

Differential Evolution (DE) is a stochastic population-based algorithm developed for global optimization in continuous spaces [12]. DE is used to solve multi-modal, multi-objective, dynamic or constrained optimization problems; it finds application in several real-world problems, such as digital filter design, fermentation processes, dispatch optimization, and several others [15–18].

DE exploits a vector of differences that is used as a perturbation operator. Given an objective function $f:\text {I}\!\text {R}^n \rightarrow \text {I}\!\text {R}$, DE starts by generating NP individuals at random, where the values of the variables are constrained in their respective lower and upper bounds. At each generation, each individual is modified according to a crossover probability $C_r$, using the following scheme:

$$\begin{aligned} x_{g+1}^i = x_{g}^i+F_w \times (y_{g}^i-z_{g}^i) \end{aligned}$$

(4)

where $x_{g+1}^i$ is the $i-$th variable of the new individual at generation $g+1$; $y_{g}$ and $z_{g}$ are two individuals of the population such that $x\ne y \ne z$; and $F_w$ is a weighting factor.

If $f(x_{g+1})<f(x_{g})$, $x_{g+1}$ replaces $x_{g}$ in the population. Typically, the algorithm stops when a predetermined number of generations (G) is reached.

The algorithm has few parameters; $C_r$ controls the exploring ability of DE, $F_w$ controls its exploiting ability, while NP determines the population size. In particular, for large-scale problems, DE requires large NP values and a sufficient number of generations to obtain satisfactory results.

4 Sensitive Algorithmic Tuning

The Sensitive Algorithmic Tuning (SAT) algorithm is a deterministic method that relies on sensitivity analysis and worst-case screening to identify maximum success regions within the parameters space.

Sensitive parameters are those that typically decrease the success or failure of an algorithm. The sensitivity of a parameter is strictly related to its uncertainty region; in general, a large parameter range makes difficult to find an optimal setting. When the value of a parameter is outside its maximum success region, we can observe an increase in sensitivity. Sensitivity minimization is a key principle in system design; it is necessary to obtain robust parameters setting, but not sufficient to guarantee near-optimal solutions. To overcome this limitation, we adopt a worst-case screening method. Given two parameters settings, SAT chooses the one providing the best solution in the worst case.

The SAT algorithm is depicted in Algorithm 1. At each step, SAT splits the parameters space of each parameter and performs Morris analysis in both regions; it then selects the subspace with the highest mean value, aiming at tuning the most sensitive parameter first. The splitting procedure can be any interval-cutting strategy [19]; in our experiments, we generate two regions by halving the range of sensitive parameters.

An example of the application of the SAT algorithm to a technique with two parameters is depicted in Fig. 1. The objective function values obtained by each parameters setting sampled during sensitivity analysis are also used for evaluating the worst case performance of the algorithm. We use two halting conditions; the attainment of an optimal solution in the worst case, or the impossibility of further splitting the parameters space. This strategy is useful to prevent a waste of computational effort when the parameter region is sufficiently small.

5 Experimental Results

We use the SAT algorithm for tuning DE, with the parameters reported in Table 1.

Table 1. Parameters of the Differential Evolution algorithm. n denotes the dimension of the problem.

Full size table

Since parameters settings are problem-dependent, we consider unimodal and multimodal numerical functions of dimensions $d=10$ and $d=20$ (see Table 2). Three unimodal functions are used, characterized by multiple quadratic terms. Multimodal functions take into account noise (see function $f_4$), quadratic and quartic terms. It should be noted that the number of local minima of $f_5$ increases exponentially with its dimension [20].

Table 2. Test problems. $f^*$ represents the global minimum; ($X_-$) and $(X^+)$ are the lower and upper bound, respectively, n denotes the dimension of the problem.

Full size table

The metric adopted for evaluating the DE performance is the average value of the best solution over 10 independent runs; moreover, we apply the SAT algorithm to find a setting that ensures a worst-case result of $10^{-1}$.

An extensive set of simulations is performed to achieve an optimal parameters setting for DE. Table 3 reports the results of SAT on the five testbed problems for $d=10$. The experimental results show that SAT is able to find satisfactory settings for all the problems.

Table 3. Results for $d=10$. $f_w^*$ is the worst case objective function value; v(f) represents the variance of the output of all the sampled settings; the lower and the upper bound of each parameter found by SAT are within brackets.

Full size table

On unimodal functions $f_1$ and $f_2$, we are able to obtain an upper bound on the algorithm performance that is close to the global optimum. The $f_3$ function represents an exception within the unimodal set: it is not surprising that the worst case result is many orders of magnitude worse than the others, due to the presence of several plateau regions that can trap the algorithm. The values of variance show that DE is highly influenced by its parameters, and the initial bounds contain many suboptimal parameters settings.

The multimodal functions are more complex, as the presence of noise and several local minima leads to a difficult tuning. Despite this complex scenario, SAT is able to find a setting that ensures near-optimal solutions in the worst case. The variance is several orders of magnitude lower with respect to unimodal functions; however, this is probably due to the smaller intervals on which the algorithm operates.

Figures 2 and 3 report the sensitivity landscapes of $f_3$ and $f_5$. The lowest function value is achieved when the sensitivity of each parameter is close to zero. This result seems to support the underlying strategy implemented in SAT. The sensitivity landscapes of the parameters are different for the two problems; for the $f_3$ function, the population size, the number of generations and the weighting factor show a convex shape, while the crossover probability has a spike near the maximum success region. By inspecting the sensitivity landscape of $f_5$, we note that the best results are obtained when the sensitivity is close to zero, but the shape of the landscape is more rugged. For all parameters, we have a high number of peaks at different points, which remarks the complexity of tuning DE in this case. Despite this rugged parameter space, SAT is able to identify the best setting.

The results for the tuning of DE when $d=20$ are presented in Table 4.

Table 4. Results for $d=20$. $f_w^*$ is the worst case objective function value; v(f) represents the variance of the output of all the sampled settings; the lower and the upper bound of each parameter found by SAT are within brackets.

Full size table

On the unimodal functions $f_1$ and $f_2$, the algorithm finds a setting with a satisfactory upper bound, however this result is not as good as the one found for the smaller dimension instances. $f_3$ proves to be the most difficult unimodal problem of the entire set; in this case, lower accuracy than expected was obtained. The values of variance are dramatically increased, and the tuning becomes more difficult for increasing problem dimension. Although SAT was not able to find a parameters setting that matched the desired upper bound, the results required only limited computational effort.

In Figs. 4 and 5, we report the sensitivity landscape of $f_3$ and $f_5$ for $d=20$.

The plots confirm the results obtained for lower dimensional problems. The lowest function value is always achieved when the parameters sensitivity is close to zero. The landscape for the two problems are comparable with the previous, however we observe an increase in $\mu $ and $\sigma $; for example, for $f_3$, the crossover probability shows a peak in the central region that is far from the maximum success region. For the $f_5$ function, a rugged landscape is also obtained, and we observe increased sensitivity values for all parameters. This landscape seems to remark the presence of sensitivity barriers, which makes difficult the tuning process.

Finally, we perform robustness analysis. Since parameters ranges are considered instead of single values, we have to evaluate the probability of finding satisfactory results using any setting belonging to a given parameter space. Table 5 shows the yield values of the settings for each problem. High yield values are obtained in all testbeds, except for the $f_3$ and $f_4$ functions. We conclude that the proposed settings are robust and effective, and guarantee near-optimal performances of the DE algorithm.

Table 5. Robustness analysis of SAT parameters. d represents the dimension, $\varGamma $ is the yield.

Full size table

6 Conclusions

Algorithm design is a complex process that should pursuit three general properties; efficiency, effectiveness and generality. These properties are strictly dependent on how algorithms parameters are configured. Parameters tuning is therefore required to ensure efficiency and effectiveness when solving a problem.

In this work, we propose the Sensitive Algorithmic Tuning (SAT), a new method for the automatic tuning of algorithms. SAT uses the Morris technique to determine the parameters to optimize to obtain satisfactory upper bounds on the expected performances. We evaluate the performance of SAT on the problem of tuning the Differential Evolution (DE) algorithm. The experimental results show the effectiveness of our approach; moreover, the discovered settings present a high degree of robustness. Interestingly, these results confirm that optimal settings are associated with a low parameters sensitivity.

We believe the parameters tuning problem should be exploited more deeply by extending the cutting strategy; for instance, an alternative branch-and-bound [21] approach could represent an efficient sampling strategy, where sensitivity information could be used to perform expansion and pruning of the exploring tree. We suggest the enhanced SAT algorithm could represent a valid candidate for tuning one of the most important algorithms of the Internet era: PageRank [22].

References

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2001)
MATH Google Scholar
Horst, R., Pardalos, P.M., Thoai, N.V.: Introduction to Global Optimization. Springer, USA (2000)
Book MATH Google Scholar
Floudas, C.A.: Deterministic Global Optimization. Kluwer, New York (2000)
Book Google Scholar
Motwani, R., Raghavan, P.: Randomized algorithms. ACM Comput. Surv. (CSUR) 28(1), 37 (1996)
Article Google Scholar
Back, T., Schwefel, H.P.: An overview of evolutionary algorithms for parameter optimization. Evol. Comput. 1(1), 1–23 (1993)
Article Google Scholar
Audet, C., Orban, D.: Finding optimal algorithmic parameters using derivative-free optimization. SIAM J. Optim. 17(3), 642–664 (2006)
Article MATH MathSciNet Google Scholar
Nannen, V., Eiben, A.E.: Relevance estimation and value calibration of evolutionary algorithm parameters. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 975–980. Morgan Kaufmann Publishers Inc. (2007)
Google Scholar
Bartz-Beielstein, T.: Sequential parameter optimization - sampling-based optimization in the presence of uncertainty (2009)
Google Scholar
Hutter, F., Hoos, H.H., Leyton-Brown, K., Murphy, K.P.: An experimental investigation of model-based parameter optimisation: Spo and beyond. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 2009, pp. 271–278. ACM, New York (2009)
Google Scholar
Hutter, F., Hoos, H.H., Stützle, T.: Automatic algorithm configuration based on local search. In: AAAI, vol. 7, pp. 1152–1157 (2007)
Google Scholar
Birattari, M., Yuan, Z., Balaprakash, P., Sttzle, T.: F-race and iterated f-race: an overview. In: Bartz-Beielstein, T., Chiarandini, M., Paquete, L., Preuss, M. (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 311–336. Springer, Berlin (2010)
Chapter Google Scholar
Storn, R., Price, K.V.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997)
Article MATH MathSciNet Google Scholar
Morris, M.D.: Factorial sampling plans for preliminary computational experiments. Technometrics 33(2), 161–174 (1991)
Article Google Scholar
Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., Tarantola, S.: Global Sensitivity Analysis: The Primer. Wiley-Interscience, Hoboken (2008)
Google Scholar
Storn, R.: Differential evolution design of an iir-filter. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 268–273 (1996)
Google Scholar
dos Santos Coelho, L., Mariani, V.C.: Improved differential evolution algorithms for handling economic dispatch optimization with generator constraints. Energy Convers. Manage. 48(5), 1631–1639 (2007)
Article Google Scholar
Chiou, J.-P., Wang, F.-S.: Hybrid method of evolutionary algorithms for static and dynamic optimization problems with application to a fed-batch fermentation process. Comput. Chem. Eng. 23(9), 1277–1291 (1999)
Article Google Scholar
Price, K.V., Storn, R.M., Lampinen, J.A.: Differential Evolution: A Practical Approach to Global Optimization. Springer, Heidelberg (2005)
Google Scholar
Fusiello, A., Benedetti, A., Farenzena, M., Busti, A.: Globally convergent autocalibration using interval analysis. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1633–1638 (2004)
Article Google Scholar
Yao, X., Liu, Y., Lin, G.: Evolutionary programming made faster. Evol. Comput. 3(2), 82–102 (1999)
Article Google Scholar
Fuchs, M., Neumaier, A.: A splitting technique for discrete search based on convex relaxation. J. Uncertain Syst. 4(1), 14–21 (2010)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge Professor Angelo Marcello Anile for the useful discussions on the seminal idea of automatic algorithms tuning. Professor Anile was a continuous source of inspiration during our research work.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Catania, Catania, Italy
Piero Conca & Giuseppe Nicosia
Ludwig Institute for Cancer Research, University of Oxford, Oxford, UK
Giovanni Stracquadanio

Authors

Piero Conca
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Stracquadanio
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Nicosia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piero Conca .

Editor information

Editors and Affiliations

University of Florida, Gainsville, Florida, USA
Panos Pardalos
University of Catania, Catania, Italy
Mario Pavone
University of Catania, Catania, Italy
Giovanni Maria Farinella
University of Catania, Catania, Italy
Vincenzo Cutello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Conca, P., Stracquadanio, G., Nicosia, G. (2015). Automatic Tuning of Algorithms Through Sensitivity Minimization. In: Pardalos, P., Pavone, M., Farinella, G., Cutello, V. (eds) Machine Learning, Optimization, and Big Data. MOD 2015. Lecture Notes in Computer Science(), vol 9432. Springer, Cham. https://doi.org/10.1007/978-3-319-27926-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-27926-8_2
Published: 06 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27925-1
Online ISBN: 978-3-319-27926-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Tuning of Algorithms Through Sensitivity Minimization

Abstract

Similar content being viewed by others

Tuning Algorithms for Stochastic Black-Box Optimization: State of the Art and Future Perspectives

Algorithm Engineering in Robust Optimization

A Fair Comparison Between Standard PSO Versions

Keywords

1 Introduction