1 Introduction

A penalty technique transforms the constrained optimization problem into a sequence of unconstrained subproblems, in a way that the sequence of solutions of the unconstrained subproblems converges to the optimal solution of the original constrained problem [1]. The technique is simple to implement and takes advantage of existing and powerful unconstrained optimization methods. However, defining a strategy to initialize and update the penalty parameter is not an easy task. To address the concerning issue related to setting the penalty parameter values within a penalty-based algorithm, a new self-adaptive penalty function is derived.

This paper illustrates the behavior of a penalty technique that relies on a self-adaptive penalty function, to solve constrained global optimization (CGO) problems. To promote convergence to a global optimal solution, the resulting bound constrained global optimization (BCGO) problems are solved by well-known population-based meta-heuristics. Although they have been implemented with different constraint handling techniques for solving CGO problems, mainly penalty-based methods [2,3,4,5,6], this study shows that the proposed self-adaptive penalty technique, when combined with the meta-heuristics, is also very effective in solving CGO problems. In particular, we analyze the performance of the firefly algorithm (FA) [7] when combined with the self-adaptive penalty technique. FA is a swarm intelligence-based algorithm that became very popular over the last decade. Several variants of the FA [8,9,10,11], including hybrid approaches [12, 13], and applications have been recently reported in the literature [14,15,16]. The effect of the control parameters on the performance of the FA has been studied in [17,18,19]. The main motivation for using the FA, besides being one of the most recent meta-heuristics, is related to its success when solving practical and complex problems [2, 20,21,22,23,24,25]. Although other adaptive penalty-based functions have been recently combined with stochastic population-based global optimizers [3, 4, 26,27,28], our proposal is simpler to implement, and the convergence of the algorithm is supported by the theoretical results. The authors in [3] construct a parameter-free penalty function. The therein proposed adaptive penalty gives the objective function value alone if the point is feasible, and combines the sum of constraint violation with either the objective value or an upper bound of the global minimum if the point is infeasible. They prove that the CGO and the BCGO problems, based on their adaptive penalty function, have the same global minimizers and present further theoretical results based on the structure of the population-based differential evolution (DE) algorithm [29]. In [4], the adaptive penalty method (APM) investigated in [26] is extended and applied with the DE. The authors in [26] use information from the population, such as the average of the objective function values and the level of violation of each constraint, at each iteration, to define the penalty parameter. In [27, 28], the normalized objective function value and a sum of the normalized constraint violations are combined to define a modified fitness value. In both papers, a real coded genetic algorithm (GA) is used in the adaptive penalty algorithm. No theoretical convergence results are supplied in the last-mentioned papers [4, 26,27,28].

Our contribution goes beyond the self-adaptive penalty function proposal. First, we prove that the CGO and the BCGO problems, based on the proposed self-adaptive penalty function, are equivalent in the sense that they have the same global minimizers. A selected set of meta-heuristics, the FA, a DE strategy with self-adaptive control parameters (jDE) [30], the particle swarm optimization (PSO) algorithm [31, 32], an evolution strategy with covariance matrix adaptation (CMA-ES) [33] and the artificial bee colony (ABC) algorithm [34] are used to solve the BCGO problem. The issue related to the adequacy of the computation of the parameters required to construct the self-adaptive penalty function in a population environment is addressed. Second, in the context of the FA, we provide a hybrid variant by using a local intensification procedure. The convergence analysis of the algorithm that takes into consideration the structure of the FA and the properties of the proposed self-adaptive penalty function is established.

The paper is organized as follows. Section 2 presents the new self-adaptive penalty function, Sect. 3 elaborates on the computation of the penalty in a population-based environment and Sect. 4 details the new hybrid self-adaptive penalty FA. Then, the numerical experiments are shown in Sect. 5, and we conclude the paper in Sect. 6.

2 Self-Adaptive Penalty Function

This study aims to propose a self-adaptive penalty framework for solving a CGO problem in the following form

$$\begin{aligned} \underset{x \in X \subset {\mathbb {R}}^n}{\min }\,\, f(x) \quad \text{ subject } \text{ to } \quad g(x) \le 0, \end{aligned}$$
(1)

where \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) and \(g: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^p\) are continuous possibly nonlinear functions in \(X := \{x \in {\mathbb {R}}^n: -\infty< l_s \le x_s \le u_s < \infty , s=1,\ldots ,n\}\) (a compact set) and the feasible set is defined by \(S:=\left\{ x \in X: g_j(x) \le 0, j=1,\ldots ,p\right\} \). Let \(x^{*}\) be a global minimizer to the problem (1) and let \(f^{*} = f(x^{*})\) be the global minimum. The feasible set \(S\subseteq X\) is assumed to be nonempty with a positive measure. Problems with equality constraints \(h(x)=0\) can be reformulated into the above form using \(h(x)-\delta \le 0\) and \(-h(x)-\delta \le 0\), where \(\delta \) is a small positive tolerance. Since we do not assume that the functions \(f, g_j, j=1,\ldots ,p\) are differentiable, a derivative-free technique that does not assume convexity and differentiability is required for solving the problem (1).

The CGO problem (1) can be formulated as a BCGO problem with an objective penalty function that is related to both f and the constraint violation. Thus, the problem (1) is equivalent to

$$\begin{aligned} \begin{array}{rl} \underset{x \in X \subset {\mathbb {R}}^n}{\min }&{}\,\, \phi (x) \\ \end{array} \end{aligned}$$
(2)

in the sense that they have the same solutions, provided that the objective penalty function \(\phi \) satisfies some properties [3].

In this study, the main goal is to derive a penalty function, that is self-adaptive, in the sense that the constraint violation weights, also considered as penalty parameter values, are not provided by the user but rather they are computed using information gathered from the violated constraints at the current point. Furthermore, the objective function and the constraint violation values are normalized taking into consideration reference values of the objective function and constraints achieved in the search space of the problem. The description of the self-adaptive penalty function follows. The objective function value f at each point x is normalized making use of the two parameters \(f^{\min } := \mathop {\min }\nolimits _{x \in X} f(x)\) and \(f^{\max } := \mathop {\max }\nolimits _{x \in X} f(x)\) in a way that the new fitness F is computed by:

$$\begin{aligned} F(x) = \dfrac{f(x) - f^{\min }}{f^{\max } - f^{\min }}. \end{aligned}$$
(3)

The violation of each constraint j, at each point x of the search space X, is given by \(\max \{g_j(x),0\}\), and the total violation is the sum of the p violations:

$$\begin{aligned} \Sigma (x) = \displaystyle \sum _{j=1}^p \max \{g_j(x),0\}, \end{aligned}$$
(4)

which is zero if \(x \in S\) (a feasible point) and positive if \(x \notin S\). However, to scale the constraint violation to the same order of magnitude as the new fitness F, each constraint violation is normalized using the following expression:

$$\begin{aligned} V_j(x) = \dfrac{\max \{g_j(x),0\}}{g_j^{\max }}, \quad \text{ where } \quad g_j^{\max } := \max _{x \in X{\setminus }{S}}\left\{ \max \{g_j(x),0\}\right\} \end{aligned}$$
(5)

is the largest value for the violation of the constraint j for all \(x \in X{\setminus }{S}\), being the subset \(X{\setminus }{S}\) the relative complement of S in X. Finally, the penalty function to be minimized is as follows:

$$\begin{aligned} \phi (x) = \left\{ \begin{array}{ll} F(x),&{} \text{ if } x \in S,\\ F(z) + \dfrac{1}{p} \displaystyle \sum _{j=1}^p V_j(x) r_j, &{} \text{ if } x \in X{\setminus }{S} \text{ and } f(x) \le f(z), \\ F(x) + \dfrac{1}{p} \displaystyle \sum _{j=1}^p V_j(x) r_j, &{} \text{ if } x \in X{\setminus }{S} \text{ and } f(x) > f(z),\\ \end{array} \right. \end{aligned}$$
(6)

where \(z \in S\) is a fixed point such that \(f(z) \ge f^*\) and each weight \(r_j\) is defined by the proportion of the search space X that violates the constraint \(g_j\):

$$\begin{aligned} r_j := \frac{\left| {x\in X: g_j(x) > 0} \right| }{\left| X\right| }, \quad j=1,\ldots ,p. \end{aligned}$$
(7)

The next results show that problems (1) and (2) are equivalent, i.e., they have the same global minimizers.

Theorem 2.1

Let \(x^*\in S\) be a global solution to the problem (1) and let \(z \in S\) be such that \(f(z) \ge f(x^*)\). Then, \(x^*\) is a global solution to the problem (2), where \(\phi \) is the penalty function defined in (6).

Proof

Let \(x^*\in S\) be a global solution to the problem (1). By definition, we have \(f(x^*)\le f(x)\) for all \(x\in S\). Hence, for all \(x\in S\), we get:

$$\begin{aligned} \phi (x^*) = \frac{f(x^*)-f^{\min }}{f^{\max }-f^{\min }} \le \frac{f(x)-f^{\min }}{f^{\max }-f^{\min }} = \phi (x). \end{aligned}$$

We now consider the case when \(x \in {X{\setminus }{S}}\). Assuming that (a) \(f(x) \le f(z)\), we have \(\phi (x^*) = F(x^*) \le F(z) < F(z) + \dfrac{1}{p} \sum _{j=1}^p V_j(x) r_j = \phi (x)\), since \(V_j\) and \(r_j\) are positive, \(f(x^*) \le f(z)\) and using the definition (6). Now, assuming that (b) \(f(x) > f(z)\), we get

$$\begin{aligned} \phi (x^*) = F(x^*) \le F(z)< F(x) < F(x) + \dfrac{1}{p} \sum _{j=1}^p V_j(x) r_j = \phi (x), \end{aligned}$$

, and therefore, \(\phi (x^*) \le \phi (x)\) for all \(x \in X\), i.e., \(x^*\) is a global solution to the problem (2). \(\square \)

Lemma 2.1

If \(x^*\) is a global solution to the problem (2), where \(\phi \) is the penalty function defined in (6), then \(x^*\) is a feasible point for the problem (1).

Proof

By contradiction, we assume that \(x^*\in ~X{\setminus }{S}\). When \(f(x^*) \le f(z)\) and \(z \in S\) we get, from (6), \(\phi (x^*) = F(z) + \dfrac{1}{p} \sum _{j=1}^p V_j(x^*) r_j > F(z) = \phi (z)\); on the other hand, when \(f(x^*) > f(z)\), we obtain the relation [using (6)] \(\phi (x^*)~=~F(x^*)~+~\dfrac{1}{p} \sum _{j=1}^p V_j(x^*) r_j> F(x^*) > F(z) =\phi (z)\), which contradict the definition of a global solution to the problem (2). Therefore, \(x^*~\in ~S\). \(\square \)

We are now able to establish the reciprocal of Theorem 2.1.

Theorem 2.2

Let \(x^*\in X\) be a global solution to the problem (2), where \(\phi \) is the penalty function defined by (6). Then, \(x^*\) is a global solution to the problem (1).

Proof

By Lemma 2.1 \(x^*\in S \subset X\). We have \(F(x^*) = \phi (x^*) \le \phi (x)\) for all \(x \in X\), and in particular, for all \(x \in S\), we have \(F(x^*) \le F(x)\), which implies \(f(x^*) \le f(x)\). Therefore, \(x^*\) is a global solution to the problem (1). \(\square \)

3 Solving the BCGO Problem

The present penalty method aims to penalize the inequality constraints violation of the problem (1) while the bound constraints are always satisfied when solving (2). According to the Theorems 2.1 and 2.2, it is sufficient to find a global solution to the problem (2), that is, a global minimizer of \(\phi (x)\) in X. To solve the BCGO problem, the meta-heuristics FA [7, 9, 15], jDE [30], PSO [31, 32], CMA-ES [33] and ABC [34] have been selected. Since they are population-based algorithms, we now show how to adequate the computation of parameters \(f^{\min }\), \(f^{\max }\), f(z), \(g_j^{\max }\) and \(r_j\), \(j=1,\ldots ,p\), shown in (3), (5), (6) and (7), to a technique that handles a population of solutions at each iteration.

Let \(X_k := \{x^1_k,\ldots ,x^m_k\}\) represent the population of the \(m < + \infty \) current points at iteration k, where \(x^i_k \in {\mathbb {R}}^n, i=1,\ldots ,m\). To compute the normalized fitness F, as defined in (3), at each point x of the population, the parameters \(f^{\min } := \mathop {\min }\nolimits _{x \in X_k} f(x)\) and \(f^{\max } := \mathop {\max }\nolimits _{x \in X_k} f(x)\) are required, where we note that the point with the lowest function value will have \(F(x) = 0\) and the point with largest objective function value will have \(F(x)=1\). To compute the normalized violation of the constraint j, the parameter \(g_j^{\max } := \mathop {\max }\nolimits _{x \in X_k} \left\{ \max \{g_j(x),0\}\right\} \) is defined as the largest value for the violation of the constraint j attained at all points in \(X_k\). The reference point z is the feasible point with the lowest objective function value found so far. If the population has no feasible points, f(z) is initially and temporarily set to \(f^{\max }\), so that \(f(x) \le f(z)\) for all \(x \in X_k\) and \(F(z) =1\). The value of f(z) is updated only when the first feasible point is encountered. Noting that, at each iteration k, the set of m generated trial points is represented by \(T_k:=\{t^1_k,\ldots ,t^m_k\}\), if the generated \(T_k\) contains feasible points, the one with least function value, say \(f(t^l_k)\), is compared with f(z) and we set \(f(z) = f(t^l_k)\) if \(f(t^l_k) < f(z)\); otherwise f(z) is not updated. Similarly, f(z) is maintained to the next iteration if there is no feasible points in the trial population. Finally, each weight/penalty parameter \(r_j\) is iteratively computed as \(r_j :=\left( \left| {x\in X_k: g_j(x) > 0} \right| \right) /m\) \((j=1,\ldots ,p)\) and represents the proportion of points in the population that violate the constraint \(g_j\). Thus, a constraint that is violated by a larger set of points of the population than any other will have a larger weight.

4 Hybrid Self-Adaptive Penalty FA for CGO

This section details the algorithm that implements the self-adaptive penalty concept, while using the meta-heuristic FA to compute the solution of the BCGO problem (2) (see Algorithm 1. This is a hybrid FA in the sense that a local intensification procedure based on a typical DE mutation operator [29] is implemented aiming to exploit the region around the points of the population. The intensification procedure starts by applying a mutation strategy to the position of the best firefly, \(x^1\), where \(\phi (x^1) < \phi (x^i), i=2,\ldots ,m\), componentwise with probability \(p_m\), to create the mutant best point, \(v^1 = x^1 + F_b \left( x^{i_1} - x^{i_2}\right) \), where \(i_1\) and \(i_2\) are two different indices randomly selected from the set \(\{2,\ldots ,m\}\) and \(F_b > 0\) is a real parameter. A projection onto X is carried out if necessary, \(v^1\) and \(x^1\) are compared and the preferred point is selected as new \(x^1\). Here, the preferred point is the one that has the smallest f value if both are feasible; otherwise is the point that has the smallest violation. The DE/best/1 mutation is then applied to the remaining points of the population, \(v^i = x^1 + F_o \left( x^{i_1} - x^{i_2}\right) \), \(i=2,\ldots ,m\), componentwise with probability \(p_m\), where \(F_o >0\) is a real parameter, and \(i_1\) and \(i_2\) are two different indices randomly chosen from the set \(\{1,\ldots ,i-1,i+1,\ldots ,m\}\). The mutant \(v^i\) and \(x^i\) are compared and the preferred point is maintained to the next iteration.

figure a

For the convergence analysis of the Algorithm 1, we follow the methodology presented in [3]. Attending to the properties of the FA, and the way the penalty function \(\phi \) is defined, we can establish the following results.

Theorem 4.1

Let \(X_k\) be the current population of m points at iteration k, \(T_k\) be the set of trial points at iteration k, and \(X_{k+1}\) be the population with the points selected for the next iteration \(k+1\). Then \(f(z_k) \ge f(z_{k+1})\), where \(z_k\) is the feasible point with the lowest function value in the set \(X_k\) and \(z_{k+1}\) is the feasible point with the lowest function value found in \(T_k\). Furthermore, \(\phi (z_k) \le \phi (t^i_k)\), for all infeasible \(t^i_k \in T_k\).

Proof

Let \(z_{k}\) be the best feasible solution of \(X_k\). Obviously, \(z_k\) will never be replaced by any infeasible point of \(T_k\). We assume now that there exists a feasible point \(t^i_k \in T_k\) such that \(\phi (t^i_k) < \phi (z_k)\). Then,

\(\phi (t^i_k) = F(t^i_k) < F(z_k) = \phi (z_k)\) implies \(f(t^i_k) < f(z_k)\), where \(f^{\min }\) and \(f^{\max }\) (for the definition of fitness F) are selected from the set \(X_k \cup T_k\). We conclude that \(f(z_k) > f(t^i_k) \ge f(z_{k+1})\). However, if the feasible point \(t^i_k \in T_k\) does not satisfy \(\phi (t^i_k) < \phi (z_k)\), then \(\phi (t^i_k) = F(t^i_k) \ge F(z_k) = \phi (z_k)\) which implies \(f(t^i_k) \ge f(z_k)\) and \(f(z_{k+1}) = f(z_k)\). In both cases, \(f(z_k) \ge f(z_{k+1})\). We consider now the case where \(t^i_k \in T_k\) is infeasible. We analyze both cases: (a) \(f(t^i_k) \le f(z_k)\) and (b) \(f(t^i_k) > f(z_k)\). In case (a), assume that \(\phi (t^i_k) < \phi (z_k)\) which implies

$$\begin{aligned} F(z_k) + \dfrac{1}{p} \displaystyle \sum _{j=1}^p V_j\left( t^i_k\right) r_j < F(z_k) \end{aligned}$$
(8)

since \(t^i_k\) is infeasible and \(f(t^i_k) \le f(z_k)\) [see (6)]. However, the last condition in (8) is a contradiction because the second term on the left hand side of the equation is positive. When in case (b), we assume that \(\phi (t^i_k) < \phi (z_k)\), we get \(F(t^i_k) + \dfrac{1}{p} \sum _{j=1}^p V_j(t^i_k) r_j < F(z_k)\), and therefore,

$$\begin{aligned} \dfrac{1}{p} \sum _{j=1}^p V_j(t^i_k) r_j< F(z_k) - F\left( t^i_k\right) = \dfrac{f(z_k) - f\left( t^i_k\right) }{f^{\max } - f^{\min }} < 0 \end{aligned}$$

which is a contradiction. Hence, we must have \(\phi (z_k) \le \phi (t^i_k)\) for all infeasible points \(t^i_k \in T_k\). \(\square \)

In the next theorem, we prove that the sequence \(\{f(z_k)\}\) converges and the limit is the greatest lower bound, or infimum, \(f^{*}\).

Theorem 4.2

Let \(z_k\) be the feasible point with the lowest objective function value obtained at iteration k. Then, \(\underset{k \rightarrow \infty }{\lim } f(z_k) = f^{*}\).

Proof

By Theorem 4.1, \(\{f(z_k)\}\) is a monotonically decreasing sequence. Since \(f^*\) is the infimum of the sequence, then for all \(\delta >0\), \(f^*+ \delta \) is not an infimum of the sequence. Hence, there exists \(K=K(\delta ) \in \mathbb {N}\), such that

$$\begin{aligned} f^*- \delta< f^*\le f(z_k) \le f(z_K) < f^*+\delta \end{aligned}$$

for all \(k \ge K\), meaning that \(f(z_k) \rightarrow f^*\) as \(k \rightarrow \infty \). \(\square \)

In the Algorithm 1, to select between the current and the trial positions, both penalty function values \(\phi (x^i_k)\) and \(\phi (t^i_k)\) are compared. When both \(x^i_k\) and \(t^i_k\) are feasible, the point with the lowest f wins [recall (6) and that parameters \(f^{\min }\) and \(f^{\max }\) are computed based on the set \(X_k\cup T_k\)]. On the other hand, when \(x^i_k\) and \(t^i_k\) are infeasible, the selection is determined by their constraint violation and F values combined in the penalty \(\phi \). However, when \(x^i_k\) is feasible and \(t^i_k\) is infeasible, the probability that the trial \(t^i_k\) is selected over \(x^i_k\) as the current point for the next iteration \(k+1\) could be determined.

Theorem 4.3

Let \(x^i_k \in X_k\), where \(X_k\) is the current population at iteration k, and \(t^i_k \in T_k\), where \(T_k\) is the set of trial points at iteration k, be such that \(x^i_k\) is feasible and \(t^i_k\) is infeasible. Assume that there exists \(0 < \bar{r} \le 1\) such that eventually \(r_{j} \ge \bar{r}\) for \(j=1,\ldots ,p\). Then, the probability of selecting \(t^i_k\) over \(x^i_k\) is zero, i.e., \(Pr\left[ \dfrac{1}{p} \sum _{j=1}^p V_j(t^i_k) r_j < F(x^i_k) - F(z_k) \right] = 0\) for \(r_{j}, j=1,\ldots ,p\), that satisfy \(r_j \ge \bar{r}.\)

Proof

Assume that \(t^i_k \in T_k\) is almost always selected when compared with a feasible \(x^i_k \in X_k\), i.e., \(\phi (t^i_k) < \phi (x^i_k)\). Hence, (a) if \(f(t^i_k) \le f(z_k)\), we have \(\phi (t^i_k) = F(z_k) + \Sigma ^n(t^i_k) < \phi (x^i_k) = F(x^i_k)\) which implies

$$\begin{aligned} 0< \Sigma ^n\left( t^i_k\right) < F\left( x^i_k\right) - F(z_k) \le 1, \end{aligned}$$
(9)

where for simplicity \(\Sigma ^n(t^i_k) = \dfrac{1}{p} \sum _{j=1}^p V_j(t^i_k) r_j > 0\). On the other hand, (b) if \(f(t^i_k) > f(z_k)\), we get \(\phi (t^i_k) = F(t^i_k) + \Sigma ^n(t^i_k) < \phi (x^i_k) = F(x^i_k) \) yielding

$$\begin{aligned} 0< \Sigma ^n\left( t^i_k\right)< F\left( x^i_k\right) - F\left( t^i_k\right) < F\left( x^i_k\right) - F(z_k) \le 1. \end{aligned}$$
(10)

We note that in both (9) and (10), \(f(x^i_k) - f(z_k)>0\), provided that \(x^i_k \ne z_k\). We now study the probability of \(\Sigma ^n(t^i_k) < F(x^i_k) - F(z_k)\) being held. We assume that the trial point \(T^i_k\) is a random variable with realizations \(t^i_k\) and that \(\Sigma ^n(T^i_k)\) increases uniformly away from the feasibility. Since \(F(x^i_k) - F(z_k)\) is a fixed number in the range (0, 1], \(Pr\left[ \Sigma ^n(T^i_k)< F(x^i_k) - F(z_k) \right] >0\) holds for (9) and (10). The larger the \(F(x^i_k) - F(z_k)\) is, the larger the probability is. However, this probability also depends on \(\Sigma ^n(T^i_k)\). By contradiction, we assume that there exists \(0 < \bar{r} \le 1\) such that \( Pr\left[ \dfrac{1}{p} \sum _{j=1}^p V_j(T^i_k) r_j < F(x^i_k) - F(z_k) \right] > 0\), when \(r_{j}\), \(j=1,\ldots ,p\) satisfy \(r_j \ge \bar{r}\). This means that

$$\begin{aligned} \frac{1}{p} \displaystyle \sum _{j=1}^p V_j\left( t^i_k\right) r_j < F\left( x^i_k\right) - F(z_k) \, \text{ for } \, r_{j} \ge \bar{r}, j=1,\ldots ,p. \end{aligned}$$
(11)

However, there certainly exists a value \(T^i_k = t^i_k\), such that for \(r_{j}\), \(j=1,\ldots ,p\) satisfying \(r_j \ge \bar{r}\), \(\dfrac{1}{p} \displaystyle \sum _{j=1}^p V_j(t^i_k) r_j \ge F(x^i_k) - F(z_k)\), which contradicts (11). \(\square \)

We now consider the situation when \(x^i_k\) is infeasible and the trial \(t^i_k\) is feasible and analyze the probability that the current \(x^i_k\) is selected over \(t^i_k\) as the current point for the next iteration \(k+1\).

Theorem 4.4

Let \(x^i_k \in X_k\), where \(X_k\) is the current population of m points at iteration k, and \(t^i_k \in T_k\), where \(T_k\) is the set of trial points at iteration k, be such that \(x^i_k\) is infeasible and \(t^i_k\) is feasible. Then, there exists \(0 < \bar{r} \le 1\) such that the probability of selecting \(x^i_k\) over \(t^i_k\) is zero, i.e., \(Pr\left[ \dfrac{1}{p} \sum _{j=1}^p V_j(x^i_k) r_j < F(t^i_k) - F(z_k) \right] = 0\) when \(r_{j}, j=1,\ldots ,p\) satisfy \(r_j \ge ~\bar{r}\).

Proof

Assume that \(x^i_k \in X_k\) is almost always selected when compared with a feasible \(t^i_k \in T_k\), which means that \(\phi (x^i_k) < \phi (t^i_k)\). When (a) \(f(x^i_k) \le f(z_k)\), \(\phi (x^i_k) = F(z_k) + \Sigma ^n(x^i_k) < \phi (t^i_k) = F(t^i_k)\) and \(\Sigma ^n(x^i_k) < F(t^i_k) - F(z_k)\) is obtained. When (b) \(f(x^i_k) > f(z_k)\), \(\phi (x^i_k) = F(x^i_k) + \Sigma ^n(x^i_k) < \phi (t^i_k) = F(t^i_k)\) implies \(\Sigma ^n(x^i_k) < F(t^i_k) - F(x^i_k)\) or \(\Sigma ^n(x^i_k) < F(t^i_k) - F(z_k)\).

Assuming that the trial point \(T^i_k\) and \(f(T^i_k)\) are random variables with realizations \(t^i_k\) and \(f(t^i_k)\), respectively, we have that \(F(T^i_k) - F(z_k)\) is bounded (since \(t^i_k\) is feasible, \(f(t^i_k)\) is bounded and f(z) is fixed). Thus, there exists a set of values \(T^i_k = t^i_k\) such that \(\Sigma ^n(x^i_k) < F(t^i_k) - F(z_k)\) holds, which means that \( Pr\left[ \Sigma ^n(x^i_k) < F(T^i_k) - F(z_k) \right] > 0\). However, there certainly exists \(0 < \bar{r} \le 1\) such that \(\dfrac{1}{p} \displaystyle \sum _{j=1}^p V_j(x^i_k) r_j > F(t^i_k) - F(z_k)\) holds for \(r_{j}\), \(j=1,\ldots ,p\) that satisfy \(r_j \ge \bar{r}\), implying that \(Pr\left[ \dfrac{1}{p} \sum _{j=1}^p V_j(x^i_k) r_j < F(T^i_k) - F(z_k) \right] = 0\) for \(r_j \ge \bar{r}, j=1,\ldots ,p\). \(\square \)

5 Numerical Experiments

In this section, the performance of the self-adaptive penalty technique when solving a benchmark set of CGO problems is investigated. Unless otherwise stated, we set \(m =50\). In the context of defining the reference point z and the value of the penalty in (6), a point x is considered feasible if \(\Sigma (x) \le \) 1e−8.

First, we aim to analyze the effectiveness of the technique when using a meta-heuristic to compute a global minimizer of the penalty \(\phi (x)\) in X, as defined by the BCGO problem (2). The FA, jDE, PSO, CMA-ES and ABC meta-heuristics are tested, using the parameter values as suggested in the papers [21, 30, 32,33,34]. For this experiment, the set g01–g13 of the g-collectionFootnote 1 is used, noting that problems g03, g05, g11 and g13 have equality constraints and the tolerance \(\delta =1\)e−4 is used. In these comparisons, we stop the algorithms after 200,000 function evaluations. The results are summarized in Table 1, where ‘average’ and ‘SD’ represent the average and the standard deviation of the function values obtained by the algorithms after 20 runs. The best known optimal solutions, ‘\(f^{*}\)’, are displayed in Table 2. Best results (the wins) are ‘Bold’, and ties are in the ‘italic’ style. From the table, it is possible to see that the FA has a larger number of wins than the others in both criteria. Overall, the self-adaptive penalty technique, with simple and easy to code meta-heuristics for solving the BCGO problem, is effective in finding global optimal solutions to CGO problems.

Table 1 ‘Average’ and ‘SD’ produced by the self-adaptive penalty algorithm
Table 2 Results from our study and from [3]
Table 3 Results from our study and from [4]
Table 4 Results from our study and from [27]
Table 5 Results from our study and from [36]
Table 6 Comparing our results with those in [38, 39]

Second, we aim to compare the hybrid self-adaptive penalty FA with other algorithms available in the literature. Three recently proposed adaptive penalty-based stochastic global optimizers [3, 4, 27] are used. When invoking the local intensification search in the FA, some parameters have been chosen to be problem dependent, namely \(p_m\), \(F_b\) and \(F_o\), with the goal of giving the best performances. To compare our results with those reported in [3] (an adaptive penalty-based DE algorithm), we stop the algorithm after 50,000 function evaluations (as indicated in [3]). The results are summarized in Table 2, where ‘best\(_E\)’, ‘worst\(_E\)’ and ‘SD\(_E\)’ represent the best error value, \(f_{\text {best}} - f^*\), the worst error and the standard deviation of the error values, based on 100 runs, respectively. Although the results produced by our algorithm are satisfactory, they are not superior to those reported in [3] except for problems g01 and g13, being g12 a tie. A larger number of function evaluations would certainly be required for some problems. While the local search has provided good quality solutions, it has raised the computational effort.

When comparing our results with those produced by DUVDE + APM in [4] (the APM with dynamic use of DE variants), the subset g01–g11 is used. The results are summarized in Table 3, where the ‘best’, the ‘average’ and the ‘SD’ of the solutions obtained in 20 independent runs are shown. The algorithms terminate after 350,000 function evaluations. The conclusions are that our algorithm is able to produce comparative and high-quality solutions when a larger number of evaluations are allowed.

Table 4 shows the results obtained after 50 runs, produced by our algorithm when solving the set g01–g13 with \(m=100\) and a maximum of 500,000 function evaluations (as in [27], where a self-adaptive penalty-based GA, is used). The results of our study are in general superior to those reported in [27] and we reiterate the previous conclusions.

Now, we compare our algorithm with a modified ABC algorithm that uses Deb’s rules consisting of three simple heuristic rules for constraint handling [36]. The following conditions are considered: \(m=40\), 30 runs and a maximum of 24,0000 function evaluations (like in [36]). From the results in Table 5, it is possible to conclude that the hybrid self-adaptive penalty FA performs similarly to the modified ABC on nine problems, is better on g06 and g13 and is worse on g05 and g10.

Finally, a set of 20 problems available in http://www.ime.usp.br/~egbirgin/ Footnote 2 is used. We aim to compare the herein proposed hybrid self-adaptive penalty FA with other penalty-type approaches. The comparison involves the results presented in [38], where an augmented Lagrangian framework is combined with a meta-heuristic, known as artificial fish swarm algorithm, and those reported in [39], where a nondifferentiable exact penalty function framework is implemented with the deterministic DIRECT algorithm. The results are summarized in Table 6, where ‘best’ is the best solution found among the 30 runs, ‘median’ is the median of the 30 solutions and ‘n.f.e.(b)’ is the number of function evaluations to reach the value ‘best’. The solution, ‘sol.’, the number of function evaluations, ‘n.f.e.’, reported in [39] and the best known solution available in the literature, ‘\(f^*\)’, are also shown in the table. When we compare our results with those in [38], we conclude that the quality of the obtained solutions is comparable although a larger number of function evaluations are needed to reach those solutions. On the other hand, the quality of our solutions is superior to the one displayed by the penalty-based DIRECT algorithm [39].

6 Conclusions

We present a new self-adaptive penalty function that aims to penalize solutions, that violate the constraints of the problem, and is user-independent in the sense that penalty parameter values are set automatically by the information gathered from the violated constraints at each iteration. We establish the existence of an equivalence between the CGO problem and the BCGO problem with the self-adaptive penalty objective. The paper also shows the practical performance of a set of well-known meta-heuristics when solving the BCGO problem by demonstrating that they are effective in converging to the global solutions. Due to the superior performance of the recent FA meta-heuristic, the paper proposes a hybrid FA aiming to enhance the quality of the solutions. The convergence analysis of the algorithm has also been established. With the numerical experiments carried out with two sets of benchmark problems, we demonstrate that the proposed self-adaptive penalty method is effective in solving CGO problems. Future developments will be focused on solving higher dimensional optimization problems and reducing the computational effort in terms of function evaluations.