1 Introduction

Many resource allocation or planning problems require compromises among the objectives of several interacting individuals or agencies, most of the time, arranged in hierarchical administrative structure and can have independent even sometimes conflicting objectives. A decision maker (DM) at one level of the hierarchy may have its objective function determined partly by variables controlled at other levels. Assuming that the decision process has a preemptive nature and having \(k\) levels of hierarchy, we consider the decision maker at the \(k\)th level to be the leader and those at lower levels to be followers.

Such type of decision making problems are often modeled as Stackelberg games, but with some finite number of hierarchical levels. We refer to these problems as decentralized multilevel systems or multilevel Stackelberg problems (MSPs). Such problems are also referred to as multilevel programming problems in some literature and have got the attention of researchers in the last four decades.

MSP was first proposed by Bracken and McGill (1973) to model a decentralized non-cooperative decision making problems. However, the formulation of the problem in the form of Bilevel and Multilevel programs was first used in Candler and Townsley (1982) (See also Colson et al. (2007) for the details of the historic development of the problem). MSPs involving random variables was also proposed first in Patriksson and Wynter (1999). These kind of problems are known to be common in various areas of application.

The mathematical form of a general k-level Stackelberg problems can be expressed as:

(1)

For a mathematical formulation of MSPs, consider the problem in (1) composed of \(k\)-levels each characterized by individual objective functions \(f_{i}\) for \(i= 1,2,\ldots ,k\), which are to be minimized by the respective DMs. Let the decision variable space \(\mathbb {R}^{n}\) be partitioned among \(k\)-levels, such that \((x_{1},\ldots ,x_{k})\in X_{1}\times \cdots \times X_{k}=S\subseteq \mathbb {R}^{n}\), where \(X_{i} \subseteq \mathbb {R}^{n_{i}}\), \(\sum _{i=1}^{k}n_{i}=n\) and \(S\) be a nonempty set. Assume that decisions are made sequentially beginning with the leader DM who has control over a vector \(x_{1} \in X_{1}\), followed by \(2^{nd}\) DM who has control over a vector \(x_{2}\in X_{2}\) down through \(k^{th}\) DM who has control over a vector \(x_{k}\in X_{k}\). If there are two DMs, in which the leader DM first makes a decision and the follower DM who knows the decision of the opponent makes a decision next, then the problem can be formulated as a Bilevel Stackelberg problem.

Some of the basic components in MSPs are defined below.

  1. 1.

    Constraint region of the MSP is

    $$\begin{aligned} \varOmega = \{ (x_{1},\ldots ,x_{k}) \in S: (x_{1},\ldots ,x_{k}) \in S_{1}\cap S_{2}\cap \cdots \cap S_{k}\} \end{aligned}$$
  2. 2.

    For each given vector \((x_{1}^{*},\ldots ,x_{p}^{*})\in X_{1}\times \cdots \times X_{p}, 1\le p<k,\) the feasible region of the \((p+1)\), and lower order levels is given by:

  3. 3.

    Projection of \( \varOmega \) onto the \(1^{st}, \ldots ,p^{th}\), \(1\le p<k\), levels decision space is given by:

    $$\begin{aligned} \varOmega (X_{1},\ldots ,X_{p}) = \{(x_{1},\ldots ,x_{p}) : \exists (x_{p+1},\ldots ,x_{k}), (x_{1},\ldots ,x_{k})\in \varOmega \} \end{aligned}$$
  4. 4.

    For each \((x_{1}^{*},\ldots ,x_{p}^{*}) \in \varOmega (X_{1},\ldots ,X_{p})\), \(1\le q< k\), \(p=k-q\), the rational reaction set for the \((p+1)\) and lower order levels is given by:

  5. 5.

    The Induced Region \((\mathcal {I\!R})\) at level one is:

    $$\begin{aligned} \mathcal {I\!R}=\{(x_{1},\dots ,x_{k})\in S_{1}: x_{1}\in X_{1}, (x_{2},\dots ,x_{k}) \in M(x_{1})\} \end{aligned}$$
  6. 6.

    Using the set \(\mathcal {I\!R}\), one can describe the MSP at level one as:

    $$\begin{aligned} \min _{ (x_{1},\dots ,x_{k})\in \mathcal {I\!R}} f_{1}(x_{1},\dots ,x_{k}) \end{aligned}$$
    (2)

We refer to any optimal solution of problem (2) as a Stackelberg solution of problem (1).

Because of the hierarchical structure of the rational reaction sets, MSPs are difficult to solve. Algorithms for function optimization of such type of problems are generally limited to convex regular functions and the solution methods proposed for MSPs are mainly for Bilevel Stackelberg problems with linear or convex property. Moreover, the linear Bilevel Stackelberg problem has been shown to be NP-hard in Ben-Ayed and Blair (1990) and to be strongly NP-hard in Hansen et al. (1999). However, many functions that can possibly model real life problems are multi-modal, discontinuous, and non-differentiable.

To solve MSPs, many numerical algorithms for linear bilevel programming problems have been designed in the last two decades, such as implicit enumeration scheme in Candler and Townsley (1982), k-th best algorithm in Bialas and Karwan (1984), branch and bound algorithm in Hansen et al. (1999), grid search algorithm in Bard (1983), descent algorithm in Savard and Gauvin (1994), penalty functions method in White and Anandalingam (1993), fuzzy approach in Shih et al. (1996) and genetic algorithms in Liu (1998), Hejazi et al. (2002) and Wang et al. (2011).

Calvete et al. (2009) have proposed a genetic algorithm to solve linear fractional bilevel problems on a polyhedral constraint region. Since the population generated for the algorithm are extreme points of the polyhedron, it cannot be extended to a general bilevel problem with non polyhedral constraint region. On the other hand, the algorithm for nonlinear bilevel Stackelberg problems proposed in Li and Wang (2011) assumes a unique reaction \(y\) for each selected \(x\) by the leader DM. With this assumption, the follower’s problem is considered to be equivalent to its stationary system based on Karush–Kuhn–Tucker (KKT) conditions. Then, \(y\) is replaced by \(y(x)\) the solution of the KKT system, and as a result, the bilevel Stackelberg problem is converted to a single optimization problem which involves only \(x\) and evolutionary algorithm was applied to find an optimal value of the leader’s objective function. If the inner problem is non-convex or convex but non-differentiable, however, the above procedure fails to work.

In Wang et al. (2005), another algorithm for nonlinear bilevel Stackelberg problems is proposed by constructing an optimization problem with two objective functions, such that, a Pareto optimal solution for the two objective optimization problem is an optimal solution for a single optimization problem which is transformed from the bilevel Stackelberg problem using the KKT conditions. To find the Pareto optimal solution of the equivalent two objective optimization problem, an evolutionary algorithm was applied. However, this method works for only special classes of bilevel Stackelberg problems as Pareto optimal points need not be optimal for the bilevel problems and vice-versa as described in Dempe (2002).

In Hejazi et al. (2002), a genetic algorithm for linear bilevel Stackelberg problems is proposed. However, the proposed genetic algorithm is applied only for a single level problem which is translated from the bilevel Stackelberg problems by deriving the Kuhn–Tucker condition for the second level problem.

The algorithms proposed in Liu (1998) and Tilahun et al. (2012) seem quite more general for bilevel and multilevel programming problems respectively.

In Liu (1998), a genetic algorithm is proposed for solving Stackelberg-Nash equilibrium of general bilevel programming models with multiple followers in which there might be information exchange among the followers. The algorithm in the paper does not assume any convexity, continuity, and differentiability in the models. However, the extension of the algorithm to general bilevel Stackelberg problems and the convergence of the algorithm to the desired solution is not studied for any formulation of the problem.

The algorithm proposed in Tilahun et al. (2012) tries to solve a multilevel problem by calling a (1 \(+\) 1)-evolutionary strategy at each level while fixing the decision variables on the other levels. However, running a (1 \(+\) 1)-evolutionary strategy at every step of movement decreases the efficiency of the algorithm. Moreover, the convergence of the algorithm to the required solution is not studied and the algorithm may converge to solutions which are too far from a global optimal solution.

Recently, a solution strategy for MSPs with a very special non-convexity formulation in the objectives of inner level problems has been proposed in Kassa and Kassa (2013). The algorithm tries to solve the problem by convexifying the inner level problem and use a multi-parametric programming approach to propose a branch-and-bound algorithm to find a global approximate solution for MSPs with non-convexity at their inner levels.

In this paper, we propose a new evolutionary type algorithm for a general MSP with bounded decision variables. A unique feature of the algorithm is that it is not affected by the behavior of the functions included in a problem and convergence of the algorithm has been studied for special problems containing non-convex and non-differentiable functions at any level.

Most existing researches on MSP describe the quality of approximate solutions by just looking at the objective function values at each level, which in-general may not be correct. Therefore, new definitions concerning \((\varepsilon ,\delta )\)-approximate Stackelberg solutions has been presented in this paper. This definition has been used to select a best solution from a given set of approximate Stackelberg solutions.

The paper is organized as follows. In Sect. 2, the proposed algorithm has been presented. Section 3 presents new studies on \((\varepsilon ,\delta )\)-approximate Stackelberg solutions and convergence proof of the proposed algorithm. Numerical results of illustrative examples are given in Sect. 4. Section 5 presents conclusive remarks of the paper.

2 Systematic evolutionary algorithm for a general MSPs with bounded decision variables (SEAMSP)

2.1 Description of the algorithm

SEAMSP searches Stackelberg solutions of a given MSP in a hierarchical way. Within an iteration, the leader DM observes the reaction of lower level DMs by passing different values of the variables under its control. We call those values, “virtual control parameters (VCPs)” of the leader DM. Similarly for each hierarchical VCPs passed by the DMs above their level, the other DMs observe the reactions of those DMs below their level by passing their own VCPs separately. The reaction is made sequentially beginning from the lowest \(kth\) DM up through the \(2nd\) DM. The leader DM records the feedbacks corresponding to all of its VCPs and selects the best of all based on its objective function and constraint region. The best VCP of the leader DM with all the hierarchical reactions of the lower level DMs is placed temporarily in a stock, being taken as the best hierarchical VCPs of the iteration. In the next iterations, the above steps are repeated again and again until the requirements of a termination is fulfilled. Best of the best hierarchical VCPs of the iterations, with regard to the leader’s objective function are proposed as an approximate Stackelberg solution of the MSP.

As the number of VCPs for each level increases, the probability of getting better results will increase, but accordingly the number of function evaluations will increase so that the computational time of the algorithm will not be acceptable. Instead, in SEAMSP each DM sends few but “intelligent VCPs” having a nice cooperation with the VCPs of the other DMs, so that it can represent the whole decision spaces in a random, unique, diverse and systematic way. The decision space of a variable is divided into equal regions of the number of VCPs, where each VCP represents a unique region. Once representing the decision spaces usingFootnote 1 systematic sampling, the DMs select candidates for a decision/reaction from the representatives of their spaces in a hierarchical way. Among other statistical sampling techniques we use systematic sampling to: (1) assure representation of whole decision spaces in a random, unique and diverse way; (2) protect a clustered selection of VCPs; (3) manage the selection of VCPs at each iteration, so that each iteration will come with new hierarchical VCPs from a region which didn’t have representatives before.

If a VCP is selected as a candidate for a reaction by a DM of any level below the leader, a mutation operator is applied to search for a better local representative on the region where the VCP is selected. The best among a candidate (parent) and its offsprings is/are selected by the DM according to its own objective values. Every iteration comes with new VCPs of the leader DM from a region that does not have a representative in the first decision space and tour the same hierarchical paths for a decision.

The main idea of the proposed algorithm, more or less, depends on:

  1. 1.

    Systematic and Fair distribution of VCPs on \(S\),

  2. 2.

    Definition of Mutation operator,

  3. 3.

    Selection of initial solution set and adaptation to the environment,

  4. 4.

    Termination criteria of the algorithm

The flowchart in Fig. 1 shows the procedures followed by the proposed algorithm to solve a k-level Stackelberg problem (1).

Fig. 1
figure 1

SEAMSP flowchart for MSPs

2.2 Detailed procedures of the proposed algorithm for the k-level Stackelberg problem, SEAMSP

For the algorithm we require the following Input values:    \(n_{j}\), \(m_{j}\), \(M_{j}\) \((\)where \(X_{j}=[m_{j},M_{j}]^{n_{j}})\), \(S_{j}\), and \(f_{j},\) \(j=1,2,\dots ,k\).

Notations:

  • \(d_{j} := M_{j} - m_{j}\), for \(j=1,2,\dots ,k\)

  • \(X_{ji}:=\{x^{*}: x^{*}\) is a VCP on \(X_{j}\), at iteration \(i\}\subseteq X_{j}\), \(j=1,2,\dots ,k\)

  • \(2b_{j}:=\) number of VCPs on a single dimension space of \(X_{j}\), \(j=1,2,\dots ,k\)

  • \(maxiter :=\) maximum number of iterations before termination

  • \(\pi _{j} := 1+|S_{j}|\), where \(|S_{j}|\) denotes the number of constraint functions in \(S_{j}\), \(j=1,2,\dots ,k\)

  • \(a \lesssim b:=\) greatest natural number \(a\) which is less than or equal to \(b\)

  • \(a-b \approx 0:=\) possible numbers \(a\) and \(b\) with minimum value of \(|a-b|\)

  • \(d := \prod _{j=1}^{k}{d_{j}^{n_{j}}}\), \(d' := \prod _{j=2}^{k}{d_{j}^{n_{j}}}\), \(n := \sum _{j=1}^{k}{n_{j}}\)

Parameters of the algorithm:

  • \(\alpha \) := a maximum number of hierarchical VCPs on \(S=X_{1}\times \dots \times X_{k}\), in one iteration

  • \(\gamma \) := a minimum number of iterations, if not terminated by \(maxiter\)

  • \(t(<\gamma )\) := number of nonempty initial solutions required before a start of computing tolerance of initial solutions of an iteration based on the leader objective function

  • \(\epsilon _{1}\) := tolerance of a reaction, from VCPs on \(X_{j}\), \(j=2,3,\dots k\)

  • \(\epsilon \) := maximum tolerance, with respect to the leaders problem, of consecutive \((\gamma - t)\) initial solutions of iterations

  • \(\beta _{1}\) := minimum number of VCPs on \(X_{1}\) for \(d_{1}=1\), if not terminated by tolerance

  • \(\beta _{2}\) := minimum number of iterations, if not terminated by tolerance

In the algorithm initial populations are systematically selected and processed using the above given notations and parameter values. The procedures are described in the coming points.

  1. 1)

    The maximum number of VCPs on a single dimensional space of \(X_{j}\), \(a_{j}\), are selected as follows: To be fair on the distributions of VCPs we consider the following \(k\) equations:

    $$\begin{aligned}&\displaystyle \prod _{j=1}^{k}a_{j}^{n_{j}} \lesssim \alpha \end{aligned}$$
    (3)
    $$\begin{aligned}&\displaystyle \gamma \left( \frac{a_{1}}{d_{1}}\right) ^{n_{1}} - \left( \frac{a_{j}}{d_{j}}\right) ^{n_{j}} \approx 0 \, , \text { where } j=2,3,\ldots ,k \end{aligned}$$
    (4)

    Equation (3) indicates that the total number of hierarchical VCPs is less than or equal to \(\alpha \) and we recommend \(\alpha \) to be directly proportional to \(d\). Equation (4) indicates that,Footnote 2 \(|X_{ji}|\) is directly proportional to \(d_{j}\), for \(j=1,\dots ,k\). From Eqs. (3) and (4), we reach to the following approximations:

    $$\begin{aligned} a_{1} \lesssim \left[ \frac{\alpha d_{1}^{n_{1}(k-1)}}{\gamma ^{k-1} d'}\right] ^{\frac{1}{n_{1}k}}, \text {~} a_{j} \lesssim d_{j}\left[ \gamma \left( \frac{a_{1}}{d_{1}}\right) ^{n_{1}}\right] ^\frac{1}{n_{j}}, \end{aligned}$$
    (5)

    where \(j=2,3,\ldots ,k\). Natural numbers \(a_{j}>1\), \(j=1,\dots ,k\) are chosen from (5), for sufficiently large \(\alpha \). Set \(b_{j}=floor\left( \frac{a_{j}}{2}\right) \),Footnote 3 \(j=1,2,\dots k\) and step-size \(\delta _{j}=\frac{d_{j}}{b_{j}},\) where the range \([m_{j},M_{j}]\) is divided into \(b_{j}\) intervals of step-size \(\delta _{j}\) and each interval contains two subintervals of step-size \(\frac{\delta _{j}}{2}\). For \(j>1\), from the two subintervals of the first interval two VCPs are selected randomly and the rest are distributed with a step-size of \(\delta _{j}\) from the two randomly selected VCPs, by using systematic sampling. To avoid repetition of VCPs on \(X_{1}\), the two subintervals of the first interval are divided into distinct \(maxiter\) sub-subintervals of step size \(\delta _{11}\), where \(\delta _{11}=\frac{\delta _{1}}{2maxiter}\). At each iteration new VCPs are randomly selected from two unique sub-subintervals in the two subintervals of the first interval, and distributed with the step size of \(\delta _{1}\) by using systematic sampling technique (Figs. 2, 3, 4 and 5).

  2. 2)

    After the representatives of each decision space are chosen, every VCP on \(X_{1}\) is sent to search good reactions from the VCPs on \(X_{j}\), where \(j>1\). While fixing \((x_{1}^{*},\ldots ,x_{j-1}^{*})\), if \(f_{j}(x_{1}^{*},\ldots ,x_{k}^{*})=v_{j}\) denotes the minimum \(f_{j}\) values at the hierarchical reaction from theFootnote 4 \(VCPs^{*}\) on \(X_{j}\times \cdots \times X_{k}\), then any point \((x_{j}',\ldots ,x_{k}')\) from the \(VCPs^{*}\) on \(X_{j}\times \dots \times X_{k}\) such that \((x_{1}^{*},\ldots ,x_{i-1}^{*},x_{i}',\ldots ,x_{k}')\in S_{i}\) and \(f_{i}(x_{1}^{*},\ldots ,x_{i-1}^{*},x_{i}',x_{i+1}^{*},\ldots ,x_{k}^{*}) \le v_{i} + \epsilon _{1}(1+|f_{i}(x_{1}^{*},\ldots ,x_{i-1}^{*},x_{i}',x_{i+1}^{*}\ldots ,x_{k}^{*})|) \), \(\forall i\) such that \(j\le i \le k\) is considered to be an approximate reaction to \((x_{1}^{*},x_{2}^{*},\ldots ,x_{j-1}^{*})\). The values of \(v_{j}\) are determined by evaluating the objective function \(f_{j}\) from the constraint region \(S_{j}\cap \cdots \cap S_{k}\) at the \(VCPs^{*}\) on \(X_{j}\) paired with the hierarchical reaction from the \(VCPs^{*}\) on \(X_{j}\times \cdots \times X_{k}\). The best, among all the representatives of the decision space \(X_{j}\) is assumed to be a parent and mutated to find better representatives from a region bounded by a movement scale of the mutation operator. The best of the parent and offsprings is selected by the respective DM. Mutation operator is applied to all parents by considering:

    1. (a)

      random movements of parents within restricted region

    2. (b)

      against any repetition of offsprings

    For instance, if we consider a parent \(\uplambda _{j}=[x_{1},\ldots , x_{n_{j}}]\in \mathbb {R}^{n_{j}}, 2\le j\le k\), then the mutation operator is defined by:

    $$\begin{aligned} {{\uplambda }_{j}^{{mut}_{m}}}= \left( \begin{matrix} x_{1}+(A(m,1) \delta _{j} rand) \\ \vdots \\ x_{n_{j}}+(A(m,n_{j}) \delta _{j} rand) \end{matrix} \right) , \end{aligned}$$

    where \(m \in \{1,2,\ldots ,2^{n_{j}}\}\), \(A\) is a \(2^{n_{j}}\times n_{j}\) matrix having a permutation of rows of entries \(1\) and \(-1\), (in MATLAB, \(A=rowexch(n_{j},2^{n_{j}})\) can give us the required matrix) and \(rand\) denotes a random number between \(0\) and \(1\). Generally offsprings of a parent are reproduced from distinct and equal rectangular neighbors (line-segments, rectangles, rectangular-boxes, ...etc) of their parents.

  3. 3)

    For any fixed leader’s VCP \(x_{1}^{*}\), if \((x_{2}^{*},\ldots ,x_{k}^{*})\) denotes the reaction from the \(VCPs^{*}\) on \(X_{2}\times \cdots \times X_{k}\) and \((x_{1}^{*},\ldots ,x_{k}^{*})\in S_{1}\), then the point \((x_{1}^{*},\ldots ,x_{k}^{*})\) is selected to be approximate point of the induced region, \(\mathcal {I\!R}\). An initial solution set is selected from all approximate points of \(\mathcal {I\!R}\), with the minimum value of the leader’s objective function \(f_{1}\). For a better adaptation of the environment, the initial solution is updated at any iteration having a better \(f_{1}\) value (Table 1). In a situation, where every representatives \((x_{1}^{*},\ldots ,x_{j-1}^{*})\), \(1<j\le k\) have a unique reaction, \((x_{j}^{*},\ldots ,x_{k}^{*})\), from the \(VCPs^{*}\) in \(X_{j}\times \cdots \times X_{k}\) and \((x_{1}^{*},\ldots ,x_{k}^{*})\in S_{1}\), the total function evaluations in one iteration is:

    $$\begin{aligned} \sum _{j=1}^{k}\pi _{j}\left( \prod _{i=1}^{j}(2b_{i})^{n_{i}}\right) + \sum _{j=2}^{k}\left( \pi _{j}2^{n_{j}}\prod _{i=1}^{j-1}(2b_{i})^{n_{i}}\right) \end{aligned}$$
    (6)
  4. 4)

    The algorithm terminates whenever the number of iterations reach \(maxiter\) or \(tol^{(p)}<\epsilon \) for any \(p\in \{1,2,3,\ldots \}\), where

    $$\begin{aligned} maxiter= & {} \max \left( \beta _{1}\left( \frac{d_{1}}{2b_{1}}\right) ^{n_{1}},\beta _{2}\right) , ~~~~~~\\ tol^{(p)}= & {} \sum _{i=t+(\gamma -t)(p-1)+1}^{t+(\gamma -t)p} \frac{f_{1}(x_{1}^{i},\ldots ,x_{k}^{i})-f_{1}(x_{1}^{i+1},\ldots ,x_{k}^{i+1})}{1+|f_{1}(x_{1}^{i},\ldots ,x_{k}^{i})|}, \end{aligned}$$

    \((x_{1}^{i},\ldots ,x_{k}^{i})\) is an initial solution at iteration \(i\).

Fig. 2
figure 2

Possible VCPs on \(X_{j}=[0,5]\), \(b_{j}=5\), \(j>1\), in iterations 1 and 2 resp

Fig. 3
figure 3

Aggregate VCPs on \(X_{1}=[0,5]\), \(b_{1}=1\), \(maxiter=5\), in iterations 1–5 resp

Fig. 4
figure 4

Possible VCPs on \(X_{j}=[0,8]^{2}\), \(b_{j}=8\), \(j>1\), in iterations 1–3 resp

Fig. 5
figure 5

Aggregate VCPs on \(X_{1}=[0,8]^{2}\), \(b_{1}=4\), \(maxiter=4\), in iterations 1–4 resp

Table 1 The proposed algorithm is summarized in the following table

Since \((x_{1}^{i},\ldots ,x_{k}^{i})\) is taken as a seed (initial) point in the next iteration, the result of the (\(i+1\)) iteration, \((x_{1}^{i+1},\ldots ,x_{k}^{i+1})\), is a better solution in terms of the leader’s objective function value.

3 Approximation of Stackelberg solutions

While using probabilistic method to solve optimization problems, we don’t expect to get exact solutions, instead the algorithms try to approach a solution in an acceptable time. To describe the quality of an approximate Stackelberg solution of a MSP and to select the best result from a set of approximate Stackelberg solutions, one needs to consider the following three basic properties of approximate results:

  1. 1.

    The feasibility of the results to the given MSP

  2. 2.

    The approximation of the results to the reaction sets

  3. 3.

    The fitness of the results, on evaluation of the leader’s objective function

Most existing algorithms compare their results by just looking the numerical values of the objective functions and the feasibility of their results at each level, which is generally not correct for approximate Stackelberg solutions. All the objective function values at a non-Stackelberg solution from the constraint region of a given MSP may have a better value than the same function values at a known Stackelberg solution. In other words, a Stackelberg solution doesn’t obey Pareto optimality.

Even though there is no common agreement on what an approximate solution to MSPs mean, according to the definition of a Stackelberg solution, in the next subsection we have tried to define different terms in a logical manner.

3.1 \((\varepsilon ,\delta )\):Approximation and comparison methods

Definition 1

A given point \((x_{1}^{*},\ldots ,x_{k}^{*}) \in \varOmega \) is said to be an \((\varepsilon ,\delta )\)-approximate Stackelberg solution to the MSP in (1), where \(\delta , \varepsilon \ge 0\), if:

  1. 1.

    \(f_{1}(x_{1}^{*},\ldots ,x_{k}^{*}) \le f_{1}(x_{1}',\ldots ,x_{k}') + \varepsilon (1+|f_{1}(x_{1}^{*},\ldots ,x_{k}^{*})|)~, ~ \forall (x_{1}',\ldots ,x_{k}')\in \mathcal {I\!R}\)

  2. 2.

    \(f_{j}(x_{1}^{*},\ldots ,x_{k}^{*}) \le f_{j}(x_{1}^{*},\ldots ,x_{j-1}^{*},x_{j},\ldots ,x_{k}) +\delta (1+|f_{j}(x_{1}^{*},\ldots ,x_{k}^{*})|)~,~\) \(~~~~ \forall j\in \{2,3,\ldots ,k\},~ ~ (x_{j},\ldots ,x_{k}) \in M(x_{1}^{*},\ldots ,x_{j-1}^{*})\)

A point \((x_{1}^{*},\ldots ,x_{k}^{*})\) satisfying the second condition is called in a \(\delta \) -reaction to \(x_1^*\).

Definition 2

A point \((x_{1}^{*},\ldots ,x_{k}^{*}) \in \varOmega \) is said to be first-rank better (or simply better) than \((x_{1}',\dots ,x_{k}')\) if

  1. 1.

    \((x_{1}',\ldots ,x_{k}') \notin \varOmega \), or of the following two statements holds true,

  2. 2.

    \(\forall (\varepsilon _{1},\delta _{1})\) such that \((x_{1}',\ldots ,x_{k}')\) is an \((\varepsilon _{1},\delta _{1})\)-approximate Stackelberg solution, \(\exists (\varepsilon ,\delta )\) such that \(\varepsilon \le \varepsilon _{1}\), \(\delta <\delta _{1}\) or \(\varepsilon < \varepsilon _{1}\), \(\delta \le \delta _{1}\) and \((x_{1}^{*},\ldots ,x_{k}^{*})\) is an \((\varepsilon ,\delta )\)-approximate Stackelberg solution,

Definition 3

A point \((x_{1}^{*},\ldots ,x_{k}^{*}) \in \varOmega \) is said to be second-rank better than \((x_{1}',\dots ,x_{k}')\) if

  1. 1.

    \((x_{1}',\ldots ,x_{k}') \notin \varOmega \), or

  2. 2.

    \(\forall (\varepsilon _{1},\delta _{1})\) such that \((x_{1}',\ldots ,x_{k}')\) is an \((\varepsilon _{1},\delta _{1})\)-approximate Stackelberg solution, \(\exists (\varepsilon ,\delta )\) such that \(\varepsilon +\delta <\varepsilon _{1}+\delta _{1}\) and \((x_{1}^{*},\dots ,x_{k}^{*})\) is an \((\varepsilon ,\delta )\)-approximate Stackelberg solution.

Proposition 1

  1. (a)

    Any Stackelberg solution is first-rank better than a non-Stackelberg point and any point can never be second-rank better than a Stackelberg solution.

  2. (b)

    (Transitivity) If a point \((x_{1}^{*},\dots ,x_{k}^{*})\) is first-rank \((\)or second-rank\()\) better than \((x_{1}',\dots ,x_{k}')\) and \((x_{1}',\ldots ,x_{k}')\) is first-rank (or second-rank) better than \((x_{1}'',\ldots ,x_{k}'')\), then \((x_{1}^{*},\dots ,x_{k}^{*})\) is first-rank (or second-rank) better than \((x_{1}'',\ldots ,x_{k}'')\).

  3. (c)

    A first-rank better solution is second-rank better, but not the converse.

Proof

  1. (a)

    Let \((x_{1}^{*},\ldots ,x_{k}^{*})\) be a Stackelberg solution and \((x_{1*},\ldots ,x_{k*})\) be non-Stackelberg point. Then \((x_{1}^{*},\ldots ,x_{k}^{*})\) is \((0,0)\)-approximate Stackelberg solution and \(\forall \varepsilon ,\delta \ge 0\), if \((x_{1*},\ldots ,x_{k*})\) is an \((\varepsilon ,\delta )\)-approximate Stackelberg solution then \(\varepsilon >0\) or \(\delta >0\) (or both \(\varepsilon >0\) and \(\delta >0\)), hence the result.

  2. (b)

    If \((x_{1}'',\dots ,x_{k}'')\notin \varOmega \), the statement holds clearly. Now let \((x_{1}'',\ldots ,x_{k}'')\in \varOmega \), and let \((\varepsilon _{1},\delta _{1})\), \((\varepsilon _{2},\delta _{2})\) and \((\varepsilon _{3},\delta _{3})\) denote the minimum possible values such that \((x_{1}^{*},\ldots ,x_{k}^{*})\), \((x_{1}',\ldots ,x_{k}')\), and respectively \((x_{1}'',\ldots ,x_{k}'')\) are \((\varepsilon ,\delta )\)-approximate Stackelberg solution. Then \(\varepsilon _{1}<\varepsilon _{2}\le \varepsilon _{3}\), \(\delta _{1}\le \delta _{2}<\delta _{3}\) or \(\varepsilon _{1}\le \varepsilon _{2}< \varepsilon _{3}\), \(\delta _{1}< \delta _{2}\le \delta _{3}\) (or \(\varepsilon _{1}+\delta _{1}<\varepsilon _{2}+\delta _{2}<\varepsilon _{3}+\delta _{3}\)). Which implies that \(\varepsilon _{1}<\varepsilon _{3}\), \(\delta _{1}<\delta _{3}\) (or \(\varepsilon _{1}+\delta _{1}<\varepsilon _{3}+\delta _{3}\)), hence the result.

  3. (c)

    Let \((x_{1}^{*},\ldots ,x_{k}^{*})\) be first-rank better than \((x_{1}',\ldots ,x_{k}')\), where \((\varepsilon _{1},\delta _{1})\) and \((\varepsilon _{2},\delta _{2})\) denote the minimum possible values such that \((x_{1}^{*},\ldots , x_{k}^{*})\), and respectively \((x_{1}',\ldots ,x_{k}')\) are \((\varepsilon _i,\delta _i)\)-approximate Stackelberg solutions. Then either \(\varepsilon _{1}<\varepsilon _{2}\), \(\delta _{1}\le \delta _{2}\) or \(\varepsilon _{1}\le \varepsilon _{2}\), \(\delta _{1}< \delta _{2}\). In both cases \(\varepsilon _{1}+\delta _{1}<\varepsilon _{2}+\delta _{2}\). Which implies that \((x_{1}^{*},\dots ,x_{k}^{*})\) is second-rank better than \((x_{1}',\ldots ,x_{k}')\). To show that the converse is not necessarily true, consider the case where, \(\varepsilon _{1}=0.01,\varepsilon _{2}=0.02, \delta _{1}=0.001\), \(\delta _{2}=0.0005\), and \((\varepsilon _{1},\delta _{1})\), \((\varepsilon _{2},\delta _{2})\) denote the minimum possible values such that \((x_{1}^{*},\dots ,x_{k}^{*})\), and respectively \((x_{1}',\ldots ,x_{k}')\) are \((\varepsilon _i,\delta _i)\)-approximate Stackelberg solutions. Then \((x_{1}^{*},\dots ,x_{k}^{*})\) is second-rank (but not first-rank) better than \((x_{1}',\ldots ,x_{k}')\). \(\square \)

3.2 \((\varepsilon , \delta )\)-convergence of SEAMSP

Convergence of evolutionary algorithms for single-level optimization problems has been studied by different authors, such as Greenwood and Zhu (1999), and Rudolph (1994, 1996, 1998). However, the conflicting multiparamertic behavior of the lower-level problems in the MSPs make it hard to apply the results of these papers. In this subsection, according to Definition 5, the convergence of SEAMSP to a Stackelberg solution of (1) has been studied for problems satisfying the assumptions in Theorem 1 below. The result in the theorem shows the performance of SEAMSP in searching the entire feasible region of a given MSP.

Definition 4

An evolutionary algorithm for a MSP is said to be an \((\varepsilon ,\delta )\)-convergent algorithm if, there exist natural numbers \(N\) and \(\tau \), such that for any iteration \(i \ge N\) and number of initial population \(\tau \), the algorithm results with an \((\varepsilon ,\delta )\)-approximate Stackelberg solutions.

Definition 5

An evolutionary algorithm for a MSP is said to be approximately convergent if, \(\forall \varepsilon ,\delta > 0\) the algorithm is \((\varepsilon ,\delta )\)-convergent.

Theorem 1

Consider the k-level Stackelberg problem (1), defined in Sect. 1, where \(X_{i}=[m_{i},M_{i}]^{n_{i}}\) and \(S^{j}=X_{j}\times \dots \times X_{k}\). Let, \(\forall j\in \{1,\ldots ,k\}\), \(f_{j}\) be a continuous function and \(S_{j}\) be a convex set. Let \(\forall j\in \{2,\ldots ,k\}\), \(f_{j}\) be a strictly convex function with respect to \(x_{j}\) and \(M(x_{1}^{*},\ldots ,x_{j-1}^{*})\ne \emptyset \), where \((x_{1}^{*},\ldots ,x_{j-1}^{*})\in X_{1}\times \cdots \times X_{j-1}\). Let \(\forall x \in \mathcal {I\!R}\), \(\exists \mu _{1}>0\) such thatFootnote 5 \(\beta (x,\mu _{1})\cap S^{1}\subseteq \varOmega \). In addition, \(\forall j\in \{2,\ldots ,k\}\), \(\exists \mu _{j}>0\) such that \((x_{1},\ldots ,x_{j-1}\), \(x_{{j}_{*}},\ldots ,x_{{k}_{*}})\) \(\in \varOmega \), where \((x_{{j}_{*}},\ldots ,x_{{k}_{*}})\in \beta ((x_{j}',\ldots ,x_{k}')\),Footnote 6 \(\mu _{j})\cap S^{j}\) and \((x_{j}',\dots ,x_{k}')\) \(\in M(x_{1},\dots ,x_{j-1})\).

Then \(\forall \varepsilon , \delta >0\), \(\exists \alpha <\infty \) such that each iteration of the SEAMSP results with an \((\varepsilon , \delta )\)-approximate Stackelberg solutions (or equivalently, SEAMSP is convergent to a Stackelberg solution of the given MSP).

Proof

Step-1 (existence of solution)

By definition, the induced region, \(\mathcal {I\!R}= \{(x_{1},x_{2},\ldots ,x_{k})\) \(\in S_{1}:x_{1}\) \(\in \) \( X_{1}\), \((x_{2},\ldots ,x_{k})\) \(\in M(x_{1})\}\). Then using \(\mathcal {I\!R}\) the MSP can be equivalently written as:

$$\begin{aligned} \min _{ (x_{1},\ldots ,x_{k})\in \mathcal {I\!R}} f_{1}(x_{1},\ldots ,x_{k}) \end{aligned}$$
(7)

But from the assumption, \(\forall x_{1} \in X_{1}\), \(\exists \mu _{2}>0\), such that \((x_{1},x_{2}',\dots ,x_{k}')\in \varOmega \), for any \((x_{2}',\ldots ,x_{k}') \in \beta ((x_{2},\dots ,x_{k}),\mu _{2})\cap S^i\), and \((x_{2},\ldots ,x_{k})\in M(x_{1})\ne \emptyset \).

Then \((x_{1},x_{2},\dots ,x_{k})\in \varOmega \), where \(x_{1}\in X_{1}\) and \((x_{2},\dots ,x_{k})\in M(x_{1})\). Which implies that \((x_{1},x_{2},\dots ,x_{k})\in S_{1}\), where \(x_{1}\in X_{1}\) and \((x_{2},\dots ,x_{k})\in M(x_{1})\).

So, problem (7) is equivalent to:

$$\begin{aligned} \begin{array}{l r} \min f_{1}(x_{1},\dots ,x_{k}) \\ s.t ~~~~~~ x_{1}\in X_{1} \\ ~~~~~~~~(x_{2},\dots ,x_{k})\in M(x_{1}) \end{array} \end{aligned}$$
(8)

From the assumption, for all \(j>1\), \(f_{j}(x_{1}^{*},\ldots ,x_{j-1}^{*},x_{j},x_{j+1}',\ldots ,x_{k}')\) is a strictly convex function, where \((x_{j}',x_{j+1}',\ldots ,x_{k}')\in M(x_{1}^{*},\ldots ,x_{j-1}^{*})\ne \emptyset \), \((x_{1}^{*},\ldots ,x_{j-1}^{*},x_{j},x_{j+1}',\ldots , x_{k}')\in S\cap S_{j}\cap \cdots \cap S_{k}\), and \((x_{j},x_{j+1}',\ldots ,x_{k}')\in \beta ((x_{j}',\ldots ,x_{k}'),\mu _{j}) \cap S\).

This implies that, the \(j^{th}\)-level problem, \(j\in \{2,\ldots ,k\}\), has at least one reaction, \(x_{j}'\), for any other reactions below the \(j\)-level. But since for all \(j>1\), \(f_{j}\) is strictly convex w.r.t \(x_{j}\), regardless of the other reactions, the \(j^{th}\)-level problem, \(j\in \{2,\ldots ,k\}\), reacts uniquely. Which implies that, \(\forall x_{1}\in X_{1}\), \(M(x_{1})\) contains a unique point.

Therefore, problem (8) is equivalent to:

$$\begin{aligned} \begin{array}{rl} \min f_{1}(x_{1},M(x_{1})) \\ ~~~~ s.t ~~~ x_{1}\in X_{1}, \end{array} \end{aligned}$$
(9)

which is an optimization problem of a continuous function over a compact set. Hence, the existence of a solution follows from a well known Weierstrass Theorem.

Step-2 (selection of step-sizes and \(\alpha \) )

Let \(M(x_{1},\ldots ,x_{j-1})=(x_{j}',\ldots ,x_{k}')\), for arbitrary \(j>1\) and \((x_{1},\ldots ,x_{j-1})\) \(\in \) \(X_{1}\times \cdots \times X_{j-1}\). From the assumption that, \(f_{j}\) is continuous on \((x_{1},\ldots ,x_{j-1}\), \(x_{j}',\ldots ,x_{k}')\) we have, \(\forall \delta >0, \exists \delta '_{j}>0\) such that \(f_{j}(x_{1},\ldots ,x_{k})\) \(\in \) \(\beta (f_{j}(x_{1},\ldots ,x_{j-1}\), \(x_{j}',\ldots ,x_{k}'),\delta )\)   whenever,    \((x_{1},\ldots ,x_{k})\in \beta ((x_{1},\ldots ,x_{j-1},x_{j}',\ldots ,x_{k}'),\delta '_{j})\).

Let \(\delta ^{*}=min\{\delta '_{i}:i\in \{j,\dots ,k\}\}\) then, \((x_{1},\ldots ,x_{k})\) \(\in \) \(\beta ((x_{1},\ldots ,x_{j-1}\), \(x_{j}',\ldots ,x_{k}'),\delta ^{*})\) implies that,

$$\begin{aligned} |f_{j}(x_{1},\ldots ,x_{j-1},x_{j}',\ldots ,x_{k}')-f_{j}(x_{1},\ldots ,x_{k})|<\delta . \end{aligned}$$

But, \((x_{1},\dots ,x_{k})\in \beta ((x_{1},\dots ,x_{j-1},x_{j}',\dots ,x_{k}'),\delta ^{*})\) if and only if,

$$\begin{aligned} \sum _{i=j}^{k}\Vert x_{i}-x_{i}'\Vert ^{2} < \delta ^{*2}. \end{aligned}$$

Choose \(x_{i'}\) and \(x'_{i'}\) such thatFootnote 7 \(\Vert x_{i}-x_{i}'\Vert =\Vert x_{i'}-x_{i'}'\Vert \), \(\forall i,i'\in \{j,\ldots ,k\}\). Then

$$\begin{aligned} (k-j+1)\Vert x_{j}-x_{j}'\Vert ^{2}< \delta ^{*2}. \end{aligned}$$

This implies that

$$\begin{aligned} \sum _{i=1}^{n_{j}} |x_{ji}-x_{ji}'|^{2} < \frac{\delta ^{*}}{(k-j+1)^{\frac{1}{2}}}, \end{aligned}$$

where \(x_{j}=(x_{j1},\dots ,x_{jn_{j}})\) and \(x_{j}'=(x_{j1}',\dots ,x_{jn_{j}}')\).

Now choose \(|x_{ji}-x_{ji}'|=|x_{ji'}-x_{ji'}'|\), \(\forall i,i'\in \{1,\ldots ,n_{j}\}\). Then

$$\begin{aligned} |x_{ji}-x_{ji}'| < \left( \frac{\delta ^{*}}{n_{j}(k-j+1)^{\frac{1}{2}}}\right) ^{\frac{1}{2}} \end{aligned}$$
(10)

Hence for all \(j'\in \{j,\dots ,k\}\) any feasible movement of the sample points \((x_{j'},\ldots ,x_{k})\), from their optimal reaction \((x_{j'}',\dots ,x_{k}')\), with a step size of \(\left( \frac{\delta ^{*}}{n_{j}(k-j+1)^{\frac{1}{2}}}\right) ^{\frac{1}{2}}\) results in a \(\delta \)-reaction.

But from the assumption, \(\forall j\in \{2,\ldots ,k\},\exists \mu _{j}>0\) such that, \((x_{1},\ldots ,x_{j-1}\), \(x_{j}',\ldots ,x_{k}')\in \varOmega \) whenever \( (x_{j},\dots ,x_{k})\in \beta ((x_{j}',\ldots ,x_{k}'),\mu _{j})\cap S^{j}\).

Choose \(\mu ^{*}=\min \{\mu _{i}:i\in \{j,\ldots ,k\}\}\) then, \((x_{1},\ldots ,x_{j-1},x_{j},\ldots ,x_{k})\in \varOmega \) whenever \((x_{j},\ldots ,x_{k})\in \beta ((x'_{j},\dots ,x'_{k}),\mu ^{*})\cap S^{j}\).

Define \(r\) as follows,

$$\begin{aligned} r=\min \left\{ \left( \frac{\delta ^{*}}{n_{j}(k-j+1)^{\frac{1}{2}}}\right) ^{\frac{1}{2}}, \left( \frac{\mu ^{*}}{n_{j}(k-j+1)^{\frac{1}{2}}}\right) ^{\frac{1}{2}},j\in \{2,\ldots ,k\}\right\} . \end{aligned}$$

Then any choice of step sizes, \(\delta _{j}\) : \(0<\delta _{j}\le r\), \(j\in \{2,\ldots ,k\}\), results with in a \(\delta \)-reaction.

Let \((x_{1}^{*},\dots ,x_{k}^{*})\) be a Stackelberg solution. But \(f_{1}\) is continuous implies \(\forall \varepsilon >0, \exists \varepsilon ^{*}>0\) such that, \(f_{1}(x_{1},\dots ,x_{k})\in \beta (f_{1}(x_{1}^{*},\ldots ,x_{k}^{*}),\varepsilon )\) whenever \((x_{1},\ldots ,x_{k})\in \beta ((x_{1}^{*},\ldots ,x_{k}^{*}),\varepsilon ^{*})\).

Using similar procedure as in (10) we can choose \(x_{ji}\) with \(|x_{ji}-x_{ji}^{*}| < \left( \frac{\varepsilon ^{*}}{n_{1}k^{\frac{1}{2}}}\right) ^{\frac{1}{2}}\), where \(j\in \{1,\ldots ,k\}\).

Thus, any feasible movement of a VCP \((x_{1},\dots ,x_{k})\), from the Stackelberg solution \((x_{1}^{*},\ldots ,x_{k}^{*})\), with a step size of \(\left( \frac{\varepsilon ^{*}}{n_{1}k^{\frac{1}{2}}}\right) ^{\frac{1}{2}}\) results in less than \(\varepsilon \) difference from \(f_{1}(x_{1}^{*},\ldots ,x_{k}^{*})\).

But from the assumption \(\exists \mu _{1}>0\) such that, \((x_{1},\ldots ,x_{k})\in \varOmega \) whenever \((x_{1},\ldots ,x_{k})\in \beta ((x_{1}^{*},\ldots ,x_{k}^{*}),\mu _{1})\cap S^{1}\).

Set \(r^{*}=\min \left\{ \left( \frac{\mu _{1}}{n_{1}k^{\frac{1}{2}}}\right) ^{\frac{1}{2}}, \left( \frac{\varepsilon ^{*}}{n_{1}k^{\frac{1}{2}}}\right) ^{\frac{1}{2}}\right\} \) and \(p= \min \{r,r^{*}\}\). Then, any movement of VCPs \((x_{1},\ldots ,x_{k})\), from the Stackelberg solution \((x_{1}^{*},\ldots ,x_{k}^{*})\), with a step size of \(r^{*}\) results in less than \(\varepsilon \) difference from \(f_{1}(x_{1}^{*},\ldots ,x_{k}^{*})\).

Hence, any choice of step-sizes \(\delta _{j}\) where \(0<\delta _{j}\le p\) and \(j\in \{1,\dots ,k\}\), results in an \((\varepsilon ,\delta )\)-approximate Stackelberg solution. So the number of VCPs on a single dimensional space of \(x_{j}\) is \(b_{j}=\frac{d_{j}}{\delta _{j}}\), for \(j\in \{1,\dots ,k\}\).

Now one can easily select \(\alpha \), sufficiently large number which results in \(a_{j}\ge 2b_{j}, \forall j\in \{1,\ldots ,k\}\). \(\square \)

Note that the above proof has shown that the result can be achieved within a single iteration of the algorithm. When the iteration continuous, the points chosen will be better in terms of \(\varepsilon \). Therefore, each iteration of SEAMSP may result in non-Stackelberg but approximate Stackelberg solution of the problem. Initial solution of an iteration is updated by the best VCP of the leader DM together with its hierarchical reaction from the VCPs* of the lower level DMs. This kind of selection guarantees that any non-Stackelberg but first-rank better approximate Stackelberg solution in an iteration, if it exist, will always be set as an initial solution of the iteration.

Fig. 6
figure 6

Feasible sets for problem 11

Remark 1

The assumptions in Theorem 1, could be satisfied by different MSPs, containing non-convex and non-differentiable functions in all levels.

For instance, consider a bilevel Stackelberg problem of the form MSP-(1) where,

$$\begin{aligned} f_{1}(x,y)= & {} 5\sin (x^{2}+1)-|\cos (y-1)|+|x-y|,\nonumber \\ f_{2}(x,y)= & {} |-x^3+2x^{2}|+y^{2},\nonumber \\ x\in X_{1}= & {} [-5,5], y\in X_{2} = [-5,5], \end{aligned}$$
(11)

\(S_{1}=\{(x,y):x+y\le 6, x-y\le 6\}\) and \(S_{2}=\{(x,y):-x+y\le 6\}\),

The feasible regions of the leader and follower problems are shown in Fig. 6:

Then \(f_{1}\) and \(f_{2}\) are continuous functions, \(S_{1}\) and \(S_{2}\) are convex sets. \(\forall x^{*}\in X_{1}\), \(M(x^{*})=\{0\}\) \((\)i.e \(\mathcal {I\!R}=\{(x^{*},0):x^{*}\in X_{1}\})\) and \(f_{2}(x^{*},y)=|-x^{*3}+2x^{*2}|+y^{2}\) is a strictly convex function.

We can easily observe from Fig. 6 that, \(\forall (x^{*},y^{*})\in \mathcal {I\!R}=\{(x^{*},0):x^{*}\in X_{1}\}, \exists \mu _{1}>0\) (any \(\mu _{1}\), where \(0<\mu _{1}<1\)) such that \((x,y)\in \varOmega \), \(\forall (x,y) \in \beta ((x^{*},0),\mu _{1})\cap S\) and \(\exists \mu _{2}>0\) (any \(\mu _{2}\), where \(0<\mu _{2}<1\)) such that \((x^{*},y)\in \varOmega \), \(\forall y\in \beta (0,\mu _{2})\cap X_{2}\), \(x^{*}\in X_{1}\).

Hence, it satisfies the assumptions in Theorem 1

Remark 2

The convergence proof of SEAMSP is done only for MSPs satisfying the assumptions in Theorem 1, however the algorithm works for a general MSPs having bounded decision variables. We can observe this from the numerical examples in the next section, in which the problems may contain any kind of nonconvex, non-differentiable and even discontinuous function forms.

4 Numerical results

To show the efficiency and effectiveness of the proposed algorithm for solving MSPs with various types of objective functions, we conduct some numerical experiments. For each test problem, we execute the MATLAB code of the proposed algorithm for 10 independent runs and recorded the following data:

  1. 1)

    The best solution by the algorithm;

  2. 2)

    The objective function values at each solution;

  3. 3)

    The CPU time (in seconds) required.

In-addition to this we recorded \((\varepsilon ,\delta )\)-approximation for those problems which already have a known exact solution and easily determined reactions. An \((\varepsilon ,\delta )\)-approximation are also recorded for those problems having known boundary values of the objective functions.

In all the test problems the algorithm parameters are selected as follows:

\(\gamma =10\), \(t=5\), \(\beta _{1}=5\), \(\beta _{2}=50\), \(\epsilon _{1}=0.005\), and \(\alpha =50,000\) for Examples 2 and 5. But for Examples 1, 3, 4, we used \(\alpha =10,000\), and for Example 6, \(\alpha =70,000\) we used. Moreover, for the Trilevel Problems in Examples 2 and 5, we used the parameter values \(\gamma =6\), \(t=3\), \(\beta _{1}=10\), \(\beta _{2}=50\), \(\epsilon _{1}=0.01\). In Examples 7 and 8 we considered 4-level problems with non convex constraint sets in some of the levels. For these two problems we used the parameter values \(\gamma =4\), \(t=2\), \(\beta _{1}=4\), \(\beta _{2}=10\), \(\epsilon _{1}=0.05\), and \(\alpha =170,000\), while the rest remain the same in the above cases.

The first two examples below are relatively simple and we presented them here only for comparison purpose to what has been reported in literature. Tables 2 and 3

Example 1

Wang et al. (2005)

(12)
Table 2 Best solutions by SEAMSP
Table 3 \((\varepsilon ,\delta )\)-approximate Stackelberg solution of the best solutions by SEAMSP

Example 2

Tilahun et al. (2012)

(13)

Example 3

(Nonconvex bilevel problem)

(14)

Example 4

(Nonconvex bilevel problem)

(15)

Example 5

(Trilevel problem with nonconvex inner levels)

(16)

Example 6

(Nonconvex bilevel problem)

(17)

Example 7

(4-level problem with nonconvex constraint sets)Footnote 8

(18)

Example 8

(Nonconvex 4-level problem with nonconvex constraint sets)

(19)

5 Conclusion

In this paper a natural way of passing decision values consecutively into lower level decision makers from the upper levels has been used to develop a heuristic algorithm that can give an approximate global solution for multilevel nonlinear problems. The proposed algorithm can possibly give approximate solutions for multilevel Stackelberg problems with non-differentiable objective functions and also problems with non-convex constraints in their inner levels. It is implemented on a personal computer using MATLAB software and tested on 8 problems that are selected partly from existing literature (one bilevel problem and one trilevel problem) and others constructed for testing purpose.

In all the test problems, SEAMSP results in an \((\varepsilon , \delta )\)-approximate Stackelberg solutions, with sufficiently small values of \(\varepsilon \) and \(\delta \). SEAMSP has been tested with MSPs including non familiar functions, like discontinuous, non-convex, non-differentiable, etc. functions, in which no existing algorithm can deal with such kind of problems. The proposed algorithm performs well in all the test problems including the difficult once and all the results show that, the proposed algorithm can find near global optimal solutions in a relatively short period of time. This indicates that is efficient and effective. The definitions of \((\varepsilon , \delta )\)-approximation and the new comparison methods might be used in the future as a reference to select a best approximate Stackelberg solutions of a MSP from a given set of approximate Stackelberg solutions. The newly constructed problems could be used as a benchmark to a comparison of similar algorithms for MSPs.

We believe that, the theoretical and numerical parts of our proposed algorithm constitutes a new and useful addition to the topic. Strengths of the algorithm include that it can be extended to any complex optimization problems. Our further research will extend topics on \((\varepsilon ,\delta )-\)approximation and \((\varepsilon ,\delta )\)-convergence of algorithms for the general multilevel optimization problems.