1 Introduction

A bilevel programming problem (BLPP) is a hierarchical optimization problem consisting of two levels, in which the first level is dominant over the second level. Consequently, the bilevel programming problem can be divided into two optimization problems located at different levels, the leader’s and the follower’s problems. Unlike other mathematical programs, the constraints of the bilevel programming problem always involve the optimality to the follower’s problem. In other words, any feasible solution must satisfy the requirement that the follower’s variable values are optimal to the follower’s problem when the leader’s variables are fixed. The problem can be formulated as follows

$$\begin{aligned} \left\{ \begin{aligned}&\min _{x\in X} F(x,y)\\&s.t.\ G(x,y)\le 0\\&\min _{y\in Y} f(x,y)\\&s.t. \ g(x,y)\le 0 \end{aligned} \right. \end{aligned}$$
(1)

where \(x\in R^n, y\in R^m\). In this problem,

$$\begin{aligned} \left\{ \begin{aligned}&\min _{x\in X} F(x,y)\\&s.t.\ G(x,y)\le 0. \end{aligned} \right. \end{aligned}$$
(2)

and

$$\begin{aligned} \left\{ \begin{aligned}&\min _{y\in Y} f(x,y)\\&s.t. \ g(x,y)\le 0 \end{aligned} \right. \end{aligned}$$
(3)

are the leader’s and the follower’s problems, respectively. The variables of problem (1) are divided into the leader’s variables \(x=(x_1,\ldots , x_n)^T\) and the follower’s variables \(y=(y_1,\ldots , y_m)^T\). Similarly, \(F\) and \(f\) are known as the leader’s and the follower’s objective functions respectively, whereas the vector-valued functions \(G{:}\,R^n\times R^m \rightarrow R^p\) and \(g{:}\,R^n\times R^m \rightarrow R^q\) are called the leader’s and the follower’s constraints, respectively. The sets \(X\) and \(Y\) place additional constraints on the variables, such as upper and lower bounds or integrality requirements etc.

The decision-making process of (1) can be stated as follows: The leader first selects a strategy \(x\) to optimize his/her objective, then the follower observes the leader’s selection and finds a strategy \(y\) to optimize his/her own objective. When such pair \((x,y)\) satisfies the leader’s constraints, it is called feasible in a bilevel decision-making procedure. The purpose of solving (1) is to optimize \(F(x,y)\) on the set of all feasible points.

In the optimization procedure, if the follower has more than one optimal solution for some \(x\), decision-makers have to select one in the optimal solution set to match \(x\). Two extreme cases are optimistic and pessimistic models (Dempe 2002, 2003). In the paper, we assume that the follower has a unique solution for any \(x\), which can make problem (1) stable (Bard 1998).

Over the past 20 years or so, the field of bilevel optimization has received a lot of attention in developing efficient algorithms and applying bilevel models to deal with hard real-world problems.

When all functions involved in (1) are linear, it is called a linear bilevel programming problem (Bard 1998; Calvete et al. 2008; Glackin et al. 2009). Some classical optimization techniques, such as “k-th best” algorithms, branch-and bound approaches, base-enumerating methods and penalty methods, etc, have been proposed for this kind of problems (Dempe 1987; Bard 1998; Colson et al. 2007). For nonlinear BLPPs, most of researches so far focus on convex and differential BLPPs in which all functions involved are convex and twice continuously differential (Colson et al. 2007, 2005; Andreani et al. 2009; Dempe 2002; Mersha and Dempe 2011; Wang et al. 2010), especially on convex quadratic BLPPs (Etoa 2010, 2011; Muu and Quy 2003). In addition, other types of BLPPs have been dealt with, e.g., bilevel multiobjective programming problems in which \(F\) and/or \(f\) are vector-valued functions (Deb and Sinha 2009), and bilevel (mixed-) integer programs where some variables are restricted into the integer set (Gümüs and Floudas 2005; Li and Wang 2008b; Shim et al. 2013), etc.

Applications have been stimulating factors for the development of BLPPs. Some interesting results have been presented for dealing with a variety of intractable problems in real world, such as the investigation of network of oligopolies (Dempe 2003; Abdou-Kandil and Bertrand 1987), the traffic planning (Migdalas 1995; Calvete et al. 2011), the road pricing problem (Dempe and Zemkoho 2012), the tax credits problem for biofuel production (Bard et al. 2000), the network design problem (Ben-Ayed et al. 1988), and the terrorist threat problem (Scaparra and Church 2008; Arroyo and Galiana 2009). Other applications of BLPPs can be found in Dempe (2003), Bard (1998), Colson et al. (2007).

In spite of the fact that there are lots of theoretic results and efficient methods for BLPPs, it does not mean the problem can be solved easily. From a computational point of view, the computational complexity of the problem can be analyzed by the following three items (Dempe 2003; Bard 1998; Colson et al. 2007).

  • BLPP is strongly NP-hard;

  • Feasible region is non-convex, making the problem non-convex;

  • Solution functions of the follower’s problem may be non-differential, it implies that BLPP is non-differential.

Item 1 shows this class of problems are intrinsically hard to solve. Items 2–3 mean the algorithmic approaches based on single-point search can’t find out the globally optimal solutions very well. The fact leads researchers to adopt some intelligent algorithms using population-search which don’t put any requirements for the differentiability and convexity of functions, e.g. evolutionary algorithms(EAs)/genetic algorithms(GAs) (Calvete et al. 2008; Wang et al. 2005; Li and Wang 2008a, 2011; Calvete et al. 2009; Wang et al. 2011), and artificial neural network(ANN) (Lan et al. 2007). etc. These algorithms can be divided into three classes. The first always begins with leader’s variables, and for each selected value of \(x\), to solve the follower’s problem for \(y\) (Li and Wang 2008a; Wang et al. 2011). The second class of approaches uses some techniques to transform BLPP into a single-level program, such as penalty functions or K-K-T conditions (Wang et al. 2005). The third is to design the algorithm using problem-specific optimality results (Dempe 1987; Calvete and Gale 1998; Calvete et al. 2009, 2008; Li and Wang 2011). When a BLPP is large-scale, the first and the second classes of algorithms are time-consuming since there are too large search spaces to explore. Using the optimality results of linear/linear fractional BLPPs, Calvete proposed base-based GAs for these two classes of problems (Calvete et al. 2008, 2009), and the experiment results show that it is efficient in solving large-scale problems. It implies the third seems to be promising for solving large-scale problems. Unfortunately, the methods can’t deal with other BLPPs.

In the present paper, we deal with a special class of BLPPs in which the follower is a fractional program, whereas the leader’s problem is simply solvable. Obviously, the model is more general than the linear and fractional BLPPs discussed by Calvete et al. (2008, (2009). We present a GA-based global optimization algorithm for the bilevel programming problem. First, the proposed algorithm begins with the bases of the follower’s problem, and these bases are taken as individuals of GA. Given any individual (base), we consider the set of values of \(x\) on which the base is always feasible and optimal. For any \(x\) in the set, the follower’s solution \(y(x)\) can be represented as a linear function of \(x\). Then, an optimization method is adopted to find out the best \(\bar{x}\) in the set such that \(F\) is minimal, and the objective value \(F(\bar{x},y(\bar{x}))\) is taken as the fitness value of the individual. After all bases are checked, the bilevel programming problem is solved. Since the number of bases will increase fast as the scale of the problem become larger, the algorithms using completely enumerating schemes are far from being efficient. In our algorithm, an efficient GA is designed and used to explore these bases.

The proposed algorithm is different from the existing algorithms mainly in two ways: (1) our algorithm begins with the follower’s problem and the search space is the set of feasible bases of the follower’s program. It means that the search space of the algorithm is finite and smaller than those of other algorithms (Calvete et al. 2008, 2009; Deb and Sinha 2009; Wang et al. 2005, 2011); (2) when the fitness is evaluated, a sub-procedure of optimization is implicitly executed by which the leader’s objective value can be improved.

This paper was organized as follows. Some notations and discussed problem were stated in Sects. 2 and  3 gave a profile of algorithmic approach. Genetic operators were designed in Sect. 4, and then based on these operators, Sect. 5 presented our algorithm. In Sect. 6, the convergence was analyzed, and some computational examples were given and solved in Sect. 7. We finally concluded our paper in Sect. 8.

2 Discussed problem and basic notations

In this paper a specific BLPP is considered, that is, the follower is a linear fractional program and the leader’s problem is solvable. Let us denote the problems by

$$\begin{aligned} \left\{ \begin{aligned}&\min _{x\in X} F(x,y)\\&s.t.\ G(x,y)\le 0\\&\min _{y} f(x,y)=\frac{c_1x+d_1y+e_1}{c_2x+d_2y+e_2}\\&s.t. \ Ax+By \le b, \quad y \ge 0 \end{aligned} \right. \end{aligned}$$
(4)

where \(A\) is a \(q\times n\)-matrix, \(B\) is a \(q\times m\)-matrix, and \(b\in R^q\). \(X\) is a box set as follows:

$$\begin{aligned} X= \{(x_1,x_2,\ldots , x_n)^T \in R^n | x_i \in [l_i,u_i], \quad i=1, \ldots , n \} \end{aligned}$$

where \(l_i, u_i\) are real constants.

Now we introduce some related definitions and notations as follows Bard (1998).

  1. 1)

    Constraint region: \(S=\{(x,y)| x\in X, G(x,y) \le 0, Ax+By \le b, \quad y \ge 0\}\).

  2. 2)

    Feasible region of follower’s problem for \(x\) fixed: \(S(x)=\{y | Ax+By \le b, \quad y \ge 0 \}\).

  3. 3)

    Projection of \(S\) onto the leader’s decision space: \(S(X)=\{x\in X | \exists y, (x,y)\in S\}\).

  4. 4)

    Follower’s rational reaction set for each \(x\in S(X)\): \(M(x)=\{y | y\in argmin\{f(x,y), v\in S(x)\}\}\).

  5. 5)

    Inducible region: \(\textit{IR}=\{(x,y)\in S | y\in M(x)\}\).

In terms of aforementioned definitions, problem (4) can also be written as:

$$\begin{aligned} \min \{F(x,y) | (x,y)\in { IR}\} \end{aligned}$$

Definition 1

(Dempe 1987) (Feasible solution) \((x, y)\) is said to be a feasible solution to (4) if and only if \((x, y)\in { IR}\).

Definition 2

(Dempe 1987) (Optimal solution) \((x^*, y^*)\) is said to be an optimal solution to (4) if \((x^*, y^*) \in { IR}\) and satisfies

$$\begin{aligned} F(x^*, y^*)\le F(x, y), \quad \forall (x,y)\in { IR}. \end{aligned}$$

In the remainder, we always assume that

A1.:

For all decisions taken by the leader, each follower has some room to react, that is, \(S(x)\ne \phi \).

A2.:

The rank of matrix \(B\) is q.

A3.:

\(c_2x+d_2y+e_2\ne 0\) for \(\forall x\in X\).

In these assumptions, A1 is a common assumption for bilevel programming problems, which makes the bilevel programming problem well posed. Assumptions A2 and A3 are often presented in algorithmic approaches to linear fractional programs (Calvete et al. 2009; Swarup 1965), which can simplify the description of algorithms.

Also, in order to solve problem (4) easily, we further assume the leader’s problem can be easily solved, that is to say, there exists at least one deterministic approach by which one can obtain the optima of the leader’s problem. There are lots of problems of this type, such as linear programs, convex quadratic programs, and linear fractional programs, etc. In fact, the assumption is necessary to almost all bilevel program solvers, otherwise, problem (1) can not be easily solved.

3 Optimality conditions and profile of the proposed approach

First, we take advantage of the characteristics of the follower’s problem to present some optimality results. Then, based on these results, we provide an algorithmic profile for solving BLPP (4).

3.1 Transformation of BLPP

Since linear inequalities can be converted to equalities by adding some slack variables at left-hand side. Without any loss of generality, (4) can be rewritten as follows

$$\begin{aligned} \left\{ \begin{aligned}&\min _{x\in X} F(x,y)\\&s.t.\ G(x,y)\le 0\\&\min _{y} \frac{c_1x+d_1y+e_1}{c_2x+d_2y+e_2}\\&s.t. \ Ax+By=b, \quad y \ge 0 \end{aligned} \right. \end{aligned}$$
(5)

in which the follower is a linear fractional programming problem with parameter vector \(x\) as follows

$$\begin{aligned} \left\{ \begin{aligned}&\min _{y} \frac{c_1x+d_1y+e_1}{c_2x+d_2y+e_2}\\&s.t. \ By=b-Ax, \quad y \ge 0. \end{aligned} \right. \end{aligned}$$
(6)

Also, \(B\) is still a \(q\times m\)-matrix.

3.2 Optimality results

In the subsection, at first, the optimality results of the linear fractional problems are discussed. Then, these optimality results are used to deal with BLPP (5).

Theorem 1

(Swarup 1965) For each \(x\in S(X)\), the optimal value of (6) can occur at an extreme point of \(S(x)\).

Theorem 1 implies that one can find the optima of (6) for each \(x\) by enumerating all bases associated with extreme points. Next, we present an optimality criterion by which one can judge whether a basic feasible solution is optimal.

Let \(B=(\bar{B}, N)\), and without loss of generality, \(\bar{B}\), as a base, is composed of the first \(q\) columns of \(B\). Then \(y_{\bar{B}}=\bar{B}^{-1}(b-Ax)-\bar{B}^{-1}Ny_N\). Further,

$$\begin{aligned}&c_ix+d_iy+e_i\nonumber \\&\quad = d_{i\bar{B}}y_{\bar{B}}+d_{iN}y_N+c_ix+e_i\nonumber \\&\quad = d_{i \bar{B}}\bar{B}^{-1}(b-Ax)+c_ix+e_i+(d_{iN}-d_{i \bar{B}}\bar{B}^{-1}N)y_N \end{aligned}$$
(7)

Here, \(i=1,2\). Set \(u_0=d_{1 \bar{B}}\bar{B}^{-1}(b-Ax)+c_1x+e_1\), \(v_0=d_{2 \bar{B}}\bar{B}^{-1}(b-Ax)+c_2x+e_2\), \(\mu =d_{1N}-d_{1 \bar{B}}\bar{B}^{-1}N\) and \(\nu =d_{2N}-d_{2 \bar{B}}\bar{B}^{-1}N\). Then the following theorem can be inferred.

Theorem 2

For the base \(\bar{B}\) and any \(x\) fixed , if inequalities \(\pi =v_0\mu -u_0\nu \ge 0\) and \(\bar{B}^{-1}(b-Ax)\ge 0\) hold, then the base is optimal, and the basic components of the optimal solution are \(\bar{B}^{-1}(b-Ax)\), whereas other components are 0.

Proof

When the denominator of the objective function is positive in (6), Theorem 2 is true Swarup (1965). For the case that the denominator is negative, both the denominator and the numerator of \(f(x,y)\) are multiplied by \(-1\). Set \(c'_i=-c_i, d'_i=-d_i\), and \(e'_i=-e_i\), then \(\pi '=v'_0\mu '-u'_0\nu '=-v_0(-\mu )-(-u_0)(-\nu )=v_0\mu -u_0\nu =\pi \). This completes the proof. \(\square \)

3.3 Sub-procedure of optimization

As discussed above, for any base \(\bar{B}\) and some \(x\)’s in \(X\), if the following inequalities

$$\begin{aligned} \bar{B}^{-1}(b-Ax)\ge 0 \end{aligned}$$
(8)

and

$$\begin{aligned} \pi \ge 0 \end{aligned}$$
(9)

are satisfied, then a subregion of \(X\) is determined in which for each \(x\), \(y(x)=(\bar{B}^{-1}(b-Ax),0)^T\) is the optimal solution to the follower’s problem. In order to obtain the optima of \(F(x,y)\), we consider the following nonlinear problem

$$\begin{aligned} \left\{ \begin{aligned}&\min _{x\in X} F(x,y(x))\\&s.t. \, G(x,y(x))\le 0,\\&\bar{B}^{-1}(b-Ax)\ge 0, \\&\pi \ge 0. \end{aligned} \right. \end{aligned}$$
(10)

If (10) is further solved, and an optimal solution \(x_0\) is obtained, this means that \(x_0\) is the best one in all \(x\)’s to which \(\bar{B}\) is feasible and optimal. Such point \((x_0, y(x_0))\), in fact, is a “locally optimal” solution to (5) (quotation mark means it isn’t really a locally optimal point in mathematical meaning of neighborhood), and called a base-optimal point. Figure 1 gives an example for illustrating the relationships, where the elliptic region represents \(S(X)\) and the total number of bases is 3. First of all, all bases of the follower’s problem are denoted by \(B_i, i=1,2,3\). Then, the value region of \(x\) is divided into three subregions according to these bases, it implies that for any point in Subregion \(i\) (\(i\) = I, II, III), the optimal base of the follower’s problem is \(B_i\), \(i= 1,2,3\), respectively. Further, \(x_i\) is the “best” one in each subregion, which means the points \((x_i, y(x_i)), i=1,2,3,\) are base-optimal points. As a result, the optimal solutions to (5) must be one of these basis-optimal points.

Fig. 1
figure 1

Relationships between the bases and the basis-optimal points of BLPP

3.4 Profile of algorithmic approach

For any base \(\bar{B}\) of the follower’s problem, (10) is first solved. If there exists no solution, then the base is removed from the set of all bases. Otherwise, the objective value is used to evaluate the base. When all bases are evaluated, the “best” base can be found, and then (5) is solved. In fact, the base-optimal point corresponding to the “best” base is an optimal solution to (5). In the profile, a genetic algorithm is designed to explore all bases.

4 Design of genetic algorithm

In this section, we begin with chromosomes encoding, present the fitness evaluation scheme, and then design crossover and mutation operators.

4.1 Chromosome encoding and initial population

Since GA is used to search all bases of the follower’s problem, we encode each base of (6) as individual of population. Let \(V=\{1,2,\ldots ,m\}\) be the set of all column indices of \(B\). \(q\) elements are selected from \(V\) and denoted by \(l=\{i_1,i_2,\ldots ,i_q\}\). Furthermore, if these columns are linearly independent, then \(l\) is taken as an individual. An initial population with the size of \(N_p\) can be generated by lexicographically selecting \(N_p\) individuals.

But it is computationally expensive if the determinant method is used for each \(l\) to judge whether the selected columns are linearly independent. We present a simplified approach by the following example. Let

$$\begin{aligned} B=\left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} b_{11} &{} b_{12} &{} b_{13} &{} b_{14} \\ b_{21}&{} b_{22} &{} b_{23} &{} b_{24} \\ \end{array} \right) \end{aligned}$$
(11)

Here, \(q=2, m=4\). For convenience, we denote by \(B\{i,j\}\) the matrix consisting of the \(i\)-th and \(j\)-th columns of \(B\). Without loss of generality, set

$$\begin{aligned} B\{1,2\}=\left( \begin{array}{c@{\quad }c} b_{11} &{} b_{12} \\ b_{21}&{} b_{22} \\ \end{array} \right) \end{aligned}$$
(12)

be nonsingular, then we get an individual \(l=\{1,2\}\). Further, we let

$$\begin{aligned} B\{1,2\}^{-1}B=\left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 0 &{} \bar{b}_{13} &{}\bar{b}_{14} \\ 0 &{} 1 &{} \bar{b}_{23} &{}\bar{b}_{24} \\ \end{array} \right) \end{aligned}$$
(13)

If \(\bar{b}_{23}\ne 0\), then \(B\{1,3\}\) is nonsingular, that is, \(l'=\{1,3\}\) is also an individual, otherwise, \(l'\) is ignored. The same procedure can be applied to judge whether \(l''=\{1, 4\}\) is an individual.

If \(B\{1,3\}\) is nonsingular, based on the \(B\{1,2\}^{-1}\), we can utilize the pivoting algorithm to obtain \(B\{1,3\}^{-1}\) as in the simplex method.

4.2 Fitness function

For any individual \(l=\{l_1,l_2,\ldots ,l_q\}\), without loss of generality, the basic matrix associated with \(l\) is also denoted by \(\bar{B}\). Recall that the leader’ problem is solvable, \(y(x)\) is linear and the added constraints, \(\bar{B}^{-1}(b-Ax)\ge 0\) and \(\pi \ge 0\), are also linear, it follows that (10) is solvable with respect to \(x\). Hence, some deterministic methods can be selected to solve the problem (10) and the optimal objective value is taken as the fitness of the individual.

4.3 Crossover and mutation operators

Crossover operator Let \(l=\{l_1,l_2,\ldots ,l_q\}\) and \(l'=\{l'_1,l'_2,\ldots ,l'_q\}\) be parents selected for crossover. First, we give a cross-position in \(l'\) as in one-point crossover operator. Then the components at left-hand side of the cross position are, one by one, taken as entering elements for l and some components in l are removed as leaving elements, which follows the same procedure as in the simplex method except for the minimum ratio rule, and makes the total of elements in l keep constant. For any entering element which is also in l, the replacement is ignored. When all entering procedures are finished, a crossover offspring is generated. When a cross-position is given in l, some elements in \(l'\) will be replaced and the other offspring can be generated.

As an example, we let \(l=\{1,2\}\) and \(l'=\{3,4\}\) be parents for crossover in the above example, and the cross-position is selected at random as follows.

$$\begin{aligned} l=\{1\mid 2\},\,\, l'=\{3\mid 4\} \end{aligned}$$

It means that 3 should be put into l, and 1 or 2 will be removed from l. Let us re-check the matrix (13). Any nonzero element is chosen in the 3-th column of the matrix. If \(\bar{b}_{i3}\) is chosen, then the \(i\)-th element in l should be replaced, \(i=1,2\). after doing so, a crossover offspring is generated.

As discussed in the above subsection, when the replacements are executed one by one, the inverse matrices associated with individuals can be gotten by the pivot algorithm.

In order to generate offspring as well as possible, we always take the best individual as one of parents in each crossover process.

Mutation operator Let \(\bar{l}=\{l_1,l_2,\ldots ,l_q\}\) be a parent individual selected for mutation, and \(\tilde{B}=B\{l_1,l_2,\ldots ,l_q\}\). First, generate randomly a integer \(z(1\le z\le q)\) and select \(z\) indices from the set \(V\backslash \bar{l}\), and then put these indices, one by one, into \(\bar{l}\) as done in the crossover operation. As a consequence, \(z\) indices are replaced, and a mutation offspring is obtained.

5 Proposed algorithm

In this section, we present a genetic algorithm using a finite search space(GA-FSS), which is described as follows.

Step 0 :

Some parameters are given, population size \(N_p\), crossover probability \(p_c\), mutation probability \(p_m\), \(\mathfrak {R}=\phi \) and an integer \(K\);

Step 1 :

\(N_p\) individuals are generated, these individuals form an initial population denoted by \(pop(0)\). Let \(k=0\);

Step 2 :

The fitness is evaluated for each individual, and the best individual \(l_{best}\) in \(pop(k)\) is recorded with its fitness value \(F_{best}\). These individuals with fitness values are put into \(\mathfrak {R}\);

Step 3 :

Crossover is applied to each individual in \(pop(k)\) according to crossover probability \(p_c\), and crossover offspring set is denoted by \(O_c\);

Step 4 :

Mutation is executed to each each individual in \(pop(k)\) according to mutation probability \(p_m\), and mutation offspring set is represented as \(O_m\);

Step 5 :

Evaluate offspring generated by crossover and mutation operators. If some offspring belong to \(\mathfrak {R}\), the evaluation can be ignored. Select the best \(N_p\) individuals from \(pop(k)\cup O_c\cup O_m\) as next population \(pop(k+1)\), and update \(F_{best}\) as well as \(l_{best}\). Also, select some offspring for which the fitness evolution is computation-complex, then put these offspring into the set \(\mathfrak {R}\) until there are \(K\) points.

Step 6 :

If the stopping criterion is satisfied, then output \(l_{best}\) and \(F_{best}\); otherwise, let \(k=k+1\), return to Step 3.

The procedure shows at least two advantages: one is that the search space has at most \(C^q_{m}\) points, which is far smaller than those of most existing algorithms; the other is that there exists a local searching process in the optimization of (10), which, as a sub-procedure of optimization, is helpful for GA-FSS to improve the value of \(F\).

Since the design of GA-FSS depends mainly on the follower problem, it can be used to deal with more general BLPPs than the approaches in Calvete et al. (2008, (2009). When the denominator of \(f(x,y)\) is 1 and the leader’s problem is linear, the problem becomes a linear BLPP, which is the simplest case for GA-FSS.

6 Convergence analysis

In order to analyze the convergent of the proposed algorithm, some preliminaries are first given Wang (2011), Bäck (1996):

Definition 3

(Monotonic) If \(F(l_{best}(k+1))\le F(l_{best}(k))\), then population sequence \(\{pop(k), k=0,1,2,\ldots , \}\) is said to be monotonic.

Here, \(l_{best}(k)\) stands for the best individual in \(pop(k)\), and \(F(l)\) is the fitness of individual l.

Lemma 1

(Wang 2011; Bäck 1996) If a genetic algorithm satisfies: (i) the search space is finite, (ii) population sequence is monotonic, and (iii) for \(\forall l, l'\in \bar{\varOmega }\), \(\exists p_0>0\) such that \(prob\{l'=mut(l)\}\ge p_0\), then it converges to a globally optimal solution with probability one.

Here, \(\bar{\varOmega }\) stands for the search space, \(prob\{A\}\) is the probability of \(A\) and \(mut(l)\) represents the mutation offspring of \(l\).

In the proposed GA-FSS, the search process, in fact, is executed at two levels. The first-level search is to explore the set of potential bases of the follower, whereas the second-level search is to find the best point \((x, y)\) related to each basis. Recall that we always assume that the leader problem is solvable, it implies that one can obtain the optima of (4) once he finds out the ’best’ basis in the first-level search. Applying Lemma 1, we have

Theorem 3

The proposed GA-FSS converges to global optima with probability one.

Proof

Notice that the search space of GA-FSS only includes at most \(C^q_{m}\) points, it leads to that (i) is satisfied. According to the selection operator, the best individuals found so far are always selected for the next generation of population, hence, (ii) is also satisfied. Next, we need to verify that (iii) holds.

Without any loss of generality, let

$$\begin{aligned} l=(i_1,\ldots , i_s,i_{s+1},\ldots ,i_q), l'=(i_1, \cdots , i_s,j_{s+1},\ldots ,j_q), \end{aligned}$$

that is, there are \(q-s\) different entries. Now we compute \(prob\{l'=mut(l)\}\). First, the probability of selecting \((j_{s+1},\ldots ,j_q)\) from \(V\setminus l\) is \(\frac{1}{q}\frac{1}{C_{m-q}^{q-s}}\). Here, \(\frac{1}{q}\) means the probability of \(z\) being taken as \(q-s\), i.e., the total \(q-s\) indices will be put into \(l\), whereas \(\frac{1}{C_{m-q}^{q-s}}\) is the probability of \((j_{s+1},\ldots ,j_q)\) being chosen for entering basis. Next, when \(j_v(v\in \{s+1,\ldots ,q\})\) is put into \(l\), we denote by \(\hat{p}_v\) the probability of the replaced element belonging to \((i_{s+1},\ldots ,i_q)\), it is obvious that \(\hat{p}_v\ge \frac{1}{q}\). It follows that when \(j_{s+1},\ldots ,j_q\), one by one, are put into \(l\), the probability of \((i_{s+1},\ldots ,i_q)\) being replaced is not less than \(\frac{1}{q^{q-s}}\). As a result, we have

$$\begin{aligned} prob\{l'=mut(l)\}\ge p_m\times \frac{1}{q}\frac{1}{C_{m-q}^{q-s}}\times \frac{1}{q^{q-s}}>0 \end{aligned}$$

This completes the proof. \(\square \)

7 Computational experiments

The computational experiments were carried out on two groups of bilevel programming problems with different scales, small and larger-sized problems. First, we executed GA-FSS on 4 small-sized problems which were frequently solved to illustrate the performance of algorithmic methods in literatures. In addition, we randomly generated four moderate-sized problems on which GA-FSS with different parameter configurations was tested. By comparing the computational results, we confirmed the reasonable parameter setting. Finally, GA-FSS with confirmed configuration was applied to solve some larger-sized bilevel programs and computational costs were compared. All computations were executed on a Pentium IV 2.66 processor with 256M RAM running Windows XP.

First, four small scale problems are given as follows.

Example 1

(Wang et al. 2005)

$$\begin{aligned} \left\{ \begin{aligned}&\min _{x\ge 0} -8x_1-4x_2+4y_1-40y_2-4y_3\\&\min _{y\ge 0} \frac{1+x_1+x_2+2y_1-y_2+y_3}{6+2x_1+y_1+y_2-3y_3}\\&s.t. \ -y_1+y_2+y_3+y_4=1,\\&2x_1+y_1+2y_2-0.5y_3+y_5=1,\\&2x_2+2y_1-y_2-0.5y_3+y_6=1. \end{aligned} \right. \end{aligned}$$
(14)

Example 2

$$\begin{aligned} \left\{ \begin{aligned}&\min _{x\ge 0} (-8x_1-4x_2+4y_1-40y_2-4y_3+29.2)^2\\&\min _{y\ge 0} \frac{1+x_1+x_2+2y_1-y_2+y_3}{6+2x_1+y_1+y_2-3y_3}\\&s.t. \ -y_1+y_2+y_3+y_4=1,\\&2x_1+y_1+2y_2-0.5y_3+y_5=1,\\&2x_2+2y_1-y_2-0.5y_3+y_6=1. \end{aligned} \right. \end{aligned}$$
(15)

Example 3

(Lan et al. 2007)

$$\begin{aligned} \left\{ \begin{aligned}&\max _{x\ge 0} -2x+11y\\&\max _{y\ge 0} -x-3y\\&s.t. \ x-2y\le 4, 2x-y\le 24, 3x+4y\le 96,\\&x+7y\le 126, -4x+5y\le 65, x+4y\ge 8.\\ \end{aligned} \right. \end{aligned}$$
(16)

Example 4

(Toll 1 in Colson et al. (2005), a toll-setting problem) The network and the costs are shown in Figure 2. Nodes 1 and 5 constitute its unique origin-destination pair. In this problem, \(n=3, m=8, p=3,\) and \(q=13\). For the more detailed description of the problem, refer to Colson et al. (2005).

Fig. 2
figure 2

Network for Toll 1

For Example 1, we enumerated all 20 potential bases, in which only 4 bases are available. The optimal objective value is \(-29.2\). Example 2 is as same as Example 1 except that \(F\) is nonlinear. Obviously, the problem has the same optimal solution as Example 1, and the optimal value is 0. Example 3 is a linear programming problem, the optimal objective value reported in the literature is \(85.0855\). As a real-world example, the reported maximal profit of Example 4 is 7.

Table 1 Computational results by GA-FSS and comparison

For these small-scale problems, the parameters were given as follows: \(p_c=0.8, p_m=0.1\), \(N_p=5\) and \(K=N_p*3\). The maximum number of generations(MaxG) was taken as 10. For each example, 10 independent runs were executed, the computational results were presented in Table 1. In this Table, \(\bar{F}\) and \(std\) stand for the mean of objective values and standard deviation in 10 runs, respectively. Also, in order to measure the convergent speed of GA-FSS, we considered the number of generations(G) and CPU time(T), and recorded the means of G and T. These two means are represented by \(\bar{G}\) and \(\bar{T}\), respectively. In column \(\bar{T}\), the numbers in square brackets are CPU time needed by the compared algorithms(Despite fact that CPU time was obtained on different hardware equipments, we listed them in the table as a reference). “\(-\)” means the item does not exist.

In addition, in order to obtain a reasonable parameter configuration, we randomly generated \(4\) moderate-sized bilevel programs with the following type.

$$\begin{aligned} \left\{ \begin{aligned}&\min _{x\ge 0} cx+dy\\&\min \frac{c_1x+d_1y+e_1}{c_2x+d_2y+e_2}\\&s.t. \ Ax+By=b, \quad y \ge 0 \end{aligned} \right. \end{aligned}$$
(17)

All coefficients were generated from uniform distributions. \(c, d\) were randomly generated in \([-10, 10]\). The coefficients of fractional objectives were taken as follows: \(c_1,d_1\) and \(e_1\) were randomly generated in [30,40] and \(c_2,d_2, e_2\) were taken randomly in \([10, 20]\). \(A\) and \(B\) were generated in [\(-\)10, 10] except for the first row generated in \([0,10]\), which can guarantee the constraint region is bounded. Also, \(b\) was obtained by taking the sum of the absolute values of the coefficients of each constraint. Considering that the search space is composed of potential bases of the follower’s problem and the leader’s problems can be solved determinately, we uniformly took \(n=10\). \(m\) was taken as 10, 20, 30 and 40 respectively, and \(q\) is 50 % of \(m\), then 4 problems with different scales\((p1-p4)\) were generated. In GA-FSS, three main parameters are population size \(N_p\), crossover probability \(p_c\) and mutation probability \(p_m\). In order to find a reasonable configuration of these parameters, we consulted the procedure in Calvete et al. (2008) and took two levels for each parameter as follows: \(N_p=50\) or 100, \(p_c=0.5\) or 0.8 and \(p_m=0.05\) or 0.1. Each of parameters has 2 levels, total \(2^3\) configurations\((c1-c8)\) should be considered, see Table 2. For each configuration, we executed GA-FSS 10 independent runs on each generated problem and when a fixed number of individual evaluations was satisfied, the algorithm was stopped. It follows that total \(4\times 10\times 8\) runs of GA-FSS need to be executed. The mean objective values of all problems \((p1-p4)\) are shown in Table 2. In order to show clearly which is the best one among all configurations, we normalized the data on each of columns \(p2-p4\) by

$$\begin{aligned} \varLambda _i^*=\frac{\varLambda _{max}-\varLambda _i}{\varLambda _{max}-\varLambda _{min}} \end{aligned}$$

Here, \(\varLambda =(\varLambda _1,\ldots ,\varLambda _8)\) is a column vector, and \(\varLambda ^*\) is a normalized vector. According to the normalization procedure, the minimum in each column is normalized as 1. For column \(p1\), we directly take 1 as normalization result. The process can make all computational results plotted clearly in one figure, see Fig. 3, where \(NF\) stands for the normalized values of ojectives. From the figure, one can see that \(c6\) is the best one in that all \(4\) problems achieve 1. As a result, we select the following parameter setting: \(N_p=100, p_c=0.5\) and \(p_m=0.1\).

Fig. 3
figure 3

Computational results at all parameter configurations

Table 2 Mean objective values at different parameter configurations

The third part of the computational experiment is to test the performance of GA-FSS on moderate- and larger-scale problems. In these examples, the maximum number of variables is up to 150, few algorithms were tested on the scale. Using the same procedure as in Calvete et al. (2009), we generated seven types of linear fractional bilevel programming problems, and compared the scales of the search spaces used by GA-FSS and by EPHS(proposed in Calvete et al. (2009), and executed on a PC Pentium 4 at 3.0 GHz having 3.5 GB of RAM), see Table 3, where Num-P represents the number of points in the search space. Obviously, for each problem the search space of GA-FSS is much smaller than that of EPHS. We executed GA-FSS with the selected parameter values 10 independent runs on each problem, and the termination criterion was taken as follows: when GA-FSS can’t improve the objective value in successive 50 iterations, the algorithm stops. For each problem there exists no optimal solution as a reference, we took the smallest value in all 10 runs as the best solution to the problem.

Table 3 Scales of problems and comparison of the search spaces used by GA-FSS and EPHS
Table 4 Mean CPU time(CPU), mean of the individual numbers(Num-Ind) and the number of runs(NR) giving the best solutions

In the experiment, for each problem, when the best result appears for the first time, we recorded: (1) the mean of the individual numbers(Num-Ind) evaluated by GA-FSS; and (2) the mean of the CPU time(CPU) invested by the algorithm. All results are shown in Table 4. In order to compare our results with those provided by EPHS, in accordance with EPHS, we calculated the expectancy of the individual numbers required for finding the best solutions. Besides, we listed the numbers(NR) of runs giving the best solutions for each problem.

In Table 4, one can see that the individual number (Num-Ind) required by GA-FSS is far less than that by EPHS, which is mainly due to GA-FSS using a sub-procedure of optimization for the leader objective and having smaller search space than EPHS. For the number of runs(NR) giving the best solutions, based on the analysis of variances, we conclude that the proposed algorithm is almost the same stable and effective as EPHS. In the table “\(-\)” means that the values are not provided in the corresponding reference.

We also executed GA-FSS on two kinds of BLPPs with \(F(x,y)=cx\,{+}\,dy\,{+}\,e, x\in \{0,1\}\) and \(F(x,y)=(cx\,{+}\,dy\,{+}\,e)^2\) respectively. The computational results show there is no evident difference from the linear fractional case expect for CPU time, which can be easily explained in that the CPU time can be affected by selecting different optimization methods to solve the leader’s problems.

Besides, it should be noted that GA-FSS can deal with more general BLPPs than EPHS, for example, when the leader’s problem is convex, EPHS can’t be used to solve this kind of problems.

8 Conclusion

It is very difficult for us to design an efficient algorithm for nonlinear BLPPs with global convergence, especially when the scale of problem is very large. As a consequence, some theoretical results of optimality or the features of problems should be considered in the design of approaches. In this paper we presented an efficient genetic algorithm by making use of the optimality results of linear fractional programs, and in this algorithmic approach, there are no any restrictions on the leader except that it is solvable. In future work, some BLPPs with other follower will be considered, such as quadratic programming problem, etc.