1 Introduction

This paper is dedicated to the comparison of solution methods for the numerical treatment of the or-constrained optimization problem

$$\begin{aligned} \begin{aligned} f(x)&\,\rightarrow \,\min&\\ g_i(x)&\,\le \,0&\qquad&i\in \mathcal {M}&\\ h_j(x)&\,=\,0&\qquad&j\in \mathcal {P}&\\ G_l(x)\,\le \, 0\,\vee \,H_l(x)&\,\le \,0&\qquad&l\in \mathcal {Q}.&\end{aligned} \end{aligned}$$
(MPOC)

Here, the functions \(f,g_i,h_j,G_l,H_l:\mathbb {R}^n\rightarrow \mathbb {R}\) are assumed to be continuously differentiable for all \(i\in \mathcal {M}:=\{1,\ldots ,m\}\), \(j\in \mathcal {P}:=\{1,\ldots ,p\}\), and \(l\in \mathcal {Q}:=\{1,\ldots ,q\}\). For brevity, \(g:\mathbb {R}^n\rightarrow \mathbb {R}^m\), \(h:\mathbb {R}^n\rightarrow \mathbb {R}^p\), \(G:\mathbb {R}^n\rightarrow \mathbb {R}^q\), and \(H:\mathbb {R}^n\rightarrow \mathbb {R}^q\) are the mappings which possess the component functions \(g_i\) (\(i\in \mathcal {M}\)), \(h_j\) (\(j\in \mathcal {P}\)), \(G_l\) (\(l\in \mathcal {Q}\)), and \(H_l\) (\(l\in \mathcal {Q}\)), respectively. Forthwith, the feasible set of (MPOC) will be denoted by \(X\subset \mathbb {R}^n\). Emphasizing that \(\vee\) denotes the logical ‘or’, the last q constraints in (MPOC) force \(G_l(x)\)or\(H_l(x)\) to be less or equal to zero for all \(l\in \mathcal {Q}\) whenever \(x\in \mathbb {R}^n\) is feasible to (MPOC). Thus, we will refer to (MPOC) as a mathematical program with or-constraints.

Clearly, (MPOC) is an instance of logical mathematical programming, see e.g. [22] for an overview, and covers several interesting applications e.g. from process engineering and scheduling, see [19] and references therein. Particularly, or-constraints can be used to avoid the formulation of so-called Big-M-constraints of type

$$\begin{aligned} \begin{aligned} G_l(x)&\,\le \, My_l&&l\in \mathcal {Q}&\\ H_l(x)&\,\le \,M(1-y_l)&&l\in \mathcal {Q}&\\ y_l&\,\in \,\{0,1\}&&l\in \mathcal {Q}& \end{aligned} \end{aligned}$$

which would induce a mixed-integer-regime and the need for an a priori calculation of the constant \(M>0\), see [30]. Let us note that or-constrained programming with affine data functions is closely related to disjunctive programming in the sense of Balas, see [1], which means that a linear function is minimized over the union of convex polyhedral sets. The situation where an arbitrary convex function is minimized over the union of sets which are characterized via convex inequality constraints, respectively, is discussed in [20]. Using

$$\begin{aligned} O:=\{(a,b)\in \mathbb {R}^2\,|\,a\le 0\,\vee \,b\le 0\}, \end{aligned}$$
(1)

which can be written as a union of two convex polyhedral sets, (MPOC) can be represented equivalently by

$$\begin{aligned} \begin{aligned} f(x)&\,\rightarrow \,\min&\\ g_i(x)&\,\in \,\mathbb {R}^0_-&\qquad&i\in \mathcal {M}&\\ h_j(x)&\,\in \,\{0\}&\qquad&j\in \mathcal {P}&\\ (G_l(x),H_l(x))&\,\in \,O&\qquad&l\in \mathcal {Q}.&\end{aligned} \end{aligned}$$

which is a disjunctive program in the sense of [3, 14, 29] where \(\mathbb {R}^0_-\) denotes the set of all nonpositive real numbers. As we will see later in Sect. 4, the model (MPOC) is closely related to so-called mathematical programs with switching constraints (MPSCs), see [24, 30], and mathematical programs with complementarity constraints (MPCCs), see e.g. [21, 28, 32, 40]. First theoretical investigations which address the model (MPOC) from the viewpoint of stationarity conditions and constraint qualifications can be found in [30, Section 7].

This paper is devoted to the numerical treatment of (MPOC). Here, we want to exploit three different ideas from disjunctive programming in order to develop numerical strategies for the computational solution of or-constrained programs. We will focus our attention on the subsequently stated approaches:

  • reformulation of the or-constraints as nonlinear standard inequality constraints using so-called NCP-functions,

  • reformulation of (MPOC) as an MPSC or MPCC which can be tackled with the aid of relaxation methods from the literature, see e.g. [21, 24], and

  • direct relaxation of the or-constraints using a Scholtes-type method.

Here, we first study the individual qualitative properties of these methods before we provide a quantitative numerical comparison based on different model problems from or-constrained optimization.

Originally, NCP-functions, where NCP abbreviates nonlinear complementarity program, were introduced in order to replace systems of complementarity constraints by nonlinear and possibly nonsmooth systems of equalities which can be solved e.g. by suitable Newton-type methods, see e.g. [7, 8, 11, 12]. A satisfying overview of NCP-functions and their properties can be found in [17, 26, 36]. Here, we exploit the fact that some NCP-functions can be used to replace or-constraints by nonlinear and possibly nonsmooth systems of inequalities.

As it turns out, one can transfer (MPOC) into an MPSC or MPCC for the respective price of 2q slack variables. It has been reported in [30] that this transformation generally comes along with additional local minimizers of the surrogate MPSC and a similar behavior is at hand when the transformation into an MPCC is considered. Here, we additionally study the relationship of the stationary points associated with (MPOC) and the stationary points of the surrogate problem. As we will show, both transformations may induce additional stationary points which has to be taken into account when this solution approach is exploited since first-order methods for the numerical solution of MPSCs or MPCCs generally compute points satisfying certain problem-tailored stationarity conditions.

Noting that the variational geometry of X is directly related to the variational geometry of the set O from (1), the irregularity of the kink in O causes some irregularities in X which may cause essential trouble when (MPOC) is solved numerically (e.g. via a direct treatment using suitable NCP-functions suggested above). In order to overcome this potential issue, one could try to regularize this kink using a suitable relaxation approach. In the past, Scholtes’ relaxation method has turned out to be a robust approach for the numerical solution of MPCCs, MPSCs, and other models from disjunctive programming, see e.g. [21, 24, 34]. That is why we want to adapt it to the setting at hand. For this purpose, we suggest two different approaches based on the smoothing of the popular Fischer–Burmeister function, see [12, 23], and a shifted version of the Kanzow–Schwartz function, see [25].

The remaining parts of the paper are organized as follows: In Sect. 2, we comment on the notation used in the manuscript. Furthermore, we recall some basics from nonlinear programming as well as essential stationarity notions and constraint qualifications for or-, switching-, and complementarity-constrained programming. We study the reformulation of or-constraints with the aid of NCP-functions in Sect. 3. Additionally, we investigate how the stationary points of (MPOC) and its reformulation are related. As we will see, this heavily depends on the choice of the underlying NCP-function. In Sect. 4, we present two reasonable reformulations of (MPOC) as an MPSC as well as an MPCC and study the relationship of the original problem and its surrogate w.r.t. minimizers and stationary points, respectively. Direct Scholtes-type relaxation techniques associated with (MPOC) which are based on the smoothed Fischer–Burmeister function as well as the Kanzow–Schwartz function are the topic of Sect. 5. For both approaches, we study the underlying convergence properties as well as the regularity of the appearing subproblems. In Sect. 6, we present a quantitative comparison of all these methods based on three different models from or-constrained programming, namely a nonlinear disjunctive program in the sense of Balas, see [1], an optimization problem whose variables possess so-called gap domains, and an optimal control problem whose controls have to satisfy a pointwise or-constraint. Some concluding remarks close the paper in Sect. 7.

2 Notation and preliminaries

2.1 Basic notation

The subsequently stated tools from variational analysis can be found in [4, 31, 33].

For a vector \(x\in \mathbb {R}^n\), \(\left\| x\right\| _{2}:=\sqrt{x\cdot x}\) is used to denote its Euclidean norm where \(\cdot\) represents the Euclidean product. Choosing \(\bar{x}\in \mathbb {R}^n\) and \(\varepsilon >0\) arbitrarily, \(\mathbb {B}^\varepsilon (\bar{x})\) and \(\mathbb {S}^\varepsilon (\bar{x})\) denote the closed \(\varepsilon\)-ball and the \(\varepsilon\)-sphere around \(\bar{x}\), respectively, w.r.t. the Euclidean norm. Let \(\mathtt {e}^n\in \mathbb {R}^n\) be the all-ones-vector, while for each \(i\in \{1,\ldots ,n\}\), \(\mathtt {e}^n_i\in \mathbb {R}^n\) represents the i-th unit vector in \(\mathbb {R}^n\). Whenever \(A\subset \mathbb {R}^n\) is a nonempty set and \(\bar{x}\in A\) is chosen arbitrarily, then the closed cone

$$\begin{aligned} \mathcal {T}_A(\bar{x}):=\left\{ d\in \mathbb {R}^n\,\bigg |\, \begin{aligned}&\exists \{x_k\}_{k\in \mathbb {N}}\subset A\,\exists \{\tau _k\}_{k\in \mathbb {N}}\subset \mathbb {R}_+:\\&\qquad x_k\rightarrow \bar{x},\,\tau _k\downarrow 0,\,(x_k-\bar{x})/\tau _k\rightarrow d \end{aligned} \right\} \end{aligned}$$

is called tangent or Bouligand cone to A at \(\bar{x}\) where \(\mathbb {R}_+\) denotes the set of all positive real numbers. Furthermore, the nonempty, closed, convex cone

$$\begin{aligned} A^\circ :=\{y\in \mathbb {R}^n\,|\,\forall x\in A:\, x\cdot y\le 0\} \end{aligned}$$

is referred to as the polar cone of A. It is well known that for any two sets \(B_1,B_2\subset \mathbb {R}^n\), the polarization rule \((B_1\cup B_2)^\circ =B_1^\circ \cap B_2^\circ\) is valid.

Let \(\{v^i\}_{i=1}^r,\{w^j\}_{j=1}^s\subset \mathbb {R}^n\) be two families of vectors. Then, \(\{v^i\}_{i=1}^r\cup \{w^j\}_{j=1}^s\) is said to be positive-linearly dependent if there exist scalars \(\alpha _i\ge 0\) (\(i=1,\ldots ,r\)) and \(\beta _j\) (\(j=1,\ldots ,s\)) which satisfy

$$\begin{aligned} 0=\sum \nolimits _{i=1}^r\alpha _iv^i+\sum \nolimits _{j=1}^s\beta _jw^j \end{aligned}$$

and do not vanish at the same time. Whenever such scalars do not exist, we call \(\{v^i\}_{i=1}^r\cup \{w^j\}_{j=1}^s\) positive-linearly independent. Note that positive-linear independence is stable under small perturbations, see [24, Lemma 2.2].

For a locally Lipschitz continuous functional \(\psi :\mathbb {R}^n\rightarrow \mathbb {R}\) and some point \(\bar{x}\in \mathbb {R}^n\), the (possibly empty) set

$$\begin{aligned} \partial ^{\mathrm{F}}\psi (\bar{x}) := \left\{ y\in \mathbb {R}^n\,\bigg |\, \liminf \limits _{x\rightarrow \bar{x}} \frac{\psi (x)-\psi (\bar{x})-y\cdot (x-\bar{x})}{\left\| x-\bar{x}\right\| _{2}}\ge 0 \right\} \end{aligned}$$

is called Fréchet (or regular) subdifferential of \(\psi\) at \(\bar{x}\). Based on that, one can define the Mordukhovich (or basic) subdifferential of \(\psi\) at \(\bar{x}\) by

$$\begin{aligned} \partial ^{\mathrm{M}}\psi (\bar{x}) := \left\{ y\in \mathbb {R}^n\,\bigg |\, \begin{aligned}&\exists \{x_k\}_{k\in \mathbb {N}}\subset \mathbb {R}^n\,\exists \{y_k\}_{k\in \mathbb {N}}\subset \mathbb {R}^n:\\&\qquad x_k\rightarrow \bar{x},\,y_k\rightarrow y,\,y_k\in \partial ^{\mathrm{F}}\psi (x_k)\,\forall k\in \mathbb {N}\end{aligned} \right\} . \end{aligned}$$

Finally, the Clarke (or convexified) subdifferential of \(\psi\) at \(\bar{x}\) is given by

$$\begin{aligned} \partial ^{\mathrm{C}}\psi (\bar{x}):={\overline{{\text {conv}}}}\partial ^{\mathrm{M}}\psi (\bar{x}) \end{aligned}$$

where \({\overline{{\text {conv}}}}A\) denotes the closed, convex hull of \(A\subset \mathbb {R}^n\). By construction, we have \(\partial ^{\mathrm{F}}\psi (\bar{x})\subset \partial ^{\mathrm{M}}\psi (\bar{x})\subset \partial ^{\mathrm{C}}\psi (\bar{x})\) and all these sets coincide with the singleton comprising only the gradient of \(\psi\) at \(\bar{x}\) whenever \(\psi\) is continuously differentiable at \(\bar{x}\).

2.2 Preliminaries from nonlinear programming

Here, we briefly recall basic constraint qualifications from nonlinear programming which can be found in [2]. Therefore, let us consider the nonlinear program

$$\begin{aligned} \begin{aligned} f(x)&\,\rightarrow \,\min&\\ g_i(x)&\,\le \,0&&i\in \mathcal {M}&\\ h_j(x)&\,=\,0&&j\in \mathcal {P},&\end{aligned} \end{aligned}$$
(NLP)

i.e. we leave the or-constraints in (MPOC) out of our consideration for a moment. Let \(\tilde{X}\subset \mathbb {R}^n\) be the feasible set of (NLP) and fix some point \(\bar{x}\in \tilde{X}\). Frequently, we will use the index set of active inequality constraints given by

$$\begin{aligned} I^g(\bar{x}):=\{i\in \mathcal {M}\,|\,g_i(\bar{x})=0\}. \end{aligned}$$

The linearization cone to \(\tilde{X}\) at \(\bar{x}\) is given by

$$\begin{aligned} \mathcal {L}_{\tilde{X}}(\bar{x}):= \left\{ d\in \mathbb {R}^n\,\bigg |\, \begin{aligned} \nabla g_i(\bar{x})\cdot d&\,\le \,0&&i\in I^g(\bar{x})\\ \nabla h_j(\bar{x})\cdot d&\,=\,0&&j\in \mathcal {P} \end{aligned} \right\} \end{aligned}$$

and it is well known that \(\mathcal {T}_{\tilde{X}}(\bar{x})\subset \mathcal {L}_{\tilde{X}}(\bar{x})\) is valid while the converse inclusion only holds true under validity of a constraint qualification in general. Let us note that the polar cone of \(\mathcal {L}_{\tilde{X}}(\bar{x})\) is given by

$$\begin{aligned} \mathcal {L}_{\tilde{X}}(\bar{x})^\circ = \left\{ \sum \nolimits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \nolimits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\,\bigg |\, \forall i\in I^g(\bar{x}):\,\lambda _i\ge 0 \right\} . \end{aligned}$$

Now, recall that LICQ (MFCQ), the linear independence constraint qualification (the Mangasarian–Fromovitz constraint qualification), holds true for (NLP) at \(\bar{x}\) whenever the vectors from

$$\begin{aligned} \{\nabla g_i(\bar{x})\,|\,i\in I^g(\bar{x})\}\cup \{\nabla h_j(\bar{x})\,|\,j\in \mathcal {P}\} \end{aligned}$$

are linearly independent (positive-linearly independent). Furthermore, GCQ, the Guignard constraint qualification, is valid at \(\bar{x}\) whenever \(\mathcal {T}_{\tilde{X}}(\bar{x})^\circ =\mathcal {L}_{\tilde{X}}(\bar{x})^\circ\) holds true. Clearly, we have

$$\begin{aligned} \hbox {LICQ}\quad \Longrightarrow \quad \text {MFCQ}\quad \Longrightarrow \quad \text {GCQ} \end{aligned}$$

and the validity of any of these constraint qualifications at a local minimizer \(\bar{x}\) of (NLP) implies that the latter is a Karush–Kuhn–Tucker (KKT) point of (NLP), i.e. there are multipliers \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)) and \(\rho _j\) (\(j\in \mathcal {P}\)) which satisfy

$$\begin{aligned} 0=\nabla f(\bar{x})+\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x}). \end{aligned}$$

Let us briefly mention that nonsmooth multiplier rules of KKT-type for the problem (NLP) with locally Lipschitz continuous but not necessarily differentiable data functions which are stated in terms of Mordukhovich’s or Clarke’s subdifferential can be found in [31] and [38], respectively.

2.3 Preliminaries from disjunctive programming

In this section, we briefly recall some stationarity conditions and constraint qualifications for three classes of disjunctive programs namely MPOCs, MPSCs, and MPCCs.

2.3.1 Mathematical programs with or-constraints

Problems of type (MPOC) were considered from the viewpoint of disjunctive programming in [30, Section 7] first. In the latter paper, the author introduced reasonable stationarity notions and constraint qualifications for (MPOC) which were motivated by the close relationship of switching- and or-constrained optimization problems. This relationship has been used in [24, Section 6.2.1] in order to solve a problem of type (MPOC) with the aid of relaxation schemes from switching-constrained programming. This approach will be discussed in Sect. 4.1, intensively.

Let us fix a feasible point \(\bar{x}\in X\) of (MPOC). Frequently, we will make use of the index sets defined below:

$$\begin{aligned} I^{-0}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})<0\wedge H_l(\bar{x})=0\},\\ I^{0-}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})=0 \wedge H_l(\bar{x})<0\},\\ I^{-+}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})<0 \wedge H_l(\bar{x})>0\},\\ I^{+-}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})>0 \wedge H_l(\bar{x})<0\},\\ I^{0+}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})=0 \wedge H_l(\bar{x})>0\},\\ I^{+0}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})>0 \wedge H_l(\bar{x})=0\},\\ I^{--}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})<0 \wedge H_l(\bar{x})<0\},\\ I^{00}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})=0 \wedge H_l(\bar{x})=0\}. \end{aligned}$$

Clearly, these index sets provide a disjoint partition of \(\mathcal {Q}\). Furthermore, we set

$$\begin{aligned} \mathcal {I}(\bar{x}):=I^{0+}(\bar{x})\cup I^{+0}(\bar{x})\cup I^{00}(\bar{x}) \end{aligned}$$
(2)

for brevity. This set can be considered as the set of all active or-constraints. Now, we are in position to state definitions of MPOC-tailored stationarity concepts, see [30, Definition 7.1].

Definition 2.1

Let \(\bar{x}\in X\) be a feasible point of (MPOC). Then, \(\bar{x}\) is said to be

  1. 1.

    weakly stationary (W-stationary) for (MPOC) whenever there exist multipliers \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)), \(\rho _j\) (\(j\in \mathcal {P}\)), \(\mu _l\ge 0\) (\(l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})\)), and \(\nu _l\ge 0\) (\(l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})\)) which satisfy

    $$\begin{aligned} 0&=\nabla f(\bar{x}) +\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\nonumber \\&\quad +\sum \limits _{l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})}\mu _l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})}\nu _l\nabla H_l(\bar{x}), \end{aligned}$$
    (3)
  2. 2.

    Mordukhovich-stationary (M-stationary) for (MPOC) whenever it is W-stationary while the associated multipliers additionally satisfy

    $$\begin{aligned} \forall l\in I^{00}(\bar{x}):\quad \mu _l\nu _l=0, \end{aligned}$$
  3. 3.

    strongly stationary (S-stationary) for (MPOC) whenever it is W-stationary while the associated multipliers additionally satisfy

    $$\begin{aligned} \forall l\in I^{00}(\bar{x}):\quad \mu _l=0\,\wedge \,\nu _l=0. \end{aligned}$$

By definition, we obtain the relations

$$\begin{aligned} \text {S-stationarity}\quad \Longrightarrow \quad \text {M-stationarity}\quad \Longrightarrow \quad \text {W-stationarity} \end{aligned}$$

between these stationarity notions which are visualized in Fig. 1. In this paper, we will make use of the MPOC-tailored constraint qualifications definied below.

Fig. 1
figure 1

Geometric visualizations of W-, M-, and S-stationarity for the program (MPOC) w.r.t. an index \(l\in I^{00}(\bar{x})\)

Definition 2.2

We say that MPOC-LICQ (MPOC-MFCQ) is valid at a feasible point \(\bar{x}\in X\) of (MPOC) whenever the gradients from

$$\begin{aligned} \Bigl [\{\nabla g_i(\bar{x})\,|\,i\in I^g(\bar{x})\}&\cup \{\nabla G_l(\bar{x})\,|\,l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})\}\\&\cup \{\nabla H_l(\bar{x})\,|\,l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})\} \Bigr ]\cup \{\nabla h_j(\bar{x})\,|\,j\in \mathcal {P}\} \end{aligned}$$

are linearly independent (positive-linearly independent).

Let \(\bar{x}\in X\) be a feasible point of (MPOC) and consider the associated tightened nonlinear problem

$$\begin{aligned} \begin{aligned} f(x)&\,\rightarrow \,\min&\\ g_i(x)&\,\le \,0&\qquad&i\in \mathcal {M}&\\ h_j(x)&\,=\,0&\qquad&j\in \mathcal {P}&\\ G_l(x)&\,\le \,0&\qquad&l\in I^-_G(\bar{x})\cup I^{0+}(\bar{x})\cup I^{00}(\bar{x})&\\ H_l(x)&\,\le \,0&\qquad&l\in I^-_H(\bar{x})\cup I^{+0}(\bar{x})\cup I^{00}(\bar{x}) \end{aligned} \end{aligned}$$
(TNLP)

where we used

$$\begin{aligned} I^-_G(\bar{x})&:= I^{-0}(\bar{x})\cup I^{-+}(\bar{x})\cup I^{--}(\bar{x}),\nonumber \\ I^-_H(\bar{x})&:= I^{0-}(\bar{x})\cup I^{+-}(\bar{x})\cup I^{--}(\bar{x}). \end{aligned}$$
(4)

Clearly, the feasible set of (TNLP) is a subset of X. It is easy to check that validity of MPOC-LICQ (MPOC-MFCQ) for (MPOC) at \(\bar{x}\) is equivalent to validity of standard LICQ (MFCQ) for (TNLP) at \(\bar{x}\). Furthermore, we note that the W-stationarity conditions for (MPOC) at \(\bar{x}\) coincide with the KKT conditions of (TNLP) at \(\bar{x}\).

Below, we present necessary optimality conditions for (MPOC). These results can be easily distilled from [30, Theorems 7.1, 7.3].

Proposition 2.1

Let\(\bar{x}\in X\)be a local minimizer of (MPOC). Then, the following statements hold.

  1. 1.

    If MPOC-LICQ holds at\(\bar{x}\), then it is an S-stationary point of (MPOC).

  2. 2.

    If MPOC-MFCQ holds at\(\bar{x}\), then it is an M-stationary point of (MPOC).

In light of the available theory for other classes of disjunctive programs, it is reasonable that there exist numerous other MPOC-tailored constraint qualifications weaker than MPOC-MFCQ whose validity guarantees M-stationarity of local minimizers. Exemplary, let us note that whenever the data functions g, h, G, and H are affine, then a local minimizer of the associated problem (MPOC) is always M-stationary, see [30, Theorem 7.2]. Second-order necessary and sufficient optimality conditions for (MPOC) can be easily obtained by applying the more general theory from [18, 29] for disjunctive programs to the problem at hand. In this paper, however, we are not reliant on these results which is why we abstain from a detailed presentation.

2.3.2 Mathematical programs with switching constraints

Let us consider the mathematical program

$$\begin{aligned} \begin{aligned} f(x)&\,\rightarrow \,\min&\\ g_i(x)&\,\le \,0&&i\in \mathcal {M}&\\ h_j(x)&\,=\,0&&j\in \mathcal {P}&\\ \tilde{G}_l(x)\tilde{H}_l(x)&\,=\,0&&l\in \mathcal {Q}&\end{aligned} \end{aligned}$$
(MPSC)

where \(\tilde{G}_l,\tilde{H}_l:\mathbb {R}^n\rightarrow \mathbb {R}\) (\(l\in \mathcal {Q}\)) are continuously differentiable functions. For later use, let \(\tilde{G},\tilde{H}:\mathbb {R}^n\rightarrow \mathbb {R}^q\) be the maps possessing the component functions \(\tilde{G}_l\) (\(l\in \mathcal {Q}\)) and \(\tilde{H}_l\) (\(l\in \mathcal {Q}\)), respectively. Note that (MPSC) results from (MPOC) by replacing the q or-constraints by q so-called switching constraints. That is why we refer to (MPSC) as a mathematical program with switching constraints. Theoretical and numerical investigations which address this problem class as well as an overview of underlying applications can be found in the recent papers [24, 30].

Let \(X_{\mathrm{SC}}\subset \mathbb {R}^n\) be the feasible set of (MPSC) and fix some point \(\bar{x}\in X_{\mathrm{SC}}\). We define

$$\begin{aligned} I^{\tilde{G}}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,\tilde{G}_l(\bar{x})=0\,\wedge \,\tilde{H}_l(\bar{x})\ne 0\},\\ I^{\tilde{H}}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,\tilde{G}_l(\bar{x})\ne 0\,\wedge \,\tilde{H}_l(\bar{x})= 0\},\\ I^{\tilde{G}\tilde{H}}(\bar{x})&:=\{l\in \mathcal {Q}\,|\,\tilde{G}_l(\bar{x})=0\,\wedge \,\tilde{H}_l(\bar{x})= 0\}. \end{aligned}$$

Clearly, these sets provide a disjoint partition of \(\mathcal {Q}\) and allow us to state suitable problem-tailored stationarity notions for (MPSC).

Definition 2.3

Let \(\bar{x}\in X_{\mathrm{SC}}\) be a feasible point of (MPSC). Then, \(\bar{x}\) is said to be

  1. 1.

    weakly stationary (\(\hbox {W}_{\mathrm{SC}}\)-stationary) for (MPSC) whenever there exist multipliers \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)), \(\rho _j\) (\(j\in \mathcal {P}\)), \(\tilde{\mu }_l\) (\(l\in I^{\tilde{G}}(\bar{x})\cup I^{\tilde{G}\tilde{H}}(\bar{x})\)), and \(\tilde{\nu }_l\) (\(l\in I^{\tilde{H}}(\bar{x})\cup I^{\tilde{G}\tilde{H}}(\bar{x})\)) which satisfy

    $$\begin{aligned} 0&=\nabla f(\bar{x}) +\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{\tilde{G}}(\bar{x})\cup I^{\tilde{G}\tilde{H}}(\bar{x})} \tilde{\mu }_l\nabla \tilde{G}_l(\bar{x}) +\sum \limits _{l\in I^{\tilde{H}}(\bar{x})\cup I^{\tilde{G}\tilde{H}}(\bar{x})} \tilde{\nu }_l\nabla \tilde{H}_l(\bar{x}), \end{aligned}$$
  2. 2.

    Mordukhovich-stationary (\(\hbox {M}_{\mathrm{SC}}\)-stationary) for (MPSC) whenever it is \(\hbox {W}_{\mathrm{SC}}\)-stationary while the associated multipliers additionally satisfy

    $$\begin{aligned} \forall l\in I^{\tilde{G}\tilde{H}}(\bar{x}):\quad \tilde{\mu }_l\tilde{\nu }_l=0, \end{aligned}$$
  3. 3.

    strongly stationary (\(\hbox {S}_{\mathrm{SC}}\)-stationary) for (MPSC) whenever it is \(\hbox {W}_{\mathrm{SC}}\)-stationary while the associated multipliers additionally satisfy

    $$\begin{aligned} \forall l\in I^{\tilde{G}\tilde{H}}(\bar{x}):\quad \tilde{\mu }_l=0\,\wedge \,\tilde{\nu }_l=0. \end{aligned}$$

Again, we obtain the relations

$$\begin{aligned} \hbox {S}_{\mathrm{SC}}\text {-stationary}\quad \Longrightarrow \quad \hbox {M}_{\mathrm{SC}}\text {-stationary}\quad \Longrightarrow \quad \hbox {W}_{\mathrm{SC}}\text {-stationary} \end{aligned}$$

by definition of these stationarity concepts. In order to avoid confusion, we used the index SC to emphasize that the above stationarity notions address (MPSC) although this is also quite clear from the context. It should be noted that \(\hbox {S}_{\mathrm{SC}}\)-stationarity is equivalent to the KKT conditions of (MPSC) where the switching constraints are interpreted as simple equality constraints. A visualization of these stationarity concepts is provided in Fig. 2.

Fig. 2
figure 2

Geometric visualizations of \(\hbox {W}_{\mathrm{SC}}\)-, \(\hbox {M}_{\mathrm{SC}}\)-, and \(\hbox {S}_{\mathrm{SC}}\)-stationarity for the program (MPSC) w.r.t. an index \(l\in I^{\tilde{G}\tilde{H}}(\bar{x})\)

2.3.3 Mathematical programs with complementarity constraints

Finally, we would like to mention so-called mathematical programs with complementarity constraints which are optimization problems of the following type:

$$\begin{aligned} \begin{aligned} f(x)&\,\rightarrow \,\min&\\ g_i(x)&\,\le \,0&&i\in \mathcal {M}&\\ h_j(x)&\,=\,0&&j\in \mathcal {P}&\\ 0\,\le \,\bar{G}_l(x)\,\perp \,\bar{H}_l(x)&\,\ge \,0&&l\in \mathcal {Q}.&\end{aligned} \end{aligned}$$
(MPCC)

Therein, the last q constraints, which are stated w.r.t. continuously differentiable functions \(\bar{G},\bar{H}:\mathbb {R}^n\rightarrow \mathbb {R}^q\) whose components will be addressed by \(\bar{G}_1,\ldots ,\bar{G}_q:\mathbb {R}^n\rightarrow \mathbb {R}\) and \(\bar{H}_1,\ldots ,\bar{H}_q:\mathbb {R}^n\rightarrow \mathbb {R}\), respectively, induce a complementarity regime since they demand that for each \(l\in \mathcal {Q}\), \(G_l(x)\) and \(H_l(x)\) are nonnegative while at least one of those numbers needs to vanish for each feasible point \(x\in \mathbb {R}^n\) of (MPCC). During the last decades, complementarity-constrained optimization has been considered eagerly from a theoretical and numerical point of view due to numerous underlying applications, see e.g. [28, 32].

We exploit \(X_{\mathrm{CC}}\subset \mathbb {R}^n\) in order to denote the feasible set of (MPCC). Let us fix a feasible point \(\bar{x}\in X_{\mathrm{CC}}\). Then, the index sets

$$\begin{aligned} I^{0+}_{\mathrm{CC}}(\bar{x})&:= \{l\in \mathcal {Q}\,|\,\bar{G}_l(\bar{x})=0\,\wedge \,\bar{H}_l(\bar{x})>0\},\\ I^{+0}_{\mathrm{CC}}(\bar{x})&:= \{l\in \mathcal {Q}\,|\,\bar{G}_l(\bar{x})>0\,\wedge \,\bar{H}_l(\bar{x})=0\},\\ I^{00}_{\mathrm{CC}}(\bar{x})&:= \{l\in \mathcal {Q}\,|\,\bar{G}_l(\bar{x})=0\,\wedge \,\bar{H}_l(\bar{x})=0\} \end{aligned}$$

provide a disjoint partition of \(\mathcal {Q}\). Note that we used the index CC in order to distinguish the above index sets from their respective counterparts which are related to (MPOC). Next, we recall some stationarity notions from complementarity-constrained programming, see e.g. [40]. Again, we use the index CC in order to emphasize that the stationarity notions of interest are related to (MPCC).

Definition 2.4

Let \(\bar{x}\in X_{\mathrm{CC}}\) be a feasible point of (MPCC). Then, \(\bar{x}\) is said to be

  1. 1.

    weakly stationary (\(\hbox {W}_{\mathrm{CC}}\)-stationary) for (MPCC) whenever there exist multipliers \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)), \(\rho _j\) (\(j\in \mathcal {P}\)), \(\bar{\mu }_l\) (\(l\in I^{0+}_{\mathrm{CC}}(\bar{x})\cup I^{00}_{\mathrm{CC}}(\bar{x})\)), and \(\bar{\nu }_l\) (\(l\in I^{+0}_{\mathrm{CC}}(\bar{x})\cup I^{00}_{\mathrm{CC}}(\bar{x})\)) which satisfy

    $$\begin{aligned} 0&=\nabla f(\bar{x}) +\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\\&\quad -\sum \limits _{l\in I^{0+}_{\mathrm{CC}}(\bar{x})\cup I^{00}_{\mathrm{CC}}(\bar{x})} \bar{\mu }_l\nabla \bar{G}_l(\bar{x}) -\sum \limits _{l\in I^{+0}_{\mathrm{CC}}(\bar{x})\cup I^{00}_{\mathrm{CC}}(\bar{x})} \bar{\nu }_l\nabla \bar{H}_l(\bar{x}), \end{aligned}$$
  2. 2.

    Clarke-stationary (\(\hbox {C}_{\mathrm{CC}}\)-stationary) for (MPCC) whenever it is \(\hbox {W}_{\mathrm{CC}}\)-stationary while the associated multipliers additionally satisfy

    $$\begin{aligned} \forall l\in I^{00}_{\mathrm{CC}}(\bar{x}):\quad \bar{\mu }_l\bar{\nu }_l\ge 0, \end{aligned}$$
  3. 3.

    Mordukhovich-stationary (\(\hbox {M}_{\mathrm{CC}}\)-stationary) for (MPCC) whenever it is \(\hbox {W}_{\mathrm{CC}}\)-stationary while the associated multipliers additionally satisfy

    $$\begin{aligned} \forall l\in I^{00}_{\mathrm{CC}}(\bar{x}):\quad \bar{\mu }_l\bar{\nu }_l=0\,\vee \,(\bar{\mu }_l>0\,\wedge \,\bar{\nu }_l>0), \end{aligned}$$
  4. 4.

    strongly stationary (\(\hbox {S}_{\mathrm{CC}}\)-stationary) for (MPCC) whenever it is \(\hbox {W}_{\mathrm{CC}}\)-stationary while the associated multipliers additionally satisfy

    $$\begin{aligned} \forall l\in I^{00}_{\mathrm{CC}}(\bar{x}):\quad \bar{\mu }_l\ge 0\,\wedge \,\bar{\nu }_l\ge 0. \end{aligned}$$

By definition, we have

$$\begin{aligned} \hbox {S}_{\mathrm{CC}}\text {-stationary}\quad&\Longrightarrow \quad \hbox {M}_{\mathrm{CC}}\text {-stationary}\\&\Longrightarrow \quad \hbox {C}_{\mathrm{CC}}\text {-stationary}\quad \Longrightarrow \quad \hbox {W}_{\mathrm{CC}}\text {-stationary}. \end{aligned}$$

Furthermore, one can check that the \(\hbox {S}_{\mathrm{CC}}\)-stationarity conditions of (MPCC) are equivalent to the KKT conditions of the equivalent NLP-model associated with (MPCC) where the complementarity constraints are restated as

$$\begin{aligned} \begin{aligned} \bar{G}_l(x)&\,\ge \, 0&&l\in \mathcal {Q}&\\ \bar{H}_l(x)&\,\ge \, 0&&l\in \mathcal {Q}&\\ \bar{G}(x)\cdot \bar{H}(x)&\,=\,0.&\end{aligned} \end{aligned}$$

All introduced stationarity notions are visualized in Fig. 3.

Fig. 3
figure 3

Geometric visualizations of \(\hbox {W}_{\mathrm{CC}}\)-, \(\hbox {C}_{\mathrm{CC}}\)-, \(\hbox {M}_{\mathrm{CC}}\)-, and \(\hbox {S}_{\mathrm{CC}}\)-stationarity for the program (MPCC) w.r.t. an index \(l\in I^{00}_{\mathrm{CC}}(\bar{x})\)

3 Reformulation of or-constraints using NCP-functions

A continuous function \(\varphi :\mathbb {R}^2\rightarrow \mathbb {R}\) which satisfies

$$\begin{aligned} \forall (a,b)\in \mathbb {R}^2:\quad \varphi (a,b)=0\,\Longleftrightarrow \,a,b\ge 0\,\wedge \,ab=0 \end{aligned}$$

is referred to as NCP-function. By definition, NCP-functions can be used to reformulate complementarity systems as (possibly nonsmooth) equalities which is beneficial since the transformed system can be tackled numerically with the aid of (semismooth) Newton or SQP methods, see e.g. [7, 8, 11, 12].

Clearly, the zero level set of an NCP-function precisely equals the complementarity set

$$\begin{aligned} C:=\{(a,b)\in \mathbb {R}^2\,|\,a,b\ge 0\,\wedge \,ab=0\}. \end{aligned}$$
(5)

Defining the sets

$$\begin{aligned} A:=\{(a,b)\in \mathbb {R}^2\,|\,a>0\,\wedge \,b>0\},\qquad B:=\{(a,b)\in \mathbb {R}^2\,|\,a<0\,\vee \,b<0\}, \end{aligned}$$

Bolzano’s theorem yields that each NCP-function \(\varphi :\mathbb {R}^2\rightarrow \mathbb {R}\) has precisely one of the following properties:

NCP1::

\(\forall (a,b)\in A\cup B:\,\varphi (a,b)>0\),

NCP2::

\(\forall (a,b)\in A\cup B:\,\varphi (a,b)<0\),

NCP3::

\(\forall (a,b)\in A:\,\varphi (a,b)<0\,\wedge \,\forall (a,b)\in B:\,\varphi (a,b)>0\),

NCP4::

\(\forall (a,b)\in A:\,\varphi (a,b)>0\,\wedge \,\forall (a,b)\in B:\,\varphi (a,b)<0\).

This already has been mentioned in [17, Corollary 1]. Noting that \(B\cup C=O\) holds for the set O defined in (1), any NCP-function of type NCP4 possesses the zero sublevel set O. Particularly, for any such NCP-function \(\varphi :\mathbb {R}^2\rightarrow \mathbb {R}\), we have the relation

$$\begin{aligned} \forall (a,b)\in \mathbb {R}^2:\quad \varphi (a,b)\le 0\,\Longleftrightarrow \,a\le 0\,\vee \,b\le 0\,\Longleftrightarrow \,(a,b)\in O. \end{aligned}$$

This observation yields the following definition.

Definition 3.1

An NCP-function \(\varphi :\mathbb {R}^2\rightarrow \mathbb {R}\) is said to be or-compatible if it possesses property NCP4.

Clearly, if \(\varphi :\mathbb {R}^2\rightarrow \mathbb {R}\) is an NCP-function possessing property NCP3, then \(-\varphi\) is an or-compatible NCP-function. Below, we list three popular NCP-functions which are or-compatible while noting that there exist many more examples:

  • the minimum function \(\varphi _{\mathrm{min}}:\mathbb {R}^2\rightarrow \mathbb {R}\) given by

    $$\begin{aligned} \forall (a,b)\in \mathbb {R}^2:\quad \varphi _{\mathrm{min}}(a,b):=\min \{a;b\}, \end{aligned}$$
  • the Fischer–Burmeister-type function \(\varphi _{\mathrm{FB}}:\mathbb {R}^2\rightarrow \mathbb {R}\) defined via

    $$\begin{aligned} \forall (a,b)\in \mathbb {R}^2:\quad \varphi _{\mathrm{FB}}(a,b):=a+b-\sqrt{a^2+b^2} \end{aligned}$$

    which dates back to [12], and

  • the Kanzow–Schwartz function \(\varphi _{\mathrm{KS}}:\mathbb {R}^2\rightarrow \mathbb {R}\) from [25] which is defined by

    $$\begin{aligned} \forall (a,b)\in \mathbb {R}^2:\quad \varphi _{\mathrm{KS}}(a,b):= {\left\{ \begin{array}{ll} ab &{}\text {if }a+b\ge 0,\\ -\tfrac{1}{2}(a^2+b^2) &{}\text {if }a+b<0. \end{array}\right. } \end{aligned}$$

Obviously, \(\varphi _{\mathrm{min}}\) is nonsmooth at all points from \(\{(a,a)\in \mathbb {R}^2\,|\,a\in \mathbb {R}\}\) while \(\varphi _{\mathrm{FB}}\) is nonsmooth only at the origin. By construction, the function \(\varphi _{\mathrm{KS}}\) is continuously differentiable, see [25, Lemma 3.1], which makes it rather attractive in comparison to other NCP-functions.

For an arbitrary or-compatible NCP-function \(\varphi :\mathbb {R}^2\rightarrow \mathbb {R}\), we now consider the surrogate

$$\begin{aligned}f(x)&\,\rightarrow\,\min&&&\\g_i(x)&\,\leq\,0&\qquad&i\in\mathcal M&\\h_j(x)&\,=\,0&\qquad&j\in\mathcal P&\\ \varphi(G_l(x),H_l(x))&\,\leq\,0&\qquad&l\in\mathcal Q&\end{aligned}\quad {\text{MPOC}(\varphi)}$$

which is equivalent to (MPOC). Let \(\bar{x}\in \mathbb {R}^n\) be an arbitrary feasible point of (MPOC) and, thus, of (\(\hbox {MPOC}(\varphi )\)). We set

$$\begin{aligned} I^\varphi (\bar{x}):=\{l\in \mathcal {Q}\,|\,\varphi (G_l(\bar{x}),H_l(\bar{x}))=0\}. \end{aligned}$$

Since \(\varphi\) is or-compatible, we have \(I^\varphi (\bar{x})=\mathcal {I}(\bar{x})\) for the set \(\mathcal {I}(\bar{x})\) defined in (2). We now study the relationship between programs (MPOC) and (\(\hbox {MPOC}(\varphi )\)) w.r.t. stationary points.

Noting that (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)) is a smooth program, we first investigate this particular model.

Proposition 3.1

A feasible point\(\bar{x}\in X\)of (MPOC) is S-stationary if and only if it is a KKT point of (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)).

Proof

\([\Longrightarrow ]\) Let \(\bar{x}\) be an S-stationary point of (MPOC). Then, we find multipliers \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)), \(\rho _j\) (\(j\in \mathcal {P}\)), \(\mu _l\ge 0\) (\(l\in I^{0+}(\bar{x})\)), and \(\nu _l\ge 0\) (\(l\in I^{+0}(\bar{x})\)) such that

$$\begin{aligned} 0&=\nabla f(\bar{x}) +\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{0+}(\bar{x})}\mu _l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{+0}(\bar{x})}\nu _l\nabla H_l(\bar{x}) \end{aligned}$$

holds. Next, we set

$$\begin{aligned} \forall l\in I^{\varphi _{\mathrm{KS}}}(\bar{x}):\quad \xi_l:= \begin{cases} \mu_l/H_l(\bar x)&\text{if }l\in I^{0+}(\bar x),\\ \nu_l/G_l(\bar x)&\text{if }l\in I^{+0}(\bar x),\\ 0&\text{if }l\in I^{00}(\bar x), \end{cases} \end{aligned}$$

which allows us to rewrite the above equation as

$$\begin{aligned} 0&=\nabla f(\bar{x}) +\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{\varphi _{\mathrm{KS}}}(\bar{x})} \xi _l\bigl (H_l(\bar{x})\nabla G_l(\bar{x})+G_l(\bar{x})\nabla H_l(\bar{x})\bigr ). \end{aligned}$$

By definition of \(\varphi _{\mathrm{KS}}\) and \(\xi _l\ge 0\) for all \(l\in I^{\varphi _{\mathrm{KS}}}(\bar{x})\), \(\bar{x}\) is a KKT point of (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)).

\([\Longleftarrow ]\) If \(\bar{x}\) is a KKT point of (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)), we find multipliers \(\bar{\lambda }_i\ge 0\) (\(i\in I^g(\bar{x})\)), \(\bar{\rho }_j\) (\(j\in \mathcal {P}\)), and \(\bar{\xi }_l\ge 0\) (\(l\in I^{\varphi _{\mathrm{KS}}}(\bar{x})\)) such that

$$\begin{aligned} 0&=\nabla f(\bar{x}) +\sum \limits _{i\in I^g(\bar{x})}\bar{\lambda }_i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\bar{\rho }_j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{\varphi _{\mathrm{KS}}}(\bar{x})} \bar{\xi }_l\bigl (H_l(\bar{x})\nabla G_l(\bar{x})+G_l(\bar{x})\nabla H_l(\bar{x})\bigr ) \end{aligned}$$

is valid. Now, we define \(\bar{\mu }_l:=\bar{\xi }_l H_l(\bar{x})\) (\(l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})\)) as well as \(\bar{\nu }_l:=\bar{\xi }_l G_l(\bar{x})\) (\(l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})\)) in order to see that \(\bar{x}\) is S-stationary for (MPOC). \(\square\)

The above result justifies to solve the smooth standard nonlinear problem (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)) instead of the disjunctive program (MPOC) in order to find S-stationary points of the latter. However, it needs to be noted that (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)) is still a challenging problem due to the combinatorial structure of its feasible set. Additionally, if \(I^{00}(\bar{x})\ne \varnothing\) holds true for some feasible point \(\bar{x}\in \mathbb {R}^n\) of (MPOC), then for each \(l\in I^{00}(\bar{x})\), the gradient of the map \(x\mapsto \varphi _{\mathrm{KS}}(G_l(x),H_l(x))\) vanishes at \(\bar{x}\). This particularly means that popular constraint qualifications like MFCQ or LICQ do not hold at \(\bar{x}\) for (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)). However, it is possible to obtain the following result.

Lemma 3.1

Let\(\bar{x}\in \mathbb {R}^n\)be a feasible point of (MPOC) where MPOC-LICQ is valid. Then, GCQ holds for (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)) at\(\bar{x}\).

Proof

Let \(\mathcal {L}_X^{\mathrm{KS}}(\bar{x})\) be the linearization cone associated with program (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)) at \(\bar{x}\). By standard arguments, \(\mathcal {T}_X(\bar{x})\subset \mathcal {L}_X^{\mathrm{KS}}(\bar{x})\) holds true, and this yields the inclusion \(\mathcal {L}_X^{\mathrm{KS}}(\bar{x})^\circ \subset \mathcal {T}_X(\bar{x})^\circ\). In order to verify that GCQ holds for (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)) at \(\bar{x}\), we only need to show the opposite inclusion.

Let \(I\subset I^{00}(\bar{x})\) be arbitrarily chosen. We consider the program

$$\begin{aligned}f(x)&\,\to\,\min&&&\\g_i(x)&\,\leq\,0&\qquad&i\in\mathcal M\\h_j(x)&\,=\,0&\qquad&j\in\mathcal P\\G_l(x)&\,\leq\,0&\qquad&l\in I^-_G(\bar x)\cup I^{0+}(\bar x)\cup I\\H_l(x)&\,\leq\,0&\qquad&l\in I^-_H(\bar x)\cup I^{+0}(\bar x)\cup(I^{00}(\bar x)\setminus I)\end{aligned}\quad{(\text{MPOC}(\bar x,I))}$$

whose feasible set will be denoted by \(X(\bar{x},I)\) in the subsequent considerations. The index sets \(I^-_G(\bar{x})\) and \(I^-_H(\bar{x})\) have been defined in (4). Locally around \(\bar{x}\), the family \(\{X(\bar{x},I)\}_{I\subset I^{00}(\bar{x})}\) provides a decomposition of X which is why the relation

$$\begin{aligned} \mathcal {T}_X(\bar{x})=\bigcup \limits _{I\subset I^{00}(\bar{x})}\mathcal {T}_{X(\bar{x},I)}(\bar{x}) \end{aligned}$$

holds, cf. [13, Lemma 3.1] or [30, Lemma 5.1] for related results associated with MPCCs or MPSCs, respectively. Noting that the validity of MPOC-LICQ implies that LICQ holds for (\(\hbox {MPOC}(\bar{x},I)\)) at \(\bar{x}\) for each \(I\subset I^{00}(\bar{x})\), we can infer

$$\begin{aligned} \mathcal {T}_X(\bar{x})=\bigcup \limits _{I\subset I^{00}(\bar{x})}\mathcal {L}_{X(\bar{x},I)}(\bar{x}), \end{aligned}$$

and polarizing this formula shows

$$\begin{aligned} \mathcal {T}_X(\bar{x})^\circ =\bigcap \limits _{I\subset I^{00}(\bar{x})}\mathcal {L}_{X(\bar{x},I)}(\bar{x})^\circ \end{aligned}$$

where \(\mathcal {L}_{X(\bar{x},I)}(\bar{x})\) denotes the linearization cone of (\(\hbox {MPOC}(\bar{x},I)\)) at \(\bar{x}\).

Pick \(\eta \in \mathcal {T}_X(\bar{x})^\circ\) arbitrarily. The above considerations show

$$\begin{aligned} \eta \in \mathcal {L}_{X(\bar{x},\varnothing )}(\bar{x})^\circ \cap \mathcal {L}_{X(\bar{x},I^{00}(\bar{x}))}(\bar{x})^\circ . \end{aligned}$$

Hence, we find multipliers \(\lambda ^s_i\ge 0\) (\(s=1,2\) and \(i\in I^g(\bar{x})\)), \(\rho ^s_j\) (\(s=1,2\) and \(j\in \mathcal {P}\)), \(\mu ^s_l\ge 0\) (\(s=1,2\) and \(l\in I^{0+}(\bar{x})\)), \(\nu ^s_l\ge 0\) (\(s=1,2\) and \(l\in I^{+0}(\bar{x})\)), \(\mu ^2_l\ge 0\) (\(l\in I^{00}(\bar{x})\)), and \(\nu ^1_l\ge 0\) (\(l\in I^{00}(\bar{x})\)) such that \(\eta\) possesses the following representations

$$\begin{aligned} \eta&= \sum \limits _{i\in I^g(\bar{x})}\lambda ^1_i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho ^1_j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{0+}(\bar{x})}\mu ^1_l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})}\nu ^1_l\nabla H_l(\bar{x})\\&= \sum \limits _{i\in I^g(\bar{x})}\lambda ^2_i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho ^2_j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})}\mu ^2_l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{+0}(\bar{x})}\nu ^2_l\nabla H_l(\bar{x}). \end{aligned}$$

This particularly shows

$$\begin{aligned} 0&= \sum \limits _{i\in I^g(\bar{x})}(\lambda ^1_i-\lambda ^2_i)\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}(\rho ^1_j-\rho ^2_j)\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{0+}(\bar{x})}(\mu ^1_l-\mu ^2_l)\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{+0}(\bar{x})}(\nu ^1_l-\nu ^2_l)\nabla H_l(\bar{x})\\&\quad -\sum \limits _{l\in I^{00}(\bar{x})}\mu ^2_l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{00}(\bar{x})}\nu ^1_l\nabla H_l(\bar{x}). \end{aligned}$$

Exploiting the validity of MPOC-LICQ, this leads to \(\mu ^2_l=\nu ^1_l=0\) for all \(l\in I^{00}(\bar{x})\), i.e.

$$\begin{aligned} \eta&= \sum \limits _{i\in I^g(\bar{x})}\lambda ^1_i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho ^1_j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{0+}(\bar{x})}\mu ^1_l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{+0}(\bar{x})}\nu ^1_l\nabla H_l(\bar{x}) \end{aligned}$$

holds true. A simple calculation shows that this means \(\eta \in \mathcal {L}^{\mathrm{KS}}_X(\bar{x})^\circ\). \(\square\)

In contrast to (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)), the programs (\(\hbox {MPOC}(\varphi _{\mathrm{min}})\)) and (\(\hbox {MPOC}(\varphi _{\mathrm{FB}})\)) are nonsmooth. However, using suitable subdifferential constructions, it is possible to state KKT-type systems associated with these optimization problems as well. Noting that the active set \(I^\varphi (\bar{x})\) is directly related to the complementarity set C defined in (5), only subdifferential information of \(\varphi\) on C is relevant for the characterization of the associated KKT systems. Using Clarke’s constructions, see [4], we obtain

$$\begin{aligned} \partial ^{\mathrm{C}}\varphi _{\mathrm{min}}(a,b)&= {\left\{ \begin{array}{ll} \{\mathtt {e}^2_1\} &{}\text {if } a=0\,\wedge \, b>0,\\ \{\mathtt {e}^2_2\} &{}\text {if }a>0\,\wedge \,b=0,\\ {\text {conv}}\left\{ \mathtt {e}^2_1,\mathtt {e}^2_2\right\} &{}\text {if }a=b=0, \end{array}\right. }\\ \partial ^{\mathrm{C}}\varphi _{\mathrm{FB}}(a,b)&= {\left\{ \begin{array}{ll} \{\mathtt {e}^2_1\} &{}\text {if } a=0\,\wedge \, b>0,\\ \{\mathtt {e}^2_2\} &{}\text {if }a>0\,\wedge \,b=0,\\ \mathbb {B}^1(\mathtt {e}^2) &{}\text {if }a=b=0 \end{array}\right. } \end{aligned}$$

for all \((a,b)\in C\). Sharper results can be obtained with Mordukhovich’s subdifferential, see [31], which computes as

$$\begin{aligned} \partial ^{\mathrm{M}}\varphi _{\mathrm{min}}(a,b)&= {\left\{ \begin{array}{ll} \{\mathtt {e}^2_1\} &{}\text {if } a=0\,\wedge \, b>0,\\ \{\mathtt {e}^2_2\} &{}\text {if }a>0\,\wedge \,b=0,\\ \left\{ \mathtt {e}^2_1,\mathtt {e}^2_2\right\} &{}\text {if }a=b=0, \end{array}\right. }\\ \partial ^{\mathrm{M}}\varphi _{\mathrm{FB}}(a,b)&= {\left\{ \begin{array}{ll} \{\mathtt {e}^2_1\} &{}\text {if } a=0\,\wedge \, b>0,\\ \{\mathtt {e}^2_2\} &{}\text {if }a>0\,\wedge \,b=0,\\ \mathbb {S}^1(\mathtt {e}^2) &{}\text {if }a=b=0. \end{array}\right. } \end{aligned}$$

Using these formulas and suitable chain rules for the underlying subdifferentials, respective KKT-type systems associated with (\(\hbox {MPOC}(\varphi _{\mathrm{min}})\)) and (\(\hbox {MPOC}(\varphi _{\mathrm{FB}})\)) can be derived, see [38, Theorem 5.6.2] and [31, Theorem 5.21], respectively. For simplicity, we refer to these first-order systems as KKT systems again and specify the underlying subdifferential construction.

The upcoming result which addresses (\(\hbox {MPOC}(\varphi _{\mathrm{min}})\)) can be validated exploiting a similar strategy as used for the derivation of Proposition 3.1 doing some nearby changes. That is why its proof is omitted here.

Proposition 3.2

  1. 1.

    A feasible point\(\bar{x}\in \mathbb {R}^n\)of (MPOC) is W-stationary if and only if it is a KKT point of (\(\hbox {MPOC}(\varphi _{\mathrm{min}})\)) w.r.t. Clarke’s subdifferential.

  2. 2.

    A feasible point\(\bar{x}\in \mathbb {R}^n\)of (MPOC) is M-stationary if and only if it is a KKT point of (\(\hbox {MPOC}(\varphi _{\mathrm{min}})\)) w.r.t. Mordukhovich’s subdifferential.

Finally, we consider the KKT system of (\(\hbox {MPOC}(\varphi _{\mathrm{FB}})\)).

Proposition 3.3

For a feasible point\(\bar{x}\in \mathbb {R}^n\)of (MPOC), the following statements are equivalent:

  1. (a)

    \(\bar{x}\)is W-stationary,

  2. (b)

    \(\bar{x}\)is a KKT point of (\(\hbox {MPOC}(\varphi _{\mathrm{FB}})\)) w.r.t. Clarke’s subdifferential, and

  3. (c)

    \(\bar{x}\)is a KKT point of (\(\hbox {MPOC}(\varphi _{\mathrm{FB}})\)) w.r.t. Mordukhovich’s subdifferential.

Proof

The implication (c)\(\Longrightarrow\)(b) is trivial due to \(\partial ^{\mathrm{M}}\varphi _{\mathrm{FB}}(a,b)\subset \partial ^{\mathrm{C}}\varphi _{\mathrm{FB}}(a,b)\) for all \((a,b)\in \mathbb {R}^2\). Furthermore, (b)\(\Longrightarrow\)(a) follows easily by the fact \(\mathbb {B}^1(\mathtt {e}^2)\subset \mathbb {R}^2_+\). It remains to show (a)\(\Longrightarrow\)(c).

Thus, let \(\bar{x}\) be a W-stationary point of (MPOC). Then, we find multipliers \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)), \(\rho _j\) (\(j\in \mathcal {P}\)), \(\mu _l\ge 0\) (\(l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})\)), and \(\nu _l\ge 0\) (\(l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})\)) such that (3) holds. Let us assume \(I^{00}(\bar{x})\ne \varnothing\) (otherwise, the proof is straightforward). Pick an index \(l\in I^{00}(\bar{x})\) and define \(\xi _l:=\mu _l+\nu _l+\sqrt{2\mu _l\nu _l}\ge 0\). In case \(\xi _l=0\), we set \(\alpha _l=\beta _l:=1-\tfrac{\sqrt{2}}{2}\). Otherwise, we define \(\alpha _l:=\mu _l/\xi _l\) and \(\beta _l:=\nu _l/\xi _l\). By construction, the relation \((\alpha _l,\beta _l)\in \mathbb {S}^1(\mathtt {e}^2)=\partial ^{\mathrm{M}}\varphi _{\mathrm{FB}}(G_l(\bar{x}),H_l(\bar{x}))\) follows. Setting \(\xi _l:=\mu _l\), \(\alpha _l:=1\), and \(\beta _l:=0\) for all \(l\in I^{0+}(\bar{x})\) as well as \(\xi _l:=\nu _l\), \(\alpha _l:=0\), and \(\beta _l:=1\) for all \(l\in I^{+0}(\bar{x})\), we have

$$\begin{aligned} 0&=\nabla f(\bar{x}) +\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{\varphi _{\hbox {FB}}(\bar{x})}} \xi _l\bigl (\alpha _l\nabla G_l(\bar{x})+\beta _l\nabla H_l(\bar{x})\bigr )\\&\qquad \forall l\in I^{\varphi _{\mathrm{FB}}}(\bar{x}):\,\xi _l\ge 0,\\&\qquad \forall l\in I^{\varphi _{\mathrm{FB}}}(\bar{x}):\,(\alpha _l,\beta _l)\in \partial ^{\mathrm{M}}\varphi _{\mathrm{FB}}(G_l(\bar{x}),H_l(\bar{x})), \end{aligned}$$

i.e. \(\bar{x}\) is a KKT point of (\(\hbox {MPOC}(\varphi _{\mathrm{FB}})\)) w.r.t. Mordukhovich’s subdifferential. \(\square\)

Let us briefly point the reader’s attention to the fact that the use of Mordukhovich’s subdifferential construction w.r.t. the function \(\varphi\) in the KKT system associated with (\(\hbox {MPOC}(\varphi )\)) does not automatically lead to the identification of M-stationary points of (MPOC) as Proposition 3.3 demonstrates.

The above Propositions 3.1 to 3.3 suggest to solve (\(\hbox {MPOC}(\varphi )\)) instead of (MPOC) in order to identify stationary points of the latter. Noting that at least (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)) is a smooth problem, this can be done exploiting standard solvers from nonlinear programming. Suitable methods from nonsmooth optimization can be used to tackle (\(\hbox {MPOC}(\varphi _{\mathrm{min}})\)) and (\(\hbox {MPOC}(\varphi _{\mathrm{FB}})\)) numerically.

4 Tranformation into other disjunctive programs

4.1 Relations to switching-constrained programming

Let us consider the switching-constrained optimization problem

$$\begin{aligned} \begin{aligned} f(x)&\,\rightarrow \,\min \limits _{x,y,z}&\\ g_i(x)&\,\le \,0&&i\in \mathcal {M}&\\ h_j(x)&\,=\,0&&j\in \mathcal {P}&\\ y_l,z_l&\,\le \,0&&l\in \mathcal {Q}&\\ (G_l(x)-y_l)(H_l(x)-z_l)&\,=\,0&&l\in \mathcal {Q}&\end{aligned} \end{aligned}$$
(SC-MPOC)

associated with (MPOC). One can easily check that for each feasible point \(\bar{x}\in X\) of (MPOC), we find \(\bar{y},\bar{z}\in \mathbb {R}^q\) such that \((\bar{x},\bar{y},\bar{z})\) is feasible to (SC-MPOC). On the other hand, if \((\tilde{x},\tilde{y},\tilde{z})\in \mathbb {R}^n\times \mathbb {R}^q\times \mathbb {R}^q\) is feasible to (SC-MPOC), then \(\tilde{x}\) is feasible to (MPOC). This observation has been used in [30, Section 7.1] in order to show that (MPOC) and (SC-MPOC) are somehow equivalent w.r.t. global minimizers while the local minimizers of (MPOC) can be found among the local minimizers of (SC-MPOC). Moreover, it has been shown that whenever \((\tilde{x},\tilde{y},\tilde{z})\) is a local minimizer of (SC-MPOC) where \(I^{0-}(\tilde{x})\cup I^{-0}(\tilde{x})\cup I^{00}(\tilde{x})=\varnothing\) holds, then \(\tilde{x}\) is a local minimizer of (MPOC). These results justify to consider the switching model (SC-MPOC) instead of (MPOC). However, one has to notice that this transformation comes for the price of 2q additional slack variables and potential artificial local minimizers.

Let us compare (MPOC) and (SC-MPOC) w.r.t. stationary points since local minimizers of (MPOC) correspond to local minimizers of (SC-MPOC) which satisfy certain stationarity conditions under validity of constraint qualifications. It follows from [30, Section 7.2] that the W-, M-, and S-stationary points of (MPOC) can be found among the \(\hbox {W}_{\mathrm{SC}}\)-, \(\hbox {M}_{\mathrm{SC}}\)-, and \(\hbox {S}_{\mathrm{SC}}\)- stationary points of (SC-MPOC). As we will see below, the converse statement is also true in certain situations.

Proposition 4.1

Let\((\bar{x},\bar{y},\bar{z})\in \mathbb {R}^n\times \mathbb {R}^q\times \mathbb {R}^q\)be feasible to (SC-MPOC) and assume that the index sets\(I^{0-}(\bar{x})\)and\(I^{-0}(\bar{x})\)are empty. If\((\bar{x},\bar{y},\bar{z})\)is\(\hbox {W}_{\mathrm{SC}}\)-stationary (\(\hbox {M}_{\mathrm{SC}}\)-stationary, \(\hbox {S}_{\mathrm{SC}}\)-stationary) for (SC-MPOC), then it is W-stationary (M-stationary, S-stationary) for (MPOC).

Proof

First, we set

$$\begin{aligned} I^{G}(\bar{x},\bar{y},\bar{z})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})=\bar{y}_l\,\wedge \,H_l(\bar{x})\ne \bar{z}_l\},\\ I^{H}(\bar{x},\bar{y},\bar{z})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})\ne \bar{y}_l\,\wedge \,H_l(\bar{x})=\bar{z}_l\},\\ I^{GH}(\bar{x},\bar{y},\bar{z})&:=\{l\in \mathcal {Q}\,|\,G_l(\bar{x})=\bar{y}_l\,\wedge \,H_l(\bar{x})=\bar{z}_l\}. \end{aligned}$$

Let \((\bar{x},\bar{y},\bar{z})\) be \(\hbox {W}_{\mathrm{SC}}\)-stationary for (SC-MPOC). Then, after elimination of the multipliers corresponding to the inequality constraints on the variables \(\bar{y}\) and \(\bar{z}\), there are multipliers \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)), \(\rho _j\) (\(j\in \mathcal {P}\)), \(\mu _l\ge 0\) (\(l\in I^G(\bar{x},\bar{y},\bar{z})\cup I^{GH}(\bar{x},\bar{y},\bar{z})\)), and \(\nu _l\ge 0\) (\(l\in I^H(\bar{x},\bar{y},\bar{z})\cup I^{GH}(\bar{x},\bar{y},\bar{z})\)) which satisfy

$$\begin{aligned} 0&=\nabla f(\bar{x})+\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\nonumber \\&\quad +\sum \limits _{l\in I^G(\bar{x},\bar{y},\bar{z})\cup I^{GH}(\bar{x},\bar{y},\bar{z})}\mu _l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^H(\bar{x},\bar{y},\bar{z})\cup I^{GH}(\bar{x},\bar{y},\bar{z})}\nu _l\nabla H_l(\bar{x})\nonumber \\&\qquad \forall l\in I^G(\bar{x},\bar{y},\bar{z})\cup I^{GH}(\bar{x},\bar{y},\bar{z}):\,\mu _l\bar{y}_l=0,\nonumber \\&\qquad \forall l\in I^H(\bar{x},\bar{y},\bar{z})\cup I^{GH}(\bar{x},\bar{y},\bar{z}):\,\nu _l\bar{z}_l=0. \end{aligned}$$
(6)

Obviously, we have

$$\begin{aligned} \{l\in I^G(\bar{x},\bar{y},\bar{z})\cup I^{GH}(\bar{x},\bar{y},\bar{z})\,|\,\bar{y}_l=0\} = \{l\in \mathcal {Q}\,|\,G_l(\bar{x})=\bar{y}_l=0\} = I^{0+}(\bar{x})\cup I^{00}(\bar{x}) \end{aligned}$$

and

$$\begin{aligned} \{l\in I^H(\bar{x},\bar{y},\bar{z})\cup I^{GH}(\bar{x},\bar{y},\bar{z})\,|\,\bar{z}_l=0\} = \{l\in \mathcal {Q}\,|\,H_l(\bar{x})=\bar{z}_l=0\} = I^{+0}(\bar{x})\cup I^{00}(\bar{x}) \end{aligned}$$

from \(I^{0-}(\bar{x})=I^{-0}(\bar{x})=\varnothing\). Thus, the multiplier \(\mu _l\) can be positive only for indices \(l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})\) while \(\nu _l\) can be positive only for \(l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})\). Consequently, (6) shows that \(\bar{x}\) is W-stationary for (MPOC).

Next, we suppose that \((\bar{x},\bar{y},\bar{z})\) is \(\hbox {M}_{\mathrm{SC}}\)-stationary for (SC-MPOC). Then, the above multipliers additionally need to satisfy

$$\begin{aligned} \forall l\in I^{GH}(\bar{x},\bar{y},\bar{z}):\quad \mu _l\nu _l=0. \end{aligned}$$

Clearly, the assumption \(I^{0-}(\bar{x})=I^{-0}(\bar{x})=\varnothing\) as well as the above considerations show

$$\begin{aligned} I^{00}(\bar{x}) \subset \{l\in I^{GH}(\bar{x},\bar{y},\bar{z})\,|\,\bar{y}_l=\bar{z}_l=0\}. \end{aligned}$$
(7)

Thus, \(\mu _l\nu _l=0\) holds for all \(l\in I^{00}(\bar{x})\), i.e. \(\bar{x}\) is M-stationary for (MPOC).

Finally, suppose that \((\bar{x},\bar{y},\bar{z})\) is \(\hbox {S}_{\mathrm{SC}}\)-stationary for (SC-MPOC). In this case, the above multipliers additionally satisfy the condition

$$\begin{aligned} \forall l\in I^{GH}(\bar{x},\bar{y},\bar{z}):\quad \mu _l=0\,\wedge \,\nu _l=0. \end{aligned}$$

Then, (7) yields that \(\mu _l=0\) and \(\nu _l=0\) hold for all \(l\in I^{00}(\bar{x})\) which means that \(\bar{x}\) is already S-stationary for (MPOC). \(\square\)

Let us visualize the assertion of Proposition 4.1 by means of the following toy program taken from [30].

Example 4.1

Consider the simple or-constrained program

$$\begin{aligned} (x_1-1)^2\,&\rightarrow \,\min \nonumber \\ x_1\,\le \,0\,\vee \,x_2\,&\le \,0. \end{aligned}$$
(8)

The set of its global minimizers is given by \(G:=\{(1,x_2)\,|\,x_2\le 0\}\) while there are additional local minimizers at all points from \(L:=\{(0,x_2)\,|\,x_2>0\}\). One can easily check that all points from G and L are S-stationary for (8). Note that there is an additional M-stationary point at (0, 0) which is not a local minimizer of (8).

Now, we consider the switching-constrained surrogate problem

$$\begin{aligned} (x_1-1)^2\,&\rightarrow \,\min \limits _{x,y,z}\nonumber \\ y,z\,&\le \,0\nonumber \\ (x_1-y)(x_2-z)\,&=\,0 \end{aligned}$$
(9)

associated with (8). By construction, all local minimizers and stationary points of (8) can be found among the local minimizers and stationary points of (9). It has been mentioned in [30, Example 7.1] that (9) possesses local minimizers whose x-components do not correspond to local minimizers of (8) e.g. at the points \((\bar{x},\bar{y},\bar{z}):=(0,0,0,-2)\) and \((\tilde{x},\tilde{y},\tilde{z}):=(0,-1,0,-2)\). Due to [30, Theorem 7.2], these points are \(\hbox {M}_{\mathrm{SC}}\)-stationary for (9) since the latter is a switching-constrained program whose feasible region is defined via affine data functions only. As mentioned above, \(\bar{x}\) is an M-stationary point of (8) while \(\tilde{x}\) is not. Finally, observe that \(I^{00}(\bar{x})=\{1\}\) and \(I^{0-}(\tilde{x})=\{1\}\) hold.

Due to the facts discussed above, it is reasonable to focus on the computation of stationary points of (SC-MPOC) in order to solve (MPOC). However, one has to keep in mind that there are stationary solutions of (SC-MPOC) that are not stationary for (MPOC), see Proposition 4.1 and Example 4.1.

In [24], the authors suggest to modify relaxation techniques for the numerical handling of MPCCs in order to tackle switching-constrained optimization problems. The presented computational results depict that adapted global relaxation schemes due to Scholtes, see [34], as well as Kanzow and Schwartz, see [25], are suitable for that purpose. The adapted method due to Scholtes turned out to be the more robust one which is why we briefly comment on this approach below. For details, we refer the interested reader to [24].

For some parameter \(t\ge 0\), let us investigate the relaxed nonlinear program

$$\begin{aligned}f(x)&\,\to\,\min\limits_{x,y,z}&&&\\g_i(x)&\,\leq\,0&\qquad&i\in\mathcal M&\\h_j(x)&\,=\,0&&j\in\mathcal P&\\y_l,z_l&\,\leq\,0&&l\in\mathcal Q&\\-t\,\leq\,(G_l(x)-y_l)(H_l(x)-z_l)&\,\leq\,t&&l\in\mathcal Q.&\end{aligned}\quad{(\text{SC-MPOC}_\text{S}(t))}$$

It possesses \(m+4q\) inequality and p equality constraints. Clearly, for positive t, the feasible set of (\(\hbox {SC-MPOC}_{\mathrm{S}}(t)\)) is a superset of the feasible set associated with (SC-MPOC). Moreover, the family of feasible sets associated with (\(\hbox {SC-MPOC}_{\mathrm{S}}(t)\)) is nested w.r.t. t in such a way that for \(t=0\), the feasible set of (SC-MPOC) is restored. Thus, for the numerical solution of (SC-MPOC), one can choose a sequence \(\{t_k\}_{k\in \mathbb {N}}\) of positive relaxation parameters converging to zero and solve the associated relaxed nonlinear problems (\(\hbox {SC-MPOC}_{\mathrm{S}}(t_k)\)) using standard solvers from nonlinear programming. Supposing that the computed sequence converges, its limit point is feasible to (SC-MPOC). Furthermore, suitable assumptions can be imposed to guarantee that this limit point is at least \(\hbox {W}_{\mathrm{SC}}\)-stationary, i.e. this approach is likely to produce W-stationary points of (MPOC), see [24, Theorem 3.2].

4.2 Relations to complementarity-constrained programming

Let as consider the complementarity-constrained optimization problem

$$\begin{aligned} \begin{aligned} f(x)&\,\rightarrow \,\min \limits _{x,y,z}&\\ g_i(x)&\,\le \,0&& i\in \mathcal {M}&\\ h_j(x)&\,=\,0&&j\in \mathcal {P}&\\ G_l(x)-y_l&\,\le \,0&&l\in \mathcal {Q}&\\ H_l(x)-z_l&\,\le \,0&&l\in \mathcal {Q}&\\ 0\,\le \,y_l\,\perp \,z_l&\,\ge \,0&&l\in \mathcal {Q}&\end{aligned} \end{aligned}$$
(CC-MPOC)

associated with (MPOC). Fix an arbitrary feasible point \(\bar{x}\in X\) of (MPOC) and define \(\bar{y},\bar{z}\in \mathbb {R}^q\) as stated below:

$$\begin{aligned} \forall l\in \mathcal {Q}:\quad \bar y_l&:= \begin{cases} 0&l\in I^{-0}(\bar x)\cup I^{-+}(\bar x)\cup I^{0+}(\bar x)\cup I^{--}(\bar x)\cup I^{00}(\bar x),\\ 1&l\in I^{0-}(\bar x),\\ 2G_l(\bar x)&l\in I^{+-}(\bar x)\cup I^{+0}(\bar x), \end{cases}\\ \bar z_l&:= \begin{cases} 0&l\in I^{0-}(\bar x)\cup I^{+-}(\bar x)\cup I^{+0}(\bar x)\cup I^{--}(\bar x)\cup I^{00}(\bar x),\\ 1&l\in I^{-0}(\bar x),\\ 2H_l(\bar x)&l\in I^{-+}(\bar x)\cup I^{0+}(\bar x). \end{cases} \end{aligned}$$
(10)

Clearly, \((\bar{x},\bar{y},\bar{z})\) is feasible to (CC-MPOC). On the other hand, one can easily check that for each feasible point \((\tilde{x},\tilde{y},\tilde{z})\in \mathbb {R}^n\times \mathbb {R}^q\times \mathbb {R}^q\) of (CC-MPOC), \(\tilde{x}\) is feasible to (MPOC).

Based on this observation, we obtain the following result by standard arguments.

Proposition 4.2

  1. 1.

    Let\(\bar{x}\in X\)be a locally (globally) optimal solution of (MPOC). Furthermore, let\(\bar{y},\bar{z}\in \mathbb {R}^q\)be the vectors defined in (10). Then, \((\bar{x},\bar{y},\bar{z})\)is a locally (globally) optimal solution of (CC-MPOC).

  2. 2.

    Let\((\tilde{x},\tilde{y},\tilde{z})\in \mathbb {R}^n\times \mathbb {R}^q\times \mathbb {R}^q\)be a globally optimal solution of (CC-MPOC). Then, \(\tilde{x}\)is a globally optimal solution of (MPOC).

The upcoming example shows that the second statement of Proposition 4.2 cannot be extended to local minimizers. This observation parallels the one for the switching-constrained reformulation of (MPOC) discussed in Sect. 4.1.

Example 4.2

Let us consider (8) as well as its complementarity-constrained reformulation

$$\begin{aligned} (x_1-1)^2\,&\rightarrow \,\min \limits _{x,y,z}\nonumber \\ x_1-y\,&\le \,0\nonumber \\ x_2-z\,&\le \,0\nonumber \\ 0\,\le y\,\perp \,z\,&\ge \,0. \end{aligned}$$
(11)

Using similar arguments as in [30, Example 7.1], one can check that the points \((0,-1,0,1)\) and (0, 0, 0, 1) are local minimizers of (11) which do not correspond to the minimizers of (8) characterized in Example 4.1.

Similar to [30, Lemma 7.2], we obtain the following result which is not surprising in light of Example 4.2.

Proposition 4.3

Let\((\bar{x},\bar{y},\bar{z})\in \mathbb {R}^n\times \mathbb {R}^q\times \mathbb {R}^q\)be a locally optimal solution of (CC-MPOC) and assume that the index sets\(I^{-0}(\bar{x})\), \(I^{0-}(\bar{x})\), and\(I^{00}(\bar{x})\)are empty. Then, \(\bar{x}\)is a local minimizer of (MPOC).

Summarizing the above facts, the transformation (CC-MPOC) comes for the price of 2q slack variables and potential additional local minimizers. These are precisely those disadvantages we had to face when using the switching-constrained surrogate (SC-MPOC).

Finally, we want to compare (MPOC) and (CC-MPOC) w.r.t. stationary points. The upcoming result shows that we can find the W-, M-, and S-stationary points of (MPOC) among the \(\hbox {C}_{\mathrm{CC}}\)-, \(\hbox {M}_{\mathrm{CC}}\)-, and \(\hbox {S}_{\mathrm{CC}}\)-stationary points of (CC-MPOC).

Proposition 4.4

Let\(\bar{x}\in X\)be a W-stationary (M-stationary, S-stationary) point of (MPOC). Furthermore, let\(\bar{y},\bar{z}\in \mathbb {R}^q\)be the vectors defined in (10). Then, \((\bar{x},\bar{y},\bar{z})\)is\(\hbox {C}_{\mathrm{CC}}\)-stationary (\(\hbox {M}_{\mathrm{CC}}\)-stationary, \(\hbox {S}_{\mathrm{CC}}\)-stationary) for (CC-MPOC).

Proof

By definition of \(\bar{y}\) and \(\bar{z}\), we obtain

$$\begin{aligned} I^{0+}_{\mathrm{CC}}(\bar{x},\bar{y},\bar{z})&=I^{-0}(\bar{x})\cup I^{-+}(\bar{x})\cup I^{0+}(\bar{x}),\\ I^{+0}_{\mathrm{CC}}(\bar{x},\bar{y},\bar{z})&=I^{0-}(\bar{x})\cup I^{+-}(\bar{x})\cup I^{+0}(\bar{x}),\\ I^{00}_{\mathrm{CC}}(\bar{x},\bar{y},\bar{z})&=I^{--}(\bar{x})\cup I^{00}(\bar{x}). \end{aligned}$$

Since \(\bar{x}\) is W-stationary for (MPOC), we find multipliers \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)), \(\rho _j\) (\(j\in \mathcal {P}\)), \(\mu _l\ge 0\) (\(l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})\)), and \(\nu _l\ge 0\) (\(l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})\)) which satisfy (3). Now set \(\mu _l:=0\) for all \(l\in \mathcal {Q}\setminus (I^{0+}(\bar{x})\cup I^{00}(\bar{x}))\) as well as \(\nu _l:=0\) for all \(l\in \mathcal {Q}\setminus (I^{+0}(\bar{x})\cup I^{00}(\bar{x})))\). Furthermore, fix \(\bar{\mu }:=-\mu\) and \(\bar{\nu }:=-\nu\). Then, these multipliers solve

$$\begin{aligned} 0&=\nabla f(\bar{x}) +\sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in \mathcal {Q}}\mu _l\nabla G_l(\bar{x}) +\sum \limits _{l\in \mathcal {Q}}\nu _l\nabla H_l(\bar{x}),\\ 0&=-\mu -\bar{\mu },\qquad 0=-\nu -\bar{\nu },\\&\qquad \forall i\in I^g(\bar{x}):\,\lambda _i\ge 0,\\ 0&=\mu \cdot (G(\bar{x})-\bar{y}),\qquad 0=\nu \cdot (H(\bar{x})-\bar{z}),\\&\qquad \forall l\in I^{+0}_{\mathrm{CC}}(\bar{x},\bar{y},\bar{z}):\,\bar{\mu }_l=0,\\&\qquad \forall l\in I^{0+}_{\mathrm{CC}}(\bar{x},\bar{y},\bar{z}):\,\bar{\nu }_l=0,\\&\qquad \forall l\in I^{00}_{\mathrm{CC}}(\bar{x},\bar{y},\bar{z}):\,\bar{\mu }_l\bar{\nu }_l\ge 0 \end{aligned}$$

which is the \(\hbox {C}_{\mathrm{CC}}\)-stationarity system of (CC-MPOC) at \((\bar{x},\bar{y},\bar{z})\), i.e. the latter point is \(\hbox {C}_{\mathrm{CC}}\)-stationary for (CC-MPOC).

If \(\bar{x}\) is M-stationary (S-stationary) for (MPOC), then the multipliers \(\mu\) and \(\nu\) from above additionally satisfy \(\mu _l\nu _l=0\) (\(\mu _l=0\) and \(\nu _l=0\)) for all \(l\in I^{00}(\bar{x})\). This means that the new multipliers \(\bar{\mu }\) and \(\bar{\nu }\) particularly satisfy \(\bar{\mu }_l\bar{\nu }_l=0\) (\(\bar{\mu }_l= 0\) and \(\bar{\nu }_l= 0\)) for all \(l\in I^{00}_{\mathrm{CC}}(\bar{x},\bar{y},\bar{z})\) which implies that \((\bar{x},\bar{y},\bar{z})\) is \(\hbox {M}_{\mathrm{CC}}\)-stationary (\(\hbox {S}_{\mathrm{CC}}\)-stationary) for (CC-MPOC). \(\square\)

Proceeding in a similar way as used for the proof of Proposition 4.1, we can validate the following result.

Proposition 4.5

Let\((\bar{x},\bar{y},\bar{z})\in \mathbb {R}^n\times \mathbb {R}^q\times \mathbb {R}^q\)be feasible to (CC-MPOC) and assume that the index sets\(I^{0-}(\bar{x})\)and\(I^{-0}(\bar{x})\)are empty. If\((\bar{x},\bar{y},\bar{z})\)is\(\hbox {C}_{\mathrm{CC}}\)-stationary (\(\hbox {M}_{\mathrm{CC}}\)-stationary, \(\hbox {S}_{\mathrm{CC}}\)-stationary) for (CC-MPOC), then it is W-stationary (M-stationary, S-stationary) for (MPOC).

In terms of Propositions 4.4 and 4.5, it seems to be promising to focus on the computation of stationary points associated with the complementarity-constrained program (CC-MPOC) in order to find stationary points of (MPOC). Similarly to the switching-constrained approach described in Sect. 4.1, we face the difficulty that the stationary points of the surrogate program (CC-MPOC) do not always correspond to stationary points of (MPOC). Thus, both approaches share the same qualitative properties.

In order to solve (CC-MPOC) computationally, it is possible to exploit e.g. problem-tailored SQP-methods, cf. [15, 27], or relaxation schemes, see [21] for an overview. Here, we focus on the well-known global relaxation approach of Scholtes, see [34], which turned out to be numerically efficient and robust in comparison with other relaxation methods, see [21]. For some parameter \(t\ge 0\), we consider the nonlinear surrogate problem

$$\begin{aligned}f(x)&\,\to\,\min\limits_{x,y,z}&&&\\g_i(x)&\,\leq\,0&\qquad&i\in\mathcal M&\\h_j(x)&\,=\,0&&j\in\mathcal P&\\G_l(x)-y_l&\,\leq\,0&&l\in\mathcal Q&\\H_l(x)-z_l&\,\leq\,0&&l\in\mathcal Q&\\y_l,z_l&\,\geq \,0&&l\in\mathcal Q&\\y_lz_l&\,\leq\,t&&l\in\mathcal Q&\end{aligned}\quad{(\text{CC-MPOC}_\text{S}(t))}$$

which possesses \(m+5q\) inequality and p equality constraints. Noting that the feasible sets of (\(\hbox {CC-MPOC}_{\mathrm{S}}(t)\)) form a nested family whose limit as \(t\downarrow 0\) is the feasible set of (CC-MPOC), we can exploit the following strategy for the numerical solution of (MPOC). First, we choose a sequence \(\{t_k\}_{k\in \mathbb {N}}\) of positive relaxation parameters converging to 0. Afterwards, we use standard solvers from nonlinear programming to compute solutions associated with (\(\hbox {CC-MPOC}_{\mathrm{S}}(t_k)\)). The potential limit of this sequence is feasible to (CC-MPOC) and, under some reasonable assumptions, a \(\hbox {C}_{\mathrm{CC}}\)-stationary point of this program, see [21, Section 3.1]. Due to Proposition 4.5, this strategy is likely to produce W-stationary points of (MPOC).

At this point, we want to remark that the Scholtes-type relaxation approach from Sect. 4.1 seems to be numerically cheaper since the resulting relaxed surrogate program (\(\hbox {SC-MPOC}_{\mathrm{S}}(t)\)) generally possesses less constraints than (\(\hbox {CC-MPOC}_{\mathrm{S}}(t)\)). On the other hand, due to the different role of the slack variables, the non-linearities in (\(\hbox {CC-MPOC}_{\mathrm{S}}(t)\)) seem to be more balanced than in (\(\hbox {SC-MPOC}_{\mathrm{S}}(t)\)). A quantitative comparison of both methods is provided in Sect. 6.

5 Relaxation of or-constraints

In contrast to complementarity-, vanishing-, switching-, or cardinality-constrained programming where essential difficulties arise from the fact that the feasible set is almost disconnected, or-constrained programs may behave geometrically well in this regard (apart from pathological cases comprising e.g. optimization problems with gap domains, see Sect. 6.2.2). However, we still need to deal with the combinatorial structure of the feasible set and the irregularity at the or-kink. As we will see later, a direct treatment as described in Sect. 3 struggles with this issue. Thus, a nearby idea is a relaxation of this kink. Motivated by the computational results from [21, 24], we perform a global relaxation and smoothing of the kink using a Scholtes-type approach which is visualized in Fig. 4. Note that the popular relaxation approach due to Kanzow and Schwartz, see [24, 25], would only lead to a shift of the kink but preserves its difficult variational structure. Thus, this idea does not reflect the general intention of this section which is why we do not consider it here.

Fig. 4
figure 4

Geometric illustration of the Scholtes-type global relaxation approach

Let \(t\ge 0\) be a relaxation parameter. In order to perform the relaxation of our interest, we focus on two modified NCP-functions characterized below. Note that any other (smoothed) or-compatible NCP-function can be used for this approach for the price of a potentially different underlying convergence analysis.

  • First, we will deal with the smoothed Fischer–Burmeister function \(\varphi _{\mathrm{FB}}^t:\mathbb {R}^2\rightarrow \mathbb {R}\) given by

    $$\begin{aligned} \forall (a,b)\in \mathbb {R}^2:\quad \varphi ^t_{\mathrm{FB}}(a,b):=a+b-\sqrt{a^2+b^2+2t}. \end{aligned}$$

    The smoothing of the Fischer–Burmeister function has been suggested by Kanzow in [23] where \(\varphi ^t_{\mathrm{FB}}\) is used for the numerical treatment of linear complementarity problems, see [16] as well. The smoothing of NCP-functions in nonlinear complementarity-constrained programming is the subject of interest in [10].

  • For our second approach, we make use of \(\varphi ^t_{\mathrm{KS}}:\mathbb {R}^2\rightarrow \mathbb {R}\) given by

    $$\begin{aligned} \forall (a,b)\in \mathbb {R}^2:\quad \varphi ^t_{\mathrm{KS}}(a,b):= \begin{cases} ab-t&\text{if }a+b\geq 0\\ -\tfrac12(a^2+b^2+2t)&\text{if }a+b<0. \end{cases} \end{aligned}$$

    Clearly, this function is related to the NCP-function \(\varphi _{\mathrm{KS}}\) from Sect. 3. However, since \(\varphi _{\mathrm{KS}}\) is already smooth, \(\varphi ^t_{\mathrm{KS}}\) cannot be referred to as a smoothed NCP-function. Instead, \(\varphi ^t_{\mathrm{KS}}\) results from \(\varphi _{\mathrm{KS}}\) by subtracting the offset t. In this way, the boundary of the associated zero sublevel set becomes smooth. That is why we will refer to \(\varphi ^t_{\mathrm{KS}}\) as the offset Kanzow–Schwartz function. Clearly, \(\varphi ^t_{\mathrm{KS}}\) is continuously differentiable for each \(t\ge 0\) since \(\varphi _{\mathrm{KS}}\) possesses this property.

For some relaxation parameter \(t\ge 0\) and a function \(\varphi ^t\in \{\varphi ^t_{\mathrm{FB}},\varphi ^t_{\mathrm{KS}}\}\), we now consider the relaxed surrogate

$$\begin{aligned}f(x)&\,\rightarrow\,\min&&&\\g_i(x)&\,\leq\,0&\qquad&i\in\mathcal M&\\h_j(x)&\,=\,0&\qquad&j\in\mathcal P&\\ \varphi^t(G_l(x),H_l(x))&\,\leq\,0&\qquad&l\in\mathcal Q&\end{aligned}\quad{(P(\varphi^t))}$$

whose feasible set will be denoted by \(X(\varphi ^t)\). Noting that for each \(t\ge 0\), one has

$$\begin{aligned} \forall (a,b)\in \mathbb {R}^2:\quad \varphi ^t_{\mathrm{FB}}(a,b)\le 0\,\Longleftrightarrow \,\varphi ^t_{\mathrm{KS}}(a,b)\le 0, \end{aligned}$$

the sets \(X(\varphi ^t_{\mathrm{FB}})\) and \(X(\varphi ^t_{\mathrm{KS}})\) are the same. However, their particular nonlinear description differs significantly. In the lemma below, we summarize the geometrical properties of the family \(\{X(\varphi ^t)\}_{t\ge 0}\). The proof of this result is rather standard and, thus, omitted.

Lemma 5.1

For\(\varphi ^t\in \{\varphi ^t_{\mathrm{FB}},\varphi ^t_{\mathrm{KS}}\}\), the family\(\{X(\varphi ^t)\}_{t\ge 0}\)possesses the following properties:

  1. 1.

    \(X(\varphi ^0)=X\),

  2. 2.

    \(0\le s\le t\,\Longrightarrow \,X(\varphi ^s)\subset X(\varphi ^t)\), and

  3. 3.

    \(\bigcap _{t>0}X(\varphi ^t)=X\).

Due to the above lemma, the following general strategy for the numerical treatment of (MPOC) is reasonable. For a sequence \(\{t_k\}_{k\in \mathbb {N}}\) of positive relaxation parameters converging to zero, we solve the relaxed surrogate (\(\hbox {P}(\varphi ^{t_k}_{\mathrm{FB}})\)) or (\(\hbox {P}(\varphi ^{t_k}_{\mathrm{KS}})\)). Noting that these problems are standard nonlinear programs, it is reasonable to demand that we are in position to compute associated KKT points. If the obtained sequence of points possesses an accumulation point, then the latter is feasible to (MPOC). In the following, we will discuss whether this accumulation point is stationary for (MPOC) as well. For that purpose, let us introduce the set

$$\begin{aligned} I^{\varphi ^t}(x):=\{l\in \mathcal {Q}\,|\,\varphi ^t(G_l(x),H_l(x))=0\} \end{aligned}$$

for a feasible point \(x\in X(\varphi ^t)\) of (\(\hbox {P}(\varphi ^{t})\)). Clearly, \(I^{\varphi ^t}(x)\) comprises all indices corresponding to smoothed or-constraints active at x.

5.1 The smoothed Fischer–Burmeister function

Here, we analyze the proposed relaxation scheme w.r.t. the smoothed Fischer–Burmeister function \(\varphi ^t_{\mathrm{FB}}\). First, we characterize its inherent convergence properties. Afterwards, the regularity of the associated nonlinear subproblems (\(\hbox {P}(\varphi ^{t}_{\mathrm{FB}})\)) is discussed in more detail.

Theorem 5.1

Let\(\{t_k\}_{k\in \mathbb {N}}\)be a sequence of positive relaxation parameters converging to zero. For each\(k\in \mathbb {N}\), let\(x_k\in X(\varphi ^{t_k}_{\mathrm{FB}})\)be a KKT point of (\(\hbox {P}(\varphi ^{t_k}_{\mathrm{FB}})\)). Suppose that\(\{x_k\}_{k\in \mathbb {N}}\)converges to some point\(\bar{x}\in X\)where MPOC-MFCQ holds. Then, \(\bar{x}\)is a W-stationary point of (MPOC).

Proof

Since \(x_k\) is a KKT point of (\(\hbox {P}(\varphi ^{t_k}_{\mathrm{FB}})\)) for each \(k\in \mathbb {N}\), we find multipliers \(\lambda ^k_i\ge 0\) (\(i\in I^g(x_k)\)), \(\rho ^k_j\) (\(j\in \mathcal {P}\)), and \(\xi ^k_l\ge 0\) (\(l\in I^{\varphi ^{t_k}_{\mathrm{FB}}}(x_k)\)) such that

$$\begin{aligned} 0&=\nabla f(x_k) +\sum \limits _{i\in I^g(x_k)}\lambda _i^k\nabla g_i(x_k) +\sum \limits _{j\in \mathcal {P}}\rho _j^k\nabla h_j(x_k)\\&\quad +\sum \limits _{l\in I^{\varphi ^{t_k}_{\mathrm{FB}}}(x_k)} \xi ^k_l\left( \alpha ^k_l\nabla G_l(x_k) +\beta ^k_l\nabla H_l(x_k)\right) \end{aligned}$$

holds where we used

$$\begin{aligned} \alpha ^k_l:=1-\frac{G_l(x_k)}{\sqrt{G_l^2(x_k)+H_l^2(x_k)+2t_k}}\qquad \beta ^k_l:=1-\frac{H_l(x_k)}{\sqrt{G_l^2(x_k)+H_l^2(x_k)+2t_k}} \end{aligned}$$

for all \(k\in \mathbb {N}\) and all \(l\in \mathcal {Q}\). Noting that \(x_k\rightarrow \bar{x}\) holds while all involved mappings are continuous, we may assume

$$\begin{aligned} \forall k\in \mathbb {N}:\quad I^g(x_k)\subset I^g(\bar{x}),\qquad I^{\varphi ^{t_k}_{\mathrm{FB}}}(x_k)\subset \mathcal {I}(\bar{x}). \end{aligned}$$

For each \(k\in \mathbb {N}\), we formally set \(\lambda ^k_l:=0\) for all \(l\in I^g(\bar{x})\setminus I^g(x_k)\) as well as \(\xi ^k_l:=0\) for all \(l\in \mathcal {I}(\bar{x})\setminus I^{\varphi ^{t_k}_{\mathrm{FB}}}(x_k)\). This yields

$$\begin{aligned} 0&=\nabla f(x_k) +\sum \limits _{i\in I^g(\bar{x})}\lambda _i^k\nabla g_i(x_k) +\sum \limits _{j\in \mathcal {P}}\rho _j^k\nabla h_j(x_k)\nonumber \\&\quad +\sum \limits _{l\in \mathcal {I}(\bar{x})} \xi ^k_l\left( \alpha ^k_l\nabla G_l(x_k) +\beta ^k_l\nabla H_l(x_k)\right) . \end{aligned}$$
(12)

Note that \(\alpha ^k_l,\beta ^k_l\in (0,2)\) holds for all \(k\in \mathbb {N}\) and \(l\in \mathcal {Q}\). This means that the sequences \(\{\alpha ^k_l\}_{k\in \mathbb {N}}\) and \(\{\beta ^k_l\}_{l\in \mathbb {N}}\) converge w.l.o.g. to \(\alpha _l\in [0,2]\) and \(\beta _l\in [0,2]\) for each \(l\in \mathcal {Q}\), respectively. By construction, we have \(\alpha _l=1\) and \(\beta _l=0\) for all \(l\in I^{0+}(\bar{x})\) while \(\alpha _l=0\) and \(\beta _l=1\) hold true for all \(l\in I^{+0}(\bar{x})\). For each \(l\in \mathcal {Q}\), we have

$$\begin{aligned} \alpha ^k_l+\beta ^k_l = 1+1-\frac{G_l(x_k)+H_l(x_k)}{\sqrt{G_l^2(x_k)+H_l^2(x_k)+2t_k}} = 1-\frac{\varphi ^{t_k}_{\mathrm{FB}}(G_l(x_k),H_l(x_k))}{\sqrt{G_l^2(x_k)+H_l^2(x_k)+2t_k}} \ge 1 \end{aligned}$$

by feasibility of \(x_k\) for (\(\hbox {P}(\varphi ^{t_k}_{\mathrm{FB}})\)). Taking the limit, we particularly have \(\alpha _l+\beta _l\ge 1\) for all \(l\in I^{00}(\bar{x})\), which yields that \(\alpha _l\) or \(\beta _l\) is positive for \(l\in I^{00}(\bar{x})\).

Let us assume that the sequence \(\{(\lambda ^k_{I^g(\bar{x})},\rho ^k,\xi _{\mathcal {I}(\bar{x})}^k)\}_{k\in \mathbb {N}}\) is unbounded. We set

$$\begin{aligned} \forall k\in \mathbb {N}:\quad (\tilde{\lambda }^k_{I^g(\bar{x})},\tilde{\rho }^k,\tilde{\xi }^k_{\mathcal {I}(\bar{x})}) := \frac{(\lambda ^k_{I^g(\bar{x})},\rho ^k,\xi _{\mathcal {I}(\bar{x})}^k)}{\Vert (\lambda ^k_{I^g(\bar{x})},\rho ^k,\xi _{\mathcal {I}(\bar{x})}^k)\Vert _2}. \end{aligned}$$

Thus, \(\{(\tilde{\lambda }^k_{I^g(\bar{x})},\tilde{\rho }^k,\tilde{\xi }^k_{\mathcal {I}(\bar{x})})\}_{k\in \mathbb {N}}\) is bounded and converges w.l.o.g. to some nonvanishing \((\tilde{\lambda }_{I^g(\bar{x})},\tilde{\rho },\tilde{\xi }_{\mathcal {I}(\bar{x})})\). Dividing (12) by \(\Vert (\lambda ^k_{I^g(\bar{x})},\rho ^k,\xi _{\mathcal {I}(\bar{x})}^k)\Vert _2\) and taking the limit \(k\rightarrow \infty\) while respecting the properties of the limits \(\alpha ,\beta \in \mathbb {R}^q\) as well as the continuous differentiability of all involved mappings, we come up with

$$\begin{aligned} 0&= \sum \limits _{i\in I^g(\bar{x})}\tilde{\lambda }_i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\tilde{\rho }_j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{0+}(\bar{x})}\tilde{\xi }_l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{+0}(\bar{x})}\tilde{\xi }_l\nabla H_l(\bar{x}) +\sum \limits _{l\in I^{00}(\bar{x})}\tilde{\xi }_l(\alpha _l\nabla G_l(\bar{x})+\beta _l\nabla H_l(\bar{x})). \end{aligned}$$

Noting that \(\tilde{\lambda }_i\ge 0\) (\(i\in I^g(\bar{x})\)) and \(\tilde{\xi }_l\ge 0\) (\(l\in \mathcal {I}(\bar{x})\)) holds true, the validity of MPOC-MFCQ yields \(\tilde{\lambda }_i=0\) (\(i\in I^g(\bar{x})\)), \(\tilde{\rho }_j=0\) (\(j\in \mathcal {P}\)), \(\tilde{\xi }_l=0\) (\(l\in I^{0+}(\bar{x})\cup I^{+0}(\bar{x})\)), and \(\tilde{\xi }_l\alpha _l=\tilde{\xi }_l\beta _l=0\) (\(l\in I^{00}(\bar{x})\)). Since \(\alpha _l\) or \(\beta _l\) is positive for each \(l\in I^{00}(\bar{x})\), we already have \(\tilde{\xi }_l=0\) (\(l\in \mathcal {I}(\bar{x})\)). Summarizing these observations, the multiplier \((\tilde{\lambda }_{I^g(\bar{x})},\tilde{\rho },\tilde{\xi }_{\mathcal {I}(\bar{x})})\) vanishes which is a contradiction.

Thus, \(\{(\lambda ^k_{I^g(\bar{x})},\rho ^k,\xi _{\mathcal {I}(\bar{x})}^k)\}_{k\in \mathbb {N}}\) is bounded and converges w.l.o.g. to some multiplier \((\lambda _{I^g(\bar{x})},\rho ,\xi _{\mathcal {I}(\bar{x})})\). Therefore, taking the limit in (12) yields

$$\begin{aligned} 0&= \nabla f(\bar{x})+ \sum \limits _{i\in I^g(\bar{x})}\lambda _i\nabla g_i(\bar{x}) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(\bar{x})\\&\quad +\sum \limits _{l\in I^{0+}(\bar{x})}\xi _l\nabla G_l(\bar{x}) +\sum \limits _{l\in I^{+0}(\bar{x})}\xi _l\nabla H_l(\bar{x}) +\sum \limits _{l\in I^{00}(\bar{x})}\xi _l(\alpha _l\nabla G_l(\bar{x})+\beta _l\nabla H_l(\bar{x})). \end{aligned}$$

with \(\lambda _i\ge 0\) (\(i\in I^g(\bar{x})\)) and \(\xi _l\ge 0\) (\(l\in \mathcal {I}(\bar{x})\)). Finally, we set

$$\begin{aligned} \begin{aligned}&\forall l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x}):&\mu _l:= {\left\{ \begin{array}{ll} \xi _l &{}l\in I^{0+}(\bar{x}),\\ \xi _l\alpha _l &{}l\in I^{00}(\bar{x}) \end{array}\right. }&\\&\forall l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x}):&\nu _l:= {\left\{ \begin{array}{ll} \xi _l &{}l\in I^{+0}(\bar{x}),\\ \xi _l\beta _l &{}l\in I^{00}(\bar{x}) \end{array}\right. }&\end{aligned} \end{aligned}$$

in order to see that \(\bar{x}\) is a W-stationary point of (MPOC). \(\square\)

At the first glance, the result from Theorem 5.1 seems to be comparatively weak when taking into account similar investigations for other classes of disjunctive programs. On the other hand, the fact that the proposed method produces W-stationary points actually means that local minimizers of (MPOC) which are only W-stationary can be found by this approach. Apart from that, the variational geometry of (MPOC) suggests that the biactive situation is rather artificial at local minimizers of (MPOC), and whenever the biactive set is empty, then all introduced stationarity notions for (MPOC) coincide.

The following simple example confirms that the results of Theorem 5.1 cannot be strengthened.

Example 5.1

Let us consider the simple or-constrained program

$$\begin{aligned} \tfrac{1}{2}(x_1-1)^2+\tfrac{1}{2}(x_2-1)^2\,&\rightarrow \,\min \nonumber \\ x_1\,\le \,0\,\vee \,x_2\,&\le \,0. \end{aligned}$$
(13)

Its globally optimal solutions are given by (1, 0) and (0, 1) and these points are S-stationary. Furthermore, the point \(\bar{x}:=(0,0)\) is W-stationary but not a local minimizer of (13).

Let us consider the associated program (\(\hbox {P}(\varphi ^{t}_{\mathrm{FB}})\)) for some \(t\in (0,1]\). One can easily check that \(x(t):=(\sqrt{t},\sqrt{t})\) is a KKT point of the latter. Taking the limit \(t\downarrow 0\), we have \(x(t)\rightarrow \bar{x}\). Note that MPOC-LICQ is valid at \(\bar{x}\). This means that we cannot strengthen the assertion of Theorem 5.1.

In order to guarantee that the local minimizers associated with the nonlinear program (\(\hbox {P}(\varphi ^{t}_{\mathrm{FB}})\)) are KKT points, a constraint qualification needs to be imposed on the latter problem. As we will show below, the validity of MPOC-MFCQ at some feasible point of (MPOC) implies that standard MFCQ is valid in a neighborhood of this point w.r.t. (\(\hbox {P}(\varphi ^{t}_{\mathrm{FB}})\)). This way, the assumptions of Theorem 5.1 turn out to be quite natural. Particularly, the need for KKT points associated with (\(\hbox {P}(\varphi ^{t}_{\mathrm{FB}})\)) is not restrictive since MPOC-MFCQ is demanded to hold at the associated limit point.

Proposition 5.1

Let\(\bar{x}\in X\)be a feasible point of (MPOC) where MPOC-MFCQ is valid. Then, there exists a neighborhood\(U\subset \mathbb {R}^n\)of\(\bar{x}\)such that MFCQ holds for (\(\hbox {P}(\varphi ^t_{\mathrm{FB}})\)) at allpoints from\(X(\varphi ^t_{\mathrm{FB}})\cap U\)for all\(t>0\).

Proof

Invoking [24, Lemma 2.2], we find a neighborhood U of \(\bar{x}\) such that the union

$$\begin{aligned} \Bigl [\{\nabla g_i(x)\,|\,i\in I^g(\bar{x})\}&\cup \{\nabla G_l(x)\,|\,l\in I^{0+}(\bar{x})\cup I^{00}(\bar{x})\}\\&\cup \{\nabla H_l(x)\,|\,l\in I^{+0}(\bar{x})\cup I^{00}(\bar{x})\}\Bigr ] \cup \{\nabla h_j(x)\,|\,j\in \mathcal {P}\} \end{aligned}$$

is positive-linearly independent for each \(x\in U\) since MPOC-MFCQ is valid and all appearing functions are continuously differentiable.

Now, fix \(t>0\) as well as \(x\in X(\varphi ^t_{\mathrm{FB}})\cap U\). If U is small enough, we have \(I^g(x)\subset I^g(\bar{x})\) and \(I^{\varphi ^t_{\mathrm{FB}}}(x)\subset \mathcal {I}(\bar{x})\) by continuity of g, G, H, and \(\varphi ^t_{\mathrm{FB}}\). Let us set

$$\begin{aligned} \alpha _l:=1-\frac{G_l(x)}{\sqrt{G_l^2(x)+H_l^2(x)+2t}}\qquad \beta _l:=1-\frac{H_l(x)}{\sqrt{G_l^2(x)+H_l^2(x)+2t}} \end{aligned}$$

for all \(l\in \mathcal {Q}\). By construction, it holds \(\alpha _l,\beta _l\in (0,2)\). Furthermore, \(\alpha _l+\beta _l\ge 1\) holds for all \(l\in \mathcal {Q}\) since x is feasible to (\(\hbox {P}(\varphi ^t_{\mathrm{FB}})\)). Hence, \(\alpha _l\) or \(\beta _l\) is positive for each \(l\in \mathcal {Q}\). Clearly, we have

$$\begin{aligned} \begin{aligned}&\forall l\in I^{0+}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x):\quad&\alpha _l\ne 0\quad&\beta _l\approx 0&\\&\forall l\in I^{+0}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x):\quad&\alpha _l\approx 0\quad&\beta _l\ne 0 \end{aligned} \end{aligned}$$

if U is chosen sufficiently small. Thus, we may assume that the union

$$\begin{aligned} \begin{aligned}&\Bigl [ \{\nabla g_i(x)\,|\,i\in I^g(\bar{x})\}\\&\qquad \cup \{\alpha _l\nabla G_l(x)+\beta _l\nabla H_l(x)\,|\,l\in (I^{0+}(\bar{x})\cup I^{+0}(\bar{x}))\cap I^{\varphi ^t_{\mathrm{FB}}}(x)\}\\&\qquad \cup \{\nabla G_l(x)\,|\,l\in I^{00}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x)\}\\&\qquad \cup \{\nabla H_l(x)\,|\,l\in I^{00}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x)\}\Bigr ] \cup \{\nabla h_j(x)\,|\,j\in \mathcal {P}\} \end{aligned} \end{aligned}$$
(14)

is positive-linearly independent.

Now, suppose that there are multipliers \(\lambda _i\ge 0\) (\(i\in I^g(x)\)), \(\rho _j\) (\(j\in \mathcal {P}\)), and \(\xi _l\ge 0\) (\(l\in I^{\varphi ^t_{\mathrm{FB}}}(x)\)) such that

$$\begin{aligned} 0 = \sum \limits _{i\in I^g(x)}\lambda _i\nabla g_i( x) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(x) +\sum \limits _{l\in I^{\varphi ^t_{\mathrm{FB}}}(x)} \xi _l\left( \alpha _l\nabla G_l(x)+\beta _l\nabla H_l(x)\right) \end{aligned}$$

is valid. This is equivalent to

$$\begin{aligned} 0&= \sum \limits _{i\in I^g(x)}\lambda _i\nabla g_i( x) +\sum \limits _{j\in \mathcal {P}}\rho _j\nabla h_j(x)\\&\quad +\sum \limits _{l\in (I^{0+}(\bar{x})\cup I^{+0}(\bar{x}))\cap I^{\varphi ^t_{\mathrm{FB}}}(x)} \xi _l\left( \alpha _l\nabla G_l(x)+\beta _l\nabla H_l(x)\right) \\&\quad +\sum \limits _{l\in I^{00}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x)} \xi _l\alpha _l\nabla G_l(x) +\sum \limits _{l\in I^{00}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x)}\xi _l\beta _l\nabla H_l(x) \end{aligned}$$

since we have \(I^{\varphi ^t_{\mathrm{FB}}}(x)\subset \mathcal {I}(\bar{x})\) by choice of U. The positive-linear independence of the union in (14) and \(I^g(x)\subset I^g(\bar{x})\) yield \(\lambda _i=0\) (\(i\in I^g(x)\)), \(\rho _j=0\) (\(j\in \mathcal {P}\)), \(\xi _l=0\) (\(l\in (I^{0+}(\bar{x})\cup I^{+0}(\bar{x}))\cap I^{\varphi ^t_{\mathrm{FB}}}(x)\)), \(\xi _l\alpha _l=0\) (\(l\in I^{00}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x)\)), as well as \(\xi _l\beta _l=0\) (\(I^{00}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x)\)). Noting that \(\alpha _l\) or \(\beta _l\) is positive, we can infer \(\xi _l=0\) for all indices \(l\in I^{00}(\bar{x})\cap I^{\varphi ^t_{\mathrm{FB}}}(x)\), i.e. \(\xi _l=0\) holds for all \(l\in I^{\varphi ^t_{\mathrm{FB}}}(x)\). Consequently, MFCQ holds for (\(\hbox {P}(\varphi ^t_{\mathrm{FB}})\)) at x. \(\square\)

5.2 The offset Kanzow–Schwartz function

Now, we investigate the proposed relaxation scheme in terms of the function \(\varphi ^t_{\mathrm{KS}}\). Recalling that the problems (\(\hbox {P}(\varphi ^{t}_{\mathrm{FB}})\)) and (\(\hbox {P}(\varphi ^{t}_{\mathrm{KS}})\)) possess the same feasible sets, it is reasonable to believe that the qualitative properties of this method to not significantly differ from the relaxation approach involving the smoothed Fischer–Burmeister function \(\varphi ^t_{\mathrm{FB}}\). In order to check this, let us review Example 5.1 first. Indeed, it is not difficult to see that \(x(t):=(\sqrt{t},\sqrt{t})\) is a KKT point of the program (\(\hbox {P}(\varphi ^{t}_{\mathrm{KS}})\)) associated with (13) for each \(t\in (0,1]\) again. Since we have \(x(t)\rightarrow \bar{x}\) as \(t\downarrow 0\) where \(\bar{x}:=(0,0)\) is a W-stationary point of (13) where MPOC-LICQ holds, the above conjecture seems to be confirmed.

Fix \(t>0\) and some \(\tilde{x}\in X(\varphi ^t_{\mathrm{KS}})\). Assume that \(I^{\varphi ^t_{\mathrm{KS}}}(\tilde{x})\) is nonempty. Then, for each \(l\in I^{\varphi ^t_{\mathrm{KS}}}(\tilde{x})\), the mapping \(x\mapsto \varphi ^t_{\mathrm{KS}}(G_l(x),H_l(x))\) behaves bilinear w.r.t. \(G_l(x)\) and \(H_l(x)\) for arguments from a neighborhood of \(\tilde{x}\) since \(G_l\) and \(H_l\) are continuous functions. Particularly, the relaxed subproblem (\(\hbox {P}(\varphi ^{t}_{\mathrm{KS}})\)) corresponds to a classical Scholtes-type relaxation locally around \(\tilde{x}\) in the sense of the particular underlying nonlinear description of the relaxed feasible set. That is why the proofs of the upcoming results, which characterize the convergence behavior of the suggested relaxation scheme as well as the regularity of the associated nonlinear subproblems, directly follow by reprising the arguments used in [21, Section 3.1] and [24, Section 3] in the context of MPCCs and MPSCs, respectively, while doing some problem-tailored but nearby adjustments.

Theorem 5.2

Let\(\{t_k\}_{k\in \mathbb {N}}\)be a sequence of positive relaxation parameters converging to zero. For each\(k\in \mathbb {N}\), let\(x_k\in X(\varphi ^{t_k}_{\mathrm{KS}})\)be a KKT point of (\(\hbox {P}(\varphi ^{t_k}_{\mathrm{KS}})\)). Suppose that\(\{x_k\}_{k\in \mathbb {N}}\)converges to some point\(\bar{x}\in X\)where MPOC-MFCQ holds. Then, \(\bar{x}\)is a W-stationary point of (MPOC).

Proposition 5.2

Let\(\bar{x}\in X\)be a feasible point of (MPOC) where MPOC-MFCQ is valid. Then, there exists a neighborhood\(U\subset \mathbb {R}^n\)of\(\bar{x}\)such that MFCQ holds for (\(\hbox {P}(\varphi ^t_{\mathrm{KS}})\)) at all points from\(X(\varphi ^t_{\mathrm{KS}})\cap U\)for all\(t>0\).

Due to the above results, the qualitative properties of the proposed relaxation scheme do not depend on the actual choice of the underlying function from \(\{\varphi ^t_{\mathrm{FB}},\varphi ^t_{\mathrm{KS}}\}\). However, we note that the nonlinearities hidden within these two functions are essentially different which is why we want to investigate the quantitative properties of the respective resulting relaxation method in numerical practice, see Sect. 6.

6 Numerical results

In this section, we are going to compare the solution approaches discussed in Sects. 35 by means of different instances of or-constrained programming. Particularly, we are going to investigate the direct replacement of the or-constraints by means of nonlinear inequalities induced by the Kanzow–Schwartz function \(\varphi _{\mathrm{KS}}\), see Sect. 3, the reformulation of the or-constrained program as an MPSC or MPCC which then is treated with the aid of suitable Scholtes-type relaxation methods, see Sect. 4, and the direct Scholtes-type relaxation approach based on the smoothed Fischer–Burmeister function \(\varphi _{\mathrm{FB}}^t\) and the offset Kanzow–Schwartz function \(\varphi ^t_{\mathrm{KS}}\) discussed in Sect. 5. The following problems, which are chosen from model classes with significant practical relevance, will serve as the benchmark for our numerical comparison:

  1. 1.

    a nonlinear disjunctive program in the sense of Balas, see Sect. 6.2.1,

  2. 2.

    an optimization problem where the domains of the underlying variables possess gaps, see Sect. 6.2.2, and

  3. 3.

    an or-constrained optimal control problem of the non-stationary heat equation in two spacial dimensions, see Sect. 6.2.3.

For each of these examples, we first discuss the underlying problem structure. Afterwards, the numerical results are presented. In order to provide a reasonable quantitative comparison of the five discussed computational methods, we make use of performance profiles, see [9], based on computed function values. Note that we do not use time as an performance index here since the transformation of the or-constrained program into an MPSC or MPCC comes for the price of several slack variables and additional constraints whose respective number depends linearly on the number of original or-constraints. Thus, we can expect that the other approaches would clearly outrun these two methods w.r.t. computation time. In order to guarantee that the nonlinear surrogate programs which arise from the different solution methods we want to compare can be tackled with the same NLP solver, we decided only to use the smooth Kanzow–Schwartz NCP-function for the direct reformulation of the or-constraints, cf. Section 3.

We would like to mention that the use of other relaxation methods for MPSCs and MPCCs, see [21, 24], is possible when using the approach from Sect. 4. However, in order to guarantee comparability of numerical results, one should try to focus on relaxation ideas which apply to all the problem classes (MPOC), (MPSC), and (MPCC) at the same time. Naturally, this seems to restrict us to generalizations of so-called one-sided relaxation schemes (MPCC-terminology) like Scholtes-type methods, the relaxation approach due to Kanzow and Schwartz, see [25], and the local relaxation approach due to Steffensen and Ulbrich, see [35]. We already noted at the beginning of Sect. 5 that the relaxation approach of Kanzow and Schwartz only shifts the kink in the set O from (1) and, thus, does not solve the combinatorial issues connected with the feasible set of (MPOC). On the other hand, it has been reported in [24, Section 6.2.3] that local relaxation might not be enough to solve particular instances of switching-constrained programming successfully, and this rules out the relaxation scheme of Steffensen and Ulbrich. We, thus, only focus on Scholtes-type relaxation methods which were shown to be very robust in terms of disjunctive programming, see e.g. [21, 24].

6.1 Implementation

The subsequently described numerical experiments were carried out using MATLAB R2018a. For our comparison, we exploited the five algorithms stated below:

IPOPT::

the IPOPT interior-point algorithm from [39] is applied to the NLP (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)) which results from (MPOC) by reformulating all or-constraints with the aid of the smooth Kanzow–Schwartz function, see Sect. 3,

ScholtesSC::

the relaxation method of Scholtes is applied to the switching-constrained reformulation (SC-MPOC) of (MPOC), see Sect. 4.1,

ScholtesCC::

the relaxation method of Scholtes is applied to the complementarity-constrained reformulation (CC-MPOC) of (MPOC), see Sect. 4.2,

smoothedFB::

the direct relaxation method from Sect. 5 using the smoothed Fischer–Burmeister function, and

offsetKS::

the direct relaxation method from Sect. 5 which exploits the offset Kanzow-Schwartz function.

Each of these algorithms is called via user-supplied gradients of objective and constraint functions. We use the global stopping tolerance \(10^{-4}\) for IPOPT’s stopping tolerance in case of algorithm IPOPT and for the maximum or-constraint violation

$$\begin{aligned} \max \{\max \{0,\min \{G_l(x),H_l(x)\}\}\,|\,l\in \mathcal {Q}\} \end{aligned}$$

in case of the other four methods. In order to allow a comparison of the computational results, the relaxed subproblems arising in the methods ScholtesSC, ScholtesCC, smoothedFB, and offsetKS are solved with IPOPT as well. Here, the internal stopping tolerance of IPOPT is set to \(10^{-6}\). For all these relaxation approaches, the relaxation parameter is chosen to be \(t_k:=0.01^k\) for each \(k\in \mathbb {N}\), and the algorithm is automatically terminated whenever \(t_k\) drops below \(10^{-8}\).

Since we aim for a fair quantitative comparison of these five methods, we cannot rely on computation time since by construction, the numerical effort of these approaches is essentially different. Instead, we focus our attention on the comparison of computed function values (w.r.t. different starting points) with the globally optimal function value in order to classify the robustness of the suggested methods. In light of the fact that or-constrained programs are likely to possess a substantial amount of local minimizers which are not globally optimal, this is a reasonable approach. Here, we make use of the quantity

$$\begin{aligned} Q_\delta (x^a_s):= {\left\{ \begin{array}{ll} f(x^a_s)-f_{\mathrm{min}}+\delta &{}\text {if }x^a_s\text { is feasible within tolerance},\\ +\infty &{}\text {otherwise} \end{array}\right. } \end{aligned}$$
(15)

as the underlying metric for the resulting performance profiles. Above, we used \(x^a_s\) in order to denote the final iterate of a run of algorithm \(a\in \mathcal {A}\) with

$$\begin{aligned} \mathcal {A}:=\{\mathbf{IPOPT},\mathbf{ScholtesSC},\mathbf{ScholtesCC},\mathbf{smoothedFB},\mathbf{offsetKS}\} \end{aligned}$$

for the starting point associated with the index \(s\in \mathcal {S}\). If unknown, a reasonable approximate of the global minimal function value \(f_{\mathrm{min}}\) needs to be determined. Finally, \(\delta \ge 0\) is an additional parameter which reduces sensitivity to numerical accuracy. Using the metric \(Q_\delta\) defined above, the resulting performance ratio is given by

$$\begin{aligned} \forall s\in \mathcal {S}\,\forall a\in \mathcal {A}:\quad r_{s,a}:=\frac{Q_\delta (x^a_s)}{\min \{Q_\delta (x^\alpha _s)\,|\,\alpha \in \mathcal {A}\}}. \end{aligned}$$

In our performance profiles, we plot the illustrative parts of the curves \(\rho _a:[1,\infty )\rightarrow [0,1]\) given by

$$\begin{aligned} \forall \tau \in [1,\infty ):\quad \rho _a(\tau ):=\frac{\left| \{s\in \mathcal {S}\,|\,r_{s,a}\le \tau \}\right| }{\left| \mathcal {S}\right| } \end{aligned}$$

for each algorithm \(a\in \mathcal {A}\) where \(\left| \cdot \right|\) denotes the cardinality of a set. Thus, \(\rho _a(\tau )\) may be interpreted as the probability that the final iterate produced by algorithm a has a function value which is not worse than \(\tau\)-times the best computed function value w.r.t. all algorithms from \(\mathcal {A}\).

6.2 Numerical experiments

In this section, we present the numerical results associated with three prominent instances of or-constrained programming.

6.2.1 Disjunctive programming

Let us define sets \(X_1,X_2\subset \mathbb {R}^3\) as stated below:

$$\begin{aligned} X_1&:=\{x\in \mathbb {R}^3\,|\,x_1\ge 4,\,x_1+(x_2-2)^2+(x_3+2)^2\ge 5\},\\ X_2&:=\{x\in \mathbb {R}^3\,|\,x_1^2+x_2^2\le x_3,\,(x_1-1)^2+x_2^2+x_3\ge 1,\,x_2\le 0\}. \end{aligned}$$

Now, we consider the nonlinear program

$$\begin{aligned} (x_1-1)^2+(x_2-2)^2+(x_3+2)^2\,&\rightarrow \,\min \\ x\,&\in \,X_1\cup X_2 \end{aligned}$$

which can be interpreted as an instance of disjunctive programming in the sense of Balas, see [1]. One can easily check that its global minimizer is given by \(\bar{x}:=(0,0,0)\) which possesses the minimal function value \(f_{\mathrm{min}}=9\). Note that this program possesses additional local minimizers which are not globally optimal at all points from the set

$$\begin{aligned} \{(1,0,1)\}\cup \{(4,x_2,x_3)\in \mathbb {R}^3\,|\,(x_2-2)^2+(x_3+2)^2=1\}. \end{aligned}$$

Introducing two slack variables \(u,v\in \mathbb {R}\), we can equivalently restate the program of interest as the or-constrained problem

$$\begin{aligned} (x_1-1)^2+(x_2-2)^2+(x_3+2)^2\,&\rightarrow \,\min \limits _{x,u,v}\nonumber \\ 4-x_1-u\,&\le \,0\nonumber \\ 5-x_1-(x_2-2)^2-(x_3+2)^2-u\,&\le \,0\nonumber \\ x_1^2+x_2^2-x_3-v\,&\le \,0\nonumber \\ 1-(x_1-1)^2-x_2^2-x_3-v\,&\le \,0\nonumber \\ x_2-v\,&\le \,0\nonumber \\ u\,\le \,0\,\vee \,v\,&\le \,0 \end{aligned}$$
(16)

which can be processed by our five algorithms. We use 500 starting points whose x-components are randomly chosen from [0, 4] while u and v are random scalars from \([-1,0]\). The resulting performance profile for \(\delta :=1\) can be found in Fig. 5. As we can see, the relaxation methods reliably compute the best function value and identify the actual global minimizer in most of the cases. There is no significant difference between the direct relaxation methods and those ones which are applied to surrogate reformulations of (16). All these algorithms do not outrun IPOPT whose performance is also quite good since it finds the best function value in more than \(85\%\) of the cases. This is, however, not surprising since problem (16) possesses just one or-constraint.

Fig. 5
figure 5

Performance profile for the disjunctive program from Sect. 6.2.1

6.2.2 Optimization problems with gap domains

In contrast to standard box-constrained programming, it may happen that variables need to be chosen such that they do not belong to a critical interval. One may think of physical quantities needing to stay away from given critical values or situations in production planning where a certain amount of products has to be bought or sold. In order to model such constraints, we fix vectors \(\ell ,u\in \mathbb {R}^n\) satisfying \(\ell <u\) and consider the system

$$\begin{aligned} x_l\le \ell _l\,\vee \,x_l\ge u_l\qquad l=1,\ldots ,n. \end{aligned}$$
(17)

These constraints induce so-called gap domains which are heavily disconnected. Here, the feasible set crumbles into \(2^n\) branches. Consequently, the underlying optimization problem is likely to possess several local minimizers which are not globally optimal. We note that due to \(\ell <u\), the biactive set \(I^{00}(x)\) is empty for all feasible points x of the underlying or-constrained optimization problems. This means that all the introduced stationarity notions, see Definition 2.1, coincide for programs with or-constraints of type (17).

For a random vector \(a\in [0,1]^{50}\) sorted in ascending order with at least 15 entries which are greater than 0.5, we consider the optimization problem

$$\begin{aligned} \begin{aligned} \sum \nolimits _{l=1}^{50}(x_l-a_l)^2&\,\rightarrow \,\min&\\ \sum \nolimits _{l=1}^{50}x_l&\,\le \,15&\\ x_l\,\le \,0\,\vee \,x_l&\,\ge \,1&\qquad&l=1,\ldots ,50 \end{aligned} \end{aligned}$$
(18)

whose variables possess gap domains. By construction, its globally minimal function value is given by

$$\begin{aligned} f_{\mathrm{min}}=\sum \nolimits _{l=1}^{35}a_l^2+\sum \nolimits _{l=36}^{50}(1-a_l)^2. \end{aligned}$$

For our experiments, we challenged our algorithms with 500 randomly chosen starting points from \([-1,2]^{50}\). Two resulting performance profiles for \(\delta :=1\) with differently scaled \(\tau\)-axes can be found in Fig. 6.

Fig. 6
figure 6

Performance profiles for the optimization problem with gap domains from Sect. 6.2.2

Noting that the feasible set of (18) is disconnected, it is not surprising that the relaxation methods clearly outrun IPOPT which generally gets stuck in the branch of (18) associated with the respective starting point. A direct relaxation of the program by means of smoothedFB or offsetKS does not really solve this issue. For example, one can easily check that the method offsetKS relaxes the gap-constraints to

$$\begin{aligned} x_1(1-x_1)\le t\qquad l=1,\ldots ,n \end{aligned}$$

which is equivalent to the or-constrained system

$$\begin{aligned} x_1\le \tfrac{1}{2}-\sqrt{\tfrac{1}{4}-t}\,\vee \,x_1\ge \tfrac{1}{2}+\sqrt{\tfrac{1}{4}-t}\qquad l=1,\ldots ,n \end{aligned}$$

for each \(t\in [0,\tfrac{1}{4}]\). That means that the method needs to handle highly disconnected feasible sets for comparatively large relaxation parameters already. Similar effects can be observed for smoothedFB. Both direct relaxation methods turn out to compute one particular locally optimal solution which is not the global minimizer of (18) in most of the situations, respectively, and the performance profiles underline this observation. It is not difficult to see that the MPSC- or MPCC-reformulations of (18) considered in Sect. 4 still possess disconnected feasible sets. However, the associated Scholtes-type relaxation methods ScholtesSC and ScholtesCC seem to be much more stable in numerical practice since they compute the actual global minimizer of (18) in most of the situations. One reason for this behavior might be the presence of slack variables which allow some freedom when the nonlinear subproblems are solved.

6.2.3 Or-constrained optimal control

Motivated by the considerations in [24, Section 6.2.2], we want to study the optimal control of the non-stationary heat equation with the aid of two control functions u and v which influence distinct parts \(\varOmega _u\) and \(\varOmega _v\) of the underlying domain \(\varOmega\) over time. Here, we additionally assume that at least one of the controls needs to be nonnegative at each time instance. Such a constraint arises when due to technical restrictions, it is not possible to cool \(\varOmega _u\) and \(\varOmega _v\) at the same time. In terms of this paper, this means that the control functions need to satisfy an or-constraint in pointwise fashion. In order to guarantee that the associated optimal control problem possesses an optimal solution, standard \(L^2\)-regularity of controls is generally not enough since this conservative regularity assumption does not guarantee the weak sequential closedness of the underlying set of feasible controls. Following ideas from [5, 6] where pointwise switching or complementarity constraints are considered, this issue can be solved by considering controls from a first-order Sobolev space.

Fix \(I:=(0,6)\) as well as \(\varOmega :=(-1,1)^2\) and let \(\varGamma\) be the boundary of \(\varOmega\). Furthermore, we set \(\varOmega _u:=(-1,0]\times (-1,1)\) and \(\varOmega _v:=(0,1)\times (-1,1)\). The non-stationary heat equation of our interest is given by

$$\begin{aligned} \begin{aligned} \partial _ty(t,\omega )-\varDelta _\omega y(t,\omega ) -\tfrac{1}{10}\chi _{\varOmega _u}(\omega )u(t)-\tfrac{1}{10}\chi _{\varOmega _v}(\omega )v(t)&\,=\,0&&\hbox {a.e. on }I\times \varOmega&\\ \vec {\mathbf {n}}(\omega )\cdot \nabla _\omega y(t,\omega )&\,=\,0&&\hbox {a.e. on }I\times \varGamma&\\ y(0,\omega )&\,=\,0&&\hbox {a.e. on }\varOmega \end{aligned} \end{aligned}$$
(19)

where \(\chi _A:\varOmega \rightarrow \mathbb {R}\) denotes the characteristic function of the measurable set \(A\subset \varOmega\) which equals 1 on A and vanishes on \(\varOmega \setminus A\). Following classical arguments, see [37] where the Lebesgue and Sobolev spaces of interest are characterized as well, there exists a continuous linear mapping \(S:H^1(I)\times H^1(I)\rightarrow L^2(I;H^1(\varOmega ))\) which assigns to each pair (uv) of controls the uniquely determined (weak) solution y of (19). Let us define the desired state \(y_{\mathrm{d}}:=S(u_{\mathrm{d}},v_{\mathrm{d}})\) where \(u_{\mathrm{d}},v_{\mathrm{d}}\in H^1(I)\) are given by

$$\begin{aligned} \forall t\in I:\quad u_{\mathrm{d}}(t):=-20\,\sin (\pi t/3)\qquad v_{\mathrm{d}}(t):= 10\,\cos (\pi t/2). \end{aligned}$$
(20)

Now, we are in position to state the optimal control problem of our interest below:

$$\begin{aligned} \begin{aligned} \frac{1}{2}\left\| S(u,v)-y_{\mathrm{d}}\right\| _{L^2(I;L^2(\varOmega ))}^2 +\frac{\alpha }{2}\left( \left\| u\right\| _{L^2(I)}^2+\left\| v\right\| _{L^2(I)}^2\right)&\\ +\frac{\beta }{2}\left( \left\| \partial _tu\right\| _{L^2(I)}^2+\left\| \partial _tv\right\| _{L^2(I)}^2\right)&\,\rightarrow \,\min \limits _{u,v}&\\ u(t)\,\ge \,0\,\vee \,v(t)&\,\ge \,0&\qquad&\hbox {a.e. on }I. \end{aligned} \end{aligned}$$
(21)

For our experiments, we choose \(\alpha :=10^{-6}\) and \(\beta :=10^{-5}\). Observe that the pair \((u_{\mathrm{d}},v_{\mathrm{d}})\) is not feasible to (21) since these functions violate the pointwise or-constraint precisely for all those \(t\in I\) satisfying \(1< t< 3\).

In order to tackle (21) with the suggested algorithms, we first need to perform a suitable discretization. Therefore, we tessellate the domain \(\varOmega\) with the aid of the function generateMesh from MATLAB’s PDE toolbox using the tolerance \(h:=10^{-1}\). The time interval I is subdivided into equidistant intervals of width \(\vartheta :=5\cdot 10^{-2}\). Noting that state and control need to possess first-order Sobolev regularity, we use standard piecewise affine and continuous finite elements for spatial and temporal discretization. This leads to a conforming approximation of the \(H^1\)-norm in the objective functional of (21).

This discretization results in a finite-dimensional program of type (MPOC) which possesses 121 simple or-constraints on the discretized control functions and a convex, quadratic objective functional. Thus, this program can be decomposed into \(2^{121}\) convex subproblems which indicates that the overall discretized program possesses a huge amount of local minimizers. For our comparison of the suggested numerical methods, we need to identify a reasonable candidate for a global minimizer of the optimal control problem. In order to do this, we use the following heuristic procedure adapted from [24, Section 6.2.2] in order to find a coarse upper bound for the globally minimal function value. First, we solve the program exactly for a rough time discretization (we used \(\vartheta :=0.375\)) by computing the (global) minimizers of all \(2^{17}\) resulting convex subproblems and comparing the obtained solutions. Afterwards, we lift the obtained global minimizer to the finer time grid using linear interpolation. The obtained point is used as a starting point for our five algorithms. The best obtained outcome possesses a function value of \(f_{\mathrm{min}}=0.2156\). The resulting controls are depicted in Fig. 7. They are closely related to \(u_{\mathrm{d}}\) and \(v_{\mathrm{d}}\) from (20) except for the time interval (1, 3) where the pointwise or-constraint from (21) leads to significant changes.

Fig. 7
figure 7

Potential global minimizer (left) and performance profile (right) for the or-constrained optimal control problem from Sect. 6.2.3

For our numerical experiment, we performed algorithmic runs for 500 starting points which were randomly chosen elementwise from \([-10,10]\). The resulting performance profile for \(\delta :=0\) can be found in Fig. 7. As it turns out, the Scholtes-type direct relaxation methods smoothedFB and offsetKS perform much better than the other two relaxation methods ScholtesSC and ScholtesCC. A reason for that might be that due to the transformation to a switching- or complementarity-constrained program, the surrogate problems under consideration in ScholtesSC and ScholtesCC possess lots of additional slack variables, namely 242, and inequality constraints which makes them uncomfortably large. Due to the fact that the feasible set of the discretized or-constrained optimal control problem is strongly connected, the direct method IPOPT keeps up at least with the latter relaxation methods. Another reason for that behavior might be the fact that MPOC-LICQ is valid at all feasible points of the discretized optimal control problem which implies that GCQ holds at all feasible points of the associated surrogate (\(\hbox {MPOC}(\varphi _{\mathrm{KS}})\)), see Lemma 3.1. However, IPOPT cannot challenge the direct relaxation methods smoothedFB and offsetKS which produce points with the best objective value much more frequently. Finally, it should be noted that smoothedFB performs slightly better than offsetKS. This might be caused by the fact that the relaxation via the smoothed Fischer–Burmeister function avoids bilinearities which appear when the Kanzow–Schwartz function is used for that purpose.

6.3 Summary

Our examples indicate that the correct choice for a numerical method which can be used to solve or-constrained optimization problems heavily depends on the underlying problem structure. In situations where only a few or-constraints need to be considered while the resulting feasible set is still connected, there is no significant difference between all the suggested algorithms, see Sect. 6.2.1. On the other hand, optimization problems with gap domains should be transferred into surrogate MPSCs or MPCCs which then should be solved by classical relaxation methods. This procedure turned out to annihilate the disconnectedness of the underlying feasible set successfully, see Sect. 6.2.2. Finally, whenever a huge number of simple or-constraints needs to be considered such that the underlying feasible set is still connected, then a direct relaxation of the program seems to be the correct approach since this method regularizes the feasible set while not blowing up the number of variables and constraints, see Sect. 6.2.3.

7 Concluding remarks

In this paper, we discussed three different approaches for the numerical handling of or-constrained optimization problems with the aid of different methods from continuous optimization. First, we investigated the reformulation of or-constraints as (smooth or nonsmooth) inequality constraints using suitable NCP-functions. Second, we transferred the or-constrained optimization problem into a switching- or complementarity-constrained surrogate problem which can be solved numerically with the aid of relaxation methods. The qualitative properties of these transformations were discussed in detail. Third, a direct Scholtes-type relaxation of optimization problems with or-constraints based on the smoothed Fischer–Burmeister function or the offset Kanzow–Schwartz function was suggested and the convergence properties of this approach were investigated. A numerical comparison of all these methods based on different models from or-constrained optimization has been carried out. It turned out that the precise choice of the method heavily depends on the structural properties of the underlying problem’s feasible set. Generally, relaxation methods perform much better than algorithms based on a simple replacement of the or-constraints using NCP-functions.