1 Introduction

Many practical optimization problems are modelled as mixed-integer nonlinear programming problems (MINLPs) involving continuous and discrete variables and the study of solution algorithms for these optimization problems has been an active focus of research over the past decades (cf. [3, 10, 1216, 21, 22, 25] and references therein). Suppose \(f,g_i: {\mathbb {R}}^n\times {\mathbb {R}}^p\rightarrow {\mathbb {R}}\, (i=1,\ldots ,m)\) are nonlinear functions, \(X\) is a nonempty compact convex set in \({\mathbb {R}}^n\) and \(Y\) is a set of discrete variables in \({\mathbb {R}}^p\). The general form for MINLPs is defined mathematically as follows:

$$\begin{aligned} (\hbox {P}) \left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x,\,y} &{} f(x, y)\\ \hbox {subject to} &{} g_i(x, y)\le 0, i=1,\ldots ,m,\\ &{} x\in X, y\in Y\ \hbox {discrete variable}. \end{array}\right. \end{aligned}$$
(1.1)

This paper is devoted to one class of convex MINLPs in which objective and constraint functions \(f,g_i\) for \(i=1,\ldots ,m\) are convex but not differentiable.

The class of convex MINLPs has been extensively studied by many authors and several methods for these MINLPs have been developed over past decades. These methods include branch-and-bound, generalized Benders decomposition, extended cutting-plane method, NLP/LP based branch and bound and outer approximation method (cf. [2, 59, 11, 12, 15, 19, 2325] and references therein). Note that the extended cutting-plane method was proposed by Westerlund and Pettersson [24] for solving differentiable convex MINLPs. Subsequently, Westerlund and Pettersson [25] presented this method to deal with a more general case of MINLPs including pseudo-convex functions. It was shown in [25] that one MINLP with pseudo-convex functions and pseudo-convex constraints can be solved to global optimality by the cutting-plane techniques. In 2014, Eronen et al. [6] generalized the extended cutting-plane method for solving convex nonsmooth MINLPs and provided one ECP algorithm which was proved to converge to one global optimum. Recently they [7] further considered this extended cutting plane method to deal with nonsmooth MINLPs with pseudo-convexity assumptions.

It is known that Duran and Grossmann [5] introduced the outer approximation method to deal with a particular class of MINLPs which was restricted to contain separable convex differentiable functions and not general convex differentiable functions in all variables. These separable convex functions were composed of convex differentiable functions in continuous variables and linear functions in discrete variables separately. Afterwards Fletcher and Leyffer [8] further extended the outer approximation method for solving convex MINLPs with convex and continuously differentiable objective and constraint functions, and provided a linear outer approximation algorithm to attain the optimal solution of this MINLP by solving a finite sequence of relaxed subproblems. This extension is the pioneering work on outer approximation method in a sense of solving MINLPs where the discrete variables are considered as nonlinear. In 2008, Bonami et. al [2] also studied outer approximation algorithms for convex and continuously differentiable MINLPs. Recently the authors in [6] and [23] used the outer approximation method to study convex nonsmooth MINLPs and established the resulting algorithms. It is noted that differentiability of functions plays an important role in the construction of relaxation and is proved to be an important matter for allowing to solve these relaxed subproblems efficiently. Since nonsmooth optimization problems defined by non-differentiable functions appear in practice, from the theoretical viewpoint as well as for applications, it is interesting and significant to consider convex and non-differentiable MINLPs. Motivated by this, in this paper, we are inspired by [2, 5, 6, 8, 23] to continue studying one convex MINLP by dropping the differentiability assumption and aim to construct an outer approximation algorithm for solving this MINLP. The outer approximation method used herein is along the line given in [8, 23] and consists of the use of KKT conditions to linearize the objective and constraint functions at different points for constructing an equivalent MILP relaxation of the problem.

The paper is organized as follows. In Sect. 2, we give some definitions and preliminaries used in this paper. Section 3 contains the equivalent reformulation of convex MINLP by the outer approximation method and one outer approximation algorithm for finding optimal solutions of this MINLP. The reformulation is mainly dependent on KKT conditions and projection techniques. For the algorithm construction, it is necessary to solve a finite sequence of nonlinear programs including feasible and infeasible subproblems and the relaxations of mixed-integer linear master program. The convergence theorem for the established algorithm is also presented therein. The conclusion of this paper is presented in Sect. 4. Section 5 is an “Appendix” which contains the proofs of the main results given for constructing the algorithm in the paper.

2 Preliminaries

Let \(\Vert \cdot \Vert \) be the norm of \({\mathbb {R}}^n\) and denote the inner product between two elements of \({\mathbb {R}}^n\) by \(\langle \cdot , \cdot \rangle \). Let \({\varOmega }\) be a closed convex set of \({\mathbb {R}}^n\) and \(x\in {\varOmega }\). We denote \(T({\varOmega }, x)\) the contingent cone of \({\varOmega }\) at \(x\); that is, \(v\in T({\varOmega }, x)\) if and only if there exist a sequence \(\{v_k\}\) in \({\mathbb {R}}^n\) converging to \(v\) and a sequence \(t_k\) in \((0, +\infty )\) decreasing to \(0\) such that \(x+t_kv_k\in {\varOmega }\) for all \(k\in \mathbb {N}\), where \(\mathbb {N}\) denotes the set of all natural numbers. It is known from [1] that

$$\begin{aligned} T({\varOmega }, x)=cl({\mathbb {R}}_+({\varOmega }-x)) \end{aligned}$$

where \(cl\) denotes the closure.

Let \(N({\varOmega }, x)\) denote the normal cone of \({\varOmega }\) at \(x\), that is

$$\begin{aligned} N({\varOmega }, x):=\{\gamma \in {\mathbb {R}}^n: \langle \gamma , z-x\rangle \le 0\quad \hbox {for all} \ z\in {\varOmega }\}. \end{aligned}$$
(2.1)

It is easy to verify that normal cone \(N({\varOmega }, x)\) and contingent cone \(T({\varOmega }, x)\) are the polar dual; that is

$$\begin{aligned} N({\varOmega }, x)=\big (T({\varOmega }, x)\big )^{\circ }:=\big \{\gamma \in {\mathbb {R}}^n: \langle \gamma , v\rangle \le 0 \quad \hbox {for all} \ v\in T({\varOmega }, x)\big \}. \end{aligned}$$

Readers are invited to consult the book [1] for more details on contingent cone and normal cone.

Let \(\varphi : {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a continuous convex function, \(\bar{x}\in {\mathbb {R}}^n\) and \(h\in {\mathbb {R}}^n\). Recall (cf. [20]) that \(d^+\varphi (\bar{x})(h)\) denotes the directional derivative of \(\varphi \) at \(\bar{x}\) along the direction \(h\) and is defined by

$$\begin{aligned} d^+\varphi (\bar{x})(h):=\lim \limits _{t\rightarrow 0^+}\frac{\varphi (\bar{x}+th)-\varphi (\bar{x})}{t}. \end{aligned}$$

We denote \(\partial \varphi (\bar{x})\) the subdifferential of \(\varphi \) at \(\bar{x}\) which is defined by

$$\begin{aligned} \partial \varphi (\bar{x}):=\{\alpha \in {\mathbb {R}}^n:\; \langle \alpha , x-\bar{x}\rangle \le \varphi (x)- \varphi (\bar{x})\ \hbox {for all} \ x\in {\mathbb {R}}^n\}. \end{aligned}$$

Each vector in \(\partial \varphi (\bar{x})\) is called a subgradient of \(\varphi \) at \(\bar{x}\). It is known from [20] that \(\alpha \in \partial \varphi (\bar{x})\) if and only if

$$\begin{aligned} \langle \alpha , h\rangle \le d^+\varphi (\bar{x})(h)\quad \hbox {for all} \ h\in {\mathbb {R}}^n. \end{aligned}$$

Recall that \(\varphi \) is said to be Gâteaux differentiable at \(\bar{x}\) if there exists \(d\varphi (\bar{x})\in {\mathbb {R}}^n\) such that

$$\begin{aligned} \lim \limits _{t\rightarrow 0^+}\frac{\varphi (\bar{x}+th)-\varphi (\bar{x})}{t}=\langle d\varphi (\bar{x}), h\rangle \quad \hbox {for all} \, h\in {\mathbb {R}}^n \end{aligned}$$
(2.2)

and \(\varphi \) is said to be Fréchet differentiable at \(\bar{x}\) if \(\varphi \) is Gâteaux differentiable there and the limit in (2.2) exists uniformly for \(\Vert h\Vert \le 1\) as \(t\rightarrow 0^+\).

It is known from [20] that \(\varphi \) is Gâteaux differentiable at \(\bar{x}\) if and only if \(\partial \varphi (\bar{x})\) is the singleton. Further, Gâteaux differentiability of \(\varphi \) is equivalent to the Fréchet differentiability of \(\varphi \) due to the local Lipschitzian property of \(\varphi \) and the compactness of unit closed ball in \({\mathbb {R}}^n\).

Given a continuous convex function \(\phi : {\mathbb {R}}^n\times {\mathbb {R}}^p\rightarrow {\mathbb {R}}\) and \((\bar{x}, \bar{y})\in {\mathbb {R}}^n\times {\mathbb {R}}^p\), one vector \((\alpha , \beta )\in {\mathbb {R}}^n\times {\mathbb {R}}^p\) is the subgradient of \(\phi \) at \((\bar{x}, \bar{y})\) if and only if

$$\begin{aligned} \phi (x, y)\ge \phi (\bar{x}, \bar{y})+(\alpha , \beta )^T\begin{pmatrix}x-\bar{x}\\ y-\bar{y}\end{pmatrix}\quad \hbox {for all} \ (x, y)\in {\mathbb {R}}^n\times {\mathbb {R}}^p, \end{aligned}$$
(2.3)

where \((\alpha , \beta )^T\) is the transpose of matrix \((\alpha , \beta )\). When \(\bar{y}\) is fixed (resp. \(\bar{x}\) is fixed), the subdifferential of \(\phi (\cdot ,\bar{y})\) (resp. \(\phi (\bar{x},\cdot )\)) at \(\bar{x}\) (resp. \(\bar{y}\)) is the set defined by

$$\begin{aligned}&\partial \phi (\cdot ,\bar{y})(\bar{x}):=\big \{\alpha \in {\mathbb {R}}^n: \phi (x, \bar{y})\ge \phi (\bar{x}, \bar{y})+\langle \alpha , x-\bar{x}\rangle \quad \hbox {for all} \ x\in {\mathbb {R}}^n\big \} \\&\Big ({ resp.}\ \partial \phi (\bar{x},\cdot )(\bar{y}):=\big \{\beta \in {\mathbb {R}}^p: \phi (\bar{x}, y)\ge \phi (\bar{x}, \bar{y})+\langle \beta , y-\bar{y}\rangle \quad \hbox {for all} \ y\in {\mathbb {R}}^p\big \}\Big ). \end{aligned}$$

The following proposition on the subdifferential of convex functions is easy to verify from the definition.

Proposition 2.1

Let \(\phi : {\mathbb {R}}^n\times {\mathbb {R}}^p\rightarrow {\mathbb {R}}\) be a continuous convex function and \((\bar{x}, \bar{y})\in {\mathbb {R}}^n\times {\mathbb {R}}^p\). Then for any \((\alpha , \beta )\in \partial \phi (\bar{x}, \bar{y})\), one has \(\alpha \in \partial \phi (\cdot ,\bar{y})(\bar{x})\) and \(\beta \in \partial \phi (\bar{x},\cdot )(\bar{y})\).

It is an interesting question to consider the converse of Proposition 2.1. This question is also interesting even for smooth convex functions in mathematical analysis. The question is explicitly stated as follows:

Given one continuous convex function \(\phi : {\mathbb {R}}^n\times {\mathbb {R}}^p\rightarrow {\mathbb {R}}\) and one vector \(\bar{\alpha }\) from \(\partial \phi (\cdot ,\bar{y})(\bar{x})\), whether or not is there some vector \(\bar{\beta }\in {\mathbb {R}}^p\) such that \((\bar{\alpha }, \bar{\beta })\in \partial \phi (\bar{x}, \bar{y})\)?

The following two propositions provided an affirmative answer to this question. These propositions will play a key role in construction of outer approximation algorithm in the sequel. The first proposition is on convex and Fréchet differentiable functions.

Proposition 2.2

Let \(\phi :{\mathbb {R}}^n\times {\mathbb {R}}^p\rightarrow {\mathbb {R}}\) be a continuous convex function and \((\bar{x}, \bar{y})\in {\mathbb {R}}^n\times {\mathbb {R}}^p\). Suppose that \(\phi (\cdot ,\bar{y})\) is Fréchet differentiable at \(\bar{x}\) and \(\phi (\bar{x}, \cdot )\) is Fréchet differentiable at \(\bar{y}\). Then \(\phi \) is Fréchet differentiable at \((\bar{x}, \bar{y})\).

Proof

By the Fréchet differentiability of \(\phi (\cdot ,\bar{y})\) and \(\phi (\bar{x}, \cdot )\), one has

$$\begin{aligned} \partial \phi (\cdot ,\bar{y})(\bar{x})=\{\triangledown _x\phi (\bar{x},\bar{y})\}\quad \hbox {and} \quad \partial \phi (\bar{x}, \cdot )(\bar{y})=\{\triangledown _y\phi (\bar{x},\bar{y})\}. \end{aligned}$$

This and Proposition 2.1 imply that \(\partial \phi (\bar{x}, \bar{y})\) is the singleton and

$$\begin{aligned} \partial \phi (\bar{x}, \bar{y})=\{(\triangledown _x \phi (\bar{x},\bar{y}), \triangledown _y\phi (\bar{x},\bar{y})\}. \end{aligned}$$

Hence \(\phi \) is Gâteaux differentiable at \((\bar{x}, \bar{y})\) and consequently Fréchet differentiable at \((\bar{x}, \bar{y})\). The proof is complete. \(\square \)

Proposition 2.2 may not necessarily be true for non-convex functions. Consider function \(\phi \) on \({\mathbb {R}}\times {\mathbb {R}}\) defined as: \(\phi (x,y)=\frac{x^2y^2}{(x^2+y^2)^{3/2}}\) if \(x^2+y^2\not =0\) and \(\phi (x,y)=0\) if \(x^2+y^2=0\). Then \(\phi \) is continuous on \({\mathbb {R}}\times {\mathbb {R}}\) and partial derivatives \(\triangledown _x\phi (0,0)\) and \(\triangledown _y\phi (0,0)\) exist (\(\triangledown _x\phi (0,0)=\triangledown _y\phi (0,0)=0\)). However, one can verify that \(\phi \) is not differentiable at \((0,0)\).

Proposition 2.3

Let \(\phi :{\mathbb {R}}^n\times {\mathbb {R}}^p\rightarrow {\mathbb {R}}\) be a continuous convex function and \((\bar{x}, \bar{y})\in {\mathbb {R}}^n\times {\mathbb {R}}^p\). Then for any \(\bar{\alpha }\in \partial \phi (\cdot ,\bar{y})(\bar{x})\), there exists \(\bar{\beta }\in {\mathbb {R}}^p\) such that \((\bar{\alpha }, \bar{\beta })\in \partial \phi (\bar{x}, \bar{y})\).

Proof

Let \(F_{\bar{y}}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\times {\mathbb {R}}^p\) be defined by \(F_{\bar{y}}(x):=(x, \bar{y})\). Then \(\phi (\cdot ,\bar{y})=\phi \circ F_{\bar{y}}\), and it is easy to verify that \(F_{\bar{y}}\) is differentiable at \(\bar{x}\) and

$$\begin{aligned} \triangledown F_{\bar{y}}(\bar{x})(h)=(h, 0)\in {\mathbb {R}}^n\times {\mathbb {R}}^p \end{aligned}$$
(2.4)

holds for all \(h\in {\mathbb {R}}^n\). Let \(\bar{\alpha }\in \partial \phi (\cdot ,\bar{y})(\bar{x})\). We first prove that

$$\begin{aligned} \bar{\alpha }\in \triangledown F_{\bar{y}}(\bar{x})^*(\partial \phi (\bar{x}, \bar{y})) \end{aligned}$$
(2.5)

where \(\triangledown F_{\bar{y}}(\bar{x})^*\) is the conjugate operator of \(\triangledown F_{\bar{y}}(\bar{x})\).

Since \(\phi \) is continuous at \((\bar{x}, \bar{y})\), it follows that \(\partial \phi (\bar{x}, \bar{y})\) is a nonempty, convex and compact subset by [20, Proposition1.11] and then \(\triangledown F_{\bar{y}}(\bar{x})^*(\partial \phi (\bar{x}, \bar{y}))\) is convex and compact as \(\triangledown F_{\bar{y}}(\bar{x})^*\) is continuous.

Suppose to the contrary that \(\bar{\alpha }\not \in \triangledown F_{\bar{y}}(\bar{x})^*(\partial \phi (\bar{x}, \bar{y}))\). By the seperation theorem, there exists \(\bar{u}\in {\mathbb {R}}^n\) with \(\Vert \bar{u}\Vert =1\) such that

$$\begin{aligned} \langle \bar{\alpha }, \bar{u}\rangle&> \max \{\langle \triangledown F_{\bar{y}}(\bar{x})^*(\alpha , \beta ), \bar{u}\rangle : (\alpha , \beta )\in \partial \phi (\bar{x}, \bar{y})\}\\&= \max \{\langle (\alpha , \beta ), \triangledown F_{\bar{y}}(\bar{x})(\bar{u})\rangle : (\alpha , \beta )\in \partial \phi (\bar{x}, \bar{y})\}. \end{aligned}$$

This and (2.4) imply that

$$\begin{aligned} \langle \bar{\alpha }, \bar{u}\rangle >\max \{\langle (\alpha , \beta ), (\bar{u}, 0)\rangle : (\alpha , \beta )\in \partial \phi (\bar{x}, \bar{y})\}. \end{aligned}$$
(2.6)

Noting that \(\bar{\alpha }\in \partial \phi (\cdot ,\bar{y})(\bar{x})\) and \(\phi \) is a continuous convex function on \({\mathbb {R}}^n\times {\mathbb {R}}^p \), it follows from [20, Proposition 2.24] and (2.6) that

$$\begin{aligned} d^+\phi (\cdot ,\bar{y})(\bar{x})(\bar{u})\ge \langle \bar{\alpha }, \bar{u}\rangle >d^+\phi (\bar{x}, \bar{y})(\bar{u}, 0)=d^+\phi (\cdot ,\bar{y})(\bar{x})(\bar{u}), \end{aligned}$$

which is contradiction. Thus (2.5) holds.

By virtue of (2.5), there exists \((\hat{\alpha }, \bar{\beta })\in \partial \phi (\bar{x},\bar{y})\) such that \(\bar{\alpha }=\triangledown F_{\bar{y}}(\bar{x})^*(\hat{\alpha }, \bar{\beta })\). It suffices to prove that \(\bar{\alpha }=\hat{\alpha }\).

For any \(h\in {\mathbb {R}}^n\), by using (2.4), one has

$$\begin{aligned} \langle \bar{\alpha }, h\rangle =\langle \triangledown F_{\bar{y}}(\bar{x})^*(\hat{\alpha }, \bar{\beta }), h\rangle =\langle (\hat{\alpha }, \bar{\beta }), \triangledown F_{\bar{y}}(\bar{x})(h)\rangle =\langle (\hat{\alpha }, \bar{\beta }), (h, 0)\rangle =\langle \hat{\alpha }, h\rangle . \end{aligned}$$

This means that \(\bar{\alpha }=\hat{\alpha }\). The proof is complete. \(\square \)

The following proposition is on the subdifferential of maximum function of two convex functions which is from [26, Theorem 2.4.18]. This result will be used later in our analysis.

Proposition 2.4

Let \(\varphi : {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a convex and continuous function. Define \(\varphi _+(x):=\max \{\varphi (x), 0\}\) for all \(x\in {\mathbb {R}}^n\). Then \(\varphi _+\) is a convex continuous function and

$$\begin{aligned} \partial \varphi _+(x)=[0, 1]\partial \varphi (x) \end{aligned}$$
(2.7)

holds for all \(x\in {\mathbb {R}}^n\) with \(\varphi (x)=0\), where \([0, 1]\partial \varphi (x):=\{t\gamma : t\in [0, 1]\ \hbox {and}\ \gamma \in \partial \varphi (x)\}\) for any \(x\in {\mathbb {R}}^n\).

3 Main results

In this section, we mainly study convex MINLP problem of (1.1) by dropping the differentiability assumption and aim to establish one outer approximation algorithm for solving such problem.

Let convex MINLP be defined as (1.1) and set \(g:=(g_1,\ldots ,g_m)\). For the case that \(f,g_i(i=1,\ldots ,m)\) in (1.1) are convex and smooth, it is known from [2, 6, 8] that main idea of outer approximation algorithm for convex smooth MINLPs is using linearization of the objective function and the constraints at different points to build a mixed-integer linear program (MILP) relaxation of the problem; that is, given some set \(K\) with optimal solutions of several optimization problems, it is possible to build a relaxation of problem (P) in (1.1):

$$\begin{aligned} \left\{ \begin{array}{ll} \mathop {\hbox {minimize}} &{} \theta \\ \hbox {subject to} &{} f(x_j, y_j)+ f(x_j,y_j)^T\begin{pmatrix}x-x_j\\ y-y_j\end{pmatrix}\le \theta , \\ &{} g(x_j, y_j)+ g(x_j,y_j)^T\begin{pmatrix}x-x_j\\ y-y_j\end{pmatrix}\le 0,\\ &{} x\in X, y\in Y\ \hbox {discrete variable}. \end{array} \right. \quad \forall (x_j,y_j)\in K \end{aligned}$$
(3.1)

When dealing with problem (P) in (1.1), the concept of subgradient is the substitute of the gradient in relaxation of (P). Note that arbitrary subgradients substituting gradients in (3.1) is not sufficient to equivalently reformulate problem (P) (see Example 3.1 below). As in [3, 6], with the help of KKT conditions, we obtain several special subgradients, which we then use to reformulate problem (P) as one equivalent MILP master program such as (3.1).

3.1 An overview of the method

For the equivalent reformulation of problem (P) in (1.1) and by using techniques in (3.1), we appeal to the concept of projection for expressing problem (P) onto \(y\) variables. For any fixed \(y\in Y\), we consider the following subproblem \(P^y\):

$$\begin{aligned} P^y \left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x} &{} f(x, y)\\ \hbox {subject to} &{} g(x, y)\le 0, \\ &{} x\in X. \end{array} \right. \end{aligned}$$
(3.2)

If there exists some \(x\in X\) such that \(g(x, y)\le 0\), the subproblem \(P^y\) is said to be feasible; otherwise, \(P^y\) is said to be infeasible.

For the validness of KKT conditions, we assume that the following Slater constraint qualification holds:

Assumption (A1) For any \(y\in Y\) satisfying that the subproblem \(P^y\) is feasible, the following Slater constraint qualification holds:

$$\begin{aligned} g(\hat{x}, y)<0\ \ { for\ some}\ \hat{x}\in X. \end{aligned}$$
(Slater CQ)

Let

$$\begin{aligned} {\varSigma }:=\{y\in Y: g(x, y)\le 0\quad \hbox {for some}\ x\in X\} \end{aligned}$$
(3.3)

denote the set of all discrete variables \(y\) that produce feasible subproblems. Then the projection of problem (P) onto variable \(y\) can be given as follows:

$$\begin{aligned} \mathop {\hbox {minimize}}\limits _{y_j\in {\varSigma }}\left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x} &{} f(x, y_j)\\ \hbox {subject to} &{} g(x, y_j)\le 0,\\ &{} x\in X. \end{array} \right. \end{aligned}$$
(3.4)

Now let \(y_j\in {\varSigma }\) be fixed. Since \(X\) is compact and \(f,g_i\) are continuous, it follows that the optimal solution to subproblem \(P^{y_j}\) exists. Thus we can suppose that \(x_j\) is one optimal solution to \(P^{y_j}\). By the assumption (A1) and KKT conditions, there exist \((\lambda _{j,1},\ldots ,\lambda _{j,m})\in {\mathbb {R}}^m\) such that

$$\begin{aligned} \left\{ \begin{array}{l} 0\in \partial f(\cdot ,y_j)(x_j)+\sum \limits _{i\in I(x_j)}\lambda _{j,i}\partial g_i(\cdot ,y_j)(x_j)+ N(X, x_j),\\ \lambda _{j,i}g_i(x_j,y_j)=0, \ \ i=1,\ldots ,m,\\ \lambda _{j,i}\ge 0, \ \ i=1,\ldots ,m, \end{array} \right. \end{aligned}$$
(3.5)

where

$$\begin{aligned} I(x_j):=\{i\in \{1,\ldots ,m\}: g_i(x_j,y_j)=0\} \end{aligned}$$
(3.6)

is the active constraint set. This means that we can take \(\alpha _j\in \partial f(\cdot ,y_j)(x_j)\) and \(\xi _{j,i}\in \partial g_i(\cdot ,y_j)(x_j) (i=1,\ldots ,m)\) such that

$$\begin{aligned} -\alpha _j-\sum \limits _{i\in I(x_j)}\lambda _{j,i}\xi _{j,i}\in N(X, x_j). \end{aligned}$$
(3.7)

By Proposition 2.3, there exist \(\beta _j\in {\mathbb {R}}^p\) and \(\eta _{j,i}\in {\mathbb {R}}^p (i=1,\ldots ,m)\) such that

$$\begin{aligned} (\alpha _j, \beta _j)\in \partial f(x_j,y_j)\quad \hbox {and} \quad (\xi _{j,i},\eta _{j,i})\in \partial g_i(x_j,y_j), \forall i\in \{1,\ldots ,m\}. \end{aligned}$$
(3.8)

Set \(\xi _j:=(\xi _{j,1},\ldots ,\xi _{j,m})\) and \(\eta _j:=(\eta _{j,1},\ldots ,\eta _{j,m})\). We consider the following linear problem:

$$\begin{aligned} LP(x_j, y_j) \left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x} &{} f(x_j, y_j)+(\alpha _j, \beta _j)^T\begin{pmatrix}x-x_j\\ 0\end{pmatrix}\\ \hbox {subject to} &{} g(x_j, y_j)+(\xi _j, \eta _j)^T\begin{pmatrix}x-x_j\\ 0\end{pmatrix}\le 0, \\ &{} x\in X. \end{array} \right. \end{aligned}$$
(3.9)

The following theorem establishes the equivalence between subproblem \(P^{y_j}\) and linear program \(LP(x_j,y_j)\) of (3.9). The proof of this theorem will be given in Sect. 5.

Theorem 3.1

Let \(LP(x_j, y_j)\) be defined as (3.9). Then \(x_j\) is one optimal solution for \(LP(x_j, y_j)\) in (3.9) and \(f(x_j, y_j)\) is the optimal value of \(LP(x_j, y_j)\) in (3.9).

We denote

$$\begin{aligned} T:=\Big \{j: P^{y_j}\ \hbox {is feasible and } \ x_j\ \hbox {is an optimal solution to}\ P^{y_j}\Big \}. \end{aligned}$$
(3.10)

Let \(j\in T\). By assumption (A1), we can take \((\lambda _{j,1},\ldots ,\lambda _{j,m})\in {\mathbb {R}}^m_+\), \((\alpha _j,\beta _j)\in \partial f(x_j,y_j)\) and \((\xi _{j,i},\eta _{j,i})\in \partial g_i(x_j,y_j) \) \((i=1,\ldots ,m)\) such that (3.7) holds. Applying Proposition 2.3, there exist \(\beta _j\in {\mathbb {R}}^p\) and \(\eta _{j,i}\in {\mathbb {R}}^p\) \((i=1,\ldots ,m)\) such that (3.8) holds. Then we set

$$\begin{aligned} \xi _j:=(\xi _{j,1},\ldots ,\xi _{j,m}) \quad \hbox {and} \quad \eta _j:=(\eta _{j,1},\ldots ,\eta _{j,m}). \end{aligned}$$

We consider the following MILP:

$$\begin{aligned} (M_{{\varSigma }})\left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x,\,y,\,\theta } &{} \theta \\ \hbox {subject to} &{} f(x_j, y_j)+(\alpha _j, \beta _j)^T\begin{pmatrix}x-x_j\\ y-y_j\end{pmatrix}\le \theta \quad \forall j\in T,\ \ \\ &{} g(x_j, y_j)+(\xi _j, \eta _j)^T\begin{pmatrix}x-x_j\\ y-y_j\end{pmatrix}\le 0\quad \forall j\in T,\\ &{} x\in X, y\in {\varSigma }\ \hbox {discrete variable}. \end{array} \right. \end{aligned}$$
(3.11)

By virtue of Theorem 3.1, we obtain the following theorem on the equivalence of problem (P) of (1.1) and MILP \((M_{{\varSigma }})\) of (3.11).

Theorem 3.2

Assmue that MINLP problem (P) of (1.1) satisfies assumption (A1). Then MILP \((M_{{\varSigma }})\) of (3.11) are equivalent to problem (P) in the sense that both have the same optimal value and that the optimal solution \((\bar{x}, \bar{y})\) to problem (P) corresponds to the optimal solution \(( \bar{x}, \bar{y}, \bar{\theta })\) to \((M_{{\varSigma }})\) of (3.11) with \(\bar{\theta }=f(\bar{x}, \bar{y})\).

For completely reformulating the problem (P), it remains to provide an appropriate representation of constraint \(y\in Y\backslash {\varSigma }\) by supporting hyperplanes. Along the lines in [2, 8], we are inspired to study infeasible subproblems so as to eliminate those discrete variables that give rise to infeasibility.

Let \(y_l\in Y\backslash {\varSigma }\). Then subproblem \(P^{y_l}\) is infeasible; that is,

$$\begin{aligned} \not \exists x\in X \quad \hbox {satisfying} \ g_i(x, y_l)\le 0 \quad \hbox {for all} \ i=1,\ldots , m. \end{aligned}$$

Let \(J_l\) be one subset of \(\{1,\ldots ,m\}\) such that there is some \(\hat{x}\in X\) satisfying

$$\begin{aligned} g_i(\hat{x}, y_l)< 0,\quad \forall i\in J_l. \end{aligned}$$
(3.12)

Denote \(J_l^{\bot }:=\{1,\ldots ,m\}\backslash J_l\) the complement of \(J_l\). To detect the infeasibility, we study the following subproblem \(F^{y_l}\):

$$\begin{aligned} F^{y_l}\left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x} &{} \sum \limits _{i\in J_l^{\bot }}[g_i(x, y_l)]_+\\ \hbox {subject to} &{} g_i(x, y_l)\le 0 \quad \forall i\in J_l,\\ &{} x\in X, \end{array} \right. \end{aligned}$$
(3.13)

where \([g_{i}(x,y_{l})]_+ := \max \{g_{i}{(x, y_l)},0\}\).

Since \(X\) is compact and \(g_i\) for \(i=1,\ldots ,m\) are continuous, then the optimal solution to subproblem \(F^{y_l}\) exists. Thus we can assume that \(x_l\) is one optimal solution to subproblem \(F^{y_l}\). For convenience to state the process, we divide the set \(J_l^{\bot }\) into three disjoint subsets which are denoted by \(J_l^1, J_l^2\) and \(J_l^3\). These three subsets are defined as

$$\begin{aligned} \left\{ \begin{array}{l} J_l^1:=\{i\in J_l^{\bot }: g_i(x_l,y_l)=0\},\\ J_l^2:=\{i\in J_l^{\bot }: g_i(x_l,y_l)>0\},\\ J_l^3:=\{i\in J_l^{\bot }: g_i(x_l,y_l)<0\}. \end{array} \right. \end{aligned}$$
(3.14)

This means that \(J_l^{\bot }=J_l^1\cup J_l^2\cup J_l^3\) and by using continuity of \(g_i\), one has

$$\begin{aligned} \partial [g_i(\cdot , y_l)]_+(x_l)=\partial g_i(\cdot , y_l)(x_l), \ \forall i\in J_l^2 \quad \hbox {and} \quad \partial [g_i (\cdot , y_l)]_+(x_l)=\{0\},\ \forall i\in J_l^3. \end{aligned}$$

By (3.12) and KKT condition, there exist \(\lambda _{l,i}\in {\mathbb {R}}\) for all \(i\in J_l\) such that

$$\begin{aligned} \left\{ \begin{array}{l} 0\in \sum \limits _{i\in J_l^{\bot }}\partial [g_i(\cdot ,y_l)]_+(x_l)+\sum \limits _{i\in J_l}\lambda _{l,i}\partial g_i(\cdot ,y_l)(x_l)+ N(X, x_l),\\ \lambda _{l,i}g_i(x_l,y_l)=0, \ \ \forall i\in J_l,\\ \lambda _{l,i}\ge 0, \ \ \forall i\in J_l. \end{array} \right. \end{aligned}$$
(3.15)

Denote \(\lambda _{l,i}\equiv 1\) for all \(i\in J_l^2\) and \(\lambda _{l,i}\equiv 0\) for all \(i\in J_l^3\). Using Proposition 2.4, there exist \(\lambda _{l,i}\in [0,1] (\forall i\in J_l^1)\) and \(\xi _{l,i}\in \partial g_i(\cdot ,y_l)(x_l) (\forall i\in J_l^{\bot }\cup J_l)\) such that

$$\begin{aligned} -\sum _{i\in J_l^{\bot }\cup J_l}\lambda _{l,i}\xi _{l,i}\in N(X,x_l). \end{aligned}$$
(3.16)

By virtue of Proposition 2.3, there exist \(\eta _{l,i}\in {\mathbb {R}}^p\) such that \((\xi _{l,i}, \eta _{l,i})\in \partial g_i(x_l,y_l)\) for all \(i\in J_l^{\bot }\cup J_l\).

Since subproblem \(P^{y_l}\) is infeasible, then there exists one optimal solution \(x_l\) to subproblem \(F^{y_l}\) such that \(\sum _{i\in J_l^{\bot }}[g_i(x_l, y_l)]_+>0\), by the continuity of \(g_i\) and compactness of \(X\). This gives the following theorem on subproblem \(F^{y_l}\). The proof is also given in Section 5.

Theorem 3.3

The discrete variable \(y_l\in Y\backslash {\varSigma }\) is infeasible to the following constraint:

$$\begin{aligned} \left\{ \begin{array}{l} g_i(x_l, y_l)+(\xi _{l,i}, \eta _{l,i})^T\begin{pmatrix}x-x_l\\ y-y_l\end{pmatrix}\le 0, \ \ \forall i\in J_l^{\bot }\cup J_l,\\ x\in X, y\in Y. \end{array} \right. \end{aligned}$$
(3.17)

It is necessary to ensure that discrete variables that produce infeasible subproblems are also infeasible in the reformulated master program. We denote

$$\begin{aligned} S:=\big \{l: P^{y_l}\ \hbox {is infeasible and}\ x_l\ \hbox {solves}\ F^{y_l} \big \}. \end{aligned}$$
(3.18)

For any \(l\in S\), take \(\lambda _{l,i}\ge 0\) and \(\xi _{l,i}\in \partial g_i(\cdot ,y_l)(x_l) (i=1,\ldots ,m)\) such that (3.16) holds. Take \(\eta _{l,i}\in {\mathbb {R}}^p\) such that \((\xi _{l,i},\eta _{l,i})\in \partial g_i(x_l,y_l)\) for any \(i\in \{1,\ldots ,m\}\) by Proposition 2.3. We set \(\xi _l:=(\xi _{l,1},\ldots ,\xi _{l,m})\) and \(\eta _l:=(\eta _{l,1},\ldots ,\eta _{l,m})\). Then by using Theorem 3.3, we have the following theorem which shows how to eliminate those discrete variables giving rise to infeasible subproblems.

Theorem 3.4

For any \(l\in S\), let \((\xi _l, \eta _l)\) be defined as above. Then the following constraints

$$\begin{aligned} \left\{ \begin{array}{l} g(x_l, y_l)+(\xi _{l}, \eta _{l})^T\begin{pmatrix}x-x_l\\ y-y_l\end{pmatrix}\le 0, \ \ \forall l\in S,\\ x\in X, y\in Y \end{array} \right. \end{aligned}$$
(3.19)

exclude all discrete variables \(y_l\in Y\) for which subproblem \(P^{y_l}\) is infeasible.

It is known from Theorem 3.4 that we can add linearization from \(F^{y_l}\) when subproblem \(P^{y_l}\) is infeasible so as to correctly represent the constraints \(y\in {\varSigma }\) in (3.3). This gives rise to the MILP master program (MP) which is equivalent to MINLP problem (P) in (1.1) and used to reformulate problem (P).

Let \(T\) and \(S\) be defined as (3.10) and (3.18), respectively. For any \(j\in T\), by assumption (A1), we can take \(\lambda _{j,i}\ge 0\) \((i=1,\ldots ,m)\), \(\alpha _j\in \partial f(\cdot ,y_j)(x_j)\) and \(\xi _{j,i}\in \partial g_i(\cdot ,y_j)(x_j) (i=1,\ldots ,m)\) such that (3.7) holds, and by Proposition 2.3, we take \(\beta _j\in {\mathbb {R}}^p\) and \(\eta _{j,i}\in {\mathbb {R}}^p (i=1,\ldots ,m)\) such that (3.8) holds. We set

$$\begin{aligned} \xi _j:=(\xi _{j,1},\ldots ,\xi _{j,m}) \quad \hbox {and} \quad \eta _j:=(\eta _{j,1},\ldots ,\eta _{j,m}). \end{aligned}$$

For any \(l\in S\), we take \(\lambda _{l,i}\ge 0\) and \(\xi _{l,i}\in \partial g_i(\cdot ,y_l)(x_l) (i=1,\ldots ,m)\) such that (3.16) holds and by Proposition 2.3, we take \(\eta _{l,i}\in {\mathbb {R}}^p\) such that \((\xi _{l,i},\eta _{l,i})\in \partial g_i(x_l,y_l)\). Set

$$\begin{aligned} \xi _l:=(\xi _{l,1},\ldots ,\xi _{l,m}) \quad \hbox {and} \quad \eta _l:=(\eta _{l,1},\ldots ,\eta _{l,m}). \end{aligned}$$

The MILP master problem (MP) is given as follows:

$$\begin{aligned} \hbox {(MP)}\left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x,\,y,\,\theta } &{} \theta \\ \hbox {subject to} &{} f(x_j, y_j)+(\alpha _j, \beta _j)^T\begin{pmatrix}x-x_j\\ y-y_j\end{pmatrix}\le \theta \quad \forall j\in T,\ \ \\ &{} g(x_j, y_j)+(\xi _j, \eta _j)^T\begin{pmatrix}x-x_j\\ y-y_j\end{pmatrix}\le 0 \quad \forall j\in T,\\ &{} g(x_l, y_l)+(\xi _l, \eta _l)^T\begin{pmatrix}x-x_l\\ y-y_l\end{pmatrix}\le 0 \quad \forall l\in S,\\ &{} x\in X, y\in Y\ \hbox {discrete variable}. \end{array} \right. \end{aligned}$$
(3.20)

The following theorem, immediate from Theorems 3.3 and 3.4, is one main result in the procedure of reformulating MINLP problem (P) of (1.1) as the equivalent MILP master program (MP).

Theorem 3.5

Assume that MINLP problem (P) of (1.1) satisfies assumption (A1). Then master program (MP) of (3.20) is equivalent to problem (P) in the sense that both problems have the same optimal value and that the optimal solution \((\bar{x}, \bar{y})\) to problem (P) corresponds to the optimal solution \((\bar{x}, \bar{y}, \bar{\theta })\) to (MP) of (3.20) with \(\bar{\theta }=f(\bar{x}, \bar{y})\).

Remark 3.1

Theorem 3.5 is one extension of main results given in [8, 23], and it generalizes the outer approximation method in the sense of equivalently reformulating convex MINLP problem (P) from differentiable case to the non-differentiable one. Further, it is known from Theorem 3.5 that all optimal solutions of problem (P) are optimal solutions to master program (MP). However, the converse is not necessarily true since some optimal solutions of (MP) may be infeasible to problem (P). We refer the reader to [2, Example 1] and [23, Remark3.1] for the detail.

Theorem 3.5 shows that some subgradients obtained from the KKT conditions enable to reformulate MINLP problem (P) as an equivalent MILP master program by outer approximation method. However, this procedure is not valid if arbitrary subgradients are chosen to replace gradients. The following example demonstrates that the substitution of the gradient by an arbitrary subgradient in the outer approximation method is insufficient for the equivalent reformulation.

Example 3.1

We consider the following convex MINLP problem:

$$\begin{aligned} \left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x,\,y} &{} f(x, y):=x+y\\ \hbox {subject to} &{} g_1(x, y):=\max \{-x+y+1, x-y+1\}\le 0,\\ &{} g_2(x,y):=x-y\le 0,\\ &{} x\in [0, 2], \ y\in \{1,2,3\}. \end{array} \right. \end{aligned}$$
(3.21)

One can verify that this convex MINLP in (3.21) is infeasible. However, let the initial point \(y_0=1\). Then subproblem \(P^{y_0}\) is infeasible and we can consider the following subproblem \(F^{y_0}\):

$$\begin{aligned} \left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x} &{} [g_1(x, y_0)]_+=g_1(x, y_0)\\ \hbox {subject to} &{} g_2(x, y_0)\le 0,\\ &{} x\in [0, 2]. \end{array} \right. \end{aligned}$$
(3.22)

It is easy to verify that \(x_0=1\) is the optimal solution to subproblem \(F^{y_0}\) and

$$\begin{aligned}{}[-1, 1]\times [-1, 1]\subset \partial g_1(x_0, y_0). \end{aligned}$$

Now, if we take \((\xi _{0,1},\eta _{0,1})=(1,1)\in \partial g_1(x_0, y_0)\), \((\xi _{0,2},\eta _{0,2})=\triangledown g_2(x_0, y_0)\) and \((\alpha _0, \beta _0)=\triangledown f(x_0, y_0)\), then the MILP is defined as

$$\begin{aligned} \left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x,\,y,\,\theta } &{} \theta \\ \hbox {subject to} &{} x+y\le \theta ,\ \ \\ &{} x+y-1\le 0,\\ &{} x-y\le 0,\\ &{} x\in [0, 2], \ y\in \{1,2,3\}. \end{array} \right. \end{aligned}$$
(3.23)

The optimal solution to MILP in (3.23) is \((x,y,\theta )=(0,1,1)\). This means that the outer approximation method for this MINLP may generate an infinite loop between points \((x_0,y_0)\) and \((0,1)\). Thus the outer approximation method is invalid for this MINLP problem in (3.21). Further, when tracking down why this method is not valid here, the reason noticed is that the KKT conditions at \((x_0,y_0)\) for \((\xi _{0,1},\eta _{0,1})\) does not hold; that is,

$$\begin{aligned} \not \exists (\lambda _{0,1},\lambda _{0,2})\in {\mathbb {R}}^2_+\ \ \hbox {satisfying} \quad \triangledown f(x_0, y_0)+\lambda _{0,1}(\xi _{0,1},\eta _{0,1})+\lambda _{0,2}\triangledown g_2(x_0, y_0)=0. \end{aligned}$$

3.2 The algorithm

In this subsection, based on the solution of MILP master program (MP) in (3.20), we pay main attention to one outer approximation algorithm for finding the optimal solution of problem (P) in (1.1) along the line in [5, 8, 23].

At iteration \(k\), the sets \(T\) and \(S\) in master program (MP) of (3.20) are substituted by the sets \(T^k\) and \(S^k\) respectively which are defined as

$$\begin{aligned} \left\{ \begin{array}{l} T^k:=\{j\le k: P^{y_j}\ \hbox {is feasible and} \ x_j \ \hbox {solves }\ P^{y_j} \},\\ S^k:=\{l\le k: P^{y_l}\ \hbox {is infeasible and} \ x_l \ \hbox {solves } \ F^{y_l} \}. \end{array} \right. \end{aligned}$$
(3.24)

If \(k\in T^k\) then \(x_k\) solves \(P^{y_k}\) and there exist \((\alpha _k,\beta _k)\in {\mathbb {R}}^n\times {\mathbb {R}}^p\) and \((\xi _{k,i},\eta _{k,i})\in {\mathbb {R}}^n\times {\mathbb {R}}^p\) for all \(i=1,\ldots ,m\) such that

$$\begin{aligned} \left\{ \begin{array}{l} -\alpha _k-\sum \limits _{i=1}^m\lambda _{k,i}\xi _{k,i}\in N(X, x_k)\ \ \hbox {for some} \ (\lambda _{k,1}\ldots ,\lambda _{k,m})\in {\mathbb {R}}^m_+,\\ (\alpha _k,\beta _k)\in \partial f(x_k,y_k),\\ (\xi _{k,i},\eta _{k,i})\in \partial g_i(x_k,y_k),\ \forall i=1,\ldots ,m. \end{array} \right. \end{aligned}$$
(3.25)

If \(k\in S^k\) then \(x_k\) solves \(F^{y_k}\) and there exist \((\xi _{k,i},\eta _{k,i})\in {\mathbb {R}}^n\times {\mathbb {R}}^p\) for all \(i=1,\ldots ,m\) such that

$$\begin{aligned} \left\{ \begin{array}{l} -\sum \limits _{i=1}^m\lambda _{k,i}\xi _{k,i}\in N(X, x_k)\ \ \hbox {for some} \ (\lambda _{k,1}\ldots ,\lambda _{k,m})\in {\mathbb {R}}^m_+,\\ (\xi _{k,i},\eta _{k,i})\in \partial g_i(x_k,y_k),\ \forall i=1,\ldots ,m. \end{array} \right. \end{aligned}$$
(3.26)

Set

$$\begin{aligned} \xi _k:=(\xi _{k,1},\ldots ,\xi _{k,m})\quad \hbox {and} \quad \eta _k:=(\eta _{k,1},\ldots ,\eta _{k,m}). \end{aligned}$$

To prevent discrete variable assignment \(y_j\) (for any \(j\in T^k\)) from being the solution to the relaxed master program, it is necessary to define \(UBD^k:=\min \{f(x_j,y_j): j\in T^k\}\) and add a constraint \(\theta <UBD^k\) to the master program. This gives rise to the following relaxed master program \(MP^k\):

$$\begin{aligned} MP^k\left\{ \begin{array}{ll} \mathop {\hbox {minimize}}\limits _{x,\,y,\,\theta } &{} \theta \\ \hbox {subject to} &{} \theta <UBD^k\\ &{} f(x_j, y_j)+(\alpha _j, \beta _j)^T\begin{pmatrix}x-x_j\\ y-y_j\end{pmatrix}\le \theta \quad \forall j\in T^k,\ \ \\ &{} g(x_j, y_j)+(\xi _j, \eta _j)^T\begin{pmatrix}x-x_j\\ y-y_j\end{pmatrix}\le 0\quad \forall j\in T^k,\\ &{} g(x_l, y_l)+(\xi _l, \eta _l)^T\begin{pmatrix}x-x_l\\ y-y_l\end{pmatrix}\le 0\quad \forall l\in S^k,\\ &{} x\in X, y\in Y\ \hbox {discrete variable}. \end{array} \right. \end{aligned}$$
(3.27)

The new discrete variable assignment \(y_{k+1}\) can be obtain by solving \(MP^k\) and the whole process is repeated iteratively until the relaxed master program is infeasible.

We are now in a position to state the outer approximation algorithm for solving problem (P) in detail.

figure a

Under the assumption of finite cardinality of discrete variable subset \(Y\), the following theorem shows Algorithm 1 can detect feasibility or infeasibility of problem (P) in (1.1) and the procedure in Algorithm 1 terminates after a finite number of steps. The proof is also given in Sect. 5.

Theorem 3.6

Suppose that MINLP problem (P) in (1.1) satisfies assumption (A1) and the cardinality of \(Y\) is finite. Then either problem (P) is infeasible or Algorithm 1 terminates in a finite number of steps at an optimal value of problem (P).

4 Conclusions

This paper is mainly devoted to the study of one convex MINLP in which objective and constraint functions are continuous and non-differentiable. With no differentiability assumption, subgradients of objective and constraint functions, the substitute of gradients in convex and smooth MINLP, are chosen from the KKT conditions and used to reformulate the MINLP problem as one equivalent mixed-integer linear program. A counterexample shows that the chosen subgradients, if not satisfying KKT conditions, may be invalid for the MILP reformulation, which demonstrates the necessity of KKT conditions in the equivalent reformulation. By solving a finite sequence of subproblems and relaxed MILP problems, one outer approximation algorithm for this convex MINLP is presented to find the optimal solution of the problem. The finite convergence of the algorithm is also proved. The work of this paper is the extension of references [5, 8, 23] and also generalizes outer approximation method in the sense of dealing with convex MINLP from differentiable case to the non-differentiable one.