Piecewise Linearization of Bivariate Nonlinear Functions: Minimizing the Number of Pieces Under a Bounded Approximation Error

Duguet, Aloïs; Ngueveu, Sandra Ulrich

doi:10.1007/978-3-031-18530-4_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13526))

Included in the following conference series:

International Symposium on Combinatorial Optimization

443 Accesses
1 Citations

Abstract

This work focuses on the approximation of bivariate functions into piecewise linear ones with a minimal number of pieces and under a bounded approximation error. Applications include the approximation of mixed integer nonlinear optimization problems into mixed integer linear ones that are in general easier to solve. A framework to build dedicated linearization algorithms is introduced, and a comparison to the state of the art heuristics shows their efficiency.

Access provided by Autonomous University of Puebla. Download conference paper PDF

On piecewise linear approximations of bilinear terms: structural comparison of univariate and bivariate mixed-integer programming formulations

Article Open access 03 November 2022

Piecewise Linear Bounding Functions for Univariate Global Optimization

MINLP formulations for continuous piecewise linear function fitting

Article Open access 09 March 2021

Keywords

1 Problem Description and State of the Art

Let ($\mathcal {P}$) be the optimization problem of approximating a nonlinear function f of two variables by a piecewise linear (PWL) function g subject to approximation error constraints on domain $\mathbb {D}$ represented by functions l and u satisfying $l(x,y) \le f(x,y) \le u(x,y)$ for all $(x,y) \in \mathbb {D}$:

Constraints (2) are pointwise approximation constraints, which gives an infinite number of constraints because $\mathbb {D}$ is a continuous domain. Piecewise linear approximation are more commonly minimizing an approximation error with a fixed number of pieces [9], but our objective is to minimize the number of pieces of g so that an MILP formulation of g introduces less binary variables [10]. It can be useful for the approximation of a Mixed Integer Nonlinear Programming problem (MINLP) with a Mixed Integer Linear Programming (MILP) by substituting each nonlinear functions by a PWL one. Moreover, it was shown in [4] that in some cases, MINLP can be solved by applying only techniques from MILP.

Heuristics and exact methods exist to approximate univariate nonlinear functions $(\mathbb {D} \subset \mathbb {R})$ [1, 6, 8]. We are interested in bivariate functions ($\mathbb {D} \subset \mathbb {R}^2$) and to the best of our knowledge, only two papers address this case, with heuristics only:

The authors of [7] propose two heuristics to solve problem $(\mathcal {P})$ with continuous PWL functions. The first heuristic is based on an iterative subdivision of the domain $\mathbb {D}$ into triangles (2-simplexes) until for each subdomain a linear function satisfying the approximation error has been found. The verification that a given linear function fits the subdomain is made by solving a nonlinear programming problem (NLP). The second heuristic can be used if the contribution of the two variables in the function can be separated in two univariate functions (linearly or nonlinearly). In this case, an algorithm finds the two optimal continuous univariate PWL functions and combine them to build a single two-variable PWL function.
In [5] an iterative process attempts to find a continuous PWL function written as a Difference of Convex Continuous PWL functions (DC CPWL) that satisfies the approximation error. The idea is to iteratively solve an MILP relaxation of $(\mathcal {P})$ and then to find lazy constraints to add to the relaxation until a solution found is feasible for $(\mathcal {P})$. The relaxation consists in replacing the infinite number of constraints (2) with a finite number of them.

After the introduction of definitions used throughout the paper in Sect. 2, the three key ideas of a framework for piecewise linearization are detailed in Sect. 3. It is followed by explanations on the instantiation of crucial parts of this framework to create different heuristics in Sect. 4, and finally, numerical experiments comparing the state of the art to our best heuristics are shown in Sect. 5.

2 Definitions

The vocabulary used throughout this work is presented below. They are in part extensions to $\mathbb {D} \subset \mathbb {R}^2$ of definitions from [1] for $\mathbb {D} \subset \mathbb {R}$.

Definition 1

(Polytope). A polytope $\mathcal {P}$ is the convex hull of some points $X_i \in \mathbb {R}^m$: $\mathcal {P} = \{x \in \mathbb {R}^m, x = \sum _i \lambda _i X_i, \sum _i \lambda _i = 1, \lambda _i \ge 0 \; \forall i \}$.

Throughout the paper, a piece will refer to a polytope that composes the graph of a PWL function, and $\llbracket 1,n \rrbracket := \{1,...,n\}$.

Definition 2

(PWL function). Let $\mathbb {D}$ be a compact set of $\mathbb {R}^2$. A function $g: \mathbb {D} \mapsto \mathbb {R}$ is a PWL function with n pieces if and only if there exists $\{a_i\}_{i \in \llbracket 1,n \rrbracket } \subset \mathbb {R}^2$, $\{b_i\}_{i \in \llbracket 1,n \rrbracket } \subset \mathbb {R}$ and a family of polytopes $\{D_i\}_{i \in \llbracket 1,n \rrbracket } \subset \mathbb {R}^2$ such that $\mathbb {D} = \cup _{i \in \llbracket 1,n \rrbracket } \; \mathbb {D}_i$, and for $i \ne j$ the polytopes $\mathbb {D}_i$ and $\mathbb {D}_j$ can only intersect on their boundary and g is defined by $g(x,y) = \min \{ a_i \cdot (x,y)^T + b_i | (x,y) \in \mathbb {D}_i \; \forall i \in \llbracket 1,n \rrbracket \} $, with $\cdot $ denoting the standard scalar product.

Precautions were taken to allow g to be not necessarily continuous at the boundary of a polytope $\mathbb {D}_i$. To prevent g(x, y) to have multiple definitions because $\{D_i\}_{i \in \llbracket 1,n \rrbracket }$ can intersect on their boundary, g(x, y) is chosen as the minimum over all possible definitions, so that it is a lower semicontinuous function.

Definition 3

(Corridor). Let $\mathbb {D}$ be a compact set of $\mathbb {R}^2$. Let $u,l : \mathbb {D} \mapsto \mathbb {R}$ be two continuous functions verifying $u(x,y) > l(x,y), \forall (x,y) \in \mathbb {D}$. Define the set $\mathcal {C}= \{(x,y,z) \in \mathbb {R}^3 | (x,y) \in \mathbb {D}, l(x,y) \le z \le u(x,y)\}$ as the corridor between u and l. If $\mathbb {D} \subset \mathbb {R}^2$, we call the area of $\mathbb {D}$ the domain area of $\mathcal {C}$.

A similar definition can be made for $\mathbb {D}$ an interval [a, b], in which case we call $b-a$ the domain length of $\mathcal {C}$.

Definition 4

(Corridor domain). Let $\mathcal {C}$ be a corridor, $\mathcal {C}\subset \mathbb {R}^3$. The domain of corridor $\mathcal {C}$ noted $D(\mathcal {C})$ is the projection of $\mathcal {C}$ on its two first coordinates, which is also the domain on which u and l need to be defined.

Definition 5

(Piece within a corridor). A polytope $\mathcal {P} \subset \mathbb {R}^3$ is within a corridor $\mathcal {C}$ if and only if there exists a linear function $g:\mathbb {D} \subset D(\mathcal {C})$, such that $\mathcal {P} = \{(x,y,g(x,y)), (x,y) \in \mathbb {D}\}$ and $\mathcal {P} \subset \mathcal {C}$.

Definition 6

(Fitting). A PWL function g fits a corridor $\mathcal {C}$ if and only if the pieces of g (polytopes $\{P\}_{i \in \llbracket 1,n \rrbracket }$ of the graph of g) are within $\mathcal {C}$ and g is defined on the entire domain $D(\mathcal {C})$.

Definition 7

(PWL corridor). A corridor $\mathcal {C}$ is called a PWL corridor if and only if u and l defining $\mathcal {C}$ are both PWL functions.

Definition 8

(Inner corridor). Let $\mathcal {C}_0$ be a corridor between $u_0$ and $l_0$. Let $\mathcal {C}$ be a corridor between u and l. We call $\mathcal {C}$ an inner corridor of $\mathcal {C}_0$ if and only if $D(\mathcal {C}) = D(\mathcal {C}_0)$ and $l_0(x,y) \le l(x,y) < u(x,y) \le u_0(x,y)$.

Definition 9

( $\mathbb {R}^2$-corridor fitting problem). The $\mathbb {R}^2$-corridor fitting problem consists in finding a PWL function g of two variables fitting a corridor $\mathcal {C}$ such that its number of pieces is minimized.

$(\mathcal {P})$ is equivalent to an $\mathbb {R}^2$-corridor fitting problem with corridor $\mathcal {C}$ between u and l. Moreover, by extending the previous definitions to dimension $m \ge 1$, it is possible to define the $\mathbb {R}^m$-corridor fitting problem that refers to the problem with $D(\mathcal {C}) \subset \mathbb {R}^m$.

Definition 10

(Truncated-corridor). Let $\mathcal {C}_1$, $\mathcal {C}_2 \in \mathbb {R}^2$ be two corridors both defined by functions u, $l: \mathbb {R} \mapsto \mathbb {R}$ on the intervals $[a,b_1]$ and $[a,b_2]$. We call $\mathcal {C}_2$ a truncated-corridor of $\mathcal {C}_1$ if and only if $[a,b_2] \subset [a,b_1]$ ($b_2 \le b_1$).

Definition 11

(Maximal linear segment). A maximal linear segment in a corridor $\mathcal {C}$ is a linear segment within $\mathcal {C}$ that induces a truncated-corridor of maximal domain length.

Definition 12

(Truncated-corridor in direction d). Let $\mathcal {C}_1$ and $\mathcal {C}_2$ be two corridors defined by the same functions u and l with compact corridor domains in $\mathbb {R}^2$. Let $d \in \mathbb {R}^2 \setminus \{0\}$. We call $\mathcal {C}_2$ a truncated-corridor of $\mathcal {C}_1$ in direction d if and only if there exists $\sigma \in \mathbb {R}$ for which $D(\mathcal {C}_2) = D(\mathcal {C}_1) \cap \{(x,y) \in \mathbb {R}^2, (x,y) \cdot d \le \sigma \}$, i.e. $D(\mathcal {C}_2)$ is the intersection of $D(\mathcal {C}_1)$ with a half-plane.

Definition 13

(Maximal piece in direction d). A maximal piece in direction $d \in \mathbb {R}^2 \setminus \{0\}$ of a corridor $\mathcal {C}$ is a polytope within $\mathcal {C}$ that induces a truncated-corridor of $\mathcal {C}$ in direction d that is of maximal domain area.

3 A Framework for Solving the $\mathbb {R}^2$-Corridor Fitting Problem

We present in this section a framework to create efficient algorithms for the $\mathbb {R}^2$-corridor fitting problem. The instantiation chosen for different parts of the framework are described in Sect. 4 as well as some details on the implementation.

Three key ideas are followed in the framework. The first two are used to avoid drawbacks encountered in [7], whereas the third one is meant to render a subproblem more tractable:

Key idea 1: Pieces should be chosen among general convex polygons instead of triangles to constrain less the pieces chosen, possibly decreasing the number of pieces
Key idea 2: Choose pieces that are good (ideally optimal) solutions of a maximal piece in direction d problem, to aim for a domain covered by a piece that is “as large as possible” because in [7] the size of triangles is fixed which increases the number of pieces necessary
Key idea 3: Compute a good feasible solution of the maximal piece in direction d problem with a series of LP problems obtained after substituting $\mathcal {C}$ with a PWL inner corridor of $\mathcal {C}$

The remainder of this section builds upon these principles.

3.1 Key Idea 1: Management of the Corridor Domain

The corridor domain should be tiled with shapes as general as possible provided that they can be formulated in an MILP. Such shapes are polygons, but we further restrict those shapes to convex polygons because formulating a non-convex polygon in an MILP introduces additional binary variables. It is expected that allowing convex polygons instead of only triangles as done in [7] will lead to a lower number of pieces.

The procedure that manages pieces and the remaining corridor domain is described in Algorithm 1. At each iteration, one piece is computed for each vertex of $D(\mathcal {C})$ by function $compute\_piece$, and a function score selects the “most” suitable to obtain a PWL function with few pieces, see Sect. 4.1. Function $update\_domain(\mathcal {C}, p)$ removes the part of $D(\mathcal {C})$ on which p is defined. This function also divides the new reduced corridor $\mathcal {C}$ in two corridors if polygon $D(\mathcal {C})$ only has angles at the vertices greater than 90 to avoid a bad behaviour.

3.2 Key Idea 2: The Maximal Piece in Direction d Problem

We chose to find a new piece by covering an area starting from point v and extending as far as possible in direction d. This direction d points to the interior of $D(\mathcal {C})$ when starting from v and is computed via function $choose\_progress\_direction$ of Algorithm 1. The hypothesis of starting from a vertex instead of any point of the border of the polygon $D(\mathcal {C})$ is made. Computing the piece consists in solving a maximal piece in direction d problem $(MP_d)$:

$$\begin{aligned} (MP_d) \left\{ \begin{array}{ll} \max &{} \sigma \\ s.t. &{} \alpha x + \beta y + \gamma \in \mathcal {C}^d_{\sigma } \quad \forall (x,y) \in D(\mathcal {C}^d_{\sigma }) \\ &{}\alpha ,\beta ,\gamma ,\sigma \in \mathbb {R} \end{array}\right. \end{aligned}$$

(4)

where $D(\mathcal {C}^d_{\sigma }) = D(\mathcal {C}) \cap \{(x,y) \in \mathbb {R}^2 | (x,y)^T \cdot d \le \sigma \}$ is the domain of $\mathcal {C}^d_{\sigma }$, the truncated-corridor of $\mathcal {C}$ in direction d. $(MP_d)$ is a generalized semi-infinite programming problem (GSIP). Indeed, the number of pointwise constraints is infinite and depends on variable $\sigma $, while the number of variables is 4: a real variable $\sigma $ for the half-plane intersection as well as 3 real variables $(\alpha ,\beta ,\gamma )$ to describe the linear function $g(x,y) = \alpha x + \beta y + \gamma $.

3.3 Key Idea 3: Computing a Feasible Solution of a Maximal Piece in Direction d Problem

As we want a computationally cheap solution to the GSIP $(MP_d)$ because of the high number of times such a problem has to be solved, a feasible solution of $(MP_d)$ is computed via a series of LP problems, as described below, using a PWL inner corridor of the original corridor $\mathcal {C}$, because it allows to replace the infinite number of nonlinear constraints by a finite number of linear constraints while ensuring feasibility.

Let corridor ${\mathcal {C}_{\textit{PWL}}}_{\sigma }^d$ be a PWL inner corridor of $\mathcal {C}^d_{\sigma }$ with associated functions $\tilde{u}$ and $\tilde{l}$ for readability; note $(D_{\tilde{u}}^i)_{i \in I}$ and $(D_{\tilde{l}}^j)_{i \in J}$ the subdomains of corridor ${\mathcal {C}_{\textit{PWL}}}_{\sigma }^d$ on which $\tilde{u}$ and $\tilde{l}$ are linear, indexed by I and J respectively. Then, ($MP_d$) is a relaxation of the following problem because $\mathcal {C}$ is replaced by an inner corridor.

$$\begin{aligned} \small (MP_d') \left\{ \begin{array}{ll} \max \; \sigma \\ s.t. \\ \alpha x + \beta y + \gamma - \tilde{u}_i(x,y) \le 0 \quad &{} \forall (x,y) \in D(\mathcal {C}^d_{\sigma }) \cap D_{\tilde{u}}^i, \forall i \in I \\ \alpha x + \beta y + \gamma - \tilde{l}_j(x,y)\ge 0 \quad &{} \forall (x,y) \in D(\mathcal {C}^d_{\sigma }) \cap D_{\tilde{l}}^j, \forall j \in J \\ \;\alpha ,\beta ,\gamma ,\sigma \in \mathbb {R} \end{array}\right. \end{aligned}$$

(5)

Note that on each $D_{\tilde{u}}^i$, $g(x,y) - \tilde{u}(x,y)$ is a linear function, thus it suffices to check $g(X)-\tilde{u}(x,y) \le 0$ for each vertex of convex polygonal domain $D_{\tilde{u}}^i$ to ensure constraint $g(x,y) - \tilde{u}(x,y) \le 0$ on $D_{\tilde{u}}^i$. A similar reasoning leads to the same result for constraints involving $\tilde{l}$. Thus, the following problem is equivalent to ($MP_d'$) but has the advantage of having only a finite number of linear constraints.

$$\begin{aligned} \small (MP_d'') \left\{ \begin{array}{ll} \max \; \sigma \\ s.t. \\ \alpha x + \beta y + \gamma - \tilde{u}_i(x,y) \le 0 \quad &{} \text { for each vertex }(x,y)\text { of } D(\mathcal {C}^d_{\sigma }) \cap D_{\tilde{u}}^i, \forall i \in I \\ \alpha x + \beta y + \gamma - \tilde{l}_j(x,y)\ge 0 \quad &{} \text { for each vertex }(x,y)\text { of } D(\mathcal {C}^d_{\sigma }) \cap D_{\tilde{l}}^j, \forall j \in J \\ \;\alpha ,\beta ,\gamma ,\sigma \in \mathbb {R} \end{array}\right. \end{aligned}$$

(6)

Finally, problem $(MP_d'')$ has constraints involving polygon intersections depending nonlinearly on variable $\sigma $, thus it is not an LP problem. Parameterizing $(MP_d'')$ with $\sigma $, the following LP feasibility problem is obtained which can be repeatedly solved until a satisfactory $\sigma $ value has been found.

$$\begin{aligned} \small (MP_{d,\sigma }'') \left\{ \begin{array}{ll} \alpha x + \beta y + \gamma - \tilde{u}_i(x,y) \le 0 \quad &{} \text { for each vertex }(x,y)\text { of } D(\mathcal {C}^d_{\sigma }) \cap D_{\tilde{u}}^i, \forall i \in I \\ \alpha x + \beta y + \gamma - \tilde{l}_j(x,y)\ge 0 \quad &{} \text { for each vertex }(x,y)\text { of } D(\mathcal {C}^d_{\sigma }) \cap D_{\tilde{l}}^j, \forall j \in J \\ \;\alpha ,\beta ,\gamma \in \mathbb {R} \end{array}\right. \end{aligned}$$

(7)

4 Framework Key Points Instantiation

In this section, choices made on key points of the framework of Sect. 3 are described: the function score of Algorithm 1 in Sect. 4.1, the choice of a direction d in Sect. 4.2 and the computation of a PWL inner corridor in Sect. 4.3. $\mathcal {C}$ refer to a corridor between u and l in the remainder of the section.

4.1 Scoring the Quality of Pieces

In Algorithm 1, a function score ranks the quality of candidate pieces and the piece with highest score is kept. Two scoring functions have been implemented:

Area measures the area of the domain covered by piece p
Partial Derivatives Total Variation (PaD) approximates the sum of the total variation of each partial derivative of u and l on the domain covered by p

The total variation of a function g is a measure of how much that function varies on its domain $\mathcal {D}$. The total variation of g on $\mathcal {D}$ is equal to $\int _{\mathcal {D}} {||\nabla g(x)||}_2 dx$. The interest of PaD needs the introduction of the pointwise height of a corridor.

Definition 14

(pointwise height). We call $\mathcal {C}_{PH}(x,y) = u(x,y)-l(x,y)$ the pointwise height of corridor $\mathcal {C}$ at point (x, y).

Remark 1

The most commonly used type of approximation error for a function f is the absolute error. It induces a corridor $\mathcal {C}$ such that $l(x,y) = f(x,y)-\delta $ and $u(x,y) = f(x,y)+\delta $ with $\delta > 0$, that has constant pointwise height.

Area is a straightforward and simple idea to evaluate the piece quality, it will serve as a reference to evaluate other scoring functions. PaD is thought to be an adaptation for two-variable functions of Theorem 1 of [3]. Indeed, it states that for a corridor $\mathcal {C}$ of constant pointwise height $2 \delta $ with $D(\mathcal {C}) = [a,b]$, when $\delta \rightarrow 0$, the minimum number of pieces $s(\delta )$ satisfies the asymptotic approximation $s(\delta ) \sim \frac{1}{4 \delta } \int _a^b \sqrt{\vert u''(x,y)\vert }dx$. Note first that $u''(x,y)=l''(x,y)$ because it is a corridor with constant pointwise height, and second that the integral computed is the total variation of function $u'$ (and $l'$). It is thus expected that PaD performs better than Area for small values of pointwise height. For large values, the total variation should be less relevant to estimate the difficulty of fitting a piece, thus PaD could be less efficient.

4.2 Choose a Progress Direction

Line 8 of Algorithm 1 selects progress direction d knowing starting vertex v. Two options were tested for this choice.

the direction going along the bisector of the two edges of $D(\mathcal {C})$ having v as starting point, denoted bd for Bisector Direction
Compute two maximal linear segments starting from v and following each edge of $D(\mathcal {C})$ having v as endpoint. The direction orthogonal to the line joining the two ends of the maximal linear segments is chosen as the progress direction d. It is denoted med for Mean progress along Edges Direction

The first is a naive option, while the second is meant to take into account the “difficulty” of progressing along the two extremal directions given by the two edges starting at v.

4.3 Inner Approximation of a Corridor

In the function $compute\_piece$, the computation of a PWL inner corridor $\mathcal {C}_{\textit{PWL}}$ of $\mathcal {C}$ boils down to computing PWL functions $\tilde{u}$ and $\tilde{l}$ verifying $l(x,y) \le \tilde{l}(x,y) \le \tilde{u}(x,y) \le u(x,y)$, for all $(x,y) \in D(\mathcal {C})$.

To compute $\tilde{u}$ (a similar method works for $\tilde{l}$), a basic idea is to divide $D(\mathcal {C})$ into rectangular pieces of same sizes, and then to compute a third coordinate $\tilde{u}(x,y)$ to each vertex $v=(x,y)$ of each rectangular piece such that $\tilde{u}(x,y) \le u(x,y)$. Interval analysis on the gradient of u suffices to compute the values of $\tilde{u}(x,y)$ such that $\tilde{u}$ is an underestimation of u, as explained in Proposition 1.

Proposition 1

Let $u \in C^1$ defined on $\mathcal {D}= [a,b]\times [c,d] \subset \mathbb {R}^2$. Let $l_x = b-a$ and $l_y = d-c$. Let $\nabla u$ be the gradient of u. Let $[D_x^{low},D_x^{high}] \times [D_y^{low},D_y^{high}]$ be such that $\nabla u(x,y) \in [D_x^{low},D_x^{high}] \times [D_y^{low},D_y^{high}]$ for all $(x,y) \in \mathcal {D}$. Let $(M_x,M_y) = (\frac{a+b}{2},\frac{c+d}{2})$. Define:

$$\begin{aligned}&u_{(a,c)}^- := u(M_x,M_y)-D_x^{high} \cdot l_x-D_y^{high} \cdot l_y \end{aligned}$$

(8)

$$\begin{aligned}&u_{(b,c)}^- := u(M_x,M_y)+D_x^{low} \cdot l_x-D_y^{high} \cdot l_y \end{aligned}$$

(9)

$$\begin{aligned}&u_{(b,d)}^- := u(M_x,M_y)+D_x^{low} \cdot l_x+D_y^{low} \cdot l_y \end{aligned}$$

(10)

$$\begin{aligned}&u_{(a,d)}^- := u(M_x,M_y)-D_x^{high} \cdot l_x+D_y^{low} \cdot l_y \end{aligned}$$

(11)

If a linear function f satisfies:

$$\begin{aligned} f(a,c) \le u_{(a,c)}^- \text {, } f(b,c) \le u_{(b,c)}^- \text {, } f(b,d) \le u_{(b,d)}^- \text { and } f(a,d) \le u_{(a,d)}^- \end{aligned}$$

(12)

Then $f(x,y) \le u(x,y)$ for all $(x,y) \in \mathcal {D}$.

Proof

Let f be a linear function satisfying the 4 inequalities (12). Let $M = (M_x,M_y,f(M_x,M_y)$) be the point on the surface defined by u corresponding to the middle of $\mathcal {D}$. Let $(x_0,y_0) \in \mathcal {D}$. We have:

$$\begin{aligned}&u(M_x,M_y)-D_x^{high} \cdot (M_x-x_0)-D_y^{high} \cdot (M_y-y_0)&\le u(x_0,y_0) \quad&\text {if } x_0 \le M_x, y_0 \le M_y \nonumber \\&u(M_x,M_y)+D_x^{low} \cdot (x_0-M_x)-D_y^{high} \cdot (M_y-y_0)&\le u(x_0,y_0) \quad&\text {if } x_0 \ge M_x, y_0 \le M_y \nonumber \\&u(M_x,M_y)+D_x^{low} \cdot (x_0-M_x)+D_y^{low} \cdot (y_0-M_y)&\le u(x_0,y_0) \quad&\text {if } x_0 \ge M_x, y_0 \ge M_y \nonumber \\&u(M_x,M_y)-D_x^{high} \cdot (M_x-x_0)+D_y^{low} \cdot (y_0-M_y)&\le u(x_0,y_0) \quad&\text {if } x_0 \le M_x, y_0 \ge M_y \nonumber \end{aligned}$$

because $[D_x^{low},D_x^{high}] \times [D_y^{low},D_y^{high}]$ are bounds of $\nabla u$ on domain $\mathcal {D}$. Now, define a PWL function $f_{\textit{PWL}}$ with four rectangle pieces with vertices positionned at $\{(a,c),(b,c),(b,d),(a,d),(\frac{a+b}{2},\frac{c+d}{2})\}$ and height the left-hand sides of (12) as well as $u(M_x,M_y)$ respectively. In particular, $f_{\textit{PWL}} \le u$ on $\mathcal {D}$. In addition, direct computations show that $f(x,y) \le f_{\textit{PWL}}(x,y)$ for $(x,y) \in \{(a,c),(b,c),(b,d),(a,d),(\frac{a+b}{2},\frac{c+d}{2})\}$. Finally, as f and $f_{\textit{PWL}}$ are linear on the four pieces domain of $f_{\textit{PWL}}$, we have $f \le f_{\textit{PWL}} \le u$ on $\mathcal {D}$. $\square $

To build a PWL inner corridor exploiting Proposition 1 in our algorithm, a method called efficiency refinement is used. It is described after the introduction of the bounding efficiency $\eta $.

Definition 15

(bounding efficiency $\eta $). Let $\mathcal {C}$ be a corridor with $D(\mathcal {C})$ a polygonal domain of $\mathbb {R}^2$. Let ${\mathcal {C}_{\textit{PWL}}}$ be a PWL inner corridor of $\mathcal {C}$. We say that ${\mathcal {C}_{\textit{PWL}}}$ achieves a bounding efficiency for $\mathcal {C}$ of $\eta \in [0,1]$ if the pointwise height (PH) ratio $\frac{({\mathcal {C}_{\textit{PWL}}})_{PH}(x,y)}{\mathcal {C}_{PH}(x,y)}$ is greater or equal to $\eta $ for each $(x,y) \in D(\mathcal {C})$.

Proposition 2

Let $\mathcal {C}$ be a corridor with $D(\mathcal {C})$ a polygonal domain of $\mathbb {R}^2$ and let $\eta \in [0,1]$. If $\mathcal {C}$ has constant pointwise height, then a PWL inner corridor ${\mathcal {C}_{\textit{PWL}}}$ has a bounding efficiency for $\mathcal {C}$ of $\eta $ if the pointwise height ratio $\frac{({\mathcal {C}_{\textit{PWL}}})_{PH}(x,y)}{\mathcal {C}_{PH}(x,y)}$ is greater or equal to $\eta $ for each (x, y) vertex of a piece domain of $\tilde{u}$ or $\tilde{l}$.

Proof

$\mathcal {C}$ has constant pointwise height. Thus for each piece of ${\mathcal {C}_{\textit{PWL}}}$, the minimum pointwise height ratio is on an extreme point of the piece domain, that is to say on a vertex of the piece domain, which is lower bounded by $\eta $ by hypothesis. $\square $

The efficiency refinement procedure builds a PWL inner corridor ${\mathcal {C}_{\textit{PWL}}}$ of $\mathcal {C}$ achieving a bounding efficiency of $\eta $ if it has a constant pointwise height, but without this property, it only checks that the pointwise height ratio at the vertices of each piece is $\eta $.

After having found a rectangle $\mathcal {D}$ containing $D(\mathcal {C})$, it creates an initial PWL corridor likely unvalid ($l {|} \le u$) with only one rectangular piece on $\mathcal {D}$, and then iteratively refines the pieces that do not satisfy a bounding efficiency of $\eta $ at the 4 vertices into 4 new pieces until each piece satisfies the efficiency, which implies the validity of the PWL inner corridor as well.

Parameter $\eta $ needs to be adjusted depending on the quality of PWL inner corridor ${\mathcal {C}_{\textit{PWL}}}$ wanted. To produce a really good approximation of corridor $\mathcal {C}$, $\eta $ near 1 shall be used, but the number of pieces forming ${\mathcal {C}_{\textit{PWL}}}$ increases consequently, thus increasing the computation time of $(MP_{d,\beta }'')$ to be solved later on.

5 Numerical Experiments

The performance of our framework is compared to the state of the art. Our best heuristic (found with experiments in [2]) called DN99 uses scoring function PaD, starting vertex set to med and $\eta =0.99$. Also, to illustrate the effects of parameter $\eta $, results of a heuristic which uses the same parameters as DN99 but with $\eta =0.95$, are shown. This second heuristic is called DN95.

Heuristics from the state of the art are described in articles [1, 5, 7]. Recall that [7] proposes two heuristics: one based on triangulation (RK2D), and one based on a decomposition of the bivariate function into a sum of one variable functions to approximate separately (RK1D). Heuristic LinA2D is based on the same decomposition as RK1D, but uses library LinA from [1] to compute the approximation of univariate functions. Obviously LinA2D outperforms RK1D since LinA computes optimal non necessarily continuous univariate PWL functions instead of optimal continuous univariate PWL functions for RK1D. The heuristic of [5] is denoted KL2D.

Those 5 heuristics are compared on the benchmark instances of [7], which consist of 45 instances obtained from 9 functions and 5 different absolute approximation errors for each function. An absolute approximation error of $\delta $ for function f means that the corresponding corridor $\mathcal {C}$ is between functions u and l satisfying $l(x,y) = f(x,y) - \delta $ and $u(x,y) = f(x,y) + \delta $. The first two functions of the benchmark are linearly separable and refered to as L1,L2. The remaining seven are not linearly separable and thus refered to as N1,...,N7. The expression and domain of each function are described in Table 1.

For our heuristics as well as LinA2D, JuMP (v. 0.21.8) is used as modeling language, Gurobi (v. 9.1) is the (MI)LP solver and a CPU of 4.4 GHz using a single core and 32 GB RAM. Moreover, heuristic RK2D uses the modeling language GAMS (v. 23.6), the global optimization solver LindoGlobal (v. 23.6.5), a CPU intel i7 with a single core, 2.93 GHz and 12 GB RAM. Finally, KL2D uses Pyomo (v. 5.6.4) as modeling language, CPLEX (v. 12.8) and 10 threads to solve MILPs, COUENNE (v.0.5.8) and 1 thread to solve NLPs, a CPU with 3.6 GHz and 32 GB RAM. As for the time limit, LinA2D and RK2D do not have one, KL2D allows 3600 s seconds for each MILP problem, while our heuristics allow 3600 s seconds for each LP problem. The differences in computation time between heuristics are in several orders of magnitude, therefore these differences are mainly due to the algorithms differences and are only marginally impacted by the differences in hardware.

Table 1. Expression and domain of benchmark functions

Full size table

Results are shown in Table 2. For each of the five heuristics we present the number of pieces n obtained by the heuristic as well as its computation time in seconds. Bold integer highlight the best solution found for each instance, i.e. the ones with the minimum number of pieces. “TO” means that the heuristic has stopped because of a time out, thus without any valid solution. In this case, “-” means that no solutions were found.

In terms of minimum number of pieces, DN99 performs the best with 28 best solutions out of 45 instances, followed by KL2D with 19 out of 45, DN95 with 15 out of 45, LinA2D with 8 out of 45 and finally RK2D with 1 out of 45. As for the computation time, the hardware and software used for the different heuristics are of different quality, thus it will not be a precise tool to compare the time needed for each heuristics. However, it can be said that LinA2D takes less than 1 s to execute, RK2D and DN95 take seconds to minutes, while KL2D and DN99 take seconds to hours.

Table 2. Comparison with state of the art heuristics

Full size table

A more in-depth analysis shows that LinA2D is the best only for instances with linearly separable functions (L1 and L2), but it does really poorly on the other functions. KL2D is the best 19 times out of the 24 instances where it terminates with a solution, which shows a clear limit to the size of the instance it can tackle, due to the increasing size of MILPs it solves. The effect of the change of parameter $\eta $ from 95% to 99% in heuristics DN95 and DN99 is on average a decrease of 9.1% of the number of pieces of the solutions, at the cost of an increase of 301% of the computation time.

6 Conclusion

We introduced a framework to create linearization algorithms based on the solution of the $\mathbb {R}^2$-corridor fitting problem. Convex polygons were used to tile the domain instead of triangles, a scoring function ranking candidate pieces was developed and a good feasible solution of a GSIP was computed via a series of LP feasibility problems. Finally, numerical experiments showed that our heuristics outperforms the state of the art on not linearly separable functions. Further work could attempt to diminish the relatively high computation time of the heuristics, search for better instanciation of the different parts of the framework or apply those heuristics to approximate real-world MINLPs to show their practical usage.

References

Codsi, J., Ngueveu, S.U., Gendron, B.: Lina: a faster approach to piecewise linear approximations using corridors and its application to mixed-integer optimization. Technical report (2021)
Google Scholar
Duguet, A.: Appendix of ”piecewise linearization of bivariate nonlinear functions: minimizing the number of pieces under a bounded approximation error”: determining the best heuristic (2022). https://homepages.laas.fr/sungueve/2dpwl.html
Frenzen, C., Sasao, T., Butler, J.T.: On the number of segments needed in a piecewise linear approximation. J. Comput. Appl. Math. 234(2), 437–446 (2010). https://doi.org/10.1016/j.cam.2009.12.035
Article MathSciNet MATH Google Scholar
Geißler, B., Martin, A., Morsi, A., Schewe, L.: Using piecewise linear functions for solving MINLPs. In: Mixed Integer Nonlinear Programming. The IMA Volumes in Mathematics and its Applications, vol. 154, pp. 287–314. Springer, Heidelberg (2012). https://doi.org/10.1007/978-1-4614-1927-3_10
Kazda, K., Li, X.: Nonconvex multivariate piecewise-linear fitting using the difference-of-convex representation. Comput. Chem. Eng. 150, 107310 (2021). https://doi.org/10.1016/j.compchemeng.2021.107310
Article Google Scholar
Ngueveu, S.U.: Piecewise linear bounding of univariate nonlinear functions and resulting mixed integer linear programming-based solution methods. Eur. J. Oper. Res. 275(3), 1058–1071 (2019). https://doi.org/10.1016/j.ejor.2018.11.021
Article MathSciNet MATH Google Scholar
Rebennack, S., Kallrath, J.: Continuous piecewise linear delta-approximations for bivariate and multivariate functions. J. Optim. Theory Appl. 167(1), 102–117 (2014). https://doi.org/10.1007/s10957-014-0688-2
Article MathSciNet MATH Google Scholar
Rebennack, S., Krasko, V.: Piecewise linear function fitting via mixed-integer linear programming. INFORMS J. Comput. 32(2), 507–530 (2020). https://doi.org/10.1287/ijoc.2019.0890
Article MathSciNet MATH Google Scholar
Toriello, A., Vielma, J.P.: Fitting piecewise linear continuous functions. Eur. J. Oper. Res. 219(1), 86–95 (2012). https://doi.org/10.1016/j.ejor.2011.12.030
Article MathSciNet MATH Google Scholar
Vielma, J.P., Ahmed, S., Nemhauser, G.: Mixed-integer models for nonseparable piecewise-linear optimization: Unifying framework and extensions. Oper. Res. 58(2), 303–315 (2010). https://doi.org/10.1287/opre.1090.0721
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

LAAS-CNRS, Université de Toulouse, CNRS, INP, Toulouse, France
Aloïs Duguet & Sandra Ulrich Ngueveu

Authors

Aloïs Duguet
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Ulrich Ngueveu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aloïs Duguet .

Editor information

Editors and Affiliations

ESSEC Business School of Paris, Cergy Pontoise Cedex, France
Ivana Ljubić
IBM TJ Watson Research Center, Yorktown Heights, NY, USA
Francisco Barahona
Georgia Institute of Technology, Atlanta, GA, USA
Santanu S. Dey
Université Paris-Dauphine, Paris, France
A. Ridha Mahjoub

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duguet, A., Ngueveu, S.U. (2022). Piecewise Linearization of Bivariate Nonlinear Functions: Minimizing the Number of Pieces Under a Bounded Approximation Error. In: Ljubić, I., Barahona, F., Dey, S.S., Mahjoub, A.R. (eds) Combinatorial Optimization. ISCO 2022. Lecture Notes in Computer Science, vol 13526. Springer, Cham. https://doi.org/10.1007/978-3-031-18530-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-18530-4_9
Published: 21 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18529-8
Online ISBN: 978-3-031-18530-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Piecewise Linearization of Bivariate Nonlinear Functions: Minimizing the Number of Pieces Under a Bounded Approximation Error

Abstract

Similar content being viewed by others

On piecewise linear approximations of bilinear terms: structural comparison of univariate and bivariate mixed-integer programming formulations

Piecewise Linear Bounding Functions for Univariate Global Optimization

MINLP formulations for continuous piecewise linear function fitting

Keywords

1 Problem Description and State of the Art

2 Definitions

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

Definition 7

Definition 8

Definition 9

Definition 10

Definition 11

Definition 12

Definition 13

3 A Framework for Solving the \(\mathbb {R}^2\)-Corridor Fitting Problem

3.1 Key Idea 1: Management of the Corridor Domain

3.2 Key Idea 2: The Maximal Piece in Direction d Problem

3.3 Key Idea 3: Computing a Feasible Solution of a Maximal Piece in Direction d Problem

4 Framework Key Points Instantiation

4.1 Scoring the Quality of Pieces

Definition 14

Remark 1

4.2 Choose a Progress Direction

4.3 Inner Approximation of a Corridor

Proposition 1

Proof

Definition 15

Proposition 2

Proof

5 Numerical Experiments

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation