Keywords

1 Introduction

While previously artificial intelligence was mainly used for soft applications such as movie recommendations [9], facial recognition [23], or chess computers [11], it is now also increasingly applied in safety-critical applications, such as autonomous driving [32], human-robot collaboration [27], or power system control [5]. In contrast to soft applications, where failures usually only have minor consequences, failures in safety-critical applications in the worst case result in loss of human lives. Consequently, in order to prevent those failures, there is an urgent need for efficient methods that can verify that the neural networks used for artificial intelligence function correctly. Verification problems involving neural networks can be grouped into two main categories:

  • Open-loop verification: Here the task is to check if the output of the neural network for a given input set satisfies certain properties. With this setup one can for example prove that a neural network used for image classification is robust against a certain amount of noise on the image.

  • Closed-loop verification: In this case the neural network is used as a controller for a dynamical system, e.g., to steer the system to a given goal set while avoiding unsafe regions. The safety of the controlled system can be verified using reachability analysis.

For both of the above verification problems, the most challenging step is to compute a tight enclosure of the image through the neural network for a given input set. Due to the high expressiveness of neural networks, their images usually have complex shapes, so that convex enclosures are often too conservative for verification. In this work, we show how to overcome this limitation with our novel approach for computing tight non-convex enclosures of images through neural networks using polynomial zonotopes.

Fig. 1.
figure 1

Triangle relaxation (left), zonotope abstraction (middle), and polynomial zonotope abstraction (right) of the ReLU activation function.

1.1 State of the Art

We first summarize the state of the art for open-loop neural network verification followed by reachability analysis for neural network controlled systems. Many different set representations have been proposed for computing enclosures of the image through a neural network, including intervals [43], polytopes [38], zonotopes [34], star sets [40], and Taylor models [21]. For neural networks with ReLU activation functions, it is possible to compute the exact image. This can be either achieved by recursively partitioning the input set into piecewise affine regions [42], or by propagating the initial set through the network using polytopes [38, 48] or star sets [40], where the set is split at all neurons that are both active or inactive. In either case the exact image is in the worst case given as a union of \(2^v\) convex sets, with \(v\) being the number of neurons in the network. To avoid this high computational complexity for exact image computation, most approaches compute a tight enclosure instead using an abstraction of the neural network. For ReLU activation functions one commonly used abstraction is the triangle relaxation [15] (see Fig. 1), which can be conveniently integrated into set propagation using star sets [40]. Another possibility is to abstract the input-output relation by a zonotope (see Fig. 1), which is possible for ReLU, sigmoid, and hyperbolic tangent activation functions [34]. One can also apply Taylor model arithmetic [26] to compute the image through networks with sigmoid and hyperbolic tangent activation [21], which corresponds to an abstraction of the input-output relation by a Taylor series expansion. In order to better capture dependencies between different neurons, some approaches also abstract the input-output relation of multiple neurons at once [28, 36].

While computation of the exact image is infeasible for large networks, the enclosures obtained by abstractions are often too conservative for verification. To obtain complete verifiers, many approaches therefore use branch and bound strategies [7] that split the input set and/or single neurons until the specification can either be proven or a counterexample is found. For computational reasons branch and bound strategies are usually combined with approaches that are able to compute rough interval bounds for the neural network output very fast. Those bounds can for example be obtained using symbolic intervals [43] that store linear constraints on the variables in addition to the interval bounds to preserve dependencies. The DeepPoly approach [35] uses a similar concept, but applies a back-substitution scheme to obtain tighter bounds. With the FastLin method [45] linear bounds for the overall network can be computed from linear bounds for the single neurons. The CROWN approach [49] extends this concept to linear bounds with different slopes as well as quadratic bounds. Several additional improvements for the CROWN approach have been proposed, including slope optimization using gradient descent [47] and efficient ReLU splitting [44]. Instead of explicitly computing the image, many approaches also aim to verify the specification directly using SMT solvers [22, 30], mixed-integer linear programming [8, 37], semidefinite programming [31], and convex optimization [24].

For reachability analysis of neural network controlled systems one has to compute the set of control inputs in each control cycle, which is the image of the current reachable set through the neural network controller. Early approaches compute the image for ReLU networks exactly using polytopes [46] or star sets [39]. Since in this case the number of coexisting sets grows rapidly over time, these approaches have to unite sets using convex hulls [46] or interval enclosures [39], which often results in large over-approximations. If template polyhedra are used as a set representation, reachability analysis for neural network controlled systems with discrete-time plants reduces to the task of computing the maximum output along the template directions [12], which can be done efficiently. Neural network controllers with sigmoid and hyperbolic tangent activation functions can be converted to an equivalent hybrid automaton [20], which can be combined with the dynamics of the plant using the automaton product. However, since each neuron is represented by an additional state, the resulting hybrid automaton is very high-dimensional, which makes reachability analysis challenging. Some approaches approximate the overall network with a polynomial function [14, 18] using polynomial regression based on samples [14] and Bernstein polynomials [18]. Yet another class of methods [10, 21, 33, 41] employs abstractions of the input-output relation for the neurons to compute the set of control inputs using intervals [10], star sets [41], Taylor models [21], and a combination of zonotopes and Taylor models [33]. Common tools for reachability analysis of neural network controlled systems are JuliaReach [6], NNV [41], POLAR [19], ReachNN* [16], RINO [17], Sherlock [13], Verisig [20], and Verisig 2.0 [21], where JuliaReach uses zonotopes for neural network abstraction [33], NVV supports multiple set representations, ReachNN* applies the Bernstein polynomial method [18], POLAR approximates single neurons by Bernstein polynomials [19], RINO computes interval inner- and outer-approximations [17], Sherlock uses the polynomial regression approach [14], Verisig performs the conversion to a hybrid automaton [20], and Verisig 2.0 uses the Taylor model based neural network abstraction method [21].

1.2 Overview

In this work, we present a novel approach for computing tight non-convex enclosures of images through neural networks with ReLU, sigmoid, or hyperbolic tangent activation functions. The high-level idea is to approximate the input-output relation of each neuron by a polynomial function, which results in the abstraction visualized in Fig. 1. Since polynomial zonotopes are closed under polynomial maps, the image through this function can be computed exactly, yielding a tight enclosure of the image through the overall neural network. The remainder of this paper is structured as follows: After introducing some preliminaries in Sect. 2, we present our approach for computing tight enclosures of images through neural networks in Sect. 3. Next, we show how to utilize this result for reachability analysis of neural network controlled systems in Sect. 4. Afterwards, in Sect. 5, we introduce some special operations on polynomial zonotopes that we require for image and reachable set computation, before we finally demonstrate the performance of our approach on numerical examples in Sect. 6.

1.3 Notation

Sets are denoted by calligraphic letters, matrices by uppercase letters, and vectors by lowercase letters. Given a vector \(b \in \mathbb {R}^n\), \(b_{(i)}\) refers to the i-th entry. Given a matrix \(A \in \mathbb {R}^{o\times n}\), \(A_{(i,\cdot )}\) represents the i-th matrix row, \(A_{(\cdot ,j)}\) the j-th column, and \(A_{(i,j)}\) the j-th entry of matrix row i. The concatenation of two matrices C and D is denoted by [C D], and \(I_n \in \mathbb {R}^{n \times n}\) is the identity matrix. The symbols \(\textbf{0}\) and \(\textbf{1}\) represent matrices of zeros and ones of proper dimension, the empty matrix is denoted by [ ], and \(\textrm{diag}(a)\) returns a diagonal matrix with \(a \in \mathbb {R}^n\) on the diagonal. Given a function f(x) defined as \(f:~\mathbb {R}\rightarrow \mathbb {R}\), \(f'(x)\) and \(f''(x)\) denote the first and second derivative with respect to x. The left multiplication of a matrix \(A \in \mathbb {R}^{o\times n}\) with a set \(\mathcal {S} \subset \mathbb {R}^n\) is defined as \(A \, \mathcal {S} := \{ A\, s ~ | ~ s \in \mathcal {S} \}\), the Minkowski addition of two sets \(\mathcal {S}_1 \subset \mathbb {R}^n\) and \(\mathcal {S}_2 \subset \mathbb {R}^n\) is defined as \(\mathcal {S}_1 \oplus \mathcal {S}_2 := \{ s_1 + s_2 ~|~ s_1 \in \mathcal {S}_1, s_2 \in \mathcal {S}_2 \}\), and the Cartesian product of two sets \(\mathcal {S}_1 \subset \mathbb {R}^n\) and \(\mathcal {S}_2 \subset \mathbb {R}^o\) is defined as \(\mathcal {S}_1 \times \mathcal {S}_2 := \big \{ [s_1^T~s_2^T]^T ~|~ s_1 \in \mathcal {S}_1, s_2 \in \mathcal {S}_2 \big \}\). We further introduce an n-dimensional interval as \(\mathcal {I} := [l,u],~ \forall i ~ l_{(i)} \le u_{(i)},~ l,u \in \mathbb {R}^n\).

2 Preliminaries

Let us first introduce some preliminaries required throughout the paper. While the concepts presented in this work can equally be applied to more complex network architectures, we focus on feed-forward neural networks for simplicity:

Definition 1

(Feed-forward neural network) A feed-forward neural network with \(\kappa \) hidden layers consists of weight matrices \(W_i \in \mathbb {R}^{v_i \times v_{i-1}}\) and bias vectors \(b_i \in \mathbb {R}^{v_i}\) with \(i \in \{1,\dots ,\kappa +1\}\) and \(v_i\) denoting the number of neurons in layer i. The output \(y \in \mathbb {R}^{v_{\kappa +1}}\) of the neural network for the input \(x \in \mathbb {R}^{v_0}\) is

$$\begin{aligned} y := y_{\kappa +1} ~~ \text {with} ~~ y_0 = x,~~ y_{i(j)} = \mu \bigg (\sum _{k=1}^{v_{i-1}} W_{i(j,k)} \, y_{i-1(k)} + b_{i(j)} \bigg ), ~ i = 1,\dots ,\kappa +1, \end{aligned}$$

where \(\mu :~\mathbb {R}\rightarrow \mathbb {R}\) is the activation function.

In this paper we consider ReLU activations \(\mu (x) = \max (0,x)\), sigmoid activations \(\mu (x) = \sigma (x) = 1/(1+e^{-x})\), and hyperbolic tangent activations \(\mu (x) = \tanh (x) = (e^x - e^{-x})/(e^x + e^{-x})\). Moreover, neural networks often do not apply activation functions on the output neurons, which corresponds to using the identity map \(\mu (x) = x\) for the last layer. The image \(\mathcal {Y}\) through a neural network is defined as the set of outputs for a given set of inputs \(\mathcal {X}_0\), which is according to Def. 1 given as

$$\begin{aligned} \mathcal {Y} = \bigg \{ y_{\kappa +1} ~\bigg |~ y_0 \in \mathcal {X}_0,~\forall i \in \{1,\dots ,\kappa +1\}:~ y_{i(j)} = \mu \bigg (\sum _{k=1}^{v_{i-1}} W_{i(j,k)} \, y_{i-1(k)} + b_{i(j)} \bigg )\bigg \}. \end{aligned}$$

We present a novel approach for tightly enclosing the image through a neural network by a polynomial zonotope [2], where we use the sparse representation of polynomial zonotopes [25]Footnote 1:

Fig. 2.
figure 2

Step-by-step construction of the polynomial zonotope from Example 1.

Definition 2

(Polynomial zonotope) Given a constant offset \(c \in \mathbb {R}^n\), a generator matrix of dependent generators \(G \in \mathbb {R}^{n \times h}\), a generator matrix of independent generators \(G_I \in \mathbb {R}^{n \times q}\), and an exponent matrix \(E \in \mathbb {N}_{0}^{p \times h}\), a polynomial zonotope \(\mathcal{P}\mathcal{Z} \subset \mathbb {R}^n\) is defined as

$$\begin{aligned} \mathcal{P}\mathcal{Z} := \bigg \{ c + \sum _{i=1}^h \bigg ( \prod _{k=1}^p \alpha _k ^{E_{(k,i)}} \bigg ) G_{(\cdot ,i)} + \sum _{j=1}^{q} \beta _j G_{I(\cdot ,j)} ~ \bigg | ~ \alpha _k, \beta _j \in [-1,1] \bigg \}. \end{aligned}$$

The scalars \(\alpha _k\) are called dependent factors since a change in their value affects multiplication with multiple generators. Analogously, the scalars \(\beta _j\) are called independent factors because they only affect the multiplication with one generator. For a concise notation we use the shorthand \(\mathcal{P}\mathcal{Z} = \langle c,G, G_I, E \rangle _{PZ}\).

Let us demonstrate polynomial zonotopes by an example:

Example 1

The polynomial zonotope

$$\begin{aligned} \mathcal{P}\mathcal{Z} = \left\langle \begin{bmatrix} 4 \\ 4 \end{bmatrix}, \begin{bmatrix} 2 &{} ~1~ &{} 2 \\ 0 &{} ~2~ &{} 2 \end{bmatrix}, \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 &{} ~0~ &{} 3 \\ 0 &{} 1 &{} 1 \end{bmatrix} \right\rangle _{PZ} \end{aligned}$$

defines the set

$$\begin{aligned} \mathcal{P}\mathcal{Z} = \bigg \{ \begin{bmatrix} 4 \\ 4 \end{bmatrix} + \begin{bmatrix} 2 \\ 0 \end{bmatrix} \alpha _1 + \begin{bmatrix} 1 \\ 2 \end{bmatrix} \alpha _2 + \begin{bmatrix} 2 \\ 2 \end{bmatrix} \alpha _1^3 \alpha _2 + \begin{bmatrix} 1 \\ 0 \end{bmatrix} \beta _1 ~ \bigg | ~ \alpha _1, \alpha _2, \beta _1 \in [-1,1] \bigg \}. \end{aligned}$$

The construction of this polynomial zonotope is visualized in Fig. 2.

3 Image Enclosure

We now present our novel approach for computing tight non-convex enclosures of images through neural networks. The general concept is to approximate the input-output relation of each neuron by a polynomial function, the image through which can be computed exactly since polynomial zonotopes are closed under polynomial maps. For simplicity, we focus on quadratic approximations here, but the extension to polynomials of higher degree is straightforward.

The overall procedure for computing the image is summarized in Algorithm 1, where the computation proceeds layer by layer. For each neuron in the current layer i we first calculate the corresponding input set in Line 5. Next, in Line 6, we compute a lower and an upper bound for the input to the neuron. Using these bounds we then calculate a quadratic approximation for the neuron’s input-output relation in Line 7. This approximation is evaluated in a set-based manner in Line 8. The resulting polynomial zonotope \(\langle c_q,G_q,G_{I,q},E_q\rangle _{PZ}\) forms the j-th dimension of the set \(\mathcal{P}\mathcal{Z}\) representing the output of the whole layer (see Line 9 and Line 12). To obtain a formally correct enclosure, we have to account for the error made by the approximation. We therefore compute the difference between the activation function and the quadratic approximation in Line 10 and add the result to the output set in Line 12. By repeating this procedure for all layers, we finally obtain a tight enclosure of the image through the neural network. A demonstrating example for Algorithm 1 is shown in Fig. 3.

For ReLU activations the quadratic approximation only needs to be calculated if \(l < 0 \wedge u > 0\) since we can use the exact input-output relations \(g(x) = x\) and \(g(x) = 0\) if \(l \ge 0\) or \(u \le 0\) holds. Due to the evaluation of the quadratic map defined by \(g(x)\), the representation size of the polynomial zonotope \(\mathcal{P}\mathcal{Z}\) increases in each layer. For deep neural networks it is therefore advisable to repeatedly reduce the representation size after each layer using order reduction [25, Prop. 16]. Moreover, one can also apply the \(\texttt {compact}\) operation described in [25, Prop. 2] after each layer to remove potential redundancies from \(\mathcal{P}\mathcal{Z}\). Next, we explain the approximation of the input-output relation as well as the computation of the approximation error in detail.

figure a

3.1 Activation Function Approximation

The centerpiece of our algorithm for computing the image of a neural network is the approximation of the input-output relation defined by the activation function \(\mu (x)\) with a quadratic expression \(g(x)= a_1\,x^2 + a_2\,x + a_3\) (see Line 7 of Algorithm 1). In this section we present multiple possibilities to obtain good approximations.

Polynomial Regression

For polynomial regression we uniformly select N samples \(x_i\) from the interval [lu] and then determine the polynomial coefficients \(a_1,a_2,a_3\) by minimizing the average squared distance between the activation function and the quadratic approximation:

$$\begin{aligned} \min _{a_1,a_2,a_3} \frac{1}{N} \sum _{i=1}^N \big (\mu (x_i) - a_1\,x_i^2 - a_2\,x_i - a_3\big )^2. \end{aligned}$$
(1)

It is well known that the optimal solution to (1) is

$$\begin{aligned} \begin{bmatrix} a_1\\ a_2\\ a_3\end{bmatrix} = A^\dagger b ~~ \text {with} ~~ A = \begin{bmatrix} x_1^2 &{} x_1 &{} 1 \\ \vdots &{} \vdots &{} \vdots \\ x_N^2 &{} x_N &{} 1 \end{bmatrix}, ~~ b = \begin{bmatrix} \mu (x_1) \\ \vdots \\ \mu (x_N) \end{bmatrix}, \end{aligned}$$

where \(A^\dagger = (A^T A)^{-1} A^T\) is the Moore-Penrose inverse of matrix A. For the numerical experiments in this paper we use \(N=10\) samples.

Closed-Form Expression

For ReLU activations a closed-form expression for a quadratic approximation can be obtained by enforcing the conditions \(g(l) = 0\), \(g'(l) = 0\), and \(g(u) = u\). The solution to the corresponding equation system \(a_1\,l^2 + a_2\,l + a_3= 0\), \(2 a_1l + a_2= 0\), \(a_1\,u^2 + a_2\,u + a_3= u\) is

$$\begin{aligned} a_1= \frac{u}{(u-l)^2}, ~~ a_2= \frac{-2l u}{(u-l)^2}, ~~ a_3= \frac{u^2(2l-u)}{(u-l)^2} + u, \end{aligned}$$

which results in the enclosure visualized in Fig. 1. This closed-form expression is very precise if the interval [lu] is close to being symmetric with respect to the origin (\(|l| \approx |u|\)), but becomes less accurate if one bound is significantly larger than the other (\(|u| \gg |l|\) or \(|l| \gg |u|\)).

Taylor Series Expansion

For sigmoid and hyperbolic tangent activation functions a quadratic fit can be obtained using a second-order Taylor series expansion of the activation function \(\mu (x)\) at the expansion point \(x^* = 0.5(l+u)\):

$$\begin{aligned} \begin{aligned}&\mu (x) \approx \mu (x^*) + \mu '(x^*)(x - x^*) + 0.5 \, \mu ''(x^*)(x-x^*)^2 = \\&~\underbrace{0.5 \, \mu ''(x^*)}_{a_1} x^2 + \big ( \underbrace{\mu '(x^*) - \mu ''(x^*)\, x^*}_{a_2}\big ) x + \underbrace{\mu (x^*) - \mu '(x^*)x^* + 0.5 \, \mu ''(x^*) \, {x^*}^2}_{ a_3}, \end{aligned} \end{aligned}$$

where the derivatives for sigmoid activations are \(\mu '(x) = \sigma (x)(1-\sigma (x))\) and \(\mu ''(x) = \sigma (x)(1-\sigma (x))(1-2\sigma (x))\), and the derivatives for hyperbolic tangent activations are \(\mu '(x) = 1-\tanh (x)^2\) and \(\mu ''(x) = -2 \tanh (x)(1-\tanh (x)^2)\). The Taylor series expansion method is identical to the concept used in [21].

Linear Approximation

Since a linear function represents a special case of a quadratic function, Algorithm 1 can also be used in combination with linear approximations. Such approximations are provided by the zonotope abstraction in [34]. Since closed-form expressions for the bounds \(\underline{d}\) and \(\overline{d}\) of the approximation error are already specified in [34], we can omit the error bound computation described in Sect. 3.2 in this case. For ReLU activations we obtain according to [34, Theorem 3.1]

$$\begin{aligned} a_1= 0, ~~ a_2= \frac{u}{u-l}, ~~ a_3= \frac{-u \, l}{2(u-l)}, ~~ \underline{d} = \frac{-u \, l}{2(u-l)}, ~~ \overline{d} = \frac{u \, l}{2(u-l)}, \end{aligned}$$

which results in the zonotope enclosure visualized in Fig. 1. For sigmoid and hyperbolic tangent activations we obtain according to [34, Theorem 3.2]

$$\begin{aligned} \begin{aligned}&a_1= 0,~~a_2= \min (\mu '(l),\mu '(u)),~~a_3= 0.5 (\mu (u) + \mu (l) - a_2(u+l)), \\&\underline{d} = 0.5 (\mu (u) - \mu (l) - a_2(u-l)),~~ \overline{d} = -0.5( \mu (u) - \mu (l) - a_2(u-l)), \end{aligned} \end{aligned}$$

where the derivatives of the sigmoid function and the hyperbolic tangent are specified in the paragraph above.

Fig. 3.
figure 3

Exemplary neural network with ReLU activations (left) and the corresponding image enclosure computed with polynomial zonotopes (right), where we use the approximation \(g(x) = 0.25\,x^2 + 0.5\,x + 0.25\) for the red neuron and the approximation \(g(x) = x\) for the blue neuron. (Color figure online)

We observed from experiments that for ReLU activations the closed-form expression usually results in a tighter enclosure of the image than polynomial regression. For sigmoid and hyperbolic tangent activations, on the other hand, polynomial regression usually performs better than the Taylor series expansion. It is also possible to combine multiple of the methods described above by executing them in parallel and selecting the one that results in the smallest approximation error \([\underline{d},\overline{d}]\). Since the linear approximation does not increase the number of generators, it represents an alternative to order reduction when dealing with deep neural networks. Here, the development of a method to decide automatically for which layers to use a linear and for which a quadratic approximation is a promising direction for future research.

3.2 Bounding the Approximation Error

To obtain a sound enclosure we need to compute the difference between the activation function \(\mu (x)\) and the quadratic approximation \(g(x) = a_1\,x^2 + a_2\,x + a_3\) from Sec. 3.1 on the interval [lu]. In particular, this corresponds to determining

$$\begin{aligned} \underline{d} = \min _{x \in [l,u]} \underbrace{\mu (x) - a_1\,x^2 - a_2\,x - a_3}_{d(x)} ~~\text {and}~~ \overline{d} = \max _{x \in [l,u]} \underbrace{\mu (x) - a_1\,x^2 - a_2\,x - a_3}_{d(x)}. \end{aligned}$$

Depending on the type of activation function, we use different methods for this.

Rectified Linear Unit (ReLU)

For ReLU activation functions we split the interval [lu] into the two intervals [l, 0] and [0, u] on which the activation function is constant and linear, respectively. On the interval [l, 0] we have \(d(x) = - a_1\,x^2 - a_2\,x - a_3\), and on the interval [0, u] we have \(d(x) = - a_1\,x^2 + (1-a_2) \,x - a_3\). In both cases d(x) is a quadratic function whose maximum and minimum values are either located on the interval boundary or at the point \(x^*\) where the derivative of d(x) is equal to zero. The lower bound on [l, 0] is therefore given as \(\underline{d} = \min (d(l),d(x^*),d(0))\) if \(x^* \in [l,0]\) and \(\underline{d} = \min (d(l),d(0))\) if \(x^* \not \in [l,0]\), where \(x^* = -0.5\, a_2/a_1\). The upper bound as well as the bounds for [0, u] are computed in a similar manner. Finally, the overall bounds are obtained by taking the minimum and maximum of the bounds for the intervals [l, 0] and [0, u].

Sigmoid and Hyperbolic Tangent

Here our high-level idea is to sample the function d(x) at points \(x_i\) with distance \(\varDelta x\) distributed uniformly over the interval [lu]. From rough bounds for the derivative \(d'(x)\) we can then deduce how much the function value between two sample points changes at most, which yields tight bounds \(\overline{d}_b \ge \overline{d}\) and \(\underline{d}_b \le \underline{d}\). In particular, we want to choose the sampling rate \(\varDelta x\) such that the bounds \(\overline{d}_b,\underline{d}_b\) comply to a user-defined precision \(\delta > 0\):

$$\begin{aligned} \overline{d} + \delta \ge \overline{d}_b \ge \overline{d} ~~ \text {and} ~~ \underline{d} -\delta \le \underline{d}_b \le \underline{d}. \end{aligned}$$
(2)

We observe that for both, sigmoid and hyperbolic tangent, the derivative is globally bounded by \(\mu '(x) \in [0,\overline{\mu }]\), where \(\overline{\mu } = 0.25\) for the sigmoid and \(\overline{\mu } = 1\) for the hyperbolic tangent. In addition, it holds that the derivative of the quadratic approximation \(g(x) = a_1\,x^2 + a_2\,x + a_3\) is bounded by \(g'(x) \in [\underline{g},\overline{g}]\) on the interval [lu], where \(\underline{g} = \min (2a_1l + a_2,2a_1u + a_2)\) and \(\overline{g} = \max (2a_1l + a_2,2a_1u + a_2)\). As a consequence, the derivative of the difference \(d(x) = \mu (x) - g(x)\) is bounded by \(d'(x) \in [-\overline{g},\overline{\mu } - \underline{g}]\). The value of d(x) can therefore at most change by \(\pm \varDelta d\) between two samples \(x_i\) and \(x_{i+1}\), where \(\varDelta d = \varDelta x \max (|-\overline{g}|,|\overline{\mu } -\overline{g}|)\). To satisfy (2) we require \(\varDelta d \le \delta \), so that we have to choose the sampling rate as \(\varDelta x \le \delta / \max (|-\overline{g}|,|\overline{\mu } -\overline{g}|)\). Finally, the bounds are computed by taking the maximum and minimum of all samples: \(\overline{d}_b = \max _{i} d(x_i) + \delta \) and \(\underline{d}_b = \min _{i} d(x_i) - \delta \). For our experiments we use a precision of \(\delta = 0.001\).

4 Neural Network Controlled Systems

Reachable sets for neural network controlled systems can be computed efficiently by combining our novel image enclosure approach for neural networks with a reachability algorithm for nonlinear systems. We consider general nonlinear systems

$$\begin{aligned} \dot{x} (t) = f\big (x(t),u_c(x(t),t),w(t)\big ), \end{aligned}$$
(3)

where \(x \in \mathbb {R}^n\) is the system state, \(u_c:~\mathbb {R}^n\times \mathbb {R}\rightarrow \mathbb {R}^m\) is a control law, \(w(t) \in \mathcal {W} \subset \mathbb {R}^r\) is a vector of uncertain disturbances, and \(f:~\mathbb {R}^n \times \mathbb {R}^m \times \mathbb {R}^r\rightarrow \mathbb {R}^n\) is a Lipschitz continuous function. For neural network controlled systems the control law \(u_c(x(t),t)\) is given by a neural network. Since neural network controllers are usually realized as digital controllers, we consider the sampled-data case where the control input is only updated at discrete times \(t_0,t_0+\varDelta t,t_0 + 2\varDelta t, \dots , t_F\) and kept constant in between. Here, \(t_0\) is the initial time, \(t_F\) is the final time, and \(\varDelta t\) is the sampling rate. Without loss of generality, we assume from now on that \(t_0 = 0\) and \(t_F\) is a multiple of \(\varDelta t\). The reachable set is defined as follows:

Definition 3

(Reachable set) Let \(\xi (t,x_0,u_c(\cdot ),w(\cdot ))\) denote the solution to (3) for initial state \(x_0 = x(0)\), control law \(u_c(\cdot )\), and the disturbance trajectory \(w(\cdot )\). The reachable set for an initial set \(\mathcal {X}_0 \subset \mathbb {R}^n\) and a disturbance set \(\mathcal {W} \subset \mathbb {R}^r\) is

$$\begin{aligned} \mathcal {R}(t) := \big \{ \xi (t,x_0,u_c(\cdot ),w(\cdot )) ~\big |~ x_0 \in \mathcal {X}_0, \forall t^* \in [0,t]:~ w(t^*) \in \mathcal {W} \big \}. \end{aligned}$$

Since the exact reachable set cannot be computed for general nonlinear systems, we compute a tight enclosure instead. We exploit that the control input is piecewise constant, so that the reachable set for each control cycle can be computed using the extended system

$$\begin{aligned} \begin{bmatrix} \dot{x}(t) \\ \dot{u}(t) \end{bmatrix} = \begin{bmatrix} f(x(t),u(t),w(t)) \\ \textbf{0} \end{bmatrix} \end{aligned}$$
(4)

together with the initial set \(\mathcal {X}_0 \times \mathcal {Y}\), where \(\mathcal {Y}\) is the image of \(\mathcal {X}_0\) through the neural network controller. The overall algorithm is specified in Algorithm 2. Its high-level concept is to loop over all control cycles, where in each cycle we first compute the image of the current reachable set through the neural network controller in Line 3. Next, the image is combined with the reachable set using the Cartesian product in Line 4. This yields the initial set for the extended system in (4), for which we compute the reachable set \(\widehat{\mathcal {R}}(t_{i+1})\) at time \(t_{i+1}\) as well as the reachable set \(\widehat{\mathcal {R}}(\tau _i)\) for the time interval \(\tau _i\) in Line 6. While it is possible to use arbitrary reachability algorithms for nonlinear systems, we apply the conservative polynomialization algorithm [2] since it performs especially well in combination with polynomial zonotopes. Finally, in Line 7, we project the reachable set back to the original system dimensions.

figure b

5 Operations on Polynomial Zonotopes

Algorithm 1 and Algorithm 2 both require some special operations on polynomial zonotopes, the implementation of which we present now. Given a polynomial zonotope \(\mathcal{P}\mathcal{Z} = \langle c,G,G_I,E \rangle _{PZ} \subset \mathbb {R}^n\), a matrix \(A \in \mathbb {R}^{o \times n}\), a vector \(b \in \mathbb {R}^{o}\), and an interval \(\mathcal {I} = [l,u] \subset \mathbb {R}^n\), the affine map and the Minkowski sum with an interval are given as

$$\begin{aligned}&A \, \mathcal{P}\mathcal{Z} \oplus b = \langle Ac + b, AG, AG_I, E\rangle _{PZ} \end{aligned}$$
(5)
$$\begin{aligned}&\mathcal{P}\mathcal{Z} \oplus \mathcal {I} = \langle c + 0.5(u+l), G, [G_I~0.5\,\text {diag}(u-l)],E\rangle _{PZ}, \end{aligned}$$
(6)

which follows directly from [25, Prop. 8], [25, Prop. 9], and [1, Prop. 2.1]. For the Cartesian product used in Line 4 of Algorithm 2 we can exploit the special structure of the sets to calculate the Cartesian product of two polynomial zonotopes \(\mathcal{P}\mathcal{Z}_1 = \langle c_1,G_1, G_{I,1},E_1 \rangle _{PZ} \subset \mathbb {R}^n\) and \(\mathcal{P}\mathcal{Z}_2 = \langle c_2,[G_2~\widehat{G}_2], [G_{I,2}~\widehat{G}_{I,2}],[E_1 ~E_2] \rangle _{PZ} \subset \mathbb {R}^{o}\) as

$$\begin{aligned} \mathcal{P}\mathcal{Z}_1 \times \mathcal{P}\mathcal{Z}_2 = \bigg \langle \begin{bmatrix} c_1 \\ c_2 \end{bmatrix},\begin{bmatrix} G_1 &{} \textbf{0} \\ G_2 &{} \widehat{G}_2 \end{bmatrix},\begin{bmatrix} G_{I,1} &{} \textbf{0} \\ G_{I,2} &{} \widehat{G}_{I,2} \end{bmatrix},[E_1 ~ E_2] \bigg \rangle _{PZ}. \end{aligned}$$
(7)

In contrast to [25, Prop. 11], this implementation of the Cartesian product explicitly preserves dependencies between the two sets, which is possible since both polynomial zonotopes have identical dependent factors. Computing the exact bounds of a polynomial zonotope in Line 6 of Algorithm 1 would be computationally infeasible, especially since this has to be done for each neuron in the network. We therefore compute a tight enclosure of the bounds instead, which can be done very efficiently:

Proposition 1

(Interval enclosure) Given a polynomial zonotope \(\mathcal{P}\mathcal{Z}= \langle c,G, G_I,E \rangle _{PZ} \subset \mathbb {R}^n\), an enclosing interval can be computed as

$$\begin{aligned} \mathcal {I} = [c + g_1 - g_2 - g_3 - g_4, c + g_1 + g_2 + g_3 + g_4] \supseteq \mathcal{P}\mathcal{Z}\end{aligned}$$

with

$$\begin{aligned} \begin{aligned}&g_1 = 0.5 \sum _{i \in \mathcal {H}} G_{(\cdot ,i)}, ~g_2 = 0.5 \sum _{i \in \mathcal {H}} |G_{(\cdot ,i)}|,~ g_3 = \sum _{i \in \mathcal {K}} |G_{(\cdot ,i)}|,~ g_4 = \sum _{i=1}^q |G_{I(\cdot ,i)}| \\&~~~~~~~~ \mathcal {H} = \bigg \{ i~ \bigg | ~ \prod _{j=1}^p \big (1-E_{(j,i)} \, \textrm{mod}~2)\big ) = 1 \bigg \} , ~~ \mathcal {K} = \{1,\dots ,h \} \setminus \mathcal {H}, \end{aligned} \end{aligned}$$

where \(x \, \textrm{mod} \,y\), \(x,y \in \mathbb {N}_0\) is the modulo operation and \(\setminus \) denotes the set difference.

Proof 1

We first enclose the polynomial zonotope by a zonotope \(\mathcal {Z} \supseteq \mathcal{P}\mathcal{Z}\) according to [25, Prop. 5], and then compute an interval enclosure \(\mathcal {I} \supseteq \mathcal {Z}\) of this zonotope according to [1, Prop. 2.2]. \(\square \)

The core operation for Algorithm 1 is the computation of the image through a quadratic function. While it is possible to obtain the exact image by introducing new dependent factors, we compute a tight enclosure for computational reasons:

Proposition 2

(Image quadratic function) Given a polynomial zonotope \(\mathcal{P}\mathcal{Z} = \langle c,G, G_I,E\rangle _{PZ} \subset \mathbb {R}\) and a quadratic function \(g(x) = a_1\,x^2 + a_2\,x + a_3\) with \(a_1,a_2,a_3,x \in \mathbb {R}\), the image of \(\mathcal{P}\mathcal{Z}\) through \(g(x)\) can be tightly enclosed by

$$\begin{aligned} \big \{ g(x)~\big |~ x \in \mathcal{P}\mathcal{Z}\big \} \subseteq \langle c_q,G_q,G_{I,q},E_q \rangle _{PZ} \end{aligned}$$

with

$$\begin{aligned} \begin{aligned}&c_q = a_1c^2 + a_2c + a_3+ 0.5 \, a_1\sum _{i=1}^q G_{I(\cdot ,i)}^2, ~~ G_q = \big [(2 a_1c + a_2)G ~~ a_1\widehat{G} \big ], \\&E_q = \big [E ~~ \widehat{E} \big ], ~~ G_{I,q} = \big [ (2 a_1c + a_2)G_I ~~ 2a_1\overline{G} ~~a_1\check{G} \big ], \end{aligned} \end{aligned}$$
(8)

where

$$\begin{aligned} \begin{aligned}&\widehat{G} = \big [G^2 ~~ 2\,\widehat{G}_1 ~~ \dots ~~ 2\,\widehat{G}_{h-1}\big ], ~~ \widehat{E} = \big [2 \, E ~~ \widehat{E}_1 ~~ \dots ~~ \widehat{E}_{h-1}\big ], \\&\widehat{G}_i = \big [G_{(i)} G_{(i+1)} ~~ \dots ~~ G_{(i)} G_{(h)}\big ], ~~ i = 1,\dots , h-1, \\&\widehat{E}_i = \big [E_{(\cdot ,i)} + E_{(\cdot , i+1)} ~~ \dots ~~ E_{(\cdot , i)} + E_{(\cdot ,h)}\big ], ~~ i = 1,\dots , h-1, \\&\overline{G} = \big [G_{(1)} G_I ~~ \dots ~~ G_{(h)} G_I \big ], ~~ \check{G} = \big [0.5 \, G_I^2 ~~ 2\,\check{G}_1 ~~ \dots ~~ 2\,\check{G}_{q-1} \big ], \\&\check{G}_i = \big [ G_{I(i)} G_{I(i+1)} ~~ \dots ~~ G_{I(i)} G_{I(q)} \big ], ~~ i = 1,\dots ,q-1, \end{aligned} \end{aligned}$$
(9)

and the squares in \(G^2\) as well as \(G_I^2\) are interpreted elementwise.

Proof 2

The proof is provided in Appendix A.

Fig. 4.
figure 4

Image enclosures computed with zonotopes (red), star sets (green), Taylor models (purple), and polynomial zonotopes (blue) for randomly generated neural networks with ReLU activations (left), sigmoid activations (middle), and hyperbolic tangent activations (right). The exact image is shown in gray. (Color figure online)

6 Numerical Examples

We now demonstrate the performance of our approach for image computation, open-loop neural network verification, and reachability analysis of neural network controlled systems. If not stated otherwise, computations are carried out in MATLAB on a 2.9GHz quad-core i7 processor with 32GB memory. We integrated our implementation into CORA [3] and published a repeatability packageFootnote 2.

Image Enclosure

First, we demonstrate how our approach captures the non-convexity of the image through a neural network. For visualization purposes we use the deliberately simple example of randomly generated neural networks with two inputs, two outputs, and one hidden layer consisting of 50 neurons. The initial set is \(\mathcal {X}_0 = [-1,1] \times [-1,1]\). We compare our polynomial-zonotope-based approach with the zonotope abstraction in [34], the star set approach in [40] using the triangle relaxation, and the Taylor model abstraction in [21]. While our approach and the zonotope abstraction are applicable to all types of activation functions, the star set approach is restricted to ReLU activations and the Taylor model abstraction is limited to sigmoid and hyperbolic tangent activations. The resulting image enclosures are visualized in Fig. 4. While using zonotopes or star sets only yields a convex over-approximation, polynomial zonotopes are able to capture the non-convexity of the image and therefore provide a tighter enclosure. While Taylor models also capture the non-convexity of the image to some extent they are less precise than polynomial zonotopes, which can be explained as follows: 1) The zonotopic remainder of polynomial zonotopes prevents the rapid remainder growth observed for Taylor models, and 2) the quadratic approximation obtained with polynomial regression used for polynomial zonotopes is usually more precise than the Taylor series expansion used for Taylor models.

Open-Loop Neural Network Verification

For open-loop neural network verification the task is to verify that the image of the neural network satisfies certain specifications that are typically given by linear inequality constraints. We examine the ACAS Xu benchmark from the 2021 and 2022 VNN competition [4, 29] originally proposed in [22, Sec. 5], which features neural networks that provide turn advisories for an aircraft to avoid collisions. All networks consist of 6 hidden layers with 50 ReLU neurons per layer. For a fair comparison we performed the evaluation on the same machine that was used for the VNN competition. To compute the image through the neural networks with polynomial zonotopes, we apply a quadratic approximation obtained by polynomial regression for the first two layers, and a linear approximation in the remaining layers. Moreover, we recursively split the initial set to obtain a complete verifier. The comparison with the other tools that participated in the VNN competition shown in Table 1 demonstrates that for some verification problems polynomial zonotopes are about as fast as the best tool in the competition.

Table 1. Computation times\(^{\textrm{a}}\) in seconds for different verification tools on a small but representative excerpt of network-specification combinations of the ACAS Xu benchmark. The symbol - indicates that the tool failed to verify the specification.
Table 2. Computation times\(^{\textrm{b}}\) in seconds for reachability analysis of neural network controlled systems considering different tools and benchmarks. The dimension, the number of hidden layers, and the number of neurons in each layer is specified in parenthesis for each benchmark, where \(a = 100\), \(b = 5\) for ReLU activation functions, and \(a = 20\), \(b = 3\) otherwise. The symbol - indicates that the tool failed to verify the specification.

Neural Network Controlled Systems

The main application of our approach is reachability analysis of neural network controlled systems, for which we now compare the performance to other state-of-the-art tools. For a fair comparison we focus on published results for which the authors of the tools tuned the algorithm settings by themselves. In particular, we examine the benchmarks from [33] featuring ReLU neural network controllers, and the benchmarks from [21] containing sigmoid and hyperbolic tangent neural network controllers. The goal for all benchmarks is to verify that the system reaches a goal set or avoids an unsafe region. As the computation times shown in Table 2 demonstrate, our polynomial-zonotope-based approach is for all but two benchmarks significantly faster than all other state-of-the-art tools, mainly since it avoids all major bottlenecks observed for the other tools: The polynomial approximations of the overall network used by Sherlock and ReachNN* are often imprecise, JuliaReach loses dependencies when enclosing Taylor models by zonotopes, Verisig is quite slow since the nonlinear system used to represent the neural network is high-dimensional, and Verisig 2.0 and POLAR suffer from the rapid remainder growth observed for Taylor models.

7 Conclusion

We introduced a novel approach for computing tight non-convex enclosures of images through neural networks with ReLU, sigmoid, and hyperbolic tangent activation functions. Since we represent sets with polynomial zonotopes, all required calculations can be realized using simple matrix operations only, which makes our algorithm very efficient. While our proposed approach can also be applied to open-loop neural network verification, its main application is reachability analysis of neural network controlled systems. There, polynomial zonotopes enable the preservation of dependencies between the reachable set and the set of control inputs, which results in very tight enclosures of the reachable set. As we demonstrated on various numerical examples, our polynomial-zonotope-based approach consequently outperforms all other state-of-the-art methods for reachability analysis of neural network controlled systems.