1 Introduction

Deep neural networks (DNNs) have become one of the most powerful techniques to deal with challenging and complex problems such as image processing [15] and natural language translation [9, 16] due to its learning ability on large data sets. Recently, the power of DNNs has inspired a new generation of intelligent autonomy which makes use of DNNs-based learning enable components such as autonomous vehicles [5] and air traffic collision avoidance systems [11]. Although utilizing DNNs is a promising approach, assuring the safety of autonomous applications containing neural network components is difficult because DNNs usually have complex characteristics and behavior that are generally unpredictable. Notably, it has been proved that well-trained DNNs may not be robust and are easily to be fooled by a slight change in the input [18]. Several recent incidents in autonomous driving (e.g., Tesla and Uber) raises an urgent need for techniques and tools that can formally verify the safety and robustness of DNNs before utilizing them in safety-critical applications.

Safety verification and robustness certification of DNNs have attracted a huge attention from different communities such as machine learning [1, 2, 13, 17, 20, 25, 26, 31], formal methods [6, 10, 12, 19, 23, 28,29,30], and security [7, 24, 25], and a recent survey of the area is available [27]. Analyzing the behavior of a DNN can broadly be categorized into exact and over-approximate analyses. For the exact analysis, the SMT-based [12] and polyhedron-based approaches [23, 28] are notable representatives. For the over-approximate analysis, the mixed-integer linear program (MILP) [6], interval arithmetic- [24, 25], zonotope- [20], input partition- [30], linearization- [26], and abstract-domain- [21] based are fast and efficient approaches. While the over-approximate analysis is usually faster and more scalable than the exact analysis, it guarantees only the soundness of the result. In contrast, the exact analysis is usually more time-consuming and less scalable. However, it guarantees both the soundness and completeness of the result [12]. Although the over-approximate analysis is fast and scalable, it is unclear how good the over-approximation is in term of conservativeness since the exact result is not available for comparison. Importantly, if an over-approximation approach is too conservative for neural networks with small or medium sizes, it will potentially produce huge conservative results for DNNs with a large number of layers and thousands of neurons since the over-approximation error is accumulated quickly over layers. Therefore, a scalable, exact reachability analysis is crucial not only for formal verification of DNNs, but also for estimating the conservativeness of current and up-coming over-approximation approaches.

In this paper, we propose a fast and scalable approach for the exact and over-approximate reachability analysis of DNN with ReLU activation functions using the concept of star sets [3], or shortly “star”. Star fits perfectly for the reachability analysis of DNNs due to its following essential characteristics: (1) an efficient (exact) representation of large input sets; (2) fast and cheap affine mapping operations; (3) inexpensive intersections with half-spaces and checking empty. By utilizing star, we avoid the expensive affine mapping operation in polyhedron-based approach [23] and thus, reduce the verification time significantly. Our approach performs reachability analysis for feedforward DNNs layer-by-layer. In the case of exact analysis, the output reachable set of each layer is a union of a set of stars. Based on this observation, the star-based exact reachability algorithm naturally can be designed for efficient execution on multi-core platforms where each layer can handle multiple input sets at the same time. In the case of over-approximate analysis, the output reachable set of each layer is a single star which can be constructed by doing point-wise over-approximation of the reachable set at all neurons of the layer.

We evaluate the proposed algorithms in comparison with the polyhedron approach [23], Reluplex [12], zonotope [20] and abstract domain [21] approaches on safety verification of the ACAS Xu neural networks [11] and robust certification of image classification DNN. The experimental results show that our exact reachability algorithm can achieve 19 times faster than Reluplex when running on multi-core platform and \({>}70\) times faster than the polyhedron approach. Notably, our exact algorithm can visualize the precise behavior of the ACAS Xu networks and can construct the complete set of counter example inputs in the case that a safety property is violated. Our over-approximate reachability algorithm is averagely 118 times faster than Reluplex. It successfully verifies many safety properties of ACAS Xu networks while the zonotope and abstract domain approaches fail due to their large over-approximation errors. Our over-approximate reachability algorithm also provides a better robustness certification for image classification DNN in comparison with the zonotope and abstract domain approaches. In summary, the main contributions of this paper are: (1) propose novel, fast and scalable methods for the exact and over-approximate reachability analysis of DNNs; (2) implement the proposed methods in NNV toolbox that is available online for evaluation and comparison; (3) provide a thorough evaluation of the new methods via real-world case studies.

2 Preliminaries

2.1 Machine Learning Models and Symbolic Verification Problem

A feed-forward neural network (FNN) consists of an input layer, an output layer, and multiple hidden layers in which each layer comprises of neurons that are connected to the neurons of preceding layer labeled using weights. Given an input vector, the output of an FNN is determined by three components: the weight matrices \(W_k\), representing the weighted connection between neurons of two consecutive layers \(k-1\) and k, the bias vectors \(b_k\) of each layer, and the activation function f applied at each layer. Mathematically, the output of a neuron i is defined by:

$$\begin{aligned} y_i = f(\varSigma _{j=1}^n\omega _{ij}x_j + b_i), \end{aligned}$$

where \(x_{j}\) is the \(j^{th}\) input of the \(i^{th}\) neuron, \(\omega _{ij}\) is the weight from the \(j^{th}\) input to the \(i^{th}\) neuron, \(b_i\) is the bias of the \(i^{th}\) neuron. In this paper, we are interested in FNN with ReLU activation functions defined by \(ReLU(x) = max(0, x)\).

Definition 1

(Reachable Set of FNN). Given a bounded convex polyhedron input set defined as \(\mathcal {I} \triangleq \{x~|~Ax \le b, x \in \mathbb {R}^n \}\), and an k-layers feed-forward neural network \(F \triangleq \{L_1, \cdots , L_k\}\), the reachable set \(F(\mathcal {I}) = \mathcal {R}_{L_k}\) of the neural network F corresponding to the input set I is defined incrementally by:

$$\begin{aligned} \begin{aligned} \mathcal {R}_{L_{1}}&\triangleq \{y_1~|~y_1 = ReLU(W_1 x + b_1),~ x \in \mathcal {I}\}, \\ \mathcal {R}_{L_{2}}&\triangleq \{y_2~|~y_2 = ReLU(W_2 y_1 + b_2),~ y_1 \in \mathcal {R}_{L_1}\}, \\&\vdots \\ \mathcal {R}_{L_k}&\triangleq \{y_k ~|~ y_k = ReLU(W_k y_{k-1} + b_k)\,~y_{k-1} \in \mathcal {R}_{L_{k-1}} \}, \\ \end{aligned} \end{aligned}$$

where \(W_k\) and \(b_k\) are the weight matrix and bias vector of the \(k^{th}\) layer \(L_k\), respectively. The reachable set \(\mathcal {R}_{L_k}\) contains all outputs of the neural network corresponding to all input vectors x in the input set \(\mathcal {I}\).

Definition 2

(Safety Verification of FNN). Given a k-layers feed-forward neural network F, and a safety specification \(\mathcal {S}\) defined as a set of linear constraints on the neural network outputs \(\mathcal {S} \triangleq \{y_k~|~Cy_k \le d \}\), the neural network F is called to be safe corresponding to the input set \(\mathcal {I}\), we write \(F(\mathcal {I}) \vDash S\), if and only if \(\mathcal {R}_{L_k} \cap \lnot \mathcal {S} = \emptyset \), where \(\mathcal {R}_{L_k}\) is the reachable set of the neural network with the input set \(\mathcal {I}\), and \(\lnot \) is the symbol for logical negation. Otherwise, the neural network is called to be unsafe \(F(\mathcal {I}) \nvDash S\).

2.2 Generalized Star Sets

Definition 3

(Generalized Star Set [3]). A generalized star set (or simply star) \(\varTheta \) is a tuple \(\langle c, V, P \rangle \) where \(c \in \mathbb {R}^n\) is the center, \(V = \{v_1, v_2, \cdots , v_m\}\) is a set of m vectors in \(\mathbb {R}^n\) called basis vectors, and \(P: \mathbb {R}^m \rightarrow \{ \top , \bot \}\) is a predicate. The basis vectors are arranged to form the star’s \(n \times m\) basis matrix. The set of states represented by the star is given as:

$$\begin{aligned} \llbracket \varTheta \rrbracket = \{x~|~x = c + \varSigma _{i=1}^m(\alpha _iv_i)~\text {such that}~P(\alpha _1, \cdots , \alpha _m) = \top \}. \end{aligned}$$
(1)

Sometimes we will refer to both the tuple \(\varTheta \) and the set of states \(\llbracket \varTheta \rrbracket \) as \(\varTheta \). In this work, we restrict the predicates to be a conjunction of linear constraints, \(P(\alpha ) \triangleq C\alpha \le d\) where, for p linear constraints, \(C \in \mathbb {R}^{p \times m}\), \(\alpha \) is the vector of m-variables, i.e., \(\alpha = [\alpha _1, \cdots , \alpha _m]^T\), and \(d \in \mathbb {R}^{p \times 1}\). A star is an empty set if and only if \(P(\alpha )\) is empty.

Proposition 1

Any bounded convex polyhedron \(\mathcal {P} \triangleq \{x~|~Cx \le d, x \in \mathbb {R}^n \}\) can be represented as a star.Footnote 1

Proposition 2

[Affine Mapping of a Star]. Given a star set \(\varTheta = \langle c, V, P\rangle \), an affine mapping of the star \(\varTheta \) with the affine mapping matrix W and offset vector b defined by \(\bar{\varTheta } = \{y~|~y=Wx + b,~x \in \varTheta \}\) is another star with the following characteristics.

$$\begin{aligned} \bar{\varTheta } = \langle \bar{c}, \bar{V}, \bar{P} \rangle , ~ \bar{c} = Wc + b, ~\bar{v} = \{Wv_1, Wv_2, \cdots , Wv_m \},~ \bar{P} \equiv P. \end{aligned}$$

Proposition 3

(Star and Half-space Intersection). The intersection of a star \(\varTheta \triangleq \langle c, V, P \rangle \) and a half-space \(\mathcal {H} \triangleq \{x~|~Hx \le g \}\) is another star with following characteristics.

$$\begin{aligned} \begin{aligned}&\bar{\varTheta } = \varTheta \cap \mathcal {H} = \langle \bar{c}, \bar{V}, \bar{P} \rangle , ~\bar{c} = c,~\bar{V} = V, ~\bar{P} = P \wedge P^{\prime }, \\&P^{\prime }(\alpha ) \triangleq (H\times V_m) \alpha \le g - H \times c, V_m = [v_1~v_2 \cdots v_m]. \end{aligned} \end{aligned}$$

3 Star-Based Reachability Analysis of FNNs

3.1 Exact and Complete Analysis

Since any bounded convex polyhedron can be represented as a star (Proposition 1), we assume the input set \(\mathcal {I}\) of an FNN is a star set. From Definition 1, one can see that the reachable set of an FNN is derived layer-by-layer. Since the affine mapping of a star is also a star (Proposition 2), the core step in computing the exact reachable set of a layer with a star input set is applying the ReLU activation function on the star input set, i.e., compute \(ReLU(\varTheta ),~\varTheta = \langle c, V, P \rangle \). For a layer L with n neurons, the reachable set of the layer can be computed by executing a sequence of n stepReLU operations as follows \(\mathcal {R}_L = ReLU_n(ReLU_{n-1}(\cdots ReLU_1(\varTheta )))\).

Fig. 1.
figure 1

An example of a stepReLU operation on a layer with two neurons.

The stepReLU operation on the \(i^{th}\) neuron, i.e., \(ReLU_i(\cdot )\), works as follows. First, the input star set \(\varTheta \) is decomposed into two subsets \(\varTheta _1 = \varTheta \wedge x_i \ge 0\) and \(\varTheta _2 = \varTheta \wedge x_i < 0\). Note that from Proposition 3, \(\varTheta _1\) and \(\varTheta _2\) are also stars. Let assume that \(\varTheta _1 = \langle c, V, P_1 \rangle \) and \(\varTheta _2 = \langle c, V, P_2 \rangle \). Since the later set has \(x_i < 0\), applying the ReLU activation function on the element \(x_i\) of the vector \( x = [x_1\cdots x_i~x_{i+1}\cdots x_n]^T \in \varTheta _2\) will lead to the new vector \(x^{\prime } = [x_1~x_2 \cdots 0~ x_{i+1} \cdots x_n]^T\). This procedure is equivalent to mapping \(\varTheta _2\) by the mapping matrix \(M = [e_1~e_2\cdots e_{i-1}~0~e_{i+1}\cdots e_n]\). Also, applying the ReLU activation function on the element \(x_i\) of the vector \(x \in \varTheta _1\) does not change the set since we have \(x_i \ge 0\). Consequently, the result of the stepReLU operation on input set \(\varTheta \) at the \(i^{th}\) neuron is a union of two star sets \(ReLU_i(\varTheta ) = \langle c, V, P1 \rangle \cup \langle Mc, MV, P2\rangle \). A concrete example of the first stepReLU operation on a layer with two neurons is depicted in Fig. 1.

The number of stepReLU operation can be reduced if we know beforehand the ranges of all states in the input set. For example, if we know that \(x_i\) is always larger than zero, then we have \(ReLU_i(\varTheta ) = \varTheta \), or in other words, we do not need to execute the stepReLU operation on the \(i^{th}\) neuron. Therefore, to minimize the number of stepReLU operations and the computation time, we first determine the ranges of all states in the input set which can be done efficiently by solving n-linear programming problems.

figure a

The star-based exact reachability algorithm given in Algorithm 3.1. works as follows. The layer takes the star output sets of the preceding layer as input sets \(I = [\varTheta _1,~\cdots ,~\varTheta _N]\). The main procedure in the algorithm is layerReach which processes the input sets I in parallel. On each input element \(\varTheta _i = \langle c_i, V_i, P_i\rangle \), the main procedure maps the element with the layer weight matrix W and bias vector b which results a new star \(I_1 = \langle Wc_i + b, WV_i, P_i \rangle \). The reachable set of the layer corresponding to the element \(\varTheta _i\) is computed by reachReLU sub-procedure which executes a minimized sequence of stepReLU operations on the new star \(I_1\), i.e., iteratively calls stepReLU sub-procedure. Note that that the stepReLU sub-procedure is designed to handle multiple star input sets since the number of star sets may increase after each stepReLU operation.

Lemma 1

The worst-case complexity of the number of stars in the reachable set of an N-neurons FNN computed by Algorithm 3.1. is \(\mathcal {O}(2^N)\).

Lemma 2

The worst-case complexity of the number of constraints of a star in the reachable set of an N-neuron FNN computed by Algorithm 3.1. is \(\mathcal {O}(N)\).

Theorem 1

(Verification complexity). Let F be an N-neuron FNN, \(\varTheta \) be a star set with p linear constraints and m-variables in the predicate, \(\mathcal {S}\) be a safety specification with s linear constraints. In the worst case, the safety verification or falsification of the neural network \(F(\varTheta )\,\models \,S ? \) is equivalent to solving \(2^N\) feasibility problems in which each has \(N \,+ \,p +\, s\) linear constraints and m-variables.

Remark 1

Although in the worst-case, the number of stars in the reachable set of an FNN is \(2^N\), in practice, the actual number of stars is usually much smaller than the worst-case result which enhances the applicability of the star-based exact reachability analysis for practical DNNs.

Theorem 2

(Safety and complete counter input set). Let F be an FNN, \(\varTheta = \langle c, V, P \rangle \) be a star input set, \(F(\varTheta ) = \cup _{i=1}^k~\varTheta _i,~\varTheta _i = \langle c_i, V_i, P_i \rangle \) be the reachable set of the neural network, and \(\mathcal {S}\) be a safety specification. Denote \(\bar{\varTheta }_i = \varTheta _i \cap \lnot S = \langle c_i, V_i, \bar{P}_i \rangle ,~i = 1, \cdots , k\). The neural network is safe if and only if \(\bar{P}_i = \emptyset \) for all i. If the neural network violates its safety property, then the complete counter input set containing all possible inputs in the input set that lead the neural network to unsafe states is \(\mathcal {C}_{\varTheta } = \cup _{i=1}^{k} \langle c, V, \bar{P}_i \rangle ,~\bar{P}_i \ne \emptyset \).

Fig. 2.
figure 2

Over-approximation of ReLU functions with different approaches.

figure b

3.2 Over-Approximate Analysis

Although the exact reachability analysis can compute the exact behavior of FNN, the number of stars grows exponentially with the number of layers and leads to an increase in computation cost that limits scalability. In this section, we propose an over-approximation reachability algorithm for FNNs. The main benefit of this approach is that the reachable set of each layer is only a single star that can be constructed efficiently by using an over-approximation of the ReLU activation function at all neurons in the layer. Importantly, our star-based over-approximate reachability algorithm is much less conservative than the zonotope-based [20] and abstract domain [21] based approaches in the way of approximating the ReLU activation function, shown in Fig. 2. The zonotope-based approach [20] over-approximates the ReLU activation function by a minimal parallelogram while the abstract-domain approach [21] over-approximates the ReLU activation function by a triangle. Our star-based approach also over-approximates the ReLU activation function with a triangle as in the abstract-domain approach. However, the abstract-domain approach only uses lower bound and upper bound constraints for the output \(y_i = ReLU(x_i)\) to avoid the state space explosion [21], for example, in Fig. 2, these constraints are \(y_i \ge 0,~y_i \le u_i(x_i - l_i)/(u_i - l_i)\). To obtain a tighter over-approximation, our approach uses three constraints for the output \(y_i\) instead. The over-approximation rule for a single neuron is given as follows,

$$\begin{aligned} {\left\{ \begin{array}{ll} y_i = x_i &{} \text {if }l_i \ge 0 \\ y_i = 0 &{} \text {if }u_i \le 0 \\ y_i \ge 0,~y_i \le \frac{u_i(x_i - l_i)}{u_i - l_i},~y_i \ge x_i &{} \text {if }l_i < 0 \text { and }u_i > 0 \end{array}\right. } \end{aligned}$$

where \(l_i\) and \(u_i\) is the lower and upper bounds of \(x_i\).

Similar to the exact approach, the over-approximate reachable set of a Layer with n neurons can be computed by executing a sequence of n approximate-stepReLU operations performing the above over-approximation rule. The over-approximate reachability algorithm for a single layer of FNN using star set given in Algorithm 3.2. works as follows. Given a star input set \(\varTheta \), the algorithm computes the affine mapping of the input set using the layer’s weight matrix and bias vector. The resulting star set is the input of a sequence of n \(approximate-stepReLU\) operations. An approximate-stepReLU operation first computes the lower and upper bounds of the state variable x[i] w.r.t the \(i^{th}\) neuron. If the lower bound is not negative (line 11), the approximate-stepReLU operation returns a new intermediate reachable set which is exactly the same as its input set. If the upper bound is not positive (line 12), the approximate-stepReLU operation returns a new intermediate reachable set which is the same as its input set except the \(i^{th}\) state variable is zero. If the lower bound is negative and the upper bound is positive (line 13), the approximate-stepReLU operation introduces a new variable \(\alpha _{m+1}\) to capture the over-approximation of ReLU function at the \(i^{th}\) neuron. As a result, the obtained intermediate reachable set has one more variable and three more linear constraints in the predicate in comparison with the corresponding input set. From this observation, we can see that in the worst case, the over-approximate reachability algorithm will obtain a reachable set with \(N + m_0\) variables and \(3N + n_0\) constraints in the predicate, where \(m_0\), \(n_0\) respectively are the number of variables and linear constraints of the predicate of the input set and N is the total number of neurons of the FNN.

figure c

3.3 Reachability Algorithm for FNNs

The reachability analysis of a FNN is done layer-by-layer in which the output set of the previous layer is the input set of the next layer. The reachability algorithm for a FNN is summarized in Algorithm 3.3.

Table 1. Safety verification results of ACAS Xu networks. Notation: TO is ‘Timeout’, Rel states for ‘Reluplex’, Poly is the Polyhedron method, UN states for unknown (due to over-approximation error), VT is the verification time in seconds, and \(r^1_{ES}, r^2_{ES}\) are the verification time improvement of the exact, star-based method compared with Reluplex and the polyhedron-based methods; \(r_Z, r_{AD}\) and \(r_{AS}\) respectively are the verification time improvement of the zonotope-, abstract domain- and over-approximate star-based methods compared with Reluplex. The computation time limitation for polyhedron-based method (run on Amazon cloud) was set to be 1 h while for Reluplex, it was set at 24 h.

4 Evaluation

In this section, we evaluate the proposed star-based reachability algorithms in comparison to existing state-of-the-art approaches including exact (sound and complete) SMT-based (Reluplex [12]) and polyhedron-based [23] approaches, as well as over-approximate approaches, such as those using zonotopes [20] and abstract domains [21]. To clarify intuitively the benefit of our approach, we re-implement the zonotope- and abstract-domain based approaches in our tool called NNV. This allows the visualization of the over-approximate reachable set of these approaches. The evaluation and comparison are done by verifying safety of the ACAS Xu DNNs [11] and the robustness of image classification DNN against adversarial attacks. All results presented in this section and their corresponding scripts are available onlineFootnote 2.

4.1 Safety Verification for ACAS Xu DNNs

The ACAS Xu networks are DNN-based advisory controllers that map the sensor measurements to advisories in the Airborne Collision Avoidance System X [11]. It consists of 45 DNNs which are trained to replace the traditional memory-consuming lookup table. Each DNN denoted by \(N_{x\_y}\) has 5 inputs, 5 outputs, and 6 hidden layers of 50 neurons. The detail about ACAS Xu networks and their safety properties are given in the appendix [22]. The experiments in this case study are done using Amazon Web Services Elastic Computing Cloud (EC2), on a powerful m5a.24xlarge instance with 96 cores and 384 GB of memory. The verification results are presented in Table 1. We used 90 cores for the exact reachability analysis of the ACAS Xu networks using the polyhedron- and star- based approaches, and only 1 core for the over-approximate reachability analysis approaches.

Fig. 3.
figure 3

Verification time reduction with parallel computing and complete counter input set construction. (Color figure online)

Verification Results and Timing Performance. Safety verification using star-based reachability algorithms consists of two major steps. The first step constructs the whole reachable set of the networks. The second step checks the intersection of the constructed reachable set with the unsafe region. The verification time (VT) in our approach is the sum of the reachable set computation time (RT) and the safety checking time (ST). The reachable set computation time dominates (averagely \(95\%\) of) the verification time in all cases and the verification time varies for different properties. The detail of the reachable set computation time and the safety checking time can be found in the verification results.

Exact Star-Based Method. The experimental results show that the exact star-based approach is on average \({>}70\) times faster than the polyhedron-based approach and 18.9 times faster than Reluplex when using parallel computing. Impressively, it can even achieve 61.8 faster than Reluplex when verifying property \(\phi _2\) on \(N_{5\_1}\) network. This improvement comes from the fact that star set that is very efficient in affine mapping and intersection with half-spaces which are crucial operations for reachable set computation and safety checking. Therefore, the exact star-based method is much more efficient than the polyhedron-based approach [23]. In addition, the exact star-based algorithm is well-designed and optimized (i.e., minimize the number of stepReLU operations) for efficiently running on multi-core platforms while Reluplex does not exploit the power of parallel-computing. Figure 3a describes the benefits of parallel computing. The figure shows that when a single core is used for a verification task, our approach takes 790.07 s which is a little bit slower than Reluplex with 653 s. However, our verification time drops quickly to 80.45 s, which is 8 times faster than Reluplex, when we use 10 cores for the computation.

Zonotope-Based Method [20]. The experimental results show that the over-approximate, zonotope-based method is significantly faster than the exact methods. In some cases, it can verify the safety of the networks with less than 0.1 s, for example, the zonotope-based method successfully verifies property \(\phi _3\) on \(N_{5\_7}\) network in 0.054 s and the corresponding reachable set is depicted in Fig. 4a. Although the zonotope-based method is time-efficient (on average 12734 times faster than Reluplex), it is unable to verify the safety of many networks due to its huge over-approximation error, i.e., if the over-approximate reachable set reaches an unsafe region, we do not know whether or not the actual reachable set reaches the unsafe region. For example, Fig. 4b describes the reachable set obtained by the zonotope-based method for \(N_{3\_8}\) network w.r.t property \(\phi _3\). As shown in the figure, the obtained reachable set is too conservative and can not be used for safety verification of the network. The main reason that makes the zonotope approach fast is that, to do reachability analysis, we need to compute the lower and upper bounds of each state x[i] of all neurons in each layer. This information can be obtained straightforwardly in the zonotope method while in the other approaches, i.e., abstract-domain and star-based approaches, this is equivalent to solving n linear optimization problems where n is the number of neurons at that layer. The time for solving these optimization problems increase over layers since the number of constraints in the reachable set increases. Therefore, despite the a large over-approximation error, the zonotope-based method is time-efficient when dealing with large and deep neural networks.

Fig. 4.
figure 4

Conservativeness of the reachable sets obtained by different methods.

Abstract-Domain Based Method [21]. The over-approximation method using abstract-domain is 150 times faster than Reluplex on average. It is also much less conservative than the zonotope-based method as can be seen from Fig. 4. However, the reachable set computed by the abstract-domain based method is still too conservative, which makes this approach unable to verify the safety properties of many ACAS Xu networks.

Over-Approximate Star-Based Method. The experiments show that our over-approximate star-based approach can obtain tight reachable sets for many networks compared to the exact sets. Therefore, our over-approximate approach successfully verifies safety properties of most of ACAS Xu networks. Notably, it is on average 118.4 times faster than Reluplex. Impressively, it is 434 times faster than Reluplex when verifying property \(\phi _1\) on \(N_{1\_1}\) network. In comparison with the zonotope and abstract-domain approaches, our method is timing-comparable with the abstract-domain method and slower than the zonotope method. However, our results is much less conservative than those obtained by the zonotope and abstract-domain methods which are shown in Fig. 4. This makes our approach applicable for safety verification of many ACAS Xu networks where the zonotope and abstract-domain methods cannot verify.

Benefits of Computing the Reachable Set. The reachable set computed in our NNV tool are useful for intuitively checking the safety properties of the network. For example, Fig. 4a describes the behaviors of \(N_{5\_7}\) network corresponding to property \(\phi _3\) requiring that the output COC is not the minimal score. From the figure, one can see that the COC is not the minimal score and thus, property \(\phi _3\) holds on \(N_{5\_7}\) network. Importantly, as shown in the figure, via visualization, one can intuitively observe the conservativeness of different over-approximation approaches in comparison to the exact ones which is impossible if we use ERAN, a C-Python implementation of the zonotope and abstract-domain-based methods. Last but not least, the reachable set is useful in the case that we need to verify a set of safety properties corresponding to the same input set. In this case, once the reachable set is obtained, it can be re-used to check different safety properties without rerunning the whole verification procedure as Reluplex does, and thus helps saving a significant amount of time.

Table 2. Maximum robustness values (\(\delta _{max}\)) of image classification networks with different methods in which k is the number of hidden layers of the network, N is the total number of neurons, Tol is the tolerance error in searching.

Complete Counter Example Input Set Construction. Another strong advantage of our approach in comparison with other existing approaches is, in the case that a neural network violates its safety specification, our exact, star-based method can construct a complete counter input set that leads the neural network to the unsafe region. The complete counter input set can be used as a adversarial input generator [4, 8] for robust training of the network. We note that finding a single counter input falsifying a safety property of a neural network can be done efficiently using only random simulations. However, constructing a complete counter input set that contains all counter inputs is very challenging because of the non-linearity of a neural network. To the best of our knowledge, our exact star-based approach is the only approach that can solve this problem. For example, assume that we want to check the following property \(\phi _{3}^{\prime } \triangleq \lnot (COC \ge 15.8 \wedge StrongRight \le 15.09)\) on \(N_{2\_8}\) network with the same input constraints as in property \(\phi _3\). Using the available reachable set of \(N_{2\_8}\) network, we can verify that the above property \(\phi _{3}^{\prime }\) is violated in which 60 stars in 421 stars of the reachable set reach the unsafe region. Using Theorem 2, we can construct a complete counter input set which is a union of 60 stars in 0.9893 s. This counter input set depicted in Fig. 3b is a part of the input set that contains all counter inputs that make the neural network unsafe.

4.2 Maximum Robustness Certification of Image Classification DNNs

Robustness certification of DNNs becomes more an more important as many safety-critical applications using image classification DNNs can be fooled easily by slightly perturbing a correctly classified input. A network is said to be \(\delta \)-locally-robust at input point x if for every \(x^{\prime }\) such that \(\left\| x - x^{\prime }\right\| _\infty \le \delta \), the network assigns the same label to x and \(x^{\prime }\). In this case study, instead of proving the robustness of a network corresponding to a given robustness certification \(\delta \), we focus on finding the maximum robustness certification value \(\delta _{max}\) that a verification method can provide a robustness guarantee for the network. We investigate this interesting problem on a set of image classification DNN with different architectures trained (with an accuracy of \(98\%\)) using the well-known MNIST data set consisting of 60000 images of handwritten digits with a resolution of \(28 \times 28\) pixels [14]. The trained networks have 784 inputs and a single output with expected value from 0 to 9. We find the maximum robustness verification value \(\delta _{max}\) for the networks on an image of digit one with the assumption that there is a \(\delta _{max}\)-bounded disturbance modifying the (normalized) values of the input vector x at all pixels of the image, i.e., \(|x[i] - x^{\prime }[i]| \le \delta _{max}\). The result are presented in Table 2. We note that the polyhedron and Reluplex approaches are not applicable for these networks because they cannot deal with high dimensional input space. The table shows that our approximate star approach produces larger upper bounds of the robustness values of the networks with many layers. For single layer networks, our approach gives the same results as the zonotope [20] and the abstract domain [21] methods. The exact-star method can prove that the network \(N_1\) is robust with the bounded disturbance \(\delta = 0.0058\). When \(\delta > 0.0058\), we ran into the “out of memory” issue in parallel computation since the number of the reachable sets becomes too large. The exact star method reaches timeout (set as 1 h) when finding the maximum robustness value for the other networks.

5 Conclusion and Future Work

We have proposed two reachability analysis algorithms for DNNs using star sets, one that is exact (sound and complete) but has scalability challenges and one that over-approximates (sound) with better scalability. The exact algorithm can compute and visualize the exact behaviors of DNNs. The exact method is more efficient than standard polyhedra approaches, and faster than SMT-based approaches when running on multi-core platforms. The over-approximate algorithm is much faster than the exact one, and notably, it is much less conservative than recent zonotope and abstract-domain based approaches. Our algorithms are applicable for real world applications as shown in the safety verification of ACAS Xu DNNs and robustness certification of image classification DNNs. In future work, we are extending the proposed methods for convolutional neural networks (CNN) and recurrent neural networks (RNN), as well as improving scalability for other types of activation functions such as tanh and sigmoid.