1 Introduction

Machine Learning technologies such as neural networks (NN) have been used to implement advanced functionality in complex systems, including safety-critical aircraft applications. Before such systems can be deployed outside of an experimental setting, it will be necessary to show that they can meet the verification and certification requirements of the aerospace domain.

In a typical NN, much of the complexity and design information resides in its training data rather than in the actual models or code produced in the training process. One of the key principles of avionics software certification is the use of requirements-based testing along with structural coverage metrics. These activities not only demonstrate compliance with functional requirements, but are intended to expose any unintended functionality by providing a measure of completeness. However, since it is not possible to associate particular neurons or lines of code in a NN with a specific requirement, these activities cannot provide the required level of assurance [1].

Formal methods tools are being developed for NNs and may be able to address this challenge by providing a comprehensive analysis of a system over its entire input space and showing the absence of unintended behaviors. In this case study, we have used the Marabou model checker [7] to analyze a NN that was trained to compute collision avoidance flight plans for aircraft. The main contribution of the paper is to show the effectiveness of formal methods in identifying potential safety concerns in a real NN application. In fact, this NN was flight tested in a controlled experiment with two general aviation-class airplanes, but we were able to find a number of conditions which trigger unexpected (and potentially unsafe) actions [2]. Furthermore, we suggest ways in which formal analysis results can be incorporated to improve the training of future systems.

One of the unique aspects of this study is that it is focused on a NN trained using Reinforcement Learning (RL). In earlier work on the ACAS-Xu system for collision avoidance in small unmanned aircraft [5], the NN was trained using supervised learning based on a complete tabular specification of correct behavior and Reluplex [6] (a precursor of Marabou) was used to verify various safety properties. Another ACAS-Xu study [3] used formal analysis tools to show the equivalence of the NN to the tabular specification. In the current study, RL was used to compute flight plans (rather than just the avoidance maneuvers produced by ACAS-Xu), but our formal analysis exposes areas in which the training process is incomplete.

Marabou is a state-of-the-art framework for verifying deep NN. It can answer queries about NN properties by transforming each query to a satisfiability problem. Currently it only supports linear constraints for inputs and outputs. Marabou accepts three input formats: NNet, TensorFlow and ONNX. In the case study, we exported the NN model parameters from the learning environment and encoded them in the NNet format. We used the Marabou Python interface to encode the constraints and perform the verification.

The collision avoidance NN and the Marabou verification scripts are available at https://github.com/darrencofer/NFM-2023-case-study.

2 Aircraft Collision Avoidance Neural Network

We study the automated aircraft collision avoidance system described in [2]. The system’s core is a NN model pre-trained on a surrogate simulation using RL. The NN model modifies the course of the controlled airplane (ownship) to provide a safe distance to another aircraft (intruder) and return to the original course when safe. The RL environment simulates various potential collision scenarios with aircraft performance similar to a Cessna 208 Caravan. The 2-D position range is [−10,000, 10,000] m \(\times \) [−10,000, 10,000] m. The heading range is (−180, 180] degree. The aircraft speed range is [50, 70] mps. The required minimum separation distance (MSD) is 2,000 m. The initial and goal position are randomly generated and remain fixed during each training scenario. During the encounter, while the intruder maintains a constant direction and speed, the ownship adjusts the flight direction and speed. Not maintaining the MSD results in a penalty, while returning to the original route results in a reward.

In the experiments, a number of policies were developed. We chose a NN that only controls the flight direction (i.e., fixed speed). It consists of 8 input nodes, 1 output node, and 2 hidden layers with 64 nodes each. All hidden nodes use rectified linear unit (ReLU) as the activation function. The output node uses tanh as the activation function. Marabou does not support tanh activation functions. So for the verification we removed the tanh activation function and mapped its outputs (e.g. value or range) back to the corresponding function input.

The NN inputs are: \(\{d, d_v, d_i, v_r, \beta _i, \psi _r, \beta _g, \beta _v\}\), where d is distance from ownship to intruder, \(d_v\) is distance to vector, \(d_i\) is distance to initial position, \(v_r\) is relative speed, \(\beta _i\) is angle to intruder, \(\psi _r\) is relative heading, \(\beta _g\) is angle to goal, \(\beta _v\) is angle to vector. The vector is from the initial position to the goal. We define \(v_r = v_i/v_o - 1\), \(\psi _r = \psi _i - \psi _o\), \(\beta _v = \psi _v - \psi _o\), where \(v_i\) is intruder speed, \(v_o\) is onwship speed, \(\psi _i\) is intruder heading, \(\psi _o\) is ownship heading, \(\psi _v\) is vector heading. The system geometry in shown in Fig. 1.

Note that \(d_v = d_i\sin \beta _g\). This means that the NN inputs are not completely independent. We capture this dependence by encoding the relation as a constraint. Since Marabou only supports linear constraints, we set \(\beta _g\) as a constant in each analysis.

All NN inputs are normalized: \(d, d_v, d_i \in [0,1]\), \(v_r \in [-0.3, 0.4]\), \(\beta _i, \psi _r, \beta _g, \beta _v \in [-1, 1]\). The NN output range is (−1, 1) due to the tanh function. It is linearly mapped to (−3, 3) to compute the turn rate (\(\omega \)), unit in degree per second. A positive and negative value indicates turning right and left, respectively.

Fig. 1.
figure 1

System geometry for potential collision.

3 Verifying Minimum Separation Distance

The reachability problem of a closed-loop neural network control system with non-linear dynamics is known to be undecidable [4]. Instead, we examine the condition where the ownship transitions from a safe state (\(d = MSD \)) to a unsafe state (\(d < MSD \)). This indicates that the distance function is decreasing at the MSD boundary. We mathematically derive the derivative of the distance function and check when the neural network will generate an output action that causes the derivative to be negative at the boundary. Although the derivative function itself is non-linear due to trigonometric functions, we found that the safety conditions can be characterized by a set of linear constraints, which can be handled by Marabou. Figure 2 illustrates the derivative calculation. The filled and open arrows represent the ownship and the intruder movement during the time interval \(\varDelta t\), respectively, with d and \(d'\) being the distance at time t and \(t+\varDelta t\).

Fig. 2.
figure 2

Distance between ownship and intruder changes during \(\varDelta t\).

The distance \(d'\) satisfies \( d'^2 = a^2 + b^2 \), where:

\( a = v_o\varDelta t\sin \varphi - v_i\varDelta t\sin \theta \),

\( b = d - v_o\varDelta t\cos \varphi - v_i\varDelta t\cos \theta \).

Ignoring the higher order terms, we have \( (d'^2 - d^2)/\varDelta t = - 2d(v_o\cos \varphi + v_i\cos \theta )\). Applying the chain rule \( \varDelta d^2/\varDelta t = 2d\varDelta d/\varDelta t\), we have \(\varDelta d/\varDelta t = - (v_o\cos \varphi + v_i\cos \theta )\). By definition: \(\varphi = \beta _i - \omega \varDelta t\), \(\theta = 180 - \beta _i + \psi _r\). Letting \(\alpha = \beta _i - \psi _r\), we rewrite the derivative as \(\dot{d} = v_i\cos \alpha - v_o\cos \varphi \).

Assuming \(v_i = v_o\), \(\dot{d} < 0\) implies \(\cos \alpha < \cos \varphi \). If \(0 \le \alpha \le 180, 0 \le \varphi \le 180\), then it implies \(\alpha > \varphi \) (i.e., \(\omega \varDelta t > \psi _r\)). This means that at the MSD boundary, the neural network has to generate a turn angle that is less than the relative heading to prevent the distance from decreasing. Note that the turn rate \(\omega \) is limited to the range \((-3, 3)\) degree per second and \(\varDelta t = 1\) second in simulation. This means if \(\psi _r \le -3\) or \(\psi _r \ge 3\), no matter what the neural network output is, the derivative will always be negative or positive, respectively. Thus, in our analysis we restrict to the scenarios where \(\psi _r \in (-3, 3)\). In other words, we only look for the scenarios where the MSD violation could be avoided, but the neural network does not generate such output.

Results. For the analysis, we sampled the intruder angle between 0 and 180\(^\circ \), and found MSD violations for each intruder angle. The simulation used in the RL training process makes it unlikely that the critical conditions where safety is violated (e.g., MSD boundary, relative heading range) are reached very often, meaning that the NN likely has insufficient training in this region to make safe decisions.

4 Robustness Analysis

Robustness analysis helps us to understand the stability of the NN controller. In most cases, the output produced by the NN should not change dramatically in reponse to small input perturbations (such as sensor noise). We can perform a \(\delta \)-local-robustness [6] to quantify the bounded-input/bounded-output stability of the NN.

To perform this analysis, we generated five arbitrary points covering a range of input conditions and NN outputs. We computed a constant \(\delta \) for an input point x such that for all inputs \(x': \Vert x-x' \Vert _\infty \le \delta \), the neural network output will not change sign (e.g., changing from turning left to right).

Table 1. Robustness analysis results.

Results. The robustness analysis results are summarized in Table 1. SAT results mean an adversarial input was found, while UNSAT results mean no such inputs exist. The results show that the neural network may be not robust. In particular at Point 2, a small input perturbation (\(\delta = 0.02\)) causes the ownship to change from turning strong-left to right. This may lead to unstable behavior in which the aircraft oscillates between left and right turns.

5 Specific Scenarios

We examine six encounter scenarios (system snapshots), similar to [8], where there are expected aircraft maneuvers (e.g., staying on course vs. turning left or right). We check whether the action generated by the neural network aligns with expectations. The following scenarios were considered.

Head-on. The ownship is on course and both airplanes are about to have a head-on collision. We expect the ownship shall make a turn to avoid collision.

Overtake. The ownship is on course while the intruder flies in the same direction and approaches from behind. We expect the ownship shall turn to avoid collision.

Parallel Same Direction. The ownship is on course while the intruder flies side by side in the same direction and is close. We expect the ownship shall not fly towards the intruder.

Parallel Opposite Direction. The ownship is on course while the intruder flies side by side in the opposite direction and is dangerously close. We expect the ownship shall not fly towards the intruder.

Approach from Right. The ownship is on course while the intruder approaches the ownship from right and is dangerously close. We expect the ownship shall turn left.

Far Away. The ownship is on course while the intruder is far away. We expect the ownship shall stay on its course and will not make strong turns.

Table 2. Verification of mid-air encounter scenarios. All times are in seconds.

Results. The encoding of the scenarios and the verification results are summarized in Table 2. The NN generated outputs violating expectations in all but one of the six scenarios. Based on the analysis, we believe that the RL training method did not provide sufficient training data to cover these critical scenarios. All experiments were performed on a Linux server with Intel Xeon E5-2698 v4 CPU @ 2.20 GHz and approximately 504 GB memory.

6 Conclusion and Future Work

We analyzed an aircraft collision avoidance NN using Marabou. We verified the minimum horizontal separation property, analyzed robustness of the NN, and investigated specific interesting scenarios. The results suggest that the RL NN training approach was insufficient to guarantee safety of the system in many critical scenarios. This shows the value of formal analysis for identifying unintended behaviors that may be present in a NN.

The counterexamples generated in the verification of a property could be used to better train the NN. The counterexamples often represent hard-to-reach corner cases. We could directly use them to train the NN in a Supervised Learning environment, because usually there are well-defined expected NN outputs. We could also adjust the RL setup by directly setting these scenarios as the initial states.

It would be interesting to combine the forward reachability analysis of NN with our MSD property verification so that a simulation trace from the initial state to the violation state is generated. Also recall that at certain conditions, due to the turn rate limit, the MSD property violation is unavoidable. We could use backward reachability analysis to compute the corresponding previous system states and actions, until the NN could potentially generate an action to deviate from the collision course. These system states and desired actions could also be added to the training set.