Keywords

1 Introduction

Flight data analysis has become an important part of each airline’s safety management system (SMS) to the extent that establishing a flight data monitoring (FDM) programme is nowadays mandatory for aircraft operators over 27 tons as per ICAO regulations [1]. Airlines and regulators collect a vast amount of data, but only a small portion of it is typically analysed in depth [2]. This is due to the fact that only events which are considered “abnormal” are analysed, leaving most of the data to be stored without further use. This data about everyday performance can, however, contain useful safety information for airlines, both in terms of trend analyses and learning about exceptionally good performance [3]. Many studies have already been published about different algorithms and data analysis methods to derive new learning opportunities from existing data [4]. This study focuses on the use of self-organising maps (SOM) to cluster flight data collected during a simulated approach. Different performance metrics were then collected to assess the clustering performance and the possibilities of deriving new knowledge from the existing data.

2 Relative Work

This study focuses on applying self-organising maps, which is a machine learning algorithm, to flight data analysis methods. Many studies which focus on data analysis algorithms and techniques, especially about data clustering, have been published in the past. This study aims to contribute to the previous research in this field.

Airlines typically use FDM to analyse their daily safety performance. Various flight parameters (over 1000 different parameters for modern aircraft such as the Airbus A350 or Boeing 787) are recorded, downloaded, and processed by flight data analysts. Flight data analyses as of today mostly focus on capturing deviations from acceptable range of parameters, such as unusually high or low airspeeds for a specific flight phase, markers of an unstable approach (e.g. the aircraft not being fully configured for landing below a specific height) and dual pilot inputs. The data is usually also classified within a safety matrix to analyse trends both in terms of event frequency and severity [5, 6]. Some recent developments now also include performing big data analyses to monitor ongoing trends within an airline, consistent with a safety-II approach [3]. Deviations from an acceptable range of parameters can help airlines to monitor their pilots’ proficiency and take mitigation measures such as specific training exercises if required [7]. As experts may accept minor deviations (that are within the acceptable range) in order to leave cognitive space for other objectives, an understanding of how these deviations interact with safety performance can prove very beneficial for airlines with pilots from diverse backgrounds featuring different piloting techniques, exposure to different equipment types, cultural values, and perspectives on individual safety objectives [8]. However, most analysis methods used in flight operations remain focused on identifying threshold exceedances, which leads to roughly only 3 to 5% of the data being analysed [9]. The 95% left could, however, lead to additional information about daily operations and normal occurrences, consistent with safety-II principles [10]. Furthermore, it heavily relies on the correct setting of the threshold for the exceedance event detection. If the detection threshold is too narrow, many events would be flagged, resulting in many false positives. Conversely, a too wide threshold would flag few events or no events at all, leading to many false negatives. Finally, it also relies on the supposition that a specific incident could occur. Unimagined potential incidents are hard to notice in the best case if no detection method has been designed previously [11]. Analysing flight data is not only done during everyday operation but also during the design stage of a new technology, such as using a touchscreen as means of flight control [12, 13].

Much previous research has demonstrated the benefits of adding machine learning to current flight data analyses. Machine learning can be defined as a programme’s ability to increase its performance with experience, i.e. through learning from the data it is fed with [14]. Machine learning can be divided into two categories: supervised and unsupervised learning. Supervised learning involves knowing the correct solution for a given dataset, whereas unsupervised learning involves not knowing any solution for a specific dataset [9]. Previous research about machine learning in flight analysis comprises Bayesian networks [15], local outlier probability [11], Multiple Kernel Anomaly Detection (MKAD) [16] and clustering [17, 18] to cite a few.

Self-organising Maps (SOM), often used as a synonym for Kohonen’s Self-Organising Map is a part of a broader type of machine learning techniques called artificial neural networks (ANN), which has not been researched extensively in conjunction with flight data analysis. ANNs are designed to simulate the sensory processing of the human brain. It simulates a network of model neurons, which can ‘learn’ many different types of problems, especially classifications [19]. The SOM algorithm specifically maps the data pattern from an input space (the original data patterns) into a n-dimensional space, known as the output space (Fig. 1). The mapping aims at preserving the topological relations between both spaces. In order to ease visualisation, the output space is usually one or two-dimensional. To map the output space, the SOM algorithm uses a neighbourhood function, which is responsible for the interactions between the different units. Therefore, SOM can be used effectively for clustering tasks and performs similarly to k-means clustering [20]. SOM are able to extract stabilised phases of flight as well as transient changes in flight parameters. Furthermore, SOM can handle large datasets, which makes them well-suited to analyse flight data and an interesting alternative to k-means clustering [21, 22].

Fig. 1.
figure 1

SOM structure of size X times Y based on an input vector x. The winner neuron is in red, the neighboring neurons are in green and the other neurons in blue [23]

3 Methodology

3.1 Data Source

The dataset consists of 296 simulated approaches flown in an engineering flight simulator called the Future Systems Simulator at Cranfield University [24].

74 participants (55 males, 18 females, 1 preferred not to say) were asked to fly four approaches and landings. Table 1 displays the participants’ demographical data. The first two approaches were performed with the use of a sidestick, the first one without any turbulence and the second one with simulated turbulence. The following two approaches, again with and without turbulence, were performed this time with a gamepad. The participants were asked to follow a standard three-degree descent path while tracking the runway, assisted with the instrument landing system (ILS) through the use of the Flight Director (FD). Additionally, participants could follow the cues displayed to them by the precision approach path indicator (PAPI), while the auto throttle (ATHR) would control the engine thrust to maintain a constant approach speed. For the purpose of this study, the following parameters were considered: the anonymous participant ID, the altitude, calibrated airspeed (CAS), deviation in altitude from the three-degree glideslope, aircraft pitch, presence of a disturbance (turbulence) and the mean of controlling the aircraft (sidestick or gamepad).

Table 1. Participants’ demographics

3.2 Research Procedure

The dataset was first cleaned to only keep the last 4.5 NM to 1 NM to the runway threshold. As most participants were not trained pilots, flying the last 1NM to the runway threshold and landing the aircraft as per standard operating procedures (SOPs) turned out to be challenging and led to high data variability within the last section of the flight. Therefore, it was removed in addition to 22 different approaches, where the data featured too many inconsistencies. The data was also discretised and interpolated for every 0.05 NM by the distance to the threshold. A new variable called ‘\(\Delta GP\)’ was created, which measures the participants’ ability to track the three-degree glideslope given by the PAPI. It represents the difference in altitude from the aircraft’s altitude compared to the reference altitude corresponding to an ideal 3-degree glide path. It has been determined as follows:

$$ \begin{array}{*{20}l} {GP = \tan \left( 3 \right) \times x_{thr} \times 6076.12 + CH_{thr} } \hfill \\ {\Delta GP = Altitude - GP} \hfill \\ \end{array} $$

The following parameters were used: xthr, representing the aircraft distance to the runway threshold in NM and CHthr – the crossing height, which is 50 ft above the runway threshold height.

The data was then divided into two sets: one for approaches flown with the sidestick and one for the approaches flown with the gamepad. The root mean square error was calculated for each \(\Delta GP\), both of the sidestick dataset (RMSES) and of the gamepad dataset (RMSEG). Each dataset was clustered by corresponding \(\Delta GP\) variable using the self-organising map (SOM) algorithm to compare the participants’ performance. Following the clustering, Welch ANOVAs, and subsequent Games Howell post-hoc tests were conducted on each cluster’s RMSE to assess the clustering performance. Python 3.10.11 was used to conduct the analysis.

3.3 Statistical Tools Used

The SOM algorithm requiring the Python ‘MiniSom’ package was used. The SOM are a type of Artificial Neural Networks (ANN) which convert nonlinear statistical relationships on higher dimensions into a low-dimensional discretised representation map. The map consists of output neurons, usually arranged in a two-dimensional grid and trying to preserve topological relations. SOM and k-means algorithms are identical when the radius of the neighbourhood function in the SOM is equal to 0 [20]. The maximum number of clusters which can be obtained is equal to the number of output neurons. To obtain the optimal number of clusters, the Silhouette score was applied. It is equal to \(S = \frac{{b_{i} - a_{i} }}{{\max a_{i} ,b_{i} }}\), where a represents the mean cluster centroid distance and b the average nearest cluster distance for every sample i. The Silhouette score is a measure of how similar an object is to its own cluster versus the other clusters. It returns a value between −1 and 1, 1 being the best clustering. For each clustering, a variable optimisation algorithm was used, which calculated the SOM σ and learning rate for the best Silhouette score over 10′000 iterations. The optimal grid dimension corresponds to \(C = 5\sqrt N\), where C corresponds to the number of neurons and N the number of samples in the dataset [25]. For the size of the used dataset, it corresponds to a 3 × 3 grid. To analyse the SOM performance, several metrics were used: the normalised quantization error, the topographical error, the trustworthiness, and the neighbourhood preservation. The quantization error represents the mean difference between the input samples and the winning neurons. It is equal to \(QE\left( M \right) = \frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} \left\| {x_{i} - w_{{c\left( {x_{i} } \right)}} } \right\|\), n representing the number of data points and wcxi the weight vector of the best matching unit in the map for the data point xi. The quantization error was then normalized by calculating the average quantization error for each node as follows: \(NQE = \left\{ {\begin{array}{*{20}c} {\frac{1}{N}\mathop \sum \nolimits_{j = 1}^{N} \left( {\begin{array}{*{20}c} {\frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} \left\| {\emptyset \left( {x_{i} } \right) - x_{i} } \right\|} \\ {norm\left( {w_{j} } \right)} \\ \end{array} } \right)} \\ {1\, if\, no\, data\, point\, matches\, the\, unit} \\ \end{array} } \right.\), k corresponding to the number of vectors mapped of each unit. The topographical error is equal to \(TE = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {d_{i} }\) where n is the number of input vectors and di the distance between the best matching and second-best matching units. The trustworthiness is equal to \(M_{1} \left( k \right) = 1 - \frac{2}{{Nk\left( {2N - 3k - 1} \right)}}\sum\nolimits_{i = 1}^{N} {\mathop \sum \nolimits_{{x_{j} \in U_{k} \left( {x_{i} } \right)}} \left( {{\text{r}}(x_{i} ,x_{j} } \right) - k)}\). The neighbourhood preservation is equal to \(M_{2} \left( k \right) = 1 - \frac{2}{{Nk\left( {2N - 3k - 1} \right)}}\sum\nolimits_{i = 1}^{N} {\mathop \sum \nolimits_{{x_{j} \in V_{k} \left( {x_{i} } \right)}} \left( {r^{ \wedge } (x_{i} ,x_{j} } \right) - k)}\). In both formulas, N represents the data set. Uk(xi) represents the data points which are k closest to the input space xi and Vk(xi) the data points which are the k closest to xi in the output space. r(xi, xj) represents the rank of xj when the data points are ordered by distance from xi and r^(xi,xj) represents the rank of xj when ordered by distance in the projection. The elbow method was used to determine the optimal k value. As the data does not show variance homogeneity (pBartlett < 0.05 for both sidestick and gamepad clustering datasets), a Welch ANOVA and Games Howell post-hoc test were used to determine the clustering performance.

4 Results and Discussion

4.1 Sample Characteristics

A total of 278 approaches from 4.5 NM to 1 NM to the runway threshold were analysed through the SOM clustering methodology. Focus was set on analysing the participants’ vertical performance, i.e. their ability to maintain a stable descent path according to a standard 3° descent. The clustering of the participants’ tracking of the 3-degree glideslope making it possible to assess the participants’ manual flying skills both in terms of good performance and common errors. The flight parameters look as displayed in Fig. 2.

Fig. 2.
figure 2

Deviations from the glideslope expressed from the distance to the runway threshold with the sidestick and gamepad as means of control.

4.2 Clustering of the Deviation from the 3 Degrees Glideslope for Approaches Flown with a Gamepad

The variables shown in Table 2 were used to optimise the SOM algorithm to cluster the approaches flown with a gamepad. The Silhouette score indicates a moderately strong clustering performance with an optimised cluster number of four clusters [26]. The results from Table 3 demonstrate that SOM can be a meaningful tool to cluster the gamepad flight data. Although the NQE remains fairly high, indicating that some data points do not match the unit’s weight vector [27], the topographic structure of the original data is well-preserved on the map [28, 29].

Table 2. Sigma, learning rate and corresponding silhouette score used as parameters for the SOM processing.
Table 3. Clustering metrics obtained by the SOM algorithm.

The approaches flown with a gamepad grouped by cluster are displayed in Fig. 3, which shows the CAS, deviation from the three degrees glideslope and pitch. The trials with disturbance appear to have been more difficult for some participants, as clusters 3, 6 and 7 are almost only composed of landings with disturbances and feature a comparatively higher RMSE (Fig. 3 and Table 4). The clusters are well-defined, the Welch ANOVA indicates significant differences between clusters (F = 3015, p < 0.05). The post hoc analysis shows significant differences in RMSE (p < 0.05) between the clusters (Table 5). Cluster 5 displays the least RMSE, and cluster 6 displays the highest RMSE (Table 4). The difference between experienced pilots and novices is less pronounced than within the sidestick dataset. A cause for this can be the smaller experience gap in using gamepads compared to flying with a sidestick.

Fig. 3.
figure 3

Calibrated airspeed, deviation from the three-degree glideslope and corresponding pitch outputs for each cluster.

Table 4. Average RMSE by cluster
Table 5. Summary of post-hoc results for the gamepad data

4.3 Clustering of the Deviation from the 3 Degrees Glideslope for Approaches Flown with a Sidestick

The variables shown in Table 6 were used to optimise the SOM algorithm to cluster the approaches flown with a sidestick. The Silhouette score showed moderately strong clustering performance with an optimised cluster number of four clusters [26], although lower than the one corresponding to the gamepad flight data. The results from Table 7 demonstrate that SOM can be a meaningful tool to cluster the sidestick flight data. Although the NQE remains fairly high, indicating that some data points do not match the unit’s weight vector [27], the topographic structure of the original data is well-preserved on the map [28, 29]. Overall, the clustering result metrics are very similar compared to the gamepad flight data, but the flying performance is better when participants flew with the sidestick. This can be due to the difference in experience, as qualified pilots would perform better than non-pilots with the sidestick whereas the overall flying performance is lower with the gamepad (Table 8).

Table 6. Sigma, learning rate and corresponding silhouette score used as parameters for the SOM processing.
Table 7. Clustering metrics obtained by the SOM algorithm.

The approaches flown with a sidestick grouped by cluster are displayed in Fig. 4, which shows the CAS, deviation from the three degrees glideslope and pitch. The trials with disturbance appear to have been more difficult for some participants, as clusters 3 and 7 are almost only composed of landings with disturbances and feature a comparatively higher RMSE, similar to the results of the gamepad landings (Fig. 3 and Table 4). The clusters are well-defined, the Welch ANOVA indicates significant differences between clusters (F = 5272, p < 0.05). The post hoc analysis shows significant differences in RMSE (p < 0.05) between the clusters (Table 9). Cluster 4 displays the least RMSE (Table 8) and cluster 2 displays the highest RMSE. The results are interesting with regards to the participants’ experience, as both experienced pilots and novices are present in cluster 4.

Fig. 4.
figure 4

Calibrated airspeed, deviation from the three-degree glideslope and corresponding pitch outputs for each cluster.

Table 8. Average RMSE by cluster
Table 9. Summary of the post-hoc results for the sidestick data

4.4 Limitations

Several limitations are present in this study. Firstly, there is a high variance within the pilots’ experience, ranging from novice to expert, which distorts the data, compared to an FDM dataset. Moreover, it can be difficult for non-pilots to follow the precision approach indicator (PAPI) and ILS guidance for the first time. The control forces on the sidestick are those of a generic aircraft which might also differ from some actual aircraft types. Finally, the SOM algorithm parameters could be optimised, and a stronger algorithm could be used for the clustering instead of the MiniSom package, which is an introductory package into SOM. This would provide a better overall clustering performance.

5 Conclusion

The clustering method through SOM provides useful information to analyse flight performance beyond exceedance events. For these datasets, it shows that, in overall, the flying performance is less susceptible to variability when flying with a sidestick compared to flying with a gamepad. This might indicate that for a novice, the use of a gamepad is easier than the sidestick. However, experienced pilots perform better when flying with the sidestick, as the average RMSEs for the sidestick data are lower than for the gamepad. Based on the deviation from the glidepath clusters, it is also possible to analyse the pilots’ pitch inputs on the flight controls and determine the effects of the different flying techniques on the flight path. Furthermore, the results show that some participants had more difficulty handling approaches with disturbances, both with the sidestick and with the gamepad. A more detailed study could be accomplished with further tuning of the SOM parameter to increase the clustering performance. The use of SOM and clustering in general could prove beneficial for airlines to perform big data and trend analyses in addition to a purely exceedance-based analysis and so take a step further towards safety-II by considering the contexts behind detected exceedances which may be influenced by previous experience levels and familiarity.