Research on Application of ATC Operation Security Based on Data Mining

Zhang, Zhaoyue; Zhang, Jing; Wang, Sen

doi:10.1007/978-3-030-00018-9_54

Zhaoyue Zhang¹⁶,
Jing Zhang¹⁷ &
Sen Wang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11067))

Included in the following conference series:

International Conference on Cloud Computing and Security

2162 Accesses

Abstract

In order to study the applicability of data mining in the study of ATC operational safety, take the six typical factors that may affect the safety of ATC as the former, and the level of unsafe incidents in ATC as the next term, use correlation analysis and Apriori algorithm, And set a reasonable degree of confidence in the rules, the degree of support for the rules, analysis of air traffic insecurity incidents. Taking the general ATC operational safety incident as an example, the results show that the data mining has applicability in the problem of ATC operational safety, and each of the influencing factors has a certain relevance; Each of the preceding factors has an impact on the safety of ATC operations, but the degree of impact is different. Among them, the factors that have a greater impact are mainly control load, airspace environment and control equipment.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Data Mining Challenges in the Management of Aviation Safety

Descriptive and Predictive Analyses of Data Representing Aviation Accidents

Association rule mining of aircraft event causes based on the Apriori algorithm

Article Open access 11 June 2024

Keywords

1 Introduction

Since the publication of the report titled “Big Data: The Next Frontier of Innovation, Competition, and Productivity” by McKinsey in 2011, big data technology has received extensive attention from various industries, pointing out that big data is a data set that exceeds the capabilities of collecting, storing, managing, and analyzing typical database software tools [1].

In recent years, the civil aviation industry has continued to grow rapidly. Not only has it rapidly expanded its transportation scale and route network, its transport capacity has also been significantly enhanced. Multiplying air traffic flow also brings complex data on the operation of air traffic control. Data related to the safety of ATC can be excavated from these data, accurately identifying the degree of importance of hazard sources and different hazard sources, which is related to whether the civil aviation can continue to develop safely, efficiently and continuously.

Air traffic management is a multi-level and dynamic process. Airborne traffic control involves many types of massive data. Therefore, the storage, classification, analysis, and application of air traffic control data has also become a new research direction. The existing ATC data mining focuses on digital system communication and voice data mining to avoid congestion of voice channels and try to eliminate situations in which pilots and controllers may be misunderstood in the process of talking, in order to deepen the content of data in the future and improve air traffic forecast accuracy. The System Wide Information Management (SWIM) system can integrate flight, performance, geography, weather, and other types of data into three-layer deployments to achieve open, flexible, and secure information management [2], Tasha [3, 4] et al. used cluster analysis to apply clustering to weather and aeronautical data to obtain a possible distribution of aircraft abilities. Ning [5] studied the delay data between three busy airports in the United States by establishing a Bayesian network model, and then obtained the law of delay propagation of airports. In addition, in terms of transportation, Anderson [6] used the K-means method to study traffic data in London, thus identifying traffic accidents. After improving the Apriori algorithm, Wang [7] reduced the traffic accident data from the mobile phone, and then used association rules mining to investigate the causes of road traffic accidents from the aspects of accident relations and accident attributes.

At present, data mining technology has also been applied in the operational risk and safety management of ATC. The emphasis is on the post-event analysis of ATC operations and has achieved certain results [8, 9]. In the process of finding the reasons for controlling unsafe incidents, it is found that when a variety of factors coexist, it will lead to the occurrence of ATC operations security incidents. Since the Apriori algorithm was put forward, its application in various fields has been widely verified [10,11,12,13,14]. Therefore, this paper intends to study the application of ATC based on data mining of Apriori algorithm.

2 Selection of Research Variables

With the growing of air traffic flow in China, the importance of the air traffic control operational safety has become increasingly prominent, and turned into a research hotspot in this field [15]. The air traffic controllers need to handle all kinds of dynamic information under limited resources and make proper air traffic control decisions. In recent years, there have been many incidents of air traffic insecurity. Therefore, it is increasingly important to look for factors that affect the safety of air traffic management and analyze the impact of various factors on unsafe incidents [16].

The factors influencing the safety of ATC operations are mainly man-made factors, environment and equipment (see Fig. 1).

However, the influence of different factors on the operational safety of ATC is different, and these factors have mutual effects. For example, the airspace environment or the deterioration of control equipment will lead to an increase in the regulatory load. It is well known that the increase in regulatory load is an important factor affecting the safety of regulatory operations. Therefore, it is necessary to discuss the relationship between the above items, which will help to find the key points in preventing air traffic management insecurity in the future, so as to fundamentally prevent it.

3 ATC Operational Data Mining Analysis

3.1 Analysis of Association Rules

The purpose of association rules is to find out all the strong association rules in the database, which can effectively mine frequent itemsets. Its main representation is $ A \Rightarrow B $, $ A $ represents if part, $ B $ represents then part.

Assume that $ I = \left\{ {I_{1} ,I_{2} ,I_{3} , \cdots ,I_{k} } \right\} $ is a set of all items. If $ X \subset I,Y \subset I $ exists, $ X,Y $ is called the itemset. If the count of the item is $ k $, it is called $ k - $ item set.

Assume that the set of all items in the database is $ I = \left\{ {I_{1} ,I_{2} ,I_{3} , \cdots ,I_{k} } \right\} $, $ D = \left\{ {T_{1} ,T_{2} ,T_{3} , \cdots ,T_{n} } \right\} $ is a database, and $ T_{i} = \left\{ {I_{i1} ,I_{i2} ,I_{i3} , \cdots ,I_{ik} } \right\} $, and any element $ I_{ij} (j \in \left[ {i,k} \right]) \subseteq I $ in $ T_{i} $, $ T_{i} $ is called a transaction in the database.

$ A \Rightarrow B $ is an association rule, where $ A $, $ B $ must satisfy $ \left\{ {A,B\,\text{|}\,A \subset I,B \subset I,A \cap B = \varPhi } \right\} $ at the same time, $ A $ is the premise of the rule, and $ B $ is the result.

According to the sample data, many association rules will be obtained, but not all association rules are valid, while some association rules have low level of association and are not effective. Therefore, it is necessary to determine whether the association rules are valid according to various measures of association rules. The most commonly used are confidence and support.

(1) Rule Confidence. Confidence is the measurement of the accuracy of simple association rules. It describes the probability of item Y in item X, and reflects the probability of Y appearing under the condition of X. The mathematical expression is as follows:

$$ C_{x \to Y} = \frac{{\left| {T\left( {X \cap Y} \right)} \right|}}{{\left| {T\left( X \right)} \right|}} $$

(1)

$ \left| {T\left( X \right)} \right| $ represents the number of transactions including items $ X $, $ \left| {T(X \cap Y)} \right| $ represents the number of transactions containing both items $ X $ and $ Y $. If the confidence degree $ C_{x \to Y} $ is larger, the project $ X $ appears then the project $ Y $ is also more likely to appear, reflecting the probability of $ Y $ under the condition of $ X $.

(2) Rule Support. It is a measurement of the generality of simple association rules, indicating the probability of simultaneous occurrence of item $ X $ and item $ Y $, representing the probability of another event occurring in the event of an event has occurred, and it can also be used to measure the confidence level or reliability of association rules. The mathematical expressions are as follows:

$$ S_{x \to Y} = \frac{{\left| {T\left( {X \cap Y} \right)} \right|}}{\left| T \right|} $$

(2)

$ \left| T \right| $ represents the total number of transactions, and if the support degree $ S_{x \to Y} $ is low, the rules are not universal.

The ideal association rules need high confidence and support. If the support degree is high and the confidence level is low, the credibility of the rules is low. If the confidence of the rules is high but the support is low, the application scope of the rules is small.

Assume that $ D $ is a database and $ X,Y $ is an item set. If the support degree $ s $ and confidence $ c $ of $ X \Rightarrow Y $ are not less than the minimum support degree $ \hbox{min} \_s $ and the minimum confidence degree $ \hbox{min} \_c $, $ X \Rightarrow Y $ is a strong association rule.

Therefore, in order to select a rule with a certain degree of confidence and support among numerous simple association rules, we need to set the threshold of minimum confidence and minimum support, and only the threshold that is greater than the minimum confidence and minimum support is effective. At the same time, the threshold setting should be reasonable: if the threshold is too small, the generated rules may not be representative. If the threshold is too large, the rules that meet the threshold requirements may not be found.

In general, if the mined simple association rule meets the preset threshold, then the rule is considered to be effective. But in fact, this rule may not be applicable. Therefore, confidence and support can only measure the validity of an association rule, but it cannot measure whether it is practical or meaningful. Therefore, we need to consider the rule lifting degree. Its mathematical expression is as follows:

$$ L_{x \to Y} { = }\frac{{C_{x \to Y} }}{{S_{Y} }} = \frac{{\left| {T\left( {X \cap Y} \right)} \right|}}{{\left| {T\left( X \right)} \right|}}/\frac{{\left| {T\left( Y \right)} \right|}}{\left| T \right|} $$

(3)

The rule lifting degree reflects the impact of the probability of item $ X $ appearance on the appearance of item $ Y $. It is meaningful when $ L_{x \to Y} > 1 $, which shows that $ X $ has a promoting effect on $ Y $.

3.2 Association Data Mining Method

The common association rule algorithms mainly include Apriori algorithm based on frequent itemset mining, Decision Tree algorithm based on mutual information computation and Rough set algorithm based on equivalence class partition. Because Apriori algorithm has outstanding advantages in mining the intrinsic meaning of data and the relationship between unknown data, it has became the core algorithm of simple association rules in data mining by the constant perfecting and improvement of scholars.

The basic idea of Apriori algorithm is to iterate repeatedly. From the 1- item sets, according to the given support threshold, we will prune frequent 1- item sets and find frequent 1- item set $ L_{1} $. According to the priori principle, if a set is frequent, all its subsets are frequent. Therefore, in generating a candidate 2- item set ($ C_{2} $), the frequent 1- term set $ L_{1} $ can be directly selected. After the candidate 2- item sets are generated, the candidate 2- item set $ C_{2} $ is pruned according to the set support threshold to generate the frequent 2- item set $ L_{2} $. And so on, until the most frequent itemset $ L_{k} $ is generated. Therefore, the data mining process of Apriori algorithm can be divided into two steps:

(1) Generating Frequent Item Sets

(a) Set $ L_{k - 1} $ that is composed of frequent items (k-1)-sets generate all candidate k- set $ C_{k} $. P and q are two of these different item sets, if the first k-2 items of the p are the same, and the last item of p is greater than the last item of q, then add the last item of q to the last item of p to make it a candidate set of k-. Then find all the k- item sets in turn, and make up the $ C_{k} $.

(b) Prune the $ C_{k} $. For each of the item sets, check whether the subsets of each (k-1) are frequent item sets. In a large number of subsets, if there is a subset does’t belong to the frequent itemset, w will be removed from the $ C_{k} $.

(c) Calculate the support of each subset w in $ C_{k} $:

$$ Support = \frac{{N_{i} }}{N} $$

(4)

Where $ N_{i} $ is the number of transactions that contain an item set, and $ N $ is the number of all transactions.

(d) Add a set of items which meet the condition of Support > minsup to the frequent k- item sets which called L_k.

(e) Just find the frequent k- item sets and have k < kmax, repeat the steps above to look for (k + 1)- item sets.

(2) Generate Association Rules Based on Frequent Item Sets

Select association rules which meet the condition of confidence is greater than the preset minimum minsup from all simple association rules generated from frequent item sets, and make a valid association rule. Steps are as follows:

For every frequent item set l in L, all non-null subsets of l are produced.

For each nonempty subset A of l, if the set evaluation criterion is met, which meet the follow conditions:

$$ \frac{Support(l)}{Support(A)} \ge \hbox{min} \_conf $$

((5))

Support(l) and Support(A) are respectively the support of item set l and non-null subsets A, finally output the rule:$ A \to \bar{A} $, where $ \bar{A} = l - A $.

The process flow chart above is as follow (see Fig. 2):

3.3 Algorithm Performance

If any $ k - 1 $-dimensional subset of the $ k $-dimensional data itemset $ x $ is not a frequent itemset, thus $ x $ is not a frequent itemset, then some elements $ c $ in $ C_{k} $ may be eliminated, that is, to determine whether $ K $ $ k - 1 $-dimensional subsets of $ c $ are all in $ L_{k - 1} $. In this method, the eliminated $ c $ only needs to scan $ L_{k - 1} $ once in the optimal state, and in the worst state until the $ K $ th $ k - 1 $-dimensional subset is not in $ L_{k - 1} $. It can be seen that the average number of inspections for no element is $ \left| {L_{k - 1} } \right| \times k/2 $ times, the average calculation amount for the whole process is $ \left| {C_{k} } \right| \times \left| {L_{k - 1} } \right| \times k/2 $, and the average calculation amount for generating frequent itemsets procedures is $ \left| {C_{k} } \right| \times \left| {L_{k - 1} } \right| \times k/2 + \left| D \right| $.

3.4 An Example of Data Mining for Safe Operation of Air Management Based on Apriori Algorithm

In this paper, intended to adopt Apriori model in SPSS Modeler software to min unsafe event association rules for air management operation in order to explore which variable factors exist at the same time will lead to a higher probability of occurrence of unsafe events. Due to rule support and rule confidence are determined by the nature of the actual problem, in this paper, we select the support of 20% as a minsup to analyze all the implied association rules between all the data of an air management unsafe event.

According to the relevant content of the research variable selection, we set six preceding factors as A: Control equipment, B: Workload of the Controller, C: The psychological quality of the controller, D: The physical quality of the controller, E: Airspace Environment, F: Control of indoor environment. Taking the unsafe event level of air management unsafe events as the bottom factor, this paper introduces the general air management unsafe events as an example.

When the rule support is 20%, the rule confidence level is 80%, and the rule lifting degree > 1, the mining association rules are arranged in the first ten items according to the support degree in Table 1, as shown below:

Table 1. Non-safe time association rules for air management (Support 20%, Confidence 85%, Rule Elevation > 1)

Full size table

It can be seen from the table that the data mining association analysis based on Apriori algorithm is suitable for the research of the operation Safety of air management. Taking the general air traffic control incident as an example, the main insecurity factors are the heavy workload of the controllers, the complexity of airspace environment, and the relatively old air traffic control equipment. Under the condition that the rule support degree is 20%, each of the preceding factors may have influence on the operation safety of the air management, but different factors of the previous factors and different combinations of the preceding factors have different effects on the air traffic control incidents.

In the future, based on this research method, we can conduct correlation analysis on other levels of ATC operational incidents to identify potential safety hazards and make rectifications in time to ensure that ATC operations can be carried out efficiently and safely in a long term.

4 Conclusions

On the basis of predecessors’ research, this paper makes a detailed interpretation of the Apriori algorithm and further broadens the field, combines this algorithm with the existing data of air traffic management, and applies the data mining technology to the analysis of the safety management of air traffic control, and discovered the main factors affecting the safety of air traffic control operations and their impact. Thus we can draw the following conclusions:

(1)
Data mining technology is feasible in the field of air traffic control operational safety. The application of Apriori correlation analysis algorithm can effectively analyze the influencing factors of ATC operation safety, and can analyze the importance of each influencing factor and the correlation between each influencing factor.
(2)
Under the condition that the rule support degree is 20%, each of the preceding factors may have influence on the operation safety of the air management, but the influence degree of different factors in the preceding paragraph is different.
(3)
In general, the main reason for the unsafe incidents is that the controller has a large workload, the airspace environment is more complicated, and the control equipment is old. Secondly, the physical quality and psychological quality of the controllers also have a certain influence on the operation of air traffic control. This may be because the controllers with better physical quality have stronger anti-fatigue ability, and the controllers with good psychological quality can adapt to stronger work. The pressure allows them to handle complex tasks more calmly in the regulatory work.
(4)
In the future research, this algorithm or its improved algorithm can be applied to analyze other level of air traffic control incidents, identify important influencing factors in time, and propose the improvement measures to ensure that the air traffic management can run efficiently and safely in a long time.

References

Mckinskey Global institute, Big Data: The next frontier innovation, competition and productivity, May 2011
Google Scholar
FAA: SWIM core Architecture evolution concepts. MITRE Technical report MTR90193. Mitre, McLean, VA (2009)
Google Scholar
Inniss, T.R.: Seasonal clustering technique for time series data. Eur. J. Oper. Res. 175(1), 376–384 (2006)
Article MathSciNet Google Scholar
Murca, M.C.R., Delaura, R., Hansman, R.J., et al.: Trajectory clustering and classification for characterization of air traffic flows. In: AIAA Aviation Technology, Integration, and Operations Conference, p. 3760 (2015)
Google Scholar
Xu, N., Donohue, G., Laskey, K.B., et al.: Estimation of delay propagation in aviation system using Bayesian network. USA (2016)
Google Scholar
Anderson, T.K.: Kernel density estimation and K-means clustering to profile road accident hotspots. Accid. Anal. Prev. 41(3), 359–364 (2009)
Article Google Scholar
Wang, H.H.: The application of the mining of association rules in analysis of traffic accidents. Anhui University (2011)
Google Scholar
Bongiorno, C., Gurtner, G., Lillo, F., et al.: Statistical characterization of deviations from planned flight trajectories in air traffic management. J. Air Transp. Manag. 58, 152–163 (2017)
Article Google Scholar
Zhou, J.L.: Risk identification and analysis of air traffic control based on text data and radiotelephony data. Civil Aviation Flight University of China (2017)
Google Scholar
Habib, A.A., Govindaraju, R.: Success measures evaluation for mobile commerce using text mining based on customer tweets. In: IOP Conference Series: Materials Science and Engineering, vol. 319, no. 1, p. 012009 (2018)
Article Google Scholar
Janani, G., Devi, N.R.: Road traffic accidents analysis using data mining techniques. JITA J. Inf. Technol. Appl. 14(2) (2016)
Google Scholar
Huang, W.C, Jia, L., Peng, D.G.: Apriori-based association rule algorithm and its application in power plant. J. Syst. Simul. 266–271 (2018)
Google Scholar
Wu, F.D.: Application research of enrollment management based on apriori algorithm. Hebei University (2014)
Google Scholar
Kumar, B.S., Rukmani, K.V.: Implementation of web usage mining using APRIORI and FP growth algorithms. Int. J. Adv. Netw. Appl. 1(06), 400–404 (2010)
Google Scholar
Zhang, Y.X., Wang, X.R., Wu, M.G.: Evaluation of the unconventional operational risks for the air traffic control based on the fuzzy hierarchical analysis process and the cloud model. J. Saf. Environ. 16(4), 42–47 (2016)
Google Scholar
Du, H.B., Wang, X.L.: Risk identification of airport traffic control tower based on flow diagram. China Saf. Sci. J. 20(6), 80–87 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Air Traffic Management, Civil Aviation University of China, Tianjin, China
Zhaoyue Zhang
School of Flight Technology, Civil Aviation University of China, Tianjin, China
Jing Zhang
College of Information and Communication Engineering, Harbin Engineering University, Harbin, China
Sen Wang

Authors

Zhaoyue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaoyue Zhang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Zhaoqing Pan
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Zhang, J., Wang, S. (2018). Research on Application of ATC Operation Security Based on Data Mining. In: Sun, X., Pan, Z., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science(), vol 11067. Springer, Cham. https://doi.org/10.1007/978-3-030-00018-9_54

Download citation

DOI: https://doi.org/10.1007/978-3-030-00018-9_54
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00017-2
Online ISBN: 978-3-030-00018-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Research on Application of ATC Operation Security Based on Data Mining

Abstract

Similar content being viewed by others

Data Mining Challenges in the Management of Aviation Safety

Descriptive and Predictive Analyses of Data Representing Aviation Accidents

Association rule mining of aircraft event causes based on the Apriori algorithm

Keywords

1 Introduction

2 Selection of Research Variables

3 ATC Operational Data Mining Analysis

3.1 Analysis of Association Rules

3.2 Association Data Mining Method

3.3 Algorithm Performance

3.4 An Example of Data Mining for Safe Operation of Air Management Based on Apriori Algorithm

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Research on Application of ATC Operation Security Based on Data Mining

Abstract

Similar content being viewed by others

Data Mining Challenges in the Management of Aviation Safety

Descriptive and Predictive Analyses of Data Representing Aviation Accidents

Association rule mining of aircraft event causes based on the Apriori algorithm

Keywords

1 Introduction

2 Selection of Research Variables

3 ATC Operational Data Mining Analysis

3.1 Analysis of Association Rules

3.2 Association Data Mining Method

3.3 Algorithm Performance

3.4 An Example of Data Mining for Safe Operation of Air Management Based on Apriori Algorithm

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation