1 Introduction

As our society becomes more connected and technologically advanced, the role of security solutions and mitigation strategies will be more important. The challenge of securing our systems and society (that relies on these systems) is, however, compounded by the constantly evolving threat landscape (Xiao et al. 2018b; Guan and Ge 2018; Sliti et al. 2018).Footnote 1 Hence, designing more efficient and effective cyber security solutions is a topic of ongoing interest.

Cyber security refers to the use of various measures, methods, and means to ensure that systems are protected from threats and vulnerabilities, and to provide users with correct services efficiently. Therefore, the cyber security mentioned in this paper includes threats from outsideFootnote 2 and within systems (known as network security in some studies). These threats will have a severe impact on the regular operation of the systems, so the goal of cyber security is to protect threats as much as possible, and to timely and effectively meet the requirements of detection before the accident, handling in the accident, and recovery after the accident.

In recent years, there have been attempts to design artificial intelligence (AI)-based solutions for a broad range of cyber security applications, partly due to the growing understanding of organizations in the importance of AI in mitigation cyber threats.Footnote 3 For example, AI-based approaches to model nonlinear problems have been shown to perform well in nonlinear classification (Ozsen et al. 2009), which can also be used to facilitate cyber threat classification. Interests in AI-based solutions are also partly driven by advances in computing capabilities. For example, according to Stanford University’s AI Index 2019 Report,Footnote 4 the time required to train large-scale image classification system on cloud infrastructure decreases from approximately three hours in October 2017 to about 88 seconds in July 2019. Computing power for AI-based approaches is also reportedly doubling every three months or so, surpassing Moore’s law. Such capabilities can be utilized to improve AI-based cyber security solutions’ performance.3,4 Examples of AI-based solutions include those developed by MIT and PatternEx (Veeramachaneni et al. 2016b), Darktrace (which uses AI to build an enterprise immune system),Footnote 5 DeepArmor (AI driven system against adversarial attacks) (Ji et al. 2019a), X by Invincea (which uses deep learning to understand and detect security threats),Footnote 6 Cognigo’s DataSense (which uses machine learning algorithms to distinguish and protect sensitive data from non-sensitive data).Footnote 7 However, it is also known that machine intelligence cannot totally replace human intelligence, and the next generation of AI will most probably combine both human and machine intelligence (Kowert 2017; Zhang et al. 2020) (also referred to as human-in-the-loop).

Therefore, this paper surveys and summarizes key AI-based approaches for cyber security applications in user access authentication, network situation awareness, dangerous behavior monitoring, and abnormal traffic identification. Specifically, the following academic platforms are mainly searched: Google Scholar, ACM Digital Library, IEEE Xplore, SpringerLink, and ScienceDirect, as well as archival sites: ResearchGate, using the keywords and Boolean operators such as:

  • (“artificial intelligence” OR “AI” OR “machine learning”) AND (“access authentication” OR “mode authentication” OR “biometric authentication”),

  • (“artificial intelligence” OR “AI” OR “machine learning”) AND (“situation awareness” OR “security situation awareness”),

  • (“artificial intelligence” OR “AI” OR “machine learning”) AND (“dangerous monitoring” OR “attacks”),

  • (“artificial intelligence” OR “AI” OR “machine learning”) AND (“traffic identification” OR “traffic analysis”),

  • (“artificial intelligence” OR “AI” OR “machine learning”) AND (“cyber security” OR “network security”).

We located over 150 articles, and we used the following inclusion criteria that resulted in the selection of 54 articles to be discussed in this paper.

  • The article has data, comparative experiments, or a detailed feasibility analysis of some proposed framework.

  • The subject of the article aligns with the topic of our survey.

  • The article was published in a peer-reviewed journal or a conference.

  • The article was published within the last five years.

In addition, the paper located a number of related literature review and survey articles. Table 1 explains how our paper differs from existing literature review and survey articles (Note: the column of Number of articles discussed only counts the related methods and frameworks).

Table 1 Existing literature review and survey articles on AI-based cyber security solutions

The remaining part of this paper is organized as follows. In the next two sections, the paper briefly reviews the key advantages and limitations of utilizing AI in the four cyber security applications (i.e., user access authentication, network situation awareness, dangerous behavior monitoring, and abnormal traffic identification). In the fourth section, the conceptual human-in-the-loop cyber security model is presented. Finally, the last section concludes this paper.

2 Potential applications of AI in cyber security applications

This section reviews related literature on AI-based solutions for user access authentication, network situation awareness, dangerous behavior monitoring, and abnormal traffic identification in Sects. 2.12.4, prioring to summarizing the discussion in Sect. 2.5.

2.1 User access authentication

2.1.1 User access authentication requirements

As the first defense line of cyber security, the system needs to strengthen the management of user access authentication, accurately identifies all kinds of camouflage behaviors, and realizes the detection of illegal or malicious objects. Before operation, the system should ensure that users are authenticated. At the same time, the user data should be confidential to prevent other risk events such as malicious collection of user information. Figure 1 shows that in the current authentication process, one of the research focuses on adding other features to enhance the uniqueness of password matching process, so as to minimize the probability of others passing off as legitimate users.

Fig. 1
figure 1

User access authentication research focuses

2.1.2 Cases of mode authentication

How to match passwords and add other user characteristics to ensure the security of dual authentication is a challenge that needs to be solved in mode authentication. For example, current ATMs only use PIN codes for identity verification. This single mode does not guarantee the security of authentication (Adekunle et al. 2019). Based on the shortcomings of one-time authentication, multi authentication technology such as Shoufan (2017a) has been considered, it used Random Forest to achieve this goal. Korkmaz (2016) not only did password matching in the password authentication system, but also trained the user’s keyboard using some styles through neural network. These styles included the user’s typing speed and the typing style, key combination, and other aspects. Wang and Fang (2019) designed a kernel function with both global and local functions, and they built a mobile communication network security authentication mechanism based on Support Vector Regression (SVR), but less data was used in simulation. Chang et al. (2016) used One-Class Support Vector Machine (One-Class SVM) to realize keystroke dynamics pattern recognition, and this pattern has received widespread attention due to AI (Qiu 2017). Lu et al. (2020) used Convolutional Neural Network (CNN), reinforcement learning, and transfer learning to construct a physical authentication scheme. It aimed at mobile edge computing, and was used to resist rogue edge attacks.

2.1.3 Cases of biometric authentication

Compared with mode authentication, biometric authentication has been widely concerned because of its uniqueness, non-replicability, heredity, and invariance. McIntire et al. (2009) pointed out that to ensure the network security and stability of cooperation, it was necessary to determine whether the other party is an AI or a human user. Therefore, it was necessary to use “reverse Turing test” (a group of problems that can be solved by humans but not by computers). After determining whether it is a machine or a human, in order to prevent others from passing off, humans need to be verified. At present, the identification is mainly based on the inherent characteristics of the human body (such as fingerprint, iris, etc.) and behavioral characteristics (such as voice, gait, etc.), and the powerful self-learning ability of AI that can effectively use them.

In the aspect of fingerprint recognition, Singh et al. (2017b) have proposed a fingerprint recognition method based on sparse proximity. Hariyanto et al. (2015b) have proposed a fingerprint feature point matching algorithm based on Artificial Neural Network (ANN) and compared the distance between feature points; the training process was accelerated by hardware. However, the paper missed performance evaluation. Saeed et al. (2018a) proposed a new fingerprint classification method based on modified Histograms of Oriented Gradients (HOG) descriptor, and this system used Extreme Learning Machine (ELM) with RBF kernel. Bakhshi and Veisi (2019a) put forword an end to end recognition model based on CNN without extracting features. In face recognition, Ding and Tao (2018) proposed a framework based on CNN. The features extracted from the clear and fuzzy pictures were shared, and the triple-state loss function was improved. Salyut and Kurnaz (2018b) proposed an ANN based on local binary mode to realize contour face recognition. Verma et al. (2019) used the hybrid genetic feature learning network in facial expression recognition.

In the aspect of iris recognition, Păvăloi and Niţă (2018a) used some distance measures and Scale Invariant Feature Transform (SIFT). Zhang et al. (2019) proposed a new method that uses dilated convolution to extract extra iris features, and several evaluation methods were used to test the model. Gangwar and Joshi (2016a) used the Deep Convolution Neural Network (DCNN) for iris recognition. Another technology combining AI and feature extraction technology, namely genetic and evolutionary feature extraction technology, was used in Shelton et al. (2016) to extract the most significant features in the images (small sizes). In the aspect of finger vein, multi-layer ELM (Yang et al. 2019), multi-layer CNN (Liu et al. 2017; Zhang et al. 2019; Hong et al. 2017), Fully CNN (FCN) (Zeng et al. 2020), Transfer Learning (Fairuz et al. 2018), and other methods could be used to achieve a recognition.

Amberkar et al. (2018b) studied the important role of Recurrent Neural Networks (RNNs) in the field of voice recognition. Some researchers introduced ladder networks to speech recognition(Parthasarathy and Busso 2019b),Footnote 8 and achieved good results. Han and Wang (2019a) proposed a new speech recognition method; it used Deep Belief Networks (DBN) to extract features and Proximal SVM to achieve recognition. Gait, as an important part of behavioral characteristics, has also attracted many researchers. For instance, Uddin et al. (2017a) firstly extracted features from depth silhouettes, and then used CNN to train and recognize. Deng et al. (2019) combined the three methods of RNN, CNN, and Radial Basis Function Neural Network (RBFNN) to eliminate the influence of perspective on gait recognition and achieved good results in the experiment. C4.5 decision tree (Thongsook et al. 2019a), HOG (Sugandhi and Raju 2019), and DCNN (Nithyakani et al. 2019) also performed well in gait recognition.

2.2 Network situation awareness

2.2.1 Network situation awareness requirements

In the process of network construction, the network designers may not find the vulnerability and insecurity in the network topology. In the process of network use, the non-uniform flow of data, which exposes the position of the network, perceives the weak link of the network in advance, provides the basis for network adjustment, needs to use network situation awareness. In the process of network situation awareness, complex networks need to be modeled, analyze the security situation of the network, and finally give the quantitative results of network situation awareness. To achieve this process, it is required that the situation awareness model has a strong knowledge base, from which it can quickly detect and match the network situation. At the same time, the model needs to have the ability to extract features, aim at never appearing in the network situation. Besides, reasoning can be realized to give reliable perception results.

2.2.2 Cases of network situational awareness combined with AI

Multi-entity Bayesian networks (MEBN) performs well in situational awareness, but there are some problems such as complex, so the idea of human-aided was used (Young Park et al. 2016b). Fuzzy Neural Network (FNN) could also play an important role in situation assessment (Li and Li 2017a), machine learning method combined with fuzzy theory could better reflect the change of network state. Yunhu Jin et al. (2016b) proposed a assessment model based on Random Forest. Every tree in the forest used independent samples and participated in the classification together, making the final result more objective. Li et al. (2018b) proposed an information fusion model based on time-space network situation awareness mechanism. This model used RBFNN for situation prediction. Yang et al. (2019) proposed a new calculating security indexes method based on CNNs. These indexes can help assess the network situation.

Shi et al. (2017) proposed a security situational awareness model. It is based on immune system and grey prediction theory. Dongmei and Jinxing (2018) used Wavelet Neural Network (WNN) based on particle swarm algorithm to achieve network situational awareness. They also designed a new algorithm to reduce data attributes. This research was committed to meeting the requirements of situation awareness in big data environment. Naderpour et al. (2014) used dynamic Bayesian network as situation assessment component, and used a fuzzy risk estimation method to generate results. In this design, the idea of human-in-the-loop was also well reflected. Bao et al. (2019b) aimed at the background of big data and AI to optimize the design of information security situation awareness system, including optimizing system hardware configuration, standardizing the synchronous operation mechanism of AI in multiple data security perception, improving the information security situation inference algorithm, designing the system software structure, and adding comparative repair steps based on security characteristic parameters.

2.3 Dangerous behavior monitoring

While new technologies such as big data and cloud computing continue to emerge, hackers’ offensive methods are also constantly developing. With the rapid growth of data volume and increasing access to the Internet, hackers are committed to find “lethal points” of the network and launch attacks on the network at any time. The original intrusion detection systems have been unable to adapt to the characteristics of the network. However, the high-speed flow of data is also conducive to find traces left by hacking activities, and has become important evidence for taking security precautions in advance. In order to achieve cyber security with accurate methods, it is necessary to monitor dangerous behaviors and their types in time. Otherwise, there will be a situation of “emergency medical treatment”, which effectively protects the network but it wastes a lot of resources. To this end, researchers have begun to improve and innovate on the basis of the original intrusion detection systems to make the current network requirements of the intrusion detection systems as scalable as possible.

Marir et al. (2018) pointed out a new distributed large-scale network abnormal behavior detection method. It combined the deep feature extraction and multi-layer integrated Support Vector Machine (SVM) and used the distributed DBN to reduce the dimension of large-scale network traffic dataset to find abnormal behaviors. Kanimozhi and Jacob (2019b) proposed an AI-based hyper-parameter optimized network intrusion detection. The system used ANN technology to detect botnet attacks and abled to deploy on multiple machines. Aljamal et al. (2019b) proposed a hybrid intrusion detection system using machine learning in a cloud computing environment. The system fused the K-Means clustering algorithm and the SVM classification algorithm. Pandeeswari and Kumar (2016) proposed a hypervisor-based anomaly detection system, in which the main technology was a neural network based on fuzzy C-means algorithm. In the cloud computing environment, the system showed good performance under low frequency attack.

Some systems focused on monitoring a single dangerous behavior, such as Distributed Denial of Service (DDoS). Jyothi et al. (2016b) proposed a complete detection framework for DDoS, and achieved good results in the experiment. It used K-Means for behavior clustering and SVM for classification. Yuan et al. (2017a) proposed a deep learning-based detection method for DDoS. The whole system consisted of CNNs, RNNs, and fully-connected layers. Hsieh and Chan (2016a) divided the entire system into five parts, namely: data collector, Hadoop-HPFS, format converter, data processing device, and neural network detection module. This system could analyze high-speed, high-traffic network systems, and neural networks could also effectively identify data packet characteristics. The advantages of AI can play a significant role in mitigating a variety of specific attacks on the network (Jenab and Moslehpour 2016).

With the advent of the 5G era, some scholars have started to study the anomaly detection of 5G technologies. For example, an adaptive deep learning based 5G network anomaly detection system was proposed in Fernández Maimó et al. (2018). In this framework, two layers of deep learning models were used; one was focused on the method of using network flow aggregation detection to quickly search for abnormal signs, it mainly uses Deep Neural Network (DNN) for processing; the other one was based on the relationship between the timeline and related symptoms to identify network anomalies, and directly communicated with the monitoring and diagnosis module after finding the anomalies. The Long Short-Term Memory (LSTM) was implemented to handle time series well.

2.4 Abnormal traffic identification

Any network has a certain carrying capacity. Within normal threshold, network can play a significant role in and provide users with high-quality services. Hackers will deliberately inject a large amount of illegal data into the network structure, which makes the network nodes and links unable to bear and cause accidents, unable to provide services for users, and even lead to serious problems such as information loss. How to provide an important basis for network situational awareness through analysis of network traffic, timely detection of high-risk behaviors on the cyberspace, and effective measures are of great significance for enhancing network response and maintaining overall cyber security.

According to research results by Ahmed et al. (2015a), abnormal flow detection methods could be divided into four categories, which were detection methods based on classification, statistics, clustering, and information theory. Aljurayban and Emam (2015a) proposed an intrusion detection system framework in cloud computing. This framework could be integrated on different cloud levels and could capture traffic then sent it to ANN. Zhang et al. (2019) proposed a Parallel Cross Convolutional Neural Network (PCCN) based on deep learning to implement traffic anomaly detection in multi-class imbalanced networks. It was mainly composed of two parallel CNNs and used multiple feature fusion methods. Zeng et al. (2019) considered the current development trend of network traffic encryption and proposed an end-to-end network traffic recognition framework based on deep learning. The framework had a two-layer structure; it used CNN to extract features and LSTM to record time characteristics. Kong’s team is dedicated to the combination of abnormal traffic identification and AI. They compared the performance of K-means (unsupervised) and SVM (supervised) methods in abnormal traffic (Kong et al. 2018a), and built a system based on SVM to identify and classify multiple attack traffic (Kong et al. 2017b). Besides, they proposed to use parallel computing to accelerate the training of the model (Kong et al. 2018b).

2.5 Summary

The aforementioned four subsections respectively introduced the AI in cyber security from different aspects. This subsection mainly summarizes the relevant technologies used in various aspects as shown in Table 2.

Table 2 Basic methods used in references

By summarizing these articles, it is found that most of the proposed methods are realized through the transformation of the basic methods of AI as shown in Table 2. Among them, 24% of the methods used CNN, 15% of the methods used SVM, and 12% of the methods used ANN, which are the most frequent used basic methods (refer to Fig. 2a for detailed usage proportion). These basic methods provide the basis and reflect the feasibility and superiority for the applications of cyber security.

But at the same time, the field of cyber security has its own characteristics, so these articles combine the characteristics of the research direction to improve the basic methods, mainly including: methods fusion (using two or more basic methods in the model), features selection (selecting new features or expressions to improve the identification ability), and models optimization (used to speed up the parameter update speed or better finding the optimal solution). From this point of view, we classify the articles and get Table 3.

In order to more clearly describe the use of basic methods in the four research aspects, Table 3 is drawn with pie chart, as shown in Fig. 2b. For user access authentication, more researches focused on features selection. Network situation awareness and dangerous behavior monitoring focused on the research of models optimization and methods fusion. Models optimization was regarded as the focus of abnormal traffic identification. For different research aspects, researchers can choose to determine the means of using the methods, and finally get the purpose of achieving new breakthroughs in technology.

Figure 3 shows a model that summarizes most of the research ideas in the field of cyber security. This model deals with security issues through four steps, including data selection and acquisition, data feature extraction, model construction, and specific applications. To this end, the entire model is divided into four levels as follows:

  • Data layer: data selection is the most basic work, and the quality of data selection directly affects the performance of the model. For the four research aspects, the data used in the experiments include general datasets and self-collecting datasets. In mode authentication and network situation awareness, all the articles mentioned in this paper (except Dongmei and Jinxing (2018)) used self-collecting datasets, such as operators behaviors (Shoufan 2017a), cyber security reports (Yunhu Jin et al. 2016b), specific network traffic data (Li and Li 2017a; Li et al. 2018b), etc. Using self-collecting data can enrich the diversity of data, but it causes some difficulties for the accuracy of single model estimation and the comparison of different models. On the contrary, a small number of articles in the remaining research perspectives collected data on their own (e.g. Chang et al. 2016; Lu et al. 2020; Shoufan 2017a; Korkmaz 2016), and most of them used general datasets. The general datasets they have covered are given in Table 4.

  • Feature layer: effective feature extraction is an important factor in determining security issues accurately. The unified processing of data is a necessary step to do before starting data extraction, especially when using self collecting datasets [e.g. (Wang and Fang 2019)]. Some methods integrated feature extraction in model construction and representation, but others performed separate feature extraction to enhance the ability to express data (refer to Table 3).

  • Intelligent layer: This layer is implemented in two steps, namely modeling and evaluation. The construction of the model is an essential step to embody AI and the core content of the general model (for the basic methods and usages involved in the model, refer to Tables 2 and 3, respectively). The effectiveness of the model was judged by the evaluation methods. The main used methods were accuracy rate, followed by the equal error rate (EER). Besides, some studies used specific evaluation methods for specific problems, such as response time (e.g. Lu et al. 2020; Hariyanto et al. 2015b; Salyut and Kurnaz 2018b), receiver operating characteristic (ROC) curve (e.g. Zhang et al. 2019; Ding and Tao 2018; Shelton et al. 2016; Fairuz et al. 2018), cumulative match characteristic (CMC) curve (e.g. Shelton et al. 2016), etc.

  • Application layer: After construction, these models either provided solutions for problems, or deployed them in combination with specific scene. The theme of the applications was consistent because of using AI to ensure cyber security.

Fig. 2
figure 2

Proportion of basic methods and their used

Table 3 Classification of basic methods used
Fig. 3
figure 3

A general model for cyber security

In addition, this paper also summarizes some of innovative methods mentioned in Table 5. These summaries include the datasets, features and their extraction methods, classification models, and maximum accuracy of methods. At the same time, timeliness and complexity are also used to compare the various methods. These two indicators can reflect the effectiveness of the methods, which also meet the processing requirements of cyber security issues.

In the field of cyber security, AI can play an important role, but at the same time, it needs to be adjusted to make this technology more suitable for the use requirements of this field. How to achieve fast detection, improve detection accuracy, and mine data characteristics are the focus of the current research in this field.

3 Limitations of AI-based approaches

Can AI detect all uncertain events? The answer is no. As a “double-edged sword”, this new technology has its own shortcomings as well as a good performance. This section discusses the factors that make the AI model dishonest in the field of cyber security.

3.1 Interference of confusing data

How much interference can cheat AI? Maybe one pixel is enough. Su et al’s experiment 2019 showed that only changing one pixel in the image can lead to the misclassification of neural network. Kolosnjaji et al. (2018a) modified a few bytes in the malicious sample software tightly, which leaded to neural network classification error. Hu and Tan (2017) used the Generative Adversarial Network (GAN) to obtain malware samples, which could bypass the detection system. As can be seen from the these examples, once the data is “infected”, there is a chance to cheat the AI system, resulting in the unsafe state of the network.

Table 4 Summary of general datasets
Table 5 Summary of some innovate methods

3.2 Maliciously modified model

The implementation of AI model is a program, which may have some vulnerabilities. These vulnerabilities may be due to the designer’s unreasonable and careless design of the logical structure of the model. They may come from specific high-level language, hardware specific problems, or the back door embedded in the model. Gu et al. (2017b) implemented the backdoor in the neural network, which made the performance of the neural network in the specific attacker sample very poor. These shortcomings also reflect from the side that the given answers by the program are not necessarily accurate.

3.3 Lack of transparency in AI decision-making process

In the decision-making process of AI, all the participants including programmers, do not know why the AI model gives the final decision results i.e., the decision-making process of AI lacks transparency. The AI model is similar to a black box. In the process of its creation and self-improvement, it can realize the automatic configuration and adjustment of parameters without too much intervention of staff, and thus, saves human resources. Nevertheless, at the same time, the problem is that its decision-making process is difficult to explain clearly. Although the AI model can achieve high accuracy, the tests are all implemented in the test set. Therefore, when facing unknown events, whether the AI model can achieve such a high accuracy remains to be verified. When there are objections to the decision-making results given by the AI model, it is difficult to explain the decision-making process, so some people will be skeptical of the decision-making results. That will not be conducive to the rapid judgment of the network situation, or even cause irreversible consequences. Some research teams have begun to conduct in-depth research on this issue (Schlegel et al. 2019).

3.4 High data requirements

At present, the AI models basically need a lot of data to complete the training. Before using data, they may do operations which mainly includes a series of steps such as data noise reduction, normalization, missing value filling, etc. If a supervised method is used, it is necessary to label the data manually.Footnote 9 However, due to the strong heterogeneity of cyberspace, different cyber structures may produce different high-risk events, and these events have the characteristics of sudden. Therefore, each possible high-risk event cannot be estimated in advance before the design of the models, nor can these high-risk events be analyzed and labeled in advance. Meanwhile AI models have a high demand for data, which may not be able to make a timely judgment.

4 Conceptual human-in-the-loop cyber security model

4.1 The development of human-in-the-loop

The design and implementation of AI, especially neural networks, try to emulate the human brain. Its purpose was to achieve the same way of thinking as humans by using connecting neurons. Unfortunately, AI needs a large number of samples to achieve learning, without reasoning ability, the final model after training is complicated for us to understand how it makes decisions. Although some attempts to seek the explanation of AI (Schlegel et al. 2019; Wang et al. 2020), this work is still in the initial stage. Therefore, to achieve efficient use of intelligence, it is far from enough to rely on these tools without human participation (Nunes et al. 2015).

AI plays an important role in the prevention and detection of high-risk network behaviors, but there are some bad factors that will interfere with its correct judgment. The AI is to assist the security specialists in this field, not to replace them. Therefore, it is still necessary for relevant specialists to intervene and use relevant network knowledge to make professional judgment on the current network form.

At present, a new type of AI is being developed, that is human-in-the-loop. In 2017, defense advanced research projects agency (DARPA) designed the DARPA robotics challenge (DRC), the idea of human-machine teamwork was embodied.Footnote 10 In January 2019, DARPA released the AI project named KAIROS to implement a system that could identify events and attract humans attention. In May, this administration announced the launch of ACE project to develop air combat capability of human-machine collaborative dogfighting.Footnote 11 In November, the U.S. Department of Defense received a report about the mechanical fighters and human-machine integration describing Cyborg fighters to be built for future wars.Footnote 12 In the military field, the research of this new technology has begun, which also reflects its importance.

Human-in-the-loop could combine human wisdom and machine intelligence, which is an important methodology to realize the complementary advantages of human and machine. AI can process a large number of data quickly and has a good recognition effect for specific scenes. But it may be disturbed and may not judge the new situation accurately. Compared with machines, human beings are more flexible, and can give a more rapid judgment in the face of new changes in the network, but also need machines to provide an auxiliary role. Interactive machine learning, which was used in AI (Holzinger et al. 2016), has also been embodied in cyber security. For example, the use of human-in-the-loop in the monitoring of software side-channel vulnerabilities could effectively improve the detection ability (Santhanam et al. 2017). Adding this idea to situational awareness, while achieving visualization, also enhanced the reliability of the system (Tyworth et al. 2013b). Thus, the use of AI and human-in-the-loop in cyber security will further enhance the models’ capabilities.

4.2 Model design of cyber security based on human-in-the-loop

AI technology has significant advantages in the applications of cyber security, but also has its own shortcomings. Based on this fact, this paper proposes a new model based on human-in-the-loop named Human-in-the-Loop Cyber Security Model (HLCSM). The model design is shown in Fig. 4. HLCSM is mainly divided into two sub modules: Machine Detection Module (MDM) and Manual Intervention Module (MIM). The two sub modules interact with each other to prevent and detect the cyber uncertain events.

Fig. 4
figure 4

Human-in-the-Loop Cyber Security Model (HLCSM)

4.2.1 Machine detection module (MDM)

In HLCSM, MDM plays the “leading role”. When high-risk events arrive, MDM will preprocess the data, which may include data cleaning, data normalization, and other operations. When the data are regular, it is very important to extract features from the data. The data contain the key information of locating event type and other information with little correlation. With the increase of data volume, it is necessary to select features and reduce dimensions in order to complete tasks quickly. After the data features are extracted, they are sent to the recognition method, which is the key link of MDM. Only when the selected method meets the requirements, the recognition accuracy will be higher. After the recognition result is given, the running result will be judged by the confidence level module (CLM), and its value will determine whether the final result is based on MDM.

In MDM, two identification methods are used. Based on the fact that neural networks can deploy backdoors (Gu et al. 2017b), the current use of a single recognition method does not guarantee complete reliability of the result. Therefore, in MDM design, two recognition methods are used to generate the judgment results in parallel, and the knowledge base provides the judgment basis. For these two methods, the used judgment techniques should be as different as possible to increase the diversity of judgment methods. By using two recognition methods, the difficulty of elusion can be increased and the accuracy of the result can be further improved. Both of the two produced results will be handled by CLM.

4.2.2 Manual intervention module (MIM)

In HLCSM, MIM plays the “auxiliary role”. When the result of MDM is unsatisfactory, the processing power should be given to MIM. After the safety specialists receive the information feedback, the event will be handled according to the experiential knowledge. The final result of determining whether the event is safe will be given directly by the safety specialists, and will no longer be intervened by MDM.

Due to the uncertainty in the types of network events, and in order to further expand the processing capabilities of MDM, it is necessary to expand MDM with the results of MIM processing. After giving the final result, specialists also need to perform data calibration. Through features extraction, new type of event is added to the knowledge base to achieve the role of expanding MDM.

4.2.3 Confidence level module (CLM)

In order to connect MDM and MIM and realize the cooperation of the two modules, CLM is introduced to the design. The main function of CLM is to determine whether MIM needs to be called to complete the event processing. When the recognition methods of MDM give two processing results, CLM integrates the results and gives the confidence level. When the confidence level is high, the final result will be given directly by MDM, which can not only save manpower, but also reduce the identification time and meet the requirements of cyber events processing. However, when the confidence level is low, the feedback will arrive at MIM and be processed by specialists to minimize errors.

There are two cases of low confidence level. First, MDM gives two completely different judgment results. This will not determine whether the event is safe or not. Secondly, the results of the two methods are the same, but they do not reach the theoretical threshold under the indexes such as accuracy, so it can be considered that the credibility of the results is not high, and they will not be regarded as the final judgment result.

4.3 Flow chart of HLCSM

Fig. 5
figure 5

Flow chart of HLCSM

In order to better express the relationship between the parts of HLCSM, Fig. 5 shows the flow chart of the model. It can be seen from the flow chart that some follow-up work should be carried out after the judgment of event i.e., if it is a high-risk event, quarantine and other means should be taken; if it is a safety event, the operation should be ended. This will constitute a complete network event defense model.

In HLCSM, the idea of human-in-the-loop is embodied. It is achieved through the cooperation between MDM and MIM judged by LCM. Based on this idea, despite the rapid development of AI, it will still be disturbed. The emergence of AI should assist human beings, rather than completely replace them. In this model, there is no need to use a large number of security specialists, because the main work is done by MDM. In this way, it can reduce personnel allocation and expenditure cost. However, due to the particularity of the used environment, this work has to have security specialists, so as to more effectively resist the rapid changes of the current network environment.

By expanding the knowledge base, MDM can handle more event types. This expansion process needs MIM auxiliary implementation. When the knowledge base reaches a certain scale, it needs to be redesigned to meet the requirements of rapid search. This design is not considered in this model.

Using two recognition methods, on the one hand, can solve the problem of unreliable single recognition method. On the other hand, it will not cause too much burden to CLM module because of using too many recognition methods. At present, the recognition method is more complex, so it requires higher computing power for the local machine. If many recognition algorithms are used at the same time, they will also cause load to the operation of the machine. But at the same time, although cloud computing and other technologies can reduce this burden, in order to prevent insecurity, it is not recommended that the methods training be carried out in a non local way.

In order to realize the cooperation between human and machine, the intermediate “translation” work is also essential. LCM takes this responsibility and plays the role of human-machine interface. The key of human-machine integration is to realize the complementary advantages of both of them, which is also the original intention of our model. If a model wants to achieve a high degree of coordination between them, it needs to have full “communication”. Through the confidence level, the bridge between human and machine communication is an important link which can not be obtained by human-machine interaction. The realization of this link will directly affect the mutual cooperation between the human and the machine.

Table 6 Comparison of models

4.4 Comparison

In order to more intuitively express the difference between the model in this paper and the general model, both models are roughly evaluated from 10 aspects in Table 6, including:

  • Scalability: in MDM, two recognition methods are used to complement each other. Users can choose specific technical methods according to specific scenarios, or expand the number of recognition methods. At the same time, the knowledge base can be expanded and get greater advantages in dealing with new security issues.

  • Maintainability: compared with the general model, HLCSM adds two new modules (MIM and CLM), which need to realize the cooperation between the modules. In case of failure, the maintenance difficulty is higher than the general model.

  • Closedness: HLCSM incorporated the concept of closed loop in the design, taking the high integration of human and machine as the final goal. After processing the result, it is necessary to further determine whether this result can be used as the final result.

  • Integration: the general model can divide the processing of cyber security issues into four modules, which are reflected in MDM. In addition, the three major modules in HLCSM are highly integrated.

  • Participatory: the main idea of HLCSM is human-in-the-loop, so the safety specialists can participate in the decision.

  • Interpretability: CLM is an important bridge between AI and safety specialists, which can intervene the results with low confidence and solve the problem of poor interpretability of AI to a certain extent.

  • Reliability: the experimental results are given by two recognition methods, and processed by the safety specialists. The reliability of the model is significantly improved compared with the general model.

  • Complexity: the processing of the general model is only covered in MDM, which shows that the complexity of HLCSM is higher.

  • Security: both models have the characteristics of security, but the use of HLCSM will further reduce the interference of data forgery and other factors, so the security is improved.

  • Deployability: due to the larger system size of HLCSM, it is more difficult to deploy. Considering the fact that AI can be implanted in the backdoor (Gu et al. 2017b), it is recommended to deploy recognition methods separately in the cloud and on-premises to increase security.

To sum up, regardless that HLCSM performs poorly in maintainability and deployment, it has obvious advantages in other respects compared to the current general model. Therefore, HLCSM can be used as a new research idea to solve the shortcomings of the current use of AI in cyber security.

5 Conclusion

As AI has significant potential in cyber security applications, it is important for the researcher and practitioner community to understand the current state-of-play and the associated challenges. Hence, this paper reviewed articles focusing on the applications of AI in four cyber security domains, namely: user access authentication, network situation awareness, dangerous behavior monitoring, and abnormal traffic identification.

In addition to identifying research challenges and opportunities, the paper posited the importance of human-in-the-loop and proposed a conceptual model and explained how it can be utilized. Hence, a future follow-up work is to implement and evaluate the proposed conceptual model in collaboration with an organization.