1  Introduction

1.1 Background

Detailed investigations of cyber-attacks and threats reveal the fact that attackers or organizations that perform cyber-attacks use common attack patterns to trick targets [1]. For this reason, security communities that develop defense strategies against cyber-attacks require intensive sharing and review of Cyber Threat Incident Reports (CTIR) in order to make these strategies more effective [2]. Due to the large sizes of CTIR and the addition of new attacks to concept of Advanced Persistent Threats (APT), it almost impossible for traditional methods in cyber-attack environment [1] to identify characteristic signature of the performed attacks [3]. Traditional methods for threat classification can be divided into two categories; machine learning based methods and lexicon-based methods [4]. Prior methods make calculations using lexicons to generate classification results [5]. However, reliability of results obtained with these methods depends largely on the quality and coverage of threat reports [4]. On the other hand, methods in the first category use handcrafted feature engineering to capture statistical features. With statistical data, various classifiers, such as Support Vector Machine (SVM), are used to obtain an estimated output of threat characteristics [5]. However, due to the difficulty of applying these methods, a poor performance results in classification results. To improve the performance of these methods, a combination of deep learning and machine learning methods can be used to automatically extract the features from CTIRs [6].

1.2 Review

Effectiveness and usefulness of machine learning methods in cyber threat intelligence has been proven many times. Bayesian probabilistic machine learning is based on joint probability function that graphically represents probability-based relationships between attack tactics, techniques, and procedures (TTPs). In addition, it eliminates uncertainty and provides prediction reliability in threat intelligence reports [7]. Bayesian method allows for a better extraction of threat features with low computation cost [8]. Despite all these superior features, performance of Bayesian method needs to be improved mainly in terms of the activation function, the loss function, and the parameters. For improving the processing time and prediction accuracy, optimized algorithms in threat characteristic classification can be used [7]. Models that use machine learning have improved the processing time and the classification accuracy with the help of various algorithms and techniques in determining threat classification [5]. For example, the prediction accuracy in [2] reaches 92%, while the processing time gives an average value of 0.043 s which is better compared to other models that use conventional clustering methods [7]. In addition to that, the use of Naive Bayes algorithm reduces keyword search problems and provides better performance than other models [2]. However, this algorithm assumes that all predictive threats are independent of each other [7]. This slows down the prediction stage and effecting the processing time performance, making it difficult to understand associated attacks.

1.3 Aim

The purpose of this paper is to improve the prediction accuracy and the processing time of security mechanisms against cyber-attacks. This is done by combining two functions; a modified version of Naïve Bayes posterior probability function and a modified risk assessment function. The proposed modified Bayesian probabilistic graphical function [7] is used to overcome independent TTPs detection problems of current support function algorithm. Moreover, it represents the probability-based relationships between TTPs, eliminates uncertainty, and provides prediction reliability [7]. For the modified risk assessment function, this paper applies the risk assessment framework, provided in [3], for the TTP classifications. This framework [3] provides a dynamic risk management by focusing on behavioural detection of complex TTPs. By combining these modified functions, the proposed solution has solved the problems of the existing solutions and increased the classification and the prediction accuracy.

1.4 Paper Structure

The rest of this paper is organized as follows: Sect. 2 reviews the literature on current solutions of cyber threat classification using machine-learning methods. The proposed solution is discussed in Sect. 3. Section 4 discusses the experiments and results of the proposed solution. Finally, Sect. 5 concludes the paper and presents the future works. Table 1 shows the abbreviation list used in this paper.

Table 1 Abbreviation for annotations used in the report

2 Literature Review

In this section, some related papers from the literature are summarized to give better understanding of the study problem, methods, and techniques.

2.1 New Emerging Techniques, Tactics and Procedures in Cyber Threat Intelligence Reports

The solution that is proposed in [9] highlights the benefits of using high-level indicators of compromise (IoCs) in attributing cyber threats and provides a machine-learning model that supplies effective accuracy in extracting high-level indicators of compromises from unstructured cyber threat intelligence (CTI) reports. The solution enhances automated cyber threat attribution framework (FinTech) to minimize unstructured report errors in machine learning. The authors offered a solution to the problem by using distributional semantics technique and improved indexing of CTI reports. They have conducted research by integrating natural language processing into machine learning models. In addition to that, they have profiled cyber attackers and attack patterns with FinTech algorithm. Their solution provides 97% accuracy in extracting high-level IoCs from unstructured cyber-threat intelligence documents. In FinTech algorithm, semantic matching query terms are matched to terms in text corpora [1] but, synonyms and polysemous words cause false matching due to inaccurate concept matching [7]. This negatively affects latent semantic analysis during obtaining and connecting high-level attack processes [7]. As a result, the accuracy in terms of cyber threat actors is acceptable, but false matching error needs to be considered to solve the issue of matching with latent semantic query. Kim and Kim [10] enhanced cyber threat intelligence dataset that is generated from cyber intelligence reports using an automated dataset generation system CTIMiner. They aimed to increase the quality of data which can be used in cyber threat intelligence analysis techniques. They have offered a solution by using data extraction method which was enhanced by malware analysis platform that provides additional valuable collection of detailed threat data and revealing characteristics of attackers who are performing cyber-attacks. The conducted research is done by running CTIMiner system on 612 public reports and categorizing the different types of collected data. It provides an increase of valuable data by 43%. In addition, this solution supplies high quality and structured dataset that is obtained from open sources, which provides cyber analysis techniques and statistical features suitable for CTI analysis [11]. However, quality of the results, which is obtained during parsing indicators of compromise extraction stage [12], is critically affected by parser’s performance. Due to the parsers’ functional limitations, there could be remaining IoCs that was not extracted from the reports database. This effects the threat analysis results and the prediction accuracy [12]. As a result, accuracy in terms of valuable threat data is acceptable, but parser performance errors need to be considered to solve extraction issue of attack patterns [5].

Subroto and Apriyana [13] in their work have proposed statistical machine learning algorithmic model to minimize unstructured and constantly changed data errors in cyber-threat intelligence reports. In their proposed solution, they used CVE-details, R software, and dendrite/pyramid functions of Plotrix package to improve the learning confusion matrix. Their solution conducted research by integrating term document matrix [14] into CVE databases to automate statistical machine learning algorithm. Their work provides 96.73% prediction accuracy with artificial neural networks (ANN) algorithm, which analyse vulnerability patterns. With this solution a good range of prediction accuracy with better positives prediction is reached, which provides high true positive rate [15] in analysis of vulnerability patterns. In addition to that, while analysing cyber threat big data, machine-learning algorithms are used to organise and clean the data. Therefore, analysis of vulnerability patterns became more accurate. However, false-negative error is not considered [15] in this solution and lack of risk management algorithm leads to short-term information delays [14] during cyber risk analysis in CVE database. This creates an environment for false-negative errors [15]. As a result, prediction accuracy in terms of threat patterns is acceptable, but risk management algorithm needs to be considered where false-negative errors are defined. Improvement of risk calculation accuracy in security management process to minimize security events, that become an incident during cyber-threat intelligence risk assessment, is the purpose of Riesco et al. [3] study. Their solution minimises emerging threats error (also called as unknown threats error) that occurs during CTI risk management process. Authors have enhanced risk management frameworks using dynamic risk assessment and management (DRA/DRM) algorithm to minimise merged threat errors. Web ontology language (OWL) and Semantic Web Rule Language (SWRL) are used to improve operational level triggers. Authors have conducted research by integrating value added semantic algorithm format into DRM for further DRA/DRM implementation. This provides 65% risk assessment accuracy in security events. The developed dynamic risk-management framework is compatible with widely used management standards and risk assessments [16]. It also provides a degree of details and effectiveness that are required for risk management frameworks and an acceptable range of risk assessment accuracy with tactical and strategic levels of risk relationships. This supplies near real time dynamic risk assessment [16]. However, not all cyber threats that are encountered in the virtual environment were included in the risk calculation algorithm. This causes incorrect detection in real time responses [8]. As a result, prediction accuracy in terms of near real time severity gives good results, but the missing threat error need to be considered to solve the issue of detection in real time responses.

2.2 Modelling Attacker Activities Based on Close Attacks

Durkota et al. [17] developed intelligent and rational security with mathematical algorithm framework (game theory) to downgrade decision-making errors in cyber threat prediction. Their work improves accuracy of computing optimal defense strategies against complex cyber-attacks in real computer networks (multiple decision-making) environments. The offered solution uses Stackelberg equilibrium (SE), MILP formulation and Markov decision process (MDP) to improve the process of computing optimal attack policies. Authors conducted research by integrating the attack graphs into MDP algorithm to compute optimal attacker policy. This gives good results in terms of decision-making accuracy. In addition to that, the provided framework successfully found 88% optimal strategy. The proposed solution provides an improved accuracy on decision-making process even if the attack complexity level is high. With the provided algorithm, a method which can be calculated quickly has been developed and the attack prediction rates can give high results [18]. However, the algorithm needs a large amount of processed data [4] (aggressive motivations, attack success percentages, etc.) which increases margin of error in sensitivity of decision-making strategy development [18]. As a result, the provided solution gives high results for action-success probability accuracy, but the sensitivity error needs to be considered to solve the issue of strategy development. Noor et al. [2] developed a novel machine learning based framework to minimize cyber-threat prediction errors in cyber threat intelligence. They have offered a solution to the problem by using Latent Semantic Indexing (LSI) [1] which has improved the correlated attack detection. Their work improves threat analysis process in attack prediction mechanisms that may help to identify TTPs based on artefacts that are observed with the help of appropriate machine learning algorithms. The proposed solution conducted research by ranking threat incident reports and adversarial tactics, techniques and common knowledge (ATT&CK) repositories based on historical data that measures maximal detection with novel machine learning based framework [1]. It provides attack pattern prediction accuracy of 92% and detection time of 0.04 s. This solution supplies high prediction accuracy and quite low detection time as compared to considerable time it typically takes to predict data breach incidents. This improvement provides threat analysis process using SIRS system with effective security analysis mechanism against attacks [5]. However, the algorithm assumes that all predictive TTPs are independent of each other [7]. This slows down the prediction stage making it difficult to understand associated attacks. The threat support function of the algorithm, which measures the maximal support of the detected TTPs towards a threat occurrence, tends to set all predictor TTPs as independent when function value approaches 1 or 0 [7]. This affects the model’s ability to recognise attacks and reduces the overall threat prediction reliability. As a result, the prediction accuracy in terms of attack prediction is acceptable but TTPs independency needs to be considered to solve the issue of detecting unknown attacks.

Sun et al. [7] proposed a solution that enhanced the machine learning method based on Hawkes Process. The solution is for modelling attacker activities with latent distance model to effectively identify activity pattern and structure of cyber-attacks. Their work uses only temporal information without the need for complicated feature engineering. Moreover, it filters out dissimilar attacker patterns of clusters. Since the graphical clustering algorithm, that is used in the developed model, does not require prior knowledge of the number of clusters and the cluster size [2], this method has a great generality. This solution provides acceptable predictive log likelihood and it effectively models and clusters attacker activity using machine learning. The study conducted research by integrating Bayesian Probabilistic Graphical Model (BPGM) and quality-based clustering algorithm in machine learning. It provides lowest sparsity property gained by the network prior − 9.5 and effectively detect connection between attackers in large number of events. However, Gibbs sampling algorithm, that is used in the BPGM, is a time-consuming method of inference [19]. Failure to detect order of attack (attack pattern) on time arises as an important problem for security of this solution [2]. As a result, predictive log likelihood in terms of attacker activity is acceptable, but cluster size needs to be considered to solve algorithm performance issues.

2.3 Advanced Malware Prediction with Regression Models

Husak et al. [6] have enhanced the attack prediction using attack projection and intention recognition algorithm to minimize prediction mistakes in intrusion detection system (IDS). In their study, the authors used artificial neural network and support vector machine (SVM) machine learning algorithms to improve attack prediction. They conducted research by integrating data mining and neutral networks into intrusion detection system to reduce the complexity and learning time of prediction algorithm [13]. Depending on the length of the applied attack scenario, the proposed work provides 92.3–99.2% accuracy rate. This solution provides a good range of accuracy with a minimal time delay, which supplies prediction for even very specific attacks [12]. Attack prediction accuracy has been increased since data mining has been added to the machine-learning algorithm for learning and generating attack models or attack plans [20]. However, the loss function, which causes small changes in the beginning of attack prediction in SVM algorithm, causes a slowdown in prediction of attacks that use different models [20]. As a result, the accuracy in terms of frequency of mistakes is acceptable, but automated attack plan library generation needs to be considered to solve the issue of prediction changes. Lee et al. [5] aimed to improve capacity of deep learning-based methods to transform security incidents that are collected into individual activities to prevent advanced cyber threats. For this reason, to minimize false positive errors, they developed artificial intelligence security information and event management cyber-threat detection technique (AI-SIEM). They offered a solution to the problem by using large-scaled event profiles and deep learning detection methods has enhanced accuracy performance. The integration of term frequency-inverse document frequency (TF-IDF) indexing mechanism for very large scale of data in AI-SIEM algorithm improves true positive accuracy. Overall best accuracy was delivered by the proposed event-profile artificial neural networks (EP-ANN) models with accuracy score of 0.93–0.99 in four experiment datasets in cyber-threat intelligence analysis. This work provides an acceptable improvement of true positive accuracy with rapidly respond time, which supply cyber-threat detection ability in large-scale cyber security environment. AI-SIEM system quickly and effectively compares long-term security analysis [21] and highlights important security alerts, therefore, false positive alerts are reduced [6]. AI-SIEM algorithm yield very good results on benchmark datasets, but accuracy inconsistencies are observed with application of the system in EP-ANN algorithm [21]. As a result, accuracy in terms of true positive and false positive is accepted, but rare attack data learning algorithm needs to be considered to solve application of system in EP-ANN algorithm issue.

Bahtiyar et al. [20] enhanced advanced malware prediction with multi-dimensional machine learning technique to downgrade malware detection errors. The proposed solution uses linear, polynomial, and random forest regression models [22] to improve correlation value. This solution conducted research by the integration of regression algorithms into correlations among features and provides 0.8203 extracted correlation value between advanced malware software features and 0.558 closeness rate to advanced malware. The study provides improved prediction accuracy and efficiency with extracted closeness score and correlation value. This supplies certain identification in advanced malware prediction. Machine learning approach uses correlations among five features and four regression algorithms to predict advanced malwares [4]. In this study, random forest regression four feature has yielded better results on analysis, but an acceptable threshold value has not been achieved in precise definition of advanced malware software [8]. This mean that advanced malware features dependencies are not taken into account [4]. Therefore, machine-learning datasets, which contain newly founded advanced malicious software samples, should be added to multi-dimensional algorithm. Moreover, a threshold value should be entered into the system as fixed value not as random value [8]. As a result, the accuracy in terms of malware features is acceptable, but fixed value needs to be considered to solve this issue in precise definition of advanced malware software.

2.4 State of Art


In this part, system’s features, which are highlighted inside blue broken line in Fig. 1, and limitations, which are highlighted inside red broken line in Fig. 1, are presented. Noor et al. [2] proposed an enhanced novel machine learning based framework algorithm to minimize cyber-threat prediction errors. The use of Latent Semantic Indexing (LSI) has improved the correlated attack detection. This study conducted research by ranking threat incident reports (CTIR) and adversarial tactics, techniques and common knowledge (ATT&CK) repositories based on the historical data in order to measure maximal detection with novel machine learning based framework [2]. It provides attack pattern prediction accuracy of 92–100 and detection time of 0.04 s. This model consists of three stages shown in Fig. 1 which are 1- semantic indexer and retrieval system stage, 2-TTD semantic network stage, and 3- cyber threat prediction stage.

Fig. 1
figure 1

Block diagram of state of art system [2]. Good features of state of art are shown inside blue broken lines and limitations of state of art are shown inside red broken lines

  • Stage 1 Semantic Indexer and Retrieval System (SIRS)

Cyber threat incident reports and adversarial tactics, techniques and common knowledge documentations are the inputs of the system. While a CTIR corresponds to a single cyber threat, an ATT&CK document may correspond to several detection mechanisms associated with a TTP. To build threat TTP Detection (TTD) network, TTPs are extracted from structured threat information expression (STIX) and encoded as cyber-threat incident reports. After that, TTPs dictionary is built [2]. In this stage, a second step is to make every single TTP semantically correlated with malware attacks in the CTIR and with TTPs in ATT&CK documents. In order to connect TTPs with detection mechanism, instead of using simple keyword search, ranking process is applied with the help of LSI to CTIR and ATT&CK for each TTPs present in TTD [2].

  • Stage 2 TTD Semantic Network Stage

In this stage, threats are linked to their TTPs and detection mechanisms. In order to represent semantic relations of TTPs under three specific concept, the detection mechanism set, threat set, TTP set for cyber threat reports, ranked cyber threat incident reports, and ATT&CK threats are linked to their relative TTPs and detection mechanisms [2]. Then, to predict threats based on the existence of determined artefacts, a network of probable events is trained between threats and TTPs based on historical data to measure the maximal support of detected TTPs towards a threat occurrence. Limitation Experiment results illustrates that the state of art solution even in the worst case scenario where TTPs have high level overlap, has achieved prediction of attack pattern accuracy average of 92% and in ideal situation threat prediction accuracy becomes %100. However, with this model, threat support function in threat TTP detection semantic network algorithm assumes that all predictive TTPs are independent of each other [7]. This slows down prediction stage [7] making it difficult to understand the associated cyber-attacks [5]. Limitation Justification the threat support function of the algorithm, which measures the maximal support of detected TTPs towards a threat occurrence, tends to set all predictor TTPs as independent when function value approaches 1 or 0. This affects the model’s ability to recognize attacks and reduces the overall threat prediction reliability [5].

  • Stage 3 Cyber Threat Prediction Stage

The first step of this stage is the threat investigation (TD) module. The responsibility of this step is to produce a predicted threat based on the detected TTPs in order to predict a set of threats. Next step is reliability assessment (RA) in which reliability of prediction is measured. In case of high reliability, threat investigation is considered completed. Otherwise, a set of existing TTPs are considered by detection mechanism selection. Therefore, RA step reduces time and resource consumption by minimizing likelihood of incorrect prediction caused by the prediction with low reliability and presence determination of TTPs [2]. Last step of this stage is detection mechanism selection (DMS). This step is needed to help the cyber-security analyst to investigate threat artefacts against most likely attack family. This is done by recommending the most efficient and cost-effective detection mechanism a set of existing TTPs linked with detection mechanism is calculated based on cost efficiency. In case of sufficient reliability grow, the prediction is terminated. Otherwise, the TD receives set of predicted TTPs [2]. Limitation There is a problem in the prediction sets which give highest probability of occurrence values for classification of detected TTPs. This problem arises as to which malware instances should be included in prediction sets [3]. Limitation Justification When constructing prediction sets correctly, it is necessary to determine a threshold to reach more reliable threat prediction values. Expert opinion is used to determine this threshold which could be inaccurate and compromise the reliability of threat prediction [2].

This model presented prediction of attack pattern accuracy of 92–100%, prediction reliability in terms of number of detected TTPs of 50–60% and prediction threat incidents for detected TTPs in an average time of 0.04 s [2].

To link each threat in the dependency table to its respective TTPs and detection mechanism, a normalized table is used and normalized probability or normalized conditional probability is calculated. Normalized posterior probability defines the objective function, which is computed by Naïve Bayes technique as shown in Eq. (1) [2]. However, this slows down the prediction stage making it difficult to understand the associated attacks and downgrade the prediction reliability.

$${\upmu }\left({t}_{i}|{ttp}_{i}\right)=\frac{{\upomega }\left({ttp}_{i}\right|{t}_{i}\left)\text{p}\right({t}_{i})}{{\sum }_{{t}_{i}\in {T}_{ttpi}}{\upomega }\left({ttp}_{i}\right|{t}_{i}\left)\text{p}\right({t}_{i})}$$
(1)

where µ(ti|ttpi): normalized posterior probability using Naïve Bayes; ω(ttpittp|ti) normalized conditional probability; p(ti): prior class probability; ti: threat in dependency table built between threat incidents and TTPS; Tttpi: set of detected TTPs; p(ttpi|ti): conditional probability between TTPs and threats; ttpi|ti: event ttpi given event ti.

The threat set that is associated with the detected TTPs from dependency table is built, and historical threat occurrences, threat likelihood, and threat set prior probability are obtained to build Bayesian probabilistic graphical model for the threat probability estimation. Normalized conditional probability is calculated using Eq. (2) [2].

$$\omega \left({ttp}_{i}\right|{t}_{i})=\frac{p\left({ttp}_{i}\right|{t}_{i})}{{\sum }_{{t}_{i}\in {T}_{ttpi}}p\left({ttp}_{i}\right|{t}_{i})}$$
(2)

where ω(ttpiti): normalized conditional probability; p(ttpi|ti): conditional probability between TTPs and threats; ttpi: detected TTPs; ti: threat in dependency table; ttpi|ti: event ttpi given event ti; Tttpi: detected TTPs set.

Belief network is implemented in TTD semantic network stage. A network of probable events is trained between threats and TTPs based on historical data to measure the maximal support of detected TTPs towards a threat occurrence. To show maximal support, threat support function is calculated as shown in Eq. (3) [2]. However, accuracy and prediction reliability can be increased by techniques for best alignment.

$$S\left({t}_{i}\right)=\frac{{\sum }_{{ttp}_{i}\in {TTPD}_{i}, }{\upmu }\left({t}_{i}|{ttp}_{i}\right) }{{\sum }_{{ttp}_{i}\in {TTP}_{ti}}\mu \left({t}_{i}|{ttp}_{i}\right) }$$
(3)

where S(ti): threat support function; TTPDi: set of detected TTPs due to threat ti; TTPti: TTPs set, associated with a threat ti; μ(ti|ttpi): normalized posterior probability using Naïve Bayes; ti|ttpi: event ti given event ttpi; ti: threat in dependency table; ttpi: detected TTPs (Table 2).

Table 2 Belief network algorithm

3 Proposed System

In cyber threat intelligence environment, machine learning algorithms which use different feature extraction and classification techniques has been analysed in detail, pros and cons of each method have been determined. After analysis, it has been found that accuracy, reliability, detection time, and false detection are key factors that impact threat prediction neural network algorithm. Noor et al. [2] was selected as the state of the art for the proposed solution in this paper among other collected and analysed methods. The main reason behind this selection was the proposed novel machine learning based framework for cyber-attack prediction. Novel machine learning based technique semantically extracts threats and attack tactics, techniques, procedures from known threat sources to create a semantic network [2]. Semantic network establishes probability relationships between threats and TTPs using Naïve Bayes machine learning to identify and predict threats. Naïve Bayes computing normalises conditional probability of threat TTP mapping and therefore, finds a best candidate threat prediction set. In addition to that, to enhance the prediction accuracy, a novel machine learning based technique is combined with the belief network model [2]. However, this work has several limitations. One limitation is that the threat support function of the algorithm, which measures the maximal support of detected TTPs towards a threat occurrence, tends to set all predictor TTPs as independent when function value approaches 0 or 1. This affects the model’s ability to recognise attacks and reduces the overall threat prediction reliability. Moreover, it slows down prediction stage, making it difficult to understand associated cyber-attacks. Another limitation is that there is a problem in prediction sets which give the highest probability of occurrence values for classification of detected TTPs. This problem arises as to which malware instances should be included in prediction sets. To overcome independent TTP detection problem of support function algorithm, Bayesian probabilistic graphical model that is based on joint probability function inspired by Sun et al. [7] was used. This model graphically represents probability-based relationships between TTPs, eliminates uncertainty, and provides prediction reliability. Another new feature of the proposed solution is the application of risk assessment framework that was proposed by Riesco et al. [3] in TTP classifications. This framework provides dynamic risk management by focusing on behavioural detection of complex TTPs. Application of these new features has solved problems of existing solution, increased classification and prediction accuracy, and reduced processing time.

The proposed system consists of three major stages (Fig. 2) which are: (1) semantic indexer and retrieval system (SIRS), (2) TDD semantic network, and (3) cyber threat prediction.

Fig. 2
figure 2

Block diagram of the proposed threat prediction method using Bayesian probabilistic graphical algorithm. Green borders refer to the new parts in the proposed system

  • Stage 1 Semantic Indexer and Retrieval System (SIRS)

This stage follows architecture of Noor et al. [2] solution where threat incident reports and adversarial tactics, techniques and common knowledge documentations are the input of the system, see Fig. 2. While a CTIR corresponds to a single cyber threat, an ATT&CK document may correspond to several detection mechanisms associated with a TTP. As shown in Fig. 2, to build threat TTP detection network, TTPs are extracted from structured threat information expression and encoded as cyber threat incident reports. After that, TTPs dictionary is built. Instead of using simple keyword search, ranking process is applied with the help of LSI to CTIR and ATT&CK for each TTPs present in TTD. Therefore, every single TTPs semantically correlated with malware attacks in CTIR and with TTPs in ATT&CK documents TTPs with detection mechanism connected (see Fig. 2).

  • Stage 2 TTD Semantic Network

In the second stage of the proposed model as shown in Fig. 2, in order to represent semantic relations of TTPs under three specific concepts (detection mechanism set, threat set, and TTP set for cyber threat reports), ranked cyber threat incident reports and adversarial tactics, techniques and common knowledge threats are linked to their relative TTPs. After that, to predict threats based on existence of determined artefacts, Bayesian probabilistic graphical model is used to identify associated attacks [7]. Historical threat occurrences, threat likelihoods, and threat set probabilities are obtained to build the Bayesian probabilistic graphical model [7], see Fig. 2. In addition to this step, joint distribution threat posterior probability is calculated to build a dependency table between TTPs and threat set. Therefore, the algorithm’s threat probability estimation and normalised conditional probability are improved. Compared to threat support function, graphical model is simpler and solves dependency problem of TTPs. As a next condition, a network of probable events is trained between threats and TTPs based on historical data to measure maximal support of detected TTPs towards a threat occurrence.

  • Stage 3 Cyber Threat Prediction

To predict a set of threats, the responsibility of this step is to produce a predicted threat based on the detected TTPs. Next, the reliability of prediction is measured. In case of high reliability, threat investigation is completed. Otherwise, a set of existing TTPs are considered by detection mechanism selection, see Fig. 2. Therefore, this step reduces time and resource consumption by minimizing the likelihood of incorrect prediction caused by the prediction with low reliability and determination of the presence of TTPs. At this stage, the proposed work applies a dynamic risk management framework (see Fig. 2). This framework assesses the risk using threat impact and decreasing values of probability due to the implemented and new proposed measures [3]. Therefore, all threat sets of detected TTPs can be considered to assess the maximal support of a set of detected TTPs towards dependency table [3]. As a result, threat set with maximum posterior probability can be considered as predicted threat set. This framework solves prediction sets threshold problem and helps the algorithm to reach more reliable threat prediction level. To help investigation of threat artefacts against most likely attack family by recommending efficient detection mechanism, a set of existing TTPs linked with detection mechanism is calculated (see Fig. 2). In case of sufficient reliability grow, the prediction is terminated. Otherwise, threat diagnosis receives a set of predicted TTPs.

3.1 Proposed Equation

Identify prediction sets that induce observed data and capture distributions that characterize relationships between hidden states and hidden variables are critical to threat prediction. Bayesian probabilistic graphical model, that is based on joint distribution, is used to calculate posterior probability to avoid the problem arises as to which malware instances should be included in prediction sets. It increases probability accuracy due to associated threats consideration compared to Posterior Naïve Bayes probability that is based on normalised conditional probability. Joint distribution is defined as in Eq. (4) [7].

$$p\left({t}_{i} ,{T}_{ttpi}|{ttp}_{i}\right)= p\left({ttp}_{i}|{t}_{i} , {T}_{ttpi}\right) p \left({t}_{i}|{T}_{ttpi}\right) p \left({T}_{ttpi}\right)$$
(4)

where \(p\left({t}_{i},{T}_{ttpi}|{ttp}_{i}\right)\): joint distribution; ti: threat in threat incidents and TTPs dependency table; ttpi: detected TTPs; Tttpi: detected TTP threat set; p(ti|Tttpi): likelihood ti if Tttpi; p(Tttpi): prior Tttpi probability.

Historical artifacts, which show presence of cyber-attack, are used to calculate the probability between threats and TTPs. To configure probability, historical data that make up frequency tables are used for TTP—threat mapping. Accordingly, it is necessary to normalise the frequency table in order to avoid null values. History probability of threat for the detected threat set (p(ti |Tttpi)) is used to find threat that is associated with a certain threat set and detected TTPs [7]. Therefore Eq. (4) is modified by us in Eq. (5).

$$Mp\left({t}_{i}\right)= \text{p}\left({t}_{i}\right|{T}_{ttpi}\left)\text{p}\right({T}_{ttpi})$$
(5)

where \(Mp\left({t}_{i}\right)\): modified prior class probability; ti: threat in dependency table; ttpi: detected TTPs; Tttpi: detected TTP threat set; P(ti|Tttpi): history likelihood for threat set; P(Tttpi): prior Tttpi probability.

For a predictive nature, malware class TTP with the highest posterior probability is considered as predicted output. Similarly, among all TTPs, TTD network estimates threat event, host and network artifacts using symptoms to calculate the highest probability. Based on this, with prior threat class probability, the enhanced equation is used to get threat posterior probability. Therefore, Eq. 1 is modified by us to be in Eq. 6.

$$\text{M}{\upmu }\left({t}_{i} ,{T}_{ttpi}|{ttp}_{i}\right)=\left(\frac{{\upomega }\left({ttp}_{i}|{t}_{i}\right)}{{\sum }_{{t}_{i}\in {T}_{ttpi}}{\upomega }\left({ttp}_{i}|{t}_{i}\right)}\right) Mp\left({t}_{i}\right)$$
(6)

where Mµ(ti,Tttpi |ttpi): modified version of Naïve Bayes posterior probability; ti: threat in dependency table built between threat incidents and TTPs; ttpi: detected TTP. Tttpi: Threat set of detected TTP; \({\upomega }\left({ttp}_{i}|{t}_{i}\right)\): normalized conditional probability.

The risk assessment approach, which is used to identify most relevant threats in the threat set, increases the accuracy of probability function and reduces the time for threat prediction. Posterior probability threat risk assessment is performed using threat impact, decreasing values of probability, and the new proposed measures as given in Eq. 7 [3].

$${R}_{ti}={P}_{ti}+{I}_{ti}-{C1}_{ti}-{C2}_{ti}$$
(7)

where Rti: risk of threat ti; Pti: probability of threat ti; Iti: impact of threat ti; C1ti: decreasing value of probability Pti; C2ti: decreasing value of probability Pti.

To consider treat probability after risk assessment and to assess maximum support of a set of detected TTPs (depending on security events and time), threat support function is used. Since risk assessments may be updated dynamically, risk management treatments and classification may also be updated automatically. Therefore Eq. 7 is modified by us in Eq. 8.

$${MR}_{ti}={I}_{ti}-{C1}_{ti}-{C2}_{ti}$$
(8)

where MRti: modified residual risk assessment; Iti: impact of threat ti then it materialized; C1ti: decreasing value of probability Pti due to implemented measures; C2ti: decreasing value of probability Pti due to new modified measures.

The threat support function defines the best candidate threat prediction set (prediction sets which give highest probability of occurrence values for classification of detected TTPs) with maximum probability value [21]. Therefore, Eq. 3 is enhanced by us to propose Eq. 9.

$$ES\left({t}_{i}\right)=\frac{{\sum }_{{ttp}_{i}\in {TTPD}_{i} }\text{M}{\upmu }\left({t}_{i} , {T}_{ttpi}|{ttp}_{i}\right) }{{\sum }_{{ttp}_{i}\in {TTP}_{ti}}\text{M}{\upmu }\left({t}_{i} , {T}_{ttpi}|{ttp}_{i}\right) }+{MR}_{ti}$$
(9)

where ES(ti): enhanced Naïve Bayes posterior probability; detected TTPs set (TTPDi) ttpi={ttp1, ttp2, ttp3,…ttpn}; detected threat ti= {t1, t2, t3,…tn} in detected threat set Tttpi; TTPti: TTPs set associated with threat ti. TTPDi: Detected TTPs set due to threat ti; Mµ(ti,Tttpi |ttpi): modified version of Naïve Bayes posterior probability; MRti: modified residual risk assessment.

3.2 Area of Improvement

In this solution, two equations were proposed, and the current method performance was improved. First, threat support function is modified to calculate best candidate threat prediction set with maximum probability value as shown in Eq. (3). It uses Bayesian probabilistic graphical model which is based on joint distribution to calculate posterior probability. The purpose of posterior probability is to create a dependency table between TTPs and threat set. Therefore, dependency table with the help of prior threat class-probability calculation is used to find threat that is associated with a certain threat set which gives the best threat class probability. This solves dependency problem of the function and improves the threat prediction accuracy. Second, dynamic risk management function is combined with cyber threat prediction. With the help of Eqs. (8 and 9), posterior probability threat risk assessment is performed using threat impact and decreasing values of probability. The purpose of risk management is to assess the maximal support of a set of detected TTPs towards dependency table and to consider as predicted threat set with maximum posterior probability. This helps the algorithm to provide more reliable threat prediction and improves processing time.

Why enhanced Naïve Bayes posterior probability: Naïve Bayes posterior probability emphasizes that security incidents can be matched to tactics mapped to artificial objects in such a way that machines can identify these possibilities with certain possibilities. Using modified threat support function (TSF) as an activation function in the threat prediction algorithm, effectively avoids dependency problem of TTPs. It uses threat set associated with detected TTPs, threat likelihood, threat set prior, and historical threat occurrence as input values. The proposed work can solve dependency problem as graphical model of the Bayesian probabilistic effectively find best threat class probability. Appropriate matching of threat classes enhances prediction performance. In addition, Naive Bayes is simpler and faster than other algorithms, resulting in a more effective training process. Moreover, the proposed study considers risk management during threat prediction phase. Risk management framework considers treat probability after risk assessment to assess the maximum support of a set of detected TTPs towards threat using enhanced TSF. This improves the overall performance and enhances the prediction accuracy and processing time.

Independent detection of TTPs by threat-TTP-detection algorithm makes it almost impossible to detect associated threats while slowing down prediction stage. This affects the algorithm’s ability to recognize attacks and reduces the overall threat prediction reliability. Therefore, the proposed work provides precise analysis of related threats using Posterior Naive Bayes based on normalised conditional probability, which increases probability accuracy. In addition to that, in prediction algorithms, lack of risk management during threat prediction phase reduces the overall classification performance. This becomes a problem in prediction sets, which give the highest probability of occurrence values for classification of detected TTPs. The proposed work considers treat probability after risk assessment to generate maximum support results of detected TTPs. This effectively prevents threshold mistakes in prediction sets and enhances reliable threat prediction values.

The threat support function that is used as an activation function in the state of art system faces dependency problem of TTPs. The proposed study solved this problem with modified threat support function based on Bayesian probabilistic graphical model. Moreover, there have been prediction reliability problems in the state of art system since risk assessment during threat prediction stage is not performed. The proposed work addresses this problem by including dynamic risk management framework in the prediction algorithm (Fig. 3, Table 3).

Fig. 3
figure 3

Flowchart of the proposed modified Bayesian probabilistic graphical model for threat prediction

Table 3 Proposed Bayesian probabilistic graphical model (BPGM) algorithm

4 Results and Discussion

In this research, Python version 3.6.9, Scikit Learn, Matplotlib, Keras, Tensorflow, and Numpy libraries were used in the application and test stages. Five datasets are used from NSL-KDD, CICIDS2017 [5], and ATT&CK [2]. These datasets are publicly accessible and free. In total, the number of records of each dataset is different; the specifications are given in detail in Table 4. Data is divided into five different sets. One of them was used for test purposes and the remaining sets were used for training purposes (Figs. 4 and 5). These procedures were performed by applying holdout cross-validation. The used system configuration for experiment is as follows: Intel® Core ™ I7–8550U CPU @ 1.80 GHz and 16 GB installed memory (RAM). Python 3.6.9 Keras library Metric method was used to calculate the prediction accuracy and the processing time values of the five different datasets. In addition, average prediction accuracy and average processing time values were calculated using Mean method of Python 3.6.9 Numpy library. Figures 6 and 7 show results of the different datasets.

Table 4 Number of records of each datasets
Fig. 4
figure 4

Dataset 2 classification accuracy for state of art and the proposed solution. a Orange line indicates classification accuracy values of state of art study [2]. b Blue line indicates classification accuracy of the proposed solution

Fig. 5
figure 5

Dataset 2 classification accuracy for state of art and the proposed solution. a Orange line indicates classification accuracy values of state of art study [2]. b Blue line indicates classification accuracy of the proposed solution

Fig. 6
figure 6

Average prediction accuracy calculated for five different datasets (in percentage). Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average accuracy of Dataset (1). b Second two lines show average accuracy of Dataset (2). c Third two lines show average accuracy of Dataset (3). d Fourth two lines show average accuracy of Dataset (4). e Fifth two lines show average accuracy of Dataset 5

Fig. 7
figure 7

Average processing time calculated for five different datasets (in seconds). Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average processing time of Dataset (1). b Second two lines show average processing time of Dataset (2). c Third two lines show average processing time of Dataset (3). d Fourth two lines show average processing time of Dataset (4). e Fifth two lines show average processing time of Dataset 5

Classification accuracy values for the different TTP types, that are available in the used datasets, were obtained using Predict method of Python 3.6.9 Keras library. Processing time values have been calculated again with the help of Python 3.6.9 Keras library Now method. Microsoft Excel functions are used to calculate average processing time and accuracy values. All results can be seen in Figs. 8, 9, 10, 11, 12, 13, 14, 1516 and 17.

Fig. 8
figure 8

Average prediction accuracy calculated for true positive and true negative results (in percentage) from Dataset 1. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average accuracy of true positive. b Second two lines show average accuracy of true negative

Fig. 9
figure 9

Average processing time calculated for true positive and true negative results (in seconds) from Dataset 1. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average processing time of true positive. b Second two lines show average processing time of true negative

Fig. 10
figure 10

Average prediction accuracy calculated for true positive and true negative results (in percentage) from Dataset 2. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average accuracy of true positive. b Second two lines show average accuracy of true negative

Fig. 11
figure 11

Average processing time calculated for true positive and true negative results (in seconds) from Dataset 2. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average processing time of true positive. b Second two lines show average processing time of true negative

Fig. 12
figure 12

Average prediction accuracy calculated for true positive and true negative results (in percentage) from Dataset 3. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average accuracy of true positive. b Second two lines show average accuracy of true negative

Fig. 13
figure 13

Average processing time calculated for true positive and true negative results (in seconds) from Dataset 3. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average processing time of true positive. b Second two lines show average processing time of true negative

Fig. 14
figure 14

Average prediction accuracy calculated for true positive and true negative results (in percentage) from Dataset 4. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average accuracy of true positive. b Second two lines show average accuracy of true negative

Fig. 15
figure 15

Average processing time calculated for true positive and true negative results (in seconds) from Dataset 4. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average processing time of true positive. b Second two lines show average processing time of true negative

Fig. 16
figure 16

Average prediction accuracy calculated for true positive and true negative results (in percentage) from Dataset 5. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average accuracy of true positive. b Second two lines show average accuracy of true negative

Fig. 17
figure 17

Average processing time calculated for true positive and true negative results (in seconds) from Dataset 5. Blue lines show the proposed solution values. Orange lines show the state of art solution [2] values. a First two lines show average processing time of true positive. b Second two lines show average processing time of true negative

Belief network model was created during feature extraction and classification stages. Using this model, features that are obtained from training data are extracted. Features that make up feature maps are considered by belief network in linear time [23]. After this, posterior probabilities are calculated using support function to generate the classification of TTPs based on incident frequency. After the training process, the model was evaluated with the use of validation dataset.

The classification accuracy performance of Dataset 2 is shown in Fig. 4. Results compares the values of state of art system [2] and the proposed solution during the training phase. State of art system and the proposed solution have similar accuracy values. However, in order to achieve optimum accuracy values, less epochs are taken by the proposed solution, it means that in case of increasing dataset size, the proposed solution will reduce the processing time values in model training process.

Accuracy performance of Dataset 2 is shown in Fig. 5. Results compare the values of state of art [2] and the proposed solution during validation phase. When Fig. 5 examined, it can be observed that the proposed solution offers 4% more accurate classification accuracy compared to state of art solution. After training and validation stages, the results of all dataset classification accuracy and processing time can be seen in Table 5.

Table 5 After training and validation stages, the classification accuracy and processing time results of the proposed solution and the state of art solution for the five datasets

Data eport and bar graphs were used to compare the proposed solution results with the state of art system [2] in order to generate the presented tables and graphs. Results are based on three scenarios that are applied over the datasets. According to small, medium and large datasets, different results were obtained, and these results were used for comparison.

Results of different scenarios are evaluated according to the training and validation stages. Results of each dataset provide the accuracy and the processing time. To elaborate this; the accuracy values are determined by the ratio of correctly classified TTP samples to the total number of TTPs in the datasets [24]. The processing time values are calculated according to the number of predicted threats models that are required to reach reliable prediction level. The test dataset is 20% of the samples covered by the five datasets (Dataset1, Dataset2, Dataset3, Dataset4, and Dataset5).

Figure 6 shows the results of average accuracy that is calculated by the average of results obtained after the training phase and validation phase of respective five datasets. Figure 7 illustrates the average processing time, which is calculated by the average of results obtained after the training phase and validation phase of respective five datasets.

In the test phase, the average number of detected threats of each dataset was analysed based on three different scenarios. Results are presented in Tables  6, 7, 8, 9, 10, 11, 12, 13, 14 and 15. The prediction accuracy and the processing time values were obtained for TTPs based on the number of records in the datasets. While the probability of correct relation of TTPs is based on the prediction accuracy measurement.

Table 6 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true positive values from Dataset 1
Table 7 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true negative values from Dataset 1
Table 8 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true positive values from Dataset 2
Table 9 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true negative values from Dataset 2
Table 10 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true positive values from Dataset 3
Table 11 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true negative values from Dataset 3
Table 12 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true positive values from Dataset 4
Table 13 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true negative values from Dataset 4
Table 14 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true positive values from Dataset 5
Table 15 Prediction accuracy–processing time results are given for the proposed solution and the state of art solution [2]. Results obtained for true negative values from Dataset 5

The duration of classification process of TTPs is taken into consideration in the processing time measurement. As explained previously, results were analysed in two positions and the classification stage of belief network is the stage where results were obtained. With the help of Eq. (6), the proposed solution improves the prediction accuracy by enhancing threat posterior probability values. At the same time, according to Eq. (8) calculations, threat risk assessment is performed and the required processing time values are decreased for the prediction probability. The proposed system helps to identify the threat artifacts against most likely attack scenarios suggesting that real-time security analyses uses lowest cost and most likely mechanisms.

When results are evaluated, it is observed that the proposed solution model improves the prediction accuracy and the processing time values compared to the state of art model based on TTP classification. The proposed solution offers an average classification accuracy of 96% with Naive Bayes posterior probability and modified prior class probability using joint distribution functions. This result is 4% higher than the results of the state of art model. Moreover, the proposed solution achieves an average processing time of 0.028 s with the help of risk assessment for maximum support of set of the detected TTPs function. This value is 0.015 s less than the state of art solution.

Accuracy of datasets for each TTP was evaluated using the predict function of Python 3.6.9 Keras library. In this step, true positive and true negative were used to calculate the accuracy of correctly retrieved documents. In order to calculate the processing time values, Python 3.6.9 functions were used. Start time and end time intervals are determined with Now method. Moreover, the average accuracy and average time values were calculated with the help of Microsoft Excel average function. The improvements in the accuracy and the processing time values were investigated for the proposed solution against the state of art algorithms. In [2] the accuracy is calculated using Eq. (10):

$${\text{Accuracy}}=\frac{{\text{True positive}}}{{\text{True positive}}+{\text{True negative}}}$$
(10)

where True positive: correctly retrieved TTPs from datasets dictionary. True negative: correctly dropped ttps from datasets dictionary.

In summary, using modified prior class probability threat support function (TFS), as the activation function in cyber-threat prediction algorithm, effectively avoids dependency problem of TTPs in the proposed solution model. TSF defines the best candidate threat prediction set with the maximum probability value. Bayesian probabilistic graphical model that is based on joint distribution calculates posterior probability. Therefore, this increases the probability accuracy as the graphical model of the Bayesian probabilistic effectively finds the best threat classification probability. Risk assessment function is another new feature of the proposed solution model. This function is used to identify the most relevant threats in the threat set, therefore, increases the accuracy of the probability function and reduces the processing time for threat prediction. As a result, it can be stated that the proposed solution provides increased accuracy and decreased processing time in cyber-threat prediction.

Various techniques have been used to detect and predict cyber-attacks. The most important limitation of these techniques has always been the attack prediction accuracy and the processing time during identification of attacks. The proposed solution solves the limitations encountered in the state of art model, achieving 96% prediction accuracy and achieving a 4% enhancement to the state of art solution which has 92% accuracy. At the same time, the proposed solution is superior to the state of art with processing time values. The proposed solution improves the state of art solution with an average processing time of 0.028 s against current processing time of 0.043 s. Threat support function is used to solve TTPs dependency problem, which is used as an activation function, and risk assessment function that improves processing time values were effective in obtaining improved results in different dataset scenarios. The main comparison between the proposed solution and the state of art is discussed in Table 16.

Table 16 Comparison table between the proposed solution and the state of art solution

5 Conclusion and Future Work

Methods and results that are presented with the proposed solution show that security incidents can be matched with cyber threat tactics in cyber threat intelligence. Machine learning can be used to artificially-link these mappings using specific possibilities and algorithms. In this context, it is worth noting that the prediction accuracy and the processing time are still limited. This study worked on the improvement of these two limitations. The proposed solution was inspired by the study that developed second best solution [7], and new features were developed such as modified version of Naive Bayes posterior probability and modified prior class probability. These functions increase the probability accuracy due to associated threats consideration compared to posterior Naive Bayes probability based on normalized conditional probability. Moreover, a new feature for risk management framework has been developed that allows the improvement in processing time limitation by using third best solution [3]. With posterior probability threat, risk assessment approach identifies the most relevant threats (using threat impact) in the cyber-threat set with increased accuracy of probability function and reduced time for threat prediction. Therefore, the proposed solution improves the average prediction accuracy by 4% and reduces the average processing time by 0.015 s. In future, in order to enable the developed model to be used in wider domains, multiple class datasets will be provided during testing and training stages of machine learning. In addition to that, studies will be carried out for mitigation integrations and automation of threat incidents detected in cyber threat intelligence. In this regard, development methods will be used to improve the threat classification performance and feature extraction.