1 Introduction

Trade management has become increasingly complex and challenging in the rapidly advancing era of digital technology and online transactions (Praveena et al., 2022; Bai et al., 2021; Vasnani et al., 2019). The widespread use of e-commerce platforms has resulted in a vast network of partners and traders, necessitating a heightened focus on security, integrity, and trust. As businesses engage in transactions across digital platforms, they face cyber threats that can compromise sensitive information and disrupt trade processes. It is crucial to address these challenges to ensure trade management systems’ smooth and secure functioning (Khan et al., 2018; Kozhaya et al., 2021; Chen et al., 2023). This paper highlights the significance of trade management in the digital era and sets the stage for discussing the importance of privacy in secured supply chain management.

Privacy plays a vital role in ensuring the effectiveness of secured supply chain management. In interconnected global trade networks, it is imperative to safeguard sensitive information and prevent unauthorised access to maintain the integrity of the supply chain (Qian et al., 2021; Kilincer et al., 2021; Woods et al., 2022). Protecting the privacy of transactional data, customer details, and intellectual property is crucial for businesses to maintain their competitive advantage and safeguard their interests. However, the digital landscape presents unique challenges to privacy due to the inherent vulnerabilities of data during online transfers and the ever-present risk of data breaches. The increasing reliance on digital platforms for trade management introduces complexities and risks that must be addressed (Paul et al., 2019; Sivaraman et al., 2020; Avrachenkov et al., 2019; Ruan et al., 2016). The potential exposure of sensitive information during online transfers raises concerns about unauthorised access, data leaks, and misuse. The threat of data breaches further emphasises the importance of implementing robust privacy measures in supply chain management. Understanding the significance of privacy sets the foundation for highlighting the limitations of existing methods in adequately addressing this challenge(Cheng et al., 2016; Dai et al., 2023; Gao et al., 2021).

Existing methods often need help to keep pace with the dynamic nature of digital trade. These methods may rely on outdated or manual processes that need more agility and responsiveness in the digital era. Additionally, traditional approaches may need built-in mechanisms to ensure privacy and data security throughout the supply chain process(Guo et al., 2023). This limitation poses significant risks to trade management systems, as the potential for data leaks or breaches can lead to financial losses, reputational damage, and legal consequences. By profoundly analyzing these challenges, this article proposes a new novel approach called Game Theory Based Secured Supply Chain Architecture (GBSSCA) which involves Evolutionary Game Theory (ECT) as a solution to address these limitations (Bouhaddi et al., 2018; Jin et al., 2020; Wang et al., 2022a, 2022b).

The proposed GBSSCA approach presents a promising solution for addressing the challenges of secured supply chain management within the digital landscape. Utilising game theory principles, this approach introduces a fresh perspective of fairness and accountability to the supply chain architecture. One key aspect of the EGT approach is the implementation of a subsystem that imposes taxes on supply chain nodes. This system encourages ethical conduct and discipline among participants by incentivising them to act in the best interests of the entire supply chain ecosystem. The approach discourages malicious actors and promotes responsible behaviour by imposing taxes. Inspired by the ultimatum game, the game-theoretic model employed in the EGT approach progressively penalizes those engaging in malicious activities, thereby discouraging fraud and enhancing compensation systems (Adami et al., 2016; Xing et al., 2020; Zheng et al., 2023; Wang et al., 2022a, 2022b). This model also reduces collusion, motivating participants to maintain a fair and transparent supply chain environment. Overall, the proposed EGT approach provides a comprehensive framework that addresses the challenges faced in secured supply chain management and establishes a system of fairness, accountability, and trust among participants (Ren et al., 2020; Douha et al., 2023). By leveraging game theory principles and introducing innovative mechanisms, this approach aims to enhance the integrity and reliability of supply chain processes in the digital landscape.

The main contributions of the paper are as follows.

  • We propose a novel approach called the Game theory-based Secure Supply Chain Architecture (GBSSCA) for enhancing secured supply chain management in effective trade management.

  • The proposed GBSSCA incorporates the Evolutionary Game Theory (EGT) approach to address security limitations and ensure privacy in the supply chain management process.

  • The effectiveness of the proposed approach is demonstrated through rigorous experiments, validating its efficacy and efficiency.

The paper is structured as follows: Sect. 2 provides a comprehensive literature review, highlighting relevant research and studies in the field. Section 3 delves into the Proposed Evolutionary Game Theory (EGT) approach, explaining its key concepts and mechanisms. Section 4 presents the results of the experiments conducted to assess the effectiveness of the proposed approach. Finally, Sect. 5 concludes the paper by summarising the key findings, discussing their implications, and suggesting potential avenues for future research.

2 Literature Review

RLM-QRD Algorithm (Liu et al., 2021a, 2021b), the concept of Nash Equilibrium (Li et al., 2015), the Dataset of MIT (Peng et al., 2019), and the China National Vulnerability Database of Information Security (CNNVD), which are discussed briefly in Sect. 3. The paper (Zhu et al., 2021) addresses the challenges related to the selfishness of parking vehicles (PVs) and the potential threats from trustless or malicious PVs. The proposed solution introduces a reputation-based cooperative content delivery mechanism, utilising a two-layer auction game to optimise content delivery and rewards for mobile vehicles (MVs), PVs, and roadside units (RSUs). The paper (Xu et al., 2020) aims to overcome the limitations of traditional deterministic game models in accurately capturing the dynamic nature of network attack and defense strategies and external factors. It constructs a stochastic evolutionary game model using stochastic differential equations with Markov property, finding evolutionary equilibrium solutions and proving model stability.

A method (Zhang et al., 2021) presents the Pareto optimal decision vectors and solutions for the finite horizon indefinite mean-field stochastic cooperative linear-quadratic (LQ) difference game. The article establishes the equivalence between the solvability of coupled generalised difference Riccati equations (GDREs) and the solvability of the multi-objective optimisation problem. The motive of this study (Liu et al., 2021a, 2021b) is to address the limitations of conventional game models that assume ideal systems with unlimited computational resources for decision-making. The study aims to develop a new mathematical model of extensive games that incorporates bounded computational resources, allowing players only to foresee a portion of available alternatives in the future (Nasrin et al., 2022; Liu et al., 2022; Liu et al., 2023).

The article (Tian et al., 2019; Jiang et al., 2023; Jiang et al., 2022) proposes applying evolutionary game theory to model the evolution process of malicious users’ attacking strategies and discusses the methodology for conducting evaluation simulations. The study (Mengibaev et al., 2020; Li et al., 2020) introduces a heterogeneous interaction mode where players can adopt different strategies for opponents. The impact of heterogeneous interaction dependency strength on privacy protection is explored through computer simulations, revealing that heterogeneous decision behavior can promote privacy protection. The study (Liu et al., 2020; Li et al., 2023; Luo et al., 2023) introduces an improved learning mechanism based on the network topology, establishes a learning object set based on the players’ learning range, and incorporates the Fermi function to calculate the transition probability between learning object strategies (Shukla et al., 2022; Mo et al., 2023; Meng et al., 2020).

The paper (Shi et al., 2021; Tutar et al., 2023) introduces dynamic honeypots that adjust defense strategies based on hacker attacks, treating the confrontation between defenders and attackers as a strategic game. The goal is to improve the security of array honeypot systems by deriving evolutionarily stable strategies from the game model and analyzing the stability of strategy evolution based on the number of servers. The paper (Wang et al., 2021) aims to explore the dynamics of social payoffs and average social investments using evolutionary game theory.

The aim (Hu et al., 2020) of this research is to address the issue of overlooking strategy in cyber security defense. While various techniques exist, decision-making and optimal strategies play a significant role in the outcome of cyber-attack defense. The research utilizes a stochastic evolutionary game model with the Logit Quantal Response Dynamics (LQRD) equation, incorporating the parameter λ to quantify cognitive differences among real-world players. This study’s objective (Guo et al., 2021) is to propose a dynamic defense strategy against dynamic load-altering attacks (D-LAAs) in the power grid. The study focuses on the interplay between the attacker and the defender in a multistage game, employing minimax-q learning to determine optimal strategies. The paper (Jie et al., 2019) focuses on modeling man-in-the-middle (MITM) attacks using a defender vs. multi-attacker Stackelberg game. The paper proposes an approach to compute the optimal defender strategy using a multi-double oracle algorithm.

3 Methodology

3.1 Proposed game model

3.1.1 Strategies initialization

Let us utilize the concept of (Douha et al., 2023) E-commerce trade management architecture encompasses three populations: e-commerce platform users (Population 1), manufacturers (Population 2), and attackers (Population 3). Figure 1 illustrates the structure of our E-commerce system. Population 1 comprises e-commerce platform users who engage in trade management activities for buying and selling goods and services. Population 2 represents the manufacturers who produce and supply various products to be traded on the e-commerce platform. Lastly, population 3 consists of attackers who threaten the security and integrity of the supply chain and trade management processes.

Fig. 1
figure 1

E-commerce supply chain architecture

In secured supply chain management and effective trade management in the e-commerce platform, population 1 relies on the products supplied by Population 2. These products may include various items such as electronics, clothing, or consumables. However, the rise of cyberattacks and fraudulent activities in the e-commerce space necessitates a focus on ensuring the security and trustworthiness of the supply chain. To address these challenges, population 1 may invest in cybersecurity measures and adopt best practices to protect themselves and their customers from potential threats. This may involve implementing secure payment gateways, encrypting sensitive data, and enforcing stringent verification processes. By proactively enhancing cybersecurity measures, population 1 can mitigate risks and build trust among e-commerce platform users. In this proposed model, population 2 is critical as the manufacturers responsible for producing and supplying the products traded on the e-commerce platform. They need to prioritize security and integrity in their manufacturing processes, ensuring that their products are authentic, reliable, and free from vulnerabilities. By adhering to strict quality control and implementing secure supply chain practices, population 2 can contribute to a more secure and trustworthy e-commerce ecosystem. Meanwhile, population 3, the attackers, constantly threaten the secured supply chain management and effective trade management in the e-commerce platform. They may attempt malicious activities such as counterfeit product distribution, data breaches, or identity theft. Therefore, Population 1 and Population 2 must collaborate and implement robust security measures to detect and mitigate potential attacks from Population 3. By employing a system model that emphasizes secured supply chain management, the e-commerce platform can create a more secure and trustworthy environment for trade management. Through adopting advanced cybersecurity measures, stringent supply chain practices, and collaborative efforts, populations 1 and 2 can collectively minimize risks, enhance trust among users, and promote effective trade management in the e-commerce platform.

3.1.2 Proposed game model based on RLM-QRD Framework

In this section, we utilize the multistage evolutionary game model (MEGM) based on (Douha et al. 2023). MEGM is a framework in game theory that considers how multiple populations or players interact strategically across multiple stages or periods. Each population’s decisions and strategies are influenced by the actions of other populations and the outcomes of previous stages. In the context of the E-commerce architecture depicted in Fig. 1, which involves e-commerce platform users (Population 1), manufacturers (Population 2), and attackers (Population 3), a multistage evolutionary game model can effectively capture the dynamic interactions and changes between these populations over time. The model enables an analysis of how the strategies and behaviors of each population evolve and affect the security and integrity of the e-commerce platform’s supply chain and trade management processes.

By studying the system dynamics and assessing the effects of different strategies adopted by each population, the MEGM helps to identify optimal defense strategies, evaluate the system’s vulnerability to attacks, and explore methods to enhance the security and effectiveness of supply chain management and trade practices. Considering multiple stages or periods allows for considering the populations’ adaptability and facilitates the exploration of strategies that yield long-term benefits (Alipour & Bastani, 2023). Furthermore, the model captures the interdependencies between populations and their decision-making processes, which is crucial for understanding the complex dynamics of the system (Fig. 2).

Fig. 2
figure 2

Proposed EGT based on RLM-QRD

Here, our proposed game model consists of Q-learning, Replication Dynamic Equations (QRD), and Reinforcement Learning Model (RLM).

The evolutionary game model based on QRD involves two main components: payoff quantification and QRD calculations (Yang et al., 2023; Xia et al., 2023). Payoff quantification utilizes the information from the initial stage of the game to compute the revenue generated by both the attack and defense strategies in that stage. By considering the strategy revenues at the current stage, QRD calculations are performed to determine the optimal defense strategy through mathematical calculations. The RLM serves the purpose of connecting all stages of the game. It plays a crucial role in maintaining continuity and facilitating the evolution of strategies over time. The RLM utilizes the known game information to adjust the incentives or punishments associated with the attack and defense revenues in the subsequent stage. This adaptation is based on the observed outcomes and dynamics of the game. By influencing the behavior of the populations, the RLM promotes adopting strategies that yield better defense outcomes in the post stages.

3.1.3 Q-Learning

Q-Learning is a reinforcement learning method that functions as an asynchronous dynamic programming approach. It adapts the state-action values \({Q}_{t}({S}^{{\prime }},{A}^{{\prime }})\) to estimate the state-action values \({Q}_{t+1}(S,A)\) at the next time step \((t + 1).\) The state-action value \({Q}_{t}(S,A)\) represents the expected revenue after taking action \(A\) in state \(S\) at time \(t\).

$${Q}_{t+1}\left(S,A\right)\leftarrow \left(1-\delta \right){Q}_{t}\left(S,A\right)+\delta (r+\gamma\, {\rm max}_{{a}^{{\prime }}}{Q}_{t}\left({S}^{{\prime }},{A}^{{\prime }}\right))$$
(1)

Q-Learning updates the state-action values based on a formula using step size \(\delta\), immediate reinforcement \(r\), and discount factor \(\gamma\). Its principle is to iteratively select actions in discrete states, improving the evaluation of action quality to maximize profit and achieve the game’s goal.

3.1.4 Replication dynamic equation

The replication dynamic equation for attackers and defenders in network attack and defense can be represented as.

For attackers

$${X}_{i}^{{\prime }}\left(t\right)=\frac{{DX}_{i}}{DT}={X}_{i}\left[{Q}_{{AS}_{i}}- \bar{Q} _{{AS}_{i}}\right]$$
(2)

For defenders

$${Y}_{j}^{{\prime }}\left(t\right)=\frac{{DY}_{j}}{DT}={Y}_{j}\left[{Q}_{{DS}_{j}}-\bar{Q}_{{DS}_{j}}\right]$$
(3)

Here, \({X}_{i}^{{\prime }}\left(t\right)\) and \({Y}_{j}^{{\prime }}\left(t\right)\) represent the probabilities of the attacker and defender selecting strategies \({AS}_{i}\) and \({DS}_{j}\), respectively, at a given time \(t\). \({Q}_{{AS}_{i}}\) and \({Q}_{{DS}_{j}}\) represent the expected revenues of the attacker’s and defender’s strategies, respectively. \(\bar{Q}_{{AS}_{i}}\) and \(\bar{Q}_{{DS}_{j}}\) represent the average revenues of the attack and defense strategy sets. The equations describe how the probabilities of strategy adoption change over time based on the differences between individual strategy revenues and average strategy revenues. The replication dynamic equation ensures the gradual adoption of strategies with better revenue, leading to the most beneficial strategy and the evolutionary stable strategy as the Nash equilibrium.

3.1.5 Payoff quantification of attack and defense strategy

Payoff quantification is crucial for analyzing the effectiveness of attack and defense strategies in achieving optimal network security defense. The attack payoff matrix \(AM\) represents the attacker’s revenue \({A}_{ij}\) generated by a combination of attack \({AS}_{i}\) and defense \({DS}_{j}\) strategies, considering attack revenue \(AR\) and cost \(AC\).

$${A}_{ij}={Q}_{A}\left({AS}_{i},{DS}_{j}\right)=AR-AC$$
(4)

The defense payoff matrix \(DM\) represents the defender’s revenue value \({D}_{ij}\) generated by the same strategy combination \(\left({AS}_{i},{DS}_{j}\right)\), considering defense revenue \(DR\) and cost DC. Both matrices capture the strategic outcomes for each stage, providing essential information for evaluating and selecting defense strategies.

$${D}_{ij}={Q}_{D}\left({AS}_{i},{DS}_{j}\right)=DR-DC$$
(5)

The attack payoff matrix in stage \(k\)

$${AM}^{k}=\left[\begin{array}{cccc}{A}_{11}^{k}& {A}_{12}^{k}& \dots & {A}_{1 m}^{k}\\ {A}_{21}^{k}& {A}_{22}^{k}& \dots & {A}_{2 m}^{k}\\ \vdots& \vdots& \ddots & \vdots\\ {A}_{n1}^{k}& {A}_{n2}^{k}& \dots & {A}_{nm}^{k}\end{array}\right]$$
(6)

and defense payoff matrix in stage K

$${DM}^{k}=\left[\begin{array}{cccc}{D}_{11}^{k}& {D}_{12}^{k}& \dots & {D}_{1 m}^{k}\\ {D}_{21}^{k}& {D}_{22}^{k}& \dots & {D}_{2 m}^{k}\\ \vdots& \vdots& \ddots & \vdots\\ {D}_{n1}^{k}& {D}_{n2}^{k}& \dots & {D}_{nm}^{k}\end{array}\right]$$
(7)

are constructed similarly, incorporating the respective revenues and costs specific to that stage. These matrices enable a comprehensive analysis of the revenues generated by different attack and defense strategies, allowing for strategic decision-making in each stage of the game.

3.1.6 Q-learning replication dynamic equation

Based on the work of (Douha et al., 2023) and (Liu et al., 2021a, 2021b), the formula for the Q-Learning replicated dynamic equation was derived using Eqs. (2) and (3). This equation incorporates the Boltzmann probability distribution to represent attack and defense strategies. The Q-Learning algorithm is then introduced into the replicated dynamic equation, resulting in the QRD equation mentioned as

$${X}^{{\prime }}\left(t\right)=\frac{{DX}_{i}}{DT}=\frac{{X}_{i}\left[{Q}_{{AS}_{i}}-\bar{Q}_{AS}\right]}{RD}+\frac{\frac{1}{\tau }{X}_{i}\sum _{k=1}^{n}{X}_{k} ln({X}_{k}/{X}_{i})}{ME}$$
(8)
$${Y}^{{\prime }}\left(t\right)=\frac{{DY}_{i}}{DT}=\frac{{Y}_{j}\left[{Q}_{{DS}_{j}}-\bar{Q}_{DS}\right]}{RD}+\frac{\frac{1}{\tau }{Y}_{j}\sum _{l=1}^{m}{Y}_{l} ln({Y}_{l}/{Y}_{j})}{ME}$$
(9)

QRD strategy includes two equations: replication dynamic equation (RD) and mutation equation (ME). RD chooses the best strategy based on current information, while ME explores new strategies in unknown attack and defense scenarios, learning from errors and adjusting strategies. This approach captures the diversity and uncertainty of network attacks and defences (Douha et al., 2023). In the context of evolutionary equilibrium,

$${X}^{{\prime }}\left(t\right)=0\;and\;{Y}^{{\prime }}\left(t\right)=0$$
(10)

when the strategies of players reach this state, it means that the rates of change of \(X\) and \(Y\) with respect to time (denoted as \({X}^{{\prime }}\left(t\right)\) and \({Y}^{{\prime }}\left(t\right)\) are not greater than zero. This implies that the strategies of the players have stabilized and are not changing significantly over time. In other words, they have reached a state of balance where further changes are minimal. In Eq. (10), the solution \(({X}^{*},{ Y}^{*})\) represents an evolutionary stable equilibrium point. This means that the strategies of the players have reached a stable state where neither player has an incentive to deviate from their current strategy. In order to achieve this stability, the \(\tau\) value in the equation needs to be sufficiently large. This ensures that the selection probability of each strategy is influenced enough to maintain the equilibrium and prevent players from shifting to different strategies.

3.1.7 Reward value learning algorithm

In RLM, the incentive and punishment factors are calculated based on the reward variable \(Rv\) and the proportion of a \(An\) certain type of attack strategy \(Sn\) used in the previous stage. These factors determine the adjustment to the reward value in order to influence the attack and defense strategies in the next stage. The calculation of these factors takes into account \(\alpha\) the defense result R from the last stage. Depending on whether the defense was successful or not, RLM modifies the reward value associated with the corresponding attack and defense strategies. This modification aims to enhance or diminish the reward value for these strategies in order to encourage or discourage their usage in the subsequent stage. By dynamically adjusting the reward values based on past performance and strategy proportions, RLM aims to optimize the selection of attack and defense strategies in future stages, ultimately improving the overall performance and effectiveness of the system.

Algorithm 1
figure a

.

The Algorithm 1 starts by initializing the variables \(Sn,An,Rv\). It then calculates the value of \(\alpha\), which represents the adjustment factor for the strategies, based on the incentive and punishment factors of the reward value. If it is the first stage, \(\alpha\) is set as half of the reward value \(Rv\). For subsequent stages, \(\alpha\) is calculated as the ratio of \(An\) to \(Sn\) multiplied by the reward value \(Rv\). Next, the algorithm checks the defense result from the last stage \(R\). If it is 0, indicating a failure in defense, the algorithm updates the attack and defense strategies for the current stage by adding \(\alpha\) to the previous values of the strategies \({A}_{ij}^{k}={A}_{ij}^{k-1}+\alpha\;\text{and}\; {D}_{ij}^{k}={D}_{ij}^{k-1}-\alpha\). On the other hand, if the defense result is non-zero, the algorithm updates the strategies by subtracting α from the previous values \({A}_{ij}^{k}={A}_{ij}^{k-1}-\alpha\;\text{and}\;{D}_{ij}^{k}={D}_{ij}^{k-1}+\alpha\). Finally, the algorithm returns the updated revenue values of the strategy combinations for the current stage, represented as \(({AM}^{k},{DM}^{k})\). This iterative process allows for the adjustment of strategies based on the previous outcomes and aims to optimize the revenue in each stage of the algorithm.

3.1.8 Optimal defense strategy selection based on RLM-QRD

In this paper, the focus is on finding the Nash equilibrium solution for a multistage evolutionary game. The authors consider the Nash equilibrium solution as a set of equilibrium solutions for each stage of the game. To achieve this, each stage of the game learns from the known game information using a reward value learning mechanism. This mechanism allows the players to update and modify the reward value associated with their defense strategy in the current stage. By incorporating the reward value learning mechanism, the players can adapt and improve their strategies over time based on the feedback they receive. Based on the optimal defense strategy determined for each stage using the reward value learning mechanism, the paper proposes constructing a multistage optimal defense strategy set. This set comprises the collection of the optimal defense strategies identified at each stage of the game.

Algorithm 2
figure b

.

The goal of the algorithm is to determine the probability set of the optimal defense strategy in each stage (denoted as \({Pr}_{D}^{k}\)). At the beginning, the algorithm initializes the parameters including the number of players \(n\), the current stage \(k\) a variable \(s\), a parameter \(\theta\), and several variables \(Q,\tau ,\alpha\). Next, the algorithm enters a loop where it calculates the values of \({AM}^{k}\;{\rm and}\; {DM}^{k}\) for each player and strategy combination using Eqs. (6) and (7). This step helps to determine the players’ actions and their corresponding defenses. Then, the algorithm proceeds to another loop for each stage \(k\)and strategy \(j\). Within this loop, it constructs \({Y}^{{\prime }}\left(t\right)\) based on an Eq. (9), which likely involves updating variables related to the players’ actions. Similarly, the algorithm constructs \({X}^{{\prime }}\left(t\right)\) within the stage loop for each player, based on Eq. (8). This step likely involves updating variables related to the players’ defenses. After that it calculates the value of \(\tau\) and the probability set of the optimal defense strategy \({Pr}_{D}^{k}\) using an Eq. (10). This calculation likely takes into account the updated variables and determines the players’ optimal defense strategies. Furthermore, the algorithm calculates the values of \({AM}^{k+1}\) and \({DM}^{k+1}\) using Algorithm 1. Finally, the algorithm outputs the probability set of the optimal defense strategy in the kth stage \({Pr}_{D}^{k}\), which is a set of values denoted as \(({Y}_{1}^{k},{Y}_{2}^{k}\dots .{Y}_{m}^{k})\).

4 Results and experiments

4.1 Experimental setup

In evaluating the effectiveness of the proposed GBSSCA, we leverage the attack and defense behavior databases sourced from MIT and the China National Vulnerability Database of Information Security (CNNVD) (Peng et al., 2019). The objective of this evaluation is to analyze the attack and defense atomic strategy within the context of the proposed GBSSCA. This experimental analysis centers around understanding the behaviors associated with cyber attacks and the corresponding defense mechanisms. The attack behavior database from MIT and the defense behavior database from CNNVD offer a comprehensive collection of documented attack and defense strategies, respectively. By utilizing these databases, we aim to assess the effectiveness of the GBSSCA in mitigating the identified attack strategies. Specifically, we analyze how the proposed system aligns with the observed attack behaviors and evaluate the robustness of its defense mechanisms in countering these attacks. This evaluation enables us to gauge the efficiency and reliability of the GBSSCA in securing the supply chain within the trade management system. By examining the alignment between the proposed approach and the documented attack and defense strategies, we can assess the extent to which the GBSSCA enhances the overall security and resilience of the supply chain in the face of cyber threats.

4.2 Evaluation metrics used

$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
$$Precision=\frac{TP}{TP+FP}$$
$$Recall\frac{TP}{TP+FN}$$
$$F1-score = 2 * \left(\right(Precision * Recall) / (Precision + Recall\left)\right)$$

Also we evaluate the proposed GBSSCA based on the existing base line models such as Bayesian game model, EGT based on Attack and defense and QRD based EGT.

4.3 Dataset preparation and integration of attack and defense behavior databases

In this section, we discuss the crucial step of preparing the dataset for evaluation by combining the attack behavior database from MIT and the defense behavior database from CNNVD. By merging these two databases, we create a unified dataset that enables us to analyze the effectiveness of the proposed GBSSCA which is shown in Fig. 3. Each record within the dataset represents a specific behavior observed in cyber attacks and defense mechanisms. This integration allows us to assess the alignment between the proposed approach and the documented attack and defense strategies, ultimately evaluating the GBSSCA’s ability to enhance the overall security and resilience of the supply chain in the face of cyber threats.

Fig. 3
figure 3

Distribution of attack and defense behavior databases

4.4 Revenue values for different attack and defense strategies

The revenue outcomes associated with attack and defense strategies, denoted as AMK (Attack Strategy Revenues) and DMK (Defense Strategy Revenues), are crucial in understanding the financial prospects of various strategy pairs in our game model. By delving into these revenue figures, we obtain a nuanced picture of how profitable different combinations of attack and defense maneuvers could be. Take, for instance, a scenario where the defender opts for Defense Strategy 1 (DS1) and the attacker goes for Attack Strategy 1 (AS1); this pairing is predicted to bring in a revenue of 2. In a different combination, where Defense Strategy 2 (DS2) is employed by the defender against Attack Strategy 2 (AS2) from the attacker, the anticipated revenue remains at 2. However, a more lucrative outcome is observed when Defense Strategy 3 (DS3) and Attack Strategy 3 (AS3) are chosen by the defender and attacker, respectively, leading to an increased expected revenue of 4. Such insights into revenue values are vital for decision-makers, as they provide a financial lens to assess the effectiveness of various attack and defense strategies. This analysis not only helps in recognizing which strategy pairs are most financially beneficial but also aids in strategizing for maximum revenue generation. This information is crucial for selecting optimal defense strategies that align with network security objectives. By considering the revenue values, decision-makers can make informed choices and improve the effectiveness of their defense strategies to mitigate potential risks and achieve desired financial outcomes (Fig. 4).

Fig. 4
figure 4

Revenue values for different attack and defense strategies

4.5 Baseline comparison

The proposed GBSSCA model exhibits superior efficiency when compared to the other models. It achieves the highest accuracy of 0.89, surpassing the Bayesian model (accuracy: 0.8), the EGT-Attack and Defense model (accuracy: 0.84), and the QRD-EGT model (accuracy: 0.86). Furthermore, the precision of the proposed GBSSCA model is 0.91, outperforming the Bayesian model (0.82), the EGT-Attack and Defense model (0.86), and the QRD-EGT model (0.88). This implies that the proposed GBSSCA model has a higher proportion of correctly predicted positive samples compared to the other models. Moreover, the recall of the proposed GBSSCA model is 0.96, outshining the Bayesian model (0.86), the EGT-Attack and Defense model (0.89), and the QRD-EGT model (0.91). This indicates a higher proportion of correctly predicted positive samples out of all actual positive samples for the proposed GBSSCA model. Overall, the proposed GBSSCA model demonstrates its superior efficiency by achieving higher accuracy, precision, and recall values, thus effectively identifying both attack and defense classes (Fig. 5).

Fig. 5
figure 5

Overall percentage achieved by the models

4.6 Mean squared error rate (MSE)

When comparing the performance of the proposed GBSSCA model with the other models based on the Mean Squared Error (MSE) rate, the proposed GBSSCA model outperforms the other models. Its lower MSE rate indicates a smaller average squared difference between its predicted values and the actual values. This suggests that the proposed GBSSCA model provides more accurate predictions and has a better fit to the data compared to the Bayesian, EGT-Attack and Defense, and QRD based EGT models (Fig. 6).

Fig. 6
figure 6

Reducing error rate

4.7 Efficiency of proposed GBSSCA

Figure 7 illustrates the effectiveness of the proposed GBSSCA through the evaluation of revenue improvement, defense effectiveness, and stability analysis.

The revenue improvement achieved by the proposed architecture compared to baseline or existing approaches. The lines representing “Baseline Revenue” and “Proposed Revenue” show the revenue values obtained from traditional supply chain models or game theory approaches and the proposed GBSSCA, respectively. If the line representing “Proposed Revenue” consistently stays above the line representing “Baseline Revenue,” it indicates that the proposed GBSSCA leads to higher revenue generation, indicating its effectiveness in improving the financial outcomes of the supply chain system.

Effectiveness of the defense strategies selected by the Reinforcement Learning Model (RLM) and QRD calculations. The lines labeled “Reduction in Attack Revenue” and “Increase in Defense Revenue” represent the reduction in attack revenue and the increase in defense revenue, respectively, achieved by the defense strategies employed in the proposed GBSSCA. If these lines consistently show a decreasing trend for attack revenue and an increasing trend for defense revenue, it indicates that the proposed GBSSCA effectively reduces the financial losses caused by attacks and enhances the defensive capabilities of the supply chain system.

The stability of the system through the lines labeled “Stable Stages” and “Unstable Stages.” These lines represent the number of stages in which the strategies reach a stable state (Nash equilibrium) (Li et al., 2015) and the number of stages in which the strategies are still unstable and undergoing significant changes. If the line for “Stable Stages” consistently shows higher values and the line for “Unstable Stages” remains close to zero, it indicates that the proposed GBSSCA is effective in maintaining a stable and equilibrium state, minimizing disruptive changes and promoting consistent performance throughout the supply chain system.

Fig. 7
figure 7

Efficiency of proposed GBSSCA in terms of revenue improvement, defense effectiveness and stability analysis

5 Conclusion

The paper explores the challenges of trade management in the digital era, focusing on the intricate networks of partners and traders in the e-commerce platform. It emphasizes the significance of security, integrity, and trust in this environment, acknowledging the vulnerability of Industrial Internet of Things (IIoT) systems to cybersecurity threats that compromise sensitive information, industrial controls, and product integrity. To tackle these challenges, the paper proposes a GBSSCA specifically tailored for multi-factory environments, leveraging an Evolutionary Game Theory (EGT) approach. The architecture aims to ensure privacy, material provenance, and enable machine-led maintenance. A key component is the introduction of a fairness concept through a subsystem that imposes taxes on supply chain nodes, fostering ethical conduct and discipline among participants. Drawing inspiration from the ultimatum game, the proposed solution employs a game mechanism to progressively penalize malicious actors, discourage fraudulent activities, enhance compensation systems, and reduce collusion. By integrating a game-theoretic model, the paper strives to cultivate fairness and accountability among supply chain participants. The proposed tool capitalizes on the incentive structure of the supply chain to establish a secure and trustworthy environment for industrial collaboration, addressing the pressing need for increased trust among partners in industrial applications.