1 Introduction

With the advancement of Internet of Things applications in all fields, the big data volumes increased, and the use of many applications in cloud computing environments (Nguyen et al. 2019; Ghasempour 2019; Fu et al. 2018). At the same time, with the increase in the amount of data and information from users via cloud computing, sensors increase exponentially, and data is stored in different geographic locations on fog computing (Lin et al. 2016; Liu et al. 2016; Abualigah and Diabat 2020). Cloud computing is one of the best solutions for placing and recalling data in different geographic locations and reducing user waiting time. Cloud computing has developed its resources to enable users to access various services such as servers, storage, sensors, etc. (Wang et al. 2018; Yang et al. 2020). With the development of cloud computing with the Internet of things, huge amounts of different data are generated, which require processing and storage across different geographical locations. Cloud computing depends on load balance, bandwidth, and network performance in waiting time, the requirements of users based on the Internet of Things. Response time for users due to delays across nodes in cloud computing requires less bandwidth (John and Mirnalinee 2020; Pallewatta et al. 2022; Wang and Zhang 2020; Taghizadeh et al. 2021; Torabi et al. 2022; Jin et al. 2022).

Fog computing extends cloud services to the IoT network’s edge, offering numerous benefits such as high performance, reduced response time, bandwidth, and load balancing in fog computing (Liu et al. 2022; Li et al. 2022a; Yousif et al. 2022). Fog computing supports the Internet of Things and provides sensors in cloud environments. It also provides users with services from virtual machines and scalable storage spaces (Peake et al. 2022; Haris and Zubair 2021; Khemili et al. 2022). Cloud computing infrastructure is one model that manages cloud resources efficiently and effectively. It enables users to use the available resources as per usage and pay for usage. Cloud computing is widely used for big data processing and complex problems in different environments(Maheshwari et al. 2012; Long et al. 2014; Boru et al. 2015). Cloud computing that supports the Internet of Things offers a promising way to identify and establish replication across nodes in different geographic locations. Data replication is one of the most widely used methods for data availability, reliability, and distribution across nodes. It also works to process data and files and access file replication at the lowest cost and lowest path (Ebadi and Navimipour 2018; Chuang and Hsiang 2022). Data replication reduces data transfer, accessing data and files with less time and less distance, and reducing user waiting time (Failed 2013). Replication techniques consist of static and dynamic methods of defining, establishing, and placing replication across nodes. Static replication is set and established during system configuration and is not affected by momentary changes such as changing, deleting, or storing data within the database. At the same time, dynamic replication patterns are the access, storage, and deletion of data and its availability (Grami 2022). There are three critical issues: (1) what data should be replicated? (2) when should the data be replicated? and (3) where the new replicas should be placed? These three main open questions must be tackled for data replication in the cloud environments (Salem et al. 2020; Awad et al. 2021a, b).

The major contributions of the article are as follows:

  • Improving a new hybrid swarm intelligent algorithm method AOASSA is the strategy for dynamic data replication of IoT applications tasks in fog computing. The integration of AOA and SSA with MOO to solve the least cost path in fog computing.

  • To the best of our knowledge, there are introducing a few papers that handle data transmission and the least cost path for optimal placement file replication with MOO.

  • To the best of our knowledge, our work is the first attempt to study a new hybrid AOA, SSA (AOASSA), solved to obtain low bandwidth, data transmission, and least cost path.

  • The application of global optimization utilizing AOASSA gives better results when experimental results are compared with MOE, and MORM.

  • AOASSA minimizes the computational complexity and it also works efficiently for transmission data, distance and placement data replication problems.

  • The experimental results show the superiority of the AOASSA algorithm performance over other algorithms, such as cost, time, bandwidth, data transmission, and least cost path.

The rest of this paper is organized as follows. Section 2 presents related work. Section 3 presents the proposed architecture. Section 4 presents the proposed strategy. Section 5 presents the experimental results—finally, Sect. 6 presents the conclusion and future work.

2 Related work

Many related studies have researched data replication strategies in the cloud, as follows:

  • Sarwar et al. (2022) proposed two cross-node replication privacy schemes for data protection, authentication, and reliability. Create a schema for cross-node select and placement data replication while maintaining privacy and confidentiality on fog computing. The proposed algorithm outperformed other algorithms regarding memory, cost, confidentiality, and privacy.

  • Chen et al. (2021) proposed the first decentralized system to demonstrate data retrieval and repeatability—BOSSA, which is compatible with all parties on blockchain platforms. BOSSA also incorporates privacy-enhancing technologies to prevent decentralized peers (including blockchain nodes) from inferring private information about external data. The security analysis is presented in the context of integrity, privacy, and reliability, and we implement a BOSSA-based prototype to leverage smart nodes on the Ethereum blockchain. Our extensive beta reviews demonstrate the practicality of our proposal.

  • Li et al. (2022b) proposed Suggested a Lagrange method for relaxation. This method considers the load balancing, storage, data dependency, data transmission, time, cost, and bandwidth to obtain minimum data transmission time between nodes. Aimed at the cloudlet and fault-tolerant task scheduling strategy is proposed. The strategy optimization of the task scheduling techniques by considering the task of time, cost, and energy method. The experiments proved the performance of the proposed strategy in transferring data through cloud computing and choosing the optimal location according to the proposed algorithm.

  • Shiet et al. (2022) developed a new approach named Multi-Cloud Application deployment (MCApp). MCApp merges iterative mixed integer linear programming with domain-tailored large node search to optimize data replication deployment and user requests. The experiments proved the performance of the proposed strategy using the real and datasets demonstrate that MCApp significantly outperforms other algorithms.

  • Majed et al. (2022) presented a hybrid strategy for peer-to-peer data replication in cloud computing environments. It chose the most suitable and optimal nodes in the network at a low cost. It also selects users' most common and accessible data files and puts them in the most appropriate placement. Experimental results showed improved network performance and reduced user waiting.

  • Li et al. (2022c) Suggested an algorithm based on the Lagrangian relaxation method for optimum data replication across nodes in cloud computing. Consider balancing loads, bandwidth, and transmission time. Also, use the Floyd algorithm to reduce cost and bandwidth. The results showed the superiority of the proposed algorithm over other algorithms.

  • Khelifa et al. (2022) Suggested a dynamic and periodic data replication strategy in cloud computing. The proposed strategy aims to reduce the time for users’ requirements, achieve load balance, and reduce waiting for time and speed access to data. It also reduces the time to send data and transfer it through cloud computing. Also, a fuzzy logic algorithm of select and placement data replication across nodes was used. The proposed algorithm proved to be superior to other algorithms.

  • Mohammadi et al. (2022) Suggested an algorithm for selecting and setting data replication across nodes in cloud computing. Use the hybrid fuzzy logic and ant colony optimization algorithm to reduce users' waiting time and discover the most suitable and optimal nodes for placement data replication. The proposed algorithm outperformed the other algorithms.

3 Suggested system and discussion

3.1 Proposed system and structure

This section describes the proposed selection and placement of data replication across nodes in fog computing. The critical component of the proposed strategy is the Fog Broker, which is set in the fog nodes layer. Fog Broker consists of three stages: Task Manager, Resource Monitoring Service, and Task Scheduler. The fog computing system relies on our dynamic data replication strategy based on IoT in cloud computing. Select and placement data replication cross node in cloud computing requires a set of configurations to move data across fog computing. We assume that our proposed strategy comprises a certain number of Fog nodes such as (DCs) data centers and IoT services. We organized the proposed strategy from different geographic locations to select and place data replication across nodes in fog computing. Services can be distributed on any DCs, Fog nodes, or IoT sensors. Use the AOA algorithm with the SSA algorithm to transfer data across nodes with the lowest path and lowest cost. MOO with a floyed algorithm was also used to reduce bandwidth across the network and balance in fog nodes. The system in Fig. 1 consists of a different set of geographically distributed node (G) and the structure is composed of many different locations.

Fig. 1
figure 1

Proposed strategy for data replication in fog computing

3.2 Arithmetic optimization algorithm (AOA)

3.2.1 Initialization phase

According to Matrix, AOA, the optimization criteria apply to several produced random solutions (X). The best result obtained in each iteration is saved as being close to the current optimal result (Abualigah et al. 2021; Abdollahzadeh et al. 2021; Abdollahzadeh and Gharehchopogh 2022; Mahajan and Pandit 2021; Mahajan et al. 2022a, b).

$$ X_{i} = \left[ {\begin{array}{*{20}l} {x_{1}^{1} } \hfill & {x_{1}^{1} } \hfill & \cdots \hfill & {x_{1,D}^{1} } \hfill \\ {x_{1}^{2} } \hfill & {x_{2}^{2} } \hfill & \ldots \hfill & {x_{D}^{2} } \hfill \\ \vdots \hfill & \ddots \hfill & \ddots \hfill & \vdots \hfill \\ {x_{1}^{N} } \hfill & {x_{D}^{N} } \hfill & \ldots \hfill & {x_{D}^{N} } \hfill \\ \end{array} } \right] $$
(1)

Based on the Math Optimizer Accelerated (MOA) function produced by utilizing Eq. (2), the search phase (exploration or exploitation) should be chosen for each iteration.

$$ {\text{MOA}}\left( {{\text{C\_Iter}}} \right) = {\text{Min}} + {\text{C\_Iter}} \times \left( {\frac{{{\text{Max}} - {\text{Min}}}}{{{\text{M\_Iter}}}}} \right) $$
(2)

where MOA(C_Iter) is the value of MOA at the current iteration. C_Iter is the current iteration, which is between [1 M_Iter]. M_Iter is the maximum number of iterations. Min and Max are the minimum and maximum of the accelerated function values.

3.2.2 Exploration phase

The simplest rule that can simulation the behaviour of mathematical operators was employed. The following location updating calculations for the exploratory part:

$$ x_{i,j} \left( {{\text{C\_Iter}} + 1} \right) = \left\{ {\begin{array}{*{20}c} {{\text{best}}\left( {x_{j} } \right) \div \left( {{\text{MOP}} + \varepsilon } \right) \times \left( {\left( {{\text{UB}}_{j} - {\text{LB}}_{j} } \right) \times \mu + {\text{LB}}_{j} } \right),} & {r2 < 0.5} \\ {{\text{best}}\left( {x_{j} } \right) \times {\text{MOP}} \times \left( {\left( {{\text{UB}}_{j} - {\text{LB}}_{j} } \right) \times \mu + {\text{LB}}_{j} } \right),} & {{\text{otherwise}}} \\ \end{array} } \right. $$
(3)

where xi(C_Iter + 1) indicates the ith solution location in the next repetition. xi,j(C_Iter) indicates the jth location of the ith solution. best(xj) is the jth in the best-obtained location in the ith solution. µ is an adjusting parameter used to select the search operation, which is set equal to 0.5.

$$ {\text{MOP}}\left( {{\text{C\_Iter}}} \right) = 1 - \frac{{{\text{C\_Iter}}^{{\left( {\frac{1}{\alpha }} \right)}} }}{{{\text{M\_Iter}}^{{\left( {\frac{1}{\alpha }} \right)}} }} $$
(4)

where MOP is a coefficient. MOP(C_Iter) indicates the value of the MOP at the tth iteration. C_Iter indicates the current iteration. M_Iter indicates the maximum number of iterations. α is an adjusting parameter used to specify the exploitation accuracy. Throughout iterations, which is set equal to 5.

3.2.3 Exploitation phase

The MOA function value conditions this part of paper for the condition of r1 is not greater than the current MOA(C Iter) value.

$$ x_{i,j} \left( {{\text{C\_Iter}} + 1} \right) = \left\{ {\begin{array}{*{20}c} {{\text{best}}\left( {x_{j} } \right) - {\text{MOP}} \times \left( {\left( {{\text{UB}}_{j} - {\text{LB}}_{j} } \right) \times \mu + {\text{LB}}_{j} } \right),} & {r3 < 0.5} \\ {{\text{best}}\left( {x_{j} } \right) + {\text{MOP}} \times \left( {\left( {{\text{UB}}_{j} - {\text{LB}}_{j} } \right) \times {\text{mu}} + {\text{LB}}_{j} } \right),} & {{\text{otherwise}}} \\ \end{array} } \right. $$
(5)

3.3 Pseudo-code of AOA

The pseudo-code of the proposed AOA is reported in Algorithm 1 (Fig. 2).

Fig. 2
figure 2

Flowchart of the original AOA

figure a

4 Proposed swarm intelligence for data replication

This part explains the proposed strategy for selecting and placing data replication via a fog node. Based on IoT via fog computing for the proposed strategy, the shortest path, cost, bandwidth, time, cost, and distance were calculated. Use iFogSim to implement the proposed model.

4.1 Cost and time of replication

Cost is a primary factor in placement data replication close to the users according to the user's budget. The cost is different from user to user through the proposed strategy cost nodes are different from one fog node to another according to the placement near the users. The Equation can be represented as follows:

$$ {\text{cost}}\left( {{\text{DT}}^{j} } \right) = \mathop \sum \limits_{y = 1}^{n} {\text{cost}}\left( {{\text{dt}}_{z}^{y} } \right) $$
(6)

where

$$ {\text{cost}}\left( {{\text{dt}}_{z}^{x} } \right) = \mathop \sum \limits_{z = 1}^{m} x_{z}^{y} (p_{z}^{y} + \left( {\frac{{{\text{size}}\left( {{\text{dt}}^{y} } \right)}}{{b_{z}^{y} }}} \right)*t{\text{cost}} $$
(7)

\({\mathrm{DT}}^{i}\) Cost of data set. \({dt}_{z}^{y}\) Data replica in region. \({x}_{z}^{y}\) A binary decision variable q \(\in (1, 2, 3, \dots .. l)\). \({p}_{z}^{y}\) Price of replica. \({b}_{z}^{y}\) Bandwidth network between replicas in region.

4.2 Shortest paths problem (SPP) between nodes based on the Floyd algorithm

The Floyd algorithm obtains the shortest path between the node in fog computing. Generally, when implementing the Floyd algorithm to obtain the weighted length between the shortest path among the DCs in fog computing. This paper aims to solve the issue of selecting and placing dynamic data replication across a geographically distributed node to the shortest and optimal path in data transmission and bandwidth (Cheng et al. 2014). The equations can be represented as follows:

$$ {\text{first weighted adjacency matrix}} A = \, \left[ {ai,j} \right]m \, \times \, m $$
(8)

ai,j is a path from node i to node j matrix m.

The state transition equation is as follows Eq.

$$ {\text{map }}\left[ {I, \, J} \right]{:} = {\text{ min }}\left\{ {{\text{map }}\left[ {I, \, k} \right] \, + {\text{ map }}\left[ {K, \, J} \right],{\text{ map }}\left[ {I, \, J} \right]} \right\} $$
(9)

Map [I, J] demonstrste the shortest distance from I to j. K is the breakpoint of exhausting I and j.

4.3 Popularity degree of the data file

The popularity of a file is determined by frequent access from users, especially in recent use. The file that is very popular with users in recent times is the file that has been identified, replicated, and placed between DCs. The Equation can be represented as:

\({\text{PD}}_{i} = {\text{an}}_{i} *w_{i}\) (10)

Each file’s replication factor (RFi) is calculated based on the popularity degree as in Eq. (11).

$$R{F}_{i}=\frac{P{D}_{i}}{R{N}_{i}*F{S}_{i}}$$
(11)

The dynamic threshold (TH) value is calculated as in Eq. (12).

$$\mathrm{DH}=\mathrm{min}\left(\left(1-\mathrm{\alpha }\right)*{\mathrm{RF}}_{\mathrm{system}}\mathrm{max}\left(\begin{array}{c}\forall \\ \mathrm{k}\in [1, 2, \dots .,\mathrm{l}]\end{array}\mathrm{ R}{\mathrm{F}}_{\mathrm{k}}\right)\right),\mathrm{\alpha }\in \left[\mathrm{0,1}\right]$$
(12)

\(P{D}_{i}\) popularity degree, \({an}_{i}\) number of access, \({w}_{i}\) time based forgetting factor. \(R{F}_{i}\) replica factor, \(R{N}_{i}\) number of replica, \(F{S}_{i}\) size of data file

4.3.1 System-level availability

SBER is the system’s overall high availability. Users should be able to access all files via tasks for data replication. Due to frequent user access, access to the most popular files. SBER keeps the file accessible and well-liked throughout the entire system. The following is a representation of the Equation:

$$\mathrm{SBER}= \frac{\sum_{i=1}^{s}({\mathrm{an}}_{k} x (\sum_{j=1}^{{n}_{k}}{\mathrm{bs}}_{j}) x P ({\mathrm{FA}}_{k}))}{\sum_{i=1}^{s}({\mathrm{an}}_{k} x \left(\sum_{j=1}^{{n}_{k}}{\mathrm{bs}}_{j}\right))}$$
(13)

4.4 Placement of new replicas

Placement dynamic data replica between nodes for optimal selection of minimum distances. When placement data replication between DCs, it considers the optimal minimum path and low cost for users. It can also be represented as:

$$ {\text{br}}_{k} \left( {{\text{dc}}_{i} } \right) = \left\lfloor {\frac{{{\text{RF}}_{k} \left( {{\text{dc}}_{i} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{s} {\text{RF}}_{k} \left( {{\text{dc}}_{i} } \right)}}x_{{{\text{br}}_{k} \left( {{\text{add}}} \right)}} } \right\rfloor $$
(14)

4.5 Salp swarm algorithm

Recently, the SSA (Mirjalili et al. 2017) has been used and inspired by the conduct of a swarm of salps in oceans. Algorithm SSA is inspired by swarm intelligence, simulating the integration between SSA and nature. It consists of two main groups: leaders and followers. First, the leaders do optimal fitness. The rest is followers. Second, the fitness value is calculated for each algorithm’s optimal solutions. The optimal solution found is called a leader. The equations can be represented as follows:

1) Leader Phase: The leader location is modernized using the following Equation:

\(X_{j}^{1} = \left\{ {\begin{array}{*{20}c} {X_{bj} + c_{1} \left( {\left( {ub_{j} - lb_{j} } \right)c_{2} + l } \right)} & if\, {c_{3} > 0.5} \\ {X_{bj} - c_{1} \left( {\left( {ub_{j} - lb_{j} } \right)c_{2} + l } \right)} & {otherwish} \\ \end{array} } \right.\) (15)

\({c}_{1}\) decreases through the iterations as.

Where

\({c}_{1}=2{e}^{- (\frac{4t}{T}{)}^{2}} \) (16)

\({X}_{j}^{1} \mathrm{and} {X}_{bj}\) repesent new placement. \({c}_{2} \mathrm{and} {c}_{3}\) random variable from 0 to 1. \(u{b}_{j} \mathrm{and} l{b}_{j}\) refer the domin of serarch at dimintion j

2) Followers Phase: To modernize the followers’ locations,

Newton’s law of motion is used, which defined as

\({X}_{j}^{i}=\frac{1}{2}{gt}^{2}+{\omega }_{0}t,i \ge 2\) (17)

So, modernizing the procedure of followers can be formulated as

$${X}_{j}^{i}=\frac{1}{2} \left({X}_{j}^{i}+ {X}_{j}^{i-1}\right)$$
(18)

\(t\) iteration. \({\omega }_{0}=0 \mathrm{and} g\) velocity and the acceleration.

4.6 Pseudo-code of SSA

The pseudo-code of the proposed SSA algorithm is reported in Algorithm 2.

figure b

4.7 Mean service time (MST)

MST describes the ability of a system to speed up responses to users. When selecting the more popular files, waiting for users reduces load balancing and minimum bandwidth. The mean service time of file fi can be calculated by:

$${\mathrm{stf}}_{i}=\sum_{j=1}^{m}\left( \mathrm{stf}\left(i,j\right)* \frac{A(i,j)}{A(i)}\right)$$
(19)

The mean service time of the system can be defined as follows:

$$\mathrm{mst}=\frac{1}{n}*\sum_{i=1}^{n}\sum_{j=1}^{m}.\left( \varnothing \left(i,j\right)*\frac{{s}_{i}}{{tp}_{j}}* \frac{A\left(i,j\right)}{A\left(i\right)}\right)$$
(20)

\(\mathrm{stf}\left(i,j\right)\) expected service time of file in data node. \(A\left(i,j\right)\) access rate of read requests from data node. \(A\left(i\right)\) mean access rate. \({s}_{i}\) size of file. \({tp}_{j}\) transfer rate of data node.

4.8 Computational complexity

Calculate the time complexity of the proposed strategy AOASSA from tasks for m number of data repetitions. Calculate the number of num_DCs and AOA with SSA. Suppose N represents the size of the population, D represents the number of objectives, T represents the number of iterations, and CoF represents the cost of function. The SSA algorithm has a calculated complexity of O(T (D * N + CoF * N)). Based on the algorithm phase AOASSA strategy, the time complexity is O (N). Hence, the AOASSA total time complexity is O (N*T*C), so it is O(N).

figure c

5 Experimental evaluation

5.1 Configuration details

AOASSA select and placement dynamic data replication between nodes. The proposed strategy has been implemented on iFogSim. In this section, we will discuss the setting, experimental result and configuration of fog computing for the proposed strategy in detail (Table 1).

Table 1 parameters data sets of the system

5.2 Results and discussion

5.2.1 Selecting the optimal data file

In Fig. 3, 100 to 5000 cloudlets are created on data replication crosses nodes and access to it. The proposed algorithm proved to be the least cost path than the proposed algorithm in terms of accessing nodes through the least distance of fog cloud.

Fig. 3
figure 3

Cost number of tasks

Figure 4 creates from 1000 to 5000 tasks to reach data replication from the minimum path and low cost. The proposed strategy proved to be speed response to users than other algorithms.

Fig. 4
figure 4

Mean service of time for tasks

Figure 5 creates from 50 to 300 data replications to placement from the minimum distance and cost. The average service time in select and placement data replication across nodes is less than the algorithms. The proposed strategy proved to be low mean service time to users than other algorithms.

Fig. 5
figure 5

Mean service of time for file

In Fig. 6, create from 50 to 300 data replication to speed up file access across nodes in cloud computing. The proposed algorithm proves faster access to files across nodes than other algorithms. The proposed strategy proved to be faster for data replication than other algorithms.

Fig. 6
figure 6

Execution time for file

Figure 7, shows the rate of data file transfer across nodes. As shown in Figures a, b, and c, the number of nodes varies to transfer files by the lowest path. The proposed strategy proved to be superior to other algorithms.

Fig. 7
figure 7

Impact of data replication on that transmission nodes

Figure 8 shows the rate of data file transfer across nodes. As shown in Figures a, b, and c, the number of tasks varies to transfer files by the least cost path. The proposed strategy proved to be superior to other algorithms.

Fig. 8
figure 8

Impact of data replication on that transmission tasks

5.3 Performance evaluation

5.3.1 Degree of balancing

Figure 9 shows the degree of imbalance over the network nodes to perform several different cloudlets at different times. The proposed strategy minimizes the degree of imbalance to a low level.

Fig. 9
figure 9

Degree of imbalance

In Fig. 10 shows the standard of load balancing over the fog nodes to perform several different tasks at different times. The proposed strategy minimizes the standard of load balancing to a low level.

Fig. 10
figure 10

Standard of load balancing

In Fig. 11 shows the throughput over the tasks nodes to perform several different tasks at different times. The proposed strategy minimizes the throughput to a low level.

Fig. 11
figure 11

Throughput

6 Conclusion and future work

This article proposes a novel hybrid metaheuristic algorithm based on IoT in fog computing for select and placement data replication. AOASSA is a hybrid AOA with SSA to improve data transmission and the least cost path between nodes in fog computing. Our proposed strategy was compared with other algorithms in fog computing. The proposed strategy enhancement the selection and placement of data replication and choosing the least cost path, throughput and standard of load balancing. The results proved the superiority of the AOASSA strategy over other algorithms according to data transmission, load balancing, and distance. In future work, work on improving our strategy through modern algorithms from metaheuristics. In the future, to reduce the least cost path, bandwidth, and cost of data replication. The evaluation of other priority strategies, such as bandwidth, fault tolerance, and enhancement QoS for the new algorithm, will be studied.