Keywords

1 Introduction

In recent years, the explosive growth of mobile applications and the growing need for low-latency and high-bandwidth services have placed significant strain on the traditional cloud-centric network infrastructure [1, 2]. To tackle these challenges, edge computing has emerged as a promising paradigm that brings computing and storage capabilities closer to end-users. This proximity allows for Lower latency, minimized network congestion and enhanced quality of service (QoS) [3,4,5].

Edge service caching is a critical component of edge computing, which significantly contributes to the improvement of mobile application performance 3. It involves the strategic storage of frequently accessed data and services on edge servers, referred to as Fog Access Points (FAPs). The purpose of this caching strategy is to reduce delay and alleviate the workload on central cloud data centers. By employing edge service caching, faster access to content and services is made possible, especially for latency-sensitive applications like augmented reality, video streaming and real-time data processing.

However, various challenges in this direction are yet to be properly addressed. Firstly, FAPs are with limited computational and storage resources, thus guaranteeing only a small amount of services is cachable and making hit rate low. Secondly, in a highly dynamic and volatile edge environment, static caching strategies are often inadequate in meeting the changing needs. Nevertheless, how to yield run-time caching decisions according to time-varying service popularity and user needs remains a difficulty. Finally, existing methods in this direction usually aim to optimize caching performance, in terms of hit rate and delivery latency. How to guarantee profit of service providers is less studied.

In this paper, we propose a novel caching method by leveraging a federated learning model for popularity prediction and a deep reinforcement learning model for yielding caching decisions (FPDRD). The FPDRD method takes both global service popularity and local user preferences as inputs and can achieve reasonable tradeoffs between caching performance and profit of providers. Extensive simulations are conducted based on a well-known dataset, Movielens. Numerical results clearly indicate that our proposed method outperform it peers.

The paper is organized as follows: Section 2 provides a literature review. Section 3 presents the system model and the problem formulation. Section 4 describes the proposed method. Section 5 presents the empirical analysis.

2 Related Work

Task offloading and caching in MEC have gained significant attention in recent years as a means to alleviate the resource constraints faced by FAPs. In their work, Gao et al. [7] presented a method that combines task offloading scheduling and resource allocation to minimize task delay and energy consumption. Liu et al. [8] proposed an approach that utilizes online computation offloading and resource scheduling to tackle the challenges arising from user mobility and network dynamics.

Due to resource and energy constraints, FAPs are usually allowed to cache limited services [9]. Thus, caching of highly popular services in FAPs has emerged as an effective solution when FAPs are limited in caching capacity. Zhong et al. [10] proposed the Cocktail Edge Caching method, which utilizes an ensemble learning algorithm to predict the popularity of services. However, this approach only takes into account the overall service popularity while neglecting the preferences of local users. In contrast, Li et al. [11] propose a service caching method that considers hit actions and user perception preferences. However, this approach raises concerns regarding user privacy and security, as it shares all users’ personal information for prediction purposes.

Recently, the cooperative service caching mechanism was proposed as well for exploiting multiple cache nodes through making them working together [12,13,14,15]. The problem of cooperative service caching can be formulated as a Mixed-Integer Nonlinear Programming (MINLP) problem, which is known to be inherently NP-hard [16, 17]. Wu et al. [18] and Xu et al. [19] propose a coalition formation algorithm that utilizes a hedonic game among cooperative service providers for optimizing both the overall profit of the coalition and the average profit of each individual participant. Li et al. [20] propose a DRL algorithm-based profit-driven cooperative service placement method in MEC.

3 System Models and Problem Formulation

Here, we consider a service caching system in MEC represented as \(\mathcal {G}=(\mathcal {CCS} \cup \mathcal {N} \cup \mathcal {U} \cup \mathcal {L})\), as illustrated in Fig. 1. This system consists of a central cloud server (CCS), multiple fog access points (FAPs) denoted as \(\mathcal {N}=\{\mathcal {N}_{1},\mathcal {N}_{2},...,\mathcal {N}_{n}\}\), multiple users denoted as \(\mathcal {U}=\{\mathcal {U}_{1},\mathcal {U}_{2},...,\mathcal {U}_{u}\}\) and a set of service types denoted as \(\mathcal {L}=\{\mathcal {L}_{1},\mathcal {L}_{2},...,\mathcal {L}_{l}\}\). Network edge service providers operate each FAP, which is equipped with storage, computing and communication capabilities. Users are provided services by these FAPs by using a billing mechanism.

Fig. 1.
figure 1

System model.

3.1 Caching Model

Due to capacity and storage limits, FAPs cache a subset of application services. These cached applications require periodic updates and replacement. The caching decisions is represented by a binary variable \(x_{n,l}\), which can be expressed as:

$$\begin{aligned} x_{n,l}={\left\{ \begin{array}{ll}1,&{}\text {if service }\mathcal {L}_{l}\text { is cached in the FAP }\mathcal {N}_{n}, \\ 0,&{}\text {otherwise.}\end{array}\right. } \end{aligned}$$
(1)

The constraint on the storage space at the FAP \(\mathcal {N}_{n}\) is:

$$\begin{aligned} \sum _{l\in \mathcal {L}} x_{n,l}\omega _{l}\le \Omega _{n} \end{aligned}$$
(2)

where \(\omega _{l}\) represents the storage capacity required for service \(\mathcal {L}_{l}\) and \(\Omega _{n}\) the total cache capacity of FAP \(\mathcal {N}_{n}\).

Users are charged for use of service \(\mathcal {L}_{l}\). Such charge is proportional to use time:

$$\begin{aligned} F_{l}=t_{l}\cdot p_{l} \end{aligned}$$
(3)

where \(p_{l}\) represent the price charged by the service provider for the execution of service \(\mathcal {L}_{l}\) per unit of time.

3.2 Computation and Communication Model

According to a collaborative service cache mechanism (CSCM) [21], where FAP \(\mathcal {N}_{n}\) receives a service request from a user. Initially, FAP \(\mathcal {N}_{n}\) checks its local cache to determine if service \(\mathcal {L}_{l}\) is cached. If the service is cached locally, FAP \(\mathcal {N}_{n}\) handles the user’s service request directly. Otherwise, FAP \(\mathcal {N}_{n}\) seeks cached services from neighboring FAPs. When multiple FAPs cache service \(\mathcal {L}_{l}\), FAP \(\mathcal {N}_{n}\) turns to FAP \(\mathcal {N}_{m}\) with the lowest transmission cost. When none of the FAPs caches the required service \(\mathcal {L}_{l}\), the user’s service request is forwarded to the CCS. The cost \(C_{n,l}^{x}\) represents the charge by FAP \(\mathcal {N}_{n}\) when processing service \(\mathcal {L}_{l}\) through server x:

$$\begin{aligned} C_{n,l}^{x}=\beta _{1}\cdot L_{n,l}^{x}+\beta _{2}\cdot E_{n,l}^{x} \end{aligned}$$
(4)

where \(L_{n,l}^{x}\) and \(E_{n,l}^{x}\) denote the delay and energy consumption when FAP \(\mathcal {N}_{n}\) processes service \(\mathcal {L}_{l}\) through server x, respectively. \(\beta _{1}\) and \(\beta _{2}\) indicate the economic factors associated with the delay and energy consumption, respectively.

When FAP \(\mathcal {N}_{n}\) receives a service request from a user, it first checks whether service \(\mathcal {L}_{l}\) is cached locally. In this case, FAP \(\mathcal {N}_{n}\) directly processes the user’s service request. In such a scenario, FAP \(\mathcal {N}_{n}\) places the service onto the thread of the queue with the shortest waiting time for processing. In this context, the local delay and energy consumption for service \(\mathcal {L}_{l}\) are:

$$\begin{aligned} &L_{n,l}^{local}=(\sum _{o\in q_{n}}{\zeta _{o}+\zeta _{l}})\cdot t_{c}+L_{base}\end{aligned}$$
(5a)
$$\begin{aligned} &E_{n,l}^{local}=\zeta _{l}f_{n}^{2}\epsilon _{n}+E_{base} \end{aligned}$$
(5b)

where \(q_{n}\) represents the set of tasks in the queue with the shortest waiting time on FAP \(\mathcal {N}_{n}\), \(\zeta _{l}\) the computational workload, \(t_{c}\) the unit processing time for a task and \(f_{n}\) the computing capacity of FAP \(\mathcal {N}_{n}\).

In case that service is not cached locally at FAP \(\mathcal {N}_{n}\), the FAP \(\mathcal {N}_{n}\) turns to other servers. According to Shannon’s formula, the transmission rate between them is:

$$\begin{aligned} r_{i,j}=B_i\log _2\left( 1+\frac{P_i|h_i|^2}{\sigma ^{2}}\right) \end{aligned}$$
(6)

where \(B_{i}\) represents the bandwidth rate, \(P_{i}\) the transmission energy consumption, \(|h_i|^2\) the channel gain and \(\sigma ^{2}\) the variance of the additive white Gaussian noise (AWGN).

In case that FAP \(\mathcal {N}_{n}\) finds one or more servers that cache service \(\mathcal {L}_{l}\) among the neighboring FAPs, it chooses the FAP \(\mathcal {N}_{m}\) with the lowest cost for processing the user’s service request. In this case, the delay and energy consumption for executing service \(\mathcal {L}_{l}\) are:

$$\begin{aligned} &L_{n,l}^{m}=(\sum _{o\in q_{m}}{\zeta _{o}+\zeta _{l}})\cdot t_{c}+\frac{d_{m,n}^{l}}{r_{m,n}}+L_{base}\end{aligned}$$
(7a)
$$\begin{aligned} &E_{n,l}^{m}=\zeta _{l}f_{m}^{2}\epsilon _{m}+P_{m}\frac{d_{m,n}^{l}}{r_{m,n}}+E_{base} \end{aligned}$$
(7b)

Where \(d_{m,n}^{l}\) represents the bit size of the computed result of the service request l that is processed by FAP \(\mathcal {N}_{m}\) and forwarded to FAP \(\mathcal {N}_{n}\).

In case that neither FAP \(\mathcal {N}_{n}\) nor its neighboring FAPs cache service \(\mathcal {L}_{l}\), the service request is offloaded to the CCS. In this case, the delay and energy consumption for executing service \(\mathcal {L}_{l}\) are:

$$\begin{aligned} &L_{n,l}^{ccs}=\zeta _{l}\cdot t_{c}+\frac{d_{ccs,n}^{l}}{r_{ccs,n}}+L_{base}\end{aligned}$$
(8a)
$$\begin{aligned} &E_{n,l}^{ccs}=\zeta _{l}f_{ccs}^{2}\epsilon _{ccs}+P_{ccs}\frac{d_{ccs,n}^{l} }{r_{ccs,n}}+E_{base} \end{aligned}$$
(8b)

For each service request, the profit obtained by FAP \(\mathcal {N}_{n}\) can be calculated as the gap between the service request’s fee and the total cost incurred. The overall cost is comprised of the cost of caching service \(C_{x,l}\) in the server x, the cost of collaboration \(C_{n,l}^{x}\) with other servers x and the equipment-related baseline cost \(C_{base}\). Thus, profit of service request is:

$$\begin{aligned} &V_{n,l}^{local}=F_{l}-C_{n,l}-C_{n,l}^{\mathcal {N}_{n}}-C_{base}\end{aligned}$$
(9a)
$$\begin{aligned} &V_{n,l}^{m}=F_{l}-C_{m,l}-C_{n,l}^{\mathcal {N}_{m}}-C_{base}\end{aligned}$$
(9b)
$$\begin{aligned} &V_{n,l}^{ccs}=F_{l}-C_{ccs,l}-C_{n,l}^{CCS}-C_{base} \end{aligned}$$
(9c)

Consequently, the total profit earned by FAP \(\mathcal {N}_{n}\) is:

$$\begin{aligned} \begin{aligned} V_{n}=\sum _{l\in \mathcal {L}}\left( P_{n,l}^{local}\cdot V_{n,l}^{local}+ P_{n,l}^{\mathcal {N}_{m}}\cdot V_{n,l}^{m}+ P_{n,l}^{ccs}\cdot V_{n,l}^{ccs}\right) \end{aligned} \end{aligned}$$
(10)

where \(P_{n,l}^{local}\), \(P_{n,l}^{\mathcal {N}_{m}}\) and \(P_{n,l}^{ccs}\) are binary variables represent the execution modes for FAP \(\mathcal {N}_{n}\) to handle service request \(\mathcal {L}_{l}\) as local execution, execution with assistance from FAP \(\mathcal {N}_{m}\) and execution via the CCS, respectively.

Additionally, the QoS of users \(\overline{U_{l}}\) is decided by the average delay \(\overline{L_{l}}\) and average fee \(\overline{F_{l}}\) of executing service \(\mathcal {L}_{l}\):

$$\begin{aligned} \overline{U_{l}}=\eta _{l}\frac{\overline{L_{l}}-L_{min}}{L_{max}-L_{min}}+\eta _{f}\frac{\overline{F_{l}}-F_{min}}{F_{max}-F_{min}} \end{aligned}$$
(11)

where \(\eta _{l}\) and \(\eta _{f}\) represent the impact factors of delay and price on the QoS of users, respectively. \(L_{max}\), \(F_{max}\) and \(L_{min}\), \(F_{min}\) represent the maximum and minimum of delay and fee for executing service \(\mathcal {L}_{l}\), respectively.

3.3 Problem Formulation

Based on the system model given above we are interested in maximizing profits of service provider with the constraints of caching capacities. According to (10), the profit of FAP \(\mathcal {N}_{n}\) is decide by the local cache hit rate \(P_{n,l}^{local}\) and the collaborative cache hit rate \(P_{n,l}^{\mathcal {N}_{m}}\). The resulting optimization formulation is thus:

$$\begin{aligned} \textbf{P}:\ \ \max \sum _{t\in T}\sum _{\mathcal {N}_n\in \mathcal {N}}{{V_{n}}} \end{aligned}$$
(12)
$$\begin{aligned} {\textbf {s.t.}} {} & {} \textbf{C1}:&\overline{U_{l}}<U_{min},\forall \mathcal {L}_{l}\in \mathcal {L}\end{aligned}$$
(13a)
$$\begin{aligned} {} & {} \textbf{C2}:&x_{n,l}\in \{0,1\},\forall \mathcal {N}_{n}\in \mathcal {N},\forall \mathcal {L}_{l}\in \mathcal {I}\end{aligned}$$
(13b)
$$\begin{aligned} {} & {} \textbf{C3}:&\sum _{l\in \mathcal {L}} x_{n,l}\omega _{l}\le \Omega _{n},\forall \mathcal {N}_{n} \in \mathcal {N}\end{aligned}$$
(13c)
$$\begin{aligned} {} & {} \textbf{C4}:&P_{n,l}^{local},P_{n,l}^{\mathcal {N}_{m}},P_{n,l}^{ccs}\in \{0,1\}, \forall \mathcal {N}_{n},\mathcal {N}_{m} \in \mathcal {N},\forall l\in \mathcal {L}\end{aligned}$$
(13d)
$$\begin{aligned} {} & {} \textbf{C5}:&P_{n,l}^{local}+,P_{n,l}^{\mathcal {N}_{m}}+P_{n,l}^{ccs}=1\end{aligned}$$
(13e)
$$\begin{aligned} {} & {} \textbf{C6}:&V_{n}\ge 0, \forall \mathcal {N}_{n} \in \mathcal {N}\end{aligned}$$
(13f)
$$\begin{aligned} {} & {} \textbf{C7}:&Num_{n,l}\le 1 \end{aligned}$$
(13g)

Constraint (13a) ensures that the QoS of \(\mathcal {L}_{l}\) is bounded. Constraints (13b) and (13c) indicate the limit of total storage capacity of cached services on FAP \(\mathcal {N}_{n}\). Constraint (13f) guarantees that the profit for each FAP \(\mathcal {N}_{n}\) must be non-negative. Constraint (13g) indicates that each FAP \(\mathcal {N}_{n}\) can cache service \(\mathcal {L}_{l}\) at most once. The above optimization problem is clearly a Mixed-Integer Nonlinear Programming (MINLP) one, which is also NP-hard.

4 The Proposed Method

In this section, we present a detailed description of the FPDRD method. Firstly, we employ a Federated Learning model for accurate prediction of local popularity by taking the global popularity model and user perception preferences as inputs. We maintain a popularity priority queue \(Q_{c}^{\mathcal {N}_{n}}\) of FAP \(\mathcal {N}_{N}\) and feed the popularity priority queues \(Q_{c}\) for each FAPs as input of into a deep reinforcement learning model. The learning model yields collaborative service caching decisions according to the optimization objective and constraints.

Fig. 2.
figure 2

Popularity prediction model.

4.1 Federated Learning for Popularity Prediction

As shown in Fig. 2, we implement the prediction of popular services based on FL algorithm. The popularity prediction includes the following three steps:

Download Global Model. At the start of each time slot t, every FAP retrieves the global model parameters \(\textbf{W}_{t}\) from the CCS (Lines 3). These model parameters facilitate the extraction of latent features to predict popular services. This enables each FAP to determine the overall popularity for services during the current time slot.

Local Model Training. Upon receiving the global model parameters \(\textbf{W}_{t}\), each FAP updates its local model through training iterations (Line 4–8). Subsequently, the updated local model \(\textbf{H}_{t}^{\mathcal {N}_{n}}\) is uploaded to the CCS (Line 9). This local model incorporates hidden features related to global service popularity as well as captures hidden features specific to the local users’ service perception preferences. By utilizing FAP \(\mathcal {N}_{n}\)’s local model \(\textbf{H}_{t}^{\mathcal {N}_{n}}\), the local service popularity priority queue \(Q_{c}^{\mathcal {N}_{n}}\) can be predicted (Line 10). The loss function employed is the categorical cross-entropy, a generalized version of binary cross-entropy, to determine the logarithmic loss for multi-class predictions. This loss function measures the misclassification between the true service \(\mathcal {L}_{l}\) label \(\boldsymbol{\pi }\) and the predicted service \(\mathcal {L}_{l}\) label \(\hat{\boldsymbol{\pi }}\) defined as cross-entropy:

$$\begin{aligned} L(\hat{\boldsymbol{\pi }},\boldsymbol{\pi })=-\sum _{i}\boldsymbol{\pi }_{i}\log (\hat{\boldsymbol{\pi }}_{i}) \end{aligned}$$
(14)

Then, we estimate the loss according to its Mean Squared Error (MSE):

$$\begin{aligned} L(\hat{\phi },\phi )=\mathbb {E}[(\phi _{i}-\hat{\phi }_{i})^{2}] \end{aligned}$$
(15)

where \(\phi _{i}\) is the real service request label and \(\hat{\phi }_{i}\) the predicted one.

figure a

Federated Aggregation. After receiving the uploaded local models \(\textbf{H}_{t}\) from FAPs, the CCS updates the global model \(\textbf{W}_{t+1}\) (Line 12). To address the issue of imbalance in the local models across different FAPs, a weighted federated aggregation approach is employed by assigning different aggregation weights to the local models uploaded by different FAPs. In that case, the updated global model \(\textbf{W}_{t+1}\) is:

$$\begin{aligned} \textbf{W}_{t+1} = \textbf{W}_{t} + \sum \limits _{\mathcal {N}_{n}\in \mathcal {N}}\nabla _{t}^{\mathcal {N}_n}\frac{M_{t}^{\mathcal {N}_{n}}}{\sum \limits _{\mathcal {N}_{n}\in \mathcal {N}}M_{t}^{\mathcal {N}_{n}}}(\textbf{W}_{t}-\textbf{H}_{t}^{\mathcal {N}_{n}}) \end{aligned}$$
(16)

where \(\nabla _{t}^{\mathcal {N}_{n}}\) represents the gradient step size and \(M_{t}^{\mathcal {N}{n}}\) the number of service requests received by FAP \(\mathcal {N}{n}\) at time t.

figure b

4.2 Deep Reinforcement Learning for Caching Decisions

Upon receiving the service popularity priority queue at each FAP for the current time slot, we utilize a deep Reinforcement learning model to determine the optimal cooperative caching decisions. The objective of this approach is to maximize the profits of FAPs while maintaining QoS for users.

State. We consider the services cached by FAP \(\mathcal {N}_{n}\) as the current state \(s^{\mathcal {N}_{n}}(t)\), where the cached services are primarily selected based on the predicted queue \(Q_{c}^{\mathcal {N}_{n}}\) obtained from Algorithm 1. Therefore, the current state can be represented as \(s(t)^{\mathcal {N}_{n}}=(s_{1}^{\mathcal {N}_{n}},s_{2}^{\mathcal {N}_{n}},...,s_{c}^{\mathcal {N}_{n}})\), where \(s_{i}^{\mathcal {N}_{n}}\) represents the ith popular service cached in FAP \(\mathcal {N}_{n}\).

figure c

Action. We define the action \(a=(a^{\mathcal {N}_{1}},a^{\mathcal {N}_{2}},...,a^{\mathcal {N}_{n}})\) to represent the set of actions for all FAPs.where \(a^{\mathcal {N}_{n}}=(a_{1}^{\mathcal {N}_{n}},a_{2}^{\mathcal {N}_{n}},...,a_{c}^{\mathcal {N}_{n}})\) represents whether it is necessary to replace the service in FAP \(\mathcal {N}_{n}\). In this context, \(a_{i}^{\mathcal {N}_{n}} = 0\) indicates that there is no need to replace the service stored in the ith position of FAP \(\mathcal {N}_{n}\) cache, while \(a_{i}^{\mathcal {N}_{n}} = 1\) indicates that it is necessary to replace the service stored in the ith position of the FAP \(\mathcal {N}_{n}\). In case that the action function is implemented using the \(\varepsilon \)-greedy method:

$$\begin{aligned} a(t)={{\text {*}}{argmax}}(Q(s(t),a;\theta )) \end{aligned}$$
(17)

Reward. We define the reward function \(\text {r(t)}\) to maximize the profit obtained by FAPs. After taking action a(t), the corresponding reward r(t) is obtained and the transition from state s(t) to \(s(t+1)\) occurs. Consequently, we can construct a \((s\left( t\right) ,a\left( t\right) ,r\left( t\right) ,s\left( t+1\right) )\) transition, which is stored in the replay buffer. Then, the action-value function is updated:

$$\begin{aligned} \begin{aligned}Q&\left( s_{i+1},a_{i+1};\theta \right) =Q\left( s_i,a_i;\theta \right) +\alpha \left[ y_{i}-Q\left( s_i,a_i;\theta \right) \right] \end{aligned} \end{aligned}$$
(18)

where \(\alpha \) represents the learning rate and \(y_{i}\) the target Q-value of the target network of tuple i:

$$\begin{aligned} y_{i}=r_i+\gamma \max \hat{Q}\left( s_{i+1},a_{i+1};\theta \right) \end{aligned}$$
(19)

where \(\gamma \) is the discount factor. The loss function \(L(\theta _i^n)\) of network is:

$$\begin{aligned} L(\theta _i^n)=\mathbb {E}\left[ \left( y_{i}-Q\left( s_j,a_j;\theta \right) \right) ^2\right] \end{aligned}$$
(20)

The gradient calculation of the loss function \(\nabla _{\theta }L(\theta )\) for all sampled tuples is:

$$\begin{aligned} \nabla _\theta L(\theta )=\mathbb {E}\left[ \left( y_i-Q\left( s_i,a_i,\theta \right) \right) \nabla _{\theta ^i}Q\left( s_i,a_i,\theta \right) \right] \end{aligned}$$
(21)

At the end of time slot t, the parameters of the network \(\theta \) are updated as:

$$\begin{aligned} \theta \leftarrow \theta -\eta _\theta \nabla _\theta L(\theta ) \end{aligned}$$
(22)

where \(\eta _{\theta }\) is the learning rate of prediction network.

Firstly, at each time instant t, we take the predicted popularity priority queue \(Q_{c}\) that obtained in Algorithm () and the cache state s(t) of the FAPs as input (Line 3–4). Secondly, we use a deep reinforcement learning model that combines the profit calculation model in Algorithm () as the model indicator for training to obtain the optimal caching decisions model (Line 5–12). Finally, each FAP \(\mathcal {N}_{N}\) selects the services from the prediction queue \(Q_{c}^{\mathcal {N}_{N}}\) for replacement according to the caching decisions model (Line 14–15).

5 Performance Evaluation

5.1 Simulation Configuration

In this paper, we developed a simulation environment based on the Movielens dataset (ml–25 m) [22], which consist of 25,000,095 ratings and 1,093,360 tags of 62,423 movies created by 162,541 users. The datasets also include the related information about the involved movies, such as titles and genres, as well as user attributes including ID number, gender, age and postcode. We assumed that user preferences are represented by movie ratings and the number of ratings corresponds to the number of user preferences. The publication time of ratings is considered as the request initiation time. All the experiments are conducted on the same computer with an AMD Ryzen7 4800H 2.90 GHz processor, 16.0 GB of RAM and using PyTorch 2.0.

Fig. 3.
figure 3

The performance of algorithms in different time spans.

5.2 Baselines

We compare our method against four baselines:

  1. 1)

    DRLVCC: The baseline initially employs a Convolutional Neural Network (CNN) model to assess the popularity of new requests at different locations. Subsequently, by a path-responsive vertical cooperative caching approach based on a deep reinforcement learning model to formulate caching decisions [23].

  2. 2)

    UPP-CL-CC: The baseline employs an LSTM model to dynamically capture user activities and preferences, thereby extracting local popularity information for FAPs which are subsequently subjected to clustering. Building upon this foundation, the author proposed a novel greedy approach to address the cache placement issue [24].

  3. 3)

    Random: Each FAP replaces unrequested services with a probability of \(\epsilon \). In our simulation, \(\epsilon =0.1\).

  4. 4)

    First-In-First-Out Scheme (FIFO): FAPs cache services based on the order of service requests and discard the oldest cached services when the cache space runs out.

5.3 Performance Analysis

We perform experiments under three scenarios:

  1. 1)

    We intercept different time spans of the datasets and increased them by one day at a time to observe how time spans impact service caching performance.

  2. 2)

    We study the impact of the number of service types on algorithm performance while fixing the time interval at 1 day and setting FAPs’ cache capacity to 100.

  3. 3)

    We compare how FAPs’ cache capacity influences algorithm performance while keeping the time interval fixed at 1 day and the number of service types fixed at 1000.

As shown in Fig. 3, the FPDRD method exhibits the best overall performance. Fig. 3(c) indicates that by considering the temporal variation of overall service popularity and the specific preferences of local users, our predictive queue aligns more closely with real-world scenarios, leading to significantly higher caching hit rates compared to the baseline algorithm. Figure 3(a) and Fig. 3(b) demonstrate that the FPDRD method achieves the lowest average delay and highest average profit on various time-span datasets. This achievement is attributed to the combination of a more accurate predictive model and the training of the decision-making network using DRL algorithms, resulting in a caching decisions model that can simultaneously safeguard the QoS of users and enhance providers’ of network services’ profit.

Fig. 4.
figure 4

The impact of the number of service types on algorithms.

As shown in Fig. 4, it is clear that an increase in service types negatively impacts the performance of all algorithms. However, different algorithms show different degrees of performance change. In particular, the FPDRD method exhibits a slower performance degradation while still maintaining the best overall performance. In contrast, the UPP-CL-CC method experiences a more rapid performance degradation. The observed trends suggest a trade-off between the algorithms’ capacity to adapt and optimize caching decisions effectively as the number of service types rises. The ability of the FPDRD method to respond quickly to environmental changes enables it to maintain superior overall performance even when confronted with a progressively diverse range of service types.

Fig. 5.
figure 5

The impact of the FAPs cache capacity on algorithms.

As shown in Fig. 5, the FPDRD algorithm achieves the highest performance for different FAPs cache capacities. Additionally, with increasing cache capacity of FAPs, the performance improvement of FPDRD method becomes more pronounced. The experimental results demonstrate the efficiency of the FPDRD method in popularity prediction and caching decisions, allowing for the efficient utilization of the available cache resources. As the cache capacity of FAPs increases, the algorithm can utilize this extra storage capacity to make more informed and optimized caching decisions. Therefore, the algorithm improves the cache hit rate and quality of service, ultimately enhancing network services and benefiting both end users and network service providers.

6 Conclusion

This paper investigates the service caching problem in MEC and proposes a novel caching method by leveraging a federated learning model for popularity prediction and a deep reinforcement learning model for yielding caching decisions (FPDRD). The experimental results clearly demonstrate that the superiority of the FPDRD method in achieving improved cache hit rates in MEC. This is accomplished by effectively considering the temporal variability of overall service popularity and the specificity of local user preference perception in predictions. Furthermore, the proposed method maximizes the utilization of limited storage and computational resources by promoting collaboration among FAPs. In case that the FPDRD method ensures the QoS of users and maximize the profits of network service providers. In the future, we aim to address the problem of resource idleness in FAPs due to the mismatch between storage and computing resource requirements of service and plan to optimize the fault-tolerance in collaborative service caching.