Keywords

1 Introduction

Field of metaverse and autonomous driving, massive amounts of data place incredibly high demands on hash rate, and the construction of a mobile edge computing (MEC) network platform is expected to cope with this challenge [1, 2]. The network of edge nodes allocates computing resources upon requests of offloading a task to the network. Scheduling methods have been developed, e.g., workflow-based dynamic scheduling algorithms [3, 4], self-adaptive learning particle swarm optimization (SLPSO) [5, 6]. Methods based on Deep Reinforcement Learning are developed for the task scheduling problem [7] in the MEC system.

In addition, setting optimization goals is essential in algorithm evaluation. Many scholars use task delay [8] and energy consumption [9, 10] as evaluation goals [11,12,13,14].

This paper proposes an optimal scheduling strategy suitable for delay constraints and user rent constraints to address this challenge. We propose an improved DRL-based task scheduling algorithm to solve multiple tasks’ edge cooperative scheduling problem. Our main contributions are as follows:

  • An edge collaborative task scheduling framework is proposed for the computational resource allocation and task scheduling problems in edge scenarios. The problem considers the bandwidth resources of the server, task characteristics, task queue state of the target server, and arithmetic power level.

  • We propose a DRL-based algorithm to solve this scheduling problem, improving algorithm convergence and minimizing the weighted sum of task latency and cost by allocating appropriate bandwidth resources and target servers.

  • Simulation results demonstrate that the proposed method outperforms the baseline algorithms. In addition, the hierarchical settings of arithmetic power for servers in the network effectively reduces the system energy consumption.

The rest of this article is organized as follows. Section 2 introduces the system model and describes the problem. Section 3 presents our proposed algorithm including algorithm design and process description. Section 4 presents our experimental results and discussion. Section 5 concludes this paper with a summary.

2 System Model and Problem Description

2.1 Model Overview

As shown in Fig. 1, our model consists of servers with hierarchical hash rate and multiple end-users. There are two waiting queues on each server, the waiting scheduling queue, and the waiting execution queue. The edge servers can be represented as \({\mathcal M} = \left\{ {1, 2, \ldots , M} \right\} \). \({\mathcal N} = \left\{ {1, 2, \ldots , N} \right\} \) represents a task generated by the end-user.

Fig. 1.
figure 1

The structure of our system model.

Users send task requests to the closest server \(m'\) first, which we define as the local server. The local server receives the task \({A_{m'n}} = \left\{ {{C_{m'n}}, {L_{m'n}}, Cos{t_{m'n}}} \right\} \), where \({C_{m'n}}\) denotes the task size and \({L_{m'n}},Cos{t_{m'n}}\) denotes the task’s latest response time and the maximum acceptable overhead for that task, respectively.

2.2 Task Scheduling Model

After the user sends a task request to the local server \(m'\), the scheduler in the local server \(m'\) allocates transmission bandwidth and gives it to the specified target server m for execution. The entire scheduling process can be divided into the following four stages.

Task Transmission: After the local server \(m'\) receives the task request, it allocates appropriate bandwidth resources to it, and the end-user transmits the task to the local server \(m'\) through the wireless channel, we ignore the time delay, the uplink transmission rate can be defined as:

$$\begin{aligned} {o_{m'n}} = Q \mathrm{{log}}\left( {1 + \frac{{{p_{m'n}}{{\left| {{h_{m'n}}} \right| }^2}}}{{{\sigma ^2}}}} \right) \end{aligned}$$
(1)

where \({p_{m'n}}\) is transmission channel bandwidth between the task n and the local server \(m'\), \({h_{m'n}}\) represents the channel gain, which is time-varying, and \({\sigma ^2}\) is the noise. The transfer time of the task is:

$$\begin{aligned} S{T_{m'n}} = \frac{{{C_n}}}{{{o_{m'n}}}} \end{aligned}$$
(2)

Task Scheduling: The scheduler of the local server gives the optimal task scheduling policy according to the currently observed network status, such as the bandwidth of the local server and the available computing resources of each server in the network.

Waiting for Execution: We use \(Qu{e_m}\) to indicate the number of tasks in the waiting execution queue of the target server m. Therefore, the waiting execution time of the task is:

$$\begin{aligned} W{T_{mn}} = \left\{ {\begin{array}{*{20}{l}} 0,&{}&{}{}&{}{Qu{e_m} = 0}\\ {free{T_{mn}} - A{T_{mn}},}&{}&{}{}&{}{else} \end{array}} \right. \end{aligned}$$
(3)

where \(free{T_{mn}}\) represents the idle time of the server m after task n arrives. \(A{T_{mn}}\) is the task n arrival time to the target server. If it is in working condition, the task needs to wait. At this time, the idle time of the target server can be expressed as:

$$\begin{aligned} free{T_{mn}} = \left\{ {\begin{array}{*{20}{l}} {free{T_{\mathrm{{m}}n'}} + E{T_{\mathrm{{m}}n'}},}&{}{}&{}{if}&{}{free{T_{\mathrm{{m}}n'}} > A{T_{n'}}}\\ {A{T_{n'}} + E{T_{\mathrm{{m}}n'}},}&{}{}&{}{}&{}{else} \end{array}} \right. \end{aligned}$$
(4)

where \(n'\) represents the previous task of the task n, \(free{T_{\mathrm{{m}}n'}}, E{T_{\mathrm{{m}}n'}}, A{T_{n'}}\) denote respectively the idle time of the target server after task \(n'\) arrive, the execution time of the task \(n'\), and the arrival time after the task \(n'\) arrives at the target server.

Task Execution: The task size \({C_n}\) and server compute speed \({f_m}\) are known. So the estimated computing time of the task n on the server m can be expressed as

$$\begin{aligned} E{T_{mn}} = \frac{{{C_n}}}{{{f_m}}}. \end{aligned}$$
(5)

In summary, after the completion of task n, the estimated completion time is the sum of the task n’s transmission time, waiting time for execution, and task execution time, which is expressed as follows

$$\begin{aligned} {T_n} = \;S{T_{m'n}} + W{T_{mn}} + E{T_{mn}}. \end{aligned}$$
(6)

2.3 User Cost Model

The unit cycle price is \({u_m}\), and the higher the hash rate, the greater the \({u_m}\). Therefore, the total cost to be paid to the service after the execution of task n is

$$\begin{aligned} {U_{mn}} = \frac{{{C_n}}}{{{f_m}}}*{u_m}. \end{aligned}$$
(7)

2.4 Problem Description

Our optimization goal is to minimize task execution delay and user cost overhead. The optimization problem can be described as follows

$$\begin{aligned} \begin{array}{*{20}{l}} {P1:}&{}{}&{}{\mathop {\min }\limits _p \left\{ {\alpha *\mathop \sum \limits _{n = 1}^N {T_n} + \left( {1 - \alpha } \right) *\mathop \sum \limits _{n = 1}^N {U_{mn}}} \right\} }&{}{}\\ {s.t.}&{}{}&{}{C1:\mathrm{{\alpha }} \in \left( {0,1} \right) }&{}{}\\ {}&{}{}&{}{\mathrm{{C}}2:0< {f_m} \le f_m^{max},}&{}{\forall m \in \mathcal{M}}\\ {}&{}{}&{}{\mathrm{{C}}3:0 < {p_{m'n}} \le p_{m'}^{max},}&{}{\forall m' \in \mathcal{M},\;\forall n \in \mathcal{N}}\\ {}&{}{}&{}{\mathrm{{C}}4:{T_n} \le {L_n},}&{}{\forall n \in \mathcal{N}}\\ {}&{}{}&{}{\mathrm{{C}}5:{U_{mn}} \le Cos{t_n},}&{}{\forall m \in \mathcal{M},\;\forall n \in \mathcal{N}} \end{array} \end{aligned}$$
(8)

where \(\mathrm{{\alpha }}\) represents the weight factor, which the scheduler can define according to different task requirements [15]. C2 indicates that the server’s hash rate is limited, and C3 indicates that the transmission power between each server and the task does not exceed the maximum value. Conditions C4 and C5 represent the task’s constraints on time and cost.

3 Proposed Method

We transform Problem P1 into an MDP problem and design a Deep Deterministic Policy Gradient-based online scheduling and allocation (SA-DDPG) algorithm to solve it. The structure of our proposed network is shown in Fig. 2.

Fig. 2.
figure 2

SA-DDPG algorithm structure.

3.1 MDP-Based Task Scheduling Problems

State Space: The system state \(s\left( t \right) \) is consists of two components: the channel gain \({{h_{{m'}n}}\left( t \right) }\) between the user and the local server and the server state information \({s_m}\left( t \right) \) in the network. Thus, the state space at moment t can be expressed as :

$$\begin{aligned} s\left( t \right) = \left\{ {{h_{m'n}}\left( t \right) ,{s_1}\left( t \right) ,{s_2}\left( t \right) , \ldots \ldots {s_m}\left( t \right) } \right\} \end{aligned}$$
(9)

Action Space: We define the action space as:

$$\begin{aligned} a\left( t \right) = \left\{ {{p_{m'n}}\left( t \right) ,{a_{1n}}\left( t \right) ,{a_{2\mathrm{{n}}}}\left( t \right) , \ldots {a_{mn}}\left( t \right) } \right\} \end{aligned}$$
(10)

Here \({a_{mn}} \in \left\{ {0,1} \right\} \), and \({a_{mn}} = 1\) represents offloading task n to server m, \({a_{1n}} + {a_{2\mathrm{{n}}}} + \ldots + {a_{mn}} = 1\).

Reward Function: Our optimization problem P1 is to minimize the delay and cost of task completion, so we set the reward to:

(11)

3.2 SA-DDPG Algorithm Framework

We describe the network structure of the algorithm in Fig. 2. The weight parameters \({\theta ^\mu }\) and \({\theta ^Q }\) of the Actor network and the Critic Network are randomly initialized at the beginning of the algorithm. We use \(Q'\) and \(\mu '\) to improve learning stability in the target network.

figure a

Algorithm Training: The Agent obtains states \(s\left( t \right) \) from the environment and selects the current best action \({a^*}\left( t \right) \). After taking action \({a^*}\left( t \right) \), the Agent receives a reward \({R_t}\) and subsequently observes the next state \(s\left( {t + 1} \right) \). The transition buffer is \(\left\{ {s\left( t \right) ,a\left( t \right) ,R\left( t \right) ,s\left( {t + 1} \right) } \right\} \), which can be stored into the experience replay memory \(\mathcal{B}\). In addition, the loss function in Fig. 2 is:

$$\begin{aligned} L = \textrm{E}\left[ {{{\left( {R\left( t \right) + \gamma {Q'}\left( {s\left( {t + 1} \right) ,a\left( {t + 1} \right) |{\theta ^{{Q'}}}} \right) - Q\left( {s\left( t \right) ,a\left( t \right) |{\theta ^Q}} \right) } \right) }^2}} \right] \end{aligned}$$
(12)

where \(\gamma \) is the attenuation coefficient. And policy gradient can be expressed as:

$$\begin{aligned} {\nabla _\theta }\mathcal{T} = E\left[ {{\nabla _a}Q\left( {s,a|{\theta ^Q}} \right) {|_{s = s(t),a = \mu (s(t))}}{\nabla _{{\theta ^\mu }}}\mu \left( {s|{\theta ^\mu }} \right) {|_{s(t)}}} \right] \end{aligned}$$
(13)

We formalize the SA-DDPG algorithm process in Algorithm 1.

4 Experimental Analysis

4.1 Settings

We compare the task success rate and energy consumption with the random scheduling algorithm, the round-robin [16] scheduling algorithm, the earliest scheduling algorithm, and the DRL algorithms of DQN [17] and DDQN. The detailed simulation parameters are shown in Table 1.

Table 1. Simulation Parameters
Fig. 3.
figure 3

The task success rate.

As shown in Fig. 3, the task success rate of the DRL-based family of algorithms is 30%-40% higher than that of the conventional algorithms because the DRL-based algorithms can make learning based on historical experience and continuously train to optimize the decisions.

Fig. 4.
figure 4

Impact of different weighting factors \(\alpha \) on task success rate.

We set \(\alpha \) to 0.2, 0.5, and 0.8 for cost-sensitive tasks, balanced tasks, and delay-sensitive tasks, respectively. As shown in Fig. 4, the task success rate of our proposed algorithm consistently outperforms the other baseline algorithms in the three task type arrival scenarios, converging to about 93%.

Fig. 5.
figure 5

Total energy consumption in different hash rate environments.

Furthermore, we explore the impact of the hierarchical hash rate of the servers on system energy consumption in Fig. 5. And after the server hash rate is hierarchical, the total energy consumption of all algorithms is reduced by about 25% compared with the ungraded case.

5 Conclusion

In this paper, we study the task scheduling problem in edge scenarios. To solve the problem of allocating bandwidth resources to servers and scheduling tasks among servers, we propose a DRL-based algorithm to reduce the total task latency and user overhead to maximize the success rate of tasks. In addition, we verified that our algorithm is highly adaptable and capable of handling any type of task. In future work, we can further explore mobility under multi-edge networks.