1 Introduction

With the explosive growth of the smart Internet of Things (IoT) devices, intelligent mobile networking applications have become ubiquitous, which triggers the high demands for the on-device big data analytics. Meanwhile, the cloud-based deep learning services [1], including recommendation systems, health monitoring, language translation, and many others [2,3,4], call for the efficiency improvement of performing the deep neural networks (DNN) on the mobile devices. However, such a centralized deep learning framework requires the users to outsource their sensitive data to the remote cloud in order to train the corresponding learning models, which arises the significant concerns on privacy as well as the on-device computation resources [5]. To this end, mobile edge computing (MEC) [6] is proposed as a novel distributed computation architecture to provide powerful and real-time data storage and processing capabilities at the edge of the network [7], which can be seen as a cornerstone to bridge the cloud-based learning services and mobile devices [8].

As a deep learning model in mobile edge computing, Federated learning (FL) [9, 10] has gradually arisen concerns from academia and industry by considering the data privacy issues in a decentralized learning manner, which can be widely used in numerous emerging scenarios, such as crowdsourced system [12]. The main purpose of federated learning is to build a joint machine learning model upon localized datasets while providing privacy guarantee [11]. Participants in federated learning act as the data providers to train a local learning model, meanwhile, the server maintains a global model by averaging all the local model parameters (i.e., gradients) generated by randomly selected participants until convergence [13]. The privacy is guaranteed due to the shares between server and participants are only model parameters, which prevent the server from direct access to the private training data.

Although federated learning has several advantages such as privacy-preserving and on-device learning, the scalability and privacy issues are still fatal for this novel paradigm, especially when encounter heavy local training consumes (DNN model) and the possible insider attackers. In this paper, we mainly consider the following two practical issues when applying the federated learning protocol to mobile edge computing architecture: 1) executing whole DNN training phase on the resource-constraint mobile edge devices will introduce incredible computation costs, which means the smart devices cannot afford such a heavy computation requirement in the federated learning protocol; 2) the shares, i.e., local model updates and global model parameters, among central server and participants could directly leak a proportion of private training datasets, meaning that the standard federated learning protocol cannot provide strong privacy guarantee against malicious entities, such as edge and cloud servers.

First of all, there are a very large number of parameters (sometimes hundreds of millions) in a common deep neural network [14, 15], even the forwarding operation of such a huge deep neural network requires significant processing and storage resources. However, the resource-constraint nature of mobile devices implies that the computation capability will soon reach the bottleneck and makes deep learning applications tend to invalidation [16, 17]. Furthermore, the shares (i.e., gradients) in federated learning could leak the sensitive information of users’ training data to the untrusted third parties [18, 19]. For example, according to [20], the server in federated learning can easily launch the model inversion attack to obtain parts of training data distributions, and the gradient backward inference described in [21] also enables an adversary to get a fraction of private data from the participants’ local updates.

For solving the efficiency problem of DNN computation, model partition technique [22] has been presented to offload the large parts of loosely coupled hidden layers [23] in a DNN model to the third party, which can be used to enable federated learning applications on mobile edge computing environment. On the other hand, aiming at preventing user-side information leakage to the untrusted server, several works have been done from different perspectives [24]. Abadi et al. [25] proposed a privacy-preserving deep learning method to protect user’s data privacy by adding the Gaussian perturbation [26] to the clipped gradient. Geyer et al. introduced a user-side differential private federated learning mechanism [27] to protect the shared learning model from revealing each participant’s updates. Furthermore, Bonawitz et al. proposed a sum aggregation protocol for high-dimensional data using the secret-sharing method [28]. However, these privacy solutions rely on the presence of a trusted aggregator to perturb the global model parameters and publish the noised parameters to each participant securely. That means the aggregator is able to see each individual’s model parameters. Therefore, it is necessary to design a practical mechanism to protect the privacy of each participant against the untrusted third parties in federated learning.

In this paper, we propose an efficient private federated learning scheme in mobile edge computing, named FedMEC, by integrating the model partition technique to the standard federated learning mechanism. Basically, FedMEC can partition the underlying DNN model across the edge server and the mobile edge devices. On the client-side DNN, the users’ private training data is fed into the low-level neural network to extract features. On the edge-side DNN, the local model updates are generated by executing the forward and backward propagations on these features. At last, the cloud server in our FedMEC framework aggregates all the received local model updates to improve the current global model until it tends to convergence. In order to preserve training data privacy, we further present a differentially private data perturbation mechanism on the local side, which adds the deliberate perturbations into the features before transmitting to the edge server. In other word, the clients, edge server and the cloud server run the different portion of a federated learning protocol, and the shares among these entities are perturbed by Laplace noise to achieve differential privacy. Our main contribution can be summarized as follows:

  • A framework enabling federated learning in mobile edge computing: Instead of outsourcing user’s training data to the cloud in the conventional cloud-centric machine learning systems, we apply federated learning to mobile edge computing to realize the localized training property. Meanwhile, we design a partitioned learning framework to split the deep neural network into two parts: client-side DNN and edge-side DNN, where the former one maintains a consistent convolutional layer to extracting training data features and the latter one equips the remaining layers to update each participant’s parameters.

  • A differentially private data perturbation mechanism: To protect user-side data privacy against untrusted third parties, we present a differentially private data perturbation mechanism to perturb the Laplacian random noises to the local training data features extracted by the partitioned convolutional layer, so as to achieve differential privacy and provide rigorous privacy guarantee.

  • Exhaustive experimental evaluation: We evaluate the proposed FedMEC scheme through a standard image classification task under the settings of federated learning and compare the result with several related methods to demonstrate effectiveness. Further, we also deploy FedMEC on an Android device to test the overhead.

The remainder of this paper is organized as follows. We briefly introduce the basic knowledge of federated learning and differential privacy in Section 2 and the system framework is presented in Section 3. Section 4 provides the detailed construction of the proposed FedMEC scheme and Section 5 discusses the effectiveness of FedMEC by conducting the extensive experimental evaluations. Finally, Section 6 gives the conclusion and future work. The notations used in this paper are listed in the Table 1.

Table 1 Notations used in this paper

2 Preliminaries

2.1 Federated learning

Federated learning has been explored to provide a more flexible distributed machine learning framework, whose main purpose is to perform joint machine learning model training across multiple mobile devices with private training data respectively. All the participants locally train the global model on their private training data and upload only the model parameters instead of the raw data. Such a localized model training method presents significant advantages in privacy-preserving because the clients do not need to share their private data with any third party. Figure 1 demonstrates the federated learning framework with model average.

Fig. 1
figure 1

Federated learning framework

The standard federated learning protocol [13] assumes that all the clients agree on a common learning objective and global model structure. In a certain communication round t, all the sampled participants mt first download the global model parameters from the central server to update the local model, then the local model is trained locally to generate the local model update \(\varDelta w_{t}^{(i)}\) on the private training datasets. After that, the central model collects all the local model updates and aggregates them to improve the current global model until it tends to convergence. The model average step in the central server can be formulated as follow.

$$ w_{t+1}^{(global)}= w_{t}^{(global)}+\frac{1}{m_{t}}\sum\limits_{i=1}^{m_{t}}{\varDelta} w_{t+1}^{(i)}, $$
(1)

where \(w_{t}^{(global)}\) indicates the global model parameters at the t-th communication round, and \({\varDelta } w_{t}^{(i)}\) denotes the local model update from the i-th participant at communication round t + 1.

2.2 Differential privacy

Differential privacy [29] is a rigorous concept that has been widely used in data publication systems by adding randomly noise (e.g., Laplace or Gaussian noises) into datasets, concealing the real results of any data query operation. It is defined in terms of the data query on two adjacent databases \(\mathcal {D}\) and \(\mathcal {D}^{\prime }\) where the query results are statistically similar but differing in one data item. In the context of federated learning, differential privacy could be used as the local-side privacy solution to protect the privacy of users’ training datasets. The formal definition of 𝜖-differential privacy can be described as follow:

Definition 1

(𝜖-differential privacy:) A randomized mechanism \({\mathscr{M}}:\mathcal {D}\rightarrow \mathcal {R}\) fulfills 𝜖-differential privacy for certain non-negative number 𝜖, iff for any adjacent input \(d\in \mathcal {D}\) and \(d^{\prime }\in \mathcal {D}^{\prime }\), and any output \(S\subseteq \mathcal {R}\), it holds that

$$ Pr[\mathcal{M}(d\in\mathcal{D})\in S]\leq e^{\epsilon}\cdot Pr[\mathcal{M}(d^{\prime}\in\mathcal{D}^{\prime})\in S], $$
(2)

where 𝜖 is defined as the privacy budget, which measures the level of privacy guarantee of the randomized mechanism \({\mathscr{M}}\): the smaller 𝜖, the stronger privacy guarantee.

Typically, a deterministic query \(f:\mathcal {D}\rightarrow \mathcal {R}\) can simply achieve 𝜖-differential privacy approximately by adding the deliberated perturbation to its result. The added perturbation is calibrated to f ’s global sensitivityΔf, which is defined as the maximal value of \(||f(d)-f(d^{\prime })||\) on the adjacent inputs d and \(d^{\prime }\). In this paper, we use the Laplace mechanism [26] to perturb the output data on the client side as follows:

$$ \mathcal{M}(d)=f(d)+Lap(\lambda), \lambda=\frac{{\varDelta} f}{\epsilon} $$
(3)

where Lap(λ) is a random variable sampled from the Laplace distribution, and the scale parameter λ is set to \(\frac {\varDelta f}{\epsilon }\).

3 System framework

3.1 Federated learning with MEC

In this paper, we consider a MEC framework that can partition the federated learning protocol across the cloud central server, the edge servers, and the edge devices. Assuming all the users voluntarily participate in the federated learning protocol provided by a cloud central server, so as to obtain the desired machine learning services. Meanwhile, these participants try to prevent the training data privacy from being leaked by the malicious entities involved in the whole learning procedure, such as the untrusted cloud central and edge servers. Therefore, we present a three-layer mobile edge computing framework that provides the perfect architecture supportive of federated learning protocol with multiple participants. Specifically, the overall framework is shown in Fig. 2 and the involved entities are as follows.

  • Edge devices: represent a set of devices (such as smartphones, laptops, smart meters, and so on) owned by the participants. Each edge device is equipped with computation and communication modules, which enable to execute the local training procedure and transfer its local update to the edge server.

  • Edge servers: are core entities for mobile edge computing architecture. They own more storage and computation resources compared to the edge devices, which usually be deployed at the edge of the network and serves as the computation unit between cloud central and edge devices.

  • Cloud server: acts as a control center to collect all users’ local model updates and execute the federated average algorithm to update the global shared model. After the model average, it assigns the shared model to the edge device, who participates the federated learning protocol.

Fig. 2
figure 2

Federated learning with mobile edge computing

Among these entities, the edge devices were assumed as the trusted entities, whose goal is to benefit from the intelligent learning services by collaboratively executing the training phases with other participants. However, the third parties, e.g., edge server and cloud server, are honest-but-curious [30]. Specifically, they faithfully execute the federated learning process to compute correctly and send results truthfully. However, they are curious about the privacy contained in the data and attempt to disclose private data [31]. Except with the privacy issues, directly implementing the federated learning protocol in mobile edge computing framework could also face practical concern. That is, the resource-constraint edge devices cannot easily afford the whole local model updating consumes because the heavy computation cost is needed to execute the DNN training procedure. Thus, the main challenge of applying federated learning with mobile edge computing is how to design a valid scheme to reduce the computation overhead on edge devices without broke the federated learning mechanism, while protecting user-side data privacy contained in the original data.

3.2 Overview of FedMEC

To solve the aforementioned challenges, we propose an efficiently private federated learning framework in mobile edge computing, named FedMEC, by considering both performance and privacy issues, where the local DNN model is divided into the client-side part and the edge -side part, so as to reduce the computation cost on the edge devices. The effectiveness of the partition mechanism in DNN architecture lies in the loosely coupled property among multiple insider layers. That is, each hidden layer in DNN can be executed separately by taking the previous layer’s output as its input. According to this practical property of DNN, we can simply divide the DNN structure in federated learning across the edge devices and the edge server. In this paper, we consider partitioning the neural network along with the last layer of convolutional layers and all the intermediate results generated by the user-side DNN are hidden from the other entities (i.e., edge server and cloud server).

Figure 3 gives a high-level description of our proposed FedMEC framework. FedMEC relies on the mobile edge computing architecture and splits the whole federated learning protocol across the involved entities. Particularly, the local neural network training phase is divided into two parts: client-side DNN and edge-side DNN. According to the partition mechanism, the convolutional layers of the local DNN are deployed on the client-side while the remaining part (i.e., dense layers or fully-connected layers) will be assigned to the edge server. In this situation, edge devices merely undertake the simple and lightweight feature extraction and perturbation. Similar to the settings in [14], the network structure and parameters of the client-side DNN model are frozen and the edge-side DNN can be fine-turned. Besides, the cloud central server in our FedMEC scheme is designed to execute the aggregate and average steps according to the standard federated learning protocol.

Fig. 3
figure 3

Overview of proposed FedMEC framework

Here, the frozen local-side neural network is pre-trained on an auxiliary dataset which shares the same distribution with the localized private training datasets. Accordingly, the global model in the cloud server is initialed by a replica of this well-trained local model. Then the pre-trained global neural network will be partitioned along with the last layer of the convolution layer and the well-trained convolution layer will send to each client for feature extraction. Based on our designed FedMEC framework, the major parts of a whole DNN training procedure can be offloaded to the edge and cloud servers, while mobile edge devices are only needing to execute the simple feature extraction part through a frozen network model. Furthermore, to provide a rigorous privacy guarantee, we apply the differential privacy mechanism in our FedMEC scheme, where the extracted features are perturbed by the deliberated Laplace noise before being transmitted to the edge server. Note that, we do not consider the privacy contained in the label of the contributed training data since we assume that the participant willingly contributes the labeled data to perform supervised learning and should have no expectation on the privacy contained in the labels.

4 Construction of proposed FedMEC

4.1 Deep neural network partition

In this paper, we choose the most popular learning method, i.e., DNN, as the learning framework to construction our federated learning protocol in mobile edge computing scenario. A complete deep neural network structure consists of many loosely coupled hidden layers, which can be partitioned as multiple parts to embed in our mobile edge computing entities, such as the edge devices and the edge server. Thus, partitioning large DNN across mobile devices and edge server is one of the solutions to reduce complex computations on resource-constraint edge devices. Following this property, we can design an efficient deep learning framework which enables federated learning applications on mobile edge computing environments. Here, we apply DNN model with convolutional layers, i.e., convolutional neural network, as our baseline model architecture.

Figure 4 shows a high-level description of our designed DNN partition method. As we know, the convolutional processes in a DNN training procedure take the most complex computation overheads, which will consume plenty of resources of mobile devices. To solve this shortcoming, we split a complete DNN model into two parts: client-side DNN and edge-side DNN, along with the last layer of the convolutional network. In the client-side part, the front portions of a DNN structure (i.e., convolution layers) are deployed to extract features from the raw data. To protect the privacy of the sensitive training data, we add the deliberated perturbations to the outputs of the client-side DNN, so as to guarantee the differential privacy. The edge-side DNN model consists of the remaining portions of the DNN structure (i.e., dense layers) to update the model parameters by executing the forward and backward propagation procedures. Based on our designed DNN partition method, the client-side resource-hungry operations required in the standard federated learning protocol can be significantly reduced [32]. Especially for the computation resources and energy considerations, partitioning solution is attractive to many machine learning service providers, paving the way for federated learning applications on mobile edge devices.

Fig. 4
figure 4

Partition process on the deep neural network

4.2 Differentially private data perturbation

Federated learning protocol is designed for providing basic privacy guarantee for each participants’ raw data due to its local training property. However, a participant’s sensitive data is still possibly leaked to the untrusted third parties, such as edge server and cloud server, even with a small portion of updated parameters (i.e., features and gradients). For examples, according to [20], the server in federated learning can easily launch the model inversion attack to obtain parts of training data distributions, and the gradient backward inference described in [21] also enables an adversary to get a fraction of private data from the participants’ local updates. Therefore, it is necessary to design a practical preserving mechanism to protect the privacy of each participant against the untrusted third parties in federated learning.

Differential privacy [26] is a great solution to provide the rigorous privacy guarantee by adding deliberate perturb on the sensitive datasets. However, adding the perturb to the original data directly may lead to significant negative effects about learning performance [33]. Thus, we can perturb the features generated by the convolutional layers of partitioned DNN, so as to preserve the privacy contained in the raw data. In this paper, we solve the aforementioned problem by considering a differentially private data perturbation mechanism which can protect the privacy information contained in the extracted features after executing the client-side DNN.

Following by the work from [14], we consider the deep neural network as a deterministic function \(x_{l}=\mathcal {F}(x_{r})\), where xr represents the private raw data and xl stands for the l-th layer output of a neural network [34]. For the privacy concern, we applying the differential privacy method to the DNN and further construct our private federated learning protocol in mobile edge computing paradigm. One efficient way to realize the 𝜖-differential privacy is to add controlled Laplace noise which is sampled from the Laplace distribution with scale \({\varDelta }\mathcal {F}/\epsilon \) into the output xl. According to the definition of differential privacy described in Section 2.2, the global sensitivity for a query \(f:\mathcal {D}\rightarrow \mathcal {R}\) can be defined as follow:

$$ {\varDelta} f=\max\limits_{d\in D,d^{\prime}\in D^{\prime}}||f(d)-f(d^{\prime})|| $$
(4)

However, the biggest challenge here is that the global sensitivity \({\varDelta }\mathcal {F}\) is difficult to quantification in the deep neural network. Directly adding the Laplace perturbations into the output features will destroy the utility of the representations for the future predictions.

To address this problem, we employ the nullification and norm bounding methods to enhance the availability of differential privacy in deep neural networks. Specifically, before a participant starting to extract the features from his sensitive raw data xr using the pretrained client-side DNN, he firstly performs the nullification operation to mask the high sensitive data items as \(x_{r}^{\prime }=x_{r}\odot I_{n}\), where ⊙ is the multiplication operation and In is the nullification matrix with the same dimension as input sensitive raw data. Besides, the nullification matrix In is a random binary matrix (i.e., consisted of 0 and 1) and its structure is determined by a nullification rate μ, meaning that the number of zeros is the supremum of Sup(nμ). Apparently, μ has a significant impact on the prediction accuracy which will be discussed in Section 5.

After the nullification operation on the sensitive raw data, each participant needs to run the client-side DNN on \(x_{r}^{\prime }\) to extract the features as \(x_{l}=\mathcal {F}(x_{r}^{\prime })\). Then, we consider the norm bounding method to enforce a certain global sensitivity as follow:

$$ x_{l}^{\prime}=x_{l}/\max\left( 1,\frac{||x_{l}||_{\infty}}{B}\right) $$
(5)

where \(||x_{l}||_{\infty }\) represents the infinite norm of the l-th layer outputs. This formula indicates that \(x_{l}^{\prime }\) is upper bounded by S, meaning that the sensitivity of xl can be preserved as long as \(||x_{l}||_{\infty }\leq B\), whereas it will be scaled by B when \(||x_{l}||_{\infty }>B\). According to [25], the scaling factor B usually be set as the median of \(||x_{l}||_{\infty }\). The Laplace perturbation (scaled to B) now is added into the bounded features \(x_{l}^{\prime }\) to further preserve the privacy as follow:

$$ \tilde{x}_{l}=x_{l}^{\prime}+Lap(B/\sigma I) $$
(6)

Note that the Laplace noise is added into the final output of the convolutional layers and the parameter bB also has high impacts of the model performance. Due to the same network structure for each client-side DNN, we use the same notation \(\tilde {x}_{l}\) to represent the latest perturbed features for all participants.

4.3 FedMEC algorithm

As aforementioned, federated learning allows the clients locally train their model in a distributed manner, and upload its local model update (i.e., gradient) instead of sharing their private data samples to the central server. In our differential private federated learning system, assuming their existing N sampled users agree on a common learning objective and model structure, each edge client owns its private dataset. In each iteration, edge clients download the partitioned deep neural network model to the local and fed their local private dataset into the client-side DNN model to generate the features. Then, they adding the deliberated Laplace noise to the features. After that, all the clients send the perturbed features to the edge server and the edge server will train the edge-side DNN model using those features to further generate the local model updates with the method of stochastic gradient descent (SGD) [10]. At last, those local model updates are aggregated and averaged to jointly optimize the current global model in the cloud server side. The above steps will be iteratively executed until the global model tends to convergence. The pseudo-code of our overall FedMEC scheme was shown in Algorithm 1.

figure a

According to the standard federated learning protocol [13], after adding the Laplace perturbation on the features extracted from the client-side DNN, all the perturbed features will be fed to the edge-side DNN to further generate the local model update by running the SGD algorithm. For simplicity, we use \(\tilde {x}_{i}\) to represent the i-th participant’s update (i.e., participant i’s perturbed features), where \(i\in \left [1,n\right ]\). The SGD mechanism is an optimization method to find the parameter w by minimizing the loss function \({\mathscr{L}}(w,\tilde {x}_{i})\). In a certain communication round t, SGD algorithm first compute the gradient \(g_{t}(\tilde {x}_{i})\) for any input features \(\tilde {x}_{i}\) as follow:

$$ g_{t}^{(i)}=\nabla_{w_{t}}\mathcal{L}(w_{t},\tilde{x}_{i}) $$
(7)

In terms of gradient descent process during the local training procedure, we utilize the distributed selective SGD (DSSGD) mechanism instead of the conventional SGD algorithm to achieve distributed computation capability. DSSGD splits the weight wt and the gradient gt into n parts, namely \(w_{t}=({w_{t}^{1}},\cdots ,{w_{t}^{n}})\) and \(g_{t}=({g_{t}^{1}},\cdots ,{g_{t}^{n}})\), so the local parameters update rule becomes as follow:

$$ {w}_{t+1}^{(i)}={w}_{t}^{(i)}-\eta\cdot g_{t}^{(i)} $$
(8)

Then the conventional SGD algorithm was executed to calculate the local model update as:

$$ {\varDelta} {w}_{t+1}^{(i)}={w}_{t+1}^{(i)}-{w}_{t}^{(i)} $$
(9)

At last, each edge server sends the local model updates \({\varDelta } {w}_{t+1}^{(i)}\) to the cloud server to further executing the federated average procedure:

$$ {w}_{t+1}^{(global)}={w}_{t}^{(global)}+\frac{1}{n}\sum\limits_{i=1}^{n}{\varDelta} {w}_{t+1}^{(i)}, $$
(10)

The whole federated learning procedure will be executed iteratively until the global model tends to convergence.

5 Experimental evaluation

In this section, we conduct a series of experiments on an image classification task and a real mobile application system to evaluate the performance and applicability of our proposed FedMEC. We first examine the effectiveness of our differentially private data perturbation method by applying the convolutional denoising autoencoder to visualize the noise and the reconstruction. Then, we verify the model accuracy under different perturbation strengths (μ, b) in terms of a classification task, and evaluate the mean accuracy trends of the global model by fixing one of the perturbation parameters and changing the other one. In addition, the comparison experiments are conducted to estimate the performance of our scheme with other three related works, including no-privacy [13], local-DP [25], and central-DP [27]. At last, in order to verify the applicability of FedMEC, we implement FedMEC on an Android system in a real mobile device to monitor the CPU temperature and CPU frequency.

In our experiments, we run the federated learning protocol on an image classification task to estimate our FedMEC mechanism. The classification task is conducted on a handwritten digital image dataset MNIST, which contains of 60000 training samples and 10000 testing samples ranging from 0 to 9 (i.e., 10 classes) with the same size of 28×28 pixels. For the experimental settings, a general deep neural network is implemented in FedMEC, which includes 3 convolutional layers and 2 dense layers. The kernel size of all three convolutional layers is 3×3 and the stride for these convolutional layers is setting as 2. The activation functions applied in the DNN structure are LReLU. As aforementioned in Section 4, the perturbation strength (μ, b) are the main parameters in our FedMEC scheme, where μ is the nullification rate and b is the diversity of the Laplace mechanism. The evaluation indexes are the model accuracy of the image classification task and the CPU frequency and temperature of mobile devices.

5.1 Effectiveness of data perturbation

According to FedMEC, the perturbation strength (μ, b) are two critical parameters in the designed differentially private data perturbation method. Thus, we adopt the convolutional denoising autoencoder [35] under the settings of federated learning to visualize the noise and reconstruction, where the perturbation strength is represented by (μ, b). We train our model based on two perturbation strengths (μ = 1%, b = 1) and (μ = 10%, b = 5).

The visualizing noise and reconstruction results are shown in Fig. 5. The three rows in this figure from the top to the bottom represent the original samples, the perturbed samples, and the constructed samples, respectively. According to the perturbation and reconstruction results, we can see that the perturbed digits can be reconstructed to a certain degree at the perturbation strengths of (μ = 1%, b = 1) as shown in Fig. 5a. However, as shown in Fig. 5b, it is hard to reconstruct the original digital when the perturbation strength reaches (μ = 10%, b = 5), even the perturbed data is public.

Fig. 5
figure 5

Visualization of Noise and Reconstruction

5.2 Performance under different data perturbations

The standard federated learning protocol allows each participant to train their data locally and only updating the model parameters. However, the edge device users may not strictly adopt the perturbation strength specified by the learning services provider, which means edge device users could change their perturbation strength before sending to the edge server. Thus, it is necessary to estimate the impact of our differentially private data perturbation mechanism under different perturbation strength in terms of the global model accuracy. In our experiments, we set two scenarios that the numbers of edge clients n are 100 and 300, and the training is stopped when the communication round reaches 30 and 50 for n = 100 and n = 300, respectively. The goal of this group of experiments is to estimate the changes of accuracy under two different perturbation strengths (μ = 10%, b = 3) and (μ = 20%, b = 2). From the results shown in Fig. 6, we can see that the model can get high accuracy very quickly within several communication rounds in both 100 and 300 clients settings, meaning our FedMEC scheme works well in the settings of federated learning while providing sufficient privacy guarantees. Besides, since the perturbation strengths (μ, b) selected in our experiments are small, the accuracy of the federated model under multiple settings is not too much difference.

Fig. 6
figure 6

Classification accuracy on different perturbations

5.3 Impact of perturbation strength

We also design a group of experiments to evaluate the model mean accuracy by changing one of the parameters in perturbation strength (μ, b), while keeping another parameter as a fixed value. Here, we consider the mean accuracy for each parameter setting by averaging all the results with 30 and 50 communication rounds for 100 clients and 300 clients. As shown in Fig. 7, our FedMEC scheme can perform more than 85% classification accuracy for all the parameter combinations. Besides, with the gradual increase of perturbation strength, the model accuracy tends to the decreasing trend due to the large perturbation on the features will bring a negative impact in the prediction stage. Moreover, the influence of parameter b is much worse than μ since it is the key factor to determine the differential privacy level. Despite this, the change range of classification accuracy is less than 5%, which shows the stability and validity of our FedMEC scheme.

Fig. 7
figure 7

Mean accuracy on μ and b

5.4 Comparison evaluation

To further illustrate the effectiveness, we compare the performance of our proposed FedMEC with other three schemes on an image classification task. The comparison schemes are as follows:

  • No-privacy [13]: a standard federated learning approach where all the participants train a complete neural network at the local-side and upload the corresponding model updates to the central server.

  • Central-DP [27]: a randomized mechanism at the server-side to provide differential privacy in the federated learning approach, which a single client’s local model updates can be hidden during the aggregation phase.

  • Local-DP [25]: each client adds enough noise during the local SGD training procedure to protect the training data from being discolored by the untrusted server.

All the four schemes are implemented under the same federated learning settings as described previously. Besides, we set the perturbation strength of our FedMEC scheme as (μ = 20%, b = 2) and other three schemes as their suggested noisy sizes. Figure 8 shows the results of classification accuracy on 300 clients with 50 communication rounds. Note that “no-privacy” approach is the baseline of standard federated learning approach and it performs the highest classification accuracy (approximately 0.95) in our experiment. Our FedMEC scheme is able to achieve a classification score of about 0.85, which is much higher than the “local-DP” approach (0.75) but little lower than the “central-DP” approach (0.9). The main reason for this phenomenon is that the “central-DP” method can only work in the context of a fully-trusted federated learning scenario, while FedMEC can provide the local-side differentially privacy guarantee even the edge and cloud servers are not trusted.

Fig. 8
figure 8

Accuracy comparison over communication rounds

5.5 Implementation on android

At last, to demonstrate the applicability of our proposed FedMEC framework, we implement the whole scheme in a demo system on HUAWEI MATE20 PRO and a laptop DELL INSPIRON 15. The mobile device is equipped with ARM Cortex-A76@2.6GHz, ARM Cortex-A76@1.92GHz, and ARM Cortex-A55@1.8GHz, and the laptop is fitted out with Intel Core i5-8250U@ 3.4GHz and AMD Radeon-530 Graphics. For the local training samples, we randomly chose 3000 images from MNIST dataset to investigate the CPU status of the mobile device during the client-side DNN training procedure, where all the images are processed consecutively. Similar to [14], we use TensorFlow to generate a deployable model for our Android system and the experiments results are shown in Fig. 9. Specifically, the CPU temperature increases when our demo system executing because the local training procedure consumes the computation resources of the mobile device. However, it keeps a stable highest temperature values in the interval of 40C-45C, which indicates that our FedMEC scheme has a fully acceptable performance in terms of resource consumption. Besides, the trend of CPU frequency further illustrates that the client-side DNN training process only occupies approximately a little more than half of the CPU frequency. Therefore, it is applicable to implement FedMEC on multiple resource-constraint mobile devices.

Fig. 9
figure 9

CPU status during client training procedure

6 Conclusion

In order to enable highly efficient federated learning protocol and meanwhile protect users’ training data privacy, we propose FedMEC in the mobile edge computing environment which can achieve both efficiency and privacy in this paper. The designed FedMEC framework splits a complete deep neural network used in the local training procedure into two parts: client-side DNN and edge-side DNN. A large part of heavy computation works is offloaded to the edge and cloud servers, while the mobile devices merely undertake the lightweight feature extraction part, so as to reduce the computation complexity on the mobile edge devices. Furthermore, to minimize the client-side privacy leakage during the training phase, we introduce a differentially private data perturbation mechanism to perturb the Laplacian random noises to the client-side features before uploading to the edge server. The extensive experimental results on a benchmark dataset demonstrate that our proposed FedMEC scheme can not only achieve high model accuracy but also providing sufficient privacy guarantees. Finally, we also implement FedMEC on a real Android application system to show its applicability and verify the computation overhead. In regards to our further work, we plan to explore the optimal perturbation strength for the differentially private data perturbation mechanism.