Abstract
Federated learning has made an important contribution to data privacy-preserving. Many previous works are based on the assumption that the data are independently identically distributed (IID). As a result, the model performance on non-identically independently distributed (non-IID) data is beyond expectation, which is the concrete situation. Some existing methods of ensuring the model robustness on non-IID data, like the data-sharing strategy or pre-training, may lead to privacy leaking. In addition, there exist some participants who try to poison the model with low-quality data. In this paper, a performance-based parameter return method for optimization is introduced, we term it FederatedSmart (FedSmart). It optimizes different model for each client through sharing global gradients, and it extracts the data from each client as a local validation set, and the accuracy that model achieves in round t determines the weights of the next round. The experiment results show that FedSmart enables the participants to allocate a greater weight to the ones with similar data distribution.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Securing high-quality machine learning models while working with different data owners is a challenge with user data security and confidentiality [15]. In the past, there have been many attempts to address user privacy issues when exchanging data. For example, Apple recommends using Differential Privacy (DP) to respond these concerns [3]. The basic idea is to add appropriately calibrated noise to data in order to eliminate the identity of any individual but still retain the statistical characteristics [2]. However, DP can only prevent user information leakage to a certain extent. In addition, it is lossy in machine learning framework because the model built with noise is injected, which can lower the model performance.
Federated Learning (FL) is a cross-distributed data modelling method proposed by [10, 11]. It can establish a global model without exchanging original data among parties. Due to the exponential growth of participated data, the model naturally performs better global robustness and superiority over individual modelling.
Subsequently, [1] proposed the concept of vertical FL to update it suitable for more realistic scenarios. Since then, many scholars have started to study the application of real FL scenarios and proposed some new algorithms and frameworks, such as SplitNN [13].
[4] reveals the problem of multi-distribution between different data islands through joint clustering and FL. Through five model structure experiments on four different data-sets, [10] demonstrated that the iterative average model can be robust under both IID and non-IID data distribution patterns. However, the iterative approach is not as perfect as imagined. On non-IID data, it requires more rounds to iterate to sufficient convergence, and the final model performance trained with the same optimal parameters always slightly inferior to that obtained under IID distribution.
Almost all FL optimization algorithms are aimed at training a global model. However, in the real scenario, there exist clients who want to train a personalized model by absorbing useful information from others with similar data property. In addition, there are some dishonest participants trying to cheat with useless data to gain a high-qualified model.
Motivated by these real demands, we design a performance-based optimization algorithm, FedSmart, which is automatically updated. Our main contributions are as follows:
-
1.
Demonstrate the impact and performance of using non-IID data on both FL frameworks and local training.
-
2.
Adopt independent validation sets in each side instead of shared data sets to improve the model performance on non-IID data.
-
3.
Propose a new parameter joint method FedSmart to make the multi-party joint value of the stochastic gradient descent close to the unbiased estimate of the complete gradient.
2 Related Work
In some cases, due to the advanced nature of some existing machine learning algorithm, the training results based on the non-IID data are still good. However, for some application scenarios, training with non-IID data will have unexpected negative effects based on existing frameworks, such as low model accuracy and convergence efficiency. Because the data on each device is generated independently by the device/user itself, the heterogeneous data of different devices/users have different distribution characteristics and the training data learned by each device during local learning are non-IID. Therefore, how to improve the learning efficiency of non-IID data is of great significance for FL.
2.1 Average-Based Optimization Algorithm
To improve the performance of FL and reduce the communication cost [10], a deep network algorithm FederatedAveraging (FedAvg) based on iterative model average is proposed for non-IID FL, which can be applied to real scenarios. Theoretical analysis and experimental results show that FedAvg is robust to unbalanced and non-IID data, and it also has a low communication cost. Compared with baseline algorithm FedSGD, FedAvg has better practicability and effectiveness. [9] theoretically clarifies the convergence of FedAvg on non-IID data. Furthermore, FedMA is aimed at settling the heterogeneity problem [14].
2.2 Performance-Based Optimization Algorithm
The proposal of FedAvg method has a great inspiration for the follow-up researches [15]. [16] proposes a data-sharing FL strategy to improve the training of non-IID Data by creating a small portion of the data globally shared between all client devices on a central server.
Local client computational complexity, communication cost, and test accuracy are three important issues addressed by [5]. It proposes a loss-based AdaBoost federated machine learning algorithm (LoAdaBoost), which further optimizes the local model with high cross-entropy loss before averaging the gradients on the central server.
FedProx, proposed by [12], lowers the potential damage to the model caused by non-IID data. It adds a near-end item to optimize the local iteration times. Similarly, SCAFFOLD introduces a new variable combined with gradients, decreasing the variance of local iteration [8].
3 Approach
In FL researches, the scholars usually focus on the algorithm framework or the improvement of the global model accuracy. However, we generally do not know the data distribution or data quality of other participants, the heterogeneous data may result in worse performance when added to the global training.
With these motivations, we propose FedSmart, a new parameter return method. In this mechanism, the FL participant is smart enough to gain information from others who have similar data property. In another aspect, FedSmart can be used to test whether the model from other clients is useful to every client’s side. Furthermore, FedSmart can be treated as a kind of latent incentive mechanism, the selfish sides who provide unrealistic or unqualified data will be naturally filtered out via decreasing the weight, only the ones who provide their valuable data can benefit from the group with the similar distributions.
3.1 The Information Transfer Framework
The framework of FL is adopted. There typically exists a server, which controls and publishes the model and jointly deals with the parameters provided by participants. The participants who contribute parameters by doing local model training are called clients.
All clients do the training respectively using local data. After the model is updated, each client sends the local model information to the server. Clients send the gradient training with their local data to the server; the server packs these changes and sends back, i.e. \(\Delta \Theta ^t (\Delta \theta _1^t, \Delta \theta _2^t, ..., \Delta \theta _n^t)\) (see Fig. 1).
3.2 The Local Model Updating Mechanism
The local model updating mechanism considers the mutual predicting ability of non-IID data. If all clients train only one global model, it will inevitably lead to distribution or sample size discrimination. FedSmart is designed to update the local model in the form of weights, which makes the model prefer to its self-side data. This approach actually optimizes the server model with the data from each client.
At the time of initialization, the server initializes the model. When all clients receive the initial model, they will conduct a batch-size training and then launch the information transfer as mentioned above.
3.3 Performance-Based Weight Allocation
The weight of the next moment is on the basis of the equation shown below. The performance of all the clients is taken into consideration, the principle, in brief, is that the weight of model will be smartly adjusted to the accuracy of each client.
where \(acc_i^t\) represents the accuracy of Client i on local validation set in round t on the validation set, \(acc^t_{median}\) is the median of the set of accuracy, and \(\eta \) is the learning rate. The weight in round t is allocated according to the weight in the previous round and the change of accuracy in this round. The validation set is extracted from each client with a proportion of \(\alpha \in [0,1]\), and only serves for this client.
In FedSmart, we update the model according to the performance on validation set, which makes the model adaptive to self-side data. To conclude, FedSmart actually optimizes model of each client with valuable data from others.
4 Experiment
The experiment settings will be described step by step, including how to deal with the dataset and the experimental settings of FedSmart. Also, we will explain the impact of different parameters on the model performance and demonstrate the mechanism of using validation set.
4.1 Implementation Details
The data that concerned with the performance evaluation is the simulated datasets of MIMIC-III database [6, 7], which contains the health information for critical care patients at a large tertiary care hospital in the U.S. The data cleansing process is following [5].
The experimental data structure is shown in Table 1.
4.2 Experiment Settings
To illustrate the limited performance of FL on non-IID data, the data are constructed as a collective form of six heterogeneous data sets. In detail, Client1 and Client4, Client2 and Client5, Client3 and Client6 in pairs share a similar data distribution respectively.
The validation set proportion \(\alpha \) is set to 0.25 in default all through the experiment.
4.3 Results
The essence of centralized training is to aggregate the data of all parties together to improve the accuracy of the model by increasing the amount of data, so the results of centralized training are often higher than the results of each client training on their datasets alone. However, when the data are non-IID, centralized training will be hard to balance the results. The model tends to favor the groups with large samples or with simple distributions, so the established global model is undoubtedly unfair to other groups.
FedSmart v.s. Local Training. The model trained with FedSmart outperforms the local six ones (see Fig. 2), which is in the expectation that all FL participants will gain a better model within the information sharing framework than only using their own data. Because compared to the individual, working in a team, sharing the information of data, i.e. in the framework of FL, everyone tends to gain something as a contributor.
FedSmart v.s. FedAvg v.s. LoAdaBoost. To illustrate the effectiveness of FedSmart, we will do a comparison among FedSmart, FedAvg and LoAdaBoost. In FedAvg, the server only receives the model parameters and returns the updated model parameters, and there is no interactive updating mechanism in FedAvg. LoAdaBoost receives the loss and parameters of the model, and combines the information of the two to update the weight of the previous iteration [5]. FedSmart adopts different parameter combinations to update the model to make it approximate to the unbiased estimate of the complete gradient. The result is shown in Fig. 3.
It can be seen that no matter what FL optimization algorithm, the performance on IID data always outperforms non-IID ones. One of the most important incentives of FL optimization algorithm is to decrease the influence of data distribution, i.e. the performance reduction on non-IID data. Also, FedSmart uses the accuracy of the validation set to measure the similarity of the distribution, establishes multiple models by adjusting the weights of different client models, and establishes multiple models on multiple clients only through the encrypted parameter exchange. The result shows that model performance is significantly better than FedAvg, and moderately better than LoAdaBoost.
FedSmart. FedSmart considers one party’s distribution without repeatedly making compromises on multiple distributions. To further explain the working mechanism and performance of FedSmart, the process of the parameter joint weight changing during the training process is shown as in Fig. 4. The weight appears to change in pairs: Client1 and Client4, Client2 and Client5, Client3 and Client6, which is in accordance with our experimental settings, indicating that FedSmart is figuring out good data.
We can observe that in the FL, there still exists the unbalanced performance improvement on some sides due to the difference in distributions. Because normally we only have one global model to be established, to reduce the global loss and improve the accuracy, there is inevitably a decrease in the performance improvement caused by the fact that one of the distributions is ignored to some extent. As long as there is only one global model, attend to one thing and lose sight of another must occur. Therefore, for the non-IID data, it is necessary to consider how to create multiple models suitable for different distributions, and then make FL more universal.
5 Conclusion
Federated Learning is raising attention in both academics and industry, as it is a way to solve the isolated island problem and a solution to privacy-preserving. We propose a performance-based parameter return method FedSmart. It is different from the general idea that FL shares one global model. Instead, FedSmart establishes multiple models by treating each client as a server to make its own model perform the best. We use the simulated MIMIC-III data and separate it into six non-IID data-sets to do the FL. The experimental result shows that FedSmart can have better performance than FedAvg and even centralized training method. FedSmart can be extended to the industries’ data training scenarios.
In the continuation of our study, to compensate for this shortcoming and minimize the leakage of privacy caused by model delivery, FedSmart can use the drop-out-like mechanism to make it difficult for training participants to obtain effective information from the changes of the model. Also, we will improve and explore the FedSmart algorithm to make it to be generally stable and adaptable for both IID and Non-IID datasets, to tackle the root of problems for FL frameworks.
References
Cheng, K., Fan, T., Jin, Y., et al.: Secureboost: a lossless federated learning framework. arXiv preprint arXiv:1901.08755 (2019)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Greenberg, A.: Apple’S ‘Differential Privacy’ Is About Collecting Your Data-But Not Your Data. https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/. Accessed 22 May 2020
Huang, L., et al.: Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 99, 103291 (2019). https://doi.org/10.1016/j.jbi.2019.103291
Huang, L., et al.: LoAdaBoost: loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data. Plos one 15(4), e0230706 (2020). https://doi.org/10.1371/journal.pone.0230706
Johnson, A.E., Pollard, T.J., Mark, R.G.: The MIMIC-III clinical database. PhysioNet (2016) https://doi.org/10.13026/C2XW26
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data. 3, 1–9 (2016). https://doi.org/10.1038/sdata.2016.35
Karimireddy, S.P., et al.: SCAFFLOD: stochastic controlled averaging for on-device federated learning. arXiv preprint arXiv:1910.06378 (2019)
Li, X., et al.: On the convergence of fedavg on non-iid data. In: 2020 International Conference on Learning Representations (ICLR) (2020)
McMahan, B., et al.: Federated learning of deep networks using model averaging. arXiv preprint arXiv:1602.05629 (2017)
McMahan, B., et al.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in PMLR. 54, 1273–1282 (2017)
Sahu, A.K., et al.: Federated optimization for heterogeneous networks. arXiv preprint arXiv:1812.061271(2), p. 3 (2018)
Vepakomma, P., et al.: Split learning for health: distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564 (2018)
Wang, H., et al.: Federated learning with matched averaging. In: 2020 International Conference on Learning Representations (ICLR) (2020)
Yang, Q., et al.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)
Zhao, Y., et al.: Federated learning with non-iid data. arXiv preprint arXiv:1806.00582 (2018)
Acknowledgements
This paper is supported by National Key Research and Development Program of China under grant No.2018YFB1003500, No.2018YFB0204400 and No.2017YFB1401202.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
He, A., Wang, J., Huang, Z., Xiao, J. (2020). FedSmart: An Auto Updating Federated Learning Optimization Mechanism. In: Wang, X., Zhang, R., Lee, YK., Sun, L., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2020. Lecture Notes in Computer Science(), vol 12317. Springer, Cham. https://doi.org/10.1007/978-3-030-60259-8_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-60259-8_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60258-1
Online ISBN: 978-3-030-60259-8
eBook Packages: Computer ScienceComputer Science (R0)