Keywords

1 Introduction

Securing high-quality machine learning models while working with different data owners is a challenge with user data security and confidentiality [15]. In the past, there have been many attempts to address user privacy issues when exchanging data. For example, Apple recommends using Differential Privacy (DP) to respond these concerns [3]. The basic idea is to add appropriately calibrated noise to data in order to eliminate the identity of any individual but still retain the statistical characteristics [2]. However, DP can only prevent user information leakage to a certain extent. In addition, it is lossy in machine learning framework because the model built with noise is injected, which can lower the model performance.

Federated Learning (FL) is a cross-distributed data modelling method proposed by [10, 11]. It can establish a global model without exchanging original data among parties. Due to the exponential growth of participated data, the model naturally performs better global robustness and superiority over individual modelling.

Subsequently, [1] proposed the concept of vertical FL to update it suitable for more realistic scenarios. Since then, many scholars have started to study the application of real FL scenarios and proposed some new algorithms and frameworks, such as SplitNN [13].

[4] reveals the problem of multi-distribution between different data islands through joint clustering and FL. Through five model structure experiments on four different data-sets, [10] demonstrated that the iterative average model can be robust under both IID and non-IID data distribution patterns. However, the iterative approach is not as perfect as imagined. On non-IID data, it requires more rounds to iterate to sufficient convergence, and the final model performance trained with the same optimal parameters always slightly inferior to that obtained under IID distribution.

Almost all FL optimization algorithms are aimed at training a global model. However, in the real scenario, there exist clients who want to train a personalized model by absorbing useful information from others with similar data property. In addition, there are some dishonest participants trying to cheat with useless data to gain a high-qualified model.

Motivated by these real demands, we design a performance-based optimization algorithm, FedSmart, which is automatically updated. Our main contributions are as follows:

  1. 1.

    Demonstrate the impact and performance of using non-IID data on both FL frameworks and local training.

  2. 2.

    Adopt independent validation sets in each side instead of shared data sets to improve the model performance on non-IID data.

  3. 3.

    Propose a new parameter joint method FedSmart to make the multi-party joint value of the stochastic gradient descent close to the unbiased estimate of the complete gradient.

2 Related Work

In some cases, due to the advanced nature of some existing machine learning algorithm, the training results based on the non-IID data are still good. However, for some application scenarios, training with non-IID data will have unexpected negative effects based on existing frameworks, such as low model accuracy and convergence efficiency. Because the data on each device is generated independently by the device/user itself, the heterogeneous data of different devices/users have different distribution characteristics and the training data learned by each device during local learning are non-IID. Therefore, how to improve the learning efficiency of non-IID data is of great significance for FL.

2.1 Average-Based Optimization Algorithm

To improve the performance of FL and reduce the communication cost [10], a deep network algorithm FederatedAveraging (FedAvg) based on iterative model average is proposed for non-IID FL, which can be applied to real scenarios. Theoretical analysis and experimental results show that FedAvg is robust to unbalanced and non-IID data, and it also has a low communication cost. Compared with baseline algorithm FedSGD, FedAvg has better practicability and effectiveness. [9] theoretically clarifies the convergence of FedAvg on non-IID data. Furthermore, FedMA is aimed at settling the heterogeneity problem [14].

2.2 Performance-Based Optimization Algorithm

The proposal of FedAvg method has a great inspiration for the follow-up researches [15]. [16] proposes a data-sharing FL strategy to improve the training of non-IID Data by creating a small portion of the data globally shared between all client devices on a central server.

Local client computational complexity, communication cost, and test accuracy are three important issues addressed by [5]. It proposes a loss-based AdaBoost federated machine learning algorithm (LoAdaBoost), which further optimizes the local model with high cross-entropy loss before averaging the gradients on the central server.

FedProx, proposed by [12], lowers the potential damage to the model caused by non-IID data. It adds a near-end item to optimize the local iteration times. Similarly, SCAFFOLD introduces a new variable combined with gradients, decreasing the variance of local iteration [8].

3 Approach

In FL researches, the scholars usually focus on the algorithm framework or the improvement of the global model accuracy. However, we generally do not know the data distribution or data quality of other participants, the heterogeneous data may result in worse performance when added to the global training.

With these motivations, we propose FedSmart, a new parameter return method. In this mechanism, the FL participant is smart enough to gain information from others who have similar data property. In another aspect, FedSmart can be used to test whether the model from other clients is useful to every client’s side. Furthermore, FedSmart can be treated as a kind of latent incentive mechanism, the selfish sides who provide unrealistic or unqualified data will be naturally filtered out via decreasing the weight, only the ones who provide their valuable data can benefit from the group with the similar distributions.

3.1 The Information Transfer Framework

The framework of FL is adopted. There typically exists a server, which controls and publishes the model and jointly deals with the parameters provided by participants. The participants who contribute parameters by doing local model training are called clients.

Fig. 1.
figure 1

Parameter update framework

All clients do the training respectively using local data. After the model is updated, each client sends the local model information to the server. Clients send the gradient training with their local data to the server; the server packs these changes and sends back, i.e. \(\Delta \Theta ^t (\Delta \theta _1^t, \Delta \theta _2^t, ..., \Delta \theta _n^t)\) (see Fig. 1).

3.2 The Local Model Updating Mechanism

The local model updating mechanism considers the mutual predicting ability of non-IID data. If all clients train only one global model, it will inevitably lead to distribution or sample size discrimination. FedSmart is designed to update the local model in the form of weights, which makes the model prefer to its self-side data. This approach actually optimizes the server model with the data from each client.

At the time of initialization, the server initializes the model. When all clients receive the initial model, they will conduct a batch-size training and then launch the information transfer as mentioned above.

3.3 Performance-Based Weight Allocation

The weight of the next moment is on the basis of the equation shown below. The performance of all the clients is taken into consideration, the principle, in brief, is that the weight of model will be smartly adjusted to the accuracy of each client.

$$\begin{aligned} ||w_i^t||=||w_i^{t-1}+\eta (acc_i^t-acc^t_{median})||_1 \end{aligned}$$
(1)

where \(acc_i^t\) represents the accuracy of Client i on local validation set in round t on the validation set, \(acc^t_{median}\) is the median of the set of accuracy, and \(\eta \) is the learning rate. The weight in round t is allocated according to the weight in the previous round and the change of accuracy in this round. The validation set is extracted from each client with a proportion of \(\alpha \in [0,1]\), and only serves for this client.

In FedSmart, we update the model according to the performance on validation set, which makes the model adaptive to self-side data. To conclude, FedSmart actually optimizes model of each client with valuable data from others.

figure a

4 Experiment

The experiment settings will be described step by step, including how to deal with the dataset and the experimental settings of FedSmart. Also, we will explain the impact of different parameters on the model performance and demonstrate the mechanism of using validation set.

4.1 Implementation Details

The data that concerned with the performance evaluation is the simulated datasets of MIMIC-III database [6, 7], which contains the health information for critical care patients at a large tertiary care hospital in the U.S. The data cleansing process is following [5].

The experimental data structure is shown in Table 1.

Table 1. Summary of experiment dataset

4.2 Experiment Settings

To illustrate the limited performance of FL on non-IID data, the data are constructed as a collective form of six heterogeneous data sets. In detail, Client1 and Client4, Client2 and Client5, Client3 and Client6 in pairs share a similar data distribution respectively.

The validation set proportion \(\alpha \) is set to 0.25 in default all through the experiment.

4.3 Results

The essence of centralized training is to aggregate the data of all parties together to improve the accuracy of the model by increasing the amount of data, so the results of centralized training are often higher than the results of each client training on their datasets alone. However, when the data are non-IID, centralized training will be hard to balance the results. The model tends to favor the groups with large samples or with simple distributions, so the established global model is undoubtedly unfair to other groups.

FedSmart v.s. Local Training. The model trained with FedSmart outperforms the local six ones (see Fig. 2), which is in the expectation that all FL participants will gain a better model within the information sharing framework than only using their own data. Because compared to the individual, working in a team, sharing the information of data, i.e. in the framework of FL, everyone tends to gain something as a contributor.

Fig. 2.
figure 2

FedSmart v.s. Local Training

Fig. 3.
figure 3

FedSmart v.s. FedAvg v.s. LoAdaBoost

FedSmart v.s. FedAvg v.s. LoAdaBoost. To illustrate the effectiveness of FedSmart, we will do a comparison among FedSmart, FedAvg and LoAdaBoost. In FedAvg, the server only receives the model parameters and returns the updated model parameters, and there is no interactive updating mechanism in FedAvg. LoAdaBoost receives the loss and parameters of the model, and combines the information of the two to update the weight of the previous iteration  [5]. FedSmart adopts different parameter combinations to update the model to make it approximate to the unbiased estimate of the complete gradient. The result is shown in Fig. 3.

It can be seen that no matter what FL optimization algorithm, the performance on IID data always outperforms non-IID ones. One of the most important incentives of FL optimization algorithm is to decrease the influence of data distribution, i.e. the performance reduction on non-IID data. Also, FedSmart uses the accuracy of the validation set to measure the similarity of the distribution, establishes multiple models by adjusting the weights of different client models, and establishes multiple models on multiple clients only through the encrypted parameter exchange. The result shows that model performance is significantly better than FedAvg, and moderately better than LoAdaBoost.

FedSmart. FedSmart considers one party’s distribution without repeatedly making compromises on multiple distributions. To further explain the working mechanism and performance of FedSmart, the process of the parameter joint weight changing during the training process is shown as in Fig. 4. The weight appears to change in pairs: Client1 and Client4, Client2 and Client5, Client3 and Client6, which is in accordance with our experimental settings, indicating that FedSmart is figuring out good data.

Fig. 4.
figure 4

The Process of Weight Allocation. The weight appears to change in pairs: Client1 and Client4, Client2 and Client5, Client3 and Client6.

We can observe that in the FL, there still exists the unbalanced performance improvement on some sides due to the difference in distributions. Because normally we only have one global model to be established, to reduce the global loss and improve the accuracy, there is inevitably a decrease in the performance improvement caused by the fact that one of the distributions is ignored to some extent. As long as there is only one global model, attend to one thing and lose sight of another must occur. Therefore, for the non-IID data, it is necessary to consider how to create multiple models suitable for different distributions, and then make FL more universal.

5 Conclusion

Federated Learning is raising attention in both academics and industry, as it is a way to solve the isolated island problem and a solution to privacy-preserving. We propose a performance-based parameter return method FedSmart. It is different from the general idea that FL shares one global model. Instead, FedSmart establishes multiple models by treating each client as a server to make its own model perform the best. We use the simulated MIMIC-III data and separate it into six non-IID data-sets to do the FL. The experimental result shows that FedSmart can have better performance than FedAvg and even centralized training method. FedSmart can be extended to the industries’ data training scenarios.

In the continuation of our study, to compensate for this shortcoming and minimize the leakage of privacy caused by model delivery, FedSmart can use the drop-out-like mechanism to make it difficult for training participants to obtain effective information from the changes of the model. Also, we will improve and explore the FedSmart algorithm to make it to be generally stable and adaptable for both IID and Non-IID datasets, to tackle the root of problems for FL frameworks.