1 Introduction

In traditional machine learning (ML) approaches, data are collected and stored by a single node (or centralized server) and used for training and testing. However, transmitting and centralizing data raises numerous administrative, ethical and legal issues, mainly related to privacy and data protection, according to the General Data Protection Regulation (GDPR) [1]. Federated learning (FL) empowers collaborative learning to address data issues while protecting information security [2]. Recently, the FL framework has been increasingly used in real-world applications, e.g., healthcare [3, 4], purchase recommendation [5, 6], and distributed synthetic data generation systems [7, 8].

Generally, the FL framework involves three primary steps: (i) all parties receive the latest global model W from the centralized server (also called a broker), (ii) the parties train the received model using their local data, and (iii) they upload their locally trained models Wi back to the centralized server to be aggregated and form an updated global model. These steps are repeated until a particular convergence criterion is obtained. However, such distributed framework also results in communication cost and leads to a training bottleneck. Currently, communication efficiency is still a significant concern for FL.

Recently, some researchers proposed several frameworks to improve communication efficiency in the horizontal federated learning (HFL) scenario [9,10,11]. The vertical federated learning (VFL) scenario is opposite to the HFL scenario, where all parties hold homogeneous data, i.e., the parties have partial overlap on the sample space, whereas they differ in the feature space. As a result, the VFL framework requires a more intricate communication architecture to ensure the other parties are unaware of the data and the characteristics of other parties. The literature has proposed several VFL frameworks. For example, in 2019, Yang et al. proposed a simple VFL framework based on the C-S communication architecture with one parameter server (PS) and two parties [12]. Figure 1 highlights that the PS occurs as a trusted coordinator who is mainly responsible for data aggregation and information distribution. Ou et al. [13] designed a vertical federated learning system utilizing Bayesian machine learning with homomorphic encryption, while Hou et al. [14] proposed a verifiable privacy-preserving scheme (VPRF) based on a vertical federated random forest. However, the stability and reliability of the PS are pretty important, as once the PS fails to provide accurate computation results, the VFL may produce a low-quality model [15]. To eliminate the effect of PS, Chen et al. [16] proposed a secure VFL framework based on a pseudo-decentralization communication architecture. As illustrated in Fig. 2, the parties are divided into one active and many passive parties, where the active party replaces the position of PS as the coordinator. In 2021, Zhu et al. [17] introduced a secure VFL framework named PIVODL, which trained GBDTs with data labels distributed on multiple devices. Zhang et al. [18] suggested a VFL framework based on an LSTM fault classification network for the firefighting IoT platform. Chen et al. [19] proposed an efficient and interpretable inference framework for decision tree ensembles in a VFL scenario. However, the pseudo-decentralization communication architecture still needs many communications to achieve high test accuracy and privacy security. Real-word applications involving an intricate communication architecture impose high time and money costs. Although Gu et al. [20] proposed an efficient VFL framework called VFB2 to simplify the communication architecture, VFB2 still suffers from semi-honest attacks [21] and the coordinator’s effect. Hence, it is quite challenging to design a framework that considers both communication efficiency and privacy security for the VFL scenario.

Fig. 1
figure 1

The VFL framework based on the C-S communication architecture

Fig. 2
figure 2

The VFL framework based on the pseudo-decentralization communication architecture

In addition, a simplified framework is urgently needed to complete VFL modeling with limited communication sources and low coordinator’s effect. Hence, this paper proposes VFL-R, a novel VFL framework integrated with the ring architecture and a HE-based approach, enabling a multi-party scheme to train the model collaboratively. We summarize the contributions of this paper as follows.

  • We first incorporate the ring communication architecture into the VFL framework. Hence, our novel VFL framework avoids the complicated communication protocol and reduces the coordinator’s effect. The performance of the VFL-R framework is evaluated based on benchmark datasets and challenged against other frameworks. The experimental results reveal that VFL-R effectively reduces the coordinator’s communication cost during the modeling process while preserving a high test accuracy.

  • We provide our framework’s detailed theoretical analysis of the loss function and gradient. This is important as the theoretical analysis affords a better understanding of the framework’s operating mechanism and optimizes the model.

  • To protect the privacy security of each party, we integrate a HE-based approach into our framework. Meanwhile, we analyze the classic semi-honest threat models and demonstrate VFL-R’s robustness to semi-honest attacks.

The remainder of this paper is organized as follows. Section 2 introduces some necessary methods and concepts for the VFL-R framework. Section 3 defines our framework’s new model formula, while Section 4 describes the proposed framework in detail. Section 5 presents the security analysis, and Sections 67 and 8 present the experimental setup, varying experiment settings, and challenge VFL-R against different VFL frameworks. Finally, Section 9 concludes this work and provides some future research direction.

2 Preliminary

This section introduces some methods and concepts for the VFL-R framework.

2.1 Paillier homomorphic encryption

Our framework employs the Paillier Homomorphic Encryption (PHE) [22, 23] scheme to protect data privacy, an additive homomorphic encryption method that performs addition and multiplication on the encrypted values. This paper defines a new encryption operation \('\odot ^{\prime }\).

Definition 1 (Encryption operation \('\odot ^{\prime }\))

For any \(a,b\in \mathbb {R}^{n}\), the \('\odot ^{\prime }\) performs the following calculation:

  • ab = [[a]] + [[b]] = [[a + b]] (addition)

  • ab = aT[[b]] = [[aTb]] (scalar product),

where the additive homomorphic encryption of a vector \(u (u\in \mathbb {R}^{n})\) is represented as [[u]].

2.2 Vertical federated learning

Let a training sample be \(\left \{\left (\mathbf {x}_{i}, y_{i}\right ): i=1,2, \ldots , n\right \}\), where \(\mathbf {x}_{i} (\mathbf {x}_{i}\in \mathbb {R}^{d})\) and yi denote the input vector and output label, respectively, d is the feature’s dimension of the training sample. For the VFL setting, xi is vertically distributed among K parties and each party owns a disjoint subset of features vector \(\mathbf {x}_{[i,k]} (\mathbf {x}_{[i,k]} \in \mathbb {R}^{d_{k}},k=1,2,\ldots ,K)\), where dk is the features dimension of the k-th party and \( \sum \limits _{k=1}^{K}d_{k}=d\). Similarly, we define \({\varTheta }=\left [\theta _{1};\theta _{2};\ldots ;\theta _{K}\right ]\), where \(\theta _{k} (\theta _{k} \in \mathbb {R}^{d_{k}})\) denotes the parameter of the k-th party. Suppose that the K-th party holds the label information \(y_{[i,K]}\left (y_{[i,K]}\in \mathbb {R}\right )\), we focus on the following empirical risk minimization function:

$$ \min_{\theta} \mathcal{L}({\varTheta}) \triangleq \frac{1}{n} {\sum}_{i=1}^{n} f\left( {\sum}_{k=1}^{K} \mathbf{x}_{[i,k]}\theta_{i}, y_{[i,K]}\right)+\lambda R\left( {\varTheta}\right), $$
(1)

where \({\mathscr{L}}: \mathbb {R}^{d} \rightarrow \mathbb {R}\) is smooth and convex, λ is a tuning parameter, and f(⋅) and R(⋅) denote the loss function and regularizer, respectively.

2.3 Gradient descent

The stochastic gradient descent (SGD) [24, 25] is the most commonly used algorithm for solving convex optimization problems. In this paper, we employ the gradient-based method to optimize (1) . Let \({\mathscr{L}}({\varTheta }) \) be the derivative, the local parameters 𝜃k from the k-th party are updated according to:

$$ \theta_{k}^{*}=\theta_{k}-\alpha \nabla\mathcal{L}\left( \theta_{k}\right), $$
(2)

where α is the learning rate and \(\nabla {\mathscr{L}}\left (\theta _{k}\right )\) denotes the gradient of \({\mathscr{L}}({\varTheta })\) with respect to 𝜃k. The empirical risk function reaches the step-wise minimum according to the gradient.

2.4 Loss function

The model’s loss function depends on the model’s purpose and can be regarded as a true function of one variable \(t (t\in \mathbb {R})\) [26]. We rewrite the loss function in (1) as \(\frac {1}{n} \sum \limits _{i=1}^{n} f\left (t\right )\) with t = wy for regression or t = wy for classification, where \(w=\sum \limits _{k=1}^{K}\mathbf {x}_{[i,k]}\theta _{i}\) and y = y[i,K]. Some common loss functions are reported in Table 1.

Table 1 Typical loss functions used in machine learning

3 Preparations for VFL-R framework

A natural question arising is which loss function of Table 1 should be adopted by our framework. To answer this question, this section introduces the necessary theoretical analysis and derives a new model formula applicable to our framework.

3.1 Theoretical analysis

For the existing VFL frameworks, Wan et al. [27] assumed that the loss function is implicitly linearly separable in the form of f(t) = gh(t), where g is any differentiable function and h(t) is a linearly separable function in the form of \(\sum \limits _{k=1}^{K}h\left (\theta _{k},\mathbf {x}_{[i,k]}\right )\). In this paper, we give a new property for the loss function involving the encryption operation \('\odot ^{\prime }\).

Property 1 (Encryption composed property)

For ∀t\(\mathbb {R},\) [[f \((t)]] \in {\mathscr{M}}\left (\ell ;^{\prime } \odot ^{\prime }\right ) \). The encrypted set \({\mathscr{M}}\left (\ell ;^{\prime } \odot ^{\prime }\right )=\){mm \(=(\ell ;^{\prime } \odot ^{\prime })\}\), where m comprises the elements from \(\ell =\left \{\ell _{1}, \ell _{2}, \ldots , \ell _{K}\right \}\) is as follows \(: m=\ell _{1}^{\prime } \odot ^{\prime } \ell _{2}^{\prime } \odot ^{\prime } \ldots ^{\prime } \odot ^{\prime }\) K.

3.2 New model formula in \({\mathscr{L}}({\varTheta })\)

In our framework, the K-th party computes the encrypted loss function [[f(t)]] in the form of \({\mathscr{M}}_{[K]}\left (\ell ;'\odot ^{\prime }\right )\), where [K] is the index of the K-th party. We assume that the regularizer satisfies the encryption decomposition property. Then, the K-th party will compute the encrypted regularizer [[R(Θ)]] in the form of \(\mathcal {N}_{[K]}\left (\theta ,'\odot ^{\prime }\right )\) with the set \(\theta =\left \{\theta _{1},\theta _{2},\ldots ,\theta _{K} \right \}\). The new model formula in (1) can be rewritten as:

$$ \frac{1}{n} \sum\limits_{i=1}^{n} \mathcal{M}_{[K]}\left( \ell;'\odot^{\prime}\right)+\lambda \mathcal{N}_{[K]}\left( \theta;'\odot^{\prime}\right). $$
(3)

3.3 Aggregation of the encrypted gradient \([[\nabla {\mathscr{L}}(\theta _{k})]]\)

The encrypted gradient aggregation is important for our framework to update the local parameters. Thus, this subsection introduces the assumption for the gradient.

Assumption 1

The gradients ∇f and ∇R satisfy Property 1, namely [[∇f]] comprises elements from set \(\mathcal {A}\) and [[∇R]] is composed of elements from set \({\mathscr{B}}\).

Theorem 1

Under Assumption 1, the encrypted gradient \([[\nabla {\mathscr{L}}(\theta _{k})]]\) can be composed of the elements from set \({\mathscr{L}}\): for t = wy, \({\mathscr{L}} =\mathcal {A} \cup {\mathscr{B}} \cup \left \{\mathbf {x}_{[i, k]}\right \}\), and for t = wy, \({\mathscr{L}}=\mathcal {A} \cup {\mathscr{B}} \cup \left \{\mathbf {x}_{[i, k]}, y_{[i, K]}\right \}\).

Proof

For t = wy, we derive the explicit form of \(\nabla {\mathscr{L}}(\theta _{k})\) according to (1) as:

$$ \nabla\mathcal{L}\left( \theta_{k}\right)=\frac{1}{n}\sum\limits_{i=1}^{n} \left( \nabla f \times \mathbf{x}_{[i,k]}\right)+\nabla R. $$
(4)

Considering the encrypted form

$$ \begin{array}{@{}rcl@{}} [[\nabla\mathcal{L}\left( \theta_{k}\right)]]&=&\frac{1}{n}\sum\limits_{i=1}^{n} \left( \nabla f \otimes \mathbf{x}_{[i,k]}\right)\oplus\nabla R\\ &=&\frac{1}{n}\sum\limits_{i=1}^{n}\left( \mathbf{x}_{[i, k]}^{T} [[\nabla f]]\right)+[[\nabla R]], \end{array} $$
(5)

\(\exists {\mathscr{L}}=\mathcal {A}\cup {\mathscr{B}}\cup \left \{ \mathbf {x}_{[i,k]}\right \}\), namely the \([[\nabla {\mathscr{L}}(\theta _{k})]]\), is comprises elements from set \({\mathscr{L}}\).

For t = wy, we derive the explicit form of \(\nabla {\mathscr{L}}(\theta _{k})\) according to (1) as:

$$ \nabla\mathcal{L}\left( \theta_{k}\right)=\frac{1}{n}\sum\limits_{i=1}^{n} \left( y_{[i,K]} \nabla f \times \mathbf{x}_{[i,k]}\right)+\nabla R. $$
(6)

Considering the encrypted form

$$ \begin{array}{@{}rcl@{}} [[\nabla\mathcal{L}\left( \theta_{k}\right)]]&=&\frac{1}{n}\sum\limits_{i=1}^{n} \left( y_{[i,K]} \otimes \nabla f \otimes \mathbf{x}_{[i,k]}\right)\oplus\nabla R\\ &=&\frac{1}{n}\sum\limits_{i=1}^{n} \left( y_{[i,K]} \mathbf{x}_{[i,k]}^{T} [[\nabla f]] \right)+[[\nabla R]], \end{array} $$
(7)

\(\exists {\mathscr{L}}=\mathcal {A} \cup {\mathscr{B}} \cup \left \{ \mathbf {x}_{[i,k]} ,y_{[i,K]}\right \}\), namely the \([[\nabla {\mathscr{L}}(\theta _{k})]]\) can be composed of the elements from set \({\mathscr{L}}\). □

According to Theorem 1, we note that the local data \(\left \{\mathbf {x}_{[i, k]}, y_{[i, K]}\right \}\) is necessary to compute the \([[\nabla {\mathscr{L}}(\theta _{k})]]\). However, in this paper the K-th party only computes the [[∇f]] and [[∇R]] encrypted results during the aggregation process, written as \(\mathcal {D}_{[K]}\left (D,^{\prime } \odot ^{\prime }\right )\), where \(D=\mathcal {A}\cup {\mathscr{B}}\), and as \(\left \{ d_{1},d_{2},\ldots ,d_{K} \right \}\). Then, each party will compute the encrypted gradient \(\left [\left [\nabla {\mathscr{L}}\left (\theta _{k}\right )\right ]\right ]\) during the local updating process. The purpose is to avoid gradient information leakage and reduce the calculation pressure during the aggregation process.

4 The VFL-R architecture

This section introduces the novel VFL framework based on the ring architecture illustrated in Fig. 3. The design framework has the following characteristics:

  • It includes two party types, one coordinator and some workers. The coordinator does not participate in the model training.

  • A one-way channel exists among each worker and a two-way channel between the coordinator and the K-th worker.

  • During the modeling process, each worker only needs one public key from the coordinator. Changing the encryption pairs in our framework is unnecessary.

Fig. 3
figure 3

The pipeline of VFL-R framework

4.1 The VFL-R framework

We divide our framework into three phases. In Phase One, the primary task is to aggregate the model function and the encrypted results, while Phase Two aims to perform local updating among each worker. Finally, Phase Three focuses on the decryption of the encrypted local parameters.

a. Phase One

The K-th worker needs to compute the \({\mathscr{M}}_{[K]}(\ell ;'\odot ^{\prime })\), \(\mathcal {N}_{[K]}\left (\theta ;'\odot ^{\prime }\right )\) and \(\mathcal {D}_{[K]}\left (D,'\odot ^{\prime }\right )\). The aggregation ideas are summarized as:

$$ \begin{array}{@{}rcl@{}} &\mathcal{M}_{[1]}=\mathcal{M}\left( \ell_{1};'\odot^{\prime}\right)\\ &\mathcal{M}_{[2]}=\mathcal{M}\left( \mathcal{M}_{[1]}\cup \ell_{2};'\odot^{\prime}\right)\\ & {\cdots} \cdots\\ &\mathcal{M}_{[K-1]}=\mathcal{M}\left( \mathcal{M}_{[K-2]}\cup \ell_{K-1};'\odot^{\prime}\right)\\ &\mathcal{M}_{[K]}\left( \ell;^{\prime} \odot^{\prime}\right)\in\mathcal{M}\left( \mathcal{M}_{[K-1]}\cup \ell_{K};'\odot^{\prime}\right). \end{array} $$
(8)

Denote that the encrypted set \({\mathscr{M}}_{[i]} (i=1,2,\ldots ,K-1)\). Each element in \({\mathscr{M}}_{[i]}\) can be used to compute \({\mathscr{M}}_{[K]}\left (\ell ;'\odot ^{\prime }\right )\). The 1-st worker computes the encrypted 1 to \({\mathscr{M}}_{[1]}\), while the 2-nd worker computes new elements based on the elements from \({\mathscr{M}}_{[1]} \cup \ell _{2}\). With the transfer of \({\mathscr{M}}_{[i]}\) suggested in our proposed framework, we increase the element availability when computing the target model. Hence, the K-th worker will compute the \({\mathscr{M}}_{[K]}\left (\ell ;'\odot ^{\prime }\right )\) and similarly, the \(\mathcal {N}_{[K]}\left (\theta ;'\odot ^{\prime }\right )\) can be aggregated as:

$$ \begin{array}{@{}rcl@{}} &{}\mathcal{N}_{[1]}=\mathcal{N}\left( \theta_{1};'\odot^{\prime}\right)\\ &\mathcal{N}_{[2]}=\mathcal{N}\left( \mathcal{N}_{[1]}\cup \theta_{2};'\odot^{\prime}\right)\\ &{\cdots} \cdots\\ &\mathcal{N}_{[K-1]}=\mathcal{N}\left( \mathcal{N}_{[K-2]}\cup \theta_{K-1};'\odot^{\prime}\right)\\ &\mathcal{N}_{[K]}\left( \theta;'\odot^{\prime}\right)\in\mathcal{N}\left( \mathcal{N}_{[K-1]}\cup \theta_{K};'\odot^{\prime}\right). \end{array} $$
(9)

The \(\mathcal {D}_{[K]}\left (D,'\odot ^{\prime }\right )\) can be aggregated as:

$$ \begin{array}{@{}rcl@{}} &{}\mathcal{D}_{[1]}=\mathcal{D}\left( d_{1};'\odot^{\prime}\right)\\ &\mathcal{D}_{[2]}=\mathcal{D}\left( \mathcal{D}_{[1]}\cup d_{2};'\odot^{\prime}\right)\\ & {\cdots} \cdots\\ &\mathcal{D}_{[K-1]}=\mathcal{D}\left( \mathcal{D}_{[K-2]}\cup d_{K-1};'\odot^{\prime}\right)\\ &\mathcal{D}_{[K]}\left( D;'\odot^{\prime}\right)\in\mathcal{D}\left( \mathcal{D}_{[K-1]}\cup d_{K};'\odot^{\prime}\right). \end{array} $$
(10)

This phase includes the following three steps.

  • Step 1: The coordinator creates encryption pairs and sends the public key to the K-th worker. Then, the K-th worker sends the public key to the 1-st worker.

  • Step 2: The 1-st worker receives the public key and computes the encrypted sets \({\mathscr{M}}_{[1]}\), \(\mathcal {N}_{[1]}\), \(\mathcal {D}_{[1]}\). The 2-nd worker does the same operations as the 1-st worker after receiving the public key and \({\mathscr{M}}_{[1]}\), \(\mathcal {N}_{[1]}\), \(\mathcal {D}_{[1]}\). This process is repeated until the encrypted sets \({\mathscr{M}}_{[K-1]}\), \(\mathcal {N}_{[K-1]}\), \(\mathcal {D}_{[K-1]}\) are sent to the K-th worker.

  • Step 3: The K-th worker completes the aggregation of \({\mathscr{M}}_{[K]}\left (\ell ;^{\prime } \odot ^{\prime }\right ), \mathcal {N}_{[K]}\left (\theta ;^{\prime } \odot ^{\prime }\right )\) and \(\mathcal {D}_{[K]}\left (D,^{\prime } \odot ^{\prime }\right )\).

b. Phase Two

In this phase, each worker computes the encrypted gradient and updates the local parameters. The steps are as follows.

  • Step 4: The K-th worker uses the \(\mathcal {D}_{[K]}\left (D,'\odot ^{\prime }\right )\) to compute the \([[\nabla {\mathscr{L}}(\theta _{K})]]\) and updates the local parameters in the form of \([[\theta _{K}^{*}]]=[[\theta _{K}]]-\alpha [[\nabla {\mathscr{L}}\left (\theta _{K}\right )]]\) under the ciphertext environment. Next, the \(\mathcal {D}_{[K]}\left (D,'\odot ^{\prime }\right )\) is sent to the 1-st worker and the 1-st worker does the same things as the last worker. This procedure repeats until all workers complete the local updating.

  • Step 5: As illustrated in Figs. 4 and 5, all workers perform Steps 2-4 during the t-th (1 < t < T) iteration. The coordinator does not play any role during the modeling process and rarely has access to the intermediate results concerning the target model.

c. Phase three

Since the local parameter updating is in the ciphertext environment, it is necessary to decrypt the local parameters in the T-th iteration.

  • Step 6: As illustrated in Fig. 6, the K-th worker sends [[𝜃K]] to the 1-th worker after updating the local parameters. Then the 1-th worker sends the {[[𝜃1]], [[𝜃K]]} to the 2-th worker after updating the local parameters. This process repeats until the encrypted set \({\varTheta }=\left \{[[\theta _{1}]],[[\theta _{2}]],\ldots ,[[\theta _{K}]] \right \}\) is sent to the coordinator. The coordinator will decrypt he encrypted set using its private key.

    Fig. 4
    figure 4

    The aggregation process for the VFL-R framework in Steps 1-3

    Fig. 5
    figure 5

    The local updating process for the VFL-R framework in Step 4

    Fig. 6
    figure 6

    The decryption of local parameters for the VFL-R framework in Step 6

5 Security analysis

This section discusses our framework’s privacy security. Given that the semi-honest threat models have been widely used in FL security analysis [28,29,30], we introduce two assumptions for semi-honest threat models and analyze the privacy security from two aspects: the coordinator and the workers.

Assumption 2 (Honest-but-curious)

Each party follows the designing protocol to perform the correct computations. However, some parties may infer the other party’s raw data and model by retaining the intermediate computation result records.

Assumption 3 (Honest-but-colluding)

Each party follows the designing protocol to perform the correct computations. Unlike Assumption 2, some parties may collude to infer the other party’s raw data and model by sharing their own retained records.

For workers

In our framework, each worker passes the intermediate results and updates local parameters in the ciphertext environment. Workers usually receive the encrypted values from other workers, while under the encryption protection, it is challenging to perform inference attacks for other workers under Assumptions 2–3.

For the coordinator

In our framework, the coordinator’s task is to distribute the public key in the 1-th iteration and decrypt the local parameters in the T-th iteration. Even if the coordinator obtains the actual values of the local parameters, it is different from inferring the raw data under Assumption 2.

6 Experiment setting

All experiments simulate the VFL scenario utilizing Python 3.8.5 on an Intel Core E5-2640 CPU 2.40GHz. The data were partitioned vertically into four non-overlapping parties with a nearly equal number of features. We randomly selected 70% of the samples as the training data, and the remaining were employed as testing data.

6.1 Problem

The following experiment focuses on the binary classifications problem and utilizes the logistic regression [31, 32] scheme written as:

$$ f(w,y)\triangleq \frac{1}{n}{\sum}_{i=1}^{n} \log \left[1+\exp \left( -wy\right)\right], $$
(11)

where w = x[i,k]𝜃i and \(y=\left \{-1,1\right \}\). We add the 2-norm regularized risk written as \(R({\varTheta })=\frac {1}{2}\|{\varTheta }\|_{2}^{2}\) to avoid overfitting. Meanwhile, we use the second-order Taylor approximations for the logistic loss function to solve the non-linear problem [33]. The model function can be written as:

$$ f(w,y)\approx\frac{1}{n} \sum\limits_{i=1}^{n} \left( \log 2-\frac{1}{2} wy +\frac{1}{8}w^{2}\right)+\frac{\lambda}{2}\|{\varTheta}\|_{2}^{2}. $$
(12)

The gradient with respect to 𝜃k is:

$$ \nabla_{[k] f(w,y)}\approx\frac{1}{n} \sum\limits_{i=1}^{n}\left( \frac{1}{4} w -\frac{1}{2} y\right) \mathbf{x}_{[i,k]} + \lambda\theta_{k}. $$
(13)

6.2 Benchmark datasets

We evaluate our framework’s performance on benchmark datasets with various numbers of samples and features. Specifically, we select four datasets from the UCI datasets [34]: the lonosphere, statlog (Heart), sonar, and breast cancer wisconsion diagnostic (WDBC) datasets. The sample and feature numbers are listed in Table 2, while the values of each feature are standardized into [0,1].

Table 2 Statistic of benchmark datasets

6.3 The algorithm

Affected by the limited calculation power, using a public key to perform repeated PHE operations becomes hard. To solve such problem, we set t as a fixed training period in Algorithm 1. Specifically, the value of t can attend the maximum number of PHE operations. Algorithm 2 gives the system of the VFL-R framework after T (T>t) iterations. In fact, the total period T will be divided into many small periods t. Each party will execute the VFL-R framework in the divided period.

Algorithm 1
figure d

The VFL-R framework with the t training period.

Algorithm 2
figure e

The VFL-R framework with the t iterations.

7 Varying experiment setting

The performance assessment metrics are the convergence performance and the classification results on the benchmark datasets. Moreover, we explore the effects of various learning rates α and tuning parameters λ.

7.1 Varying learning rates α

In our experiment, it is hard to get the loss function curves due to the limits of the encrypted local parameters. Thus, we assume that all workers save the results of the encrypted parameters and jointly compute the loss function with the coordinator (t = 1). The loss function curves of the VFL-R framework under various learning rates are presented in Figs. 789 and 10. These figures highlight that the loss function curves have a consistent overall trend regardless of the learning rates. Moreover, from α = 0.01 to α = 0.3, the convergence speed of VFL-R is improved, and the classification results under various learning rates are reported in Table 3. When α = 0.1, the VFL-R framework achieves the best classification, while overall, the learning rate affects the classification performance. Therefore, the learning rate value must be appropriately tuned rather than selecting a large value.

Fig. 7
figure 7

Loss function curve with various α in the lonosphere dataset, where α = \(\left \{ 0.01, 0.1, 0.2, 0.3 \right \}\)

Fig. 8
figure 8

Loss function curve with various α in the statlog (Heart) dataset, where α = \(\left \{ 0.01, 0.1, 0.2, 0.3 \right \}\)

Fig. 9
figure 9

Loss function curve with various α in the snoar dataset, where α = \(\left \{ 0.01, 0.1, 0.2, 0.3 \right \}\)

Fig. 10
figure 10

Loss function curve with various α in the WDBC dataset, where α = \(\left \{ 0.01, 0.1, 0.2, 0.3 \right \}\)

Table 3 Classificationresults of the VFL-R framework under various learning rates on the benchmark datasets for T = 300

7.2 Varying tuning parameters λ

For this case, we set the learning rate to 0.1 and alter the tuning parameters. The loss function curves involving various tuning parameters λ are illustrated in Figs. 111213 and 14 demonstrating that the loss function curves have a similar convergence trend. The classification results of VFL-R with different tuning parameters are reported in Table 4, indicating that when λ increases from 0.1 to 0.9, VFL-R achieves the high classification performance of 84.69% - 85.05% on the lonosphere dataset, 86.11% - 86.33% on the statlog (Heart) dataset, 81.43% - 82.05% on the snoar dataset, and 95.55% - 95.83% on the WDBC dataset.

Fig. 11
figure 11

The loss function curve with different λ in the lonosphere dataset, where λ = \(\left \{ 0.1, 0.3, 0.6, 0.9 \right \}\)

Fig. 12
figure 12

Loss function curve with various λ in the statlog (Heart) dataset, where λ = \(\left \{ 0.1, 0.3, 0.6, 0.9 \right \}\)

Fig. 13
figure 13

Loss function curve with various λ in the snoar dataset, where λ = \(\left \{ 0.1, 0.3, 0.6, 0.9 \right \}\)

Fig. 14
figure 14

Loss function curve with various λ in the WDBC dataset, where λ = \(\left \{ 0.1, 0.3, 0.6, 0.9 \right \}\)

Table 4 Classification results of the VFL-R framework with various tuning parameters on four datasets for T = 300

8 Comparison with different VFL frameworks

At present, existing VFL frameworks pay little attention to the innovation of the communication architecture. In order to better highlight the performance of VFL-R, we challenge it against the VFL [12] and VFB2 [20] framework in different aspects, including functionality analysis, test accuracy, and communication performance.

8.1 Functionality analysis

Table 5 reports the functional comparison of the above frameworks. Specifically, VFL is based on the C-S communication architecture and can defend against semi-honest attacks to preserve data security. However, such a C-S communication architecture is inefficient, especially when many parties are involved. Regarding the VFB2 framework, it relies on the tree communication architecture and supports distributed learning. Although the tree communication architecture can significantly reduce the number of communications during the modeling process, its privacy protection can not guarantee high privacy security without using encryption technology [35].

Table 5 Comparison analysis of the VFL and VFB2 frameworks

Furthermore, VFL and VFB2 frameworks impose a significant communication burden for the coordinator, as during the modeling process, the coordinator sends the gradient or other parameters, involving an unnecessary communication cost and a high risk of information disclosure. In contrast, our proposed framework balances the two frameworks and reduces the coordinator’s communication burden.

8.2 Test accuracy

To demonstrate the test accuracy of the VFL-R framework, we challenge it against the VFL and VFB2 frameworks. Furthermore, we test the accuracy gap of different loss functions by considering the non-federated (NonF) experiment where all data are integrated for modeling with the logistic loss function.

Figures 151617 and 18 plot the test accuracy of four frameworks based on benchmark datasets. For the Taylor loss function, the VFL-R framework achieves a similar test accuracy to the VFL and VFB2 frameworks, with the test accuracy of each framework deviating at most by 4% in the lonosphere dataset. Considering the logistic loss function, the VFL-R framework attends the small test accuracy gap.

Fig. 15
figure 15

Test accuracy on the lonosphere dataset with various VFL frameworks, where T = {25,50,100,150,200,250,300,350,400}, α = 0.1 and λ = 0.3

Fig. 16
figure 16

Test accuracy on the heartstalog dataset with different VFL frameworks, where T = {25,50,100,150,200,250,300,350,400}, α = 0.1 and λ = 0.3

Fig. 17
figure 17

Test accuracy on the snaor dataset with different VFL frameworks, where T = {25,50,100,150,200,250,300,350,400}, α = 0.1 and λ = 0.3

Fig. 18
figure 18

Test accuracy on the wdbc dataset with different VFL frameworks, where T = {25,50,100,150,200,250,300,350,400}, α = 0.1 and λ = 0.3

8.3 Communication cost

We assume that each VFL framework includes N parties. \(\operatorname {Enc}\left (\cdot \right )\) is defined as encryption operation and \(\left |\cdot \right |\) denotes the data size of each party during the modeling process. wi and gi represent the intermediate results for the i-th party, which are used to compute the loss function and gradient, respectively. G is the gradient for modeling process.

For the VFL-R framework.

During the aggregation process, each party sends the \(\operatorname {Enc}\left (w_{i},g_{i}\right )\) to the next party and receives the \(\operatorname {Enc}\left (w_{i-1},g_{i-1}\right )\) from the last party. For the VFL-R framework, the coordinator does not participate in the modeling process. Thus, the communication cost of the third-coordinator is O(1).

For the VFL framework

Each party needs to send the Enc \(\left (w_{i},g_{i}\right )\) to the major party and receives the \(\operatorname {Enc}\left (w_{n},g_{n}\right )\) from the other party, where \(n=\left \{1,2,\ldots ,i-1\right \}\). The coordinator has to send the \(\operatorname {Enc}\left (G\right )\) to each party and therefore the communication cost of the coordinator can be formulated as \(O(\left |\operatorname {Enc}\left (G\right )\right |\cdot N )\).

For the VFB2 framework

Each party sends the \(\left (w_{i},g_{i}\right )\) to the next party based on the tree communication architecture. Meanwhile, each party will receive the G from the coordinator. Thus, the communication cost of the coordinator is \(O(\left | G\right |\cdot N )\).

In Table 6, we compare in detail the communication cost of all competitor frameworks, demonstrating that our framework reduces the coordinator’s communication cost. Meanwhile, compared to the VFL framework, our method greatly reduces the communication cost for each party.

Table 6 Communication cost of the VFL-R framework compared with the VFL and VFB2 frameworks in one communication round

8.4 The number of communications

Next, we compare the number of communications of the three VFL frameworks per communication round. In Fig. 19, the horizontal axis shows the number of parties K and the vertical axis shows the number of communications in one communication round. It reveals that the VFL framework requires O(K2) communications as the number of participants increases. However, our proposed framework requires O(K) communications, similar to the VFB2 framework. Nevertheless, as mentioned in Section 8.1, the VFB2 framework has poor privacy security, which is not a concern in our framework.

Fig. 19
figure 19

The number of communications about three VFL frameworks in one communication round

9 Conclusion and future work

This work proposes VFL-R, a new VFL framework that utilizes a ring communication architecture to simplify the intricate communication architecture among each party. In particular, the ring communication architecture reduces the coordinator’s communication burden, and our novel communication architecture reduces the number of communications in one communication round. Furthermore, our framework employs HE-based technology to guarantee privacy security. Functionality analysis and extensive experiments demonstrate that VFL-R effectively reduces the communication cost and achieves high accuracy on all benchmark datasets.

Our framework is limited by the necessary assumptions regarding the loss function and gradient. Therefore, future work will aim for improvements utilizing more complicated machine learning approaches or other methods solving these problems. Meanwhile, we will continue our research in designing an efficient framework that further enhances communication performance in VFL scenario.