Keywords

1 Introduction

With the rapid development of wireless communication and Artificial Intelligence (AI) technologies, Internet of Vehicles (IoV) has emerged as a significant application scenario for 5G and beyong. It is playing a crucial role in the fields of autonomous driving and Intelligent Transportation Systems (ITS) [1, 2]. However, Intel estimates that each smart car will generate approximately 4000GB of data per day, which is equivalent to the data produced by nearly 3000 mobile phone users. The real-time processing of the data collected from the vehicles poses a thorny issue. Meanwhile, with the development of Mobile Edge Computing (MEC) and Federated Learning (FL) technologies, on the one hand, as the vehicle data is generated in IoV, MEC naturally combines with IoV, enabling data processing to be performed in the vicinity of the vehicles through the computing power and storage resources of edge server (ES) [3]. On the other hand, in 2016, Google proposed federated learning [4] as a distributed deep learning paradigm, which allows vehicles to train their local deep learning models independently using local data and aggregates them into a global model. Vehicles do not directly send local data and only share local model parameters, which to some extent, protects vehicle privacy [5]. As real-time computing services on the vehicular edge continue to grow, the combination of IoV and FL technology will become a research focus.

Although the existing FL clustering approaches and aggregation mechanisms have been effective in some IoV scenarios, several challenges persist in IoV, including: as a result of differences in sensors and processors of vehicles and devices, the data collected by vehicles is the non-independent identical distribution (Non-IID). When using such Non-IID data, FL model may significantly decrease in terms of convergence speed and accuracy [6, 7]. Vehicles are typically in a state of high-speed mobility and their distance to ES varies over time, which can result in communication congestion and delays when participating vehicles of FL frequently update model parameters to ES. Additionally, the computational capabilities of some vehicles differ, resulting in slow-performing stragglers significantly prolonging the delay of each round FL aggregation and ultimately impacting the convergence speed of the global model.

To tackle the aforementioned three challenges, we acknowledge the significance of cooperation among vehicles and propose a novel vehicle clustering-based semi-asynchronous federated learning framework for IoV (CSA_FedVeh). Our contributions are summarized as follows:

  • We establish a distributed training network for FL, which combines local training in vehicles and global aggregation in ES. To ensure the quality of FL model for vehicles and support the faster possible model convergence, we formulate a minimization problem for the convergence time of global model aggregation.

  • Based on the CSA_FedVeh framework, we propose a Space-Time and Weight DBSCAN density clustering algorithm (STW-DBSCAN) that relies on both the space-time location similarities and model weight similarities of vehicles. This algorithm efficiently solves the straggler problem and accelerates local model training. Meanwhile, a semi-asynchronous federated aggregation mechanism is adopted to further reduce resource consumption and communication costs by adjusting server waiting time.

  • We establish a simulation for vehicle-clustered FL network. Experimental results demonstrate that, under a fixed system operation time, our CSA_FedVeh framework outperforms four other benchmark frameworks by shortening the running time by approximately 24.6% to 60.2%, while achieving similar accuracy. Additionally, on MNIST dataset, the communication consumption is reduced by 3.4% to 62.07%, and on GTSRD dataset, the communication consumption is reduced by 1.01% to 68.6%, when compared to achieving similar accuracy.

2 Related Work

In recent years, an increasing number of scholarly investigations have endeavored to implement FL frameworks within IoV scenario [8,9,10]. Huang et al. [8] propose a novel FL framework called ‘FedParking’ that assists parked vehicles in providing computational offloading services and utilizes LSTM model for parking space estimation. Liang et al. [9] propose a semi-synchronous FL (Semi-SynFed) protocol and a dynamic aggregation scheme to asynchronously aggregate model parameters, in order to enhance the performance of FL in IoV scenario. Huang et al. [10] propose an asynchronous FL privacy-preserving computation model (AFLPC) for 5G-V2X, which aims to better utilize the low latency advantage of 5G networks, while also protecting data privacy in IoV. Similar to the aforementioned framework, we also considered the implementation of FL in IoV for real-time processing of data collected by vehicles.

Existing FL frameworks have been effective in addressing the impact of Non-IID data and resource constraints [11,12,13]. Ma et al. [11] propose a task offloading method based on data and resource heterogeneity in the HFEL environment, incorporating the statistical features of data through information entropy into the cost function to reshape the edge data. Briggs et al. [12] improve FL by introducing a hierarchical clustering step (FL+HC), which separates client clustering based on the similarity between clients’ local updates and the global joint model. Tan et al. [13] propose a novel federated prototype learning (FedProto) framework, in which communication between devices and servers is done via class prototypes rather than gradients. Considering the presence of Non-IID data and resource constraints in IoV, we propose a clustering algorithm in this paper that alleviates the problem of Non-IID data and resource constraints in IoV, while effectively mitigating the impact caused by high vehicle mobility in IoV.

Since global aggregation is required for parameter uploading in FL, existing FL aggregation mechanisms can be classified into two types based on their aggregation mechanisms: synchronous [4] and asynchronous [14] mechanisms. For synchronous FL mechanisms, the ES needs to collect all the parameters obtained from the participating vehicles before executing the aggregation process. However, the impact of stragglers [15], caused by poor network or hardware resources of some vehicles, can lead to significant delays. As for asynchronous FL mechanisms, the ES can aggregate the parameters without waiting for all vehicles in a round FL aggregation, but this can result in gradient divergence, further decreasing the performance of the FL model. In this work, we adopt the semi-asynchronous mechanism [16,17,18], which further reduces resource consumption and communication costs by adjusting the server’s waiting time. Sun et al. [17] propose a semi-asynchronous FL framework for extremely heterogeneous devices. Ma et al. [18] propose a semi-asynchronous federated learning mechanism called ‘FedSA’ and theoretically prove the convergence of FedSA. In contrast to the aforementioned framework, we consider vehicles of high-speed mobility and combine a semi-asynchronous mechanism with clusters of vehicles.

3 System Model and Problem Formulation

In this section, we firstly introduce the clustered federated learning process in IoV scenario. Sequentially, we describe the cluster-based semi-asynchronous federated learning framework (CSA_FedVeh). Finally, we propose the problem of minimizing the global model training time and formalize it for a better address of the challenges in IoV federated learning.

3.1 Vehicle-Clustered Federated Learning Network

As shown in the Fig. 1, we consider a vehicle-clustered FL network system, consisting of vehicular users (VUs) and edge server (ES). Assuming N VUs randomly distributed in IoV system, forming a set of VUs \(\mathcal{V} = \left\{ {1,...,n,....,N} \right\} \) , these VUs are clustered into M vehicle clusters using the STW-DBSCAN algorithm that relies on both the space-time location similarities and model weight similarities of vehicles (introduced in Sect. 4), forming a set of vehicle clusters \(\mathcal{C} = \left\{ {{c_1},...,{c_m},....,{c_M}} \right\} \) . Assuming convergence of FL model after K rounds of global aggregation, where \(k \in \left\{ {1,2,...,K} \right\} \).

Fig. 1.
figure 1

Illustration of Cluster FL process in IoV.

Table 1. Notations and their meanings.

In the vehicle-clustered FL network system, due to the close proximity of the VUs within a vehicle cluster, the collected information and the trained models also are highly similar, it can be assumed that the data of the VUs in the vehicle cluster are the same and can be partitioned into shared data blocks (SDBs). During the training process, VUs in the vehicle cluster only need to train their own models using their own historical experience data blocks (DBs), without the need to transmit local data, where DBs are partitioned according to the computing capabilities of VUs within each vehicle cluster. The main vehicle cluster head (MCH) is responsible for uploading and downloading and sending model weight parameters. If the MCH is offline, a vice vehicle cluster head (VCH) will be activated to take its place (Table 1).

We consider an unidirectional, straight, multi-lane IoV scenario, where VUs travel along the X-axis in the direction of the arrow. At time t, assuming that VU n is traveling at a constant speed \({\bar{\upsilon }_n}\), its position can be denoted as \(\{ {x_n}(t),{y_n}(t)\} \). The associated ES e is located at a fixed position \(\{ {x_e},{y_e}\}\) with a coverage radius of r. Therefore, the remaining distance of VU n within the coverage area of its ES can be defined as:

$$\begin{aligned} {\Pi _i} = \sqrt{r_{}^2 - {{({y_e} - {y_n}(t))}^2}} - ({x_n}(t) - {x_e}). \end{aligned}$$
(1)

Only when VU n is within the coverage area of ES, the parameters can be uploaded to the current ES. Therefore, the sojourn time of VU n at the current ES is defined as:

$$\begin{aligned} {{T}}_{v,n}^\mathrm{{soj}} = \frac{{{\Pi _n}}}{{{{\bar{\upsilon }}_n}}}, \end{aligned}$$
(2)

where \({\bar{\upsilon }_n}\) is the speed of VU n.

In order to calculate the distance between arbitrary nodes i and j (including VUs and ES), the Euclidean distance formula is introduced:

$$\begin{aligned} dist(i,j){{ = }}\sqrt{{{({x_i} - {x_j})}^2} + {{({y_i} - {y_j})}^2}} . \end{aligned}$$
(3)

We resort to the Shannon capacity formula to compute the data rate of nodes i to j in FL the k-th round of global aggregation and denoted as:

$$\begin{aligned} R_{{{i}},{{j}}}^k(dist(i,j)) = {B_{{i}}}{\log _2}(1 + \frac{{P_{{i}}^\mathrm{{tx}} \cdot h(dist(i,j))}}{{{N_0}}}), \end{aligned}$$
(4)

where \({N_0}\) represents the noise power, h(dist(ij)) is the channel gain at the distance between node i and j, \(P_{{i}}^\mathrm{{tx}}\) and \({B_{{i}}}\) represent the transmission power and the communication bandwidth from node i to node j.

During the k-th round of global aggregation, the uplink transmission time for a vehicle cluster \(c_m\) to transmit its cluster model weight parameters to its corresponding ES can be expressed as:

$$\begin{aligned} T_{k,m}^{\mathrm{{tx}}} = \frac{{\left| {\left. {w_m^k} \right| } \right. }}{{R_{m,e}^k(dist(m,e))}}{{.}} \end{aligned}$$
(5)

The downlink transmission time for an ES to transmit global model weight parameters to the MCH of vehicle cluster \(c_m\) in its coverage area can be expressed as:

$$\begin{aligned} T_{k,m}^\mathrm{{rx}} = \frac{{\left| {\left. {{w_k}} \right| } \right. }}{{R_{e,m}^k(dist(e,m))}}. \end{aligned}$$
(6)

As the time for intra-cluster transmission of parameters from cluster member VUs to the MCH is short, this transmission time is neglected in this paper. Therefore, for the k-th round of global aggregation in ES, the communication time for a vehicle cluster \(c_m\) consists of the uplink transmission time and downlink transmission time of its MCH.

$$\begin{aligned} T_{k,m}^\mathrm{{comm}} = T_{k,m}^{\mathrm{{tx}}} + T_{k,m}^\mathrm{{rx}}{{ }}. \end{aligned}$$
(7)

3.2 Cluster-Based Semi-asynchronous Federated Learning Framework for IoV (CSA_FedVeh)

In the CSA_FedVeh, assuming \({\mathcal{C}_k}\) represents the set of vehicle clusters participating in the k-th round of global aggregation, while semi-asynchronous aggregation quantity q is the number of vehicle clusters taking part in each round of global aggregation.

Fig. 2.
figure 2

Illustration of the CSA_FedVeh framework when q=2.

The vehicle cluster \({c_m}\) that participate in global aggregation process download the global training model weight parameters \(w_k^{}\) from the ES and distribute them to all VUs within the vehicle cluster. Each VU then updates its local model weight parameters based on its own historical data after uploads their new model weight parameters to the MCH, which synchronously aggregates vehicle cluster model weight parameters \(w_{k + 1}^{c,m}\) and sends them back to the ES. The ES collects q sequentially arrived cluster model weight parameters and performs global aggregation. Meanwhile, the vehicle clusters that did not participate in global aggregation continue their training. After the ES performs global aggregation, it generates the next round of global model weight parameters \(w_{k + 1}^{}\) and sends them to the MCHs that uploaded local weight parameters in the previous round, which then transmit them to the VUs within their respective vehicle clusters for the next round of training.

For instance, as shown in Fig. 2, The vehicle clusters participating in global aggregation in the first, second, and third rounds are \({\mathcal{C}_1} =\left\{ {{c_1},{c_3}} \right\} \), \({\mathcal{C}_2} =\left\{ {{c_4},{c_1}} \right\} \), and \({\mathcal{C}_3} =\left\{ {{c_2},{c_3}} \right\} \), respectively.

To formalize the problem, we will introduce the CSA_FedVeh framework from three aspects: vehicle user model training, intra-cluster model aggregation, and semi-asynchronous global model aggregation, which is formally described in Algorithm 1.

Vehicle User Model Training. Local loss function: each VU trains a local model based on local DBs, where the loss function of the k-th round FL training model of VU n is defined as:

$$\begin{aligned} {{F}_{v,n}}{{(}}w_k^{v,n}) = \frac{1}{{D_n}}\sum \limits _{({x_d},{y_d}) \in {\mathcal{D}_n}}^{} {f(w_k^{v,n},{x_d},{y_d})} , \end{aligned}$$
(8)

where \(f(w_k^{v,n},{x_d},{y_d})\) is the loss function of the model based on the training set samples \({x_d}\) and their predicted labels \({y_d}\) under the local weight parameter \(w_k^{v,n}\), \({\mathcal{D}_n}\) and \({D_n} \buildrel \varDelta \over = \left| {{\mathcal{D}_n}} \right| \) denote the local trained DB and the number of samples of VU n after partitioning the SDB.

Local model update: after receiving the global model weight parameter \(w_k^{}\), VU n performs \({\tau _{}}\) iterations for local parameter updates:

$$\begin{aligned} w_{k{{ + 1}}}^{v,n} \leftarrow w_k^{}{{ - }}{\eta }\nabla {{F}_{v,n}}{{(}}w_k^{}), \end{aligned}$$
(9)

where \({\eta }\) is the learning rate and \(\nabla {{F}_{v,n}}{{(}}w_k^{})\) is the gradient computed by local model of VU n under the global weight parameter \(w_k^{}\).

Local resource cost: the computation time and energy consumption of each round of local model training for VU n are denoted as:

$$\begin{aligned} {{T}}_{v,n}^{^{\mathrm{{loc}}}}{{ = }}\frac{{\sum \limits _{{{{d}}_{{n}}}{{ = 1}}}^{{{{D}}_{{n}}}} {{\psi _{{{{d}}_{{n}}}}}} }}{{{{{c}}_{{n}}^p} \cdot {{{f}}_{{n}}}}} , {{E}}_{v,n}^{^{\mathrm{{loc}}}}{{ = }}{{{p}}_{{n}}} \cdot {{T}}_{v,n}^{^{\mathrm{{loc}}}}, \end{aligned}$$
(10)

where \({\psi _{{{{d}}_{{n}}}}}\) is the total number of Floating Point Operations per Second (FLOPS) required for sample \({{{d}}_{{n}}}\), VU is characterized by a processing capability equal to \({{{c}}_{{n}}^p}\) (FLOPS) per CPU cycle, \({f_n}\) is CPU frequency and \({p_n}\) is the computational power of VU.

Intra-Cluster Model Aggregation. Cluster model aggregation: The VUs within vehicle cluster \(c_m\) pass the trained local model weight parameter \(w_k^{v,n}\) to the MCH for intra-cluster aggregation, obtaining the cluster model weight parameter:

$$\begin{aligned} w_k^{c,m} \leftarrow \sum \limits _{n \in {c_m}}^{} {{D_n} \cdot w_k^{v,n}} , \end{aligned}$$
(11)

Cluster gradient aggregation: in order to determine the convergence of the global model, it is necessary to upload the gradients from each vehicle cluster to ES. Therefore, the aggregation of gradients from vehicle cluster \(c_m\) is defined as:

$$\begin{aligned} \nabla {{{F}}_{c,m}}{{(}}w_k^{}) \leftarrow \sum \limits _{n \in {c_m}}^{} {\nabla {{{F}}_{v,n}}{{(}}w_k^{})} . \end{aligned}$$
(12)

Cluster resource cost: the computation time and energy consumption for training each round of local model in vehicle cluster \({c_m}\) is defined as:

$$\begin{aligned} {{T}}_{c,m}^{^{\mathrm{{loc}}}} = \mathop {\max }\limits _{c \in {c_m}} \{ {{T}}_{v,c}^{^{\mathrm{{loc}}}}\} ,\ {{E}}_{{{c,m}}}^{^{\mathrm{{loc}}}} = \sum \limits _{c \in {c_m}}^{} {{{E}}_{v,c}^{^{\mathrm{{loc}}}}} . \end{aligned}$$
(13)
Algorithm 1
figure a

CSA_FedVeh

Semi-asynchronous Global Model Aggregation. Global model aggregation: during a round of global aggregation, when ES receives cluster model weight parameters transmitted by q vehicle clusters, ES performs global aggregation:

$$\begin{aligned} {w_{k + 1}} \leftarrow \frac{1}{{{D\_k}}}\sum \limits _{m \in {\mathcal{C}_k}}^{} {w_k^{c,m}}, \end{aligned}$$
(14)

where \({D\_k}\) is the total number of samples from all participating vehicle clusters in the k-th round of global aggregation.

Global Gradient aggregation: in order to determine the stopping criterion for the convergence of the global model [19], the global gradient is defined as:

$$\begin{aligned} \nabla {{{F}}_{}}{{(}}w_{k + 1}^{}) \leftarrow \frac{1}{q}\sum \limits _{m \in {\mathcal{C}_k}}^{} {\nabla {{{F}}_{c,m}}{{(}}w_k^{})}. \end{aligned}$$
(15)

For the k-th round of FL aggregation in ES, the time taken for global aggregation is determined by the longest computation time of the participating vehicle cluster \({\mathcal{C}_k}\), denoted as:

$$\begin{aligned} {{{T}}_k}{{ = }}{\mathcal{T}_k}^{sort}[q], \end{aligned}$$
(16)

where \({\mathcal{T}_k}^{sort}[q]\) represents the q-th element in the set \({\mathcal{T}_k}^{sort}\)(defined in Eq.(20)).

Assuming that all VUs update their local weight parameters \({\tau _{}}\) times during each cluster weight parameter upload, the local computation time of vehicle cluster \({c_m}\) is the duration between the start time of the k-th round of global aggregation and the end time of local training, denoted as:

$$\begin{aligned} {{T}}_{{{k,m}}}^{\mathrm{{comp}}}{{(s}}_k^m{{) = }}\left\{ \begin{array}{l} {{T}}_{c,m}^{^{\mathrm{{loc}}}} \cdot \tau - \sum \limits _{k - {{s}}_k^m}^{{{k - 1}}} {{{{T}}_k}{{\ \ , s}}_k^m > 0} \\ {{T}}_{c,m}^{^{\mathrm{{loc}}}} \cdot \tau {{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ , s}}_k^m = 0{{ }} \end{array} \right. , \end{aligned}$$
(17)

where \({{s}}_k^m\) represents the number of global aggregation rounds that differ between the k-th round of global aggregation in which vehicle cluster \({c_m}\) participated and the global aggregation in which this vehicle cluster \({c_m}\) participated last time, for example in Fig. 2, where \({{s}}_{{1}}^{{3}}{{ = 0, \ T}}_{{{1,3}}}^{\mathrm{{comp}}}{{(s}}_{{1}}^{{3}}{{) = T}}_{{3}}^{^{\mathrm{{loc}}}} \cdot \tau \); \({{s}}_{{2}}^{{1}}{{ = 0,\ T}}_{{{2,1}}}^{\mathrm{{comp}}}{{(s}}_{{2}}^{{1}}{{) = T}}_{{1}}^{^{\mathrm{{loc}}}} \cdot \tau {{ }}\);

\({{s}}_{{3}}^{{2}}{{ = 2,\ T}}_{{{3,2}}}^{\mathrm{{comp}}}{{(s}}_{{3}}^{{2}}{{) = T}}_{{2}}^{^{\mathrm{{loc}}}} \cdot \tau {{ - }}\sum \limits _{{{3 - 2}}}^{{{3 - 1}}} {{{{T}}_k}} {{ = T}}_{{2}}^{^{\mathrm{{loc}}}} \cdot \tau {{ - }}{{{T}}_{{2}}}{{ - }}{{{T}}_{{1}}}\).

In the k-th round of ES global aggregation, the total time consumption of vehicle cluster \({c_m}\) is the sum of local computation time and communication time, represented as:

$$\begin{aligned} {{T}}_{k,m}^\mathrm{{total}}{{ = T}}_{k,m}^{\mathrm{{comp}}}{{(s}}_k^m{{) + T}}_{k,m}^{\mathrm{{comm}}}. \end{aligned}$$
(18)

Assuming that all vehicle clusters participate in the k-th round of global aggregation, and the set of training times for all vehicle clusters \(\mathcal{C}\) in the k-th round of global aggregation is defined as:

$$\begin{aligned} {\mathcal{T}_k} = \left\{ {{{T}}_{k,1}^\mathrm{{total}},...,{{T}}_{k,m}^\mathrm{{total}},....,{{T}}_{k,M}^\mathrm{{total}}} \right\} . \end{aligned}$$
(19)

We sort the set of training times \({\mathcal{T}_k}\) in ascending order:

$$\begin{aligned} {\mathcal{T}_k}^{sort} = sort({\mathcal{T}_k}). \end{aligned}$$
(20)

We define the matrix of the number of times each vehicle cluster participates in the global aggregation of ES as:

$$\begin{aligned} {{G = }}\left( {\begin{array}{*{20}{c}} {{g_1}}&{...}&{{g_m}}&{...}&{{g_M}} \end{array}} \right) , \end{aligned}$$
(21)

where \({g_m}\) denotes the number of times that vehicle cluster \({{{c}}_m}\) participates in the global aggregation of ES, \(0 \le {g_m} \le K\).

We define the matrix of energy consumption of all vehicle cluster as follows:

$$\begin{aligned} {{E = }}\left( {\begin{array}{*{20}{c}} {{{E}}_{c,1}^{\mathrm{{loc}}}}&{...}&{{{E}}_{c,m}^{\mathrm{{loc}}}}&{...}&{{{E}}_{c,M}^{\mathrm{{loc}}}} \end{array}} \right) . \end{aligned}$$
(22)

3.3 Problem Formulation

The optimization problem formulated by the vehicle-clustered federated learning network and the CSA_FedVe framework can be described as:

$$\begin{aligned} \begin{array}{l} \mathrm{{ (P1): }}\mathop {\min }\limits _{q,\mathcal{C}} {{ }}\sum \limits _{{{k = 1}}}^{{K}} {{{{T}}_{{k}}}}\\ {{s}}{{.t}}{{. }}\left\{ \begin{array}{l} {{{C}}_1}:\left\| {\nabla {{{F}}_{}}{{(}}w_{k + 1}^{})} \right\| \le \mu \left\| {\nabla {{{F}}_{}}{{(}}w_k^{})} \right\| \\ {{{C}}_2}:\sum \limits _{{{k = 1}}}^{{K}} {{{{T}}_{{k}}}} \le {{{T}}_{\max }}\\ {{{C}}_3}:\tau \cdot {{G}} \cdot {{{E}}^{{T}}} \le {{{E}}_{\max }}\\ {{{C}}_4}:{{T}}_{k,m}^\mathrm{{total}} \le \mathop {\min }\limits _{c \in {c_m}} \{ {{T}}_{v,c}^\mathrm{{soj}}{{\} }}\\ {{{C}}_5}:q \in \{ 1,2,...,M\} . \end{array} \right. \end{array} \end{aligned}$$
(23)

In problem \(\mathrm{{P1}}\), the objective function is to minimize the FL training time while satisfying the constraints under the semi-asynchronous aggregation quantity q and the vehicle network clustering strategy \(\mathcal{C}\). Constraint \({{{C}}_1}\) corresponds to the global aggregation stopping condition [19], where \(\mu (0 \le \mu \le 1)\). When \(\mu = 0\), the global model achieves a precise solution, whereas \(\mu = 1\) indicates that no progress has been made by the global model. Constraint \({{{C}}_2}\) represents the global training time constraint of FL, and constraint \({{{C}}_3}\) represents the global training energy constraint of FL. Here, \({{{T}}_{\max }}\) and \({{{E}}_{\max }}\) refer to the maximum acceptable global training time and energy consumption of FL, respectively. Constraint \({{{C}}_4}\) denotes the total time spent on the k-th round of global aggregation of vehicle cluster \({c_m}\) must not exceed the minimum sojourn time of VUs in the vehicle cluster \({c_m}\).

As many machine learning models have complex intrinsic properties, it is difficult to find closed-form solutions for the objective function. Therefore, in Sect. 4, we designe some novel solutions that reduce the training time and resource costs of FL while maintaining learning accuracy.

4 Methodology

To solve the aforementioned problems, in this section, we propose the STW-DBSCAN clustering algorithm, which is designed to determine vehicle clustering strategies in dynamic IoV system. The algorithm decomposes the originally high-complex intrinsic properties problem into two sub-problems: the vehicle clustering problem and the semi-asynchronous aggregation mechanism problem, in order to approximate the solution to the original problem while mitigating the impact of Non-IID data on the FL process.

4.1 STW-DBSCAN Density Clustering Algorithm

In order to reduce the complexity of solving \(\mathrm{{P1}}\), we propose a density clustering algorithm (STW-DBSCAN) that relies on both the space-time location similarities and model weight similarities of vehicles to determine the clustering strategy in problem \(\mathrm{{P1}}\). The algorithm integrates the space-time location constraints into the DBSCAN [20] clustering algorithm to guarantee that VUs within the ES region stay within the range of the vehicle cluster. Additionally, considering the Non-IID of data collected by vehicles, the cosine similarity between VU model weight parameters is calculated to ensure that the vehicle data within a cluster belongs to the same distribution [12]. Lastly, The MCH is chosen based on the latest sojourn time, while the VCH is chosen based on the second latest sojourn time. Based on the algorithm, we can acquire the set of vehicle clusters \(\mathcal{C}\), and by combining \(\mathcal{C}\) with Eq.(2), we can compute MCH and VCH.

Algorithm 2
figure b

STW-DBSCAN

The STW-DBSCAN algorithm is mainly determined by the parameters of neighborhood threshold \(\varepsilon \), density threshold MinPts, vehicle sojourn time \({{T}}_n^\mathrm{{soj}}\), vehicle speed \({\bar{\upsilon }_{}}\), model weight parameter set of VU, and similarity threshold \(\delta \), where \(\varepsilon \) and MinPts are system-determined hyperparameters. For a VU \(n \in {\mathcal{V}}\), we select the set of VUs in the set \(\mathcal{V}\) whose distance from VU n does not exceed \(\varepsilon \), as the \(\varepsilon \)_neighborhood of the VU n (\({N_\varepsilon }(n)\)), denoted as:

$$\begin{aligned} {N_\varepsilon }(n) = \{ p \in \mathcal{V}|dist(n,p) \le \varepsilon \} . \end{aligned}$$
(24)

We determine the sojourn time among the VUs in the set \({N_\varepsilon }(n)\) by using Eq.(2) and compare it to determine the minimum sojourn time in \({N_\varepsilon }(n)\):

$$\begin{aligned} {{T}}_{N,n}^\mathrm{{soj}} = \mathop {\min }\limits _{{p} \in {N_\varepsilon }(n) + \{ n\} } \{ {{T}}_{v,p}^\mathrm{{soj}}\} . \end{aligned}$$
(25)

In IoV system, if the distance between the VU n and all other VUs in \({N_\varepsilon }(n)\) still satisfies within \(\varepsilon \) after the minimum sojourn time \({{T}}_{N,n}^\mathrm{{soj}}\), then it is referred to as the \(\varepsilon \)+_neighborhood of the VU n (\(N_\varepsilon ^ + (n)\)) and denoted as:

$$\begin{aligned} N_\varepsilon ^ + (n) = \{ p \in {N_\varepsilon }(n)|\sqrt{{{(({x_n} - {{\bar{\upsilon }}_n} \cdot {{T}}_{N,n}^\mathrm{{soj}}) - ({x_p} - {{\bar{\upsilon }}_p} \cdot {{T}}_{N,n}^\mathrm{{soj}}))}^2} + {{({y_n} - {y_p})}^2}} \le \varepsilon \} . \end{aligned}$$
(26)

If \(N_\varepsilon ^ + (n)\) has at least MinPts other VUs and denoted as:

$$\begin{aligned} \left| {N_\varepsilon ^ + (n)} \right| \ge MinPts, \end{aligned}$$
(27)

then vehicle cluster \(c_m^{}\) is created, and VU n and all VUs in \(N_\varepsilon ^ + (n)\) are added to the cluster, and all VUs in \(N_\varepsilon ^ + (n)\) are added to the candidate set \(\varOmega \). Each VU \(o \in \varOmega \) is checked in turn to see if \(N_\varepsilon ^ + (o)\) contains at least MinPts other VUs, if o has not been added to the vehicle cluster yet, it is added to vehicle cluster \(c_m^{}\), and o is removed from the candidate set \(\varOmega \), and \(N_\varepsilon ^ + (o)\) is added to the candidate set \(\varOmega \). This process continues until \(\varOmega {{ = }}\emptyset \). Additionally, if VUs that have not been clustered what contain at least MinPts other VUs in their \(\varepsilon \)+_neighborhood, a new vehicle cluster and candidate set are created.

Randomly select a VU \(\alpha \in c_m^{}\) as a baseline, and calculate the cosine similarity of the model weight parameters between VU \(\alpha \) and all other VUs \(\beta \in c_m^{}\),

$$\begin{aligned} \theta (w_1^{v,\alpha } ,w_1^{v,\beta } ) = \frac{{{{\left( {w_1^{v,\alpha } } \right) }^T}w_1^{v,\beta } }}{{\left\| {w_1^{v,\alpha } } \right\| \left\| {w_1^{v,\beta } } \right\| }}{{\ }}{{,\ }}\alpha {{,\ }}\beta \in c_m^{}, \end{aligned}$$
(28)

judge whether the data distribution of VUs in vehicle cluster \(c_m^{}\) is similar. If \(\theta (w_1^{v,\alpha } ,w_1^{v,\beta } ) \le \delta \), indicating low similarity, remove \(\beta \) from vehicle cluster \(c_m^{}\) and re-cluster \(\beta \), where \(\delta ( - 1 \le \delta \le 1) \) is the similarity threshold, and the closer \(\delta \) is to 1, the more similar the data distribution is, while the closer it is to -1, the less similar it is. The STW-DBSCAN algorithm is formally described in Algorithm 2.

4.2 Semi-asynchronous

Once the clustering strategy \(\mathcal{C}\) of STW-DBSCAN algorithm is fixed, \(\mathrm{{P1}}\) is redefined as a problem of solving a single variable q:

$$\begin{aligned} \begin{array}{l} \mathrm{{ (P2): }}\mathop {\min }\limits _q {{ }}\sum \limits _{{{k = 1}}}^{{K}} {{{{T}}_{{k}}}} \\ {{s}}{{.t}}{{. }}{{{C}}_1}{{,}}{{{C}}_2}{{,}}{{{C}}_3}{{,}}{\mathrm{{C}}_5}. \end{array} \end{aligned}$$
(29)

We adopt a semi-asynchronous aggregation mechanism [16] to accelerate the global model training speed. Each time global aggregation selects q cluster model weight parameters that arrive in order for aggregation.

5 Performance Evaluation

5.1 Simulation Setting

Benchmarks. We utilize three classic FL frameworks and a semi-asynchronous framework with randomized clustering as benchmarks for performance comparison.

  • FedAF: FedAF (FedAvgfull) is a synchronous FL framework, which is a variant of FedAvg [4]. In FedAF framework, all VUs participate in the global updating in each round.

  • FedASY [14]: FedASY is an asynchronous FL framework, where ES immediately performs a global updating upon receiving local model weight parameters from any VU.

  • SAFA [16]: SAFA is a semi-asynchronous FL framework. For simplicity, the client selection in SAFA is removed, and we naturally set it to half of the total number of VUs in our experiment, simulating the framework of semi-asynchronous aggregation under the condition of no clustering.

  • R-SAFA: R-SAFA adopts the SAFA aggregation framework with K-means [21] random clustering, where K is set to be the same as the number of clusters in STW-DBSCAN, simulating the framework of semi-asynchronous aggregation under the condition of general clustering.

Models and Datasets. In order to ascertain the efficacy of CSA_FedVeh framework, we conducted experiments using two disparate training models (LR [22] and CNN [23]), and on two real-world datasets (MNIST [24] and German Traffic Sign Recognition Database (GTSRD) [25]). MNIST dataset comprises 60,000 training samples and 10,000 testing samples, each of which is a grayscale image of a handwritten digit measuring 28\(\,\times \,\)28 pixels. GTSRD dataset includes 43 classes of RGB three-channel traffic sign images, divided into 39,209 training images and 12,630 testing images.

Performance Metrics. We have utilized four commonly-used performance metrics to evaluate the training performance, including:

1) Loss Function: used to measure the difference between predicted values and actual values.

2) Accuracy: indicates the proportion of correctly classified samples by the model among all samples in the dataset.

3) Runtime: indicates the time taken to complete the training process, used to measure the training speed of the model.

4) Communication Cost: represents the total communication time spent between all vehicles and the ES upon completion of the training process.

Data Distribution. Considering the heterogeneity of data distribution and the similarity of data collected from vehicles within a certain area among VUs in a real-world IoV scenarios, it is necessary to form a Non-IID dataset among VUs. To achieve data re-distribution, a mixed distribution based on Dirichlet distribution was applied [26], which highlights the similarity of data within the cluster and Non-IID among VUs. The parameters for the mixed distribution were set as \(a=1.0\) and \(n=3\).

Simulation Parameters. In the simulation of IoV scenarios for FL, we consider 50 VUs participating, with safe distances randomly scattered along the lane. The average vehicle speed is approximately 43.6km/h, and the vehicles are given random speeds. We set the neighborhood threshold \(\varepsilon \)=25m and the density threshold MinPts=1, with similarity threshold \(\delta \) of 2%. We use the same \(batchsize=64\) for all VUs. The global learning rate is set to \({\eta }\)=0.01 for both MNIST and GTSRD, and the number of local updates per epoch is set to H=30.

5.2 Simulation Results

In this section, we compared our CSA_FedVeh framework with the baseline by training models for 30000 s and 50000 s on Non-IID MNIST and GTSRD datasets, respectively. Finally, Table 2 lists more detailed training performance comparisons on MNIST and GTSRD datasets. The results show that CSA_FedVeh framework can work better even when the data distribution is Non-IID.

Table 2. Performance comparison of CSA_FedVeh with four benchmarks under two models.

Convergence Performance. In MNIST dataset, as shown in Fig. 3(e), when the global model training loss of LR model drops to 0.5, CSA_FedVeh reaches its fastest runtime of 3348 s, which is 44.7% faster than FedASY, 54.6% faster than FedAF, 48.1% faster than SAFA, and 31% faster than R-SAFA. In Fig. 3(f), when the global model training loss of CNN model drops to 0.5, CSA_FedVeh reaches its fastest runtime of 2348 s, which is 27.3% faster than FedASY, 56.6% faster than FedAF, 37.9% faster than SAFA, and 28.3% faster than R-SAFA. In GTSRD dataset, as shown in Fig. 3(g), when the global model training loss value reached 1.0, CSA_FedVeh reached the fastest running time at 19077 s, which was 48% faster than FedASY, 60.2% faster than FedAF, 53.3% faster than SAFA, and 25.7% faster than R-SAFA. As shown in Fig. 3(h), when CNN global model training loss value reached 1.0, CSA_FedVeh reached the fastest running time at 11868 s, which was 48.4% faster than FedASY, 58.9% faster than FedAF, 53.1% faster than SAFA, and 24.6% faster than R-SAFA. Results on different datasets and models indicate that CSA_FedVeh reduces the time required to reach the same loss value by about 24.6%-60.2%.

Fig. 3.
figure 3

Accuracy and Loss vs. Runtime with LR and CNN over MNIST and GTSRD.

Not only can CSA_FedVeh stabilize the convergence of the global model, but it also outperforms the four benchmarks in terms of accuracy and convergence speed. Additionally, when comparing Figs. 3(e) and 3(f), as well as Figs. 3(g) and 3(h), CSA_FedVeh exhibits faster convergence speed on CNN models compared to LR models.

Resource Constraints. In MNIST dataset, as shown in Fig. 3(a), when the running time constraint under LR model is set at 30,000 s, CSA_FedVeh achieved the highest accuracy of 94.5%. This is 2% higher compared to FedASY, 2.7% higher compared to FedAF, 2.2% higher compared to SAFA, and 1.3% higher compared to R-SAFA. As shown in Fig. 3(b), under CNN model with a running time constraint of 30,000 s, CSA_FedVeh achieved the highest accuracy of 98.4%. This is 0.7% higher compared to FedASY, 1.4% higher compared to FedAF, 0.8% higher compared to SAFA, and 0.6% higher compared to R-SAFA. In GTSRD dataset, as shown in Fig. 3(c), under a time constraint of 50000 s for LR model, CSA_FedVeh achieved the highest accuracy of 89.3%, which is 7.5% higher than FedASY, 12.9% higher than FedAF, 9.4% higher than SAFA, and 3.3% higher than R-SAFA. As shown in Fig. 3(d), under a time constraint of 50000 s for CNN model, CSA_FedVeh achieved the highest accuracy of 96.1%, which is 4.4% higher than FedASY, 8.1% higher than FedAF, 5.8% higher than SAFA, and 0.6% higher than R-SAFA. The results indicate that, under the same time budget, CSA_FedVeh achieved higher accuracy and lower loss than FedAF, FedASY, SAFA, and R-SAFA. This means that CSA_FedVeh can achieve good performance in terms of the balance between convergence speed and accuracy.

Fig. 4.
figure 4

The Comparison of Communication Resource Consumption. (a) Average Communication rounds. (b) Communication time cost at 90% accuracy of MNIST. (c) Communication time cost at 75% accuracy of GTSRD.

According to Fig. 4(a), on both MNIST and GTSRD datasets, CSA_FedVeh has an average number of communication rounds of 501.76 and 838.4, respectively. This is lower than FedASY by 324.62 and 539.3 rounds, lower than SAFA by 47.24 and 77.6 rounds, lower than R-SAFA by 77.24 and 127.1 rounds, and higher than FedAF by 196.76 and 329.4 rounds. These results indicate that CSA_FedVeh has a lower average number of communication rounds than FedASY, SAFA, and R-SAFA, and is second only to FedAF in this respect. Moreover, CSA_FedVeh achieves a higher accuracy than FedAF by 1.4%-12.9% and converges faster. According to Fig. 4(b), when LR and CNN models of MNIST dataset reach a training accuracy of 90%, the global communication time of CSA_FedVeh is 6.96 s and 3.808 s, respectively. This is a reduction of 16.6% and 19.83% compared to FedAF, a reduction of 62.07% and 49.56% compared to FedASY, a reduction of 47.76% and 37.31% compared to SAFA, and a reduction of 11.36% and 3.4% compared to R-SAFA. According to Fig. 4(c), when LR and CNN models of GTSRD dataset reach a training accuracy of 75%, the global communication time of CSA_FedVeh was 15.6 s and 10.496 s, respectively. This is a decrease of 35.13% and 35.21% compared to FedAF, a decrease of 68.4% and 68.6% compared to FedASY, a decrease of 56.99% and 56.93% compared to SAFA, and a decrease of 1.01% and 7.54% compared to R-SAFA. The results demonstrate that CSA_FedVeh achieves the optimal global communication time across all models. This implies that, with the same communication budget, CSA_FedVeh reduces communication costs between VUs and the ES by executing cluster-based semi-asynchronous aggregation.

In summary, our CSA_FedVeh framework has demonstrated superiority over four benchmarks in the following aspects. Firstly, as seen in Fig. 3, CSA_FedVeh consistently achieves better convergence than the benchmarks during training and reduces training time by approximately 24.6% to 60.2% to reach the same loss level. Secondly, as shown in Fig. 4(a), on both MNIST and GTSRD datasets, the average number of communication rounds for CSA_FedVeh is lower than FedASY, SAFA, and R-SAFA, and only slightly more than FedAF. Additionally, CSA_FedVeh achieve the lowest total communication time, the highest accuracy and the fastest convergence speed. Finally, Figs. 4(b) and 4(c) demonstrate that, compared to the baselines, CSA_FedVeh reduces communication costs by 3.4% to 62.07% on MNIST dataset and by 1.01% to 68.6% on GTSRD dataset, while achieving similar accuracy.

6 Conclusion

In this paper, we propose CSA_FedVeh, a novel cluster-based semi-asynchronous FL framework for IoV. We aspire to enhance the effectiveness of FL in the dynamic and intricate scenarios of IoV. Under this guidance, We proposed the STW-DBSCAN clustering algorithm, which takes advantage of Non-IID of data collected by vehicles to cluster vehicles with similar vehicle space-time location and high model weights similarity, efficiently addressing the straggler problem and accelerating global model training. Meanwhile, we combine the semi-asynchronous federated aggregation mechanism to accelerate the speed of global aggregation. The experimental results indicate that our proposed framework can obtain excellent performance under resource constraints on the datasets of Non-IID compared with baselines. In the future, we will explore our CSA_FedVeh framework on vehicle tasks that require stable, low-latency, and highly reliable services in IoV, such as object tracking, high-definition (HD) map generation and augmented reality (AR) navigation.