1 Introduction

In the modern digital age, numerous applications create enormous spatiotemporal data streams that must be instantly sorted and analyzed. For a variety of systems, the capacity to understand spatiotemporal data streams in real-time is essential. Processing enormous amounts of spatiotemporal data from various sources, including online traffic, social media, sensor networks, and others, is a significant challenge [1]. Because of this, storing a lot of data for analysis is impractical. Currently, the classification of this infinitely evolving data stream is being challenged by concept drifts. Concept drift is the process by which the spread of the data input or the association between the input and the desired label alters over time. Practically, there are three ways that concept drift can happen in a situation of stream learning: (a) "sudden/abrupt drift", where there is a total change in the distribution of data; (b) "gradual drift", where the current concept gradually shifts to another concept over time; and (c) "recurring drift", where the old concept reappears after a specific interval of time [2]. Figure 1 depicts the practical types of concept drifts.

Fig. 1
figure 1

Structural types of concept drift

Assuming a sample \(\left(X,Y\right)\) in the data stream has three potential classes \((Y=\left({y}_{1}, {y}_{2}, {y}_{3}\right))\), as shown in Fig. 2, and a two-dimensional feature vector \((X=\left({x}_{1}, {x}_{2}\right))\) as well. At time t, the samples exhibit a specific distribution \({P}_{t }(X, Y)\). Concept drift happens when \({P}_{t }\left(X, Y\right) \ne {P}_{t+1 }(X, Y)\) and the distribution changes at time t + 1. Concept drift, as mentioned above, can be caused by the three factors that correspond to the equation \({P}_{t}\left(X, Y\right)= {P}_{t}\left(X\right)* {P}_{t}\left(Y|X\right)\), virtual, real, and mixed drift [3], as shown in Fig. 2. Virtual drift occurs when the decision boundary \(({P}_{t}\left(Y|X\right)= {P}_{t+1}\left(Y|X\right))\) does not change but the feature vector distribution does, as seen in Fig. 2b, f, that is: \({P}_{t}\left(X\right)\ne {P}_{t+1}\left(X\right)\). Real drift on the other hand happens when the decision boundary shifts, as in Fig. 2c, g, where \({P}_{t}\left(Y|X\right) \ne {P}_{t+1}\left(Y|X\right)\), but the feature vector distribution does not change; that is: \({P}_{t}\left(X\right) = {P}_{t+1}\left(X\right)\). Finally, as seen in Fig. 2d, h, mixed drift happens when both the decision boundary change (\({P}_{t}\left(Y|X\right) \ne {P}_{t+1}\left(Y|X\right)\)) and the feature vector distribution change (\({P}_{t}\left(X\right) \ne {P}_{t+1}\left(X\right)\)) [4]. Figure 2a, e indicates the original nature of the dataset when the concept has not changed. Concept drift, a critical aspect in evolving data analysis, can be categorized in two ways based on alterations in the class prior probability denoted as P(Y). The first category, "FixedImb," as shown in Fig. 2, signifies concept drift characterized by a fixed imbalance ratio. In this scenario, the class prior probability P(Y) remains constant, while variations occur in the class-conditional probability P(X|Y). The second category, "VarImb," represents concept drift with a variable imbalance ratio. In VarImb, the class prior probability P(Y) changes, as visually represented in Fig. 2. It is important to note that the study did not specifically delve into the investigation of class imbalance but focused on the broader concept drift phenomenon.

Fig. 2
figure 2

Factors causing concept drift [4]

The learning model performs worse if the drifts are left unattended. Concept drift is the most challenging issue in real-time learning because it significantly affects the consistency of streaming spatiotemporal data classification [5]. As a result, real-time analytics on streaming or non-stationary spatiotemporal data have recently caught the attention of researchers [6]. Spatiotemporal data streams are data collections that flow continuously and alter as they enter a system. According to [7], data streams can be enormous, ordered promptly, changed quickly, and potentially endless in duration. Due to the periodic data changes on the streaming platform, the typical mining method needs to be upgraded [8].

Constructing models that can adjust to the online adaptive analytics for the anticipated and unanticipated variations in the spatiotemporal data is vital. The importance is because the traditional machine learning (ML) models cannot handle concept drift [3]. As a result, this research suggests an ML-based drift adaptive framework for spatiotemporal streaming data analytics, which deals with data that has both spatial and temporal dimensions.

The framework consists of a "Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE)" approach for model optimization, an eXtreme Gradient Boosting model (XGBoost) for learning spatiotemporal data, and a newly proposed method called Bayesian-Optimized Adaptive and Sliding Windowing (BOASWIN) for adaptation of concept drift. The effectiveness and efficiency of the suggested adaptive framework are assessed using seven open-source datasets. The following can be used to summarize the main article's contributions:

  • Novel Drift Adaptation Approach The paper introduces a novel method called "BOASWIN" to address the challenge of concept drift in spatiotemporal data. BOASWIN offers a fresh perspective by combining Bayesian optimization and sliding window techniques. This innovative approach not only detects changes in data distribution but also optimizes model parameters to adapt effectively.

  • Efficient Adaptive Framework The research presents an adaptive framework that combines "BO-TPE" for model optimization, an "XGBoost" for learning spatiotemporal data, and the BOASWIN method for concept drift adaptation. This framework offers offline and online learning functionalities, enhancing its efficiency for spatiotemporal data categorization use cases.

  • Experimental Evaluation The proposed approach is empirically evaluated using seven open-source datasets and compared against contemporary techniques. This evaluation provides evidence of the framework's effectiveness and efficiency in handling spatiotemporal data streams with varying patterns and concept drift.

While the proposed framework for spatiotemporal streaming data analytics presents several noteworthy contributions, some limitations deserve attention. Firstly, the size and complexity of the spatiotemporal datasets it encounters might influence the framework's effectiveness. Extensive experimentation on datasets with varying scales and dimensions is imperative to gauge their scalability accurately. Additionally, despite its efficiency, the framework may introduce some computational overhead when dealing with particularly large-scale data streams. Thus, further research should explore strategies to optimize its computational efficiency.

2 Related works

Various concept drift learning strategies have been developed recently to adjust to shifting concepts [9,10,11]. "Concept drift detectors" strive to spot changes in streams by either keeping an eye on the streams' distribution or the performance of a classifier concerning some standard, like accuracy. The Adaptive Sliding Window (ADWIN) [12, 13] is a standard method for assessing a classifier's prediction accuracy, and it works under the presumption that if a change in performance is seen, the concept has been altered [14]. ADWIN breaks a window W into two adaptive subwindows, analyses the underlying statistics, and utilizes W to detect distribution changes. If no change is discovered, the main window enlarges; if a difference in the statistics of the subwindows is discovered, it shrinks. Hoeffding Bound [14] allows for the recognition of the change. The "Drift Detection Method (DDM)" [15, 16], a well-liked model performance-based approach, establishes two thresholds—a warning level and a drift level—to track changes in the standard deviation and error rate of the model for drift detection [15].

In DDM, concept drift is a frequent phenomenon characterized by a significant increase in the model's overall error rate and standard deviation. Since a learner will only be changed when its performance significantly deteriorates, DDM is easy to use and can prevent unnecessary model modifications. While DDM is good at detecting abrupt drift, it frequently responds slowly to gradual drifts. The disadvantage happens because memory overflows result from storing many data samples to meet the drift level of a long, slow drift [17]. "Early drift detection method (EDDM)", a variation of DDM [18], examines the distance between two successive misclassifications rather than the total number of misclassifications. One benefit of this detector is its lack of an input parameter [6]. An incremental learning system called Online Passive-Aggressive (OPA) [19] adapts to drift by passively responding to accurate predictions and forcefully reacting to errors. The standard K-Nearest Neighbors (KNN) model for online data analytics has been improved by "Self-Adjusting Memory with KNN (SAM-KNN)" [20]. The SAM-KNN algorithm uses two memory modules to adjust to concept drift: Short-term memory (STM) for the present concept and long-term memory (LTM) for prior conceptions [20]. Lu et al. presented the chunk-based dynamic weighted majority to analyze data streams with concept drift, in which the chunk size was adaptively chosen using statistical hypothesis testing [5]. To increase the classification accuracy, Zhang et al. suggested a three-layer concept drift detection method [21]. A framework for drifting data stream classification that integrates data pre-processing and the dynamic ensemble selection approach is proposed. It uses stratified bagging to train base classifiers [22]. Concept drift detection is achieved using a cluster-based histogram, and segmentation loss minimization increases the method's sensitivity [11]. In [23], the selective ensemble technique suggests adopting a deep neural network to solve the concept drift problem. Shallow and deep features are merged in the depth unit to improve the convergence of the online deep learning model. A semi-supervised classification system was suggested by Din et al., where the micro-clusters were dynamically maintained to capture idea drift in data streams [24]. The diversified dual ensemble model is built for the drifting data stream, where the weights are updated dynamically and adaptively to identify gradual drift and rapid drift [25]. The aforementioned studies are successful at resolving concept drift, but based on this review, classifier performance received more focus than stream data distributions. As a result, in this research, we examine both classifier performance and streaming data distribution for better classifications.

3 Proposed model

This study proposed an optimized adaptive sliding window with the XGBoost model called the BOASWIN-XGBoost model, which monitors the classifier's performance and regulates the streaming distribution of data into the classifier for improved classification. Figure 3 illustrates the three components of the suggested paradigm: pre-processing (stage 1), concept drift detection and classification. An initial XGBoost model is trained using the historical dataset. Additionally, Bayesian Optimization, a hyperparameter optimization (HPO) technique, is used to adjust the XGBoost model's hyperparameters to produce the optimized XGBoost model.

Fig. 3
figure 3

Proposed model framework

The suggested system will handle the data streams continuously produced throughout time. The next step in this method is processing (stage 3) the data streams using the initial XGBoost model obtained. Suppose concept drift (stage 2) is discovered in the new data streams using the proposed BOASWIN approach, fitting the current concept of the new data streams. In that case, the XGBoost model will be retrained on the new concept data samples obtained by the adaptive window of BOASWIN. The suggested system can adapt to the new data streams' ever-changing patterns to maintain correct classifications. The classifier's performance is monitored, and the distribution of data streams is also controlled.

3.1 XGBoost model

A powerful ensemble machine learning model built on decision trees is the eXtreme Gradient Boosting (XGBoost) model [26]. The XGBoost discussed in [27] was created using a GBDT (Gradient Boosting Decision Tree), and it was shown to have excellent convergence and generalization speed [28]. In [28], the XGBoost algorithm's goal function and optimization strategy were introduced. XGBoost's target function is given by Eq. 1 [29].

$$Obj\left(\theta \right)=L\left(\theta \right)+\Omega \left(\theta \right)$$
(1)

where \(L\left(\theta \right)=l\left({\widehat{y}}_{i},{y}_{i}\right)\) and \(\Omega \left(\theta \right)=\gamma {\text{T}}+\frac{1}{2}\lambda \Vert \mathcalligra{w}{||}^{2}\).

The objective function (\(Obj\left(\theta \right)\)) which is to be optimized is divided into two sections: \(L\left(\theta \right)\) and \(\Omega \left(\theta \right)\). θ corresponds to the formula's numerous parameters. The goal is to find the values of θ that minimize this function. The difference between the forecast \({\widehat{y}}_{i}\) and the target \({y}_{i}\) is measured by \(L\left(\theta \right)\), a differentiable convex loss function. The point is to demonstrate how to incorporate the facts into the framework [29]. Convex loss functions frequently employed, such as the mean square loss function in Eq. 2 and the logistic loss function shown in Eq. 3, can be employed in the above equation.

$$l\left({\widehat{y}}_{i},{y}_{i}\right)={\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2}$$
(2)
$$l\left({\widehat{y}}_{i},{y}_{i}\right)={y}_{i}\mathrm{ln}\left(1+{e}^{-{\widehat{y}}_{i}}\right)+\left(1+{y}_{i}\right)\mathrm{ln}\left(1+{e}^{{\widehat{y}}_{i}}\right)$$
(3)

Complex models are penalized by the regularized term \(\Omega \left(\theta \right)\). \(\mathrm{T}\) is the number of leaves in the tree, and \(\gamma \) is the learning rate, which ranges from 0 to 1. When multiplied by \(\mathrm{T}\), it equals spanning tree pruning, which prevents overfitting. When compared to the classic GBDT algorithm, the XGBoost algorithm increases the term \(\frac{1}{2}\lambda \Vert \mathcalligra{w}{||}^{2}\). The regularized parameter is, while \(\mathcalligra{w}\) is the weight of the leaves. The value of this item can be increased to prevent the model from fitting and to improve its generalization capabilities. On the other hand, including model penalty items with functions as parameters leads to the failure of classical approaches to be optimized by the objective function in Eq. 1. As a result, we must assess if we can learn to obtain the aim \({y}_{i}\) as seen in Eq. 4 [29].

$$Obj\left(\theta \right)=\sum_{i=1}^{n}l\left({y}_{i},{{\widehat{y}}_{i}}^{\left(t-1\right)}+{S}_{t}\left({\mathrm{T}}_{i}\right)\right)+\Omega \left(\theta \right)$$
(4)

where, in the \(t\) iteration, \({S}_{t}\left({\mathrm{T}}_{i}\right)\) denotes the tree produced by instance \(i\), and n number of points.

The optimization target in each iteration is to build a tree design that minimizes the aimed function. Hence, when solving the square loss function, the objective function of Eq. 4 is optimal, but it becomes tricky when calculating other loss functions. As a result, Eq. 4 translates Eq. 5 using the two-order Taylor expansion, allowing further loss functions to be solved.

$$Obj\left(\theta \right)=\sum_{i-1}^{n}\left[l\left({y}_{i},{\widehat{y}}^{\left(t-1\right)}\right)+{g}_{i}{S}_{t}\left({\mathrm{T}}_{i}\right)+\frac{1}{2}{h}_{i}{S}_{t}^{2}\left({\mathrm{T}}_{i}\right)\right]+\Omega \left(\theta \right)$$
(5)

where \({g}_{i}={\partial }_{{\widehat{y}}^{\left(t-1\right)}}l\left({y}_{i},{\widehat{y}}^{\left(t-1\right)}\right)\) which is the 1st derivative of the error function and \({h}_{i}={\partial }_{{\widehat{y}}^{\left(t-1\right)}}^{2}l\left({y}_{i},{\widehat{y}}^{\left(t-1\right)}\right)\) is the 2nd derivative of the error function.

Because the tree model needs to find the best segmentation points and store them in several blocks, the algorithm ranks the eigenvalues based on the realization of XGBoost. This structure is reused in subsequent iterations, resulting in a significant reduction in computing complexity. Furthermore, the information gain of each feature must be determined during the node splitting process, which employs the greed algorithm, as shown in algorithm 1, allowing the calculation of information gain to be parallelized [28].

Algorithm 1
figure a

Split finding greed algorithm

Algorithm 1 utilizes a greedy approach as its fundamental strategy. Its primary concept involves an initial sorting of the data based on eigenvalues. Subsequently, it proceeds by iterating through each feature. It considers every possible value as a potential splitting point for each feature and computes the corresponding gain and loss. After evaluating all features in this manner, the algorithm identifies the most distinctive value for gain loss as the optimal splitting point. Within the algorithm, 'j' represents the index used to iterate through all eigen attribute values during the sorting process, while 'k' is employed to iterate through all samples.

3.2 Dynamic hyperparameter tuning

Dynamic hyperparameter tuning strategies are pivotal in ensuring that machine learning models maintain their effectiveness in changing data characteristics. These strategies enable models to adapt and optimize their hyperparameters to match evolving data distributions.

Dynamic hyperparameter tuning, as outlined in the literature [30], involves automatically adjusting hyperparameters during training and inference. One of the primary ways it achieves this is through a learning rate schedule. Learning rate scheduling dynamically adapts the learning rate based on performance metrics or predefined schedules. For instance, the learning rate may be reduced when the loss plateaus, allowing the model to fine-tune its parameters more delicately in response to data changes [31].

Another strategy involves early stopping, a widely recognized technique [32]. By monitoring a validation metric such as validation loss, the model's training can be halted when it begins to deteriorate, thereby preventing overfitting and ensuring that the model remains robust to variations in data characteristics. Adaptive optimizers like Adam and RMSprop [33] are also valuable in dynamic hyperparameter tuning. These optimizers adapt the learning rates for individual model parameters based on their gradients, allowing the model to navigate through varying data landscapes effectively.

Hyperparameter search methods, as discussed in research by Wu et al. [34], can continuously seek optimal hyperparameter configurations as data evolves. Techniques like Bayesian optimization or grid search can be employed to identify the best hyperparameters for the current data distribution, ensuring the model's adaptability. Ensemble models [35] and online learning [19] are further strategies for adapting models to changing data patterns by combining multiple models or incrementally updating the model with new data.

In summary, dynamic hyperparameter tuning strategies, backed by research in the field, provide the means for models to adapt their hyperparameters to continuously changing data characteristics. By incorporating these strategies, machine learning models can maintain their performance and relevance over time. They are well-suited for real-world applications where data is subject to fluctuations and shifts.

3.2.1 Bayesian optimization (BO)

BO [36] models were created to solve optimization issues. BO comprises two essential components: surrogate models for simulating the objective function and an acquisition function for measuring the value produced by the objective function's assessment at a new location [37]. These activities, exploration and exploitation occur during BO processes. Exploration is exploring previously unexplored areas, whereas exploitation is analyzing samples in the current zone where the global optimum is most likely. These activities should be balanced according to BO models [38]. "The Gaussian Process (GP)" and "The Tree Parzen Estimator (TPE)" are two popular models used as BO surrogate models [39, 40]. Based on the surrogate model used, BO models can be divided into BO-GP and BO-TPE models [41]. In this study, we adopted the BO-TPE due to the drawback of BO-GP, which restricts parallelizability due to its cubic computational complexity, O(n3). Among BO surrogate models, the Tree-structured Parzen Estimator (TPE) [40] is well-liked. BO-TPE creates two density functions, l(x) and g(x), that act as generative models for all processed data instead of deriving a prediction distribution for the objective function. As part of BO-TPE, the input data is divided into two groups (good and poor observations) depending on a predetermined threshold * that is modeled using standard Parzen windows (Eq. 6):

$$p\left(x|y, D\right)= \left\{\begin{array}{c}l\left(x\right), if\,y< {y}^{*}\\ g\left(x\right), if\,y>{y}^{*}\end{array}\right.$$
(6)

where y = f(x) represents the prediction for input data x, and D is the configuration search space.

Due to its capacity to optimize complex configurations with low computational complexity of O(nlogn), BO-TPE has demonstrated excellent performance when applied to a variety of machine learning applications [37, 38]. Furthermore, TPE can accurately handle conditional variables because it uses a tree structure to keep conditional dependencies [42]. Hence, we used the BO to optimize the proposed adaptative XGBoost and the adaptive sliding windows for effective concept drift handling in spatiotemporal data streams.

3.3 Bayesian optimized adaptive sliding window (BOASWIN)

As seen from the literature [13, 17], the adaptive sliding windows continue to grow large enough to detect a drift; however, this is a drawback. Since drifts like the gradual drift may occur unnoticed, this will give the classifier a wrong classification since the gradual drift is not detected promptly. To handle this challenge and check the distribution of the streams, we proposed BOASWIN. Here, our windows are made to be variables, giving room for close monitoring.

The BOASWIN approach is suggested in this study to provide reliable analytics. It is intended to detect concept drift and adapt to the continually changing data stream. BOASWIN was created based on synthesizing concepts from sliding and adaptable window-based methods, performance-based approaches, and window-based strategies. Two essential functions, "ConceptDriftAdaptation" and "HPO_BO-TPE," comprise the entire BOASWIN technique. With the help of the supplied hyperparameter settings, the "ConceptDriftAdaptation" function seeks to identify concept drift in streaming data and update the XGBoost model with fresh concept samples for drift adaptation. The "ConceptDriftAdaptation" function's hyperparameters are tuned and optimized using BO-TPE by the "HPO_BO-TPE" function. A sliding window (P) for concept drift detection and an adaptive window (Pmax) for storing new concept samples are the two different types of windows in BOASWIN. The concept drift detection method also uses two thresholds to indicate the drift level: α and the warning level: β.

Four parameters in the BOASWIN proposed method, α, β, P, and Pmax are the crucial hyperparameters that directly affect how well the BOASWIN model performs. Since BO is successful for both discrete and continuous hyperparameters, to which the hyperparameters of BOASWIN correspond, it is utilized to change these four hyperparameters to provide the optimal adaptive learner. We adopted the adaptive sliding window algorithm proposed in [43] [44] and modified it to achieve our goals. Hence, the algorithm for our proposed BOASWIN is given in algorithm 2.

Algorithm 2
figure b

Bayesian-optimized adaptive sliding window (BOASWIN)

4 Experimental analysis

Our experiment is carried out and analyzed using the River [45] library and Python 3.9. Our suggested approach, BOASWIN-XGBoost, is compared to seven cutting-edge models, including ADWIN, DDM, EDDM, OPA, SAM-KNN, SRP, and XGBoost. Here, we aim to assess a fair comparison between these models regarding how well they perform in the face of various types of concept drift.

4.1 Datasets used in the study

Where an actual drift genuinely is must be determined before we can evaluate a drift detector's performance using the various detection criteria. Only synthetic datasets make this possible. The scikit-multiflow framework enables the creation of various types of synthetic data to simulate the occurrence of drifts [46]. Table 1 includes specific details about the seven datasets that were used in this research.

Table 1 Attributes of the datasets

Agrawal generator [47] has three categorical elements and six numeric attributes to describe the hypothetical loan applications. A perturbation factor for the numeric characteristics offsets the actual value and causes it to shift. It can provide ten functions to assess whether or not the loan should be granted. By altering the functions, the concept drift takes place.

SEA generator [48] is comprised of two classes, three numerical attributes produced at random, and noise for the third attribute. In the range [0,10], the numbers are created at random. Each instance is classified as class 1 if \({f}_{1}+{f}_{2} \le \theta \), where \({f}_{1}\) and \({f}_{2}\) are the first two characteristics, and is a threshold that generates several contexts, has a value of 8, 9, 7, or 9.5.

The weather dataset consists of over 9000 weather stations worldwide and has provided data to the US National Oceanic and Atmospheric Administration. Records go back to the 1930s and offer various weather patterns. Temperature, pressure, wind speed, and other variables are measured every day, along with indications for precipitation and other weather-related phenomena. We used the Offutt Air Force Base in Bellevue, Nebraska, as a representative real-world dataset for this experiment because of its vast period of 50 years (1949–1999) and a variety of weather patterns that make it a long-term precipitation classification/prediction drift challenge [49].

HyperPlane In this data set, the ideas that have gradually changed are calculated using the formula \(f\left(x\right)={\sum }_{i=1}^{d-1}{a}_{i}*\left(\left({x}_{i}+{x}_{i+1}\right)/{x}_{i}\right)\), where d = 10 is the dimension and ai is utilized to regulate the decision hyperplane [50].

The phishing dataset, distinguishing between dangerous and benign web pages, is taken from [51]. A typical classification problem was assumed to be represented by the digits dataset [51].

4.2 Results and discussion

BO automatically tunes the hyperparameters of the XGBoost and BOASWIN models to produce optimum versions. Table 2 displays the XGBoost and BOASWIN models' initial hyperparameter search range and discovered hyperparameter values for the seven datasets under consideration. After applying BO to create optimized models for spatiotemporal classifications, the proposed models were given the ideal hyperparameter values. Table 3 shows the default and optimized classification accuracy for the seven datasets used, in which the optimized were higher than the default.

Table 2 Hyperparameters of XGBoost and BOASWIN
Table 3 Default and optimized performance accuracies of the XGBoost model

4.2.1 Analysis of varying window size

As part of our goal to monitor the stream data distribution in the sliding windows, Table 4 shows the experiments carried out in this study with varying window sizes to see which sizes of these windows produced good results in the presence of concept drift. It was observed that moderate-sized windows produce good outputs regarding the classifier accuracy on the seven datasets used in this study. The bold values from Table 4 were the best parameters; hence, they were used to detect changes in the datasets used, and the results of the experiments are shown in Fig. 4.

Table 4 Performance of the varying window sizes on the seven datasets
Fig. 4
figure 4

Drift points detection graphs

4.2.2 Drift points

Drift points are those specific moments or data points where these changes become evident, often leading to the need for model adaptation or retraining. Identifying and monitoring drift points is crucial for maintaining model accuracy and effectiveness in applications that involve evolving data distributions. In Fig. 4, the dots indicate the change points in the datasets used. The blue lines representing our proposed model could track the point of drifts while maintaining a higher classification accuracy than the offline XGBoost model in red lines. The suggested accuracy of the BOASWIN + XGBoost model is compared in Table 5 to the cutting-edge drift adaptive techniques described in an earlier section.

Table 5 Comparison of the effectiveness of drift adaption techniques

4.2.3 Analysis of AGRAWAL dataset

Table 5 shows that the suggested adaptive model performs better than all previous techniques regarding accuracy on the seven datasets used in this study. Bold values show the best outcomes for each dataset in Table 5.

The proposed technique, implemented on the AGRAWAL_a dataset and illustrated in Fig. 5, attained the most excellent accuracy of 70.83% among all implemented models by adjusting to the sudden concept drift found in the dataset. The offline XGBoost model's accuracy is 70.53% without drift adaption, which is slightly less accurate. The accuracy ratings of the other six cutting-edge methods are also less accurate than those of our proposed strategy. Additionally, as shown in Fig. 6, our suggested model (BOASWIN + XGBoost) indicated as "OURs" beat other state-of-the-art models in terms of precision, recall, and f1-score (69.61%, 67.87%, and 68.73%, respectively).

Fig. 5
figure 5

Comparison of the accuracy of the AGR_a dataset using different drift adaption techniques

Fig. 6
figure 6

Comparison of the proposed model's precision, recall, and f1-score and other state-of-the-art models on the Agrawal_a dataset

There is a gradual drift on the AGRAWAL_g dataset. As seen in Fig. 7 and Table 5, the suggested technique attained the best accuracy of 71.02% by responding to the gradual drift identified. In comparison, the offline XGBoost model's accuracy reduces significantly to only 70.28% without drift adaptation. This places a focus on the advancement of our proposed drift adaption technique. The proposed technique is substantially more accurate than the other six examined methods, OPA, SAM-KNN, SRP, ADWIN, DDM and EDDM, with accuracy values of 50.32%, 53.71%, 67.15%, 67.78%, 67.18%, and 67.03% respectively. Figure 8 depicts our proposed model's precision, recall, and f1-score comparison with the models experimented with in this study. However, the proposed model was best in precision and f1-score with values of 69.47% and 69.00%, respectively, while the XGBoost model had the highest recall value of 69.83%.

Fig. 7
figure 7

Comparison of the accuracy of the AGRAWAL_g dataset using different drift adaption techniques

Fig. 8
figure 8

Comparison of the proposed model's precision, recall, and f1-score and other state-of-the-art models on the Agrawal_g dataset

The success of "BOASWIN + XGBoost" on the AGRAWAL datasets can be attributed to the synergistic combination of Bayesian optimization (BOASWIN) and the XGBoost model. While other methods struggle to adapt to concept drift adequately, the proposed approach optimizes model hyperparameters dynamically, ensuring robust performance even in changing data distributions.

4.2.4 Analysis of the SEA datasets

The suggested BOASWIN + XGBoost model's accuracy is compared on the SEA_a dataset, which contained sudden drift, and on the SEA_g dataset, which contained gradual drift. Figure 9 compares the proposed model outperforming the other models by adjusting to the sudden concept drift found in the dataset. Figure 10 compares the proposed model outperforming the others by adjusting to gradual drift in the SEA_g dataset. The proposed model attained the most remarkable accuracy of 76.76% on the SEA_a dataset in Fig. 9 and 76.96% on the SEA_g dataset in Fig. 10 among all implemented models. Figure 11 depicts our proposed model's precision, recall, and f1-score comparison with the models experimented with in this study. However, the proposed model was best in precision with a value of 76.81%, while the XGBoost model had the highest values of recall and f1-score of 80.05% and 76.82%, respectively.

Fig. 9
figure 9

Comparison of the accuracy of the SEA_a dataset using different drift adaption techniques

Fig. 10
figure 10

Comparison of the accuracy of the SEA_g dataset using different drift adaption techniques

Fig. 11
figure 11

Comparison of the proposed model's precision, recall, and f1-score and other state-of-the-art models on the SEA_a dataset

Figure 12 depicts our proposed model's precision, recall, and f1-score comparison with the other models experimented with in this study. The proposed BOASWIN + XGBoost model outperformed all the other models concerning precision and f1-score with the highest values of 76.93% and 76.86%, respectively, while the XGBoost model has the best recall value of 79.25%.

Fig. 12
figure 12

Comparison of the proposed model's precision, recall, and f1-score and other state-of-the-art models on the SEA_g dataset

4.2.5 Analysis of the HYPERPLANE dataset

A significant drift at the start of the HYP dataset test set contained sudden and reoccurring concept drift. As seen in Fig. 13, the suggested technique attained the best accuracy of 84.26% by responding to both the sudden and reoccurring drifts identified. In comparison, the offline XGBoost model's accuracy reduces significantly to only 74.66% without drift adaptation. The proposed technique is substantially more accurate than the other six examined methods, OPA, SAM-KNN, SRP, ADWIN, DDM and EDDM, with accuracy values of 81.95%, 75.59%, 76.62%, 79.59%, 78.26%, and 77.44% respectively. Figure 14 compares the precision, recall, and f1-score of our proposed model on the HYP dataset with the other models in this study. The proposed BOASWIN + XGBoost model outperformed all the other models concerning precision, recall, and f1-score with the highest values of 84.02%, 84.64%, and 84.33%, respectively.

Fig. 13
figure 13

Comparison of the accuracy of the HYP dataset using different drift adaption techniques

Fig. 14
figure 14

Comparison of the proposed model's precision, recall, and f1-score and other state-of-the-art models on the HYP dataset

4.2.6 Analysis of PHISHING and WEATHER datasets

The proposed BOASWIN + XGBoost model's accuracy is evaluated using the real-world datasets PHI and WET, which contain drifts. Using the PHI data set, Fig. 15 compares the performance of the suggested model with that of the competing models. Figure 16 compares the performance of the proposed model with that of the other models using the WET data set. The proposed model outperformed the other models by responding to changes detected in the PHI data set with an accuracy of 95.53%. The proposed model's accuracy score of 78.35% was the highest for the WET data set. Figure 17 compares the precision, recall, and f1-score of our suggested model on the PHI data set with the other models tested in this work. The proposed BOASWIN + XGBoost model outperformed all the other models concerning precision, recall, and f1-score with the highest values of 94.99%, 97.10%, and 96.03%, respectively.

Fig. 15
figure 15

Comparison of the accuracy of the PHI dataset using different drift adaption techniques

Fig. 16
figure 16

Comparison of the accuracy of the WET dataset using different drift adaption techniques

Fig. 17
figure 17

Comparison of the proposed model's precision, recall, and f1-score and other state-of-the-art models on the PHI dataset

Figure 18 compares our suggested model's precision, recall, and f1-score with the models tested in this study on the WET data set. However, the XGBoost model had the highest precision value of 73.69%, while the proposed model had the best recall and f1-score with values of 58.66% and 63.38%, respectively.

Fig. 18
figure 18

Comparison of the proposed model's precision, recall, and f1-score and other state-of-the-art models on the WET dataset

The results obtained from the proposed "BOASWIN + XGBoost" model exhibit significant implications for both false positives and false negatives in classification tasks. These implications stem from the model's performance in key metrics such as precision, recall, F1-score, and accuracy, directly influencing its ability to handle concept drift effectively.

Firstly, let's consider the effect of these results on false positives (Type I Errors). Precision, a critical metric, represents the ratio of true positives to the total predicted positives. When "BOASWIN + XGBoost" achieves higher precision than other models, it implies that the model correctly classifies positive instances while minimizing false positives. This outcome is paramount when false positives can have substantial consequences, such as in medical diagnosis or fraud detection. The model's superior precision suggests that it reduces the risk of falsely flagging instances as positive when they are, in fact, harmful.

On the other hand, the results also have a significant impact on false negatives (Type II Errors). Recall, another essential metric, quantifies the ratio of true positives to the total actual positives. When "BOASWIN + XGBoost" achieves higher recall, the model is proficient at capturing actual positive instances while reducing false negatives. In practical terms, the model is less likely to miss positive cases, leading to a lower rate of false negatives. This characteristic is especially critical in applications where missing positive instances can have severe consequences, such as in medical screenings or cybersecurity, where failing to detect diseases or security breaches can be detrimental.

Moreover, the F1-score, a metric that balances precision and recall, plays a pivotal role. A higher F1-score achieved by "BOASWIN + XGBoost" suggests an effective trade-off between reducing false positives and false negatives. This balance is crucial in real-world scenarios where both types of errors can have significant implications. The model's ability to maintain high precision and recall implies that it can adapt to concept drift without disproportionately increasing either false positives or false negatives, making it an invaluable choice for applications where balanced performance is paramount.

In conclusion, the performance results of "BOASWIN + XGBoost" on precision, recall, F1-score, and accuracy collectively indicate its capability to achieve a harmonious equilibrium between minimizing false positives and false negatives. This equilibrium is essential in diverse real-world settings where the consequences of classification errors can vary widely. The model's ability to maintain this balance while handling concept drift positions it as a robust and adaptable solution for applications demanding accurate and well-balanced classifications.

4.2.7 Analysis of the average time

As shown in Fig. 19 and Table 5, the suggested method for real-time learning is evaluated by calculating the average prediction time for each occurrence while considering time in spatiotemporal systems. OPA, SAM-KNN, SRP, ADWIN, DDM, and EDDM all have prediction times that are less than the suggested model, but their accuracy is substantially worse. Regarding the trade-off between accuracy and efficiency, the proposed method continues to outperform the methods in the presence of concept drift. The experimental findings demonstrate the potency and reliability of the suggested BOASWIN + XGBoost model for spatiotemporal streaming data analytics.

Fig. 19
figure 19

Comparison of the average execution time of all the models used on the seven experimented datasets

In the AGR_a and AGR_g datasets, most models exhibit relatively short processing times, ranging from 0.36 to 65.43 min. While OPA operates swiftly, SRP, DDM, and EDDM require the most extensive processing durations. However, BOASWIN-XGBoost stands out for its relatively longer processing times in these datasets, with 40.03 and 65.43 min, respectively.

Across the SEA_a and SEA_g datasets, OPA, SAM-KNN, and ADWIN consistently demonstrate minimal processing times within the 0.02–0.33 min range. Here, too, BOASWIN-XGBoost exhibits longer processing times, which can be attributed to its intricate algorithm and comprehensive approach to handling concept drift.

However, XGBoost and BOASWIN-XGBoost stand out for their relatively longer processing times, especially in the case of SEA_a, where XGBoost's duration is notably higher. OPA is the quickest in the HYPERPLANE dataset, taking only 0.07 min. In contrast, BOASWIN-XGBoost significantly extends the processing time to 25.73 min, suggesting it may not be ideal for real-time applications within this dataset. This extended processing time for BOASWIN-XGBoost can be attributed to the complexity of its algorithm, which likely involves advanced techniques to maintain high predictive accuracy in the face of concept drift.

In the PHISHING dataset, OPA boasts the fastest processing time at 0.007 min, while both XGBoost and BOASWIN-XGBoost require more extensive processing periods, with XGBoost notably exceeding the duration of OPA. Again, this longer processing time for BOASWIN-XGBoost reflects its thorough approach to concept drift handling.

Lastly, within the WEATHER dataset, OPA maintains its reputation as the swiftest, with a mere 0.004 min. BOASWIN-XGBoost necessitates more processing time but remains within reasonable limits at 2.32 min. The extended processing time for BOASWIN-XGBoost in various datasets can be attributed to its algorithm's complexity and the thoroughness with which it tackles concept drift, resulting in higher predictive accuracy but longer processing durations.

BOASWIN-XGBoost appears to excel for several reasons in the context of drift adaption techniques. First and foremost, it is crucial to recognize that its average time consumption does not solely determine the effectiveness of a drift adaption technique. Instead, it balances time efficiency and maintaining high predictive accuracy in the face of concept drift. BOASWIN-XGBoost appears to strike this balance effectively, as it consistently achieves competitive or even superior performance compared to other techniques.

One key factor contributing to the strong performance of BOASWIN-XGBoost is its adaptability. Concept drift, which occurs when the underlying data distribution changes over time, is a common challenge in many machine learning applications. BOASWIN-XGBoost possesses a robust mechanism for detecting and adapting to these changes efficiently, reflected in its high accuracy, precision, recall, and F1-score across diverse datasets.

In conclusion, the BOASWIN + XGBoost model's suitability in real-time or resource-constrained scenarios depends on the task's context and requirements. While it can be computationally expensive, its accuracy benefits should be balanced against available resources and decision urgency. Careful model selection, deployment optimizations, and hardware choices can make it viable in various applications.

5 Conclusion

In this research endeavor, we have delved into handling concept drift within non-stationary spatiotemporal data streams. This challenge has grown exponentially in significance with the proliferation of data-rich environments. Our novel approach, BOASWIN, which marries an adaptive XGBoost-based model with the BO-TPE hyperparameter optimization strategy, has emerged as a potent tool for spatiotemporal data analytics. The outcomes of our extensive experimentation, involving seven diverse datasets (AGR_a, AGR_g, SEA_a, SEA_g, HYP, PHI, and WET), have yielded insights of paramount importance. One of the paramount findings of our research is the remarkable and consistent superiority of our model's classification performance over a spectrum of state-of-the-art drift adaptation techniques. On dataset AGR_a, BOASWIN + XGBoost achieved an accuracy rate of 70.83%.

Similarly, on dataset AGR_g, the model demonstrated an accuracy rate of 71.02%. This trend of outperforming other techniques was maintained across datasets SEA_a (76.76%), SEA_g (76.96%), HYP (84.26%), PHI (95.5%), and WET (78.35%). These results underscore the model's effectiveness in maintaining high classification accuracy rates across all datasets examined. The adaptability of BOASWIN + XGBoost, which enables it to respond autonomously to evolving data patterns, emerges as a critical asset in this research. Not only does it enhance classification accuracy, but it also ensures that models remain pertinent in scenarios where data distributions undergo continuous and unpredictable changes. This adaptability is a testament to the model's practicality in real-world applications where dynamic data streams are the norm. The implications of our research extend far beyond academia, carrying profound significance for a wide array of practical applications. Fields such as environmental monitoring, urban planning, and disaster management stand to gain immensely from the availability of reliable and adaptive classification models. Our work represents a significant step in ensuring these domains can make informed decisions despite rapidly changing spatiotemporal data. As we chart our course into the future of spatiotemporal data analytics, we anticipate that our findings and limitations, as presented in section one, will catalyze the development of more resilient and effective solutions in handling concept drift within spatiotemporal data streams, thereby benefiting a multitude of applications and domains.