A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

Sidhu, Parneeta; Bhatia, M. P. S.

doi:10.1007/s13042-017-0738-9

A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

Original Article
Published: 02 November 2017

Volume 10, pages 563–578, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

Download PDF

Parneeta Sidhu¹ &
M. P. S. Bhatia¹

423 Accesses
12 Citations
Explore all metrics

Abstract

We present an ensemble system, recurring dynamic weighted majority (RDWM) that maintains two ensembles of experts, so as to accurately handle drifting concepts mainly recurrent drifts. The primary online ensemble represents the present concepts and the secondary ensemble represents the old concepts since the beginning of learning. An effective pruning methodology helps to remove redundant and old classifiers, which may have otherwise caused interference in learning the new concepts. Experimental evaluation using datasets proves that RDWM achieves very high generalization accuracy, irrespective of the speed or severity of drift; or presence of noise in the dataset.

A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

Article 31 January 2015

An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

Article 30 April 2015

Droplet Ensemble Learning on Drifting Data Streams

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Mining large streams of data is an upcoming area of research in the machine learning community. Data stream mining is the process of understanding the underlying concepts in data and analyzing drifts [3, 6, 32], so as to accurately classify the new instances. A drift could be sudden, gradual, recurring, or incremental. Sudden change is observed when the concept changes from one class to another within a single time step. Gradual change occurs when the new concept emerges gradually over time. A change is said to be recurrent if an old concept reappears after some time. The drift is incremental if any two consecutive concepts are almost similar and the drift is felt only after a longer time period. Further, a drift can also measured by its severity and speed. Severity represents the amount of changes caused by a new concept and speed is the inverse of the total time taken for a new concept to completely replace the old concept. Various applications where drifts have been observed are Market-Basket analysis [12], computer security, medical diagnosis etc.

Online approaches [1, 4, 6, 12, 16, 18, 26, 37] process each instance only “once” on arrival without storing it for further processing. These can be categorized as: approaches that explicitly use a mechanism to handle drifts [1, 6, 18]; and that does not explicitly use a mechanism for drift detection [4, 12]. Online approaches may either be a single classifier; or a single ensemble; or an active classifier and a set of weighted classifier systems. None of the existing systems maintain more than one ensemble in its model. It has been studied that an ensemble of classifiers [5, 7, 11, 35, 38] provides higher generalization accuracy [3, 30, 36] as compared to a single classifier system. Hence, we have proposed Recurring Dynamic Weighted Majority system (RDWM) that maintains two ensembles: a primary online ensemble and a secondary ensemble, for more accurate handling of drifting concepts mainly recurrent drifts. The ensembles vary in the type of concept they represent and may perform differently depending on the speed or severity of drift. The primary ensemble represents the present concepts and is trained and updated as in Dynamic Weighted Majority (DWM) [12]; and the secondary ensemble consists of the most accurate experts being copied from the primary ensemble at times of drift. Experimental evaluation using various datasets proves that for drifts with high speed or low speed (independent of severity), the primary ensemble provides better accuracy than the secondary ensemble. For recurrent drifts, the secondary ensemble provides better accuracy as compared to the primary ensemble. Hence, RDWM performs better or at least similar as the existing systems for drift detection.

2 Research questions and paper organization

The paper aims at answering the following questions:

1.
Why do we need to maintain two ensembles in RDWM? How does it ensure improved system performance while handling drifts?
2.
Does our system always provide better accuracy as compared to the existing systems for drift detection?
3.
How does severity impact the performance of RDWM in terms of prequential accuracy, kappa statistic, model cost, time and memory?
4.
What is the impact of change in the base classifier on the performance of RDWM?
5.
How does the presence of noise impact our systems’ performance?

The answer to the first question is that RDWM needs to maintain a primary online ensemble and a secondary ensemble so as to achieve the best generalization accuracy while handling drifts. If the drift has high speed and low severity, the new concept is not the same but quite similar to the recent old concept. Thus, the updated primary ensemble has a very high possibility of providing good accuracy. If the drift has high speed and high severity, it results in big changes very suddenly. Hence, the re-initialized primary ensemble may provide good accuracy. For low speed drift (independent of severity), the new concept gradually replaces the old concept. Hence, immediately after the beginning of drift the new concept would be quite similar to the old concept. Thus, the primary ensemble has a very high possibility of achieving better accuracy. However, longer after the drift when the new concept would be quite different from the old concept, the re-initialized primary ensemble may provide better accuracy levels. For recurrent drifts, the secondary ensemble maintaining the old, most accurate experts provides better classification accuracy than the primary ensemble.

For answering the second question, we evaluated RDWM using various datasets with variation in the speed of drift such as Stagger concepts [26] (sudden drift), moving hyperplane dataset [9] (gradual drift), real-time datasets such as electricity pricing dataset [8], power supply stream [25], KDD CUP 1999 dataset [34] and static datasets e.g. breast cancer dataset [31]. The analysis identifies that while handling sudden drift, RDWM achieved the best accuracy among the single ensemble DWM [12], single classifier EDDM [1], naïve bayes (NB) [20] and Hoeffding Tree (HT) [30]. Further, RDWM performs slightly better than DWM [12] and EDDM while handling gradual drifts. Experimental evaluation of RDWM using hyperplane dataset shows that RDWM performs the best when severity is high and performs worse or similar as the other approaches when severity is low. High severity of drift results in big changes in concept. The re-initialized primary ensemble helps RDWM achieve better accuracy as compared to the updated DWM ensemble; re-build single classifier in EDDM and the standard implementation of NB with no drift handling capabilities. When severity is low, the new concept is quite similar to the old concept. Our systems’ prediction is the global prediction by its updated primary ensemble, which performs almost similarly as the updated ensemble in DWM. Evaluation using various real time drifting datasets shows that RDWM performs better than DWM, EDDM, ADWIN [23], DDM [6], PL [22], NB and HT. For static datasets, our system performs almost similarly as NB, but slightly better than DWM and EDDM.

For answering the third question, we evaluated our system using hyperplane dataset with varying severity levels. It has been observed that as the severity increases, RDWM’s accuracy drops and the system has reduced homogeneity among its experts. However, variation in severity does not affect the performance of RDWM in terms of the other metrics.

For answering the fourth question, we evaluated RDWM using NB and HT as its base classifiers. NB treated all the attributes as independent whereas HT assumed feature dependence. RDWM with NB as the base classifier (RDWM-NB) performed better as compared to RDWM with HT as its base classifier (RDWM-HT). HT itself as a classifier achieves better or similar accuracy as NB. Hence, we can state that the better accuracy of RDWM-NB as compared to RDWM-HT is only because of the methodology inherent in RDWM and nothing due to the choice of the base classifier used.

For answering the next question, we evaluated RDWM in a noisy domain. For datasets with gradual drifts and noise, RDWM shows high sensitivity to noise, resulting in reduced prequential accuracy and kappa statistics. However, noise does not impact our systems’ performance in terms of memory, evaluation time and model cost.

The paper is further organized as follows. In Sect. 3, we give an overview of the various existing approaches for drift detection. Section 4, gives an understanding of our proposed system in detail. In Sect. 5, we describe the various datasets and also perform a detailed evaluation of RDWM. Section 5 also discusses the statistical analysis of the experimental results. In the end, we summarize our paper and discuss the scope for future research.

3 Related work

3.1 Online approaches for handling drifting concepts

Weighted majority (WM) [13] believes that all features are not necessary for making a prediction. drift detection method (DDM) [6] detects drift by monitoring the online error-rate whereas early drift detection method (EDDM) [1] monitors the distance between prediction errors. Adaptive windowing (ADWIN) [23] uses sliding windows with variable sizes. Paired learner (PL) [22] maintains a stable learner that predicts based on its learning since the last replacement and a reactive learner that predicts based on its most recent experience.

Adaptive classifier ensemble (ACE) [20] uses an online classifier, a set of batch classifiers and a drift detection mechanism for handling recurrent drifts. An enhanced version of ACE [18] uses a pruning strategy to remove the old redundant classifiers. DWM [12] dynamically creates new experts and removes an expert if its weight reaches a threshold value. In Diversified dynamic weighted majority (DDWM) [30], the classification result is the class with the maximum support considering both the low diversity and the high diversity ensembles. L-GEM [7] is a dynamic fusion method that estimates the local competence of base classifiers in multiple classifier systems. pool and accuracy based stream classification (PASC) [28] maintains a pool of classifiers to track recurring concepts. A novel Just-In-Time (JIT) classifier [27] deals with recurrent drifts by means of a practical formalization of the concept representation and the definition of a set of operators working on such representations. A context-aware data stream learning system [15] uses available context information to improve existing ensemble approaches for handling recurrent concepts.

3.2 Performance metrics

Prequential accuracy (%) It is the average accuracy calculated online by classifying every instance to be learned prior to its learning. For evaluating RDWM, we have used sliding window (w) as the forgetting mechanism [29].
Kappa statistic (%) It gives a score of homogeneity among the experts [30].
Model cost (RAM-Hours) One RAM-Hour is equivalent to one GB of RAM being deployed for one hour.
Time (CPU-seconds) It is the total runtime that involves training and testing of the experts.
Memory (bytes) It measures the total memory used to store the running statistics and the online model.

4 Recurring dynamic weighted majority (RDWM) approach

RDWM maintains two ensembles: a primary online ensemble (EO) and a secondary ensemble (EB). An expert maintains an accuracy weight, a pruning weight and an accuracy value. The accuracy weight is used for class prediction and the pruning weight helps in determining the pruning order. The accuracy value measures the accuracy of the expert for the most recent W (window size) instances. The primary ensemble in RDWM is updated or pruned as in DWM [12]. The secondary ensemble is neither updated nor trained but only copies the best expert from the primary ensemble.

For every new instance arriving in the data stream, Algorithm 1 (Global Prediction) gives the global prediction by each ensemble. Algorithm 2 (Final Prediction) is used to predict the final class prediction. Algorithm 3 (Drift Handling) updates the system upon drift detection. Algorithm 4 (Recurring Dynamic Weighted Majority) outlines the main procedure followed by our approach.

4.1 Algorithm 1: global prediction

The ensemble EO maintains online experts, each having an initial accuracy weight of one, both pruning weight and accuracy value of zero. When the local prediction (LO) is incorrect (lines 4–5), the accuracy weight of an expert in EO is reduced by a multiplicative constant (β, 0 ≤ β < 1) [12]. However, when the local prediction is correct, the accuracy value is increased by one (lines 7–8, 21–22), at each time step. After every W instance, the accuracy value of each expert is set to zero so as to have a comparative analysis of the experts in terms of their accuracy on the most recent W instances (lines 9, 23–24).

The pruning weights of all the experts in both EO and EB are reduced by one at each time step (lines 16 and 28), except for the expert having the highest accuracy value for the most recent W instances.

Further, for this expert if the pruning weight is less than zero, we set its weight to zero (lines 15) else increase it by one (lines 13) at each time step. An expert in EO is removed, if its pruning weight reaches the threshold value θ (line 33).

The update of accuracy weights and removal of experts in EO is controlled by W (lines 4 and 31). The global prediction by each ensemble is the weighted majority vote of the experts’ predictions and is the class with the maximum support (line 30). After each update, the accuracy weight of all the experts is normalized so that after transformation the maximum value of weight is one (line 32). Algorithm 1 outputs the global class prediction by both the ensembles (line 35).

4.2 Algorithm 2: final prediction

The final prediction (G) is the class with the maximum support, involving the weighted majority vote of the experts’ predictions from both EO and EB (lines 2–4). If the support for the class as predicted by EO is more than EB, the final prediction is the class as predicted by EO (line 2) else as predicted by EB (line 3). However, in the initial learning phase when EB is empty, the final prediction is the class as predicted by EO (line 5). For every new instance, the algorithm outputs the final class prediction G (line 7).

4.3 Algorithm 3: drift handling

After the first 2W instances have arrived in the data stream (line 1), the best expert having the highest accuracy on the most recent W instances is copied from EO into EB (lines 2–4). If the final class prediction G is incorrect (line 5), the following three cases arise:

Case I

drift is detected by both ensembles.

Case II

drift is detected by primary ensemble only.

Case III

drift is detected by secondary ensemble only.

For Case I or Case II, the best online expert from EO is copied into EB (line10). EO is re-initialized so as to learn the next concept from scratch (line 11). However, if EB already contains m experts, the expert with the minimum pruning weight is removed from EB (lines 7–8).

Similarly for Case III, the best online expert from EO is copied into EB (line 17). A new expert trained as per the new concept is added in EO (line 21). However, if EO is already full, we remove the expert with the minimum pruning weight (lines 18–20). The handling of drift by RDWM occurs only after the first 2W instances have arrived in the data stream (line 1).

4.4 Algorithm 4: recurring dynamic weighted majority

RDWM develops a primary ensemble (EO) using modified version of online bagging [21], containing m experts each with an accuracy weight (awo _jl) of one, both pruning weight (pwo _jl) and accuracy value (ao _jl) of zero (lines 1–4). Input to the system is n instances, each consisting of a feature vector and its corresponding class label (line 5).

For every new instance, we call Algorithm 1 to get the global prediction by each of the ensembles (line 6). The final class prediction for the instance is the class as predicted by Algorithm 2 (line 7). Algorithm 3 (line 8) is called when a drift is detected by our system. Training of the experts in EO is a continuous process happening at each time step and one could use any base learner considering the various parameters of the base learner (lines 9–11).

5 Experimental evaluation

5.1 Concept drifting data streams

5.1.1 Artificial datasets

5.1.1.1 Stagger concepts

A Stagger concept [26] has 3 features: shape ∈ {triangle, circle, rectangle}, size ∈ {small, medium, large} and color ∈ {blue, green, red}. It contains 240 instances, with a new instance at each time step. A learner is evaluated based on a pair of features only. It contains abrupt drifts and recurrent drifts.

5.1.1.2 Moving hyperplane dataset

The instances [9] are uniformly distributed in multi-dimensional space [0, 1]¹⁰ and are classified as positive if they satisfy the condition as in Eq. (1)

$${w_0}_{{}} \leqslant {\text{ }}{\Sigma _{{\text{i}}={\text{1}}}}^{{{\text{1}}0}}{w_i}{x_i}$$

(1)

For the various runs of the dataset, the weights {w _i } are initialized to [− 1, 1] randomly and updated as w _i ← w _i+ ds _i at each time step, where s _i ∈ {− 1, 1} represents the direction of change and d represents the magnitude of change. At each time step, the threshold w ₀ is calculated as given in Eq. (2).

$${w_0}_{{}}=\frac{1}{2}{\Sigma _{{\text{i}}={\text{1}}}}^{{{\text{1}}0}}{w_i}$$

(2)

{s _i} is reset randomly after every 1000 instances. The dataset has a total of 3000 instances with gradual drifts and noise. For evaluating RDWM, the value of d was set to 0.001.

5.1.2 Real-world datasets

As these represent a real-world phenomenon, we cannot predict the occurrence of drift.

5.1.2.1 Electricity pricing domain

The dataset [8] was obtained from the electricity supplier TransGrid, New South Wales Australia. It contains 45,312 instances collected at 30-min intervals between 7 May, 1996 and 5 December, 1998. Each instance consists of five features and a class label of either up or down. The prediction task is to predict the price of electricity and is affected by demand and supply.

5.1.2.2 Power supply stream

The dataset [25] records hourly power supply of an Italian electricity company, measuring the supply from the main grid and the power transformed from the other grids. It maintains 3 year power supply records from 1995 to 1998, with a total of 29,928 instances. Each instance maintains two attributes. The prediction task is to predict the hour (1 out of 24) to which the current power supply belongs to.

5.1.2.3 KDD cup 1999 dataset

KDD Cup 1999 [34] is a network intrusion detection dataset. It consists of a large variety of intrusions simulated in a military network environment. The data set contains 494,020 instances, with each instance maintaining 41 attributes. The target class identifies whether the connection is an attack or a normal connection. For evaluating RDWM, only 15% of the dataset i.e. 74,103 instances were used.

5.1.2.4 Breast cancer dataset

This static dataset was obtained from UCI repository [31]. It classifies an instance as either a recurrence-event or a no-recurrence-event. The dataset was provided by the Oncology Institute, and maintains a total of 286 instances. Each instance maintains nine attributes with either a linear or a nominal value.

5.2 Experimental objectives, design and measures analyzed

The main objective is to study the behavior of RDWM in different situations, answering the research questions discussed in Sect. 2. Experiments were done using Massive Online Analysis (MOA) [2] tool. It is based on a Unix/ Linux system with Java 6 SDK installed. To run MOA, two jar files were needed: moa.jar and sizeofag.jar. RDWM was evaluated using various datasets, to study the change in its behavior with variations in the speed or severity of drift or presence of noise. Numerically as well as empirically RDWM has been compared with the existing approaches, measuring its average performance over 50 runs of each dataset. The results were completely in favor of RDWM.

Table 1 lists the parametric values used by each learning system. To have a fair comparison, the number of experts in each system must be same. So, we used EDDM, DDM, Adwin and PL along-with OzaBag [14]. The ensemble size in DWM was set to double the size (m) in RDWM. The examples in the real time datasets were processed in the same temporal order as they appear in the dataset, with one example at each time step. The base learners used were NB (that assumes feature independence) and HT (having a dependent feature set). In MOA, the width of sliding window (w) was set to 1000.

Table 1 The Parametric values used by learning systems

A two ensemble system to handle concept drifting data streams: recurring dynamic weighted majority

Abstract

Similar content being viewed by others

A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

Droplet Ensemble Learning on Drifting Data Streams

Explore related subjects

1 Introduction

2 Research questions and paper organization

3 Related work

3.1 Online approaches for handling drifting concepts

3.2 Performance metrics

4 Recurring dynamic weighted majority (RDWM) approach

4.1 Algorithm 1: global prediction

4.2 Algorithm 2: final prediction

4.3 Algorithm 3: drift handling

Case I

Case II

Case III

4.4 Algorithm 4: recurring dynamic weighted majority

5 Experimental evaluation

5.1 Concept drifting data streams

5.1.1 Artificial datasets

5.1.1.1 Stagger concepts

5.1.1.2 Moving hyperplane dataset

5.1.2 Real-world datasets

5.1.2.1 Electricity pricing domain

5.1.2.2 Power supply stream

5.1.2.3 KDD cup 1999 dataset

5.1.2.4 Breast cancer dataset

5.2 Experimental objectives, design and measures analyzed

5.3 Evaluation and results

5.3.1 Evaluation on Stagger concepts

5.3.2 Evaluation on moving hyperplane dataset

5.3.3 Evaluation on electricity pricing domain

5.3.4 Evaluation on power supply stream

5.3.5 Evaluation on KDD CUP 1999 dataset

5.3.6 Evaluation on breast cancer dataset

5.3.7 Evaluation on static concepts

5.4 Statistical analysis of the experimental results

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation