Weighted Ensemble Classification of Multi-label Data Streams

Wang, Lulu; Shen, Hong; Tian, Hui

doi:10.1007/978-3-319-57529-2_43

Lulu Wang¹⁹,
Hong Shen^20,21 &
Hui Tian²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10235))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3325 Accesses
13 Citations

Abstract

Many real world applications involve classification of multi-label data streams. However, most existing classification models mostly focused on classifying single-label data streams. Learning in multi-label data stream scenarios is more challenging, as the classification systems should be able to consider several properties, such as large data volumes, label correlations and concept drifts. In this paper, we propose an efficient and effective ensemble model for multi-label stream classification based on ML-KNN (Multi-Label KNN) [31] and propose a balance AdjustWeight function to combine the predictions which can efficiently process high-speed multi-label stream data with concept drifts. The empirical results indicate that our approach achieves a high accuracy and low storage cost, and outperforms the existing methods ML-KNN and SMART [14].

Access provided by CONRICYT-eBooks. Download conference paper PDF

Dynamic Ensemble Selection for Imbalanced Data Stream Classification with Limited Label Access

Multi-label Classification via Multi-target Regression on Data Streams

Multi-label classification from high-speed data streams with adaptive model rules and random rules

Article 24 January 2018

Keywords

1 Introduction

Due to the recent advances in computer networks and data storage, many data are produced and accumulated at an ever increasing rate in the form of stream. Such as online shopping information, logistics information, online news, stock market data, emails, credit card transactions, etc. These data are real-time, continuous and orderly arrival, and need to be analyzed promptly and effectively. For example, in online mail systems, incoming emails need to be classified into different categories, like spams, business emails, personal emails, important emails, etc. This classification task, each stream example is associated with a single class label l from a set of labels $L\left( |L|>1\right) $, is called single-label data stream classification, and has been extensively studied [3, 4, 11, 20, 33] in the literature.

In many emerging applications, each stream record may carry multiple class labels. A good example is news reports in the online news systems, most news reports carry multiple news topics (e.g. entertainment, financial and politics), then this is called Multi-Label data Stream Classification (MLSC) [21]. Formally, the multi-label stream classification problem is to training a model to attach a label subset $Y\subseteq L$ to each instance in a high-speed data stream. Although, multi-label classification has been studied in traditional database mining scenarios, multi-label data stream classification is a relatively new concept and has not been fully addressed yet.

An intuitive approach to solving the multi-label classification problem is to transform it into one or more single-label classification problems. In this fashion, traditional single-label classifiers can be employed to make single-label classifications. Finally, the multi-label predictions can be produced by combining these multi-label predictions. On the other hand, an alternative category is to adopt the existing single-label classifiers directly to multi-label classification [18, 23].

The multi-label data stream environment has different challenges from the traditional batch learning setting. As the instances in a multi-label data stream contain multi-labels (multi-concepts), dealing with the concept drift is the most important challenge to a classifier. Another challenge with regard to MLSC is that, it is possible that an arriving example will belong to a set of labels, some of which, will not have been previously observed because of the dynamic nature of the set of labels [21]. Besides, the learner must be able to handle the stream using limited memory in real time, because stream data flood in continuously at a high speed, which makes it impossible to be stored in the memory and processed offline [14].

To address the above issues, in this paper we propose an efficient and effective ML-KNN-based ensemble model for multi-label stream classification with a balance AdjustWeight function, called Streaming Weighted ML-KNN-based Ensemble Classifier (SWMEC). More specifically, we first propose an ML-KNN based algorithm to build the basic classifier $C_{i}$. The ensemble classifiers $C=<C_{1},C_{2},\cdots ,C_{L}>$ will be build at the beginning of the stream with randomly selected L test data chunks. This only needs to compute and save a small amount of information of the cluster center points. Thus the building process is highly efficient while consuming constant memory space. In addiction, each classifier $C_{i}$ has its own weight $w_{i}$, and $C=<\left( C_{1},w_{1}\right) ,\left( C_{2},w_{2}\right) ,\cdots ,\left( C_{L},w_{L}\right)>$. As data flow in, the weights will be adjusted and the classifier C will be updated according to the weights. Thus the proposed SWMEC approach can work adaptively to evolving data and deal with concept drifts, and can efficiently classify the incoming data in real-time.

The main contributions of this paper are as follows. (1) an adaption of the existing multi-label methods to evolving data streams. (2) an effective weighting adjustment strategy for ensemble classifiers. (3) experimental results validating the performance of our method and benchmarks in predictive performance and space complexity.

The remainder of this paper is organized as follows. In Sect. 2, we discuss relevant work in multi-label classification and stream classification. Section 3 presents the n preliminaries about the problem. Section 4, describes the proposed framework in details, and the experimental results are presented in Sect. 5. Finally, Sect. 6 concludes the paper.

2 Related Work

Our work is related to multi-label classification and stream classification techniques. We will briefly review the existing work on both of them.

Multi-label classification is the problem to deal with such instances that may belong to multiple different classes simultaneously and focuses on offline settings [16, 19]. Multi-label classification methods can be grouped into two categories, namely, problem transformation and algorithm adaptation [21]. Problem transformation methods transform the multi-label problem into multiple single-label problems. Problem transformation methods transform the multi-label problem into multiple single-label problems including Label Power-set (LP), Binary Relevance (BR) and Ranking by Pairwise Comparison (RPC [12]). Algorithm adaptation methods extend the traditional learning techniques to multi-label context, such as decision trees [5], neural networks [30], maximal margin methods [10], maximum entropy methods [25], and ensemble methods [25], etc. One well-known such approach is ML-KNN [31], which is derived from the popular lazy learning algorithm kNN. It’s the most relevant approach to our model and will be introduced in the next section.

Many studies have also been done on single-label stream classification. There are two sets of solutions: single-model based and ensemble based. Single-model based approaches [1, 2, 6, 9, 26, 27, 32, 33] use new data to incrementally update their model so that the model can scale to a large data volume. Ensemble based approaches [13, 22, 28], on the other hand, partition the data stream into equal sized chunks, and train multiple base models on different chunks of data. Then all the models are combined for prediction. The ensemble based approaches are easier to scale and parallelize, tend to achieve better accuracy and can also avoid over fitting than single classifier methods.

Recently, there are also some studies focusing on multi-label stream classification [14, 15, 17, 18, 21, 29]. Kong et al. [14] builds an ensemble of classifiers on successive data chunks. It proposes a random-tree based algorithm to improve it’s efficiency. Work also has been done on adopting the ensemble based strategy in handling multi-label streams [15, 29].

We follow a similar strategy to design our classification model with ML-KNN-based ensemble methods. Our model builds streaming classifiers by extending MLkNN, which is just designed for multi-label static data classification.

3 Preliminaries

We first introduce the notations that will be used throughout this paper, and then briefly describe the techniques ML-KNN [31] to make this paper self-contained.

Consider a multi-label stream S with a label set $\mathcal {L}=\left\{ 1,2,\cdots ,q\right\} $, $S\subseteq \mathbb {R}^{d}$. Stream S consists of infinite data chunks, $\left\{ D_{1},\cdots D_{n},\cdots \right\} $, where labeled chunk $D_{i}$ is denoted by $D_{L}$ and unlabeled chunk denoted by $D_{U}$. Each instance $x\in S$ has a label subset $Y=\left\{ y_{1},y_{2},\cdots ,y_{q}\right\} \in \left\{ -1,1\right\} ^{q}$, where $Y\left[ j\right] ={\left\{ \begin{array}{ll} 1 &{} \text {if }y_{j}\in Y\\ -1 &{} \text {if }y_{j}\notin Y \end{array}\right. }$, $\left( x_{i},Y_{i}\right) $ is an instance in the multi-label data stream. The task of multi-label stream classification based on ensemble solution is to train a classification model on the historical examples $F\left( \cdot \right) $, $F\left( \cdot \right) =g\left( f_{1}\left( \cdot \right) ,f_{2}\left( \cdot \right) ,\cdots ,f_{L}\left( \cdot \right) \right) $, where $f_{i}(\cdot )$ is the sub-classifier, $g\left( \cdot \right) $ is the combination function that combines the outputs of all $f\left( \cdot \right) $, L is the number of classifiers. Then it uses $F\left( \cdot \right) $ to predict a label set $Y_{i}$ to the incoming data $x_{i}$.

Table 1. Summary of major mathematical notations

Full size table

3.1 ML-KNN [31]

We briefly describe our model’s basic approach ML-KNN, which is derived from the traditional k-Nearest Neighbor (kNN) algorithm and classify the traditional static multi-label data in a lazy learning way. There are three main steps in this approach. For convenience, several notations are summarized in Table 1. In addition, given a training set $T=\left\{ \left( x_{1},Y_{1}\right) ,\left( x_{2},Y_{2}\right) ,\cdots ,\left( x_{n},Y_{n}\right) \right\} \left( x_{i}\in \mathbb {R}^{d},Y_{i}\in \mathcal {L}\right) $, t is the test instance, s is the smoothing parameter with a default value 1.

Step 1: Computing the prior probabilities $P\left( H_{b}^{l}\right) $ according to Eqs. 1 and 2. Where $l\in Y$ is the l-th label, and $b\in \left\{ 0\text {,1}\right\} $, $H_{1}^{l}$ represents the event the instance has label l. Conversely, $H_{0}^{l}$ means the instance does not have label l.

$$\begin{aligned} P\left( H_{1}^{l}\right) =\left( s+{\displaystyle \sum _{i=1}^{n}L_{t}\left( l\right) }\right) /\left( s\times 2+n\right) \end{aligned}$$

(1)

$$\begin{aligned} P\left( H_{0}^{l}\right) =1-P\left( H_{1}^{l}\right) \end{aligned}$$

(2)

Step 2: Computing the posterior probabilities $P\left( E_{j}^{l}\mid H_{b}^{l}\right) $ according to Eqs. 3 and 4. Where $E_{j}^{l}\left( j\in \left\{ 0,1,\cdots ,k\right\} \right) $ denotes the event that, among the k nearest neighbors of t, there are exactly j instances which have label l. c[j] counts the number of training instances with label l whose k nearest neighbors contain exactly j instances with label l. Correspondingly, $c^{'}[j]$ counts the number of training instances without label l whose k nearest neighbors contain exactly j instances with label l.

$$\begin{aligned} P\left( E_{m}^{l}\mid H_{1}^{l}\right) =\left( s+c\left[ j\right] \right) /\left( s\times \left( k+1\right) +\sum _{p=0}^{k}c\left[ p\right] \right) \end{aligned}$$

(3)

$$\begin{aligned} P\left( E_{m}^{l}\mid H_{0}^{l}\right) =\left( s+c^{'}\left[ j\right] \right) /\left( s\times \left( k+1\right) +\sum _{p=0}^{k}c^{'}\left[ p\right] \right) \end{aligned}$$

(4)

Step 3: Computing the output $y_{t}$ (label subset) and $r_{t}$ of t, where $r_{t}$ is a real-valued vector calculated to rank labels in $\mathcal {L}$, according to Eqs. 5 and 6.

$$\begin{aligned} \overset{\rightarrow }{y_{i}}\left( l\right)= & {} \arg \max _{b\in \left\{ 0,1\right\} }P\left( H_{b}^{l}\right) P\left( E_{C_{t}\left( t\right) }^{l}\mid H_{b}^{l}\right) \end{aligned}$$

(5)

$$\begin{aligned} \overset{\rightarrow }{r_{i}}\left( l\right)= & {} P\left( H_{1}^{l}\mid E_{C_{t}\left( l\right) }^{l}\right) \nonumber \\= & {} \left( P\left( H_{1}^{l}\right) P\left( E_{C_{t}\left( l\right) }^{l}\mid H_{1}^{l}\right) \right) /P\left( E_{C_{t}\left( l\right) }^{l}\right) \end{aligned}$$

(6)

The detailed architecture of ML-KNN was given in [31].

4 Weighted Ensemble Classification

In this section we first give the main idea of our weighted ensemble classification approach, then we introduce the process of the ensemble classifiers’ training and updating and the adjustment of their weights, finally we give the description of our algorithm.

4.1 Basic Idea

The data stream S is divided into a fixed number of chunks and each classification model in the ensemble is trained from a different chunk. Each classifier in the ensemble has it’s own weight. The new arriving unlabeled data chunk is classified by the ensemble while the corresponding weight will be changed. According to the weights, the latest classified data chunk will be decided if to be trained to generate a new model and replace one of the existing models in the ensemble. In this way the ensemble can be maintained at a fixed size and kept up-to-date. The problems of data stream’s infinite length and concept-drift can correspondingly be well addressed.

4.2 Classifier Training and Updating

The ML-KNN based multi-label ensemble classifier is built at the beginning of the data stream and timely updated over the data stream as follow: when a training data chunk in the data stream is arriving at time t, we build h clusters with the labeled data points by the application of XMeans technique. After the building of these clusters, we save each cluster’s centroid $O=\{o_{1},o_{2},\cdots ,o_{h}\},o_{i}\in \mathbb {R}^{d},i\in \left\{ 1,2,\cdots ,h\right\} $, and compute each centroid’s label subset in the same way as ML-KNN. After that, all centroids and their label subset $C=<\left( o_{1},y_{1},r_{1}\right) ,\left( o_{2},y_{2},r_{2}\right) ,\cdots ,\left( o_{h},y_{h},r_{h}\right) >,y_{j}\in Y$, $y_{i}$,$r_{j}$ are respectively the label subset and the a real-valued vector calculated by ML-KNN, will be saved as a summary. At the same time, the current summary’s arriving time t will also be recorded. Each summary’s weight w then can be calculated according to o, $\overset{\rightarrow }{r}$ and t, and set to be 1 at beginning. After the process of L chunks, the ensemble classifier $C=<\left( C_{1},w_{1}\right) ,\left( C_{2},w_{2}\right) ,\cdots ,\left( C_{L},w_{L}\right)>$ (each model $C_{i}$ is a collection of h summaries and the number of h is unfixed, $w_{i}$ is the weight of model $C_{i}$) will be built. When an unlabeled data chunk $D_{U}$ is arriving, for each instance $x_{i}\in D_{U}$, we find the $m_{i}=\left( \left( o_{j},y_{j}\right) ,w_{i}\right) \in C_{i},j\in [1,2,\cdots h]$ whose centroid $o_{j}$ is nearest from $x_{i}$. The corresponding weight $w_{i}$ will then be determined by the distance between $o_{j}$ and $x_{i}$, $\overset{\rightarrow }{r_{i}}$ and $t_{i}$. Then the $x_{i}$ will be labeled according to the summaries $\left\{ m_{1},m_{2},\cdots ,m_{L}\right\} $. Each unlabeled data chunk will be classified as above. After a chunk $D_{U}$ has been handled by the ensemble C, If the lowest $w_{lowest}\in w$ falls below a threshold value $\epsilon $, the corresponding model $C_{j}$ will be replaced by the new model that trained by $D_{U}$. This ensures that the ensemble will be updated with the passing of data chunk and the number of models in the ensemble remains constant.

4.3 Ensemble Weighting

In this paper, inspired by [8], we propose a combination function $g\left( \cdot \right) $ including three components ($\overset{\rightarrow }{\alpha _{i}}$, $\beta _{i}$ and $\gamma _{i}$) to combine classifiers in the ensemble. They are: (1) label confidence, which is a vector measuring the confidence in the sub-classifier outputting each label; (2) time difference; (3) distance difference, both (2) and (3) describe how confident a sub-classifier is when making a classification decision.

We estimate $\overset{\rightarrow }{\alpha _{i}}$ from the ensemble model C by Eq. 7:

$$\begin{aligned} \overset{\rightarrow }{\alpha _{i}}=\frac{\overset{\rightarrow }{r_{i}}}{\sum _{i=1}^{L}\overset{\rightarrow }{r_{i}}} \end{aligned}$$

(7)

where $\beta _{i}$, $\gamma _{i}$ describe how confident a sub-classifier is when making a classification decision. $\beta _{i}$ is estimated by the time difference between arriving new data chunk $D_{U}$ and the sub-classifier $C_{i}$. $\triangle t_{i}=t_{D_{U}}-t_{C_{i}}$, where $t_{D_{U}},$ $t_{C_{i}}$ are the arriving times of data chunk $D_{U}$ and $D_{i}$ respectively. The longer the chunks are apart, the lower the $\beta $ is.

$$\begin{aligned} \beta _{i}=\frac{e^{-\triangle t_{i}}}{\sum _{i=1}^{L}e^{-\triangle t_{i}}};0<\beta _{i}<1,\sum _{i=1}^{L}\beta _{i}=1 \end{aligned}$$

(8)

$\gamma _{i}$ is estimated by the distance difference between $x_{i}$ which is the instance from the arriving new data chunk $D_{U}$ and it’s nearest center $o_{j}$ in sub-classifier $C_{i}$.

$$\begin{aligned} \gamma _{i}=\frac{e^{-d_{i}}}{\sum _{i=1}^{L}e^{-d_{i}}};0<\gamma _{i}<1;\sum _{i=1}^{L}\gamma _{i}=1 \end{aligned}$$

(9)

where $d_{i}=distance\left( x_{i},o_{j}\right) $, the closer the distance between $x_{i}$ and $o_{j}$ the more confident the sub-classifier.

Finally, we combine $\overset{\rightarrow }{\alpha _{i}}$, $\beta _{i}$ and $\gamma _{i}$ to decide the weight $w_{i}=\overset{\rightarrow }{\alpha _{i}}\times \beta _{i}\times \gamma _{i}$. Consequently:

$$\begin{aligned} g\left( \cdot \right) =\sum _{i=1}^{L}\left( f_{i}\left( x_{i}\right) \times \overset{\rightarrow }{\alpha _{i}}\times \beta _{i}\times \gamma _{i}\right) \end{aligned}$$

(10)

The output $Y=\left\{ y_{1},y_{2},\cdots ,y_{q}\right\} \in \left\{ -1,1\right\} ^{q}$,

$Y\left[ j\right] ={\left\{ \begin{array}{ll} 1 &{} \text {if }g\left( \cdot \right) [j]=\sum _{i}^{L}\left( f_{i}\left( \cdot \right) [j]\times \alpha _{i}\times \beta _{i}\times \gamma _{i}\right) >0\\ -1 &{} \text {if }g\left( \cdot \right) [j]=\sum _{i}^{L}\left( f_{i}\left( \cdot \right) [j]\times \alpha _{i}\times \beta _{i}\times \gamma _{i}\right) <0 \end{array}\right. }$.

4.4 The Classification Algorithm

The pseudo code of our method for classifying multi-label data streams using the ML-KNN-based ensemble method is given in Algorithm 1.

5 Experiments

In this section, we show the results of several experiments performed to evaluate the effectiveness of our proposed SWMEC for classification in real-world multi-label data streams TMC2007 [24] (see Table 2 in detail). All the experiments are conducted on a PC with Intel(R) Core(TM) i3-3220 3.30 GHz CPU and 4 GB RAM. We have implemented SWMEC in python2.7. Most importantly, in order to get more precise classification results, we firstly preprocess the data set. We remove the infrequent words that occur in less than 11% of the documents as [14] did.

Table 2. Summary of dataset TMC2007

Full size table

A. Classification quality comparison with ML-KNN

We compare the performance of our approach SWMEC against ML-KNN [31] as the base-line of our model. In order to apply this one of the state-of-the-art methods in offline context to the stream classification, we train a ML-KNN classifier on the latest chunk of data, and use it to classify the next chunk of data, which is similar to sliding window approaches. Since the data chunk size is the most important factors in the classification and training process, here we change the chunk sizes (sliding window sizes) to test the performance of basic ML-KNN. The Parameters are set as follows: SWMEC: L (the size of the ensemble classifier model) $=4$, k (the number of nearest neighbors in ML-KNN) $=4$, $\epsilon $ (the threshold about weight’s adjustment) $=0.001$. ML-KNN ($w=100$): w (the size of the window) $=100$. ML-KNN ($w=200$): w (the size of the window) $=200$. ML-KNN ($w=400$): w (the size of the window) $=400$.

Multi-label classification problems has many different metrics for evaluation. Such as Hamming-loss, F-measure, Log-Loss, Ranking-Loss [17]. Here we adopt two metrics to evaluate multi-label classification performance in a data stream. First, Micro F1: considers both micro average of Precision and Recall with equal importance, evaluates a classifier’s label set prediction performance. The higher the value, the better the performance.

$$ MicroF1\left( f_{i},D_{L}\right) =\frac{2\times \sum _{i=1}^{|D_{L}|}|f_{i}\left( x\right) \cap Y_{i}|}{\sum _{i=1}^{|D_{L}|}|f_{i}\left( x\right) |+\sum _{i=1}^{|D_{L}|}|Y_{i}|} $$

where |D| is the number of instances in a multi-label data stream D, which contains $(x_{i},Y_{i})$, where $Y_{i}\subseteq L(i=1,\cdot \cdot \cdot ,|D|)$, $f(x_{i})\subseteq L$ denotes a multi-label classifier’s predicted label set for $x_{i}$ and $f(x_{i},k)$ denotes the classifier’s probability outputs for $x_{i}$ on the k-th label ($l_{k}$).

Second, Ranking Loss [7]: compute the average number of label pairs that are incorrectly ordered given $Y_{i}$ weighted by the size of the label set and the number of labels not in the label set. Evaluates the performance of classifier’s probability outputs or real-value outputs $f(x_{i},k)$. The best performance is achieved with a ranking loss of zero.

$$ RankLoss(f,D)=\frac{1}{|D|}\sum _{i=1}^{|D|}\frac{1}{|Y_{i}||\bar{Y_{i}}|}loss\left( f,x_{i},Y_{i}\right) $$

$$ loss(f,x_{i},Y_{i})=\sum _{k\in Y_{i}}\sum _{k^{'}\in \bar{Y_{i}}}I\left( f\left( x_{i},k\right) \le f\left( x_{i},k^{'}\right) \right) $$

where the $\bar{Y_{i}}$ denotes the complementary set of $Y_{i}$ in $\mathcal {L}$.

Figure 1(a) shows the Micro F1 for the four algorithm throughout the stream in TMC2007 data-set. We report the average performance on every |D| / 10 instances. For example, at X axis $=5$, the Y values show the average Micro F1 of four classification models from the $|D|*4/10$th instance of the stream to the $|D|*5/10$th instance. At this point, the Micro F1 of SWMEC is 0.3729, the Y values in ML-KNN ($w=100$), ML-KNN ($w=200$) and ML-KNN ($w=400$) are all below SWMEC.

We also calculate the Ranking loss of four classification models. Figure 1(b) show the Ranking loss of the four algorithms throughout the stream in TMC2007 data-set. For example, at X axis = 6, the ranking loss of SWMEC is 0.1571, the Y values in ML-KNN ($w=100$), ML-KNN ($w=200$) and ML-KNN ($w=400$) are all higher than SWEMC.

B. Classification quality comparison with SMART [14]

We also compare the performance of our approach SWMEC against SMART [14], which also adopt the strategy of ensemble and gives us a great inspiration in this paper. As Fig. 2(a) and (b) shows, Our method SWMEC is slightly better than SMART in Micro F1 and Ranking Loss. Especially noteworthy is that SMART has a much bigger space overhead than SWMEC, because SMART needs to maintain several tree structures while SWMEC only needs to store small amount of central points. In the future, we would like to combine both of these two methods’ advantages to address the multi-label classification problem in data streams.

6 Conclusion

This paper presents an efficient algorithm for multi-label data stream classification based on ML-KNN. As the properties of data stream and multiple labels assigned to each instances. It becomes more challenging than the traditional static multi-label data classification problems and single-label data stream classification problems. To address these challenges, we propose an ensemble multilabel data stream classification approach, manly Streaming Weighting ML-KNN based Ensemble Classifier (SWMEC), to efficiently update the model with the multi-label data stream flows. Then our model can effectively and efficiently predict multiple labels for future data points. The experimental results on the real world validate that our multi-label data stream classification approach is very effective and efficient for multi-label stream classification.

References

Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On demand classification of data streams. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 503–508. ACM (2004)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Locust: an online analytical processing framework for high dimensional classification of data streams. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 426–435. IEEE (2008)
Google Scholar
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Patt. Recogn. 37(9), 1757–1771 (2004)
Article Google Scholar
Brinker, K., Fürnkranz, J., Hüllermeier, E.: A unified model for multilabel classification and ranking. In: Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence, August 29–September 1, 2006, Riva del Garda, Italy, pp. 489–493. IOS Press (2006)
Google Scholar
De Comité, F., Gilleron, R., Tommasi, M.: Learning multi-label alternating decision trees from texts and data. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 35–49. Springer, Heidelberg (2003). doi:10.1007/3-540-45065-3_4
Chapter Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)
Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, pp. 681–687 (2001)
Google Scholar
Fung, G.P.C., Yu, J.X., Wang, H., Cheung, D.W., Liu, H.: A balanced ensemble approach to weighting classifiers for text classification. In: Sixth International Conference on Data Mining (ICDM 2006), pp. 869–873. IEEE (2006)
Google Scholar
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM (2003)
Google Scholar
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24775-3_5
Chapter Google Scholar
Gopal, S., Yang, Y.: Multilabel classification with meta-level features. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 315–322. ACM (2010)
Google Scholar
Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning pairwise preferences. Artif. Intell. 172(16), 1897–1916 (2008)
Article MathSciNet MATH Google Scholar
Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 449–456. ACM (2005)
Google Scholar
Kong, X., Yu, P.S.: An ensemble-based approach to fast classification of multi-label data streams. In: 2011 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), pp. 95–104. IEEE (2011)
Google Scholar
Qu, W., Zhang, Y., Zhu, J., Qiu, Q.: Mining multi-label concept-drifting data streams using dynamic classifier ensemble. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS (LNAI), vol. 5828, pp. 308–321. Springer, Heidelberg (2009). doi:10.1007/978-3-642-05224-8_24
Chapter Google Scholar
Read, J.: Scalable multi-label classification (2010)
Google Scholar
Read, J., Bifet, A., Holmes, G., Pfahringer, B.: Scalable and efficient multi-label classification for evolving data streams. Mach. Learn. 88(1–2), 243–272 (2012)
Article MathSciNet Google Scholar
Read, J., Bifet, A., Holmes, G., Pfahringer, B.: Efficient multi-label classification for evolving data streams (2010)
Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Article MathSciNet Google Scholar
Sanden, C., Zhang, J.Z.: Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 705–714. ACM (2011)
Google Scholar
Spyromitros-Xioufis, E.: Dealing with concept drift and class imbalance in multi-label stream classification. Ph.D. thesis, Department of Computer Science, Aristotle University of Thessaloniki (2011)
Google Scholar
Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM (2001)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Department of Informatics, Aristotle University of Thessaloniki, Greece (2006)
Google Scholar
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: a java library for multi-label learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)
MathSciNet MATH Google Scholar
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74958-5_38
Chapter Google Scholar
Wang, H., Wu, J., Zhu, X., Zhang, C.: Time-variant graph classification (2016). arXiv preprint: arXiv:1609.04350
Wang, H., Zhang, P., Zhu, X., Tsang, I., Chen, L., Zhang, C., Wu, X.: Incremental subgraph feature selection for graph classification. IEEE Trans. Knowl. Data Eng. 29, 128–142 (2016)
Article Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM (2003)
Google Scholar
Wang, P., Zhang, P., Guo, L.: Mining multi-label data streams using ensemble-based active learning. In: SDM, pp. 1131–1140. SIAM (2012)
Google Scholar
Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)
Article Google Scholar
Zhang, M.-L., Zhou, Z.-H.: Ml-knn: a lazy learning approach to multi-label learning. Patt. Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Zhang, P., Gao, B.J., Zhu, X., Guo, L.: Enabling fast lazy learning for data streams. In: 2011 IEEE 11th International Conference on Data Mining, pp. 932–941. IEEE (2011)
Google Scholar
Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from data streams. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 757–762. IEEE (2007)
Google Scholar

Download references

Acknowledgment

This work is supported by Research Initiative Grant of Sun Yat-Sen University under Project 985 and Australian Research Council Discovery Projects funding DP150104871. The corresponding author is Hong Shen.

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
Lulu Wang
School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
Hong Shen
School of Computer Science, University of Adelaide, Adelaide, Australia
Hong Shen
School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, China
Hui Tian

Authors

Lulu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Shen
View author publications
You can also search for this author in PubMed Google Scholar
Hui Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Shen .

Editor information

Editors and Affiliations

Kangwon National University, Chuncheon, Korea (Republic of)
Jinho Kim
Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
University of Technology Sydney, Sydney, New South Wales, Australia
Longbing Cao
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
University of New South Wales, Sydney, New South Wales, Australia
Xuemin Lin
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Shen, H., Tian, H. (2017). Weighted Ensemble Classification of Multi-label Data Streams. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-57529-2_43
Published: 23 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics