An improved weighted extreme learning machine for imbalanced data classification

Lu, Chengbo; Ke, Haifeng; Zhang, Gaoyan; Mei, Ying; Xu, Huihui

doi:10.1007/s12293-017-0236-3

An improved weighted extreme learning machine for imbalanced data classification

Regular Research Paper
Published: 29 June 2017

Volume 11, pages 27–34, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Memetic Computing Aims and scope Submit manuscript

An improved weighted extreme learning machine for imbalanced data classification

Download PDF

Chengbo Lu¹,
Haifeng Ke²,
Gaoyan Zhang²,
Ying Mei¹ &
…
Huihui Xu³

1094 Accesses
21 Citations
Explore all metrics

Abstract

This paper proposes an improved weighted extreme learning machine (IW-ELM) for imbalanced data classification. By incorporating voting method into weighted extreme learning machine (weighted ELM), three major steps are involved in the proposed method: training weighted ELM classifiers, eliminating unusable classifies to determine proper classifiers for voting, and finally determining the classification result based on majority voting. Simulations on many real world imbalanced datasets with various imbalance ratios have demonstrated that the proposed algorithm outperforms weighted ELM and other related classification algorithms.

Self-adaptive Weighted Extreme Learning Machine for Imbalanced Classification Problems

A self-adaptive evolutionary weighted extreme learning machine for binary imbalance learning

Article 16 January 2018

A Novel Weighted Extreme Learning Machine for Highly Imbalanced Multiclass Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The problem of classification for imbalanced data is one of the top ten studied areas in the data mining and machine learning field today [1]. Imbalanced data classification typically refers to a problem with the classes not equally represented. Applications of imbalanced data classification include a wide variety of “real-world” problems such as detecting fraudulent transactions of credit cards or characterizing the recurrence of breast cancer in patients. Most of classic imbalance studies focus on two-class (binary) classification since multi-class can be decomposed into multiple two-class. Commonly in a two-class classification problem, the majority of data are labeled as the “negative” class while the minority are the “positive” class. Because most of the standard classification learning algorithms [2,3,4] are designed to tackle balanced datasets, the small “positive” class is often underrepresented when these algorithms are being applied directly on imbalanced datasets. An excellent accuracy may be achieved due to correct classification for almost all samples, but the accuracy only reflects the underlying class distribution and therefore could be a misleading performance measure. Moreover, the positive classes usually have higher interest or importance than the negative class and oftentimes imply higher misclassification cost.

Many techniques have been developed recently to combat imbalanced data classification in machine learning. These techniques can be categorized as either data resampling methods or algorithmic approaches. Data resampling methods aim to balancing data before applying standard classification learning algorithms. Two well-known strategies can be utilized—oversampling or undersampling. Synthetic minority over-sampling technique (SMOTE) [5] is an example of oversampling, in which new data samples are generated between the minority data and their selected neighbors. Tomek links method [6] and Wilson’s editing [7] are undersampling methods, in which a fraction of the majority data are removed. Balancing data by resampling is helpful in reducing misclassification error, however, the effectiveness is not consistent but problem dependent [8]. On the other hand, new machine learning classification algorithms or modification of existing ones are also developed, such as one-class classifiers [9] and cost-sensitive learning [10]. As a popular algorithmic approach, cost-sensitive classifier incorporates cost sensitivity into machine learning process, i.e., a different misclassification cost is assigned for each particular example [11]. However, cases may arise in the real world applications when the misclassification cost matrix is difficult to generate.

Zong et al. [12] proposed a competitive cost-sensitive learning approach to automatically generate the misclassification cost matrix in accordance with class distribution, known as weighted extreme learning machine (weighted ELM). A weighting that is inversely proportional to the number of samples is assigned, which strengthens the impact of the positive class in imbalanced datasets. While maintaining the advantages from the original unweighted ELM, i.e., convenient implementation and easy application on multi-class data classification, weighted ELM demonstrates better performance on imbalanced datasets compared to unweighted ELM (i.e., ELM). However, the weighting schemes presented in Zong et al. induce a certain amount of misclassified samples from the negative class in compromise for a better accuracy in the positive class. Therefore, other ELM based approaches have been proposed. For instance, Zhang and Ji [13] developed fuzzy ELM (FELM): a fuzzy matrix is created to change the distributions of penalty factors. However, a unified design strategy for generating the fuzzy matrix is unpresented. In [14], Li et al. developed a Boosting weighted ELM, in which weighted ELM is embedded into a modified AdaBoost framework in order to obtain proper weights. But the computational complexity of algorithm is increased significantly due to iterations. Lin et al. [15] combined resampling technique with ELM; however, its performance is largely dependent on the adopted resampling technique.

In this paper, an improved weighted ELM (IW-ELM) algorithm is proposed, in which a voting based weighting scheme is introduced when assigning appropriate weights adaptively for imbalanced data classification. This paper is organized as follows. Section 2 briefly reviews the fundamental principles of weighted ELM and its issues. Section 3 elaborates the proposed algorithm. Experiment results are presented in Sect. 4. Section 5 compares the computational complexity of IW-ELM with other related classification algorithms using real world imbalanced datasets. Section 6 draws a conclusion.

2 Related work

The proposed algorithm is developed based on ELM and weighted ELM. The basic principles of ELM and weighted ELM are reviewed in Sects. 2.1 and 2.2 respectively. Section 2.2 also presents the problems related to weighted ELM when classifying imbalanced datasets.

2.1 Unweighted ELM

Least square based learning algorithm named ELM [16] was originally proposed for single layer feedforward networks (SLFNs), where input weights of a SLFN are randomly generated, and output weights are trained with batch learning technique of least squares. It has been proved that SLFNs with randomly hidden neurons and tunable output weights have universal approximation and excellent generalization performance [17]. More importantly, ELM outperforms most existed learning algorithms in training speed [18,19,20,21,22] and it has been widely used in applications of face recognition [23], image processing and classification[24], electricity price classification [25], energy commodity futures index forecast [26], location fingerprinting technique [27], protein sequence classification [28], and location classification [29].

The basics of unweighted ELM are briefly outlined as follows:

For any input data ${\varvec{x}}\in R^{n}$, the output of a standard SLFN with L hidden nodes can be written as

$$\begin{aligned} H\left( {\varvec{x}} \right) =\sum \nolimits _{i=1}^L {\varvec{\beta }}_{i}G\left( {\varvec{w}}_{i},b_{i},{\varvec{x}} \right) ,\, {\varvec{w}}_{i}\in R^{n}, {\varvec{\beta }}_{i}\in R^{m}, \end{aligned}$$

where ${\varvec{\beta }}_{i}={[\beta _{i1},\beta _{i2},\ldots ,\beta _{im}]}^{T}$ is the weight vector connecting the ith hidden node to the output nodes, and $G\left( {\varvec{w}}_{i},b_{i},{\varvec{x}} \right) $ is the output of the ith hidden node.

Consider a set of training pairs with N inputs ${\varvec{x}}_{1},{\varvec{x}}_{2},\ldots ,{\varvec{x}}_{N}$ (here ${\varvec{x}}_{i}\in R^{n})$, and N target output ${\varvec{t}}_{1},{\varvec{t}}_{2},\ldots ,{\varvec{t}}_{N}$ (here ${\varvec{t}}_{i}\in R^{m})$, respectively. The mathematical model of a SLFN is

$$\begin{aligned} {\varvec{H}}{\varvec{\beta }}={\varvec{T}} \end{aligned}$$

(1)

with

$$\begin{aligned} {\varvec{H}}=\left( \begin{array}{*{20}c} G({\varvec{w}}_{1},b_{1},{\varvec{x}}_{1}) &{}\quad \cdots &{}\quad G({\varvec{w}}_{L},b_{L},{\varvec{x}}_{1})\\ \vdots &{} \quad \ddots &{} \quad \vdots \\ G({{\varvec{w}}_{1},b_{1},{\varvec{x}}}_{N}) &{} \quad \cdots &{}\quad G({\varvec{w}}_{L},b_{L},{\varvec{x}}_{N})\\ \end{array}\right) , \end{aligned}$$

(2)

Similar to least squares support vector machine (LS-SVM), the output can be formulated by finding the solution for an optimization problem:

$$\begin{aligned}&\text {Minimize }\,\frac{1}{2}\left\| {\varvec{\beta }} \right\| ^{2}+\frac{C}{2}\left\| {\varvec{\varepsilon }}_{i} \right\| ^{2}, \end{aligned}$$

(3)

$$\begin{aligned}&\text {subject to }{\varvec{\varepsilon }} _{i}=H\left( {\varvec{x}}_{i} \right) {\varvec{\beta }}-{\varvec{t}}_{i},\, i=1,2,\ldots ,N, \end{aligned}$$

(4)

where ${\varvec{\varepsilon }}_{i}$ is the training error vector of the m output nodes with respect to the training sample and C is a positive real regularization parameter.

By solving optimization problem (3), the least square solution can be determined as:

$$\begin{aligned} \text {when }N< & {} L:\, {\varvec{\beta }}=\tilde{{\varvec{H}}}^{+}{\varvec{T}}={\varvec{H}}^{T}\left( \frac{1}{C}{\varvec{I}}+{\varvec{H}}{\varvec{H}}^{T} \right) ^{-1}{\varvec{T}},\nonumber \\ \text {when } N\ge & {} L:\, {\varvec{\beta }}=\tilde{{\varvec{H}}}^{+}{\varvec{T}}=\left( \frac{1}{C}{\varvec{I}}+{\varvec{H}}^{T}{\varvec{H}} \right) ^{-1}{\varvec{H}}^{T}{\varvec{T}}. \end{aligned}$$

(5)

Similar to other conventional learning algorithms, the performance of unweighted ELM is highly dependent on class distribution. It performs well with balanced datasets, but imbalanced classification may be problematic. Intuitively, the negative class tends to push the separating boundary towards the positive class side to obtain better classification accuracy for itself (see [12] for more detailed analyses). Weighted ELM algorithm [12] was developed to overcome this problem.

2.2 Weighted ELM

As a cost-sensitive learning algorithm, weighted ELM assigns different weights for each example to minimize misclassification of positive samples and associated cost errors. That is, in weighted ELM, the goal is to maximize the marginal distances:

$$\begin{aligned}&\text {Minimize }\,\frac{1}{2}\left\| {\varvec{\beta }} \right\| ^{2}+\frac{C}{2}{\varvec{W}}\left\| {\varvec{\varepsilon }}_{i} \right\| ^{2}, \end{aligned}$$

(6)

$$\begin{aligned}&\text {subject to }\,{\varvec{\varepsilon }}_{i}=H\left( {\varvec{x}}_{i} \right) {\varvec{\beta }}-{\varvec{t}}_{i},\, i=1,2,\ldots ,N, \end{aligned}$$

(7)

where ${\varvec{W}}$ is a misclassification cost matrix in accordance with class distribution.

Similar to (5), solutions of ${\varvec{\beta }}$ can be obtained:

$$\begin{aligned} \text {when }N< & {} L{:} \, {\varvec{\beta }}={\varvec{H}}^{T}\left( \frac{1}{C}{\varvec{I}}+{\varvec{WH}}{\varvec{H}}^{T} \right) ^{-1}{\varvec{WT}},\nonumber \\ \text {when }N\ge & {} L{:} \,{\varvec{\beta }}=\left( \frac{1}{C}{\varvec{I}}+{\varvec{H}}^{T}{\varvec{WH}} \right) ^{-1}{\varvec{H}}^{T}{\varvec{WT}}. \end{aligned}$$

(8)

In [12], two cost matrices ${\varvec{W}}_{{\varvec{1}}}$ and ${\varvec{W}}_{{\varvec{2}}}$, were proposed.

$$\begin{aligned} \text {Weighting scheme }{\varvec{W}}_{{\varvec{1}}}: W_{i,i}=\frac{1}{\# (t_{i})}, \, i\mathrm {=1,2,\ldots ,}m, \end{aligned}$$

(9)

where $\# (t_{i})$ is the number of samples in class $t_{i}$.

The imbalanced datasets reach a cardinal balance when scheme ${\varvec{W}}_{{\varvec{1}}}$ is applied.

In comparison, another weighting scheme

$$\begin{aligned} {\varvec{W}}_{{\varvec{2}}}:\, \left\{ {\begin{array}{cc} W_{i,i}=\frac{1}{\# (t_{i})},&{} \mathrm {if}\,t_{i}>\mathrm {AVG},\\ &{} \\ W_{i,i}=\frac{0.618}{\# (t_{i})},&{}\mathrm {if}\,t_{i}\le \mathrm {AVG},\\ \end{array} } \right. \end{aligned}$$

where the golden ratio—0.618—is used and the AVG indicates the averaged sample size per class. Actually ${\varvec{W}}_{{\varvec{2}}}$ is a trade-off between unweighted ELM and weighted ELM ${\varvec{W}}_{{\varvec{1}}}$.

Table 1 Performance results of binary problems with imbalance ratio $\in (0.0078, 0.8605)$

Full size table

After applying weighting scheme ${\varvec{W}}_{{\varvec{1}}}$ or ${\varvec{W}}_{{\varvec{2}}}$ into unweighted ELM, weighted ELM reduces the misclassification error associated with the small positive class. In another word, the weighting scheme pulls the boundary back towards the negative class side so that additional positive class samples can be correctly classified.

However, assigning higher weights to the positive class simply implies lowering the weights of the negative class. In fact, if the weights of the positive class are too high, then the classification accuracy in the negative class reduces. However, if the weights of the positive class are not high enough, then the classification accuracy in the positive class may be unsatisfactory. As noted in [12], a certain amount of samples from the negative class are misclassified in compromise for a better accuracy in the positive class. Therefore, the key point is to, prior to applying weighted ELM, determine an optimal weight or the most accurate position in the negative class side where the boundary should be pushed towards. However, this is generally not possible because the optimal weight is problem dependent. As shown in Table 1, different weights are evaluated when applying weighted ELM algorithm. Out of 46 tested datasets, only 11 happen to have the optimal weights as ${\varvec{W}}_{{\varvec{1}}}$ or ${\varvec{W}}_{{\varvec{2}}}$.

3 Description of IW-ELM

Since it is not practical to determine an optimal weight when designing a weighted ELM classifier in advance, in IW-ELM, we propose to use a series of “appropriate” weights in place of a single ${\varvec{W}}_{{\varvec{1}}}$ or ${\varvec{W}}_{{\varvec{2}}}$ for imbalanced data classification. For convenience, only binary classification problems are considered. Assume $t_{1}$ is the positive class and $t_{2}$ is the negative class, $a=\frac{1}{\# (t_{1})}, \quad b_{i}=\frac{1}{\# (t_{1}\mathrm {)+}\left\{ \# \left( t_{2} \right) -\# (t_{1}) \right\} \mathrm {\times }\frac{1}{k-1}\times (i\mathrm {+}\frac{k-1}{2})}, \left( i=0,1,\ldots ,k-1 \right) .$ The proposed method is implemented in three major steps.

Table 2 The values of trade-off constant C and the number of hidden nodes L corresponding to the results in Table 1

Full size table

Step 1k different weight matrices are first selected:

$$\begin{aligned} \tilde{{\varvec{W}}}_{i}={\varvec{W}}_{i}^{+}+{\varvec{W}}_{i}^{-}, \end{aligned}$$

(10)

where ${\varvec{W}}_{i}^{+}=\mathrm {diag} \left( {\mathop {\underbrace{0,0,\cdots ,0}_{\#(t_{2})}}}, {\mathop {\underbrace{a,a,\cdots ,a}_{\#(t_{1})}}}\right) ,$${\varvec{W}}_{i}^{-}=\mathrm {diag}\left( {\mathop {\underbrace{b_{i},b_{i},\cdots ,b_{i}}_{\#(t_{2})}}}, {\mathop {\underbrace{0,0,\cdots ,0}_{\#(t_{1})}}}\right) , i=0,1,\ldots ,k-1$.

Note that when i is equal to $\frac{k-1}{2}$, $\tilde{{\varvec{W}}}_{i}$ becomes the weight matrix ${\varvec{W}}_{{\varvec{1}}}$ in Eq. (9).

Step 2k weighted ELM classifiers are trained corresponding to the selected k weight matrices in Step 1.

It is expected that some of the classifiers are not usable—they may induce significant false-negative errors or false-positive errors. In order to eliminate those unusable classifiers, for each classifier, the number of positive classification determined is recorded. Then $\left\lceil \frac{k}{4} \right\rceil $ classifiers corresponding to the biggest $\left\lceil \frac{k}{4} \right\rceil $ numbers are rejected since they are very likely biased towards the positive class ($\lceil \,\,\rceil $ is rounding up function ). Similarly, based on the number of negative classification recorded, $\left\lceil \frac{k}{4} \right\rceil $ classifiers corresponding to the biggest $\left\lceil \frac{k}{4} \right\rceil $ negative classification numbers are rejected as well because of the high possibility of misclassifying too many positive samples.

Step 3 Upon removing $\left\lceil \frac{k}{2} \right\rceil $ unusable classifiers, an ensemble composed of remaining classifiers is generated and majority voting is used to classify samples.

Note that the proposed algorithm and voting based extreme learning machine (V-ELM) proposed by Cao et al. in [30] are significantly different even though both are voting based techniques. V-ELM is an algorithm targeted towards balanced datasets, in which classifiers are first trained with unweighted ELM algorithm then the results are averaged by majority voting on classifying samples. V-LEM needs to be modified for imbalanced data classification. One intuitive technique is to combine weighted ELM into V-ELM. We name this method VW-ELM, in which the results are averaged from weighted ELM classifiers (scheme ${\varvec{W}}_{{\varvec{1}}}$ or ${\varvec{W}}_{{\varvec{2}}})$. It is known that the classifiers in an effective ensemble need to be “good and different”. That is to say, every single classifier in the ensemble should have a certain level accuracy for classification while maintaining some dissimilarity from others. Although the single weighted ELM classifier in VW-ELM possibly performs better, the similarity of some classifiers makes VW-ELM unsatisfactory.

4 Performance evaluation

In this section, the performance of the proposed IW-ELM is compared with unweighted ELM, weighted ELM ${\varvec{W}}_{{\varvec{1}}}$, weighted ELM ${\varvec{W}}_{{\varvec{2}}}$, and VW-ELM using fivefold cross-validation. The results of unweighted ELM are quoted from Table 3 and Table 4 in [12] directly. For weighted ELM ${\varvec{W}}_{{\varvec{1}}}$ and weighted ELM ${\varvec{W}}_{{\varvec{2}}}$, the one with better performance is chosen from Table 3 and Table 4 in [12]. All experiments are conducted in MATLAB (R2014, 64bit) on a Window 10 OS with Intel Core i7-2620M 2.70GHz GHZ CPU and 12 GB RAM.

4.1 Data specification

For comparison purposes, the proposed IW-ELM are evaluated using the same 46 binary classification datasets as listed in Table 1 in [12], with the attributes normalized into [−1, 1] and various imbalance ratios ranging from 0.0078 to 0.8605. Note that the imbalance ratio (IR) of binary classes is defined as:

$$\begin{aligned} \mathrm {IR}=\frac{\# (t_{1})}{\# (t_{2})}. \end{aligned}$$

(11)

4.2 Parameter settings

For convenience, the proposed algorithm is tested on Sigmoid additive node with a Sigmoid additive activation function defined as $G\left( {\varvec{a}},b,{\varvec{x}}\right) =\frac{1}{1+\mathrm {exp}(-{\varvec{a}}.{\varvec{x}}+b)}$. There are two parameters to tune: the trade-off constant C and the number of hidden nodes L. A grid search of C on $\left\{ 2^{-18},2^{-16},\ldots ,2^{48},2^{50} \right\} $ and L on $\left\{ 10,20,\ldots ,990,1000 \right\} $ is conducted in order to find the optimal result. The value of k in Eq. (10) is defined as 11, thus the number of remaining classifiers is 5. In order to make a fair comparison, the number of classifiers used in VW-ELM is also 5. C and L corresponding to the optimal result in our experiments are shown in Table 2.

4.3 Evaluation metrics for imbalanced datasets

Due to the possible presence of significant class imbalance, i.e., negative samples outnumber positive samples by a large margin, accuracy alone is not a reliable performance measure for imbalanced data classification. Other frequently used ones include F-measure, receiver operating characteristic (ROC) curve and geometric mean of the true rates (G-mean), etc. To compare the proposed method with weighted ELM presented in [12], G-mean is used in this paper. For binary classification problem, G-mean value is defined as the square root of (positive class accuracy $\mathrm {\times }$ negative class accuracy):

$$\begin{aligned} \text {G-mean value}=\sqrt{\frac{{\textit{TP}}}{{\textit{TP}}+{\textit{FN}}}\times \frac{{\textit{TN}}}{{\textit{TN}}+{\textit{FP}}}}, \end{aligned}$$

(12)

where TP, TN, FP, FN stands for true positive, true negative, false positive and false negative, respectively.

4.4 Experiment results

4.4.1 IW-ELM versus unweighted ELM

From Table 1, it can be seen that the proposed IW-ELM algorithm outperforms unweighted ELM. For problems with an imbalance ratio close to 1, i.e., the datasets are relatively balanced, weighted ELM does not have a significant advantage over unweighted ELM while IW-ELM still performs better than either one of them with higher G-mean values.

4.4.2 IW-ELM versus weighted (${\varvec{W}}_{{\varvec{1}}}, {\varvec{W}}_{{\varvec{1}}})$

As mentioned before, there are only 11 out of 46 datasets happen to have the optimal weights as ${\varvec{W}}_{{\varvec{1}}}$ or ${\varvec{W}}_{{\varvec{2}}}$. Even for these 11 datasets, the results of IW-ELM and weighted ELM are still comparable because of the ensemble used in IW-ELM.

4.4.3 IW-ELM versus VW-ELM

As seen in Table 1, VW-ELM generally performs better than weighted ELM because of the voting strategy; and IW-ELM has even higher successful classification rates over VW-ELM for most of the imbalanced problems tested. This is due to the fact that, in IW-ELM, unusable classifiers are eliminated from the ensemble and remaining classifiers are more different compared to the ones in VW-ELM.

5 Computational complexity

When using IW-ELM, there is a compromise between the high accuracy achieved and the computational complexity. Use ELM as a guideline, weighted-ELM has approximately the same computational complexity, while the complexities of VW-ELM and the proposed IW-ELM are higher due to the classifier training process. Since $\frac{k}{2}$ unweighted classifiers are trained in VW-ELM and k weighted classifies in IW-ELM, the execution time of IW-ELM is increased by k times compared to the original ELM.

6 Conclusions

In this paper, an improved algorithm designed for solving binary imbalanced classification problems—IW-ELM—has been proposed. There are three major steps in the process: training k weighted ELM classifiers, removing $\left\lceil \frac{k}{2} \right\rceil $ unusable classifiers, and determining the final result based on majority voting of remaining classifiers. The performance of IW-ELM (k set to be 11) is verified using 46 imbalanced datasets with various imbalance ratios ranging from 0.0078 to 0.8605. Simulation results demonstrated that IW-ELM achieves higher accuracy compared to other ELM based algorithms.

References

Monika I Trivedi, Mridush M (2016) Credit card fraud detection. Int J Adv Res Comput Commun Eng 5(1):39–50
Article Google Scholar
Fan B, Lu X, Li HX (2016) Probabilistic inference-based least squares support vector machine for modeling under noisy environment. IEEE Trans Syst Man Cybern Syst 46(12):1703–1710
Wu DD, Zheng L, Olson DL (2014) A decision support approach for online stock forum sentiment analysis. IEEE Trans Syst Man Cybern Syst 44(8):1077–1087
Article Google Scholar
Jiao L, Denoeux T, Pan Q (2016) A hybrid belief rule-based classification system based on uncertain training data and expert knowledge. IEEE Trans Syst Man Cybern Syst 46(12):1711–1723
Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: synthetic minority over-sampling technique. J Mach Learn Res 16:321–357
MATH Google Scholar
Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Cybern 6:769–772
MathSciNet MATH Google Scholar
Zquez F, Nchez JS, Pla F (2005) A stochastic approach to wilson’s editing algorithm. In: Second Iberian conference on pattern recognition and image analysis volume part II, vol 3523, pp 35–42
Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: 4th international conference, IDA 2001: advances in intelligent data analysis, volume 2189 of the series. Lecture notes in computer science, pp 34–43
Maldonado S, Montecinos C (2014) Robust classification of imbalanced data using one-class and two-class SVM-based multi classifiers. Intell Data Anal 18(1):95–112
Article Google Scholar
Huang Y (2015) Cost-sensitive incremental classification under the map reduce framework for mining imbalanced massive data streams. J Discrete Math Sci Cryptogr 18(1–2):177–194
MathSciNet Google Scholar
López V, Fernández A, García S et al (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
Zong W, Huang GB, Chen Y (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242
Article Google Scholar
Zhang WB, Ji HB (2013) Fuzzy extreme learning machine for classification. Electron Lett 49:448–450
Article Google Scholar
Li K, Kong X, Lu Z, Liu W, Yin J (2014) Boosting weighted ELM for imbalanced learning. Neurocomputing 128:15–21
Article Google Scholar
Lin SJ, Chang C, Hsu MF (2013) Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction. Knowl Based Syst 39(3):214–223
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Article Google Scholar
Zhang R, Lan Y, Huang GB, Xu ZB (2012) Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23:365–371
Article Google Scholar
Cao J, Hao J, Lai X et al (2016) Ensemble extreme learning machine and sparse representation classification. J Franklin Inst 353(17):4526–4541
Article MathSciNet MATH Google Scholar
Li X, Mao W, Jiang W et al (2016) Extreme learning machine via free sparse transfer representation optimization. Memetic Comput 8(2):85–95
Article Google Scholar
Cheng X, Liu H, Xu X et al (2016) Denoising deep extreme learning machine for sparse representation. Memetic Comput. doi:10.1007/s12293-016-0185-2
Zhang N, Ding S (2016) Unsupervised and semi-supervised extreme learning machine with wavelet kernel for high dimensional data. Memetic Comput. doi:10.1007/s12293-016-0198-x
Lu H, Du B, Liu J et al (2017) A kernel extreme learning machine algorithm based on improved particle swam optimization. Memetic Comput 9(2):121–128
Jin Y, Cao J, Wang Y et al (2016) Ensemble based extreme learning machine for cross-modality face matching. Multimed Tools Appl 75(19):11831–11846
Article Google Scholar
Cao J, Zhang K, Luo M (2016) Extreme learning machine and adaptive sparse representation for image classification. Neural Netw 81:91–102
Article Google Scholar
Shrivastava NA, Panigrahi BK, Lim M-H (2016) Electricity price classification using extreme learning machines. Neural Comput Appl 27(1):9–18
Article Google Scholar
Das SP, Padhy S (2016) Unsupervised extreme learning machine and support vector regression hybrid model for predicting energy commodity futures index. Memetic Comput. doi:10.1007/s12293-016-0191-4
Dwiyasa F, Lim M-H, Ong YS et al (2017) Extreme learning machine for indoor location fingerprinting. Multidimens Syst Signal Process 28:867. doi:10.1007/s11045-016-0409-0
Cao J, Luo X (2014) Protein sequence classification with improved extreme learning machine algorithms. Biomed Res Int 2014(1):103054
Google Scholar
Dwiyasa F, Lim M-H (2015) Extreme learning machine for active RFID location classification. In: Proceedings of the 18th Asia pacific symposium on intelligent and evolutionary systems, volume 2 of the series proceedings in adaptation, learning and optimization, pp 657–670
Cao J, Lin Z, Huang GB et al (2012) Voting based extreme learning machine. Inf Sci 185:66–77
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering, Lishui University, Lishui, 323000, China
Chengbo Lu & Ying Mei
School of Computer and Computing Science, Zhejiang University City College, Hangzhou, 310015, China
Haifeng Ke & Gaoyan Zhang
School of Engineering and Computer Science, University of the Pacific, Stockton, CA, 95211, USA
Huihui Xu

Authors

Chengbo Lu
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Ke
View author publications
You can also search for this author in PubMed Google Scholar
Gaoyan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Mei
View author publications
You can also search for this author in PubMed Google Scholar
Huihui Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haifeng Ke.

Additional information

This work is supported by Natural Science Foundation of China under Grant No. 61373057, First-class Disciplines of Zhejiang Province—Mathematics (Lishui University) and Scientific Research Foundation of Zhejiang Provincial Education Department under Grant No. Y201432787, Y201432200.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, C., Ke, H., Zhang, G. et al. An improved weighted extreme learning machine for imbalanced data classification. Memetic Comp. 11, 27–34 (2019). https://doi.org/10.1007/s12293-017-0236-3

Download citation

Received: 02 January 2017
Accepted: 20 June 2017
Published: 29 June 2017
Issue Date: 08 March 2019
DOI: https://doi.org/10.1007/s12293-017-0236-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An improved weighted extreme learning machine for imbalanced data classification

Abstract

Similar content being viewed by others

Self-adaptive Weighted Extreme Learning Machine for Imbalanced Classification Problems

A self-adaptive evolutionary weighted extreme learning machine for binary imbalance learning

A Novel Weighted Extreme Learning Machine for Highly Imbalanced Multiclass Classification

1 Introduction

2 Related work

2.1 Unweighted ELM

2.2 Weighted ELM

3 Description of IW-ELM

4 Performance evaluation

4.1 Data specification

4.2 Parameter settings

4.3 Evaluation metrics for imbalanced datasets

4.4 Experiment results

4.4.1 IW-ELM versus unweighted ELM

4.4.2 IW-ELM versus weighted (\({\varvec{W}}_{{\varvec{1}}}, {\varvec{W}}_{{\varvec{1}}})\)

4.4.3 IW-ELM versus VW-ELM

5 Computational complexity

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved weighted extreme learning machine for imbalanced data classification

Abstract

Similar content being viewed by others

Self-adaptive Weighted Extreme Learning Machine for Imbalanced Classification Problems

A self-adaptive evolutionary weighted extreme learning machine for binary imbalance learning

A Novel Weighted Extreme Learning Machine for Highly Imbalanced Multiclass Classification

Explore related subjects

1 Introduction

2 Related work

2.1 Unweighted ELM

2.2 Weighted ELM

3 Description of IW-ELM

4 Performance evaluation

4.1 Data specification

4.2 Parameter settings

4.3 Evaluation metrics for imbalanced datasets

4.4 Experiment results

4.4.1 IW-ELM versus unweighted ELM

4.4.2 IW-ELM versus weighted (\({\varvec{W}}_{{\varvec{1}}}, {\varvec{W}}_{{\varvec{1}}})\)

4.4.3 IW-ELM versus VW-ELM

5 Computational complexity

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation