Adaptive CCR-ELM with variable-length brain storm optimization algorithm for class-imbalance learning

Cheng, Jian; Chen, Jingjing; Guo, Yi-nan; Cheng, Shi; Yang, Linkai; Zhang, Pei

doi:10.1007/s11047-019-09735-9

Adaptive CCR-ELM with variable-length brain storm optimization algorithm for class-imbalance learning

Published: 27 February 2019

Volume 20, pages 11–22, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Natural Computing Aims and scope Submit manuscript

Adaptive CCR-ELM with variable-length brain storm optimization algorithm for class-imbalance learning

Download PDF

Jian Cheng¹,
Jingjing Chen¹,
Yi-nan Guo¹,
Shi Cheng¹,
Linkai Yang¹ &
…
Pei Zhang¹

595 Accesses
25 Citations
Explore all metrics

Abstract

Class-specific cost regulation extreme learning machine (CCR-ELM) can effectively deal with the class imbalance problems. However, its key parameters, including the number of hidden nodes, the input weights, the biases and the tradeoff factors are normally generated randomly or preset by human. Moreover, the number of input weights and biases depend on the size of hidden layer. Inappropriate quantity of hidden nodes may lead to the useless or redundant neuron nodes, and make the whole structure complex, even cause the worse generalization and unstable classification performances. Based on this, an adaptive CCR-ELM with variable-length brain storm optimization algorithm is proposed for the class imbalance learning. Each individual consists of all above parameters of CCR-ELM and its length varies with the number of hidden nodes. A novel mergence operator is presented to incorporate two parent individuals with different length and generate a new individual. The experimental results for nine imbalance datasets show that variable-length brain storm optimization algorithm can find better parameters of CCR-ELM, resulting in the better classification accuracy than other evolutionary optimization algorithms, such as GA, PSO, and VPSO. In addition, the classification performance of the proposed adaptive algorithm is relatively stable under varied imbalance ratios. Applying the proposed algorithm in the fault diagnosis of conveyor belt also proves that ACCR-ELM with VLen-BSO has the better classification performances.

VPSO-Based CCR-ELM for Imbalanced Classification

Artificial bee colony optimization-based weighted extreme learning machine for imbalanced data learning

Article 06 February 2018

Class-specific extreme learning machine based on overall distribution for addressing binary imbalance problem

Article 12 December 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In many real-world classification problems, the total number of data for a class is less than the other class (He and Ma 2013; Shen et al. 2018). We call it the imbalance data sets. Classifying the imbalance data is a challenging problem that has attracted growing attention of the scholars. Many outstanding works, normally grouped into data- and algorithm-based strategies (Liu et al. 2009), had been done on the class imbalance learning. Data-based strategies weaken the imbalance degree of data by over-sampling for minority samples and under-sampling for majority samples. Synthetic minority over-sampling technique (SMOTE) (Nitesh et al. 2002) and SMOTE-based improved sampling algorithms were the popular data-based strategies (Ramentol 2012; Zeng et al. 2009; Gao et al. 2011). Algorithm-based strategies improve the traditional classification algorithms according to the data features, with the purpose of enhancing the classification performances. The cost-sensitive boosting methods (Li and Mao 2014) updated the weight of AdaBboost in terms of the classification cost. Kernel-based learning methods were employed to adjust the classification boundary between minority and majority samples (Wu and Chang 2005; Wu et al. 2016). Three approaches including the boundary movement, the biased penalty and the class-boundary alignment were presented to adjust boundary skews (Wu and Chang 2005). In the kernel boundary alignment algorithm, the kernel matrix was adjusted in terms of the skew distribution of dataset (Wu et al. 2016). A weighted extreme learning machine (weighted-ELM) (Zong et al. 2013) was presented to strengthen the role of minority class by assigning an extra weight to each minority sample. Xiao et al. (2017) introduced class-specific regulation cost to penalize the misclassified data, and then proposed a class-specific cost regulation extreme learning machine (CCR-ELM), together with its kernel-based extension, to solve binary or multiclass classification problems with imbalance distribution.

In recent years, extreme learning machine (ELM) has become an attention-attracting method due to its faster speed and better generalization in regression and classification applications. It provides an unified learning platform with a widespread type of feature mappings by employing generalized single hidden layer feed-forward networks (SLFNs) and least-square-based learning algorithm (Huang et al. 2005). The input weights and hidden biases of SLFNs are chosen randomly. Corresponding output weights are analytically determined by generalized inverse operation of output matrix in the hidden layer. Some improved ELM algorithms were proposed to solve different kinds of regression or classification problems. Online sequential extreme learning machines employed additive hidden nodes with radial basis function in SLFNs, with the purpose of better generalization for dealing with an online data flow or concept-drifting data stream (Rong et al. 2009; Zhao et al. 2012; Mirza et al. 2015). Incremental extreme learning machine (Yang et al. 2017) randomly added hidden nodes and manually adjusted the output weights linking the hidden layer and the output layer. Semi-supervised ELM and unsupervised ELM fit for tackling the classification problems that collecting a large amount of labeled data is hard and time-consuming (Huang et al. 2014). For the class imbalance problems, weighted ELM (W-ELM) (Zong et al. 2013), total error rate ELM (TER-ELM) (Mirza et al. 2013), and class-specific cost regulation extreme learning machine (CCR-ELM) (Xiao et al. 2017) were presented, and CCR-ELM show better performances.

All above ELM-based learning algorithms may result in ill-condition problems due to the randomly selected input weights and hidden biases. Non-optimal or redundant input weights and hidden biases also make them responding slowly (Huang et al. 2005) and lead to worse generalization. In view of the shortcomings above, some population-based intelligence optimization algorithms, such as genetic algorithm (GA) (Ertam and Avci 2017), particle swarm optimization (PSO) (Han et al. 2013), fruit fly optimization (FOA) (Li and Zhang 2014) and krill herd algorithm (KH) (Guo et al. 2017), were introduced to find the optimal parameters of the activation function in ELM-based learning algorithms, with the purpose of improving the classification accuracy.

In this paper, we focus on the class imbalance problem, in which majority class with negative label and minority class with positive label. Class-specific cost regulation ELM is employed to weaken the influence of the imbalance and dispersion degree of the data on the classification accuracy. The structure of CCR-ELM decided by the number of hidden nodes, the input weights, the hidden biases and two tradeoff factors, plays the direct role in its classification performances. In general, above parameters are pre-set by human or randomly generated. Inappropriate values of the above parameters may lead to the useless or redundant neuron nodes, and make the whole structure complex. Based on this, brain storm optimization algorithm (BSO) is employed to find the optimal parameters’ values of CCR-ELM. Because the number of input weights and biases depend on the number of hidden nodes, the networks with a varying number of hidden nodes are depicted by the individuals with various length. The novel mutation and mergence operators generating the new individuals are proposed, with the purpose of effectively exchanging evolutionary information between the individuals with variable-length. Following that, an adaptive CCR-ELM based on variable-length brain storm optimization algorithm is presented to solve the class imbalance problems.

The rest of paper is organized as follows. A brief review of CCR-ELM is given in Sect. 2. BSO-based CCR-ELM for classifying imbalance data is proposed in Sect. 3. The classification performances of the proposed method are compared with other class imbalance learning algorithms in Sect. 4. Finally, the whole paper is concluded and work planned to do next are given.

2 Class-specific cost regulation extreme learning machine

In extreme learning machine originally proposed by Huang et al. (2005), the input weights and biases of SLFNs are randomly generated. Corresponding output matrix of hidden-layer is explicitly calculated in terms of the output weights by the fewer steps. Thus, the computation cost of ELM is less.

Suppose there are N distinct samples, depicted by ($X_{i},y_{i}), i=1,2,\ldots ,N$. $X_{i}=[x_{i1},x_{i2},\ldots ,x_{in}]^{T}\in R^{n}$ and $y_{i}=[y_{i1},y_{i2},\ldots ,y_{im}]^{T}\in R^{m}$. Let ${a_j}$ and ${{\beta }_j}$ be the input and output weights, respectively. ${b_j}$ is the bias of a hidden unit. A SLFN with L hidden nodes is modeled as follows.

$$\begin{aligned} \sum \limits _{j = 1}^L {{\beta _j}g({a_j},{b_j},{X_i})} = {o_i},i = 1,...,N \end{aligned}$$

(1)

In above formula, $g(\bullet )$ is the activation function and normally employs traditional nonlinear functions, such as sigmoid, sine, radial basis functions, and so on (Huang et al. 2012). The error between the estimated output ${o_i}$ and the actual output ${y_i}$ shall be zero once SLFNs can exactly approximate the feature of data. That is,

$$\begin{aligned} \sum \limits _{j = 1}^L {{\beta _j}g({a_j},{b_j},{X_i}) = {y_i}} ,i = 1,...,N \end{aligned}$$

(2)

Let $\beta =[{\beta _{1}}^{T},\ldots ,{\beta _{L}}^{T}]^{T}$ and $Y=[{y_{1}}^{T},\ldots ,{y_{N}}^{T}]^{T}$. Above model can be simplified as ${H\beta }=Y$.

$$H = \left[ {\begin{array}{*{20}c} {g(a_{1} ,b_{1} ,X_{1} )} & \cdots & {g(a_{L} ,b_{L} ,X_{1} )} \\ \vdots & \cdots & \vdots \\ {g(a_{1} ,b_{1} ,X_{N} )} & \cdots & {g(a_{L} ,b_{L} ,X_{N} )} \\ \end{array} } \right]_{{N \times L}}$$

(3)

H is so-called output matrix of hidden layer.${h_{ij}}$ represents the output of jth hidden node corresponding to the input ${X_i}$. During the training process, the parameters of a hidden node, including ${a_j}$ and ${b_j}$, are not adjusted after initially generated. Corresponding output weights are estimated as follows.

$$\begin{aligned} {{\hat{\beta }}} = {H^\dag }Y = \left\{ \begin{array}{l} {\left( {{\textstyle {I \over C}} + {H^T}H} \right) ^{ - 1}}{H^T}Y,L < N\\ {H^T}{\left( {{\textstyle {I \over C}} + {H^T}H} \right) ^{ - 1}}Y,L \ge N \end{array} \right. \end{aligned}$$

(4)

$H^{\dag }$ is the Moore–Penrose generalized inverse of H. C is a pre-set parameter, with the purpose of providing a tradeoff between minimizing the training error and maximizing the marginal distance. I is the unit matrix. The optimal output weights are gotten by minimizing the cost function $\Vert O-Y\Vert$.

After introducing class-specific regulation cost, CCR-ELM was presented to solve the class imbalance problems. Two tradeoff factors, including $C^{+}$ for minority positive samples and $C^{-}$ for majority negative samples, are employed to rebalance both classes (Xiao et al. 2017). Assuming that the number of minority positive samples and majority negative samples are expressed by $l_{1}$ and $l_{2}$, respectively. CCR-ELM is modelled as follows.

$$\begin{aligned} \begin{aligned}&\min \, \left( {\frac{1}{2}{{\left\| \beta \right\| }^2} + \frac{1}{2}{C^ + }\sum \limits _{i = 1|{y_{i = + 1}}}^{{l_1}} {\xi _i^2} + \frac{1}{2}{C^ - }\sum \limits _{i = 1|{y_{i = - 1}}}^{{l_2}} {\xi _i^2} } \right) \\&s.t.\, h({x_i})\beta = {y_i} - {\xi _i},i = 1,...N. \end{aligned} \end{aligned}$$

(5)

Corresponding output weights ${\hat{\beta }}$ are computed as:

$$\begin{aligned} {{\hat{\beta }}} = {H^\dag }Y = \left\{ \begin{array}{l} {\left( {{\textstyle {I \over {{C^ + }}}} + {\textstyle {I \over {{C^ - }}}} + {H^T}H} \right) ^{ - 1}}{H^T}Y,L < N\\ {H^T}{\left( {{\textstyle {I \over {{C^ + }}}} + {\textstyle {I \over {{C^ - }}}} + {H^T}H} \right) ^{ - 1}}Y,L \ge N \end{array} \right. \end{aligned}$$

(6)

For the binary classification problems, the decision function of CCR-ELM-based classifier is $f(x)=sign h(x)\beta$ .

$$\begin{aligned} f(x) = \left\{ \begin{array}{l} sign \, h(x){\left( {{\textstyle {I \over {{C^ + }}}} + {\textstyle {I \over {{C^ - }}}} + {H^T}H} \right) ^{ - 1}}{H^T}Y,L < N\\ sign \, h(x){H^T}{\left( {{\textstyle {I \over {{C^ + }}}} + {\textstyle {I \over {{C^ - }}}} + {H^T}H} \right) ^{ - 1}}Y,L \ge N \end{array} \right. \end{aligned}$$

(7)

In CCR-ELM, there are five key parameters having a direct impact on the classification accuracy, including the number of hidden nodes L, the input weights ${a_{j}}$, the biases ${b_{j}}$, ${C^{+}}$ for the minority positive samples and ${C^{-}}$ for the majority negative samples. The former three parameters decide the structure of SLFNs and normally are pre-set by human.

3 Adaptive CCR-ELM with variable-length brain storm optimization algorithm

Brain storm optimization algorithm (BSO) proposed by Cheng et al. (2015) is a promising swarm intelligence algorithm simulating the human brainstorming process. Implicit parallel search ability of BSO makes the population having superior diversity and converges to the optimum faster, especially for multi-modal or large-scale optimization problems. Many improved BSO algorithms have been presented, with the purpose of avoiding premature convergence and accelerating the convergence speed. Cheng et al. (2014) proposed the population diversity enhanced brain storm optimization algorithm (en-BSO). Two partial re-initializing solutions strategies were given to help the population jump out of local optima. The algorithm steps of en-BSO with re-initialization strategy is shown in Algorithm 1 (Cheng et al. 2014).

In en-BSO, a new individual is formed by the following mutation or mergence operators. Let ${\overline{{x_{ij}}}(t)}$ and ${x_{ij}(t)}$ be jth gene of ${x_{i}}$ after and before mutation operation. T and t denote the maximum iterations and the current generation. logsig is a logarithmic sigmoid transfer function. c is a constant. The mutation operation corresponds to Step 3.1 shown as follows.

$$\begin{aligned} \overline{{x_{ij}}} (t)= & {} {x_{ij}}(t) + \xi (t) \times rand() \end{aligned}$$

(8)

$$\begin{aligned} \xi (t)= & {} \log {\mathrm{{sig}}}\left( {\frac{{0.5T - t}}{c}} \right) \times rand() \end{aligned}$$

(9)

The mergence operation corresponding to Step 3.2 is done based on two selected parent individuals or cluster centers expressed by ${x_{ij}(t)}$ and ${x_{kj}(t)}$. ${{\overline{\overline{{x_{ij}}}}(t)}}$ denotes the new individual. Let ${\alpha \sim U(0,1)}$ be a random number.

$$\begin{aligned} \overline{\overline{{x_{ij}}}} (t) = \alpha {x_{ij}}(t) + (1 - \alpha ){x_{kj}}(t) \end{aligned}$$

(10)

Above en-BSO is employed to find the optimal parameters of CCR-ELM. Among five key parameters of CCR-ELM, the number of input weights ${a_{ij}} \in {R^{N \times L}}$ and the biases ${b_{j}\epsilon {R^{L}}}$ vary with the number of hidden nodes L. Based on this, the networks with different L can be described by the individuals with various length, expressed by $p_{i}=(C^{+},C^{-},a_{11},\ldots ,a_{1N},\ldots ,a_{L1},\ldots ,a_{LN},b_{1},b_{2},\ldots ,b_{L})$. The length of each individual is $L(N+1)$+2. During the evolution, once two parent individuals have different length, no matching genes for the shorter individual can be merged with the longer one by Eq. (10). Therefore, a novel mergence operator is proposed, and then variable-length en-BSO algorithm (VLen-BSO) is constructed.

Assuming that ${l_{j}}$ and ${l_{k}}$ are the length of parent individuals. Only the genes in the matching positions can be merged each other and the new genes in the mismatching parts are randomly generated based on corresponding parents.

$$\begin{aligned} \overline{\overline{{x_{ij}}}} (t) = \left\{ {\begin{array}{ll} {\alpha {x_{ij}}(t) + (1 - \alpha ){x_{kj}}(t)}&j \le \min ({l_i},{l_k})\\ {x_{ij}}(t) + \alpha \xi (t)&\min ({l_i},{l_k}) < j \le \left\lceil {\max ({l_i},{l_k}) \times rand()} \right\rceil ,{l_i}> {l_k}\\ 0&j > \left\lceil {\max ({l_i},{l_k}) \times rand()} \right\rceil \end{array}} \right. \end{aligned}$$

(11)

Above variable-length en-BSO algorithm is introduced to find the optimal structure and tradeoff factors of CCR-ELM for the class imbalance learning. We call it adaptive CCR-ELM with VLen-BSO. In order to evaluate the classification accuracy of each individual, the fitness function is defined as the root mean square error (RMSE) on the training set ${X_{train}}$.

$$\begin{aligned} F = \sqrt{\frac{{\sum \nolimits _{i = 1}^N {\left\| {\sum \nolimits _{j = 1}^L {{{{{\hat{\beta }}} }_j}g({a_j},{b_j},{X_i}) - \left. {{y_i}} \right\| _2^2} } \right. } }}{N}} \end{aligned}$$

(12)

The detailed algorithm steps of adaptive CCR-ELM with VLen-BSO are shown as follows.

4 Experimental results and discussion

Nine binary-class imbalanced datasets are chosen from UCI machine learning repository preprocessed by Frank and Asuncion (2010) in the experiments, with the purpose of fully analyzing and comparing the classification performances of the proposed method. Especially, the data in Wine1 record chemical composition of wine in the same area of Italy, and can be classified into three categories in fact. It is transformed into binary-classification problem by combining the data in any two classes into majority samples, and employing the others as minority samples. Detailed information about these datasets is listed in Table 1. The imbalanced ratio (IR) of datasets varies from 0.1 to 0.4. In addition, a real-world class imbalance problem for the fault diagnosis of conveyor belt is employed to further analyze the classification performances.

Table 1 Detailed information about the datasets from UCI

Full size table

To evaluate the class imbalance algorithms, we give more insight into not only the overall accuracy of the whole samples, but also the classification accuracy obtained within each class. Suppose TP, TN, FP, FN are the number of data with true positive, true negative, false positive and false negative, respectively. The overall accuracy of the dataset is defined in Eq. (13). However, this index is sensitive to the class distribution and cannot fully reflect the imbalance classification performances (Xiao et al. 2017). Based on this, minority accuracy, majority accuracy and G-mean are employed to evaluate the binary-classification performances (Yu et al. 2016).

$$\begin{aligned} overallaccuracy= & {} \frac{{TP + FN}}{{TP + FP + TN + FN}} \end{aligned}$$

(13)

$$\begin{aligned} minorityaccuracy= & {} \frac{{TP}}{{TP + FN}} \end{aligned}$$

(14)

$$\begin{aligned} majority\,accuracy= & {} \frac{{TN}}{{TN + FP}} \end{aligned}$$

(15)

$$\begin{aligned} G - mean= & {} \sqrt{\frac{{TP}}{{TP + FN}} \times \frac{{TN}}{{TN + FP}}} \end{aligned}$$

(16)

4.1 The influence of parameters on the performance of CCR-ELM

In CCR-ELM, sigmoid expressed by $g(a,b,X)=1/(1+exp(a*X+b))$ is employed as the activation function and the bound of input weights and biases are set as ${a_{j}}$ ,${{b_{j}}\in {[-1,1]}}$. ${C^{+}}$ and ${C^{-}}$ are the discrete values chosen from {$2^{-24}$,$2^{-23}$,...,$2^{25}$} and the possible values of L are {10,20,...,1000}.

The statistical classification performances under different L are shown in Fig. 1. Taking Car dataset as example, the effect of L on G-mean is shown in Fig. 2. With the increasing of L, more hidden nodes are contained in the network, bringing the better testing accuracy, and relatively stable G-mean when ${L\ge 500}$ for most of the datasets. However, G-mean for large-scale datasets, such as Poker hand and Dota2, becomes larger with the increasing number of hidden nodes, causing the weak generalization of CCR-ELM. Iris, as a small-scale dataset, the classification performance is apparently independent on L because larger L causes more redundant nodes having meaningless impact on the algorithm performances. In summary, the classifier with the complex structure is in favor of the classification accuracy, but obviously increases the computational complexity, causing the time-consuming for learning.

As L is pre-set to 400, G-mean under different ${C^{+}}$ and ${C^{-}}$ are shown in Fig. 3. Obviously, G-mean is sensitive to both ${C^{+}}$ and ${C^{-}}$. When ${C^{+}}$ and ${C^{-}}$ are less than $2^{-10}$, the classification accuracy retains worse because of less punishment for the misclassified data.

4.2 Comparison of the classification performances among ACCR-ELM combining with different evolutionary optimization algorithms

In this paper, five kinds of swarm intelligence optimization algorithms, including GA, PSO (Guo et al. 2016), variable-length PSO (VPSO) (Wang et al. 2018), VLen-BSO and group-based VLen-BSO (GVLen-BSO) are introduced to find the optimal parameters of CCR-ELM, with the purpose of improving the generalization and classification accuracy. Group-based VLen-BSO employes a simple grouping method (Guo et al. 2015) in the cluster operator instead of the k-means method to reduce the computational cost. During the experiments, the maximum terminal iteration and the population size are respectively set to 1500 and 20 for all methods. For ACCR-ELM with GA, the crossover probability is 0.9, the mutation probability is 0.01, and the generation gap is 0.9. For ACCR-ELM with PSO and VPSO, the learning factors ${c_{1}}=1.5$, ${c_{2}}=1.7$ and the inertia weight ${\omega } =1$. For ACCR-ELM with VLen-BSO, ${p_{c}=0.2}$, ${p_{g}=0.6}$. The number of the clusters is set to 20. Only half of the solutions will be re-initialized every 200 iterations.

The classification performances of above all adaptive CCR-ELM are compared with the original CCR-ELM and the statistical results under 30 running times are listed in Tables 2 and 3.

Table 2 Compare the overall accuracy $(({C^{+}},{C^{-}},L)Max/Average (\%))$ among CCR-ELM and ACCR-ELM with GA, PSO, VPSO, VLen-BSO and GVLen-BSO

Full size table

Table 3 Compare G-mean $(({C^{+}},{C^{-}},L)Max/Average (\%))$ among CCR-ELM and ACCR-ELM with GA, PSO, VPSO, VLen-BSO and GVLen-BSO

Full size table

The statistical results of overall accuracy for 30 running times shown in Table 2 indicate that ACCR-ELMs with VLen-BSO and GVLen-BSO have the better classification accuracy than other algorithms.By further comparing G-mean shown in Table 3, we see that traditional CCR-ELM with randomly generated parameters has the worse classification accuracy, especially for Iris, Dota2 and Poker hand datasets. In contrast, adaptive CCR-ELMs with the optimal parameters have the better classification performances for most of datasets. Especially, VLen-BSO and GVLen-BSO provides the more rational parameters for CCR-ELM, classifying the minority data more accurately.

4.3 The effect of the imbalance ratio on the algorithm performance

In real-world classification problems, the imbalance ratios reflecting the quantitative difference between the data of majority class and minority class, depend on the problem properties. The experiments under the imbalanced ratios changing from 0.1 to 0.4 are done for above nine imbalance datasets, with the purpose of analyzing the robustness of the class imbalance learning algorithms under varied imbalanced ratios.

The statistic experimental results of G-mean under different imbalanced ratio are shown in Fig. 4. Comparing the classification performances among ACCR-ELM with VLen-BSO, VPSO and GVLen-BSO indicate that no matter the imbalance ratios change or not, ACCR-ELM with VLen-BSO has the relatively stable classification performance. The G-means of ACCR-ELMs with GVLen-BSO and VPSO both fluctuate in the range about 0.5, meaning relatively worse robustness, but GVLen-BSO obtains the more rational parameters with the relatively better classification accuracy than VPSO.

4.4 Application of ACCR-ELM algorithm in fault diagnosis of conveyor belt

Conveyor belt is the most important transportation equipment in coal mine. Any fault, such as overcurrent, off-tracking, overload, and so on, may make the belt shut down immediately, and cause production delay, even reduce production efficiency. Given that the faults only occur occasionally and the belt generally holds a normal operation, the monitoring data for a conveyor belt is in fact imbalance.

The experiment is done on the monitoring data collected from two belts in certain coal mine. Belt 1 represents a belt for the underground transportation of coal mine. Belt 2 is a conveyer belt in a coal separating plant. Detailed information about two belts are listed in Tables 4 and 5. In ACCR-ELM with VLen-BSO, the number of hidden nodes is set to 500, sigmoid activation function with $C=1$ is employed. The learning factors and inertia weight of VPSO are denoted as ${c_{1}}=1.5$, ${c_{2}}=1.6$ , and $\omega =1$. The probability of VLen-BSO is preset to ${p_{c}}=0.2$, ${p_{g}}=0.6$. The statistic experimental results with 30 running times shown in Table 6 indicate that ACCR-ELM with VLen-BSO has the better classification accuracy and stability that VPSO because of the more reasonable structure of ELM.

Table 4 The information of two belts

Full size table

Table 5 Detailed information on the fault of belt in coal mine

Full size table

Table 6 Comparison of the classification performance for the belt fault between ACCR-ELM with VLen-BSO and VPSO

Full size table

5 Conclusions

An adaptive CCR-ELM combining with VLen-BSO was proposed to deal with the class imbalance problems, with the purpose of improving the generalization and classification accuracy. VLen-BSO is employed to find the optimal parameters of CCR-ELM, including the number of hidden nodes, the input weights, the biases and the tradeoff factors. All above parameters compose of an individual in VLen-BSO and the length of individuals vary with the number of hidden nodes. In order to incorporate two parent individuals with different length, a novel mergence operator is presented to generate a new individual. The experimental results for nine imbalance datasets show that VLen-BSO can find better parameters of CCR-ELM, resulting in the better classification accuracy than other evolutionary optimization algorithms, such as GA, PSO, and VPSO. In addition, the classification performance of the proposed adaptive algorithm is relatively stable under varied imbalance ratios. Applying the proposed algorithm in the fault diagnosis of conveyor belt also proves that ACCR-ELM with VLen-BSO has the better classification accuracy and stability. The optimal structure for ensemble imbalance learning with more than one CCR-ELM is our future work.

References

Cheng S, Qin QD, Chen JF et al (2014) Maintaining population diversity in brain storm optimization algorithm. Proc IEEE Congr Evolut Comput 4(3):3230–3237
Google Scholar
Cheng S, Qin QD, Chen JF et al (2015) Brain storm optimization algorithm: a review. Artif Intell Rev 6728:1–14
Google Scholar
Ertam F, Avci E (2017) A new approach for internet traffic classification: GA-WK-ELM. Measurement 95:135–142
Article Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/mlS
Gao M, Hong X, Chen S et al (2011) A combined smote and PSO based RBF classifier for two-class imbalanced problems. Neurocomputing 74(17):3456–3466
Article Google Scholar
Guo XP, Wu YL, Xie LX et al (2015) An adaptive brain storm optimization algorithm for multiobjective optimization problems. Lect Notes Comput Sci 9140(4):365–372
Article Google Scholar
Guo YN, Zhen Y, Wang C et al (2016) Cultural particle swarm optimization algorithms for uncertain multi-objective problems with interval parameters. Nat Comput 16(4):1–22
MathSciNet Google Scholar
Guo YN, Zhang P, Cheng J et al (2017) An improved weighted ELM with krill herd algorithm for imbalanced learning. In: Advances in swarm intelligence Springer, pp 371–378
Guo YN, Cheng J, Luo S, Gong DW, Xue Y (2018) Robust dynamic multi-objective vehicle routing optimization method. IEEE/ACM Trans Comput Biol Bioinform 15(6):1891–1903
Article Google Scholar
Han F, Yao HF, Ling QH (2013) An improved evolutionary extreme learning machine based on particle swarm optimization. Neurocomputing 116:87–93
Article Google Scholar
He HB, Ma YQ (2013) Imbalanced learning: foundations, algorithms, and applications. Wiley, New York
Book Google Scholar
Huang GB, Zhu QY, and Siew CK (2005) Extreme learning machine: a new learning scheme of feed forward neural networks. In: IEEE international joint conference on neural networks, vol 2, pp 985–990
Huang GB, Zhou HM, Ding XJ et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B 42(2):513–529
Article Google Scholar
Huang GB, Song SJ, Gupta JND et al (2014) Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern 44(12):2405–2417
Article Google Scholar
Li QJ, Mao YB (2014) A review of boosting methods for imbalanced data classification. Pattern Anal Appl 17(4):679–693
Article MathSciNet Google Scholar
Li D, Zhang W (2014) Stock price prediction based on ELM and FOA. Comput Eng Appl 50(18):14–18
Google Scholar
Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B 39(2):539–550
Article Google Scholar
Mirza B, Lin ZP, Toh KA (2013) Weighted online sequential extreme learning machine for class imbalance learning. Neural Process Lett 38(3):465–486
Article Google Scholar
Mirza B, Lin ZP, Liu N (2015) Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 149(Part A):316–329
Article Google Scholar
Nitesh VC, Bowyer KW, Lawrence OH et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
MATH Google Scholar
Ramentol E (2012) SMOTE-RS: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245–265
Article Google Scholar
Rong HJ, Huang GB, Narasimhan S et al (2009) Online sequential fuzzy extreme learning machine for function approximation and classification problems. IEEE Trans Syst Man Cybern Part B 39(4):1067
Article Google Scholar
Shen F, Liu J, Wu P (2018) Double complete D-LBP with extreme learning machine auto-encoder and cascade forest for facial expression analysis. In: 25th IEEE international conference on image processing (ICIP), pp 1947–1951
Wang B, Sun YN, Xue B et al (2018) Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. In: IEEE congress on evolutionary computation
Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795
Article Google Scholar
Wu DH, Wang ZL, Ye C et al (2016) Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset. Neurocomputing 190:35–49
Article Google Scholar
Xiao WD, Zhang J, Li Y et al (2017) Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomputing 261:70–82
Article Google Scholar
Yang LX, Yang SY, Li SJ et al (2017) Incremental laplacian regularization extreme learning machine for online learning. Appl Soft Comput 59:546–555
Article Google Scholar
Yu HL, Sun CY, Yang XB et al (2016) ODOC-ELM: optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data. Knowl Based Syst 92:55–70
Article Google Scholar
Zeng ZQ, Qun WU, Liao BS et al (2009) A classfication method for imbalance data set based on kernel smote. Acta Electron Sin 37(11):2489–2495
Google Scholar
Zhao JW, Wang ZH, Dong SP (2012) Online sequential extreme learning machine with forgetting mechanism. Neurocomputing 87(15):79–89
Article Google Scholar
Zong WW, Huang GB, Chen YQ (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101(3):229–242
Article Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant 61573361, Research Program of Frontier Discipline of China University of Mining and Technology under Grant 2015XKQY19, Six Talent Peak Project in Jiangsu Province under Grant 2017-DZXX-046 and National Key Research and Development Program under Grant 2016YFC0801406.

Author information

Authors and Affiliations

School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221008, Jiangsu, China
Jian Cheng, Jingjing Chen, Yi-nan Guo, Shi Cheng, Linkai Yang & Pei Zhang

Authors

Jian Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi-nan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Shi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Linkai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Pei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi-nan Guo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, J., Chen, J., Guo, Yn. et al. Adaptive CCR-ELM with variable-length brain storm optimization algorithm for class-imbalance learning. Nat Comput 20, 11–22 (2021). https://doi.org/10.1007/s11047-019-09735-9

Download citation

Published: 27 February 2019
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11047-019-09735-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Adaptive CCR-ELM with variable-length brain storm optimization algorithm for class-imbalance learning

Abstract

Similar content being viewed by others

VPSO-Based CCR-ELM for Imbalanced Classification

Artificial bee colony optimization-based weighted extreme learning machine for imbalanced data learning

Class-specific extreme learning machine based on overall distribution for addressing binary imbalance problem

1 Introduction

2 Class-specific cost regulation extreme learning machine

3 Adaptive CCR-ELM with variable-length brain storm optimization algorithm

4 Experimental results and discussion

4.1 The influence of parameters on the performance of CCR-ELM

4.2 Comparison of the classification performances among ACCR-ELM combining with different evolutionary optimization algorithms

4.3 The effect of the imbalance ratio on the algorithm performance

4.4 Application of ACCR-ELM algorithm in fault diagnosis of conveyor belt

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive CCR-ELM with variable-length brain storm optimization algorithm for class-imbalance learning

Abstract

Similar content being viewed by others

VPSO-Based CCR-ELM for Imbalanced Classification

Artificial bee colony optimization-based weighted extreme learning machine for imbalanced data learning

Class-specific extreme learning machine based on overall distribution for addressing binary imbalance problem

Explore related subjects

1 Introduction

2 Class-specific cost regulation extreme learning machine

3 Adaptive CCR-ELM with variable-length brain storm optimization algorithm

4 Experimental results and discussion

4.1 The influence of parameters on the performance of CCR-ELM

4.2 Comparison of the classification performances among ACCR-ELM combining with different evolutionary optimization algorithms

4.3 The effect of the imbalance ratio on the algorithm performance

4.4 Application of ACCR-ELM algorithm in fault diagnosis of conveyor belt

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation