1 Introduction

Blast furnace is a high-energy-consuming industrial reactor, which has been extensively used to produce pig iron, often called hot metal. The core work duty of BF operator is to attain smoothly stable operation of the BF and further produce hot metal withhigh quality and low cost. For this goal, it is necessary to keep the thermal state of BF in optimal state. However, in practice, the direct measurements on thermal state of the BF are not available due to the hostile conditions of high temperature, highpressure, etc. [27]. The thermal state is always evaluated by some thermal indicator such as the silicon content, the phosphorus content, and the sulfur content, etc. Thereinto, silicon content in blast furnace hot metal (Si), acting as the most important indicator of the inner thermal state of the furnace, needs to be controlled strictly within proper bounds to produce iron with high quality and low cost. To achieve this, accurate prediction of silicon content in BF is needed to assist BF operator take actions. For this reason, prediction of silicon content in BF has attracted considerable research interests, among them data-driven modeling is being investigated quite intensively [1, 2, 5,6,7, 11, 12, 18, 22, 24, 26, 38, 39]. Zhou et al. proposed a Hammerstein model for the prediction of the silicon content in BF using the least squares support vector machine-based nonlinear subspace identification method [26]. Zeng et al. established an input-output linear model for silicon content prediction using the subspace identification method and further applied this model to realize the predictive control of silicon content in BF hot metal [2]. Bhattacharya studied silicon content prediction model using the PLS algorithm, where a decomposition of the observed input and output into their principal components is made [1]. Gao et al. developed a Volterra expansion-based chaotic predictor. Through grid-search for the reconstructed parameters, a chaos-based linear Volterra filter completely outperformed the Taylor expansion-based chaotic predictor in predicting silicon content in BF hot metal [6]. Chen et al. developed a hybrid method where a genetic algorithm evolved neural networks is used for predicting the silicon content in BF hot metal [5]. Besides the direct regression prediction of silicon content in BF hot metal, SVM have been used to predict the change trend of silicon content in BF hot metal [11, 12, 22]. Specifically, Gao et al. proposed a fuzzy-based support vector machine classification algorithm to perform the tasks of determining the controllable bound of silicon content in BF hot metal and further formulated this prediction task into a multi-classification problem [11]. Nevertheless, there remain some shortcomings in the proposed algorithms. For example, the leaning rate of Hammerstein model based on the Least Square Support Vector Machine (LS-SVM) nonlinear subspace identification method is difficult to be determined. The structure and size for neural network are hard to be determined. What is more, SVM is not good at dealing with the imbalanced classification problem directly.

To tackle the above challenges, in this paper, a W-SVM ensemble predictor based on AdaBoost is proposed for the prediction task of iron-making process. Indeed, some closely related works have been proposed by researchers, such as [9, 13, 15, 33]. However, there are marked differences in the settings of weak classifier and model parameter. In [15], Hu et al. proposed an ensemble classification method based on SVM and applied it to deal with the diagnosis problem. In this method, the kernel width of weak classifier had been dynamically changed. In order to solve the multi-label classification problem, based on the work proposed by Hu et al., Guo et al. modified the original AdaBoost sampling methods in order to make it to be more informative [13]. To further improve the efficiency of the method proposed by Hu et al., Chang et al. proposed a local diversity AdaBoost SVM (LDAB-SVM) method, which obtains more diversified SVM as the base classifier to enhance the capability of the final classifier [33]. The above mentioned three ensemble methods use SVM with different kernels as the weak learner and fix the iterations rounds T automatically when the weak learner’s performance is less than 50%. The main difference between the above works and the research in this paper is that this research is aiming at solving the imbalance classification problem, so the weighted version SVM (W-SVM) was selected with the same kernel width as base classifier and fix the parameters of kernel width γ, trade-off parameter C and iteration round T through a separate validation set. Besides, Garca and Lozano proposed the Boosting Support Vector Machine (BSVM), which employs the weighted SVM as base learner and discard the less representative examples in the data set namely the examples with small weight under the current distribution [9]. The weights of base classifier are optimized by an additional optimized process, which may cost some additional time. The proposed method is different from BSVM, since only the weight distribution of samples were adjusted and then the weights of weak learners were fixed under the framework of AdaBoost framework without any additional computation. To the best of our knowledge, the work closest to ours is [30], which employed SMOTE to make each data batch class-balanced and then trained the Adaboost SVM model. All in all, the proposed method tactfully combined the well known weighted version SVM and AdaBoost learning framework to deal with unbalanced classification problem. On the t + 1th iteration round, the weight distribution of training sample xi, i.e., \( {s}_i^{t+1} \) is fixed by the classical AdaBoost framework and then it is directly used to train the t + 1th W-SVM. To the best of our knowledge, this kind of deep fusion algorithm has not been proposed in the literature. So the proposed method is significantly different from the related works [9, 13, 15, 33, 40] which work under the framework of AdaBoost completely by selecting SVM as weak classifier and superficially employs the weight distribution of training sample to calculate the training error of weak classifier.

The main contributions of this paper lie in providing a potential alternative to predict the inner thermal state of BF, i.e., AdaBoost W-SVM predictor. The proposed AdaBoost W-SVM predictor dynamically adjusts the weight distribution of training samples according to the performance of weak classifier, in this way to effectively mine information lurked in the samples. Further more, the prediction can act as a guide to aid the operators for judging the thermal state of BF.

The remainder of this paper is organized as follows. In Section 2, the related work on Weighted Support Vector Machine and AdaBoost algorithm are reviewed. In Section 3, the proposed SVM ensemble predictor based on AdaBoost algorithm is introduced. In Section 4, empirical evaluations on benchmark datasets and real-world datasets are presented to show the superiority of the proposed method. Finally, Section 5 presents the conclusion of this paper.

2 Review of related method

2.1 Weighted support vector machine

Support Vector Machine is a kind of kernel based learning algorithm [34, 35, 41]. It works by designing an optimal hyper plane in a feature space (Reproducing Kernel Hilbert Space) to separate two different classes [28, 31, 36]. This procedure is realized through implicitly specifying a nonlinear mapping \( \varPhi \left(\mathrm{x}\right):{\mathcal{R}}^d\to \mathcal{H} \) and optimizing the linear decision function y = ωTΦ(x) + b, where d stands for the number of features, ω is the normal vector of the hyperplane and b is the intercept term. Mathematically, this procedure can be formulated as a quadratic programming. Let the training dataset be T = {(xi, yi)| i = 1, 2, ⋯, n}, where n is the number of training samples, \( {\mathrm{x}}_i\in {\mathcal{R}}^d \) is the input variable and yi ∈ {−1, 1} is the output variable. The primal problem of SVM is

$$ \underset{\omega, b,\xi }{\min}\frac{1}{2}\parallel \omega {\parallel}^2+C\sum \limits_{i=1}^n{\xi}_i\mathrm{s}.\mathrm{t}.{y}_i\left({\omega}^{\mathrm{T}}\varPhi \left({\mathrm{x}}_i\right)+b\right)\ge 1-{\xi}_i,i=1,2,\cdots, n,{\xi}_i\ge 0,i=1,2,\cdots, n. $$
(1)

Where ξi denotes slack variable, C is the regularized parameter. By the means of Lagrange function Eq.(1) can be transformed into the dual programming

$$ \underset{\alpha }{\min}\frac{1}{2}\sum \limits_{i=1}^n\sum \limits_{j=1}^n{\alpha}_i{\alpha}_j{y}_i{y}_jk\left({\mathrm{x}}_i,{\mathrm{x}}_j\right)-\sum \limits_{i=1}^n{\alpha}_i\mathrm{s}.\mathrm{t}.\sum \limits_{i=1}^n{\alpha}_i{y}_i=0,0\le {\alpha}_i\le C,i=1,2,\cdots, n. $$
(2)

Here, αi is the Lagrange multiplier and kernel trick is introduced to calculate the inner product in a high- or even infinite dimensional feature space \( \mathcal{H} \), i.e., k(xi, xj) = ΦT(xi)Φ(xj), using low-dimensional pattern space data without knowing the exact form of Φ.

It is clearly that all training data of each class are treated uniformly by SVM. But in many real applications, some data are more valuable than others, so the effects of different training data should vary one another. In this scenario, SVM become invalid for every training point is treated comparably. To overcome this limitation of SVM, Lin and Wang proposed a weighted version of SVM to incorporate weights associated with each training point [21].

Given the training set with associated sample weight T = {(xi, yi, si)| i = 1, 2, ⋯, n}. Here, the sample weight si satisfies 0 ≤ si ≤ 1, i = 1, ⋯, n. The W-SVM is formulated as

$$ \underset{\omega, b,\xi }{\min}\frac{1}{2}\parallel \omega {\parallel}^2+C\sum \limits_{i=1}^n{s}_i{\xi}_i\mathrm{s}.\mathrm{t}.{\sum}_{i=1}^n{\alpha}_i{y}_i=0,{\xi}_i\ge 0,i=1,2,\cdots, n. $$
(3)

Eq.(3) is rewritten as the corresponding dual form by means of non-negative Lagrange multipliers, i.e.,

$$ \underset{\alpha }{\min}\frac{1}{2}\sum \limits_{i=1}^n\sum \limits_{j=1}^n{\alpha}_i{\alpha}_j{y}_i{y}_jk\left({\mathrm{x}}_i,{\mathrm{x}}_j\right)-\sum \limits_{i=1}^n{\alpha}_i\mathrm{s}.\mathrm{t}.{\sum}_{i=1}^n{\alpha}_i{y}_i=0,0\le {\alpha}_i\le {s}_iC,i=1,2,\cdots, n. $$
(4)

Solving Eq.(4) can yield the optimal αi and further yield the decision function

$$ f\left(\mathrm{x}\right)=\operatorname{sgn}\ \left(\sum \limits_{i=1}^n{y}_i{\alpha}_ik\left({\mathrm{x}}_i,\mathrm{x}\right)+b\right). $$
(5)

2.2 AdaBoost algorithm

AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm formulated by Freund and Schapire [10]. It is always used in conjunction with base learning algorithm, i.e., weak classifier, to improve the learner’s performance. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing, the final model can be proven to converge to a strong learner [10]. The output of weak classifiers iscombined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak classifiers are tweaked in favor of those instances misclassified by previous classifiers [3]. However, it is not workable to process the multi-classification problem directly. One may transform the multi-classification problem into several binary classification problems by means of one-vs-one or one-vs-all strategies [37].

3 W-SVM ensemble algorithm based on AdaBoost

Specially, the well-known W-SVM was selected as the weak classifier of AdaBoost algorithm. The weight distribution of training samples can be efficiently incorporated with the weak classifier W-SVM, i.e. Eqs.(3–4). Motivated by the work of [42], the following extensively used weighting scheme was applied to generate weight distribution for each training samples as \( {S}^1=\left\{{s}_1^1,{s}_2^1,\cdots, {s}_n^1\right\} \) where \( {s}_i^1=\frac{1}{\#{y}_i} \) and #yiis the number of samples belonging to class yi in the training set. And then, the W-SVM was trained with data samples D and weight distribution S1. In this way, the first weak classifier was generated and shorted as f1 ≔ alL(D, S1). For each classifier ft(·) the quantity on the training set D was evaluated by the following criterion

$$ {\epsilon}_t={\Sigma}_{i=1}^n{s}_i^tI\left({f}_t\left({\mathrm{x}}_i\right)\mathbf{\ne}{y}_i\right) $$
(6)

where I(·) is the indicator function, and fix the weight of classifier ft(·) by

$$ {w}_t=\frac{1}{2}\ln\ \frac{1-{\epsilon}_t}{\epsilon_t} $$
(7)

Based on wt, the weight distribution of training samples for the next W-SVM was updated by

$$ {s}_i^{t+1}={s}_i^t\exp\ \left\{-{w}_t{y}_i{f}_t\left({\mathrm{x}}_i\right)\right\}/s $$
(8)

where \( s={\Sigma}_{i=1}^n{s}_i^t\exp\ \left\{-{w}_t{y}_i{f}_t\left({\mathrm{x}}_i\right)\right\} \).

Repeat the above procedure T times and finally output the ensemble classifier \( f\left(\mathrm{x}\right)={\Sigma}_{t=1}^T{w}_t{f}_t\left(\mathrm{x}\right) \). The schematic illustration of W-SVM ensemble algorithm based on AdaBoost was listed in Fig. 1 as follows.

Fig. 1
figure 1

Schematic illustration of AdaBoost Algorithm

The proposed W-SVM ensemble algorithm based on AdaBoost is presented as follows.

figure f

3.1 Parameter selection

It is worth to mention that there are three parameters, i.e., the trade off parameter C, the kernel parameter, and the iteration number T in the proposed ensemble W-SVM algorithm. The well-known strategy of cross validation with gridsearch may be the first choice for the determination of the model parameters. However, it would be time consuming to fix three hyper parameters by cross validation strategy with grid search. To save time, in the current work one parameter will be tuned while the other parameters keep unchanged, then the best value for each parameter will be recorded and combined. The same strategy is used for other comparing methods. With the fixed iteration round T, the trade-off parameter C and the specified kernel parameter, the final ensemble classifier can be formulated as

$$ f\left(\mathrm{x}\right)=\operatorname{sgn}\left({\Sigma}_{t=1}^T{w}_t\left({\Sigma}_{i=1}^n{\alpha}_i{y}_ik\left({\mathrm{x}}_i,\mathrm{x}\right)+{b}_t\right)\right) $$
(9)

3.2 Computational complexity analysis

For the classical SVM and W-SVM, researchers have proposed some methods which can improve the computational complexity to O(n2) for n training samples [20]. Therefore, the training of W-SVM ensemble predictor needs O(Tn2) operations for T weak classifiers on a training set with n samples. However, the computational complexity of W-SVM ensemble predictor may be significantly reduced by incorporation the sampling strategy proposed by Garca and Lozano [9], which is out of the scope of this paper and will be our future investigation.

4 Empirical analysis

In this section, firstly, an experiment was presented to compare the proposed ensemble method with six well-known algorithms, i.e., SVDD [32], SVM [4], W-SVM [21], IOLSSVM [29], ELM [16] and OSELM [14] on five benchmark datasets. Then, experiments on two real-world BF datasets were conducted to assess the performance of the proposed method. In the current work, all experiments are performed in MATLAB 7.14 environment on a single computer with 3.4 GHz Intel Core i7 processors and 32 Gb of RAM.

4.1 Evaluation criteria

Classification accuracy is the most frequently used evaluation metric for classification problem. However, in the scenario of imbalanced classification problem, it is not applicable. For example, in an imbalanced binary classification problem with 99% and 1% samples belong to two classes respectively; it is easy for the naive learner to achieve 99% accuracy by directly classifying all samples into the majority class. In this case, the accuracy for the minority class is as low as 0%, which maybe not acceptable in practice. Thus, the widely used G-means metric which is the geometric mean of accuracy for each class was selected as the evaluation metric [28]. G-means for binary classification problem is expressed as

$$ \sqrt{TPR\times TNR} $$
(10)

where \( TPR=\frac{TP}{TP+ FN} \) and \( TNR=\frac{TN}{TN+ FP} \) measure the proportion of actual positives/negatives that are correctly identified, TP, TN, FP, and FN stands for the number of true positive, true negative, false positive and falsenegative, respectively.

4.2 Evaluating the performance on benchmark datasets

Firstly, the performance of the proposed AdaBoost W-SVM predictor was evaluated on a number of benchmark datasets. All the datasets can be obtained from the LIBSVM website [4]. Details of the datasets are listed in Table 1.

Table 1 Details of Benchmark Datasets

All of the benchmark datasets are split into three parts: training set, validation set and test set. The number of samples for training set and validation set is 300 and 200 respectively. The rest samples of each datasets will be the test set. The validation set is used for the selection of three parameters γ, C and T. In the current work, the frequently used RBF kernel k(xi, xj) =  exp (−γ ∥ xi xj2) is selected as the kernel function for each weak classifier, i.e., W-SVM. There are three parameters in the proposed AdaBoost W-SVM method: the kernel width γ, the hyper parameter C and iteration number T. Specifically, the kernel width γ was tuned from \( \left\{\frac{1}{5d}:\frac{1}{5d}:\frac{2}{d}\right\} \) where d is the number of features. The hyper parameter C and the iteration number T was tuned from {100 : 100 : 2000} and {5 : 1 : 20} respectively. In Fig.2, the subfigure of γ is shown with the other parameters C and T fixed as 1000 and 10 respectively. The first subfigure shows that the optimal parameter γ is 0.1. For the subfigure of C, T is fixed as 10 and γ is set as the optimal (γ = 0.1). For the subfigure of T, γ and C is set as the optimal (γ = 0.1, C = 1600).

Fig. 2
figure 2

G-mean of AdaBoost W-SVM on Eegeye dataset with respect to different γ, C and T

It is important to notice that similar parameter optimization process has been done for the compared methods. The hyper parameters γ and C in SVDD, SVM, W-SVM and IOLSSVM, the number of hidden neurons in ELM and OSELM are also fixed in advance by the validation set. With the optimal setting for model parameter, their performances were compared on the test set. It is presented in Fig.3, the classification performance of AdaBoost W-SVM consistently outperforms all other baseline methods on three out of five datasets, i.e., Pima, Ijcnn, and German. Besides, on the dataset Australian the proposed method shows comparable performance with the best one (i.e., W-SVM), on Eegeye performance of AdaBoost W-SVM is comparable to the best one (i.e., IOLSSVM).

Fig. 3
figure 3

Comparison results of SVDD, SVM, W-SVM, ELM and AdaBoost W-SVM on benchmark datasets

In order to provide a more comprehensive comparison, the frequently used Statlog’s ordering method [17] was selected to list the ranks of the compared algorithms over multiple datasets. The presented ranking in Table 2 shows the performance of proposed AdaBoost W-SVM compared with the other baseline methods, i.e., SVDD, SVM, W-SVM, IOLSSVM, ELM and OSELM in most cases. SVDD’s perform is the worst one on datasets Pima, Eegeye, and Australian. Table 1 shows that the ratios of #oxPositive/ # Negative of the three datasets are all around 1, so we roughly come to the conclusion that SVDD is not a good candidate for general classification problem.

Table 2 Comprehensive Rank of Methods

To further validate the conclusions, Demsar’s method [8] was applied to provide a set of non-parametric statistical tests to compare different methods. Firstly, the Friedman test was introduced to measure if there is a significantdifference between different methods. Given k methods and N datasets, let \( {r}_i^j \) denote the rank of the j-th method on the i-th dataset. Friedman test compares the average rank of different methods, i.e., \( {R}_j=\frac{1}{N}{\Sigma}_{i=1}^N{r}_i^j \). Under the null-hypothesis (all methods are equivalent, so the ranks Rj are equivalent), Iman and Davenport showed the statistic \( {F}_F=\frac{\left(N-1\right){\chi}_F^2}{N\left(k-1\right)-{\chi}_F^2} \) where \( {\chi}_F^2=\frac{12N}{k\left(k+1\right)}\left[{\Sigma}_j{R}_j^2-\frac{k{\left(k+1\right)}^2}{4}\right] \) is distributed according to F-distribution with k-1 and (k-1)(N-1) degrees of freedom. According to Table 2, the statistic FF is 6 and the corresponding critical value Fha(6, 24) is 2.51 (α = 0.05). Friedman test at 0.05 significance level (with six methods and five datasets) rejects the null hypothesis which states that “all methods are equivalent.” In addition, a post-hoc test was applied in further comparison between the methods. In this research, Nemenyi test [25] was adopted to check whether the proposed method is significantly different from the state-of-the-art methods by checking whether the corresponding average ranks differ by at least the critical difference CD\( ={q}_{\alpha}\sqrt{\frac{k\left(k+1\right)}{6N}} \). The critical value for two-tailed Nemenyi test over seven compared methods is q0.05 = 2.95 and the corresponding CD is 3.89. Figure 4 shows the CD diagram for G-mean criterion, where the average rank of each compared method is marked along the axis (lower ranksto the right). Any compared method whose average rank is within one CD to that of AdaBoost W-SVM is interconnected with a thick line. In the contrast, any method not connected with AdaBoost W-SVM is deemed to have a significant difference.

Fig. 4
figure 4

Comparison of AdaBoost W-SVM (control method) against other methods with the Nemenyi test

As illustrated in Fig.4, the SVDD method is not connecting with AdaBoost W-SVM, and it is considered to be significantly different (α = 0.05) from the control algorithm.

4.3 Thermal state-prediction of blast furnace

In this section, the empirical performance of the proposed W-SVM ensemble algorithm was evaluated on real datasets sampled from two BFs, i.e., No.6 BF at Baotou Iron and Steel Group Co. of China and No.1 BF at Laiwu Iron and Steel Group Co. of China labeled as BF(a) and BF(b), respectively [19]. Thereinto, BF(a) is a medium-sized blast furnace with the inner volume of about 2500m3, BF(b) is a pint-sized blast furnace with the inner volume of about 750m3. These two BFs may be representative of medium and pint-sized blast furnaces for which it poses a great challenge to construct reliable models. For the BF(a), the candidate variables include blast volume, feed speed, utilization coefficient etc., totally 20 variables are collected by the data acquisition module and further preprocessed by the data processing module of Blast Furnace Expert System and selected as the model inputs. While for BF(b), there are only six relevant variables collected due to the weak measurement condition in this blast furnace.

There are 839 data samples observed from BF(a) and 800 data samples from BF(b). The sampling interval is about 1.5 h for BF(a) while it is about 2 h for BF(b). In Fig.5, data from the two BFs, both the time series of the silicon content and the blast volumes, were demonstrated. Obviously, the magnitudes of the demonstrated two variables are significantly different. Due to the fact that the variable with a large magnitude will have a stronger effect on the model than the one with a small magnitude, the emphasis of some variables with large magnitude may be overestimated. To overcome this issue, the following relationship is used to handle the input data to make them under the same magnitude

$$ {x}_i^j=\frac{u_i^j-\mu \left({u}_i^j\right)}{\delta \left({u}_i^j\right)},i=1,2,\dots, n;j=1,2,\dots, d, $$
(11)

where \( {u}_i^j \) stands for the jth input variable of the ith actual sampling data, μ(uj) denotes the mean of the jth variable, and δ(uj) stands for the standard deviation of the jth variable.

Fig. 5
figure 5

Time series of silicon content in hot metal and blast volume

In the current experiment, the two sampled datasets were divided into three separate parts, i.e., training set, validation set, and test set. For two BFs, 300 data samples are used for model training, 200 data samples form the validation set and the remainder are used for testing.

During the process of prediction thermal state of BF the proposed predictor has three key phases: (1) firstly performs clustering analysis of the silicon content in BF hot metal by k-means and divides it into low, high and normal states. The states of low and high are merged into one named abnormal, in this way, the original series is divided into two categories, i.e., normal and abnormal. (2) Select W-SVM as the weak classifier and perform binary classification task on the imbalanced problem. (3) Employ AdaBoost algorithm to adjust the weights of learning samples dynamically and fix the weights of weak classifiers according to their accuracy. Output the final predictor by combining the weaker classifiers. To design the binary W-SVM classifier that is mentioned above, it is needed to determine the expected control bound of the silicon content in BF metal from the collected data. Motivated by Luo et al. [23], data clustering, i.e., k-means is selected as the solution for this task. In the current work, the controlled bounds of the hot metal silicon content are fixed as [0.39,0.84] and [0.32,0.61], which are the average of ten random experimental results for BF(a) and BF(b), respectively. Figure 6 and Table 3 show the normal and abnormal state distribution results of BF(a) and BF(b) according to k-means. The ratio of normal and abnormal is about 4 : 1, so the corresponding binary classification problem is a typical imbalanced classification problem. In this research, the abnormal state was labeled as −1 while the normal state was labeled as 1, respectively.

Fig. 6
figure 6

The distribution of normal and abnormal status by k-means

Table 3 Categories Distribution of BFs

In the current experiment, the frequently used RBF kernel k(xi, xj) =  exp (−γ ∥ xi xj2) is selected as the kernel function for the proposed AdaBoost W-SVM method and the baseline methods, i.e., SVDD, SVM, W-SVM and IOLSSVM. Sigmoid function is selected as the activation function in ELM and OSELM. The hyper parameters thekernelwidthγ, the hyper parameter C and iteration number T are optimized sequentially on the validation set as mentioned in Section IV.B.

It is found that for BF(a) (γ, C, T) = (0.23,1400,10) achieves the best G-means on the validation set. Table 4 lists the weights of weak classifiers for BF(a).

Table 4 Weights of Weak Classifiers for BF(a)

The same strategy is also applied to BF(b) and (γ, C, T) is set as (0.0667,800,10). In Table 5, the weights of weak classifiers are listed.

Table 5 Weights of Weak Classifiers for BF(b)

The detailed experiment results are illustrated in Tables 6 and 7, in the tables, G-means, TPR, and TNR of the proposed W-SVM ensemble predictor are compared among the six methods: SVDD, SVM, W-SVM, IOLSSVM, ELM and OSELM. The hyper parameters (γ, C) for SVDD, SVM, W-SVM and IOLSSVM, the parameter of ELM and OSELM (number of hidden neurons) were tuned on the validation set. With BF(a), the best parameters for SVDD is (0.02,0.1) (it should be noted that the hyper C in SVDD is tuned from {0.001,0.002,0.01,0.02,0.1,0.2,1}), while it is (0.05,1800) for SVM, and (0.03,900) for the W-SVM method, (16,300) for IOLSSVM, and the optimal number of hidden neurons is 50 for ELM, 80 for OSELM. With BF(b), the best łeft(γ, C is (0.3,0.01) for SVDD, (0.2,1300) for SVM, (0.17,900) for W-SVM, and (9.6,100) for IOLSSVM, while the optimal number of hidden neurons is 100 for ELM and OSELM. The above optimal parameters that are estimated on the validation set are applied to the test. The G-means of W-SVM ensemble predictor is superior to the baseline methods, for BF(a) it achieves at least 1.5% improvement while in BF(b) the increment was even greater and reached 3.2%. The results suggest that the proposed W-SVM ensemble have stronger adaptability in practical applications and more flexibility to solve the problems of imbalanced classification. It is also worth to mention that the size of the BF will influence the classification performance that the medium-sized BF has a better performance than the pint-sized one. Therefore it is more difficult to predict the silicon content for pint-sized furnace. However, the criterion TNR on BF(b) is still not high enough and need further investigation in the future. The column of running time shows that the classical SVDD, SVM, W-SVM, and ELM are extremely efficient. The proposed AdaBoost W-SVM costs more running time than the compared methods. However, the blast furnace tapping interval is about 1.5–2 h for BF. So AdaBoost W-SVM method is also efficient enough for the current BF thermal state prediction task.

Table 6 The Classification Prediction Results of Thermal State in BF(a)
Table 7 The Classification Prediction Results of Thermal State in BF(b)

5 Conclusion

In this paper, a W-SVM ensemble predictor based on AdaBoost algorithm was proposed to address the thermal state prediction problem of blast furnace, which is formulated as an imbalanced binary classification problem. To tackle the imbalanced problem, weighted SVM is selected as the weak classifier and AdaBoost algorithm is employed to automatically determine the weight distribution of training samples as well as the weight of each weighted SVM. Experimental results on benchmark datasets and real-world datasets demonstrate the efficacy of the proposed method. Compared with the classical SVDD, SVM, W-SVM, IOLSSVM, ELM and OSELM, the proposed predictor yields improvement in the criteria of G-means. This prediction result may serve as guidelines for control complex blast furnace iron-making process.