A weighted SVM ensemble predictor based on AdaBoost for blast furnace Ironmaking process

Luo, Shihua; Dai, Zian; Chen, Tianxin; Chen, Hongyi; Jian, Ling

doi:10.1007/s10489-020-01662-y

A weighted SVM ensemble predictor based on AdaBoost for blast furnace Ironmaking process

Published: 22 February 2020

Volume 50, pages 1997–2008, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

A weighted SVM ensemble predictor based on AdaBoost for blast furnace Ironmaking process

Download PDF

Shihua Luo¹,
Zian Dai¹,
Tianxin Chen¹,
Hongyi Chen² &
…
Ling Jian ORCID: orcid.org/0000-0002-9385-5977³

707 Accesses
17 Citations
Explore all metrics

Abstract

As one of the most complex industrial reactors, there remain some urgent issues for blast furnace (BF), such as BF automation, prediction of the inner thermal state, etc. In this work, the prediction of BF inner thermal state, which is represented by the silicon content in BF hot metal, is taken as an imbalanced binary classification problem and a Weighted Support Vector Machine (W-SVM) ensemble predictor based on AdaBoost is presented for the prediction task. Compared with the traditional W-SVM algorithm, the proposed predictor dynamically adjusts the weight distribution of training samples according to the performance of weak classifier, in this way to mine information lurked in the samples. The prediction can act as a guide to aid the operators for judging the thermal state of BF in time. Experiments results on five benchmark datasets and two real-world BFs datasets demonstrate the efficiency of the proposed W-SVM ensemble predictor.

Prediction of friction stir weld quality without and with signal features

Article 17 November 2017

A blast furnace coke ratio prediction model based on fuzzy cluster and grid search optimized support vector regression

Article 10 February 2022

Prediction of Coal and Gas Outburst Based on FSVM

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Blast furnace is a high-energy-consuming industrial reactor, which has been extensively used to produce pig iron, often called hot metal. The core work duty of BF operator is to attain smoothly stable operation of the BF and further produce hot metal withhigh quality and low cost. For this goal, it is necessary to keep the thermal state of BF in optimal state. However, in practice, the direct measurements on thermal state of the BF are not available due to the hostile conditions of high temperature, highpressure, etc. [27]. The thermal state is always evaluated by some thermal indicator such as the silicon content, the phosphorus content, and the sulfur content, etc. Thereinto, silicon content in blast furnace hot metal (Si), acting as the most important indicator of the inner thermal state of the furnace, needs to be controlled strictly within proper bounds to produce iron with high quality and low cost. To achieve this, accurate prediction of silicon content in BF is needed to assist BF operator take actions. For this reason, prediction of silicon content in BF has attracted considerable research interests, among them data-driven modeling is being investigated quite intensively [1, 2, 5,6,7, 11, 12, 18, 22, 24, 26, 38, 39]. Zhou et al. proposed a Hammerstein model for the prediction of the silicon content in BF using the least squares support vector machine-based nonlinear subspace identification method [26]. Zeng et al. established an input-output linear model for silicon content prediction using the subspace identification method and further applied this model to realize the predictive control of silicon content in BF hot metal [2]. Bhattacharya studied silicon content prediction model using the PLS algorithm, where a decomposition of the observed input and output into their principal components is made [1]. Gao et al. developed a Volterra expansion-based chaotic predictor. Through grid-search for the reconstructed parameters, a chaos-based linear Volterra filter completely outperformed the Taylor expansion-based chaotic predictor in predicting silicon content in BF hot metal [6]. Chen et al. developed a hybrid method where a genetic algorithm evolved neural networks is used for predicting the silicon content in BF hot metal [5]. Besides the direct regression prediction of silicon content in BF hot metal, SVM have been used to predict the change trend of silicon content in BF hot metal [11, 12, 22]. Specifically, Gao et al. proposed a fuzzy-based support vector machine classification algorithm to perform the tasks of determining the controllable bound of silicon content in BF hot metal and further formulated this prediction task into a multi-classification problem [11]. Nevertheless, there remain some shortcomings in the proposed algorithms. For example, the leaning rate of Hammerstein model based on the Least Square Support Vector Machine (LS-SVM) nonlinear subspace identification method is difficult to be determined. The structure and size for neural network are hard to be determined. What is more, SVM is not good at dealing with the imbalanced classification problem directly.

To tackle the above challenges, in this paper, a W-SVM ensemble predictor based on AdaBoost is proposed for the prediction task of iron-making process. Indeed, some closely related works have been proposed by researchers, such as [9, 13, 15, 33]. However, there are marked differences in the settings of weak classifier and model parameter. In [15], Hu et al. proposed an ensemble classification method based on SVM and applied it to deal with the diagnosis problem. In this method, the kernel width of weak classifier had been dynamically changed. In order to solve the multi-label classification problem, based on the work proposed by Hu et al., Guo et al. modified the original AdaBoost sampling methods in order to make it to be more informative [13]. To further improve the efficiency of the method proposed by Hu et al., Chang et al. proposed a local diversity AdaBoost SVM (LDAB-SVM) method, which obtains more diversified SVM as the base classifier to enhance the capability of the final classifier [33]. The above mentioned three ensemble methods use SVM with different kernels as the weak learner and fix the iterations rounds T automatically when the weak learner’s performance is less than 50%. The main difference between the above works and the research in this paper is that this research is aiming at solving the imbalance classification problem, so the weighted version SVM (W-SVM) was selected with the same kernel width as base classifier and fix the parameters of kernel width γ, trade-off parameter C and iteration round T through a separate validation set. Besides, Garca and Lozano proposed the Boosting Support Vector Machine (BSVM), which employs the weighted SVM as base learner and discard the less representative examples in the data set namely the examples with small weight under the current distribution [9]. The weights of base classifier are optimized by an additional optimized process, which may cost some additional time. The proposed method is different from BSVM, since only the weight distribution of samples were adjusted and then the weights of weak learners were fixed under the framework of AdaBoost framework without any additional computation. To the best of our knowledge, the work closest to ours is [30], which employed SMOTE to make each data batch class-balanced and then trained the Adaboost SVM model. All in all, the proposed method tactfully combined the well known weighted version SVM and AdaBoost learning framework to deal with unbalanced classification problem. On the t + 1^th iteration round, the weight distribution of training sample x_i, i.e., $ {s}_i^{t+1} $ is fixed by the classical AdaBoost framework and then it is directly used to train the t + 1^th W-SVM. To the best of our knowledge, this kind of deep fusion algorithm has not been proposed in the literature. So the proposed method is significantly different from the related works [9, 13, 15, 33, 40] which work under the framework of AdaBoost completely by selecting SVM as weak classifier and superficially employs the weight distribution of training sample to calculate the training error of weak classifier.

The main contributions of this paper lie in providing a potential alternative to predict the inner thermal state of BF, i.e., AdaBoost W-SVM predictor. The proposed AdaBoost W-SVM predictor dynamically adjusts the weight distribution of training samples according to the performance of weak classifier, in this way to effectively mine information lurked in the samples. Further more, the prediction can act as a guide to aid the operators for judging the thermal state of BF.

The remainder of this paper is organized as follows. In Section 2, the related work on Weighted Support Vector Machine and AdaBoost algorithm are reviewed. In Section 3, the proposed SVM ensemble predictor based on AdaBoost algorithm is introduced. In Section 4, empirical evaluations on benchmark datasets and real-world datasets are presented to show the superiority of the proposed method. Finally, Section 5 presents the conclusion of this paper.

2 Review of related method

2.1 Weighted support vector machine

Support Vector Machine is a kind of kernel based learning algorithm [34, 35, 41]. It works by designing an optimal hyper plane in a feature space (Reproducing Kernel Hilbert Space) to separate two different classes [28, 31, 36]. This procedure is realized through implicitly specifying a nonlinear mapping $ \varPhi \left(\mathrm{x}\right):{\mathcal{R}}^d\to \mathcal{H} $ and optimizing the linear decision function y = ω^TΦ(x) + b, where d stands for the number of features, ω is the normal vector of the hyperplane and b is the intercept term. Mathematically, this procedure can be formulated as a quadratic programming. Let the training dataset be T = {(x_i, y_i)| i = 1, 2, ⋯, n}, where n is the number of training samples, $ {\mathrm{x}}_i\in {\mathcal{R}}^d $ is the input variable and y_i ∈ {−1, 1} is the output variable. The primal problem of SVM is

$$ \underset{\omega, b,\xi }{\min}\frac{1}{2}\parallel \omega {\parallel}^2+C\sum \limits_{i=1}^n{\xi}_i\mathrm{s}.\mathrm{t}.{y}_i\left({\omega}^{\mathrm{T}}\varPhi \left({\mathrm{x}}_i\right)+b\right)\ge 1-{\xi}_i,i=1,2,\cdots, n,{\xi}_i\ge 0,i=1,2,\cdots, n. $$

(1)

Where ξ_i denotes slack variable, C is the regularized parameter. By the means of Lagrange function Eq.(1) can be transformed into the dual programming

$$ \underset{\alpha }{\min}\frac{1}{2}\sum \limits_{i=1}^n\sum \limits_{j=1}^n{\alpha}_i{\alpha}_j{y}_i{y}_jk\left({\mathrm{x}}_i,{\mathrm{x}}_j\right)-\sum \limits_{i=1}^n{\alpha}_i\mathrm{s}.\mathrm{t}.\sum \limits_{i=1}^n{\alpha}_i{y}_i=0,0\le {\alpha}_i\le C,i=1,2,\cdots, n. $$

(2)

Here, α_i is the Lagrange multiplier and kernel trick is introduced to calculate the inner product in a high- or even infinite dimensional feature space $ \mathcal{H} $, i.e., k(x_i, x_j) = Φ^T(x_i)Φ(x_j), using low-dimensional pattern space data without knowing the exact form of Φ.

It is clearly that all training data of each class are treated uniformly by SVM. But in many real applications, some data are more valuable than others, so the effects of different training data should vary one another. In this scenario, SVM become invalid for every training point is treated comparably. To overcome this limitation of SVM, Lin and Wang proposed a weighted version of SVM to incorporate weights associated with each training point [21].

Given the training set with associated sample weight T = {(x_i, y_i, s_i)| i = 1, 2, ⋯, n}. Here, the sample weight s_i satisfies 0 ≤ s_i ≤ 1, i = 1, ⋯, n. The W-SVM is formulated as

$$ \underset{\omega, b,\xi }{\min}\frac{1}{2}\parallel \omega {\parallel}^2+C\sum \limits_{i=1}^n{s}_i{\xi}_i\mathrm{s}.\mathrm{t}.{\sum}_{i=1}^n{\alpha}_i{y}_i=0,{\xi}_i\ge 0,i=1,2,\cdots, n. $$

(3)

Eq.(3) is rewritten as the corresponding dual form by means of non-negative Lagrange multipliers, i.e.,

$$ \underset{\alpha }{\min}\frac{1}{2}\sum \limits_{i=1}^n\sum \limits_{j=1}^n{\alpha}_i{\alpha}_j{y}_i{y}_jk\left({\mathrm{x}}_i,{\mathrm{x}}_j\right)-\sum \limits_{i=1}^n{\alpha}_i\mathrm{s}.\mathrm{t}.{\sum}_{i=1}^n{\alpha}_i{y}_i=0,0\le {\alpha}_i\le {s}_iC,i=1,2,\cdots, n. $$

(4)

Solving Eq.(4) can yield the optimal α_i and further yield the decision function

$$ f\left(\mathrm{x}\right)=\operatorname{sgn}\ \left(\sum \limits_{i=1}^n{y}_i{\alpha}_ik\left({\mathrm{x}}_i,\mathrm{x}\right)+b\right). $$

(5)

2.2 AdaBoost algorithm

AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm formulated by Freund and Schapire [10]. It is always used in conjunction with base learning algorithm, i.e., weak classifier, to improve the learner’s performance. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing, the final model can be proven to converge to a strong learner [10]. The output of weak classifiers iscombined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak classifiers are tweaked in favor of those instances misclassified by previous classifiers [3]. However, it is not workable to process the multi-classification problem directly. One may transform the multi-classification problem into several binary classification problems by means of one-vs-one or one-vs-all strategies [37].

3 W-SVM ensemble algorithm based on AdaBoost

Specially, the well-known W-SVM was selected as the weak classifier of AdaBoost algorithm. The weight distribution of training samples can be efficiently incorporated with the weak classifier W-SVM, i.e. Eqs.(3–4). Motivated by the work of [42], the following extensively used weighting scheme was applied to generate weight distribution for each training samples as $ {S}^1=\left\{{s}_1^1,{s}_2^1,\cdots, {s}_n^1\right\} $ where $ {s}_i^1=\frac{1}{\#{y}_i} $ and #y_iis the number of samples belonging to class y_i in the training set. And then, the W-SVM was trained with data samples D and weight distribution S¹. In this way, the first weak classifier was generated and shorted as f₁ ≔ alL(D, S¹). For each classifier f_t(·) the quantity on the training set D was evaluated by the following criterion

$$ {\epsilon}_t={\Sigma}_{i=1}^n{s}_i^tI\left({f}_t\left({\mathrm{x}}_i\right)\mathbf{\ne}{y}_i\right) $$

(6)

where I(·) is the indicator function, and fix the weight of classifier f_t(·) by

$$ {w}_t=\frac{1}{2}\ln\ \frac{1-{\epsilon}_t}{\epsilon_t} $$

(7)

Based on w_t, the weight distribution of training samples for the next W-SVM was updated by

$$ {s}_i^{t+1}={s}_i^t\exp\ \left\{-{w}_t{y}_i{f}_t\left({\mathrm{x}}_i\right)\right\}/s $$

(8)

where $ s={\Sigma}_{i=1}^n{s}_i^t\exp\ \left\{-{w}_t{y}_i{f}_t\left({\mathrm{x}}_i\right)\right\} $.

Repeat the above procedure T times and finally output the ensemble classifier $ f\left(\mathrm{x}\right)={\Sigma}_{t=1}^T{w}_t{f}_t\left(\mathrm{x}\right) $. The schematic illustration of W-SVM ensemble algorithm based on AdaBoost was listed in Fig. 1 as follows.

The proposed W-SVM ensemble algorithm based on AdaBoost is presented as follows.

3.1 Parameter selection

It is worth to mention that there are three parameters, i.e., the trade off parameter C, the kernel parameter, and the iteration number T in the proposed ensemble W-SVM algorithm. The well-known strategy of cross validation with gridsearch may be the first choice for the determination of the model parameters. However, it would be time consuming to fix three hyper parameters by cross validation strategy with grid search. To save time, in the current work one parameter will be tuned while the other parameters keep unchanged, then the best value for each parameter will be recorded and combined. The same strategy is used for other comparing methods. With the fixed iteration round T, the trade-off parameter C and the specified kernel parameter, the final ensemble classifier can be formulated as

$$ f\left(\mathrm{x}\right)=\operatorname{sgn}\left({\Sigma}_{t=1}^T{w}_t\left({\Sigma}_{i=1}^n{\alpha}_i{y}_ik\left({\mathrm{x}}_i,\mathrm{x}\right)+{b}_t\right)\right) $$

(9)

3.2 Computational complexity analysis

For the classical SVM and W-SVM, researchers have proposed some methods which can improve the computational complexity to O(n²) for n training samples [20]. Therefore, the training of W-SVM ensemble predictor needs O(Tn²) operations for T weak classifiers on a training set with n samples. However, the computational complexity of W-SVM ensemble predictor may be significantly reduced by incorporation the sampling strategy proposed by Garca and Lozano [9], which is out of the scope of this paper and will be our future investigation.

4 Empirical analysis

In this section, firstly, an experiment was presented to compare the proposed ensemble method with six well-known algorithms, i.e., SVDD [32], SVM [4], W-SVM [21], IOLSSVM [29], ELM [16] and OSELM [14] on five benchmark datasets. Then, experiments on two real-world BF datasets were conducted to assess the performance of the proposed method. In the current work, all experiments are performed in MATLAB 7.14 environment on a single computer with 3.4 GHz Intel Core i7 processors and 32 Gb of RAM.

4.1 Evaluation criteria

Classification accuracy is the most frequently used evaluation metric for classification problem. However, in the scenario of imbalanced classification problem, it is not applicable. For example, in an imbalanced binary classification problem with 99% and 1% samples belong to two classes respectively; it is easy for the naive learner to achieve 99% accuracy by directly classifying all samples into the majority class. In this case, the accuracy for the minority class is as low as 0%, which maybe not acceptable in practice. Thus, the widely used G-means metric which is the geometric mean of accuracy for each class was selected as the evaluation metric [28]. G-means for binary classification problem is expressed as

$$ \sqrt{TPR\times TNR} $$

(10)

where $ TPR=\frac{TP}{TP+ FN} $ and $ TNR=\frac{TN}{TN+ FP} $ measure the proportion of actual positives/negatives that are correctly identified, TP, TN, FP, and FN stands for the number of true positive, true negative, false positive and falsenegative, respectively.

4.2 Evaluating the performance on benchmark datasets

Firstly, the performance of the proposed AdaBoost W-SVM predictor was evaluated on a number of benchmark datasets. All the datasets can be obtained from the LIBSVM website [4]. Details of the datasets are listed in Table 1.

Table 1 Details of Benchmark Datasets

Full size table

All of the benchmark datasets are split into three parts: training set, validation set and test set. The number of samples for training set and validation set is 300 and 200 respectively. The rest samples of each datasets will be the test set. The validation set is used for the selection of three parameters γ, C and T. In the current work, the frequently used RBF kernel k(x_i, x_j) = exp (−γ ∥ x_i− x_j∥²) is selected as the kernel function for each weak classifier, i.e., W-SVM. There are three parameters in the proposed AdaBoost W-SVM method: the kernel width γ, the hyper parameter C and iteration number T. Specifically, the kernel width γ was tuned from $ \left\{\frac{1}{5d}:\frac{1}{5d}:\frac{2}{d}\right\} $ where d is the number of features. The hyper parameter C and the iteration number T was tuned from {100 : 100 : 2000} and {5 : 1 : 20} respectively. In Fig.2, the subfigure of γ is shown with the other parameters C and T fixed as 1000 and 10 respectively. The first subfigure shows that the optimal parameter γ is 0.1. For the subfigure of C, T is fixed as 10 and γ is set as the optimal (γ = 0.1). For the subfigure of T, γ and C is set as the optimal (γ = 0.1, C = 1600).

It is important to notice that similar parameter optimization process has been done for the compared methods. The hyper parameters γ and C in SVDD, SVM, W-SVM and IOLSSVM, the number of hidden neurons in ELM and OSELM are also fixed in advance by the validation set. With the optimal setting for model parameter, their performances were compared on the test set. It is presented in Fig.3, the classification performance of AdaBoost W-SVM consistently outperforms all other baseline methods on three out of five datasets, i.e., Pima, Ijcnn, and German. Besides, on the dataset Australian the proposed method shows comparable performance with the best one (i.e., W-SVM), on Eegeye performance of AdaBoost W-SVM is comparable to the best one (i.e., IOLSSVM).

In order to provide a more comprehensive comparison, the frequently used Statlog’s ordering method [17] was selected to list the ranks of the compared algorithms over multiple datasets. The presented ranking in Table 2 shows the performance of proposed AdaBoost W-SVM compared with the other baseline methods, i.e., SVDD, SVM, W-SVM, IOLSSVM, ELM and OSELM in most cases. SVDD’s perform is the worst one on datasets Pima, Eegeye, and Australian. Table 1 shows that the ratios of #oxPositive/ # Negative of the three datasets are all around 1, so we roughly come to the conclusion that SVDD is not a good candidate for general classification problem.

Table 2 Comprehensive Rank of Methods

Full size table

To further validate the conclusions, Demsar’s method [8] was applied to provide a set of non-parametric statistical tests to compare different methods. Firstly, the Friedman test was introduced to measure if there is a significantdifference between different methods. Given k methods and N datasets, let $ {r}_i^j $ denote the rank of the j-th method on the i-th dataset. Friedman test compares the average rank of different methods, i.e., $ {R}_j=\frac{1}{N}{\Sigma}_{i=1}^N{r}_i^j $. Under the null-hypothesis (all methods are equivalent, so the ranks R_j are equivalent), Iman and Davenport showed the statistic $ {F}_F=\frac{\left(N-1\right){\chi}_F^2}{N\left(k-1\right)-{\chi}_F^2} $ where $ {\chi}_F^2=\frac{12N}{k\left(k+1\right)}\left[{\Sigma}_j{R}_j^2-\frac{k{\left(k+1\right)}^2}{4}\right] $ is distributed according to F-distribution with k-1 and (k-1)(N-1) degrees of freedom. According to Table 2, the statistic F_F is 6 and the corresponding critical value F_ha(6, 24) is 2.51 (α = 0.05). Friedman test at 0.05 significance level (with six methods and five datasets) rejects the null hypothesis which states that “all methods are equivalent.” In addition, a post-hoc test was applied in further comparison between the methods. In this research, Nemenyi test [25] was adopted to check whether the proposed method is significantly different from the state-of-the-art methods by checking whether the corresponding average ranks differ by at least the critical difference CD$ ={q}_{\alpha}\sqrt{\frac{k\left(k+1\right)}{6N}} $. The critical value for two-tailed Nemenyi test over seven compared methods is q_0.05 = 2.95 and the corresponding CD is 3.89. Figure 4 shows the CD diagram for G-mean criterion, where the average rank of each compared method is marked along the axis (lower ranksto the right). Any compared method whose average rank is within one CD to that of AdaBoost W-SVM is interconnected with a thick line. In the contrast, any method not connected with AdaBoost W-SVM is deemed to have a significant difference.

As illustrated in Fig.4, the SVDD method is not connecting with AdaBoost W-SVM, and it is considered to be significantly different (α = 0.05) from the control algorithm.

4.3 Thermal state-prediction of blast furnace

In this section, the empirical performance of the proposed W-SVM ensemble algorithm was evaluated on real datasets sampled from two BFs, i.e., No.6 BF at Baotou Iron and Steel Group Co. of China and No.1 BF at Laiwu Iron and Steel Group Co. of China labeled as BF(a) and BF(b), respectively [19]. Thereinto, BF(a) is a medium-sized blast furnace with the inner volume of about 2500m³, BF(b) is a pint-sized blast furnace with the inner volume of about 750m³. These two BFs may be representative of medium and pint-sized blast furnaces for which it poses a great challenge to construct reliable models. For the BF(a), the candidate variables include blast volume, feed speed, utilization coefficient etc., totally 20 variables are collected by the data acquisition module and further preprocessed by the data processing module of Blast Furnace Expert System and selected as the model inputs. While for BF(b), there are only six relevant variables collected due to the weak measurement condition in this blast furnace.

There are 839 data samples observed from BF(a) and 800 data samples from BF(b). The sampling interval is about 1.5 h for BF(a) while it is about 2 h for BF(b). In Fig.5, data from the two BFs, both the time series of the silicon content and the blast volumes, were demonstrated. Obviously, the magnitudes of the demonstrated two variables are significantly different. Due to the fact that the variable with a large magnitude will have a stronger effect on the model than the one with a small magnitude, the emphasis of some variables with large magnitude may be overestimated. To overcome this issue, the following relationship is used to handle the input data to make them under the same magnitude

$$ {x}_i^j=\frac{u_i^j-\mu \left({u}_i^j\right)}{\delta \left({u}_i^j\right)},i=1,2,\dots, n;j=1,2,\dots, d, $$

(11)

where $ {u}_i^j $ stands for the j^th input variable of the i^th actual sampling data, μ(u^j) denotes the mean of the j^th variable, and δ(u^j) stands for the standard deviation of the j^th variable.

In the current experiment, the two sampled datasets were divided into three separate parts, i.e., training set, validation set, and test set. For two BFs, 300 data samples are used for model training, 200 data samples form the validation set and the remainder are used for testing.

During the process of prediction thermal state of BF the proposed predictor has three key phases: (1) firstly performs clustering analysis of the silicon content in BF hot metal by k-means and divides it into low, high and normal states. The states of low and high are merged into one named abnormal, in this way, the original series is divided into two categories, i.e., normal and abnormal. (2) Select W-SVM as the weak classifier and perform binary classification task on the imbalanced problem. (3) Employ AdaBoost algorithm to adjust the weights of learning samples dynamically and fix the weights of weak classifiers according to their accuracy. Output the final predictor by combining the weaker classifiers. To design the binary W-SVM classifier that is mentioned above, it is needed to determine the expected control bound of the silicon content in BF metal from the collected data. Motivated by Luo et al. [23], data clustering, i.e., k-means is selected as the solution for this task. In the current work, the controlled bounds of the hot metal silicon content are fixed as [0.39,0.84] and [0.32,0.61], which are the average of ten random experimental results for BF(a) and BF(b), respectively. Figure 6 and Table 3 show the normal and abnormal state distribution results of BF(a) and BF(b) according to k-means. The ratio of normal and abnormal is about 4 : 1, so the corresponding binary classification problem is a typical imbalanced classification problem. In this research, the abnormal state was labeled as −1 while the normal state was labeled as 1, respectively.

Table 3 Categories Distribution of BFs

Full size table

In the current experiment, the frequently used RBF kernel k(x_i, x_j) = exp (−γ ∥ x_i− x_j∥²) is selected as the kernel function for the proposed AdaBoost W-SVM method and the baseline methods, i.e., SVDD, SVM, W-SVM and IOLSSVM. Sigmoid function is selected as the activation function in ELM and OSELM. The hyper parameters thekernelwidthγ, the hyper parameter C and iteration number T are optimized sequentially on the validation set as mentioned in Section IV.B.

It is found that for BF(a) (γ, C, T) = (0.23,1400,10) achieves the best G-means on the validation set. Table 4 lists the weights of weak classifiers for BF(a).

Table 4 Weights of Weak Classifiers for BF(a)

Full size table

The same strategy is also applied to BF(b) and (γ, C, T) is set as (0.0667,800,10). In Table 5, the weights of weak classifiers are listed.

Table 5 Weights of Weak Classifiers for BF(b)

Full size table

The detailed experiment results are illustrated in Tables 6 and 7, in the tables, G-means, TPR, and TNR of the proposed W-SVM ensemble predictor are compared among the six methods: SVDD, SVM, W-SVM, IOLSSVM, ELM and OSELM. The hyper parameters (γ, C) for SVDD, SVM, W-SVM and IOLSSVM, the parameter of ELM and OSELM (number of hidden neurons) were tuned on the validation set. With BF(a), the best parameters for SVDD is (0.02,0.1) (it should be noted that the hyper C in SVDD is tuned from {0.001,0.002,0.01,0.02,0.1,0.2,1}), while it is (0.05,1800) for SVM, and (0.03,900) for the W-SVM method, (16,300) for IOLSSVM, and the optimal number of hidden neurons is 50 for ELM, 80 for OSELM. With BF(b), the best łeft(γ, C is (0.3,0.01) for SVDD, (0.2,1300) for SVM, (0.17,900) for W-SVM, and (9.6,100) for IOLSSVM, while the optimal number of hidden neurons is 100 for ELM and OSELM. The above optimal parameters that are estimated on the validation set are applied to the test. The G-means of W-SVM ensemble predictor is superior to the baseline methods, for BF(a) it achieves at least 1.5% improvement while in BF(b) the increment was even greater and reached 3.2%. The results suggest that the proposed W-SVM ensemble have stronger adaptability in practical applications and more flexibility to solve the problems of imbalanced classification. It is also worth to mention that the size of the BF will influence the classification performance that the medium-sized BF has a better performance than the pint-sized one. Therefore it is more difficult to predict the silicon content for pint-sized furnace. However, the criterion TNR on BF(b) is still not high enough and need further investigation in the future. The column of running time shows that the classical SVDD, SVM, W-SVM, and ELM are extremely efficient. The proposed AdaBoost W-SVM costs more running time than the compared methods. However, the blast furnace tapping interval is about 1.5–2 h for BF. So AdaBoost W-SVM method is also efficient enough for the current BF thermal state prediction task.

Table 6 The Classification Prediction Results of Thermal State in BF(a)

Full size table

Table 7 The Classification Prediction Results of Thermal State in BF(b)

Full size table

5 Conclusion

In this paper, a W-SVM ensemble predictor based on AdaBoost algorithm was proposed to address the thermal state prediction problem of blast furnace, which is formulated as an imbalanced binary classification problem. To tackle the imbalanced problem, weighted SVM is selected as the weak classifier and AdaBoost algorithm is employed to automatically determine the weight distribution of training samples as well as the weight of each weighted SVM. Experimental results on benchmark datasets and real-world datasets demonstrate the efficacy of the proposed method. Compared with the classical SVDD, SVM, W-SVM, IOLSSVM, ELM and OSELM, the proposed predictor yields improvement in the criteria of G-means. This prediction result may serve as guidelines for control complex blast furnace iron-making process.

References

Bhattacharya T (2005) Prediction of silicon content in blast furnace hot metal using partial least squares (pls). ISIJ Int 45(12):1943–1945
Article Google Scholar
Cai J, Zeng J, Luo S (2012) A state space model for monitoring of the dynamic blast furnace system. ISIJ Int 52(12):2194–2199
Article Google Scholar
Cao Y, Miao QG, Liu JC, Gao L (2013) Advance and prospects of adaboost algorithm. Acta Automat Sin 39(6):745–758
Article Google Scholar
Chang C, Lin C (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Article Google Scholar
Chen W, Wang BX, Han HL (2013) Prediction and control for silicon content in pig iron of blast furnace by integrating artificial neural network with genetic algorithm. Ironmaking & Steelmaking 37(6):458–463
Article Google Scholar
Chuanhou G, Ling J, Xueyi L, Jiming C, Youxian S (2011) Data-driven modeling based on volterra series for multidimensional blast furnace system. IEEE Trans Neural Netw 22(12):2272–2283
Article Google Scholar
Cui G, Jiang Z, Zhan W, Gu J (2015) Prediction of blast furnace temperature based on time series neural network. Metallurgy Automation 39(5):15–21
Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet MATH Google Scholar
Elkin, G., Lozano, F.: Boosting support vector machines. International Conference on Machine Learning and Data Mining pp. 153–167 (2007)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm pp. 148–156 (1996)
Gao C, Ge Q, Ling J (2014) Rule extraction from fuzzy-based blast furnace svm multiclassifier for decision-making. IEEE Trans Fuzzy Syst 22(3):586–596
Article Google Scholar
Gao C, Ling J, Luo S (2011) Modeling of the thermal state change of blast furnace hearth with support vector machines. IEEE Trans Ind Electron 59(2):1134–1145
Article Google Scholar
Guo, H., Ma, J., Li, Z.: An improved adaboost-svm model based on sample weights and sampling equilibrium. In: 2016 4th international conference on electrical & electronics engineering and computer science (ICEEECS 2016). Atlantis Press (2016)
Guo W, Xu T, Tang K (2017) M-estimator-based online sequential extreme learning machine for predicting chaotic time series with outliers. Neural Comput & Applic 28(12):4093–4110
Article Google Scholar
Hu JH, Xie SS, Yang F, Cai K, Wang HT (2007) Ensemble of classification methods based on svm and its application in diagnosis. Journal Of Propulsion Technology-BeiJing 28(6):669
Google Scholar
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Article Google Scholar
Jian L, Li J, Liu H (2018) Exploiting multilabel information for noise-resilient feature selection. ACM Trans Intell Syst Technol 9(5):52
Article Google Scholar
Jian L, Li J, Luo S (2017) Exploiting expertise rules for statistical data-driven modeling. IEEE Trans Ind Electron 64(11):8647–8656
Article Google Scholar
Jian L, Song Y, Shen S, Wang Y, Yin H (2015) Adaptive least squares support vector machine predictor for blast furnace ironmaking process. ISIJ Int 55(4):845–850
Article Google Scholar
Joachims T (1998) Making large-scale svm learning practical. Tech rep
Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471
Article Google Scholar
Ling J, Gao C, Xia Z (2012) Constructing multiple kernel learning framework for blast furnace automation. IEEE Transactions on Automation Science & Engineering 9(4):763–777
Article Google Scholar
Luo S, Huang J, Zeng J, Zhang Q (2011) Identification of the optimal control center for blast furnace thermal state based on the fuzzy c-means clustering. ISIJ Int 51(10):1668–1673
Article Google Scholar
Mitra T, Pettersson F, Saxén H, Chakraborti N (2017) Blast furnace charging optimization using multi-objective evolutionary and genetic algorithms. Mater Manuf Process 32(10):1179–1188
Article Google Scholar
Nemenyi, P.B.: Distribution-free multiple comparisons (1963)
Ping Z, Song H, Hong W, Chai T (2017) Data-driven nonlinear subspace modeling for prediction and control of molten iron quality indices in blast furnace ironmaking. IEEE Trans Control Syst Technol 25(5):1761–1774
Article Google Scholar
Saxen H, Gao C, Gao Z (2013) Data-driven time discrete models for dynamic prediction of the hot metal silicon content in the blast furnace-a review. IEEE Transactions on Industrial Informatics 9(4):2213–2225
Article Google Scholar
Scholkopf, B., Smola, A.J.: Learning with kernels: support vector machines, Regularization, Optimization, and Beyond (2002)
Song X, Jian L, Song Y (2017) A chunk updating ls-svms based on block gaussian elimination method. Appl Soft Comput 51:96–104
Article Google Scholar
Sun J, Li H, Fujita H, Fu B, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with smote and time weighting. Information Fusion 54:128–144
Article Google Scholar
Sun S, Xie X, Dong C (2019) Multiview learning with generalized eigenvalue proximal support vector machines. IEEE Transactions on Cybernetics 49:688–697
Article Google Scholar
Tax DMJ, Duin RP (1999) Support vector domain description. Pattern Recogn Lett 20(11–13):1191–1199
Article Google Scholar
Tiantian C, Hongwei L, Shuisheng Z (2009) Large scale classification with local diversity adaboost svm algorithm. J Syst Eng Electron 20(6):1344–1350
Google Scholar
Vapnik, N.: The nature of statistical learning theory (2000)
Xie X (2018) Regularized multi-view least squares twin support vector machines. Appl Intell 48:3108–3115
Article Google Scholar
Xie, X., Sun, S.: Multi-view support vector machines with the consensus and complementarity in- formation. IEEE Transactions on Knowledge and Data Engineering
Yang, X., Zhuang, M.A., Yuan, S.: Multi-class adaboost algorithm based on the adjusted weak classifier. Journal of Electronics & Information Technology (2016)
Zeng JS, Gao CH, Su HY (2010) Data-driven predictive control for blast furnace ironmaking process. Computers & Chemical Engineering 34(11):1854–1862
Article Google Scholar
Zeng JS, Liu XG, Gao CH, Luo SH, Jian L (2008) Wiener model identification of blast furnace ironmaking process. ISIJ International 48(12):1734–1738
Article Google Scholar
Zhang, Y., Ni, M., Zhang, C., Liang, S., Fang, S., Li, R., Tan, Z.: Research and application of adaboost algorithm based on svm. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 662–666. IEEE (2019)
Zhao J, Xu Y, Fujita H (2019) An improved non-parallel universum support vector machine and its safe sample screening rule. Knowl-Based Syst 170(15):79–88
Article Google Scholar
Zong W, Huang GB, Chen Y (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242
Article Google Scholar

Download references

Acknowledgements

This research is partially supported by Natural Science Foundation of China under Grant Nos. 61973145 and 61873279, Foundation of the Education of Jiangxi Province under Grant No. GJJ180247, National Key Research and Development Program of Shandong Province under Grant No. 2018GSF120020 and Fundamental Research Funds for the Central Universities under Grant No. 19CX05027B.

Author information

Authors and Affiliations

School of Statistics, Jiangxi University of Finance & Economics, Nanchang, 330013, China
Shihua Luo, Zian Dai & Tianxin Chen
College of Science, China University of Petroleum, Qingdao, 266580, China
Hongyi Chen
School of Economics and Management & Institute for Energy Economics and Policy, China University of Petroleum, Qingdao, 266580, China
Ling Jian

Authors

Shihua Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zian Dai
View author publications
You can also search for this author in PubMed Google Scholar
Tianxin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hongyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ling Jian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ling Jian.

Ethics declarations

Conflict of interest

There is no conflict of interest from the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, S., Dai, Z., Chen, T. et al. A weighted SVM ensemble predictor based on AdaBoost for blast furnace Ironmaking process. Appl Intell 50, 1997–2008 (2020). https://doi.org/10.1007/s10489-020-01662-y

Download citation

Published: 22 February 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10489-020-01662-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A weighted SVM ensemble predictor based on AdaBoost for blast furnace Ironmaking process

Abstract

Similar content being viewed by others

Prediction of friction stir weld quality without and with signal features

A blast furnace coke ratio prediction model based on fuzzy cluster and grid search optimized support vector regression

Prediction of Coal and Gas Outburst Based on FSVM

1 Introduction