1 Introduction

Ironmaking process is a multivariant and nonlinear industrial process, and it has numerous chemical reactions (Peacey and Davenport 2016). Blast furnace is an essential step for ironmaking process (Radhakrishnan and Ram 2001; Geerdes et al. 2009). The smooth operation of the blast furnace is very important for iron and steel companies. Burden distribution matrix is a significant operation system of the blast furnace (Liu 2012). It is consisted of the angels of rotating chute and the rotational numbers corresponding to each angel of rotating chute. The burden distribution matrix is expressed as

$$\begin{aligned} \left[ {\begin{array}{*{20}{c}} {{\alpha _1}}&{}{{\alpha _2}}&{} \cdots &{}{{\alpha _i}}&{} \cdots &{}{{\alpha _n}}\\ {{N_1}}&{}{{N_2}}&{} \cdots &{}{{N_i}}&{} \cdots &{}{{N_n}} \end{array}} \right] , \end{aligned}$$
(1)

where \({\alpha _i}\) represents the angel of rotating chute, \({N_i}\) is the rotational numbers corresponding to each angel, \(i = 1,2, \ldots ,n\).

Reasonable burden distribution matrix can obtain reasonable gas flow distribution and make full use of energy of gas, and reasonable burden distribution matrix is one of important requirements that can realize low consumption, high efficiency, high quality and long campaign life of the blast furnace (Liu 2012; Shi et al. 2016). In practical production process, the burden distribution matrix is not fixed. The angles of rotating chute and the rotational numbers corresponding to each angle of rotating chute can be changed. When the blast furnace has the abnormal conditions or the blast furnace condition parameters are not good, the burden distribution matrix should be adjusted. Moreover, the burden distribution matrix can be measured by the blast furnace condition parameters. In real blast furnace operation, operators use blast furnace condition parameters to determine whether the burden distribution matrix needs to be adjusted. There are seven blast furnace condition parameters (the blast volume, the blast pressure, the blast velocity, the top pressure, the permeability index, the gas utilization rate and the utilization coefficient). According to these blast furnace condition parameters and operation experience, this paper will establish a data-driven prediction model based on machine learning algorithm to determine whether the burden distribution matrix needs to be adjusted.

Extreme learning machine (ELM) is a fast learning algorithm for single-hidden layer feedforward neural networks (SLFNs) (Huang et al. 2004; Li et al. 2005; Huang et al. 2006). The input weights and hidden biases of the ELM algorithm are randomly generated, respectively. They do not need to be fine-tuned. Moreover, the ELM algorithm has better generalization performance and faster learning rate than conventional gradient-based algorithms. Besides, it may get rid of sinking into the local minima. As a result, the ELM algorithm has been widely applied in many fields, such as image processing, face recognition, fault diagnose and human–computer interaction.

More and more researchers have pay attention to the ELM algorithm, and it has obtained huge development. Many variants of ELM have been improved the specific aspects of performance of the original algorithm. For example, the differential evolutionary (DE) (Storn and Price 1997) is used to optimize the input weights and hidden biases for ELM (Zhu et al. 2005); the online sequential extreme learning machine (OS-ELM) algorithm can learn data one-by-one or chunk-by-chunk (Liang et al. 2006); a dynamic ensemble ELM based on sample entropy is proposed to overcome the over-fitting problem (Zhai et al. 2012); voting method is introduced into ELM for classification applications (Cao et al. 2012); a weighted ELM (W-ELM) is proposed to deal with data with imbalanced class distribution (Zong et al. 2013); multi-ELM is proposed to approximate any target continuous function and classify disjointed regions (Yang et al. 2015). In addition, the ELM algorithm and improved ELM algorithm are widely used. For instance, the upper integral network with the ELM algorithm (Wang et al. 2011) has better performance than the single upper integral classifier; an algorithm for architecture selection of SLFNs trained by the ELM algorithm-based initial localized generalization error (LGEM) can automatically determine the number of hidden nodes (Wang et al. 2013); three ELM-based discriminative clustering methods are proposed by Huang et al. (Huang et al. 2015); a single classifier is trained by ELM to obtain optimal and generalized solution for multiclass traffic sign recognition (TSR) (Huang et al. 2017); based on MapReduce and ensemble of ELM classifiers, a classification algorithm is proposed for imbalanced large data datasets (Zhai et al. 2017); OS-ELM with sparse weighting is proposed to increase the classification accuracy of minority class samples and reduce the accuracy loss of majority class samples (Mao et al. 2017). In addition, a semi-supervised low-rank kernel learning method based on ELM is proposed (Liu et al. 2017); unsupervised ELM based on embedded features extreme learning machine autoencoder (ELM-AE) (Kasun et al. 2013) is proposed to handle the multicluster clustering (Ding et al. 2017). It is worth noting that the multilayer extreme learning machine (ML-ELM) algorithm (Kasun et al. 2013) has been proposed. It is based on the ELM-AE algorithm. ELM-AE is used to initialize the whole hidden layer weights of ML-ELM. ML-ELM does not need to be fine-tuned. Hence, it costs less training time than deep learning (Bengio 2009). There are some improvements for ML-ELM. For example, the ML-ELM with subnetwork nodes is proposed for representation learning (Yang and Wu 2016); a new architecture based on multilayer network framework is proposed for dimension reduction and image reconstruction (Yang et al. 2016).

Like ELM, the output weights of the ML-ELM algorithm is obtained by using \(\beta = {({H^\mathrm{T}}H)^{ - 1}}{H^\mathrm{T}}T\), where H is the output matrix of the last hidden layer for ML-ELM or the output matrix of the hidden layer for ELM. For ELM, multicollinearity problem can deteriorate its generalization performance (Zhang et al. 2016). Due to multicollinearity problem, \({H^\mathrm{T}H}\) may not always be nonsingular or may tend to be singular in some applications and \(\beta = {({H^\mathrm{T}}H)^{ - 1}}{H^\mathrm{T}}T\) may not perform well (Huang et al. 2006). In order to get more stable resultant solutions and better generalization performance, \(\beta = {(\frac{I}{\lambda } + {H^\mathrm{T}}H)^{ - 1}}{H^\mathrm{T}}T\) is used to obtain the output weights of ELM (Huang et al. 2012), including ML-ELM. PLS-ML-ELM (Su et al. 2016) uses the partial least square (PLS) method to overcome the multicollinearity problem. It has better generalization performance than the ML-ELM algorithm. However, the PLS-ML-ELM algorithm may have different results in different trails of simulations. Hence, this paper further improves the PLS-ML-ELM algorithm. The ensemble model (Hansen and Salamon 1990) can overcome this problem. And it is used to improve the PLS-ML-ELM algorithm in this paper. This algorithm is named as EPLS-ML-ELM. The EPLS-ML-ELM algorithm is consisted of several PLS-ML-ELMs.

The blast furnace process belongs to the process industry, and it has a high demand of timeliness. Hence, in this paper, ML-ELM is selected as the classification algorithm of the data-driven prediction model of determining whether the burden distribution matrix needs to be adjusted. Moreover, the proposed EPLS-ML-ELM algorithm has better generalization performance and prediction accuracy than the ML-ELM algorithm and the PLS-ML-ELM algorithm. Hence, the proposed EPLS-ML-ELM algorithm is applied to establish the data-driven prediction model of determining whether the burden distribution matrix needs to be adjusted. And the real industrial data are used to verify the data-driven prediction model. Compared with the SVM algorithm, the ELM algorithm, the ML-ELM algorithm and the PLS-ML-ELM algorithm, simulation results are shown that the data-driven prediction model based on the proposed EPLS-ML-ELM algorithm has better prediction accuracy and generalization performance.

The rest of this paper is as follows: Sect. 2 briefly introduces the ELM algorithm, the ELM-AE algorithm and the ML-ELM algorithm, and the PLS-ML-ELM algorithm and the EPLS-ML-ELM algorithm are introduced in detail. The data-driven prediction model of determining whether the burden distribution matrix needs to be adjusted is described in Sect. 3. The simulation results are shown in Sect. 4. Section 5 represents the summarization of this paper.

2 The improved multilayer extreme learning machine algorithm

Because the last hidden layer of ML-ELM has the multicollinearity problem which can affect the prediction accuracy, the PLS-ML-ELM algorithm uses the PLS method to overcome this problem. However, the PLS-ML-ELM algorithm may have different results in different trails of simulations. So this paper proposes the EPLS-ML-ELM algorithm to obtain better generalization performance than the PLS-ML-ELM algorithm. EPLS-ML-ELM is consisted of several PLS-ML-ELMs. This section briefly introduces the ELM algorithm, the ELM-AE algorithm and the ML-ELM algorithm. The PLS-ML-ELM algorithm and the proposed EPLS-ML-ELM algorithm are presented in detail, respectively.

2.1 Extreme learning machine (ELM)

ELM, proposed by Huang et al, is a fast machine learning algorithm of SLFNs (Huang et al. 2006). It has input layer, one single-hidden layer and output layer. Figure 1 gives the structure of the ELM algorithm. There are N arbitrarily distinct samples \(({x_i},{t_i}) \in {R^n} \times {R^m}\), where \({x_i} = {[{x_{i1}},{x_{i2}}, \ldots ,{x_{in}}]^\mathrm{T}}\) represents that the ith sample has n-dimensional features, and \({t_i} = {[{t_{i1}},{t_{i2}}, \ldots ,{t_{im}}]^\mathrm{T}}\) is the target vector. ELM with l hidden layer nodes and the activation function g(x) can be mathematically expressed as

$$\begin{aligned} \sum \limits _{i = 1}^l {{\beta _i}{g_i}({x_j})}= & {} \sum \limits _{i = 1}^l {{\beta _i} g({w_i}\cdot {x_j} + {b_i})} = {o_j},\nonumber \\&j = 1,2, \ldots ,N, \end{aligned}$$
(2)

where \({x_j} = [{x_{j1}},{x_{j2}}, \ldots ,{x_{jn}}]\) is the input vector, \({w_i} = [{w_{i1}},{w_{i2}}, \ldots ,{w_{in}}]^\mathrm{T}\) is the weight vector of connecting the ith hidden layer node and the whole input nodes, \({b_i}\) is the bias of the ith hidden layer node, \({\beta _i} = {[{\beta _{i1}},{\beta _{i2}}, \ldots ,}\) \({{\beta _{im}}]^\mathrm{T}}\) is the weight vector which connects the ith hidden node and the whole output nodes.

Equation (2) can be briefly expressed as

$$\begin{aligned} H\beta \mathrm{{ = }}T, \end{aligned}$$
(3)

where \(H = {\left[ {\begin{array}{*{20}{c}} {g({w_1}\cdot {x_1} + {b_1})}&{} \cdots &{}{g({w_l}\cdot {x_1} + {b_l})}\\ \vdots &{} \ddots &{} \vdots \\ {g({w_1}\cdot {x_N} + {b_1})}&{} \cdots &{}{g({w_l}\cdot {x_N} + {b_l})} \end{array}} \right] _{N \times l}}\), \(\beta = {\left[ {\begin{array}{*{20}{c}} {\beta _1^\mathrm{T}}\\ \vdots \\ {\beta _l^\mathrm{T}} \end{array}} \right] _{l \times m}}\), \(T = {\left[ {\begin{array}{*{20}{c}} {t_1^\mathrm{T}}\\ \vdots \\ {t_N^\mathrm{T}} \end{array}} \right] _{N \times m}}\).

H is the output matrix of the hidden layer for ELM. The ith column of H is the ith hidden node output vector with respect to the inputs \({x_1},{x_2}, \ldots ,{x_N}\).

For ELM, the least square method is used to calculate the output weight \({\beta }\), and the representation is shown as

$$\begin{aligned} \beta = {H^{\dag }}T = {({H^\mathrm{T}}H)^{ - 1}}{H^\mathrm{T}}T, \end{aligned}$$
(4)

where the \(H^{\dag }\) is the Moore–Penrose generalized inverse of H.

Fig. 1
figure 1

Structure of the ELM algorithm

2.2 Extreme learning machine autoencoder (ELM-AE)

ELM-AE is an unsupervised learning algorithm (Kasun et al. 2013). The inputs of ELM-AE are also used as the outputs. It has input layer, one single-hidden layer and output layer. The input weights and the hidden biases of ELM-AE are randomly selected and orthogonal, respectively. Structure of the ELM-AE algorithm is shown in Fig. 2. There are N distinctive samples \({x_i} \in {R^N} \times {R^j}, i = 1,2, \ldots ,N\), where j is the number of input nodes. The outputs of ELM-AE hidden layer can be written as

$$\begin{aligned} h = g(ax + b), \end{aligned}$$
(5)

where \({a^\mathrm{T}}a = I\), \({b^\mathrm{T}}b = 1\).

The mathematical relationship for the outputs of hidden layer and the outputs of output layer can be represented as

$$\begin{aligned} h({x_i})\beta = x_i^\mathrm{T}, i = 1,2, \ldots ,N, \end{aligned}$$
(6)

where \({\beta }\) represents the output weights of output layer. Moreover, there have three cases of calculating the \({\beta }\) (Ding et al. 2015).

Case 1 The number of hidden layer nodes is less than the number of training data.

$$\begin{aligned} \beta = {\left( {\frac{I}{C} + {H^\mathrm{T}}H} \right) ^{ - 1}}{H^\mathrm{T}}X. \end{aligned}$$
(7)

Case 2 The number of hidden layer nodes is more than the number of training data.

$$\begin{aligned} \beta = {H^\mathrm{T}}{\left( {\frac{I}{C} + H{H^\mathrm{T}}} \right) ^{ - 1}}X. \end{aligned}$$
(8)

Case 3 The number of hidden layer nodes is equal to the number of training data.

$$\begin{aligned} \beta = {H^{ - 1}}X. \end{aligned}$$
(9)
Fig. 2
figure 2

Structure of the ELM-AE algorithm

2.3 Multilayer extreme learning machine (ML-ELM)

ML-ELM uses the unsupervised learning to train parameters for every hidden layer, and ELM-AE is used to train parameters of every hidden layers for ML-ELM (Kasun et al. 2013). The ML-ELM algorithm does not need to be fine-tuned, and then it can save time in terms of training network (Tang et al. 2016). Figure 3 shows structure of the ML-ELM algorithm.

For ML-ELM, the activation function of every hidden layer can be either linear or nonlinear piecewise (Kasun et al. 2013). Note: if the number of nodes for the kth hidden layer is equal to the number of nodes for the \(k-1\)th hidden layer, the activation function is linear; otherwise, it is nonlinear piecewise. The output of the kth hidden layer can be represented as

$$\begin{aligned} {H_k} = g\left( {{{\left( {{\beta ^k}} \right) }^\mathrm{T}}{H_{k - 1}}} \right) , \end{aligned}$$
(10)

where \({H_k}\) and \({H_{k-1}}\) are the output matrix and the input matrix of the kth hidden layer, respectively. \(g( \cdot )\) represents the activation function of every hidden layer. Note: \(k-1=0\), we explain that \({H_0}\) is the input matrix of the first hidden layer (it is also the output matrix of the input layer) and \({H_1}\) is the output matrix of the first hidden layer.

Fig. 3
figure 3

Structure of the ML-ELM algorithm

2.4 The PLS-ML-ELM algorithm

PLS method brings together the characters of the multiple linear regression and principal component analysis, and it is an effective method to model between multi-independent variables and multidependent variables (Wold et al. 2001). Moreover, PLS can deal with the multicollinearity problem. Hence, the PLS-ML-ELM algorithm uses PLS method to get rid of the multicollinearity problem and noise. In addition, the output weights between the last hidden layer and the output layer also can be calculated directly through partial least square.

Suppose that the output matrix of the last hidden layer for ML-ELM is \({H_\mathrm{Last}} \in {R^{N \times m}}\), where N is the number of sample data, m is the number of nodes for the last hidden layer. And the outputs of output layer are \(Y \in {R^{N \times l}}\), where l represents the number of nodes for the output layer. PLS method is used to establish the relationship between \({H_\mathrm{Last}}\) and Y (Su et al. 2016). The mathematical representation is shown as

$$\begin{aligned} Y = {H_\mathrm{Last}}{\beta _\mathrm{PLS}} + \xi , \end{aligned}$$
(11)

where \({\beta _\mathrm{PLS}}\) can be calculated by PLS method, and \(\xi \) is noise.

The main idea of the PLS-ML-ELM algorithm is as follows: the first component \({u_1}\) from \({H_\mathrm{Last}}\) (\({u_1}\) is the linear combination of \({h_1},{h_2}, \ldots ,{h_m}\), where \({H_\mathrm{Last}} = [ {h_1},{h_2}, \ldots ,{h_m}]_{N \times m}\)), and the first component \({v_1}\) from Y (\({v_1}\) is the linear combination of \({y_1},{y_2}, \ldots ,{y_l}\)) is, respectively, extracted. In the same time, the correlation degree between \({u_1}\) and \({v_1}\) is maximum; then the regression model between \({y_1},{y_2}, \ldots ,{y_l}\) and \({u_1}\) is established. If the regression function reaches the satisfactory accuracy, the algorithm can be terminated; otherwise, the \({u_2}\) and \({v_2}\) from the residual matrix of \({H_\mathrm{Last}}\) and Y are continuously extracted. Hence, the regression equation is established until it achieves the satisfactory accuracy (Geladi and Kowalski 1986).

Assume that there have r extracted components \({u_1},{u_2}, \ldots ,{u_r}\), and the regression equation between \({y_1},{y_2},\ldots ,{y_l}\) and \({u_1},{u_2}, \ldots ,{u_r}\) is established. Furthermore, the regression equation between \({y_1},{y_2}, \ldots ,{y_l}\) and \({h_1},{h_2}, \ldots ,{h_m}\) is established; then the output weights \({\beta _\mathrm{PLS}}\) is obtained. The detailed steps are as follows.

Step 0 Given a data set \(({x_i},{t_i})\) which has N sample data, where \({x_i} = {[{x_{i1}},{x_{i2}}, \ldots ,{x_{im}}]^\mathrm{T}} \in {R^m}\) represents the ith sample data has m features, \({t_i} = {[{t_{i1}},{t_{i2}}, \ldots ,{t_{il}}]^\mathrm{T}} \in {R^l}\) is the target vector. Each of the hidden layer weights is initialized through applying ELM-AE which performs layerwise unsupervised training. According to Eq. (10), the output matrixes of every hidden layer can be calculated, until \({H_\mathrm{Last}}\) , the output matrix of the last hidden layer, is obtained.

Step 1 Extract the first pair components \({u_1}\) and \({v_1}\), and make the correlation degree between \({y_1},{y_2},\ldots ,{y_l}\) and \({u_1}\) the largest; moreover, \({u_1}\) and \({v_1}\) should meet two requests:

  1. 1.

    \({u_1}\) and \({v_1}\) extract as much as possible information of variables from variable group, respectively.

  2. 2.

    Correlation degree between \({u_1}\) and \({v_1}\) reaches the largest.

And \({u_1}\) and \({v_1}\) can be represented as

$$\begin{aligned} \left\{ \begin{array}{l} {u_1} = {H_\mathrm{Last}}{p_1} = \left[ {\begin{array}{*{20}{c}} {{h_{11}}}&{} \cdots &{}{{h_{1m}}}\\ \vdots &{} \ddots &{} \vdots \\ {{h_{N1}}}&{} \cdots &{}{{h_{Nm}}} \end{array}} \right] \left[ {\begin{array}{*{20}{c}} {{p_{11}}}\\ \vdots \\ {{p_{1m}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{u_{11}}}\\ \vdots \\ {{u_{N1}}} \end{array}} \right] ,\\ {v_1} = Y{q_1} = \left[ {\begin{array}{*{20}{c}} {{y_{11}}}&{} \cdots &{}{{y_{1l}}}\\ \vdots &{} \ddots &{} \vdots \\ {{y_{N1}}}&{} \cdots &{}{{y_{Nl}}} \end{array}} \right] \left[ {\begin{array}{*{20}{c}} {{q_{11}}}\\ \vdots \\ {{q_{1l}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{v_{11}}}\\ \vdots \\ {{v_{N1}}} \end{array}} \right] , \end{array} \right. \nonumber \\ \end{aligned}$$
(12)

where \({p_1} = {[{p_{11}}, \ldots ,{p_{1m}}]^\mathrm{T}}\) and \({q_1} = {[{q_{11}}, \ldots ,{q_{1l}}]^\mathrm{T}}\) are the loading factor of \({H_\mathrm{Last}}\) and Y, respectively.

So requests (1) and (2) can be transformed as condition extremum problem

$$\begin{aligned} \begin{array}{ll} \displaystyle max\begin{array}{*{20}{c}} {}&{}{} \end{array}({u_1},{v_1}) = ({H_\mathrm{Last}}{p_{1,}},Y{q_1}) = p_1^\mathrm{T}H_\mathrm{Last}^\mathrm{T}Y{q_1},\\ \displaystyle s.t.\begin{array}{*{20}{c}} {}&{}{} \end{array}\left\{ {\begin{array}{*{20}{c}} {p_1^\mathrm{T}{p_1} = {{\left\| {{p_1}} \right\| }^2} = 1,}\\ {q_1^\mathrm{T}{q_1} = {{\left\| {{q_1}} \right\| }^2} = 1.} \end{array}} \right. \end{array} \end{aligned}$$
(13)

Applying the Lagrange multiplier method, that is

$$\begin{aligned} L = p_1^\mathrm{T}H_\mathrm{Last}^\mathrm{T}Y{q_1} - {\lambda _1}(p_1^\mathrm{T}{p_1} - 1) - {\lambda _2}(q_1^\mathrm{T}{q_1} - 1). \end{aligned}$$
(14)

Obtain the derivation of L about \({p_1},{q_1},{\lambda _1},{\lambda _2}\)

$$\begin{aligned} \left\{ \begin{array}{lll} \frac{{\partial L}}{{\partial {p_1}}} = H_\mathrm{Last}^\mathrm{T}Y{q_1} - 2{\lambda _1}{p_1} = 0,\\ \frac{{\partial L}}{{\partial {q_1}}} = {Y^\mathrm{T}}{H_\mathrm{Last}}{p_1} - 2{\lambda _2}{q_1} = 0,\\ \frac{{\partial L}}{{\partial {\lambda _1}}} = - (p_{_1}^\mathrm{T}{p_1} - 1) = 0,\\ \frac{{\partial L}}{{\partial {\lambda _2}}} = - (q_{_1}^\mathrm{T}{q_1} - 1) = 0. \end{array} \right. \end{aligned}$$
(15)

According Eq. (15), Eq. (16) can be obtained,

$$\begin{aligned} 2{\lambda _1} = 2{\lambda _2} = p_1^\mathrm{T}H_\mathrm{Last}^\mathrm{T}Y{q_1} = \left\langle {{H_\mathrm{Last}}{p_1},Y{q_1}} \right\rangle . \end{aligned}$$
(16)

Note: \({\theta _1} = 2{\lambda _1} = 2{\lambda _2} = p_1^\mathrm{T}H_\mathrm{Last}^\mathrm{T}Y{q_1}\), so \({\theta _1}\) is the objective function value of the condition extremum problem. Then, there have

$$\begin{aligned}&H_\mathrm{Last}^\mathrm{T}Y{q_1} = {\theta _1}{p_1}, \end{aligned}$$
(17)
$$\begin{aligned}&{Y^\mathrm{T}}{H_\mathrm{Last}}{p_1} = {\theta _1}{q_1}. \end{aligned}$$
(18)

According to Eq. (17) and Eqs. (18), (19) can be obtained

$$\begin{aligned} H_\mathrm{Last}^\mathrm{T}Y{Y^\mathrm{T}}{H_\mathrm{Last}}{p_1} = \theta _{_1}^2{p_1}. \end{aligned}$$
(19)

Observe Eq. (19), \({p_1}\) is the eigenvector of matrix

\(H_\mathrm{Last}^\mathrm{T}Y{Y^\mathrm{T}}{H_\mathrm{Last}}\), \(\theta _{_1}^2\) is the corresponding eigenvalue, and \({\theta _1}\) is the objective function value. According to Eq. (19), \({q_1}\) can be calculated by \({p_1}\); then the score vectors can be represented by the loading factors \({p_1}\) and \({q_1}\),

$$\begin{aligned} \left\{ \begin{array}{l} {u_1} = H_\mathrm{Last}{p_1},\\ {v_1} = Y{q_1}. \end{array} \right. \end{aligned}$$
(20)

Step 2 Establish the regression equation between \({h_1},{h_2}, \ldots ,{h_m}\) and \({u_1}\), \({y_1},{y_2}, \ldots ,{y_l}\) and \({v_1}\), respectively. Note \({E_0} = H_{_\mathrm{Last}},{F_0} = {Y}\). So the regression model is

$$\begin{aligned} \left\{ {\begin{array}{*{20}{c}} {{E_0} = {u_1}\alpha _1^\mathrm{T} + {E_1},}\\ {{F_0} = {v_1}\gamma _1^\mathrm{T} + {F_1},} \end{array}} \right. \end{aligned}$$
(21)

where \({\alpha _1} = {[{\alpha _{11}},{\alpha _{12}}, \ldots ,{\alpha _{1m}}]^\mathrm{T}}\) and \({\gamma _1} = [{\gamma _{11}},{\gamma _{12}}, \ldots , {\gamma _{1l}}]^\mathrm{T}\) are vectors of parameter, \({E_1}\) and \({F_1}\) are residual matrixes. The least square estimation of regression parameter vectors \({\alpha _1}\) and \({\gamma _1}\) is

$$\begin{aligned} \left\{ {\begin{array}{*{20}{c}} {{\alpha _1} = \frac{{E_{^0}^\mathrm{T}{u_1}}}{{{{\left\| {{u_1}} \right\| }^2}}},}\\ {{\gamma _1} = \frac{{{F_0}^\mathrm{T}{v_1}}}{{{{\left\| {{v_1}} \right\| }^2}}}.} \end{array}} \right. \end{aligned}$$
(22)

Step 3 Substitute \({E_1}\) and \({F_1}\) for \({E_0}\) and \({F_0}\). If elements of \(F_1\) are close to zero, the regression equation which is established by using the first pair score vector meets the accuracy, and cease. Otherwise, repeat Step 1 and Step 2; then the second score vector can be expressed as

$$\begin{aligned} \left\{ {\begin{array}{*{20}{c}} {{u_2} = {E_1}{p_2},}\\ {{v_2} = {F_1}{q_2},} \end{array}} \right. \end{aligned}$$
(23)

so the regression model is displayed as

$$\begin{aligned} \left\{ {\begin{array}{*{20}{c}} {{E_0} = {u_1}\alpha _{_1}^\mathrm{T} + {u_2}\alpha _2^\mathrm{T} + {E_2},}\\ {{F_0} = {v_1}\gamma _{_1}^\mathrm{T} + {v_2}\gamma _2^\mathrm{T} + {F_2}.} \end{array}} \right. \end{aligned}$$
(24)

where \({\alpha _2}\) and \({\gamma _2}\) are vectors of regression parameter, they can be represented as

$$\begin{aligned} \left\{ {\begin{array}{*{20}{c}} {{\alpha _2} = \frac{{E_1^\mathrm{T}{u_2}}}{{{{\left\| {{u_2}} \right\| }^2}}},}\\ {{\gamma _2} = \frac{{F_1^\mathrm{T}{v_2}}}{{{{\left\| {{v_2}} \right\| }^2}}}.} \end{array}} \right. \end{aligned}$$
(25)

Step 4 Repeat Step 2 and Step 3, until r principal components are reserved. Meanwhile, the rest, \(m-r\) components have little variation, so they can be seen as the noise or reason of generating the multicollinearity of the last hidden layer. Furthermore, the biases of \(E_r\) and \(F_r\) are extremely small. So there has the regression model

$$\begin{aligned} \left\{ \begin{array}{l} {E_0} = \sum \limits _{i = 1}^r {{u_1}\alpha _1^\mathrm{T} + {u_2}\alpha _2^\mathrm{T} + \cdots + {u_r}\alpha _r^\mathrm{T} + {E_r}} \\ \;\;\;\;\;= U{\alpha ^\mathrm{T}} + {E_r},\\ {F_0} = \sum \limits _{i = 1}^r {{v_1}\gamma _1^\mathrm{T} + {v_2}\gamma _2^\mathrm{T} + \cdots + {v_r}\gamma _r^\mathrm{T} + {F_r}} \\ \;\;\;\;\;= V{\gamma ^\mathrm{T}} + {F_r}. \end{array} \right. \end{aligned}$$
(26)

Moreover, there has inner relationship for \(u_k\) and \(v_k\) (Geladi and Kowalski 1986); then the relationship can be described as

$$\begin{aligned} {v_k} = {u_k}{b_k},k = 1,2, \ldots ,r. \end{aligned}$$
(27)

So the equation of \(F_0\) can be rewritten as

$$\begin{aligned} \begin{array}{l} {F_0} = V{\gamma ^\mathrm{T}} + {F_r} = \sum \limits _{i = 1}^r {{u_1}{b_1}\gamma _1^\mathrm{T} + {u_2}{b_2}\gamma _2^\mathrm{T}} \\ \;\;\;\;\;\quad + \ldots + {u_r}{b_r}\gamma _r^\mathrm{T} + {F_r} = UB{\gamma ^\mathrm{T}} + {F_r}, \end{array} \end{aligned}$$
(28)

where \(\mathop U\limits ^ \wedge = {E_0}P\), the regression equation can be expressed as

$$\begin{aligned} {{\mathop {F}\limits ^ \wedge }_0} = {E_0}PB{\gamma ^\mathrm{T}} + {F_r}. \end{aligned}$$
(29)

According to the above analysis, the output weight of the output layer can be represented as

$$\begin{aligned} {{\mathop {{\beta }}\limits ^ \wedge }_\mathrm{PLS}} = PB{\gamma ^\mathrm{T}}, \end{aligned}$$
(30)

where P is component matrix, B is diagonal matrix, \({\gamma ^\mathrm{T}}\) is the weight matrix of \(F_0\).

2.5 The proposed EPLS-ML-ELM algorithm

For PLS-ML-ELM, it may have different results in different trails of simulations. Hence, in order to overcome this problem, this paper constructs L PLS-ML-ELM networks to form the EPLS-ML-ELM algorithm. EPLS-ML-ELM has better generalization performance than PLS-ML-ELM. For the whole PLS-ML-ELMs of EPLS-ML-ELM, they have same number of hidden layers, and they have same number of every hidden layer nodes. In addition, for each PLS-ML-ELM, every hidden layer weights are initialized through applying ELM-AE. The detailed steps of the EPLS-ML-ELM algorithm are as follows.

  • Step 1 Assemble L PLS-ML-ELMs with same number of hidden layers and same number of every hidden layer nodes, and same action function for each hidden layer node.

  • Step 2 For each PLS-ML-ELM, every hidden layer weights are initialized by applying ELM-AE. And the output weights of each PLS-ML-ELM can be obtained according to Eq. (30).

  • Step 3 The output matrix of output layer for each PLS-ML-ELM network is obtained.

  • Step 4 There are two cases for prediction result of the EPLS-ML-ELM algorithm.

  • Case 1 For the regression problem, the average value of the whole prediction results obtained by L PLS-ML-ELM networks is used as the final prediction result of the EPLS-ML-ELM algorithm. It can be represented as

$$\begin{aligned} {O_\mathrm{final}} = \frac{1}{L}\sum \limits _{i = 1}^L {{O_i}}, i = 1,2 \ldots ,L, \end{aligned}$$
(31)

where i indicates the ith PLS-ML-ELM network.

  • Case 2 For the classification problem, the prediction result of each sample is determined by the highest vote (Xue et al. 2014). Each sample has a class label which is a vector \(\varvec{v}\). The dimension of vector \(\varvec{v}\) is equal to the whole number of classes (suppose there has p classes). For EPLS-ML-ELM, if the prediction result of the ith PLS-ML-ELM network is the kth class, then the kth number of the corresponding vector \(\varvec{v_i}\) is set to one; otherwise, it is set to zero, where \(i=1,2,\ldots ,L\) and \(k=1,2,\ldots ,p\). When each sample is predicted by the whole PLS-ML-ELMs, the prediction vector \({\varvec{v_{final}}}\) of each sample can be calculated as

$$\begin{aligned} \varvec{v_{final}}= \sum \limits _{i = 1}^L {\varvec{v_i}}, \end{aligned}$$
(32)

where L is the number of PLS-ML-ELMs in EPLS-ML-ELM. The biggest label in \({\varvec{v_{final}}}\) is used as the prediction label.

3 Data-driven prediction model of adjusting the burden distribution matrix for blast furnace

Burden distribution matrix is extremely important for smooth operation of the blast furnace. For example, adjusting burden distribution matrix is an effective measure to control radial distribution of gas flow in blast furnace; adjusting burden distribution matrix can improve gas utilization rate and utilization coefficient. Moreover, adjusting burden distribution matrix can make the relationship between the blast pressure and the blast volume stable. Reasonable pressure difference between the blast pressure and the top pressure can be obtained by adjusting the burden distribution matrix. Besides, reasonable permeability index can be obtained by adjusting burden distribution matrix. In practical operation process, operators determine whether the burden distribution matrix needs to be adjusted according to the blast furnace condition parameters and operation experience. At the blast furnace operation site, these blast furnace condition parameters (the blast volume, the blast pressure, the blast velocity, the top pressure, the permeability index, the gas utilization rate and the utilization coefficient) are used to determine whether the burden distribution matrix needs to be adjusted. Data of these seven parameters can be obtained. Moreover, for the burden distribution matrix, there have two class labels: class 1 represents that the burden distribution matrix needs to be adjusted, and class 0 represents that the burden distribution matrix does not need to adjusted. According to the above explanation, the two class labels of the burden distribution matrix are the dependent variables, and these seven blast furnace condition parameters are the independent variables. Hence, based on the collected data, operation experience and machine learning algorithm, this paper will establish a data-driven prediction model to determine whether the burden distribution matrix needs to be adjusted.

Fig. 4
figure 4

Structure of the data-driven prediction model of adjusting burden distribution matrix based on the EPLS-ML-ELM algorithm

In this paper, the proposed EPLS-ML-ELM algorithm is used as the prediction algorithm for data-driven prediction model. The structure of this data-driven prediction model is shown in Fig. 4. The detailed modeling steps are introduced as follows.

  • Step 1 Determine the parameters of data-driven prediction model.

    According to the above analysis, there are seven blast furnace condition parameters (the blast volume, the blast pressure, the blast velocity, the top pressure, the permeability index, the gas utilization rate and the utilization coefficient) are used as the input parameters of data-driven prediction model. And class labels (1 represents that the burden distribution matrix needs to be adjusted, and 0 represents that the burden distribution matrix does not need to be adjusted) are used as the output parameters of data-driven prediction model.

  • Step 2 Establish the EPLS-ML-ELM structure.

    The EPLS-ML-ELM algorithm is consisted of several PLS-ML-ELMs, and the number of all PLS-ML-ELMs is represented as L. Moreover, for EPLS-ML-ELM, each PLS-ML-ELM uses the whole dataset.

  • Step 2 includes seven small parts.

    (2–1) In this paper, the EPLS-ML-ELM algorithm is used as the prediction algorithm for data-driven prediction model. And EPLS-ML-ELM is consisted of L PLS-ML-ELMs. According to Step 1, for each PLS-ML-ELM, there are seven input nodes of input layer and two output nodes of output layer.

    (2–2) Determine the number of hidden layers and the number of every hidden layer nodes. For all PLS-ML-ELMs, they have same number of hidden layers and same number of every hidden layer nodes. In this paper, both the number of hidden layers and the number of each hidden layer nodes are determined through many times simulation testing. Moreover, for each PLS-ML-ELM, every hidden layer weight is initialized by applying ELM-AE.

    (2–3) The activation function of every hidden layer is sigmoid function.

    (2–4) Determine the ensemble number L. The EPLS-ML-ELM algorithm is consisted of several PLS-ML-ELMs. The number of the whole PLS-ML-ELMs needs to be chosen.

    (2–5) For each PLS-ML-ELM, the output matrix of every hidden layer \(H_i\) is calculated, where i represents the ith hidden layer.

    (2–6) For each PLS-ML-ELM, the connection weight \(H_\mathrm{Last}\) between the last hidden layer and the output layer is calculated by Eq. (30).

    (2–7) For EPLS-ML-ELM, the prediction result can be obtained by Case 2 of Step 4 in Sect. 2.5. Based on these two steps, the structure of data-driven prediction model of adjusting the burden distribution matrix is established.

  • Step 3 Verify the data-driven prediction model.

    If evaluation criterions of this data-driven prediction model meet the required precision, then the data-driven prediction model has been established. Otherwise, return (2–2) to adjust the number of hidden layers and the number of every hidden layer nodes, and return (2–4) to adjust the number of PLS-ML-ELMs in EPLS-ML-ELM, and then reestablish the data-driven prediction model.

Table 1 Information of sampling data
Table 2 Numbers of nodes of ML-ELM which has different numbers of hidden layers
Table 3 Comparison of prediction results using the ML-ELM which has different numbers of hidden layers

4 Simulation results

In order to testify the rationality and the prediction accuracy of the data-driven prediction model, this paper adapts the production data of the Blast Furnace with 2500 m\(^3\) to testify. There are 1000 data pairs (502 data pairs for class 1, 498 data pairs for class 0). Eight hundred data pairs are used as the training data, and the rest data pairs are the testing data. The detailed information of data for these seven blast furnace condition parameters is shown in Table 1. From Table 1, there is huge difference for the dimension of these seven parameters. The difference may provide fluctuations and influence the accuracy of the data-driven prediction model. Therefore, data of these seven parameters are normalized in this paper. Besides, in order to indicate the data-driven prediction model based on the EPLS-ML-ELM algorithm that has better prediction accuracy and generalization performance, this paper also uses the SVM algorithm, the ELM algorithm, the ML-ELM algorithm and the PLS-ML-ELM algorithm to establish the data-driven prediction model. And these data-driven prediction models are compared with the data-driven prediction model based on the EPLS-ML-ELM algorithm. The whole simulation experiments have been conducted in MATLAB 8.3.0 software.

For the whole data-driven prediction models based on algorithms, this paper adopts the training time, the testing time, the accuracy and the F-score as evaluation criterions. Accuracy is a percentage and can illustrate the good generalization performance of data-driven prediction model when it is close to 100%. F-score ranges from 0 to 1, and it contains precision and recall. F-score reaches the best value at 1 and the worst value at 0. The representations of the accuracy and the F-score are shown as

$$\begin{aligned}&{\text {Accuracy}}\,=\,\frac{{{\text {TP}} \,+\, {\text {TN}}}}{{{\text {TP}} \,+ \,{\text {FP}} \,+ \,{\text {FN}} \,+ \,{\text {TN}}}}, \end{aligned}$$
(33)
$$\begin{aligned}&F{\text {-score}}=2 \cdot \frac{{{\text {precision}}\, \cdot {\text {recall}}}}{{{\text {precision}} \,+\, {\text {recall}}}}\nonumber \\&\quad =\frac{{2{\text {TP}}}}{{2{\text {TP}}\, + \,{\text {FP}} \,+ \,{\text {FN}}}}, \end{aligned}$$
(34)

where TP is the number of samples that correctly predicted to “1”, FP is the number of samples that falsely predicted to “1”, FN is that the number of samples that falsely predicted to “0”, and TN is the number of samples that correctly predicted to “0”; precision represents precision ratio (\({\text {precision}} \,=\, \frac{\text {TP}}{{\text {TP}}\, +\, {\text {FP}}}\)), and recall represents recall ratio (\({{\text {recall}} \,= \frac{{{\text {TP}}}}{{{\text {TP}} \,+\, {\text {FN}}}}}\)).

In order to ensure the data-driven prediction model to reach the optimal target, this paper repeats many times experiments to determine the number of hidden layers and the number of nodes for these algorithms. Moreover, this paper also repeats many times experiments to determine the numbers of nodes in all the hidden layers. For ELM, the number of hidden layer nodes is 100. For ML-ELM, two hidden layers, three hidden layers and four hidden layers are adopted to test in this paper, respectively. The numbers of every hidden layer nodes are shown in Table 2. And the prediction results are given in Table 3. It can be shown that the prediction accuracy of the ML-ELM algorithm with three hidden layers or four hidden layers is better than the ML-ELM algorithm with two hidden layers. In addition, evaluation criterions of the ML-ELM algorithm containing three hidden layers are approximate with the ML-ELM algorithm with four hidden layers. To further demonstrate the influence of the number for hidden layers in the ML-ELM algorithm about prediction accuracy, the five hidden layers, the six hidden layers, the seven hidden layers and the eight hidden layers are adopted, respectively. However, as the increase in the number for hidden layers, the prediction accuracy is not significantly improved. Hence, it can be clearly known that the ML-ELM algorithm with three hidden layers can well establish the data-driven prediction model. Moreover, the simulation results of ML-ELM with different hidden layers are displayed in Fig. 5. Based on the above analysis, this paper adopts the ML-ELM algorithm with three hidden layers. For PLS-ML-ELM, it also adopts three hidden layers structure. And the number of every hidden layer nodes is same with ML-ELM’s.

Fig. 5
figure 5

Accuracy of different numbers for hidden layers of ML-ELM

For EPLS-ML-ELM, each PLS-ML-ELM also adopts three hidden layers structure. And the number of every hidden layer nodes is also same with ML-ELM’s. It is worth noting that the ensemble parameter L of the EPLS-ML-ELM algorithm needs to be determined. In this paper, L is set as 5, 10, 15, 20 and 25, respectively. Prediction results of different number for PLS-ML-ELMs are shown in Table 4 and Fig. 6. From Table 4 and Fig. 6, it can be seen that the best result is obtained when the number of PLS-ML-ELMs in EPLS-ML-ELM is 15, and the prediction result is better than single PLS-ML-ELM as well. Hence, the EPLS-ML-ELM algorithm which is consisted of 15 PLS-ML-ELMs is used as the prediction algorithm of the data-driven prediction model.

Table 4 Comparison of prediction results using the EPLS-ML-ELM which has different numbers of PLS-ML-ELMs
Fig. 6
figure 6

Accuracy of EPLS-ML-ELM with different numbers of PLS-ML-ELMs

Table 5 Comparison of prediction results using different algorithms

For determination of the burden distribution matrix whether needs to be adjusted, this paper uses different algorithms to establish the data-driven prediction model. And all the algorithms are, respectively, run 50 times. The standard deviation (SD) is used as an evaluation criterion, and it is written as

$$\begin{aligned} {\text {SD}} = \sqrt{\frac{{\sum \nolimits _{i = 1}^N {{{({X_i} - {\bar{X}})}^2}} }}{{n - 1}}} \end{aligned}$$
(35)

where \({X_i}\) is the accuracy of the ith simulation, \({\bar{X}}\) is the average value of all the accuracies in the whole simulations, and n is the number of all the simulations.

Fig. 7
figure 7

Results of data-driven prediction models based on different algorithms. a Accuracy for different algorithms. b F-score for different algorithms

Prediction results of the whole data-driven prediction models based on different algorithms (the SVM algorithm, the ELM algorithm, the ML-ELM algorithm, the PLS-ML-ELM algorithm and the proposed EPLS-ML-ELM algorithm) are shown in Table 5 and Fig. 7. From Table 5 and Fig. 7, the data-driven prediction model based on the EPLS-ML-ELM algorithm has better prediction results than others. In Table 5, the training accuracy and the testing accuracy of SVM are 75.88 and 74.00%; the training accuracy and the testing accuracy of ELM are 88.13 and 86.50%; the training accuracy and the testing accuracy of ML-ELM are 90.63 and 90.00%; the training accuracy and the testing accuracy of PLS-ML-ELM are 92.13 and 91.50%; the training accuracy and the testing accuracy of the proposed EPLS-ML-ELM algorithm are 93.75 and 93.50%. As the above results shown, the prediction accuracy of the proposed EPLS-ML-ELM algorithm is better than other algorithms. In terms of training time and testing time, the SVM algorithm costs 0.3609 and 0.0342 s; the ELM algorithm costs 0.0936 and 0.0033 s; the ML-ELM algorithm costs 0.3588 and 0.0312 s; the PLS-ML-ELM algorithm costs 0.4485 and 0.0396 s; the proposed EPLS-ML-ELM algorithm costs 7.3106 and 0.5117 s. Although the data-driven prediction model based on the proposed EPLS-ML-ELM algorithm costs the most training time and testing time, its prediction accuracy is more precise than data-driven prediction models based on other algorithms. In addition, the training time and the testing time of the proposed EPLS-ML-ELM algorithm are measured by seconds, which meet the requirement of the blast furnace process. For the training F-score and the testing F-score, the data-driven prediction model based on the proposed EPLS-ML-ELM algorithm is 0.9382 and 0.9359. They are greater than data-driven prediction models based on other algorithms. The data-driven prediction model based on the proposed EPLS-ML-ELM algorithm is 0.0053 and 0.0062 in the aspects of the training SD and the testing SD. And they are smaller than the data-driven prediction models based on other algorithms, which indicates that the proposed EPLS-ML-ELM algorithm is better and more stable than other algorithms. According to the above analysis, the proposed EPLS-ML-ELM algorithm has better prediction accuracy and generalization performance than other algorithms. Compared with data-driven prediction models based on other algorithms, the data-driven prediction model based on the proposed EPLS-ML-ELM algorithm can better predict the burden distribution matrix whether needs to be adjusted. In order to further indicate the proposed EPLS-ML-ELM algorithm has better generalization performance, the standard data sets are used to testify EPLS-ML-ELM. Moreover, it is compared with other algorithms. It is shown in Appendixes.

5 Conclusions

Reasonable burden distribution matrix of the blast furnace can realize the smooth operation of the blast furnace. It is extremely important for the blast furnace. Based on the collected data of blast furnace production site, operation experience and machine learning algorithm, this paper establishes a data-driven prediction model. This data-driven prediction model can determine whether the burden distribution matrix needs to be adjusted. In this paper, the proposed EPLS-ML-ELM algorithm is used to establish the data-driven prediction model. This proposed algorithm is based on the PLS-ML-ELM algorithm and the ensemble model. For PLS-ML-ELM, the PLS method is used to overcome the multicollinearity problem. However, PLS-ML-ELM may have different results in different trails of simulations. Hence, the ensemble model is introduced to overcome this problem. Then the EPLS-ML-ELM is consisted of several PLS-ML-ELMs. The blast furnace production data are used to validate the data-driven prediction model based on the EPLS-ML-ELM algorithm. Compared with data-driven prediction models based on other algorithms, simulation results illustrate that the data-driven prediction model based on the EPLS-ML-ELM algorithm can better determine whether the burden distribution matrix needs to be adjusted. Furthermore, this data-driven prediction model can offer decision for the subsequent operation of the blast furnace. In the future, we will continuously improve the ML-ELM algorithm and make the data-driven prediction model more precise.