1 Introduction

Nowadays, there is a huge demand for coal consumption with a substantial gradual increase, and longwall mining is the most responsible method of coal extraction. Mechanized longwall mining is highly productive for coal exploitation against other underground mining methods [1]. However, the frequency of injuries and fatalities related to roof failure has correspondingly been increased by production rising in longwall coal mines [2].

Designing sustainable and functional tunnels in a longwall panel is indispensable to provide the mining safety and maximize the recovery rate. The longwall mining will be commenced at the coalface through creating a slot, and then, coal is extracted by parallel cutting in narrow slices. A schematic perspective for a longwall panel along with the two side tunnels is presented in Fig. 1.

Fig. 1
figure 1

Longwall panel along with tailgate and headgate roadways

Essentially, two tunnels are excavated in a single-entry longwall panel; each serves a particular function. One of the tunnels is excavated for haulage of extracted material, personnel’s passageway, and transportation of supplies called the headgate roadway. The other one is mainly used for egress, and outby ventilation named the tailgate roadway [1]. These tunnels are derived on either side of a longwall panel off the maingate roadway.

One of the primary goals in designing a longwall mine is to gain an accurate prediction of tunnel stability and support requirements [3]. The longwall roadways may endure tremendous loading conditions, because of stress concentration around coalface and T-junctions. However, there are currently no reasonable proved solutions available to tackle with roof instabilities in longwall mining. The first stage in designing a longwall mine is, therefore, to perform a geotechnical assessment of the site. This issue may lead to identifying geological structures, which chiefly influence the tunnels’ stability [4]. Besides, a suitable monitoring program is emphasized to be implemented in order to collect the information of strata displacements recorded from the under-loading tunnels at the longwall panel. In this regard, remote reading telltales are the reliable real-time monitoring systems for recording displacements in longwall coal mines [5].

Excessive displacements of the roof strata at the vicinity of a rock structure in longwall mining can cause roof instability or roof failure at workings, which may be redounded some irrecoverable safety problems. Unstable roof strata may also result in unwanted collapses, damages to equipment, production delays, etc. Therefore, implementing a monitoring program for measuring roof displacements is a practical technique to cope with undesirable disasters in longwall mining [6, 7].

However, it is impossible to accurately predict any parameters related to rock structures due to inherent complexity and uncertainty associated with the geological conditions. Nonetheless, it is feasible to derive a reasonable judgment for stability prediction, provided that the trends of the displacements are recorded, and their changes could be related to the geomechanical information. This issue may also be used as a precise and timely predictor model, which will be updated by introducing new historical data.

Literature survey shows that the application of computational intelligence (CI) approaches in analyzing underground rock structures are recently attracted much interest. Among them, artificial neural networks (ANNs) [8, 9], support vector machines (SVMs) [10, 11], and fuzzy logic systems [12, 13] are more commonly used CI methods, which were applied for solving regression and function approximation problems.

Nonetheless, ANNs rather suffer from some deficiencies such as local extrema, slow convergence, inability to cover uncertainty, and difficulty to understand, which may be deficient in practice [14]. Nonetheless, fuzzy rule-based (FRB) systems are capable of presenting human-like reasoning based on the expert knowledge. Also, FRB systems are composed of a set of IF–THEN rules, implying the uncertainty associated with complex geomechanical problems, in which the IF-part is the antecedent (premise), and the THEN-part is the consequence (conclusion). The antecedent part of fuzzy rules consists of some fuzzy sets determining the Membership Degree (MD) of input patterns [15].

The theory of fuzzy was developed to embody the vagueness and imprecise information in scientific problems [15]. Thereafter, fuzzy logic on the basis of the fuzzy set theory and approximate reasoning was extensively employed to handle the imprecision and ambiguity related to engineering problems.

In recent decades, the fuzzy-based models emerge as suitable tools in various fields of mining engineering for encountering with uncertainties in earth sciences. In this regard, adaptive neuro-fuzzy inference system (ANFIS) [16] is extensively employed in different subjects related to underground rock structures to identify the Membership Functions (MFs) through combining fuzzy logic and ANNs.

Rangel et al. [17] developed a neuro-fuzzy system to appraise the stability of tunnels during excavation. Adoko et al. [18] predicted the tunnel convergence using the Mamdani fuzzy system. Adoko and Wu [6] proposed an ANFIS model to predict the tunnel convergence. Farid et al. [19] also developed a neuro-fuzzy model to predict the roof fall phenomenon in underground coal mines.

Song et al. [20] proposed a coal mine intelligent system based on the ANFIS for safety management through forecasting and optimizing the released gases in the mine environment. Ghasemi and Ataei [21] developed a fuzzy model to predict roof fall rate in coal mines. Bouayad and Emeriault [22] proposed a hybrid ANFIS method to predict the surface settlements in shield tunneling on the basis of operational and geological parameters. Felka and Brodny [23] employed ANFIS to predict hazardous zones in longwall coal mines. Chen et al. [24] proposed an ANFIS system for structural safety evaluation of in-service tunnels.

In order to enhance some deficits in ANFIS, the local linear models (LLMs) were developed [25, 26]. In this respect, two applicable local linear neuro-fuzzy models are the local linear model tree (LoLiMoT) and the hierarchical local model tree (HiLoMoT). In fact, the LLMs generate a fuzzy system to solve nonlinear problems by breaking them down into several more convenient subproblems.

The LLMs are developed based on the input–output data pairs without necessity to the predetermined settings by experts [27]. Moreover, the convergence speed and performance of the LLMs are recently enhanced through a nonlinear relationship in the hierarchical algorithm of the HiLoMoT model [28]. The LoLiMoT and HiLoMoT models are innovative in mining engineering and earth sciences, and there are no research works in these fields. However, much interest is newly drawn toward them in various fields of sciences.

Aflakian et al. [29] proposed an intelligent algorithm based on the LoLiMoT for kinematic controlling of the cable-suspended parallel robots. Du et al. [30] developed a new model on the basis of the HiLoMoT for rapid calibration test of diesel engines. Razavi et al. [31] employed the LoLiMoT in an updatable prognosis methodology. Oliaee et al. [32] proposed an incremental algorithm by combining the genetic and LoLiMoT methods for nonlinear fault detection and identification in industrial gas turbines. Rastegarmanesh et al. [33] employed LoLiMoT to develop a fuzzy model for prediction of rockburst in underground rock structures. Malekizadeh et al. [34] employed LoLiMoT as an ensemble neuro-fuzzy model to extract the short-term load profile trends. Salmanpour et al. [35] predicted the motor outcome using a machine learning method trained with LoLiMoT. Shahsavari et al. [36] proposed a LoLiMoT model to identify the geochemical anomalies.

This research is conducted to predict the roof displacements in longwall tailgates based on the local model networks for considering the uncertainties associated with geomechanical and geological conditions. Three models of HiLoMoT, LoLiMoT, and ANFIS are developed to predict roof displacements based on the geomechanical information. The prediction capabilities of the proposed models are examined through comparing the results with actual measured data.

2 Case description

Tabas mine is a mechanized underground coal mine located in the northeast of Iran, which was commissioned in 2007 to produce about 1.7 Mt coking coal from each panel.

This mine is situated about 85 km south of Tabas County, South Khorasan Province (Fig. 2). Tabas mine is geologically placed in the Parvadeh coal basin, which is developed in the central part of an asymmetrical anticline in an area of 1200 km2, as shown in Fig. 2 [37]. The northern boundary of the mine is bounded with the Rostam fault, a reverse fault with a displacement of up to 700 m. Therefore, rock formations in the vicinity of the Rostam fault endure a severe deformation due to tight folding and numerous minor faults.

Fig. 2
figure 2

Location and geological structure of the mine, South Khorasan, Iran

The rock strata in the Tabas coal basin are typically mudstone, limestone, siltstone, and sandstone sequences. The main coal seams are within a 50-m section of the central strata. The seam C1 is now extracting, which was developed from its outcrop on the south side of the Parvadeh anticline. The seam C2 proximately lies above the seam C1 at a distance of less than 1 m in the southwest to 18 m in the northeast, having the potential to augment the risk of roof failure due to thin bedding and weak strengths. Due to that, many roof failures were taken place during mining operations in panel E2, causing several delays and downtimes in the production plan. The plan view of the longwall panels at Tabas coalfield is shown in Fig. 3, in which the panel E2 considered in our research has been hatched. The width and length of the panel E2 are, respectively, 212 m and 1200 m. The depth of mine is in the range of 300 m. The seam thickness in panel E2 varies from 1.5 to 2 m. The strata gradients in the tailgate roadway vary from 1 in 5 to 1 in 2 [37].

Fig. 3
figure 3

Location of panel E2 in Tabas mine [37]

The catastrophic instabilities in panel E2 are more severe than other longwall panels such that many roof failures have occurred during exploitation of this panel. The major factor affecting strata instabilities and huge roof collapses is found to be the incontrollable roof displacements. The idea behind this study is to develop a fuzzy-based model to indicate the unstable zones according to the corresponding geomechanical and monitoring information. Moreover, since the risk of encountering minor faulting is high in the region, especially in the southwestern margins, the timely prediction of roof displacements is a fundamental measure in controlling roof instabilities.

3 Methodology

The LLMs are developed based on the assumptions of the fuzzy theory. Therefore, the fuzzification procedure and the fuzzy system’s design are briefly mentioned at first.

In fuzzy mathematics, the fuzzifier module at the first step maps the crisp input patterns to the fuzzy sets characterized by an MF (\(\mu \in \left[ {0,1} \right]\)). The MFs are selected on the basis of the problem nature and can be varied in different types of triangular, trapezoidal, Gaussian, sigmoidal, etc. The FRB is then constructed according to a collection of IF–THEN rules, which can be expressed:

$$\Re ^{j} :~\left\{ {\begin{array}{ll} {~{\text{IF}}~\,~x_{1} ~{\text{is}}~\,A_{1}^{j} \;~~~{\text{AND}}\;~~~x_{2} \,~is~\,A_{2}^{j} ~\;~{\text{~AND}}\,~~ \ldots ~\,~{\text{AND}}\;~~~x_{n} ~\,{\text{is}}\,~A_{n}^{j} } \\ {{\text{THEN}}\,~y^{j} = ~B^{j} } \\ \end{array} } \right.$$
(1)

where \(\Re^{j}\) is the \(j\)th fuzzy rule for \(j = 1{ }, \ldots ,{ }R\) including the IF-part for the antecedent and the THEN-part for consequent of a rule. Also, \(x_{k}\) is the \(k\)th input variable of the \(n\)-dimensional input vector \(\underline {x} = \left( {x_{1} { },{ }x_{2} ,{ } \ldots ,{ }x_{n} } \right)\), and the linguistic term, \(A_{k}^{j}\), is the fuzzy MF associated with \(x_{k}\) in the \(j\)th rule. Furthermore, the output of the \(j\)th fuzzy rule is derived by \(B^{j}\).

In order to aggregate the IF–THEN rules, a fuzzy inference engine should be applied, which uses fuzzy rules to implement the input–output mapping. Finally, the last step of the fuzzy modeling is defuzzification, in which the fuzzy set output would be converted to the crisp or numeric value. Figure 4 depicts a block diagram of the fuzzification process.

Fig. 4
figure 4

Block diagram for fuzzification process

Among various types of FRB models, there are two more common models, namely Takagi–Sugeno (TS) and Mamdani systems. Since the consequent parts of the rules in the Mamdani inference system are defined based on the linguistic variables, such systems can only use the fixed MFs, which are predetermined by an expert. In contrast, the TS models, which are interpreted as local models, have been proposed by considering a mathematical function in the consequent part of each rule [38, 39]. Therefore, a TS fuzzy rule can be rewritten as [40]:

$${\Re }^{j} :~\left\{ {\begin{array}{ll} {{\text{~IF}}~\,~x_{1} ~{\text{is}}~A_{1}^{j} ~\;{\text{AND}}~\;x_{2} ~{\text{is}}~A_{2}^{j} \;~~{\text{AND}}\,~ \ldots ~~\,{\text{AND}}\;~~x_{n} ~{\text{is}}~A_{n}^{j} } \\ {{\text{THEN}}\,~y^{j} = ~f\left( {\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{x} } \right)} \\ \end{array} } \right.$$
(2)

where \(y^{j} = { }f\left( {\underline {x} } \right)\) is a crisp function in the consequent part. In Eq. (2), each rule’s consequent will be constructed through an arbitrary function associated with the input data partitions, and consequently, the output of the system is obtained by weighted averaging. In general, the defuzzification step is more convenient in the TS model in comparison with the Mamdani model. In addition, the TS models are more applicable in approximating the nonlinear systems with fewer rules. Therefore, the learning capability of the TS systems made it more reliable than the Mamdani type [41].

3.1 Fuzzy system design

There are different choices for MFs in the antecedent part of a fuzzy rule to convert the crisp input patterns to their fuzzy values. One of the most applicable MFs is the Gaussian function, in which the unknown parameters can be estimated by differentiable gradient-based methods. Herein, the Gaussian membership function (GMF) has been considered for specifying the fuzzy sets. Therefore, each input variable, \(x_{k}\), is first fuzzified through the GMF as follows:

$$\eta_{{A_{k}^{j} }} = \exp \left[ { - 0.5\left( {{ }\frac{{x_{k} - c_{k}^{j} }}{{\sigma_{k}^{j} }}} \right)^{2} } \right]$$
(3)

where \(c_{k}^{j}\) and \(\sigma_{k}^{j}\) stand for the center and the width of the \(j\)th GMF considered in the \(k\)th input variable. Afterward, by using the product inference method as the AND operation, the output of each rule can be obtained:

$$\mu^{j} = \mathop \prod \limits_{k = 1}^{n} \eta_{{A_{k}^{j} }}$$
(4)

Taking into account the whole fuzzy rules, the normalized output of the \(j\)th rule is given as:

$$\hat{\mu }^{j} = \frac{{\mu^{j} }}{{\mathop \sum \nolimits_{j = 1}^{R} \mu^{j} }}$$
(5)

where \(\hat{\mu }^{j}\) is the firing strength employed for determining the contribution degree of each rule in the whole network. Finally, the output of the fuzzy system is calculated:

$$\hat{y} = \mathop \sum \limits_{j = 1}^{R} \hat{\mu }^{j} {\text{y}}^{j}$$
(6)

3.2 ANFIS

ANFIS is a kind of ANNs which was developed based on the TS fuzzy inference system. In an ANFIS model, nonlinear functions are approximated through defining a set of IF–THEN rules. Therefore, the structure of an ANFIS model is similar to that of the ANNs (Fig. 5), in which neurons in hidden layers are replaced with the fuzzy rules [16].

Fig. 5
figure 5

Architecture of the ANFIS

The neuro-adaptive techniques can provide a learning strategy for the FRB models to learn proper rules. Therefore, ANFIS is capable of adjusting MFs and consequent parameters, which allows the TS system to learn from the dataset. Mathematically, an arbitrary linear function is considered in the consequent part of the fuzzy rules as [42]:

$${\Re }^{j} :~\left\{ {\begin{array}{*{20}c} {~{\rm IF}\,~~x_{1} \,~{\rm is}\,~A_{1}^{j} \;{\rm AND}~\;~~x_{2} ~{\rm is}~A_{2}^{j} ~\;~~{\rm AND}~\,~ \ldots ~\,~{\rm AND}~\;~~x_{n} ~{\rm is}~A_{n}^{j} } \\ {{\rm THEN}\,~y^{j} = ~a_{0}^{j} + a_{1}^{j} x_{1} + \cdots + a_{n}^{j} x_{n} } \\ \end{array} } \right.$$
(7)

where \(a_{0}^{j} ,{ }a_{1}^{j} , \ldots { },\) and \(a_{n}^{j}\) are the linear parameters of the consequent part of the \(j\)th rule.

Since the tunable coefficients are trained based on the gradient-based algorithm, the computational time will be increased, when encountered with extensive input data. Nonetheless, the more the number of rules, the more is the occurrence probability of the overfitting [41].

3.3 LoLiMoT

LoLiMoT is one of the TS fuzzy systems, which interpolates the LLMs to describe the training data in an incremental learning algorithm [43, 44]. Unlike ANFIS, the premise part of the fuzzy rules in LoLiMoT is evolved based on the input–output data pairs without the necessity to the predetermined settings. The LLMs generate a fuzzy system through the divide and conquer strategy to break down a complex identification problem into some more convenient subproblems, which will control the procedure of rule generation and prevent the complexity of the model [44].

Figure 6 depicts the LoLiMoT topology, in which the whole parts of the input space is covered by a set of neurons. As seen, each neuron is responsible for a specific zone and consists of an LLM and a validity function. Moreover, in LoLiMoT there is just a single hidden layer organized from neurons, and a simple adder which aggregates the LLMs as the outputs of the neurons [42].

Fig. 6
figure 6

LoLiMoT topology for \(n\) input and \(R\) local models

According to the topology of the network, the input–output relationship of the LLM can be written as:

$$\theta_{j} \left( {\underline {x} } \right) = \exp \left[ { - 0.5\left( {\left( {{ }\frac{{x_{1} - c_{1}^{j} }}{{\sigma_{1}^{j} }}} \right)^{2} + \cdots + \left( {{ }\frac{{x_{k} - c_{k}^{j} }}{{\sigma_{k}^{j} }}} \right)^{2} + \cdots + \left( {{ }\frac{{x_{n} - c_{n}^{j} }}{{\sigma_{n}^{j} }}} \right)^{2} } \right)} \right].$$
(8)

where \(\theta_{j}\) is the validity function of the \(j\)th neuron, and \(c_{k}^{j}\) and \(\sigma_{k}^{j}\) are the centroid and the standard deviation of the Gaussian activation functions, respectively. Afterward, the normalized validity function of the \(j\)th neuron is written as:

$${\Phi }_{j} \left( {\underline {x} } \right) = { }\frac{{\theta_{j} \left( {\underline {x} } \right)}}{{\mathop \sum \nolimits_{{{\text{j}} = 1}}^{R} \theta_{j} \left( {\underline {x} } \right)}}$$
(9)

where \(R\) is the number of neurons (fuzzy rules) in a standard FRB system. Similar to the TS models, the first-order polynomials are considered for the LLM in each neuron as follows:

$$\hat{O}_{j} = { }w_{0}^{j} + w_{1}^{j} x_{1} + \cdots + w_{{\text{n}}}^{j} x_{{\text{n}}}$$
(10)

where \(\hat{O}_{j}\) is the output of the \(j\)th LLM, and \(\underline {w}^{j} = \left[ {w_{0}^{j} ,w_{1}^{j} , \ldots ,w_{{\text{n}}}^{j} } \right]^{{\text{T}}}\) is the vector of weights associated with \(\hat{O}_{j}\). Finally, the overall output of the LoLiMoT model will be calculated as:

$$\hat{y} = \mathop \sum \limits_{j = 1}^{R} {\Phi }_{j} \left( {\underline {x} } \right)\hat{O}_{j}$$
(11)

in which the unknown parameters are the weights of LLMs (\(\underline {w}^{j}\)) and the parameters of Gaussian activation functions. Herein, the weights \(\underline {w}^{j}\) are estimated by the weighted least-square (WLS) method as follows:

$$\underline {w}^{j} = \left( {X^{T} Q_{j} X} \right)^{ - 1} X^{T} Q_{j} y{ },{ }\,j = 1,{ }2,{ } \ldots ,{ }R$$
(12)

where \(X\) and \(Q\) are the regression and the weight matrices, respectively. When the given input data consists of \(N\) samples, the aforementioned matrices can be formulated as:

$$X = { }\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 & {x_{1,1} } & {x_{2,1} } \\ \end{array} } & \cdots & {x_{n,1} } \\ \vdots & \ddots & \vdots \\ {\begin{array}{*{20}c} 1 & {x_{1,N} } & {x_{2,N} } \\ \end{array} } & \cdots & {x_{n,N} } \\ \end{array} } \right]$$
(13)
$$Q_{j} = { }\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\Phi }_{j} \left( {\underline {x}_{1} } \right)} & 0 \\ 0 & {{\Phi }_{j} \left( {\underline {x}_{2} } \right)} \\ \end{array} } & {\begin{array}{*{20}c} \ldots & 0 \\ \ldots & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} \vdots & \vdots \\ {0{ }} & 0 \\ \end{array} } & {\begin{array}{*{20}c} \ddots & \vdots \\ \ldots & {{\Phi }_{j} \left( {\underline {x}_{N} } \right)} \\ \end{array} } \\ \end{array} } \right]$$
(14)

Although the network topology in LLMs is different from ANFIS, it has been proved that both models have a similar interpretation. In particular, the number of neurons in LoLiMoT is identical to the number of fuzzy rules in ANFIS. Moreover, LLMs in each neuron is equivalent to the functions of the rule’s consequent. Nonetheless, the training strategy in the LoLiMoT and ANFIS is quite different. ANFIS is highly dependent on the initialization of the parameters, the number of fuzzy rules, and the topology of the model. The model structure is also fixed during the learning phase. In contrast, LoLiMoT begins with just one LLM at first, and the network structure will grow based on an incremental algorithm in order that the optimum model is finally achieved [45]. Therefore, LoLiMoT is classified as an incremental tree-construction algorithm that employs the axis-orthogonal partitioning to split input data.

In general, each neuron in the LoLiMoT topology, and its corresponding LLM will provide a linear sub-system as a local model to find the final output. It is necessary to increase the LLMs for improving the model performance that is equivalent to an increase in the number of neurons. This procedure is done by orthogonal splitting in different axes of input space based on the selection of the worst local model. For obtaining the worst local model, the general and local cost function corresponding to the output of each neuron is defined:

$$J = { }\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} e\left( {\underline {x}_{i} } \right)^{2} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {y\left( {\underline {x}_{i} } \right) - \hat{y}\left( {\underline {x}_{i} } \right)} \right)^{2}$$
(15)
$$J_{j} = \mathop \sum \limits_{i = 1}^{N} e\left( {\underline {x}_{i} } \right)^{2} {\Phi }_{j} \left( {\underline {x}_{i} } \right){ }, \,\,{ }j = 1,{ }2,{ } \ldots ,R$$
(16)

Here \(e\) is the squared error function, \(J\) is the overall error, and \(J_{j}\) is the error of the \(j\)th local model. The local models with higher local errors are chosen in each iteration and then are broken down into two distinctive neurons with new LLMs. Therefore, the topology of the fuzzy networks is increasing during the training phase. The main steps in LoLiMoT are summarized as:

  • Step 1: Starting with one neuron as the initial model, \(\Phi_{1} = 1\), and \(R = 1\). (The whole input space is covered by the corresponding LLM and its validity function.)

  • Step 2: Determining the worst LLM based on the maximum local cost function (Eq. 16).

  • Step 3: Breaking the worst LLM:

    1. a.

      Splitting each input dimension into two different local models.

    2. b.

      Assigning the new LLMs and validity functions for both regions, and estimating the parameters of LLM through the WLS (Eq. 12).

    3. c.

      Calculating the general cost function for the whole model (Eq. 15).

  • Step 4: Checking the stop condition and going back to step 2.

Figure 7 depicts the orthogonal partitioning of the input space in a 2D LoLiMoT model. As seen, the upper region is split into two distinctive subregions. This procedure is repeated at each iteration, and then, the worst performing LLM would be subdivided into two new subregions to generate the new LLMs and their validity functions. After each division, the center of the subregions has been chosen as the center of the Gaussian validity functions (\(c_{k}^{j}\)) for the new local models. Moreover, given the extensions of subregions as \(\Delta_{k}^{j}\), the standard deviation of each validity function has been selected as \(\sigma_{k}^{j} = 0.33 \times \Delta_{k}^{j}\). This procedure is repeated in all dimensions based on Step 3 mentioned in the LoLiMoT algorithm, and finally, the highest performance is determined.

Fig. 7
figure 7

Orthogonal partitioning of a 2D input space in LoLiMoT

3.4 HiLoMoT

In the partitioning strategy of the LoLiMoT, the subregion having the maximum local cost function is determined in each iteration and will then be broken down into two new subregions by axis-orthogonal partitioning. Unfortunately, the optimization approach in the LoLiMoT is linear, which will mainly influence the convergence speed and performance of the model [46].

In order to cope with such deficiency, Hartmann et al. [28] based on the hinging hyperplanes [47], developed a hierarchical model structure, namely HiLoMoT. In contrast to the LoLiMoT with a flat axis-orthogonal method, the direction of splitting in the HiLoMoT algorithm is necessarily produced a nonlinear relationship in a hierarchical algorithm [48].

The direction of splitting for LoLiMoT and HiLoMoT is compared in Fig. 8. In brief, HiLoMoT is characterized by a nonlinear oblique partitioning, which will cause to overcome the restriction of the axis-orthogonal partitioning as a significant limitation of the LoLiMoT [49]. Nonetheless, the nonlinear partitioning may increase the computational cost.

Fig. 8
figure 8

Procedure of input space splitting for four successive iterations [46]

4 Results

The problem of roof failure is a common challenge in Tabas coal mine, which would appear to be a more serious trouble by increasing the depth of mining in the near future. An unstable tailgate roadway would not only cause mining operations to be slowed down or delayed but could also potentially bring about incidents leading to injuries or fatalities. Therefore, due to the high investment costs and the safety standards, it is not satisfactory to ignore the roof displacements during mining operations in the Tabas mine. In our research, the three models of HiLoMoT, LoLiMoT, and ANFIS are developed using the fuzzy toolboxes of MATLAB software to predict roof displacements. The procedure of the model’s development and the results are presented in the following subsections.

4.1 Data establishment

In order to predict the roof displacement in tailgate roadways of Tabas longwall mine, the panel E2 was selected as our case study. For this purpose, a dataset was gathered from the geological reports, boreholes information, and rock mechanics laboratory tests. In addition, maximum roof displacements (\(d_{\max}\)) are recorded by face advancement using dual height telltale instruments installed at specified distances along the investigated tailgate roadway. By continually recording telltales, the displacements within and above the bolted height will be indicated. In other words, a telltale has two movement indicators; the “A” indicator that shows roof displacements about 30 cm below the bolted height, and the “B” indicator that displays displacements at least twice above the bolted height. Figure 9 illustrates recorded data for some selected sections from Jul. 3, 2008, to Dec. 1, 2012.

Fig. 9
figure 9

Roof displacements versus time for both indicators at six selected sections (“A” indicator shows roof displacements about 30 cm below the bolted height, and “B” indicator displays roof displacements at least twice above the bolted height)

In this research, the \(d_{\max}\) recorded from reading the telltales are introduced to the fuzzy-based models to find the nonlinear relationship between geomechanical information and \(d_{\max}\). The input parameters are the uniaxial compressive strength (UCS), rock mass rating (RMR), tensile strength (\(\sigma_{t}\)), shear strength (\(\tau\)), Young’s modulus (\(E\)), cohesion (\(C\)), angle of internal friction (\(\phi\)), slake durability index (\(I_{d2}\)), and density (\(\rho\)). The statistics of the employed data are presented in Table 1.

Table 1 Details of the datasets applied for training the neuro-fuzzy models

4.2 Data normalization

Since the independent variables are obtained in different units, the input data have to be normalized before introducing them to the models. Data normalization leads to dimensionless and maintains the input data between 0 and + 1. In addition, the learning speed and the permanency of the model will be enhanced by dimensionless. The input data in our case are normalized using Eq. (17) [50]:

$$X_{{{\text{Norm}}}}^{pq} = \frac{{X^{pq} - X_{{{\text{min}}}}^{q} }}{{X_{{{\text{max}}}}^{q} - X_{{{\text{min}}}}^{q} }}$$
(17)

where \(X_{Norm}^{pq}\) is the normalized value, \(X^{pq}\) is the original value in the \(p\)th row and the \(q^{th}\) column, respectively, \(X_{{{\text{min}}}}^{q} {\text{and}}\) \(X_{{{\text{max}}}}^{q}\) are, respectively, the minimum and maximum values of the related \(q{th}\) column.

4.3 Model development

The procedure of model development in our research is summarized in Fig. 10. The main idea behind the local model networks is to approximate the nonlinear function using multiple piecewise linear models. As seen, the first step in our neuro-fuzzy model is to establish the LLMs. For initializing the FRB system, the training procedure is commenced with one neuron as the initial model, \(\Phi_{1} = 1\), and \(R = 1\). The related LLM and the validity function will then cover the whole input space.

Fig. 10
figure 10

Flowchart of the modeling process

In the next step, by maximizing the local cost function, the worst LLM is picked out and will then be split into two new LLMs. The partitioning strategy will be implemented in the next step in order to estimate the new LLMs and their validity functions. The model is trained after partitioning, and its validity is checked by calculating the general cost function. Finally, the prediction capability of the trained model is examined by introducing new data, and comparing the predicted and measured values.

The major difference between LoLiMoT and HiLoMoT models is related to the method of partitioning. As mentioned, partitioning approaches for the LoLiMoT and HiLoMoT models are, respectively, in the forms of linear orthogonal and nonlinear oblique partitioning. For comparison, the axis-orthogonal partitioning between input variables and \(I_{d2}\) is shown in Fig. 11 for instance. In addition, the partitioning methods in some selected variables during training the HiLoMoT and LoLiMoT models are compared in Fig. 12.

Fig. 11
figure 11

Axis-orthogonal partitioning between input variables and \(I_{d2}\) in LoLiMoT model

Fig. 12
figure 12

Comparison of partitioning methods in HiLoMoT and LoLiMoT for selected instances

As shown in Fig. 11, the partitioning of input space in LoLiMoT is done based on the orthogonal partitioning strategy. In this respect, increasing the number of partitions causes to increase in the number of neurons and LLMs, which is equivalent to growing the number of fuzzy rules. Therefore, the new validity functions and LLMs are considered for each anew generated local model. During the training procedure, the parameters related to each validity function will be kept unchanged until the proper local model is identified as the worst local model. Accordingly, the input apace for variables presented in Fig. 11, has been divided into four distinctive subregions on the basis of the corresponding worst local models. As a result, the middle of the new subregions is set as the center of the Gaussian validity functions, in which one-third of the extension of the subregions is devoted to their standard deviations.

On the other hand, the worst local models in HiLoMoT are split by oblique lines, in contrast to LoLiMoT in which the horizontal or vertical lines are employed (Fig. 12). Actually, the position and direction of split lines in HiLoMoT are optimized based on the nonlinear optimization techniques. As shown in Fig. 12, the number of local models, and consequently, the corresponding LLMs and validity functions are quite different in the HiLoMoT model. In this figure, input spaces in some cases such as \(\tau\) and \(E\) or RMR and \(\phi\) were not detected as the worst models, and the related validity functions are then including just one Gaussian function. Equivalently, the 3D view of the partitioning procedure for a single-partition and a four-partition local models are presented in Fig. 13. As seen, the centers of the 3D Gaussian validity functions are located at the middle of subregions and expanded to cover the whole areas of the specified subregions.

Fig. 13
figure 13

GMFs associated with the LLMs resulted during training the HiLoMoT model

In contrast to the local model networks, the first step in developing an ANFIS model is selecting the number of rules. For modeling ANFIS, the MFs for the nine input variables are generated in MATLAB environment by subtractive clustering, and the results are presented in Fig. 14. Visual inspection of the figure verifies that three MFs are created in each dimension, which is equivalent to three fuzzy rules.

Fig. 14
figure 14

GMFs associated with the nine input entries in the ANFIS model

After determining the fuzzy set, the network’s structure of the ANFIS is composed of different layers, and various sets of rules are examined, and the model with the highest accuracy is obtained associated with the three extracted fuzzy rules. In fact, ANFIS employs back-propagation (BP) algorithm for tuning MFs and least-squares estimation for training the coefficients of the conclusion parts [51]. In other words, the consequent parameters in each training step are estimated through a two-pass algorithm based on the least-squares estimation in the forward phase. Then, the MF parameters are tuned through back-propagating the obtained error in the backward phase.

The \(d_{\max}\) values predicted by the three employed models are plotted versus the actually measured ones in Fig. 15. As seen, the squared correlation coefficients (\(R^{2}\)) calculated between the predicted and measured values for training the ANFIS, LoLiMoT, and HiLoMoT models are, respectively, obtained as 0.921, 0.943, and 0.976 for the best structures corresponding to the best performance of the models.

Fig. 15
figure 15

\(R^{2}\) resulted from training three models of ANFIS, LoLiMoT, and HiLoMoT

4.4 Prediction capability of the model

The validity and performance of the HiLoMoT model in predicting \(d_{\max}\) are investigated by introducing an unseen dataset from different sections of panel E2. For this purpose, three indices of the coefficient of determination (\(R^{2}\)), variance accounted for (VAF), and the root mean square error (RMSE) between the measured and predicted values of \(d_{\max}\) are used:

$$R^{2} = \frac{{\left[ {\mathop \sum \nolimits_{i = 1}^{N} \left( {y_{i} - \overline{y}} \right)^{2} } \right] - \left[ {\mathop \sum \nolimits_{i = 1}^{N} \left( {y_{i} - \hat{y}_{i} } \right)^{2} } \right]}}{{\left[ {\mathop \sum \nolimits_{i = 1}^{N} \left( {y_{i} - \overline{y}} \right)^{2} } \right]}} , \quad\overline{y} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} y_{i}$$
(18)
$$VAF = 100 \times \left[ {1 - \frac{{var\left( {y_{i} - \hat{y}_{i} } \right)}}{{var\left( {y_{i} } \right)}}} \right]$$
(19)
$${\rm RMSE} = \sqrt {\frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }$$
(20)

where \(N\) is the number of samples, \(var\) is the variance, and \(y_{i}\) and \(\hat{y}_{i}\) are, respectively, the measured and predicted values associated with the \(i\)th input sample. The three measures of \(R^{2}\), VAF, and RMSE obtained for three employed models of ANFIS, LoLiMoT, and HiLoMoT are summarized in Table 2. As seen, the prediction capability of the HiLoMoT model is superior in comparison with the other ones.

Table 2 \(R^{2}\), VAF, and RMSE for three models of ANFIS, LoLiMoT, and HiLoMoT

According to the results of the ANFIS model, it appears that the best performance is obtained when the number of rules is three. In other words, ANFIS achieved the highest \(R^{2}\) and the lowest RMSE for testing samples when there are three fuzzy rules.

In contrast to ANFIS, which is affected by the number of rules, the fuzzy structure will automatically be created in HiLoMoT. In this research, the structures of the LoLiMoT and HiLoMoT models are found to be composed of seven and two local models, respectively. During generation of the local models, only the parameters of LLMs are optimized, and the parameters related to the validity functions remained unchanged. Therefore, the number of tuning parameters in HiLoMoT is fewer than ANFIS, since the number of LLMs in local model networks is equivalent to the number of fuzzy rules. Moreover, it is not necessary to predetermine the number of LLMs by an expert. According to Table 2, HiLoMoT indicates the best performance in terms of \(R^{2}\) and RMSE, while it has a simpler structure than the other two models.

In order to develop an ANFIS model, the premise and consequent parameters are to be optimized at first, whereas the learning process in HiLoMoT includes a forward and a backward part. The consequent parameters are optimized by the least-squares method in a forward path, and the premise parameters are adjusted by the gradient descent method in a backward path.

The predicted values of \(d_{{{\text{max}}}}\) for three models of ANFIS, LoLiMoT, and HiLoMoT are correlated with measured values in Fig. 16. The calculated \(R^{2}\) between the predicted and measured values for ANFIS, LoLiMoT, and HiLoMoT are, respectively, obtained as 0.8729, 0.9392, and 0.9520. These results indicate the confident capability of the HiLoMoT model in predicting \(d_{\max}\) when encounters with unseen data.

Fig. 16
figure 16

\(R^{2}\) resulted from testing three models of ANFIS, LoLiMoT, and HiLoMoT

In addition, the absolute errors associated with each one of the models are presented in Fig. 17. As seen, the prediction capability of the HiLoMoT method is more exquisite in comparison with ANFIS. Meanwhile, the local model networks have many advantages such as: simplicity, high flexibility, not necessity for presetting parameters, and avoiding the risk of overfitting. Therefore, the HiLoMoT model may be implemented as a new applicable tool for predicting roof displacements ahead of time in mechanized longwall coal mining.

Fig. 17
figure 17

Errors associated with three models of ANFIS, LoLiMoT, and HiLoMoT

5 Discussion

The risk of facing unstable zones in longwall coal mining is evident due to the high-stress concentration and bulky movements of roof strata. In this respect, the tailgate roadway is more susceptible, which suffers a high-stress redistribution, especially around the T-junctions due to the superposition of the abutment stresses. Depending upon the mine geometrical, geological, and geomechanical conditions, various failure mechanisms are prone to occur in longwall tailgates. Geological surveys in Parvadeh coalfield show that the risk of encountering with the undetected minor faults is high. On the other hand, there were many roof strata instabilities in panel E2, which led to several technical and financial problems. The uncontrolled roof displacements may affect the mine profitability or workability, because any displacements in mechanized longwall mining can have adverse effects on the mining operations and the availability of equipment. Therefore, implementing a suitable program to collect the geomechanical and monitoring information during the development stages of the next panels is essential. The proposed model may be used as a predictive measure for identifying roof displacement ahead of time on the basis of the associated geomechanical information. HiLoMoT, as a TS fuzzy model, is flexible in considering the uncertainties inherent to geological and geomechanical information. Employing such a model is emphasized when the probability of minor faults is beyond normal bounds in Tabas mine. Moreover, when the poor ground conditions associated with faulting that result in large roof displacements are inevitable, the proposed neuro-fuzzy model is helpful to avoid or control adverse effects on the advance rate and production planning.

Figure 18 illustrates the predicted values of \(d_{\max}\) for three models of ANFIS, LoLiMoT, and HiLoMoT. As seen, comparison of the results obtained from the HiLoMoT model with those of LoLiMoT and ANFIS shows that the proposed HiLoMoT model can comparatively predict \(d_{\max}\) more accurately than the others. Also, the HiLoMoT results are in good agreement with the measured ones, signified by the closeness to the equality line and the high goodness of fit.

Fig. 18
figure 18

Comparing the \(d_{\max}\) values for measured (blue) and predicted (red) samples (Color figure online)

The histograms of the resulted errors during procedures of training and testing of the models are also presented in Fig. 19. As seen, the dominant frequency of the calculated errors has been focused in the vicinity of the zero lines, which shows the well-trained models. However, the minimum testing errors are obtained in the HiLoMoT model.

Fig. 19
figure 19

Comparison of the error histograms for training and testing samples

The proposed HiLoMoT model has many advantages in comparison with the ANFIS method as a commonly used neuro-fuzzy model. At first, since ANFIS employed the BP for tuning the parameters associated with the MFs, the convergence speed is relatively slow. The results of ANFIS are mainly depended on the initial selection of the parameters. Moreover, ANFIS suffers from significant limitations such as trapping in local extrema and overfitting [41, 51].

In contrast, the HiLoMoT model receives some benefits such as local optimization, least-squares estimation, and incremental tree-construction [46]. Incremental tree-construction leads to subdividing the input space into each iteration to choose the number of neurons (rules or LLMs) in order to improve the quality of the model without further iteration loops or trial and error procedures, while the MFs in ANFIS are empirically chosen by trial and error. In addition, the validity functions are fixed in the HiLoMoT model to estimate the parameters of local models using local optimization in each LLM. This issue results in neglecting the overlap of the validity functions that significantly reduces the risk of overfitting. Although the number of MFs is to be kept fixed during the learning process in ANFIS, there is no limit for MFs to be fixed in HiLoMoT. Therefore, in contrast to ANFIS, which needs the presetting parameters, the incremental partitioning of HiLoMoT does not require prior knowledge. A major weakness of LoLiMoT is the restriction to axis-orthogonal partitioning, which can be removed through oblique partitioning strategy in HiLoMoT.

Therefore, the HiLoMoT model is proved to be extraordinarily flexible and eligible in industrial application for successfully identifying both linear and nonlinear systems [43]. Despite widespread progress in mechanized longwall mining, there is presently no unique technique that can provide information concerning roof instabilities during mining operations. Uncontrolled roof displacements are responsible for many downtimes and delays in the Tabas longwall mine, which certainly cause to calamitous consequences, especially in the tailgate roadway and T-junctions. The proposed model may be useful to timely detect unstable zones by predicting roof displacements.

The immediate roof in this panel is composed of a weak listric mudstone with a thickness of 0.2 m, separated from the overlying mudstones by a polished bedding plane. The interval to seam C2 is less than 3 m, the roof measures are predominantly of weak seatearth mudstone. The overhead mudstone layer has been well bedded with several bedding planes forming well-defined partings, which indicates more serious conditions for the next panels.

In retreat longwall mining, the geological conditions and geomechanical information are confirmed during panel development. Introducing such information, the proposed model is practically functional in predicting unstable zones by identifying the ranges of \(d_{\max}\) changes. The risk of encountering catastrophic roof failure and unexpected disasters is therefore significantly reduced, although not entirely removed.

6 Conclusion

The phenomenon of roof displacement as an indicator for predicting unstable zones in longwall tailgates is identified through a TS fuzzy-based HiLoMoT method. The purpose is to predict the nonlinear relationship between geomechanical information and roof strata displacements. The displacements were monitored using dual height telltales, installed at specific distances in the under-question tailgate roadway at Tabas longwall mine. Due to the high-stress concentrations in longwall tailgates, timely prediction of any roof strata displacements is vital to safe and well-organized mining operations. Problematic tailgate instabilities in the Tabas coal mine are recognized as the serious concerns with adverse consequences, varying from production delays to catastrophic roof failures. The results were compared with those of ANFIS to examine the prediction capability of the proposed model. Performance evaluation of the three models of ANFIS, LoLiMoT, and HiLoMoT are fulfilled by calculating three indices of \(R^{2}\), VAF, and RMSE. Accordingly, the \(R^{2}\) measure resulted from training the HiLoMoT, LoLiMoT, and ANFIS models is, respectively, obtained as 0.976, 0.943, and 0.921 for the best structures corresponding to the best performance of the models. Introducing unseen data, the \(R^{2}\) resulted from testing the three models of HiLoMoT, LoLiMoT, and ANFIS is found to be 0.952, 0.939, and 0.873, respectively. The maximum value for VAF and minimum value for RMSE are also obtained for the proposed HiLoMoT model as 97.57 and 0.0050 for training, and 95.13 and 0.0193 for testing data. According to the results, the proposed HiLoMoT model can comparatively predict \(d_{\max}\) more accurate than the others, and its results are in agreement with the measured ones signified by a high goodness of fit. However, it is impossible to accurately predict any parameters related to rock structures due to inherent complexity and uncertainty associated with the geological conditions. Nonetheless, if the trends of the displacements are recorded, and their changes are related to the geomechanical information through an intelligent model, it is feasible to derive a reasonable judgment for stability prediction in similar conditions at the mine. This issue may also be used as a precise and timely predictor model, which will be updated by introducing the new historical data.