1 Introduction

Due to the harsh and unpredicted working conditions in the underground coal mine, many accidents occurred (Qiao and Zeng 2011; Wang et al. 2013). Furthermore, as per Mine Safety and Health Administration (MSHA), the explosion of flammable gases was occurred in underground mines (MSHA 2018). As a result, it had reduced the production in the affected mines. Therefore, there is a necessity for designing a system that continually monitors and forecasts gas concentrations in underground coal mine enclosed regions for the safety of workers and the early opening of the enclosed site to begin production. Some traditional sensing and monitoring technologies are available for monitoring gas concentrations (Kumar et al. 2013; Mandal et al. 2013; Chaulya and Prasad 2016). However, these traditional methods are incapable of processing massive amounts of multidimensional data provided by various sensors.

In this study, the IoT-enabled gas sensors (CH4, CO2, CO, O2, and H2) are deployed in a gas collecting chamber installed outside the enclosed site wall of an underground coal mine. The inlet of the metallic gas chamber is connected to a solenoid valve followed by a metallic pipe inserted deep into the fireside through a sealed-off wall. The solenoid valve automatically opens in the fixed time interval for collecting different gas concentrations of a sealed-off area. The five gas sensors gather the appropriate gas concentrations and transmit them to the prediction system via a wireless local area network (WLAN). After collecting 2-min sampling, the solenoid valve closes automatically and opens after a predefined programmable interval. The multidimensional data are generated from IoT-enabled gas sensors. Then generated data send to the prediction model, where the model is trained. After completing the training and validation processes, the prediction model predicts the respective gas concentration of a sealed-off area and sends the prediction result to the cloud storage. Finally, the mine management accesses the cloud storage from the surface control room and monitors the sealed-off area's environmental condition.

Recently, Huang and Kuo (2018) have developed a deep convolutional neural network (CNN)-long short-term memory (LSTM) method for weather forecasting in a smart city. On the other hand, deep learning networks have been broadly used in natural language processing (NLP), computer vision, and object recognition (He et al. 2016; Collobert and Weston 2008). However, deep learning networks effectively process complex multidimensional data (Rashid and Rehmani 2016; Jo and Khan 2018; Saeed et al. 2019). But this method does not correlate the gas concentration, air velocity, and temperature parameters of the underground mine. Hence, this method cannot accurately predict the concentrations of gases in the enclosed region. Therefore, the combination of a deep learning network and IoT technologies has been adopted to monitor the underground mine environment (Rashid and Rehmani 2016; Jo and Khan 2018; Muduli et al. 2019; Saeed et al. 2019).

To achieve this goal, a reliable and efficient deep learning model has been developed that can predict the error-free gas concentration in real time for an underground mine's sealed-off area. Thus, a deep learning method is proposed as a prediction model to combine t-distributed stochastic neighbor embedding (t-SNE), variational auto-encoder (VAE), and bidirectional LSTM (bi-LSTM) network, which is named as a t-SNE_VAE_bi-LSTM model. The t-SNE algorithm preprocesses multidimensional time series data from the CH4, CO2, CO, O2, and H2 gases data and reduces the dimension. Subsequently, the model combines VAE and bi-LSTM layers, extracting useful information and predicting the concentration of gases. Traditionally, several statistical models proposed inefficient methods for extracting the data's implicit feature during the analysis phase (Liu et al. 2004; McKeen et al. 2007). The VAE can extract potential information and reduce data volume, improving the prediction accuracy of the gas concentration. The LSTM is based on a recursive neural network (RNN) to predict the incident over time interval value (Sundermeyer et al. 2012). The bi-LSTM is connected to the hidden layer in both orders between forward and backward. The bi-LSTM receives input from the VAE layer's output, which generates the essential feature of five gas sensors data. The bidirectional LSTM model is trained using the present and past gas sensor data feature, improving prediction accuracy. Thus, the VAE layer extracts the import feature, and bi-LSTM efficiently predicts the gas concentration in the enclosed region of an underground mine.

Therefore, this study aims to develop a unique model for predicting CH4, CO2, CO, O2, and H2 concentrations, known as the t-SNE_VAE_bi-LSTM model, by learning the in-depth characteristics of these gases. Furthermore, as a result of this study, the proposed model can predict other mine hazards, such as roof fall in underground mines and slope failure in opencast mines with minor modifications.

The following are the paper's main contributions:

  • CH4, CO2, CO, O2, and H2 concentration prediction helps to improve mine safety and early reopening of the sealed-off area to start production.

  • The state-of-the-art approach, namely deep learning algorithms, has been used to build a prediction model for CH4, CO2, CO, O2, and H2 concentrations.

  • An automated metallic gas chamber has been designed that regularly collects gas samples from an underground coal mine sealed-off area at fixed time intervals.

  • In the enclosed region of coal mines, a real-time system for monitoring and forecasting CH4, CO2, CO, O2, and H2 gases concentration has been designed.

  • The CH4, CO2, CO, O2, and H2 gas concentration prediction model has been proposed by combining t-SNE, VAE, and bi-LSTM models named the t-SNE_VAE_bi-LSTM model.

  • The proposed model's t-SNE method aims to lower the dimension of the collected gas concentration; the proposed model's VAE layer seeks to retrieve the inner characteristics of low-dimension gas concentration. Finally, the proposed model's Bi-LSTM layer tries to forecast the concentrations of CH4, CO2, CO, O2, and H2 gases at regular intervals.

  • The developed system and prediction model have been validated by deploying the complete unit in a coal mine.

The rest of this paper has been arranged as follows. Section 2 presents a recent survey of the literature on the prediction of gas concentrations. Section 3 provides an overview of the automated and intelligent prediction system. The problem statement for the concentration of gases in the deep coal mine sealed-off region is presented in Sect. 4. Section 5 presents the proposed methodology for predicting real-time gas concentrations. Section 6 presents the detailed dataset, results, and comparative analysis of the different prediction models for gas concentration prediction. Finally, Sect. 7 brings the findings to a conclusion. Figure 1 depicts the paper's structure.

Fig. 1
figure 1

Structure of the paper

2 Related works

Mine accidents have increased yearly due to abnormally increased gas concentration levels (Qiao and Zeng 2011; Wang et al. 2013) as underground coal mines extend up to deeper depth. Xia et al. (2016) have mentioned the sudden increase in gas levels and controlling disaster in the coal mine. However, the gas concentration level was changed abnormally before the underground mine accident occurred. Rodriguez et al. (2014) and Song et al. (2019) have also described such a situation and tried to determine the gas concentration of the underground mine but were unable to predict the gas concentration. Different machine learning (ML) methods were also applied by various researchers (Karaca et al. 2006; Xi et al. 2015) to predict gas concentration. There are multiple traditional methods available for predicting the concentration of gases in the underground coal mine. The existing prediction methods can be divided into different categories, like the machine learning forecasting approach (Karaca et al. 2006; Xi et al. 2015), statically prediction approach (Brooks et al. 2016), and mathematical prediction approach (Sundermeyer et al. 2012). Said approaches are not capable of processing vast amounts of data generated from five gas sensors in real time. It is challenging to employ stated techniques in the actual forecasting status of a sealed-off area's gas level. To predict gas emission of the working face, Ye et al. (2006), Yang et al. (2009), Chen et al. (2016), and Guo et al. (2018) have divided the different gas emission sources. Their plan came from a distinct source and forecasting method. They formed equations for gas emission of other emission areas of a mine, which delivered a significant vision and assistance for gas extraction and ventilation design. However, numerous variables, such as the rate of coal descending per unit of time and the amount of coal that remained in the goaf, were present in this approach (Zheng et al. 2019). Thus, applying the above techniques to forecast future gas levels in the underground coal mine is not easy. To decide the rule of gas concentration trek in the surface of a mine where working is still progress, the researchers observed the equation of gas flow, equation of gas dispersal, and trek equations by a bulky number of mathematical simulations (Cao and Li 2017; Xia et al. 2017). But, mesh separation of the mathematical model was an essential consequence of this experimentation. The simulation result of gas extraction in a mine's working face depends on a basic model with the perfect margin environment. A mathematical model demonstrated the rule of gas extraction, but it is problematic to accomplish actual forecasting of different gas concentrations.

Many researchers have studied on-time serial sensor data and developed several prediction models. For example, the chaotic time series (CHAOS) model (Zhang et al. 2007; Cheng et al. 2008; Liu 2010), auto-regressive integrated average moving (ARIMA) model (Rekhi et al. 2020), and support vector regression (SVR) model (Kun et al. 2016) proposed for predicting the concentration of gases in underground coal mines. However, as mentioned above, the models' prediction speed and accuracy were insufficient for actual gas level monitoring. The LSTM neural network is a popular recurrent method for time series data prediction (Lyu et al. 2020). It can remember long-term historical data, which can be used in various applications such as speech recognition, emotion detection, and forecasting detection.

Many researchers have recently developed optimization algorithms. These algorithms described how vital features could be extracted from the gas concentration in underground mines. For example, Abualigah (2019) has proposed a novel classifier for classifying text documents. Further, particle swarm optimization algorithm and dimension reduction technique has been used to get the novel feature in low-dimensional space. This unique characteristic is then employed to boost productivity while lowering the computational cost of the text clustering (TC) technique.

Furthermore, Abualigah et al. (2021a) have proposed a mathematical optimizer to find a solution in an ample search space. It used the distribution behavior of the four basic mathematical operations. Similarly, Abualigah et al. (2021b) have developed a lightweight optimizer motivated by Aquila's behavior. The optimization procedures of the proposed algorithm are divided into four methodologies for selecting and discovering separate search spaces.

Altabeeb et al. (2021) have proposed a collaborative modified firefly algorithm to cooperate with the capacitated vehicle routing problem (CVRP). This proposed method efforts to find transport routes with the shortest distance traveled. Abd Elaziz et al. (2021) have proposed modified artificial ecosystem-based optimization (AEO) to solve the task scheduling in the cloud environment. Task scheduling is critical, and optimizing scheduling for IoT task requests can enhance organizational quality and profitability. Hassan et al. (2021) have presented an improved slime mold algorithm (ISMA). They used it to effectively resolve single- and bi-objective financial and emission dispatch (FED) problems while considering valve-point effects. Eid et al. (2021) have developed an improved marine predators algorithm (IMPA) to extend the previous marine predator's algorithm (MPA). The proposed enhancements result provides faster convergence and avoids possible minima instability for the previous MPA. In addition, IMPA regulates the voltage and current injected into the distributed generation to minimize overall system losses and total voltage deviations. Şahin and Abualigah (2021) have proposed a unique deep neural network-based intrusion prevention technique for determining features using the grouping system. A deep neural network is also utilized to store the time sequence characteristic mapped from actual past data. The proposed approach used an impervious features extraction model to improve the recognition skills of static analyses. Hati et al. (2019) have proposed an intelligent wireless framework to manage the network.

Dey et al. (2021a) and Kumari et al. (2021) have proposed a deep neural network to forecast mining risks and explosive states in the underground mining site. The proposed system indicates the safe condition of the working zone of the underground mines by correlating several hazards parameters. In addition, Dey et al. (2021b) have presented a safe architecture for training the model securely. Similarly, Dey et al. (2021c) have introduced a deep network-based secure communication channel in the mining site to secure communication in underground mines. Muduli et al. (2019) have conducted a comprehensive survey on deploying wireless sensor network technology in the enclosed region of the mine site. It understood the coal mine limits as well as other aspects of the operating zone. Jiang et al. (2018) have used a hazard adjustment approach based on a machine-learning algorithm to anticipate rock bolt incompetence in underground coal mines.

In the predictive training phase, the support vector machine (SVM) algorithm was used for various mine gas concentrations. Zhang et al. (2016) have optimized the weight of the artificial neural of the machine learning model and predicted gas level using the old and disorder principles. Deng et al. (2018) have developed a combined architecture of regression and swarm optimization to estimate the atmospheric pressure of unexpected combustion processes in the goaf region. Qiang and Pu (2018) have presented a technique for predicting short-term electricity supply. The preceding predictions are based on machine learning algorithms and swarm optimization. Zhao et al. (2020) have recommended a wastewater treatment plant based on artificial intelligence (AI). This framework quickly separated the toxic substances from the wastewater. In addition, AI was used to improve the efficiency and data processing in sewerage systems. Osarogiagbon et al. (2020) have presented a trained machine learning approach for the milling process. This approach smoothly recognized the numerous hazardous events that occurred, mainly during the milling process. Finally, Sharafati et al. (2020) have developed an enhanced data mining technique for forecasting effluent sewage's mean values and predictability.

Deep learning has mostly overtaken standard machine learning models in recent times. The deep learning method automatically extracts the vital feature from the sequence of data. Moreover, it compresses the input data using multi-layer applications. Thus, it reduces the over-fitting problem during the model learning time. However, the performance accuracy of the deep neural network is not efficient for complex time series data and is unable to processed massive amounts of data in real time. Hence, the t-SNE_VAE_bi-LSTM model has been proposed in this paper for the accurate and efficient prediction of gases present in underground coal mines in real time. Some of the recent research for predicting hazards and different optimization techniques is summarized in Table 1.

Table 1 Summary of recent prediction models developed for forecasting or optimization of process

3 System description

An automated and intelligent system is designed to monitor the condition of a blaze inside an enclosed region of the underground mining site in real time. The tracking of fire status is achieved by predicting concentrations of gases inside the enclosed region. The sealed-off area is defined as a part of the underground mine where the fire has occurred. The area has been sealed by constructing a wall to control fire by cutting off the oxygen supply and isolate the area from other working faces of the underground mining site. The fire intensity of the isolated area is gradually decreased by cutoff the oxygen flow in this area. Mine fires and explosions take many lives and cause much property damage every year. However, current methods of analyzing sealed-off areas in mines involve slow, cumbersome manual processes and are prone to error. Thus, an automated gas sampling and predicting system are developed for the sealed-off area to track the status of the fire. The details of the system are depicted in Fig. 2. The system consists of a data acquisition system, WLAN, five gas sensors (CH4, CO2, CO, O2, and H2) fitted inside a box/chamber, two solenoid valves, a suction pump, and a prediction model for collecting gas concentration from the sealed-off area at a fixed time interval. The detailed specification of gas sensors is given in Table 2. The system is connected with a pipe fitting in a fire-stopping brick wall for air sampling purposes from a sealed-off area. After each predefined time interval, the solenoid valve near the pipe opens, and the suction pump starts sucking air from the sealed-off area. The rear solenoid valve also opens subsequently. The sucked air passes through the box/chamber, and gas concentration is measured by different sensors fitted inside the room. It measures the gas concentration and sends it to the prediction model using WLAN. The other gas concentration is predicted using t-SNE and VAE with the bi-LSTM model in the prediction model. After the prediction process, the prediction result is uploaded to the cloud storage using WLAN. The concerned mine management of an underground mine accesses the prediction result from the cloud storage. It continuously monitors the gas level in the enclosed region of the underground mining site. The measurement process continues for 2 min. Then the system stops operation till the next cycle starts.

Fig. 2
figure 2

System architecture of forecasting system of the underground mining site to monitor the gas level in the enclosed region

Table 2 Permissible limit in underground coal mine and specification of different gas sensors

4 Problem scenario

The previously defined models are unable to process the complex and multidimensional time series sensor data effectively. As a result, the models cannot efficiently predict the gas concentration in the enclosed region of the underground mining site. The five fixed time series gas sensors data are collected from the designed metallic gas chamber and uploaded to the prediction model to predict the gas concentration accurately. The accumulated time series data includes 150 days of CH4, CO2, CO, O2, and H2 concentrations, including complex, multidimensional, and noisy data. Hence, there is a requirement for reducing dimension and noise from the collected data. The extracted important feature from collected data has been utilized for efficiently predicting gas concentration. Appropriate deep learning is employed to reduce dimension and noise and extract the vital feature of the multidimensional time series data. Therefore, t-SNE, VAE, and bi-LSTM neural networks have been utilized to predict gas concentration efficiently.

In this case, we explore a 2D matrix \(A(i,j)\) where the ith row is denoted as a group of different gases, and the jth column is indicated as a group of timestamps \(T = (t_{1} ,t_{2} , \ldots ,t_{d} )\) where \(t_{1} < t_{2} < \cdots < t_{d}\). For example, the matrix \(a_{{t_{1} ,1}}\) is marked as the value of CH4 concentration at the timestamp t1.

$$ A(i,j) = \begin{array}{*{20}c} {CH}_{4}\;\; \cdots\;\; {H}_{2}\\ \left({\begin{array}{*{20}c} {a_{{t_{1} ,1}} } & \ldots & {a_{{t_{1} n}} } \\ \vdots & \ddots & \vdots \\ {a_{{t_{d} 1}} } & \cdots & {a_{{t_{d} n}} } \end{array} } \right) \end{array}$$
(1)

The first objective is to minimize the dimension \(A(i,j)\) and mapping \(B(i,j)\) using the t-SNE method. The second objective is to extracts the important feature from \(B(i,j)\) and mapping to \(C(i,j)\) using the VAE method. Finally, the bi-LSTM layer predicts the future gas sensor value input as a \(C(i,j)\).

The main problem scenario of the underground coal mines are as follows:

  • There is no real-time CH4, CO2, CO, O2, and H2 concentration prediction system available for the enclosed region of the underground mining site to improve mine safety.

  • There is no automated system available to collect gas concentrations from an enclosed region of the underground mining site. Traditionally, the gas concentration was manually collected from the seal-off site using a sampling bag.

  • Previously defined models are incapable of processing complex and multidimensional gas concentrations in real-time.

  • Earlier described models cannot reduce dimension, and noise as well as cannot extract the vital feature from the multidimensional gas concentration.

Therefore, an automated metallic gas chamber that collects different gases at a fixed time interval has been designed. Section 3 contains a detailed description of the automated system. In addition, a prediction model has been developed based on t-SNE, VAE, and bi-LSTM neural network techniques. The model can process complex and multidimensional gas concentration data in real time and extract vital features that improve mine safety by reducing the noise and dimension of the collected gas concentration.

5 The proposed method for real-time gas concentration prediction

The real-time gas level prediction method is depicted in Fig. 3. It is divided into three parts. The first part of Fig. 3 describes the dimension reduction process of gas sensor data. The second part is the VAE layer, where data are de-noised and extracted from the critical feature. The last part is the bi-LSTM-based prediction model, trained, and validated using past and future features input from the VAE layer.

Fig. 3
figure 3

Real-time gas concentration prediction model

The VAE is a type of deep learning for de-noising the sensor data and extracting the import feature. Various researchers have already applied the VAE for video anomaly detection (Fan et al. 2020), pattern recognition (Ma et al. 2019), and feature learning (Zhang et al. 2019), and it produced a good result. Hence, the present study has employed the VAE layer to extract the potential feature from multiple sensor data. The bi-LSTM is a deep learning model where data processing, classification, and prediction process are performed. The historical and forecasting sensor values are playing an essential role in efficiently predicting gas concentration. The Bi-LSTM model knows short and long-term dependencies without holding duplicate data from both historical and forecast values. Many researchers have already applied bi-LSTM for speech identification (Ogawa and Hori 2017), classification (Zhao et al. 2018), biomedicine (Tutubalina et al. 2018), and sentimental analysis (Chen et al. 2017), which efficiently generated prediction results from time series data. Therefore, the bi-LSTM method is employed for the prediction process. Also, t-SNE, VAE, and bi-LSTM techniques enhanced the prediction accuracy and decreased model training time.

5.1 Preprocessing of input data

Due to the complex, multidimensional, and noisy data sample, it is difficult to directly train the model in the prediction process, generating inaccurate prediction results. As a result, the data preprocessing approach is critical in decreasing data imbalance and improving prediction outcomes.

In this paper, the t-SNE method is adopted nonlinearly by reducing the dimension of gas sensor data. It shrinks time series data by translating the multidimensional spatially neighborhood's Gaussian distribution to the low-dimensional space. As a result, the t-SNE technique can successfully capture a considerable fraction of local and global structures on a wide scale (Maaten and Hinton 2008). Furthermore, the similarity between multidimensional sensor data points and low-dimensional space is maintained by measuring the Gaussian joint probabilities between two data points (Fooladgar and Duwig 2018).

Here, we consider multidimensional gas concentration as a two-dimensional matrix \(A(1,j) = (a_{{t_{1} }} ,a_{{t_{2} }} , \cdots ,a_{{t_{d} }} ) \in {\mathbb{R}}^{\mathbb{Z}}\) where \(t_{1} < t_{2} < \cdots < t_{d}\). In \(A(1,j)\) a matrix, first row represents the CH4 gas concentration, and the jth column is denoted as the timestamp of the CH4 gas concentration. The detailed descriptions of the matrix are given in Eq. 1. Similarly, we have been represented the dimensionality of the remaining gas concentration. The conditional probability \(p_{l|k}\) between two neighboring data points \(a_{{t_{l} }}\) and \(a_{{t_{k} }}\) in timestamp \(t_{k} ,t_{l}\) is given by:

$$ p_{l|k} = \frac{{\exp ( - \left\| {a_{{t_{k} }} - a_{{t_{l} }} } \right\|^{2} /2\sigma_{{t_{k} }}^{2} )}}{{\sum\nolimits_{m \ne k} {\exp ( - \left\| {a_{{t_{k} }} - a_{{t_{m} }} } \right\|^{2} /2\sigma_{{t_{k} }}^{2} )} }} $$
(2)

where \(\sigma_{{t_{k} }}\) is the Gaussian variance concerning the central data point \(a_{{t_{k} }}\), and \(a_{{t_{m} }}\) is another neighboring data point in the timestamp \(t_{m}\). When \(p_{l|k} = 0\), the joint probability \(P_{kl}\) of the multidimensional space is determined as:

$$ P_{kl} = \frac{{(p_{l|k} + p_{k|l} )}}{2d} $$
(3)

where d is denoted as a set of data points of multidimensional gas concentration with a different timestamp. The low-dimensional gas concentration is represented as \(B(1,j) = (b_{{t_{1} }} ,b_{{t_{2} }} , \cdots ,b_{{t_{d} }} ) \in {{{\rm R}}}^{z}\) where \(z < {{{\rm Z}}}\). Similar to the above, there is a set of Gaussian variance \(\sigma_{{t_{k} }}\) in the conditional probability \(q_{l|k}\) to \(\frac{1}{\sqrt 2 }\). The joint probability \(Q_{kl}\) of low-dimensional space is defined as:

$$ Q_{kl} = \frac{{(1 + \left\| {b_{{t_{k} }} - b_{{t_{l} }} } \right\|^{2} )^{ - 1} }}{{\sum\nolimits_{m \ne o} {(1 + \left\| {b_{{t_{m} }} - b_{{t_{o} }} } \right\|^{2} )^{ - 1} } }} $$
(4)

where \(b_{{t_{m} }} ,b_{{t_{o} }}\) are another two neighboring data points at the timestamp \(t_{m} ,t_{o}\). The t-SNE algorithm seeks a low-dimensional \(B(i,j)\) that minimizes the mismatch between P and Q in order to make the low-dimensional gas concentration have the identical joint probability distribution as the multidimensional gas concentration. The Kullback–Leibler (KL) divergence between multidimensional and low-dimension gas concentration is used to measure the correlation between P and Q. The loss functions \(L\) between P and Q is calculated as:

$$ L(b_{{t_{1} }} ,b_{{t_{2} }} , \cdots b_{{t_{d} }} ) = \sum\limits_{k} {KL(P_{k} ||Q_{k} )} = \sum\limits_{k} {\sum\limits_{l} {P_{kl} \log \frac{{P_{kl} }}{{Q_{kl} }}} } $$
(5)

The loss function L is minimized in the weight updating process using a gradient descent algorithm. The t-SNE algorithm's gradient is defined as:

$$ \frac{\partial L}{{\partial b_{{t_{k} }} }} = 4\sum {(P_{kl} - Q_{kl} )} (b_{{t_{k} }} - b_{{t_{l} }} )(1 + \left\| {b_{{t_{k} }} - b_{{t_{l} }} } \right\|^{2} )^{ - 1} $$
(6)

The weight updating of Eq. (6) is derived as:

$$ b_{t}^{n} = b_{t}^{n - 1} + \eta \frac{\partial L}{{\partial b_{t} }} + \alpha (n)(b_{t}^{n - 1} - b_{t}^{n - 2} ) $$
(7)

where \(\eta\) is the learning rate, \(\alpha (n)\) is the momentum at iteration n. The dimensionally is reduced in each iterated process described in Eq. (7). The dimension reduction process for a multi-dimension time series dataset using the t-SNE method is given in Algorithm 1.

figure a

5.2 Essential feature extraction using VAE layer

The VAE layer is taking input from preprocessing layer. In this layer, essential features are extracted from the low dimension gas concentration \(B(i,j) \in {{\rm R}}^{z}\). The extracted dataset is passed to the bi-LSTM layer, which predicts gas concentration. The VAE is a generic deep learning model. The working principle of VAE is similar to variational Bayesian learning. The VAE extracts essential features from a low-dimensional gas concentration and produces new information. Figure 4 represents the VAE layer. According to Fig. 4, the VAE layer is split into two components: encoder and decoder. The encoder is created the latent vector from the input dataset, which extracted the main feature. The decoder rebuilds the input dataset using the latent vector to back to the original input dataset. The input dataset is denoted as \(B(i,j)\); latent vector pointed as \(C(i,j)\) which extracted the main features, encoder parameter is represented as \(\phi\), and decoder parameter is marked as \(\theta\). The encoder is labeled as \(q_{\phi } (C(i,j)|B(i,j))\), and decoder is described as \(p_{\phi } (B(i,j)|C(i,j))\).

Fig. 4
figure 4

The architecture of variational auto-encoder layer

The VAE training procedure is described as follows:

  1. i

    The encoder \(q_{\phi } (C(i,j)|B(i,j))\) takes the input from the input dataset \(B(i,j)\). Then encoder is generated a latent vector \(C(i,j)\) using two vectors; the means vector \(\mu_{\phi } (B(i,j))\) and the variance vector \(\sigma_{\phi }^{2} (B(i,j))\).

  2. ii

    The latent vector \(C(i,j)\) is sampled based on Gaussian distribution using the mean vector \(\mu_{\phi } (B(i,j))\) and the variance vector \(\sigma_{\phi }^{2} (B(i,j))\). The reparameterization trick (Kingma and Welling 2013; Kingma et al. 2015) is used in the sample \(C(i,j)\).

  3. iii

    The decoder \(p_{\phi } (B(i,j)|C(i,j))\) has been reconstructed \(B(i,j)\) from the latent vector \(C(i,j)\). The decoder's posterior distribution is assumed to be Gaussian in this case. The decoder may still immediately measure the mean vector \(\mu_{\theta } (C(i,j))\) and the variance vector \(\sigma_{\theta }^{2} (C(i,j))\) to regenerate the \(B(i,j)\).

  4. iv

    We are using the lower bound of the periphery likelihood \(p_{\theta } (B(i,j))\) for calculating the gradient. Then, the parameter is updated in the backpropagation process.

In this paper, the VAE layer utilizes the Gaussian distribution to generate an essential feature from the input dataset. The encoder (\(\phi\)) and decoder (\(\theta\)) constraints are trained by maximizing the periphery likelihood \(\log p_{\theta } (B(i,j))\). The \(\log p_{\theta } (B(i,j))\) is calculated in the following equation:

$$ \begin{aligned} \log p_{\theta } (B(i,j)) & = \log \int {p_{\theta } (B(i,j)|C(i,j))p(C(i,j))} dC \\ & = \log \int {q_{\phi } (C(i,j)|B(i,j))\frac{{p_{\theta } (B(i,j)|C(i,j))p(C(i,j))}}{{q_{\phi } (C(i,j)|B(i,j))}}} dC \\ \end{aligned} $$
(8)
$$ \ge \int {q_{\phi } (C(i,j)|B(i,j))\frac{{p_{\theta } (B(i,j)|C(i,j))p(C(i,j))}}{{q_{\phi } (C(i,j)|B(i,j))}}} dC $$
(9)
$$ \begin{gathered} = \int {q_{\phi } (C(i,j)|B(i,j))\{ \log \frac{p(C(i,j))}{{q_{\phi } (C(i,j)|B(i,j))}} + \log p_{\theta } (B(i,j)|C(i,j))dC\} } \hfill \\ = \int {q_{\phi } (C(i,j)|B(i,j))} \log p_{\theta } (B(i,j)|C(i,j))dC \hfill \\ - \int {q_{\phi } (C(i,j)|B(i,j))\log \frac{p(C(i,j))}{{q_{\phi } (C(i,j)|B(i,j))}}} \hfill \\ \end{gathered} $$
$$ = E_{{C(i,j) \sim q_{\phi } (C(i,j)|B(i,j))}} [p_{\theta } (B(i,j)|C(i,j))] - KL(q_{\phi } C(i,j)|B(i,j)||p(C(i,j))) $$
(10)

where \(p(C(i,j)) = \aleph (C(i,j);0,I)\) and \(p_{\theta } (B(i,j)|C(i,j)) = \aleph (B(i,j);\mu_{\theta } ,\sigma_{\theta }^{2} )\). Equation (10) is generated according to the number of sampling approximations. Assume that the number of gas concentration samples L, the approximation is measured as:

$$ \log p_{\theta } (B(i,j)) \cong \frac{1}{L}\sum\limits_{l = 1}^{L} {\log p_{\theta } (B(i,j)|C(i,j)^{l} )} - KL(q_{\phi } (C(i,j)|B(i,j)||p(C(i,j))) $$
(11)

The latent vector is measured from the mean vector \(\mu_{\phi } (B(i,j))\) and the variance vector \(\sigma_{\phi }^{2} (C(i,j))\) using the following reparameterization trick.

$$ C(i,j) = \mu_{\phi } (B(i,j)) + \sigma_{\phi } (B(i,j)) \odot \in ( \in \sim \aleph (0,I)) $$
(12)

Finally, the VAE error rate is described as follows:

$$ \zeta (\theta ,\phi ,B(i,j)) = - \log p_{\theta } (B(i,j)) $$
(13)

5.3 Prediction layer based on bi-LSTM

In this layer, the real-time prediction result is generated using bi-LSTM. The working principle of the LSTM layer is similar to the recurrent neural network (RNN) model. The LSTM model maintains one hidden layer, followed by a regular feed-forward output layer. The traditional RNN cannot resolve the vanishing gradient and long-standing dependents problem. But LSTM efficiently solves the vanishing gradient and long-standing dependents problem. The long-standing dependents problem is defined as when the time interval is increased for time series data. The learned information cannot connect to significantly past information, leading to the vanishing gradients problem. The historical and future characteristics of time series data are helpful in the prediction process. If the model is built using historical and future time series data characteristics, it efficiently predicts the future concentration of gases. But the hidden layer of LSTM only contains the feature from the historical data. As a result, a bidirectional LSTM model is used in this research. The model is trained using history and future features from the time series data, efficiently predicting the gas concentration. The bi-LSTM prediction model is represented in Fig. 5. The left side of Fig. 5 explains the bi-LSTM architecture, and the right side designates LSTM neural network.

Fig. 5
figure 5

The proposed prediction model's architecture for estimating the gas level of an enclosed region in an underground mine

The LSTM comprises an input gate, a forget gate, and an output gate. The logistic nonlinearity \(\sigma\) is included in the three defined gates. The input gate controls the input data, defining how long data are read from input time series data. The VAE model extracted the feature \(C_{(t,t - 1, \cdots ,t - l)}\) of l hours before the timestamp T and passed it to the bi-LSTM model as an input. This paper aims to predict the gas concentration of an enclosed region N hours after the timestamp T. Both l and N are preset time intervals. The intake is calculated using the subsequent Equation:

$$ i_{t} = \sigma (U_{i} h_{t - 1} + W_{i} x_{t} + b_{i} ) $$
(14)
$$ c_{t} = f_{t} *c_{t - 1} + i_{t} *\tanh (U_{c} h_{t - 1} + W_{c} x_{t} + b_{c} ) $$
(15)

where \(\sigma\) is denoted as sigmoid function, i, c and f are represented as input gate, cell state vector, and forget gate. The ht is marked as a hidden state vector of the bi-LSTM neural network. The Ui and Uc are defined as the weighted value of the hidden state of the bi-LSTM neural network. The Wi and Wc are the weighted value of the input gate and cell state for the input xt of the bi-LSTM neural network. The bc is defined as a bias vector and, ct is defined as a cell state vector.

The middle gate (forget gate) determines how far to overlook the present state data. The forget gate manages some data features from the input data feature. The forget gate is defined as:

$$ f_{t} = \sigma (U_{f} h_{t - 1} + W_{f} x_{t} + b_{f} ) $$
(16)

where \(\sigma\) is denoted as a sigmoid function, the ht−1 is indicated the hidden state vector, and the subscript t is represented as the timestamp. The Uf is represented as weighed matrices of the hidden state \(h_{t - 1}\). The Wf is the weighted value of the forget gate for the input xt of the bi-LSTM neural network. The bf is denoted as a bias vector.

The output gate finally predicts gas concentration. The hidden state ht is represented in the next movement. The output gate is defined as:

$$ o_{t} = \sigma (U_{o} h_{t - 1} + W_{o} x_{t} + b_{o} ) $$
(17)
$$ h_{t} = o_{t} * \tanh (c_{t} ) $$
(18)

where \(\sigma\) is denoted as sigmoid function. The Uo is represented as weighed matrices of the hidden state ht. The Wo is the weighted value of the output gate for the input xt of the bi-LSTM neural network. The bo is denoted as a bias vector.

The proposed bi-LSTM neural network model efficiently analyses time interval values than LSTM neural network model. It is an analysis of data in both backward and forward movements. The historical and future time interval value can affect the forecasting of the present value. The feature using past and present time series data can more accurately predict gas concentration. The bidirectional LSTM deep learning model's parameters can be used in the forecasting process. The information is stored in the backward direction vector. From backward to forward, the bi-LSTM model is improved. Therefore, the combination of backward and forward information enhances the prediction result. Figure 6 shows the training process of the bi-LSTM model. Here time series of different gas sensor data are selected for the model training process. The forward LSTM takes the input from \(t = 1\) to 2T, and the backward LSTM takes the information from \(t = 2T\) to 1. It predicts gas concentration. The combination of backward and forward LSTM produces efficient and error-free prediction results.

Fig. 6
figure 6

The training example of the bi-LSTM model

6 Experimental results

6.1 Dataset

The experiment was conducted for an Indian underground coal mining site to determine the prediction accuracy of CH4, CO2, CO, O2, and H2 gases using the proposed model. The dataset contained the hourly time interval gas concentration of a sealed-off area from September 12, 2019, to February 12, 2020, depicted in Fig. 7. The \(\mathrm{X}\)-axis of Fig. 7 represents one hour's time interval from September 12, 2019 to February 12, 2020, and \(\mathrm{Y}\)-axis represents the corresponding concentration values of different gases. Table 3 displays the names and units of the input variables.

Fig. 7
figure 7

Hourly time interval concentration of CH4, CO2, CO O2, and H2 gases (from left to right, top to bottom) from September 12, 2019 to February 12, 2020

Table 3 Input variables of the dataset

The t-SNE_VAE_bi-LSTM, ARIMA, and CHAOS models were trained using the collected data. After model training, different gas concentrations were predicted using the proposed trained model and compared the proposed model's effectiveness with the existing ARIMA and CHAOS models.

In the experiment process, 80\(\%\) of the collected data were used for model training, and reaming 20\(\%\) of data was used to test the system. Table 4 gives the information about the distribution process of the collected dataset. The training dataset of gas concentration was used to train the model. The model was trained using 300 iterations. In each iteration process, the performance of the model was optimized by updating the gradient error. In the model's validation process, the training parameters were adjusted to increase the trained model's generalization capability and remove the over-fitting problem by dropping some training parameters. Figure 8 represents the training loss and validation loss of t-SNE_VAE_bi-LSTM of the trained model. Finally, the testing process demonstrated the effectiveness of the trained model, which is described in the result section.

Table 4 Details of the dissemination process of the concentration of gases
Fig. 8
figure 8

The training and validation losses of the proposed model

6.2 Prediction result

The experiment was conducted on the proposed prediction model for evaluating the prediction accuracy. The proposed model's prediction accuracy was compared with two traditional machine learning models, namely ARIMA and CHAOS. Here input gas concentration was passed to the proposed model, and it produced the forecasting gas concentration. The mean squared error (MSE) and mean absolute error (MAE) was used to associate the t-SNE_VAE_bi-LSIM model and two traditional machine learning models, namely ARIMA and CHAOS models. The RMSE and MAE are defined as:

$$ RMSE = \sqrt {\frac{{\sum\limits_{i = 1}^{l} {(A_{i} - P_{i} )^{2} } }}{l}} $$
(19)
$$ MAE = \sqrt {\frac{{\sum\limits_{i = 1}^{l} {\left| {A_{i} - P_{i} } \right|} }}{l}} $$
(20)

where Ai is the actual value of CH4, CO2, CO, O2, and H2 gas concentration, Pi is denoted as predicted results of five gases and i is indicated the number of gases. Therefore, the smaller RMSE and MAE values represent the forecasting accuracy and efficiency of the prediction model.

Before the model training process, each gas concentration was preprocessed, as described in Sect. 5.1. After preprocessing, each value of the CH4, CO2, CO, O2, and H2 was normalized to [0, 1]. The normalization process is described below:

$$ Normalize_{Variable} = \frac{V - \min (V)}{{Max(V) - Min(V)}} $$
(21)

where V is denoted as gas concentration. After the normalization process, the normalized gas concentrations were sent to VAE with LSTM models for training purposes. Three hundred iterations performed the training process. A backpropagation method was employed to minimize the training error, and the LSTM model was optimized using the Adam optimizer. The batch size was set to 64 samples each iteration throughout the training phase, and the training rate was set at 10−3. Table 5 gives the training parameters used in VAE and bi-LSTM models.

Table 5 Training parameters used in VAE and bi-LSTM models

The correlations among CH4, CO2, CO, O2, and H2 concentrations of gases were verified in the validation process. Then, the prediction model was trained using alone CH4, CO2, CO, O2, and H2 gas concentration (before correlation) and after correlation of the respective gas concentration. After the training process, the correlated prediction accuracy of the five gases was compared with the proposed t-SNE_VAE_bi-LSTM method with the existing ARIMA and CHAOS machine learning models. Figure 9 shows the real value versus correlated prediction value of CH4, CO2, CO, O2, and H2 for t-SNE_VAE_bi-LSTM, ARIMA, and CHAOS models over 739 h (from January 13, 2020, to February 12, 2020) from the testing data in the forecasting phase. The X-axis of Fig. 9 denotes the time interval (in an hour), and Y-axis indicates the predicted level of CH4, CO2, CO, O2, and H2 gases. The upper part of each figure's color bar represents the observed value using different machine learning models. The experimental results indicated that the t-SNE_VAE_bi-LSTM model achieved better accuracy than ARIMA and CHAOS models in the forecasting phase.

Fig. 9
figure 9figure 9

True value versus correlated prediction value of CH4, CO2, CO, O2, and H2 (from top to bottom) using t-SNE_VAE_bi-LSTM, ARIMA, and CHAOS models over 739 h in the forecasting phase

Table 6 indicates the standard deviation of the percentage difference of the predicted CH4 gas concentrations for the t-SNE VAE bi-LSTM, ARIMA, and CHAOS models from January 13, 2020 to February 12, 2020. It has 739 h of predicted data from three models, with an average of 24 h of data in each row. The standard deviation has been calculated based on the percentage difference between the actual and predicted CH4 gas concentrations for the three models. Therefore, the standard deviation for the three models has been included in the last row of Table 6. Similarly, in the electronic supplementary material file, Tables A1A4 have the standard deviation of the three models' CO2, CO, O2, and H2 concentrations of gases.

Table 6 Standard deviation of percentage difference of the predicted result of CH4 gas concentration for the t-SNE_VAE_bi-LSTM, ARIMA, and CHAOS models from January 13, 2020 to February 12, 2020

Figure 10 depicts a comparison of the standard deviation of CH4, CO2, CO, O2, and H2 gas concentrations for the proposed model, ARIMA, and CHAOS models. For CH4 prediction, the standard deviation of the proposed model, ARIMA, and CHAOS models was found to be 5.05%, 8.88%, and 8.89%, respectively. Similarly, CO2 was found to be 4.48%, 5.65%, and 5.99%; CO was found to be 4.28%, 4.30%, and 4.62%; O2 was found to be 10.85%, 43.0%, and 43.51%; and H2 was found to be 6.94%, 8.49%, and 8.25%. Thus, when compared to the ARIMA and CHAOS models, the proposed model has a lower standard deviation. Consequently, to achieve accuracy, the proposed model beats the ARIMA and CHAOS models.

Fig. 10
figure 10

Standard deviation of percentage difference of the measured and predicted values by the proposed model, ARIMA and CHAOS models

Mean square error (MSE) and mean absolute error (MAE) results are described for the correlated prediction values of the proposed t-SNE_VAE_bi-LSTM model with ARIMA and CHAOS models. Table 7 gives MSE and MAE results of the t-SNE VAE_bi-LSTM model with ARIMA and CHAOS models. The MSE results of CH4, CO2, CO, O2, and H2 were 0.077, 0.998, 0.077, 0.298 and 0.233, respectively, for the proposed t_SNE_VAE_bi-LSTM model. The MSE results of CH4, CO2, CO, O2, and H2 were 0.106, 1.035, 0.169, 2.179, and 1.468, respectively, for the ARIMA model. The MSE results of CH4, CO2, CO, O2, and H2 were 0.146, 1.017, 0.169, 2.190, and 1.433, respectively, for the CHAOS model. The MAE results of CH4, CO2, CO, O2 and H2 were 0.369, 1.018, 0.296, 0.58 and 0.549, respectively, for the proposed t_SNE_VAE_bi-LSTM model. The MAE results of CH4, CO2, CO, O2, and H2 were 0.489, 1.082, 0.412, 1.476, and 1.211, respectively, for the ARIMA model. The MAE results of CH4, CO2, CO, O2, and H2 were 0.510, 1.091, 0.411, 1.480, and 1.191, respectively, for the CHAOS model. Figures 11 and 12 represent the comparative analysis of MSE and MAE results of the proposed t-SNE_VAE_bi-LSTM, ARIMA, and CHAOS models. Figure 11 indicates that the MSE result of the proposed t-SNE VAE bi-LSTM model for the CH4 forecasted value is less than 0.029 and 0.069 for the ARIMA and CHAOS models, respectively. Similarly, the proposed t-SNE VAE bi-LSTM model outperformed the ARIMA and CHAOS models by 0.037 and 0.019 for CO2; 0.092, 0.092 for CO; 1.881, 1.892 for O2; 1.235, 1.200 for H2. Figure 12 shows that the MAE result of the proposed t-SNE_VAE_bi-LSTM model is less than 0.120 and 0.141 for ARIMA and CHAOS models, respectively, CH4 predicted value. Similarly, the proposed t-SNE_VAE_bi-LSTM model's MAE result was 0.064 and 0.073 lower than the ARIMA and CHAOS models for CO2; 0.116 and 0.115 for CO; 0.896 and 0.900 for O2; 0.662 and 0.642 for H2. The prediction accuracy is increased if the value of MSE and MAE is decreased. Figures 11 and 12 clearly show that the t-SNE_VAE_bi-LSTM model has less MSE and MAE value than ARIMA and CHAOS models. Thus, the proposed t-SNE_VAE_bi-LSTM model has achieved better prediction accuracy than ARIMA and CHAOS models.

Table 7 MSE and MAE values of the proposed model, ARIMA and CHAOS models in the forecasting phase
Fig. 11
figure 11

MSE value of the proposed model, ARIMA and CHAOS models in the prediction process after correlation

Fig. 12
figure 12

MAE value of the proposed model, ARIMA and CHAOS models in the prediction process after correlation

The proposed t-SNE_VAE_bi-LSTM model was trained using CH4, CO2, CO, O2, and H2 gas concentration from the dataset and the respective correlation value for each concentration of gases. Figure 13 depicts the actual measured value versus before and after the correlated prediction of CH4, CO2, CO, O2, and H2 gas concentrations using the t-SNE_VAE_bi-LSTM, ARIMA, and CHAOS models over 739 h in the forecasting phase, where the X-axis represents the time interval (h), and the Y-axis represents the individual gas level.

Fig. 13
figure 13figure 13

Actual measured values verses before and after the correlated prediction of CH4, CO2, CO, O2, and H2 concentrations (from top to bottom) using t-SNE VAE bi-LSTM models during the forecasting phase

Mean square error (MSE) and mean absolute error (MAE) results are described before correlation and after correlation processes of the proposed t-SNE_VAE_bi-LSTM model. Table 8 gives MSE and MAE results before correlation and after correlation processes of forecasting value of 5 gases. The MSE results of CH4, CO2, CO, O2, and H2 were 0.094, 1.073, 0.077, 0.333, and 0.240, respectively, before correlation. After correlation, the MSE results of CH4, CO2, CO, O2, and H2 were 0.077, 0.998, 0.077, 0.298, and 0.233, respectively. The MAE results of CH4, CO2, CO, O2, and H2 were 0.378, 1.053, 0.296, 0.596, and 0.550, respectively, before correlation. After correlation, the MAE results of CH4, CO2, CO, O2, and H2 were 0.369, 1.018, 0.296, 0.581, and 0.549, respectively. Figures 14 and 15 represent the comparative analysis of MSE and MAE results before and after correlation processes for the t-SNE_VAE_bi-LSTM based forecasting model. Figure 14 shows that MSE results after the correlation process were 0.017, 0.075, 0.000, 0.035, and 0.007 less than before the correlation for CH4, CO2, CO, O2, and H2 forecasting results t-SNE_VAE_bi-LSTM model. Figure 15 depicts that MAE results after the correlation process were 0.009, 0.035, 0.000, 0.015, and 0.001, less than before correlation for CH4, CO2, CO, O2, and H2 forecasting results, respectively, by t-SNE_VAE_bi-LSTM model. The fewer MSE and MAE results increased the efficiency of the t-SNE_VAE_bi-LSTM model in the forecasting phase. Figures 14 and 15 clearly show that MSE and MAE results after correlation were less than before correlation. Thus, after correlating five gas concentrations, the t-SNE_VAE_bi-LSTM model has achieved batter accuracy in the prediction process.

Table 8 MSE and MAE values of the t-SNE_VAE_bi-LSTM model before and after correlations
Fig. 14
figure 14

MSE value before and after correlations of the proposed t-SNE_VAE_bi-LSTM model

Fig. 15
figure 15

MAE value before and after correlations of the proposed t-SNE_VAE_bi-LSTM model

7 Conclusions

A novel prediction technique has been proposed to predict the CH4, CO2, CO, O2, and H2 concentration of gases in the enclosed region in the underground coal-mining site. The said five gas values had been correlated during training phases. The correlation among five gas concentrations increased the forecasting precision of the proposed model. Before the training process, the five gas concentrations were preprocessed using the t-SNE algorithm, reducing the dimension of the gas concentrations. The preprocess input data were given to the VAE layer, where the essential features were extracted from the input data. It has improved the efficiency of the prediction model. The output data of VAE were sent to the bi-LSTM model, where the actual forecasting model was trained to predict the sealed-off area's gas concentration. The forwarding and backward direction in bi-LSTM efficiently handled the time interval value and increased the forecasting precision. The forecasting value shows that the proposed model has fewer predicted MSE and MAE values than ARIMA and CHAOS models. Thus, the proposed model may be utilized for online monitoring and predicting concentrations of gases in the enclosed region of the underground coal mining site.

Future works include predicting other mine hazards, like roof fall in underground mines, slope failure in opencast mines, etc.