Keywords

1 Introduction

The Internet of Things and Industry 4.0 in the mining industry have become a fact. The great step in underground industrial advancement which completed automatization in the field is the convergence of industrial systems with the power of advanced computing, analytics, low-cost sensing and new levels of connectivity. Smart sensor technologies and advanced technics of analysis play an important role in mining process monitoring and improvement [11].

Automation has enabled access to very detailed data characterizing the operation of machines and devices (stored in monitoring systems). Longwall automation and monitoring systems allows a closer look at the ongoing processes underground. A vast amount of data is generated that should be used and handled more efficiently in a modern mining operation [4]. Nowadays, the analytics of collected data is mainly based on data-oriented techniques: BI techniques for operational report creation as well as on more advanced analytic data mining and machine learning techniques for predictive maintenance purposes [8]. Thus, for acquiring new knowledge about ongoing processes underground, we proposed process-oriented analysis of the gathered data [2].

In the paper we present example data from a low-level machinery monitoring system used in underground mine, which can be used for the modelling and analysis of the mining process carried out in a longwall face.

Our work aims proposal of original extension of data analysis from low-level longwall machinery monitoring system with process mining techniques and according to the authors’ best knowledge, it is the first attempt of process mining usage in the mining domain [6].

The basic challenge raising from the proposed analysis extension is the creation of a suitable event log for process mining purposes containing [1]: timestamp of activity, activity name and case id. This is not a trivial task since case id and activities (process stages) are not given directly in the raw data from the low-level longwall machinery monitoring system. Moreover, there is no procedure to identify case id in the raw data from the longwall monitoring system, since no one has applied process mining in the mining domain.

To address the mentioned challenges we prepared two procedures for data processing:

  1. 1.

    For activities’ name definition we proposed the mixture of supervised and unsupervised data mining techniques as well as domain knowledge, presented in more detail in [2]. That proposal contains among others: data cleaning, clustering and classification for labelling the process stages in the raw data.

  2. 2.

    For case id identification, we propose the heuristic approach presented in this paper. Our solution is an example of how we can handle with raw data related to the cyclic process in a specific production domain without clear marking of the start and the beginning of the case.

Our both procedures are written in R, mainly using libraries: dplyr , arules , cluster , forecast , CHAID and rpart .

The paper is structured as follows: Sect. 2 includes mining process description. In Sect. 3 identification of case id in raw data is presented. An example of created event log is described in Sect. 4. Conclusions are presented in Sect. 5.

2 Process Description

The mining process can be defined as a collection of mining, logistics and transport operations. One of the most complex and difficult examples of its realisation is underground mining characterized by changeable geological and mining conditions as well as natural hazards not occurring on the surface. Very interesting is the nature of the mining process in the longwall system that is performed by machines and devices moving in a workspace and also in relation to each other.

Main longwall equipment includes (Fig. 1): a shearer (A), an armoured longwall conveyor (B) and mechanized supports (C).

Fig. 1.
figure 1

Longwall machinery (https://famur.com/upload/2016/09/FAMUR_01-1.jpg)

Each of the mentioned machines realises its own operation process, consequently the mining process in a longwall face can be seen as collection of machinery processes. The mining process includes even up to a hundred processes (depending on the dimensions of a mining excavation and number of mechanized supports).

In the paper we focus on the operation process of main longwall machinery, namely the shearer. The operation of the shearer indicates the cycle of a whole mining process [12], therefore it is the most intuitive choice for case id in an event log. The theoretical shearer operation process is presented in Fig. 2.

Fig. 2.
figure 2

Source: based on [10]

Example cycle of shearer operation.

In general a shearer operation cycle consists of several characteristic phases. Firstly, the shearer starts cutting from the driver unit side (1). The next phase is indentation where a shearer is cutting into the turning station direction for a distance of 30–40 m. Together with the movement of the shearer a longwall conveyor is shifting (2). The third step is cutting into the driver unit side – longwall cleaning (3). Along with the shearer the powered roof support is moving. In the next phase a shearer is cutting without loading for a distance of 30–40 m (4), after that it is cutting throughout the longwall till the turning station. Along with the movement of the shearer, the conveyor and powered roof support are moved (5, 6, 7).

The basic indicator of the shearer’s movement is the value of the “Location in the longwall” variable. The ideal model of the shearer operation with activity names is presented in Fig. 3.

Fig. 3.
figure 3

Example cycles of shearer operation in time dimension

The real location of a shearer in a raw data is presented in Fig. 4.

Fig. 4.
figure 4

Example cycles of the shearer operation

Two main challenges in modelling the mining process based on real data are illustrated well on the picture: data quality and cycles variability. The first challenge has a major source in technical problems in data transfer from the machinery to the surface, especially in the case of power off events and data retrieving from the machinery local data containers. The second challenge is strictly related to the mining and geological conditions of process realisation.

It should be also mentioned that raw data contain various quantitative and qualitative (mostly binary) variables that in some way describe the process stages, not directly as activity names. It needs a lot of analytic efforts to build event logs on top of it (activity recognition, abstraction level choice etc.). Our contribution in this area is presented in the following sections.

3 Identification of Case ID in a Raw Data

In this section we present issues related to case id identification for the purpose of event log creation based on the shearer operation data from the selected hard coal mine. Raw data related to the mentioned process include 2.5 million records from a monthly period obtained from one of the Polish mining companies.

In Table 1 the selected variables characterizing the shearer operation are presented.

Table 1. Selected variables characterizing the shearer operation

The identification of the shearer’s work cycles (case id) was mainly based on the analysis of the attributes “Location in the longwall” (distance below 5 m and over 135 m) and “Shearer speed” (equal to 0 m/s).

The first approach of the cycle start and finish identification was based on the classic analysis of local minimum and maximum. This approach did not yield satisfactory results. The main problem was related to large local process variability.

Therefore, the heuristic approach with the following steps was proposed.

  1. 1.

    The shearer’s position in the longwall face was split into three ranges (Fig. 5) according to the technological conditions and theoretical model of the cycle:

    Fig. 5.
    figure 5

    Ranges of the shearer’s position

    • the beginning of the longwall face - distance below 5 m (was marked with 2),

    • the end of the longwall face - over 135 m (was marked with 1),

    • and in the middle of the longwall face (marked with 0).

  2. 2.

    In the range sets, below 5 m and over 135 m, the local minimum (1) and maximum (2) were detected accordingly.

  3. 3.

    Characteristic peaks (start and end of the cycle) were identified by the selection of sequences only with specific range (1) and (2) order (Fig. 6).

    Fig. 6.
    figure 6

    Visualization of the beginning and the end of the cycles

In the analysed dataset 75 cycles were identified (9 cycles are presented in Fig. 7)

Fig. 7.
figure 7

Example of identified cycles

In the most cases the proposed heuristic enabled the identification of the cycle start and end correctly. The errors in identification were caused mainly by data quality. The red arrows in the Fig. 7 points out one of the main issues: incorrect state of location. Thus, the presented approach is sensitive on data quality and further works will be focused on improving data cleaning at the early stages to avoid the mentioned issue.

In the case of lack of data in a shearer location variable extrapolation between the nearest two points existing in the data can be done. We know how the theoretical and technological cycle looks like, so extrapolation, based also on other variables values, could be verified.

The shearer cycle is crucial for all machinery working in the longwall face, because the rest of the machines and devices are just adjusting to the shearer position in the cycle. Therefore, for process modelling purposes, it is very important to find the way how in raw data a start and end of the cycle can be identified. Especially, when real cycles are varied very much from theoretical models.

Although our approach is for a very specific domain, we think that it can be helpful for the creation of event logs for similar problems and processes.

4 Creation of an Event Log

The creation of an event log based on sensor data, beside a case id identification requires the recognition and identification of activities (process stages). In these cases supervised and unsupervised techniques of data mining can be applied [3, 5, 7, 9, 13].

Selected variables (Table 1) were used for distinguishing the unique states of the shearer operation, according to the procedure, described in [2]. The following stages were performed:

  1. 1.

    Data preprocessing. In this stage exploratory data analysis was conducted. Subsequently an analysis of correlation for the numerical variables and cross tables for the logical variables were performed to exclude the depended variables. Then the discretization of all continuous variables into a categorical variables was carried out. Furthermore, in the final data set, containing discretized and logical variables, duplicate rows were removed.

  2. 2.

    Data clustering. For the final data set dissimilarity matrix with Gower’s distance was created. Then hierarchical clustering was carried out using the Ward’s minimum variance method. Finally, selected clusters have been labelled with activity names, based on the statistical analysis results and an expert knowledge.

  3. 3.

    Classification for labelling activity names in the raw data. In this stage instances with a labeled activity name (process stage) have been used as a learning sample in the CHAID tree algorithm. For each label, according to the CHAID tree model, unique rules have been generated and, on this base, activity labelling in the raw data was done.

The identification of case id and activity definition enable the creation of an event log presented in Table 2. The process stages labelled on the example traces are shown in Fig. 8.

Table 2. Fragment of an event log (selected in the Fig. 8)
Fig. 8.
figure 8

Labelled process stages on example traces

A created event log enables the performance of process modelling with selected techniques and formalisms [1] and further works in this scope are carried out.

5 Conclusions

Current underground machinery monitoring systems can contain streaming data from hundreds of sensors of various types. The efficient processing of such an amount of data (Big Data) for process improvements is possible only with the specific techniques of advanced analysis from data mining and process mining fields.

Process mining techniques require a specific structure of an event log with activity names and case id, that very often are not present in raw industrial sensor data. The challenges related to activity recognition and case identification are strongly connected to the data quality and nature of an analyzed process. Therefore, cleaning and preprocessing activities are needed and adequate analytic approaches should be found.

In the paper we presented case id identification problems on a selected example from the longwall monitoring system in an underground mine. The classic approach in this case has not yielded correct results due to the high variability of the process and the existence of many local optima, thus the heuristic approach was developed.

We have contributed original solutions (procedures) for an event log creation from a low-level machinery monitoring system in underground mining for process mining purposes. Future challenges will be related to process modelling based on prepared event logs in the case of high process variability.