A Classical Approach to Modeling of Coal Mine Data

Yılmaz, Mehmet; Potas, Nihan; Buyum, Buse

doi:10.1007/978-3-319-18693-1_7

Mehmet Yılmaz²⁰,
Nihan Potas²¹ &
Buse Buyum²²

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

1933 Accesses
1 Citations

Abstract

Data sets such as the occurrence time of random events or the lifetime of a certain product (or a system) are modelled by compound or mixture distributions especially in the last years. This situation is led to encounter proposal of more complex distribution models in the literature. One of the data set made a model proposal by in this way is coal mine data set. In this study, Two Component Mixed Exponential Distribution (2MED) model had more easier interpretation on this data set is used and compared with the other study results. Also, the extended coal mine data set with 191 observations is modelled by 2MED and the results are given.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Pliant Model to Count Data: Nabla Poisson–Lindley Distribution with a Practical Data Example

Article 21 April 2023

The Maxwell–Weibull Distribution in Modeling Lifetime Datasets

Article 29 May 2020

The Exponentiated Generalized Marshall–Olkin Family of Distribution: Its Properties and Applications

Article 05 June 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

7.1 Introduction

Mining is an important source of foreign exchange for many developing countries. But this important source of foreign exchange is also one of the sectors in which it occurs the most accidents all over the world. These coal mine accidents is still under investigation by many different disciplines in today as in the past. Coal mine accidents are often used as a real data set in studies in the field of statistics as in other fields. In these studies, some researchers have made trend analysis by taking the occurrence time of coal mine accidents, some of them have tried to model of accident occurrence time. Purpose of modeling studies is obtained the model that can best forecast (or model) these processes. Because of estimate of the occurrence time of these accidents is of vital importance to prevent them addition to the measures that can be taken.

In the current study, the data set, which is one of the most widely used in the literature, is obtained firstly by Maguire et al. [8] is firstly analyzed by Cox and Lewis [2], is handled. The data set obtains the time intervals (in days) between coal mine accidents concluded death of 10 or more men. In later, this data set is arranged by Jarrett [6] and is extended to 191 observations. Some researchers who try to model the data set with 109 observations are Adamidis and Loukas [1], Kus [7], Mirhossaini and Dolati [10], and Rodriguesa et al. [11]. Some of them use non-mixture distributions such as Exponential, Gamma and Weibull and the others use mixture distributions such as Exponential-Poisson (EP), Exponential-Gamma (EG) and Exponential Conway-Maxwell Poisson (ECOMP).

In this study, Two Component Mixed Exponential Distribution (2MED) is used to model this famous data set. Aim of this study, propose 2MED as a new distribution (model) in addition to distributions used in modeling study. First of all, properties of 2MED, then parameter estimations of Maximum Likelihood (MLE) and the Least Squares (LSE) will be introduced. In here, MLE is obtained by Expectation-Maximization (EM) algorithm which is one of the numeric way. The results that is obtained from other studies and from this study will be compared with Kolmogorov Smirnov Test Statistic (KS) and it is tried to indicate that how 2MED is successful about modeling this data set. Parameter estimations, KS values and p values (p) are obtained by MATLAB.

7.2 Parameter Estimations Methods for 2MED

In this section, some basic properties of 2MED are introduced and then the methods of maximum likelihood and the least squares will be given.

7.2.1 Mixed Exponential Distribution with Two-Component (2MED)

Probability density function (p.d.f) of 2MED is given below.

$$\displaystyle\begin{array}{rcl} f(x;\alpha,\theta _{1},\theta _{2})& =& \alpha f_{1}(x;\theta _{1}) + (1-\alpha )f_{2}(x;\theta _{2}) {}\\ & =& \alpha \frac{1} {\theta _{1}} \exp (-x/\theta _{1}) + (1-\alpha )\frac{1} {\theta _{2}} \exp (-x/\theta _{2}) {}\\ \end{array}$$

where α ∈ (0, 1), θ _i > 0 (i = 1, 2), x > 0. Similarly the cumulative distribution function (c.d.f) is as follows.

$$\displaystyle\begin{array}{rcl} F(x;\alpha,\theta _{1},\theta _{2})& =& \alpha F_{1}(x;\theta _{1}) + (1-\alpha )F_{2}(x;\theta _{2}) {}\\ & =& \alpha (1 -\exp (-x/\theta _{1})) + (1-\alpha )(1 -\exp (-x/\theta _{2})) {}\\ \end{array}$$

Survival function,

$$\displaystyle\begin{array}{rcl} S(x;\alpha,\theta _{1},\theta _{2})& =& \alpha S_{1}(x;\theta _{1}) + (1-\alpha )S_{2}(x;\theta _{2}) {}\\ & =& \alpha \exp (-x/\theta _{1}) + (1-\alpha )\exp (-x/\theta _{2}) {}\\ \end{array}$$

and the hazard function,

$$\displaystyle\begin{array}{rcl} h(x;\alpha,\theta _{1},\theta _{2})& =& \frac{\alpha h_{1}(x)S_{1}(x;\theta _{1}) + (1-\alpha )h_{2}(x)S_{2}(x;\theta _{2})} {\alpha S_{1}(x;\theta _{1}) + (1-\alpha )S_{2}(x;\theta _{2})} {}\\ & =& h_{1}(x) \frac{\alpha S_{1}(x;\theta _{1})} {\alpha S_{1}(x;\theta _{1}) + (1-\alpha )S_{2}(x;\theta _{2})} {}\\ & & +h_{2}(x) \frac{(1-\alpha )S_{2}(x;\theta _{2})} {\alpha S_{1}(x;\theta _{1}) + (1-\alpha )S_{2}(x;\theta _{2})} {}\\ & =& h_{1}(x)w_{1}(x;\alpha,\theta _{1},\theta _{2}) + h_{2}(x)w_{2}(x;\alpha,\theta _{1},\theta _{2}) {}\\ \end{array}$$

where $h_{i}(x) = \frac{1} {\theta _{i}}$ and $w_{1}(.) + w_{2}(.) = 1.$

7.2.2 Maximum Likelihood Method for 2MED

Let $\ \underline{X} =\{ X_{1},X_{2,\ldots,}X_{n}\}$ be a random sampling with independent and identically distributed as 2MED having a p.d.f $f(\underline{x};\underline{\varPhi })$ where $\underline{\varPhi }= (\alpha,\theta _{1},\theta _{2})$ is a parameter vector. The likelihood function and the logarithmic form of the likelihood function of $\underline{\varPhi }$ are respectively given as below:

$$\displaystyle\begin{array}{rcl} L(\underline{\varPhi };\underline{x})& =& \prod \limits _{j=1}^{n}\left [\sum \limits _{ i=1}^{2}\alpha _{ i}\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})\right ] {}\\ \log L& =& \sum \limits _{j=1}^{n}\log \left [\sum \limits _{ i=1}^{2}\alpha _{ i}\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})\right ] -\lambda \left (\sum \limits _{i=1}^{2}\alpha _{ i} - 1\right ) {}\\ \end{array}$$

where $\sum \limits _{i=1}^{2}\alpha _{i} = 1$. If the derivative of this function respect to α _i, i = 1, 2 is equalized to zero,

$$\displaystyle{\frac{\partial \log L} {\partial \alpha _{i}} =\sum \limits _{ j=1}^{n} \frac{\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})} {\sum \limits _{i=1}^{2}\alpha _{i}\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})}-\lambda = 0}$$

then

$$\displaystyle{ \sum \limits _{j=1}^{n} \frac{\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})} {\sum \limits _{i=1}^{2}\alpha _{i}\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})} =\lambda }$$

(7.1)

Multiplying the both side of (7.1) by α _i and taking the sum over index i:

$$\displaystyle{\sum \limits _{j=1}^{n}\sum \limits _{ i=1}^{2} \frac{\alpha _{i}\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})} {\sum \limits _{i=1}^{2}\alpha _{i}\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})} =\lambda \alpha _{i}}$$

then n = λ. Based on Bayes’ rule, the probability that x _j belongs to ith component when X _j = x _j is observed is as follows:

$$\displaystyle{P(i\mid x_{j}) = \frac{\alpha _{i}\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})} {\sum \limits _{i=1}^{2}\alpha _{i}\frac{1} {\theta _{i}} \exp (-x_{j}/\theta _{i})}}$$

Thus,

$$\displaystyle{\hat{\alpha }_{i} = \frac{\sum \limits _{j=1}^{n}P(i\mid x_{j})} {n} }$$

where i = 1, 2. If the derivative of logL with respect to θ _i is equalized to zero,

$$\displaystyle\begin{array}{rcl} & \frac{\partial \log L} {\partial \theta _{i}} =\sum \limits _{ j=1}^{n}\frac{\frac{\alpha _{i}} {\theta _{i}}\left (\frac{x_{j}} {\theta _{i}^{2}} \right )\exp \left (\frac{-x_{j}} {\theta _{i}} \right )- \frac{\alpha _{i}} {\theta _{i}^{2}} \exp \left (\frac{-x_{j}} {\theta _{i}} \right )} {\sum \limits _{i=1}^{2}\alpha _{i}\frac{1} {\theta _{i}}\exp \left (\frac{-x_{j}} {\theta _{i}} \right )} = 0& {}\\ & \hat{\theta }_{i} = \frac{\sum \limits _{j=1}^{n}x_{ j}P(i\mid x_{j})} {\sum \limits _{j=1}^{n}P(i\mid x_{j})} & {}\\ \end{array}$$

where i = 1, 2. $\hat{\theta }_{i}$ is obtained and reminded that $P(2\mid x_{j}) = 1 - P(1\mid x_{j})$, then the solutions will be

$$\displaystyle{\hat{\theta }_{1} = \frac{1} {n\hat{\alpha }_{i}}\sum \limits _{j=1}^{n}x_{ j}P(i\mid x_{j})}$$

$$\displaystyle{\hat{\theta }_{2} = \frac{1} {n(1 -\hat{\alpha }_{i})}\sum \limits _{j=1}^{n}x_{ j}(1 - P(i\mid x_{j})).}$$

As seen in the above, the parameter estimations can not been obtained directly from derivative equations. Therefore numeric ways are preferred for solving of these equations. In this study, EM algorithm which is one of the numeric way is taken into account [3, 4, 9]. These are step solutions obtained by EM which steps are given:

1.
Input the initial values $(\alpha _{i}^{(0)},\theta _{i}^{(0)}),$ i = 1, 2.
2.
Calculate the P(i∣x _j).
3.
Calculate $\hat{\alpha }_{i}^{(k)},\hat{\theta }_{i}^{(k)}$
4.
After calculations of $\hat{\alpha }_{i}$ and $\hat{\theta }_{i}$, the values replace in logL and get the value of function. For ε > 0 selected small enough $\log L^{(k)} -\log L^{(k-1)} \leq \epsilon$ is provided then the values on the kth step will be used for parameter estimations. Steps 2–4 are repeated until converge is accomplished.

7.2.3 The Least Squares Method for 2MED

This method is based on the idea that there is a regression relationship between empirical $\hat{F}$ and parametric F distributions. Considering ordered observations $x_{(1)} \leq x_{(2)} \leq \ldots \leq x_{(n)}$ versus empirical distribution $\hat{F}(x_{(i)}) \equiv i/(n + 1)$, the vector Φ which minimizes the following expression is tried to determine. Detailed study was given in Gupta and Kundu [5] for non-mixture Generalized Exponential Distribution. System of equations that is occurred for the solutions for this optimization problem is as follows.

$$\displaystyle{Q\left (\underline{\varPhi }\right ) =\sum \limits _{ i=1}^{n}\left (\hat{F}\left (x_{ (i)}\right ) - F\left (x_{(i)};\underline{\varPhi }\right )\right )^{2}}$$

For solving of this optimization problem, since the expressions after derivative are related to parameters, it is difficult to obtain the solutions. Therefore it is necessary to use numerical ways. The values minimized $Q(\underline{\mbox{ $\varPhi $}})$ function are calculated numerically by current command in MATLAB. The stopping rule can be based on absolute value of the difference between the function values in the previous iteration and next iteration. So, when the measured absolute difference becomes less than 10⁽⁻²¹⁾ the search can be stopped.

7.3 Suggested and Current Models for Coal Mine Data

The data set, which is obtained firstly by Maguire et al. [8] and obtains the time intervals (in days) between coal mine accidents concluded death of 10 or more men is given in Table 7.1. In this section, the results obtained from the studies in the literature and from the current study will be compared.

Table 7.1 The time intervals (in days) between coal mine accidents

Full size table

First of all, in terms of providing comparison and ease of comment the parameter estimations, KS and p-values obtained by modeling with 2MED is given in Table 7.2.

Table 7.2 Parameter estimations, KS and p-values for 2MED

Full size table

In Adamidis and Loukas [1], firstly they are suggested Weibull and Gamma which are used frequently as non-mixture distributions and then they are tried to model the data set with EG. After modeling, KS value of EG is found as 0.076 and they said that the EG distribution fits the data set at least as good as the two popular alternatives. When this value and the KS value for 2MED according to two methods, it can be said that 2MED is more successful than EG distribution about modeling the data set.

In Kus [7], EP distribution is used in addition to the distributions used in Adamidis and Loukas [1]. The KS and p-value of EP is given in.

Table 7.3 KS and p-values for Kus [7]

Full size table

When the KS values for 2MED and EP given in Table 7.3 are compared, it can be seen that 2MED values are smaller than EP values. Therefore 2MED is the best amongst four distributions handled so far according to KS criteria.

Exponential distribution model is discussed in addition to the above non-mixture distributions in Mirhossaini and Dolati [10]. Besides non-mixture distributions, ME is used in modelling study and the results is given in the Table 7.4.

Table 7.4 KS values for Mirhossaini and Dolati [10]

Full size table

Considering the proposed model in the above, it is thinkable that ME is the closest model to 2MED. Even in this thought, it is clear that the KS values for 2MED is lower than the KS values for ME. The results is same for the other three distributions. The comment made on the results of other studies is also applied here.

The ECOMP distribution is used as well as commonly used for modeling this data set in Rodriguesa et al. [11]. In their study, the modified Cramer-von Mises (W*) and Anderson-Darling (A*) test statistics are taken into account but it is decided that which distribution is more successful according to the value. Therefore the A* value is calculated for 2MED while comparing with distributions in Rodriguesa et al. [11]. Computational code is taken from the first author Josemar Rodriguesa. The A* value is given in the Table 7.5.

Table 7.5 A* values for Rodriguesa et al. [11]

Full size table

The A* values calculated according to MLE and LSE methods for 2MED are found 0,426 and 0,722 respectively. A* value found according to MLE seems to be smaller than the value calculated for distribution in the above table. Accordingly, 2MED is more suitable for this data set.

Coal mine data set is arranged and is extended to 191 observations by Jarrett [6]. However any modelling study for 191 observations has not reached in the literature search. Here we try to show how this extended data set (given in Table 7.6) is modelled by 2MED in addition to the studies above.

Table 7.6 The extended coal mine data set

Full size table

Table 7.7 Parameter estimations, KS and p-values for 2MED of extended data set

Full size table

When the results given in Table 7.7 are examined, modelling of this extended data set with 2MED for two methods is also successful according to KS criteria.

7.4 Conclusion

In this study is handled seven different distributions used modelling of the time intervals (in days) between coal mine accidents. It can be seen that the data set is modelled mostly by Weibull, Gamma and Exponential as non-mixture distributions and by EG, EP as mixture distributions. Except for these distributions, ME and ECOMP are used in modelling study.

The results of KS statistic and the A* test statistic are used as a measure to compare three of these modelling studies. When the results obtained from both LSE and MLE methods for 2MED and the results of these distributions are compared, 2MED seems to be the best model according to KS and A* criteria (except for LSE). 2MED is the best model between mixture and non-mixture distributions used in Adamidis and Loukas [1], Kus [7], Mirhossaini and Dolati [10], and Rodriguesa et al. [11]. In addition to comparison, the extended data set is also analyzed. But studies for this data set is generally based on analyzing as a stochastic process. Therefore this extended data set is modelled only by 2MED and the results is given Table 7.7.

Finally, we can say that as an uncomplicated model 2MED can be recommended for the coal mine data set which is studied by many researches.

References

Adamidis K, Loukas S (1998) A lifetime distribution with decreasing failure rate. Stat Probab Lett 39:35–42
Article MathSciNet Google Scholar
Cox DR, Lewis PAW (1966) The statistical analysis of series of events. Chapman and Hall, London/Methuen
Book Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
MathSciNet Google Scholar
Everitt ES, Hand DJ (1981) Finite mixture distributions. Chapman and Hall, London
Book Google Scholar
Gupta RD, Kundu D (2000) Generalized exponential distribution: different method of estimations. J Stat Comput Simul 00:1–22
Google Scholar
Jarrett RG (1979) A note on the intervals between coal mining disasters. Biometrika 66:191–193
Article Google Scholar
Kus C (2007) A new lifetime distribution. Comput Stat Data Anal 51:4497–4509
Article MathSciNet Google Scholar
Maguire BA, Pearson ES, Wynn AHA (1952) The time intervals between industrial accidents. Biometrika 39:168–180
Article Google Scholar
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, Hoboken
Book Google Scholar
Mirhossaini SM, Dolati A (2008) On a new generalization of the exponential distribution. J Math Ext 3(1):27–42
MathSciNet Google Scholar
Rodriguesa J, Balakrishnan N, Cordeiro GM (2011) A unified view on lifetime distributions. Comput Stat Data Anal 55(12):3311–3319
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science, Department of Statistics, Ankara University, 06100, Ankara, Turkey
Mehmet Yılmaz
Faculty of Economics and Administrative Sciences, Department of Health Care Management, Gazi University, 06100, Ankara, Turkey
Nihan Potas
Department of Statistics, Graduate School of Natural and Applied Science, Ankara University, 06100, Ankara, Turkey
Buse Buyum

Authors

Mehmet Yılmaz
View author publications
You can also search for this author in PubMed Google Scholar
Nihan Potas
View author publications
You can also search for this author in PubMed Google Scholar
Buse Buyum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehmet Yılmaz .

Editor information

Editors and Affiliations

Faculty of Education, Hacettepe University, Ankara, Turkey
Şefika Şule Erçetin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yılmaz, M., Potas, N., Buyum, B. (2016). A Classical Approach to Modeling of Coal Mine Data. In: Erçetin, Ş. (eds) Chaos, Complexity and Leadership 2014. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-18693-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-18693-1_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18692-4
Online ISBN: 978-3-319-18693-1
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

A Classical Approach to Modeling of Coal Mine Data

Abstract

Similar content being viewed by others

A Pliant Model to Count Data: Nabla Poisson–Lindley Distribution with a Practical Data Example

The Maxwell–Weibull Distribution in Modeling Lifetime Datasets

The Exponentiated Generalized Marshall–Olkin Family of Distribution: Its Properties and Applications

Keywords

7.1 Introduction

7.2 Parameter Estimations Methods for 2MED

7.2.1 Mixed Exponential Distribution with Two-Component (2MED)

7.2.2 Maximum Likelihood Method for 2MED

7.2.3 The Least Squares Method for 2MED

7.3 Suggested and Current Models for Coal Mine Data

7.4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Classical Approach to Modeling of Coal Mine Data

Abstract

Similar content being viewed by others

A Pliant Model to Count Data: Nabla Poisson–Lindley Distribution with a Practical Data Example

The Maxwell–Weibull Distribution in Modeling Lifetime Datasets

The Exponentiated Generalized Marshall–Olkin Family of Distribution: Its Properties and Applications

Keywords

7.1 Introduction

7.2 Parameter Estimations Methods for 2MED

7.2.1 Mixed Exponential Distribution with Two-Component (2MED)

7.2.2 Maximum Likelihood Method for 2MED

7.2.3 The Least Squares Method for 2MED

7.3 Suggested and Current Models for Coal Mine Data

7.4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation