Abstract
Cost-effective sampling design is a problem of major concern in some experiments, especially when the measurement of the characteristic of interest is costly or painful or time-consuming. In the current paper, the Fisher information matrix of the log-extended exponential–geometric distribution LEEGD\((\alpha ,\beta )\) with parameters \(\alpha \) and \(\beta \) based on simple random sample, ranked set sample (RSS), median RSS (MRSS) and extreme RSS is discussed. We obtain the expressions for the Fisher information matrix in each case and use them to perform efficiency comparisons. It is found that MRSS is most efficient when one parameter is inferred at a time (with the other parameter known), while RSS is most efficient when both parameters are inferred simultaneously. A real data set is used for illustration.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A random variable X has the log-extended exponential–geometric distribution if it has the probability density function(pdf)
with distribution function
where \(0<x<1\), \(\alpha >0\), \(\beta >0\). We write LEEGD\((\alpha , \beta )\) to denote the distribution as defined in (1). The applications of this distribution are well known in the populations of cities, the intensities of earthquakes and the sizes of power outages. For further details on the importance and applications of this distribution, one may refer to Mitzenmacher (2004), Newman (2005), Sornette (2006) and Clauset et al. (2009).
Ranked set sampling (RSS) is a sampling technique used when the measurement of sampling units is quite difficult or expensive in terms of cost, time or other factors. However, a small set of units can easily be ranked according to the variable of interest, without requiring the actual measurement. Introduced by McIntyre (1952), the mathematical theory for RSS was given by Takahasi and Wakimoto (1968). Dell and Clutter (1972) showed that RSS is more efficient than simple random sampling (SRS) even with an error in ranking. To reduce the errors in ranking, Samawi et al. (1996) introduced a modification of RSS called extreme RSS (ERSS). Another scheme of RSS was investigated by Muttlak (1997) which is the median RSS(MRSS). For further introduction of these sampling schemes, refer to the literature (Abu-Dayyeh et al. 2013; Hassan 2013; Hussian 2014) and Esemen and Grler (2018).
In the literature, there are numbers of studies focused on parametric inference of distributions under RSS and its modifications. Stokes (1995) studied parameter estimation of the parameter for the location-scale family distributions under RSS. Shaibu and Muttlak (2004) studied parameter estimation of the parameter for the normal, exponential and gamma distributions under modified RSS. Xiaofang et al. (2018) studied parameter estimation of the parameter for the log-logistic distribution under RSS. However, one can in case the efficiency of RSS be increasing the sample size which may imply to increase the errors in ranking. Al-Saleh and Al-Omari (2002) suggested multistage RSS method that increases the efficiency of RSS for fixed sample size. Also, see Jemain and Al-Omari (2006) for estimating the population mean using multistage median ranked set samples, Al-Omari and Jaber (2008) for percentile double ranked set sampling, Al-Hadhrami and Al-Omari (2009) considered Bayesian inference on the variance of normal distribution using moving extremes RSS, Shadid et al. (2011) for best linear unbiased estimators and best linear invariant estimators of the location and scale parameters and the population mean using RSS, Al-Hadhrami and Al-Omari (2012) considered Bayes estimation of the mean of normal distribution using moving ERSS, Al-Omari and Al-Hadhrami (2011) investigated maximum likelihood estimators of the parameters of a modified Weibull distribution using extreme RSS, Haq et al. (2013) for partial RSS, Haq et al. (2014) for mixed RSS methods, and Al-Omari (2015) used the L RSS in estimating the distribution function. For some additional results and references, one can see Sinha et al. (1996), Chen et al. (2013, 2016, 2017), Dey et al. (2017), Chen et al. (2018), Qian et al. (2019), Chen et al. (2019) and He et al. (2019).
In the current paper, the Fisher information matrix of the log-extended exponential–geometric distribution LEEGD\((\alpha ,\beta )\) with parameters \(\alpha \) and \(\beta \) based on simple random sample (SRS), ranked set sample (RSS), median RSS (MRSS) and extreme RSS (ERSS) is discussed. These contents are, respectively, arranged in Sects. 2, 3, 4 and 5. Comparisons and conclusions are presented in Sect. 6. A real data set is used for illustration.
2 Fisher Information Matrix in SRS
Let \(\left\{ {X_1 ,X_2,\ldots ,X_m} \right\} \) be a simple random sample of size m from (1). Pedro et al. (2018) obtained the Fisher information matrix based on these samples:
where \(\Gamma _m(z,s,a) = \int _0^1 {\frac{{u^a \log ^{s - 1} (1/u)}}{{(1 + zu)^{m+ 1} }}} du\), \(m= 0,1,2\ldots \) for any real numbers \(\alpha \ge 0\), \(s\ge 1\).
3 Fisher Information Matrix in RSS
An important advantage of RSS approach is that it improves the efficiency of estimators of the population parameters by providing more representative sample from the target population. In this section, we will study the Fisher information matrix for LEEGD\((\alpha ,\beta )\) with model parameters \(\alpha \) and \(\beta \) under RSS. The RSS method suggested by McIntyre (1952) can be summarized as follows: One first draws \(m^2\) units at random from the population and partitions them into m sets of m units. The m units in each set are ranked without making actual measurements. From the first set of m units, the unit ranked lowest is chosen for actual quantification. From the second set of m units, the unit ranked second lowest is measured. The process is continued until the unit ranked largest is measured from the mth set of m units.
Let \(\left\{ {X_{1(1)} ,X_{2(2)} ,\ldots ,X_{m(m)} } \right\} \) be a ranked set sample of size m from (1), then the pdf of \({X_{i(i)} }\) is
where \(c(i,m) = \frac{{m!}}{{(m - i)!(i - 1)!}}\). The log-likelihood function based on these samples is
where \(d_{0}\) is a value which is free of \(\alpha \) and \(\beta \).
Taking the first derivative for \(L^{*}_{\mathrm{RSS}}\), we have
and
Taking the second derivative for \(L^{*}_{\mathrm{RSS}}\), we have
and
Then, we can obtain
and
where \(\zeta (3) = \sum \nolimits _{m= 1}^\infty {\frac{1}{{m^3 }}}\). Combining (3), (4) with (5), we can obtain the Fisher information matrix under RSS:
4 Fisher Information Matrix in MRSS
The MRSS procedure, which is another scheme of RSS, was investigated by Muttlak (1997). The procedure is as follows: Randomly draw units of size \(m^{2}\) from the infinite population for which the unknown parameter is to be estimated, and randomly partition them into m sets of m units. If m is even, select the \(\frac{m}{2}\)th rank unit from each in \(\frac{m}{2}\) sets for actual measurement. Select the \(\left( {\frac{m}{{\mathrm{2}}}{\mathrm{+ 1}}} \right) \)th rank unit from each in the other sets. Such a ranked set sample of size m is denoted by MRSSE. If m is odd, from each set of m units the \(\frac{m+1}{2}\)th unit is measured. Such a ranked set sample of size m is denoted by MRSSO. In this section, we will study the Fisher information matrix for LEEGD\((\alpha ,\beta )\) with model parameters \(\alpha \) and \(\beta \) under MRSS.
Let \( \{X_{1(\frac{m}{2})}, X_{2(\frac{m}{2})}, \ldots ,X_{m(\frac{m}{2} + 1)} \}\) be an MRSSE of size m from (1), then the pdfs of \({X_{i\left( \frac{m}{2}\right) }}(i=1,2,\ldots ,\frac{m}{2})\) and \({X_{i(\frac{m}{2}+1)} }(i=\frac{m}{2}+1,\frac{m}{2}+2,\ldots ,m)\) are, respectively,
and
The log-likelihood function based on these samples is
where \(d_{1}\) is a value which is free of \(\alpha \) and \(\beta \).
Then, the first derivative of \(L^{*}_{\mathrm{MRSSE}}\) can be written as
and
The second derivative of \(L^{*}_{\mathrm{MRSSE}}\) can be written as
and
Then, we can obtain
and
Combining (7), (8) with (9), we can obtain the Fisher information matrix based on MRSSE:
Let \(\{X_{1(\frac{{m + 1}}{2})}, X_{2(\frac{{m + 1}}{2})}, \ldots , X_{m(\frac{{m + 1}}{2})}\}\) be an MRSSO of size m from (1), the pdf of \(X_{i(\frac{{m + 1}}{2})}\) is
We can obtain based on these samples
and
The establishing procedures of (11), (12) and (13) are similar as those of (7), (8) and (9).
Combining (11), (12) with (13), we can obtain the Fisher information matrix based on MRSSO
5 Fisher Information Matrix in ERSS
Take the sorting error into consideration, Samawi et al. (1996) introduced a modification of RSS called extreme RSS (ERSS). The procedure of ERSS is described as follows: Randomly draw units of size \(m^{2}\) from the population, and randomly partition them into m sets of m units. If m is even, select the lowest ranked unit from each in \(\frac{m}{2}\) sets for actual measurement. Select the largest ranked unit from each in the other sets for actual measurement. Such a ranked set sample of size m is denoted by ERSSE. If m is odd, select the lowest ranked unit from each in \(\frac{m-1}{2}\) sets for actual measurement. Select the largest ranked unit from each in \(\frac{m-1}{2}\) sets for actual measurement. Select the \(\frac{m+1}{2}\) ranked unit from the mth set for actual measurement. Such a ranked set sample of size m is denoted by ERSSO. In this section, we will study the information matrix for LEEGD\((\alpha ,\beta )\) with model parameters \(\alpha \) and \(\beta \) under ERSS. Let \(\{X_{1(1)} ,X_{2(1)} , \ldots ,X_{m(m)} \}\) be an ERSSE of size m from (1), then the pdfs of \(X_{i(1)}(i=1,2,\ldots ,\frac{m}{2})\) and \(X_{i(m)}(i=\frac{m}{2}+1,\frac{m}{2}+2,\ldots ,m)\) are, respectively,
and
The log-likelihood function based on these samples is
where \(d_{2}\) is a value which is free of \(\alpha \) and \(\beta \).
Taking the first derivative for \(L^{*}_{\mathrm{ERSSE}}\), we have
and
Taking the second derivative for \(L^{*}_{\mathrm{ERSSE}}\), we have
and
Then, we can obtain
and
Combining (15), (16) with (17), we can obtain the Fisher information matrix based on ERSSE:
Let \({X_{1(1)} ,X_{2(1)} , \ldots ,X_{m(\frac{{m + 1}}{2})} }\) be an ERSSO of size m from (1), then we can obtain based on these samples
and
The establishing procedures of (19), (20) and (21) are similar as those of (15), (16) and (17).
Combining (19), (20) with (21), we can obtain the Fisher information matrix based on ERSSO:
6 Comparison and Conclusions
6.1 Numerical Comparison
In this subsection, we will compare efficiency of RSS for parameter estimation for LEEGD\((\alpha ,\beta )\). For parameter inference of \(\alpha \), the relative efficiency of RSS with respect to (w.r.t.) SRS, the relative efficiency of MRSSK(K=E or O) w.r.t. SRS and the relative efficiency of ERSSK w.r.t. SRS may be defined as
respectively. For parameter inference of \(\beta \), the relative efficiency of RSS w.r.t. SRS, the relative efficiency of MRSSK w.r.t. SRS and the relative efficiency of ERSSK w.r.t. SRS may be defined as
respectively. For parameter inference of \(\alpha \) and \(\beta \), the relative efficiency of RSS w.r.t. SRS, the relative efficiency of MRSSK w.r.t. SRS and the relative efficiency of ERSSK w.r.t. SRS may be defined as
respectively.
It can be seen that \({\mathrm{RE}}^i(i=1,2,\ldots ,9)\) are free of \(\alpha \). The results are, respectively, given in Tables 1, 2 and 3.
From Table 1, we conclude the following:
- (1)
\({\mathrm{RE}}^1>1\), which means that for parameter inference of \(\alpha \) from LEEGD\((\alpha ,\beta )\) in which \(\beta \) is known, RSS is more efficient than SRS.
- (2)
\({\mathrm{RE}}^2>1\), which means that for parameter inference of \(\alpha \) from LEEGD\((\alpha ,\beta )\) in which \(\beta \) is known, MRSS is more efficient than SRS.
- (3)
\({\mathrm{RE}}^3>1\), which means that for parameter inference of \(\alpha \) from LEEGD\((\alpha ,\beta )\) in which \(\beta \) is known, ERSS is more efficient than SRS.
- (4)
Comparing \({\mathrm{RE}}^1\), \({\mathrm{RE}}^2\) with \({\mathrm{RE}}^3\), we can conclude that for parameter inference of \(\alpha \) from LEEGD\((\alpha ,\beta )\) in which \(\beta \) is known, MRSS is more efficient than the other sampling designs.
From Table 2, we conclude the following:
- (5)
\({\mathrm{RE}}^4>1\), which means that for parameter inference of \(\beta \) from LEEGD\((\alpha ,\beta )\) in which \(\alpha \) is known, RSS is more efficient than SRS.
- (6)
\({\mathrm{RE}}^5>1\), which means that for parameter inference of \(\beta \) from LEEGD\((\alpha ,\beta )\) in which \(\alpha \) is known, MRSS is more efficient than SRS.
- (7)
\({\mathrm{RE}}^6>1\), which means that for parameter inference of \(\beta \) from LEEGD\((\alpha ,\beta )\) in which \(\alpha \) is known, ERSS is more efficient than SRS.
- (8)
Comparing \({\mathrm{RE}}^4\), \({\mathrm{RE}}^5\) with \({\mathrm{RE}}^6\), we can conclude that for parameter inference of \(\beta \) from LEEGD\((\alpha ,\beta )\) in which \(\alpha \) is known, MRSS is more efficient than the other sampling designs.
From Table 3, we conclude the following:
- (9)
\({\mathrm{RE}}^7>1\), which means that for parameter inference of \(\alpha \) and \(\beta \) from LEEGD\((\alpha ,\beta )\), RSS is more efficient than SRS.
- (10)
\({\mathrm{RE}}^8>1\), which means that for parameter inference of \(\alpha \) and \(\beta \) from LEEGD\((\alpha ,\beta )\), MRSS is more efficient than SRS.
- (11)
\({\mathrm{RE}}^9>1\), which means that for parameter inference of \(\alpha \) and \(\beta \) from LEEGD\((\alpha ,\beta )\), ERSS is more efficient than SRS.
- (12)
Comparing \({\mathrm{RE}}^7\), \({\mathrm{RE}}^8\) with \({\mathrm{RE}}^9\), we can conclude that for parameter inference of \(\alpha \) and \(\beta \) from LEEGD\((\alpha ,\beta )\), RSS is more efficient than the other sampling designs.
6.2 A Real Data Application
In this section, we present a data analysis with a real data. The data set is available from FreesFootnote 1 (2010) and consists of 73 observations on 7 variables. The data were collected from a questionnaire carried out with the purpose of relating cost-effectiveness to management philosophy of controlling the company's exposure to various property and casualty losses, after adjusting for company effects such as size and industry type. These data have been previously analyzed by Schmit and Roth (1990), Gomez-Deniz et al. (2014) and Jodra and Jimenez-Gamero (2016). In this section, interest is centered on the variable FIRMCOST (divided by 100), which is a measure of the cost-effectiveness of the risk management practices of the firm. The LEEGD was fitted to the variable FIRMCOST/100; the maximum likelihood estimators of \(\alpha \) and \(\beta \) are, respectively, 1.4322 and 52.1069. It can also be checked that the correlation coefficient between the theoretical and the empirical cumulative probabilities is 0.9956. The result of the analysis is presented in Tables 4 and 5. It can be seen from these tables that there are the same conclusions as simulation results of the previous sections. This agrees with the simulation results of the previous sections.
Notes
References
Abu-Dayyeh W, Assrhani A, Ibrahim K (2013) Estimation of the shape and scale parameters of Pareto distribution using ranked set sampling. Stat Pap 54(1):207–225
Al-Hadhrami SA, Al-Omari AI (2009) Bayesian inference on the variance of normal distribution using moving extremes ranked set sampling. J Mod Appl Stat Methods 8(1):227–235
Al-Hadhrami SA, Al-Omari AI (2012) Bayes estimation of the mean of normal distribution using moving extreme ranked set sampling. Pakistan Jo Stat Oper Res VIII(1):21–30
Al-Omari AI (2015) The efficiency of L ranked set sampling in estimating the distribution function. Afr Mat 26(7):1457–1466
Al-Omari AI, Al-Hadhrami SA (2011) On maximum likelihood estimators of the parameters of a modified Weibull distribution using extreme ranked set sampling. J Mod Appl Stat Methods 10(2):607–617
Al-Omari AI, Jaber K (2008) Percentile double ranked set sampling. J Math Stat 4(1):60–64
Al-Saleh MF, Al-Omari AI (2002) Multistage ranked set sampling. J Stat Plan Inference 102(2):273–286
Chen W, Xie M, Wu M (2013) Parametric estimation for the scale parameter for scale distributions using moving extremes ranked set sampling. Stat Probab Lett 83(9):2060–2066
Chen W, Xie M, Wu M (2016) Modified maximum likelihood estimator of scale parameter using moving extremes ranked set sampling. Commun Stat Simul Comput 45(6):2232–2240
Chen W, Tian Y, Xie M (2017) Maximum likelihood estimator of the parameter for a continuous one parameter exponential family under the optimal ranked set sampling. J Syst Sci Complex 30(6):1350–1363
Chen W, Tian Y, Xie M (2018) The global minimum variance unbiased estimator of the parameter for a truncated parameter family under the optimal ranked set sampling. J Stat Comput Simul 88(17):3399–3414
Chen W, Yang R, Yao D, Long C (2019) Pareto parameters estimation using moving extremes ranked set sampling. Stat Pap. https://doi.org/10.1007/s00362-019-01132-9
Clauset A, Shaliz CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703
Dell TR, Clutter JL (1972) Ranked set sampling theory with order statistics background. Biometrics 28(2):545–555
Dey S, Salehi M, Ahmadi J (2017) Rayleigh distribution revisited via ranked set sampling. Metron 75(1):69–85
Esemen M, Grler S (2018) Parameter estimation of generalized Rayleigh distribution based on ranked set sample. J Stat Comput Simul 88(4):615–628
Frees EW (2010) Regression modeling with actuarial and financial applications. International series on actuarial science. Cambridge University Press, Cambridge
Gomez-Deniz E, Sordo MA, Caldern-Ojeda E (2014) The Log–Lindley distribution as an alternative to the beta regression model with applications in insurance. Insurance Math Econ 54:49–57
Haq A, Brown J, Moltchanova E, Al-Omari AI (2013) Partial ranked set sampling design. Environmetrics 24(3):201–207
Haq A, Brown J, Moltchanova E, Al-Omari AI (2014) Mixed ranked set sampling design. J Appl Stat 41(10):2141–2156
Hassan AS (2013) Maximum likelihood and Bayes estimators of the unknown parameters for exponentiated exponential distribution using ranked set sampling. Int J Eng Res Appl 3(1):720–725
He X, Chen W, Qian W (2018) Maximum likelihood estimators of the parameters of the log-logistic distribution. Stat Pap. https://doi.org/10.1007/s00362-018-1011-3
He X, Chen W, Yang R (2019) Log-logistic parameters estimation using moving extremes ranked set sampling design. Appl Math A J Chin Univ Ser B (accepted)
Hussian MA (2014) Bayesian and maximum likelihood estimation for Kumaraswamy distribution based on ranked set sampling. Am J Math Stat 4(1):30–37
Jemain AA, Al-Omari AI (2006) Multistage median ranked set samples for estimating the population mean. Pakistan J Stat 22(3):195–207
Jodra P, Jimenez-Gamero MD (2016) A note on the Log–Lindley distribution. Insurance Math Econ 71:186–194
McIntyre GA (1952) A method for unbiased selective sampling using ranked sets. Aust J Agric Res 3(4):385–390
Mitzenmacher M (2004) A brief history of generative models for power law and log-normal distribution. Internet Math 1(2):226–251
Muttlak HA (1997) Median ranked set sampling. J Appl Stat Sci 6(4):245–255
Newman MEJ (2005) Power laws, Pareto distributions and Zipf’s law. Contemp Phys 46:323–351
Pedro J, Maria D, Jimenez G (2018) A quantile regression model for bounded responses based on the exponential geometric distribution. Revstat Stat J
Qian W, Chen W, He X (2019) Parameter estimation for the Pareto distribution based on ranked set sampling. Stat Pap. https://doi.org/10.1007/s00362-019-01102-1
Samawi HM, Ahmed MS, Abu-Dayyeh W (1996) Estimating the population mean using extreme ranked set sampling. Biom J 38(5):577–586
Schmit JT, Roth K (1990) Cost effectiveness of risk management practices. J Risk Insurance 57(3):455–470
Shadid MR, Raqab MZ, Al-Omari AI (2011) Modified BLUEs and BLIEs of the location and scale parameters and the population mean using ranked set sampling. J Stat Comput Simul 81(3):261–274
Shaibu AB, Muttlak HA (2004) Estimating the parameters of the normal, exponential and gamma distributions using median and extreme ranked set samples. Statistica 64(1):75–98
Sinha BK, Sinha BK, Purkayastha S (1996) On some aspects of ranked set sampling for estimation of normal and exponential parameters. Stat Decis 14(3):223–240
Sornette D (2006) Critical phenomena in natural sciences, 2nd edn. Springer, Berlin
Stokes L (1995) Parametric ranked set sampling. Ann Inst Stat Math 47(3):465–482
Takahasi K, Wakimoto K (1968) On unbiased estimates of the population mean based on the sample stratified by means of ordering. Ann Inst Stat Math 20(1):1–31
Acknowledgements
The authors thank the Editor in Chief, an associate editor and reviewers for their valuable comments and suggestions to improve the paper. This research was supported by the National Science Foundation of China (Grant No. 11901236), Scientific Research Fund of Hunan Provincial Science and Technology Department (Grant No. 2019JJ50479), Scientific Research Fund of Hunan Provincial Education Department (Grant No. 18B322), Winning Bid Project of Hunan Province for the 4th National Economic Census (Grant No. [2020]1), Young Core Teacher Foundation of Hunan Province (No. [2020]43) and Fundamental Research Fund of Xiangxi Autonomous Prefecture (Grant No. 2018SF5026).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest
Rights and permissions
About this article
Cite this article
Yang, R., Chen, W., Yao, D. et al. The Efficiency of Ranked Set Sampling Design for Parameter Estimation for the Log-Extended Exponential–Geometric Distribution. Iran J Sci Technol Trans Sci 44, 497–507 (2020). https://doi.org/10.1007/s40995-020-00855-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40995-020-00855-x