Abstract
In this paper, L ranked set sampling (LRSS) technique (Al-Nasser in Simul Comput, 6:33–43, 2007) is considered for estimating the distribution function of a random variable. The suggested estimator of the distribution function is compared with its counterparts using simple random sampling (SRS) and ranked set sampling (RSS) schemes. It is found that the suggested LRSS estimator of the distribution function is biased and is more efficient than that of the SRS and RSS for a given \(x\) based on the number of measured units.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The RSS is suggested by [10] for estimating the population mean of pasture and forage yields as efficient method than the SRS. The RSS can be implemented in situations when the sampling units in a study can be more easily ranked than quantified.
The RSS as suggested [10] can be described as:
-
Step 1: Randomly select \(n^2\) units from the target population.
-
Step 2: Allocate the \(n^2\) selected units as randomly as possible into \(n\) sets, each of size \(n\).
-
Step 3: Rank the units within each set with respect to a variable of interest. This may be based on personal professional judgment or done based on a concomitant variable correlated with the variable of interest.
-
Step 4: Choose a sample for actual quantification by including the smallest ranked unit in the first set, the second smallest ranked unit in the second set, the process continues in this way until the largest ranked unit is selected from the last set.
-
Step 5: Repeat Steps 1 through 4 for \(m\) cycles to obtain a sample of size \(mn\) for actual quantification.
Let \(X_1,\,X_2 ,\ldots ,X_n\) be a SRS of size \(n\) with a continuous probability density function (pdf) \(\varphi (x)\) and cumulative distribution function (cdf) \(\Phi (x)\), with a finite mean \(\mu \) and variance \(\sigma ^2\). Let \(X_{11h},\,X_{12h},{\ldots },X_{1nh}\); \(X_{21h},\,X_{22h},{\ldots },\,X_{2nh}\); ...;\(X_{n1h},\,X_{n2h},{\ldots },\,X_{nnh}\) be \(n\) independent SRS each of size \(n\) in the \(h\)th cycle \(( {h=1,2,...,m})\).
Assume that \(X_{i(1:n)h} \), \(X_{i(2:n)h} \),...,\(X_{i(n:n)h} \) are the order statistics of the \(i\)th sample \(X_{i1h},\,X_{i2h} ,\ldots ,\,X_{inh}\), \(( {i=1,2,...,n})\) in the \(h\)th cycle \(( {h=1,2,...,m})\). Then, \(X_{1(1:n)h} \), \(X_{2(2:n)h} \), ..., \(X_{n(n:n)h} \) is a RSS of size \(n\). The cdf \(\Phi _{(j:n)} (x)\) of the \(j\)th order statistics \(X_{(j:n)} \), is given by
and the pdf \(\varphi _{(j:n)} (x)\) is given by
Also, \(E( {X_{(j:n)} })\!=\mu _{(j:n)} =\int \nolimits _{-\infty }^\infty {x\varphi _{(j:n)} (x)dx} \) and \(\text{ Var }( {X_{(j:n)} })\!=\sigma _{(j:n)}^2 \!=\int \nolimits _{-\infty }^\infty ( {x\!-\!\mu _{(j:n)} })^2\varphi _{(j:n)} (x)dx \). See [8] for more details. [14] independently introduced the same RSS method and they showed that
As a modification of the usual RSS [1] suggested the LRSS as a robust method for estimating the population mean. The LRSS method consists of the following steps:
-
Step 1: Select \(n\) samples each of size \(n\) from the target population.
-
Step 2: Rank the units within each sample with respect to a variable of interest by visual inspection or any cost free method.
-
Step 3: Select the LRSS coefficient \(k=[\alpha n]\), where \(0\le \alpha <0.5\), and \([x]\) is largest integer value less than or equal to \(x\).
-
Step 4: For each of the first \(k\) ranked samples, select the units with the rank \(k+1\) for actual measurement.
-
Step 5: For each of the last \(k\) ranked samples, i.e., the \((n-k)th\) to the \(n\)th ranked samples, select the units with the rank \(n-k\) for actual measurement.
-
Step 6: For \(j=k+2,k+3,...,n-k\), the unit with rank \(j\) in the \(j\)th ranked sample is selected for actual measurement.
-
Step 5: The procedure can be repeated \(m\) cycles if needed to obtain a sample of size nm units.
Note that if \(k=0\), then the LRSS will be reduced to the usual RSS. The LRSS is used for estimating the linear regression model by [2].
Let \(X_{1(k+1:n)},\,X_{2(k+1:n)},{\ldots },X_{k(k+1:n)}\), \(X_{k+1(k+1:m)},\,X_{k+2(k+2:n)} ,{\ldots },X_{n-k(n-k:n)}\), \(X_{n-k+1(n-k:n)} X_{n-k+2(n-k:n)} ,{\ldots }, X_{n(n-k:n)}\) be a LRSS of size \(n\). The LRSS estimator of the population mean is defined as
Its variance is given by
For more information about RSS see for example [12] suggested extreme ranked set samples (ERSS) for estimating a population mean. [6] suggested multistage RSS for estimating the population mean. Estimation of mean based on modified robust ERSS is considered by [3]. [11] used ERSS and median RSS for estimating the distribution function. [9] considered the distribution function estimation using extreme median RSS. [4] considered ratio type estimators of population mean using auxiliary information based on median RSS.
The remaining part of this paper is organized as follows: In Sect. 2, estimation of the distribution function is considered using SRS, RSS, and the suggested LRSS. Numerical comparisons are given in Sect. 3. Finally, in Sect. 4 some conclusions and recommendations are presented.
2 Estimation of \(\Phi (x)\) using SRS, RSS and LRSS
Let \(\Phi _{SRS} (x)\) be the empirical distribution function of a SRS of size nm, \(X_1,\,X_2,\ldots ,\,X_{nm}\) from \(\Phi (x)\). It is well known that \(\Phi _{SRS} (x)\) is an unbiased estimator of \(\Phi (x)\) for a given \(x\), with variance \(\text{ Var }\left[ {\Phi _{SRS} (x)} \right] =\frac{1}{nm}\Phi (x)\left[ {1-\Phi (x)} \right] \), and \(\Phi _{SRS} (x)\) is a consistent estimator of \(\Phi (x)\), (See [7]).
Recently, a new estimator of \(\Phi (x)\) is suggested by [13] using RSS for fixed \(x\) as \(\Phi _{RSS} (x)=\frac{1}{nm}\sum \nolimits _{h=1}^m {\sum \nolimits _{j=1}^n {I( {X_{j(j:n)h} \le x})} } \), where \(I(\cdot )\) is an indicator function. They proved that \(E\left[ {\Phi _{RSS} (x)} \right] =\frac{1}{nm}\sum \nolimits _{h=1}^m {\sum \nolimits _{j=1}^n {\Phi _{j(j:n)h} (x)} \!=\! \Phi (x)},\text{ Var }\left[ {\Phi _{RSS} (x)} \right] \!=\!\frac{1}{n^2m}\sum \nolimits _{j=1}^n \Phi _{(j:n)} (x)\left[ {1-\Phi _{(j:n)} (x)} \right] ,\) and \(\frac{\Phi _{RSS} (x)-E\left[ {\Phi _{RSS} (x)} \right] }{\sqrt{\text{ Var }\left[ {\Phi _{RSS} (x)} \right] } }\buildrel D \over \longrightarrow N(0,1)\) as \(m\rightarrow \infty \) when \(x\) and \(n\) are fixed.
The suggested LRSS estimator of the distribution function \(\Phi (x)\) is defined as
with variance
where
It is well known that when we use the ranked set sampling method we select a small sample size in order to rank the units visually or by any cheap method. So to increase the sample size we repeat the process \(m\) cycles for fixed value of the sample size. Therefore, in the following two propositions we assumed that the number of cycles gets large.
Proposition 1
with bias
Proof
To prove (6), take the expectation of (3) as
The proof of (7) is straightforward by assuming \(Bias\left[ {\Phi _{LRSS} (x)} \right] =E\left[ {\Phi _{LRSS} (x)} \right] -\Phi (x)\).
Note that for fixed \(k\) as \(n\rightarrow \infty \), the bias approaches zero while it does not depends on \(m\). \(\square \)
Proposition 2
For fixed x and n, when \(m\!\rightarrow \! \infty \), the random variable \(Z\!=\!\frac{\Phi _{LRSS} (x)\!-\!E\left[ {\Phi _{LRSS} (x)} \right] }{\sqrt{\text{ Var }\left[ {\Phi _{LRSS} (x)} \right] } }\) converges in distribution to the standard normal distribution.
for
Proof
Suppose that
Then,
where \(P( {X_{(j:n)} \le x})=\sum \nolimits _{i=j}^n {( {\begin{array}{l} n \\ i \\ \end{array}})F^i(x)( {1-F(x)})^{n-i}} \).
Since the random variables \(\Psi _w \)’s are independent and identically (IID), then by the central limit theorem (CLT) we have \(\frac{\sqrt{m} \left[ {\bar{\Psi }_w -E( {\Psi _w })} \right] }{\sqrt{\text{ Var }( {\Psi _w })} }\) converges in distribution to \(N(0,1)\). \(\square \)
3 Numerical Comparisons
In this section, a simulation study is conducted to investigate the efficiency of LRSS method in estimating the distribution function \(\Phi (x)\). The suggested estimator is compared with its competitors using RSS and SRS for samples of sizes, \(n=4,5,6,...,10\). The efficiency of \(\Phi _{RSS} (x)\) and \(\Phi _{LRSS} (x)\) with respect to \(\Phi _{SRS} (x)\), respectively are defined as
and
The bold fonts in the Tables 1, 2, 3 and 4 are the best values of the efficiency using LRSS relative to SRS with fixed \(k\) for estimating the distribution function \(\Phi (x)\). It can be seen that as \(\Phi (x)\) goes to zero (0.01) or 1 (0.99), the efficiency values are the largest. As an example, for \(n=10\) and \(k=2\), the efficiency values are 9.4327 and 9.5039 for \(\Phi (x)=0.01\) and 0.99, respectively. But for other values when, \(0.10\le \Phi (x)\le 0.90\), the efficiency values are increases when \(\Phi (x)\) is approximately 0.5.
For the RSS method as it is shown in Table 5, [13] showed that the \(\Phi _{RSS} (x)\) is more efficient than \(\Phi _{SRS} (x)\). But \(\Phi _{LRSS} (x)\) is more efficient than \(\Phi _{RSS} (x)\) for most of cases considered in this study. See as an example, when \(n=8\) and \(\Phi (x)=0.99\), the efficiency values of RSS and LRSS when \(k=2\), are 1.0523 and 12.3858, respectively.
From Table 6, we can select the value of \(k\) with each sample size to get the best value of the efficiency.
Table 7 involved the suitable values of the sample size for each value of \(\Phi (x)\). However, we can note that the best values of the sample size are \(n=5\) for \(\Phi (x)=0.01,\,0.99\), \(n=4\) for \(\Phi (x)=0.10,\,0.90\), and \(n=10\) for \(0.2\le \Phi (x)\le 0.8\).
4 Conclusion
A new estimator of the distribution function is suggested using L ranked set sampling method. The suggested estimator is compared with the SRS and RSS estimators of the distribution function. It is found that LRSS estimator is biased and it is more efficient than the SRS and RSS for most of cases considered in this study. It is recommended to use the LRSS for estimating the distribution of a continuous random variable.
References
Al-Nasser, A.D.: L-Ranked set sampling: a generalization procedure for robust visual sampling. Communications in statistics. Simul. Comput. 6, 33–43 (2007)
Al-Nasser, A.D., Radaideh, A.: Estimation of simple linear regression model using L ranked set sampling. Int. J. Open Probl. Comput. Sci. Math. 1(1), 18–33 (2008)
Al-Omari, A.I.: Estimation of mean based on modified robust extreme ranked set sampling. J. Stat. Comput. Simul. 81(8), 1055–1066 (2011)
Al-Omari, A.I.: Ratio estimation of population mean using auxiliary information in simple random sampling and median ranked set sampling. Stat. Probab. Lett. 82(11), 1883–1990 (2012)
Al-Omari, A.I., Al-Hadhrami, S.A.: On maximum likelihood estimators of the parameters of a modified Weibull distribution using extreme ranked set sampling. J. Mod. Appl. Stat. Methods 10(2), 607–617 (2011)
Al-Saleh, M.F., Al-Omari, A.I.: Multistage ranked set sampling. J. Stat. Plan. Inference 102(2), 273–286 (2002)
Bahadur, R.R.: A note on quantiles in large samples. Ann Math Stat 37(3), 577–580 (1966)
David, H.A., Nagaraja, H.N.: Order Stat., 3rd edn. Wiley, New Jersey (2003)
Kim, D.H., Kim, D.W., Kim, G.H.: On the estimation of the distribution function using extreme median ranked set sampling. J. Korean Data Anal. Soc. 7(2), 429–439 (2005)
McIntyre, G.A.: A method for unbiased selective sampling using ranked sets. Aust. J. Agric. Res. 3, 385–390 (1952)
Samawi, H., Al-Sageer, O.M.: On the estimation of the distribution function using extreme and median ranked set sampling. Biom. J. 43(3), 357–373 (2001)
Samawi, H.M., Mohmmad, S., Abu-Dayyeh, W.: Estimating the population mean using extreme ranked set sampling. Biometrical J 38(5), 577–586 (1996)
Stokes, S.L., Sager, T.W.: Characterization of a ranked set sample with application to estimating distribution functions. J Am Stat Assoc 83(402), 374–381 (1988)
Takahasi, K., Wakimoto, K.: On the unbiased estimates of the population mean based on the sample stratified by means of ordering. Ann. Inst. Stat. Math. 20, 1–31 (1968)
Acknowledgments
The author is thankful to the editor and the anonymous reviewers for their valuable comments and suggestions that significantly improved the original version of the article.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Al-Omari, A.I. The efficiency of L ranked set sampling in estimating the distribution function. Afr. Mat. 26, 1457–1466 (2015). https://doi.org/10.1007/s13370-014-0298-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13370-014-0298-z