1 Introduction

The RSS is suggested by [10] for estimating the population mean of pasture and forage yields as efficient method than the SRS. The RSS can be implemented in situations when the sampling units in a study can be more easily ranked than quantified.

The RSS as suggested [10] can be described as:

  • Step 1: Randomly select \(n^2\) units from the target population.

  • Step 2: Allocate the \(n^2\) selected units as randomly as possible into \(n\) sets, each of size \(n\).

  • Step 3: Rank the units within each set with respect to a variable of interest. This may be based on personal professional judgment or done based on a concomitant variable correlated with the variable of interest.

  • Step 4: Choose a sample for actual quantification by including the smallest ranked unit in the first set, the second smallest ranked unit in the second set, the process continues in this way until the largest ranked unit is selected from the last set.

  • Step 5: Repeat Steps 1 through 4 for \(m\) cycles to obtain a sample of size \(mn\) for actual quantification.

Let \(X_1,\,X_2 ,\ldots ,X_n\) be a SRS of size \(n\) with a continuous probability density function (pdf) \(\varphi (x)\) and cumulative distribution function (cdf) \(\Phi (x)\), with a finite mean \(\mu \) and variance \(\sigma ^2\). Let \(X_{11h},\,X_{12h},{\ldots },X_{1nh}\); \(X_{21h},\,X_{22h},{\ldots },\,X_{2nh}\); ...;\(X_{n1h},\,X_{n2h},{\ldots },\,X_{nnh}\) be \(n\) independent SRS each of size \(n\) in the \(h\)th cycle \(( {h=1,2,...,m})\).

Assume that \(X_{i(1:n)h} \), \(X_{i(2:n)h} \),...,\(X_{i(n:n)h} \) are the order statistics of the \(i\)th sample \(X_{i1h},\,X_{i2h} ,\ldots ,\,X_{inh}\), \(( {i=1,2,...,n})\) in the \(h\)th cycle \(( {h=1,2,...,m})\). Then, \(X_{1(1:n)h} \), \(X_{2(2:n)h} \), ..., \(X_{n(n:n)h} \) is a RSS of size \(n\). The cdf \(\Phi _{(j:n)} (x)\) of the \(j\)th order statistics \(X_{(j:n)} \), is given by

$$\begin{aligned} \Phi _{(j:n)} (x)=\sum \limits _{i=j}^n {\left( {\begin{array}{l} n \\ i \\ \end{array}}\right) \left[ {\Phi (x)} \right] ^i\left[ {1-\Phi (x)} \right] ^{n-i}} ,\,-\infty <x<\infty , \end{aligned}$$
(1)

and the pdf \(\varphi _{(j:n)} (x)\) is given by

$$\begin{aligned} \varphi _{(j:n)} (x)=\left( {\begin{array}{l} n \\ j \\ \end{array}}\right) \left[ {\Phi (x)} \right] ^{j-1}\left[ {1-\Phi (x)} \right] ^{n-j}\varphi (x). \end{aligned}$$
(2)

Also, \(E( {X_{(j:n)} })\!=\mu _{(j:n)} =\int \nolimits _{-\infty }^\infty {x\varphi _{(j:n)} (x)dx} \) and \(\text{ Var }( {X_{(j:n)} })\!=\sigma _{(j:n)}^2 \!=\int \nolimits _{-\infty }^\infty ( {x\!-\!\mu _{(j:n)} })^2\varphi _{(j:n)} (x)dx \). See [8] for more details. [14] independently introduced the same RSS method and they showed that

$$\begin{aligned} \varphi (x)\!=\!\frac{1}{n}\sum \limits _{j=1}^n {\varphi _{(j:n)} (x)} ,\mu \!=\!\frac{1}{n}\sum \limits _{j=1}^n {\mu _{(j:n)} } \;\text{ and } \sigma ^2=\frac{1}{n}\sum \limits _{j=1}^n {\sigma _{(j:n)}^2 } +\frac{1}{n}\sum \limits _{j=1}^n {\left[ {E\left( {X_{(j:n)} }\right) -\mu } \right] ^2}. \end{aligned}$$

As a modification of the usual RSS [1] suggested the LRSS as a robust method for estimating the population mean. The LRSS method consists of the following steps:

  • Step 1: Select \(n\) samples each of size \(n\) from the target population.

  • Step 2: Rank the units within each sample with respect to a variable of interest by visual inspection or any cost free method.

  • Step 3: Select the LRSS coefficient \(k=[\alpha n]\), where \(0\le \alpha <0.5\), and \([x]\) is largest integer value less than or equal to \(x\).

  • Step 4: For each of the first \(k\) ranked samples, select the units with the rank \(k+1\) for actual measurement.

  • Step 5: For each of the last \(k\) ranked samples, i.e., the \((n-k)th\) to the \(n\)th ranked samples, select the units with the rank \(n-k\) for actual measurement.

  • Step 6: For \(j=k+2,k+3,...,n-k\), the unit with rank \(j\) in the \(j\)th ranked sample is selected for actual measurement.

  • Step 5: The procedure can be repeated \(m\) cycles if needed to obtain a sample of size nm units.

Note that if \(k=0\), then the LRSS will be reduced to the usual RSS. The LRSS is used for estimating the linear regression model by [2].

Let \(X_{1(k+1:n)},\,X_{2(k+1:n)},{\ldots },X_{k(k+1:n)}\), \(X_{k+1(k+1:m)},\,X_{k+2(k+2:n)} ,{\ldots },X_{n-k(n-k:n)}\), \(X_{n-k+1(n-k:n)} X_{n-k+2(n-k:n)} ,{\ldots }, X_{n(n-k:n)}\) be a LRSS of size \(n\). The LRSS estimator of the population mean is defined as

$$\begin{aligned} \bar{X}_{LRSS(k,n)} =\frac{1}{n}\left( {\sum \limits _{i=1}^k {X_{i(k+1:n)} } +\sum \limits _{i=k+1}^{n-k} {X_{i(i:n)} } +\sum \limits _{i=n-k+1}^n {X_{i(n-k:n)} } }\right) . \end{aligned}$$

Its variance is given by

$$\begin{aligned} Var( {\bar{X}_{LRSS(k,n)} })\!=\!\frac{1}{n^2}\left( {\sum \limits _{i=1}^k {\!Var( {X_{i(k+1:n)} })} \!+\!\sum \limits _{i=k+1}^{n-k} {\!Var( {X_{i(i:n)} })} \!+\!\sum \limits _{i=n-k+1}^n {\!Var( {X_{i(n-k:n)} })} }\right) \end{aligned}$$

For more information about RSS see for example [12] suggested extreme ranked set samples (ERSS) for estimating a population mean. [6] suggested multistage RSS for estimating the population mean. Estimation of mean based on modified robust ERSS is considered by [3]. [11] used ERSS and median RSS for estimating the distribution function. [9] considered the distribution function estimation using extreme median RSS. [4] considered ratio type estimators of population mean using auxiliary information based on median RSS.

The remaining part of this paper is organized as follows: In Sect. 2, estimation of the distribution function is considered using SRS, RSS, and the suggested LRSS. Numerical comparisons are given in Sect. 3. Finally, in Sect. 4 some conclusions and recommendations are presented.

2 Estimation of \(\Phi (x)\) using SRS, RSS and LRSS

Let \(\Phi _{SRS} (x)\) be the empirical distribution function of a SRS of size nm, \(X_1,\,X_2,\ldots ,\,X_{nm}\) from \(\Phi (x)\). It is well known that \(\Phi _{SRS} (x)\) is an unbiased estimator of \(\Phi (x)\) for a given \(x\), with variance \(\text{ Var }\left[ {\Phi _{SRS} (x)} \right] =\frac{1}{nm}\Phi (x)\left[ {1-\Phi (x)} \right] \), and \(\Phi _{SRS} (x)\) is a consistent estimator of \(\Phi (x)\), (See [7]).

Recently, a new estimator of \(\Phi (x)\) is suggested by [13] using RSS for fixed \(x\) as \(\Phi _{RSS} (x)=\frac{1}{nm}\sum \nolimits _{h=1}^m {\sum \nolimits _{j=1}^n {I( {X_{j(j:n)h} \le x})} } \), where \(I(\cdot )\) is an indicator function. They proved that \(E\left[ {\Phi _{RSS} (x)} \right] =\frac{1}{nm}\sum \nolimits _{h=1}^m {\sum \nolimits _{j=1}^n {\Phi _{j(j:n)h} (x)} \!=\! \Phi (x)},\text{ Var }\left[ {\Phi _{RSS} (x)} \right] \!=\!\frac{1}{n^2m}\sum \nolimits _{j=1}^n \Phi _{(j:n)} (x)\left[ {1-\Phi _{(j:n)} (x)} \right] ,\) and \(\frac{\Phi _{RSS} (x)-E\left[ {\Phi _{RSS} (x)} \right] }{\sqrt{\text{ Var }\left[ {\Phi _{RSS} (x)} \right] } }\buildrel D \over \longrightarrow N(0,1)\) as \(m\rightarrow \infty \) when \(x\) and \(n\) are fixed.

The suggested LRSS estimator of the distribution function \(\Phi (x)\) is defined as

$$\begin{aligned} \Phi _{LRSS} (x)&= \frac{1}{nm}\Bigg [\sum \limits _{h=1}^m {\sum \limits _{j=1}^k {I\left( {X_{j(k+1:n)h} \le x}\right) } }+\sum \limits _{h=1}^m \sum \limits _{j=k+1}^{n-k} I\left( {X_{j(j:n)h} \le x}\right) \nonumber \\&+\sum \limits _{h=1}^m \sum \limits _{j=n-k+1}^n I\left( {X_{j(n-k:n)h} \le x}\right) \Bigg ], \end{aligned}$$
(3)

with variance

$$\begin{aligned} \text{ Var }\left[ {\Phi _{LRSS} (x)} \right]&=\frac{k}{n^2}\left\{ {\Phi _{(k+1:n)} (x)\left[ {1-\Phi _{(k+1:n)} (x)} \right] +\Phi _{(n-k:n)} (x)\left[ {1-\Phi _{(n-k:n)} (x)} \right] } \right\} \nonumber \\&\quad +\frac{1}{n^2}\sum \limits _{j=1}^{n-2k} {\Phi _{(j:n)} (x)\left[ {1-\Phi _{(j:n)} (x)} \right] } , \end{aligned}$$
(4)

where

$$\begin{aligned} \Phi _{(r:n)} (x)=I_{\Phi (x)} ( {r,n-r+1})=\frac{1}{B(a,b)}\int \limits _0^{\Phi (x)} {t^{a-1}( {1-t})^{b-1}dt} , \end{aligned}$$
(5)
$$\begin{aligned} B(a,b)=\int \limits _0^1 {t^{a-1}( {1-t})^{b-1}dt} ,\,a>0,\,b>0, \mathrm{and} \, B(a,b)=\frac{(a-1)!(b-1)!}{(a+b-1)!}. \end{aligned}$$

It is well known that when we use the ranked set sampling method we select a small sample size in order to rank the units visually or by any cheap method. So to increase the sample size we repeat the process \(m\) cycles for fixed value of the sample size. Therefore, in the following two propositions we assumed that the number of cycles gets large.

Proposition 1

$$\begin{aligned} E\left[ {\Phi _{LRSS} (x)} \right]&= \frac{k}{n}\left[ {\Phi _{(k+1:n)} (x)+\Phi _{(n-k:n)} (x)} \right] +\frac{1}{n}\sum \limits _{j=k+1}^{n-k} {\Phi _{(j:n)} (x)} , \nonumber \\&= \frac{k}{n}\left[ {\Phi _{(k+1:n)} (x)\!+\!\Phi _{(n-k:n)} (x)} \right] \!-\!\frac{1}{n}\left[ {\sum \limits _{j=1}^k {\Phi _{(j:n)} (x)} +\sum \limits _{j=n-k+1}^n {\Phi _{(j:n)} (x)} } \right] \nonumber \\&+\Phi (x) \end{aligned}$$
(6)

with bias

$$\begin{aligned} Bias\left[ {\Phi _{LRSS} (x)} \right]&=E\left[ {\Phi _{LRSS} (x)} \right] -\Phi (x) \nonumber \\&=\frac{k}{n}\left[ {\Phi _{(k+1:n)} (x)+\Phi _{(n-k:n)} (x)} \right] \nonumber \\&\quad -\frac{1}{n}\left[ {\sum \limits _{j=1}^k {\Phi _{(j:n)} (x)} +\sum \limits _{j=n-k+1}^n {\Phi _{(j:n)} (x)} } \right] . \end{aligned}$$
(7)
Table 1 The efficiency and bias values of \(\Phi _{LRSS} (x)\) with respect to \(\Phi _{SRS} (x)\) for \(n=4,5\)
Table 2 The efficiency and bias values of \(\Phi _{LRSS} (x)\) with respect to \(\Phi _{SRS} (x)\) for \(n=6,7\)
Table 3 The efficiency and bias values of \(\Phi _{LRSS} (x)\) with respect to \(\Phi _{SRS} (x)\) for \(n=8\)
Table 4 The efficiency and bias values of \(\Phi _{LRSS} (x)\) with respect to \(\Phi _{SRS} (x)\) for \(n=9,10\)
Table 5 The efficiency of \(\Phi _{RSS} (x)\) with respect to \(\Phi _{SRS} (x)\) for \(n=4,5,...,10\)
Table 6 The best values of \(k\) for each sample size which give the large efficiency using LRSS with respect to SRS
Table 7 The best values of the sample size for each \(\Phi (x)\) using LRSS

Proof

To prove (6), take the expectation of (3) as

$$\begin{aligned} E\left[ {\Phi _{LRSS} (x)} \right]&= \frac{1}{nm}E\Bigg [ \sum \limits _{h=1}^m {\sum \limits _{j=1}^k {I\left( {X_{j(k+1:n)h} \le x}\right) } } +\sum \limits _{h=1}^m \sum \limits _{j=k+1}^{n-k} I\left( {X_{j(j:n)h} \le x}\right) \\&+\sum \limits _{h=1}^m {\sum \limits _{j=n-k+1}^n {I\left( {X_{j(n-k:n)h} \le x}\right) } } \Bigg ]\\&= \frac{1}{n}\left\{ {k\left[ {\Phi _{(k+1:n)} (x)+\Phi _{(n-k:n)} (x)} \right] +E\left[ {\sum \limits _{h=1}^m {\sum \limits _{j=k+1}^{n-k} {I\left( {X_{j(j:n)h} \le x}\right) } } } \right] } \right\} \\&= \frac{1}{n}\left\{ {k\left[ {\Phi _{(k+1:n)} (x)+\Phi _{(n-k:n)} (x)} \right] +\sum \limits _{j=k+1}^{n-k} {\Phi _{(j:n)} (x)} } \right\} \\&= \frac{1}{n}\Bigg \{ k\left[ {\Phi _{(k+1:n)} (x)+\Phi _{(n-k:n)} (x)} \right] +\sum \limits _{j=1}^n {\Phi _{(j:n)} (x)} \\&-\sum \limits _{j=1}^k {\Phi _{(j:n)} (x)} -\sum \limits _{j=n-k+1}^n {\Phi _{(j:n)} (x)} \Bigg \} \\&= \frac{k}{n}\left[ {\Phi _{(k+1:n)} (x)+\Phi _{(n-k:n)} (x)} \right] +\frac{1}{n}\sum \limits _{j=1}^n {\Phi _{(j:n)} (x)} -\frac{1}{n}\sum \limits _{j=1}^k {\Phi _{(j:n)} (x)}\\&-\frac{1}{n}\sum \limits _{j=n-k+1}^n {\Phi _{(j:n)} (x)} \\&= \frac{k}{n}\left[ {\Phi _{(k+1:n)} (x)+\Phi _{(n-k:n)} (x)} \right] -\frac{1}{n}\left[ {\sum \limits _{j=1}^k {\Phi _{(j:n)} (x)} \!+\!\!\sum \limits _{j=n-k+1}^n {\Phi _{(j:n)} (x)} } \right] \\&+\,\Phi (x) \\ \end{aligned}$$

The proof of (7) is straightforward by assuming \(Bias\left[ {\Phi _{LRSS} (x)} \right] =E\left[ {\Phi _{LRSS} (x)} \right] -\Phi (x)\).

Note that for fixed \(k\) as \(n\rightarrow \infty \), the bias approaches zero while it does not depends on \(m\). \(\square \)

Proposition 2

For fixed x and n, when \(m\!\rightarrow \! \infty \), the random variable \(Z\!=\!\frac{\Phi _{LRSS} (x)\!-\!E\left[ {\Phi _{LRSS} (x)} \right] }{\sqrt{\text{ Var }\left[ {\Phi _{LRSS} (x)} \right] } }\) converges in distribution to the standard normal distribution.

for

Proof

Suppose that

$$\begin{aligned} \Psi _w&= \frac{\sum \nolimits _{j=1}^k {I( {X_{j(k+1:n)w} \le x})} }{n}+\frac{\sum \nolimits _{j=k+1}^{n-k} {I( {X_{j(j:n)w} \le x})} }{n}\\&+\frac{\sum \nolimits _{j=n-k+1}^n {I( {X_{j(n-k:n)w} \le x})} }{n}\quad \text{ for } w=1,2,\ldots ,m. \end{aligned}$$

Then,

$$\begin{aligned} E( {\Psi _w })&= \frac{k}{n}\left[ {\Phi _{(k+1:n)} (x)+\Phi _{(n-k:n)} (x)} \right] -\frac{1}{n}\left[ {\sum \limits _{j=1}^k {\Phi _{(j:n)} (x)} +\sum \limits _{j=n-k+1}^n {\Phi _{(j:n)} (x)} } \right] \\&+\Phi (x)<\infty , \end{aligned}$$
$$\begin{aligned} V( {\Psi _w })&= \text{ Var }\left[ {\Phi _{LRSS} (x)} \right] \\&= \frac{k}{n^2}\left\{ {\Phi _{(k+1:n)} (x)\left[ {1-\Phi _{(k+1:n)} (x)} \right] +\Phi _{(n-k:n)} (x)\left[ {1-\Phi _{(n-k:n)} (x)} \right] } \right\} \\&+\frac{1}{n^2}\sum \limits _{j=1}^{n-2k} {\Phi _{(j:n)} (x)\left[ {1-\Phi _{(j:n)} (x)} \right] } <\infty , \\ \end{aligned}$$

where \(P( {X_{(j:n)} \le x})=\sum \nolimits _{i=j}^n {( {\begin{array}{l} n \\ i \\ \end{array}})F^i(x)( {1-F(x)})^{n-i}} \).

Since the random variables \(\Psi _w \)’s are independent and identically (IID), then by the central limit theorem (CLT) we have \(\frac{\sqrt{m} \left[ {\bar{\Psi }_w -E( {\Psi _w })} \right] }{\sqrt{\text{ Var }( {\Psi _w })} }\) converges in distribution to \(N(0,1)\). \(\square \)

3 Numerical Comparisons

In this section, a simulation study is conducted to investigate the efficiency of LRSS method in estimating the distribution function \(\Phi (x)\). The suggested estimator is compared with its competitors using RSS and SRS for samples of sizes, \(n=4,5,6,...,10\). The efficiency of \(\Phi _{RSS} (x)\) and \(\Phi _{LRSS} (x)\) with respect to \(\Phi _{SRS} (x)\), respectively are defined as

$$\begin{aligned} Eff\left[ {\Phi _{RSS} (x),\Phi _{SRS} (x)} \right] =\frac{\text{ Var }\left[ {\Phi _{SRS} (x)} \right] }{\text{ Var }\left[ {\Phi _{RSS} (x)} \right] }=\frac{n\Phi (x)\left[ {1-\Phi (x)} \right] }{\sum \nolimits _{j=1}^n {\Phi _{(j:n)} (x)\left[ {1-\Phi _{(j:n)} (x)} \right] } }, \end{aligned}$$
(8)

and

$$\begin{aligned}&Eff\left[ {\Phi _{LRSS} (x),\Phi _{SRS} (x)} \right] =\frac{\text{ Var }\left[ {\Phi _{SRS} (x)} \right] }{\text{ MSE }\left[ {\Phi _{LRSS} (x)} \right] } \nonumber \\&\quad =\frac{n\Phi (x)\left[ {1\!-\!\Phi (x)} \right] }{k\left\{ {\Phi _{(k+1:n)} (x)\left[ {1\!-\!\Phi _{(k+1:n)} (x)} \right] \!+\!\Phi _{(n-k:n)} (x)\left[ {1\!-\!\Phi _{(n-k:n)} (x)} \right] } \right\} \!+\!\!\sum \limits _{j=1}^{n-2k} {\Phi _{(j:n)} (x)\left[ {1\!-\!\Phi _{(j:n)} (x)} \right] }+\left[ { Bias}(\Phi _{{ LRSS}}(x)) \right] ^2 }\nonumber \\ \end{aligned}$$
(9)

The bold fonts in the Tables 1, 2, 3 and 4 are the best values of the efficiency using LRSS relative to SRS with fixed \(k\) for estimating the distribution function \(\Phi (x)\). It can be seen that as \(\Phi (x)\) goes to zero (0.01) or 1 (0.99), the efficiency values are the largest. As an example, for \(n=10\) and \(k=2\), the efficiency values are 9.4327 and 9.5039 for \(\Phi (x)=0.01\) and 0.99, respectively. But for other values when, \(0.10\le \Phi (x)\le 0.90\), the efficiency values are increases when \(\Phi (x)\) is approximately 0.5.

For the RSS method as it is shown in Table 5, [13] showed that the \(\Phi _{RSS} (x)\) is more efficient than \(\Phi _{SRS} (x)\). But \(\Phi _{LRSS} (x)\) is more efficient than \(\Phi _{RSS} (x)\) for most of cases considered in this study. See as an example, when \(n=8\) and \(\Phi (x)=0.99\), the efficiency values of RSS and LRSS when \(k=2\), are 1.0523 and 12.3858, respectively.

From Table 6, we can select the value of \(k\) with each sample size to get the best value of the efficiency.

Table 7 involved the suitable values of the sample size for each value of \(\Phi (x)\). However, we can note that the best values of the sample size are \(n=5\) for \(\Phi (x)=0.01,\,0.99\), \(n=4\) for \(\Phi (x)=0.10,\,0.90\), and \(n=10\) for \(0.2\le \Phi (x)\le 0.8\).

4 Conclusion

A new estimator of the distribution function is suggested using L ranked set sampling method. The suggested estimator is compared with the SRS and RSS estimators of the distribution function. It is found that LRSS estimator is biased and it is more efficient than the SRS and RSS for most of cases considered in this study. It is recommended to use the LRSS for estimating the distribution of a continuous random variable.