1 Introduction

Ranked set sampling (RSS) is a data collection method introduced by McIntyre (1952) in an agricultural context involving pasture yields. It serves as an alternative to the usual simple random sampling (SRS) in situations in which exact measurements of sample units are difficult or expensive to obtain but judgment ranking of them according to the variable of interest is relatively easy and cheap. The judgment ranking is usually performed visually (by a field expert, say), or using one or more concomitant variables, but it cannot necessitate actual measurements on the selected units.

The RSS design can be explained as follows:

  1. 1.

    Draw k random samples, each of size k, from the target population.

  2. 2.

    Apply the judgement ordering, by any cheap method without the actual measurement of the variable of interest, on the elements of the rth (\(r=1, \ldots ,k\)) sample and identify the \(r\hbox {th}\) smallest unit.

  3. 3.

    Actually measure the k identified units in step 2.

  4. 4.

    Repeat steps 1–3, m times (cycles), if needed, to obtain a ranked set sample of size mk.

Let \(X_{[r]i}\) be the \(r\hbox {th}\) judgement order statistic from the ith cycle. This is a standard notation in the RSS literature (see Chapter 1 in Chen et al. 2004, for example). Then, the resulting ranked set sample is denoted by \(\{X_{[r]i}: r=1,\ldots ,k\,;i=1,\ldots ,m \}\). The design parameter k is called set size. To facilitate the judgment ranking, the set size should be kept small in practice, say 2–8. Nonetheless, larger set sizes can be used as long as the ranking process is not hampered.

A ranked set sample comprising m cycles and with set size k exploits information about far more units than a simple random sample of size mk. In the RSS, the judgment ranking information about \(mk(k-1)\) unmeasured units contributes to drawing a more representative sample. The SRS, however, has no mechanism for incorporating the judgment ranking information. Thus, the RSS-based procedures are usually more efficient than their SRS competitors. The extent of improvement hinges on the accuracy of the judgment ranking. The RSS has been applied in a variety of fields, including forestry (Halls and Dell 1966), entomology (Howard et al. 1982), environmental monitoring (Kvam 2003), clinical trials and genetic quantitative trait loci mappings (Chen 2007), segmentation of Terahertz images (Ayech and Ziou 2015), and medicine (Zamanzade and Mahdizadeh 2017).

In reliability theory, the probability \(\theta =P(X<Y)\) represents the reliability of a stress–strength model, where X and Y represent the stress and strength variables, respectively. This probability also quantifies steady state availability of a repairable system with X and Y denoting repair time and lifetime of the system, respectively. In fact, \(\theta \) provides a general measure of the difference between two populations, that has found applications in diverse areas (Kotz et al. 2003). For example, it is a measure of household financial fragility in economics when X and Y are disposable household income and consumption, respectively. In medicine, it is interpreted as a measure of treatment’s effectiveness if X and Y are the response variables from control and treatment groups, respectively. The latter situation is illustrated using a real data set in Sect. 5.

It is well known that a point estimate is generally different from the true parameter value, say \(\vartheta \). Moreover, it does not convey any measure of reliability. Interval estimation is another type of estimation which contains more information about the data used to obtain the point estimate. It allows us to have some degree of confidence for securing \(\vartheta \). Interval estimators are called confidence intervals (CIs). Let L and U be two statistics such that \(P\left( L< \vartheta <U\right) =1-\alpha \), for some \(\alpha \in (0,1)\). Then, (LU) is a CI for \(\vartheta \) with coverage probability (confidence level) \(1-\alpha \). The width of the CI reflects the amount of variability inherent in the point estimate. A good interval should be relatively narrow on the average, with high probability of enclosing the true parameter.

This article deals with constructing some CIs for \(\theta \) in the RSS design. It is worth noting that point estimation of different population attributes have been comprehensively studied in the RSS literature, while hypothesis testing and interval estimation problems have received little attention (see Chen et al. 2004; Wolfe 2012; Chapter 15 in Hollander et al. 2014 for a good review of the RSS and its applications). Yin et al. (2016) proposed a CI for \(\theta \) based on kernel density estimation. A simpler approach is to use empirical distribution function, which has not been investigated yet. We set out to fill this gap in this work. It emerges that the resulting intervals have an edge over the existing one.

In Sect. 2, our point estimator is introduced and its theoretical properties are studied. Some estimators for variance of this estimator also are presented. In Sect. 3, six types of intervals are developed. Section 4 contains results of Monte Carlo simulations assessing performances of the suggested intervals in terms of coverage probability and expected length. An agricultural data set is analyzed in Sect. 5. Final conclusions appear in Sect. 6. Proofs are put off to an appendix.

2 Nonparametric estimation

Let \(\{X_{[r]i}: r=1,\ldots ,k\,;i=1,\ldots ,m \}\) and \(\{Y_{[s]j}: s=1,\ldots ,\ell \,;j=1,\ldots ,n \}\) be independent ranked set samples from two populations with the distribution functions F and G, respectively. Also, the survival function associated with G is denoted by \({\bar{G}}\). The standard estimator of \(\theta \) is given by

$$\begin{aligned} {{\hat{\theta }}}_{\text {RSS}}=\frac{1}{mk n \ell } \sum _{i=1}^m \sum _{j=1}^n \sum _{r=1}^k \sum _{s=1}^\ell I(X_{[r]i}<Y_{[s]j}), \end{aligned}$$
(1)

where \(I\left( .\right) \) is the indicator function.

The properties of \({\hat{\theta }}_{\text {RSS}}\) in the especial case of \(m=n=1\) was investigated by Sengupta and Mukhuti (2008). They showed that the this estimator is unbiased and more efficient than its SRS counterpart, even in the presence of ranking errors. The following result shows asymptotic normality of \({\hat{\theta }}_{\text {RSS}}\).

Proposition 1

Let \({\hat{\theta }}_{\text {RSS}}\) be as in (1), and \(N=mk+n\ell \). If \(m,n \rightarrow \infty \) and \((mk)/N \rightarrow \lambda \in (0,1)\), then

$$\begin{aligned} \sqrt{N}({\hat{\theta }}_{\text {RSS}}-\theta ) {\mathop {\rightarrow }\limits ^{d}} N\left( 0,\frac{\sigma _1^2}{\lambda }+\frac{\sigma _2^2}{1-\lambda }\right) , \end{aligned}$$

where

$$\begin{aligned} \sigma _1^2=Var\left( {\bar{G}}(X)\right) -\frac{1}{k} \sum _{r=1}^k \left[ E\left( {\bar{G}}(X_{[r]})\right) -\theta \right] ^2, \end{aligned}$$

and

$$\begin{aligned} \sigma _2^2=Var\left( F(Y)\right) -\frac{1}{\ell } \sum _{s=1}^\ell \left[ E\left( F(Y_{[s]})\right) -\theta \right] ^2. \end{aligned}$$

Suppose \({\hat{\theta }}_{\text {SRS}}\) is the counterpart of (1) based on two independent simple random samples of sizes mk and \(n\ell \) from F and G, respectively. By virtue of the next result, \({\hat{\theta }}_{\text {RSS}}\) is asymptotically more efficient than \({\hat{\theta }}_{\text {SRS}}\). This statement is valid regardless of the accuracy of the judgment ranking process.

Proposition 2

The asymptotic relative efficiency of \({\hat{\theta }}_{\text {RSS}}\) to \({\hat{\theta }}_{\text {SRS}}\) is

$$\begin{aligned} ARE({\hat{\theta }}_{\text {RSS}},{\hat{\theta }}_{\text {SRS}})=1+\frac{\frac{1-\lambda }{k}\sum _{r=1}^k \left[ E\left( {\bar{G}}(X_{[r]})\right) -\theta \right] ^2 + \frac{\lambda }{\ell }\sum _{s=1}^\ell \left[ E\left( F(Y_{[s]})\right) -\theta \right] ^2 }{\frac{1-\lambda }{k}\sum _{r=1}^k Var\left( {\bar{G}}(X_{[r]})\right) + \frac{\lambda }{\ell }\sum _{s=1}^\ell Var\left( F(Y_{[s]})\right) }. \end{aligned}$$

In light of the above result, interval estimation of \(\theta \) based on the RSS is expected to be more efficient than that based on the SRS. An estimate of \(Var({\hat{\theta }}_{\text {RSS}})\) is needed to propose an interval based on Proposition 1. To the best of our knowledge, this has not been studied yet. In the sequel, we introduce an estimator for \(\sigma _1^2\). Similar arguments yield an estimate of \(\sigma _2^2\). These are combined to arrive at the final estimator.

Let \(\{X_{[r]i}: r=1,\ldots ,k;\, i=1,\ldots ,m\}\) be a ranked set sample from a population with finite mean \(\mu \) and variance \(\sigma ^2\). If \(\mu _{[r]}\) and \(\sigma ^2_{[r]}\) denote the mean and variance of \(X_{[r]1}\), respectively, then Stokes (1980) showed that

$$\begin{aligned} \sigma ^2=\frac{1}{k} \sum _{r=1}^k \sigma ^2_{[r]}+\frac{1}{k} \sum _{r=1}^k \left( \mu _{[r]}-\mu \right) ^2. \end{aligned}$$
(2)

Suppose the random variable W is defined as \(W={\bar{G}}(X)\). Then using (2), we get

$$\begin{aligned} Var(W)=\frac{1}{k} \sum _{r=1}^k Var(W_{[r]})+\frac{1}{k} \sum _{r=1}^k \left[ E(W_{[r]})-E(W) \right] ^2, \end{aligned}$$

where \(W_{[r]}={\bar{G}}(X_{[r]1})\). That is to say that

$$\begin{aligned} \sigma _1^2=\frac{1}{k} \sum _{r=1}^k Var(W_{[r]}). \end{aligned}$$

Now, from Equation 3 in MacEachern et al. (2002), one can construct an estimator for \(\sigma _1^2\) as

$$\begin{aligned} \hat{\sigma }_1^2=\frac{1}{2k m(m-1)} \sum _{r=1}^k \sum _{i=1}^m \sum _{i'=1}^m \left( \mathcal {W}_{[r]i}-\mathcal {W}_{[r]i'} \right) ^2, \end{aligned}$$
(3)

where

$$\begin{aligned} \mathcal {W}_{[r]i}=\frac{1}{n \ell } \sum _{s=1}^\ell \sum _{j=1}^n I(X_{[r]i}<Y_{[s]j}). \end{aligned}$$

Similarly, an estimator of \(\sigma _2^2\) is obtained as

$$\begin{aligned} \hat{\sigma }_2^2=\frac{1}{2\ell n(n-1)} \sum _{s=1}^\ell \sum _{j=1}^n \sum _{j'=1}^n \left( \mathcal {Z}_{[s]j}-\mathcal {Z}_{[s]j'} \right) ^2, \end{aligned}$$
(4)

where

$$\begin{aligned} \mathcal {Z}_{[s]j}=\frac{1}{m k} \sum _{r=1}^k \sum _{i=1}^m I(X_{[r]i}<Y_{[s]j}). \end{aligned}$$

Combining (3) and (4), we conclude that

$$\begin{aligned} \widehat{Var}({\hat{\theta }}_{\text {RSS}})=\frac{1}{N}\left( \frac{\hat{\sigma }_1^2}{\hat{\lambda }}+\frac{\hat{\sigma }_2^2}{1-\hat{\lambda }}\right) , \end{aligned}$$
(5)

where \(\hat{\lambda }=(mk)/N\). The above estimator is expected to work well for moderate to large values of m and n, but not for small choices of them. In the sequel, three alternatives are suggested.

The jackknife methodology has been proposed to serve two purposes, namely, to reduce a possible bias of an estimator, and to yield an approximation for its variance (see Quenouille 1956; Tukey 1958). Let \({\hat{\theta }}(X_1,\ldots ,X_n)\) be a statistic of interest, where \(X_ i\)’s are iid random variables, and \({\hat{\theta }}\) is invariant under permutation of the arguments. If \({\hat{\theta }}^{(i)}\) denotes the value of \({\hat{\theta }}\) based on \(X_1,\ldots ,X_{i-1},X_{i+1},\ldots ,X_n\), then the jackknife estimate of \(Var({\hat{\theta }})\) is given by

$$\begin{aligned} \widehat{Var}({\hat{\theta }})=\frac{n-1}{n}\sum _{i=1}^n \left( {\hat{\theta }}^{(i)}-\hat{\theta }^{(0)}\right) ^2, \end{aligned}$$

where \({\hat{\theta }}^{(0)}=\sum _{i=1}^n {\hat{\theta }}^{(i)}/n\).

A ranked set sample consists of independent but not identically distributed random variables. Therefore, one should adapt the above technique to estimate \(Var({\hat{\theta }}_{\text {RSS}})\). The first method is to treat data as \(m+n\) iid random variables \({\mathbf {X}}_1,\ldots ,{\mathbf {X}}_m,\mathbf {Y}_1,\ldots ,\mathbf {Y}_n\), where \({\mathbf {X}}_i=(X_{[1]i},\ldots ,X_{[k]i})\) (\(i=1,\ldots ,m\)) and \(\mathbf {Y}_j=(Y_{[1]j},\ldots ,Y_{[\ell ]j})\) (\(j=1,\ldots ,n\)). This is to say that \({\mathbf {X}}_i\) (\(\mathbf {Y}_j\)) contains the elements of the X (Y) sample drawn in the ith (jth) cycle. Suppose \(\tilde{\theta }_{\text {RSS}}^{(t)}\) is value of the reliability estimator when \(\mathbf {Z}_t\), \(t=1,\ldots ,m+n\), is omitted from the data, where

$$\begin{aligned} \mathbf {Z}_t = \left\{ \begin{array}{ll} {\mathbf {X}}_t &{} t=1,\ldots ,m\\ \mathbf {Y}_{t-m} &{} t=m+1,\ldots ,m+n \end{array} \right. . \end{aligned}$$

Now, the jackknife estimate of the variance is

$$\begin{aligned} {\widetilde{Var}}_1({\hat{\theta }}_{\text {RSS}})=\frac{m+n-1}{m+n} \sum _{t=1}^{m+n} \left( \tilde{\theta }_{\text {RSS}}^{(t)}-\tilde{\theta }^{(0)}\right) ^2, \end{aligned}$$
(6)

where \(\tilde{\theta }^{(0)}=\sum _{t=1}^{m+n} \tilde{\theta }_{\text {RSS}}^{(t)}/(m+n)\).

It is possible to obtain another jackknife-type estimate of the variance by excluding the cycles from the two samples simultaneously. Let \(\breve{\theta }_{\text {RSS}}^{(u,v)}\) denote value of the estimator when \({\mathbf {X}}_u\) and \(\mathbf {Y}_v\) are removed from the data. The second estimator is then

$$\begin{aligned} {\widetilde{Var}}_2({\hat{\theta }}_{\text {RSS}})=\frac{m n-1}{m n} \sum _{u=1}^{m} \sum _{v=1}^{n} \left( \breve{\theta }_{\text {RSS}}^{(u,v)}-\breve{\theta }^{(0)}\right) ^2, \end{aligned}$$
(7)

where \(\breve{\theta }^{(0)}=\sum _{u=1}^{m} \sum _{v=1}^{n} \breve{\theta }_{\text {RSS}}^{(u,v)}/(m n)\).

The bootstrap method, introduced by Efron (1979), can also be used to estimate the variance. The method involves drawing samples repeatedly from the empirical distribution function. Suppose \(X_1,\ldots ,X_n\) is a random sample from the target population, and \({\hat{\theta }}\) is an estimator of interest. First we draw a sample of size n, with replacement, from the data points (called a bootstrap sample). This sampling procedure is repeated B times, and the estimator is computed from each bootstrap sample. The sample variance of these B values is then the bootstrap estimate of \(Var({\hat{\theta }})\).

Modarres et al. (2006) suggested two bootstrap algorithms in the RSS design. The bootstrap ranked set sampling (BRSS) method, which is the most efficient one, is now delineated. Let \(F_{m k}\) be the empirical distribution function based on the ranked set sample \(\{X_{[r]i}: r=1,\ldots ,k;\, i=1,\ldots ,m\}\), i.e.

$$\begin{aligned} F_{m k}(x)=\frac{1}{m k} \sum _{r=1}^k \sum _{i=1}^m I(X_{[r]i}\le x). \end{aligned}$$

According to the BRSS algorithm, a bootstrap sample is drawn as follows:

  1. 1.

    Assign to each element of the ranked set sample a probability of \((mk)^{-1}\).

  2. 2.

    Randomly draw k elements \({\mathcal {X}}_1,\ldots ,{\mathcal {X}}_k {\mathop {\sim }\limits ^{iid}} F_{m k}\), sort them in ascending order \({\mathcal {X}}_{(1)},\ldots ,{\mathcal {X}}_{(k)}\), and retain \(X_{[r]1}^*={\mathcal {X}}_{(r)}\).

  3. 3.

    Perform step 2 for \(r=1,\ldots ,k\).

  4. 4.

    Repeat steps 2 and 3 m times to obtain {\(X_{[r]i}^*\)}.

Following similar steps, a bootstrap copy of \(\{Y_{[s]j}: s=1,\ldots ,\ell ;\, j=1,\ldots ,n\}\) is generated. Suppose B pairs of bootstrap samples are drawn as described above, and let \({\hat{\theta }}_{\text {RSS}}^b\) be the value of the reliability estimator based on data in the bth (\(b=1,\ldots ,B\)) replication. Then bootstrap variance estimator is given by

$$\begin{aligned} {\widehat{Var}}_{\text {boot}}({\hat{\theta }}_{\text {RSS}})=\frac{1}{B-1}\sum _{b=1}^B \left( {\hat{\theta }}_{\text {RSS}}^b-\bar{\theta }^* \right) ^2, \end{aligned}$$
(8)

where \(\bar{\theta }^*=\sum _{b=1}^B {\hat{\theta }}_{\text {RSS}}^b/B\).

3 Proposed CIs

In this section, we construct several CIs for \(\theta \) using asymptotic and resampling methods. Based on Proposition 1, one can employ the pivotal quantity

$$\begin{aligned} T=\frac{{\hat{\theta }}_{\text {RSS}}-\theta }{\sqrt{\widehat{Var}({\hat{\theta }}_{\text {RSS}})}} \thickapprox N(0, 1), \end{aligned}$$

where \(\widehat{Var}({\hat{\theta }}_{\text {RSS}})\) is defined in (5). The corresponding approximate (\(1-\alpha \))-CI is

$$\begin{aligned} \left( {\hat{\theta }}_{\text {RSS}}-z_{\alpha /2}\sqrt{\widehat{Var}({\hat{\theta }}_{\text {RSS}})},{\hat{\theta }}_{\text {RSS}}+z_{\alpha /2}\sqrt{\widehat{Var}({\hat{\theta }}_{\text {RSS}})} \right) , \end{aligned}$$
(9)

where \(z_{\alpha /2}\) is the (\(1-\alpha /2\)) quantile of the standard normal distribution. The pivotal quantity T can be altered if \(\widehat{Var}({\hat{\theta }}_{\text {RSS}})\) is replaced by one of the estimates presented in (6), (7) and (8). Accordingly, natural modifications of (9) would be

$$\begin{aligned}&\left( {\hat{\theta }}_{\text {RSS}}-z_{\alpha /2}\sqrt{{\widetilde{Var}}_1({\hat{\theta }}_{\text {RSS}})},{\hat{\theta }}_{\text {RSS}}+z_{\alpha /2}\sqrt{{\widetilde{Var}}_1({\hat{\theta }}_{\text {RSS}})} \right) , \end{aligned}$$
(10)
$$\begin{aligned}&\left( {\hat{\theta }}_{\text {RSS}}-z_{\alpha /2}\sqrt{{\widetilde{Var}}_2({\hat{\theta }}_{\text {RSS}})},{\hat{\theta }}_{\text {RSS}}+z_{\alpha /2}\sqrt{{\widetilde{Var}}_2({\hat{\theta }}_{\text {RSS}})} \right) , \end{aligned}$$
(11)

and

$$\begin{aligned} \left( {\hat{\theta }}_{\text {RSS}}-z_{\alpha /2}\sqrt{{\widehat{Var}}_{\text {boot}}({\hat{\theta }}_{\text {RSS}})},{\hat{\theta }}_{\text {RSS}}+z_{\alpha /2}\sqrt{{\widehat{Var}}_{\text {boot}}({\hat{\theta }}_{\text {RSS}})} \right) . \end{aligned}$$
(12)

We can construct a two-sided equal-tailed (\(1-\alpha \))-CI for \(\theta \) from the empirical distribution function of a series of bootstrap replications of \({\hat{\theta }}_{\text {RSS}}\). The \(\alpha /2\) and the \(1-\alpha /2\) quantiles of the bootstrap replications are used as lower and upper confidence bounds. This procedure is called percentile bootstrap, and the corresponding interval is given by

$$\begin{aligned} \left( {\hat{\theta }}_{\text {RSS}}^{\alpha /2}, {\hat{\theta }}_{\text {RSS}}^{1-\alpha /2}\right) , \end{aligned}$$
(13)

where \({\hat{\theta }}_{\text {RSS}}^{\beta }\) is the \(\beta \) quantile of \({\hat{\theta }}_{\text {RSS}}^1,\ldots ,{\hat{\theta }}_{\text {RSS}}^B\).

The bootstrap-t method approximates quantiles of the distribution of T from sample quantiles of the quantities

$$\begin{aligned} T_b=\frac{{\hat{\theta }}_{\text {RSS}}^b-{\hat{\theta }}_{\text {RSS}}}{\sqrt{{\widehat{Var}}({\hat{\theta }}_{\text {RSS}}^b)}} \quad (b=1,\ldots ,B), \end{aligned}$$

where \({\hat{\theta }}_{\text {RSS}}^b\) and \({\widehat{Var}}({\hat{\theta }}_{\text {RSS}}^b)\) are computed from the bth bootstrap sample. The bootstrap-t interval is defined as

$$\begin{aligned} \left( {\hat{\theta }}_{\text {RSS}}-t_{1-\alpha /2}\sqrt{\widehat{Var}({\hat{\theta }}_{\text {RSS}})},{\hat{\theta }}_{\text {RSS}}-t_{\alpha /2}\sqrt{\widehat{Var}({\hat{\theta }}_{\text {RSS}})} \right) , \end{aligned}$$
(14)

where \(t_{\beta }\) is the \(\beta \) quantile of \(T_1,\ldots ,T_B\).

The intervals (9), (10), (11), (12), (13) and (14) will be referred to as Normal, Normal-J1, Normal-J2, Normal-B, Boot-p and Boot-t, respectively. It should be mentioned that all the above intervals, except Boot-p, may have endpoints outside the interval (0,1). Therefore, we correct the original interval (LU) as \(\left( \max \{0,L\},\min \{1,U\}\right) \).

4 Simulation results

This section contains results of simulation studies conducted to compare the performances of the different intervals suggested in the previous section. We consider the cases where both X and Y follow either a normal or exponential distribution. If \(X-\mu \, (\mu \in \mathbb {R}\)) and Y are independent standard normal random variables, then

$$\begin{aligned} \theta =\Phi \left( \frac{-\mu }{\sqrt{2}} \right) , \end{aligned}$$

where \(\Phi (.)\) is the distribution function of Y. Similarly, for independent standard exponential random variables \(X/\beta \,(\beta >0\)) and Y, it can be shown that

$$\begin{aligned} \theta =\frac{1}{1+\beta }. \end{aligned}$$

Under each parent distribution, three values were assigned to the associated parameter so as to produce \(\theta =0.25,0.5,0.75\) which are referred to as case A, B and C, respectively. The appropriate parameter values are given in Table 1. If the total sample sizes are denoted by \(N_1=mk\) and \(N_2=n\ell \), then we select \((N_1,N_2) \in \big \{(10,10),(10,20),(10,30),(20,20)\big \}\). Also, ranked set samples are drawn from the two populations using common set sizes \(k=\ell =1,2,5\), where the set size one simply represents the SRS design.

We assume that the ranking the variables of interest X and Y are done based on the concomitant variables \({\mathcal {X}}\) and \({\mathcal {Y}}\) which are related according to equations

$$\begin{aligned} {\mathcal {X}}=\rho _1 \left( \frac{X-\mu _x}{\sigma _x} \right) + \sqrt{1-\rho _1^2} Z_1, \end{aligned}$$

and

$$\begin{aligned} {\mathcal {Y}}=\rho _2 \left( \frac{Y-\mu _y}{\sigma _y} \right) + \sqrt{1-\rho _2^2} Z_2, \end{aligned}$$

where \(\rho _i \in [0,1]\,\, (i=1,2)\), and \(Z_1\, (Z_2\)) is a standard normal random variable independent from \(X \,(Y\)). Moreover, \(Z_1\) and \(Z_2\) are independent. The quality of rankings are controlled by the parameter \(\rho _i\)’s. It is easy to see that \(Corr(X,{\mathcal {X}})=\rho _1\) and \(Corr(Y,{\mathcal {Y}})=\rho _2\). The chosen values of \(\left( \rho _1,\rho _2\right) \) are (1, 1) for perfect rankings of X and Y, (1, 0.8) for perfect ranking of X and fairly accurate ranking of Y, and (0.8, 0.8) for fairly accurate rankings of X and Y.

Table 1 Parameter values corresponding to case A, B and C

For each combination of distribution, sample sizes and correlations, 5000 pairs of samples were generated in the RSS design (with the aforesaid set sizes). The six intervals were constructed from each pair of samples for \(\alpha =0.05\). In doing so, number of the bootstrap replications is chosen to be 500. Then, coverage rate and expected length of any interval is estimated by fraction of the intervals containing true \(\theta \), and mean of the intervals’ lengths, respectively. The results with the perfect ranking are reported in Tables 2, 3, 4 and 5, where the lengths of intervals appear in parentheses.

It can be seen generally that the higher length of interval, the better coverage probability. Normal-J2 and Boot-t CIs have the best coverage rates, and the latter is always shorter. Also, Normal-B and Boot-p are the shortest CIs, and their coverage rates are more or less the same. For a fixed \(N_1+N_2\), performances of the CIs generally improve with equal sample sizes setup. Compare similar intervals for sample sizes (10, 30) and (20, 20) under different parent distributions.

Table 2 Estimated coverage rates and lengths of 95% intervals under normal distribution with the perfect ranking when \((N_1,N_2)=(10,10),(10,20)\)
Table 3 Estimated coverage rates and lengths of 95% intervals under normal distribution with the the perfect ranking when \((N_1,N_2)=(10,30),(20,20)\)
Table 4 Estimated coverage rates and lengths of 95% intervals under exponential distribution with the perfect ranking when \((N_1,N_2)=(10, 10),(10,20)\)
Table 5 Estimated coverage rates and lengths of 95% intervals under exponential distribution with the perfect ranking when \((N_1,N_2)=(10,30),(20,20)\)

Given a pair of total sample sizes, the lengths of all intervals are decreasing in the set size regardless of the case (A, B or C). However, changes in the coverage probabilities are not regular. The above trends are consistent with some results in the RSS literature. For example, Terpstra and Miller (2006) studied exact inference for a population proportion based on the RSS. According to their findings, expected length of the RSS-based CI is uniformly (as a function of the true population proportion) smaller than that of the SRS-based CI. However, there is not a uniform superiority for the coverage probability. See Figure 3 in Terpstra and Miller (2006). In our problem, the situation is more complex because the intervals are based on asymptotic and/or resampling methods.

If the perfect rankings are assumed, the performances of each interval for cases A and C are in close agreement when the parent distribution is normal. This statement is true about the exponential distribution if \(N_1=N_2\). These properties can also be observed in the imperfect ranking setup (see Tables 1–8 in the supplementary material), but the additional assumption \(\rho _1=\rho _2\) is needed for the exponential distribution. In the presence of ranking errors, lengths of the CIs increase (as compared with the perfect ranking case) but the coverage probabilities do not behave regularly. Overall, Normal-J2 and Boot-t CIs have satisfactory coverage rates (which are close to the nominal level or higher than it) in this situation, although they are longer than the other CIs.

Mahdizadeh and Zamanzade (2016) used kernel density estimation to estimate \(\theta \) in the RSS. Let \(h_1\) and \(h_2\) be bandwidth of the kernel density estimator based on \(\{X_{[r]i}: r=1,\ldots ,k\,;i=1,\ldots ,m \}\) and \(\{Y_{[s]j}: s=1,\ldots ,\ell \,;j=1,\ldots ,n \}\), respectively (see Chen 1999 for the kernel density estimation in the RSS). If \(t=\sqrt{h_1^2+h_2^2}\), then kernel-based estimator is given by

$$\begin{aligned} \tilde{\theta }_{\text {RSS}}=\frac{1}{mk n \ell } \sum _{i=1}^m \sum _{j=1}^n \sum _{r=1}^k \sum _{s=1}^\ell \Phi \left( \frac{Y_{[s]j}-X_{[r]i}}{t} \right) , \end{aligned}$$

where \(\Phi (.)\) is the distribution function of standard normal random variable. Yin et al. (2016) established asymptotic normality of the above estimator, and employed this result in developing a CI for \(\theta \). The corresponding interval is defined as

$$\begin{aligned} \left( \tilde{\theta }_{\text {RSS}}-z_{\alpha /2}\sqrt{\widehat{Var}(\tilde{\theta }_{\text {RSS}})},\tilde{\theta }_{\text {RSS}}+z_{\alpha /2}\sqrt{\widehat{Var}(\tilde{\theta }_{\text {RSS}})} \right) , \end{aligned}$$
(15)

where \(\widehat{Var}(\tilde{\theta }_{\text {RSS}})\) is computed similar to (5) based on

$$\begin{aligned} \mathcal {W}_{[r]i}=\frac{1}{n \ell } \sum _{s=1}^\ell \sum _{j=1}^n \Phi \left( \frac{Y_{[s]j}-X_{[r]i}}{t} \right) , \end{aligned}$$

and

$$\begin{aligned} \mathcal {Z}_{[s]j}=\frac{1}{m k} \sum _{r=1}^k \sum _{i=1}^m \Phi \left( \frac{Y_{[s]j}-X_{[r]i}}{t} \right) . \end{aligned}$$

We conducted a partial simulation study to compare CIs (9) and (15) in terms of the coverage probability and length, based on 10,000 pairs of samples. In determining \(h_1\) and \(h_2\), the following three methods of the bandwidth selection were utilized: normal reference (NR) rule, unbiased cross-validation (UCV), and plug-in (PI). Although these techniques are developed for the SRS (see Sheather 2004 for more details), they can be applied in the RSS setup by considering data as if collected by the SRS.

Figures 1 and 2 display the results for \((N_1,N_2)=(10, 10)\) with \(k=\ell =2,5\), when the perfect rankings are assumed. Here, black/solid curve is corresponding to the interval (9). Also, blue/dashed, red/dotted and orange/longdash curves are associated with the interval (15) using NR, UCV and PI methods, respectively. It is observed that the interval (9) has better coverage rate, while the interval (15) is always shorter. Hence, there is not a single interval preferred from both aspects. Among the kernel-based intervals, overall performance of the CI using PI method is satisfactory.

Fig. 1
figure 1

Estimated coverage rates and lengths of 95% intervals under normal distribution with the perfect ranking when \((N_1,N_2)=(10,10)\)

Fig. 2
figure 2

Estimated coverage rates and lengths of 95% intervals under exponential distribution with the perfect ranking when \((N_1,N_2)=(10,10)\)

5 Illustration

We now apply the proposed procedures to an agricultural data set. Murray et al. (2000) conducted an experiment in which apple trees are sprayed with chemical containing fluorescent tracer, Tinopal CBS-X, at 2% concentration level in water. Two nine-tree plots were chosen for spraying. One plot was sprayed at high volume, using coarse nozzles on the sprayer to give a large average droplet size. The other plot was sprayed at low volume, using fine nozzles to give a small average droplet size. Fifty sets of five leaves were identified from the central five trees of each plot, and used to draw 10 copies a ranked set sample of size five, from each plot. The variable of interest is the percentage of area covered by the spray on the surface of the leaves. The formal measurement entails chemical analysis of the solution collected from the surface of the leaves, and thereby is a time-consuming and expensive process. The judgment ranking within each set is based on the visual appearance of the spray deposits on the leaf surfaces when viewed under ultraviolet light. Clearly, the latter method is cheap, and fairly accurate if implemented by an expert observer.

Table 6 Ranked set sample data for the percentage area covered on the surface of the leaves of apple trees
Table 7 95% CIs for \(\theta \) based on the apple trees data

The data are given in Table 6, where measurements obtained from the plot sprayed at high (low) volume constitute the control (treatment) group. The interest centers on knowing whether the sprayer settings affect the percentage area coverage. If \(X\, (Y\)) denotes the response variable from the control (treatment) group, then \({\hat{\theta }}_{\text {RSS}}\) is a measure of the treatment effect. From the data in Table 6, \({\hat{\theta }}_{\text {RSS}}=0.6184\) is obtained with estimated variances \({\widehat{Var}}({\hat{\theta }}_{\text {RSS}})=0.001344\), \({\widetilde{Var}}_1({\hat{\theta }}_{\text {RSS}})=0.001825\), \({\widetilde{Var}}_2({\hat{\theta }}_{\text {RSS}})=0.019038\), and \({\widehat{Var}}_{\text {boot}}({\hat{\theta }}_{\text {RSS}})=0.001169\). For the bootstrap-based estimate, \(B=5000\) is used. It is seen that with the exception of \({\widetilde{Var}}_2({\hat{\theta }}_{\text {RSS}})\), all of the estimates are in good agreement. Table 7 displays 95% CIs for \(\theta \) based on different methods. Apart from Normal-J2 interval, we may conclude that the treatment effect is significant at 0.05 level as none of the intervals contain 0.5. It should be mentioned that Normal-J2 is the longest interval in this example, and this is consistent with simulation results in Sect. 4.

As a reviewer pointed out, the accuracy of sampling and statistical inference largely hinges on properties of the population, conditions of the sample, and the method of estimation, so called sampling and statistical trinity (see Wang et al. 2012, for example). Here, the proposed procedures are illustrated using agricultural data. Spatial population may be dominated by spatial autocorrelation, spatial stratified heterogeneity, or both. Also, there may be significant covariates. The properties of the population should be tested before making a choice of the most suitable one among numerous estimators. To justify the choice of a method, a table may be drawn to compare the assumptions of the mainstream models in the topic and the properties of the data under study (e.g. spatial autocorrelation, spatial stratified heterogeneity, and the significance of covariates). Unfortunately, we have not any information about the population from which our sample in Table 6 is drawn. Thus, it is not possible to check the aforesaid properties.

6 Conclusion

The RSS method combines measurement with the judgment ranking information for purpose of statistical inference. It is advantageous in settings where precise measurement on the variable of interest is difficult (e.g., time-consuming, expensive or destructive), but small sets of units can be accurately ranked without actual quantification.

While point estimation of different population attributes have been exhaustively studied in the RSS literature, hypothesis testing and interval estimation problems have received little attention. This article aims to fill this gap in the context of estimating the reliability parameter. Several asymptotic and resampling-based intervals are developed, and compared with their SRS analogs through extensive simulation study. The results confirm the preference of the RSS-based CIs with respect to length, although their coverage rates are not uniformly superior. An agricultural data set is used to illustrate the suggested interval estimation procedures.

The intervals presented in this work utilize a point estimator constructed based on empirical distribution function. We have partly investigated performance of one of the CIs modified using kernel density estimation. The other intervals can be adapted similarly. This will be considered in a separate article.