1 Introduction

Ranked set sampling (RSS) is an alternative method of data collection and has been known as a cost-efficient sampling procedure for many years. This approach to data collection was first proposed by McIntyre (1952) as a method to improve the precision of estimated pasture yield. Later, Takahasi and Wakimoto (1968) established a rigorous statistical foundation for the theory of RSS. For more details and real applications of RSS scheme, one may refer to Patil et al. (1999), Chen (2007), Linder et al. (2015), Samawi et al. (2017), and Al-Omari and Bouza (2014). An RSS utilizes the basic intuitive properties associated with simple random sampling (SRS). However, it involves the extra structure induced through the judgment ranking and the independence of the resulting order statistics. As a result, the procedures based on RSS lead to more efficient estimators of population parameters than those based on an SRS with the same sample size. The existing literature also includes works on hypothesis testing as well as point and interval estimation under both parametric and nonparametrics settings. See, for example, Bohn and Wolfe (1992), Chen et al. (2006), Fligner and MacEachern (2006), Frey (2007), Ozturk and Balakrishnan (2009), and the references cited therein.

The most basic version of RSS is the so-called balanced RSS. The process of generating an RSS involves drawing \(k^2\) units at random from the target population. These items are then randomly divided into k sets of k units each. Within each set, the units are then ranked by some means other than direct measurement. For example, the ranking can be done either visually or by using a concomitant measurement that is comparatively cheaper to measure and also easier to obtain than the measurement of interest itself. Finally, one item from each set is chosen for actual quantification. To be more specific, from the first set we select the item with the smallest judgment-rank for measurement, from the second set we select the item with the second smallest judgment-rank, and so on, until the unit ranked largest is chosen from the kth set. This complete procedure, called a cycle, is repeated independently m times to obtain a Ranked set sample of size mk. Therefore, a balanced RSS of size mk requires a total of \(mk^2\) units to be selected, but only mk of them are actually measured. Hence, a wider range of the population can be covered while greatly reducing the sampling cost. According to Takahasi and Wakimoto (1968), for easy implementation of RSS, the set size k is usually kept as small as 4 or less. However, we can obtain a large sample by increasing the cycle size m. Another option is that of unbalanced RSS. In an unbalanced RSS, \(n\times k\) units are selected at random from the target population. These items are then randomly divided into n sets of k units each. Units in each set are judgment ranked without measuring the actual units. In this setting, let \(m_{r}\) denote the number of sets allocated to measure units having the rth judgment-rank such that \(n=\sum _{r=1}^{k} m_{r}\). The measured observations then constitute an unbalanced RSS of size n.

Recently, several bootstrap methods have been developed based on RSS. Hui et al. (2004) proposed a bootstrap confidence interval method for the population mean based on RSS via linear regression, wherein they applied the bootstrap method to estimate the variance of the estimator of the population mean for constructing confidence intervals. In a similar vein, Modarres et al. (2006) developed many bootstrap procedures for balanced RSS and established their consistency for the sample mean. Drikvandi et al. (2006) proposed a bootstrap method to test for the symmetry of the distribution function about an unknown median based on RSS. Finally, Frey (2014) developed confidence bands for the CDF based on RSS by using the bootstrap method.

In this article, we develop three confidence interval methods for the mean parameter of RSS. First, we suggest the bias-corrected and accelerated (\(BC_{a}\)) confidence interval method. The \(BC_{a}\) was proposed by Efron (1987) for SRS and it is an improvement over the bootstrap percentile confidence interval method in terms of coverage probability (see Hall 1988). The \(BC_{a}\) method has been considered by several researchers in different contexts. This method requires numerical implementation of the acceleration constant that is computed by using the jackknife method. However, in the case of RSS, the numerical computation of the acceleration constant becomes intensive as set size increases. This motivates us to develop an alternative approach to reduce this computational burden. For this purpose, we derive a formula for the acceleration constant based on Edgeworth expansion which avoids the cumbersome numerical implementation. Next, two confidence interval methods are proposed based on monotone transformations. These methods transform the studentized pivot into another one based on a monotone transformation so that the resulting distribution of the transformed pivot is nearly symmetric and then we can construct confidence intervals by inverting back the transformation. Various transformations of the studentized pivot have been investigated by Johnson (1978), Hall (1992a), Zhou and Gao (2000), and Cojbasic and Loncar (2011) under SRS.

The remainder of this paper is organized as follows. Section 2 introduces a Ranked set sample. In Sects. 3 and 4, we develop \(BC_{a}\) method and transformation confidence interval methods, respectively. Simulation results are presented in Sect. 5. Section 6 presents a real data application. We finally conclude the article with some brief remarks in Sect. 7. Proofs are relegated to the “Appendix”.

2 Ranked set sample

Let nk units be drawn randomly from a population with an unknown distribution F(x). Let \(\mu \) and \(\sigma ^{2}\) be the mean and variance of F(x), respectively. These units are then randomly divided into n groups \(G_1, \ldots , G_n\) of size k each. The rth group \(G_r\) consists of \(\{X_{r,1},X_{r,2},\ldots ,X_{r,k}\}\). Then, the units in each n subgroups are ordered on the attribute of interest by the use of some ranking process. Let \(m_{r}\) be the number of actual measurements on units having rank \(r, r=1,\ldots ,k\), such that \(n=\sum _{i=1}^{k} m_{r}\). Under the assumption of perfect ranking, let \(X_{(r), j}\) denote the measurement on the jth unit having rank r and let the \(m_{r}\) resulting measurements on units with rank r be labeled as \(\{X_{(r),1},\ldots ,X_{(r),m_{r}}\}\). Therefore, the resulting RSS of size n drawn from that underlying distribution F(x) is given by \(\mathcal {X}_{RSS}=\{X_{(r),j}, r=1,\ldots ,k; j=1\ldots ,m_{r}\}\). When \(m_{r}=m, r=1,\ldots ,k,\) RSS leads to the balanced ranked set sample of size mk.

Let \(\{X_{(r),1},\ldots ,X_{(r),m_{r}}\}\) be a random sample from \(F_{r}\), where \(F_{r}\) denotes the distribution function of the rth order statistic from F(x). Let \(\mu _{r}\) denote the mean of \(F_{r}\). We are then interested in constructing confidence intervals for

$$\begin{aligned} \mu =k^{-1}\sum _{r=1}^{k}\mu _{r} \end{aligned}$$

based on \(\mathcal {X}_{RSS}\). As stated by Dell and Clutter (1972), the above identity holds under both perfect and imperfect rankings. An unbiased estimator of \(\mu \) can be obtained as

$$\begin{aligned} {\bar{X}}_{RSS}=\frac{1}{k}\sum _{r=1}^{k}{\bar{X}}_r=\frac{1}{k}\sum _{r=1}^{k}\frac{1}{m_{r}}\sum _{j=1}^{m_r}X_{(r),j}, \end{aligned}$$

where \({\bar{X}}_r\) is the sample mean based on \(\{X_{(r),1},\ldots ,X_{(r),m_{r}}\}\). Let \(\tau ^2\) be the variance of \({\bar{X}}_{RSS}\) given by

$$\begin{aligned} \tau ^2=\frac{1}{k^2}\sum _{r=1}^{k} \frac{\sigma ^{2}_{r}}{m_{r}}, \end{aligned}$$

where \(\sigma ^{2}_{r}\) is the variance of \(X_{(r),j}\). Set \(S^{2}_{r}=\frac{1}{m_r}\sum _{j=1}^{m_r}(X_{(r),j}-{\bar{X}}_{r})^{2}\), a plug-in estimator for \(\sigma ^{2}_{r}\), so that the corresponding plug-in estimator for \(\tau ^{2}\) becomes

$$\begin{aligned} {\hat{\tau }}^{2}= \frac{1}{k^2}\sum _{r=1}^{k}\frac{S^{2}_{r}}{m_{r}}. \end{aligned}$$

While seeking confidence intervals for \(\mu \), let \(t_{\zeta }\) denote the \(\zeta \)th quantile of the distribution of the pivot

$$\begin{aligned} T_{RSS}=\frac{({\bar{X}}_{RSS}-\mu )}{{\hat{\tau }}}=\frac{\sum _{r=1}^{k}(\bar{X_{r}}-\mu _{r})}{\sqrt{\sum _{r=1}^{k}\frac{S^{2}_{r}}{m_r}}} \end{aligned}$$

such that \(P(T_{RSS} \le t_{\zeta })=\zeta .\) For the rest of this article, we consider \(\alpha \) as the nominal coverage probability of a confidence interval. Then, \(I_0= [{\bar{X}}_{RSS}- t_{\alpha }{\hat{\tau }}, \infty )\), \(I_1= (-\infty , {\bar{X}}_{RSS}- t_{1-\alpha }{\hat{\tau }} ]\), and \(I_2= [{\bar{X}}_{RSS}-t_{(1+\alpha )/2}{\hat{\tau }} , {\bar{X}}_{RSS}-t_{1-(1+\alpha )/2}{\hat{\tau }}]\) are the ideal lower, upper, and two-sided confidence intervals for \(\mu \), respectively. However, these intervals are unknown since \(t_{\zeta }\) is unknown. Usually, we estimate these by employing the normal approximation, based on the central limit theorem, to the distribution of \(T_{RSS}\). An alternative method to the normal approximation is through bootstrap method which has become a standard tool for estimating unknown confidence intervals.

Our proposed confidence intervals depend on detailed properties of the Edgeworth expansions of the distributions of \(T_{RSS}\) and \(S_{RSS}=({\bar{X}}_{RSS}-\mu )/\tau \). For this purpose, we assume that the distribution of \((X_{(r)}, X^{2}_{(r)} )\) satisfies Cramér’s continuity condition (see Hall 1992b, pp. 66–67) for each \(r=1,2, \ldots , k\) and that \(E(X^{8}) < \infty \). We also assume that \(m_{1},\ldots , m_{r}\) are of the same order, i.e., \(\lim _{n\rightarrow \infty } (m_{r}/n)=\lambda _{r}\in (0,1)\). These conditions are sufficient for all the results derived in this paper. We conclude this section by presenting the Edgeworth expansions of the distributions of \(T_{RSS}\) and \(S_{RSS}\).

Theorem 2.1

Under the above conditions, the distributions of \(S_{RSS}\) and \(T_{RSS}\) have the following Edgeworth expansions:

$$\begin{aligned} P(S_{RSS}\le x)=\Phi (x)+n^{-1/2}p_{1}(x)\phi (x)+O(n^{-1}), \end{aligned}$$

and

$$\begin{aligned} P(T_{RSS}\le x)=\Phi (x)+n^{-1/2}q_{1}(x)\phi (x)+O(n^{-1}), \end{aligned}$$

where \(\Phi (x)\) is the cdf of the standard normal distribution, and \(p_{1}(x)\) and \(q_{1}(x)\) are even polynomials of degree 2 having the expressions

$$\begin{aligned} p_{1}(x)= & {} -\frac{1}{6}\eta _{1}^{-3/2} \eta _{2}(x^2-1),\\ q_{1}(x)= & {} \frac{1}{6}\eta _{1}^{-3/2} \eta _{2}(2x^2+1), \end{aligned}$$

with

$$\begin{aligned} \eta _{1}=\sum _{r=1}^{k} \frac{\sigma ^{2}_{r}}{\lambda _{r}}\;,\;\eta _{2}=\sum _{r=1}^{k} \frac{\gamma _{r}}{\lambda ^{2}_{r}}\;\hbox { and}\; \gamma _{r}=E{(X_{(r)}-\mu _{r})^{3}}. \end{aligned}$$

An excellent review on the theory of Edgeworth expansion can be found in Hall (1992a). A simpler procedure to approximate the confidence intervals \(I_0\), \(I_1\) and \(I_2\) is based on the normal approximation to the distribution of \(T_{RSS}\), using the fact that \(P(T_{RSS}\le x){\rightarrow } \Phi (x)\) as \(n{\rightarrow }\infty \). Let \(z_{\zeta }\) be the \(\zeta \)th quantile of the standard normal distribution. Then, \(I_{0,N}= [{\bar{X}}_{RSS}- {\hat{\tau }} z_{\alpha },\infty ]\), \(I_{1,N}= (-\infty , {\bar{X}}_{RSS}- {\hat{\tau }} z_{1-\alpha }]\) and \(I_{2, N}= [{\bar{X}}_{RSS}- {\hat{\tau }} z_{(1+\alpha )/2}, {\bar{X}}_{RSS}- {\hat{\tau }} z_{1-(1+\alpha )/2}]\) are the respective lower, upper and both-sided confidence intervals for \(\mu \) based on the normal approximation. For a given \(\alpha \), Theorem 2.1 shows that

$$\begin{aligned} P(\mu \in I_{0,N})= P(\mu \in I_{1,N})=\alpha +O(n^{-1/2})\;\;\hbox {and}\;\; P(\mu \in I_{2,N})= \alpha +O(n^{-1}). \end{aligned}$$

Hence, \(I_{0,N}\) and \(I_{1,N}\) are the first-order accurate confidence intervals while \(I_{2,N}\) is the second-order accurate confidence interval. In the next section, we develop \(BC_{a}\) confidence intervals for \(\mu \) using RSS.

3 Bootstrap for ranked set samples

To facilitate the development of bootstrap confidence interval methods based on RSS, we first give a short description of the existing resampling methods, namely, BRSSR (bootstrap RSS by row), BRSS (bootstrap RSS), and MRBRSS (mixed row bootstrap RSS), for RSS. Chen et al. (2004) introduced BRSSR and Modarres et al. (2006) subsequently studied its properties. They showed that BRSSR is asymptotically consistent in estimating the distribution of the standardized sample mean under RSS. Since RSS can be viewed as k independent random samples from k different distributions, BRSSR can therefore be easily implemented by drawing bootstrap samples independently from each of the k independent random samples.

BRSS and MRBRSS were proposed by Modarres et al. (2006) for balanced RSS under perfect rankings. BRSS method draws RSS from the observed RSS to perform the bootstrap method. They established that under balanced RSS along with perfect rankings, BRSS consistently estimates the distribution of the standardized sample mean, i.e., BRSS estimator of the distribution of the standardized sample mean converges to its true distribution almost surely as \(n \rightarrow \infty \). However, BRSS may not be appropriate for unbalanced RSS as it may introduce some bias (Modarres et al. 2006). The other resampling method MRBRSS is not an appealing resampling method for RSS since it does not provide a consistent estimator of the distribution of the standardized sample mean (Modarres et al. 2006). The \(BC_{a}\) confidence intervals constructed here are therefore based on BRSSR.

3.1 Construction of bootstrap confidence intervals

Confidence intervals for \(\mu \) based on RSS can be easily constructed by extending Efron’s (1979) bootstrap percentile method in the case of RSS. This method has some attractive features, such as it is invariant to monotone transformations, but it suffers from poor coverage probabilities. Efron (1987) proposed a correction to the percentile-method that reduces the coverage error while retaining the invariance property. The resulting confidence interval method is known as bias-corrected and accelerated (\(BC_{a}\)) bootstrap. To facilitate the construction of \(BC_{a}\) confidence intervals based on RSS, let \(\{X^{*}_{(1),1},\ldots X^{*}_{(1),m_{1}}\},\ldots , \{X^{*}_{(k),1},\ldots X^{*}_{(k),m_{k}}\}\) denote k bootstrap samples drawn independently and randomly with replacement from the k sets, \(\{X_{(1),1},\ldots X_{(1),m_{1}}\}, \ldots , \{X_{(k),1},\ldots X_{(k),m_{k}}\}\), respectively. Then, the resulting bootstrap sample \(\mathcal {X}^{*}_{RSS}=\{X^{*}_{(r),j}, r=1,\ldots ,k; j=1\ldots ,m_{r}\}\) is known as BRSSR. Let us denote \({\bar{X}}^{*}_{r}=m^{-1}_{r}\sum _{i=1}^{m_r}X^{*}_{(r),i}\) and \(S^{*2}_{r}=m^{-1}_{r}\sum _{i=1}^{m_r}(X^{*}_{(r),i}-{\bar{X}}^{*}_{r})^{2}\). The bootstrap versions of \({\bar{X}}_{RSS}\) and \({\hat{\tau }}^2\) are then

$$\begin{aligned} {\bar{X}}^{*}_{RSS}=k^{-1}\sum _{r=1}^{k}{\bar{X}}^{*}_{r}\;\; \hbox {and}\;\; {\hat{\tau }}^{*2}=k^{-2}\sum _{r=1}^{k}\frac{S^{*2}_{r}}{m_{r}}. \end{aligned}$$

Define

$$\begin{aligned} \hat{u}_{\xi }=sup\{u:P({\bar{X}}^{*}_{RSS}\le u|\mathcal {X}_{RSS})\le \xi \}. \end{aligned}$$

Then, \(I_{0,BP}= [\hat{u}_{1-\alpha },\infty )\), \(I_{1,BP}= (-\infty , \hat{u}_{\alpha }]\) and \(I_{2, BP}= [\hat{u}_{(1-\alpha )/2}, \hat{u}_{(1+\alpha )/2}]\) are the respective lower, upper and both-sided percentile-method confidence intervals for \(\mu \). It can be shown that

$$\begin{aligned} P\{\mu \in I_{0,BP}\}=P\{\mu \in I_{1,BP}\}=\alpha +O(n^{-1/2})\;\hbox {and}\; P\{\mu \in I_{2,BP}\}=\alpha +O(n^{-1}). \end{aligned}$$

For constructing \(BC_{a}\) confidence intervals for \(\mu \) based on RSS, let us define

$$\begin{aligned} \hat{G}_{RSS} (x)=P({\bar{X}}^{*}_{RSS}\le x|\mathcal {X}_{RSS}), \end{aligned}$$

the bootstrap distribution of \({\bar{X}}^{*}_{RSS}\). Put

$$\begin{aligned} \hat{d}= & {} \Phi ^{-1}\{\hat{G}_{RSS} ({\bar{X}}_{RSS})\}, \end{aligned}$$
(3.1)
$$\begin{aligned} l_{\hat{a}}(\alpha )= & {} \Phi [\hat{d}+(\hat{d}+z_{\alpha })\{1-\hat{a}(\hat{d}+z_{\alpha })\}^{-1}], \end{aligned}$$
(3.2)

where \(\hat{d}\) and \(\hat{a}\) are called bias-correction and acceleration constant, respectively. We now define Efron’s (1987) \(BC_{a}\) method confidence intervals for \(\mu \) based on RSS as follows:

$$\begin{aligned} I_{0,{BC_{a}}}= & {} [\hat{u}_{l_{\hat{a}}(1-\alpha )},\infty ),\\ I_{1,{BC_{a}}}= & {} (-\infty ,\hat{u}_{l_{\hat{a}}(\alpha )}] \end{aligned}$$

and

$$\begin{aligned} I_{2,{BC_{a}}}=[\hat{u}_{l_{\hat{a}}((1-\alpha )/2)}, \hat{u}_{l_{\hat{a}}((1+\alpha )/2)}], \end{aligned}$$

respectively. The bootstrap percentile and the bias-corrected methods can be viewed as special cases of \(BC_{a}\), which can be obtained by letting \(\hat{d}=\hat{a}=0\) and \(\hat{a}=0\), respectively. In particular, the non-zero values of \(\hat{d}\) and \(\hat{a}\) change the quantiles used for \(BC_{a}\). In practice, \(\hat{d}\) is computed as

$$\begin{aligned} \hat{d}= \Phi ^{-1}\biggl (\frac{\#\{{\bar{X}}^{*}_{RSS}\le {\bar{X}}_{RSS}\}}{B}\biggr ), \end{aligned}$$

where B is the number of bootstrap samples. The acceleration constant \(\hat{a}\) can be computed using the jackknife method (for details, see Efron 1987; Efron and Tibshirani 1993), which becomes computationally burdensome as the set size in RSS increases. This computational burden involved in \(BC_{a}\) method for RSS can be substantially reduced by letting

$$\begin{aligned} \hat{a}=\frac{1}{6}n^{-1/2}\hat{\eta }^{-3/2}_{1}\hat{\eta }_{2}=\frac{1}{6}n^{-1/2}\left( \sum _{r=1}^{k} \frac{\hat{\sigma }^{2}_{r}}{\lambda _{r}}\right) ^{-3/2}\sum _{r=1}^{k} \frac{\hat{\gamma }_{r}}{\lambda ^{2}_{r}}. \end{aligned}$$

The above formula for \(\hat{a}\) is obtained by extending Hall’s (1988) finding to the case of RSS. Then, \(I^{*}_{0,{BC_{a}}}\), \(I^{*}_{1,{BC_{a}}}\) and \(I^{*}_{2,{BC_{a}}}\) are versions of \(I_{0,{BC_{a}}}\), \(I_{1,{BC_{a}}}\) and \(I_{2,{BC_{a}}}\) based on the above expression for \(\hat{a}\). The following result establishes the second-order accuracy of \(I^{*}_{0,{BC_{a}}}\), \(I^{*}_{1,{BC_{a}}}\) and \(I^{*}_{2,{BC_{a}}}\).

Theorem 3.1

Under the assumptions of Theorem 2.1,

$$\begin{aligned} P\{\mu \in I^{*}_{0,{BC_{a}}}\}=P\{\mu \in I^{*}_{1,{BC_{a}}}\}=P\{\mu \in I^{*}_{2,{BC_{a}}}\}=\alpha +O(n^{-1}). \end{aligned}$$

This result shows that \(BC_{a}\) method-based lower and upper confidence intervals are more accurate than those based on normal approximation and bootstrap percentile-method, which provide a coverage error of order \(O(n^{-1/2})\). However, for two sided-confidence intervals, normal approximation and bootstrap percentile methods are similar to \(BC_{a}\) method in terms of the coverage probability, as they all result in a coverage error of order \(O(n^{-1})\). It is important to note that the normal approximation confidence intervals do not respect transformation.

4 Confidence intervals based on monotone transformations

Considerable work has been done on obtaining a simple and accurate approximation to the distribution of a statistic or a pivot. Many researchers (Johnson 1978; Hall 1992a; Zhou and Gao 2000; Cojbasic and Loncar 2011) have investigated the effects of a monotone transformation on the distribution of an asymptotic pivot for simple random samples. Such a transformation is useful in reducing the effects of skewness of data on the distribution of an asymptotic pivot. That is, the distribution of the transformed pivot becomes more symmetric than that of the original asymptotic pivot. Johnson (1978) proposed a modified one-sample t test based on a quadratic transformation that is less affected by the population skewness than the conventional t test. However, Hall (1992a) noticed that this transformation has some drawbacks in that it is not monotone and fails to correct adequately for skewness. For this reason, Hall (1992a) proposed a monotone transformation based on a cubic polynomial that has a simple inverse function. In our case, Hall’s (1992a) cubic transformation can be defined as

$$\begin{aligned} g_1(x)= x+n^{-1/2}\frac{1}{3}\hat{\eta }x^2+n^{-1}\frac{1}{27}\hat{\eta }^{2}x^{3}+n^{-1/2}\frac{1}{6}\hat{\eta }, \end{aligned}$$

where \(\hat{\eta }=\hat{\eta }^{-3/2}_{1}\hat{\eta }_{2}\). Under the assumption of Theorem 2.1, it can be shown that \(P\{g_{1}(T_{RSS})\le x \}=\Phi (x)+O(n^{-1})\); that is, the distribution of the transformed pivot, \(g_{1}(T_{RSS})\), is more symmetric than that of \(T_{RSS}\).

Let \(z_{\zeta }\) be the \(\zeta \)th quantile of the standard normal distribution and define

$$\begin{aligned} I_{0,g_{1}}= & {} [{\bar{X}}_{RSS}- g^{-1}_{1}(z_{\alpha }){\hat{\tau }}, \infty ),\\ I_{1,g_{1}}= & {} (-\infty , {\bar{X}}_{RSS}- g^{-1}_{1}(z_{1-\alpha }){\hat{\tau }}] \end{aligned}$$

and

$$\begin{aligned} I_{2, g_{1}}= [{\bar{X}}_{RSS}-{\hat{\tau }} g^{-1}_{1}(z_{\gamma }),{\bar{X}}_{RSS}- {\hat{\tau }} g^{-1}_{1}(z_{1-\gamma })], \end{aligned}$$

where \(\gamma =\frac{1}{2}(1+\alpha )\). Then, \(I_{0,g_{1}}\), \(I_{1,g_{1}}\) and \(I_{2, g_{1}}\) are the lower, upper and two-sided confidence intervals for \(\mu \), based on the transformation \(g_{1}(.)\) with \(g^{-1}_{1}(x)=n^{1/2}(\frac{1}{3}\hat{\eta })^{-1}[\{1+\hat{\eta }(n^{-1/2}x-\frac{1}{6}n^{-1}\hat{\eta })\}^{1/3}-1]\). Hall (1992a) also proposed exponential-type transformation which is monotone and has a simple inverse function, and for RSS it is given by

$$\begin{aligned} g_{2}(x)=\left( \frac{2}{3}n^{-1/2}\hat{\eta }\right) ^{-1}\left\{ \exp \left( \frac{2}{3}n^{-1/2}\hat{\eta }x\right) -1\right\} +\frac{1}{6}n^{-1}\hat{\eta }. \end{aligned}$$

Then, in this case, we have

$$\begin{aligned} I_{0,g_{2}}= & {} [{\bar{X}}_{RSS}- {\hat{\tau }} g^{-1}_{2}(z_{\alpha }), \infty ),\\ I_{1,g_{2}}= & {} (-\infty , {\bar{X}}_{RSS}- {\hat{\tau }} g^{-1}_{2}(z_{1-\alpha })] \end{aligned}$$

and

$$\begin{aligned} I_{2, g_{2}}= [{\bar{X}}_{RSS}-{\hat{\tau }} g^{-1}_{2}(z_{\gamma }),{\bar{X}}_{RSS}- {\hat{\tau }} g^{-1}_{2}(z_{1-\gamma })], \end{aligned}$$

as the respective lower, upper and two-sided confidence intervals for \(\mu \), based on the transformation \(g_{2}(.)\), with \(g^{-1}_{2}(x)=(\frac{2}{3}n^{-1/2}\hat{\eta })^{-1}\log \{1+\frac{2}{3}n^{-1/2}\hat{\eta }(x-n^{-1}\frac{1}{6}\hat{\eta })\}.\) The following result shows that the intervals \(I_{0,g_{i}}\), \(I_{1,g_{i}}\) and \(I_{2,g_{i}}\) are second-order accurate, for \(i=1, 2\).

Theorem 4.1

Under the assumptions of Theorem 2.1,

$$\begin{aligned} P\{\mu \in I_{0,g_{i}}\}=P\{\mu \in I_{1,g_{i}}\}= P\{\mu \in I_{2,g_{i}}\}=\alpha +O(n^{-1}),\quad i=1,2. \end{aligned}$$

Theorem 4.1 follows from the fact that \(P\{g_{i}(T_{RSS})\le x\}=\Phi (x)+O(n^{-1})\), \(i=1,2\). The above results show that the coverage error associated with the intervals \(I_{0,g_{i}}\) and \(I_{1,g_{i}}\) are of order \(O(n^{-1})\) and hence these intervals are improvements over the intervals \(I_{0,N}\) and \(I_{1,N}\) in terms of coverage errors. Theorems 3.1 and 4.1 imply that the bias-corrected and accelerated and transformation methods are asymptotically equivalent in terms of coverage errors. But, the bias-corrected and accelerated method is transformation-respecting, while the transformation methods are easy to apply.

5 Simulation study

Simulation studies are performed to compare the proposed confidence interval methods in Sects. 3 and 4 with three conventional confidence interval methods, namely, the bootstrap percentile method, the bootstrap percentile-t method, and the normal approximation method. Lower, upper and two-sided confidence intervals were constructed based on each of these methods.

To facilitate the discussion of our simulation results, the lower, upper and two-sided confidence intervals are denoted by LCL, UCL and TCL, respectively. The symbols N, BP, BT, \(BC_{a}\) stand for normal approximation method, bootstrap percentile method, bootstrap percentile-t method, and bias-corrected and accelerated method. Moreover, \(N_{g_1}\) and \(N_{g_2}\) denote the normal approximation along with transformations \(g_1\) and \(g_2\), respectively. We also consider the confidence interval method proposed by Ahn et al. (2014) as a competing procedure. This method uses t critical values based on the Welch-type approximation and we denote this procedure by \(t_{ALW}\). All these different confidence interval methods are then compared with respect to their coverage probabilities, average lower confidence limits, average upper confidence limits, and average interval widths. We also compare RSS and simple random sampling (SRS) with respect to all these methods.

We now describe in detail the setting of our simulation study. The RSS were generated based on several balanced RSS and unbalanced RSS designs with different sample sizes when the set sizes were chosen to be \(k=2\), 3 and 5. Data were generated from three underlying distributions with various degrees of skewness and kurtosis: a chi-square distribution with \(df=1 (\chi ^{2}_{1})\); the standard exponential distribution (Exp(1)), and the half normal distribution denoted by HN(0, 1). The means are, respectively, 1, 1 and \(\sqrt{(}2/\pi )\) for these three distributions.

Table 1 Coverage probabilities of 90% confidence intervals for mean of \(\chi ^{2}_{1}\) data
Table 2 Coverage probabilities of 90% confidence intervals for mean of \(\chi ^{2}_{1}\) data
Table 3 Coverage probabilities of 90% confidence intervals for mean of Exp(1) data
Table 4 Coverage probabilities of 90% confidence intervals for mean of Exp(1) data
Table 5 Coverage probabilities of 90% confidence intervals for mean of HN(0, 1)
Table 6 Coverage probabilities of 90% confidence intervals for mean of HN(0, 1) data

For every distribution and sample size combination, we generated 5000 simulated samples. The coverage probability for each method for each combination was then estimated by the proportion of times the method covered the true parameter of the underlying distribution. Different bootstrap confidence intervals were constructed by using 3000 bootstrap samples. The obtained simulation results are presented in Tables 12345 and 6 for \(k=2, 3\). The simulation results for \(k=5\) are provided in the supplement. Based on the results in these tables, we have the following findings:

(i):

The \(BC_{a}\) method gives most accurate coverage probabilities of LCL for \(\chi ^{2}_{1}\) and Exp(1) when \(n\le 20\). However, for these distributions, the coverage probabilities of LCL based on \(BC_{a}\), \(N_{g_1}\) and \(N_{g_2}\) are comparable for \(n>20\). We also observe that the LCL based on BP, BT, \(t_{ALW}\) and N methods gives over-coverage in most cases for \(\chi ^{2}_{1}\) and Exp(1). The methods \(BC_{a}\), \(N_{g_1}\) and \(N_{g_2}\) give similar coverage probabilities for the lower confidence interval for most of the cases when data are generated from HN(0, 1). The coverage probabilities for the lower confidence interval corresponding to all methods, except N, are also similar for HN(0, 1) when \(n\ge 30\). \(BC_{a}\) method provides most accurate average lower limits compared to all other methods when \(n\le 20\) for \(\chi ^{2}_{1}\) and Exp(1). However, for \(n > 20\), average lower limits are comparable for all the methods. Average lower limits of all methods corresponding to HN(0, 1) are also comparable, but \(BC_{a}\) method provides the most accurate average lower limits for \(n=10\). Based on our simulation study, \(BC_{a}\) is the one we would recommend for the construction of lower confidence intervals.

(ii):

For UCL, BT method gives best coverage accuracy among all the methods when \(n\le 20\). However, the coverage probability of UCL corresponding to the BT method is far removed from the nominal coverage for \(\chi ^{2}_{1}\) and Exp(1) when \(n\le 20\). For large \(n (>30)\), the methods BT, \(BC_{a}\), \(t_{ALW}\), \(N_{g_1}\) and \(N_{g_2}\) all give similar coverage probabilities for \(\chi ^{2}_{1}\) and Exp(1). For the case of HN(0, 1), the methods BT, \(BC_{a}\), \(t_{ALW}\), \(N_{g_1}\) and \(N_{g_2}\) all give similar coverage probabilities for UCL when \(n \ge 20\). This may be due to the fact that HN(0, 1) is less skewed as compared to \(\chi ^{2}_{1}\) and Exp(1). The N and BP methods undercover consistently for all sample sizes. Also, when n is small, \(t_{ALW}\) method produces better coverage probabilities compared to \(BC_{a}\), \(N_{g_1}\) and \(N_{g_2}\) methods for UCL. Overall, BT method appears to be the best in terms of coverage accuracy for UCL when \(n\le 30\).

(iii):

For two-sided confidence intervals, BT method performs best in terms of coverage accuracy with very wide confidence intervals for small to moderate sample sizes. For large sample sizes \((n=60)\), all methods provides very similar coverage probabilities. But, \(N_{g_1}\) and \(t_{ALW}\) give better coverage probabilities for TCL than N, BP, \(BC_{a}\) and \(N_{g_2}\) methods for small to moderate sample sizes. In summary, we find BT method to give best coverage for TCL and produces wider confidence intervals for small to moderate samples sizes. For large sample sizes, \(N_{g_1}\), \(N_{g_2}\) and \(t_{ALW}\) all give comparable coverage as BT in addition to being simple to compute and also requiring less computational effort.

(iv):

The simulation results show that the SRS scheme produces better coverage probabilities for the methods N, BP, BT, \(BC_{a}\), \(N_{g_1}\) and \(N_{g_2}\) than the RSS when \(n \le 20\). However, both sampling methods give similar coverages for all methods when \(n\ge 30\). The most important observation is that for each distribution and sample size combination that we have considered, all confidence interval methods give more accurate average lower and upper limits, and shorter average interval widths for LCL, UCL and TCL under the RSS scheme as compared to the SRS scheme.

(v):

Simulation results (see the supplement) for set size \(k=5\) provide similar conclusions as for set sizes \(k=2,3\).

5.1 Imperfect ranking

Throughout this work, we have assumed the RSS scheme under perfect ranking. However, in practice, it may involve ranking errors. Therefore, it will be of natural interest to evaluate the performance of all these interval estimation methods under imperfect ranking. Several nonparametric tests have been proposed by Li and Balakrishnan (2008) to test the assumption of perfect ranking in RSS. In order to simulate imperfect RSS samples, we employ the model proposed by Dell and Clutter (1972). Consider the model \(Y_{i}=X_{i}+\epsilon _{i}\), where \(\epsilon _{i}\) represents the error involved in the judgment ranking. This model can be implemented by generating n independent realizations \(X_{i}\) from a given distribution F(x) and n independent normally distributed random errors with zero mean and variance \(\sigma ^2\). We compute \(Y_{i}\)’s and order them. Let \(Y_{(r)}\) denote the measurement corresponding to the true rth order statistic. Then, the corresponding X value represents the rth judgment order statistic, \(X_{[r]}\). The variance component attached to the error term, \(\sigma ^2\), controls the degree of judgment error. In particular, the correlation coefficient between Y and X can be expressed as \(\rho =\phi / \sqrt{\phi ^2+\sigma ^2}\), where \(\phi ^2\) is the variance of X. It is quite evident that the RSS scheme involves perfect ranking when \(\epsilon \) has a degenerate distribution. We consider \(\rho \) to be 0.5 and 0.75 and the values of \(\sigma ^2\) are selected accordingly. The obtained simulation results under imperfect ranking are presented in Tables 78 and 9. For brevity, simulation results below highlights our findings based on a set size of \(k=2\) with the balanced design. All these different confidence interval methods appeared to be robust under imperfect ranking. That is, we have similar conclusions as reported under the perfect ranking scenario. We also observe that, given a sample size, as the correlation increases these methods result in a shorter interval.

Table 7 Coverage probabilities of 90% confidence intervals for mean of \(\chi ^{2}_{1}\) data under imperfect ranking
Table 8 Coverage probabilities of 90% confidence intervals for mean of Exp(1) data under imperfect ranking
Table 9 Coverage probabilities of 90% confidence intervals for mean of HN(0, 1) data under imperfect ranking

6 Illustrative example

In this section, we use a data from a study involving 46 shrubs. The dataset was first reported in Muttlak and McDonald (1990). First three transect lines were laid out across the area and all shrubs intersecting each transect were sampled. Finally, the size of each shrub was measured. This technique is good for sampling a very large area relatively quickly. These data were further used by Ghosh and Tiwari (2004) to construct an RSS. The original sample was broken into 15 groups, each containing 3 shrubs (leaving one out). The 3 shrubs in the first group were ranked based on their sizes, and the shortest of all was included in the sample. This process was repeated 5 times, which resulted in 5 replicates. For the next 5 groups, the ones with the second smallest size were included in the sample. Finally, from each of the remaining five groups, the largest shrubs were chosen. This process resulted in a balanced RSS with set size 3 and cycle size 5. The data so obtained are presented in Ghosh and Tiwari (2004). Figure 1 presents the density plot of shrub sizes and it shows that the sample has a bimodal skewed distribution. The resulting confidence intervals for the mean shrub size are presented in Table 10.

It can be seen that BP method gives largest lower limit for the 90% lower confidence interval for mean size of shrubs. The lower limit corresponding to the \(BC_{a}\) method is close to that of BP method. Method N is similar to the methods \(N_{g_1}\) and \(N_{g_2}\), but their lower limits are smaller than those of BP and \(BC_{a}\) methods. BT method gives the smallest lower limit. Based on our simulation study, \(BC_{a}\) method should be chosen for the lower confidence interval for mean size of shrubs.

For the 90% upper confidence limit, BP method produces smallest upper limit which is close to that of \(BC_{a}\) method. \(N_{g_1}\) and \(N_{g_2}\) methods give same upper limits for mean size of shrubs and their upper limits are bigger than those of BP and \(BC_{a}\) methods, but close to the upper limit based on N method. BT method gives largest upper limits for mean size of shrubs, but very similar to that of \(t_{ALW}\) method. The results of the simulation study suggest that the BT method often has the best coverage error for upper confidence intervals. Therefore, BT method should be chosen for the upper confidence interval for mean size of shrubs.

For the 90% two-sided confidence interval, \(BC_{a}\) and BP methods have shortest interval width compared to other methods. BT method produces the widest 90% two-sided confidence interval for mean size of shrubs. Our simulation study shows BT method to have best coverage probabilities for small to moderate sample sizes, but it produces widest two-sided confidence intervals. \(t_{ALW}\) gives the next best coverage for two-sided confidence intervals. Hence, \(t_{ALW}\) or BT method should be chosen for the two-sided confidence interval for mean size of shrubs.

Fig. 1
figure 1

Estimated density function for shrub sizes

Table 10 Summary of 90% LCL, UCL and TCL for mean shrub size

7 Discussion and concluding remarks

We have developed bias-corrected and accelerated method along with transformation methods, \(N_{g_1}\) and \(N_{g_2}\), for constructing confidence intervals for the population mean based on Ranked set samples. We have studied asymptotic properties of these methods and have shown that they are second-order accurate. These methods are asymptotically equivalent to bootstrap percentile-t in terms of coverage errors. From the simulation studies carried out, it is evident that for a right skewed distribution, the bias-corrected method gives best coverage probability in the case of lower confidence intervals, whereas bootstrap percentile-t method results in smallest average lower limits among all methods. On the other hand, when the population distribution is right skewed, the bootstrap percentile-t method gives best finite sample coverage for the upper and two-sided intervals. This behavior can be substantiated by the empirical results suggesting largest average upper limits and the widest two-sided confidence intervals associated with bootstrap percentile-t method. Performances of all methods improve as sample size increases, as one would expect. Hence, for large sample sizes, bias-corrected and accelerated, bootstrap percentile-t, \(t_{ALW}\), \(N_{g_1}\) and \(N_{g_2}\) methods all give similar coverage probabilities for all three intervals. However, \(t_{ALW}\), \(N_{g_1}\) and \(N_{g_2}\) require less computing in terms of bootstrap resampling. Though our proposed confidence interval methods are developed under the assumptions of perfect ranking, our simulation studies show that they are robust even in the presence of judgment error. In this research, we did not consider iterated bootstrap method. Even though this method is known to be effective in reducing coverage errors for SRS, it becomes computationally demanding as set sizes in RSS scheme increases. A reduction in the computational burden of the iterated bootstrap method in RSS may be possible by applying some analytical approximations to the nominal level. Work is currently under progress on this problem and we hope to report these findings in a future paper.