1 Introduction

Judgment post stratification (JPS) sampling scheme was introduced by MacEachern et al. (2004) as an alternative to simple random sampling (SRS) scheme for situations where the variable of interest is expensive or difficult to measure, but judgment ranking is cheap and can be done relatively easily. These situations happen in many practical research including educational studies (Wang et al. 2016), sport sciences (He et al. 2018; Qian et al. 2019) and medical studies (Mahdizadeh and Zamanzade 2019; Zamanzade and Mahdizadeh 2020). A JPS sample is constructed using a simple random sample and its supplementary judgment ranks. The JPS sampling procedure is shortly described as follows.

To draw a JPS sample of size n using set size m from an infinite population, a SRS sample of size n is collected and measured. Then for each measured unit in the simple random sample, \(m-1\) supplemental units are identified from the same population to create a set of size m. Finally, the judgment rank of measured unit in the simple random sample among \(m-1\) supplemental units is noted. Thus, a JPS sample contains a SRS sample plus its preparatory judgment ranks and can be represented as \(\left( Y_1,R_1\right) , \ldots , \left( Y_n,R_n\right) \), where \(R_i\) is the judgment rank of \(Y_i\) among \(m-1\) supplemental units in the set of size m. Throughout of this paper, the term of judgment rank is used to point out that the rank of \(R_i\) is obtained using an inexpensive method such as visual inspection, covariate or an expert’s personal judgment which does not require referring to exact values of supplemental units in the set and thus it may not to be accurate and contains error. If the rank is obtained without error (perfect ranking), then the conditional distribution of \(Y_i\) given its rank \(R_i=r\) is the same as the rth order statistic of a sample of size m. Otherwise, the conditional distribution follows the rth judgment order statistic of a sample of size m. A ranking process is called consistent if

$$\begin{aligned} F\left( t\right) =\frac{1}{m}\sum _{r=1}^m F_{[r]}\left( t\right) , \end{aligned}$$

where \(F\left( t\right) \) and \(F_{[r]}\left( t\right) \) are cumulative distribution functions (CDFs) of the parent distribution and rth judgment order statistic of a sample of size m, evaluated at the point t, respectively (see Presnell and Bohn 1999).

Let \(n_r\) be the number of JPS units with judgment rank r, then one can simply show that under a consistent ranking process, the vector \(\mathbf {n}=\left( n_1,\ldots ,n_m\right) \) follows a multinomial distribution with mass parameter n and probability vector \(\left( \frac{1}{m}, \ldots , \frac{1}{m}\right) \).

Conditioning on the vector of the ranks \(\mathbf {R}=\left( R_1,\ldots , R_n\right) \), a JPS sample can be regarded as an unbalanced ranked set sample (RSS). To draw a RSS sample, one first determines a set size m and a vector of post strata sample sizes \(\mathbf {n}=\left( n_1, \ldots , n_m\right) \), so that \(n=\sum _{r=1}^m n_r\) is the total sample size. For \(r=1,\ldots ,m\), \(n_r\) samples of size m are identified from the population of interest, ranked in an increasing magnitude, and the unit with rank r is selected for actual measurement.

Although JPS and RSS are in close connection with each other, two obvious differences between them can be found from their definitions. The first one is related to how judgment ranks are connected to the sample units. Note that in JPS (RSS) setting, the ranks are assigned to each sample unit after (before) its measurement, and therefore they are loosely (strongly) attached to the sample units and can (cannot) be ignored. This means that a JPS sample has more variability than its RSS counterpart, yet more flexibility to be used in practice since the standard SRS techniques can still be used for a JPS sample when the ranks are ignored. The second difference between JPS and RSS is related to the vector of post strata sample sizes \(\mathbf {n}=\left( n_1,\ldots ,n_m\right) \). While \(\mathbf {n}\) is a fixed and pre-specified vector in RSS, it is a random vector in JPS and follows a multinomial distribution.

The JPS has been the subject of many studies since its introduction, including estimation of the population mean (Wang et al. 2008; Frey and Feeman 2012; Dastbaravarde et al. 2016; Frey 2016), estimation of the population variance (Frey and Feeman 2013; Zamanzade and Vock 2015; Zamanzade 2016), estimation of the cumulative distribution function (CDF; Frey and Ozturk 2011; Wang et al. 2012; Duembgen and Zamanzade 2020), estimation of the population proportion (Zamanzade and Wang 2017), estimation of the population quantile (Ozturk 2014), two-sample problems (Ozturk 2015; Dastbaravarde and Zamanzade 2020), finite mixture model analysis (Omidvar et al. 2018), finite sample size corrections (Ozturk 2016) and perfect ranking test (Zamanzade and Vock 2018).

It is shown in the most of the above literature that JPS sampling design provides more efficient statistical inference than what is possible in SRS design of comparable size provided that the sample size is not too small and quality of ranking is fairly good (Dastbaravarde et al. 2016). Most of the JPS literature focus on drawing statistical inference of the population characteristics based on normal approximation (NA) method. However, since the JPS is applicable for situations in which obtaining exact values of sample units is hard/expensive, a large enough sample size may not be available to obtain a valid asymptotic inference due to time/cost considerations. So, there is a need of existing an alternative method to draw statistical inference for the population characteristics based on a JPS sample.

Bootstrap approach proposed by Efron (1979), is a computer-based method that repeats a simple operation so many times. This method enjoys from the property that it makes few distributional assumptions and provides solutions to many standard statistical problems. Specifically, the bootstrap approach can be used to estimate CDF and constructing a confidence interval for the population characteristics without making any distributional assumptions.

This paper describes two bootstrap approaches for a JPS sample, one of which has been already used by Ozturk (2016) without studying its consistency and the other is new. In Sect. 2, two bootstrap methods based on JPS sample scheme are introduced and their asymptotic consistencies are established in Sect. 3. In Sect. 4, the bootstrap techniques are used to construct confidence intervals for the population mean based on a JPS sample. The bootstrap confidence intervals we developed in this section are then compared with the confidence interval based on NA using Monte Carlo simulation. In Sect. 5, a real dataset is used to show the applicability and efficiency of the introduced methods in practice. Some concluding remarks and directions for future research are provided in Sect. 6.

2 Bootstrap Methods for a Judgment Post Stratified Sample

Let \(\left( Y_1,R_1\right) , \ldots , \left( Y_n,R_n\right) \), be a JPS sample of size n from a population with CDF F, where \(R_i\) is the judgment rank of \(Y_i\) among \(m-1\) supplemental units in the set of size m. Let \(n_r\) be the number of the JPS sample units with judgment rank r, then it is easy to show that under a consistent ranking process, the vector \(\left( n_1,\ldots ,n_m\right) \) follows a multinomial distribution with mass parameter n and probability vector \(\left( \frac{1}{m},\ldots ,\frac{1}{m}\right) \). As it is mentioned earlier, the conditional distribution of \(Y_i\) given its judgment rank \(R_i=r\) is given by \(F_{[r]}\left( t\right) =P\left( Y_i\le t|R_i=r\right) =P\left( Y_{[r]}\le t\right) \), where \(Y_{[r]}\) is the rth judgment order statistic of a sample of size m. Here, the square bracket [.] is used to indicate imperfect rankings. If the rankings are perfect, then the square brackets are replaced with the round ones and \(F_{(r)}\left( t\right) =B\left( F\left( t\right) ,r,m+1-r\right) \), where \(B\left( t,\alpha ,\beta \right) \) is the CDF of the beta distribution with parameters \(\alpha \) and \(\beta \), evaluated at the point t. The empirical estimator of \(F_{[r]}\left( t\right) \) based on a JPS sample is given by

$$\begin{aligned} F_{[r],n_r}\left( t\right) =\left( \frac{1}{n_r}\sum _{i=1}^{n_r} \mathbb {I}\left( Y_i \le t\right) \mathbb {I}\left( R_i=r\right) \right) \mathbb {I}\left( n_r>0\right) , \end{aligned}$$
(1)

where \(\mathbb {I}\left( .\right) \) is the indicator function. The empirical estimator of CDF based on a JPS sample is then given by

$$\begin{aligned} F_{n,jps}\left( t\right) =\frac{1}{d_m}\sum _{r=1}^m F_{[r],n_r}\left( t\right) , \end{aligned}$$
(2)

where \(d_m={\sum _{r=1}^m \mathbb {I}\left( n_r>0\right) }\).

The standard mean estimator in the JPS setting can be defined in a similar fashion as follows

$$\begin{aligned} \hat{\mu }_{jps}=\frac{1}{d_m} \sum _{r=1}^m \bar{Y}_{[r]}, \end{aligned}$$
(3)

where

$$\begin{aligned} \bar{Y}_{[r]}=\left( \frac{1}{n_r}\sum _{i=1}^{n} Y_i \mathbb {I}\left( R_i=r\right) \right) \mathbb {I}\left( n_r>0\right) , \end{aligned}$$
(4)

is the mean of sample units with judgment rank r. Dastbaravarde et al. (2016) examined the finite sample size and asymptotic properties of \(\hat{\mu }_{jps}\). They showed that this estimator is unbiased regardless of issue of rankings. Dastbaravarde et al. (2016) also proved that if the sample size n goes to infinity then \(\sqrt{n}\left( \hat{\mu }_{jps}-\mu \right) \) converges in distribution to a mean zero normal distribution with variance \(\sigma ^2_{jps}=\frac{1}{m}\sum _{r=1}^m \sigma ^2_{[r]}\), where \(\sigma ^2_{[r]}\) is the variance of the rth judgment order statistic in a set of size m. One can simply show that \(\sigma ^2_{jps}\le \sigma ^2\), and therefore \(\hat{\mu }_{jps}\) is at least as asymptotically efficient as the standard mean estimator in SRS.

In what follows, we develop two algorithms to draw a bootstrap sample from a JPS sample. The first algorithm is called simple bootstrap JPS (SBJPS) which is based on drawing simple random samples of JPS units with replacement from the original JPS sample. In the second algorithm, the JPS sample units are partitioned into different strata based on their judgment ranks and then a simple random sample with replacement is drawn from each stratum. The second algorithm is called bootstrap JPS by stratum (BJPSS). These two algorithms are delineated as follows.

2.1 SBJPS: Simple Bootstrap JPS

The simple bootstrap JPS algorithm is based on the fact that the pairs of \(\left( Y_i,R_i\right) \), \(i=1,\ldots ,n\) in the JPS sample, are identically and independently distributed. Thus, the original bootstrap algorithm can be applied on the pairs \(\left( Y_i,R_i\right) \). This technique was firstly used by Ozturk (2016), and here, we describe it in detail and establish its asymptotic properties.

Algorithm 1: SBJPS

  1. 1.

    Assign probability \(\frac{1}{n}\) to each pair of \(\left( Y_1,R_1\right) ,\ldots ,\left( Y_n,R_n\right) \).

  2. 2.

    Randomly draw n pairs with replacement from \(\left( Y_1,R_1\right) ,\ldots ,\left( Y_n,R_n\right) \) to obtain \(\left( Y_1^*,R_1^*\right) ,\ldots ,\left( Y_n^*,R_n^*\right) \).

  3. 3.

    Define the bootstrap empirical distribution function as

    $$\begin{aligned} F_{n,jps}^*\left( t\right) =\frac{1}{d_m^*}\sum _{r=1}^m F_{[r],n_r}^*\left( t\right) , \end{aligned}$$

    where

    $$\begin{aligned} F_{[r],n_r}^*\left( t\right) =\left( \frac{1}{n_r^*}\sum _{j=1}^{n_r^*} \mathbb {I}\left( Y_i^* \le t\right) \mathbb {I}\left( R_i^*=r\right) \right) \mathbb {I}\left( n_r^*>0\right) , \end{aligned}$$

    \(d_m^*={\sum _{r=1}^m\mathbb {I}\left( n_r^*>0\right) }\) and \(n_r^*\) is the number of bootstrap sample units with judgment rank r (for \(r=1,\ldots ,m\)).

  4. 4.

    The bootstrap estimate of the parameter \(\theta =g\left( F\right) \), for an arbitrary function of \(g\left( .\right) \), is then obtained as \(\hat{\theta }^*=g\left( F_{n,jps}^*\right) \).

  5. 5.

    Repeat steps 1–4 for B times to obtain bootstrap sample \(\left( \hat{\theta }^*_1, \ldots , \hat{\theta }^*_B \right) \).

2.2 BJPSS : Bootstrap JPS by Stratum

This bootstrap algorithm is based on the idea that artificial post-strata can be constructed based on the judgment ranks in a JPS sample. Thus, independent bootstrap samples can be first drawn from different post-strata and then combined to obtain the final bootstrap sample. Let \(\left( Y_1,R_1\right) , \ldots , \left( Y_n,R_n\right) \), be a JPS sample of size n from a population with CDF F. Let \(\mathcal {Y}_r=\left\{ \left( Y_1,R_1\right) , \ldots , \left( Y_n,R_n\right) ; Y_i|R_i=r; i=1,\ldots ,n \right\} \) be the sample units with judgment rank r (for \(r=1, \ldots , m\)). Then for a fixed r, the sample units in \(\mathcal {Y}_r\) are independent and identical random variables. Thus, the following bootstrap algorithm is proposed.

Algorithm 2: BJPSS

  1. 1.

    If \(n_r>0\), then assign probability \(\frac{1}{n_r}\) to each element of \(\mathcal {Y}_r\) and randomly draw \(n_r\) pairs of \((Y_i^*,R_i^*)\) with replacement from \(\mathcal {Y}_r\) to construct \(\mathcal {Y}_r^*\).

  2. 2.

    Repeat the step (1) for \(r=1,\ldots ,m\).

  3. 3.

    The bootstrap sample is then obtained as \(\cup _{r=1}^m \mathcal {Y}_r^*\).

  4. 4.

    The bootstrap empirical distribution function and estimate of the parameter \(\theta =g\left( F\right) \) are then defined in a similar fashion as in algorithm 1.

  5. 5.

    Repeat steps 1–4 for B times to obtain bootstrap sample \(\left( \hat{\theta }^*_1, \ldots , \hat{\theta }^*_B \right) \).

Finally, bootstrap statistical inference can be made based on the bootstrap sample \(\left( \hat{\theta }^*_1, \ldots , \hat{\theta }^*_B \right) \) obtained from each of the algorithms defined above. For example, bootstrap variance estimate of \(\hat{\theta }\) is obtained from \(\widehat{Var}_B\left( \hat{\theta }\right) =\frac{1}{B-1}\sum _{b=1}^B \left( \hat{\theta }^*_b - \bar{\hat{\theta }}^*\right) \), where \( \bar{\hat{\theta }}^*=\frac{1}{B}\sum _{b=1}^B \hat{\theta }^*_b\).

3 Asymptotic Results

In this section, we establish the consistency of the bootstrap algorithms we described in Sect. 2 using Mallows metric which was firstly used by Bickel and Freedman (1981) in the context of the bootstrap technique. The definition of Mallows metric is given below.

Definition 1

(Mallows metric) Let \(\Gamma _2\) be the set of CDFs having finite second moments. Let X and Y be random variables with CDFs \( G , H \in {\Gamma _2}\), respectively, and define \(\rho _2(G,H)=\inf _{\tau _{X,Y}}\mathbb {E}^{1/2} (\vert X-Y \vert ^2)\) where \(\tau _{X,Y}\) is the set of all possible joint distributions of the pair (XY) whose marginal CDFs are G and H, respectively.

Definition 2

(Concept of consistency for bootstrap estimators) Let \(T_n =T(X_1,X_2,\ldots ,X_n)\) be a statistic based on a random sample \(X_1,\ldots ,X_n\) and \(T_n^*\) be its corresponding bootstrap replicate. Then the bootstrap procedure is strongly consistent under \(\rho _2\) for T if \(\rho _2(H_n,H_n^*)\xrightarrow []{\text {a.s.}} 0\), where \(H_n\) is the sampling distribution of \(T_n\) and \(H_n^*\) is the sampling distribution of \(T_n^*\) and \(\rho _2\) is a metric on the space of CDFs.

In order to obtain asymptotic results for our bootstrap algorithms, the following lemmas are recalled from Bickel and Freedman (1981).

Lemma 1

Let \(G_n,G\in {\Gamma _2}\). Then \(\rho _2(G_n,G)\xrightarrow []{\text {a.s.}} 0\) as \(n\rightarrow \infty \) is equivalent to

$$\begin{aligned} G_n\rightarrow G \quad weakly \ and \quad \int t^2dG_n(t)\xrightarrow []{\text {a.s.}} \int t^2dG(t). \end{aligned}$$

Lemma 2

Suppose the \(U_j\) are independent, likewise for \(V_j\). Assume that their corresponding CDFs are in \(\Gamma _2\) and \(\mathbb {E}(U_j)=\mathbb {E}(V_j)\). Then

$$\begin{aligned} \rho _2\left( \sum _j U_j , \sum _j V_j\right) ^2 \le \sum _j \rho _2(U_j , V_j)^2. \end{aligned}$$

Lemma 3

Let the CDFs of UV are in \(\Gamma _2.\) Then

$$\begin{aligned} \rho _2(U,V)^2 =\rho _2[U-\mathbb {E}(U) , V-\mathbb {E}(V)]^2+\vert \mathbb {E}(U)-\mathbb {E}(V)\vert ^2. \end{aligned}$$

The first theorem in this section shows that the empirical estimator of \(F_{[r]}\left( t\right) \) based on a JPS sample is consistent.

Theorem 3.1

Let \(\left( Y_1,R_1\right) , \ldots , \left( Y_n,R_n\right) \), be a JPS sample of size n from a population with CDF \(F \in {\Gamma _2}\) which is obtained under a consistent ranking process. Let \(F_{[r],n_r}\left( t\right) \) be the empirical estimator of \(F_{[r]}\left( t\right) \) as given in Eq. (1). Then \(\rho _2(F_{[r],n_r},F_{[r]})\xrightarrow []{\text {a.s.}} 0 \) as the sample size n goes to infinity.

Proof

Let \(\mathcal {Y}_r=\left\{ \left( Y_1,R_1\right) , \ldots , \left( Y_n,R_n\right) ; Y_i|R_i=r; i=1,\ldots ,n \right\} \) be the sample units with judgment rank r (for \(r=1, \ldots , m\)). Then for a fixed value of r and under a consistent ranking assumption, the sample units in \(\mathcal {Y}_r\) are independent and identically distributed with CDF \(F_{[r]}\). A straightforward application of Glivenko–Cantelli theorem indicates that

$$\begin{aligned} \sup _{t}\big |F_{[r],n_r}(t)-F_{[r]}(t)\big |\xrightarrow []{\text {a.s.}} 0 \quad as \ n_r\rightarrow \infty \ ; \ \forall r. \end{aligned}$$

If \(F \in {\Gamma _2}\), then \(F_{[r]} \in {\Gamma _2}\). Thus, it follows from strong law of large numbers (SLLN) that:

$$\begin{aligned} \int t^2dF_{[r],n_r}(t)=\frac{1}{n_r}\sum _{j=1}^{n_r}X_{[r]j}^2\xrightarrow []{\text {a.s.}} \int t^2dF_{[r]}(t). \end{aligned}$$

Therefore by Lemma 1, \(\rho _2\left( F_{[r],n_r},F_{[r]}\right) \xrightarrow []{\text {a.s.}} 0 \) as \( n_r\rightarrow \infty \) for \(r=1,\ldots ,m\). \(\square \)

Theorem 3.2

Let \(\left( Y_1,R_1\right) , \ldots ,\left( Y_n,R_n\right) ,\) be a JPS sample of size n from a population with CDF \(F \in {\Gamma _2}\) which is obtained under a consistent ranking process. Let \(F_{n,jps}\left( t\right) \) be the empirical estimator of \(F\left( t\right) \) based on the JPS sample as given in Eq. (2). Then \(\rho _2(F_{n,jps},F)\xrightarrow []{\text {a.s.}} 0 \) as \( n\rightarrow \infty \).

Proof

Note that under a consistent ranking process assumption, we can write

$$\begin{aligned} F\left( t\right) =\frac{1}{m}\sum _{r=1}^m F_{[r]}\left( t\right) . \end{aligned}$$

Therefore, we have

$$\begin{aligned} \sup _{t}|F_n(t)-F(t)|&=\sup _{t}\left|\frac{1}{d_m}\sum _{r=1}^m F_{[r],n_r}\left( t\right) -\frac{1}{m}\sum _{r=1}^m F_{[r]}(t)\right|\\&\le \sum _{r=1}^m \sup _{t} \left|\frac{1}{d_m} F_{[r],n_r}(t)-\frac{1}{m}F_{[r]}(t)\right|. \end{aligned}$$

Besides

$$\begin{aligned}&\sup _{t}\big |\frac{1}{d_m} F_{[r],n_r}(t)-\frac{1}{m}F_{[r]}(t)\big |\\&\quad \le \big |\frac{1}{d_m}-\frac{1}{m}\big |F_{[r],n_r}(t) + \frac{1}{m}\sup _{t}\big |F_{[r],n_r}(t)-F_{[r]}(t)\big |=o(1). \end{aligned}$$

Therefore \(\sup _{t}\big |F_n(t)-F(t)\big |= o(1) \). If \( F\in \Gamma _2\) then \(F_{[r]} \in \Gamma _2\). Thus, by SLLN we have:

$$\begin{aligned} \int t^2 dF_n(t) \xrightarrow []{\text {a.s.}}\sum _{r=1}^m \frac{1}{m} \int t^2 dF_{[r]}(t) = \int t^2 d\left( \frac{1}{m}\sum _{r=1}^m F_{[r]}(t)\right) =\int t^2 dF(t). \end{aligned}$$

Then by Lemma 1, \(\rho _2\left( F_n,F\right) \xrightarrow []{\text {a.s.}} 0 \) as \( n\rightarrow \infty \). \(\square \)

Theorem 3.3

Let \(\left( Y_1^*,R_1^*\right) , \ldots , \left( Y_n^*,R_n^*\right) ,\) be a SBJPS sample of size n based on a JPS sample of the same size from a population with CDF \(F \in {\Gamma _2}\) which is obtained under a consistent ranking process. Then \(\rho _2(F_{[r],n_r^*}^*,F_{[r],n_r})\xrightarrow []{\text {a.s.}} 0 \) as \( n_r \rightarrow \infty \) for \(r=1,\ldots ,m,\) where \(F_{[r],n_r}^*\) is the empirical estimator of \(F_{[r],n}\) based on the SBJPS sample.

Proof

Let \(\mathcal {Y}_r^*=\left\{ \left( Y_1^*,R_1^*\right) , \ldots , \left( Y_n^*,R_n^*\right) ; Y_i^*|R_i^*=r; i=1,\ldots ,n \right\} \) be the bootstrap units with judgment rank r (for \(r=1, \ldots , m\)). Note that the units in \(\mathcal {Y}_r^*\) are independent and identically distributed with CDF \(F_{[r]}\). Thus it follows from Glivenko–Cantelli theorem that

$$\begin{aligned} \sup _{t}\big |F_{[r],n_r^*}^*(t)-F_{[r],n_r}(t)\big |\xrightarrow []{\text {a.s.}} 0 \quad as \ n_r\rightarrow \infty ;\quad \forall r. \end{aligned}$$

If \(F \in {\Gamma _2}\), then \(F_{[r]} \in {\Gamma _2}\). By SLLN, we have

$$\begin{aligned} \int t^2dF_{[r],n_r^*}^*(t)\xrightarrow []{\text {a.s.}}\int t^2dF_{[r],n_r}(t). \end{aligned}$$

Therefore by Lemma 1, \(\rho _2\left( F_{[r],n_r^*}^*,F_{[r],n_r}\right) \xrightarrow []{\text {a.s.}} 0\) as \(n_r \rightarrow \infty \) for \(r=1,2,\ldots ,m\). \(\square \)

Theorem 3.4

Let \(\left( Y_1^*,R_1^*\right) , \ldots , \left( Y_n^*,R_n^*\right) ,\) be a SBJPS sample of size n based on a JPS sample of the same size from a population with CDF \(F \in {\Gamma _2}\) which is obtained under a consistent ranking process. Then \(\rho _2(F_{n,jps}^*,F_{n,jps})\xrightarrow []{\text {a.s.}} 0 \) as \( n\rightarrow \infty ,\) where \(F_{n,jps}^*\) is the empirical estimator of F based on the SBJPS sample.

Proof

$$\begin{aligned} \sup _{t}\big |F_{n,jps}^*(t)-F_{n,jps}(t)\big |&=\sup _{t}\big |\sum _{r=1}^m \frac{1}{d_m^*} F_{[r],n_r^*}^*(t)-\frac{1}{m}\sum _{r=1}^m F_{[r],n_r}(t)\big |\\&\le \sum _{r=1}^m \sup _{t} \big |\frac{1}{d_m^*} F_{[r],n_r^*}^*(t)-\frac{1}{m}F_{[r],n_r}(t)\big |=o(1). \end{aligned}$$

If \( F\in \Gamma _2\) then \(F_{[r]} \in \Gamma _2\). Thus, by SLLN we have

$$\begin{aligned} \int t^2 dF_n^*(t) \xrightarrow []{\text {a.s.}} \sum _{r=1}^m \frac{1}{m} \int t^2 dF_{[r],n_r}(t) =\int t^2 dF_n(t) \end{aligned}$$

and this completes the proof. \(\square \)

Now, we are ready to present the main results. To do so, we only present the results for SBJPS algorithm and the results for BJPSS can be obtained in a similar fashion.

Theorem 3.5

Let \(\left( Y_1^*,R_1^*\right) , \ldots , \left( Y_n^*,R_n^*\right) ,\) be a SBJPS sample of size n based on a JPS sample of the same size from a population with CDF \(F \in {\Gamma _2}\) which is obtained under a consistent ranking process. Then \(\rho _2(F_{n,jps}^*,F)\xrightarrow []{\text {a.s.}} 0 \) as \( n\rightarrow \infty ,\) where \(F_{n,jps}^*\) is the empirical estimator of F based on the SBJPS sample.

Proof

$$\begin{aligned} \sup _{t}\big |F_{n,jps}^*(t)-F(t)\big |\le \sup _{t}\big |F_{n,jps}^*(t)-F_{n,jps}(t)\big |+ \sup _{t}\big |F_{n,jps}(t)-F(t)\big |=o(1). \end{aligned}$$

If \( F\in \Gamma _2\) then \(F_{[r]} \in \Gamma _2\). So, SLLN indicates that

$$\begin{aligned}&\int t^2 dF_{n,jps}^*(t)\xrightarrow []{\text {a.s.}} \sum _{r=1}^m \frac{1}{m} \int t^2 dF_{[r]}(t) \\&\quad =\int t^2 d\left( \frac{1}{m}\sum _{r=1}^m F_{[r]}(t)\right) =\int t^2 dF(t), \end{aligned}$$

and this completes the proof. \(\square \)

Theorem 3.6

Let \(\left( Y_1^*,R_1^*\right) , \ldots , \left( Y_n^*,R_n^*\right) ,\) be a SBJPS sample of size n based on a JPS sample of the same size from a population with CDF \(F \in {\Gamma _2}\) which is obtained under a consistent ranking process. Define \(T_n=\sqrt{n}(\hat{\mu }_{jps}-\mu )\) and \(T_n^*=\sqrt{n}(\hat{\mu }_{jps}^*-\hat{\mu }_{jps}),\) then \(\rho _2(H_n^* , H_n)\xrightarrow []{\text {a.s.}} 0 \) as \(n\rightarrow \infty \) where \(H_n\) and \(H_n^*\) are the sampling distributions of \(T_n\) and \(T_n^*,\) respectively.

Proof

$$\begin{aligned} \rho _2\left( H_n^* , H_n\right)&=\rho _2\left( \sqrt{n}\left( \hat{\mu }_{jps}^*-\hat{\mu }_{jps}\right) ,\sqrt{n}\left( \hat{\mu }_{jps}-\mu \right) \right) \\&=\rho _2\left( \sqrt{n}\sum _{r=1}^{m}\left( \frac{1}{d_m^*}\bar{X}_{[r]}^*- \frac{1}{d_m} \bar{X}_{[r]}\right) ,\sqrt{n}\sum _{r=1}^{m}\left( \frac{1}{d_m} \bar{X}_{[r]}-\frac{1}{m}\mu _{[r]}\right) \right) \\&=\rho _2\left( \sum _{r=1}^{m}\sqrt{\frac{n}{n_r}}\sqrt{n_r}\left( \frac{1}{d_m^*}\bar{X}_{[r]}^*- \frac{1}{d_m} \bar{X}_{[r]}\right) ,\right. \\&\quad \left. \sum _{r=1}^{m}\sqrt{\frac{n}{n_r}}\sqrt{n_r}\left( \frac{1}{d_m} \bar{X}_{[r]}-\frac{1}{m}\mu _{[r]}\right) \right) \\&\quad \xrightarrow []{\text {a.s.}} \rho _2\left( \sum _{r=1}^{m}\sqrt{\frac{n_r}{m}}\left( \bar{X}_{[r]}^*- \bar{X}_{[r]}\right) ,\sum _{r=1}^{m}\sqrt{\frac{n_r}{m}}\left( \bar{X}_{[r]}-\mu _{[r]}\right) \right) \\&\le \frac{1}{\sqrt{m}}\sqrt{\sum _{r=1}^{m}\rho _2\left( \sqrt{n_r}\left( \bar{X}_{[r]}^*-\bar{X}_{[r]}\right) ,\sqrt{n_r}\left( \bar{X}_{[r]}-\mu _{[r]}\right) \right) } \\&\le \frac{1}{\sqrt{m}}\sqrt{m}\sqrt{\sup _r\Big [\rho _2\left( \sqrt{n_r}\left( \bar{X}_{[r]}^* -\bar{X}_{[r]}\right) ,\sqrt{n_r}\left( \bar{X}_{[r]}-\mu _{[r]}\right) \right) \Big ]}=o(1). \end{aligned}$$

Because

$$\begin{aligned}&\rho _2\left( \sqrt{n_r}\left( \bar{X}_{[r]}^*-\bar{X_{[r]}}\right) ,\sqrt{n_r}\left( \bar{X}_{[r]}-\mu _{[r]}\right) \right) ^2\\&\quad =\frac{1}{n_r}\rho _2\left( \sum _{j=1}^{n_r}\left( X_{[r]j}^*-\bar{X}_{[r]}\right) ,\sum _{j=1}^{n_r}\left( X_{[r]j}-\mu _{[r]}\right) \right) ^2\\&\quad \le \frac{1}{n_r}\sum _{j=1}^{n_r}\rho _2\left( X_{[r]j}^*-\bar{X}_{[r]},X_{[r]j}-\mu _{[r]}\right) ^2\\&\quad =\rho _2\left( X_{[r]1}^*-\bar{X}_{[r]},X_{[r]1}-\mu _{[r]}\right) ^2\\&\quad =\rho _2\left( X_{[r]1}^*,X_{[r]1}\right) ^2-\big |\bar{X}_{[r]}-\mu _{[r]}\big |^2\\&\quad =\rho _2\left( F_{[r],n_r^*}^*,F_{[r],n_r}\right) ^2-\big |\bar{X}_{[r]}-\mu _{[r]}\big |^2\\&\quad =o(1). \end{aligned}$$

\(\square \)

In the next section, we will use the above bootstrap methods for constructing confidence intervals of the population mean.

4 Monte Carlo Simulation

In this section, we compare the performance of bootstrap expanded percentile confidence intervals based on the standard mean estimator in JPS with confidence interval using NA technique. In the NA method, we used the fact that \(\hat{\mu }_{jps}\) is an unbiased estimator of the population mean and \(\sqrt{n}\left( \hat{\mu }_{jps}-\mu \right) \) converges in distribution to a mean zero normal distribution with variance \(\sigma ^2_{jps}\) as the sample size n tends to infinity. An unbiased estimator \(\hat{\sigma }^2_{jps}\) for \(\mathbb {V}(\hat{\mu }_{jps})\) is given by Ozturk (2016, Eq. 2.5 in Theorem 2). Based on this estimator, the \(100\left( 1-\alpha \right) \%\) asymptotic confidence interval for the population mean due to Ozturk (2016), is given by

$$\begin{aligned} \left( \hat{\mu }_{jps}-t_{1-\alpha /2,n-1}\sqrt{\frac{\hat{\sigma }^2_{jps}}{n}},\hat{\mu }_{jps}+t_{1-\alpha /2,n-1}\sqrt{\frac{\hat{\sigma }^2_{jps}}{n}} \right) , \end{aligned}$$

where \(t_{p,v}\) is the pth quantile of the t-distribution with v degrees of freedom.

There are several methods for constructing a confidence interval using bootstrap approach. Here, we use bootstrap expanded percentile confidence interval since it is obtained using the similar adjustments to the NA confidence interval described above for the bootstrap setting and it often has better performance than percentile confidence interval (see for example, Hesterberg 2015).

The \(100\left( 1-\alpha \right) \%\) expanded percentile confidence interval for the population mean using bootstrap approach can be constructed as follows. First, let \({\alpha }^\prime /2=\Phi \left( \sqrt{n/\left( n-1\right) }t_{\alpha /2,n-1} \right) \), where \(\Phi \left( .\right) \) is the CDF of the standard normal distribution. Then, draw B bootstrap samples of size n from original JPS sample using each of the bootstrap methods introduced in Sect. 2. Next, obtain the estimate of the population mean using the bootstrap sample, \(\left( \hat{\mu }_{jps,1}^*,\ldots ,\hat{\mu }_{jps,B}^*\right) \). Finally, the \(100\left( 1-\alpha \right) \%\) bootstrap expanded percentile confidence interval is obtained as \(\left( \hat{\mu }_{jps}^{*,\alpha ^\prime /2}, \hat{\mu }_{jps}^{*,1-\alpha ^\prime /2}\right) \), where \(\hat{\mu }_{jps}^{*,p}\) is the pth quantile of the bootstrap sample \(\left( \hat{\mu }_{jps,1}^*,\ldots ,\hat{\mu }_{jps,B}^*\right) \).

To generate a JPS sample, we assume that the ranking in each set of size m is done using perceptual linear ranking model due to Dell and Clutter (1972). In this model, it is assumed that in each set of size m, the actual rank of the concomitant variable Y is assigned as the judgment rank of the interest variable X, where the following relation between X and Y holds

$$\begin{aligned} Y=\lambda \left( \frac{X-\mu _x}{\sigma _x} \right) +\sqrt{1-\lambda ^2}Z, \end{aligned}$$

where \(\mu _x\) and \(\sigma _x\) are mean and standard deviation of X, respectively, Z is independent from X and follows a standard normal distribution and the parameter \(\lambda \) is the correlation coefficient between X and Y which controls the quality of ranking.

We set \(n \in \left\{ 10,20,30,50, 100 \right\} \), \(m \in \left\{ 3, 4, 5\right\} \), \(\lambda \in \left\{ 0.5, 0.7, 0.9, 1\right\} \) and for each combination of \(\left( n,m, \lambda \right) \), we have generated 10,000 JPS random samples from three symmetric distributions, i.e. standard normal distribution (\(N\left( 0,1\right) \)), standard uniform distribution (\(U\left( 0,1\right) \)), beta distribution with parameters 0.5, and 0.5 (\(B\left( 0.5,0.5\right) \)) and three asymmetric distributions, i.e. standard log-normal distribution (\(LN\left( 0,1\right) \)), Gamma distribution with scale parameter 1 and shape parameter 0.5 (\(G\left( 0.5\right) \)) and Weibull distribution with scale parameter 1 and shape parameter 0.5 (\(W\left( 0.5\right) \)). We have then estimated the coverage probability (CP) of bootstrap and NA \(95\%\) confidence intervals using the simulated samples. Also, the bootstrap size is taken to be \(B=1000\). For brevity, we only present the results for standard normal and standard log-normal distributions in Figs. 1 and 2, respectively, in the paper. The results for other distributions can be found in Figs. S1, S2, S3 and S4 in the Supplementary Material.

Fig. 1
figure 1

Estimated coverage probability (CP) of different confidence intervals based on NA (represented by two dash line, and blue color), SBJPS (represented by solid line and red color) and BJPSS (represented by dotted line and black color) methods as a function of sample size n for \(m \in \left\{ 3, 4, 5\right\} \), \(\lambda \in \left\{ 0.5, 0.7, 0.9, 1\right\} \) when the parent distribution is \(N\left( 0,1\right) \). This figure appears in color in the electronic version of this paper. (Color figure online)

Fig. 2
figure 2

Estimated coverage probability (CP) of different confidence intervals based on NA (represented by two dash line, and blue color), SBJPS (represented by solid line and red color) and BJPSS (represented by dotted line and black color) methods as a function of sample size n for \(m \in \left\{ 3, 4, 5\right\} \), \(\lambda \in \left\{ 0.5, 0.7, 0.9, 1\right\} \) when the parent distribution is \(LN\left( 0,1\right) \). This figure appears in color in the electronic version of this paper. (Color figure online)

Figure 1 presents the simulation results when the parent distribution is standard normal. We observe from this figure that the NA confidence interval provides the nearest CP to its nominal level \(95\%\) in all considered cases, which is not surprising since the NA method should be the best technique when the parent distribution is truly normal. It should be mentioned that the CP of NA confidence interval does not change much with sample size n. The confidence interval based on SBJPS method provides slightly lower (higher) CP to its nominal level for \(n=10\left( \ge 20 \right) \). The CP of SBJPS confidence interval increases when sample size goes from \(n=10\) to 20 and levels out for \(n\ge 20\). The confidence interval based on BJPSS method has the lowest CP for \(n\le 30\) and the difference between its CP and the nominal level \(95\%\) is more pronounced for \(n=10\). This can be justified by the nature of BJPSS algorithm. Note that BJPSS forces the number of units with each judgment rank to be fixed. Since the number of units with a particular rank is actually random in JPS, BJPSS underestimates the amount of variability in the JPS mean estimate. It is worth mentioning that the CP of BJPSS confidence interval increases with set sample size n, for \(n\le 50\) and it remains almost unchanged for \(n>50\). It is also easy to see the the set size m and the quality of ranking \(\lambda \) do not have much effect on CPs of different confidence intervals.

Comparing Fig. 1 with Figs. S1 and S2 in the Supplementary Material, we observe that the patterns of the CPs of different confidence intervals for standard normal distribution remain almost the same as \(U\left( 0,1\right) \) and \(B\left( 0.5,0.5\right) \) distributions.

Simulation results for standard log-normal distribution are given in Fig. 2. It is clear from this figure that CPs of all considered confidence intervals are lower than the nominal level, yet they converge to \(95\%\) as the sample size goes from \(n=10\) to 100. The confidence interval based on SBJPS method has the nearest CP to the nominal level \(95\%\) in all considered cases except for \(n=10\) in which the NA confidence interval usually has slightly better performance. The BJPSS confidence interval has the poorest performance for \(n\le 30\), and the difference between its CP and the nominal level becomes sizeable for \(n=10\), but for \(n\ge 50\) its CP usually fall between CPs of the confidence intervals based on SBJPS and NA approaches. Similar to what we have observed for standard normal distribution in Fig. 1, it is evident from Fig. 2 that the set size m and the quality of ranking \(\lambda \) do not have much effect on CPs of the confidence intervals for the standard log-normal distribution, as well.

Comparing Fig. 2 with Figs. S3 and S4 in the Supplementary Material, we find that the patterns of CPs of different confidence intervals for \(LN\left( 0,1\right) \), \(G\left( 0.5\right) \), \(W\left( 0.5\right) \) distributions are almost the same.

5 A Real Data Example

In this section, a real dataset is used to illustrate the potential application proposed procedures in this paper to construct a confidence interval for the population mean. The real dataset is then used the evaluate the performance of different confidence intervals in the JPS setting.

Bone mineral density (BMD) is the amount of bone mineral in bone tissues, and it is frequently used in medicine as an indicator for detecting osteoporosis. The BMD measurement is usually made over the lumbar spine and over the upper part of the hip using dual-energy X-ray absorptiometry (DEXA) technology, and a person is considered to be suffering from osteoporosis if his/her BMD measurement using DEXA technology is no larger than 0.56. Note that obtaining exact measurement of BMD using DEXA technology is costly, and the technology may not be easily accessible in some developing countries. It is also inconvenient to use because it need a medical expert to manually segment images. But, a medical expert can simply assign judgement ranks to sample units in a set of small size in terms of probability of suffering from osteoporosis. This can be done using the medical expert’s personal experience or by checking if the patient has some risk factors of osteoporosis such as family history, cigarette smoking, excessive alcohol and caffeine consumption. Therefore, JPS seems to be a better alternative to SRS for this application.

The dataset used in this section in obtained from the third National Health and Nutrition Examination Survey (NHANES III), and is available online at http://www.cdc.gov/nchs/nhanes/nh3data.htm.Footnote 1 We consider the BMD of people who suffer from osteoporosis in NHANES III dataset as our hypothetical population (denoted here after by BMD dataset). The histogram of BMD dataset is presented in Fig. 3. Suppose that we are interested in constructing a confidence interval for mean of BMD in this population. So, we assume that the BMD is our variable of interest. Using \(n=20\) and \(m=3\), we draw a JPS sample from the given population. To do so, we first draw a SRS sample of size 20 from the BMD dataset and measure all of them. Then we each measured unit, we draw 2 additional units to create a set of size \(m=3\). Sampling with replacement is considered, so the assumption of independence is guaranteed. The rank of each measured unit in the set of size \(m=3\) is determined using the linear ranking model described in Sect. 4 with \(\lambda =0.7\). The sample units with their ranks are presented in Table 1 and the corresponding confidence intervals for the population mean are given in Table 2.

Fig. 3
figure 3

The histogram of BMD dataset

Table 1 A JPS sample of size \(n=20\) using set size \(m=3\) from BMD dataset
Table 2 \(95\%\) JPS confidence intervals for the mean of the BMD using the data in Table 1

We next use this BMD dataset to compare performance of different confidence intervals. In doing so, we set \(n\in \left\{ 10, 20, 30, 50\right\} \), \(m\in \left\{ 3,4,5 \right\} \) and for each combination of \(\left( n,m\right) \), we draw 10, 000 JPS samples from the BMD dataset, where all samplings are done with replacement. The ranking is done using the linear ranking model as described in Sect. 4 with \(\lambda \in \left\{ 1, 0.9, 0.7, 0.5 \right\} \). Finally, we estimate CPs of bootstrap expanded percentile and NA confidence intervals for the population mean in the JPS setting using 10,000 samples. The bootstrap size is taken as \(B=1000\). The results are presented in Table 3.

Table 3 Estimated coverage probability (CP) of different confidence intervals of the population mean in the JPS setting at nominal level \(95\%\) using BMD dataset when the ranking is done using linear ranking model with \(\lambda \in \left\{ 1, 0.9, 0.7, 0.5 \right\} \)

The results in Table 3 are consistent with what we observed in Sect. 4. The confidence interval based on SBJPS method often provides the nearest CP to the nominal level \(95\%\), which can be justified by the fact that the distribution of BMD dataset is skewed (see Fig. 3).

6 Conclusion

Judgment post stratification (JPS) is a useful sampling scheme in applications requiring cost efficiency. The JPS adds additional judgment ranking information to a simple random sample to improve estimation of the population parameters. The judgment ranks are obtained through subjective judgment of an expert, concomitant variable, or a combination of them and need not to be accurate.

Since JPS is a cost efficient sampling method and is applicable in the settings in which obtaining exact values of sample units is much harder than finding their ranks in a set of small size, it is very common that a researcher cannot obtain a JPS sample with large enough size to use asymptotic distribution of the estimators. Therefore, existence an alternative method to draw statistical inference based on a JPS sample is essential.

In this paper, we described two bootstrap methods based on a JPS sample, i.e. simple bootstrap JPS (SBJPS) and bootstrap JPS by stratum (BJPSS). SBJPS has been already used in the literature without studying its consistency and BJPSS is our proposal. We then showed that the bootstrap procedures are consistent. Finally, as an application of the bootstrap methods, we discussed construction of bootstrap expanded percentile confidence intervals for the population mean using empirical mean estimator in JPS and compared their coverage probabilities with the confidence interval obtained via normal approximation (NA) using Monte Carlo simulation study for variety choices of sample size, set size and parent distribution. Based on the simulation study, we found that each of NA and SBJPS techniques can be the best method in terms of coverage probability (CP) for construction of a confidence interval for the population mean in the most considered cases. So, we recommend using SBJPS technique to construct a confidence interval for the population mean based on bootstrap methods when the sample size is small and normality assumption is in suspect.

Although this work studies two resampling methods for a JPS sample, it remains an ample space for future research in this field. For example, several improved mean estimators over the empirical mean estimator have been proposed in the literature by Wang et al. (2008), Frey and Feeman (2012) and Frey (2016), and exact expressions for variances of some of those mean estimators are not analytically available. Thus, one can use the bootstrap methods described in this paper to construct confidence intervals for the population mean using the improved mean estimators and it is intuitively expected that those confidence intervals have better performance than confidence intervals based on the empirical mean estimator. It is known that the bias-corrected and accelerated and bootstrap-t methods often have a competitive performance as compared to bootstrap expanded percentile technique (Hall 1988). Therefore, it is of interest to discuss construction of those confidence intervals based on a JPS sample. One can also think of constructing confidence intervals for other population attributes rather than the population mean.

Finally, we would like to mention that for a JPS sample of small size, it is very common to observe empty strata (i.e. \(n_r=0\) for some \(r \in \left\{ 1, \ldots , m\right\} \)). Statistical inference based on a JPS sample with at least one empty stratum may be different from JPS sample with no empty strata. For example, Zamanzade and Wang (2017) showed that when there is at least one empty stratum, maximum likelihood estimator of the population proportion can be much more efficient than other estimators in some certain circumstances. Thus, another interesting topic for future research is to study of the effect of empty strata on the performance of bootstrap confidence intervals in JPS.