On a Partially Sequential Ranked Set Sampling Paradigm

Wolfe, Douglas A.

doi:10.1007/978-3-319-39065-9_9

Douglas A. Wolfe³

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 168))

1104 Accesses

Abstract

In a two-sample setting it is important to design statistical procedures that can take advantage of additional information to minimize the sample sizes required to reach reliable inferences about possible differences between the two populations. This is particularly true when it is difficult and/or costly to obtain sample observations from one or both of the populations. One class of procedures designed with this goal in mind uses the partially sequential sampling (PS) approach, first introduced by Wolfe (Journal of the American Statistical Association 72:202–205, 1977a). The use of ranked set sampling (RSS), first introduced by McIntyre (Australian Journal of Agricultural Research 3:385–390, 1952, reprinted in 2005), offers another approach for minimizing required sample sizes through the mechanism of obtaining more representative samples than can be achieved using simple random samples. In this paper we provide a review of these two sampling techniques and discuss options for melding the two methodologies to obtain partially sequential ranked set sample (PSRSS) two-sample test procedures that take advantage of the sample saving properties of both the PS and RSS approaches. To illustrate this combination, we consider PSRSS procedures where the fixed (control) sample is obtained via simple random sampling and the sequential (treatment) sample is obtained via ranked set sampling. Properties of the associated tests are discussed, including the limiting distributions as the fixed sample size tends to infinity.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Randomly selected order statistics in ranked set sampling: A less expensive comparable alternative to simple random sampling

Article 19 February 2018

On Ranked Set Sampling Variation and Its Applications to Public Health Research

Judgment Post-stratified Sampling with Multiple Ranking: A Comparison with Ranked Set Sampling

Keywords

9.1 Introduction

Minimizing the cost associated with collection of the sample data is a critical feature of most statistical analyses. As a result, it is important to develop statistical approaches to sampling that minimize the sample sizes necessary to achieve desired properties, whether it be precision of estimators, length of confidence intervals, or power of statistical tests. One technique that has been shown to be useful in this regard is ranked set sampling (RSS), first introduced by McIntyre (1952, reprinted in 2005) in the context of sampling from pasture and crop plots. This sampling approach uses readily available auxiliary information from individual units in a population to aid in the selection of more representative units for measurement than are typically generated by simple random sampling (SRS). Development of statistical procedures using this RSS approach remains an active area of research. [See, for example, the recent survey article by Wolfe (2012).] A second approach to data collection designed to reduce the sample size in a treatment versus control two-sample setting is the partially sequential (PS) paradigm introduced by Wolfe (1977a,b). This approach uses a negative binomial sampling framework to minimize the number of treatment observations necessary for reaching satisfactory statistical conclusions regarding the treatment’s efficacy.

In this paper we review the basic tenets of both the RSS and PS methodologies and discuss how to combine these approaches to develop partially sequential ranked set sample (PSRSS) two-sample test procedures. In Sect. 9.2 we present the PS two-sample framework and review previous work in this area. We describe the basic RSS approach in Sect. 9.3 and discuss a number of options available within this structure. We propose a class of melded PSRSS two-sample test procedures in Sect. 9.4 and develop their basic small sample and asymptotic properties as the control sample size becomes large. Section 9.5 is devoted to a general discussion of the opportunities presented by this new methodology as well as extensions for future research.

9.2 Partially Sequential Two-Sample Procedures

The partially sequential approach to data collection in the two-sample setting was first introduced by Wolfe (1977a). It is particularly appropriate for data collection settings such as the following:

1.
A sample from the first population (e.g., control) has already been collected and we do not wish to collect any more observations from the second population (e.g., new treatment) than are necessary for reaching a decision.
2.
Neither sample has been collected, but one of the samples (say the ‘standard’procedure observations) is relatively easy and inexpensive to collect, while the other sample observations (corresponding to the ‘new treatment’) are costly and/or difficult to collect. In such situations our goal would be to collect a sample (usually large) of standard observations and then collect only enough difficult-to-obtain new treatment observations necessary to reach statistically valid conclusions about potential differences between the two populations.

We first describe a general PS procedure to test for differences between two distributions. Let X ₁, ⋯ , X _m be a random sample from a continuous probability distribution with p.d.f. f(x) and c.d.f. F(x), where m is a fixed positive integer, and let G(y) be a second continuous distribution function with associated p.d.f. g(y). Let (x ₁, ⋯ , x _m) be an arbitrary m-tuple of real numbers and let A(x ₁, ⋯ , x _m) be a subset of the real line R depending on the m-tuple (x ₁, ⋯ , x _m). For example, A(⋅ ) could be the portion of R below the minimum x-value or the portion of R above the maximum x-value. Define the indicator function Ψ(⋅ ) by

$$\displaystyle\begin{array}{rcl} \varPsi (y) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1,\quad &\hspace{-6.0pt}\mbox{ if }\ y \in A(x_{1},\cdots \,,x_{m}), \\ 0,\quad &\hspace{-6.0pt}\mbox{ if }\ y\not\in A(x_{1},\cdots \,,x_{m}). \end{array} \right.& &{}\end{array}$$

(9.1)

Now let Y be a random variable (independent of X ₁, ⋯ , X _m) from the second distribution G(y). Applying Ψ(y) to these random variables X ₁, ⋯ , X _m and Y, we obtain the following

$$\displaystyle\begin{array}{rcl} \varPsi (Y ) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1,\quad &\hspace{-6.0pt}\mbox{ if }\ Y \in A(X_{1},\cdots \,,X_{m}), \\ 0,\quad &\hspace{-6.0pt}\mbox{ if }\ Y \not\in A(X_{1},\cdots \,,X_{m}). \end{array} \right.& &{}\end{array}$$

(9.2)

Thus, Ψ(Y ) is the random indicator variable for the random set A(X ₁, ⋯ , X _m).

With (9.2) in mind we sequentially sample mutually independent Y ′s from the distribution G(y) until a preset number, say r, of these Y ′s are in the set A(x ₁, ⋯ , x _m), where (x ₁, ⋯ , x _m) is the observed value of the previously collected random vector (X ₁, ⋯ , X _m). Define the statistic N _m (having random contributions from both the X and Y samples) by

$$\displaystyle{ N_{m} =\{ \text{ number of}\ Y \text{observations required to get}\ r\ Y 's\ \text{in}\ A(x_{1},\cdots \,,x_{m})\}. }$$

(9.3)

Wolfe (1977a) discussed how to use N _m to test the null hypothesis H ₀: F(x) ≡ G(x) against appropriate alternatives [depending on the nature of the set A(x ₁, ⋯ , x _m)]. The decision rule he proposed is to reject H ₀ when N _m ≤ N ₀(α, r, m, A), where N ₀(α, r, m, A) is the lower αth percentile point for the null (H ₀) distribution of N _m. Note that with this approach we will never need to collect more than N ₀(α, r, m, A) Y observations. In fact, we would stop even sooner with an even smaller Y sample size and (1) reject H ₀ as soon as we obtain rY observations in A(x ₁, ⋯ , x _m) or (2) fail to reject H ₀ as soon as we obtain $\{N_{0}(\alpha,r,m,A) - r + 1\}$ Y observations not in A(x ₁, ⋯ , x _m).

9.2.1 Properties of Partially Sequential Procedures

For given $X_{1} = x_{1},\cdots \,,X_{m} = x_{m}$, let

$$\displaystyle{ p_{m} = p_{m}(x_{1},\cdots \,,x_{m}) = P_{G}\{Y \in A(x_{1},\cdots \,,x_{m})\}. }$$

(9.4)

Thus, p _m is the conditional probability that an observation from the distribution G falls in the set A(x ₁, ⋯ , x _m) prescribed by the observed values from the F distribution. Then, conditional on $X_{1} = x_{1},\cdots \,,X_{m} = x_{m}$, N _m has a negative binomial distribution with parameters r and p _m; that is,

$$\displaystyle{ \begin{array}{rl} P(N_{m} = n&\vert X_{1} = x_{1},\cdots \,,X_{m} = x_{m}) \\ & = \binom{n - 1}{r - 1}[p_{m}(x_{1},\cdots \,,x_{m})]^{r}[1 - p_{m}(x_{1},\cdots \,,x_{m})]^{n-r}I_{\{r,r+1,r+2,\cdots \,\}}(n). \end{array} }$$

(9.5)

The unconditional distribution of N _m is obtained from the result in (9.5) by integrating over the distribution of the X′s, namely,

$$\displaystyle{ P(N_{m}=n)=E_{F}\{\binom{n - 1}{r - 1}[p_{m}(X_{1},\cdots \,,X_{m})]^{r}[1-p_{ m}(X_{1},\cdots \,,X_{m})]^{n-r}\}I_{\{ r,r+1,r+2,\cdots \,\}}(n). }$$

(9.6)

Since the investigator has flexibility in setting the sample size m for the X observations, it is of interest to know how N _m behaves as m becomes large, that is, as m → ∞. If, for given F and G, p _m(X ₁, ⋯ , X _m) converges in probability to a fixed number p = p(F, G), 0 < p ≤ 1, as m → ∞, then the limiting distribution (m → ∞) of N _m is negative binomial with parameters r and p; that is, the asymptotic distribution (m → ∞) of N _m is

$$\displaystyle{ P^{{\ast}}(N_{ m} = n) = \binom{n - 1}{r - 1}p^{r}(1 - p)^{n-r}I_{\{ r,r+1,r+2,\cdots \,\}}(n). }$$

(9.7)

(Note: A limiting value of $p = p(F,G) = 0$ does not satisfy the conditions for this result. If a pair (F, G) produces a limiting value of p = 0, the statistic N _m does not possess a limiting distribution as m → ∞, since in such cases N _m increases stochastically without limit as m → ∞.)

When m is fixed and large we can use the limiting distribution in (9.7) to select r to guarantee asymptotic (m → ∞) power against an alternative to H ₀ of interest. Let H _a be an alternative to H ₀ against which we require an approximate power β, where 0 < β < 1 is arbitrary. Let p ^∗ be the value of p in (9.7) that corresponds to the alternative H _a. Then from the definition of N ₀(α, r, ∞, A) (i.e., the approximate α-level critical value for the asymptotic, m → ∞, distribution of N _m), this approximate power requirement corresponds to

$$\displaystyle{ \sum \limits _{n=r}^{N_{0}(\alpha,r,\infty,A)}\binom{n - 1}{r - 1}(p^{{\ast}})^{r}(1 - p^{{\ast}})^{n-r} \geq \beta. }$$

(9.8)

For many partially sequential procedures, the left side of the inequality in (9.8) is a non-decreasing function of r. In this case, to satisfy our asymptotic power requirements with the fewest Y observations, we can preset r to be r = r ^∗, where r ^∗ is the smallest integer for which (9.8) is satisfied.

9.2.2 Examples

The PS approach can be used equally well in parametric or nonparametric settings. We briefly discuss two such examples.

Example 1: Parametric Setting

Let $F(x) =\varPhi \{\frac{x-\mu _{1}} {\sigma } \}$ and $G(y) =\varPhi \{\frac{y-\mu _{2}} {\sigma } \}$, where Φ(t) is the standard normal distribution function. The null hypothesis of interest is H ₀: μ ₁ = μ ₂ and we consider here the alternative H _a: μ ₂ > μ ₁.

One method for selecting the set A(x ₁, ⋯ , x _m) would be to view the indicator Ψ(⋅ ) in (9.1) as a critical function for testing H ₀ against H _a for random samples of sizes m and 1 from the F and G distributions, respectively. For example, we know that the uniformly most powerful level α ^∗ test of H ₀ against H _a for mX observations and a single Y observation has critical region

$$\displaystyle{ C(y,x_{1},\cdots \,,x_{m}) = \left \{(y,x_{1},\cdots \,,x_{m}): \frac{m^{1/2}} {(m + 1)^{1/2}} \frac{(y -\bar{ x})} {s} \geq t_{\alpha ^{{\ast}}}(m - 1)\right \}, }$$

where $\bar{x} =\sum \limits _{ i=1}^{m}x_{i}/m$, $s^{2} =\sum \limits _{ i=1}^{m}(x_{i} -\bar{ x})^{2}/(m - 1)$ and $t_{\alpha ^{{\ast}}}(m - 1)$ is the upper α ^∗ percentile point for the t distribution with m − 1 degrees of freedom.

Thus, in this setting it is natural to take the set A(x ₁, ⋯ , x _m) to be

$$\displaystyle{ A(x_{1},\cdots \,,x_{m}) =\{ y: y \geq \bar{ x} + t_{\alpha ^{{\ast}}}(m - 1)(\{m + 1\}s^{2}/m)^{\frac{1} {2} }\}. }$$

(9.9)

In fact, Orban and Wolfe (1978) showed that this choice of A(x ₁, ⋯ , x _m) leads to the asymptotically (m → ∞) most powerful level α ^∗ partially sequential procedure for testing H ₀ against H _a.

With A(x ₁, ⋯ , x _m) given by (9.9), we have

$$\displaystyle{ p_{m} = 1 -\varPhi (\{\bar{x} + t_{\alpha ^{{\ast}}}(m - 1)(\{m + 1\}s^{2}/m)^{\frac{1} {2} } -\mu _{2}\}/\sigma ) }$$

and the limiting distribution of N _m as m → ∞ is negative binomial (9.7) with parameters r and $p =\lim \limits _{m\rightarrow \infty }p_{m} = 1 -\varPhi \left \{\frac{\mu _{1}-\mu _{2}} {\sigma } + z_{\alpha ^{{\ast}}}\right \}$.

Example 2: Nonparametric Setting

Let F and G be arbitrary, continuous distribution functions. We wish to test H ₀: F ≡ G against the alternative H _a: ξ ₂ > ξ ₁, where ξ ₁ and ξ ₂ are the medians of the F and G distributions, respectively. Assume that m is an odd integer (more complicated, but tractable for m even) and define A(x ₁, ⋯ , x _m) by

$$\displaystyle{ A(x_{1},\cdots \,,x_{m}) =\{ y: y > m_{x}\}, }$$

(9.10)

where m _x = median(x ₁, ⋯ , x _m). Then the PS two-sample median test associated with N _m (9.3) has the following properties:

(a)
$p_{m} = 1 - G(m_{x})$ and the exact null (H ₀) distribution of N _m is given by
$$\displaystyle\begin{array}{rcl} P_{0}(N_{m} = n) = \left \{\begin{array}{@{}l@{\quad }l@{}} \binom{n - 1}{r - 1} \frac{m!} {[\{(m-1)/2\}!]^{2}} \frac{\varGamma \left (\frac{m+2n-2r+1} {2} \right )\varGamma \left (\frac{2r+m+1} {2} \right )} {(m+n)!},\quad &\ n = r,r + 1,\cdots \\ 0, \quad &\ \text{elsewhere}.\end{array} \right.& & {}\end{array}$$
(9.11)
(b)
The limiting distribution of N _m as m → ∞ is negative binomial (9.7) with parameters r and $p =\lim \limits _{m\rightarrow \infty }p_{m} = 1 - G(\xi _{1})$, with $p = 1/2$ or > 1∕2 depending on whether H ₀ or H _a is true, respectively.

Wolfe (1977b) initially proposed this PS two-sample median procedure and Orban and Wolfe (1982) studied its properties, including the expected number of Y observations required to conduct the test. They also provided the necessary tables for selecting r so that the approximate power requirement in (9.8) can be attained.

9.3 Ranked Set Sampling

The goal of RSS is to collect observations that are more likely to be representative of the full range of values in a population than the same number of observations obtained via SRS. To obtain a balanced RSS of k observations from a population, we proceed as follows. First, an initial SRS of k units is selected from the population and rank ordered on the attribute of interest. This ranking can be obtained through a variety of mechanisms, including visual comparisons, expert opinion, or through the use of correlated concomitant variables, but it cannot involve actual measurements of the attribute of interest on the selected units. The unit that is judged to be the smallest in this ranking is taken as the first item in the RSS and the attribute of interest is formally measured for the unit and denoted by X _[1]. Note that square brackets are used instead of the usual round brackets for the smallest order statistic since X _[1] may or may not actually have the smallest attribute measurement among the k units in the SRS, even though our ranking judged it to be the smallest. The other remaining k − 1 units in our initial SRS are not considered further in making inferences about the population—they were used solely to assist in the selection of the smallest ordered ranked unit for measurement.

Following the selection of X _[1], a second SRS (independent of the first SRS) of size k is selected from the population and ranked in the same manner as the first SRS. From this second SRS we select the item ranked as the second smallest of the k units (i.e., the second judgment order statistic) and add its attribute measurement, X _[2], to the RSS. From a third SRS (independent of both previous SRS’s) of size k we select the unit ranked to be the third smallest (i.e., the third judgment order statistic) and include its attribute measurement, X _[3], in the RSS. This process continues until we have selected the unit ranked to be the largest of the k units in the kth independent SRS and included its attribute measurement, X _[k], in our RSS.

This process results in the k measured observations X _[1], X _[2], ⋯ , X _[k] and is called a cycle. The number of units, k, in each SRS is called the set size. To complete a single ranked set cycle, we need to access a total of k ² units from the population to separately rank k independent simple random samples of size k each. The measured observations, X _[1], X _[2], ⋯ , X _[k], constitute a balanced ranked set sample of size k, where the descriptor “balanced” refers to the fact that we have collected one judgment order statistic for each of the ranks 1, 2, ⋯ , k. To obtain a final balanced RSS with a desired total number of measured observations n = qk, we repeat the entire process for q independent cycles, yielding the balanced RSS of size n: $X_{[1]_{j}},X_{[2]_{j}},\cdots \,,X_{[k]_{j}}$, for j = 1, ⋯ , q.

Note that a balanced RSS of size n differs from an SRS of size n in a number of important ways. An SRS is designed so that the n observations in the sample are mutually independent and identically distributed. This means that, probabilistically speaking, each of the individual sample items can be viewed as representative of a typical value from the underlying population. That is certainly not the case for a balanced RSS of size n. While the individual observations in a balanced RSS are also mutually independent, they are clearly not identically distributed. As such, it is not the case that each of the individual observations in a balanced RSS represents a typical value from the underlying population. On the contrary, the individual judgment order statistics represent very distinctly different portions of the underlying population. It is, however, precisely this additional structure on the items in the balanced RSS that enables it to provide greater assurance that the entire range of population values are represented in the sample data.

There have been numerous papers in the literature demonstrating the advantages that balanced RSS provides relative to SRS, both in terms of precision accuracy and in terms of reducing required sample sizes. Dell and Clutter (1972) showed that the estimator of the population mean μ based on a balanced RSS is unbiased and it has a variance that is never larger than the variance of the estimator of μ based on a SRS of the same size. The remarkable thing is that this result is true even if the judgment ranking for the balanced RSS is not perfect. The better the judgment ranking, of course, the greater the improvement from using a balanced RSS instead of a SRS. Stokes and Sager (1988) obtained similar results for the RSS estimator of the distribution function of the population and Terpstra (2004) did the same for the RSS maximum likelihood estimator for a population proportion.

While a balanced RSS is the most commonly occurring form of ranked set sampling data, there are situations where it is not optimal to collect the same number of measured observations for each of the judgment order statistics. For example, suppose we are interested primarily in making inferences about the median ξ of a distribution based on an odd number of observations $k = 2d + 1$. It is well known that among all the order statistics the sample median, X _(d+1), contains the most information about ξ when the underlying distribution is unimodal and symmetric. Thus, to make inferences about ξ, it is natural to measure the same judgment order statistic, X _[d+1], in each set so that it is measured all k times in each of the q cycles. The resulting RSS consists of qk measured observations, each of which is a judgment median from a set of size k. This is the most efficient RSS for making inferences about the population median ξ for a distribution that is both unimodal and symmetric, and it is clearly as unbalanced as possible. (A similar argument calls for a distinctly different unbalanced RSS for estimating the median of an asymmetric unimodal population or a multimodal population. See, for example, Ozturk and Wolfe 2000, and Chen et al. 2006.) We should point out, however, that such a median unbalanced RSS would not necessarily be a good idea if we wanted to make inference about other features of the population, such as its distribution function or the population variance.

RSS and related methodology has an active and rich literature. The interested reader is referred to the recent survey and review articles in Wolfe (2004) and Wolfe (2012) for more comprehensive discussions.

9.4 A Class of PSRSS Two-Sample Percentile Test Procedures

There are three approaches that can be taken to incorporate RSS into partially sequential procedures:

1.
Use RSS for the X sample data and SRS for the sequentially obtained Y sample data.
2.
Use RSS for both the X sample and Y sample data.
3.
Use SRS for the X sample data and RSS for the Y sample data.

All three of these options are worthy of consideration, although the first approach is probably the least interesting in the context where partially sequential procedures would be most useful. In this paper we concentrate on the most natural third option to provide an illustration of how to introduce RSS into the partially sequential process. To facilitate the discussion we consider the particular unbalanced RSS corresponding to all of the observations being collected at a single judgment order statistic and we assume that the judgment ranking is perfect, so that the various judgment order statistics can be viewed as true order statistics.

As before, let X ₁, ⋯ , X _m be a random sample from a probability distribution with p.d.f. f(x) and c.d.f. F(x), where m is an odd integer, and let G(y) be a second distribution function with associated p.d.f. g(y). Let M _X be the X sample median and let m _x be the observed value of M _X. Once again we wish to test H ₀: F ≡ G against the alternative H _a: ξ ₂ > ξ ₁, where ξ ₁ and ξ ₂ are the medians of the F and G distributions, respectively.

For illustrative purposes, we consider collecting unbalanced RSS data from G using a single cycle (q = 1) with set size k and measuring the jth order statistic, Y _(j), at each step of the sequential sampling, for fixed j ∈ { 1, ⋯ , k}. With this RSS Y -sampling scheme and the indicator set A(x ₁, ⋯ , x _m) = { y: y > m _x}, the associated PSRSS test of H ₀: F ≡ G against the alternative H _a: ξ ₂ > ξ ₁ has the following properties:

(a)
The unconditional exact distribution of N _m still has the form
$$\displaystyle{ P(N_{m} = n) = E_{F}\left \{\binom{n - 1}{r - 1}[p_{m}(X_{1},\cdots \,,X_{m})]^{r}[1 - p_{ m}(X_{1},\cdots \,,X_{m})]^{n-r}\right \}I_{\{ r,r+1,r+2,\cdots \,\}}(n), }$$
(9.12)
but the parameter p _m = p _m(x ₁, ⋯ , x _m) is now given by
$$\displaystyle{ p_{m} = P\{Y _{(j)} > m_{x}\} = 1 - Q_{j}(m_{x}), }$$
where Q _j(⋅ ) is the c.d.f. for the j ^th order statistic for a random sample of size k from G, given by
$$\displaystyle{ Q_{j}(t) =\sum \limits _{ u=j}^{k}\binom{k}{u}[G(t)]^{u}[1 - G(t)]^{k-u}. }$$
(9.13)
Combining (9.12) and (9.13), the unconditional distribution of N _m becomes
$$\displaystyle{ P(N_{m} = n) = E_{F_{M_{ X}}}\left \{\binom{n - 1}{r - 1}[1 - Q_{j}(M_{X})]^{r}[Q_{ j}(M_{X})]^{n-r}\right \}I_{\{ r,r+1,r+2,\cdots \,\}}(n), }$$
(9.14)
where $F_{M_{X}}$ is the c.d.f. of the sample median for a random sample of size m from F. Using the standard form of $F_{M_{X}}$ for an odd sample size m in expression (9.14), it follows that
$$\displaystyle{ \begin{array}{rl} P(N_{m} = n) =&\int _{-\infty }^{\infty }\binom{n - 1}{r - 1}[1 - Q_{j}(t)]^{r}[Q_{j}(t)]^{n-r} \\ & \times \frac{m!} {[\left (\frac{m-1} {2} \right )!]^{2}} \{F(t)[1 - F(t)]\}^{\frac{m-1} {2} }f(t)dtI_{\{r,r+1,r+2,\cdots \,\}}(n), \end{array} }$$
(9.15)
Under H ₀: F ≡ G it follows from the change of variable v = F(t) in (9.15) that the null distribution for N _m “simplifies” to
$$\displaystyle{ \begin{array}{rl} P(N_{m} = n) =&\int _{0}^{1}\binom{n - 1}{r - 1}\left \{1 -\sum \limits _{u=j}^{k}\binom{k}{u}[v]^{u}[1 - v]^{k-u}\right \}^{r}\left [\sum \limits _{u=j}^{k}\binom{k}{u}[v]^{u}[1 - v]^{k-u}\right ]^{n-r} \\ & \times \frac{m!} {[\left (\frac{m-1} {2} \right )!]^{2}} \{v(1 - v)\}^{\frac{m-1} {2} }dvI_{\{r,r+1,r+2,\cdots \,\}}(n), \end{array} }$$
(9.16)
This expression clearly does not depend on the form of the continuous F, so that the test based on N _m is distribution-free and the exact critical values for the test can be evaluated from (9.16) without knowledge of F.
(b)
The limiting distribution of N _m as m → ∞ is negative binomial with parameters r and $p_{j}^{{\ast}} =\lim \limits _{m\rightarrow \infty }p_{m} = 1 - Q_{j}(\xi _{1})$.

Using the expression for Q _j(t) in (9.13), we see that
$$\displaystyle{ p_{j}^{{\ast}} = 1 - Q_{ j}(\xi _{1}) = 1 -\sum \limits _{u=j}^{k}\binom{k}{u}[G(\xi _{ 1})]^{u}[1 - G(\xi _{ 1})]^{k-u}, }$$
which simplifies under the null hypothesis to
$$\displaystyle{ p_{0j}^{{\ast}} = 1-Q_{ j}(\xi _{1}) = 1-\sum \limits _{u=j}^{k}\binom{k}{u}[F(\xi _{ 1})]^{u}[1-F(\xi _{ 1})]^{k-u} = 1-\sum \limits _{ u=j}^{k}\binom{k}{u}[0.5)]^{u}[0.5]^{k-u}, }$$
(9.17)

9.4.1 Special Cases

1.
j = k—here we are measuring the maximum judgment order statistic in each set and
$$\displaystyle{ p_{0k}^{{\ast}} = 1 -\sum \limits _{ u=k}^{k}\binom{k}{u}[0.5)]^{u}[0.5]^{k-u} = 1 -\binom{k}{k}[0.5)]^{k}[0.5]^{k-k} = 1 - (0.5)^{k}, }$$
(9.18)
which converges to 1 as k → ∞.
2.
j = 1—here we are measuring the minimum judgment order statistic in each set and
$$\displaystyle\begin{array}{rcl} p_{01}^{{\ast}}& =& 1 -\sum \limits _{ u=1}^{k}\binom{k}{u}[0.5)]^{u}[0.5]^{k-u} \\ & =& 1 - [\sum \limits _{u=0}^{k}\binom{k}{u}[0.5)]^{u}[0.5]^{k-u} -\binom{k}{0}[0.5)]^{0}[0.5]^{k-0}] \\ & =& 1 - [1 - (0.5)^{k}] = (0.5)^{k}, {}\end{array}$$
(9.19)
which converges to 0 as k → ∞. (Remember that this is not a viable option for PSRSS.)
3.
$j = d + 1$, where $k = 2d + 1$ is an odd integer—here we are measuring the median judgment order statistic, Y _(d+1), in each set and
$$\displaystyle\begin{array}{rcl} p_{0(d+1)}& =& 1 -\sum \limits _{u=d+1}^{2d+1}\binom{2d + 1}{u}[0.5)]^{u}[0.5]^{(2d+1)-u} \\ & =& \sum \limits _{u=0}^{d}\binom{2d + 1}{u}[0.5)]^{u}[0.5]^{(2d+1)-u} \\ & =& \frac{1} {2}\sum \limits _{u=0}^{2d+1}\binom{2d + 1}{u}[0.5)]^{u}[0.5]^{(2d+1)-u} = \frac{1} {2}(1) = 0.5. {}\end{array}$$
(9.20)

The fact that the limiting distribution under the general alternative F ≠ G depends on both the negative binomial stopping parameter r and the set size k provides us with even greater flexibility in designing a study with the idea of guaranteeing prescribed power against specific alternatives. Both increasing r and increasing k will lead to increased power for the PSRSS median procedure, but increasing r will also lead to a larger number of measured observations from the Y distribution, something that we are trying to avoid. Increasing k and/or increasing the initial sample size m from the X distribution can be used as effective alternatives for increasing the power without increasing the number of measured Y observations.

9.5 Discussion and Future Research

Small sample and asymptotic properties of the PSRSS two-sample median test procedure (corresponding to special case 3) based on measuring the Y sample median (for an odd set size k) in every ranked set have been investigated extensively by Matthews et al. (2016). They found that taking the RSS approach for collection of the Y sample observations leads to both increased power and decreased expected Y sample size relative to the PSRSS version studied by Orban and Wolfe (1982). This is due to both the intrinsic structure inherent in the partially sequential approach to the two-sample problem and the ranked set sampling methodology employed in obtaining the Y sample. As noted in Sect. 9.4, further improvements in both power and reduced Y sample size can likely be obtained by utilizing RSS to collect both the X and Y sample observations. The basic formulation of this dual RSS approach would be analogous to what we utilized in this paper using SRS to collect the X sample items, although the mathematical properties would be more complicated. Another intriguing possibility would be to develop PSRSS methodology that utilized a fully balanced RSS approach to the collection of the Y observations, rather than relying solely on the use of the medians of the ranked sets. This could also include a fully balanced RSS approach to collection of the initial X sample, leading to natural partially sequential analogues to the two- sample balanced RSS procedures considered by Bohn and Wolfe (1992) and Fligner and MacEachern (2006).

References

Bohn, L., & Wolfe, D. A. (1992). Nonparametric two-sample procedures for ranked-set samples data. Journal of the American Statistical Association, 87, 552–561.
Article MATH Google Scholar
Chen, H., Stasny, E. A., & Wolfe, D. A. (2006). Unbalanced ranked set sampling for estimating a population proportion. Biometrics, 62, 150–158.
Article MathSciNet MATH Google Scholar
Dell, T. R., & Clutter, J. L. (1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545–555.
Article MATH Google Scholar
Fligner, M. A., & MacEachern, S. N. (2006). Nonparametric two-sample methods for ranked-set sample data. Journal of the American Statistical Association, 101, 1107–1118.
Article MathSciNet MATH Google Scholar
Matthews, M. J., Stasny, E. A., & Wolfe, D. A. (2016). Partially sequential median test procedures using ranked set sample data. Special Issue of Journal of Applied Statistical Sciences (to appear).
Google Scholar
McIntyre, G. A. (1952). A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research, 3, 385–390.
Article Google Scholar
McIntyre, G. A. (2005). A method for unbiased selective sampling, using ranked sets. The American Statistician, 59, 230–232 (Reprinted).
Google Scholar
Orban, J., & Wolfe, D. A. (1978). Optimality criteria for the selection of a partially sequential indicator set. Biometrika, 65, 357–362.
Article MATH Google Scholar
Orban, J., & Wolfe, D. A. (1982). Properties of a distribution-free two-stage two-sample median test. Statistica Neerlandica, 36, 15–22.
Article MathSciNet MATH Google Scholar
Özturk, Ö., & Wolfe, D. A. (2000). Optimal allocation procedure in ranked set sampling for unimodal and multi-modal distributions. Environmental and Ecological Statistics, 7, 343–356.
Article Google Scholar
Stokes, S. L., & Sager, T. W. (1988). Characterization of a ranked-set sample with application to estimating distribution functions. Journal of the American Statistical Association, 83, 374–381.
Article MathSciNet MATH Google Scholar
Terpstra, J. T. (2004). On estimating a population proportion via ranked set sampling. Biometrical Journal, 46, 264–272.
Article MathSciNet Google Scholar
Wolfe, D. A. (1977a). On a class of partially sequential two-sample test procedures. Journal of the American Statistical Association, 72, 202–205.
Google Scholar
Wolfe, D. A. (1977b). Two-stage two-sample median test. Technometrics, 19, 495–501.
Google Scholar
Wolfe, D. A. (2004). Ranked set sampling: an approach to more efficient data collection. Statistical Science, 19, 636–643.
Article MathSciNet MATH Google Scholar
Wolfe, D. A. (2012). Ranked set sampling—its relevance and impact on statistical inference. ISRN Probability and Statistics, 2012, 32 p. (Spotlight Article).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Ohio State University, Columbus, OH, 43210, USA
Douglas A. Wolfe

Authors

Douglas A. Wolfe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Douglas A. Wolfe .

Editor information

Editors and Affiliations

Department of Statistics, Rutgers University, New Brunswick, New Jersey, USA
Regina Y. Liu
Department Statistics, Western Michigan University, Kalamazoo, Michigan, USA
Joseph W. McKean

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wolfe, D.A. (2016). On a Partially Sequential Ranked Set Sampling Paradigm. In: Liu, R., McKean, J. (eds) Robust Rank-Based and Nonparametric Methods. Springer Proceedings in Mathematics & Statistics, vol 168. Springer, Cham. https://doi.org/10.1007/978-3-319-39065-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-39065-9_9
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39063-5
Online ISBN: 978-3-319-39065-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

On a Partially Sequential Ranked Set Sampling Paradigm

Abstract

Similar content being viewed by others

Randomly selected order statistics in ranked set sampling: A less expensive comparable alternative to simple random sampling

On Ranked Set Sampling Variation and Its Applications to Public Health Research

Judgment Post-stratified Sampling with Multiple Ranking: A Comparison with Ranked Set Sampling

Keywords

9.1 Introduction

9.2 Partially Sequential Two-Sample Procedures

9.2.1 Properties of Partially Sequential Procedures

9.2.2 Examples

Example 1: Parametric Setting

Example 2: Nonparametric Setting

9.3 Ranked Set Sampling

9.4 A Class of PSRSS Two-Sample Percentile Test Procedures

9.4.1 Special Cases

9.5 Discussion and Future Research

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On a Partially Sequential Ranked Set Sampling Paradigm

Abstract

Similar content being viewed by others

Randomly selected order statistics in ranked set sampling: A less expensive comparable alternative to simple random sampling

On Ranked Set Sampling Variation and Its Applications to Public Health Research

Judgment Post-stratified Sampling with Multiple Ranking: A Comparison with Ranked Set Sampling

Keywords

9.1 Introduction

9.2 Partially Sequential Two-Sample Procedures

9.2.1 Properties of Partially Sequential Procedures

9.2.2 Examples

Example 1: Parametric Setting

Example 2: Nonparametric Setting

9.3 Ranked Set Sampling

9.4 A Class of PSRSS Two-Sample Percentile Test Procedures

9.4.1 Special Cases

9.5 Discussion and Future Research

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation