Keywords

7.1 Introduction

Multivariate data in factorial designs with few replications arise in agricultural, behavioral and biomedical studies, just to mention a few. However, due to the lack of appropriate inference procedures, such data are often analyzed using simplistic univariate methods or questionable model assumptions (e.g., multivariate normality). In this article, we develop fully nonparametric methods for the analysis of such data. These nonparametric methods allow for the analysis of data with ordinal responses, and they contain desirable invariance properties, as each variable (endpoint) can be monotonically transformed without changing the outcome of the analysis. Furthermore, the proposed methods represent the first nonparametric approach that is asymptotically valid when the number (a) of different samples or treatment groups (more generally, the number of levels of one of the factors) is large. In order to illustrate application of the procedure, we use the following data example.

7.1.1 Agricultural Field Trial

In an agricultural experiment that stands here exemplary for many similarly conducted field trials, several varieties of crabapples are examined with regard to their disease resistance (Chatfield et al. 2000). The response variable is a rating of tree health, on an ordinal scale from 0 to 5. Trees are evaluated at different times during the growing season, generating a multivariate observation vector per tree. When the experiment is repeated in a different year or at a different location, a second treatment factor is introduced whose main effect and interaction with the plant variety need to be considered, in addition to the variety effect. In Chatfield et al. (2000), the number of crabapple varieties was a = 63, justifying the use of methods derived for the asymptotic situation of a → . The number n ij of replicates per variety were between 3 and 5. If we assume that the same study is performed at two different agricultural experiment stations or in two different years, we would be in the situations with b = 2.

7.1.2 Model

We describe the model using a two-factor layout corresponding to the data example. Generalization to higher-way layouts can be done using the techniques described here. On each experimental unit, a p-dimensional response vector is observed. These vectors are described by

$$\displaystyle{\mathbf{X}_{ijr} = (X_{ijr}^{(1)},X_{ ijr}^{(2)},\ldots,X_{ ijr}^{(p)})'.}$$

Here, the first two indices, i = 1, , a, and j = 1, , b, denote the levels of the two explanatory factors considered (in the example, year or location and variety, respectively). The index r = 1, , n ij stands for the replication or experimental unit within a factor level combination, and the super-index d = 1, , p denotes the respective variable, among the total of p response variables considered (p-dimensional response). A possible multivariate additive linear model for X ijr could be:

$$\displaystyle\begin{array}{rcl} \mathbf{X}_{ijr} =\boldsymbol{\mu } +\boldsymbol{\xi }_{i} +\boldsymbol{\lambda } _{j} +\boldsymbol{\gamma } _{ij} +\boldsymbol{\varepsilon } _{ijr},& & {}\\ \end{array}$$

where \(\boldsymbol{\xi }\), \(\boldsymbol{\lambda }\), \(\boldsymbol{\gamma }\) are the effects due to experimental condition, variety, and interaction between experimental condition and variety, and \(\boldsymbol{\varepsilon }\) is the random variation assumed to be independently distributed with mean vector 0 and covariance matrix \(\boldsymbol{\varSigma _{ij}}\).

Some drawbacks of the linear model approach are that the results depend on the type of transformation used and can be heavily influenced by outliers. In this manuscript, we are proposing a completely nonparametric alternative to the linear model approach. This nonparametric model can be written as

$$\displaystyle{ \mathbf{X}_{ijr} \sim F_{ij}, }$$
(7.1)

where F ij is the multivariate p-dimensional distribution of the response vector for factor level combination (i, j). This model imposes no restriction on distributions or correlations of error terms or random effects. The dependence in the data, induced by observing several outcome variables on the same subject, is entirely absorbed by modeling them as multivariate observation vectors, allowing for arbitrary, unspecified dependence structures among the response variables. The vectors X ijr are independent for different indices i, j, or r, but the components of the vectors are possibly dependent.

In this manuscript, we are proposing a completely nonparametric model for the analysis of multivariate data from factorial experiments, applicable in a variety of situations. Inferential methods for the two-factor heteroscedastic model have relatively been well developed in the univariate case in the parametric as well as nonparametric contexts (for the latter, see for example, the monograph Brunner et al. (2002), and the references therein). There is some recent work for the semiparametric multivariate counterpart (Harrar and Bathke 2012; Konietschke et al. 2015; Van Aelst and Willems 2011), and several procedures have been proposed under the assumption of multivariate normality (Belloni and Didier 2008; Girón and del Castillo 2010; Kawasaki and Seo 2012; Krishnamoorthy and Lu 2010; Krishnamoorthy and Yu 20042012; Nel and Van der Merwe 1986; Zhang 20112012; Zhang and Liu 2013). However, not much has been done under the nonparametric paradigm, in particular under the asymptotic framework of a large number of factor levels. This asymptotic setup is becoming increasingly popular due to high throughput diagnostics and other bioinformatics tools which generate massive amounts of data. More motivations for this type of asymptotics in agriculture, health sciences, and other disciplines are found in Boos and Brownie (1995), Akritas and Arnold (2000), Bathke (2002), Bathke (2004) and Harrar and Gupta (2007) in univariate settings, and Gupta et al. (2006), Gupta et al. (2008), Bathke and Harrar (2008) and Harrar and Bathke (2008) in the multivariate setting. Whereas the work of Gupta et al. (20062008) is restricted to the equal covariance matrix case, Bathke and Harrar (2008) and Harrar and Bathke (2008) consider the single factor nonparametric situation.

In the following sections, hypotheses and corresponding test statistics are introduced, and their asymptotic properties are derived. A section is devoted to the cumbersome task of consistent estimation of the variance-covariance matrix, and one section shows empirical evidence regarding the performance of the proposed tests, based on a simulation study.

Regarding the notation, a block diagonal matrix with blocks A and B will be written as AB, and the Kronecker product of matrices will be denoted as AB. See, for example, Schott (2005, Sect. 8.2) or Harville (2008, Sect. 16.1) for the definition and basic properties of the Kronecker product.

7.2 Hypotheses and Test Statistics

In general notation, the two experimental factors are denoted as factor A and factor B, respectively. Based on the nonparametric model (7.1), we will test hypotheses pertaining to these factors. The hypotheses are formulated in terms of the distribution functions F ij . To this end, define the vector F = (F 11, , F 1b , F 21, , F ab ) of cumulative distribution functions. Here, we assume the normalized versions of the distribution functions, allowing naturally for ties (Kruskal 1952; Lévy 1925; Ruymgaart 1980) and thus not restricting the methodology to absolutely continuous distributions.

Particular hypotheses of interest will be of the form \(\mathcal{H}^{\psi }: D_{\psi }\mathbf{F} = \mathbf{0}\), postulating absence of the effect ψ. More specifically,

$$\displaystyle{D_{\psi } = \left \{\begin{array}{@{}l@{\quad }l@{}} \mathbf{P}_{a} \otimes \frac{1} {b}\mathbf{J}_{b},\quad &\text{ for }\psi = A, \\ \mathbf{P}_{a} \otimes \mathbf{I}_{b} \quad &\text{ for }\psi = A\vert B, \\ \frac{1} {a}\mathbf{J}_{a} \otimes \mathbf{P}_{b} \quad &\text{ for }\psi = B, \\ \mathbf{I}_{a} \otimes \mathbf{P}_{b} \quad &\text{ for }\psi = B\vert A, \\ \mathbf{P}_{a} \otimes \mathbf{P}_{b} \quad &\text{ for }\psi = AB,\\ \quad \end{array} \right.}$$

where I d is the d × d identity matrix, J d is the d × d unity matrix (matrix of ones), and \(\mathbf{P}_{d} = \mathbf{I}_{d} - d^{-1}\mathbf{J}_{d}\).

These nonparametric hypotheses imply their corresponding parametric counterparts (see, e.g., Brunner et al. 2002; Harrar and Bathke 2008). As an illustration in the univariate context, the interaction effect in a parametric linear model can be expressed as \((\mathbf{P}_{a} \otimes \mathbf{P}_{b})\boldsymbol{\mu }\), where \(\boldsymbol{\mu }\) is the lexicographically arranged vector of cell means, \(\boldsymbol{\mu }= (\mu _{11},\ldots,\mu _{1b},\mu _{21},\ldots,\mu _{ab})'\). The implication between nonparametric and parametric hypotheses is immediately clear when expressing \((\mathbf{P}_{a} \otimes \mathbf{P}_{b})\boldsymbol{\mu }\) in terms of the distribution functions as (P a P b )∫ x d F(x). The same relation holds between the multivariate nonparametric and parametric analogs. The converse of this relation is not true: the parametric hypotheses do not imply their nonparametric counterparts.

In order to define nonparametric (rank-based) test statistics, let R ij  = (R ij1, R ij2, , \(\mathbf{R}_{ijn_{ij}})\) where R ijk  = (R ijk (1), , R ijk (p))′ and R ijk (l) is the (mid-)rank of X ijk (l) among all \(N =\sum _{ i=1}^{a}\sum _{j=1}^{b}n_{ij}\) random variables \(X_{111}^{(l)},\ldots,X_{abn_{ab}}^{(l)}\). Use of mid-ranks follows naturally from the normalized version of the cumulative distribution function (see above). Arranging these mid-ranks R ijk (l) into a p × N matrix, put R = (R 1, R 2, , R a ) where R i  = (R i1, R i2, , R ib ). Then, denote the p × p hypothesis and error sum of squares and cross product matrices based on the ranks as H (A)(R), H (A | B)(R), H (AB)(R) and G(R). The corresponding matrices for testing main and simple effects of factor B can be written analogously to those of factor A. However, due to the large a asymptotics considered in this manuscript, we will not consider tests for the main effect of factor B in detail here.

$$\displaystyle\begin{array}{rcl} \mathbf{H}^{(A)}& =& \frac{1} {a - 1}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b}(\mathbf{\tilde{R}}_{ i..} -\mathbf{\tilde{R}}_{\ldots })(\mathbf{\tilde{R}}_{i..} -\mathbf{\tilde{R}}_{\ldots })' {}\\ & =& \frac{1} {a - 1}\mathbf{R}\left [\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}\right )(\mathbf{P}_{a} \otimes \frac{1} {b}\mathbf{J}_{b})\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}'\right )\right ]\mathbf{R}', {}\\ \mathbf{H}^{(A\vert B)}& =& \frac{1} {(a - 1)b}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b}(\mathbf{\bar{R}}_{ ij.} -\mathbf{\tilde{R}}_{.j.})(\mathbf{\bar{R}}_{ij.} -\mathbf{\tilde{R}}_{.j.})' {}\\ & =& \frac{1} {(a - 1)b}\mathbf{R}\left [\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}\right )(\mathbf{P}_{a} \otimes \mathbf{I}_{b})\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}'\right )\right ]\mathbf{R}', {}\\ \mathbf{H}^{(B)}& =& \frac{1} {b - 1}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b}(\mathbf{\tilde{R}}_{.j.} -\mathbf{\tilde{R}}_{\ldots })(\mathbf{\tilde{R}}_{.j.} -\mathbf{\tilde{R}}_{\ldots })' {}\\ & =& \frac{1} {b - 1}\mathbf{R}\left [\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}\right )(\frac{1} {a}\mathbf{J}_{a} \otimes \mathbf{P}_{b})\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}'\right )\right ]\mathbf{R}', {}\\ \mathbf{H}^{(B\vert A)}& =& \frac{1} {a(b - 1)}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b}(\mathbf{\bar{R}}_{ ij.} -\mathbf{\tilde{R}}_{i..})(\mathbf{\bar{R}}_{ij.} -\mathbf{\tilde{R}}_{i..})' {}\\ & =& \frac{1} {a(b - 1)}\mathbf{R}\left [\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}\right )(\mathbf{I}_{a} \otimes \mathbf{P}_{b})\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}'\right )\right ]\mathbf{R}', {}\\ \mathbf{H}^{(AB)}& =& \frac{1} {(a - 1)(b - 1)}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b}(\mathbf{\bar{R}}_{ ij.} -\mathbf{\tilde{R}}_{i..} -\mathbf{\tilde{R}}_{.j.} + \mathbf{\tilde{R}}_{\ldots })(\mathbf{\bar{R}}_{ij.} -\mathbf{\tilde{R}}_{i..} -\mathbf{\tilde{R}}_{.j.} + \mathbf{\tilde{R}}_{\ldots }) {}\\ & =& \frac{1} {(a - 1)(b - 1)}\mathbf{R}\left [\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}\right )(\mathbf{P}_{a} \otimes \mathbf{P}_{b})\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{1}_{n_{ij}}'\right )\right ]\mathbf{R}',\quad \text{and} {}\\ \mathbf{G}& =& \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)}\sum \limits _{k=1}^{n_{ij} }(\mathbf{R}_{ijk} -\mathbf{\bar{R}}_{ij.})(\mathbf{R}_{ijk} -\mathbf{\bar{R}}_{ij.})' = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}}\mathbf{S}_{ij} {}\\ & =& \frac{1} {ab}\mathbf{R}\left (\bigoplus _{i=1}^{a}\bigoplus _{ j=1}^{b} \frac{1} {n_{ij}(1 - n_{ij})}\mathbf{P}_{n_{ij}}\right )\mathbf{R}', {}\\ \end{array}$$

where \(\mathbf{\bar{R}}_{ij.} = \frac{1} {n_{ij}}\sum \limits _{k=1}^{n_{ij}}\mathbf{R}_{ijk}\), \(\mathbf{\tilde{R}}_{i..} = \frac{1} {b}\sum \limits _{j=1}^{b}\mathbf{\bar{R}}_{ ij.}\), \(\mathbf{\tilde{R}}_{.j.} = \frac{1} {a}\sum \limits _{i=1}^{a}\mathbf{\bar{R}}_{ ij.}\), \(\mathbf{\tilde{R}_{\ldots }} = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b}\mathbf{\bar{R}}_{ ij.}\), and \(\mathbf{S}_{ij} = \frac{1} {(n_{ij}-1)}\sum \limits _{k=1}^{n_{ij}}(\mathbf{R}_{ ijk} -\mathbf{\bar{R}}_{ij.})(\mathbf{R}_{ijk} -\mathbf{\bar{R}}_{ij.})'\).

These sum of squares and cross product matrices constitute essentially a nonparametric multivariate unweighted means analysis. The matrix notation above shows the pattern after which they can also be defined in higher-way layouts. To keep this manuscript concise, this extension to higher-way layouts is not carried out in detail here. Under the hypothesis \(\mathcal{H}^{\psi }\), the expectation of H (ψ) is equal to the expectation of G, thus allowing for the following way to construct multivariate test statistics.

Let ψ be one of the effects under consideration: AB, A | B, A, B, or B | A. We propose the following multivariate test statistics for testing \(\mathcal{H}^{\psi }\).

  1. (a)

    Dempster’s ANOVA Type criterion: \(T_{\mathrm{D}}^{(\psi )} =\mathrm{ tr}(\mathbf{H}^{(\psi )})/\mathrm{tr}(\mathbf{G})\).

  2. (b)

    Wilks’ Λ criterion: \(T_{\mathrm{LR}}^{(\psi )} =\log \vert \mathbf{I} + \mathbf{H}^{(\psi )}\mathbf{G}^{-}\vert \).

  3. (c)

    The Lawley-Hotelling criterion: \(T_{\mathrm{LH}}^{(\psi )} =\mathrm{ tr}(\mathbf{H}^{(\psi )}\mathbf{G}^{-})\).

  4. (d)

    The Bartlett-Nanda-Pillai criterion: \(T_{\mathrm{BNP}}^{(\psi )} =\mathrm{ tr}\left (\mathbf{H}^{(\psi )}\mathbf{G}^{-}(\mathbf{I} + \mathbf{H}^{(\psi )}\mathbf{G}^{-})\right )\).

These test statistics are similar to the four test statistics considered in Harrar and Bathke (2012) in the context of a two-factor semiparametric MANOVA under heteroscedasticity. Their use in this manuscript is distinct in two important ways. In the present article, the sums of squares and cross products H (ψ) and G are computed from the ranks which can not be assumed to be independent across subjects. Due to the discreteness of the rankings, it may not be reasonable to assume non-singularity of the matrices G and H (ψ) +G. Thus we use here Moore-Penrose generalized inverses in defining the test statistics. The Moore-Penrose generalized inverse has the useful continuity property (Schott 2005, Sect. 5.7; for a proof see, e.g., Penrose 1955).

7.3 Asymptotic Results

For the asymptotic derivations in this section, we will assume that a → , b bounded, and ∀i, j: n ij bounded. The asymptotics are somewhat involved as the quadratic forms H (ψ) and G are based on a matrix of ranks R which has both row-wise and column-wise dependence.

For the mathematical derivations in the technical proofs of this manuscript, it is convenient to use the so-called “asymptotic rank transforms” (ART) and “rank transforms” (RT). They are formally introduced in the following definition. For the concept of ART, see also Brunner et al. (2002, p. 77).

Definition 7.1.

Let \(\mathbf{X}_{ijk} = (X_{ijk}^{(1)},\ldots,X_{ijk}^{(p)})',\ i = 1,\ldots,a,\ j = 1,\ldots,b\), and k = 1, , n ij , be independent random vectors with possibly dependent components X ijk (l) whose marginal distribution is F ij (l),  l = 1, , p. Let \(N =\sum _{ i=1}^{a}\sum _{j=1}^{b}n_{ij}\). Further let

$$\displaystyle{H^{(l)}(x) = \frac{1} {N}\sum _{i=1}^{a}\sum _{ j=1}^{b}n_{ ij}F_{ij}^{(l)}(x)}$$

denote the average cdf for variable (l),

$$\displaystyle{\hat{H} ^{(l)}(x) = \frac{1} {N}\sum _{i=1}^{a}\sum _{ j=1}^{b}\sum _{ k=1}^{n_{ij} }c(x - X_{ijk}^{(l)}),}$$

where \(c(t) = 0,1/2,1\) if t < 0, t = 0, t > 0, respectively, denotes the average empirical cdf, and Y = (Y 1, , Y a ) where Y i  = (Y i1, , Y ib ), \(\mathbf{Y}_{ij} = (\mathbf{Y}_{ij1},\ldots,\mathbf{Y}_{ijn_{ij}})\) and Y ijk  = (Y ijk (1), , Y ijk (p))′ where Y ijk (l) = H (l)(X ijk (l)) is known as the asymptotic rank transform (ART) of X ijk (l). The matrix of rank transforms (RT), \(\hat{\mathbf{Y}}\), is defined analogously, with elements \(\hat{Y } _{ijk}^{(l)} = \hat{H} ^{(l)}(X_{ijk}^{(l)})\).

The expression “rank transform” pays tribute to the fact that \(\hat{Y }_{ijk}^{(l)}\) is related to the (mid-)rank R ijk (l) by \(\hat{Y } _{ijk}^{(l)} = N^{-1}(R_{ijk}^{(l)} -\frac{1} {2})\). However, the “asymptotic rank transforms” are technically more tractable than the “rank transforms”, due to the simpler covariance structure of Y as compared to \(\hat{\mathbf{Y}}\). Note that the ART of independent random variables are independent, but the RT are not.

Denote \(\mathrm{Var}(\mathbf{Y}_{ij1}) = \boldsymbol{\varSigma }_{ij}\) and assume that the following limit exists:

$$\displaystyle{\lim \limits _{a\rightarrow \infty } \frac{1} {ab}\sum _{i=1}^{a}\sum _{ j=1}^{b}\boldsymbol{\varSigma }_{ ij} = \boldsymbol{\varSigma }.}$$

For later use, we also introduce the notation \(\mathbf{M} = (\boldsymbol{\mu }_{1},\boldsymbol{\mu }_{2},\ldots,\boldsymbol{\mu }_{a})\), \(\boldsymbol{\mu }_{i} = (\boldsymbol{\mu }_{i1},\) , \(\boldsymbol{\mu }_{ib})\), \(\boldsymbol{\mu }_{ij} = (\boldsymbol{\mu }_{ij1},\ldots,\boldsymbol{\mu }_{ijn_{ij}})\), where \(\boldsymbol{\mu }_{ijk} = (\mu _{ijk}^{(1)},\ldots,\mu _{ijk}^{(p)})'\) is the vector of expectations of the ART vector Y ijk , that is μ ijk (l) = E(Y ijk (l)), and \(\mathbf{Y}_{\mu } = \mathbf{Y} -\mathbf{M},\ \hat{\mathbf{Y}} _{\mu } = \hat{\mathbf{Y}}-\mathbf{M}\).

For ψ ∈ { A, B, A | B, B | A, AB}, we denote the ART analogs of the matrices H (ψ) and G defined in Sect. 7.2 by \(\tilde{\mathbf{H}}^{(\psi )}\) and \(\tilde{\mathbf{G}}\), respectively. In order to prove asymptotic normality results for the rank-based test statistics considered in this paper, we need to first establish the asymptotic equivalence of certain quadratic forms defined in terms of (H (ψ), G) (based on “rank transforms”) and the corresponding quadratic forms defined in terms of \((\tilde{\mathbf{H}}^{(\psi )},\tilde{\mathbf{G}})\) (based on “asymptotic rank transforms”).

We begin this task by showing the asymptotic equivalence between certain matrix differences in “rank transforms’ and the corresponding ones in “asymptotic rank transforms”. Recall that the ranks R ijk (l) take values in [1, N], while the “rank transforms” \(\hat{Y }_{ijk}^{(l)} = N^{-1}(R_{ijk}^{(l)} -\frac{1} {2})\) and the “asymptotic rank transforms” Y ijk (l) take values within the unit interval, making it necessary to divide the rank matrices by N 2 in order to be able to establish asymptotic equivalence.

Proposition 7.1.

Assume b, p and n are bounded. Then

  1. (i)

    Under the hypothesis \(\mathcal{H}^{\psi }\) for ψ ∈{ A,A|B,AB}

    $$\displaystyle{\sqrt{a}\ \left \{ \frac{1} {N^{2}}(\mathbf{H}^{(\psi )} -\mathbf{G}) - (\tilde{\mathbf{H}}^{(\psi )} -\tilde{\mathbf{G}})\right \} = o_{ p}(1)\quad \text{ as }a \rightarrow \infty \ .}$$
  2. (ii)

    Under the hypothesis \(\mathcal{H}^{\psi }\) for ψ ∈{ B,B|A}

    $$\displaystyle{ \frac{1} {N^{2}}\mathbf{H}^{(\psi )} -\tilde{\mathbf{H}}^{(\psi )} = o_{ p}(1)\quad \text{ as }a \rightarrow \infty \ .}$$

    and

  3. (iii)

    \(N^{-2}\mathbf{G} -\tilde{\mathbf{G}} = o_{p}(1)\) as a →∞.

Proof.

The proof can be established using the same techniques as in the proof of Theorem 4 in Harrar and Bathke (2008).

The following proposition asserts that the difference \(N^{-2}\mathbf{G} -\boldsymbol{\varSigma }\) is asymptotically (a → ) stochastically negligible.

Proposition 7.2.

Assume that the n ij are bounded. Then \(N^{-2}\mathbf{G} -\boldsymbol{\varSigma }\stackrel{p}{\rightarrow }0\) as a →∞.

Proof.

Since \(N^{-2}\mathbf{G} -\tilde{\mathbf{G}} = o_{p}(1)\) by (iii) of Proposition 7.1, it suffices to show that \(\tilde{\mathbf{G}} -\boldsymbol{\varSigma }\stackrel{p}{\rightarrow }0\). This follows from Theorem 1 of Harrar and Bathke (2012) if we show that \(\sum _{i=1}^{a}\sum _{j=1}^{b}n_{ij}^{-2}(n_{ij} - 1)^{-1}\boldsymbol{\varSigma }_{ij} \otimes \boldsymbol{\varSigma }_{ij} = o(a^{2})\) and \(\sum _{i=1}^{a}\sum _{j=1}^{b}n_{ij}^{-3}K_{4}(\mathbf{Y}_{ij1}) = o(a^{2})\) as a → . These two follow from the fact that the components of Y ij1 are uniformly bounded random variables.

Next, we obtain the asymptotic null distributions of the four test statistics for testing the main, simple, and interaction effects. Since the results for testing \(\mathcal{H}^{(AB)}\), \(\mathcal{H}^{(A)}\), \(\mathcal{H}^{(A\vert B)}\) and \(\mathcal{H}^{(B\vert A)}\) are similar in form and their derivations proceed along the same lines, we consider them together.

We know from Proposition 7.2 that \(N^{-2}\mathbf{G} -\boldsymbol{\varSigma } = o_{p}(1)\) as a → , and it is established in Theorem 7.1 below that \(\sqrt{a}(\mathbf{H}^{(\psi ) } -\mathbf{G})\boldsymbol{\varOmega } = O_{p}(1)\) as a → , for any matrix of constants \(\boldsymbol{\varOmega }\). All four test statistics, scaled and centered suitably, can be expressed as

$$\displaystyle{ \sqrt{a}\ (\ell T_{\mathcal{G}}^{(\psi )} - h) = \sqrt{a}\ \mathrm{tr}(\mathbf{H}^{(\psi )} -\mathbf{G})\boldsymbol{\varOmega } + o_{ p}(1), }$$
(7.2)

where  = 1, 2, 1, 4, h = 1, 2plog2, p, 2p and \(\boldsymbol{\varOmega } = (1/\mathrm{tr}\boldsymbol{\varSigma })\mathbf{I}_{p},\boldsymbol{\varSigma }^{-},\boldsymbol{\varSigma }^{-},\boldsymbol{\varSigma }^{-}\) for \(\mathcal{G} =\mathrm{ D},\mathrm{LR},\mathrm{LH},\mathrm{BNP}\), respectively (see Harrar and Bathke 2012, for more details). Therefore, the null distributions of the four test statistics can be derived in a unified manner by obtaining the null distribution of \(\sqrt{a}\ \mathrm{tr}(\mathbf{H}^{(\psi ) } -\mathbf{G})\boldsymbol{\varOmega }\) for any fixed matrix \(\boldsymbol{\varOmega }\). The null distribution of this latter quantity is given in Theorem 7.1.

Theorem 7.1.

Let ψ = AB,A,A|B, or B|A. Under the hypothesis \(\mathcal{H}_{0}^{(\psi )}\), \(\sqrt{a}\ \mathrm{tr}(\mathbf{H}^{(\psi ) } -\mathbf{G})\boldsymbol{\varOmega }\stackrel{\mathcal{L}}{\rightarrow }N\left (0,\tau _{\psi }^{2}(\boldsymbol{\varOmega })\right )\) as a →∞ and n ij and b bounded, where

$$\displaystyle{\tau _{\psi }^{2}(\boldsymbol{\varOmega }) = \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{2} {b}\left \{v_{1}(\boldsymbol{\varOmega }) + \frac{v_{2}(\boldsymbol{\varOmega })} {(b-1)^{2}} \right \} \quad &\text{when}\quad \psi = AB, \\ \frac{2} {b}\left \{v_{1}(\boldsymbol{\varOmega }) + v_{2}(\boldsymbol{\varOmega })\right \} \quad &\text{when}\quad \psi = A, \\ \frac{2} {b}v_{1}(\boldsymbol{\varOmega }) \quad &\text{when}\quad \psi = A\vert B, \\ \frac{2} {b^{2}} \left \{v_{1}(\boldsymbol{\varOmega }) + \frac{v_{2}(\boldsymbol{\varOmega })} {(b-1)^{2}} \right \}\quad &\text{when}\quad \psi = B\vert A.\\ \quad \end{array} \right.}$$

Here,

$$\displaystyle{v_{1}(\boldsymbol{\varOmega }) =\lim _{a\rightarrow \infty } \frac{1} {ab}\sum _{i=1}^{a}\sum _{ j=1}^{b} \frac{\mathrm{tr}(\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij})^{2}} {n_{ij}(n_{ij} - 1)},}$$

and

$$\displaystyle{v_{2} =\lim _{a\rightarrow \infty } \frac{1} {ab}\sum _{i=1}^{a}\sum _{ j\neq j'}^{b}\frac{\mathrm{tr}(\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij}\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij'})} {n_{ij}n_{ij'}},}$$

assuming the limits exist.

Proof.

Considering Proposition 7.1, it is enough to show that \(N^{-2}\sqrt{a}\ \mathrm{tr}(\mathbf{H}^{(\psi )} -\mathbf{G})\boldsymbol{\varOmega }\stackrel{\mathcal{L}}{\rightarrow }N\left (0,\tau _{\psi }^{2}(\boldsymbol{\varOmega })\right )\) as a →  and n ij and b bounded. This follows from Theorem 2 of Harrar and Bathke (2012) if for some δ > 0, \(E\vert (\mathbf{Y}_{ij1} -\frac{1} {2}\mathbf{1})'\boldsymbol{\varSigma }_{ij}^{-1}(\mathbf{Y}_{ ij1} -\frac{1} {2}\mathbf{1})\vert ^{2+\delta } < \infty \) and

$$\displaystyle\begin{array}{rcl} & & \lim _{a\rightarrow \infty }\frac{1} {a}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}^{1+\delta /2}(n_{ij} - 1)^{1+\delta /2}}\mathrm{tr}(\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij})^{2+\delta } < \infty \text{ and } {}\\ & & \qquad \qquad \qquad \qquad \qquad \lim _{a\rightarrow \infty }\frac{1} {a}\sum \limits _{i=1}^{a}\sum \limits _{ j\neq j'}^{b} \frac{1} {n_{ij}^{1+\delta /2}n_{ij'}^{1+\delta /2}}\mathrm{tr}(\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij}\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij'})^{1+\delta /2} < \infty. {}\\ \end{array}$$

Recalling again that the components of Y ij1 are bounded random variables completes the proof.

Under the assumptions and notations of Theorem 7.1, the asymptotic distribution of Dempster’s ANOVA type criterion can be obtained by setting \(\boldsymbol{\varOmega }= (1/\mathrm{tr}\boldsymbol{\varSigma })\mathbf{I}_{p}\). For the other three criteria, we set \(\boldsymbol{\varOmega }=\boldsymbol{\varSigma } ^{-1}\) to get the asymptotic null distributions.

Needless to say, the asymptotic null distributions of T LR, T LH and T BNP, scaled and centered as in (7.2), are the same up to the order \(O(a^{-1/2})\). A comparison of the asymptotic variances in Theorem 7.1 reveals that the test statistic for the interaction effect has smaller variance than that of the main effect. Also we see from the asymptotic variances in Theorem 7.1 that the test statistic for the simple effect of A has smaller variance compared to that of either the interaction or main effects.

7.4 Consistent Variance and Covariance Matrix Estimation

Multivariate data in factorial designs present a major technical difficulty considering the derivation of valid nonparametric test statistics: Unlike in the multivariate one-way design discussed in Harrar and Bathke (2008), the covariance matrices do not simplify under the null hypotheses that are considered here. Therefore, it is more complicated to devise consistent variance estimators.

The following theorem provides an asymptotic result formulated in terms of the unobservable “asymptotic rank transforms”. The expression is analogous to the variance estimator defined in Theorem 2.3 of Harrar and Bathke (2012) in the semiparametric context. However, due to the fact that the “asymptotic rank transforms” are per definition bounded between 0 and 1, it is not necessary to require a moment condition as in Harrar and Bathke (2012).

Theorem 7.2.

Let the model and assumptions be as in Theorem  7.1. Define

$$\displaystyle\begin{array}{rcl} \widetilde{\boldsymbol{\varPsi }}_{ij}(\boldsymbol{\varOmega })& =& \frac{1} {4c_{ij}}\sum \limits _{(k_{1},k_{2},k_{3},k_{4})\in \mathcal{K}}^{n_{ij} }\boldsymbol{\varOmega }(\mathbf{Y}_{ijk_{1}} -\mathbf{Y}_{ijk_{2}})(\mathbf{Y}_{ijk_{1}} -\mathbf{Y}_{ijk_{2}})' {}\\ & & \times \boldsymbol{\varOmega }(\mathbf{Y}_{ijk_{3}} -\mathbf{Y}_{ijk_{4}})(\mathbf{Y}_{ijk_{3}} -\mathbf{Y}_{ijk_{4}})', {}\\ \end{array}$$

where \(\mathcal{K}\) is the set of all quadruples κ = (k 1 ,k 2 ,k 3 ,k 4 ) where no element in κ is equal to any other element in κ, and \(c_{ij} = n_{ij}(n_{ij} - 1)(n_{ij} - 2)(n_{ij} - 3)\) . Also, define

$$\displaystyle{\tilde{\mathbf{S}}_{ij} = \frac{1} {(n_{ij} - 1)}\sum \limits _{k=1}^{n_{ij} }(\mathbf{Y}_{ijk} -\mathbf{\bar{Y}}_{ij.})(\mathbf{Y}_{ijk} -\mathbf{\bar{Y}}_{ij.})'.}$$

Then,

$$\displaystyle{ \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)}\mathrm{tr}(\widetilde{\boldsymbol{\varPsi }}_{ij}(\boldsymbol{\varOmega })) - \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)}\mathrm{tr}(\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij})^{2} = o_{ p}(1),}$$

and

$$\displaystyle{ \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j\neq j'}^{b} \frac{1} {n_{ij}n_{ij'}}\mathrm{tr}(\boldsymbol{\varOmega }\tilde{\mathbf{S}}_{ij}\boldsymbol{\varOmega }\tilde{\mathbf{S}}_{ij'}) - \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j\neq j'}^{b} \frac{1} {n_{ij}n_{ij'}}\mathrm{tr}(\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij}\boldsymbol{\varOmega }\boldsymbol{\varSigma }_{ij'}) = o_{p}(1),}$$

as a →∞.

The proof follows similar to that of Theorem 2.3 in Harrar and Bathke (2012), or rather from the theory of U-statistics (see, e.g., Serfling 1980).

Since the “variance estimator” presented in the previous theorem is not observable and therefore can not be used in practice, in the next two theorems we are introducing observable rank-based estimators and establish their asymptotic equivalence to corresponding expressions formulated in terms of the “asymptotic rank transforms”.

Theorem 7.3.

Let \(\widetilde{\boldsymbol{\varPsi }}_{ij}(\boldsymbol{\varOmega })\) be defined as in Theorem  7.2. Define \(\widehat{\boldsymbol{\varPsi }}_{ij}(\boldsymbol{\varOmega })\) analogously, but using rank transforms instead of asymptotic rank transforms (see Definition  7.1). Then,

$$\displaystyle{ D = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)}\mathrm{tr}(\widehat{\boldsymbol{\varPsi }}_{ij}(\boldsymbol{\varOmega }))-\frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)}\mathrm{tr}(\widetilde{\boldsymbol{\varPsi }}_{ij}(\boldsymbol{\varOmega })) = o_{p}(1), }$$

as a →∞.

Proof.

Without loss of generality, assume that \(\boldsymbol{\varOmega }= \mathbf{I}\). Define

$$\displaystyle\begin{array}{rcl} & & \tilde{D} = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)} \frac{1} {4c_{ij}} {}\\ & & \sum \limits _{(k_{1},k_{2},k_{3},k_{4})\in \mathcal{K}}^{n_{ij} }\big[(\hat{\mathbf{Y}} _{ijk_{1}} -\hat{\mathbf{Y}} _{ijk_{2}})(\hat{\mathbf{Y}} _{ijk_{1}} -\hat{\mathbf{Y}} _{ijk_{2}})' \otimes (\hat{\mathbf{Y}} _{ijk_{3}} -\hat{\mathbf{Y}} _{ijk_{4}})(\hat{\mathbf{Y}} _{ijk_{3}} -\hat{\mathbf{Y}} _{ijk_{4}})' {}\\ & & \qquad \qquad - (\mathbf{Y}_{ijk_{1}} -\mathbf{Y}_{ijk_{2}})(\mathbf{Y}_{ijk_{1}} -\mathbf{Y}_{ijk_{2}})' \otimes (\mathbf{Y}_{ijk_{3}} -\mathbf{Y}_{ijk_{4}})(\mathbf{Y}_{ijk_{3}} -\mathbf{Y}_{ijk_{4}})'\big], {}\\ \end{array}$$

where the c ij are as defined in Theorem 7.2, and consider an arbitrary element of this p 2 × p 2 matrix. Each element \(\tilde{D}_{q_{1},q_{2},q_{3},q_{4}}\) is uniquely determined by a combination of four indices q 1, q 2, q 3, q 4, where q r  = 1, , p, r = 1, , 4. Then, we have

$$\displaystyle\begin{array}{rcl} & & \tilde{D}_{q_{1},q_{2},q_{3},q_{4}} = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)} \frac{1} {4c_{ij}}\sum \limits _{(k_{1},k_{2},k_{3},k_{4})\in \mathcal{K}}^{n_{ij} } {}\\ & & \qquad \qquad \big[(\hat{Y } _{ijk_{1}}^{(q_{1})} -\hat{Y } _{ ijk_{2}}^{(q_{1})})(\hat{Y } _{ ijk_{1}}^{(q_{2})} -\hat{Y } _{ ijk_{2}}^{(q_{2})})(\hat{Y } _{ ijk_{3}}^{(q_{3})} -\hat{Y } _{ ijk_{4}}^{(q_{3})})(\hat{Y } _{ ijk_{3}}^{(q_{4})} -\hat{Y } _{ ijk_{4}}^{(q_{4})}) {}\\ & & \qquad \qquad \qquad \quad - (Y _{ijk_{1}}^{(q_{1})} - Y _{ ijk_{2}}^{(q_{1})})(Y _{ ijk_{1}}^{(q_{2})} - Y _{ ijk_{2}}^{(q_{2})})(Y _{ ijk_{3}}^{(q_{3})} - Y _{ ijk_{4}}^{(q_{3})})(Y _{ ijk_{3}}^{(q_{4})} - Y _{ ijk_{4}}^{(q_{4})})\big] {}\\ & & = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)4c_{ij}}\sum \limits _{(k_{1},k_{2},k_{3},k_{4})\in \mathcal{K}}^{n_{ij} } {}\\ & & \qquad \quad \big[(\hat{H} ^{(q_{1})}(X_{ ijk_{1}}^{(q_{1})}) -\hat{H} ^{(q_{1})}(X_{ ijk_{2}}^{(q_{1})}))(\hat{H} ^{(q_{2})}(X_{ ijk_{1}}^{(q_{2})}) -\hat{H} ^{(q_{2})}(X_{ ijk_{2}}^{(q_{2})})) {}\\ & & \qquad \qquad \qquad \qquad \times (\hat{H} ^{(q_{3})}(X_{ ijk_{3}}^{(q_{3})}) -\hat{H} ^{(q_{3})}(X_{ ijk_{4}}^{(q_{3})}))(\hat{H} ^{(q_{4})}(X_{ ijk_{3}}^{(q_{4})}) -\hat{H} ^{(q_{4})}(X_{ ijk_{4}}^{(q_{4})})) {}\\ & & \qquad \qquad \qquad \qquad - (H^{(q_{1})}(X_{ ijk_{1}}^{(q_{1})}) - H^{(q_{1})}(X_{ ijk_{2}}^{(q_{1})}))(H^{(q_{2})}(X_{ ijk_{1}}^{(q_{2})}) - H^{(q_{2})}(X_{ ijk_{2}}^{(q_{2})})) {}\\ & & \qquad \qquad \qquad \qquad \times (H^{(q_{3})}(X_{ ijk_{3}}^{(q_{3})}) - H^{(q_{3})}(X_{ ijk_{4}}^{(q_{3})}))(H^{(q_{4})}(X_{ ijk_{3}}^{(q_{4})}) - H^{(q_{4})}(X_{ ijk_{4}}^{(q_{4})}))\big] {}\\ & & = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j=1}^{b} \frac{1} {n_{ij}(n_{ij} - 1)4c_{ij}}\sum \limits _{(k_{1},k_{2},k_{3},k_{4})\in \mathcal{K}}^{n_{ij} } \frac{1} {N^{4}}\sum _{s_{1}=1}^{N}\sum _{ s_{2}=1}^{N}\sum _{ s_{3}=1}^{N}\sum _{ s_{4}=1}^{N} {}\\ & & \qquad \qquad \zeta (X_{ijk_{1}}^{(q_{1})},X_{ ijk_{2}}^{(q_{1})},X_{ ijk_{1}}^{(q_{2})},X_{ ijk_{2}}^{(q_{2})},X_{ ijk_{3}}^{(q_{3})},X_{ ijk_{4}}^{(q_{3})},X_{ ijk_{3}}^{(q_{4})},X_{ ijk_{4}}^{(q_{4})},X_{ s_{1}},X_{s_{2}},X_{s_{3}},X_{s_{4}}), {}\\ & & \mbox{ where }\zeta (X_{ijk_{1}}^{(q_{1})},X_{ ijk_{2}}^{(q_{1})},X_{ ijk_{1}}^{(q_{2})},X_{ ijk_{2}}^{(q_{2})},X_{ ijk_{3}}^{(q_{3})},X_{ ijk_{4}}^{(q_{3})},X_{ ijk_{3}}^{(q_{4})},X_{ ijk_{4}}^{(q_{4})},X_{ s_{1}},X_{s_{2}},X_{s_{3}},X_{s_{4}}) {}\\ & & \qquad \qquad \quad =\Big ([c(X_{ijk_{1}}^{(q_{1})} - X_{ s_{1}}) - c(X_{ijk_{2}}^{(q_{1})} - X_{ s_{1}})][c(X_{ijk_{1}}^{(q_{2})} - X_{ s_{2}}) - c(X_{ijk_{2}}^{(q_{2})} - X_{ s_{2}})] {}\\ & & \qquad \qquad \qquad \quad \times [c(X_{ijk_{3}}^{(q_{3})} - X_{ s_{3}}) - c(X_{ijk_{4}}^{(q_{3})} - X_{ s_{3}})][c(X_{ijk_{3}}^{(q_{4})} - X_{ s_{4}}) - c(X_{ijk_{4}}^{(q_{4})} - X_{ s_{4}})] {}\\ & & \qquad \qquad \qquad \quad - [F_{s_{1}}(X_{ijk_{1}}^{(q_{1})}) - F_{ s_{1}}(X_{ijk_{2}}^{(q_{1})})][F_{ s_{2}}(X_{ijk_{1}}^{(q_{2})}) - F_{ s_{2}}(X_{ijk_{2}}^{(q_{2})})] {}\\ & & \qquad \qquad \qquad \quad \times [F_{s_{3}}(X_{ijk_{3}}^{(q_{3})}) - F_{ s_{3}}(X_{ijk_{4}}^{(q_{3})})][F_{ s_{4}}(X_{ijk_{3}}^{(q_{4})}) - F_{ s_{4}}(X_{ijk_{4}}^{(q_{4})})]\Big), {}\\ & & c(\cdot )\ \mbox{ denotes the normalized counting function }\ c(x) = (I\{x > 0\} + I\{x \geq 0\}), {}\\ & & \mbox{ and }\ F_{t}\ \mbox{ denotes the cdf of }\ X_{t}. {}\\ \end{array}$$

Note that E(ζ) = 0 if all indices s 1, s 2, s 3, s 4 are different from each other, and the corresponding random variables independent of the other eight random variables. This holds because the first part of ζ, integrated over \((X_{s_{1}},X_{s_{2}},X_{s_{3}},X_{s_{4}})\), equals the second part. Therefore, \(E(\tilde{D}) \rightarrow 0\) since the number of (s 1, s 2, s 3, s 4) index combinations resulting in nonzero expectation is of order N 3, but the sum is divided by N 4. Consider now

$$\displaystyle\begin{array}{rcl} & & \tilde{D}_{q_{1},q_{2},q_{3},q_{4}}^{2} = \frac{1} {a^{2}b^{2}}\sum \limits _{i_{1}=1}^{a}\sum \limits _{ i_{2}=1}^{a}\sum \limits _{ j_{1}=1}^{b}\sum \limits _{ j_{2}=1}^{b} \frac{1} {n_{i_{1}j_{1}}n_{i_{2}j_{2}}(n_{i_{1}j_{1}} - 1)(n_{i_{2}j_{2}} - 1)16c_{i_{1}j_{1}}c_{i_{2}j_{2}}} {}\\ & & \qquad \qquad \qquad \quad \sum \limits _{(k_{1},k_{2},k_{3},k_{4})\in \mathcal{K}}^{n_{i_{1}j_{1}} }\sum \limits _{(l_{1},l_{2},l_{3},l_{4})\in \mathcal{K}}^{n_{i_{2}j_{2}} } \frac{1} {N^{8}}\sum _{s_{1}=1}^{N}\sum _{ s_{2}=1}^{N}\sum _{ s_{3}=1}^{N}\sum _{ s_{4}=1}^{N}\sum _{ t_{1}=1}^{N}\sum _{ t_{2}=1}^{N}\sum _{ t_{3}=1}^{N}\sum _{ t_{4}=1}^{N} {}\\ & & \qquad \quad \zeta (X_{i_{1}j_{1}k_{1}}^{(q_{1})},X_{ i_{1}j_{1}k_{2}}^{(q_{1})},X_{ i_{1}j_{1}k_{1}}^{(q_{2})},X_{ i_{1}j_{1}k_{2}}^{(q_{2})},X_{ i_{1}j_{1}k_{3}}^{(q_{3})},X_{ i_{1}j_{1}k_{4}}^{(q_{3})},X_{ i_{1}j_{1}k_{3}}^{(q_{4})},X_{ i_{1}j_{1}k_{4}}^{(q_{4})},X_{ s_{1}},X_{s_{2}},X_{s_{3}},X_{s_{4}}) {}\\ & & \times \zeta (X_{i_{2}j_{2}l_{1}}^{(q_{1})},X_{ i_{2}j_{2}l_{2}}^{(q_{1})},X_{ i_{2}j_{2}l_{1}}^{(q_{2})},X_{ i_{2}j_{2}l_{2}}^{(q_{2})},X_{ i_{2}j_{2}l_{3}}^{(q_{3})},X_{ i_{2}j_{2}l_{4}}^{(q_{3})},X_{ i_{2}j_{2}l_{3}}^{(q_{4})},X_{ i_{2}j_{2}l_{4}}^{(q_{4})},X_{ t_{1}},X_{t_{2}},X_{t_{3}},X_{t_{4}}). {}\\ \end{array}$$

Again, when all involved random variables with indices s 1, s 2, s 3, s 4, t 1, t 2, t 3, t 4 are independent of each other, and of the remaining random variables, the expectation of each ζ-function is zero, and therefore also the expectation of the product. Similar to above, this can be seen by first integrating over the random variables with indices (s 1, s 2, s 3, s 4), conditional on those with indices (k 1, k 2, k 3, k 4). The number of cases with nonzero expectation is again of smaller order, in this case N 7, while division is by N 8. It follows that \(E(\tilde{D}_{q_{1},q_{2},q_{3},q_{4}}^{2}) \rightarrow 0\) and therefore \(\tilde{D}_{q_{1},q_{2},q_{3},q_{4}} = o_{p}(1)\) for each element of \(\tilde{D}\), which proves \(\tilde{D} = o_{p}(1)\).

Theorem 7.4.

Let \(\tilde{\mathbf{S}}_{ij}\) be defined as in Theorem  7.2, and define \(\widehat{\mathbf{S}}_{ij}\) analogously, but using rank transforms instead of asymptotic rank transforms. Then,

$$\displaystyle{K = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j\neq j'}^{b} \frac{1} {n_{ij}n_{ij'}}\mathrm{tr}(\boldsymbol{\varOmega }\widehat{\mathbf{S}}_{ij}\boldsymbol{\varOmega }\widehat{\mathbf{S}}_{ij'}) - \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j\neq j'}^{b} \frac{1} {n_{ij}n_{ij'}}\mathrm{tr}(\boldsymbol{\varOmega }\widetilde{\mathbf{S}}_{ij}\boldsymbol{\varOmega }\widetilde{\mathbf{S}}_{ij'}) = o_{p}(1),}$$

as a →∞.

Proof.

As in the proof of Theorem 7.3, assume without loss of generality that \(\boldsymbol{\varOmega }= \mathbf{I}\), and define

$$\displaystyle\begin{array}{rcl} & & \tilde{\mathbf{K}} = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j\neq j'}^{b} \frac{1} {n_{ij}(1 - n_{ij})n_{ij'}(1 - n_{ij'})}\sum \limits _{k=1}^{n_{ij} }\sum \limits _{k'=1}^{n_{ij'} } {}\\ & & \big[(\hat{\mathbf{Y}} _{ijk} -\hat{\bar{\mathbf{Y}}} _{ij\cdot })(\hat{\mathbf{Y}} _{ijk} -\hat{\bar{\mathbf{Y}}} _{ij\cdot })' \otimes (\hat{\mathbf{Y}} _{ij'k'} -\hat{\bar{\mathbf{Y}}} _{ij'\cdot })(\hat{\mathbf{Y}} _{ij'k'} -\hat{\bar{\mathbf{Y}}} _{ij'\cdot })' {}\\ & & \qquad \quad - (\mathbf{Y}_{ijk} -\bar{\mathbf{Y}}_{ij\cdot })(\mathbf{Y}_{ijk} -\bar{\mathbf{Y}}_{ij\cdot })' \otimes (\mathbf{Y}_{ij'k'} -\bar{\mathbf{Y}}_{ij'\cdot })(\mathbf{Y}_{ij'k'} -\bar{\mathbf{Y}}_{ij'\cdot })'\big], {}\\ \end{array}$$

and consider again an arbitrary element \(\tilde{K}_{q_{1},q_{2},q_{3},q_{4}}\) of this p 2 × p 2 matrix that is determined by a combination of four indices q 1, q 2, q 3, q 4, where q r  = 1, , p, r = 1, , 4.

$$\displaystyle\begin{array}{rcl} & & \tilde{K}_{q_{1},q_{2},q_{3},q_{4}} = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j\neq j'}^{b} \frac{1} {n_{ij}(1 - n_{ij})n_{ij'}(1 - n_{ij'})}\sum \limits _{k=1}^{n_{ij} }\sum \limits _{k'=1}^{n_{ij'} } {}\\ & & \big[(\hat{Y } _{ijk}^{(q_{1})} -\hat{\bar{Y }} _{ ij\cdot }^{(q_{1})})\hat{Y } _{ ijk}^{(q_{2})}(\hat{Y } _{ ij'k'}^{(q_{3})} -\hat{\bar{Y }} _{ ij'\cdot }^{(q_{3})})\hat{Y } _{ ij'k'}^{(q_{4})} - (Y _{ ijk}^{(q_{1})} -\bar{ Y }_{ ij\cdot }^{(q_{1})})Y _{ ijk}^{(q_{2})}(Y _{ ij'k'}^{(q_{3})} -\bar{ Y }_{ ij'\cdot }^{(q_{3})})Y _{ ij'k'}^{(q_{4})}\big] {}\\ & & \qquad = \frac{1} {ab}\sum \limits _{i=1}^{a}\sum \limits _{ j\neq j'}^{b} \frac{1} {n_{ij}(1 - n_{ij})n_{ij'}(1 - n_{ij'})}\sum \limits _{k=1}^{n_{ij} }\sum \limits _{k'=1}^{n_{ij'} } {}\\ & & \qquad \qquad \quad \Big\{\big[\hat{Y } _{ijk}^{(q_{1})}\hat{Y } _{ ijk}^{(q_{2})}\hat{Y } _{ ij'k'}^{(q_{3})}\hat{Y } _{ ij'k'}^{(q_{4})} - Y _{ ijk}^{(q_{1})}Y _{ ijk}^{(q_{2})}Y _{ ij'k'}^{(q_{3})}Y _{ ij'k'}^{(q_{4})}\big] {}\\ & & \qquad \qquad \quad -\big [\hat{\bar{Y }} _{ij\cdot }^{(q_{1})}\hat{Y } _{ ijk}^{(q_{2})}(\hat{Y } _{ ij'k'}^{(q_{3})} -\hat{\bar{Y }} _{ ij'\cdot }^{(q_{3})})\hat{Y } _{ ij'k'}^{(q_{4})} -\bar{ Y }_{ ij\cdot }^{(q_{1})}Y _{ ijk}^{(q_{2})}(Y _{ ij'k'}^{(q_{3})} -\bar{ Y }_{ ij'\cdot }^{(q_{3})})Y _{ ij'k'}^{(q_{4})}\big] {}\\ & & \qquad \qquad \quad -\big [\hat{Y } _{ijk}^{(q_{1})}\hat{Y } _{ ijk}^{(q_{2})}\hat{\bar{Y }} _{ ij'\cdot }^{(q_{3})}\hat{Y } _{ ij'k'}^{(q_{4})} - Y _{ ijk}^{(q_{1})}Y _{ ijk}^{(q_{2})}\bar{Y }_{ ij'\cdot }^{(q_{3})}Y _{ ij'k'}^{(q_{4})}\big]\Big\}. {}\\ \end{array}$$

The terms in each of the three square brackets can be considered separately, using basically the same techniques for each. We show details of the proof for the first term.

$$\displaystyle\begin{array}{rcl} & & \hat{Y } _{ijk}^{(q_{1})}\hat{Y } _{ ijk}^{(q_{2})}\hat{Y } _{ ij'k'}^{(q_{3})}\hat{Y } _{ ij'k'}^{(q_{4})} - Y _{ ijk}^{(q_{1})}Y _{ ijk}^{(q_{2})}Y _{ ij'k'}^{(q_{3})}Y _{ ij'k'}^{(q_{4})} {}\\ & & \quad =\hat{ H}^{(q_{1})}(X_{ ijk}^{(q_{1})})\hat{H} ^{(q_{2})}(X_{ ijk}^{(q_{2})})\hat{H} ^{(q_{3})}(X_{ ij'k'}^{(q_{3})})\hat{H} ^{(q_{4})}(X_{ ij'k'}^{(q_{4})}) {}\\ & & \quad - H^{(q_{1})}(X_{ ijk}^{(q_{1})})H^{(q_{2})}(X_{ ijk}^{(q_{2})})H^{(q_{3})}(X_{ ij'k'}^{(q_{3})})H^{(q_{4})}(X_{ ij'k'}^{(q_{4})}) {}\\ & & \quad = \frac{1} {N^{4}}\sum _{s_{1}=1}^{N}\sum _{ s_{2}=1}^{N}\sum _{ s_{3}=1}^{N}\sum _{ s_{4}=1}^{N}\big[c(X_{ ijk}^{(q_{1})} - X_{ s_{1}})c(X_{ijk}^{(q_{2})} - X_{ s_{2}})c(X_{ij'k'}^{(q_{3})} - X_{ s_{3}})c(X_{ij'k'}^{(q_{4})} - X_{ s_{4}}) {}\\ & & \qquad \qquad \quad - F_{s_{1}}(X_{ijk}^{(q_{1})})F_{ s_{2}}(X_{ijk}^{(q_{2})})F_{ s_{3}}(X_{ij'k'}^{(q_{3})})F_{ s_{4}}(X_{ij'k'}^{(q_{4})})\big]. {}\\ \end{array}$$

Clearly, the expected value of this expression is 0 when all indices s 1, s 2, s 3, s 4 are different from each other, and the corresponding random variables independent of the other four random variables. This can be seen by integrating over \((X_{s_{1}},X_{s_{2}},X_{s_{3}},X_{s_{4}})\) first. The number of (s 1, s 2, s 3, s 4) index combinations resulting in nonzero expectation is of order N 3, while the sum is divided by N 4. Using similar techniques for the remaining components of \(\tilde{K}_{q_{1},q_{2},q_{3},q_{4}}\), it follows that \(E(\tilde{K}_{q_{1},q_{2},q_{3},q_{4}}) \rightarrow 0\). Consider next

$$\displaystyle\begin{array}{rcl} & & \tilde{K}_{q_{1},q_{2},q_{3},q_{4}}^{2} = \frac{1} {a^{2}b^{2}}\sum \limits _{i_{1}=1}^{a}\sum \limits _{ i_{2}=1}^{a}\sum \limits _{ j_{1}\neq j_{1}'}^{b}\sum \limits _{ j_{2}\neq j_{2}'}^{b} {}\\ & & \qquad \qquad \frac{1} {n_{i_{1}j_{1}}n_{i_{2}j_{2}}(1 - n_{i_{1}j_{1}})(1 - n_{i_{2}j_{2}})n_{i_{1}j_{1}'}n_{i_{2}j_{2}'}(1 - n_{i_{1}j_{1}'})(1 - n_{i_{2}j_{2}'})} {}\\ & & \qquad \qquad \sum \limits _{k_{1}=1}^{n_{i_{1}j_{1}} }\sum \limits _{k_{2}=1}^{n_{i_{2}j_{2}} }\sum \limits _{k'=1}^{n_{i_{1}j_{1}'} }\sum \limits _{k_{2}'=1}^{n_{i_{2}j_{2}'} }\big[(\hat{Y } _{i_{1}j_{1}k_{1}}^{(q_{1})} -\hat{\bar{Y }} _{ i_{1}j_{1}\cdot }^{(q_{1})})\hat{Y } _{ i_{1}j_{1}k_{1}}^{(q_{2})}(\hat{Y } _{ i_{1}j_{1}'k_{1}'}^{(q_{3})} -\hat{\bar{Y }} _{ i_{1}j_{1}'\cdot }^{(q_{3})})\hat{Y } _{ i_{1}j_{1}'k_{1}'}^{(q_{4})} {}\\ & & \qquad \qquad \qquad \qquad \qquad \qquad - (Y _{i_{1}j_{1}k_{1}}^{(q_{1})} -\bar{ Y }_{ i_{1}j_{1}\cdot }^{(q_{1})})Y _{ i_{1}j_{1}k_{1}}^{(q_{2})}(Y _{ i_{1}j_{1}'k_{1}'}^{(q_{3})} -\bar{ Y }_{ i_{1}j_{1}'\cdot }^{(q_{3})})Y _{ i_{1}j_{1}'k_{1}'}^{(q_{4})}\big] {}\\ & & \qquad \qquad \qquad \quad \big[(\hat{Y } _{i_{2}j_{2}k_{2}}^{(q_{1})} -\hat{\bar{Y }} _{ i_{2}j_{2}\cdot }^{(q_{1})})\hat{Y } _{ i_{2}j_{2}k_{2}}^{(q_{2})}(\hat{Y } _{ i_{2}j_{2}'k_{2}'}^{(q_{3})} -\hat{\bar{Y }} _{ i_{2}j_{2}'\cdot }^{(q_{3})})\hat{Y } _{ i_{2}j_{2}'k_{2}'}^{(q_{4})} {}\\ & & \qquad \qquad \qquad \qquad \qquad \qquad - (Y _{i_{2}j_{2}k_{2}}^{(q_{1})} -\bar{ Y }_{ i_{2}j\cdot }^{(q_{1})})Y _{ i_{2}j_{2}k_{2}}^{(q_{2})}(Y _{ i_{2}j_{2}'k_{2}'}^{(q_{3})} -\bar{ Y }_{ i_{2}j_{2}'\cdot }^{(q_{3})})Y _{ i_{2}j_{2}'k_{2}'}^{(q_{4})}\big]. {}\\ \end{array}$$

The product of the square brackets can be decomposed into the following and similar terms, using the same decomposition as in the first part of this proof.

$$\displaystyle\begin{array}{rcl} & & \frac{1} {N^{8}}\sum _{s_{1}=1}^{N}\sum _{ s_{2}=1}^{N}\sum _{ s_{3}=1}^{N}\sum _{ s_{4}=1}^{N}\sum _{ t_{1}=1}^{N}\sum _{ t_{2}=1}^{N}\sum _{ t_{3}=1}^{N}\sum _{ t_{4}=1}^{N} {}\\ & & \qquad \quad \big[c(X_{i_{1}j_{1}k_{1}}^{(q_{1})} - X_{ s_{1}})c(X_{i_{1}j_{1}k_{1}}^{(q_{2})} - X_{ s_{2}})c(X_{i_{1}j_{1}'k_{1}'}^{(q_{3})} - X_{ s_{3}})c(X_{i_{1}j_{1}'k_{1}'}^{(q_{4})} - X_{ s_{4}}) {}\\ & & \qquad \quad \qquad \quad - F_{s_{1}}(X_{i_{1}j_{1}k_{1}}^{(q_{1})})F_{ s_{2}}(X_{i_{1}j_{1}k_{1}}^{(q_{2})})F_{ s_{3}}(X_{i_{1}j_{1}'k_{1}'}^{(q_{3})})F_{ s_{4}}(X_{i_{1}j_{1}'k_{1}'}^{(q_{4})})\big] {}\\ & & \qquad \quad \big[c(X_{i_{2}j_{2}k_{2}}^{(q_{1})} - X_{ t_{1}})c(X_{i_{2}j_{2}k_{2}}^{(q_{2})} - X_{ t_{2}})c(X_{i_{2}j_{2}'k_{2}'}^{(q_{3})} - X_{ t_{3}})c(X_{i_{2}j_{2}'k_{2}'}^{(q_{4})} - X_{ t_{4}}) {}\\ & & \qquad \quad \qquad \quad - F_{t_{1}}(X_{i_{2}j_{2}k_{2}}^{(q_{1})})F_{ t_{2}}(X_{i_{2}j_{2}k_{2}}^{(q_{2})})F_{ t_{3}}(X_{i_{2}j_{2}'k_{2}'}^{(q_{3})})F_{ t_{4}}(X_{i_{2}j_{2}'k_{2}'}^{(q_{4})})\big]. {}\\ \end{array}$$

As above, it can be seen that when all involved random variables with indices s 1, s 2, s 3, s 4, t 1, t 2, t 3, t 4 are independent of each other, and of the remaining random variables, the expectation of this expression is zero. The number of cases with nonzero expectation is of order N 7, while division is by N 8. A tedious calculation verifies that this is also the case for the remaining components of \(\tilde{K}_{q_{1},q_{2},q_{3},q_{4}}^{2}\). Thus, \(E(\tilde{K}_{q_{1},q_{2},q_{3},q_{4}}^{2}) \rightarrow 0\) and \(\tilde{K}_{q_{1},q_{2},q_{3},q_{4}} = o_{p}(1)\) for each element of \(\tilde{K}\), proving \(\tilde{K} = o_{p}(1)\).

The three previous theorems together establish the consistency of a rank-based estimator of the asymptotic variances. Aggregating the results so far, we can take advantage of the results from Harrar and Bathke (2012) and formulate Theorem 7.5.

Theorem 7.5.

Let ψ = AB,A,A|B, or B|A. Under the hypothesis \(\mathcal{H}_{0}^{(\psi )}\), \(\sqrt{a}\ \mathrm{tr}(\mathbf{H}^{(\psi ) } -\mathbf{G})\hat{\boldsymbol{\varOmega }}\hat{\tau }_{\psi }^{-1}(\hat{\boldsymbol{\varOmega }})\stackrel{\mathcal{L}}{\rightarrow }N\left (0,1\right )\) as a →∞ and n ij and b bounded, where \(\hat{\boldsymbol{\varOmega }}\) is the consistent estimator of \(\boldsymbol{\varOmega }\) obtained by replacing \(-\boldsymbol{\varSigma }\) with N −2 G (see Proposition 7.2), and where

$$\displaystyle{\hat{\tau }_{\psi }^{2}(\hat{\boldsymbol{\varOmega }}) = \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{2} {b}\left \{\hat{v}_{1}(\hat{\boldsymbol{\varOmega }}) + \frac{\hat{v}_{2}(\hat{\boldsymbol{\varOmega }})} {(b-1)^{2}} \right \} \quad &\text{when}\quad \psi = AB, \\ \frac{2} {b}\left \{\hat{v}_{1}(\hat{\boldsymbol{\varOmega }}) +\hat{ v}_{2}(\hat{\boldsymbol{\varOmega }})\right \} \quad &\text{when}\quad \psi = A, \\ \frac{2} {b}\hat{v}_{1}(\hat{\boldsymbol{\varOmega }}) \quad &\text{when}\quad \psi = A\vert B, \\ \frac{2} {b^{2}} \left \{\hat{v}_{1}(\hat{\boldsymbol{\varOmega }}) + \frac{\hat{v}_{2}(\hat{\boldsymbol{\varOmega }})} {(b-1)^{2}} \right \}\quad &\text{when}\quad \psi = B\vert A.\\ \quad \end{array} \right.}$$

Here, \(\hat{v}_{1}(\hat{\boldsymbol{\varOmega }}) = \frac{1} {ab}\sum _{i=1}^{a}\sum _{ j=1}^{b} \frac{\mathrm{tr}(\widehat{\boldsymbol{\varPsi }}_{ij}(\hat{\boldsymbol{\varOmega }}))} {n_{ij}(n_{ij}-1)}\) and \(\hat{v}_{2}(\hat{\boldsymbol{\varOmega }}) = \frac{1} {ab}\sum _{i=1}^{a}\sum _{ j\neq j'}^{b}\) \(\frac{\mathrm{tr}(\hat{\boldsymbol{\varOmega }}\hat{\mathbf{S}} _{ij}\hat{\boldsymbol{\varOmega }}\hat{\mathbf{S}} _{ij'})} {n_{ij}n_{ij'}}\).

7.5 Simulation Study

In order to investigate the finite sample performance of the proposed inference methods under the exemplary setting of dimension p = 3, number of levels of factor A between a = 6 and a = 50, number of levels of factor B set to b = 3, sample sizes per cell between n ij  = 4, 5, and 6. Underlying distributions chosen were normal and skew normal. The multivariate skew normal data were generated according to Proposition 6 in Azzalini and Dalla Valle (1996) where we used (in their notation) \(\boldsymbol{\delta }= \frac{\sqrt{2}} {\sqrt{p(p+1)+2}}(1,\ldots,p)'\) and \(\varOmega =\mathrm{ I}_{p} + \frac{1} {2\pi }\boldsymbol{\delta }\boldsymbol{\delta }'\).

The results under null hypothesis are shown in Fig. 7.1. As expected from a fully nonparametric rank-based approach, the underlying distribution does not have a major effect on the performance. In all cases considered, Wilks’ Λ type statistic performed best, in the sense of the simulated level being closest to the nominal level, while not exceeding it.

Fig. 7.1
figure 1

Simulated α under null hypothesis for p = 3, b = 3, a = 6 to 50, n ij  = 4, 5, 6. Normal and skew normal underlying data distributions, nominal α 5 %. Main effect of A and interaction between A and B

Due to its best performance under null hypothesis, Wilks’ Λ type test statistic was selected for a power simulation. Here, the statistic based on variablewise ranks, as proposed in the present article, was compared to the power of the analogous procedure using the original observations instead of the ranks (justified by Harrar and Bathke 2012). While there were no visible differences for underlying normal distributions, the power gain of the nonparametric rank-based method became quite pronounced when the underlying distribution was chosen as contaminated normal. Figure 7.2 shows simulation results for an exemplary situation with heteroscedastic contaminated multivariate normal distributions \(0.9N_{3}(\mathbf{0},\boldsymbol{\varSigma }_{ij}) + 0.1N_{3}(10 \cdot \mathbf{1},\boldsymbol{\varSigma }_{ij})\). Here, \(\boldsymbol{\varSigma }_{ij}\) were different compound symmetric variance-covariance matrices with off-diagonal elements \(\rho _{ij} = \sqrt{ij}/(1 + ij)\) and diagonal elements 1 −ρ ij .

Fig. 7.2
figure 2

Simulated power of Wilks’ Λ type test statistic using ranks, and using raw data, p = 3, a = 20, b = 3, n ij  = 4, 5, 6. Contaminated normal underlying data distributions, nominal α 5 %. Main effect of A and interaction between A and B. Location shifts for main and interaction effects as described in the text

Alternatives were modeled by location shifts. Specifically, in the main effects power simulation, expected values were shifted up by δ units for levels 10–20 of factor A, for all variables, while they were shifted down by δ units for the other levels 1–9. In the interaction effects power simulation, the upwards shift was for the factor level combinations (i, j) with i ≥ 10, j ≥ 2, whereas the downwards shift was for i < 10, j < 2. In both cases, a = 20, b = 3, p = 3.

The results show the rather striking advantages of a nonparametric rank-based approach over its semiparametric competitor using the original observations instead of ranks.

7.6 Discussions and Conclusions

In this somewhat theoretical manuscript, we have introduced fully nonparametric, rank-based test statistics for inference on multivariate data in factorial designs. To our knowledge, no comparable results in such general applicability (for example for fully ordinal data) have been established yet. Due to the rather cumbersome technicalities, the work has only been carried out here for a design with two factors, but it can be extended in a straightforward way to higher-way layouts. Also, we have focused here on large (a) asymptotics (number of factor levels of factor A tends to infinity) and only considered those test statistics in detail that yield asymptotic normality under this type of asymptotic setting. The asymptotic distribution of the test for main effect of factor B will be that of a weighted sum of χ 2 random variables.

It should be pointed out that the test statistics can be calculated directly, they don’t involve any iterative computational procedures. The test statistics presented here can be taken as a basis for small sample approximations based on moment estimators or expansions. In future work, it would be interesting to compare their performance with resampling based methods such as those from Konietschke et al. (2015), or with other robust procedures based on semiparametric models.