Change of Measure Applications in Nonparametric Statistics

Alvo, Mayer

doi:10.1007/978-3-030-57306-5_2

Mayer Alvo⁴

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 339))

Included in the following conference series:

Conference of the International Society for Non-Parametric Statistics

1225 Accesses

Abstract

Neyman [7] was the first to propose a change in measure in the context of goodness of fit problems. This provided an alternative density to the one for the null hypothesis. Hoeffding introduced a change of measure formula for the ranks of the observed data which led to obtaining locally most powerful rank tests. In this paper, we review these methods and propose a new approach which leads on the one hand to new derivations of existing statistics. On the other hand, we exploit these methods to obtain Bayesian applications for ranking data.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A unifying framework for rank and pseudo-rank based inference using nonparametric confidence distributions

Article Open access 06 April 2023

Empirical Likelihood and Ranking Methods

Efficient and adaptive rank-based fits for linear models with skew-normal errors

Article Open access 22 October 2014

Keywords

Mathematics Subject Classification (2010)

1 Introduction

In a landmark paper, [7] considered the nonparametric goodness of fit problem and introduced the notion of smooth tests of fit by proposing a parametric family of alternative densities to the null hypothesis. In this article, we describe a number of applications of this change of measure. Hence, we obtain a new derivation of the well-known Friedman statistic as the locally most powerful test in an embedded family of distributions.

2 Smooth Models

Suppose that the probability mass function of a discrete k-dimensional random vector $\textit{\textbf{X}}$ is given by

$$\begin{aligned} \pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }\right) =\exp \left( \varvec{\theta }'\textit{\textbf{x}}_{j}-K(\varvec{\theta })\right) p_{j},\;j=1,\ldots ,m, \end{aligned}$$

(1)

where $\textit{\textbf{x}}_{j}$ is the jth value of $\textit{\textbf{X}}$ and $\textit{\textbf{p}}=\left( p_{j}\right) '$ denotes the vector of probabilities when $\varvec{\theta }=\varvec{\theta }_{0}$. Here $K\left( \varvec{\theta }\right) $ is a normalizing constant for which

$$ \sum _{j}\pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }\right) =1. $$

We see that the model in (1) prescribes a change of measure from the null to the alternative hypothesis. Let $\textit{\textbf{T}}=\left[ \textit{\textbf{x}}_{i},\ldots ,\textit{\textbf{x}}_{m}\right] $ be the $k\times m$ matrix of possible vector values of $\textit{\textbf{X}}$. Then under the distribution specified by $\textit{\textbf{p}}$,

$$\begin{aligned} \varvec{\Sigma }\equiv Cov_{\textit{\textbf{p}}}\left( \textit{\textbf{X}}\right)= & {} E_{\textit{\textbf{p}}}\left[ \left( \textit{\textbf{X}}-E\left[ \textit{\textbf{X}}\right] \right) \left( \textit{\textbf{X}}-E\left[ \textit{\textbf{X}}\right] \right) '\right] \end{aligned}$$

(2)

$$\begin{aligned}= & {} \textit{\textbf{T}}\left( diag\left( \textit{\textbf{p}}\right) \right) \textit{\textbf{T}}'-\left( \textit{\textbf{T}}\textit{\textbf{p}}\right) \left( \textit{\textbf{T}}\textit{\textbf{p}}\right) ', \end{aligned}$$

(3)

where the expectations are with respect to the model (1). This particular situation arises often when dealing with the nonparametric randomized block design. Define

$$ \varvec{\pi }\left( \varvec{\theta }\right) =\left( \pi \left( \textit{\textbf{x}}_{1};\varvec{\theta }\right) ,\ldots ,\pi \left( \textit{\textbf{x}}_{m};\varvec{\theta }\right) \right) ' $$

and suppose that we would like to test

$$ H_{0}:\varvec{\theta }=\textit{\textbf{0}}\,\mathrm{vs}\,H_{1}:\varvec{\theta }\ne \textit{\textbf{0}}. $$

Letting $\textit{\textbf{N}}$ denote a multinomial random vector with parameters $\left( n,\varvec{\pi }\left( \varvec{\theta }\right) \right) $, we see that the log likelihood as a function of $\varvec{\theta }$ is, apart from a constant, proportional to

$$\begin{aligned} \sum _{j=1}^{m}n_{j}\log \left( \pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }\right) \right)= & {} \sum _{j=1}^{m}n_{j}\left( \varvec{\theta }'\textit{\textbf{x}}_{j}-K(\varvec{\theta })\right) \\= & {} \varvec{\theta }'\left( \sum _{j=1}^{m}n_{j}\textit{\textbf{x}}_{j}\right) -nK(\varvec{\theta }). \end{aligned}$$

The score vector under the null hypothesis is then given by

$$\begin{aligned} U\left( \varvec{\theta ;X}\right)= & {} \sum _{j=1}^{m}N_{j}\left( \frac{1}{\pi _{j}\left( \varvec{\theta }\right) }\frac{\partial \pi _{j}\left( \varvec{\theta }\right) }{\partial \varvec{\theta }}\right) \\= & {} \textit{\textbf{T}}\left( \textit{\textbf{N}}-n\textit{\textbf{p}}\right) . \end{aligned}$$

Under the null hypothesis,

$$ E\left[ U\left( \varvec{\theta ;X}\right) \right] =0, $$

and the score statistic is given by

$$\begin{aligned} \frac{1}{n}\left[ \textit{\textbf{T}}(\textit{\textbf{N}}-n\textit{\textbf{p}})\right] '\varvec{\Sigma }^{-1}\left[ \textit{\textbf{T}}(\textit{\textbf{N}}-n\textit{\textbf{p}})\right] =\frac{1}{n}(\textit{\textbf{N}}-n\textit{\textbf{p}})'\left( \textit{\textbf{T}}'\varvec{\Sigma }^{-1}\textit{\textbf{T}}\right) (\textit{\textbf{N}}-n\textit{\textbf{p}})\xrightarrow {\mathcal {L}}\chi _{r}^{2}, \end{aligned}$$

(4)

where $r=rank\left( \textit{\textbf{T}}'\varvec{\Sigma }^{-1}\textit{\textbf{T}}\right) .$

In the one-sample ranking problem whereby a group of judges are each asked to rank a set of t objects in accordance with some criterion, let $\mathcal {P}=\left\{ \varvec{\nu }_{j},j=1,\ldots ,t!\right\} $ be the space of all t! permutations of the integers $1,2,\ldots ,t$ and let the probability mass distribution defined on $\mathcal {P}$ be given by

$$ \textit{\textbf{p}}=\left( p_{1},\ldots ,p_{t!}\right) , $$

where $p_{j}=\Pr \left( \varvec{\nu }_{j}\right) $. Conceptually, each judge selects a ranking $\varvec{\nu }$ in accordance with the probability mass distribution $\textit{\textbf{p}}.$ In order to test the null hypothesis that each of the rankings are selected with equal probability, that is,

$$\begin{aligned} H_{0}:\textit{\textbf{p}}=\textit{\textbf{p}}_{0}\text { vs }H_{1}:\textit{\textbf{p}}\ne \textit{\textbf{p}}_{0}, \end{aligned}$$

(5)

where $\textit{\textbf{p}}_{0}=\frac{1}{t!}\mathbf {1},$ define a k-dimensional vector score function $\textit{\textbf{X}}\left( \varvec{\nu }\right) $ on the space $\mathcal {P}$ and following (1), let its smooth probability mass function be given as

$$\begin{aligned} \pi (\textit{\textbf{x}}_{j};\varvec{\theta })=\exp \left( \varvec{\theta }'\textit{\textbf{x}}_{j}-K(\varvec{\theta })\right) \frac{1}{t!},\quad j=1,\ldots ,t! \end{aligned}$$

(6)

where $\varvec{\theta }$ is a t-dimensional vector, $K\left( \varvec{\theta }\right) $ is a normalizing constant and $\textit{\textbf{x}}_{j}$ is a t-dimensional score vector to be specified in (8). Since

$$ \sum _{j=1}^{t!}\pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }\right) =1 $$

it can be seen that $K(\textit{\textbf{0}})=0$ and hence the hypotheses in (5) are equivalent to testing

$$\begin{aligned} H_{0}:\varvec{\theta }=\textit{\textbf{0}}\text { vs }H_{1}:\varvec{\theta }\ne \textit{\textbf{0}}. \end{aligned}$$

(7)

It follows that the log likelihood function is proportional to

$$ l\left( \varvec{\theta }\right) \sim n\left[ \varvec{\theta }'\hat{\varvec{\eta }}-K(\varvec{\theta })\right] , $$

where

$$ \hat{\varvec{\eta }}=\left[ \sum _{j=1}^{t!}\textit{\textbf{x}}_{j}\hat{p}_{nj}\right] ,\hat{p}_{nj}=\frac{n_{j}}{n} $$

and $n_{j}$ represents the number of observed occurrences of the ranking $\varvec{\nu }_{j}$. The Rao score statistic evaluated at $\varvec{\theta }=\textit{\textbf{0}}$ is

$$\begin{aligned} U\left( \varvec{\theta };\textit{\textbf{X}}\right)= & {} n\frac{\partial }{\partial \theta }\left[ \varvec{\theta }'\varvec{\hat{\eta }}-K(\textit{\textbf{0}})\right] \\= & {} n\left[ \varvec{\hat{\eta }}-\frac{\partial }{\partial \theta }K(\textit{\textbf{0}})\right] , \end{aligned}$$

whereas the information matrix is

$$\begin{aligned} \textit{\textbf{I}}(\varvec{\theta })= & {} -n\left[ \frac{\partial ^{2}}{\partial \theta ^{2}}K(\textit{\textbf{0}})\right] . \end{aligned}$$

The test then rejects the null hypothesis whenever

$$ n^{2}\left[ \hat{\varvec{\eta }}-\frac{\partial }{\partial \varvec{\theta }}K\left( \textit{\textbf{0}}\right) \right] '\textit{\textbf{I}}^{-1}\left( \textit{\textbf{0}}\right) \left[ \hat{\varvec{\eta }}-\frac{\partial }{\partial \varvec{\theta }}K\left( \textit{\textbf{0}}\right) \right] >\chi _{f}^{2}\left( \alpha \right) , $$

where $\chi _{f}^{2}\left( \alpha \right) $ is the upper $100$ $\left( 1-\alpha \right) \%$ critical value of a chi square distribution with $f=\text {rank}(\textit{\textbf{I}}\left( \varvec{\theta }\right) )$ degrees of freedom. We note that the test just obtained is the locally most powerful test of $H_{0}.$

Specializing this test statistic to the Spearman score function of adjusted ranks

$$\begin{aligned} \textit{\textbf{x}}_{j}=\left( \nu _{j}\left( 1\right) -\frac{t+1}{2},\ldots ,\nu _{j}\left( t\right) -\frac{t+1}{2}\right) ',\,j=1,\ldots ,t!, \end{aligned}$$

(8)

we can show that the Rao score statistic is the well-known Friedman test [5].

$$\begin{aligned} W=\frac{12n}{t\left( t+1\right) }\sum _{i=1}^{t}\left[ \bar{R}_{i}-\frac{t+1}{2}\right] ^{2}, \end{aligned}$$

(9)

where $\bar{R}_{i}$ is the average of the ranks assigned to the ith object.

2.1 The Two-Sample Ranking Problem

The approach just described can be used to deal with the two-sample ranking problem assuming again the Spearman score function. Let $\textit{\textbf{X}}_{1},\textit{\textbf{X}}_{2}$ be two independent random vectors whose distributions as in the one sample case are expressed for simplicity as

$$ \pi \left( \textit{\textbf{x}}_{j};\varvec{\theta }_{l}\right) =\exp \left\{ \varvec{\theta }_{l}'\textit{\textbf{x}}_{j}-K\left( \varvec{\theta }_{l}\right) \right\} p_{l}\left( j\right) ,\;j=1,\ldots ,t!,l=1,2, $$

where $\varvec{\theta }_{l}=\left( \theta _{l1},\ldots ,\theta _{lt}\right) ^{\prime }$ represents the vector of parameters for population l. We are interested in testing

$$ H_{0}:\varvec{\theta }_{1}=\varvec{\theta }_{2}\text { vs }H_{1}:\varvec{\theta }_{1}\ne \varvec{\theta }_{2}. $$

The probability distribution $\left\{ p_{l}\left( j\right) \right\} $ represents an unspecified null situation. Define

$$ \hat{\textit{\textbf{p}}}_{l}=\left( \frac{n_{l1}}{n_{l}},\ldots ,\frac{n_{lt!}}{n_{l}}\right) ', $$

where $n_{ij}$ represents the number of occurrences of the ranking $\varvec{\nu }_{j}$ in sample l.

Also, for $l=1,2$, set $\sum _{j}n_{ij}\equiv n_{l}$, $\varvec{\gamma }=\varvec{\theta }_{1}-\varvec{\theta }_{2}$ and

$$\begin{aligned} \varvec{\theta }_{l}= & {} \textit{\textbf{m}}+b_{l}\varvec{\gamma }, \end{aligned}$$

where

$$ \textit{\textbf{m}}=\frac{n_{1}\varvec{\theta }_{1}+n_{2}\varvec{\theta }_{2}}{n_{1}+n_{2}},b_{1}=\frac{n_{2,}}{n_{1}+n_{2}},b_{2}=-\frac{n_{1}}{n_{1}+n_{2}}. $$

Let $\varvec{\Sigma }_{l}$ be the covariance matrix of $\textit{\textbf{X}}_{l}$ under the null hypothesis defined as

$$ \varvec{\Sigma }_{l}=\varvec{\Pi }_{l}-\textit{\textbf{p}}_{l}\textit{\textbf{p}}_{l}', $$

where $\varvec{\Pi }_{l}=diag\left( p_{l}\left( 1\right) ,\ldots ,p_{l}\left( t!\right) \right) $ and $\textit{\textbf{p}}_{l}=\left( p_{l}\left( 1\right) ,\ldots ,p_{l}\left( t!\right) \right) '$. The logarithm of the likelihood L as a function of $\left( \textit{\textbf{m}},\varvec{\gamma }\right) $ is proportional to

$$\begin{aligned} \log \,L\left( \textit{\textbf{m}},\varvec{\gamma }\right)\sim & {} \sum _{l=1}^{2}\sum _{j=1}^{t!}n_{lj}\left\{ \left( \textit{\textbf{m}}+b_{l}\varvec{\gamma }\right) '\textit{\textbf{x}}_{j}-K\left( \varvec{\theta }_{l}\right) \right\} . \end{aligned}$$

In order to test

$$ H_{0}:\varvec{\theta }_{1}=\varvec{\theta }_{2}\text { vs }H_{1}:\varvec{\theta }_{1}\ne \varvec{\theta }_{2} $$

we calculate the Rao score test statistic which is given by

$$\begin{aligned} n\left( \textit{\textbf{T}}_{S}\hat{\textit{\textbf{p}}}_{1}-\textit{\textbf{T}}_{S}\hat{\textit{\textbf{p}}}_{2}\right) '\hat{\textit{\textbf{D}}}\left( \textit{\textbf{T}}_{S}\hat{\textit{\textbf{p}}}_{1}-\textit{\textbf{T}}_{S}\hat{\textit{\textbf{p}}}_{2}\right) . \end{aligned}$$

(10)

It can be shown to have asymptotically a $\chi _{f}^{2}$ whenever $n_{l}/n\rightarrow \lambda _{l}>0$ as $n\rightarrow \infty ,$ where $n=n_{1}+n_{2}.$ Here $\hat{\textit{\textbf{D}}}$ is the Moore–Penrose inverse of $\textit{\textbf{T}}_{S}\hat{\varvec{\Sigma }}\textit{\textbf{T}}_{S}'$ and $\hat{\varvec{\Sigma }}$ is a consistent estimator of $\varvec{\Sigma }=\frac{\varvec{\Sigma }_{1}}{\lambda _{1}}+\frac{\varvec{\Sigma }_{2}}{\lambda _{2}}$ and f is the rank of $\hat{\textit{\textbf{D}}}$, as required.

2.2 The Use of Penalized Likelihood

In the previous sections, it was possible to derive test statistics for the one and two-sample ranking problems by means of the change of measure paradigm. This paradigm may be exploited to obtain new results for the ranking problems. Specifically, we consider a negative penalized likelihood function defined to be the negative log likelihood function subject to a constraint on the parameters which is then minimized with respect to the parameter. This approach yields further insight into ranking problems.

For the one-sample ranking problem, let

$$\begin{aligned} \varLambda (\varvec{\theta },c)=-\varvec{\theta }'\left[ \sum _{j=1}^{t!}n_{j}\textit{\textbf{x}}_{j}\right] +nK(\varvec{\theta })+\lambda \bigg (\sum _{i=1}^{t}\theta _{i}^{2}-c\bigg ) \end{aligned}$$

(11)

represent the penalizing function for some prescribed values of the constant c. We shall assume for simplicity that $\left\| \textit{\textbf{x}}_{j}\right\| =1$. When t is large (say $t\ge 10$), the computation of the exact value of the normalizing constant $K(\varvec{\theta })$ involves a summation of t! terms. [6] noted the resemblance of (6) to the continuous von Mises-Fisher density

$$ f\left( \varvec{x;\theta }\right) =\frac{\left\| \varvec{\theta }\right\| ^{\frac{t-3}{2}}}{2^{\frac{t-3}{2}}t!I_{\frac{t-3}{2}}(\left\| \varvec{\theta }\right\| )\Gamma (\frac{t-1}{2})}\exp \left( \varvec{\theta }'\textit{\textbf{x}}\right) , $$

where $\left\| \varvec{\theta }\right\| $ is the norm of $\varvec{\theta }$ and $\textit{\textbf{x}}$ is on the unit sphere and $I_{\upsilon }(z)$ is the modified Bessel function of the first kind given by

$$ I_{\upsilon }(z)=\sum _{k=0}^{\infty }\frac{1}{\Gamma (k+1)\Gamma (\upsilon +k+1)}\left( \frac{z}{2}\right) ^{2k+\nu }. $$

This seems to suggest the approximation of the constant $K\left( \varvec{\theta }\right) $ by

$$ \exp \left( -K(\varvec{\theta })\right) \approx \frac{1}{t!}\cdot \frac{\left\| \varvec{\theta }\right\| ^{\frac{t-3}{2}}}{2^{\frac{t-3}{2}}I_{\frac{t-3}{2}}(\left\| \varvec{\theta }\right\| )\Gamma (\frac{t-1}{2})}. $$

In [1], penalized likelihood was used in ranking situations to obtain further insight into the differences between groups of rankers.

3 Bayesian Models for Ranking Data

The fact that the model in (1) is itself parametric in nature leads one to consider an extension to Bayesian considerations. Let $\textit{\textbf{R}}=(R(1),\ldots ,R(t))'$ be a ranking t items, labeled $1,\ldots ,t$ and define the standardized rankings as

$$ \textit{\textbf{y}}=\left( \textit{\textbf{R}}-\frac{t+1}{2}\mathbf {1}\right) /\sqrt{\frac{t(t^{2}-1)}{12}}, $$

where $\textit{\textbf{y}}$ is the $t\times 1$ vector with $\left\| \textit{\textbf{y}}\right\| \equiv \sqrt{\textit{\textbf{y}}'\textit{\textbf{y}}}=1$. We consider the following more general ranking model:

$$ \pi (\textit{\textbf{y}}|\kappa ,\varvec{\theta })=C(\kappa ,\varvec{\theta })\exp \left\{ \kappa \varvec{\theta }'\textit{\textbf{y}}\right\} , $$

where the parameter $\varvec{\theta }$ is a $t\times 1$ vector with $\left\| \varvec{\theta }\right\| =1$, parameter $\kappa \ge 0$, and $C(\kappa ,\varvec{\theta })$ is the normalizing constant. This model has a close connection to the distance-based models considered in [3]. Here, $\varvec{\theta }$ is a real-valued vector, representing a consensus view of the relative preference of the items from the individuals. Since both $\left\| \varvec{\theta }\right\| =1$ and $\left\| \textit{\textbf{y}}\right\| =1$, the term $\varvec{\theta }'\textit{\textbf{y}}$ can be seen as $\cos \phi $ where $\phi $ is the angle between the consensus score vector $\varvec{\theta }$ and the observation $\textit{\textbf{y}}$. The probability of observing a ranking is proportional to the cosine of the angle from the consensus score vector. The parameter $\kappa $ can be viewed as a concentration parameter. For small $\kappa $, the distribution of rankings will appear close to a uniform whereas for larger values of $\kappa $, the distribution of rankings will be more concentrated around the consensus score vector. We call this new model an angle-based ranking model.

To compute the normalizing constant $C(\kappa ,\varvec{\theta })$, let $\text {P}_{t}$ be the set of all possible permutations of the integers $1,\ldots ,t$. Then

$$\begin{aligned} \left( C(\kappa ,\varvec{\theta })\right) ^{-1}=\sum _{\textit{\textbf{y}}\in \mathcal {P}}\exp \left\{ \kappa \varvec{\theta }^{T}\textit{\textbf{y}}\right\} . \end{aligned}$$

(12)

Notice that the summation is over the t! elements in $\mathcal {P}$. When t is large, say greater than 15, the exact calculation of the normalizing constant is prohibitive. Using the fact that the set of t! permutations lie on a sphere in $(t-1)$-space, our model resembles the continuous von Mises-Fisher distribution, abbreviated as $vMF(\textit{\textbf{x}}|\textit{\textbf{m}},\kappa )$, which is defined on a $\left( p-1\right) $ unit sphere with mean direction $\textit{\textbf{m}}$ and concentration parameter $\kappa $:

$$ p(\textit{\textbf{x}}|\kappa ,\textit{\textbf{m}})=V_{p}(\kappa )\exp (\kappa \textit{\textbf{m}}'\textit{\textbf{x}}), $$

where

$$ V_{p}(\kappa )=\frac{\kappa ^{\frac{p}{2}-1}}{\left( 2\pi \right) ^{\frac{p}{2}}I_{\frac{p}{2}-1}(\kappa )}, $$

and $I_{a}(\kappa )$ is the modified Bessel function of the first kind with order a. Consequently, we may approximate the sum in (12) by an integral over the sphere:

$$ C(\kappa ,\varvec{\theta })\simeq C_{t}(\kappa )=\frac{\kappa ^{\frac{t-3}{2}}}{2^{\frac{t-3}{2}}t!I_{\frac{t-3}{2}}(\kappa )\Gamma (\frac{t-1}{2})}, $$

where $\Gamma (.)$ is the gamma function. In ( [9], it is shown that this approximation is very accurate for values of $\kappa $ ranging from 0.01 to 2 and t ranging from 4 to 11. Moreover, the error drops rapidly as t increases. Note that this approximation allows us to approximate the first and second derivatives of $\log \,C$ which can facilitate our computation in what follows.

3.1 Maximum Likelihood Estimation (MLE) of Our Model

Let $\textit{\textbf{Y}}=\left\{ \textit{\textbf{y}}_{1},\ldots ,\textit{\textbf{y}}_{N}\right\} $ be a random sample of N standardized rankings drawn from $p(\textit{\textbf{y}}|\kappa ,\varvec{\theta })$. The log likelihood of $\left( \kappa ,\varvec{\theta }\right) $ is then given by

$$\begin{aligned} l(\kappa ,\varvec{\theta })=n\log C_{t}(\kappa )+\sum _{i=1}^{n}\kappa \varvec{\theta }'\textit{\textbf{y}}_{i}. \end{aligned}$$

(13)

Maximizing (13) subject to $\left\| \varvec{\theta }\right\| =1$ and $\kappa \ge 0$, we find that the maximum likelihood estimator of $\varvec{\theta }$ is given by $\hat{\varvec{\theta }}_{MLE}=\frac{\sum _{i=1}^{N}\textit{\textbf{y}}_{i}}{\left\| \sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| },$ and $\hat{\kappa }$ is the solution of

$$\begin{aligned} A_{t}(\kappa )\equiv \frac{-C_{t}^{'}(\kappa )}{C_{t}(\kappa )}=\frac{I_{\frac{t-1}{2}}\left( \kappa \right) }{I_{\frac{t-3}{2}}\left( \kappa \right) }=\frac{\left\| \sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| }{N}\equiv r. \end{aligned}$$

(14)

A simple approximation to the solution of (14) following [4] is given by

$$ \hat{\kappa }_{MLE}=\frac{r(t-1-r^{2})}{1-r^{2}}. $$

A more precise approximation can be obtained from a few iterations of Newton’s method. Using the method suggested by [8], starting from an initial value $\kappa _{0}$, we can recursively update $\kappa $ by iteration:

$$ \kappa _{i+1}=\kappa _{i}-\frac{A_{t}(\kappa _{i})-r}{1-A_{t}(\kappa _{i})^{2}-\frac{t-2}{\kappa _{i}}A_{t}(\kappa _{i})},\;i=0,1,2,\ldots . $$

3.2 One-Sample Bayesian Method with Conjugate Prior

Taking a Bayesian approach, we consider the following conjugate prior for $(\kappa ,\varvec{\theta })$ as

$$\begin{aligned} p(\kappa ,\varvec{\theta })\propto \left[ C_{t}(\kappa )\right] ^{\nu _{0}}\exp \left\{ \beta _{0}\kappa \textit{\textbf{m}}_{0}'\varvec{\theta }\right\} , \end{aligned}$$

(15)

where $\left\| \textit{\textbf{m}}_{0}\right\| =1$, $\nu _{0},\beta _{0}\ge 0$. Given $\textit{\textbf{y}}$, the posterior density of $(\kappa ,\varvec{\theta })$ can be expressed by

$$ p(\alpha ,\varvec{\theta }|\textit{\textbf{y}})\propto \exp \left\{ \beta \kappa \textit{\textbf{m}}'\varvec{\theta }\right\} V_{t}(\beta \kappa )\cdot \frac{\left[ C_{t}(\kappa )\right] ^{N+\nu _{0}}}{V_{t}(\beta \kappa )}, $$

where $\textit{\textbf{m}}=\left( \beta _{0}\textit{\textbf{m}}_{\textit{\textbf{0}}}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right) \beta ^{-1},$ $\beta =\left\| \beta _{0}\textit{\textbf{m}}_{0}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| $. The posterior density can be factored as

$$\begin{aligned} p(\kappa ,\theta |\textit{\textbf{y}})=p(\theta |\kappa ,\textit{\textbf{y}})p(\kappa |\textit{\textbf{y}}), \end{aligned}$$

(16)

where $p(\varvec{\theta }|\kappa ,\textit{\textbf{y}})\sim vMF(\varvec{\theta }|\textit{\textbf{m}},\beta \kappa )$ and

$$ p(\kappa |\textit{\textbf{y}})\propto \frac{\left[ C_{t}(\kappa )\right] ^{N+\nu _{0}}}{V_{t}(\beta \kappa )}=\frac{\kappa ^{\frac{t-3}{2}(\upsilon _{0}+N)}I_{\frac{t-2}{2}}(\beta \kappa )}{\left[ I_{\frac{t-3}{2}}(\kappa )\right] ^{\nu _{0}+N}\left( \beta \kappa \right) ^{\frac{t-2}{2}}}. $$

The normalizing constant for $p(\kappa |\textit{\textbf{y}})$ is not available in closed form. For reasons explained in [9], we approximate the posterior distribution using the method of variational inference (abbreviated VI from here on). Variational inference provides a deterministic approximation to an intractable posterior distribution through optimization. We first adopt a joint vMF- Gamma distribution as the prior for $(\kappa ,\varvec{\theta })$:

$$\begin{aligned} p(\kappa ,\varvec{\theta })&=p(\varvec{\theta }|\kappa )p(\kappa )\\&=vMF(\varvec{\theta }|\textit{\textbf{m}}_{0},\beta _{0}\kappa )\,Gamma(\kappa |a_{0},b_{0}), \end{aligned}$$

where $Gamma(\kappa |a_{0},b_{0})$ is the Gamma density function with shape parameter $a_{0}$ and rate parameter $b_{0}$ (i.e., mean equal to $\frac{a_{0}}{b_{0}}$), and $p(\varvec{\theta }|\kappa )=vMF(\varvec{\theta }|\textit{\textbf{m}}_{0},\beta _{0}\kappa )$. The choice of $Gamma(\kappa |a_{0},b_{0})$ for $p(\kappa )$ is motivated by the fact that for large values of $\kappa $, $p(\kappa )$ in (15) tends to take the shape of a Gamma density. In fact, for large values of $\kappa $, $I_{\frac{t-3}{2}}(\kappa )\simeq \frac{e^{\kappa }}{\sqrt{2\pi \kappa }}$, and hence $p(\kappa )$ becomes the Gamma density with shape $(\nu _{0}-1)\frac{t-2}{2}+1$ and rate $\nu _{0}-\beta _{0}$:

$$ p(\kappa )\propto \frac{\left[ C_{t}(\kappa )\right] ^{\nu _{0}}}{V_{t}(\beta _{0}\kappa )}\propto \kappa ^{(\nu _{0}-1)\frac{t-2}{2}}\exp (-(\nu _{0}-\beta _{0})\kappa ). $$

Using the variational inference framework, [9] showed that the optimal posterior distribution of $\theta $ conditional on $\kappa $ is a von Mises-Fisher distribution $vMF(\varvec{\theta }|\textit{\textbf{m}},\kappa \beta )$ where

$$ \beta =\left\| \beta _{0}\textit{\textbf{m}}_{0}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| \;\text { and }\;\textit{\textbf{m}}=\left( \beta _{0}\textit{\textbf{m}}_{0}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right) \beta ^{-1}. $$

The optimal posterior distribution of $\kappa $ is a $Gamma(\kappa |a,b)$ with shape a and rate b with

$$\begin{aligned} a=a_{0}+N\left( \frac{t-3}{2}\right) +\beta \bar{\kappa }\left[ \frac{\partial }{\partial \beta \kappa }\ln I_{\frac{t-2}{2}}(\beta \bar{\kappa })\right] , \end{aligned}$$

(17)

$$\begin{aligned} b=b_{0}+N\frac{\partial }{\partial \kappa }I_{\frac{t-3}{2}}(\bar{\kappa })+\beta _{0}\left[ \frac{\partial }{\partial \beta _{0}\kappa }\ln I_{\frac{t-2}{2}}(\beta _{0}\bar{\kappa })\right] . \end{aligned}$$

(18)

Finally, the posterior mode $\bar{\kappa }$ can be obtained from the previous iteration as

$$\begin{aligned} \bar{\kappa }={\left\{ \begin{array}{ll} \frac{a-1}{b} &{} \text{ if } a>1,\\ \frac{a}{b} &{} \text{ otherwise. } \end{array}\right. } \end{aligned}$$

(19)

3.3 Two-Sample Bayesian Method with Conjugate Prior

Let $\textit{\textbf{Y}}_{i}=\left\{ \textit{\textbf{y}}_{i1},\ldots ,\textit{\textbf{y}}_{iN_{i}}\right\} $ for $i=1,2,$ be two independent random samples of standardized rankings each drawn, respectively, from $p(\varvec{y_{i}}|\kappa _{i},\varvec{\theta _{i}}).$ Taking a Bayesian approach, we assume that conditional on $\kappa $, there are independent von Mises conjugate priors, respectively, for $(\varvec{\theta }_{1},\varvec{\theta }_{2})$ as

$$ p(\varvec{\theta _{i}}|\kappa )\propto \left[ C_{t}(\kappa )\right] ^{\nu _{i0}}\exp \left\{ \beta _{i0}\kappa \textit{\textbf{m}}_{i0}^{T}\varvec{\theta _{i}}\right\} , $$

where $\left\| \textit{\textbf{m}}_{i0}\right\| =1$, $\nu _{i0},\beta _{i0}\ge 0$. We shall be interested in computing the Bayes factor when considering two models. Under model 1, denoted $M_{1},$ $\varvec{\theta _{1}}=\varvec{\theta _{2}}$ whereas under model 2, denoted $M_{2}$, equality is not assumed. The Bayes factor comparing the two models is defined to be

$$\begin{aligned} B_{21}= & {} \frac{\int p(\varvec{y_{1}}|\kappa ,\varvec{\theta _{1}})p(\varvec{y_{2}}|\kappa ,\varvec{\theta _{2}})p(\varvec{\theta _{1}}|\kappa )p(\varvec{\theta _{2}}|\kappa )d\varvec{\theta _{1}}d\varvec{\theta _{2}}d\kappa }{\int p(\varvec{y_{1}}|\kappa ,\varvec{\theta })p(\varvec{y_{2}}|\kappa ,\varvec{\theta })p(\varvec{\theta }|\kappa )d\varvec{\theta }d\kappa }\\ \\= & {} \frac{\int \left[ \int p(\varvec{y_{1}}|\kappa ,\varvec{\theta _{1}})p(\varvec{\theta _{1}}|\kappa )d\varvec{\theta _{1}}\right] \left[ \int p(\varvec{y_{2}}|\kappa ,\varvec{\theta _{2}})p(\varvec{\theta _{2}}|\kappa )d\varvec{\theta _{2}}\right] d\kappa }{\int p(\varvec{y_{1}}|\kappa ,\varvec{\theta })p(\varvec{y_{2}}|\kappa ,\varvec{\theta })p(\varvec{\theta }|\kappa )d\varvec{\theta }d\kappa }. \end{aligned}$$

The Bayes factor enables us to compute the posterior odds of model 2 to model 1. We fist deal with the denominator in $B_{21}.$ Under $M_{1}$, we assume a joint von Mises-Fisher prior on $\theta $ and a Gamma prior on $\kappa \!:$

$$ p\left( \theta ,\kappa \right) =vMF\left( \theta |m_{0},\beta _{0}\kappa \right) G\left( \kappa |a_{0},b_{0}\right) . $$

Hence,

$$\begin{aligned} \int p(\varvec{y_{1}}|\kappa ,\varvec{\theta })p(\varvec{y_{2}}|\kappa ,\varvec{\theta })p(\varvec{\theta }|\kappa )d\varvec{\theta }d\kappa= & {} \int C_{t}^{N}\left( \kappa \right) exp\left\{ \beta \kappa \varvec{\theta }^{T}\textit{\textbf{m}}\right\} V_{t}\left( \beta _{0}\kappa \right) G\left( \kappa |a_{0},b_{0}\right) d\varvec{\theta }d\kappa \\ \\= & {} \int C_{t}^{N}\left( \kappa \right) V_{t}\left( \beta _{0}\kappa \right) V_{t}^{-1}\left( \beta \kappa \right) G\left( \kappa |a_{0},b_{0}\right) d\kappa , \end{aligned}$$

where $N=N_{1}+N_{2}$ and

$$ \textit{\textbf{m}}=\left[ \beta _{0}\varvec{m_{0}}+\sum _{i=1}^{2}\sum _{j=1}^{N_{i}}\varvec{y_{ij}}\right] \beta ^{-1},\beta =\parallel \textit{\textbf{m}}\parallel . $$

Now,

$$\begin{aligned} \int C_{t}^{N}\left( \kappa \right) V_{t}\left( \beta _{0}\kappa \right) V_{t}^{-1}\left( \beta \kappa \right) G\left( \kappa |a_{0},b_{0}\right) d\kappa= & {} C\left( \frac{\beta _{0}}{\beta }\right) ^{\frac{t-2}{2}}\int \left[ \frac{\kappa ^{a_{0}+N\left( \frac{t-3}{2}\right) -1}e^{-b_{0}\kappa }I_{\left( \frac{t-2}{2}\right) }\left( \beta \kappa \right) }{I_{\left( \frac{t-2}{2}\right) }\left( \beta _{0}\kappa \right) I_{\left( \frac{t-3}{2}\right) }^{N}\left( \kappa \right) }\right] d\kappa \\ \\\approx & {} C\left( \frac{\beta _{0}}{\beta }\right) ^{\frac{t-2}{2}}\int \kappa ^{a-1}e^{-b\kappa }d\kappa , \end{aligned}$$

where in the last step, we used the method of variational inference as an approximation, with

$$\begin{aligned} C= & {} \frac{b_{0}^{a_{0}}}{\Gamma \left( a_{0}\right) }\left( 2^{N\left( \frac{t-3}{2}\right) }\left( t!\right) ^{N}\Gamma ^{N}\left( \frac{t-1}{2}\right) \right) ^{-1}\\ \\ a_{1}= & {} a_{0}+N\left( \frac{t-3}{2}\right) +\beta \bar{\kappa }\left[ \frac{\partial }{\partial \beta \kappa }\ln I_{\frac{t-2}{2}}(\beta \bar{\kappa })\right] ,\\ b_{1}= & {} b_{0}+N\frac{\partial }{\partial \kappa }I_{\frac{t-3}{2}}(\bar{\kappa })+\beta _{0}\left[ \frac{\partial }{\partial \beta _{0}\kappa }\ln I_{\frac{t-2}{2}}(\beta _{0}\bar{\kappa })\right] \end{aligned}$$

and the posterior mode $\bar{\kappa }$ is

$$ \bar{\kappa }={\left\{ \begin{array}{ll} \frac{a_{1}-1}{b_{1}} &{} \text{ if } a_{1}>1,\\ \frac{a_{1}}{b_{1}} &{} \text{ otherwise. } \end{array}\right. } $$

It follows that the denominator of $B_{21}$ is

$$ C\left( \frac{\beta _{0}}{\beta }\right) ^{\frac{t-2}{2}}\frac{\Gamma \left( a_{1}\right) }{b_{1}^{a_{1}}}. $$

For the numerator, we shall assume that conditional on $\kappa $, there are independent von Mises conjugate priors, respectively, for $\varvec{\theta }_{1},\varvec{\theta }_{2}$ given by

$$ p(\varvec{\theta _{i}}|\kappa )\propto \left[ C_{t}(\kappa )\right] \exp \left\{ \beta _{0}\kappa \textit{\textbf{m}}_{0}^{T}\varvec{\theta _{i}}\right\} ,i=1,2 $$

where $\left\| \textit{\textbf{m}}_{0}\right\| =1$, $\beta _{0}\ge 0$. Hence,

$$\begin{aligned} \int \left[ \int p(\varvec{y_{1}}|\kappa ,\varvec{\theta _{1}})p(\varvec{\theta _{1}}|\kappa )d\varvec{\theta _{1}}\right] \left[ \int p(\varvec{y_{2}}|\kappa ,\varvec{\theta _{2}})p(\varvec{\theta _{2}}|\kappa )\mathrm{d}\varvec{\theta _{2}}\right] \mathrm{d}\kappa&\\ \\ =\int C_{t}^{N}\left( \kappa \right) exp\left\{ \beta _{1}\kappa \varvec{\theta _{1}}^{T}\varvec{m_{1}}\right\} exp\left\{ \beta _{2}\kappa \varvec{\theta _{2}}^{T}\varvec{m_{2}}\right\} V_{t}^{2}\left( \beta _{0}\kappa \right) G\left( \kappa |a_{0},b_{0}\right) \mathrm{d}\varvec{\theta _{1}}\mathrm{d}\varvec{\theta _{2}}\mathrm{d}\kappa&\\ \\ =\int C_{t}^{N}\left( \kappa \right) V_{t}^{-1}\left( \beta _{1}\kappa \right) V_{t}^{-1}\left( \beta _{2}\kappa \right) V_{t}^{2}\left( \beta _{0}\kappa \right) G\left( \kappa |a_{0},b_{0}\right) \mathrm{d}\kappa&\\ \\ = C\left( \frac{\beta _{0}}{\beta _{1}}\right) ^{\frac{t-2}{2}}\left( \frac{\beta _{0}}{\beta _{2}}\right) ^{\frac{t-2}{2}}\int \left[ \frac{\kappa ^{a_{0}+N\left( \frac{t-3}{2}\right) -1}e^{-b_{0}\kappa }I_{\left( \frac{t-2}{2}\right) }\left( \beta _{1}\kappa \right) I_{\left( \frac{t-2}{2}\right) }\left( \beta _{2}\kappa \right) }{I_{\left( \frac{t-2}{2}\right) }^{2}\left( \beta _{0}\kappa \right) I_{\left( \frac{t-3}{2}\right) }^{N}\left( \kappa \right) }\right] \mathrm{d}\kappa&\\ \\ = C\left( \frac{\beta _{0}}{\beta _{1}}\right) ^{\frac{t-2}{2}}\left( \frac{\beta _{0}}{\beta _{2}}\right) ^{\frac{t-2}{2}}\int \kappa ^{a_{2}-1}e^{-b_{2}\kappa }\mathrm{d}\kappa \end{aligned}$$

where for $i=1,2,$

$$\begin{aligned} \varvec{m_{i}}= & {} \left[ \beta _{0}\varvec{m_{0}}+\sum _{j=1}^{N_{i}}\varvec{y_{ij}}\right] \beta _{i}^{-1}=\parallel \varvec{m_{i}}\parallel \\ a_{2}= & {} a_{0}+N\left( \frac{t-3}{2}\right) +\sum _{i}\beta _{i}\bar{\kappa }\left[ \frac{\partial }{\partial \beta _{i}\kappa }\ln I_{\frac{t-2}{2}}(\beta _{i}\bar{\kappa })\right] \\ b_{2}= & {} b_{0}+N\frac{\partial }{\partial \kappa }lnI_{\frac{t-3}{2}}(\bar{\kappa })+2\beta _{0}\left[ \frac{\partial }{\partial \beta _{0}\kappa }\ln I_{\frac{t-2}{2}}(\beta _{0}\bar{\kappa })\right] \end{aligned}$$

and the posterior mode $\bar{\kappa }$ is given recursively:

$$ \bar{\kappa }={\left\{ \begin{array}{ll} \frac{a_{2}-1}{b_{2}} &{} \text{ if } a_{2}>1,\\ \frac{a_{2}}{b_{2}} &{} \text{ otherwise. } \end{array}\right. } $$

It follows that the numerator of the Bayes factor is

$$ C\left( \frac{\beta _{0}}{\beta _{1}}\right) ^{\frac{t-2}{2}}\left( \frac{\beta _{0}}{\beta _{2}}\right) ^{\frac{t-2}{2}}\frac{\Gamma \left( a_{1}\right) }{b_{1}^{a_{1}}}. $$

The Bayes factor is then given by the ratio

$$\begin{aligned} B_{21} \quad = \quad&\frac{\left( \frac{\beta _{0}}{\beta }\right) ^{\frac{t-2}{2}}\frac{\Gamma \left( a_{1}\right) }{b_{1}^{a_{1}}}}{\left( \frac{\beta _{0}}{\beta _{1}}\right) ^{\frac{t-2}{2}}\left( \frac{\beta _{0}}{\beta _{2}}\right) ^{\frac{t-2}{2}}\frac{\Gamma \left( a_{2}\right) }{b_{2}^{a_{2}}}}\\ =\left( \frac{\beta _{1}\beta _{2}}{\beta \beta _{0}}\right) ^{\frac{t-2}{2}}&\frac{\Gamma \left( a_{1}\right) b_{2}^{a_{2}}}{\Gamma \left( a_{2}\right) b_{1}^{a_{1}}}. \end{aligned}$$

4 Conclusion

In this article, we have considered a few applications of the change of measure paradigm. In particular, it was possible to obtain a new derivation of the Friedman statistic. As well, extensions to the Bayesian models for ranking data were considered. Further applications as, for example, to the sign and Wilcoxon tests are found in [2].

References

Alvo, M., Xu, H.: The analysis of ranking data using score functions and penalized likelihood. Austrian J. Stat. 46, 15–32 (2017)
Article Google Scholar
Alvo, M., Yu, Philip, L.H.: A Parametric Introduction to Nonparametric Statistics. Spinger, Berlin (2018)
Google Scholar
Alvo, M., Yu, Philip, L.H.: Statistical Methods for Ranking Data. Springer, Berlin (2014)
Google Scholar
Banerjee, A., Dhillon, IS., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32, 675–701 (1937)
Article Google Scholar
McCullagh, P.: Models on spheres and models for permutations. In: Fligner, M.A., Verducci, J.S. (eds.) Probability Models and Statistical Analyses for Ranking Data, p. 278283. Springer, Berlin
Google Scholar
Neyman, J.: Smooth test for goodness of fit. Skandinavisk Aktuarietidskrift 20, 149–199 (1937)
MATH Google Scholar
Sra, S.: A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of is (x). Comput. Stat. 27(1), 177–190 (2012)
Article Google Scholar
Xu, H., Alvo, M., Yu, Philip, L.H.: Angle-based models for ranking data. Comput. Stat. Data Anal. 121, 113–136 (2018)
Google Scholar

Download references

Acknowledgements

Work supported by the Natural Sciences and Engineering Council of Canada, Grant OGP0009068.

Author information

Authors and Affiliations

Department of Mathematics, Statistics University of Ottawa, Ottawa, ON, Canada
Mayer Alvo

Authors

Mayer Alvo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mayer Alvo .

Editor information

Editors and Affiliations

Department of Economics and Statistics, University of Salerno, Salerno, Italy
Michele La Rocca
Department of Methods and Models for Economics, Terrority and Finance, Sapienza University of Rome, Rome, Italy
Brunero Liseo
Department of Management and Engineering, University of Padua, Vicenza, Italy
Luigi Salmaso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alvo, M. (2020). Change of Measure Applications in Nonparametric Statistics. In: La Rocca, M., Liseo, B., Salmaso, L. (eds) Nonparametric Statistics. ISNPS 2018. Springer Proceedings in Mathematics & Statistics, vol 339. Springer, Cham. https://doi.org/10.1007/978-3-030-57306-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-57306-5_2
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57305-8
Online ISBN: 978-3-030-57306-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Change of Measure Applications in Nonparametric Statistics

Abstract

Similar content being viewed by others

A unifying framework for rank and pseudo-rank based inference using nonparametric confidence distributions

Empirical Likelihood and Ranking Methods

Efficient and adaptive rank-based fits for linear models with skew-normal errors

Keywords

Mathematics Subject Classification (2010)

1 Introduction

2 Smooth Models

2.1 The Two-Sample Ranking Problem

2.2 The Use of Penalized Likelihood

3 Bayesian Models for Ranking Data

3.1 Maximum Likelihood Estimation (MLE) of Our Model

3.2 One-Sample Bayesian Method with Conjugate Prior

3.3 Two-Sample Bayesian Method with Conjugate Prior

4 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Change of Measure Applications in Nonparametric Statistics

Abstract

Similar content being viewed by others

A unifying framework for rank and pseudo-rank based inference using nonparametric confidence distributions

Empirical Likelihood and Ranking Methods

Efficient and adaptive rank-based fits for linear models with skew-normal errors

Keywords

Mathematics Subject Classification (2010)

1 Introduction

2 Smooth Models

2.1 The Two-Sample Ranking Problem

2.2 The Use of Penalized Likelihood

3 Bayesian Models for Ranking Data

3.1 Maximum Likelihood Estimation (MLE) of Our Model

3.2 One-Sample Bayesian Method with Conjugate Prior

3.3 Two-Sample Bayesian Method with Conjugate Prior

4 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation