Abstract
Neyman [7] was the first to propose a change in measure in the context of goodness of fit problems. This provided an alternative density to the one for the null hypothesis. Hoeffding introduced a change of measure formula for the ranks of the observed data which led to obtaining locally most powerful rank tests. In this paper, we review these methods and propose a new approach which leads on the one hand to new derivations of existing statistics. On the other hand, we exploit these methods to obtain Bayesian applications for ranking data.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
Mathematics Subject Classification (2010)
1 Introduction
In a landmark paper, [7] considered the nonparametric goodness of fit problem and introduced the notion of smooth tests of fit by proposing a parametric family of alternative densities to the null hypothesis. In this article, we describe a number of applications of this change of measure. Hence, we obtain a new derivation of the well-known Friedman statistic as the locally most powerful test in an embedded family of distributions.
2 Smooth Models
Suppose that the probability mass function of a discrete k-dimensional random vector \(\textit{\textbf{X}}\) is given by
where \(\textit{\textbf{x}}_{j}\) is the jth value of \(\textit{\textbf{X}}\) and \(\textit{\textbf{p}}=\left( p_{j}\right) '\) denotes the vector of probabilities when \(\varvec{\theta }=\varvec{\theta }_{0}\). Here \(K\left( \varvec{\theta }\right) \) is a normalizing constant for which
We see that the model in (1) prescribes a change of measure from the null to the alternative hypothesis. Let \(\textit{\textbf{T}}=\left[ \textit{\textbf{x}}_{i},\ldots ,\textit{\textbf{x}}_{m}\right] \) be the \(k\times m\) matrix of possible vector values of \(\textit{\textbf{X}}\). Then under the distribution specified by \(\textit{\textbf{p}}\),
where the expectations are with respect to the model (1). This particular situation arises often when dealing with the nonparametric randomized block design. Define
and suppose that we would like to test
Letting \(\textit{\textbf{N}}\) denote a multinomial random vector with parameters \(\left( n,\varvec{\pi }\left( \varvec{\theta }\right) \right) \), we see that the log likelihood as a function of \(\varvec{\theta }\) is, apart from a constant, proportional to
The score vector under the null hypothesis is then given by
Under the null hypothesis,
and the score statistic is given by
where \(r=rank\left( \textit{\textbf{T}}'\varvec{\Sigma }^{-1}\textit{\textbf{T}}\right) .\)
In the one-sample ranking problem whereby a group of judges are each asked to rank a set of t objects in accordance with some criterion, let \(\mathcal {P}=\left\{ \varvec{\nu }_{j},j=1,\ldots ,t!\right\} \) be the space of all t! permutations of the integers \(1,2,\ldots ,t\) and let the probability mass distribution defined on \(\mathcal {P}\) be given by
where \(p_{j}=\Pr \left( \varvec{\nu }_{j}\right) \). Conceptually, each judge selects a ranking \(\varvec{\nu }\) in accordance with the probability mass distribution \(\textit{\textbf{p}}.\) In order to test the null hypothesis that each of the rankings are selected with equal probability, that is,
where \(\textit{\textbf{p}}_{0}=\frac{1}{t!}\mathbf {1},\) define a k-dimensional vector score function \(\textit{\textbf{X}}\left( \varvec{\nu }\right) \) on the space \(\mathcal {P}\) and following (1), let its smooth probability mass function be given as
where \(\varvec{\theta }\) is a t-dimensional vector, \(K\left( \varvec{\theta }\right) \) is a normalizing constant and \(\textit{\textbf{x}}_{j}\) is a t-dimensional score vector to be specified in (8). Since
it can be seen that \(K(\textit{\textbf{0}})=0\) and hence the hypotheses in (5) are equivalent to testing
It follows that the log likelihood function is proportional to
where
and \(n_{j}\) represents the number of observed occurrences of the ranking \(\varvec{\nu }_{j}\). The Rao score statistic evaluated at \(\varvec{\theta }=\textit{\textbf{0}}\) is
whereas the information matrix is
The test then rejects the null hypothesis whenever
where \(\chi _{f}^{2}\left( \alpha \right) \) is the upper \(100\) \(\left( 1-\alpha \right) \%\) critical value of a chi square distribution with \(f=\text {rank}(\textit{\textbf{I}}\left( \varvec{\theta }\right) )\) degrees of freedom. We note that the test just obtained is the locally most powerful test of \(H_{0}.\)
Specializing this test statistic to the Spearman score function of adjusted ranks
we can show that the Rao score statistic is the well-known Friedman test [5].
where \(\bar{R}_{i}\) is the average of the ranks assigned to the ith object.
2.1 The Two-Sample Ranking Problem
The approach just described can be used to deal with the two-sample ranking problem assuming again the Spearman score function. Let \(\textit{\textbf{X}}_{1},\textit{\textbf{X}}_{2}\) be two independent random vectors whose distributions as in the one sample case are expressed for simplicity as
where \(\varvec{\theta }_{l}=\left( \theta _{l1},\ldots ,\theta _{lt}\right) ^{\prime }\) represents the vector of parameters for population l. We are interested in testing
The probability distribution \(\left\{ p_{l}\left( j\right) \right\} \) represents an unspecified null situation. Define
where \(n_{ij}\) represents the number of occurrences of the ranking \(\varvec{\nu }_{j}\) in sample l.
Also, for \(l=1,2\), set \(\sum _{j}n_{ij}\equiv n_{l}\), \(\varvec{\gamma }=\varvec{\theta }_{1}-\varvec{\theta }_{2}\) and
where
Let \(\varvec{\Sigma }_{l}\) be the covariance matrix of \(\textit{\textbf{X}}_{l}\) under the null hypothesis defined as
where \(\varvec{\Pi }_{l}=diag\left( p_{l}\left( 1\right) ,\ldots ,p_{l}\left( t!\right) \right) \) and \(\textit{\textbf{p}}_{l}=\left( p_{l}\left( 1\right) ,\ldots ,p_{l}\left( t!\right) \right) '\). The logarithm of the likelihood L as a function of \(\left( \textit{\textbf{m}},\varvec{\gamma }\right) \) is proportional to
In order to test
we calculate the Rao score test statistic which is given by
It can be shown to have asymptotically a \(\chi _{f}^{2}\) whenever \(n_{l}/n\rightarrow \lambda _{l}>0\) as \(n\rightarrow \infty ,\) where \(n=n_{1}+n_{2}.\) Here \(\hat{\textit{\textbf{D}}}\) is the Moore–Penrose inverse of \(\textit{\textbf{T}}_{S}\hat{\varvec{\Sigma }}\textit{\textbf{T}}_{S}'\) and \(\hat{\varvec{\Sigma }}\) is a consistent estimator of \(\varvec{\Sigma }=\frac{\varvec{\Sigma }_{1}}{\lambda _{1}}+\frac{\varvec{\Sigma }_{2}}{\lambda _{2}}\) and f is the rank of \(\hat{\textit{\textbf{D}}}\), as required.
2.2 The Use of Penalized Likelihood
In the previous sections, it was possible to derive test statistics for the one and two-sample ranking problems by means of the change of measure paradigm. This paradigm may be exploited to obtain new results for the ranking problems. Specifically, we consider a negative penalized likelihood function defined to be the negative log likelihood function subject to a constraint on the parameters which is then minimized with respect to the parameter. This approach yields further insight into ranking problems.
For the one-sample ranking problem, let
represent the penalizing function for some prescribed values of the constant c. We shall assume for simplicity that \(\left\| \textit{\textbf{x}}_{j}\right\| =1\). When t is large (say \(t\ge 10\)), the computation of the exact value of the normalizing constant \(K(\varvec{\theta })\) involves a summation of t! terms. [6] noted the resemblance of (6) to the continuous von Mises-Fisher density
where \(\left\| \varvec{\theta }\right\| \) is the norm of \(\varvec{\theta }\) and \(\textit{\textbf{x}}\) is on the unit sphere and \(I_{\upsilon }(z)\) is the modified Bessel function of the first kind given by
This seems to suggest the approximation of the constant \(K\left( \varvec{\theta }\right) \) by
In [1], penalized likelihood was used in ranking situations to obtain further insight into the differences between groups of rankers.
3 Bayesian Models for Ranking Data
The fact that the model in (1) is itself parametric in nature leads one to consider an extension to Bayesian considerations. Let \(\textit{\textbf{R}}=(R(1),\ldots ,R(t))'\) be a ranking t items, labeled \(1,\ldots ,t\) and define the standardized rankings as
where \(\textit{\textbf{y}}\) is the \(t\times 1\) vector with \(\left\| \textit{\textbf{y}}\right\| \equiv \sqrt{\textit{\textbf{y}}'\textit{\textbf{y}}}=1\). We consider the following more general ranking model:
where the parameter \(\varvec{\theta }\) is a \(t\times 1\) vector with \(\left\| \varvec{\theta }\right\| =1\), parameter \(\kappa \ge 0\), and \(C(\kappa ,\varvec{\theta })\) is the normalizing constant. This model has a close connection to the distance-based models considered in [3]. Here, \(\varvec{\theta }\) is a real-valued vector, representing a consensus view of the relative preference of the items from the individuals. Since both \(\left\| \varvec{\theta }\right\| =1\) and \(\left\| \textit{\textbf{y}}\right\| =1\), the term \(\varvec{\theta }'\textit{\textbf{y}}\) can be seen as \(\cos \phi \) where \(\phi \) is the angle between the consensus score vector \(\varvec{\theta }\) and the observation \(\textit{\textbf{y}}\). The probability of observing a ranking is proportional to the cosine of the angle from the consensus score vector. The parameter \(\kappa \) can be viewed as a concentration parameter. For small \(\kappa \), the distribution of rankings will appear close to a uniform whereas for larger values of \(\kappa \), the distribution of rankings will be more concentrated around the consensus score vector. We call this new model an angle-based ranking model.
To compute the normalizing constant \(C(\kappa ,\varvec{\theta })\), let \(\text {P}_{t}\) be the set of all possible permutations of the integers \(1,\ldots ,t\). Then
Notice that the summation is over the t! elements in \(\mathcal {P}\). When t is large, say greater than 15, the exact calculation of the normalizing constant is prohibitive. Using the fact that the set of t! permutations lie on a sphere in \((t-1)\)-space, our model resembles the continuous von Mises-Fisher distribution, abbreviated as \(vMF(\textit{\textbf{x}}|\textit{\textbf{m}},\kappa )\), which is defined on a \(\left( p-1\right) \) unit sphere with mean direction \(\textit{\textbf{m}}\) and concentration parameter \(\kappa \):
where
and \(I_{a}(\kappa )\) is the modified Bessel function of the first kind with order a. Consequently, we may approximate the sum in (12) by an integral over the sphere:
where \(\Gamma (.)\) is the gamma function. In ( [9], it is shown that this approximation is very accurate for values of \(\kappa \) ranging from 0.01 to 2 and t ranging from 4 to 11. Moreover, the error drops rapidly as t increases. Note that this approximation allows us to approximate the first and second derivatives of \(\log \,C\) which can facilitate our computation in what follows.
3.1 Maximum Likelihood Estimation (MLE) of Our Model
Let \(\textit{\textbf{Y}}=\left\{ \textit{\textbf{y}}_{1},\ldots ,\textit{\textbf{y}}_{N}\right\} \) be a random sample of N standardized rankings drawn from \(p(\textit{\textbf{y}}|\kappa ,\varvec{\theta })\). The log likelihood of \(\left( \kappa ,\varvec{\theta }\right) \) is then given by
Maximizing (13) subject to \(\left\| \varvec{\theta }\right\| =1\) and \(\kappa \ge 0\), we find that the maximum likelihood estimator of \(\varvec{\theta }\) is given by \(\hat{\varvec{\theta }}_{MLE}=\frac{\sum _{i=1}^{N}\textit{\textbf{y}}_{i}}{\left\| \sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| },\) and \(\hat{\kappa }\) is the solution of
A simple approximation to the solution of (14) following [4] is given by
A more precise approximation can be obtained from a few iterations of Newton’s method. Using the method suggested by [8], starting from an initial value \(\kappa _{0}\), we can recursively update \(\kappa \) by iteration:
3.2 One-Sample Bayesian Method with Conjugate Prior
Taking a Bayesian approach, we consider the following conjugate prior for \((\kappa ,\varvec{\theta })\) as
where \(\left\| \textit{\textbf{m}}_{0}\right\| =1\), \(\nu _{0},\beta _{0}\ge 0\). Given \(\textit{\textbf{y}}\), the posterior density of \((\kappa ,\varvec{\theta })\) can be expressed by
where \(\textit{\textbf{m}}=\left( \beta _{0}\textit{\textbf{m}}_{\textit{\textbf{0}}}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right) \beta ^{-1},\) \(\beta =\left\| \beta _{0}\textit{\textbf{m}}_{0}+\sum _{i=1}^{N}\textit{\textbf{y}}_{i}\right\| \). The posterior density can be factored as
where \(p(\varvec{\theta }|\kappa ,\textit{\textbf{y}})\sim vMF(\varvec{\theta }|\textit{\textbf{m}},\beta \kappa )\) and
The normalizing constant for \(p(\kappa |\textit{\textbf{y}})\) is not available in closed form. For reasons explained in [9], we approximate the posterior distribution using the method of variational inference (abbreviated VI from here on). Variational inference provides a deterministic approximation to an intractable posterior distribution through optimization. We first adopt a joint vMF- Gamma distribution as the prior for \((\kappa ,\varvec{\theta })\):
where \(Gamma(\kappa |a_{0},b_{0})\) is the Gamma density function with shape parameter \(a_{0}\) and rate parameter \(b_{0}\) (i.e., mean equal to \(\frac{a_{0}}{b_{0}}\)), and \(p(\varvec{\theta }|\kappa )=vMF(\varvec{\theta }|\textit{\textbf{m}}_{0},\beta _{0}\kappa )\). The choice of \(Gamma(\kappa |a_{0},b_{0})\) for \(p(\kappa )\) is motivated by the fact that for large values of \(\kappa \), \(p(\kappa )\) in (15) tends to take the shape of a Gamma density. In fact, for large values of \(\kappa \), \(I_{\frac{t-3}{2}}(\kappa )\simeq \frac{e^{\kappa }}{\sqrt{2\pi \kappa }}\), and hence \(p(\kappa )\) becomes the Gamma density with shape \((\nu _{0}-1)\frac{t-2}{2}+1\) and rate \(\nu _{0}-\beta _{0}\):
Using the variational inference framework, [9] showed that the optimal posterior distribution of \(\theta \) conditional on \(\kappa \) is a von Mises-Fisher distribution \(vMF(\varvec{\theta }|\textit{\textbf{m}},\kappa \beta )\) where
The optimal posterior distribution of \(\kappa \) is a \(Gamma(\kappa |a,b)\) with shape a and rate b with
Finally, the posterior mode \(\bar{\kappa }\) can be obtained from the previous iteration as
3.3 Two-Sample Bayesian Method with Conjugate Prior
Let \(\textit{\textbf{Y}}_{i}=\left\{ \textit{\textbf{y}}_{i1},\ldots ,\textit{\textbf{y}}_{iN_{i}}\right\} \) for \(i=1,2,\) be two independent random samples of standardized rankings each drawn, respectively, from \(p(\varvec{y_{i}}|\kappa _{i},\varvec{\theta _{i}}).\) Taking a Bayesian approach, we assume that conditional on \(\kappa \), there are independent von Mises conjugate priors, respectively, for \((\varvec{\theta }_{1},\varvec{\theta }_{2})\) as
where \(\left\| \textit{\textbf{m}}_{i0}\right\| =1\), \(\nu _{i0},\beta _{i0}\ge 0\). We shall be interested in computing the Bayes factor when considering two models. Under model 1, denoted \(M_{1},\) \(\varvec{\theta _{1}}=\varvec{\theta _{2}}\) whereas under model 2, denoted \(M_{2}\), equality is not assumed. The Bayes factor comparing the two models is defined to be
The Bayes factor enables us to compute the posterior odds of model 2 to model 1. We fist deal with the denominator in \(B_{21}.\) Under \(M_{1}\), we assume a joint von Mises-Fisher prior on \(\theta \) and a Gamma prior on \(\kappa \!:\)
Hence,
where \(N=N_{1}+N_{2}\) and
Now,
where in the last step, we used the method of variational inference as an approximation, with
and the posterior mode \(\bar{\kappa }\) is
It follows that the denominator of \(B_{21}\) is
For the numerator, we shall assume that conditional on \(\kappa \), there are independent von Mises conjugate priors, respectively, for \(\varvec{\theta }_{1},\varvec{\theta }_{2}\) given by
where \(\left\| \textit{\textbf{m}}_{0}\right\| =1\), \(\beta _{0}\ge 0\). Hence,
where for \(i=1,2,\)
and the posterior mode \(\bar{\kappa }\) is given recursively:
It follows that the numerator of the Bayes factor is
The Bayes factor is then given by the ratio
4 Conclusion
In this article, we have considered a few applications of the change of measure paradigm. In particular, it was possible to obtain a new derivation of the Friedman statistic. As well, extensions to the Bayesian models for ranking data were considered. Further applications as, for example, to the sign and Wilcoxon tests are found in [2].
References
Alvo, M., Xu, H.: The analysis of ranking data using score functions and penalized likelihood. Austrian J. Stat. 46, 15–32 (2017)
Alvo, M., Yu, Philip, L.H.: A Parametric Introduction to Nonparametric Statistics. Spinger, Berlin (2018)
Alvo, M., Yu, Philip, L.H.: Statistical Methods for Ranking Data. Springer, Berlin (2014)
Banerjee, A., Dhillon, IS., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32, 675–701 (1937)
McCullagh, P.: Models on spheres and models for permutations. In: Fligner, M.A., Verducci, J.S. (eds.) Probability Models and Statistical Analyses for Ranking Data, p. 278283. Springer, Berlin
Neyman, J.: Smooth test for goodness of fit. Skandinavisk Aktuarietidskrift 20, 149–199 (1937)
Sra, S.: A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of is (x). Comput. Stat. 27(1), 177–190 (2012)
Xu, H., Alvo, M., Yu, Philip, L.H.: Angle-based models for ranking data. Comput. Stat. Data Anal. 121, 113–136 (2018)
Acknowledgements
Work supported by the Natural Sciences and Engineering Council of Canada, Grant OGP0009068.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Alvo, M. (2020). Change of Measure Applications in Nonparametric Statistics. In: La Rocca, M., Liseo, B., Salmaso, L. (eds) Nonparametric Statistics. ISNPS 2018. Springer Proceedings in Mathematics & Statistics, vol 339. Springer, Cham. https://doi.org/10.1007/978-3-030-57306-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-57306-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57305-8
Online ISBN: 978-3-030-57306-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)