Aggregated Tests Based on Supremal Divergence Estimators for Non-regular Statistical Models

Baudry, Jean-Patrick; Broniatowski, Michel; Thommeret, Cyril

doi:10.1007/978-3-031-38271-0_14

Jean-Patrick Baudry^9,11,
Michel Broniatowski⁹ &
Cyril Thommeret^9,10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14071))

Included in the following conference series:

International Conference on Geometric Science of Information

1105 Accesses
1 Altmetric

Abstract

A methodology is proposed to build statistical test procedures pertaining to models with incomplete information; the lack of information corresponds to a nuisance parameter in the description of the model. The supremal approach based on the dual representation of CASM divergences (or $f-$divergences) is fruitful; it leads to M-estimators with simple and standard limit distribution, and it is versatile with respect to the choice of the divergence. Duality approaches to divergence-based optimisation are widely considered in statistics, data analysis and machine learning: indeed, they avoid any smoothing or grouping technique which would be necessary for a more direct divergence minimisation approach for the same problem.

We are interested in a widely considered but still open problem which consists in testing the number of components in a parametric mixture. Although common, this is still a challenging problem since the corresponding model is non-regular particularly because of the true parameter lying on the boundary of the parameter space. This range of problems has been considered by many authors who tried to derive the asymptotic distribution of some statistic under boundary conditions. The present approach based on supremal divergence M-estimators makes the true parameter an interior point of the parameter space, providing a simple solution for a difficult question. To build a composite test, we aggregate simple tests.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Keywords

1 Dual Representation of the $\varphi $-Divergences and Tests

We consider CASM divergences (see [15] for definitions and properties):

$$\begin{aligned} D_\varphi (Q,P) = {\left\{ \begin{array}{ll} \int \varphi (\frac{dQ}{dP}) dP \text { if } Q<< P \\ + \infty \text { otherwise} \end{array}\right. } \end{aligned}$$

where Q and P are probability measures on the same probability space. Extensions to divergences between probability measures and signed measures can be found in [22]. Dual formulations of divergences can be found in [7, 16]. Another interpretation of these formulations can be found in [9, Section 4.6]. They are widely considered in statistics, data analysis and machine learning (see e.g. [4, 20]).

As in [1], let $\mathcal F$ be some class of $\mathcal B$-measurable (borelian) real valued functions and let $\mathcal M_{\mathcal F} = \{P\in \mathcal M : \int |f| dP < \infty , \forall f\in \mathcal F\}$ where $\mathcal M$ is the space of probability measures. Let any $P^*\in \mathcal M$, which shall be the underlying true unknown probability law in a statistical context in the following sections. Assume that $\varphi $ is differentiable and strictly convex. Then, for all $P\in \mathcal M_{\mathcal F}$ such that $D_\varphi (P,P^*)$ is finite and $\varphi '(dP/dP^*)$ belongs to $\mathcal F$, $D_\varphi $ admits the dual representation (see Theorem 4.4 in [6]):

$$\begin{aligned} D_\varphi (P,P^*) = \sup _{f\in \mathcal F} \int f dP - \int \varphi ^\#(f) dP^*, \end{aligned}$$

(1)

where $\varphi ^\#(x) = \sup _{t\in \mathbb R} tx-\varphi (t)$ is the Fenchel-Legendre convex conjugate. Moreover, the supremum is uniquely attained at $f = \varphi '(dP/dP^*)$.

This result can be used in two directions. First, a statistical model, e.g. a parametrical model $\{P_\theta : \theta \in \varTheta \}$ with $P_\theta $ is absolutely continuous with respect to some dominating measure $\mu $ for any $\theta $, naturally induces a family $\mathcal F = \{\varphi '(p_\theta /p_{\theta '}) : \theta ,\theta '\in \varTheta \}$. This is the main framework of this paper.

Conversely, a class of functions $\mathcal F$ defines the distribution pairs P and Q that can be compared, which are these such that $\varphi '(dP/dQ)\in \mathcal F$. Furthermore it induces a divergence $D_\varphi $ on these pairs. A typical example is the logistic model.

The KLm divergence is defined by the generator $\varphi : x\in \mathbb R \mapsto -log x + x -1$ and leads to the maximum likelihood estimator for both forms of estimation for the supremal estimator, once of which is defined bellow (see Remark 3.2 in [7]).

We consider in this paper the problem of testing the number of components in a mixture model. This question has been considered by various authors. [2, 10, 12, 14, 17] have considered likelihood ratio tests and showed some difficulties with those due to the fact that the likelihood ratio statistic is unbounded with respect to n. [17] prove that its distribution is driven by a $\log \log n$ term in a specific simple Gaussian mixture model. The test statistic needs to be calibrated in accordance with this result. But first, as stated by [17], the convergence to the limit distribution is extremely slow, making this result unpractical. And second, it seems very difficult to derive the corresponding term for a different model, and even more so for a general situation.

Our approach to this problem is suggested by the dual representation of the divergence. For the KLm divergence, it amounts to considering the maximum likelihood estimator itself as a test statistic instead of the usual maximum value of the likelihood function. This leads to a well-defined limiting distribution for the test statistic under the null. This holds for a class of estimators obtained by substituting KLm by any regular divergence. This approach also eliminates the curse of irregularity encountered by many authors for the problem of testing the number of components in a mixture.

Since we are interested in composite hypotheses, there is no justification in this context that the likelihood ratio test would be the best (in terms of uniform power) as is usually considered (e.g. [8, 18]) and [13] showed what difficulties likelihood ratio tests can encounter in this context.

[8] considered tests based on an estimation of the minimum divergence between the true distribution and the null model. In we make use of the unicity of the optimiser of the dual representation of the divergence in (1) and of the supremal divergence estimator introduced by [7]. An immediate practical advantage of this choice as compared to estimating the minimum divergence is that one less optimisation is needed. Moreover [23] showed that this estimator is robust for several choices of the divergence.

Our procedure for composite hypotheses consists in the aggregation of simple tests in the spirit of [11]. [5] used a similar aggregation procedure for testing between two distributions under noisy data and obtained some control of the resulting test power.

2 Notation and Hypotheses

Let $\{f_1(\,.\,;\theta _1):\theta _1\in \varTheta _1\}$, $\varTheta _1\subset \mathbb R^p$, and $\{f_2(\,.\,;\theta _2):\theta _2\in \varTheta _2\}$, $\varTheta _2\subset \mathbb R^q$, be probability density families with respect to a $\sigma $-finite measure $\lambda $ on $(\mathcal X, \mathcal B)$. For some fixed open interval $]a,b[ \ni 0$, let $\varTheta \subset ]a,b[ \times \varTheta _1 \times \varTheta _2$, and

$$\begin{aligned} g_{\pi ,\theta } = (1-\pi ) f_1(\,.\,;\theta _1) + \pi f_2(\,.\,;\theta _2) \end{aligned}$$

for any $(\pi ,\theta )\in \varTheta $ with $\theta = (\theta _1,\theta _2)$.

Assume that $x_1,\dots ,x_n\in \mathbb R$ have been observed and they are modelled as a realisation of the i.i.d. sample $X_1,\dots ,X_n$ which distribution $\mathbb P^* := g_{\pi ^*,\theta ^*}.\lambda $ is known up to the parameters $(\pi ^*,\theta ^*)\in \varTheta $. Our aim is to test the hypothesis $H_0 : \pi ^* = 0$.

Assume that $g_{\pi , \theta } = g_{\pi ^*, \theta ^*} \Rightarrow \pi = \pi ^*, \theta _1 = \theta _1^* \text { and, if } \pi ^* \ne 0 \text {, } \theta _2 = \theta _2^*$.

Let g be a probability density with respect to $\lambda $ such that $Supp(g) \subset Supp(g_{\pi ,\theta })$ for any $(\pi ,\theta )\in \varTheta $ such that

$$\begin{aligned} \forall (\pi ,\theta ) \in \varTheta , \int \left| \varphi '(\frac{g}{g_{\pi ,\theta }}) \right| g d\lambda < \infty . \end{aligned}$$

Let us define for any $(\pi ,\theta ) \in \varTheta $,

$$\begin{aligned} m_{\pi ,\theta } : x\in \mathcal X \mapsto \int \varphi '\Bigl (\frac{g}{g_{\pi ,\theta }}\Bigr ) g d\lambda - \varphi ^\# \Bigl ( \frac{g}{g_{\pi ,\theta }} \Bigr ) (x) \end{aligned}$$

and assume that $(\pi ,\theta )\mapsto m_{\pi ,\theta }(x)$ is continuous for any $x\in \mathcal X $. Let us also assume that

$$\begin{aligned} \forall (\tilde{\pi },\tilde{\theta })\in \varTheta , \exists r_0>0 / \forall r \le r_0, \ P^*\, \bigl |\sup _{d((\tilde{\pi },\tilde{\theta }),(\pi ,\theta ))< r} m_{\pi ,\theta } \bigr | < \infty \end{aligned}$$

where $d(\cdot ,\cdot )$ denotes the Euclidean distance and where, as usual, the operator-type notation $\mathbb P^*Y$ denotes the expectation—with respect to the probability measure $\mathbb P^*$—of the random variable Y.

Theorem 1

For any $(\pi ^*,\theta ^*)\in \varTheta $

$$\begin{aligned} D_\varphi (g.\lambda ,g_{\pi ^*,\theta ^*}.\lambda ) = \sup _{(\pi ,\theta )\in \varTheta } P^*m_{\pi ,\theta }, \end{aligned}$$

which we call the supremal form of the divergence. Moreover attainment holds uniquely at $(\pi ,\theta ) = (\pi ^*,\theta ^*)$.

Definition 1

Let $\mathbb {P}_{n}$ denote the empirical measure pertaining to the sample $X_{1},\dots ,X_{n}$. Define

$$\begin{aligned} (\hat{\pi },\hat{\theta }) := \arg \max _{\left( \pi ,\theta \right) }\mathbb {P}_{n}m_{\pi ,\theta } \end{aligned}$$

the supremal estimator of $\left( \pi ^{*},\theta ^{*}\right) $.

The existence of $( \hat{\pi },\hat{\theta })$ can be guaranteed by assuming that $\varTheta $ is compact. When uniqueness does not hold, consider any maximizer. This class of estimators has been introduced in [7], under the name .

3 Consistency of the Supremal Divergence Estimator

Let us first state the consistency of the supremal divergence estimator of the proportion and the parameters of the existing component, when the non-existing component parameters are fixed, uniformly over the latter.

Here and below, by abuse of notation, we let $\varphi '\bigl (\frac{g}{g_{\pi ,\theta }}\bigr )$ stand for $x\mapsto \varphi '\bigl (\frac{g(x)}{g_{\pi ,\theta }(x)}\bigr )$, and so on.

Remark that, for $\pi ^* = 0$ and any $\theta _1^*\in \varTheta _1$ and $\theta _2\in \varTheta _2$, we can unambiguously write $m_{\pi ^*,\theta _1^*}$ for $m_{\pi ^*,\theta _1^*,\theta _2}$ since the parameter $\theta _2$ is not involved in the expression of $m_{0,\theta _1^*,\theta _2}$.

Theorem 2

Assume that $\pi ^* = 0$ and let for any $\theta _2\in \varTheta _2$, $(\hat{\pi }(\theta _2),\hat{\theta }_1(\theta _2))\in ]a,b[ \times \varTheta _1$ such that

$$\begin{aligned} \inf _{\theta _2\in \varTheta _2}P_nm_{\hat{\pi }(\theta _2),\hat{\theta }_1(\theta _2),\theta _2} \ge P_nm_{\pi ^*,\theta _1^*} - o_{P^*}(1). \end{aligned}$$

(2)

Then

$$\begin{aligned} \sup _{\theta _2\in \varTheta _2} d\bigl ( (\hat{\pi }(\theta _2), \hat{\theta }_1(\theta _2)), (0, \theta _1^*) \bigr ) \xrightarrow [n\rightarrow \infty ]{P^*} 0. \end{aligned}$$

The convergence holds a.s. in the particular case of (2) when, a.s.,

$$\begin{aligned} \forall \theta _2\in \varTheta _2, (\hat{\pi }(\theta _2),\hat{\theta }_1(\theta _2)) \in \mathop {\textrm{argmax}}\limits _{(\pi ,\theta _1) \in ]a,b[ \times \varTheta _1} P_nm_{\pi ,\theta _1,\theta _2}. \end{aligned}$$

(3)

4 Asymptotic Distribution of the Supremal Divergence Estimator

Under $H_0$ ($\pi ^* = 0$), the joint asymptotic distribution of $(\hat{\pi }(\theta _2),\hat{\pi }(\theta _2'))$ is provided by the following theorem. The interior of $\varTheta $ will be denoted by $\mathring{\varTheta }$.

Theorem 3

Let $\theta _2\in \varTheta _2$ and $\theta _2'\in \varTheta _2$ such that $(\pi ^*,\theta _1^*,\theta _2)\in \mathring{\varTheta }$ and $(\pi ^*,\theta _1^*,\theta _2')\in \mathring{\varTheta }$. Write

$$\begin{aligned} \varTheta (\theta _2)&= \{(\pi ,\theta _1)\in ]-\infty ,1[\times \varTheta _1 : (\pi ,\theta _1,\theta _2)\in \varTheta \} \\ \varTheta (\theta _2')&= \{(\pi ,\theta _1)\in ]-\infty ,1[\times \varTheta _1 : (\pi ,\theta _1,\theta _2')\in \varTheta \}, \end{aligned}$$

and let $(\hat{\pi },\hat{\theta }_1)$ and $(\hat{\pi }',\hat{\theta }_1')$ be such that

$$\begin{aligned} (\hat{\pi },\hat{\theta }_1)&\in \mathop {\textrm{argmax}}\limits _{(\pi ,\theta _1)\in \varTheta (\theta _2)} P_nm_{\pi ,\theta _1,\theta _2} \\ (\hat{\pi }',\hat{\theta }_1')&\in \mathop {\textrm{argmax}}\limits _{(\pi ,\theta _1)\in \varTheta (\theta _2')} P_nm_{\pi ,\theta _1,\theta _2'}. \end{aligned}$$

Assume that $\pi ^* = 0$.

Moreover, assume that :

$(\pi ,\theta _1)\in \varTheta (\theta _2)\mapsto m_{\pi ,\theta _1,\theta _2}(x)$ (resp. $(\pi ,\theta _1)\in \varTheta (\theta _2')\mapsto m_{\pi ,\theta _1,\theta _2'}(x)$) is differentiable $\lambda $-a.e. with derivative $\psi _{\pi ,\theta _1} = \begin{pmatrix} \frac{\partial }{\partial \pi }m_{\pi ,\theta _1,\theta _2} \\ \frac{\partial }{\partial \theta _1}m_{\pi ,\theta _1,\theta _2} \end{pmatrix}_{|(\pi ,\theta _1)}$ (resp. $\psi _{\pi ,\theta _1}' = \begin{pmatrix} \frac{\partial }{\partial \pi }m_{\pi ,\theta _1,\theta _2'} \\ \frac{\partial }{\partial \theta _1}m_{\pi ,\theta _1,\theta _2'} \end{pmatrix}_{|(\pi ,\theta _1)}$) such that $P^* \psi _{\pi ^*,\theta _1^*} = 0$ (resp. $P^* \psi _{\pi ^*,\theta _1^*}' = 0$).
$(\pi ,\theta _1)\in \varTheta (\theta _2) \mapsto P^* \psi _{\pi ,\theta _1}$ (resp. $(\pi ,\theta _1)\in \varTheta (\theta _2') \mapsto P^* \psi _{\pi ,\theta _1}'$) is differentiable at ${\pi ^*,\theta _1^*}$ with invertible derivative matrix $H = D{\left( P^*\psi \right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } $ (resp. $H' = D{\left( P^*\psi '\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } $).
$\{\psi _{\pi ,\theta _1} : (\pi ,\theta _1)\in \varTheta (\theta _2) \}$ and $\{\psi _{\pi ,\theta _1}' : (\pi ,\theta _1)\in \varTheta (\theta _2') \}$ are $P^*$-Donsker.
$\int (\psi _{\hat{\pi },\hat{\theta }_1}(x) - \psi _{\pi ^*,\theta _1^*}(x))^2 dP^*(x) \xrightarrow []{P^*}0$ and $\int (\psi _{\hat{\pi },\hat{\theta }_1}'(x) - \psi _{\pi ^*,\theta _1^*}'(x))^2 dP^*(x) \xrightarrow []{P^*}0$.

Assume that $H {=} D{\left( P^*\psi \right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } {=} P^*D^2{\left( h\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } $ (resp. $H' {=} D{\left( P^*\psi '\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } {=}$ $ P^*D^2{\left( h'\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } $) with $P^* |D^2{\left( h\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } | < \infty $ (resp. $P^* |D^2{\left( h'\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } | < \infty $) where $h : (\pi ,\theta _1)\in \varTheta (\theta _2)\mapsto m_{\pi ,\theta _1,\theta _2}(x)$ (resp. $h' : (\pi ,\theta _1)\in \varTheta (\theta _2')\mapsto m_{\pi ,\theta _1,\theta _2'}(x)$).

Then with $a_n$ (resp. $a_n'$) being the (1, 1)-entry of the matrix $H_n^{-1} \cdot \bigl (P_n\psi _{\hat{\pi },\hat{\theta }_1}\psi _{\hat{\pi },\hat{\theta }_1}^T\bigr ) \cdot H_n^{-1}$ (resp. of $H_n'^{-1} \cdot \bigl ( P_n\psi _{\hat{\pi },\hat{\theta }_1}'\psi _{\hat{\pi },\hat{\theta }_1}'^T\bigr ) \cdot H_n'^{-1}$)—where $H_n$ (resp. $H_n'$) denotes the Hessian matrix of $(\pi ,\theta _1) \mapsto P_nm_{\pi ,\theta _1,\theta _2}$ (resp. of $(\pi ,\theta _1) \mapsto P_nm_{\pi ,\theta _1,\theta _2'}$) at the point $(\hat{\pi },\hat{\theta }_1)$, which is supposed to be invertible with high probability—one gets

$$\begin{aligned} \begin{pmatrix} \sqrt{\frac{n}{a_n}}(\hat{\pi }- \pi ^*)\\ \sqrt{\frac{n}{a_n'}}(\hat{\pi }' - \pi ^*)\end{pmatrix} \xrightarrow {\mathcal L} \mathcal N(0,U) \end{aligned}$$

with

$$\begin{aligned} U = \left( \begin{array}{cc} 1 &{} \frac{b}{\sqrt{a a'}} \\ \frac{b}{\sqrt{a a'}} &{} 1 \\ \end{array} \right) \end{aligned}$$

(4)

where b is the (1, 1)-entry of the matrix $H^{-1} \cdot \bigl (\mathbb {P}\psi _{\pi ^*,\theta _1^*}\psi _{\pi ^*,\theta _1^*}'^T\bigr ) \cdot H'^{-1}$, and a (resp. $a'$) the (1, 1)-entry of the matrix $H^{-1} \cdot \bigl (\mathbb {P}\psi _{\pi ^*,\theta _1^*}\psi _{\pi ^*,\theta _1^*}^T\bigr ) \cdot H^{-1}$ (resp. of $H'^{-1} \cdot \bigl (\mathbb {P}\psi _{\pi ^*,\theta _1^*}'\psi _{\pi ^*,\theta _1^*}'^T\bigr ) \cdot H'^{-1}$).

This result naturally generalises to k-tuples. The marginal result for $\theta $ actually also holds when $\pi ^* > 0$ and $\theta _2 = \theta _2^*$, which is useful to control the power of the test procedure to be defined.

Let us consider as a test statistic $T_n = \sup _{\theta _2} \sqrt{\frac{n}{a_n}} \hat{\pi }$ and let us reject $H_0$ when $T_n$ takes large values. It seems sensible to reduce $\hat{\pi }$ for each value of $\theta _2$ so that, under $H_0$, it is asymptotically distributed as a $\mathcal N(0,1)$ and that the (reduced) values of $\hat{\pi }$ for different values of $\theta _2$ can be compared. In practice, the asymptotic variance has to be estimated hence the substitution of $\hat{\pi }$, $\hat{\theta }_1$, and $P_n$ for $\pi ^*$, $\theta _1^*$, and $P^*$ in $H^{-1} P^* \psi _{\pi ^*,\theta _1^*}\psi _{\pi ^*,\theta _1^*}^T H^{-1}$. This choice is justified in [19].

The Bonferoni aggregation rule is not sensible here since the tests for different values of $\theta _2$ are obviously not independent so that such a procedure would lead to a conservative test. Hence the need in Theorem 3 for the joint asymptotic distribution to take the dependence between $\hat{\pi }$ for different values of $\theta _2$. This leads to the study of the asymptotic distribution of $T_n$ which should be the distribution of $\sup W$ where W is a Gaussian process which covariance structure is given by Theorem 3. This will be proved in the forthcoming section.

5 Asymptotic Distribution of the Supremum of Supremal Divergence Estimators

$H_0$ is assumed to hold in this section.

It is stated that the asymptotic distribution of $T_n$ is that of the supremum of a Gaussian process with the covariance $\frac{b}{\sqrt{aa'}}$, as in (4).

Then it is stated that the distribution of the latter can be approximated by maximising the Gaussian process with the covariance $\frac{b_n}{\sqrt{a_na_n'}}$, where $a_n$, $a_n'$, and $b_n$ are estimations of the corresponding quantities, on a finite grid of values for $\theta _2$.

Let X be the centred Gaussian process over $\varTheta _2$ with

$$\begin{aligned} \forall \theta _2, \theta _2' \in \varTheta _2, r(\theta _2,\theta _2') = \textrm{Cov}(X_{\theta _2}, X_{\theta _2'}) = \frac{b}{\sqrt{aa'}} \end{aligned}$$

where a and b are defined in Theorem 3.

Theorem 4

Under general regularity conditions pertaining to the class of derivatives of m (Glivenko-Cantelli classes), we have

$$\begin{aligned} \sqrt{\frac{n}{a_n}} (\hat{\pi }_n - \pi ^*) \xrightarrow []{\mathcal L} X. \end{aligned}$$

This results from [21] when $dim(\varTheta _2) = 1$ and [24] when $dim(\varTheta _2) > 1$.

Theorem 5

Under the same general regularity conditions as above, we have

$$\begin{aligned} T_n \xrightarrow []{\mathcal L} \sup _{\theta _2\in \varTheta _2}X(\theta _2). \end{aligned}$$

The proof of the last result when $dim(\varTheta _2) = 1$ makes use of the fact that $\theta _2 \mapsto \hat{\pi }(\theta _2)$ is cadlag ([21]). This is a reasonable assumption, which holds in the examples which we considered. We are eager for counter-examples! When $dim(\varTheta _2) > 1$, the result holds also by [3].

Let now $X^n$ be the centred Gaussian process over $\varTheta _2$ with

$$\begin{aligned} \forall \theta _2, \theta _2' \in \varTheta _2, \textrm{Cov}(X^n_{\theta _2}, X^n_{\theta _2'}) = \frac{b_n}{\sqrt{a_n a_n'}} \end{aligned}$$

where $a_n$, $a_n'$ are defined in Theorem 3 and $b_n$ is defined analogously.

Theorem 6

Let, for any $\delta > 0$, $\varTheta _2^\delta $ be a finite set such that $\forall \theta _2\in \varTheta _2, \exists \tilde{\theta }_2\in \varTheta _2^\delta / \Vert \theta _2-\tilde{\theta }_2\Vert \le \delta $. Then

$$\begin{aligned} M_n^\delta = \sup _{\theta _2\in \varTheta _2^\delta } X^n_{\theta _2} \xrightarrow [\begin{array}{c} n \rightarrow \infty \\ \delta \rightarrow 0 \end{array}]{\mathcal L} M = \sup _{\theta _2\in \varTheta _2} X_{\theta _2}. \end{aligned}$$

6 Algorithm

Our algorithm for testing that the data was sampled from a single-component mixture ($H_0 : \pi ^* = 0$) against a two-component mixture ($H_1 : \pi ^* > 0$) is presented in Algorithm 1.

In this algorithm, $\hat{\pi }(\theta _2)$ is defined in (2) and (3). It depends on g. This Theorems hold as long as g fulfils $Supp(g) \subset Supp(g_{\pi ,\theta })$ for any $(\pi ,\theta )\in \varTheta $. However it has to be chosen with care. The constants in the asymptotic distribution in Theorem 3 depend on it. Moreover [23] argue that the choice of g can influence the robustness properties of the procedure.

The choice of $\varphi $ is also obviously crucial (see also [23] for the induced robustness properties).

The choice of $\varphi $ and g are important practical questions which are work in progress.

As already stated, the supremal estimator for the modified Kullback-Leibler divergence $\varphi : x \in \mathbb R^{+*} \mapsto -\log x + x - 1$ is the usual maximum likelihood estimator. In this instance the estimator does not depend on g.

References

Al Mohamad, D.: Towards a better understanding of the dual representation of phi divergences. Stat. Pap. 59(3), 1205–1253 (2016). https://doi.org/10.1007/s00362-016-0812-5
Article MathSciNet MATH Google Scholar
Bickel, P.J., Chernoff, H.: Asymptotic distribution of the likelihood ratio statistic in a prototypical non regular problem. In: Statistics and Probability: A Raghu Raj Bahadur Festschrift, pp. 83–96 (1993)
Google Scholar
Bickel, P.J., Wichura, M.J.: Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Stat. 42(5), 1656–1670 (1971)
Article MathSciNet MATH Google Scholar
Birrell, J., Dupuis, P., Katsoulakis, M.A., Pantazis, Y., Rey-Bellet, L.: (f, $\gamma $)-divergences: interpolating between f-divergences and integral probability metrics. J. Mach. Learn. Res. 23(1), 1816–1885 (2022)
MathSciNet MATH Google Scholar
Broniatowski, M., Jurečková, J., Moses, A.K., Miranda, E.: Composite tests under corrupted data. Entropy 21(1), 63 (2019)
Article Google Scholar
Broniatowski, M., Keziou, A.: Minimization of $\varphi $-divergences on sets of signed measures. Studia Scientiarum Mathematicarum Hungarica 43(4), 403–442 (2006)
Article MathSciNet MATH Google Scholar
Broniatowski, M., Keziou, A.: Parametric estimation and tests through divergences and the duality technique. J. Multivar. Anal. 100(1), 16–36 (2009)
Article MathSciNet MATH Google Scholar
Broniatowski, M., Miranda, E., Stummer, W.: Testing the number and the nature of the components in a mixture distribution. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2019. LNCS, vol. 11712, pp. 309–318. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26980-7_32
Chapter MATH Google Scholar
Broniatowski, M., Stummer, W.: Some universal insights on divergences for statistics, machine learning and artificial intelligence. In: Geometric Structures of Information, pp. 149–211 (2019)
Google Scholar
Feng, Z., McCulloch, C.E.: Statistical inference using maximum likelihood estimation and the generalized likelihood ratio when the true parameter is on the boundary of the parameter space. Stat. Probabil. Lett. 13(4), 325–332 (1992)
Article MathSciNet MATH Google Scholar
Garel, B.: Asymptotic theory of the likelihood ratio test for the identification of a mixture. J. Stat. Plan. Infer. 131(2), 271–296 (2005)
Article MathSciNet MATH Google Scholar
Ghosh, J.K., Sen, P.K.: On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. Technical report, North Carolina State University. Department of Statistics (1984)
Google Scholar
Hall, P., Stewart, M.: Theoretical analysis of power in a two-component normal mixture model. J. Stat. Plan. Infer. 134(1), 158–179 (2005)
Article MathSciNet MATH Google Scholar
Hartigan, J.A.: Statistical theory in clustering. J. Classif. 2(1), 63–76 (1985)
Article MathSciNet MATH Google Scholar
Liese, F., Vajda, I.: Convex statistical distances, vol. 95. Teubner (1987)
Google Scholar
Liese, F., Vajda, I.: On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 52(10), 4394–4412 (2006)
Article MathSciNet MATH Google Scholar
Liu, X., Shao, Y.: Asymptotics for the likelihood ratio test in a two-component normal mixture model. J. Stat. Plan. Infer. 123(1), 61–81 (2004)
Article MathSciNet MATH Google Scholar
Lo, Y., Mendell, N.R., Rubin, D.B.: Testing the number of components in a normal mixture. Biometrika 88(3), 767–778 (2001)
Article MathSciNet MATH Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Book MATH Google Scholar
Nguyen, X., Wainwright, M.J., Jordan, M.I.: Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 56(11), 5847–5861 (2010)
Article MathSciNet MATH Google Scholar
Pollard, D.: Convergence of Stochastic Processes. Springer, Heidelberg (2012). https://doi.org/10.1007/978-1-4612-5254-2
Book MATH Google Scholar
Rüschendorf, L.: On the minimum discrimination information theorem. In: Statistical Decisions, pp. 263–283 (1984)
Google Scholar
Toma, A., Broniatowski, M.: Dual divergence estimators and tests: robustness results. J. Multivar. Anal. 102(1), 20–36 (2011)
Article MathSciNet MATH Google Scholar
van der Vaart, A., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, Heidelberg (1996). https://doi.org/10.1007/978-1-4757-2545-2
Book MATH Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the reviewers for their valuable comments and suggestions which helped improving the article.

Author information

Authors and Affiliations

Sorbonne Université and Université Paris Cité, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, 75005, Paris, France
Jean-Patrick Baudry, Michel Broniatowski & Cyril Thommeret
Safran Group, Paris, France
Cyril Thommeret
4 place Jussieu, 75005, Paris, France
Jean-Patrick Baudry

Authors

Jean-Patrick Baudry
View author publications
You can also search for this author in PubMed Google Scholar
Michel Broniatowski
View author publications
You can also search for this author in PubMed Google Scholar
Cyril Thommeret
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean-Patrick Baudry .

Editor information

Editors and Affiliations

Sony Computer Science Laboratories Inc., Tokyo, Japan
Frank Nielsen
THALES Land and Air Systems, Meudon, France
Frédéric Barbaresco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baudry, JP., Broniatowski, M., Thommeret, C. (2023). Aggregated Tests Based on Supremal Divergence Estimators for Non-regular Statistical Models. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2023. Lecture Notes in Computer Science, vol 14071. Springer, Cham. https://doi.org/10.1007/978-3-031-38271-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-38271-0_14
Published: 01 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38270-3
Online ISBN: 978-3-031-38271-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Aggregated Tests Based on Supremal Divergence Estimators for Non-regular Statistical Models

Abstract

Keywords

1 Dual Representation of the \(\varphi \)-Divergences and Tests

2 Notation and Hypotheses

Theorem 1

Definition 1

3 Consistency of the Supremal Divergence Estimator

Theorem 2

4 Asymptotic Distribution of the Supremal Divergence Estimator

Theorem 3

5 Asymptotic Distribution of the Supremum of Supremal Divergence Estimators

Theorem 4

Theorem 5

Theorem 6

6 Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Aggregated Tests Based on Supremal Divergence Estimators for Non-regular Statistical Models

Abstract

Keywords

1 Dual Representation of the \(\varphi \)-Divergences and Tests

2 Notation and Hypotheses

Theorem 1

Definition 1

3 Consistency of the Supremal Divergence Estimator

Theorem 2

4 Asymptotic Distribution of the Supremal Divergence Estimator

Theorem 3

5 Asymptotic Distribution of the Supremum of Supremal Divergence Estimators

Theorem 4

Theorem 5

Theorem 6

6 Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation