Abstract
Consider two robust detection problems formulated by nonparametric hypotheses such that both hypotheses are composite and indistinguishable. Strongly consistent testing rules are shown.
The research reported here was supported in part by the National Development Agency (NFÜ, Hungary) as part of the project Introduction of Cognitive Methods for UAV Collision Avoidance Using Millimeter Wave Radar (grant no.: KMR-12-1-2012-0008).
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Composite Hypotheses Defined by Half Spaces of Distributions
Let \(\nu _0,\nu _1\) be fixed distributions on \(\mathbb {R}^d\) which are the nominal distributions under two hypotheses. Let
denote the total variation distance between two distributions \(\nu \) and \(\mu \), where the supremum is taken over all Borel sets of \(\mathbb {R}^d\).
Let \(X,X_1,X_2,\dots \) be i.i.d. random vectors according to a common distribution \(\mu \). We observe \(X_1,\dots ,X_n\). Under the hypothesis \(H_j\) (\(j=0,1\)) the distribution \(\mu \) is a distorted version of \(\nu _j\). Formally define the two hypotheses by
and
Our aim is to construct a distribution-free strongly consistent test, which makes an error only finitely many times almost surely (a.s.). The concept of strongly consistent test is quite unusual: it means that both on \( H_0\) and on \( H_1\) the test makes a.s. no error after a random sample size. In other words, denoting by \({\mathbb {P}}_0\) and \({\mathbb {P}}_1\) the probability under the hypotheses \( H_0\) and \( H_1\), we have
and
In a real-life problem, for example, when we get the data sequentially, one gets data just once, and should make good inferences from these data. Strong consistency means that the single sequence of inference is a.s. perfect if the sample size is large enough. This concept is close to the definition of discernability introduced by Dembo and Peres [5]. For a discussion and references, we refer the reader to Biau and Györfi [3], Devroye and Lugosi [7], Gretton and Györfi [10], and Györfi and Walk [15].
Motivated by a related goodness of fit test statistic of Györfi and van der Meulen [14], we put
and
where \(\mu _n\) denotes the empirical measures associated with the sample \(X_1, \ldots ,X_n\), so that
for any Borel subset A, and \(\mathcal {P}_n=\{A_{n,1},\ldots ,A_{n,m_n}\}\) is a finite partition of \({\mathbb R}^d\).
We introduce a test such that the hypothesis \(H_0\) is accepted if
and rejected otherwise.
The sequence of partitions \(\mathcal {P}_1,\mathcal {P}_2,\dots \) is called asymptotically fine if for any sphere S centered at the origin
Theorem 22.1
Assume that the sequence of partitions \(\mathcal {P}_1,\mathcal {P}_2,\dots \) is asymptotically fine and
Then the test (22.1) is strongly consistent.
Proof
Assume \(H_0\) without loss of generality. Then the error event means that
Thus,
where
Introduce the notation
The sequence of partitions \(\mathcal {P}_1,\mathcal {P}_2,\dots \) is asymptotically fine, which implies that
(cf. Biau and Györfi [3]). Thus, for all n large enough,
Beirlant et al. [2] and Biau and Györfi [3] proved that, for any \(\delta >0\),
Therefore
Because of (22.2),
and so the Borel-Cantelli lemma implies that a.s.
for all n large enough, i.e., the error
occurs a.s. for only finitely many n. Thus, strong consistency is proved. \(\Box \)
In a straightforward way, the proof of Theorem 22.1 can be extended to infinite partitions if we assume that for each sphere S centered at the origin
Next, a variant of the test (22.1) with much smaller computational complexity will be defined. The test statistic is based on a recursive histogram. In this section we assume that the partitions are infinite and all cells of the partitions have finite and positive Lebesgue measure \(\lambda \). Let \(A_n(x)\) denote the cell of \(\mathcal {P}_n\) to which x belongs. The density estimate
is called a recursive histogram.
For \(A\in \mathcal {P}_n\), introduce the estimate
Notice that \(\mu _n^*(A)\) can be calculated in a recursive way, which is important in on-line applications. These definitions imply that
If the sequence of partitions \(\mathcal {P}_1,\mathcal {P}_2,\dots \) is nested, i.e., the sequence of \(\sigma \)-algebras \(\sigma (\mathcal {P}_n)\) is non-decreasing, then for \(A\in \mathcal {P}_n\) let the ancestor \(B_A^{(i)}\in \mathcal {P}_i\) be such that \(A \subseteq B_A^{(i)}\) (\(i\le n\)). One can check that for nested partitions the estimate has the following form:
Put
(\(j=0,1\)). We introduce a test such that the hypothesis \(H_0\) is accepted if
and rejected otherwise.
Theorem 22.2
Assume that the sequence of partitions \(\mathcal {P}_1,\mathcal {P}_2,\dots \) is asymptotically fine such that
Further suppose that \(\mu \) has a density. Then the test (22.3) is strongly consistent.
Proof
Assume \(H_0\) without loss of generality. One notices
where
and
By Biau and Györfi [3],
the latter because of \(H_0\). Next \(L_n^*\rightarrow 0 \) a.s. will be shown. Denote the density of \(\mu \) by f. Thus
Therefore we have to prove the strong \(L_1\)-consistency of the recursive histogram. Consider the bias part. Introduce the ordinary histogram:
and put
According to the Abou-Jaoude theorem, if the sequence of partitions \(\mathcal {P}_1,\mathcal {P}_2,\dots \) is asymptotically fine, then
(cf. Devroye and Györfi [6]). Thus, for the bias term of the recursive histogram, we get
For the variation term of the recursive histogram, we apply the generalized theorem of Kolmogorov. Let \(U_n\), \(n=1,2,\dots \) be an \(L_2\)-valued sequence of independent, zero-mean random variables such that
where \(\Vert \cdot \Vert _2\) denotes the \(L_2\) norm. Then
a.s. (cf. Györfi et al. [11]). For
one has to verify the condition of the generalized Kolmogorov theorem:
by the condition of the theorem, and so
a.s. From Lemma 3.1 in Györfi and Masry [13] we get that the limit relations (22.4) and (22.5) imply
a.s. Therefore a.s.
for all n large enough, and so strong consistency is proved. \(\Box \)
2 Composite Hypotheses Defined by Half Spheres of Distributions
Again, under the hypothesis \(H'_j\) (\(j=0,1\)) the distribution \(\mu \) is a distorted version of \(\nu _j\). In this section we assume that the true distribution lies within a certain total variation distance of the underlying nominal distribution.
We formally define the two hypotheses by
where
Because of
the test (22.1) in the previous section is strongly consistent. In this section we introduce a simpler test. For the notations
the general version of Scheffé’s theorem implies that
where
Introduce the notation
where
The proposed test is the following: accept hypothesis \(H'_0\) if
and reject otherwise.
Then, we get that
Theorem 22.3
The test (22.7) is strongly consistent.
Proof
Assume \(H_0\) without loss of generality. Put
Observe that by the Scheffé theorem [22],
Rearranging the obtained inequality, we get that
Therefore, (22.8) and Hoeffding’s inequality [16] imply that
Therefore the Borel-Cantelli lemma implies strong consistency. \(\Box \)
3 Discussion
3.1 Indistinguishability
For the hypotheses \(H_0\) and \(H_1\) there is no positive margin, because the gap between \(H_0\) and \(H_1\) is just the hyperplane
Moreover, the margin is zero:
Without any positive margin condition it is impossible to derive a uniform bound on the error probabilities. The pair \((H_0,H_1)\) of hypotheses is called distinguishable if there is a sequence of uniformly consistent tests, which means that the errors of the first and second kind tend to zero uniformly. For a test \(T_n\) with sample size n, let \(\alpha _{n,\mu }(T_n)\) and \(\beta _{n,\mu }(T_n)\) denote the errors of the first and second kind, resp. Put
A sequence of tests \(T_n\), \(n=1,2,\dots \) is called uniformly consistent if
It is known that a necessary condition of the distinguishable property is that for any distribution \(\mu \)
(See Barron [1], Ermakov [9], Hoeffding and Wolfowitz [17], Kraft [18], LeCam [19], LeCam and Schwartz [20], Schwartz [23].) Obviously, this necessary condition is not satisfied when \(\mu ^*=(\nu _1+\nu _2)/2\). Because of
the pair \((H'_0,H'_1)\) of hypotheses is indistinguishable, too.
3.2 Computation
The hypothesis testing method (22.7) proposed above is computationally quite simple. The set \(A_{0,1}\) and the nominal probabilities \(\nu _0(A_{0,1})\) and \(\nu _1(A_{0,1})\) may be computed and stored before seeing the data. Then one merely needs to calculate \(\mu _n(A_{0,1})\).
3.3 Hypotheses Formulated by Densities
Devroye et al. [8] formulated a special case of hypotheses \((H'_0,H'_1)\), when \(\mu \), \(\nu _0\), and \(\nu _1\) have densities f, \(f_0\), and \(f_1\). Under some mild margin condition they proved uniform exponential bounds for the probability of failure for \(k\ge 2\) hypotheses. Moreover, they illustrated robustness of these bounds under additive noise, and showed examples where the test (22.7) is consistent and the maximum likelihood test does not work. Formally, the maximum likelihood test \(T_n\) is defined by
For \(f\in H'_0\), the strong law of large numbers implies the strong consistency of the maximum likelihood test if both integrals \(\int f\log f_0\) and \(\int f\log f_1\) are well defined, and
3.4 Robustness
Note that Theorem 22.3 does not require any assumptions about the nominal distributions. The test is robust in a very strong sense: we obtain consistency under the sole assumption that the distorted distribution remains within a certain total variation distance of the nominal distribution. For example, if \(\mu \) is either \((1-\delta )\nu _0 + \delta \tau \), or \((1-\delta )\nu _1 + \delta \tau \) with arbitrary “strange” distribution \(\tau \) such that \(\delta < \varDelta \), then we have (22.6):
The outliers’ distribution \(\tau \) is really arbitrary. For example, it may not have expectations, or may even be a discrete distribution. The probability of outlier \(\delta \) can be at most equal to \(\varDelta \). The outliers can be formulated such that we are given three independent i.i.d. sequences \(\{U_i\}\), \(\{V_i\}\), \(\{I_i\}\), where \(\{U_i\}\), \(\{V_i\}\) are \(\mathbb {R}^d\)-valued, and \(\{I_i\}\) are binary. Put
If \(U_n\) is \(\nu _0\) distributed, \(V_n\) is \(\tau \) distributed, \({\mathbb {P}}\{I_n=1\}=\delta \), then we get the previous scheme. Other application include the case of censored observations, when \(V_n\) is a distortion of \(U_n\) such that some components of the vector \(U_n\) are censored. In this scheme \(\delta \) is the probability of censoring. Notice that in order to estimate the distribution from censored observations one needs samples \(\{(X_i,I_i)\}_{i=1}^n\) (cf. Györfi et al. [12]), while for detection it is enough to have \(\{X_i\}_{i=1}^n\).
3.5 Open Problems
-
1.
Characterize the distributions \(\mu \in H_0{\setminus } H'_0\) where the simple test (22.7) is strongly consistent. As in the proof of Theorem 22.3, strong consistency can be verified if
$$ \mu (A_{0,1}) > \frac{\nu _0(A_{0,1}) + \nu _1(A_{0,1})}{2}. $$We are interested in non-consistent examples, too.
-
2.
Maybe one can improve the test (22.1), since in the construction of the partitions we don’t take into account the properties of \(\nu _0\) and \(\nu _1\). For example, we can include somehow the set \(A_{0,1}\).
3.6 Sequential Tests
We dealt with sequences of nonparametric tests with increasing sample size n, where almost surely type I and II errors occur only for finitely many n. One has to distinguish them from nonparametric sequential tests with power one (cf. Darling and Robbins [4], Sect. 6 in Robbins [21], Sect. 9.2 in Sen [24]). Such tests almost surely terminate at a random sample size with rejection of a null hypothesis \(H_0\) after finitely many observations, if the alternative hypothesis is valid, and with positive probability do not terminate if \(H_0\) is valid (open-ended procedures). In the latter case an upper bound of the complementary probabilities is an upper bound for the type I error probability.
References
Barron, A.R.: Uniformly powerful goodness of fit tests. Ann. Stat. 17(1), 107–124 (1989)
Beirlant, J., Devroye, L., Györfi, L., Vajda, I.: Large deviations of divergence measures on partitions. J. Stat. Plan. Inference 93(1–2), 1–16 (2001)
Biau, G., Györfi, L.: On the asymptotic properties of a nonparametric \(l_1\)-test statistic of homogeneity. IEEE Trans. Inf. Theory 51(11), 3965–3973 (2005)
Darling, D.A., Robbins, H.: Some nonparametric sequential tests with power one. Proc. Natl. Acad. Sci. USA 61(3), 804–809 (1968)
Dembo, A., Peres, Y.: A topological criterion for hypothesis testing. Ann. Stat. 22(1), 106–117 (1994)
Devroye, L., Györfi, L.: Nonparametric Density Estimation: The \(L_1\) View. Wiley, New York (1985)
Devroye, L., Lugosi, G.: Almost sure classification of densities. J. Nonparametric Stat. 14(6), 675–698 (2002)
Devroye, L., Györfi, L., Lugosi, G.: A note on robust hypothesis testing. IEEE Trans. Inf. Theory 48(7), 2111–2114 (2002)
Ermakov, M.: On distinguishability of hypotheses. Technical report (2013). arXiv:1308.4295
Gretton, A., Györfi, L.: Consistent nonparametric tests of independence. J. Mach. Learn. Res. 11, 1391–1423 (2010)
Györfi, L., Györfi, Z., Vajda, I.: A strong law of large numbers and some applications. Stud. Sci. Math. Hung. 12, 233–244 (1977)
Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: Distribution-Free Theory of Nonparametric Regression. Springer, New York (2002)
Györfi, L., Masry, E.: The \(L_1\) and \(L_2\) strong consistency of recursive kernel density estimation from dependent samples. IEEE Trans. Inf. Theory 36(3), 531–539 (1990)
Györfi, L., van der Meulen, E.C.: A consistent goodness of fit test based on the total variation distance. In: Roussas, G. (ed.) Nonparametric Functional Estimation and Related Topics, pp. 631–645. Kluwer, Dordrecht (1990)
Györfi, L., Walk, H.: Strongly consistent nonparametric tests of conditional independence. Stat. Probab. Lett. 82(6), 1145–1150 (2012)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Hoeffding, W., Wolfowitz, J.: Distinguishability of sets of distributions. Ann. Math. Stat. 29(3), 700–718 (1958)
Kraft, C.: Some conditions for consistency and uniform consistency of statistical procedures. Univ. Calif. Publ. Stat. 2, 125–142 (1955)
Le Cam, L.: Convergence of estimates under dimensionality restrictions. Ann. Stat. 1(1), 38–53 (1973)
Le Cam, L., Schwartz, L.: A necessary and sufficient condition for the existence of consistent estimates. Ann. Math. Stat. 31(1), 140–150 (1960)
Robbins, H.: Statistical methods related to the law of the iterated logarithm. Ann. Math. Stat. 41(5), 1397–1409 (1970)
Scheffé, H.: A useful convergence theorem for probability distributions. Ann. Math. Stat. 18(3), 434–438 (1947)
Schwartz, L.: On Bayes procedures. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 4(1), 10–26 (1965)
Sen, P.K.: Sequential Nonparametrics: Invariance Principles and Statistical Inference. Wiley, New York (1981)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Györfi, L., Walk, H. (2015). Strongly Consistent Detection for Nonparametric Hypotheses. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds) Measures of Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-21852-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-21852-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21851-9
Online ISBN: 978-3-319-21852-6
eBook Packages: Computer ScienceComputer Science (R0)