The Uniform Distribution Is Complete with Respect to Testing Identity to a Fixed Distribution

Goldreich, Oded

doi:10.1007/978-3-030-43662-9_10

Oded Goldreich¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12050))

594 Accesses
5 Citations

Abstract

Inspired by Diakonikolas and Kane (2016), we reduce the class of problems consisting of testing whether an unknown distribution over [n] equals a fixed distribution to the special case in which the fixed distribution is uniform over [n]. Our reduction preserves the parameters of the problem, which are n and the proximity parameter $\epsilon >0$, up to a constant factor.

While this reduction yields no new bounds on the sample complexity of either problems, it provides a simple way of obtaining testers for equality to arbitrary fixed distributions from testers for the uniform distribution. The reduction first reduces the general case to the case of “grained distributions” (in which all probabilities are multiples of $\varOmega (1/n)$), and then reduces this case to the case of the uniform distribution. Using grained distributions as a pivot of the exposition, we call attention to this natural class.

Access provided by Autonomous University of Puebla. Download chapter PDF

Testing Probability Distributions Underlying Aggregated Data

A Lower Bound on the Complexity of Testing Grained Distributions

Article 26 October 2023

Improving and Extending the Testing of Distributions for Shape-Restricted Properties

Article 21 June 2019

An early version of this work appeared as TR16-015 of ECCC. The original version reproduced some text from the author’s lecture notes on property testing [8], which were later used as a basis for his book [9]. The current revision is quite minimal, except for the correction of various typos and a significant elaboration of the Appendix.

1 Introduction

Inspired by Diakonikolas and Kane [5], we present, for every fixed distribution D over [n], a simple reduction of the problem of testing whether an unknown distribution over [n] equals D to the problem of testing whether an unknown distribution over [n] equals the uniform distribution over [n]. Specifically, we reduce $\epsilon $-testing of equality to D to $\epsilon /3$-testing of equality to the uniform distribution over [6n], denoted $U_{6n}$.

Hence, the sample (resp., time) complexity of testing equality to D, with respect to the proximity parameter $\epsilon $, is at most the sample (resp., time) complexity of testing equality to $U_{6n}$ with respect to the proximity parameter $\epsilon /3$. Since optimal bounds were known for both problems (cf., e.g., [1, 2, 4, 12, 14, 17]), our reduction yields no new bounds. Still, it provides a simple way of obtaining testers for equality to arbitrary fixed distributions from testers for the uniform distribution.

The Setting at a Glance. For any fixed distribution D over [n], we consider the problem of $\epsilon $-testing equality to D, where the tester is given samples drawn from an unknown distribution X and is required to distinguish the case that $X\equiv D$ from the case that X is $\epsilon $-far from D, where the distance is the standard statistical distance. The sample complexity of this testing problem, depends on D, and is viewed as a function of n and $\epsilon $. We write $D\subseteq [n]$ to denote that D ranges over [n].

Wishing to present reductions between such problems, we need to spell out what we mean by such a reduction. Confining ourselves to problems of testing equality to fixed distributions, we use a very stringent notion of a reduction, which we call reduction via a filter. Specifically, we say that $\epsilon $-testing equality to $D\subseteq [n]$ reduces to $\epsilon '$-testing equality to $D'\subseteq [n']$ if there exists a randomized process F (called a filter) that maps [n] to $[n']$ such that the distribution D is mapped to the distribution $D'$ and any distribution that is $\epsilon $-far from D is mapped to a distribution that is $\epsilon '$-far from $D'$, where we say that F maps the distribution X to the distribution Y if $Y\equiv F(X)$.

Note that the foregoing is a very stringent notion of reduction between distribution testing problems: Under this notion, a tester T for $\epsilon $-testing equality to D is derived by invoking a $\epsilon '$-tester $T'$ for equality to $D'$ and providing $T'$ with the sample $F(i_1),...,F(i_s)$, where $i_1,...,i_s$ is the sample provided to T. Still, our main result can be stated as follows.

Theorem 1.1

(completeness of testing equality to $U_n$): For every distribution D over [n] and every $\epsilon >0$, it holds that $\epsilon $-testing equality to D reduces to $\epsilon /3$-testing equality to $U_{6n}$, where $U_m$ denotes the uniform distribution over [m]. Furthermore, the same reduction F can be used for all $\epsilon >0$.

Hence, the sample complexity of $\epsilon $-testing equality to D is upper-bounded by the sample complexity of $\epsilon /3$-testing equality to $U_{6n}$. We mention that in some cases, testing equality to D can be easier than testing equality to $U_n$; such natural cases contain grained distributions (see below). (A general study of the dependence on D of the complexity of testing equality to D was undertaken in [17].)

Our Reduction at a Glance. We decouple the reduction asserted in Theorem 1.1 into two steps. In the first step, we assume that the distribution D has a probability function q that ranges over multiples of 1/m, for some parameter $m\in \mathbb N$; that is, $m\cdot q(i)$ is a non-negative integer (for every i). We call such a distribution m-grained, and reduce testing equality to any fixed m-grained distribution to testing equality to the uniform distribution over [m]. This reduction maps i uniformly at random to a set $S_i$ of size $m\cdot q(i)$ such that the $S_i$’s are disjoint. Clearly, this reduction maps the distribution q to the uniform distribution over m fixed elements, and it can be verified that this randomized mapping preserves distances between distributions.

Since every distribution over [n] is $\epsilon /2$-close to a $O(n/\epsilon )$-grained distribution, it is stands to reason that the general case can be reduced to the grained case. This is indeed true, but the reduction is less obvious than the treatment of the grained case. Actually, we shall use a different “graining” procedure (than the one eluded to above), which yields a better result (i.e., a reduction to the case of O(n)-grained distributions rather than to the case of $O(n/\epsilon )$-grained distributions). Specifically, we present a reduction of $\epsilon $-testing equality to any distribution $D\subseteq [n]$ to $\epsilon /3$-testing equality to some 6n-grained distribution $D'$, where $D'$ depends only on D. This reduction is described next.

Letting $q:[n]\rightarrow [0,1]$ denote the probability function of D, the reduction maps $i\in [n]$ to itself with probability $\frac{{\lfloor 6n\,\cdot \, q(i)\rfloor }/6n}{q(i)}$, and otherwise maps i to $n+1$. This description suffices when $q(i)\ge 1/2n$ for every $i\in [n]$, since in this case $\frac{{\lfloor 6n\,\cdot \, q(i)\rfloor }/6n}{q(i)}\ge \frac{2}{3}$, and in order to guaranteed this condition (i.e., $q(i)\ge 1/2n$ for every $i\in [n]$) we use a preliminary reduction that maps $i\in [n]$ to itself with probability 1/2 and maps it uniformity to [n] otherwise. This preliminary reduction cuts the distance between distributions by a factor of two, and it can be shown that the main randomized mapping preserves distances between distributions up to a constant factor (of 2/3).

History, Credits, and an Acknowledgement. The study of testing properties of distributions was initiated by Batu, Fortnow, Rubinfeld, Smith and White [2].^{Footnote 1} Testers of sample complexity $\mathrm{poly}(1/\epsilon )\cdot {\sqrt{n}}$ for equality to $U_n$ and for equality to an arbitrary distribution D over [n] were presented by Goldreich and Ron [12] and Batu et al. [1], respectively, were the presentation in [12] is only implicit.^{Footnote 2} The tight lower and upper bound of $\varTheta ({\sqrt{n}}/\epsilon ^2)$ on the sample complexity of both problems were presented in [4, 14, 17] (see also [5, 6]). For a general survey of the area, the interested reader is referred to Canonne [3].

As stated upfront, our reductions are inspired by Diakonikolas and Kane [5], who presented a unified approach for deriving optimal testers for various properties of distributions (and pairs of distributions) via reductions to testing the equality of two unknown distributions that have small $\mathcal{L}_2$-norm. We note that our reduction from testing equality to grained distributions to testing equality to the uniform distribution is implicit in [6].

Lastly, we wish to thank Ilias Diakonikolas for numerous email discussions, which were extremely helpful in many ways.

Organization. In Sect. 2 we recall the basic context and define the restricted notion of a reduction used in this work. The core of this work is presented in Sect. 3, where we prove Theorem 1.1. In Sect. 4 we briefly consider the problem of testing whether an unknown distribution is grained, leaving an open problem. The appendix addresses a side issue that arises in Sect. 4.

2 Preliminaries

We consider discrete probability distributions. Such distribution have a finite support, which we assume to be a subset of [n] for some $n\in \mathbb N$, where the support of a distribution is the set of elements assigned positive probability mass. We represent such distributions either by random variables, like X, that are assigned values in [n] (indicated by writing $X\in [n]$), or by probability functions like $p:[n]\rightarrow [0,1]$ that satisfy $\sum _{i\in [n]}p(i)=1$. These two representation correspond via $p(i)=\mathbf{Pr}[X\!=\!i]$. At times, we also refer to distributions as such, and denote them by D. (Distributions over other finite sets can be treated analogously, but in such a case we may provide the tester with a description of the set; indeed, n serves as a concise description of [n].)

Recall that the study of “distribution testing” refers to testing properties of distributions. That is, the object being testing is a distribution, and the property it is tested for is a property of distributions (equiv., a set of distributions). The tester itself is given samples from the distribution and is required to distinguish the case that the distribution has the property from the case that the distribution is far from having the property, where the distance between distributions is defined as the total variation distance between them (a.k.a the statistical difference). That is, X and Y are said to be $\epsilon $ -close if

$$\begin{aligned} \frac{1}{2}\cdot \sum _{i}\Bigl |\mathbf{Pr}[X\!=\!i]-\mathbf{Pr}[Y\!=\!i]\Bigr |\le \epsilon , \end{aligned}$$

(1)

and otherwise they are deemed $\epsilon $ -far. With this definition in place, we are ready to recall the standard definition of testing distributions.

Definition 2.1

(testing properties of distributions): Let $\mathcal{D}=\{\mathcal{D}_n\}_{n\in \mathbb N}$ be a property of distributions and $s:\mathbb N\times (0,1]\rightarrow \mathbb N$. A tester, denoted T, of sample complexity s for the property $\mathcal D$ is a probabilistic machine that, on input parameters n and $\epsilon $, and a sequence of s(n) samples drawn from an unknown distribution $X\in [n]$, satisfies the following two conditions.

1.
The tester accepts distributions that belong to $\mathcal D$: If X is in $\mathcal{D}_n$, then
$$\mathbf{Pr}_{i_1,...,i_s\sim X}[T(n,\epsilon ;i_1,...,i_s)\!=\!1]\ge 2/3,$$
where $s=s(n,\epsilon )$ and $i_1,...,i_s$ are drawn independently from the distribution X.
2.
The tester rejects distributions that far from $\mathcal D$: If X is $\epsilon $-far from any distribution in $\mathcal{D}_n$ (i.e., X is $\epsilon $-far from $\mathcal D$), then
$$\mathbf{Pr}_{i_1,...,i_s\sim X}[T(n,\epsilon ;i_1,...,i_s)\!=\!0]\ge 2/3,$$
where $s=s(n,\epsilon )$ and $i_1,...,i_s$ are as in the previous item.

Our focus is on “singleton” properties; that is, the property is $\{D_n\}_{n\in \mathbb N}$, where $D_n$ is a fixed distribution over [n]. Note that n fully specifies the distribution $D_n$, and we do not consider the complexity of obtaining an explicit description of $D_n$ from n. For sake of simplicity, we will consider a generic n and omit it from the notation (i.e., use D rarher than $D_n$). Furthermore, we refer to $\epsilon $-testers derived by setting the proximity parameter to $\epsilon $. Nevertheless, all testers discussed here are actually uniform with respect to the proximity parameter $\epsilon $ (and also with respect to n, assuming that they already derived or obtained an explicit description of $D_n$).

Confining ourselves to problems of testing equality to distributions, we formally restate the notion of a reduction used in the introduction. In fact, we explicitly refer to the randomized mapping at the heart of the reduction, and also define a stronger (i.e., uniform over $\epsilon $) notion of a reduction that captures the furthermore part of Theorem 1.1.

Definition 2.2

(reductions via filters): We say that a randomized process F, called a filter, reduces $\epsilon $-testing equality to $D\subseteq [n]$ to $\epsilon '$-testing equality to $D'\subseteq [n']$ if the following two conditions hold:

1.
The filter F maps the distribution D to the distribution $D'$; that is, $p'(i)=\sum _{j}p(j)\cdot \mathbf{Pr}[F(j)\!=\!i]$, where p and $p'$ denote the probability functions of D and $D'$, respectively.
2.
The filter F maps any distribution that is $\epsilon $-far from D to a distribution that is $\epsilon '$-far from $D'$; that is, if q is $\epsilon $-far from D, then $q'(i)\,{\mathop {=}\limits ^\mathrm{def}}\,\sum _{j}q(j)\cdot \mathbf{Pr}[F(j)\!=\!i]$ is $\epsilon '$-far from $D'$.

We say that F reduces testing equality to $D\subseteq [n]$ to testing equality to $D'\subseteq [n']$ if, for some constant c and every $\epsilon >0$, it holds that F reduces $\epsilon $-testing equality to D to $\epsilon /c$-testing equality to $D'$.

Recall that we say that F maps the distribution X to the distribution Y if Y and F(X) are identically distributed (i.e., $Y\equiv F(X)$), where we view the distributions as random variables. We stress that if F is invoked t times on the same i, then the t outcomes are (identically and) independently distributed. Hence, a sequence of samples drawn independently from a distribution X is mapped to a sequence of samples drawn independently from the distribution F(X).

Note (added in revision): As stated in the introduction, Definition 2.2 captures a natural but stringent notion of a reduction. First, note that this notion extends to reducing testing any set of distributions $\mathcal D$ to testing the set $\mathcal{D}'$ (by requiring that F maps any distribution in $\mathcal D$ to some distribution in $\mathcal{D}'$ while mapping any distribution that is $\epsilon $-far from $\mathcal D$ to a distribution that is $\epsilon '$-far from $\mathcal{D}'$). However, more general definitions may allow the tester of $\mathcal D$ to use the sample provided to it in arbitrary ways and invoke the tester of $\mathcal{D}'$ on an arbitrary sample as long as it distinguishes distributions in $\mathcal D$ from distributions that are $\epsilon $-far from $\mathcal D$. While such general definitions are analogous to Cook-reductions, Definition 2.2 seems analogous to a very restricted (i.e., “local”) notion of a Karp-reduction.

3 The Reduction

Recall that testing equality to a fixed distribution D means testing the property $\{D\}$; that is, testing whether an unknown distribution equals the fixed distribution D. For any distribution D over [n], we present a reduction of the task of $\epsilon $-testing $\{D\}$ to the task of $\epsilon /3$-testing the uniform distribution over [6n].

3.1 Overview

We decouple the reduction into two steps. In the first step, we assume that the distribution D has a probability function q that ranges over multiples of 1/m, for some parameter $m\in \mathbb N$; that is, $m\,\cdot \, q(i)$ is a non-negative integer (for every i). We call such a distribution m-grained, and reduce testing equality to any fixed m-grained distribution to testing uniformity (over [m]). Next, in the second step, we reduce testing equality to any distribution over [n] to testing equality to some 6n-grained distribution.

Definition 3.1

(grained distributions): A probability distribution over [n] having a probability function $q:[n]\rightarrow [0,1]$ is m -grained if q ranges over multiples of 1/m; that is, if for every $i\in [n]$ there exists a non-negative integer $m_i$ such that $q(i)=m_i/m$.

Note that the uniform distribution over [n] is n-grained, and it is the only n-grained distribution having support [n]. Furthermore, if a distribution D results from applying some deterministic process to the uniform distribution over [m], then D is m-grained. On the other hand, any m-grained distribution must have support size at most m.

3.2 Testing Equality to a Fixed Grained Distribution

Fixing any m-grained distribution (represented by a probability function) $q:[n]\rightarrow \{j/m:j\in \mathbb N\cup \{0\}\}$, we consider a randomized transformation (or “filter”), denoted $F_q$, that maps the support of q to $S=\{{\langle {i,j}\rangle }:i\!\in \![n]\wedge j\!\in \![m_i]\}$, where $m_i=m\cdot q(i)$. (We stress that, as with any randomized process, invoking the filter several times on the same input yields independently and identically distributed outcomes.) Specifically, for every i in the support of q, we map i uniformly to $S_i=\{{\langle {i,j}\rangle }:j\!\in \![m_i]\}$; that is, $F_q(i)$ is uniformly distributed over $S_i$. If i is outside the support of q (i.e., $q(i)=0$), then we map it to ${\langle {i,0}\rangle }$. Note that $|S|=\sum _{i\in [n]}m_i=\sum _{i\in [n]}m\cdot q(i)=m$. The key observations about this filter are:

1.
The filter $F_q$ maps q to a uniform distribution: If Y is distributed according to q, then $F_q(Y)$ is distributed uniformly over S; that is, for every ${\langle {i,j}\rangle }\in S$, it holds that
$$\begin{aligned} \mathbf{Pr}[F_q(Y)={\langle {i,j}\rangle }]&= \mathbf{Pr}[Y=i] \cdot \mathbf{Pr}[F_q(i)={\langle {i,j}\rangle }] \\&= q(i) \cdot \frac{1}{m_i} \\&= \frac{m_i}{m} \cdot \frac{1}{m_i} \end{aligned}$$
which equals $1/m=1/|S|$.
2.
The filter preserves the variation distance between distributions: The total variation distance between $F_q(X)$ and $F_q(X')$ equals the total variation distance between X and $X'$. This holds since, for $S'=S\cup \{{\langle {i,0}\rangle }:i\in [n]\}$, we have
$$\begin{aligned}&\sum _{{\langle {i,j}\rangle }\in S'} \Bigl |\mathbf{Pr}[F_q(X)={\langle {i,j}\rangle }]-\mathbf{Pr}[F_q(X')={\langle {i,j}\rangle }] \Bigr | \\&\;\;= \sum _{{\langle {i,j}\rangle }\in S'} \Bigl |\mathbf{Pr}[X=i] \cdot \mathbf{Pr}[F_q(i)={\langle {i,j}\rangle }] -\mathbf{Pr}[X'=i] \cdot \mathbf{Pr}[F_q(i)={\langle {i,j}\rangle }] \Bigr | \\&\;\;= \sum _{{\langle {i,j}\rangle }\in S'} \mathbf{Pr}[F_q(i)={\langle {i,j}\rangle }]\cdot \Bigl |\mathbf{Pr}[X=i]-\mathbf{Pr}[X'=i] \Bigr | \\&\;\;= \sum _{i\in [n]} \Bigl |\mathbf{Pr}[X=i]-\mathbf{Pr}[X'=i] \Bigr |. \end{aligned}$$
Indeed, this is a generic statement that applies to any filter that maps i to a random variable $Z_i$, which only depends on i, such that the supports of the different $Z_i$’s are disjoint.

Observing that knowledge of q allows to implement $F_q$ as well as to map S to [m], yields the following reduction.

Algorithm 3.2

(reducing testing equality to m-grained distributions to testing uniformity over [m]): Let D be an m-grained distribution with probability function $q:[n]\rightarrow \{j/m:j\in \mathbb N\cup \{0\}\}$. On input $(n,\epsilon ;i_1,...,i_s)$, where $i_1,...,i_s\in [n]$ are samples drawn according to an unknown distribution $p:[n]\rightarrow [0,1]$, invoke an $\epsilon $-tester for uniformity over [m] by providing it with the input $(m,\epsilon ;i'_1,...,i'_s)$ such that for every $k\in [s]$ the sample $i'_k$ is generated as follows:

1.
Generate ${\langle {i_k,j_k}\rangle }\leftarrow F_q(i_k)$.

Recall that if $m_{i_k}\,{\mathop {=}\limits ^\mathrm{def}}\, m\cdot q(i_k)>0$, then $j_k$ is selected uniformly in $[m_k]$, and otherwise $j_k\leftarrow 0$. We stress that if $F_q$ is invoked t times on the same i, then the t outcomes are (identically and) independently distributed. Hence, the s samples drawn independently from p are mapped to s samples drawn independently from $p'$ such that $p'({\langle {i,j}\rangle })=p(i)/m_i$ if $j\in [m_i]$ and $p'({\langle {i,0}\rangle })=p(i)$ if $m_i=0$.
2.
If $j_k\in [m_{i_k}]$, then ${\langle {i_k,j_k}\rangle }\in S$ is mapped to its rank in S (according to a fixed order of S), where $S=\{{\langle {i,j}\rangle }:i\!\in \![n]\wedge j\!\in \![m_i]\}$, and otherwise ${\langle {i_k,j_k}\rangle }\not \in S$ is mapped to $m+1$.

(Alternatively, the reduction may just reject if any of the $j_k$’s equals 0.)^{Footnote 3}

The forgoing description presumes that the tester for uniform distributions over [m] also operates well when given arbitrary distributions (which may have a support that is not a subset of [m]).^{Footnote 4} However, any tester for uniformity can be easily extended to do so (see discussion in Sect. 3.4). In any case, we get

Proposition 3.3

(Algorithm 3.2 as a reduction): The filter $F_q$ used in Algorithm 3.2 reduces $\epsilon $-testing equality to an m-grained distribution D (over [n]) to $\epsilon $-testing equality to the uniform distribution over [m], where the distributions tested in the latter case are over $[m+1]$. Furthermore, if the support of D equals [n] (i.e., $q(i)>0$ for every $i\in [n]$), which may happen only if $m\ge n$, then the reduction is to testing whether a distribution over [m] is uniform on [m].

Using any of the known uniformity tests that have sample complexity $O({\sqrt{n}}/\epsilon ^2)$,^{Footnote 5} we obtain—

Corollary 3.4

(testing equality to m-grained distributions): For any fixed m-grained distribution D, the property $\{D\}$ can be $\epsilon $-tested in sample complexity $O({\sqrt{m}}/\epsilon ^2)$.

The foregoing tester for equality to grained distributions seems to be of independent interest, which extends beyond its usage towards testing equality to arbitrary distributions.

3.3 From Arbitrary Distributions to Grained Ones

We now turn to the problem of testing equality to an arbitrary known distribution, represented by $q:[n]\rightarrow [0,1]$. The basic idea is to round all probabilities to multiples of $\gamma /n$, for an error parameter $\gamma $ (which will be a small constant). Of course, this rounding should be performed so that the sum of probabilities equals 1. For example, we may use a randomized filter that, on input i, outputs i with probability $\frac{m_i\,\cdot \,\gamma /n}{q(i)}$, where $m_i={\lfloor q(i)\,\cdot \, n/\gamma \rfloor }$, and outputs $n+1$ otherwise. Hence, if i is distributed according to p, then the output of this filter will be i with probability $\frac{\gamma m_i/n}{q(i)}\cdot p(i)$. This works well if $\gamma m_i/n\approx q(i)$, which is the case if $q(i)\gg \gamma /n$ (equiv., $m_i\gg 1$), but may run into trouble otherwise.

For starters, we note that if $q(i)=0$, then $\frac{\gamma m_i/n}{q(i)}$ is undefined, and replacing it by either 0 or 1 will not do. More generally, suppose that $q(i)\in (0,\gamma /n)$ (e.g., $q(i)=0.4\gamma /n$). In this case, setting $m_i=0$ means that the filter is oblivious of the probability assigned to this i, and does not distinguish distributions that agree on $\{i:q(i)\ge \gamma /n\}$ but greatly differ on $\{i:q(i)<\gamma /n\}$, which means that it does not distinguish the distribution associated with q from some distributions that are $0.1\gamma $-far from it.^{Footnote 6} Hence, we modify the basic idea such to avoid this problem.

Specifically, we first use a filter that averages the input distribution p with the uniform distribution, and so guarantees that all elements occur with probability at least 1/2n, while preserving distances between different input distributions (up to a factor of two). Only then, do we apply the foregoing proposed filter (which outputs i with probability $\frac{m_i\,\cdot \,\gamma /n}{q(i)}$, where $m_i={\lfloor q(i)\cdot n/\gamma \rfloor }$, and outputs $n+1$ otherwise). Details follow.

1.
We first use a filter $F'$ that, on input $i\in [n]$, outputs i with probability 1/2, and outputs the uniform distribution (on [n]) otherwise. Hence, if i is distributed according to the distribution p, then $F'(i)$ is distributed according to $p'=F'{({p})}$ such that
$$\begin{aligned} p'(i)=\frac{1}{2}\cdot p(i)+\frac{1}{2}\cdot \frac{1}{n}. \end{aligned}$$
(2)
(Indeed, we denote by $F'{({p})}$ the probability function of the distribution obtained by selecting i according to the probability function p and outputting $F'(i)$.)

Let $q'=F'{({q})}$; that is, $q'(i)=0.5\cdot q(i)+(1/2n)\ge 1/2n$.
2.
Next, we apply a filter $F''_{q'}$, which is related to the filter $F_q$ used in Algorithm 3.2. Letting $m_i={\lfloor q'(i)\cdot n/\gamma \rfloor }$, on input $i\in [n]$, the filter outputs i with probability $\frac{m_i\,\cdot \,\gamma /n}{q'(i)}$, and outputs $n+1$ otherwise (i.e., with probability $1-\frac{m_i\gamma /n}{q'(i)}$).

Note that $\frac{m_i\gamma /n}{q'(i)}\le 1$, since $m_i\le q'(i)\cdot n/\gamma $. On the other hand, recalling that $q'(i)\ge 1/2n$ and observing that $m_i\cdot \gamma /n > ((q'(i)\cdot n/\gamma )-1)\cdot \gamma /n = q'(n)-(\gamma /n)$, it follows that $\frac{m_i\gamma /n}{q'(i)} > \frac{q'(i)\,-\,(\gamma /n)}{q'(i)} \ge 1-2\gamma $, since $q'(i)\ge 1/2n$.

Now, if i is distributed according to the distribution $p'$, then $F''_{q'}(i)$ is distributed according to $p'':[n+1]\rightarrow [0,1]$ such that, for every $i\in [n]$, it holds that
$$\begin{aligned} p''(i)=p'(i)\cdot \frac{m_i\cdot \gamma /n}{q'(i)} \end{aligned}$$
(3)
and $p''(n+1)=1-\sum _{i\in [n]}p''(i)$.

Let $q''$ denote the probability function related (by $F''_{q'}$) to $q'$. Then, for every $i\in [n]$, it holds that $q''(i)=q'(i)\cdot \frac{m_i\gamma /n}{q'(i)}=m_i\,\cdot \,\gamma /n \in \{j\cdot \gamma /n:j\in \mathbb N\cup \{0\}\}$ and $q''(n+1)=1-\sum _{i\in [n]}m_i\cdot \gamma /n<\gamma $, since $m\,{\mathop {=}\limits ^\mathrm{def}}\,\sum _{i\in [n]}m_i > \sum _{i\in [n]}((n/\gamma )\cdot q'(i)-1) = (n/\gamma )-n$.

Note that if $n/\gamma $ is an integer, then $q''$ is $n/\gamma $-grained. Furthermore, if $m=n/\gamma $, which happens if and only if $q'(i)=m_i\cdot \gamma /n$ for every $i\in [n]$ (i.e., $q'$ is itself $n/\gamma $-grained), then $q''$ has support [n], and otherwise it has support $[n+1]$.

Combining these two filters, we obtain the desired reduction.

Algorithm 3.5

(reducing testing equality to a general distribution to testing equality to a O(n)-grained distributions): Let D be an arbitrary distribution with probability function $q:[n]\rightarrow [0,1]$, and T be an $\epsilon '$-tester for m-grained distributions having sample complexity $s(m,\epsilon ')$. On input $(n,\epsilon ;i_1,...,i_s)$, where $i_1,...,i_s\in [n]$ are $s=s(O(n),\epsilon /3)$ samples drawn according to an unknown distribution p, the tester proceeds as follows:

1.
It produces a s-long sequence $(i''_1,...,i''_s)$ by applying $F''_{F'{({q})}}\circ F'$ to $(i_1,...,i_s)$, where $F'$ and $F''_{q'}$ are as in Eqs. (2) and (3); that is, for every $k\in [s]$, it produces $i'_k\leftarrow F'(i_k)$ and $i''_k\leftarrow F''_{F'{({q})}}(i'_k)$.

(Recall that $F''_{q'}$ depends on a universal constant $\gamma $, which we shall set to 1/6.)
2.
It invokes the $\epsilon /3$-tester T for equality to $q''$ while providing it with the sequence $(i''_1,...,i''_s)$. Note that this is a sequence over $[n+1]$.

Using the notations as in Eqs. (2) and (3), we first observe that the total variation distance between $p'=F'{({p})}$ and $q'=F'{({q})}$ is half the total variation distance between p and q (since $p'(i)=0.5\cdot p(i)\,+\,(1/2n)$ and ditto for $q'$). Next, we observe that the total variation distance between $p''=F''_{q'}{({p'})}$ and $q''=F''_{q'}{({q'})}$ is lower-bounded by a constant fraction of the total variation distance between $p'$ and $q'$. To see this, let X and Y be distributed according to $p'$ and $q'$, respectively, and observe that

$$\begin{aligned} \sum _{i\in [n]}\Bigl |\mathbf{Pr}[F''_{q'}(X)=i]-\mathbf{Pr}[F''_{q'}(Y)=i] \Bigr |&= \sum _{i\in [n]} \left| p'(i)\cdot \frac{m_i\gamma /n}{q'(i)} -q'(i)\cdot \frac{m_i\gamma /n}{q'(i)} \right| \\&= \sum _{i\in [n]} \frac{m_i\gamma /n}{q'(i)}\cdot \left| p'(i)-q'(i)\right| \\&\ge \min _{i\in [n]}\left\{ \frac{m_i\gamma /n}{q'(i)}\right\} \cdot \sum _{i\in [n]}\left| p'(i)-q'(i)\right| . \end{aligned}$$

As stated above, recalling that $q'(i)\ge 1/2n$ and $m_i={\lfloor (n/\gamma )\cdot q'(i)\rfloor }>(n/\gamma )\cdot q'(i)-1$, it follows that

$$\frac{m_i\gamma /n}{q'(i)} > \frac{((n/\gamma )\cdot q'(i)-1)\cdot \gamma /n}{q'(i)} = 1-\frac{\gamma /n}{q'(i)} \ge 1-\frac{\gamma /n}{1/2n} = 1-2\gamma .$$

Hence, if p is $\epsilon $-far from q, then $p'$ is $\epsilon /2$-far from $q'$, and $p''$ is $\epsilon /3$-far from $q''$, where we use $\gamma \le 1/6$. On the other hand, if $p=q$, then $p''=q''$. Noting that $q''$ is an $n/\gamma $-grained distribution, provided that $n/\gamma $ is an integer (as is the case for $\gamma =1/6$), we complete the analysis of the reduction. Hence,

Proposition 3.6

(Algorithm 3.5 as a reduction): The filter $F''_{F'{({q})}}\circ F'$ used in Algorithm 3.5 reduces $\epsilon $-testing equality to any fixed distribution D (over [n]) to $\epsilon $-testing equality to an 6n-grained distribution over $[n']$, where $n'\in \{n,n+1\}$ depends on q.^{Footnote 7} Furthermore, the support of $F''_{F'{({q})}}\circ F'{({q})}$ equals $[n']$.

Hence, the sample complexity of $\epsilon $-testing equality to arbitrary distributions over [n] equals the sample complexity of $\epsilon /3$-testing equality to 6n-grained distributions (which is essentially a special case).

Digest. One difference between the filter underlying Algorithm 3.2 and the one underlying Algorithm 3.5 is that the former preserves the exact distance between distributions, whereas the later only preserves them up to a constant factor. The difference is reflected in the fact that the first filter maps the different i’s to distributions of disjoint support, whereas the second filter (which is composed of the filters of Eqs. (2) and (3)) maps different i’s to distributions of non-disjoint support. (Specifically, the filter of Eq. (2) maps every $i\in [n]$ to a distribution that assigns each $i'\in [n]$ probability at least 1/2n, whereas the filter of Eq. (3) typically maps each $i\in [n]$ to a distribution with a support that contains the element $n+1$.)

3.4 From Arbitrary Distributions to the Uniform One

Combining the reductions stated in Propositions 3.3 and 3.6, we obtain a proof of Theorem 1.1.

Theorem 3.7

(Theorem 1.1, restated): For every probability function $q:[n]\rightarrow [0,1]$ the filter $F_{q''}\circ F''_{F'{({q})}}\circ F'$, where $q''=F''_{F'{({q})}}\circ F'{({q})}$ is as in Algorithm 3.5 and $F_{q''}$ is as in Algorithm 3.2, reduces $\epsilon $-testing equality to q to $\epsilon /3$-testing equality to the uniform distribution over [6n].

Proof:

First, setting $\gamma =1/6$ and using the filter $F''_{F'{({q})}}\circ F'$, we reduce the problem of $\epsilon $-testing equality to q to the problem of $\epsilon /3$-testing equality to the 6n-grained distribution $q''$, while noting that the distribution $q''$ has support $[n']$, where $n'\in \{n,n+1\}$ (depending on q). Note that the latter assertion relies on the furthermore part of Proposition 3.6. Next, using the furthermore part of Proposition 3.3, we note that $F_{q''}$ reduces $\epsilon /3$-testing equality to $q''$ to $\epsilon /3$-testing equality to the uniform distribution over [6n]. $\blacksquare $

Observe that the proof of Theorem 3.7 avoids the problem discussed right after the presentation of Algorithm 3.2, which refers to the fact that testing equality to an m-grained distribution over [n] is reduced to testing whether distributions over $[m']$ are uniform over [m], where in some cases $m'\in [m,m\,+\,n]$ rather than $m'=m$. These bad cases arise when the support of the m-grained distribution is a strict subset of [n] (see Footnote 4), and it was avoided above because we applied the filter of Algorithm 3.2 to distributions $q'':[n']\rightarrow [0,1]$ that have support $[n']$. Nevertheless, it is nice to have a reduction from the general case of “testing uniformity” to the special case, where the general case refers to testing whether distributions over [n] are uniform over [m], for any n and m, and the special case mandates that $m=n$. Such a reduction is provided next.

Theorem 3.8

(testing uniform distributions, a reduction between two versions): There exists a simple filter that maps $U_m$ to $U_{2m}$, while mapping any distribution X that is $\epsilon $-far from $U_m$ to a distribution over [2m] that is $\epsilon /2$-far from $U_{2m}$. We stress that X is not necessarily distributed over [m] and remind the reader that $U_n$ denotes the uniform distribution over [n].

Thus, for every n and m, this filter reduces $\epsilon $-testing whether distributions over [n] are uniform over [m] to $\epsilon /2$-testing whether distributions over [2m] are uniform over [2m].

Proof:

The filter, denoted F, maps $i\in [m]$ uniformly at random to an element in $\{i,m+i\}$, while mapping any $i\not \in [m]$ uniformly at random to an element in [m]. Observe that any distribution over [n] is mapped to a distribution, and that $F(U_m)\equiv U_{2m}$. Note that F does not necessarily preserve distances between arbitrary distributions over [n] (e.g., both the uniform distribution over [2m] and the uniform distribution over $[m]\cup [2m+1,3m]$ are mapped to the same distribution), but (as shown next) F preserves distances to the relevant uniform distributions up to a constant factor. Specifically, note that

$$\begin{aligned} \sum _{i\in [m+1,2m]}\Bigl |\mathbf{Pr}[F(X)\!=\!i]-\mathbf{Pr}[U_{2m}\!=\!i]\Bigr |= & {} \sum _{i\in [m]}\left| \mathbf{Pr}[X\!=\!i]\cdot \frac{1}{2}-\frac{1}{2m}\right| \\= & {} \frac{1}{2}\cdot \sum _{i\in [m]}\Bigl |\mathbf{Pr}[X\!=\!i]-\mathbf{Pr}[U_{m}\!=\!i]\Bigr | \end{aligned}$$

and

$$\begin{aligned} \sum _{i\in [m]}\Bigl |\mathbf{Pr}[F(X)\!=\!i]-\mathbf{Pr}[U_{2m}\!=\!i]\Bigr |\ge & {} \mathbf{Pr}\left[ F(X)\in [m]\right] -\mathbf{Pr}\left[ U_{2m}\in [m]\right] \\= & {} \left( \mathbf{Pr}[X\in [m]]\cdot \frac{1}{2}+\mathbf{Pr}[X\not \in [m]]\right) -\frac{1}{2} \\= & {} \frac{1}{2}\cdot \mathbf{Pr}[X\not \in [m]] \\= & {} \frac{1}{2}\cdot \sum _{i\not \in [m]} \Bigl |\mathbf{Pr}[X\!=\!i]-\mathbf{Pr}[U_{m}\!=\!i]\Bigr |. \end{aligned}$$

Hence, the total variation distance between F(X) and $U_{2m}$ is at least half the total variation distance between X and $U_{m}$. $\blacksquare $

4 On Testing Whether a Distribution Is Grained

A natural question that arises from the interest in grained distributions refers to the complexity of testing whether an unknown distribution is grained. Specifically, given n and m (and a proximity parameter $\epsilon $), how many samples are required in order to determine whether an unknown distribution X over [n] is m-grained or $\epsilon $-far from any m-grained distribution?

This question can be partially answered by invoking the results of Valiant and Valiant [16]. Specifically, for an upper bound we use their “learning up to relabelling” algorithm, which may be viewed as a learner of histograms (which is what it actually does). Recall that the histogram of the probability function p is defined as the multiset $\{p(i):i\in [n]\}$ (equiv., as the set of pairs $\{(v,m):m=|\{i\!\in \![n]:p(i)\!=\!v\}|>0\}$).

Theorem 4.1

(learning the histogram [16, Thm. 1]):^{Footnote 8} There exists an $O(\epsilon ^{-2}\cdot n/\log n)$ time algorithm that, on input $n,\epsilon $ and $O(\epsilon ^{-2}\cdot n/\log n)$ samples drawn from an unknown distribution $p:[n]\rightarrow [0,1]$, outputs, with probability $1-1/\mathrm{poly}(n)$, a histogram of a distribution that is $\epsilon $-close to p.

The implication of this result on testing any label-invariant property of distributions is immediate, where a property of distribution $\mathcal D$ is called label-invariant if for every distribution $p:[n]\rightarrow [0,1]$ in $\mathcal D$ and every permutation $\pi :[n]\rightarrow [n]$ it holds that $p\circ \pi $ is in $\mathcal D$. In our case, the tester consists of employing the algorithm of Theorem 4.1 with proximity parameter $\epsilon /2$ and accepting if and only if the output fits a histogram of a distribution that is $\epsilon /2$-close to being m-grained. The same holds with respect to estimating the distance from the set of m-grained distributions (which can be captured as a special case of label-invariant properties). Hence, we get

Corollary 4.2

(testing whether a distribution is grained): For every $n,m\in \mathbb N$, the set of m-grained distributions over [n] has a tester of sample complexity $O(\epsilon ^{-2}\cdot n/\log n)$. Furthermore, the distance of an unknown distribution to the set of m-grained distributions over [n] can be approximated up to an additive error of $\epsilon $ using the same number of samples.

We comment that it seems that, using the techniques of [16], one can reduce the complexity to $O(\epsilon ^{-2}\cdot n'/\log n')$, where $n'=\min (n,m)$. (For the case of testing, this is shown in the Appendix, using a reduction.) On the other hand, for $m\in [\varOmega (n),O(n)]$, the above distance approximator is optimal, whereas it makes no sense to consider $m>n/\epsilon $ (since any distribution over [n] is $\epsilon $-close to being $n/\epsilon $-grained). The negative result follows from the corresponding result of Valiant and Valiant [16].

Theorem 4.3

(optimality of Theorem 4.1 [16, Thm. 2]):^{Footnote 9} For every sufficiently small $\epsilon >0$, there exist two distributions $p_1,p_2:[n]\rightarrow [0,1]$ that are indistinguishable by any label-invariant algorithm that takes $O(\epsilon ^{-1}n/\log n)$ samples although $p_1$ is $\epsilon $-close to the uniform distribution over [n] and $p_2$ is $\epsilon $-close to the uniform distribution over some set of n/2 elements.

Let us spell out that, in the current context, an algorithm A is called label-invariant if for every permutation $\pi :[n]\rightarrow [n]$ and every sample $i_1,...,i_s$, it holds that $A(n,\epsilon ;i_1,...,i_s)\equiv A(n,\epsilon ;\pi (i_1),...,\pi (i_s))$. Indeed, when estimating the distance to a label-invariant property, we may assume (w.l.o.g.) that the algorithm is label-invariant. Combining Theorem 4.3 with the latter fact, we get—

Corollary 4.4

(optimality of Corollary 4.2): For any $m\in [\varOmega (n),O(n)]$, estimating the distance to the set of m-grained distributions over [n] up to a sufficiently small additive constant requires $\varOmega (n/\log n)$ samples.

Similarly, tolerant testing (cf. [15]) in the sense of distinguishing distributions that are $\epsilon _1$-close to being m-grained from distributions that are $\epsilon _2$-far from being m-grained requires $\varOmega (n/\log n)$ samples, for any constant $\epsilon _2\in (0,1/(2\cdot {\lfloor 2m/n\rfloor }))$ and $\epsilon _1\in (0,\epsilon _2)$.

Proof:

The case of $m=n/2$ follows by invoking Theorem 4.3, while observing that $p_2$ is $\epsilon $-close to being m-grained, whereas $p_1$ is $\epsilon $-close to a distribution (i.e, $U_{2m}$) that is $(0.499-\epsilon )$-far from being m-grained,^{Footnote 10} where $p_1,p_2$ and $\epsilon $ are as in Theorem 4.3. Hence, distinguishing the distributions $p_2$ and $p_1$ (in a label-invariant manner) is reducible to $(0.499-2\epsilon )$-testing the set of distributions that are $\epsilon $-close to be m-grained, which implies that the latter task has sample complexity $\varOmega (n/\log n)$.

The case of $m<n/2$ is reduced to the case of $m=n/2$, by resetting $n\leftarrow 2m$. This yields a lower bound of $\varOmega (m/\log m)$. Using the hypothesis $m=\varOmega (n)$, we derive the desired lower bound of $\varOmega (n/\log n)$.

For $m>n/2$ (equiv., $n<2m$), we show a reduction of the distinguishing task underlying Theorem 4.3 to the testing problem at hand. Specifically, let $t={\lceil 2m/n\rceil }$ and $m'\,{\mathop {=}\limits ^\mathrm{def}}\,{\lfloor m/t\rfloor }$, and note that $t\in [2,O(1)]$ and $m'\in [\varOmega (n),n/2]$ (by the various hypothesis).^{Footnote 11} We assume for simplicity that $m'=m/t$ (equiv., t divides m).^{Footnote 12} Now, consider a randomized filter, denoted $F_{m,t}:[n]\rightarrow [n]$, that maps each $i\in [m']$ to $m'+i$ with probability 1/t and otherwise maps it to itself, but always maps $i\in [n]\setminus [m']$ to $i-m'$. Then:

$F_{m,t}$ maps the uniform distribution over $[m']$ to an m-grained distribution, since $q'_2(j)\,{\mathop {=}\limits ^\mathrm{def}}\,\mathbf{Pr}[F_{m,t}(U_{m'})\!=\!j]$ equals $\mathbf{Pr}[U_{m'}\!=\!j]\cdot \frac{t\,-\,1}{t} = \frac{1}{m'}\cdot \frac{t\,-\,1}{t} = \frac{t\,-\,1}{m}$ if $j\in [m']$ and equals $\mathbf{Pr}[U_{m'}\!=\!j\,-\,m']\cdot \frac{1}{t}=\frac{1}{m}$ if $j\in [m'+1,2m']$.
$F_{m,t}$ maps the uniform distribution over $[2m']$ to a distribution that is (0.999/2t)-far from being m-grained, since $q'_1(j)\,{\mathop {=}\limits ^\mathrm{def}}\,\mathbf{Pr}[F_{m,t}(U_{2m'})\!=\!j]$ equals $\mathbf{Pr}[U_{2m'}\!=\!j+m']+\mathbf{Pr}[U_{2m'}\!=\!j]\cdot \frac{t\,-\,1}{t} =\frac{t\,+\,t\,-\,1}{2m't} =\frac{2t\,-\,1}{2m}$ if $j\in [m']$ and equals $\mathbf{Pr}[U_{2m'}\!=\!j\,-\,m']\cdot \frac{1}{t}=\frac{1}{2m}$ if $j\in [m'+1,2m']$.

Applying the filter $F_{m,t}$ to the distributions $p_1$ and $p_2$ of Theorem 4.3 (while setting $n=2m'=2\cdot m/t$), we obtain distributions $p'_2$ and $p'_1$ such that $p'_2$ is $\epsilon $-close $q'_2$, which is m-grained, whereas $p'_1$ is $\epsilon $-close to $q'_1$, which is (0.999/2t)-far from being m-grained, since filters can only decrease the distance between distributions. Hence, distinguishing the distributions $p_2$ and $p_1$ (over [2m/t]) is reducible to $((0.999/2t)-2\epsilon )$-testing the set of distributions that are $\epsilon $-close to being m-grained, which implies that the latter task has sample complexity $\varOmega ((2m/t)/\log (2m/t))$. (The claim follows by recalling that $1/t=\varOmega (1)$, since $m=O(n)$.) $\blacksquare $

Open Problems. Note that Corollary 4.4 does not refer to testing, but rather to distance approximation, and there are natural cases in which the complexity of testing a property of distributions is significantly lower than the corresponding distance approximation task (cf. [12] versus [16]). Hence, we ask—

Open Problem 4.5

(the sample complexity of testing whether a distribution is grained): For any m and n, what is the sample complexity of testing the property that consists of all m-grained distributions over [n].

This question can be generalized to properties that allow m to reside in some predetermined set M, where the most natural case is that M is an interval, say of the form $[m',2m']$.

Open Problem 4.6

(Problem 4.5, generalized): For any finite set $M\subset \mathbb N$ and $n\in \mathbb N$, what is the sample complexity of testing the property that consists of all distributions over [n] that are each m-grained for some $m\in M$.

Notes

1.
As an anecdote, we mention that, in course of their research, Goldreich, Goldwasser, and Ron considered the feasibility of testing properties of distributions, but being in the mindset that focused on complexity that is polylogarithmic in the size of the object (see discussion in [9, Sec. 1.4]), they found no appealing example and did not report of these thoughts in their paper [11].
2.
Testing equality to $U_n$ is implicit in a test of the distribution of the endpoint of a relatively short random walk on a bounded-degree graph.
3.
See further discussion in Sect. 3.4.
4.
This may happen if and only if the support of $q:[n]\rightarrow [0,1]$ is a strict subset of [n] (equiv., if $m_i=0$ for some $i\in [n]$). Specifically, for every $X\in [n]$, the support of $F_q(X)$ equals $ S''\,{\mathop {=}\limits ^\mathrm{def}}\, S\cup \{{\langle {i,0}\rangle }:i\in [n] \& q(i)\!=\!0\} \subseteq S'$, whereas $|S''|=m+|\{i\in [n]:q(i)\!=\!0\}|$.
5.
Recall that the alternatives include the tests of [14] and [4] or the collision probability test (of [12]), per its improved analysis in [7, 10].
6.
Consider, for example, the case that $q(i)=0.4\gamma /n$ if $i\in [0.5n]$ and $q(i)=(2\,-\,0.4\gamma )/n$ otherwise, and any distribution X such that $\mathbf{Pr}[X\!=\!i]<\gamma /n$ if $i\in [0.5n]$ and $\mathbf{Pr}[X\!=\!i]=q(i)$ otherwise. Then, each of these possible X’s will be mapped by F to the same distribution, although such distributions may be $0.1\gamma $-far from the distribution associated with q.
7.
Typically, $n'=n+1$. Recall that $n'=n$ if and only if D itself is 6n-grained, in which case the reduction is not needed anyhow.
8.
Valiant and Valiant [16] stated this result for the “relative earthmover distance” (REMD) and commented that the total variation distance up to relabelling is upper-bounded by REMD. This claim appears as a special case of [18, Fact 1] (using $\tau =0$), and a detailed proof appears in [13].
9.
Like in Footnote 8, we note that Valiant and Valiant [16] stated this result for the “relative earthmover distance” (REMD) and commented that the total variation distance up to relabelling is upper-bounded by REMD. This claim appears as a special case of [18, Fact 1] (using $\tau =0$), and a detailed proof appears in [13].
10.
The constant 0.499 stands for an arbitrary large constant that is smaller than 0.5. Recall that the definition of $\delta $-far mandates that the relevant distance be greater than $\delta $.
11.
Specifically, $t\ge 2$ since $2m>n$, whereas $t=O(1)$ and $m'=\varOmega (n)$ since $m=O(n)$.
12.
Otherwise the following description reduces the problem of Theorem 4.3 to a testing problem regarding $(t\cdot {\lfloor m/t\rfloor })$-grained distributions. In this case, we reduce the latter testing problem to one regarding m-grained distributions (e.g., by using a filter that maps each $i\in [n]$ to itself with probability $t\cdot {\lfloor m/t\rfloor }/m$ and maps it to n otherwise.
13.
Specifically, let $q:[n]\rightarrow [0,2)$ be the function resulting from the first step (i.e., $q(i)={\lfloor m\cdot p(i)\rfloor }/m$ if $r(i)\le 1/2m$ and $q(i)={\lceil m\cdot p(i)\rceil }/m$ otherwise). Then, $\delta \,{\mathop {=}\limits ^\mathrm{def}}\,\sum _{i\in [n]}|q(i)-p(i)|=\sum _{i\in [n]}\min (r(i),(1/m)-r(i))$ and $|1-\sum _{i\in [n]}q(i)|\le \delta $, since $\left| \sum _{i\in [n]}q(i)-\sum _{i\in [n]}p(i)\right| \le \sum _{i\in [n]}|q(i)-p(i)|$.
14.
Specifically, letting $\zeta _i=\zeta _i(f)$ denote the contribution of $i\in H$ to $\sum _{i\in H_f}s(i)$, we have $\mathbb E[\zeta _i]\ge 0.9\cdot s(i)$ and $\mathbb V[\zeta _i]\le \mathbb E[\zeta _i^2]\le s(i)^2\le s(i)/2m$. Hence, by Chebyshev’s Inequality, $\mathbf{Pr}\left[ \sum _{i\in H}\zeta _i\le 0.2\delta \right] < \frac{\delta /2m}{(0.45\delta \,-\,0.2\delta )^2}$, since $\mathbb V\left[ \sum _{i\in H}\zeta _i\right] \le \delta /2m$ and $\mathbb E\left[ \sum _{i\in H}\zeta _i\right] \ge 0.9\cdot 0.5\delta $. This suffices for $\delta =\omega (1/m)$. Actually, the same argument holds if $\sum _{i\in H}s(i)^2=o(\delta ^2)$; the argument for the general case follows.
In general (esp., if $\sum _{i\in H}s(i)^2=\varOmega (\delta ^2)$), for a sufficiently small $c'>0$, we define $H'\,{\mathop {=}\limits ^\mathrm{def}}\,\{i\in H:s(i)\ge c'\cdot \delta \}$, and consider two cases.
1. (a)
  If $\sum _{i\in H\setminus H'}s(i)>0.3\cdot \delta $, then we use $H\setminus H'$ instead of H, while noting that $\mathbf{Pr}\left[ \sum _{i\in H\setminus H'}\zeta _i\le 0.2\delta \right]< \frac{c'\,\cdot \,\delta ^2}{(0.07\delta )^2}<c$, since $\mathbb E\left[ \sum _{i\in H\setminus }\zeta _i\right] >0.9\cdot 0.3\delta $ and $\mathbb V[\sum _{i\in H\setminus H'}\zeta _i] \le \sum _{i\in H\setminus H'}s(i)^2\le c'\delta \cdot \delta $.
2. (b)
  If $\sum _{i\in H'}s(i)>0.2\cdot \delta $, then we use $H'$ instead of H, while noting that the probability that $|f(H')|<|H'|$ is at most ${{|H'|}\atopwithdelims ()2}\cdot (1/k)\le {{1/c'}\atopwithdelims ()2}\cdot (1/k)<c$, where the last inequality holds for sufficiently large k (i.e., $m=c\cdot k>(1/c')^2$ suffices).
(We proceed with H replaced by either $H'$ or $H\setminus H'$.)

References

Batu, T., Fischer, E., Fortnow, L., Kumar, R., Rubinfeld, R., White, P.: Testing random variables for independence and identity. In: 42nd FOCS, pp. 442–451 (2001)
Google Scholar
Batu, T., Fortnow, L., Rubinfeld, R., Smith, W.D., White, P.: Testing that distributions are close. In: 41st FOCS, pp. 259–269 (2000)
Google Scholar
Canonne, C.L.: A survey on distribution testing: your data is big. But is it blue? In: ECCC, TR015-063 (2015)
Google Scholar
Chan, S., Diakonikolas, I., Valiant, P., Valiant, G.: Optimal algorithms for testing closeness of discrete distributions. In: 25th ACM-SIAM Symposium on Discrete Algorithms, pp. 1193–1203 (2014)
Google Scholar
Diakonikolas, I., Kane, D.: A new approach for testing properties of discrete distributions. arXiv:1601.05557 [cs.DS] (2016)
Diakonikolas, I., Kane, D., Nikishkin, V.: Testing identity of structured distributions. In: 26th ACM-SIAM Symposium on Discrete Algorithms, pp. 1841–1854 (2015)
Google Scholar
Diakonikolas, I., Gouleakis, T., Peebles, J., Price, E.: Collision-based testers are optimal for uniformity and closeness. In: ECCC, TR16-178 (2016)
Google Scholar
Goldreich, O.: Introduction to property testing: lecture notes. Superseded by [9]. Drafts are available from the author’s web-page
Google Scholar
Goldreich, O.: In: Introduction to Property Testing. Cambridge University Press, Cambridge (2017)
Google Scholar
Goldreich, O.: On the optimal analysis of the collision probability tester (an exposition). This volume
Google Scholar
Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. J. ACM 45, 653–750 (1998). Extended abstract in 37th FOCS, 1996
Article MathSciNet Google Scholar
Goldreich, O., Ron, D.: On testing expansion in bounded-degree graphs. In: ECCC, TR00-020, March 2000
Google Scholar
Goldreich, O., Ron, D.: On the relation between the relative earth mover distance and the variation distance (an exposition). This volume
Google Scholar
Paninski, L.: A coincidence-based test for uniformity given very sparsely-sampled discrete data. IEEE Trans. Inf. Theory 54, 4750–4755 (2008)
Article MathSciNet Google Scholar
Parnas, M., Ron, D., Rubinfeld, R.: Tolerant property testing and distance approximation. J. Comput. Syst. Sci. 72(6), 1012–1042 (2006)
Article MathSciNet Google Scholar
Valiant, G., Valiant, P.: Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. In: 43rd ACM Symposium on the Theory of Computing, pp. 685–694 (2011)
Google Scholar
Valiant, G., Valiant, P.: Instance-by-instance optimal identity testing. In: ECCC, TR13-111 (2013)
Google Scholar
Valiant, G., Valiant, P.: Instance optimal learning. CoRR abs/1504.05321 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot, Israel
Oded Goldreich

Authors

Oded Goldreich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oded Goldreich .

Editor information

Editors and Affiliations

Weizmann Institute of Science, Rehovot, Israel
Oded Goldreich

Appendix: Reducing Testing m-Grained Distributions (over [n]) to the Case of $n=O(m)$

Recall that Corollary 4.2 asserts that for every $n,m\in \mathbb N$, the set of m-grained distributions over [n] has a tester of sample complexity $O(\epsilon ^{-2}\cdot n/\log n)$. As commented in the main text, we believe that using the techniques of [16] one can reduce the complexity to $O(\epsilon ^{-2}\cdot n'/\log n')$, where $n'=\min (n,m)$. Here we show an alternative proof of this result. Specifically, we shall reduce $\epsilon $-testing m-grained distributions over [n] to $\varOmega (\epsilon )$-testing m-grained distributions over [O(m)], and apply Corollary 4.2.

The reduction will consist of using a deterministic filter $f:[n]\rightarrow [k]$, where $k=O(m)$, that will be selected uniformly at random among all such filters. We stress that this is fundamentally different from the randomized filters F used in the main text. Specifically, when applying F several times to the same input, we obtained outcomes that are independently and identically distributed, whereas when we apply a function f (which is selected at random) several times to the same input we obtain the same output.

Note that applying any function $f:[n]\rightarrow [k]$ to any m-grained distribution yields an m-grained distribution. Our main result is that, for any distribution X over [n] that is $\epsilon $-far from being m-grained, for almost all functions $f:[n]\rightarrow [O(m)]$, the distribution f(X) is $\varOmega (\epsilon )$-far from being m-grained.

Lemma A.1

(relative preservation of distance from m-grained distributions): For all sufficiently small $c>0$ and all sufficiently large n and m, the following holds. If a distribution X over [n] is $\epsilon $-far from being m-grained, then, with probability at least $1-36c$ over the choice of a function $f:[n]\rightarrow [m/c]$, the distribution f(X) is $0.02\cdot \epsilon $-far from being m-grained.

Hence, we obtain a randomized reduction of the general problem of testing m-grained distributions (over [n]) to the special case of $n=O(m)$, where the reduction consists of selecting at random a function $f:[n]\rightarrow [m/c]$ and using it as a (deterministic) filter for reducing the general problem to its special case.

Proof:

Let $k=m/c$ and let $p:[n]\rightarrow [0,1]$ denote the probability function that describes X. Define $r:[n]\rightarrow [0,1/m)$ such that $r(i)=p(i)-{\lfloor m\cdot p(i)\rfloor }/m$. Denoting by $\varDelta _G(p)$ the statistical distance between p and the set of m-grained distributions (i.e., half the norm-1 distance), we have

$$\begin{aligned} 2\cdot \varDelta _G(p)\ge & {} \sum _{i\in [n]}\min (r(i),(1/m)-r(i)) \end{aligned}$$

(4)

$$\begin{aligned} 2\cdot \varDelta _G(p)\le & {} 2\cdot \sum _{i\in [n]}\min (r(i),(1/m)-r(i)) \end{aligned}$$

(5)

where Eq. (4) is due to the need to transform each p(i) to a multiple of 1/m and Eq. (5) is justified by a two-step correction process in which we first round each p(i) to the closest multiple of 1/m, and then we correct the resulting function so that it sums up to 1 (while keeping its values as multiples of 1/m).^{Footnote 13} Hence, using Eq. (5). the lemma’s hypothesis implies that $\sum _{i\in [n]}\min (r(i),(1/m)-r(i)) > \epsilon $. We shall prove the lemma by lower-bounding (w.h.p.) the corresponding sum that refers to the distribution f(X), when f is selected at random. Specifically, letting $p'(j)=\sum _{i:f(i)=j}p(i)$, we shall lower-bound the probability that $\sum _{j\in [k]}\min (r'(j),(1/m)-r'(j))=\varOmega (\epsilon )$, where $r'(j)=p'(j)-{\lfloor m\cdot p'(j)\rfloor }/m$, and then apply Eq. (4).

Before doing so, we introduce a few additional notations. Firstly, we let $s(i)=\min (r(i),(1/m)-r(i))$, and let $\delta =\sum _{i\in [n]}s(i)$, which is greater than $\epsilon $ by the hypothesis. Next, we let $H=\{i\in [n]:p(i)\ge 1/3m\}$ denote the set of “heavy” elements in X. We observe that $|H|\le 3m$ and that for every $i\in {\overline{H}}\,{\mathop {=}\limits ^\mathrm{def}}\,[n]\setminus H$ it holds that $s(i)=r(i)=p(i)$, since $p(i)<1/2m$ holds for every $i\in {\overline{H}}$. We consider two cases, according to whether or not the sum $\sum _{i\in {\overline{H}}}p(i)$ is smaller than $0.5\cdot \delta $.

Claim A.1.1

(the first case): Suppose that $\sum _{i\in {\overline{H}}}p(i)<0.5\cdot \delta $. Then, with probability at least $1-16c$ over the choice of f, it holds that f(X) is $0.05\epsilon $-far from being m-grained.

Proof: In this case $\sum _{i\in H}s(i) > 0.5\cdot \delta $, and we shall focus on the contribution of f(H) to the distance of f(X) from being m-grained. We shall show that, for almost all functions f, much of this weight is mapped (by f) in a one-to-one manner, and that the elements in $\overline{H}$ do not change by much the weight mapped by f to f(H). Specifically, we consider a uniformly selected function $f:[n]\rightarrow [k]$, and the following two good events defined on this probability space.

1.
The first (good) event is that the function f maps at least $0.2\delta $ of the s(i)-mass of the i’s in H to distinct images. Intuitively, this is very likely given that the total s(i)-mass of i’s in H is greater than $0.5\delta $ and that $|H|\ll k$. Formally, denoting by $H_f$ the (random variable that represents the) set of $i\in H$ that satisfy $f(i)\not \in f(H\setminus \{i\})$ (i.e., for every $i\in H_f$ it holds that $f^{-1}(f(i))\cap H=\{i\}$), we claim that $\mathbf{Pr}_f\left[ \sum _{i\in H_f}s(i)> 0.2\delta \right] >1-c$.

To see this, we first note that, for every $i\in H$, conditioned on the values assigned to $H\setminus \{i\}$, the probability that $f(i)\not \in f(H\setminus \{i\})$ is at least $\frac{k\,-\,(|H|\,-\,1)}{k}>1\,-\,|H|/k\ge 0.9$, where the inequality is due to $|H|\le 3m=3c\cdot k\le 0.1\cdot k$. Hence, each $i\in H$ contributes $s(i)\le 1/2m$ to the sum (of s(i)’s with $i\in H_f$) with probability at least 0.9, also when conditioned on all other values assigned by f. It follows that $\mathbf{Pr}_f\left[ \sum _{i\in H_f}s(i)> 0.2\delta \right] >1-c$, where the (typical) case of $\delta =\omega (1/m)$ is straightforward.^{Footnote 14}
2.
The second (good) event is that the function f does not map much p(i)-mass of i’s in ${\overline{H}}$ to the images occupied by H. Again, this is very likely given that $|H|\ll k$. Specifically, observe that $\mathbb E_f\left[ \sum _{i\in {\overline{H}}:f(i)\in f(H)}p(i)\right] \le \frac{|H|}{k}\cdot \sum _{i\in {\overline{H}}}p(i) < 3c\cdot \delta /2$, since $p(i)=s(i)$ for every $i\in {\overline{H}}$ (and $|H|\le 3m$ and $k=m/c$). Letting $S_f=\sum _{i\in {\overline{H}}:f(i)\in f(H)}p(i)$, we get $\mathbf{Pr}_f[S_f<0.1\delta ]>1-\frac{3c\delta /2}{0.1\delta }=1-15c$.

Assuming that the two good events occur (which happens with probability at least $1-16c$), it follows that at least $0.2\delta $ of the $s(\cdot )$-mass of H is mapped by f to distinct images and at most $0.1\delta $ of the mass of ${\overline{H}}$ is mapped to these images. Hence, f(X) corresponds to a probability function $p'$ such that $r'(i)\,{\mathop {=}\limits ^\mathrm{def}}\, p'(i)-{\lfloor m\cdot p'(i)\rfloor }/m$ satisfies

$$\begin{aligned} \sum _{i\in H_f}\min (r'(i),(1/m)-r'(i))\ge & {} \sum _{i\in H_f}s(i) - \sum _{i\in {\overline{H}}:f(i)\in f(H)} p(i)\\> & {} 0.2\delta -0.1\delta , \end{aligned}$$

where $H_f=\{i\in H:f^{-1}(f(i))\cap H=\{i\}\}$ (as above). Hence, recalling that $\delta >\epsilon $ and using Eq. (4), with probability at least $1-16c$ over the choice of f, it holds that f(X) is $0.05\epsilon $-far from being m-grained. $\square $

Claim A.1.2

(the second case): Suppose that $\delta '\,{\mathop {=}\limits ^\mathrm{def}}\,\sum _{i\in {\overline{H}}}p(i)\ge 0.5\cdot \delta $. Then, with probability at least $1-36c$ over the choice of f, it holds that f(X) is $0.02\epsilon $-far from being m-grained.

Proof: In this case $\sum _{i\in {\overline{H}}}s(i) > 0.5\cdot \delta $, and we shall focus on the contribution of $f({\overline{H}})$ to the distance of f(X) from being m-grained. We shall show that, for almost all functions f, much of this weight is mapped (by f) to $[k]\setminus H$ and that the mass of the elements of $\overline{H}$ is distributed almost uniformly. Specifically, we first show that more than half of the probability mass of $\overline{H}$ is mapped disjointly of H. That is,

$$\begin{aligned} \mathbf{Pr}_{f:[n]\rightarrow [k]}\left[ \sum _{i\in {\overline{H}}:f(i)\not \in f(H)} p(i)>0.5\cdot \delta '\right] \ge 1-6c \end{aligned}$$

(6)

where the probability is taken uniformly over all possible choices of f. The proof is similar to the analysis of the second event in the proof of Claim A.1.2. Specifically, we consider random variables $\zeta _i$’s such that $\zeta _i=p(i)$ if $f(i)\not \in f(H)$ and $\zeta _i=0$ otherwise, and observe that $\mathbb E[\zeta _i]\ge \frac{k\,-\,|H|}{k}\cdot p(i)\ge (1-3c)\cdot p(i)$ (since $|H|\le 3m$ and $m=ck$). Thus, $\mathbb E\left[ \sum _{i\in {\overline{H}}}\zeta _i\right] \ge (1-3c)\cdot \delta '$ and Eq. (6) follows by Markov Inequality while using $\sum _{i\in {\overline{H}}}\zeta _i\le \sum _{i\in {\overline{H}}}p(i)=\delta '$. This holds also if we fix the values of f on H and condition on it, which is what we do from this point on. Hence, we fix an arbitrary sequence of value for f(H), and consider the uniform distribution of f conditioned on this fixing as well as on the event in Eq. (6).

Actually, we decompose $f:[n]\rightarrow [k]$ into three parts, denoted $f',f''$ and $f'''$, that represents its restriction to the three-way partition of [n] into (H, B, G) such that $B=\{i\in {\overline{H}}:f(i)\in f(H)\}$ (and $G=\{i\in {\overline{H}}:f(i)\not \in f(H)\}$); indeed, $f':H\rightarrow [k]$ is the restriction of f to H, whereas $f'':B\rightarrow f(H)$ and $f''':G\rightarrow [k]\setminus f(H)$ are its restrictions to the two parts of $\overline{H}$. We fix arbitrary $f':H\rightarrow [k]$ and $f'':B\rightarrow f'(H)$, where $B=\{i\in {\overline{H}}:f''(i)\in f'(H)\}$, such that $\sum _{i\in B} p(i)<0.5\delta '$, while bearing in mind that such fixing (of $f'$ and $f''$) arise from the choice of a random f with probability at least $1-6c$. Our aim will be to show that, with high probability over the choice of $f''':G\rightarrow [k]\setminus f(H)$, it holds that

$$\begin{aligned} \sum _{i\in G:f'''(i)\in J(f''')}p(i)>0.4\delta '. \end{aligned}$$

(7)

where $J(f''')\,{\mathop {=}\limits ^\mathrm{def}}\,\{j\in [k]:\sum _{i\in G:f'''(i)=j}p(i)\le 0.8/m\}$. (Recall that for any $i\in G\subseteq {\overline{H}}$ it holds that $p(i)=r(i)=s(i)<1/3m$.) This would imply that, with high probability, the distance of f(X) from being m-grained is at least

$$\begin{aligned} \sum _{j\in J(f''')} \min \left( \sum _{i:f'''(i)=j}p(i)\;,\;\frac{1}{m}-\frac{0.8}{m}\right)\ge & {} \sum _{j\in J(f''')} 0.25\cdot \sum _{i:f'''(i)=j}p(i) \\> & {} 0.25\cdot 0.4\delta ' \\\ge & {} 0.05\delta , \end{aligned}$$

where the first inequality is due to the fact that $p'(j)\,{\mathop {=}\limits ^\mathrm{def}}\,\mathbf{Pr}[f(X)\!=\!j]=\sum _{i\in G:f'''(i)=j}p(i)\le 0.8/m$ for every $j\in J(f''')$ and so $\min (p'(j),0.2/m)\ge p'(j)/4$. So all that remains is to show that Eq. (7) holds with high probability over the choice of $f'''$.

Letting $K'\,{\mathop {=}\limits ^\mathrm{def}}\,[k]\setminus f(H)$, we start by observing that, for every $i\in G$, it holds that

$$\begin{aligned}&\mathbf{Pr}_{f''':G\rightarrow K'}[f'''(i)\not \in J(f''')]\nonumber \\&\;\;\le \mathbf{Pr}_{f''':G\rightarrow K'} \left[ \sum _{\ell \in G\setminus \{i\}:f'''(\ell )=f'''(i)}p(\ell ) \;>\;\frac{0.8}{m}-\frac{1}{3m}\right] \nonumber \\&\;\;= \mathbf{Pr}_{f''':G\rightarrow K'} \left[ f'''(i)\in \left\{ j\in K': \sum _{\ell \in G\setminus \{i\}:f'''(\ell )=j} p(\ell ) \;>\;\frac{1.4}{3m}\right\} \right] \end{aligned}$$

(8)

where the equality can be seen by first fixing $f'''$-values for all elements in $G\setminus \{i\}$ and then selecting $f'''(i)$ uniformly in $K'$. Upper-bounding the size of the set in Eq. (8) by $(1.4/3m)^{-1}$, and using $m=ck$ and $|K'|\ge k-3m$, we get

$$\begin{aligned} \mathbf{Pr}_{f''':G\rightarrow K'}[f'''(i)\not \in J(f''')]\le & {} \frac{3m}{1.4}\cdot \frac{1}{|K'|} \\\le & {} \frac{3ck}{1.4}\cdot \frac{1}{k-3ck} \\\le & {} 3c, \end{aligned}$$

where the last inequality presupposes $1.4\cdot (1-3c)\ge 1$ (equiv., $c \le 2/21$). It follows that

$$\begin{aligned} \mathbb E_{f''':G\rightarrow K'}\left[ \sum _{i\in G:f'''(i)\not \in J(f''')}p(i)\right]= & {} \sum _{i\in G}\mathbf{Pr}_{f''':G\rightarrow K'}[f'''(i)\not \in J(f''')]\cdot p(i) \\\le & {} \sum _{i\in G}3c\cdot p(i) \\\le & {} 3c\cdot \delta ', \end{aligned}$$

since $\sum _{i\in G}p(i)\le \sum _{i\in {\overline{H}}}p(i)=\delta '$. Hence,

$$\mathbf{Pr}_{f''':G\rightarrow K'} \left[ \sum _{i\in G:f'''(i)\not \in J(f''')}p(i)\ge 0.1\delta '\right] \;\le \;\frac{3c}{0.1} \;=\;30c.$$

Recalling that $\sum _{i\in B} p(i)<0.5\delta '$, which implies $\sum _{i\in G} p(i)>0.5\delta '$, this implies that Eq. (7) holds with probability at least $1-30c$ (over the choice of $f'''$).

Lastly, recall that $\sum _{i\in B} p(i)<0.5\delta '$, where $B=\{i\in {\overline{H}}:f''(i)\in f'(H)\}$, holds with probability at least $1-6c$ (over the choice of $f'$ and $f''$). The claim follows, since (as argued above) Eq. (7) implies that $\sum _{j\in J(f''')}\min (p'(j),(1/m)-p'(j))>0.1\delta '$ (whereas using $\delta '\ge \delta /2\ge \epsilon /2$ and Eq. (4), it follows that f(X) is $0.02\epsilon $-far from being m-grained). $\square $

Combining Claims A.1.1 and A.1.2, the lemma follows. $\blacksquare $

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Goldreich, O. (2020). The Uniform Distribution Is Complete with Respect to Testing Identity to a Fixed Distribution. In: Goldreich, O. (eds) Computational Complexity and Property Testing. Lecture Notes in Computer Science(), vol 12050. Springer, Cham. https://doi.org/10.1007/978-3-030-43662-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-43662-9_10
Published: 04 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43661-2
Online ISBN: 978-3-030-43662-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Uniform Distribution Is Complete with Respect to Testing Identity to a Fixed Distribution

Abstract

Similar content being viewed by others

Testing Probability Distributions Underlying Aggregated Data

A Lower Bound on the Complexity of Testing Grained Distributions

Improving and Extending the Testing of Distributions for Shape-Restricted Properties

1 Introduction

Theorem 1.1

2 Preliminaries

Definition 2.1

Definition 2.2

3 The Reduction

3.1 Overview

Definition 3.1

3.2 Testing Equality to a Fixed Grained Distribution

Algorithm 3.2

Proposition 3.3

Corollary 3.4

3.3 From Arbitrary Distributions to Grained Ones

Algorithm 3.5

Proposition 3.6

3.4 From Arbitrary Distributions to the Uniform One

Theorem 3.7

Proof:

Theorem 3.8

Proof:

4 On Testing Whether a Distribution Is Grained

Theorem 4.1

Corollary 4.2

Theorem 4.3

Corollary 4.4

Proof:

Open Problem 4.5

Open Problem 4.6

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Reducing Testing m-Grained Distributions (over [n]) to the Case of \(n=O(m)\)

Appendix: Reducing Testing m-Grained Distributions (over [n]) to the Case of \(n=O(m)\)

Lemma A.1

Proof:

Claim A.1.1

Claim A.1.2

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation