We introduce a new aspect of the influence of the information-theoretical methods on the statistical theory. The procedures of the probability distributions identification for K(≥ 1) random objects each having one from the known set of M(≥ 2) distributions are studied. N-sequences of discrete independent RV’s represent results of N observations for each of K objects. On the base of such samples decisions must be made concerning probability distributions of the objects. For N → the exponential decrease of the test’s error probabilities is considered. The reliability matrices of logarithmically asymptotically optimal procedures are investigated for some models and formulations of the identification problems. The optimal subsets of reliabilities which values may be given beforehand and conditions guaranteeing positiveness of all the reliabilities are investigated.

“In statistical literature such a problem is referred to as one of classification or discrimination, but identification seems to be more appropriate”

Radhakrishna Rao [27].

1 Problem Statement

Let \({\mathbf {X}}_k=(X_{k,n}, \ n\in \left [ N \right ]), \ k\in \left [ K \right ]\), be K(≥ 1) sequences of N discrete i.i.d. RV’s representing possible results of N observations, respectively, for each of K randomly functioning objects.

For \(k \in \left [ K \right ]\), \(n \in \left [ N \right ]\), X k,n assumes values x k,n in the finite set \({\mathcal {X}}\) of cardinality \(|\mathcal {X}|\). Let \(\mathcal {P}(\mathcal {X})\) be the space of all possible distributions on \({\mathcal {X}}\). There are M(≥ 2) probability distributions G 1, …, G M from \(\mathcal {P}(\mathcal {X})\) in inspection, some of which are assigned to the vectors X 1, …, X K. This assignment is unknown and must be determined on the base of N-samples (results of N independent observations) x k = (x k,1, …, x k,N), where x k,n is a result of the nth observation of the kth object.

When M = K and all objects are different (any two objects cannot have the same distribution), there are K! possible decisions. When objects are independent, there are M K possible combinations.

Bechhofer, Kiefer, and Sobel presented investigations on sequential multiple-decision procedures in [7]. This book is concerned principally with a particular class of problems referred to as ranking problems.

Chapter “Models with Prior Knowledge at the Sender” of the book by Ahlswede and Wegener [5] is devoted to statistical identification and ranking problems.

We study models considered in [7] and [5] and variations of these models inspired by the pioneering papers by Ahlswede and Dueck [4] (see chapter “Identification via Channels” in Part I) and by Ahlswede [1], applying the concept of optimality developed in [9, 16, 22,23,24, 28] for the models with K = 1.

Consider the following family of error probabilities of a test

$$\displaystyle \begin{aligned} \alpha^{(N)}_{m_1, m_2,\dots, m_K|l_1, l_2, \dots, l_K}, \ (m_1, m_2,\dots, m_K)\neq (l_1, l_2, \dots, l_K),\ m_k, l_k\in \left[ M \right], \ k\in [K], \end{aligned}$$

which are the probabilities of decisions l 1, l 2, …, l K when actual indices of the distributions of the objects were, respectively, m 1, m 2, …, m K.

The probabilities to reject all K hypotheses when they are true are the following

$$\displaystyle \begin{aligned} \alpha^{(N)}_{m_1, m_2,\dots, m_K|m_1, m_2,\dots, m_K} =\sum\limits_{(l_1, l_2, \dots, l_K)\neq (m_1, m_2,\dots, m_K)} \alpha^{(N)}_{m_1, m_2,\dots, m_K|l_1, l_2, \dots, l_K}. \end{aligned}$$

We study exponential decrease of the error probabilities when N → and define (using logarithms and exponents to the base e)

$$\displaystyle \begin{aligned} \limsup_{{N\to\infty}} -\frac{1}{N} \log \alpha^{(N)}_{m_1, m_2,\dots, m_K|l_1, l_2, \dots, l_K} = E_{m_1, m_2,\dots, m_K|l_1, l_2, \dots, l_K}\geq0. \end{aligned} $$
(1)

These are exponents of error probabilities which we call reliabilities (in association with Shannon’s reliability function [15]). We shall examine the matrix \(\mathbf {E}=\{E_{m_1, m_2,\dots , m_K|l_1, l_2, \dots , l_K}\}\) and call it the reliability matrix.

Our criterion of optimality is: given M, K and values of a part of reliabilities to obtain the best (the largest) values for others. In addition it is necessary to describe the conditions under which all these reliabilities are positive. The procedure that realizes such testing is identification, which following Birgé [9], we call “logarithmically asymptotically optimal” (LAO).

Let N(x|x) be the number of repetitions of the element \(x\in \mathcal {X}\) in the vector \(\mathbf {x}\in \mathcal {X}^N\), and let

$$\displaystyle \begin{aligned} Q=\{Q(x)=N(x|\mathbf{x})/N,\ x \in\mathcal{X}\} \end{aligned}$$

is the distribution, called “the empirical distribution” (ED) of the sample x in statistics, in information theory called “the type” [14, 15] and in algebraic literature “the composition”.

Denote the space of all empirical distributions for given N by \(\mathcal {P}^{(N)}(\mathcal {X})\) and by \(\mathcal {T}^{(N)}_{Q}\) the set of all vectors of the ED \(Q\in \mathcal {P}^{(N)}(\mathcal {X})\).

Consider for \(k\in \left [ K \right ]\), \(m\in \left [ M \right ]\), relative entropies

$$\displaystyle \begin{aligned} D(Q_k||G_m)=\sum\limits_{x\in\mathcal{X}}Q_k(x)\log {\frac{Q_k(x)}{G_m(x)}}, \end{aligned}$$

and entropies

$$\displaystyle \begin{aligned} H(Q_k)=-\sum\limits_{x \in\mathcal{X}}Q_k(x)\log Q_k(x). \end{aligned}$$

We shall use the following relations for the probability of the vector x when G m is the distribution of the object:

$$\displaystyle \begin{aligned} G_m^{(N)}(\mathbf{x})=\prod\limits_{n=1}^ {N}G_m(x_n)=\exp\{-N[D(Q||G_m)+H(Q)]\}. \end{aligned}$$

For \(m_k\in \left [ M \right ]\), \(k\in \left [ K \right ]\), when the objects are independent and \(G_{m_k}\) is the distribution of the kth object:

$$\displaystyle \begin{aligned} P_{m_1, m_2, \dots, m_K}^{(N)}({\mathbf{x}}_1, {\mathbf{x}}_2, \dots, {\mathbf{x}}_K )=\exp\{-N[\sum_{k=1}^{K}D(Q_k||G_{m_k})+H(Q_k)]\}. \end{aligned} $$
(2)

The equalities follow from the independence of N observations of K objects and from the definitions of relative entropies and entropies. It should be noted that the equality (2) is valid even when its left part is equal to 0, in that case for one of x k the distribution Q k is not absolutely continuous relative to \(G_{m_k}\) and \(D(Q_k||G_{m_k})=\infty \).

Our arguments will be based on the following fact: the “maximal likelihood” test accepts as the solution values m 1, m 2, …, m k, which maximize the probability \(P_{m_1,m_2,\dots ,m_K}^{(N)}({\mathbf {x}}_1, {\mathbf {x}}_2,\dots , {\mathbf {x}}_K)\), but from (2) we see that the same solution can be obtained by minimization of the sum \(\sum \limits _{k=1}^{K}[D(Q_k||G_{m_k})+H(Q_k)]\), that is the comparison with the help of relative entropies of the ED’s of observed vectors with their hypothetical distributions may be helpful.

In this lecture we consider the following models.

  1. 1.

    K objects are different, they have different distributions among M ≥ K possibilities. For simplicity we restrict ourselves to the case K = 2, M = 2. It is the identification problem in formulations of the books [7] and [5].

  2. 2.

    K objects are independent, that is some of them may have the same distributions. We consider an example for K, M = 2. It is surprising, but this model has not been considered earlier in the literature.

  3. 3.

    We investigate one object, K = 1, and M possible probability distributions. The question is whether the mth distribution occurred or not. This is the problem of identification of distributions in the spirit of chapter “Identification via Channels”.

  4. 4.

    Ranking, or ordering problem [1]. We have one vector of observations X = (X 1, X 2, …, X N) and M hypothetical distributions. The receiver wants to know whether the index of the true distribution of the object is in {1, 2, …, r} or in {r + 1, …, M}.

  5. 5.

    r-identification of distribution [1]. Again K = 1. One wants to identify the observed object as a member either of the subset \(\mathcal {S}\) of \(\left [ M \right ]\), or of its complement, with r being the number of elements in \(\mathcal {S}\).

Section 2 of this lecture presents necessary notions and results on hypothesis testing. The models of identification for independent objects are considered in 3 and for different objects in 4. Section 5 is devoted to the problem of identification of an object distribution and 6 to the problems of r-identification and ranking. Some results are illustrated by numerical examples and graphs. Many directions of further research are indicated in the course of the text and in the 7.

2 Background

The study of interdependence of exponential rates of decrease, as the sample size N goes to the infinity, of the error probabilities \(\alpha _{1|2}^{(N)}\) of the “first kind” and \(\alpha _{2|1}^{(N)}\) of the “second kind” was started by the works of Hoeffding [23], Csiszár and Longo [16], Tusnády [28], Longo and Sgarro [24], Birgé [9], and for multiple hypotheses by Haroutunian [22]. Similar problems for Markov dependence of experiments were investigated by Natarajan [26], Haroutunian [21], Gutman [18] and others. As it was remarked by Blahut in his book [11], it is unfortunately confusing that the errors are denoted type I and type II, while the hypotheses are subscripted 0 and 1. The word “type” is also used in another sense to refer to the type of a measurement or the type of a vector. For this reason we do not use the names “0” and “1” for hypotheses and the name “type” for errors. Note that in [10, 11, 17] an application of the methods of hypothesis testing to the proper problems of information theory is developed.

It will be very interesting to combine investigation of described models with the approach initiated by the paper of Ahlswede and Csiszár [3] and developed by many authors, particularly, for the exponentially decreasing error probabilities by Han and Kobayashi [20].

In [8] Berger formulated the problem of remote statistical inference. Zhang and Berger [29] studied a model of an estimation system with compressed information. Similar problems were examined by Ahlswede and Burnashev [2] and by Han and AmariAmari, S. [19]. In the paper of Ahlswede , Yang and Zhang [6] identification in channels via compressed data was considered. Fu and Shen [17] studied hypothesis testing for an arbitrarily varying source.

Our further considerations will be based on the results from [22] on multiple hypotheses testing, so now we expose briefly corresponding formulations and proofs. In our terms it is the case of one object (K = 1) and M possible distributions (hypotheses) G 1, …, G M. A test φ(x) on the base of N-sample x = (x 1, …, x N) determines the distribution. Since experiments are independent the probability of the sample x if the distribution is G m will be

$$\displaystyle \begin{aligned} G_m^{(N)}(\mathbf{x})= \prod\limits_{n=1}^ {N}G_m(x_n), \ m \in \left[ M \right]. \end{aligned}$$

We study error probabilities \(\alpha _{m|l}^{(N)}\) for \( m, l\in \left [ M \right ]\). Here \(\alpha _{m|l}^{(N)}\) is the probability that the distribution G l was accepted instead of true distribution G m. For m = l the probability to reject G m when it is true, is denoted by \(\alpha _{m|m}^{(N)}\) thus:

$$\displaystyle \begin{aligned} \alpha_{m|m}^{(N)}=\sum\limits_{l:l\neq m}\alpha_{m|l}^{(N)}. \end{aligned}$$

This probability is called [12] the test’s “error probability of the kind m”. The matrix \(\{\alpha _{m|l}^{(N})\}\) is sometimes called the “power of the test” [12].

In this lecture we suppose that the list of possible hypotheses is complete. Remark that, as it was noted by Rao [27], the case, when the objects may have also some distributions different from G 1, …, G M, is interesting too.

Let us analyze the reliability matrix

$$\displaystyle \begin{aligned} \mathbf{E}=\left ( \begin{array}{c} E_{1| 1}\,\, \ldots \, E_{1| l}\, \ldots \,\, E_{1| M}\\ \ldots \ldots \ldots \ldots \ldots \ldots \ldots \\ E_{m | 1} \ldots E_{m| l} \ldots E_{m | M}\\ \ldots \ldots \ldots \ldots \ldots \ldots \ldots \\ E_{M | 1} \ldots E_{M | l} \ldots E_{M | M} \end{array} \right ) \end{aligned}$$

with components

$$\displaystyle \begin{aligned} E_{m|l}=\limsup_{{N \to \infty}} -\frac{1}{N}\log {{\alpha}_{m|l}^{(N)}}, \ m, l\in \left[ M \right]. \end{aligned}$$

According to this definition and the definition of \(\alpha _{m|l}^{(N)}\) we can derive that

$$\displaystyle \begin{aligned} E_{m|m}=\min\limits_{l:m\neq l}E_{m|l}. \end{aligned} $$
(3)

Really,

$$\displaystyle \begin{aligned} E_{m|m} &=\limsup_{{N \to \infty}}-\frac{1}{N}\log {\sum_{l:m\neq l}\alpha_{m|l}^{(N)}} \\ &=\limsup_{{N \to \infty}}-\frac{1}{N}\log \max\limits_{l: m\neq l}\alpha_{m|l}^{(N)} + \limsup_{{N \to \infty}}-\frac{1}{N}\log \left[\left( \sum_{l:m\neq l}\alpha_{m|l}^{(N)}\right)/ \max\limits_{l: m\neq l} \alpha_{m|l}^{(N)}\right] \\ &=\min\limits_{l:m\neq l}E_{m|l}. \end{aligned} $$

The last equality is a consequence of the fact that for all m and N

$$\displaystyle \begin{aligned} 1\leq(\sum_{l:m\neq l}\alpha_{m|l}^{(N)})/ \max\limits_{l:m\neq l} \alpha_{m|l}^{(N)}\leq M-1. \end{aligned}$$

In the case M = 2, the reliability matrix is

$$\displaystyle \begin{aligned} \mathbf{E}= \left( \begin{array}{ccc} E_{1|1} & E_{1|2}\\ E_{2|1} & E_{2|2}\\ \end{array} \right) \end{aligned} $$
(4)

and it follows from (3) that there are only two different values of elements, namely

$$\displaystyle \begin{aligned} E_{1|1}=E_{1|2} \mbox{ and } E_{2|1}=E_{2|2}, \end{aligned} $$
(5)

so in this case the problem is to find the maximal possible value of one of them, given the value of the other.

In the case of M hypotheses for given positive and finite E 1|1, E 2|2, …, E M−1,M−1 let us consider the regions of distributions

$$\displaystyle \begin{aligned} \mathcal{R}_l&=\{Q: D(Q||G_l)\leq E_{l|l}\}, \qquad l\in \left[ M-1 \right], \,{} \end{aligned} $$
(6)
$$\displaystyle \begin{aligned} \mathcal{R}_{M}&=\{Q: D(Q||G_l)>E_{l|l},\ l\in \left[ M-1 \right]\} =\mathcal{P}(\mathcal{X})-\bigcup\limits_{l=1}^{M-1}\mathcal{R}_l, {} \end{aligned} $$
(7)
$$\displaystyle \begin{aligned} \mathcal{R}_{l}^{(N)}&=\mathcal{R}_{l}\bigcap\mathcal{P}^{(N)}, \qquad l \in [M]. {} \end{aligned} $$
(8)

Let

$$\displaystyle \begin{aligned} E_{l|l}^{\ast}&=E_{l|l}^{\ast}(E_{l|l})=E_{l|l},\ l\in \left[ M-1 \right], {} \end{aligned} $$
(9)
$$\displaystyle \begin{aligned} E_{m|l}^{\ast} &=E_{m|l}^{\ast}(E_{l|l}) =\inf\limits_{Q\in\mathcal{R}_l}D(Q||G_m), \ m\in \left[ M \right],\ m\neq l,\ l\in \left[ M-1 \right], {} \end{aligned} $$
(10)
$$\displaystyle \begin{aligned} E_{m|M}^{\ast} &=E_{m|M}^{\ast}(E_{1|1},\dots,E_{M-1,M-1}) =\inf\limits_{Q\in\mathcal{R}_{M}}D(Q||G_{m}),\ m\in \left[ M-1 \right], {} \end{aligned} $$
(11)
$$\displaystyle \begin{aligned} E_{M|M}^{\ast} &=E_{M|M}^{\ast}(E_{1|1},\dots,E_{M-1,M-1}) =\min\limits_{l\in \left[ M-1 \right]}E_{M|l}^{\ast}. {} \end{aligned} $$
(12)

If some distribution G m is not absolutely continuous relative to G l the reliability \(E_{m|l}^{\ast }\) will be equal to the infinity, this means that corresponding \(\alpha _{m|l}^{(N)}=0\) for some large N.

The principal result of [22] is:

Theorem 272

If all the distributions G m are different and all elements of the matrix {D(G l||G m)}, \(l,m\in \left [ M \right ]\) , are positive, but finite, two statements hold:

  1. (i)

    when the positive numbers E 1|1, E 2|2, …, E M−1,M−1 satisfy conditions

    $$\displaystyle \begin{aligned} E_{1|1}<\min\limits_{l\in \left[ 2,M \right]}D(G_l||G_1), \end{aligned}$$
    $$\displaystyle \begin{aligned} \vdots {} \end{aligned} $$
    (13)
    $$\displaystyle \begin{aligned} E_{m|m}<\min[\min\limits_{l\in \left[ m-1 \right]}E_{m|l}^{\ast}(E_{l|l}), \min\limits_{l\in \left[ m+1,M \right]}D(G_l||G_{m})], \,\, m\in \left[ 2,M-1 \right], \end{aligned} $$

    then there exists a LAO sequence of tests, the reliability matrix of which \(E^{\ast }=\{E_{m|l}^{\ast }\}\) is defined in (9)(12) and all elements of it are positive;

  2. (ii)

    even if one of conditions (13) is violated, then the reliability matrix of any such test has at least one element equal to zero (that is the corresponding error probability does not tend to zero exponentially).

The essence of the proof of Theorem 272 consists in construction of the following optimal tests sequence. Let the decision l will be taken when x gets into the set

$$\displaystyle \begin{aligned} \mathcal{B}_{l}^{(N)}=\bigcup_{Q\in\mathcal{R}_{l}^{(N)}}\mathcal{T}_{Q}^{(N)}, \qquad l\in \left[ M \right],\ N=1,2,\dots. {}\end{aligned} $$
(14)

The non-coincidence of the distributions G m and the conditions (13) guarantee that the sets from (14) are not empty, they meet conditions

$$\displaystyle \begin{aligned} \mathcal{B}_{l}^{(N)}\bigcap\mathcal{B}_{m}^{(N)}=\emptyset,\qquad l\neq m, \end{aligned} $$

and

$$\displaystyle \begin{aligned} \bigcup_{l=1}^{M}\mathcal{B}_l^{(N)}=\mathcal{X}^{N}, \end{aligned} $$

and so they define a sequence of tests, which proves to be LAO.

For the simplest particular case M = 2 elements of the reliability matrix (4) satisfy equalities (5) and for given E 1|1 from (5) and (7) we obtain the value of \(E_{2|1}^{\ast }=E_{2|2}^{\ast }\):

$$\displaystyle \begin{aligned} E_{2|1}^{\ast}(E_{1|1})=\inf_{Q: D(Q||G_{1})\leq E_{1|1}} D(Q||G_{2}). {} \end{aligned} $$
(15)

Here, according to (13), we can take E 1|1 from (0, D(G 2G 1)) and \(E_{2|1}^{\ast }(E_{1|1})\) will range between D(G 1||G 2) and 0.

3 Identification Problem for Model with Independent Objects

We begin with study of the second model. To illustrate possibly arising developments and essential features we consider a particular case K = 2, M = 2. It is clear that the case with M = 1 is trivial. The reliability matrix is (see (1))

$$\displaystyle \begin{aligned} \mathbf{E}= \begin{pmatrix} E_{1,1|1,1} & E_{1,1|1,2} & E_{1,1|2,1} & E_{1,1|2,2} \\ E_{1,2|1,1} & E_{1,2|1,2} & E_{1,2|2,1} & E_{1,2|2,2} \\ E_{2,1|1,1} & E_{2,1|1,2} & E_{2,1|2,1} & E_{2,1|2,2} \\ E_{2,2|1,1} & E_{2,2|1,2} & E_{2,2|2,1} & E_{2,2|2,2} \end{pmatrix} . \end{aligned}$$

Let us denote by \(\alpha _{m_1|l_1}^{(1)}\), \(\alpha _{m_2|l_2}^{(2)}\) and \(E_{m_1|l_1}^{(1)}\), \(E_{m_2|l_2}^{(2)}\) the error probabilities and the reliabilities as in (4) for, respectively, the first and the second objects.

Lemma 273

If \(0< E_{1|1}^{(i)}< D(G_2||G_1)\) , i = 1, 2, then the following equalities hold true:

$$\displaystyle \begin{aligned} E_{m_1,m_2|l_1,l_2}&=E_{m_1|l_1}^{(1)}+E_{m_2|l_2}^{(2)}, &&\mathit{\text{if }}m_1 \neq l_1,\ m_2\neq l_2,{} \end{aligned} $$
(16)
$$\displaystyle \begin{aligned} E_{m_1,m_2|l_1,l_2}&=E_{m_i|l_i}^{(i)}, &&\mathit{\text{if }}m_{3-i}=l_{3-i},\ m_i\neq l_i,\ i=1,2,{} \end{aligned} $$
(17)

Proof

From the independence of the objects it follows that

$$\displaystyle \begin{aligned} \alpha_{m_1,m_2|l_1, l_2}^{(N)} &=\alpha_{m_1|l_1}^{(N,1)}\alpha_{m_2|l_2}^{(N,2)}, &&\text{if } m_1\neq l_1,\ m_2\neq l_2,{} \end{aligned} $$
(18)
$$\displaystyle \begin{aligned} \alpha_{m_1,m_2|l_1, l_2}^{(N)} &=\alpha_{m_i|l_i}^{(N, i)} (1-\alpha_{m_{3-i}|l_{3-i}}^{(N, 3-i)}), &&\text{if } m_{3-i}=l_{3-i},\ m_i\neq l_i,\ i=1,2, {} \end{aligned} $$
(19)

According to (1), from (18) we obtain (16), from (19) and the conditions of positiveness of \(E_{1|1}^{(i)}\) and \(E_{2|2}^{(i)}\), i = 1, 2, (17) follows. □

Theorem 274

If the distributions G 1 and G 2 are different, the strictly positive elements E 1,1|1,2 , E 1,1|2,1 of the reliability matrix E are given and bounded above:

$$\displaystyle \begin{aligned} E_{1,1|1,2}<D(G_2||G_1),\mathit{\text{ and }} E_{1,1|2,1}<D(G_2||G_1), {} \end{aligned} $$
(20)

then the other elements of the matrix E are defined as follows:

$$\displaystyle \begin{aligned} E_{2,1|2,2}&=E_{1,1|1,2}, & E_{1,2|2,2}&=E_{1,1|2,1}, \\ E_{2,2|1,1}&=E_{1,2|1,1}+E_{2,1|1,1}, & E_{2,1|1,2}&=E_{2,1|1,1}+E_{1,2|2,2}, \\ E_{1,2|2,1}&=E_{1,2|1,1}+E_{1,2|2,2}, & E_{1,1|2,2}&=E_{1,1|1,2}+E_{1,1|2,1}, \end{aligned} $$
$$\displaystyle \begin{aligned} E_{1,2|1,1}&=E_{2,2|2,1}=\inf\limits_{Q: D(Q||G_1)\leq E_{1,1|1,2}}D(Q||G_2), \\ E_{2,1|1,1}&=E_{2,2|1,2}=\inf\limits_{Q: D(Q||G_1)\leq E_{1,1|2,1}}D(Q||G_2), {} \\ E_{m_1,m_2|m_1,m_2}&=\min\limits_{(l_1,l_2)\neq (m_1,m_2)}E_{m_1,m_2|l_1,l_2},\ m_1,m_2={1,2}. \end{aligned} $$
(21)

If one of the inequalities (20) is violated, then at least one element of the matrix E is equal to 0.

Proof

The last equalities in (21) follow (as (3)) from the definition of

$$\displaystyle \begin{aligned} \alpha_{m_1,m_2|m_1,m_2}^{(N)} =\sum\limits_{(l_1,l_2)\neq (m_1,m_2)}\alpha_{m_1,m_2|l_1,l_2}^{(N)}, \qquad m_1,m_2=1,2. \end{aligned}$$

Let us consider the reliability matrices of each of the objects X 1 and X 2

$$\displaystyle \begin{aligned} \mathbf{E^{(1)}= \begin{pmatrix} E_{1|1}^{(1)}& E_{1|2}^{(1)} \\ E_{2|1}^{(1)} & E_{2|2}^{(1)} \end{pmatrix} } \qquad \text{ and }\qquad {\mathbf{E}}^{(2)}= \begin{pmatrix} E_{1|1}^{(2)}& E_{1|2}^{(2)}\\ E_{2|1}^{(2)} & E_{2|2}^{(2)} \end{pmatrix} . \end{aligned}$$

From (5) we know that \(E_{1|1}^{(i)}= E_{1|2}^{(i)} \) and \(E_{2|1}^{(i)} =E_{2|2}^{(i)}\), i = 1, 2. From (20) it follows that \(0<E_{1|1}^{(1)}<D(G_2||G_1),\ \ 0< E_{1|1}^{(2)}<D(G_2||G_1)\). Really, if 0 < E 1,1|1,2 < D(G 2||G 1), but \(E_{1|1}^{(2)}\geq D(G_2||G_1)\), then from (19) and (1) we arrive to

$$\displaystyle \begin{aligned} \limsup_{{N\to\infty}}-\frac{1}{N}\log(1-\alpha^{(N,2)}_{1|2})<0, \end{aligned}$$

therefore index N 0 exists, such that for sub-sequence of N > N 0 we will have \(1-\alpha ^{(N,2)}_{1|2}>1\). But this is impossible because \(\alpha ^{(N,2)}_{1|2}\) is the probability and must be positive.

Using Lemma 273 we can deduce that the reliability matrix E can be obtained from matrices E (1) and E (2) as follows:

$$\displaystyle \begin{aligned} \mathbf{E}= \begin{pmatrix} \min(E_{1|2}^{(1)},E_{1|2}^{(2)}) & E_{1|2}^{(2)} & E_{1|2}^{(1)} &E_{1|2}^{(1)}+ E_{1|2}^{(2)} \\ E_{2|1}^{(2)} & \min( E_{1|2}^{(1)},E_{2|1}^{(2)} ) & E_{1|2}^{(1)}+E_{2|1}^{(2)}& E_{1|2}^{(1)} \\ E_{2|1}^{(1)} & E_{2|1}^{(1)}+ E_{1|2}^{(2)}& \min(E_{2|1}^{(1)},E_{1|2}^{(2)})& E_{1|2}^{(2)} \\ E_{2|1}^{(1)}+E_{2|1}^{(2)} & E_{2|1}^{(1)} &E_{2|1}^{(2)} & \min (E_{2|1}^{(1)}, E_{2|1}^{(2)}) \\ \end{pmatrix}, \end{aligned}$$

in other words, providing, that conditions (20) are fulfilled, we find that

$$\displaystyle \begin{aligned} E_{1,1|1,2}&=E_{1|2}^{(2)}=E_{1|1}^{(2)}, & E_{1,1|2,1}&=E_{1|2}^{(1)}=E_{1|1}^{(1)}, \\ E_{2,1|2,2}&=E_{1,1|1,2}=E_{1|2}^{(2)}, & E_{1,2|2,2}&=E_{1,1|2,1}=E_{1|2}^{(1)}, \\ E_{1,2|1,1}&=E_{2,2|2,1}=E_{2|1}^{(2)}, & E_{2,1|1,1}&=E_{2,2|1,2}=E_{2|1}^{(1)}, \\ E_{2,2|1,1}&=E_{2|1}^{(1)}+E_{2|1}^{(2)}, & E_{2,1|1,2}&=E_{2|1}^{(1)}+E_{1|2}^{(2)},{} \\ E_{1,2|2,1}&=E_{1|2}^{(1)}+E_{2|1}^{(2)}, & E_{1,1|2,2}&=E_{1|2}^{(1)}+E_{1|2}^{(2)}, \end{aligned} $$
(22)

and

$$\displaystyle \begin{aligned} E_{m_1,m_2|m_1,m_2}=\min \{E_{m_1|m_1}^{(1)},E_{m_2|m_2}^{(2)}\}, \qquad m_1,m_2={1,2}, \end{aligned}$$

From Theorem 272 we know that if \(E_{1|1}^{(i)}\in (0, D(G_2||G_1))\), i = 1, 2, then the tests of both objects are LAO and the elements \(E_{2|1}^{(i)}\), i = 1, 2, can be calculated (see (15)) by

$$\displaystyle \begin{aligned} E_{2|1}^{(i)}=\inf \limits_{Q:D(Q||G_1)\leq E_{1|1}^{(i)}}D(Q||G_2), \ \ \ \ i=1,2, {} \end{aligned} $$
(23)

and if \(E_{1|1}^{(i)}\geq D(G_2||G_1)\), then \(E_{2|1}^{(i)}=0\).

According to (22) and (23), we obtain, that when (20) takes place, the elements of the matrix E are determined by relations (21). When one of the inequalities (20) is violated, then from (23) and the first and the third lines of (22) we see, that some elements in the matrix E must be equal to 0 (namely, either E 1,2|1,1, or E 2,1|1,1 and others).

Now let us show that the compound test for two objects is LAO, that is it is optimal. Suppose that for given E 1,1|1,2 and E 1,1|2,1 there exists a test with matrix \({\mathbf {E}}^{ '}\), such that it has at least one element exceeding the respective element of the matrix E. Comparing elements of matrices E and E different from E 1,1|1,2 and E 1,1|2,1, from (22) we obtain that either \(E_{1,2|1,1}<E_{1,2|1,1}^{\prime }\), or \(E_{2,1|1,1}<E_{2,1|1,1}^{\prime }\), i.e. either \(E_{2|1}^{(2)} < E_{2|1}^{(2)'}\), or \(E_{2|1}^{(1)} <E_{2|1}^{(1)'}\). It is contradiction to the fact, that LAO tests have been used for the objects X 1 and X 2.

When it is demanded to take the same values for the reliabilities of the first and the second objects \(E_{1|2}^{(1)}=E_{1|2}^{(2)}=a_1\) and, consequently, \(E_{2|1}^{(1)}=E_{2|1}^{(2)}=a_2\), then the matrix E will take the following form

$$\displaystyle \begin{aligned} \mathbf{E}= \begin{pmatrix} a_1 & a_1 & a_1 &2a_1 \\ a_2 & \min( a_1,a_2 ) & a_1+a_2& a_1\\ a_2& a_1+ a_2& \min(a_1,a_2)&a_1 \\ 2a_2 & a_2&a_2 & a_2 \\ \end{pmatrix}. \end{aligned} $$

4 Identification Problem for Models with Different Objects

The K objects are not independent, they have different distributions, and so the number M of the distributions is not less than K. This is the model studied in [7]. For brevity we consider the case K = 2, M = 2. The matrix of reliabilities will be the following:

$$\displaystyle \begin{aligned}\mathbf{E}= \begin{pmatrix} E_{1,2|1,2} & E_{1,2|2,1} \\ E_{2,1|1,2} & E_{2,1|2,1} \end{pmatrix} . {} \end{aligned} $$
(24)

Since the objects are strictly dependent this matrix coincides with the reliability matrix of the first object (see (4))

$$\displaystyle \begin{aligned} \mathbf{E^{(1)}}= \begin{pmatrix} E_{1|1}^{(1)} & E_{1|2}^{(1)}\\ E_{2|1}^{(1)} & E_{2|2}^{(1)} \end{pmatrix} , \end{aligned}$$

because the distribution of the second object is uniquely defined by the distribution of the first one.

We can conclude that among 4 elements of the reliability matrix of two dependent objects only 2 elements are distinct, the second of which is defined by given \(E_{1|1}^{(1)}=E_{1,2|1,2}\).

From symmetry it follows that the reliability matrix of the second object also may determine the matrix (24).

5 Identification of the Probability Distribution of an Object

Let we have one object, K = 1, and there are known M ≥ 2 possible distributions. The question is whether rth distribution occurred, or not. There are two error probabilities for each \(r\in \left [ M \right ]\) the probability \({\alpha }_{m=r|l\neq r}^{(N)}\) to accept l different from r, when r is in reality, and the probability \({\alpha }_{m\neq r|l =r}^{(N)}\) that r is accepted, when it is not correct.

The probability \({\alpha }_{m=r|l\neq r}^{(N)}\) is already known, it coincides with the probability \({\alpha }_{r|r}^{(N)}\) which is equal to \(\sum \limits _{l: l\neq r}{\alpha }_{r|l}^{(N)}\). The corresponding reliability E m=r|lr is equal to E r|r which satisfies the equality (3).

We have to determine the dependence of E mr|l=r upon given E m=r|lr = E r|r, which can be assigned values satisfying conditions (13), this time we will have the conditions:

$$\displaystyle \begin{aligned}0< E_{r|r}< \min\limits_{l: l\neq r}D(G_l\Vert G_r), \quad r\in \lbrack M \rbrack. \end{aligned} $$

We need the probabilities of different hypotheses. Let us suppose that the hypotheses G 1, …, G M have, say, probabilities \({\Pr }(r), \ r\in [M].\) The only supposition we shall use is that \({\Pr }(r)>0, \ r\in [M].\) We will see, that the result formulated in the following theorem does not depend on values of \({\Pr }(r), \ r\in [M]\), if they all are strictly positive.

Now we can make the following reasoning for each \(r\in \left [ M \right ]\):

$$\displaystyle \begin{aligned} {\alpha}_{m\neq r|l=r}^{(N)}=\frac{{\Pr}^{(N)}(m \neq r, l=r)}{{\Pr}(m \neq r)} =\frac{1}{\sum\limits_{m: m\neq r}{\Pr}(m)} \sum_{m: m \neq r}{\Pr}^{(N)} (m,r).\end{aligned} $$

From here we see that for r ∈ [M]

$$\displaystyle \begin{aligned} E_{m\neq r|l=r} &=\limsup_{{N\to \infty}}\left (-\frac{1}{N}\log {\alpha}_{m\neq r|l=r}^{(N)}\right ) \\ &=\limsup_{{N\to \infty}} \frac{1}{N}\left(\log \sum_{m: m\neq r}{\Pr}(m)-\log \sum_{m: m\neq r}{\alpha}_{m|r}^{(N)}{\Pr} (m)\right ) \\ &= \min_{m: m\neq r}E_{m|r}^{\ast}. {}\end{aligned} $$
(25)

Using (25) by analogy with the formula (15) we conclude (with \(\mathcal {R}_r\) defined as in (6) for each r including r = M by the values of E r|r from \((0, \min \limits _{l: l \neq r}D(G_l|| G_r)))\) that

$$\displaystyle \begin{aligned} E_{m\neq r|l=r}(E_{r|r}) &=\min_{m: m\neq r}\inf_{Q\in \mathcal{R}_r} D(Q\Vert G_m) \\ &=\min_{m: m\neq r}\inf_{Q: D(Q\Vert G_r) \leq E_{r|r}}D(Q\Vert G_m), \ r \in [M]. {} \end{aligned} $$
(26)

We can summarize this result in

Theorem 275

For the model with different distributions, for the given sample x we define its ED Q, and when \(Q\in \mathcal {R}_r^ {(N)}\) we accept the hypothesis r. Under condition that the probabilities of all M hypotheses are positive the reliability of such test E mr|l=r for given E m=r|lr = E r|r is defined by (26).

For presentation of examples let us consider the set \(\mathcal {X}=\{0, 1\}\) with only 2 elements. Let 5 probability distributions are given on \({\mathcal {X}}\):

$$\displaystyle \begin{aligned} G_1&=\{0.1,\,\, 0.9\} \\ G_2&=\{0.65,\,\, 0.35\} \\ G_3&=\{0.45,\,\, 0.55\}\\ G_4&=\{0.85,\,\, 0.15\} \\ G_5&=\{0.23,\,\, 0.77\} \end{aligned} $$

On Fig. 1, the results of calculations of E mr|l=r as function of E m=r|lr are presented.

Fig. 1
figure 1

E mr|l=r as function of E m=r|lr for r = 1, 2, 3, 4, 5

The elements of the matrix of relative entropies of all pairs of distributions are used for calculation of conditions (13) for this example.

$$\displaystyle \begin{aligned} \left\{D(G_m\Vert G_l)\right\}_{m\in [5]}^{l\in [5]}=\left( \begin{array}{ccccc} 0 & 0.956 & 0.422 & 2.018 & 0.082\\ 1.278 & 0 & 0.117 & 0.176 & 0.576\\ 0.586 & 0.120 & 0 & 0.618 & 0.169\\ 2.237 & 0.146 & 0.499 & 0 & 1.249\\ 0.103 & 0.531 & 0.151 & 1.383 & 0\\ \end{array} \right).\end{aligned} $$

In Figs. 2 and 3 the results of calculations of the same dependence are presented for 4 distributions taken from previous 5.

Fig. 2
figure 2

E mt|l=t as function of \( \left [E_{t|t} \right ]\) for t = 1, 2, 3, 4

Fig. 3
figure 3

E mt|l=t as function of \( \left [E_{t|t} \right ]\) for t = 1, 2, 3, 4

6 r-Identification and Ranking Problems

The model was introduced in [1] and named K-identification. Since in this lecture the letter K is already used we speak of r-identification. Given N-sample x of measurements of the object the problem is to answer to the question: is the distribution of the object in the part \(\mathcal {S}\) of M possible distributions or in its complement, here r is the number of elements of the set \(\mathcal {S}\).

Again we can make decision on the base of the ED Q of the sample x and suppose that before experiments all hypotheses have some positive probabilities

$$\displaystyle \begin{aligned} {\Pr}(1),\dots, {\Pr}(M). {} \end{aligned}$$

Using (6)–(8) with some E 1,1, …, E M−1,M−1 meeting the conditions (13) when \(Q\in \bigcup \limits _{l\in \mathcal {S}}\mathcal {R}_l^{(N)}\) decision “l is in \({\mathcal {S}}\)” follows.

The model of ranking is the particular case of the model of r-identification with \(\mathcal {S}=\{1,2,\dots ,r\}\). But conversely the r-identification problem without loss of generality may be considered as the ranking problem, to this end we can renumber the hypotheses placing the hypotheses of \(\mathcal {S}\) in the r first places. Because these two models are mathematically equivalent we shall speak below only of the ranking model.

It is enough to consider the cases r ≤⌈M∕2⌉, because in the cases of larger r we can replace \({\mathcal {S}}\) with its complement. Remark that the case r = 1 was considered in 5.

We study two error probabilities of a test: the probability \({\alpha }_{m\leq r|l>r}^{(N)}\) to make incorrect decision when m is not greater than r and the probability \({\alpha }_{m>r|l\leq r}^{(N)}\) to make error when m is greater than r. The corresponding reliabilities are

$$\displaystyle \begin{aligned} E_1(r)=E_{m\leq r|l>r} \ \ \mbox{and} \ \ E_2(r)=E_{m>r|l\leq r},\ \, 1\leq r \leq \lceil M/2 \rceil. {}\end{aligned} $$
(27)

With supposition (6) we have

$$\displaystyle \begin{aligned} {\alpha}_{m\leq r|l >r}^{(N)} &=\frac{{\Pr}^{(N)}(m\leq r,l>r)}{{\Pr}(m\leq r)} \\ &=\frac{1}{{\sum \limits_{m \leq r}{\Pr}(m)}} \sum_{m\leq r}\sum_{l>r}{\Pr}^{(N)}(m,l) \\ &=\frac{1}{\sum \limits_{m \leq r}{\Pr}(m)} \sum_{m\leq r}\sum_{l>r}{\alpha}_{m|l}^{(N)}{\Pr}(m). {}\end{aligned} $$
(28)

The definition (27) of E 1(r) and the equality (28) give

$$\displaystyle \begin{aligned} E_1(r)&=\limsup_{{N\to \infty}}-\frac{1}{N} \log{\alpha}_{m\leq r|l>r}^{(N)} \\ &=\limsup_{{N\to \infty}}- \frac{1}{N}\left[\log\sum_{m\leq r}\sum_{l>r}{\Pr}(m) {\alpha}_{m|l}^{(N)}-\log {\sum \limits_{m \leq r}{\Pr}(m)}\right] \\ &= \min\limits_{m\leq r, l>r}E_{m|l}. {} \end{aligned} $$
(29)

Analogously, at the same time

$$\displaystyle \begin{aligned} E_2(r) &=\limsup_{{N\to \infty}}-\frac{1}{N}\log{\alpha}_{m>r|l\leq r}^{(N)} \\ &=\limsup_{{N\to \infty}}-\frac{1}{N}\left[\log\sum_{m>r} \sum_{l\leq r}{\alpha}_{m|l}^{(N)}-\log {\sum \limits_{m > r}{\Pr}(m)} \right] \\ &=\min\limits_{m>r, l\leq r}E_{m|l}. {} \end{aligned} $$
(30)

For any test the value of E 1(r) must satisfy the condition (compare (3) and (29))

$$\displaystyle \begin{aligned} E_1(r)\geq \min\limits_{m: m\leq r} E_{m|m}. {} \end{aligned} $$
(31)

Thus for any test meeting all inequalities from (13) for m ≤ r and inequality (31) the reliability E 2(r) may be calculated with the equality (30). For given value of E 1(r) the best E 2(r) will be obtained if we use liberty in selection of the biggest values for reliabilities E m|m, r < m ≤ M − 1, satisfying for those m-s conditions (13). These reasonings may be illuminated by Fig. 4 and resumed as follows:

Fig. 4
figure 4

Calculation of \(E_2(r) \left [E_1(r) \right ]\)

Theorem 276

When the probabilities of the hypotheses are positive, for given E 1(r) for m  r not exceeding the expressions on the right in (13), E 2(r) may be calculated in the following way:

$$\displaystyle \begin{aligned} E_2(r)\left [E_1(r)\right ]= \max\limits_{\{E_{m|l}, \ m, l\in \left[ M \right]\}:\min\limits_{m\leq r,\ l>r}E_{m|l}^{\ast}=E_1(r)}\left[\min\limits_{m>r,\ l\leq r}E_{m|l}^{\ast}\right] \,\, {} \end{aligned} $$
(32)

with \(E_{m|l}^{\ast }\) defined in (9)(12).

Remark

One can see from (32) that for r = 1 we arrive to (26) for r = 1.

In Figs. 5 and 6 for 2 subsets by 3 distributions taken from 5 defined for Fig. 1 the results of calculation of the dependence (26) and in Figs. 7 and 8 the corresponding results of the formula (33) are presented.

Fig. 5
figure 5

E mt|l=t as function of \( \left [E_{t|t} \right ]\) for t = 1, 2, 3

Fig. 6
figure 6

E mt|l=t as function of \( \left [E_{t|t} \right ]\) for t = 1, 2, 3

Fig. 7
figure 7

E 2(r), E 1(r)

Fig. 8
figure 8

E 2(r), E 1(r)

7 Conclusion and Extensions of Problems

The lecture is a contribution to influence of the information theory methods on statistical theory. We have shown by simple examples what questions arise in different models of statistical identification.

Problems and results of the lecture may be extended in several directions some of which have been already noted above.

It is necessary to examine models in which measurements are described by more general classes of RV’s and processes [18, 19, 21, 26].

One of the directions is connected with the use of compressed data of measurements [2, 6, 8, 19, 29].

One may see perspectives in application of identification approach and methods to the authentication theory [25] and steganography [13].