1 INTRODUCTION

The considered problems of the analysis of integer data arise at the stage of training logical classification procedures by precedents. The metric (quantitative) properties of the sets of solutions to these problems need to be studied in order to obtain theoretical estimates of the complexity of the synthesis of logical classifiers and forecast the time costs.

We introduce the basic concepts. The set of M objects are explored. It is known that each object of the set M can be represented as a numerical vector obtained based on the observation or measurement of a number of its characteristics. Such characteristics are called attributes. It is assumed that each attribute has a limited set of valid values, which are encoded as integers.

Assume \(X = \left\{ {{{x}_{1}},~ \ldots ,~{{x}_{n}}~} \right\}\) is the given set of attributes; H is the set from r attributes of the form \(H = \{ {{x}_{{{{j}_{1}}}}}, \ldots ,~{{x}_{{{{j}_{r}}}}}\} , \; {{j}_{1}} < \ldots < {{j}_{r}};\) and \({{\sigma }} = ({{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}})\) the set in which σi is an admissible value of attribute \({{x}_{{{{j}_{i}}}}}, \; i = \overline {1, \; r} \). The pair (σ, H) is called an elementary fragment (EF) of rank r. The set of all EFs generated by the set of attributes X is denoted through W(X).

We assume \(S = \left( {{{a}_{1}},~ \ldots ,~{{a}_{n}}} \right)~\) is an object from M (here aj, \(j \in \left\{ {1,~2,~ \ldots ,~n~} \right\}\), is the value of attribute xj for object S). We will consider that S contains EF \(\left( {{{\sigma }},~H} \right),H = \left\{ {{{x}_{{{{j}_{1}}}}}, \ldots ,~{{x}_{{{{j}_{r}}}}}} \right\},{{\sigma }} = \left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right)\) if \({{a}_{{{{j}_{i}}}}} = {{\sigma }_{i}}\) at \(i = \overline {1, \; r} \).

The set of objects D from M and number p, 1 \( \leqslant p \leqslant \left| D \right|\), where |D| is the number of objects in D, are given. The objects in D are not necessarily different.

EF \(\left( {{{\sigma }},~H} \right), \; \left( {{{\sigma }},~H} \right) \in W\left( X \right)\) is called (p, D)-frequent if at least p objects from D contain (σ, H). EF \(\left( {\sigma ,~H} \right), \; \left( {\sigma ,~H} \right) \in W\left( X \right)\), of rank \(r, \; r \leqslant \left| D \right|\) is called correct in D if (σ, H) is \(\left( {r,~D} \right)\)-frequent. EF \(\left( {{{\sigma }},~H} \right), \; \left( {{{\sigma }},~H} \right) \in W\left( X \right)\), is called infrequent in D if no object from D contains (σ, H), and it is called minimal infrequent in D if from the condition \({{\sigma }}' \subset {{\sigma }}, \; H' \subset H\) it follows that EF \(\left( {{{\sigma '}},~H'} \right)\) is not infrequent in D.

The logical classification of integer data assumes the presence of several nonoverlapping samples \({{D}_{{1~}}},~ \ldots ~,~{{D}_{{l~}}}_{~}\), \(l \geqslant 2\), of objects from M, each of which represents a certain class of objects. The objects contained in these samples are called precedents, and the attributes from X are called features. At the training stage in each sample \({{D}_{{i~}}}, \; i \in \left\{ {1,~2,~ \ldots ,~l} \right\}\), we search for those frequent EFs that are infrequent in Dj for any \(j \ne i\). The found EFs make it possible to distinguish precedents from different classes and are called logical patterns or representative elementary classifiers [16].

Some additional conditions may be imposed on the type of the desired EF (depending on the classifier model under consideration). For example, the so-called irredundant representative elementary classifiers are sought. An elementary classifier (σ, H) is called a irredundant representative for \({{D}_{{i~}}}, \; i \in \left\{ {1,~2,~ \ldots ,~l} \right\}\) if two conditions are met: (1) (σ, H) is an \(\left( {1,~{{D}_{{i~}}}} \right)\)-frequent EF; and (2) (σ, H) is minimal infrequent in Dj for any \(j \ne i\). In this case, when searching minimal infrequent EFs we need to consider the intractable discrete problem of constructing the irredundant covers of an integer matrix [3], whose rows are descriptions of precedents that do not belong to Di.

In [6], a model of a logical classifier is proposed, based on the initial search in each sample \({{D}_{{i~}}}, \; i \in \left\{ {1,~2,~ \ldots ,~l} \right\}\), of the correct EFs and the subsequent selection among them of those that are not contained in the descriptions of precedents from other classes. This model demonstrates a significant advantage in terms of counting speed over the classical model based on the construction of irredundant representative elementary classifiers, which are not inferior to the latter in terms of classification.

It is of interest to obtain asymptotic estimates (for n → ∞) of the typical number of correct EFs and estimates of the typical length of the correct EF. In [7], the required estimates are obtained for the case when the number of objects in D is significantly less than the number of attributes and each attribute takes values from the set {\(0,1, \ldots ,k - 1\} ,~\,\,k \geqslant 2\).

The new results obtained in this paper mainly concern research on the metric properties of the set of correct EFs in the case \(n \leqslant \left| D \right|\). It should be noted that similar properties of the set of minimal infrequent EFs were previously studied in a number of publications (for example, [3, 8, 9]), in which, among other things, the case \(n \leqslant \left| D \right|\) is considered. The estimates of the number of minimal infrequent EFs given in the article have a form that allows us to compare them with the corresponding estimates of the correct EFs. The result of the comparison indicates the expediency (in terms of reducing the time costs) of using methods for searching for frequent EFs for the synthesis of logical classifiers and agrees with the experimental results obtained in [6] on random model data.

In Section 1 the problem statement is given. The initial data are presented as an integer matrix, whose rows are descriptions of the objects from D. Statements of the two main theorems on the number of correct EFs are given. The proofs of these theorems are contained in Section 2. The previously obtained and new estimates of the typical values for the number of minimal infrequent EFs and the length of the minimum infrequent EF are shown in Section 3.

2 1. STATEMENT OF THE PROBLEM AND FORMULATION OF THE MAIN RESULTS

We assume \(L, \; L = \left( {{{a}_{{ij}}}} \right), \; i = \overline {1,~m} , \; j = \overline {1,~n} \), is a matrix with elements from {\(0,1, \ldots ,k - 1\} \), \(k \geqslant 2;E_{k}^{r},r \leqslant n, \; k \geqslant 2\), is the set of sets \(\left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right), \; {{\sigma }_{i}} \in \left\{ {0,~1, \ldots ,k - 1} \right\}, \; i = \overline {1, \; r} ; \; W_{r}^{n}, \; r \leqslant n\), is the set of all sets of the form \(\left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}\), where \({{j}_{t}} \in \left\{ {1,~2,~ \ldots ,n} \right\}\) at \(t = \overline {1, \; r} \) and \({{j}_{1}} < \ldots < {{j}_{r}}; \; V_{r}^{m}, \; r \leqslant m\), is the set of all ordered sets of the form \(\left( {{{i}_{1}}, \ldots ,{{i}_{r}}} \right)\), where \({{i}_{t}} \ne {{i}_{l}}\) at \(t,~l = \overline {1, \; r} \).

We put \({{\sigma }} \in E_{k}^{r},{{\;\sigma }} = \left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right),{{\;}}w \in W_{r}^{n},{{\;}}w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}\). We will call number \(r\) the length of set w.

We will call the set w σ-admissible for L if we can specify a set \({v} = \left( {{{i}_{1}}, \ldots ,{{i}_{r}}} \right),{v} \in V_{r}^{m}\) such that \({{a}_{{{{i}_{t}}{{j}_{t}}}}}\) \( = {{\sigma }_{t}}\) at \(t = \overline {1, \; r} \). We will consider that the σ-admissible set w is generated by the set σ.

It is easy to see that in the case when the matrix L takes descriptions of objects from the sample D as its rows, the set \(w \in W_{r}^{n}, \; w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}\), is σ-admissible for L if and only if the EF \(\left( {{{\sigma }},~H} \right), \; H = \left\{ {{{x}_{{{{j}_{1}}}}}, \ldots ,~{{x}_{{{{j}_{r}}}}}} \right\}\) is correct in D.

Let us introduce the following notation: \(\mathfrak{M}_{{mn}}^{k}\) is the set of all matrices of size m × n with elements from {\(0,1, \ldots ,k - 1\} , \; k \geqslant 2\); \(U\left( {L,~\sigma } \right), \; L \in \mathfrak{M}_{{mn}}^{k}, \; \sigma \in E_{k}^{r}\), is the set of all σ-admissible sets for matrices L; \({{U}_{r}}\left( {L,~\sigma } \right)\) is the set of all sets in \(U\left( {L,~{{\sigma }}} \right)\) of length r; \(U\left( L \right), \; L \in \mathfrak{M}_{{mn}}^{k}\), is the aggregate of all admissible sets for matrices L in which each set occurs as many times as the number of sets it generates from \(E_{k}^{r}\); |N| is the cardinality of the set N;

$$\left| {{{U}_{r}}\left( L \right)} \right| = \mathop \sum \limits_{\sigma \in E_{k}^{r}} \left| {{{U}_{r}}\left( {L,~\sigma } \right)} \right|;$$
$$\left| {U\left( L \right)} \right| = \mathop \sum \limits_{r = 1}^n \mathop \sum \limits_{\sigma \in E_{k}^{r}} \left| {{{U}_{r}}\left( {L,~\sigma } \right)} \right|;$$

\({{r}_{1}} = ~[0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}{\text{log}}_{k}^{2}mn - {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n]\); hear and further, [q] is the integer part of the number q; \({{r}_{2}} = \,]0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}{\text{log}}_{k}^{2}mn + {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n[\); hear and further, ]q[ is the smallest integer greater than \(q; \; {{\phi }_{1}}\) is the interval \(\left[ {{{r}_{1}},~{{r}_{2}}} \right]; \; {{r}_{3}} = \left] {{\text{lo}}{{{\text{g}}}_{k}}m + {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}m} \right[; \; {{\phi }_{2}}\) is the interval \(\left[ {1,~{{r}_{3}}} \right]; \; {{b}_{n}} \approx {{c}_{n}},\) n → ∞ means that \({\text{li}}{{{\text{m}}}_{{n \to \infty }}}{{b}_{n}}{\text{/}}{{c}_{n}} = 1\) and \({{b}_{n}} \preccurlyeq {{c}_{n}}, \; n~\,\, \to \,\,~\infty \) means that \({\text{li}}{{{\text{m}}}_{{n \to \infty }}}{{b}_{n}}{\text{/}}{{c}_{n}} \leqslant 1\).

Below we present the asymptotic estimates for the typical value of |U(L)| and an estimate of the typical length admissible set for \(L\) for different values of \(m\) and \(n\).

The identification of the typical situation is connected with a statement of the type “for almost all matrices L from \(\mathfrak{M}_{{mn}}^{k}\) at n → ∞ \({{F}_{1}}\left( L \right) \approx {{F}_{2}}\left( L \right)\) is satisfied” (here \({{F}_{1}}\left( L \right)\) and \({{F}_{2}}\left( L \right)\) are two functionals defined on matrices from \(\mathfrak{M}_{{mn}}^{k}\)). This statement means that there are two positive infinitely decreasing functions α(n) and β(n) such that for all sufficiently large n

$$1 - \left| \mathfrak{M} \right|{\text{/|}}\mathfrak{M}_{{mn}}^{k}{\text{|}} \leqslant {{\alpha }}(n)$$

where \(\mathfrak{M}\) is the set of such matrices L in \(\mathfrak{M}_{{mn}}^{k}\) for which

$$1 - ~{{\beta }}\left( n \right) < \left| {{{F}_{1}}\left( L \right)} \right|/\left| {{{F}_{2}}\left( L \right)} \right| < 1 + {{\beta }}\left( n \right)$$

is fulfilled.

Theorems 1 and 2 below are valid.

Theorem 1. If \({{m}^{a}} \leqslant n \leqslant {{k}^{m}}^{{^{{{\beta }}}}},~\,\,a > 1,~\,\,{{\beta }} < 1,~\,\,k \geqslant 2\), then at n → ∞ for almost all matrices L from \(\mathfrak{M}_{{mn}}^{k}\),

$$~\mathop \sum \limits_{r \leqslant {{r}_{1}}} \left| {{{U}_{r}}\left( L \right)} \right| \approx \left| {{{U}_{{{{r}_{1}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - {{r}_{1}}^{2}}}},$$
$$~\mathop \sum \limits_{r \geqslant {{r}_{2}}} \left| {{{U}_{r}}\left( L \right)} \right| \approx \left| {{{U}_{{{{r}_{2}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{2}}}}C_{m}^{{{{r}_{2}}}}{{k}^{{{{r}_{2}} - {{r}_{2}}^{2}}}},$$
$$\left| {U\left( L \right)} \right| \approx \mathop \sum \limits_{r \in {{\phi }_{1}}} \left| {{{U}_{r}}\left( L \right)} \right| \approx \mathop \sum \limits_{r \in {{\phi }_{1}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}}$$

are fulfilled and the lengths of almost all sets from U(L) belong to the interval ϕ1.

Theorem 2. If \(n \leqslant m \leqslant {{k}^{{{{n}^{\beta }}}}},\,\,{{\beta }} < 1{\text{/}}2,~\,\,k \geqslant 2\), then at n → ∞ for almost all matrices L from \(\mathfrak{M}_{{mn}}^{k}\),

$$~\mathop \sum \limits_{r \geqslant {{r}_{3}}} \left| {{{U}_{r}}\left( L \right)} \right| \approx \left| {{{U}_{{{{r}_{3}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{3}}}}C_{m}^{{{{r}_{3}}}}{{k}^{{{{r}_{3}} - {{r}_{3}}^{2}}}},$$
$$\left| {U\left( L \right)} \right| \precsim \mathop \sum \limits_{r \in {{\phi }_{2}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}}$$

are valid and the lengths of almost all sets from U(L) belong to the interval ϕ2.

The proofs of Theorems 1 and 2 are based on a number of lemmas given in Section 2.

3 2. PROOFS OF THEOREMS 1 AND 2

Assume \({v} \in V_{r}^{m},{v} = \left( {{{i}_{1}}, \ldots ,{{i}_{r}}} \right);{{\;\sigma }} \in E_{k}^{r},\sigma = \left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right);\) and \(w \in W_{r}^{n}, \; w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}\). Matrix L = (aij), \(i = \overline {1,~m} , \; j = \overline {1,~n} , \; L \in \mathfrak{M}_{{mn}}^{k}\), is called \(({v},{{\sigma }},w)\)-matrix if \({{a}_{{{{i}_{t}}{{j}_{t}}}}}\) \( = {{{{\sigma }}}_{t}}\) at \(t = \overline {1,r} \). We denote by \({{N}_{{({v},{{\sigma }},w)}}}\) the set of \(({v},{{\sigma }},w)\)-matrices in \(\mathfrak{M}_{{mn}}^{k}\); and through \(N_{{({v},{{\sigma }},w)}}^{*}\), the set of all matrices L in \({{N}_{{({v},{{\sigma }},w)}}}\) such that \(L \notin {{N}_{{({{{v}}_{1}},{{\sigma }},w)}}}\) at \({{{v}}_{1}} \in V_{r}^{m},{{{v}}_{1}} \ne {v}\).

Lemma 1. If \({v} \in V_{r}^{m},~\,\,w \in W_{r}^{n},{{\;\sigma }} \in E_{k}^{r}\), then

$$\left| {{{N}_{{({v},{{\sigma }},w)}}}} \right| = {{k}^{{mn - {{r}^{2}}}}}.$$

Proof. We estimate in how many ways it is possible to construct the matrix L from \({{N}_{{({v},{{\sigma }},w)}}}\). Those elements of matrix L that are located at the intersection of rows with numbers from \({v}\) and columns with numbers from w are uniquely determined. The remaining elements of this matrix can be chosen arbitrarily (\({{k}^{{mn - {{r}^{2}}}}}\) ways). From this we obtain the required estimate. Lemma 1 is proved.

Lemma 2. If \({v} \in V_{r}^{m},\,\,~w \in W_{r}^{n}\), \({{\sigma }} \in E_{k}^{r}\), then

$$\left| {N_{{({v},{{\sigma }},w)}}^{*}} \right| = {{(1 - {{k}^{{ - r}}})}^{{m - r}}}{{k}^{{mn - {{r}^{2}}}}}.$$

Proof. We estimate in how many ways it is possible to construct the matrix L from \(N_{{({v},{{\sigma }},w)}}^{*}\). The elements of this matrix, located in columns with numbers not included in w, can be chosen arbitrarily (in \({{k}^{{m\left( {n - r} \right)~}}}\) ways). Hence, given that the rows in the submatrix of matrix L formed by columns with numbers from w, can be chosen by \({{({{k}^{r}} - 1)}^{{m - r}}}\) methods, we obtain the required estimate. Lemma 2 is proved.

Lemma 3. We assume \({{{v}}_{1}} \in V_{r}^{m},{{\;}}{{{v}}_{2}} \in V_{l}^{m},{{\;}}{{w}_{1}} \in W_{r}^{n},{{\;}}{{w}_{2}} \in W_{l}^{n},{{\;\sigma '}} \in E_{k}^{r},{{\;\sigma ''}} \in E_{k}^{l}\); sets \({{{v}}_{1}}\) and \({{{v}}_{2}}\) intersect along \(a~\,\,\left( {a \geqslant 0} \right)\) elements; and sets w1 and w2 intersect along \(b~\,\,\left( {b \geqslant 0} \right)\) elements. Then

$$\left| {{{N}_{{({{{v}}_{1}},{{\sigma '}},~{{w}_{1}})~}}} \cap {{N}_{{({{{v}}_{2}},{{\sigma ''}},~{{w}_{2}})}}}} \right| \leqslant {{k}^{{mn - {{r}^{2}} - {{l}^{2}} + ab}}}.$$

The proof of Lemma 3 is not given due to its obviousness.

Lemmas 4–6 below are proved using the expression \({{b}_{n}}{{ \leqslant }_{n}}{{c}_{n}}\), which means that \({{b}_{n}} \leqslant {{c}_{n}}\) for all sufficiently large n.

Lemma 4. 1. If  \(m \leqslant n \leqslant {{k}^{{{{m}^{\beta }}}}},\,\,\beta < 1\), then

$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}} \precsim C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty .$$

2. The following relation is valid:

$$\mathop \sum \limits_{r \geqslant {{r}_{2}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}} \precsim C_{n}^{{{{r}_{2}}}}C_{m}^{{{{r}_{2}}}}{{k}^{{{{r}_{2}} - r_{2}^{2}}}},\quad n \to \infty .$$

3. If \(n \leqslant m\), then

$$\mathop \sum \limits_{r \geqslant {{r}_{3}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}} \precsim C_{n}^{{{{r}_{3}}}}C_{m}^{{{{r}_{3}}}}{{k}^{{{{r}_{3}} - r_{3}^{2}}}},\quad n \to \infty ~.$$

Proof. We put \({{a}_{r}} = C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}}\), \(q = 0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}~{\text{log}}_{k}^{2}mn,\,\,t = {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n\).

1. We assume \(m \leqslant n \leqslant {{k}^{{{{m}^{\beta }}}}}, \; \beta < 1\), and \(r \leqslant {{r}_{1}} + 1\). Then, using the fact that \(q \leqslant \) 0.5logkmn, \({{k}^{{2q}}} = mn{\text{/log}}_{k}^{2}mn \; ~\) and \(\left( {n - q} \right){{ \geqslant }_{n}}0.5n\) at \(m \leqslant n, \; \left( {m - q} \right){{ \geqslant }_{n}}0.5m\), at \(n \leqslant {{2}^{{{{m}^{\beta }}}}}\), we get

$$\frac{{{{a}_{{r - 1}}}}}{{{{a}_{r}}}} = \frac{{{{r}^{2}}{{k}^{{2r - 2}}}}}{{\left( {n - r + 1} \right)\left( {m - r + 1} \right)}} \leqslant \frac{{{{q}^{2}}{{k}^{{2q - 2t}}}}}{{\left( {n - q} \right)\left( {m - q} \right)}}{{ \leqslant }_{n}}{{k}^{{ - 2t}}}.$$

2. At \(r \geqslant {{r}_{2}} - 1~\), we get

$$\frac{{{{a}_{{r + 1}}}}}{{{{a}_{r}}}} \leqslant \frac{{mn}}{{{{r}^{2}}}}{{k}^{{ - 2r}}}{{ \leqslant }_{n}}\frac{{mn}}{{{{q}^{2}}~}}{{k}^{{ - 2q - 2t + 2}}}{{ \leqslant }_{n}}{{k}^{{ - 2t}}}.$$

3. At \(n \leqslant m, \; r \geqslant {{r}_{3}} - 1\), we get

$$\frac{{{{a}_{{r + 1}}}}}{{{{a}_{r}}}} \leqslant \frac{{mn}}{{{{r}^{2}}}}{{k}^{{ - 2r}}}{{ \leqslant }_{n}}\frac{1}{{{{{({\text{lo}}{{{\text{g}}}_{k}}n)}}^{2}}}}.$$

Thus, \({{a}_{{r - 1}}} = o({{a}_{r}}), \; n \to \infty \), in case 1 and \({{a}_{{r + 1}}} = o({{a}_{r}}), \; n \to \infty \), in each of cases 2 and 3. Lemma 4 is proved.

Lemma 5. If \(m \leqslant n\) and \(r,~\,\,l \leqslant {{r}_{2}}\), then

$$\mathop \sum \limits_{b = 0}^{{\text{min}}(r,l)} {{k}^{{lb}}}C_{n}^{r}C_{r}^{b}C_{{n - r}}^{{l - b}} \leqslant C_{n}^{r}C_{n}^{l}(1 + {{\delta }}(n)),$$

where δ(n) → 0 at n → ∞.

Proof. We denote \({{{{\lambda }}}_{b}} = {{k}^{{lb}}}C_{n}^{r}C_{r}^{b}C_{{n - r}}^{{l - b}}{\text{/}}C_{n}^{r}C_{{n - r}}^{l}\). Since

$$\frac{{C_{r}^{b}C_{{n - r}}^{{l - b}}}}{{C_{{n - r}}^{l}}} \leqslant {{\left( {\frac{{rl}}{{n - r - l}}} \right)}^{b}}~,$$

and on the condition \(r,~l{{ \leqslant }_{n}}0.~5{\text{lo}}{{{\text{g}}}_{k}}mn \leqslant {\text{lo}}{{{\text{g}}}_{k}}n,\left( {r + l} \right){\text{/}}n{{ \leqslant }_{n}}\) 0.5, then

$${{{{\lambda }}}_{b}}{{ \leqslant }_{n}}{{\left( {\frac{{2{\text{log}}_{k}^{2}n}}{n}} \right)}^{b}}~.$$

Therefore, the estimated amount does not exceed \(C_{n}^{r}C_{{n - r}}^{l}(1 + {{\delta }}(n))\), where δ(n) → 0 at n → ∞. Hence, using the inequality \(C_{{n - r}}^{l} \leqslant C_{n}^{l}\), we obtain the assertion of the lemma. Lemma 5 is proved.

Lemma 6. If \(m \leqslant ~{{k}^{{{{n}^{{{\beta }}}}}}},\,\,{{\beta }} < 1{\text{/}}2\), and \(r,~\,\,l \leqslant \) \({{r}_{3}}\), then

$$\mathop \sum \limits_{b = 0}^{{\text{min}}(r,l)} {{k}^{{lb}}}C_{n}^{r}C_{r}^{b}C_{{n - r}}^{{l - b}} < C_{n}^{r}C_{n}^{l}(1 + {{\delta }}(n)),$$

where δ(n) → 0 at n → ∞.

The proof of Lemma 6 is similar to the proof of Lemma 5 (in this case \(r, \; l \leqslant 2{{n}^{{{\beta }}}}\) and \({{{{\lambda }}}_{b}}{{ \leqslant }_{n}}{{(8{{n}^{{2{{\beta }} - 1}}})}^{b}}\)).

We consider \(\mathfrak{M}_{{mn}}^{k} = \left\{ L \right\}\) to be the space of elementary events in which each event L happens with probability \(1/\left| {\mathfrak{M}_{{mn}}^{k}} \right|\). The mathematical expectation of a random variable X(L) defined on the set \(\mathfrak{M}_{{mn}}^{k}\) will be denoted by \({\mathbf{M}}X\left( L \right)\); and dispersion, through \({\mathbf{D}}X\left( L \right)\).

Lemma 7 [10]. We assume that for random variables \({{X}_{1}}\left( L \right)\) and \({{X}_{2}}\left( L \right)\) defined on \(\mathfrak{M}_{{mn}}^{k}\), \({{X}_{1}}\left( L \right) \geqslant {{X}_{2}}\left( L \right) \geqslant 0\) is fulfilled; and at n → ∞, \({\mathbf{M}}{{X}_{1}}\left( L \right) \approx {\mathbf{M}}{{X}_{2}}\left( L \right)\) and \({\mathbf{D}}{{X}_{2}}\left( L \right){\text{/}}{{({\mathbf{M}}{{X}_{2}}\left( L \right))}^{2}}\) → 0 are valid. Then for almost all matrices L  from \(\mathfrak{M}_{{mn}}^{k}\), \({{X}_{1}}\left( L \right) \approx {{X}_{2}}\left( L \right) \approx {\mathbf{M}}{{X}_{2}}\left( L \right),\,\,~n \to \infty \), is valid.

Assume \({{\sigma }} \in E_{k}^{r}, \; w \in W_{r}^{n}\). On \(\mathfrak{M}_{{mn}}^{k} = \left\{ L \right\}\) we consider a random variable \({{{{\zeta }}}_{{\left( {{{\sigma }},w} \right)}}}\left( L \right), \; \)equal to 1 if w is the σ-admissible set for matrix L and equal to 0 otherwise. We put

$${{{{\mu }}}_{r}}\left( L \right) = \mathop \sum \limits_{w \in W_{r}^{n}} \mathop \sum \limits_{{{\sigma }} \in E_{k}^{r}} {{{{\zeta }}}_{{\left( {{{\sigma }},w} \right)}}}\left( L \right),\quad {{\zeta }}\left( L \right) = \mathop \sum \limits_{r = 1}^{{\text{min}}\left( {m,n} \right)} {{{{\mu }}}_{r}}\left( L \right)~,\quad {{{{\zeta }}}_{i}}\left( L \right) = \mathop \sum \limits_{r \in {{\phi }_{i}}} {{{{\mu }}}_{r}}\left( L \right),\quad i \in \left\{ {1,~2} \right\}~.$$

It is easy to see that \({{{{\mu }}}_{r}}\left( L \right) = \left| {{{U}_{r}}\left( L \right)} \right|\) (number of sets in U(L) of length r, \({{\zeta }}\left( L \right) = \) \(\left| {U\left( L \right)} \right|\), and \({{{{\zeta }}}_{i}}\left( L \right), \; i \in \left\{ {1,~2} \right\}\), is the number of those sets in U(L) whose lengths belong to the interval ϕi.

We estimate the probability of an event \({{{{\zeta }}}_{{({{\sigma }},w)}}}(L) = 1,{{\sigma }} \in E_{k}^{r},w \in W_{r}^{n}\), denoted below by \(P({{\zeta }_{{({{\sigma }},w)}}}(L)\) = 1). Obviously, by Lemma 1

$$P({{{{\zeta }}}_{{\left( {{{\sigma }},w} \right)}}}\left( L \right) = 1) \leqslant \mathop \sum \limits_{{v} \in V_{r}^{m}} \left| {{{N}_{{\left( {{v},{{\sigma }},w} \right)}}}} \right|{\text{/|}}\mathfrak{M}_{{mn}}^{k}{\text{|}} = C_{m}^{r}{{k}^{{ - {{r}^{2}}}}}.$$
(2.1)

However, by Lemma 2 we have

$$P({{\zeta }_{{\left( {{{\sigma }},w} \right)}}}\left( L \right) = 1) \geqslant \mathop \sum \limits_{{v} \in V_{r}^{m}} {\text{|}}N_{{\left( {{v},{{\sigma }},w} \right)}}^{*}{\text{|/|}}\mathfrak{M}_{{mn}}^{k}{\text{|}} = C_{m}^{r}{{(1 - {{k}^{{ - r}}})}^{{m - r}}}{{k}^{{ - {{r}^{2}}}}}.$$
(2.2)

The following lemma immediately follows from (2.1) and Lemma 4.

Lemma 8. If  \(m \leqslant n \leqslant {{k}^{{{{m}^{{{\beta }}}}}}},~\,\,{{\beta }} < 1\), then the following relations are valid:

$${\mathbf{M}}{{{{\mu }}}_{{{{r}_{1}}}}}\left( L \right) \leqslant C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty ,$$
$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} {\mathbf{M}}{{{{\mu }}}_{r}}\left( L \right) \precsim C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty .$$

Lemma 9. If  \({{m}^{a}} \leqslant n,\,\,a > 1\), then

$${\mathbf{M}}{{{{\mu }}}_{{{{r}_{1}}}}}\left( L \right) \succcurlyeq C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty .$$
$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} {\mathbf{M}}{{\mu }_{r}}\left( L \right) \succcurlyeq C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad ~n \to \infty .$$

Proof. We have

$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} {\mathbf{M}}{{{{\mu }}}_{r}}\left( L \right) \geqslant {\mathbf{M}}{{{{\mu }}}_{{{{r}_{1}}}}}\left( L \right).~$$

Since \(m{{k}^{{ - {{r}_{1}}}}} \to 0, \; n \to \infty \), then \({{(1 - {{k}^{{ - {{r}_{1}}}}})}^{{m - {{r}_{1}}}}} \to 1, \; n \to \infty \). From this, using (2.2), we obtain

$${\mathbf{M}}{{\mu }_{{{{r}_{1}}}}}\left( L \right) \succcurlyeq C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty .$$

Lemma 9 is proved.

Lemmas 8 and 9 immediately imply the following lemma.

Lemma 10. If  \({{m}^{a}} \leqslant n \leqslant {{k}^{{{{m}^{{{\beta }}}}}}},~\,\,a > 1,{{\;\beta }} < 1\), then

$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} {\mathbf{M}}{{\mu }_{r}}\left( L \right) \approx {\mathbf{M}}{{\mu }_{{{{r}_{1}}}}}\left( L \right) \approx C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad ~n \to \infty .~$$

The proofs of Lemmas 11–13 presented below are not given, since they are completely analogous to the proof of Lemma 10.

Lemma 11. If  \({{m}^{a}} \leqslant n,\,\,a > 1\), then

$$\mathop \sum \limits_{r \geqslant {{r}_{2}}} {\mathbf{M}}{{{{\mu }}}_{r}}\left( L \right) \approx {\mathbf{M}}{{{{\mu }}}_{{{{r}_{2}}}}}\left( L \right) \approx C_{n}^{{{{r}_{2}}}}C_{m}^{{{{r}_{2}}}}{{k}^{{{{r}_{2}} - r_{2}^{2}}}},~\quad n \to \infty $$

Lemma 12. If  \(n \leqslant m\), then

$$\mathop \sum \limits_{r \geqslant {{r}_{3}}} {\mathbf{M}}{{{{\mu }}}_{r}}\left( L \right) \approx {\mathbf{M}}{{{{\mu }}}_{{{{r}_{3}}}}}\left( L \right) \approx C_{n}^{{{{r}_{3}}}}C_{m}^{{{{r}_{3}}}}{{k}^{{{{r}_{3}} - r_{3}^{2}}}},~\quad n \to \infty .$$

Lemma 13. If  \({{m}^{a}} \leqslant n \leqslant {{k}^{{{{m}^{{{\beta }}}}}}},~\,\,a > 1,\,\,{{\beta }} < 1\), then

$${\mathbf{M}}{{\zeta }}\left( L \right) \approx {\mathbf{M}}{{{{\zeta }}}_{1}}\left( L \right) \approx \mathop \sum \limits_{r \in {{\phi }_{1}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}},\quad n \to \infty .$$

Lemma 14. If  \({{m}^{a}} \leqslant n \leqslant ~{{k}^{{{{m}^{{{\beta }}}}}}},~\,\,a > 1,\,\,\beta < 1\), then

$${\mathbf{D}}{{{{\zeta }}}_{1}}(L){\text{/}}{{({\mathbf{M}}{{{{\zeta }}}_{1}}(L))}^{2}} \to 0,\quad ~n \to \infty .$$

Proof. We have

$${\mathbf{D}}{{{{\zeta }}}_{1}}\left( L \right) = {\mathbf{M}}{{\left( {{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}} - {{\left( {{\mathbf{M}}{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}}.$$
(2.3)

It is easy to see that

$${\mathbf{M}}{{\left( {{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}} \leqslant ~\mathop \sum \limits_{r,l \in {{\phi }_{1}}} \mathop \sum \limits_{\begin{array}{*{20}{c}} {{{{v}}_{1}} \in V_{r}^{m},{{{v}}_{2}}~ \in V_{l}^{m}} \\ {{{w}_{1}} \in W_{r}^{n},~{{w}_{2}} \in W_{l}^{n}~} \end{array}} \mathop \sum \limits_{\begin{array}{*{20}{c}} {{{\sigma '}} \in E_{k}^{r}} \\ {{{\sigma ''}} \in E_{k}^{l}} \end{array}} \left| N \right|{\text{/}}{{k}^{{mn}}},$$

where \(N = {{N}_{{\left( {{{{v}}_{1}},{{\sigma }},{{w}_{1}}} \right)}}} \cap {{N}_{{\left( {{{{v}}_{2}},{{\sigma ''}},~{{w}_{2}}} \right)}}}\). Hence, using Lemmas 3 and 5, we obtain

$${\mathbf{M}}{{\left( {{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}} \leqslant \mathop \sum \limits_{r,l \in {{\phi }_{1}}} \mathop \sum \limits_{b = 0}^{{\text{min}}(r,l)} {{k}^{{r + l}}}{{k}^{{ - {{r}^{2}} - {{l}^{2}} + lb}}}C_{n}^{r}C_{r}^{b}C_{{n - r}}^{{l - b}}C_{m}^{r}C_{m}^{l}$$
$$ \leqslant \mathop \sum \limits_{r,l \in {{\phi }_{1}}} C_{n}^{r}C_{n}^{l}C_{m}^{r}C_{m}^{l}{{k}^{{r + l}}}{{k}^{{ - {{r}^{2}} - {{l}^{2}}}}}(1 + {{\delta }}(n)),~$$
(2.4)

where \({{\delta }}(n) \to 0\) at n → ∞.

However, by Lemma 13

$${{\left( {{\mathbf{M}}{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}} \approx \mathop \sum \limits_{r,l \in {{\phi }_{1}}} C_{n}^{r}C_{n}^{l}C_{m}^{r}C_{m}^{l}{{k}^{{r + l}}}{{k}^{{ - {{r}^{2}} - {{l}^{2}}}}},\quad n \to \infty . \; $$
(2.5)

From (2.3)–(2.5) the assertion of the lemma being proved follows. Lemma 14 is proved.

Lemmas 15–17 below are proved similarly to Lemma 14.

Lemma 15. If  \({{m}^{a}} \leqslant n \leqslant {{k}^{{{{m}^{\beta }}}}},\,\,a > 1,\,\,{{\beta }} < 1\), then

$${\mathbf{D}}{{{{\mu }}}_{{{{r}_{1}}}}}\left( L \right){\text{/}}{{\left( {{\mathbf{M}}{{\mu }_{{{{r}_{1}}}}}\left( L \right)} \right)}^{2}} \to 0,\quad n~\,\, \to \infty .$$

Lemma 16. If  \({{m}^{a}} \leqslant n,\,\,a > 1\), then

$${\mathbf{D}}{{{{\mu }}}_{{{{r}_{2}}}}}(L){\text{/}}{{({\mathbf{M}}{{{{\mu }}}_{{{{r}_{2}}}}}(L))}^{2}} \to \; 0~,\quad n \to \infty .$$

Lemma 17. If  \(n \leqslant m\), then

$${\mathbf{D}}{{{{\mu }}}_{{{{r}_{3}}}}}(L){\text{/}}{{({\mathbf{M}}{{{{\mu }}}_{{{{r}_{3}}}}}(L))}^{2}} \; \to 0,\quad n \to \infty .$$

We assume \({v} \in V_{r}^{m},~\,\,{{\sigma }} \in E_{k}^{r},\,\,w \in W_{r}^{n}\). On \(\mathfrak{M}_{{mn}}^{k} = \left\{ L \right\}\), we consider a random variable \({{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}(L),~\) equal to 1 if \(L \in {{N}_{{\left( {{v},{{\sigma }},w} \right)}}}\), and equal to 0 otherwise. We put

$${{\xi }}\left( L \right) = \mathop \sum \limits_{r = 1}^{{\text{min}}\left( {m,n} \right)} \mathop \sum \limits_{{v} \in V_{r}^{m},~w \in W_{r}^{n}} \mathop \sum \limits_{{{\sigma }} \in E_{k}^{r}} ~{{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right),$$
$${{{{\xi }}}_{1}}\left( L \right) = \mathop \sum \limits_{r \in {{\phi }_{2}}} \mathop \sum \limits_{{v} \in V_{r}^{m},~w \in W_{r}^{n}} \mathop \sum \limits_{{{\sigma }} \in E_{k}^{r}} ~{{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right).~$$

Lemma 18. If  \(n \leqslant m \leqslant ~{{k}^{{{{n}^{{{\beta }}}}}}},\,\,{{\beta }} < 1{\text{/}}2\), then at n → ∞, the following relation is fulfilled for almost all matrices L from \(\mathfrak{M}_{{mn}}^{k}\):

$$\xi \left( L \right) \approx {{{{\xi }}}_{1}}\left( L \right) \approx \mathop \sum \limits_{r \in {{\phi }_{2}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}}.$$

Proof. We estimate the probability of an event \({{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right)~\,\, = 1,\,\,{v} \in V_{r}^{m},\,\,{{\sigma }} \in E_{k}^{r},\,\,w \in W_{r}^{n}\), denoted below by \(P({{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right) = 1)\). By Lemma 1

$$P({{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right) = 1) \; = {\text{|}}{{N}_{{\left( {{v},{{\sigma }},w} \right)}}}{\text{|}}/{\text{|}}\mathfrak{M}_{{mn}}^{k}{\text{|}} = {{k}^{{ - {{r}^{2}}}}}.$$

Therefore, according to Lemma 4,

$${\mathbf{M}}~{{\xi }}\left( L \right) \approx {\mathbf{M}}{{{{\xi }}}_{1}}\left( L \right) \approx \mathop \sum \limits_{r \in {{\phi }_{2}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}},\quad n \to \infty .$$
(2.6)

From (2.6) and Lemma 6, using the scheme of the proof of Lemma 14, we obtain

$${\mathbf{D}}{{{{\xi }}}_{1}}(L){\text{/}}{{({\mathbf{M}}{{{{\xi }}}_{1}}(L))}^{2}},\quad n \to \infty .$$
(2.7)

From (2.6), (2.7), and Lemma 7, the assertion of the lemma to be proved follows. Lemma 18 is proved.

The assertions of Theorem 1 follow directly from Lemmas 7, 10, 11, 13, and 14–16, while the assertions of Theorem 2 follow directly from Lemmas 7, 12, 17, 18, and the inequality \({{\zeta }}\left( L \right) \leqslant {{\xi }}\left( L \right)\).

4 3. ESTIMATES OF THE TYPICAL VALUES OF THE NUMBER OF MINIMAL INFREQUENT EFS AND THE LENGTH OF THE MINIMUM INFREQUENT EFS

We put \(L \in \mathfrak{M}_{{mn}}^{k},L = ({{a}_{{ij}}}),i = 1,~ \ldots ,~m, \; j = 1,~ \ldots ,~n\); \({{\sigma }} \in E_{k}^{r},{{\sigma }} = \left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right);w \in W_{r}^{n},w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}\).

The set w is called a σ-covering for matric L of length r if for any \(i \in \left\{ {1,2, \ldots ,~m} \right\}\) there are \(j \in \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}\) such that \({{a}_{{ij}}} \ne {{\sigma }_{j}}\). We will consider that the σ-covering w is generated by the set σ.

The set w, which is a σ-covering for matric L is called an irredundant if for any \(~t \in \left\{ {1,~2, \ldots , \; r} \right\}\) the set \(w{{\backslash }}\{ {{j}_{t}}\} \) is not a γt-covering for matrix L, where \({{{{\gamma }}}_{t}} = \left( {{{{{\sigma }}}_{1}},~ \ldots ,~{{{{\sigma }}}_{{t - 1}}},~{{{{\sigma }}}_{{t + 1}}},~ \ldots ,~{{{{\sigma }}}_{r}}~} \right)\). If w is an irredundant σ-covering for matric L, it is easy to see that the columns of matrix L with numbers from w contain a submatrix that, up to row permutation, has the form

$$\left( {\begin{array}{*{20}{c}} {{{{{\beta }}}_{1}}~{{{{\sigma }}}_{2}}~{{{{\sigma }}}_{3}}~ \ldots ~{{{{\sigma }}}_{{r - 1}}}~{{{{\sigma }}}_{r}}} \\ {{{{{\sigma }}}_{1}}~{{{{\beta }}}_{2}}~{{{{\sigma }}}_{3}}~ \ldots ~{{{{\sigma }}}_{{r - 1}}}~{{{{\sigma }}}_{r}}} \\ {~ \ldots } \\ {~{{{{\sigma }}}_{1}}~{{{{\sigma }}}_{2}}~{{{{\sigma }}}_{3}}~ \ldots ~{{{{\sigma }}}_{{r - 1}}}~{{{{\beta }}}_{r}}} \end{array}} \right),$$

where \({{{{\beta }}}_{p}} \ne {{\sigma }_{p}}\) at \(p = 1,~2, \ldots ,~r.~\) Such a submatrix is called a σ-submatrix.

Note that in the case when the descriptions of objects from the sample D are taken as the rows of matrix L, then the set \(w \in W_{r}^{n}, \; w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}\), is an irredundant σ-covering for matric L if and only if the EF (σ, H), \(H = \left\{ {{{x}_{{{{j}_{1}}}}}, \ldots ,~{{x}_{{{{j}_{r}}}}}} \right\}\), is minimal infrequent in D.

We introduce the following notation: \(B(L,~\sigma ),L \in \mathfrak{M}_{{mn}}^{k},{{\sigma }} \in E_{k}^{r}\), is the set of all irredundant of the σ‑covering for matrix \(L;\,\,S(L,~\sigma ),\,\,L \in \mathfrak{M}_{{mn}}^{k},\,\,{{\sigma }} \in E_{k}^{r}\), is the set of all σ-matrix submatrices L; Br(L, σ), \(L \in \mathfrak{M}_{{mn}}^{k},{{\sigma }} \in E_{k}^{r}\), is the set of all sets in B(L, σ) of length \(r;{{S}_{r}}\left( {L,~\sigma } \right),{{\;}}L \in \mathfrak{M}_{{mn}}^{k},{{\;\sigma }} \in E_{k}^{r}\), is the set of all submatrices in \(S\left( {L,~\sigma } \right)\) of order \(r; \; B\left( L \right), \; L \in \mathfrak{M}_{{mn}}^{k}\), is the set of all irredundant σ-covering for matric L, in which each covering occurs as many times as the number of sets of \(E_{k}^{r}\) it generates; \(S\left( L \right), \; L \in \mathfrak{M}_{{mn}}^{k}\), is the set of all σ-submatrices of matrix L for all σ from \(E_{k}^{r}\);

$$\left| {B\left( L \right)} \right| = \mathop \sum \limits_{r = 1}^n \mathop \sum \limits_{\sigma \in E_{k}^{r}} \left| {{{B}_{r}}\left( {L,{{\sigma }}} \right)} \right|;$$
$$\left| {S\left( L \right)} \right| = \mathop \sum \limits_{r = 1}^n \mathop \sum \limits_{\sigma \in E_{k}^{r}} \left| {{{S}_{r}}\left( {L,~{{\sigma }}} \right)} \right|;$$

\({{r}_{3}} = \left] {{\text{lo}}{{{\text{g}}}_{k}}m + {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}m} \right[; \; {{\phi }_{2}}\) – interval [1, r3]; \({{r}_{4}} = \left[ {0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}mn - {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n} \right];\) \({{r}_{5}} = \left] {0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}mn + {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n} \right[\); ϕ3 – interval [r4, r5]; r6 = ]logkm + logklogkm + \({\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n[\); ϕ4 is the interval [1, r6].

Theorem 3 [3]. If \({{m}^{a}} \leqslant n \leqslant {{k}^{m}},~\,\,a > 1, \; k \geqslant 2\), then the following relations are valid at n → ∞ for almost all L matrices from \(\mathfrak{M}_{{mn}}^{k}\):

$$\mathop \sum \limits_{r \leqslant {{r}_{4}}} \left| {{{B}_{r}}\left( L \right)} \right| \approx \left| {{{B}_{{{{r}_{4}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{4}}}}C_{m}^{{{{r}_{4}}}}r!{{\left( {k - 1} \right)}^{{{{r}_{4}}}}}{{k}^{{{{r}_{4}} - r_{4}^{2}}}},$$
$$\mathop \sum \limits_{r \geqslant {{r}_{5}}} \left| {{{B}_{r}}\left( L \right)} \right| \approx \left| {{{B}_{{{{r}_{5}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{5}}}}C_{m}^{{{{r}_{5}}}}r!{{\left( {k - 1} \right)}^{{{{r}_{5}}}}}{{k}^{{{{r}_{5}} - r_{5}^{2}}}},$$
$$\left| {B\left( L \right)} \right| \approx \left| {S\left( L \right)} \right| \approx \mathop \sum \limits_{r \in {{\phi }_{3}}} C_{n}^{r}C_{m}^{r}r!{{\left( {k - 1} \right)}^{r}}{{k}^{{r - {{r}^{2}}}}},$$

and the lengths of almost all sets from B(L) belong to the interval ϕ3.

Theorem 4. If  \(n \leqslant m \leqslant {{k}^{{{{n}^{{{\beta }}}}}}},\,\,{{\beta }} < 1{\text{/}}2,~\,\,k \geqslant 2\), then the following relations are valid at n → ∞ for almost all matrices L from \(\mathfrak{M}_{{mn}}^{k}\):

$$\mathop \sum \limits_{r \geqslant {{r}_{6}}} \left| {{{B}_{r}}\left( L \right)} \right| \approx \left| {{{B}_{{{{r}_{6}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{6}}}}C_{m}^{{{{r}_{6}}}}r!{{\left( {k - 1} \right)}^{{{{r}_{6}}}}}{{k}^{{{{r}_{6}} - r_{6}^{2}}}},$$
$$\left| {B\left( L \right)} \right| \leqslant \left| {S\left( L \right)} \right| \approx \mathop \sum \limits_{r \in {{\phi }_{2}}} C_{n}^{r}C_{m}^{r}r!{{\left( {k - 1} \right)}^{r}}{{k}^{{r - {{r}^{2}}}}},$$

and the lengths of almost all sets from B(L) belong to the interval ϕ4.

The scheme of the proof of Theorem 4 is similar to that of the proof of Theorem 2.

Thus, in each of the two cases considered, the typical length of a set of U(L) and the typical length of a set of B(L) belong to the same interval. The results of Theorems 1, 3 and Theorems 2, 4 are illustrated, respectively, in Figs. 1 and 2.

Fig. 1.
figure 1

Typical values for the lengths of sets from U(L) (see Section 1) and B(L) when \({{m}^{a}} \leqslant n \leqslant {{k}^{m}}^{{^{{{\beta }}}}},\,\,~a > 1,~\,\,{{\beta }} < 1\).

Fig. 2.
figure 2

Typical values for the lengths of sets from U(L) (see Section 1) and B(L) when \(n \leqslant m \leqslant {{k}^{{{{n}^{{{\beta }}}}}}},\,\,{{\beta }} < 1{\text{/}}2\).

5 CONCLUSIONS

Topical issues of logical analysis of integer data concerning the research on the metric (quantitative) properties of sets of frequent and infrequent elements of such data are considered. The technique for obtaining estimates for the typical values of the main numerical characteristics of the specified sets has been improved and new estimates for such characteristics have been found. A theoretical substantiation of the expediency (in terms of reducing time costs) of using methods for searching for frequent elements at the stage of training classifiers based on a logical analysis of the training sample is given.

The results of the study carried out in this paper are also important for a number of other applied areas, among which it is worth highlighting the searching for associative rules in data. In this case D is called a database, and each object of the database D is a transaction. The associative rule establishes a relationship between two frequent EFs, according to which one frequent EF (premise) with some “certainty” entails another frequent EF (consequence). In this case, the premise and the consequence are generated by one common frequent EF. Questions of the synthesis of associative rules arose in connection with the analysis of the consumer basket [11].