Number of Solutions for Some Special Logical Analysis Problems of Integer Data

Djukova, A. P.; Djukova, E. V.

doi:10.1134/S1064230723050052

Number of Solutions for Some Special Logical Analysis Problems of Integer Data

COMPUTER METHODS
Published: 11 November 2023

Volume 62, pages 817–826, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Computer and Systems Sciences International Aims and scope

Number of Solutions for Some Special Logical Analysis Problems of Integer Data

Download PDF

A. P. Djukova¹ &
E. V. Djukova¹

92 Accesses
Explore all metrics

Abstract

In the class of discrete enumeration problems, an important place belongs to the problems of searching for frequently and infrequently occurring elements in integer data. Questions on the effectiveness of such a search are directly related to the study of the metric (quantitative) properties of sets of frequent and infrequent elements. It is assumed that the initial data are presented in the form of an integer matrix, whose rows are descriptions of the studied objects in the given system of the numerical characteristics of these objects, called attributes. The case is considered when each attribute takes values from the set {$0,1, \ldots ,k - 1\} , \; k \geqslant 2$. Asymptotic estimates for the typical number of special, frequent fragments of object descriptions, called correct fragments, and estimates for the typical length of such a fragment are given. We also present new results concerning the study of the metric properties of the minimal infrequent fragments of descriptions of objects.

New Approaches to Solving Discrete Programming Problems on the Basis of Lexicographic Search

Article 28 July 2016

Sparsity of Integer Solutions in the Average Case

Compact representation of near-optimal integer programming solutions

Article 09 April 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 INTRODUCTION

The considered problems of the analysis of integer data arise at the stage of training logical classification procedures by precedents. The metric (quantitative) properties of the sets of solutions to these problems need to be studied in order to obtain theoretical estimates of the complexity of the synthesis of logical classifiers and forecast the time costs.

We introduce the basic concepts. The set of M objects are explored. It is known that each object of the set M can be represented as a numerical vector obtained based on the observation or measurement of a number of its characteristics. Such characteristics are called attributes. It is assumed that each attribute has a limited set of valid values, which are encoded as integers.

Assume $X = \left\{ {{{x}_{1}},~ \ldots ,~{{x}_{n}}~} \right\}$ is the given set of attributes; H is the set from r attributes of the form $H = \{ {{x}_{{{{j}_{1}}}}}, \ldots ,~{{x}_{{{{j}_{r}}}}}\} , \; {{j}_{1}} < \ldots < {{j}_{r}};$ and ${{\sigma }} = ({{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}})$ the set in which σ_i is an admissible value of attribute ${{x}_{{{{j}_{i}}}}}, \; i = \overline {1, \; r} $. The pair (σ, H) is called an elementary fragment (EF) of rank r. The set of all EFs generated by the set of attributes X is denoted through W(X).

We assume $S = \left( {{{a}_{1}},~ \ldots ,~{{a}_{n}}} \right)~$ is an object from M (here a_j, $j \in \left\{ {1,~2,~ \ldots ,~n~} \right\}$, is the value of attribute x_j for object S). We will consider that S contains EF $\left( {{{\sigma }},~H} \right),H = \left\{ {{{x}_{{{{j}_{1}}}}}, \ldots ,~{{x}_{{{{j}_{r}}}}}} \right\},{{\sigma }} = \left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right)$ if ${{a}_{{{{j}_{i}}}}} = {{\sigma }_{i}}$ at $i = \overline {1, \; r} $.

The set of objects D from M and number p, 1 $ \leqslant p \leqslant \left| D \right|$, where |D| is the number of objects in D, are given. The objects in D are not necessarily different.

EF $\left( {{{\sigma }},~H} \right), \; \left( {{{\sigma }},~H} \right) \in W\left( X \right)$ is called (p, D)-frequent if at least p objects from D contain (σ, H). EF $\left( {\sigma ,~H} \right), \; \left( {\sigma ,~H} \right) \in W\left( X \right)$, of rank $r, \; r \leqslant \left| D \right|$ is called correct in D if (σ, H) is $\left( {r,~D} \right)$-frequent. EF $\left( {{{\sigma }},~H} \right), \; \left( {{{\sigma }},~H} \right) \in W\left( X \right)$, is called infrequent in D if no object from D contains (σ, H), and it is called minimal infrequent in D if from the condition ${{\sigma }}' \subset {{\sigma }}, \; H' \subset H$ it follows that EF $\left( {{{\sigma '}},~H'} \right)$ is not infrequent in D.

The logical classification of integer data assumes the presence of several nonoverlapping samples ${{D}_{{1~}}},~ \ldots ~,~{{D}_{{l~}}}_{~}$, $l \geqslant 2$, of objects from M, each of which represents a certain class of objects. The objects contained in these samples are called precedents, and the attributes from X are called features. At the training stage in each sample ${{D}_{{i~}}}, \; i \in \left\{ {1,~2,~ \ldots ,~l} \right\}$, we search for those frequent EFs that are infrequent in D_j for any $j \ne i$. The found EFs make it possible to distinguish precedents from different classes and are called logical patterns or representative elementary classifiers [1–6].

Some additional conditions may be imposed on the type of the desired EF (depending on the classifier model under consideration). For example, the so-called irredundant representative elementary classifiers are sought. An elementary classifier (σ, H) is called a irredundant representative for ${{D}_{{i~}}}, \; i \in \left\{ {1,~2,~ \ldots ,~l} \right\}$ if two conditions are met: (1) (σ, H) is an $\left( {1,~{{D}_{{i~}}}} \right)$-frequent EF; and (2) (σ, H) is minimal infrequent in D_j for any $j \ne i$. In this case, when searching minimal infrequent EFs we need to consider the intractable discrete problem of constructing the irredundant covers of an integer matrix [3], whose rows are descriptions of precedents that do not belong to D_i.

In [6], a model of a logical classifier is proposed, based on the initial search in each sample ${{D}_{{i~}}}, \; i \in \left\{ {1,~2,~ \ldots ,~l} \right\}$, of the correct EFs and the subsequent selection among them of those that are not contained in the descriptions of precedents from other classes. This model demonstrates a significant advantage in terms of counting speed over the classical model based on the construction of irredundant representative elementary classifiers, which are not inferior to the latter in terms of classification.

It is of interest to obtain asymptotic estimates (for n → ∞) of the typical number of correct EFs and estimates of the typical length of the correct EF. In [7], the required estimates are obtained for the case when the number of objects in D is significantly less than the number of attributes and each attribute takes values from the set {$0,1, \ldots ,k - 1\} ,~\,\,k \geqslant 2$.

The new results obtained in this paper mainly concern research on the metric properties of the set of correct EFs in the case $n \leqslant \left| D \right|$. It should be noted that similar properties of the set of minimal infrequent EFs were previously studied in a number of publications (for example, [3, 8, 9]), in which, among other things, the case $n \leqslant \left| D \right|$ is considered. The estimates of the number of minimal infrequent EFs given in the article have a form that allows us to compare them with the corresponding estimates of the correct EFs. The result of the comparison indicates the expediency (in terms of reducing the time costs) of using methods for searching for frequent EFs for the synthesis of logical classifiers and agrees with the experimental results obtained in [6] on random model data.

In Section 1 the problem statement is given. The initial data are presented as an integer matrix, whose rows are descriptions of the objects from D. Statements of the two main theorems on the number of correct EFs are given. The proofs of these theorems are contained in Section 2. The previously obtained and new estimates of the typical values for the number of minimal infrequent EFs and the length of the minimum infrequent EF are shown in Section 3.

2 1. STATEMENT OF THE PROBLEM AND FORMULATION OF THE MAIN RESULTS

We assume $L, \; L = \left( {{{a}_{{ij}}}} \right), \; i = \overline {1,~m} , \; j = \overline {1,~n} $, is a matrix with elements from {$0,1, \ldots ,k - 1\} $, $k \geqslant 2;E_{k}^{r},r \leqslant n, \; k \geqslant 2$, is the set of sets $\left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right), \; {{\sigma }_{i}} \in \left\{ {0,~1, \ldots ,k - 1} \right\}, \; i = \overline {1, \; r} ; \; W_{r}^{n}, \; r \leqslant n$, is the set of all sets of the form $\left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}$, where ${{j}_{t}} \in \left\{ {1,~2,~ \ldots ,n} \right\}$ at $t = \overline {1, \; r} $ and ${{j}_{1}} < \ldots < {{j}_{r}}; \; V_{r}^{m}, \; r \leqslant m$, is the set of all ordered sets of the form $\left( {{{i}_{1}}, \ldots ,{{i}_{r}}} \right)$, where ${{i}_{t}} \ne {{i}_{l}}$ at $t,~l = \overline {1, \; r} $.

We put ${{\sigma }} \in E_{k}^{r},{{\;\sigma }} = \left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right),{{\;}}w \in W_{r}^{n},{{\;}}w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}$. We will call number $r$ the length of set w.

We will call the set w σ-admissible for L if we can specify a set ${v} = \left( {{{i}_{1}}, \ldots ,{{i}_{r}}} \right),{v} \in V_{r}^{m}$ such that ${{a}_{{{{i}_{t}}{{j}_{t}}}}}$ $ = {{\sigma }_{t}}$ at $t = \overline {1, \; r} $. We will consider that the σ-admissible set w is generated by the set σ.

It is easy to see that in the case when the matrix L takes descriptions of objects from the sample D as its rows, the set $w \in W_{r}^{n}, \; w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}$, is σ-admissible for L if and only if the EF $\left( {{{\sigma }},~H} \right), \; H = \left\{ {{{x}_{{{{j}_{1}}}}}, \ldots ,~{{x}_{{{{j}_{r}}}}}} \right\}$ is correct in D.

Let us introduce the following notation: $\mathfrak{M}_{{mn}}^{k}$ is the set of all matrices of size m × n with elements from {$0,1, \ldots ,k - 1\} , \; k \geqslant 2$; $U\left( {L,~\sigma } \right), \; L \in \mathfrak{M}_{{mn}}^{k}, \; \sigma \in E_{k}^{r}$, is the set of all σ-admissible sets for matrices L; ${{U}_{r}}\left( {L,~\sigma } \right)$ is the set of all sets in $U\left( {L,~{{\sigma }}} \right)$ of length r; $U\left( L \right), \; L \in \mathfrak{M}_{{mn}}^{k}$, is the aggregate of all admissible sets for matrices L in which each set occurs as many times as the number of sets it generates from $E_{k}^{r}$; |N| is the cardinality of the set N;

$$\left| {{{U}_{r}}\left( L \right)} \right| = \mathop \sum \limits_{\sigma \in E_{k}^{r}} \left| {{{U}_{r}}\left( {L,~\sigma } \right)} \right|;$$

$$\left| {U\left( L \right)} \right| = \mathop \sum \limits_{r = 1}^n \mathop \sum \limits_{\sigma \in E_{k}^{r}} \left| {{{U}_{r}}\left( {L,~\sigma } \right)} \right|;$$

${{r}_{1}} = ~[0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}{\text{log}}_{k}^{2}mn - {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n]$; hear and further, [q] is the integer part of the number q; ${{r}_{2}} = \,]0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}{\text{log}}_{k}^{2}mn + {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n[$; hear and further, ]q[ is the smallest integer greater than $q; \; {{\phi }_{1}}$ is the interval $\left[ {{{r}_{1}},~{{r}_{2}}} \right]; \; {{r}_{3}} = \left] {{\text{lo}}{{{\text{g}}}_{k}}m + {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}m} \right[; \; {{\phi }_{2}}$ is the interval $\left[ {1,~{{r}_{3}}} \right]; \; {{b}_{n}} \approx {{c}_{n}},$ n → ∞ means that ${\text{li}}{{{\text{m}}}_{{n \to \infty }}}{{b}_{n}}{\text{/}}{{c}_{n}} = 1$ and ${{b}_{n}} \preccurlyeq {{c}_{n}}, \; n~\,\, \to \,\,~\infty $ means that ${\text{li}}{{{\text{m}}}_{{n \to \infty }}}{{b}_{n}}{\text{/}}{{c}_{n}} \leqslant 1$.

Below we present the asymptotic estimates for the typical value of |U(L)| and an estimate of the typical length admissible set for $L$ for different values of $m$ and $n$.

The identification of the typical situation is connected with a statement of the type “for almost all matrices L from $\mathfrak{M}_{{mn}}^{k}$ at n → ∞ ${{F}_{1}}\left( L \right) \approx {{F}_{2}}\left( L \right)$ is satisfied” (here ${{F}_{1}}\left( L \right)$ and ${{F}_{2}}\left( L \right)$ are two functionals defined on matrices from $\mathfrak{M}_{{mn}}^{k}$). This statement means that there are two positive infinitely decreasing functions α(n) and β(n) such that for all sufficiently large n

$$1 - \left| \mathfrak{M} \right|{\text{/|}}\mathfrak{M}_{{mn}}^{k}{\text{|}} \leqslant {{\alpha }}(n)$$

where $\mathfrak{M}$ is the set of such matrices L in $\mathfrak{M}_{{mn}}^{k}$ for which

$$1 - ~{{\beta }}\left( n \right) < \left| {{{F}_{1}}\left( L \right)} \right|/\left| {{{F}_{2}}\left( L \right)} \right| < 1 + {{\beta }}\left( n \right)$$

is fulfilled.

Theorems 1 and 2 below are valid.

Theorem 1. If ${{m}^{a}} \leqslant n \leqslant {{k}^{m}}^{{^{{{\beta }}}}},~\,\,a > 1,~\,\,{{\beta }} < 1,~\,\,k \geqslant 2$, then at n → ∞ for almost all matrices L from $\mathfrak{M}_{{mn}}^{k}$,

$$~\mathop \sum \limits_{r \leqslant {{r}_{1}}} \left| {{{U}_{r}}\left( L \right)} \right| \approx \left| {{{U}_{{{{r}_{1}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - {{r}_{1}}^{2}}}},$$

$$~\mathop \sum \limits_{r \geqslant {{r}_{2}}} \left| {{{U}_{r}}\left( L \right)} \right| \approx \left| {{{U}_{{{{r}_{2}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{2}}}}C_{m}^{{{{r}_{2}}}}{{k}^{{{{r}_{2}} - {{r}_{2}}^{2}}}},$$

$$\left| {U\left( L \right)} \right| \approx \mathop \sum \limits_{r \in {{\phi }_{1}}} \left| {{{U}_{r}}\left( L \right)} \right| \approx \mathop \sum \limits_{r \in {{\phi }_{1}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}}$$

are fulfilled and the lengths of almost all sets from U(L) belong to the interval ϕ₁.

Theorem 2. If $n \leqslant m \leqslant {{k}^{{{{n}^{\beta }}}}},\,\,{{\beta }} < 1{\text{/}}2,~\,\,k \geqslant 2$, then at n → ∞ for almost all matrices L from $\mathfrak{M}_{{mn}}^{k}$,

$$~\mathop \sum \limits_{r \geqslant {{r}_{3}}} \left| {{{U}_{r}}\left( L \right)} \right| \approx \left| {{{U}_{{{{r}_{3}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{3}}}}C_{m}^{{{{r}_{3}}}}{{k}^{{{{r}_{3}} - {{r}_{3}}^{2}}}},$$

$$\left| {U\left( L \right)} \right| \precsim \mathop \sum \limits_{r \in {{\phi }_{2}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}}$$

are valid and the lengths of almost all sets from U(L) belong to the interval ϕ₂.

The proofs of Theorems 1 and 2 are based on a number of lemmas given in Section 2.

3 2. PROOFS OF THEOREMS 1 AND 2

Assume ${v} \in V_{r}^{m},{v} = \left( {{{i}_{1}}, \ldots ,{{i}_{r}}} \right);{{\;\sigma }} \in E_{k}^{r},\sigma = \left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right);$ and $w \in W_{r}^{n}, \; w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}$. Matrix L = (a_ij), $i = \overline {1,~m} , \; j = \overline {1,~n} , \; L \in \mathfrak{M}_{{mn}}^{k}$, is called $({v},{{\sigma }},w)$-matrix if ${{a}_{{{{i}_{t}}{{j}_{t}}}}}$ $ = {{{{\sigma }}}_{t}}$ at $t = \overline {1,r} $. We denote by ${{N}_{{({v},{{\sigma }},w)}}}$ the set of $({v},{{\sigma }},w)$-matrices in $\mathfrak{M}_{{mn}}^{k}$; and through $N_{{({v},{{\sigma }},w)}}^{*}$, the set of all matrices L in ${{N}_{{({v},{{\sigma }},w)}}}$ such that $L \notin {{N}_{{({{{v}}_{1}},{{\sigma }},w)}}}$ at ${{{v}}_{1}} \in V_{r}^{m},{{{v}}_{1}} \ne {v}$.

Lemma 1. If ${v} \in V_{r}^{m},~\,\,w \in W_{r}^{n},{{\;\sigma }} \in E_{k}^{r}$, then

$$\left| {{{N}_{{({v},{{\sigma }},w)}}}} \right| = {{k}^{{mn - {{r}^{2}}}}}.$$

Proof. We estimate in how many ways it is possible to construct the matrix L from ${{N}_{{({v},{{\sigma }},w)}}}$. Those elements of matrix L that are located at the intersection of rows with numbers from ${v}$ and columns with numbers from w are uniquely determined. The remaining elements of this matrix can be chosen arbitrarily (${{k}^{{mn - {{r}^{2}}}}}$ ways). From this we obtain the required estimate. Lemma 1 is proved.

Lemma 2. If ${v} \in V_{r}^{m},\,\,~w \in W_{r}^{n}$, ${{\sigma }} \in E_{k}^{r}$, then

$$\left| {N_{{({v},{{\sigma }},w)}}^{*}} \right| = {{(1 - {{k}^{{ - r}}})}^{{m - r}}}{{k}^{{mn - {{r}^{2}}}}}.$$

Proof. We estimate in how many ways it is possible to construct the matrix L from $N_{{({v},{{\sigma }},w)}}^{*}$. The elements of this matrix, located in columns with numbers not included in w, can be chosen arbitrarily (in ${{k}^{{m\left( {n - r} \right)~}}}$ ways). Hence, given that the rows in the submatrix of matrix L formed by columns with numbers from w, can be chosen by ${{({{k}^{r}} - 1)}^{{m - r}}}$ methods, we obtain the required estimate. Lemma 2 is proved.

Lemma 3. We assume ${{{v}}_{1}} \in V_{r}^{m},{{\;}}{{{v}}_{2}} \in V_{l}^{m},{{\;}}{{w}_{1}} \in W_{r}^{n},{{\;}}{{w}_{2}} \in W_{l}^{n},{{\;\sigma '}} \in E_{k}^{r},{{\;\sigma ''}} \in E_{k}^{l}$; sets ${{{v}}_{1}}$ and ${{{v}}_{2}}$ intersect along $a~\,\,\left( {a \geqslant 0} \right)$ elements; and sets w₁ and w₂ intersect along $b~\,\,\left( {b \geqslant 0} \right)$ elements. Then

$$\left| {{{N}_{{({{{v}}_{1}},{{\sigma '}},~{{w}_{1}})~}}} \cap {{N}_{{({{{v}}_{2}},{{\sigma ''}},~{{w}_{2}})}}}} \right| \leqslant {{k}^{{mn - {{r}^{2}} - {{l}^{2}} + ab}}}.$$

The proof of Lemma 3 is not given due to its obviousness.

Lemmas 4–6 below are proved using the expression ${{b}_{n}}{{ \leqslant }_{n}}{{c}_{n}}$, which means that ${{b}_{n}} \leqslant {{c}_{n}}$ for all sufficiently large n.

Lemma 4. 1. If $m \leqslant n \leqslant {{k}^{{{{m}^{\beta }}}}},\,\,\beta < 1$, then

$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}} \precsim C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty .$$

2. The following relation is valid:

$$\mathop \sum \limits_{r \geqslant {{r}_{2}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}} \precsim C_{n}^{{{{r}_{2}}}}C_{m}^{{{{r}_{2}}}}{{k}^{{{{r}_{2}} - r_{2}^{2}}}},\quad n \to \infty .$$

3. If $n \leqslant m$, then

$$\mathop \sum \limits_{r \geqslant {{r}_{3}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}} \precsim C_{n}^{{{{r}_{3}}}}C_{m}^{{{{r}_{3}}}}{{k}^{{{{r}_{3}} - r_{3}^{2}}}},\quad n \to \infty ~.$$

Proof. We put ${{a}_{r}} = C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}}$, $q = 0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}~{\text{log}}_{k}^{2}mn,\,\,t = {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n$.

1. We assume $m \leqslant n \leqslant {{k}^{{{{m}^{\beta }}}}}, \; \beta < 1$, and $r \leqslant {{r}_{1}} + 1$. Then, using the fact that $q \leqslant $ 0.5log_kmn, ${{k}^{{2q}}} = mn{\text{/log}}_{k}^{2}mn \; ~$ and $\left( {n - q} \right){{ \geqslant }_{n}}0.5n$ at $m \leqslant n, \; \left( {m - q} \right){{ \geqslant }_{n}}0.5m$, at $n \leqslant {{2}^{{{{m}^{\beta }}}}}$, we get

$$\frac{{{{a}_{{r - 1}}}}}{{{{a}_{r}}}} = \frac{{{{r}^{2}}{{k}^{{2r - 2}}}}}{{\left( {n - r + 1} \right)\left( {m - r + 1} \right)}} \leqslant \frac{{{{q}^{2}}{{k}^{{2q - 2t}}}}}{{\left( {n - q} \right)\left( {m - q} \right)}}{{ \leqslant }_{n}}{{k}^{{ - 2t}}}.$$

2. At $r \geqslant {{r}_{2}} - 1~$, we get

$$\frac{{{{a}_{{r + 1}}}}}{{{{a}_{r}}}} \leqslant \frac{{mn}}{{{{r}^{2}}}}{{k}^{{ - 2r}}}{{ \leqslant }_{n}}\frac{{mn}}{{{{q}^{2}}~}}{{k}^{{ - 2q - 2t + 2}}}{{ \leqslant }_{n}}{{k}^{{ - 2t}}}.$$

3. At $n \leqslant m, \; r \geqslant {{r}_{3}} - 1$, we get

$$\frac{{{{a}_{{r + 1}}}}}{{{{a}_{r}}}} \leqslant \frac{{mn}}{{{{r}^{2}}}}{{k}^{{ - 2r}}}{{ \leqslant }_{n}}\frac{1}{{{{{({\text{lo}}{{{\text{g}}}_{k}}n)}}^{2}}}}.$$

Thus, ${{a}_{{r - 1}}} = o({{a}_{r}}), \; n \to \infty $, in case 1 and ${{a}_{{r + 1}}} = o({{a}_{r}}), \; n \to \infty $, in each of cases 2 and 3. Lemma 4 is proved.

Lemma 5. If $m \leqslant n$ and $r,~\,\,l \leqslant {{r}_{2}}$, then

$$\mathop \sum \limits_{b = 0}^{{\text{min}}(r,l)} {{k}^{{lb}}}C_{n}^{r}C_{r}^{b}C_{{n - r}}^{{l - b}} \leqslant C_{n}^{r}C_{n}^{l}(1 + {{\delta }}(n)),$$

where δ(n) → 0 at n → ∞.

Proof. We denote ${{{{\lambda }}}_{b}} = {{k}^{{lb}}}C_{n}^{r}C_{r}^{b}C_{{n - r}}^{{l - b}}{\text{/}}C_{n}^{r}C_{{n - r}}^{l}$. Since

$$\frac{{C_{r}^{b}C_{{n - r}}^{{l - b}}}}{{C_{{n - r}}^{l}}} \leqslant {{\left( {\frac{{rl}}{{n - r - l}}} \right)}^{b}}~,$$

and on the condition $r,~l{{ \leqslant }_{n}}0.~5{\text{lo}}{{{\text{g}}}_{k}}mn \leqslant {\text{lo}}{{{\text{g}}}_{k}}n,\left( {r + l} \right){\text{/}}n{{ \leqslant }_{n}}$ 0.5, then

$${{{{\lambda }}}_{b}}{{ \leqslant }_{n}}{{\left( {\frac{{2{\text{log}}_{k}^{2}n}}{n}} \right)}^{b}}~.$$

Therefore, the estimated amount does not exceed $C_{n}^{r}C_{{n - r}}^{l}(1 + {{\delta }}(n))$, where δ(n) → 0 at n → ∞. Hence, using the inequality $C_{{n - r}}^{l} \leqslant C_{n}^{l}$, we obtain the assertion of the lemma. Lemma 5 is proved.

Lemma 6. If $m \leqslant ~{{k}^{{{{n}^{{{\beta }}}}}}},\,\,{{\beta }} < 1{\text{/}}2$, and $r,~\,\,l \leqslant $ ${{r}_{3}}$, then

$$\mathop \sum \limits_{b = 0}^{{\text{min}}(r,l)} {{k}^{{lb}}}C_{n}^{r}C_{r}^{b}C_{{n - r}}^{{l - b}} < C_{n}^{r}C_{n}^{l}(1 + {{\delta }}(n)),$$

where δ(n) → 0 at n → ∞.

The proof of Lemma 6 is similar to the proof of Lemma 5 (in this case $r, \; l \leqslant 2{{n}^{{{\beta }}}}$ and ${{{{\lambda }}}_{b}}{{ \leqslant }_{n}}{{(8{{n}^{{2{{\beta }} - 1}}})}^{b}}$).

We consider $\mathfrak{M}_{{mn}}^{k} = \left\{ L \right\}$ to be the space of elementary events in which each event L happens with probability $1/\left| {\mathfrak{M}_{{mn}}^{k}} \right|$. The mathematical expectation of a random variable X(L) defined on the set $\mathfrak{M}_{{mn}}^{k}$ will be denoted by ${\mathbf{M}}X\left( L \right)$; and dispersion, through ${\mathbf{D}}X\left( L \right)$.

Lemma 7 [10]. We assume that for random variables ${{X}_{1}}\left( L \right)$ and ${{X}_{2}}\left( L \right)$ defined on $\mathfrak{M}_{{mn}}^{k}$, ${{X}_{1}}\left( L \right) \geqslant {{X}_{2}}\left( L \right) \geqslant 0$ is fulfilled; and at n → ∞, ${\mathbf{M}}{{X}_{1}}\left( L \right) \approx {\mathbf{M}}{{X}_{2}}\left( L \right)$ and ${\mathbf{D}}{{X}_{2}}\left( L \right){\text{/}}{{({\mathbf{M}}{{X}_{2}}\left( L \right))}^{2}}$ → 0 are valid. Then for almost all matrices L from $\mathfrak{M}_{{mn}}^{k}$, ${{X}_{1}}\left( L \right) \approx {{X}_{2}}\left( L \right) \approx {\mathbf{M}}{{X}_{2}}\left( L \right),\,\,~n \to \infty $, is valid.

Assume ${{\sigma }} \in E_{k}^{r}, \; w \in W_{r}^{n}$. On $\mathfrak{M}_{{mn}}^{k} = \left\{ L \right\}$ we consider a random variable ${{{{\zeta }}}_{{\left( {{{\sigma }},w} \right)}}}\left( L \right), \; $equal to 1 if w is the σ-admissible set for matrix L and equal to 0 otherwise. We put

$${{{{\mu }}}_{r}}\left( L \right) = \mathop \sum \limits_{w \in W_{r}^{n}} \mathop \sum \limits_{{{\sigma }} \in E_{k}^{r}} {{{{\zeta }}}_{{\left( {{{\sigma }},w} \right)}}}\left( L \right),\quad {{\zeta }}\left( L \right) = \mathop \sum \limits_{r = 1}^{{\text{min}}\left( {m,n} \right)} {{{{\mu }}}_{r}}\left( L \right)~,\quad {{{{\zeta }}}_{i}}\left( L \right) = \mathop \sum \limits_{r \in {{\phi }_{i}}} {{{{\mu }}}_{r}}\left( L \right),\quad i \in \left\{ {1,~2} \right\}~.$$

It is easy to see that ${{{{\mu }}}_{r}}\left( L \right) = \left| {{{U}_{r}}\left( L \right)} \right|$ (number of sets in U(L) of length r, ${{\zeta }}\left( L \right) = $ $\left| {U\left( L \right)} \right|$, and ${{{{\zeta }}}_{i}}\left( L \right), \; i \in \left\{ {1,~2} \right\}$, is the number of those sets in U(L) whose lengths belong to the interval ϕ_i.

We estimate the probability of an event ${{{{\zeta }}}_{{({{\sigma }},w)}}}(L) = 1,{{\sigma }} \in E_{k}^{r},w \in W_{r}^{n}$, denoted below by $P({{\zeta }_{{({{\sigma }},w)}}}(L)$ = 1). Obviously, by Lemma 1

$$P({{{{\zeta }}}_{{\left( {{{\sigma }},w} \right)}}}\left( L \right) = 1) \leqslant \mathop \sum \limits_{{v} \in V_{r}^{m}} \left| {{{N}_{{\left( {{v},{{\sigma }},w} \right)}}}} \right|{\text{/|}}\mathfrak{M}_{{mn}}^{k}{\text{|}} = C_{m}^{r}{{k}^{{ - {{r}^{2}}}}}.$$

(2.1)

However, by Lemma 2 we have

$$P({{\zeta }_{{\left( {{{\sigma }},w} \right)}}}\left( L \right) = 1) \geqslant \mathop \sum \limits_{{v} \in V_{r}^{m}} {\text{|}}N_{{\left( {{v},{{\sigma }},w} \right)}}^{*}{\text{|/|}}\mathfrak{M}_{{mn}}^{k}{\text{|}} = C_{m}^{r}{{(1 - {{k}^{{ - r}}})}^{{m - r}}}{{k}^{{ - {{r}^{2}}}}}.$$

(2.2)

The following lemma immediately follows from (2.1) and Lemma 4.

Lemma 8. If $m \leqslant n \leqslant {{k}^{{{{m}^{{{\beta }}}}}}},~\,\,{{\beta }} < 1$, then the following relations are valid:

$${\mathbf{M}}{{{{\mu }}}_{{{{r}_{1}}}}}\left( L \right) \leqslant C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty ,$$

$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} {\mathbf{M}}{{{{\mu }}}_{r}}\left( L \right) \precsim C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty .$$

Lemma 9. If ${{m}^{a}} \leqslant n,\,\,a > 1$, then

$${\mathbf{M}}{{{{\mu }}}_{{{{r}_{1}}}}}\left( L \right) \succcurlyeq C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty .$$

$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} {\mathbf{M}}{{\mu }_{r}}\left( L \right) \succcurlyeq C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad ~n \to \infty .$$

Proof. We have

$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} {\mathbf{M}}{{{{\mu }}}_{r}}\left( L \right) \geqslant {\mathbf{M}}{{{{\mu }}}_{{{{r}_{1}}}}}\left( L \right).~$$

Since $m{{k}^{{ - {{r}_{1}}}}} \to 0, \; n \to \infty $, then ${{(1 - {{k}^{{ - {{r}_{1}}}}})}^{{m - {{r}_{1}}}}} \to 1, \; n \to \infty $. From this, using (2.2), we obtain

$${\mathbf{M}}{{\mu }_{{{{r}_{1}}}}}\left( L \right) \succcurlyeq C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad n \to \infty .$$

Lemma 9 is proved.

Lemmas 8 and 9 immediately imply the following lemma.

Lemma 10. If ${{m}^{a}} \leqslant n \leqslant {{k}^{{{{m}^{{{\beta }}}}}}},~\,\,a > 1,{{\;\beta }} < 1$, then

$$\mathop \sum \limits_{r \leqslant {{r}_{1}}} {\mathbf{M}}{{\mu }_{r}}\left( L \right) \approx {\mathbf{M}}{{\mu }_{{{{r}_{1}}}}}\left( L \right) \approx C_{n}^{{{{r}_{1}}}}C_{m}^{{{{r}_{1}}}}{{k}^{{{{r}_{1}} - r_{1}^{2}}}},\quad ~n \to \infty .~$$

The proofs of Lemmas 11–13 presented below are not given, since they are completely analogous to the proof of Lemma 10.

Lemma 11. If ${{m}^{a}} \leqslant n,\,\,a > 1$, then

$$\mathop \sum \limits_{r \geqslant {{r}_{2}}} {\mathbf{M}}{{{{\mu }}}_{r}}\left( L \right) \approx {\mathbf{M}}{{{{\mu }}}_{{{{r}_{2}}}}}\left( L \right) \approx C_{n}^{{{{r}_{2}}}}C_{m}^{{{{r}_{2}}}}{{k}^{{{{r}_{2}} - r_{2}^{2}}}},~\quad n \to \infty $$

Lemma 12. If $n \leqslant m$, then

$$\mathop \sum \limits_{r \geqslant {{r}_{3}}} {\mathbf{M}}{{{{\mu }}}_{r}}\left( L \right) \approx {\mathbf{M}}{{{{\mu }}}_{{{{r}_{3}}}}}\left( L \right) \approx C_{n}^{{{{r}_{3}}}}C_{m}^{{{{r}_{3}}}}{{k}^{{{{r}_{3}} - r_{3}^{2}}}},~\quad n \to \infty .$$

Lemma 13. If ${{m}^{a}} \leqslant n \leqslant {{k}^{{{{m}^{{{\beta }}}}}}},~\,\,a > 1,\,\,{{\beta }} < 1$, then

$${\mathbf{M}}{{\zeta }}\left( L \right) \approx {\mathbf{M}}{{{{\zeta }}}_{1}}\left( L \right) \approx \mathop \sum \limits_{r \in {{\phi }_{1}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}},\quad n \to \infty .$$

Lemma 14. If ${{m}^{a}} \leqslant n \leqslant ~{{k}^{{{{m}^{{{\beta }}}}}}},~\,\,a > 1,\,\,\beta < 1$, then

$${\mathbf{D}}{{{{\zeta }}}_{1}}(L){\text{/}}{{({\mathbf{M}}{{{{\zeta }}}_{1}}(L))}^{2}} \to 0,\quad ~n \to \infty .$$

Proof. We have

$${\mathbf{D}}{{{{\zeta }}}_{1}}\left( L \right) = {\mathbf{M}}{{\left( {{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}} - {{\left( {{\mathbf{M}}{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}}.$$

(2.3)

It is easy to see that

$${\mathbf{M}}{{\left( {{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}} \leqslant ~\mathop \sum \limits_{r,l \in {{\phi }_{1}}} \mathop \sum \limits_{\begin{array}{*{20}{c}} {{{{v}}_{1}} \in V_{r}^{m},{{{v}}_{2}}~ \in V_{l}^{m}} \\ {{{w}_{1}} \in W_{r}^{n},~{{w}_{2}} \in W_{l}^{n}~} \end{array}} \mathop \sum \limits_{\begin{array}{*{20}{c}} {{{\sigma '}} \in E_{k}^{r}} \\ {{{\sigma ''}} \in E_{k}^{l}} \end{array}} \left| N \right|{\text{/}}{{k}^{{mn}}},$$

where $N = {{N}_{{\left( {{{{v}}_{1}},{{\sigma }},{{w}_{1}}} \right)}}} \cap {{N}_{{\left( {{{{v}}_{2}},{{\sigma ''}},~{{w}_{2}}} \right)}}}$. Hence, using Lemmas 3 and 5, we obtain

$${\mathbf{M}}{{\left( {{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}} \leqslant \mathop \sum \limits_{r,l \in {{\phi }_{1}}} \mathop \sum \limits_{b = 0}^{{\text{min}}(r,l)} {{k}^{{r + l}}}{{k}^{{ - {{r}^{2}} - {{l}^{2}} + lb}}}C_{n}^{r}C_{r}^{b}C_{{n - r}}^{{l - b}}C_{m}^{r}C_{m}^{l}$$

$$ \leqslant \mathop \sum \limits_{r,l \in {{\phi }_{1}}} C_{n}^{r}C_{n}^{l}C_{m}^{r}C_{m}^{l}{{k}^{{r + l}}}{{k}^{{ - {{r}^{2}} - {{l}^{2}}}}}(1 + {{\delta }}(n)),~$$

(2.4)

where ${{\delta }}(n) \to 0$ at n → ∞.

However, by Lemma 13

$${{\left( {{\mathbf{M}}{{{{\zeta }}}_{1}}\left( L \right)} \right)}^{2}} \approx \mathop \sum \limits_{r,l \in {{\phi }_{1}}} C_{n}^{r}C_{n}^{l}C_{m}^{r}C_{m}^{l}{{k}^{{r + l}}}{{k}^{{ - {{r}^{2}} - {{l}^{2}}}}},\quad n \to \infty . \; $$

(2.5)

From (2.3)–(2.5) the assertion of the lemma being proved follows. Lemma 14 is proved.

Lemmas 15–17 below are proved similarly to Lemma 14.

Lemma 15. If ${{m}^{a}} \leqslant n \leqslant {{k}^{{{{m}^{\beta }}}}},\,\,a > 1,\,\,{{\beta }} < 1$, then

$${\mathbf{D}}{{{{\mu }}}_{{{{r}_{1}}}}}\left( L \right){\text{/}}{{\left( {{\mathbf{M}}{{\mu }_{{{{r}_{1}}}}}\left( L \right)} \right)}^{2}} \to 0,\quad n~\,\, \to \infty .$$

Lemma 16. If ${{m}^{a}} \leqslant n,\,\,a > 1$, then

$${\mathbf{D}}{{{{\mu }}}_{{{{r}_{2}}}}}(L){\text{/}}{{({\mathbf{M}}{{{{\mu }}}_{{{{r}_{2}}}}}(L))}^{2}} \to \; 0~,\quad n \to \infty .$$

Lemma 17. If $n \leqslant m$, then

$${\mathbf{D}}{{{{\mu }}}_{{{{r}_{3}}}}}(L){\text{/}}{{({\mathbf{M}}{{{{\mu }}}_{{{{r}_{3}}}}}(L))}^{2}} \; \to 0,\quad n \to \infty .$$

We assume ${v} \in V_{r}^{m},~\,\,{{\sigma }} \in E_{k}^{r},\,\,w \in W_{r}^{n}$. On $\mathfrak{M}_{{mn}}^{k} = \left\{ L \right\}$, we consider a random variable ${{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}(L),~$ equal to 1 if $L \in {{N}_{{\left( {{v},{{\sigma }},w} \right)}}}$, and equal to 0 otherwise. We put

$${{\xi }}\left( L \right) = \mathop \sum \limits_{r = 1}^{{\text{min}}\left( {m,n} \right)} \mathop \sum \limits_{{v} \in V_{r}^{m},~w \in W_{r}^{n}} \mathop \sum \limits_{{{\sigma }} \in E_{k}^{r}} ~{{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right),$$

$${{{{\xi }}}_{1}}\left( L \right) = \mathop \sum \limits_{r \in {{\phi }_{2}}} \mathop \sum \limits_{{v} \in V_{r}^{m},~w \in W_{r}^{n}} \mathop \sum \limits_{{{\sigma }} \in E_{k}^{r}} ~{{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right).~$$

Lemma 18. If $n \leqslant m \leqslant ~{{k}^{{{{n}^{{{\beta }}}}}}},\,\,{{\beta }} < 1{\text{/}}2$, then at n → ∞, the following relation is fulfilled for almost all matrices L from $\mathfrak{M}_{{mn}}^{k}$:

$$\xi \left( L \right) \approx {{{{\xi }}}_{1}}\left( L \right) \approx \mathop \sum \limits_{r \in {{\phi }_{2}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}}.$$

Proof. We estimate the probability of an event ${{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right)~\,\, = 1,\,\,{v} \in V_{r}^{m},\,\,{{\sigma }} \in E_{k}^{r},\,\,w \in W_{r}^{n}$, denoted below by $P({{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right) = 1)$. By Lemma 1

$$P({{{{\xi }}}_{{\left( {{v},{{\sigma }},w} \right)}}}\left( L \right) = 1) \; = {\text{|}}{{N}_{{\left( {{v},{{\sigma }},w} \right)}}}{\text{|}}/{\text{|}}\mathfrak{M}_{{mn}}^{k}{\text{|}} = {{k}^{{ - {{r}^{2}}}}}.$$

Therefore, according to Lemma 4,

$${\mathbf{M}}~{{\xi }}\left( L \right) \approx {\mathbf{M}}{{{{\xi }}}_{1}}\left( L \right) \approx \mathop \sum \limits_{r \in {{\phi }_{2}}} C_{n}^{r}C_{m}^{r}{{k}^{{r - {{r}^{2}}}}},\quad n \to \infty .$$

(2.6)

From (2.6) and Lemma 6, using the scheme of the proof of Lemma 14, we obtain

$${\mathbf{D}}{{{{\xi }}}_{1}}(L){\text{/}}{{({\mathbf{M}}{{{{\xi }}}_{1}}(L))}^{2}},\quad n \to \infty .$$

(2.7)

From (2.6), (2.7), and Lemma 7, the assertion of the lemma to be proved follows. Lemma 18 is proved.

The assertions of Theorem 1 follow directly from Lemmas 7, 10, 11, 13, and 14–16, while the assertions of Theorem 2 follow directly from Lemmas 7, 12, 17, 18, and the inequality ${{\zeta }}\left( L \right) \leqslant {{\xi }}\left( L \right)$.

4 3. ESTIMATES OF THE TYPICAL VALUES OF THE NUMBER OF MINIMAL INFREQUENT EFS AND THE LENGTH OF THE MINIMUM INFREQUENT EFS

We put $L \in \mathfrak{M}_{{mn}}^{k},L = ({{a}_{{ij}}}),i = 1,~ \ldots ,~m, \; j = 1,~ \ldots ,~n$; ${{\sigma }} \in E_{k}^{r},{{\sigma }} = \left( {{{{{\sigma }}}_{1}},~ \ldots ,{{{{\sigma }}}_{r}}} \right);w \in W_{r}^{n},w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}$.

The set w is called a σ-covering for matric L of length r if for any $i \in \left\{ {1,2, \ldots ,~m} \right\}$ there are $j \in \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}$ such that ${{a}_{{ij}}} \ne {{\sigma }_{j}}$. We will consider that the σ-covering w is generated by the set σ.

The set w, which is a σ-covering for matric L is called an irredundant if for any $~t \in \left\{ {1,~2, \ldots , \; r} \right\}$ the set $w{{\backslash }}\{ {{j}_{t}}\} $ is not a γ_t-covering for matrix L, where ${{{{\gamma }}}_{t}} = \left( {{{{{\sigma }}}_{1}},~ \ldots ,~{{{{\sigma }}}_{{t - 1}}},~{{{{\sigma }}}_{{t + 1}}},~ \ldots ,~{{{{\sigma }}}_{r}}~} \right)$. If w is an irredundant σ-covering for matric L, it is easy to see that the columns of matrix L with numbers from w contain a submatrix that, up to row permutation, has the form

$$\left( {\begin{array}{*{20}{c}} {{{{{\beta }}}_{1}}~{{{{\sigma }}}_{2}}~{{{{\sigma }}}_{3}}~ \ldots ~{{{{\sigma }}}_{{r - 1}}}~{{{{\sigma }}}_{r}}} \\ {{{{{\sigma }}}_{1}}~{{{{\beta }}}_{2}}~{{{{\sigma }}}_{3}}~ \ldots ~{{{{\sigma }}}_{{r - 1}}}~{{{{\sigma }}}_{r}}} \\ {~ \ldots } \\ {~{{{{\sigma }}}_{1}}~{{{{\sigma }}}_{2}}~{{{{\sigma }}}_{3}}~ \ldots ~{{{{\sigma }}}_{{r - 1}}}~{{{{\beta }}}_{r}}} \end{array}} \right),$$

where ${{{{\beta }}}_{p}} \ne {{\sigma }_{p}}$ at $p = 1,~2, \ldots ,~r.~$ Such a submatrix is called a σ-submatrix.

Note that in the case when the descriptions of objects from the sample D are taken as the rows of matrix L, then the set $w \in W_{r}^{n}, \; w = \left\{ {{{j}_{1}},~ \ldots ,~{{j}_{r}}} \right\}$, is an irredundant σ-covering for matric L if and only if the EF (σ, H), $H = \left\{ {{{x}_{{{{j}_{1}}}}}, \ldots ,~{{x}_{{{{j}_{r}}}}}} \right\}$, is minimal infrequent in D.

We introduce the following notation: $B(L,~\sigma ),L \in \mathfrak{M}_{{mn}}^{k},{{\sigma }} \in E_{k}^{r}$, is the set of all irredundant of the σ‑covering for matrix $L;\,\,S(L,~\sigma ),\,\,L \in \mathfrak{M}_{{mn}}^{k},\,\,{{\sigma }} \in E_{k}^{r}$, is the set of all σ-matrix submatrices L; B_r(L, σ), $L \in \mathfrak{M}_{{mn}}^{k},{{\sigma }} \in E_{k}^{r}$, is the set of all sets in B(L, σ) of length $r;{{S}_{r}}\left( {L,~\sigma } \right),{{\;}}L \in \mathfrak{M}_{{mn}}^{k},{{\;\sigma }} \in E_{k}^{r}$, is the set of all submatrices in $S\left( {L,~\sigma } \right)$ of order $r; \; B\left( L \right), \; L \in \mathfrak{M}_{{mn}}^{k}$, is the set of all irredundant σ-covering for matric L, in which each covering occurs as many times as the number of sets of $E_{k}^{r}$ it generates; $S\left( L \right), \; L \in \mathfrak{M}_{{mn}}^{k}$, is the set of all σ-submatrices of matrix L for all σ from $E_{k}^{r}$;

$$\left| {B\left( L \right)} \right| = \mathop \sum \limits_{r = 1}^n \mathop \sum \limits_{\sigma \in E_{k}^{r}} \left| {{{B}_{r}}\left( {L,{{\sigma }}} \right)} \right|;$$

$$\left| {S\left( L \right)} \right| = \mathop \sum \limits_{r = 1}^n \mathop \sum \limits_{\sigma \in E_{k}^{r}} \left| {{{S}_{r}}\left( {L,~{{\sigma }}} \right)} \right|;$$

${{r}_{3}} = \left] {{\text{lo}}{{{\text{g}}}_{k}}m + {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}m} \right[; \; {{\phi }_{2}}$ – interval [1, r₃]; ${{r}_{4}} = \left[ {0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}mn - {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n} \right];$ ${{r}_{5}} = \left] {0.~5{\text{lo}}{{{\text{g}}}_{k}}mn - 0.~5{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}mn + {\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n} \right[$; ϕ₃ – interval [r₄, r₅]; r₆ = ]log_km + log_klog_km + ${\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}{\text{lo}}{{{\text{g}}}_{k}}n[$; ϕ₄ is the interval [1, r₆].

Theorem 3 [3]. If ${{m}^{a}} \leqslant n \leqslant {{k}^{m}},~\,\,a > 1, \; k \geqslant 2$, then the following relations are valid at n → ∞ for almost all L matrices from $\mathfrak{M}_{{mn}}^{k}$:

$$\mathop \sum \limits_{r \leqslant {{r}_{4}}} \left| {{{B}_{r}}\left( L \right)} \right| \approx \left| {{{B}_{{{{r}_{4}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{4}}}}C_{m}^{{{{r}_{4}}}}r!{{\left( {k - 1} \right)}^{{{{r}_{4}}}}}{{k}^{{{{r}_{4}} - r_{4}^{2}}}},$$

$$\mathop \sum \limits_{r \geqslant {{r}_{5}}} \left| {{{B}_{r}}\left( L \right)} \right| \approx \left| {{{B}_{{{{r}_{5}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{5}}}}C_{m}^{{{{r}_{5}}}}r!{{\left( {k - 1} \right)}^{{{{r}_{5}}}}}{{k}^{{{{r}_{5}} - r_{5}^{2}}}},$$

$$\left| {B\left( L \right)} \right| \approx \left| {S\left( L \right)} \right| \approx \mathop \sum \limits_{r \in {{\phi }_{3}}} C_{n}^{r}C_{m}^{r}r!{{\left( {k - 1} \right)}^{r}}{{k}^{{r - {{r}^{2}}}}},$$

and the lengths of almost all sets from B(L) belong to the interval ϕ₃.

Theorem 4. If $n \leqslant m \leqslant {{k}^{{{{n}^{{{\beta }}}}}}},\,\,{{\beta }} < 1{\text{/}}2,~\,\,k \geqslant 2$, then the following relations are valid at n → ∞ for almost all matrices L from $\mathfrak{M}_{{mn}}^{k}$:

$$\mathop \sum \limits_{r \geqslant {{r}_{6}}} \left| {{{B}_{r}}\left( L \right)} \right| \approx \left| {{{B}_{{{{r}_{6}}}}}\left( L \right)} \right| \approx C_{n}^{{{{r}_{6}}}}C_{m}^{{{{r}_{6}}}}r!{{\left( {k - 1} \right)}^{{{{r}_{6}}}}}{{k}^{{{{r}_{6}} - r_{6}^{2}}}},$$

$$\left| {B\left( L \right)} \right| \leqslant \left| {S\left( L \right)} \right| \approx \mathop \sum \limits_{r \in {{\phi }_{2}}} C_{n}^{r}C_{m}^{r}r!{{\left( {k - 1} \right)}^{r}}{{k}^{{r - {{r}^{2}}}}},$$

and the lengths of almost all sets from B(L) belong to the interval ϕ₄.

The scheme of the proof of Theorem 4 is similar to that of the proof of Theorem 2.

Thus, in each of the two cases considered, the typical length of a set of U(L) and the typical length of a set of B(L) belong to the same interval. The results of Theorems 1, 3 and Theorems 2, 4 are illustrated, respectively, in Figs. 1 and 2.

5 CONCLUSIONS

Topical issues of logical analysis of integer data concerning the research on the metric (quantitative) properties of sets of frequent and infrequent elements of such data are considered. The technique for obtaining estimates for the typical values of the main numerical characteristics of the specified sets has been improved and new estimates for such characteristics have been found. A theoretical substantiation of the expediency (in terms of reducing time costs) of using methods for searching for frequent elements at the stage of training classifiers based on a logical analysis of the training sample is given.

The results of the study carried out in this paper are also important for a number of other applied areas, among which it is worth highlighting the searching for associative rules in data. In this case D is called a database, and each object of the database D is a transaction. The associative rule establishes a relationship between two frequent EFs, according to which one frequent EF (premise) with some “certainty” entails another frequent EF (consequence). In this case, the premise and the consequence are generated by one common frequent EF. Questions of the synthesis of associative rules arose in connection with the analysis of the consumer basket [11].

REFERENCES

L. V. Baskakova and Yu. I. Zhuravlev, “Model of recognition algorithms with representative sets and support set systems,” Zh. Vychisl. Mat. Mat. Fiz. 21 (5), 1264–1275 (1981).
MathSciNet Google Scholar
P. L. Hammer, “Partially defined Boolean functions and cause-effect relationships,” in Lectures at the Int. Conf. on Multi-Attribute Decision Making via O.R.-based Expert Systems (University of Passau, Passau, Germany, 1986).
E. V. Dyukova and Yu. I. Zhuravlev, “Discrete analysis of feature descriptions in recognition problems of high dimensionality,” Comput. Math. Math. Phys. 40 (8), 1214–1227 (2000).
MathSciNet Google Scholar
E. V. Dyukova and N. V. Peskov, “Search for informative fragments in descriptions of objects in discrete recognition procedures,” Comput. Math. Math. Phys. 42 (5), 711–723 (2002).
MathSciNet Google Scholar
Yu. I. Zhuravlev, V. V. Ryazanov, and O. V. Sen’ko, Recognition. Mathematical Methods. Software System. Practical Applications (FAZIS, Moscow, 2006) [in Russian].
Google Scholar
N. Dragunov, E. Djukova, and A. Djukova, “Supervised classification and finding frequent elements in data,” in VIII Int. Conf. on Information Technology and Nanotechnology (ITNT) (IEEE, Samara, 2022).
E. V. Djukova and A. P. Djukova, “On the complexity of learning logical classification procedures,” “Informatics and Applications” 16 (4), 57–62 (2022).
A. E. Andreev, “On the asymptotic behavior of the number of dead-end tests and the length of the minimum test for almost all tables,” Probl. Kibern., No. 41, pp. 117–142 (1984) [in Russian].
E. V. Djukova and R. M. Sotnezov, “Asymptotic estimates for the number of solutions of the dualization problem and its generalizations,” Comput. Math. Math. Phys. 51 (8), 1431–1440 (2011).
Article MathSciNet Google Scholar
V. N. Noskov and V. A. Slepyan, “On the number of dead-end tests for a class of tables,” Kibernetika, No. 1, 60–65 (1972) [in Russian].
C. Aggarwal Charu, Frequent Pattern Mining (Springer, New York, 2014). https://www.charuaggarwal.net/freqbook.pdf.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 119333, Moscow, Russia
A. P. Djukova & E. V. Djukova

Authors

A. P. Djukova
View author publications
You can also search for this author in PubMed Google Scholar
E. V. Djukova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to E. V. Djukova.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Publisher’s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Djukova, A.P., Djukova, E.V. Number of Solutions for Some Special Logical Analysis Problems of Integer Data. J. Comput. Syst. Sci. Int. 62, 817–826 (2023). https://doi.org/10.1134/S1064230723050052

Download citation

Received: 03 April 2023
Revised: 28 April 2023
Accepted: 05 June 2023
Published: 11 November 2023
Issue Date: October 2023
DOI: https://doi.org/10.1134/S1064230723050052

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Number of Solutions for Some Special Logical Analysis Problems of Integer Data

Abstract

Similar content being viewed by others

New Approaches to Solving Discrete Programming Problems on the Basis of Lexicographic Search

Sparsity of Integer Solutions in the Average Case

Compact representation of near-optimal integer programming solutions

1 INTRODUCTION

2 1. STATEMENT OF THE PROBLEM AND FORMULATION OF THE MAIN RESULTS

3 2. PROOFS OF THEOREMS 1 AND 2

4 3. ESTIMATES OF THE TYPICAL VALUES OF THE NUMBER OF MINIMAL INFREQUENT EFS AND THE LENGTH OF THE MINIMUM INFREQUENT EFS

5 CONCLUSIONS

REFERENCES

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s Note.

Rights and permissions

About this article

Cite this article

Navigation

Number of Solutions for Some Special Logical Analysis Problems of Integer Data

Abstract

Similar content being viewed by others

New Approaches to Solving Discrete Programming Problems on the Basis of Lexicographic Search

Sparsity of Integer Solutions in the Average Case

Compact representation of near-optimal integer programming solutions

1 INTRODUCTION

2 1. STATEMENT OF THE PROBLEM AND FORMULATION OF THE MAIN RESULTS

3 2. PROOFS OF THEOREMS 1 AND 2

4 3. ESTIMATES OF THE TYPICAL VALUES OF THE NUMBER OF MINIMAL INFREQUENT EFS AND THE LENGTH OF THE MINIMUM INFREQUENT EFS

5 CONCLUSIONS

REFERENCES

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s Note.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation