Introduction. The formation of sets of independent random variables is necessary when reducing the dimension of information processing problems and the synthesis of effective decision-making algorithms. A priori information about the independence of random variables makes it possible to improve the approximation properties of the nonparametric estimate of the probability density in comparison with the kernel statistics for dependent variables [1,2,3]. The results obtained in these works are confirmed by studying the asymptotic properties of a nonparametric estimate for the equation of a separating surface of kernel type in the problem of pattern recognition [4].

The traditional method for testing the hypothesis of the independence of random variables is based on the Pearson χ2 test. However, its application includes the difficultly formalized stage of dividing the range of values of random variables into multidimensional intervals and requires large volumes of initial statistical data, which is associated with the transition from continuous to discrete random variables [5]. Methods for discretizing the interval of values of a one-dimensional random variable are considered in [6,7,8]. In [7, 9], formulas for the discretization of the range of values of a multidimensional random variable with a normal distribution law are given, obtained on the basis of an analysis of the asymptotic properties of the histogram. The work [10] substantiates the method of optimal discretization of the range of values of a multidimensional random variable. This technique is consistent with the formula for choosing the number of sampling intervals for a one-dimensional random variable with a uniform distribution law (Heyhold and Gaede’s formula) [11], and its implementation is coupled with the estimation of the integral of the square of a multidimensional probability density. When evaluating this functional on the probability density, the first positive results were obtained [12, 13]. Therefore, it was required to develop a new method for testing the hypothesis under consideration, which avoids the decomposition problem for the domain of values of random variables. A similar problem is solved when testing the hypothesis about the identity of the distribution laws of random variables based on a nonparametric pattern recognition algorithm [14,15,16]. These works show the possibility of replacing the hypothesis testing problem about the distributions of random variables with the hypothesis testing problem about the equality of the pattern recognition error to a certain threshold value.

The purpose of this article is to develop a methodology for the formation of sets of independent components of a multidimensional random variable using a nonparametric algorithm for pattern recognition of kernel type that meets the criterion of maximum likelihood.

Testing the hypothesis of the independence of the components of a two-dimensional random variable. Let there be a sample \( V=\left({x}_1^i,{x}_2^i,i=1,\dots, n\right) \) of volume n formed from independent observations of a two-dimensional random variable x = (x1, x2). Observations x are extracted from general populations characterized by unknown probability densities p(x1)p(x2) or p(x1, x2). It is necessary to check the hypothesis of the independence of the random variables x1, x2:

$$ {H}_0:p\left({x}_1,{x}_2\right)\equiv p\left({x}_1\right)p\left({x}_2\right). $$
(1)

To test the hypothesis H0 (1), we will solve a two-alternative pattern recognition problem. The classes Ω1, Ω2 correspond to the domains of definition of the probability densities p(x1)p(x2), p(x1, x2). Under these conditions, the Bayesian decision rule corresponding to the criterion of maximum likelihood has the form

$$ m(x):\left\{\begin{array}{l}x\in {\Omega}_1\kern0.5em \mathrm{if}\kern0.5em p\left({x}_1,{x}_2\right)<p\left({x}_1\right)p\left({x}_2\right);\\ {}x\in {\Omega}_2\kern0.5em \mathrm{if}\kern0.5em p\left({x}_1,{x}_2\right)<p\left({x}_1\right)p\left({x}_2\right).\end{array}\right. $$
(2)

In contrast to the traditional formulation of the pattern recognition problem, in the synthesis of the decision rule (2) under conditions of initial uncertainty, there is clearly no training sample. The estimation of the probability densities p(x1)p(x2), p(x1, x2) is carried out using the sample V. For this, nonparametric estimates of the probability densities of the Rosenblatt–Parzen type are used [1,2,3, 17, 18]:

$$ \overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right)=\frac{1}{n^2{c}_1{c}_2}\sum \limits_{i=1}^n\sum \limits_{j=1}^n\Phi \left(\frac{x_1-{x}_1^i}{c_1}\right)\Phi \left(\frac{x_2-{x}_2^i}{c_2}\right); $$
(3)
$$ \overline{p}\left({x}_1,{x}_2\right)=\frac{1}{nc_1{c}_2}\sum \limits_{i=1}^n\Phi \left(\frac{x_1-{x}_1^i}{c_1}\right)\Phi \left(\frac{x_2-{x}_2^i}{c_2}\right), $$
(4)

where c1, c2 are the blur coefficients of the kernel functions Φ(uv), v = 1, 2.

In statistics (3) and (4), the kernel functions Φ(uv) satisfy the conditions

$$ 0\le \Phi \left({u}_v\right)<\infty; \kern1em \Phi \left({u}_v\right)=\Phi \left({u}_v\right);\kern1em \int_{-\infty}^{+\infty}\Phi \left({u}_v\right) du=1;\kern1em \int_{-\infty}^{+\infty }{u}^m\Phi \left({u}_v\right){du}_v<\infty; \kern1em 0\le m<\infty; \kern1em v=1,2. $$

The values of the kernel function blur coefficients c1, c2 decrease with an increase in the sample volume n of statistical data V.

The nonparametric statistics (3), (4) are asymptoticly unbiased and consistent [1, 2, 18]. It was found that the minimum value of the standard deviation (RMSD) \( \overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right) \) from p(x1)p(x2) with an increase in the volume n of initial statistical data tends to zero in proportion to the value n–4/5. The order of such convergence of the nonparametric estimate \( \overline{p}\left({x}_1,{x}_2\right) \) to the probability density p(x1, x2) is less and amounts to n–4/6. The a priori information on the independence of random variables enables an increase in the order of convergence of a nonparametric kernel-type probability density estimate.

Taking into account expressions (2)–(4), we write the nonparametric decision rule for the classification of random variables x = (x1, x2) as

$$ \overline{m}(x):\left\{\begin{array}{l}x\in {\Omega}_1\kern0.5em \mathrm{if}\kern0.5em \overline{p}\left({x}_1,{x}_2\right)<\overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right);\\ {}x\in {\Omega}_2\kern0.5em \mathrm{if}\kern0.5em \overline{p}\left({x}_1,{x}_2\right)<\overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right).\end{array}\right. $$
(5)

In the modification of the nonparametric algorithm for pattern recognition (5), the optimal blur coefficients c1, c2 of the kernel estimates of the probability densities \( \overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right) \) and \( \overline{p}\left({x}_1,{x}_2\right) \) are selected based on the results of the analysis of their approximation properties.

The optimal value of the blur coefficient cv of the kernel functions of the nonparametric estimate of the one-dimensional probability density under the condition of the minimum standard deviation \( \overline{p}\left({x}_v\right) \) of p(xv) is determined by the equation [18]:

$$ {c}_v^{\ast }={\left[{\left\Vert \Phi (u)\right\Vert}^2/\left(n{\left\Vert {p}^{(2)}\left({x}_v\right)\right\Vert}^2\right)\right]}^{1/5}. $$
(6)

Here p(2)(xv) is the second derivative of the probability density p(xv) with respect to xv;

$$ {\left\Vert \Phi (u)\right\Vert}^2=\int_{-\infty}^{\infty }{\Phi}^2(u) du;\kern1em {\left\Vert {p}^{(2)}\left({x}_v\right)\right\Vert}^2=\int_{-\infty}^{\infty }{\left({p}^{(2)}\left({x}_v\right)\right)}^2{dx}_v. $$

After simple transformations, we represent Eq. (6) in the form

$$ {c}_v^{\ast }={\upbeta}_v{\sigma}_v{n}^{-1/5}, $$

where σv is the standard deviation of the random value xv;

$$ {\upbeta}_v={\left[\left\Vert \Phi (u)\right\Vert /\left({\upsigma}_v^5{\left\Vert {p}^{(2)}\left({x}_v\right)\right\Vert}^2\right)\right]}^{1/5}. $$

The component of the functional βv, defined by the expression

$$ {\uplambda}_v={\upsigma}_v^5{\left\Vert {p}^{(2)}\left({x}_v\right)\right\Vert}^2, $$

is a constant for a number of unimodal probability densities. The value λv is determined by the form of the probability density and does not depend on its parameters [19,20,21,22,23,24,25,26,27].

According to the data of computational experiments for a family of one-dimensional lognormal distribution laws, the authors of this article have determined the parameter estimates

$$ \overline{\upbeta_v}=1.49\exp \left(-0.589{\overline{\upalpha_v}}^{0.857}\right)+0.021, $$

where \( {\overline{\upeta}}_v \) and are the estimates of the coefficients of counter-excess and asymmetry of the random variable xv, v = 1, 2, respectively.

The optimal kernel function blur coefficient in the statistic \( \overline{p}\left({x}_v\right) \) is estimated by the formula

$$ {\overline{c}}_v^{\ast }={\overline{\upbeta}}_v{\overline{\upalpha}}_v{n}^{-1/5},v=1,2. $$
(7)

According to the proposed methodology and taking into account the results of research [27], for a two-dimensional random variable x = (x1, x2), the estimates of the optimal blur coefficients of the kernel statistic \( \overline{p}\left({x}_1,{x}_2\right) \) are determined by the expression

$$ {\overline{\overline{c}}}_v^{\ast }=\overline{\upbeta}{\overline{\upalpha}}_v{n}^{-1/6},v=1,2. $$
(8)

In Eq. (8), the parameter \( \overline{\upbeta} \) takes on the value

$$ \overline{\upbeta}=1.498\exp \left(-0.524{\overline{\upalpha}}^{0.809}\right)+0.0356, $$
$$ \overline{\upalpha}={\left({\overline{\updelta}}_1^2+{\overline{\updelta}}_2^2+{\overline{\upeta}}_1^2+{\overline{\upeta}}_2^2\right)}^{1/2}. $$
(where)

The obtained estimates for the functionals βv, v = 1, 2, and β refine the research results [27,28,29].

Let us estimate the error probabilities of recognizing ρ1, ρ2 using the nonparametric decision rule (5) for pattern recognition from the statistical data V. In this case, we will take into account the estimates of the blur coefficients (7), (8) of the kernel statistics \( \overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right),\overline{p}\left({x}_1,{x}_2\right). \)

The estimates \( {\overline{\uprho}}_t \) are calculated in the “sliding exam” mode on the sample V under the assumption that its elements belong to the class Ωt:

$$ {\overline{\uprho}}_t=\frac{1}{n}\sum \limits_{j=1}^n1\left(\delta (j),\overline{\delta}(j)\right), $$

where \( 1\left(\delta (j),\overline{\delta}(j)\right) \) is the indicator function; δ(j) = t is an indicator of type \( {x}^j=\left({x}_1^j,{x}_2^j\right)\in {\Omega}_t;\overline{\updelta}(j) \) is the “decision” of algorithm (5) for membership of the situation x j in one of the classes Ωt, t = 1, 2.

When calculating \( {\overline{\uprho}}_t \) using the “sliding exam” methodology, the situation \( {x}^j=\left({x}_1^j,{x}_2^j\right) \) from the sample V, which is controlled by algorithm (5), is excluded from the process of generating statistics (3) and (4).

When forming the values \( {\overline{\uprho}}_t \), the indicator function is defined as

$$ 1\left(\updelta (j),\overline{\updelta}(j)\right)=\left\{\begin{array}{l}0\kern1em \mathrm{if}\kern1em \updelta (j)=\overline{\updelta}(j);\\ {}1\kern1em \mathrm{if}\kern1em \updelta (j)\ne \overline{\updelta}(j).\end{array}\right. $$

Let us compare the values \( {\overline{\uprho}}_t,{\overline{\uprho}}_2 \) under the assumption that the elements of the sample V belong to the classes Ω1, Ω2, respectively. The hypothesis H0 is satisfied if \( {\overline{\uprho}}_t<{\overline{\uprho}}_2. \) This assertion is true, since with the independence of the random variables in the domains of definition Ω1, Ω2 of the estimates of the probability densities \( \overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right) \) and \( \overline{p}\left({x}_1,{x}_2\right), \) the relation \( \overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right)>>\overline{p}\left({x}_1,{x}_2\right) \) is satisfied with the estimated error probability \( \overline{\uprho}. \) Otherwise, \( {\overline{\uprho}}_2<{\overline{\uprho}}_1, \) the random variables x1, x2 are dependent.

The probabilities of Bayesian errors ρ1, ρ2 of class recognition are linear functionals of the probability densities p(x1)p(x2) and p(x1, x2), respectively. Since the nonparametric estimates of the indicated probability densities have the properties of asymptotic convergence [4, 18], the asymptotic convergence of the statistical estimates \( {\overline{\uprho}}_1,{\overline{\uprho}}_2\kern0.5em \mathrm{to}\kern0.5em {\uprho}_1,{\uprho}_2 \) is assumed.

With bounded volumes n of the initial sample V, the problem of confidence-based estimation of the probabilities of pattern recognition errors arises. To solve it, one can use the traditional method of confidence assessment of probabilities [5] or the Kolmogorov–Smirnov test [30], in which the deviation \( {\overline{D}}_{12}=\left|{\overline{\uprho}}_1-{\overline{\uprho}}_2\right| \) is compared with the threshold value Dβ = [− ln(β0/2)/n]1/2. Here β0 is the probability (risk) of rejecting the hypothesis \( \overline{H}:{\uprho}_1<{\uprho}_2. \) If the relation \( {\overline{D}}_{12}<{D}_{\upbeta} \) holds, then the hypothesis \( {\overline{H}}_0 \) is valid and the risk of rejecting it does not exceed the value β0. When \( {\overline{D}}_{12}<{D}_{\upbeta} \), the hypothesis \( {\overline{H}}_0 \) is rejected.

Methods for the formation of sets of components of a multidimensional random variable. Suppose there is a sample of observations \( V=\left({x}_v^i,v=1,\dots, k,i=1,\dots, n\right) \) of volume n, composed of statistically independent observations of the components of a multidimensional random variable x = (xv, v = 1, ..., k). The form of the probability density p(x) is a priori unknown. From the statistical data of the sample V, using the above proposed hypothesis testing criterion

$$ {H}_{vj}:p\left({x}_v,{x}_j\right)\equiv p\left({x}_v\right)p\left({x}_j\right) $$
(9)

for the components xv, v = 1, ..., k, xj, j = 1, ..., k, v > j, it is required to generate sets of independent random variables x(t) = = (xv, vIt), t = 1, ..., m. Here It is the set of indices of components that make up the set x(t), and the number m of sets of components of a random variable x is unknown.

The proposed technique consists of three stages.

Stage 1 In accordance with the above recommendations, the hypothesis Hvj of type (9) is tested for each pair of components (xv, xj) of a multidimensional random variable x = (xv, v = 1, ..., k). The number of such pairs is 0.5k(k – 1).

Stage 2 Based on the results of stage 1, an information graph G(X, A) is constructed, where X is the set of vertices corresponding to the components of the random variable x; A is the set of edges. If the hypothesis Hvj holds, that is, the components xv, xj are independent, then there is an edge between two vertices xv, xj.

Stage 3 Analyze the information graph G(X, A) and determine the complete subgraphs G(Xt, At), t = 1, ..., m. If the components of the random variable x are independent, then each pair of vertices of the subgraph G(Xt, At) has an edge. Detect the complete subgraphs with algorithms to decompose the original graph using its adjacency matrix [31]. The components xv, vIt, corresponding to the vertices of the complete subgraph G(Xt, At) form a set of independent random variables. In this case, one can find a number of options for decomposing the information graph.

Analysis of the results of computational experiments. Let us check the efficiency of the proposed method when analyzing the data of a computational experiment. Let us investigate the efficiency of the procedure for forming sets of independent components on the volume n of the initial statistical data and the degree of dependence of random variables.

When designing computational experiments, the statistical data \( V=\left({x}_1^i,\dots, {x}_4^i,i=1,\dots, n\right) \) of the components x1, x2, x3 of a multidimensional random variable x = (x1, x2, x3, x4) are assumed to be independent. Their values are formed using sensors with uniform p(x1) = R(3; 1) and normal distribution laws p(x2) = N(3; 1), p(x3) = N(3; 1), where R(3; 1) and N(3; 1) are the distribution laws of random variables with mathematical expectation and standard deviation equal to three and one, respectively. The values of the component x4 are found from one or another dependence determined by various conditions of the study:

$$ {x}_4=\upvarphi \left({x}_1\right)={\upvarphi}_1^2-6{x}_1+10+\upvarepsilon; $$
(10)
$$ {x}_4={\upvarphi}_4\left({x}_1,{x}_2\right)={x}_1^2-{x}_2^2+6\left({x}_2-{x}_1\right)+20+\upvarepsilon . $$
(11)

Here ε are the values of a random variable with a normal distribution law N(0; 1).

Sensors of random variables x2, x3 with normal distribution laws are formed on the basis of expressions

$$ {x}_v=3+\left(\sum \limits_{l=1}^r{\upvarepsilon}_0^l-0.5r\right)6/\sqrt{3r},\kern1em v=2,3, $$

where \( {\varepsilon}_0^l \) are values of random variables with a uniform probability density on the interval [0; 1]; r = 12.

The technique of computational experiments is implemented in the Delphi-7 programming environment. To generate a random variable ε0 ∈ (0; 1) with a uniform distribution law, we use a standard function random and the procedure Randomize, which takes into account the time of day as the basis for generating pseudo-random numbers with a uniform distribution law.

The volume n of initial statistical data in computational experiments was 100, 500, 1000. For a specific volume n of initial data, the parameters \( {\overline{\uprho}}_1,{\overline{\uprho}}_2,\kern0.5em \overline{D} \) correspond to dependencies (10), (11). The parameter values \( {\overline{\uprho}}_1,{\overline{\uprho}}_2 \) are calculated 10 times and then averaged (cf. Table 1).

Table 1 Results of Testing Hypotheses about the Independence of Paired Combinations of a Four-Dimensional Random Variable

With relatively small volumes n = 100 of initial statistical data and using dependence (10), the values \( {\overline{\uprho}}_1 \) and \( {\overline{\uprho}}_2 \) for pairs of random variables (x1, x2), (x1, x3), (x2, x3), (x2, x4) and (x3, x4), are not reliably different. Under these conditions, the values \( {\overline{D}}_{12},{\overline{D}}_{13},{\overline{D}}_{23},{\overline{D}}_{24},{\overline{D}}_{34}, \) are less than the threshold Dβ = 0.192 with the risk β0 = 0.05 of rejecting the hypothesis \( {\overline{H}}_0. \) According to the conditions of the computational experiment, these paired sets of components of the random variable are a priori independent. With small volumes of initial data, there is an uncertainty in decision making, which follows from the analysis of the results of testing the hypotheses under study. This conclusion is explained by the difference in the convergence conditions, for example, in the nonparametric estimates of the probability densities \( \overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right) \) and \( \overline{p}\left({x}_1,{x}_2\right), \) which is confirmed by the results of studies in [1,2,3].

If the paired combinations of random components are dependent, then the proposed method, under the conditions of the considered computational experiment, unambiguously rejects the hypothesis \( {\overline{H}}_0. \) This conclusion is valid for the components (x1, x4) when using dependence (10) in the computational experiment. In this case, the inequality \( {\overline{\uprho}}_1>{\overline{\uprho}}_2 \) holds, i.e., the condition \( \overline{p}\left({x}_1,{x}_2\right)>\overline{p}\left({x}_1\right)\overline{p}\left({x}_2\right) \) is satisfied, and the decision that is made according to rule (5) is valid. The specified condition is reliable, since \( {\overline{D}}_{14}<{D}_{\upbeta} \) and \( {\overline{D}}_{14}=0.5 \) and Dβ = 0.192. With the complication of dependencies between random variables under the conditions of using the transformation (11) with n = 100, the above-mentioned regularities are preserved.

With an increase in the volume n of the initial statistical data V, the efficiency of the proposed method for testing hypotheses about the independence of random variables increases. With the volume of initial data n = 500, the hypothesis about the independence of paired combinations of random variables (x1, x2), (x1, x3), (x2, x3), (x3, x4) is fulfilled, since the corresponding errors of pattern recognition by the decision rule (5) are connected by the relation \( {\overline{\uprho}}_1<{\overline{\uprho}}_2. \) This relation holds for the paired combinations (x1, x3), (x3, x4) when using dependence (10) and the risk β0 = 0.05 in the computational experiment to reject the hypothesis \( {\overline{H}}_0. \) The dependence between the components (x1, x4) is more significant when \( {\overline{D}}_{14}=0.616 \) and Dβ = 0.086. As the relationship between random variables (11) becomes more complex, a reliable dependence is observed between (x1, x4), (x2, x4), as well as a reliable independence of the components (x1, x2), (x3, x4).

With the volume of initial data n =1000, the above-mentioned patterns for n = 500 are basically preserved. Application of the developed methodology enables detection of dependency characteristics contradicting the hypothesis \( {\overline{H}}_0 \) with a high level of reliability when using transformations (10), (11) in a computational experiment. If the components of the random variable x are a priori independent, then the inequality \( {\overline{\uprho}}_1<{\overline{\uprho}}_2 \) holds; however, under the conditions of the considered computational experiment, it can be reliable or unreliable.

Based on the data in the Table 1, we will form an information graph G(X, A), where X is the set of vertices that correspond to the components xv, v = 1, ..., 4, of the random variable x; A is the set of edges between the vertices of the graph. There is an edge between two vertices xv, xj if the corresponding components are independent (cf. Fig. 1).

Fig. 1
figure 1

Information graphs G(X, A) for n = 1000 using transformation (10), (11) (a, b, respectively).

When using transformation (10), there are two versions of complete subgraphs, which correspond to sets of independent components (x1, x2, x3) and (x2, x3, x4) (cf. Fig. 1a). The choice of a specific option is determined by the method of subsequent processing of the initial data. If transformation (11) is applied, then one set of independent random variables (x1, x2, x3) is found (cf. Fig. 1b). The results obtained correspond to the conditions for designing a computational experiment based on transformations (10), (11).

Conclusion. The proposed method for the formation of sets of independent components of a multidimensional random variable can be replaced by testing hypotheses about the independence of their paired combinations. To solve this problem, a two-dimensional nonparametric algorithm of kernel-type pattern recognition is used, which corresponds to the criterion of maximum likelihood. This approach allows us to bypass the problem of decomposition of the values of random variables into multidimensional intervals. When optimizing a nonparametric pattern recognition algorithm, it is advisable to use fast algorithms for selecting the blur coefficients, which is especially important when processing large volumes of data. Based on the results of testing hypotheses about the independence of two-dimensional random variables it is possible to build an information graph, whose vertices correspond to random variables, and edges define their independence. Decomposition of the information graph into complete subgraphs enables detection of different variants of sets of independent components of a multidimensional random variable. The choice of a particular set is determined by the adopted procedure for the subsequent processing of the initial data.

According to the results of computational experiment, it was established that the proposed technique is especially sensitive to the detection of dependent random variables, which is characteristic of small and large volumes of initial statistical data. In conditions of small volumes of values of a four-dimensional random variable (for n <500), it is impossible to make an unambiguous decision about the independence of random variables. At n ≥ 500 under the conditions of a computational experiment, it is possible to reliably detect the dependent components of a four-dimensional random variable. Independent random variables in these conditions are determined with varying degrees of reliability with limited amounts of initial statistical data. A promising direction for further research is the comparison of the proposed method with the traditional one based on the use of Pearson’s criterion with different formulas for discretization of the range of values of random variables.

The results obtained form the basis for the synthesis of a multi-level nonparametric system, where each level corresponds to a specific set of independent random variables. Such systems are efficient when processing large data volumes.

The study was carried out with the financial support of the Russian Foundation for Basic Research, the Government of the Krasnoyarsk Territory, and the Krasnoyarsk Regional Science Foundation within the framework of the scientific project No. 20-41-240001.