1 Introduction

An information system (IS) is usually represented as a triple \(S = (U, A, d)\), where \(U = \{u_{1}, u_{2}, \ldots , u_{m}\}\) is a set of uniformed objects ( records, examples, instances, etc.) referred to as a dataset, \(A = \{a_{1}, a_{2,} \ldots , a_{n}\}\) is the set of condition attributes ( features), and \(d\) is a decision attribute ( class attribute) of the dataset. The number of the objects of a dataset and the number of attributes describing this dataset are called the size and the dimension of the dataset, respectively. For instance, the dataset with \(U = \{u_{1}, u_{2}, \ldots , u_{m}\}\) and \(A = \{a_{1}, a_{2,} \ldots , a_{n}\}\) has a size of \(m\) objects and a dimension of \(n\) attributes.

One of the basic problems of ISs is the selection of a minimal subset of attributes (MSA) for a given dataset so that the selected MSA not only significantly reduces the dataset, but also provides a correct classification of objects within the reduced dataset.

In the rough set theory, every MSA is referred to as a reduct, the number of attributes composing a reduct is referred to as the size of the reduct, and the problem of finding the reducts for datasets is referred to as attribute reduction (AR) or feature selection, that is, one of the specific appearances of the unweighted set cover problem. In this study, the term attribute reduction is used.

AR provides many benefits for ISs such as: reducing the dimension and size of the dataset, simplifying the classification rules of the dataset with improving their classification efficiency, speeding up the data mining algorithms, reducing the amount of storage needed for ISs [17]. Due to these and other benefits, AR is widely used for preprocessing the datasets in ISs such as pattern recognition, knowledge discovery, data mining, information retrieval, computer vision, and bioinformatics [1, 2, 4, 6, 7].

From a methodological point of view, existing AR approaches are divided into two categories: Wrapper approaches (WAs) and Filter approaches (FAs). Since a WA selects an optimal reduct by minimizing the training error of the chosen classifier, it can achieve better classification accuracy for only particular classifiers. Moreover, the WAs are time-consuming. In contrast, the FAs evaluate attributes based on criteria independent of any classifier and find reducts by optimizing these criteria [5, 6]. In this study, we deal only with FAs.

According to the rough set theory, once a reduct for a dataset has been computed, the rules classifying the dataset are easily constructed by overlaying it over the originating dataset and reading the values off [1, 8]. But, unluckily, an optimal reduct is usually relative to a certain criterion, imposing some restrictions on the parameters such as the reduct size, the dataset classification accuracy, the query response time, the memory capacity required, and the degree of inconsistency of the reduced dataset [5, 911]. The basic solution for locating an optimal reduct is to generate all possible reducts and retrieve those that meet the given criterion. Unfortunately, the solution to such problem is NP-hard [5, 1012].

In order to reduce the computational complexity of AR, most AR methods involve the heuristic criteria based on measures such as distance, information, dependency ( correlation), learning model performance, consistency and so on [7, 1315]. According to the first four criteria, an attribute \(X\) is preferred over any other attribute \(Y\) if \(X\) induces a greater difference between the two-class conditional probabilities than \(Y\); if the information gain from \(X\) is greater than that from \(Y\); or if the association between \(X\) and the class \(C\) is higher than the association between \(Y\) and \(C\); and/or if the classification accuracy provided by \(X\) is higher than that by \(Y\). As for the consistency-based heuristic, it selects such a minimum number of attributes that separates the dataset into consistent subsets of objects [10, 13]. Unfortunately, every heuristic AR method generates a single reduct or a small number of possible reducts [2, 7, 13, 14, 16, 17] with the risk of overlooking the optimal ones. Since heuristic AR methods cannot guarantee the optimality of the reduct found [16, 17], they estimate the efficiency of a reduct via the experiments on the dataset for which it was generated [10, 18].

Above, it has been mentioned that the basic solution for locating an optimal reduct is to generate all possible reducts and retrieve those that meet the given criterion. But, unfortunately, there is only one such method developed by Skowron and Rauszer and named as the discernibility function (DF)-based AR [8, 10, 17, 1922]. It is based on the conversion of the DF of a dataset from the conjunctive normal form (CNF) to the disjunctive normal form (DNF). But since the CNF to DNF conversion is NP-hard, the DF-based AR is only applicable for simple datasets [8, 10, 14, 17, 19, 23].

The analysis of CNF to DNF conversion processes shows that there are two main factors complicating it: the number of prime implicants (PIs) and the number of redundant products (RPs) generated during the process. Recall that a product is a conjunction of at least two logic variables. In the process of the CNF to DNF conversion, every product can be absorbed by the others or grow up to a member of the last result. While every product of the first type is called as a RP, every product of the second type is called as a PI. Recall that the aim of the CNF to DNF conversion for a dataset is to obtain the set of PIs of the DF belonging to the dataset and to consider each PI as one certain reduct of the dataset.

In [24], it has been shown that the worst case time complexity (WCTC) of the conventional CNF to DNF conversion is a square of its worst case space complexity (WCSC). Therefore, in this study, we are focusing only on reducing the WCSC of the mentioned conversion. Note that the WCSC of the CNF to DNF conversion is the maximum possible sum of the theoretically achievable numbers of PIs and RPs in this conversion. In [1] and [25], it has been shown that the number of PIs of the DNF of a function of \(n\) variables may be no more than \(\left( {_{n/2}^n } \right) \). As for the number of RPs, usually, it is multiple times more than that of PIs [26]. In order to minimize the WCSC of the CNF to DNF conversion as many as possible, we first represent a DF by a \(q\times n\) bit-matrix (BM) \(M_{1}\) where the \(BC_{t\in \{1,2, \ldots ,q\}}\) and the attribute \(a_{j\in \{1,2, \ldots ,n\}}\) are represented by the row \(R_{t}\) and the column \(a_{j}\), respectively. For short, in sequel, instead of the phrase “the WCSC of the conversion of a BM to the DNF,” we will use the phrase “WCSC of the BM”.

We iteratively partition a BM \(M_{1}\) into the SubBMs (SBMs) such that, firstly, the WCSC of the SBM processed in any iteration cannot exceed the square root of the WCSC of the conversion of the whole DF, and secondly, the number of the generated PIs in any iteration cannot exceed the limit of \(\left( {_{n/2}^n } \right) /2\). The second transformation is based on dividing the RPs generated during the conventional CNF to DNF conversion into two categories [26]: preventable RPs (PRPs) and unpreventable RPs (URPs). While every PRP can be pre-detected and avoided without generation, every URP can be detected and removed only after it is generated. Fortunately, the ratio between the numbers of UPRs and PRPs is usually very small so that it cannot affect the complexity class of the CNF to DNF conversion. Moreover, the Algorithm_ILE developed in Sect. 5 catches and discards each URPs once it is generated. Due to this property of the second transformation, none of the results (intermediate or end) generated during a CNF to DNF conversion can contain any RP. This transformation is also iterative and can reduce the space complexity of the CNF to DNF conversion up to \(n-1\) times in each of iterations. We apply it for converting the SBMs produced by the first transformation to the appropriate PIs of the DF represented by the BM.

The paper is organized as follows. In Sect. 2, preliminaries to DF-based AR are given. In Sect. 3, the problem is stated and proved. In Sect. 4, unique properties of the approaches used for converting a BM into the appropriate set of PIs are exposed. In Sect. 5, the algorithm for converting a BM into the appropriate set of PIs and a numerical example demonstrating the work of the algorithm in detail are given. In Sects. 6 and 7, experimental results and conclusions are given.

2 Preliminaries

As it is known, every DF is always represented in conjunctive normal form (CNF), which is a conjunction of different disjunctions of \(1\le k\le n-1\) attributes [1, 8, 21, 22]. Each of such disjunctions is called a clause. The number of attributes composing a clause and the number of clauses composing a DF in CNF are called the size of the clause and the size of the DF (or the size of the CNF), respectively. For short, we denote the size of a set \(X\) by \(\left| X \right| \).

A clause \(C_{ij}\) for a dataset with \(n\) attributes and \(m\) objects is defined as [1, 8, 19, 22]:

$$\begin{aligned} C_{ij} =\vee _{k=1}^n a_k :a_k (u_i )\ne a_k (u_j ) \end{aligned}$$
(1)

where \(a_{k}(u_{i})\) and \(a_{k}(u_{j})\)    are the values of the attribute \(a_{k}\) for the objects \(u_{i\in \{1,2, \ldots ,m-1\}}\) and \(u_{j\in \{i+1,i+2,\ldots ,m\}}\), respectively. The DF for a dataset is formed as follows [1, 8, 10, 19]:

$$\begin{aligned} DF=\wedge _{i=1}^{m-1} \wedge _{j=i+1}^m C_{ij} \end{aligned}$$
(2)

This formula prescribes the logic multiplication (in sequel will be referred simply as multiplication) over all clauses derived from the same dataset. Usually, a DF contains many clauses absorbed (subsumed) by the other ones. We call such clauses as Redundant Clauses (RCs) and the ones are not absorbed by other clauses as Prime Clauses (PCs). In order to avoid superfluous operations and memory capacity related to RCs, before the computations in (2), the RCs should be removed. A DF without any RC, i.e., containing only PCs, is called a minimal DF and denoted by \(DF_\mathrm{min}\). In [27], it is shown that for datasets with numbers of objects in the range from thousands to tens of thousands, the ratio between \({\vert }\hbox {DF}{\vert }\) and \({\vert }DF_\mathrm{min}{\vert }\), usually, lies in the range from thousands to millions. Therefore, the algorithm obtaining the \(DF_\mathrm{min}\), directly, from a dataset should compare each newly generated clause with the already existing ones and remove from them those absorbed by it, or discard it, if it is absorbed by any of them. Such an algorithm is given in [27].

Example 1

Obtain the \(DF_\mathrm{min}\) for the dataset in Table 1.

According to (1), the clauses for this dataset are to be as in Table 2. The original and minimized forms of the DF obtained from Table 2 are, respectively, as follows:

$$\begin{aligned} \hbox {DF}&= C_{12 }{\wedge C}_{13}{\wedge C}_{14}{ \wedge C}_{15}{\wedge C}_{23 } {\wedge C}_{24 }\wedge C_{25 }\wedge C_{34 }\wedge C_{35 }\wedge C_{45 }\\&= (a_{1}\vee a_{2}\vee a_{3}\vee a_{4}) \wedge (a_{1}\vee a_{3}\vee a_{4}) \wedge (a_{2}\vee a_{3}) \wedge a_{4} \wedge (a_{1}\vee a_{2}\vee a_{3})\\&\quad \wedge (a_{1}\vee a_{3}\vee a_{4}) \wedge (a_{1}\vee a_{2}\vee a_{3}\vee a_{4}) \wedge (a_{1}\vee a_{2}\vee a_{4})\\&\quad \wedge (a_{1}\vee a_{3}\vee a_{4}) \wedge (a_{2}\vee a_{3}\vee a_{4})\\ \hbox {DF}_\mathrm{min}&= C_{14} \wedge C_{15} = a_{4} \wedge (a_{2}\vee a_{3}) \end{aligned}$$

As it is seen from Table 2 and these expressions, while \({\vert }\hbox {DF}{\vert }=10,\, {\vert }\hbox {DF}_\mathrm{min}{\vert }=2\). Since a reduct of a DF is a product of the variables the DF depends on, we should obtain the DNF of \(\hbox {DF}_\mathrm{min}\) by expanding it as follows:

$$\begin{aligned} \hbox {DF}_\mathrm{min} = a_{4}(a_{2} \vee a_{3})=a_{2}a_{4 }\vee a_{3}a_{4} \end{aligned}$$

where each of the products \(a_{2}a_{4}\) and \(a_{3}a_{4}\) is a reduct of the dataset given in Table 1. The descriptions of the dataset by these reducts are given in Tables 3 and 4, respectively.

Table 1 A dataset example (adopted from [10])
Table 2 The clauses for the dataset given in Table 1
Table 3 Description of the dataset by \(a_{2}\) and \(a_{4}\)
Table 4 Description of the dataset by \(a_{3}\) and \(a_{4}\)

For simplicity and clarity of operations on the clauses, in [24, 26, 27], each clause, \(C_{ij}\), given by (1) is represented by a bit-clause (BC) \(B_{ij}\) defined as:

$$\begin{aligned} B_{ij} =(b_{ijk} )_{k=1}^n \end{aligned}$$
(3)

where \(b_{ijk}=1\) if \(a_{k}(u_{i})\ne a_{k}(u_{j})\) and \(b_{ijk}=0\) otherwise, and \(k\) is the position of the bit at hand in the \(B_{ij}\). As it is seen from (2), for a dataset with \(m\) objects, exactly \(Q=m(m-1)/2\) BCs are generated. They compose a set

$$\begin{aligned} S=\left\{ {B_{ij} } \right\} ,i=1,2,\ldots ,m-1,j=i+1,i+2,\ldots ,m \end{aligned}$$
(4)

Since every \(B_{ij}\) is a bitwise representation of a certain propositional clause \(C_{ij}\), a set \(S\) has as many redundant BCs as the represented by its propositional DF with redundant clauses. For example, the BC-based representation of Table 2 is given in Table 5, where attributes and bit positions in BCs are associated by the following attribute-bit structure.

$$\begin{aligned} { Attribute }\text {-}{ bit }\text {-}{ str }=\{a_1 :1;a_2 :1;a_3 :1;a_4 :1\} \end{aligned}$$
(5)

In this structure, \(a_{k}:1\) means that, in BCs, the attribute \(a_{k}\) is represented by a single bit in the \(k\)th position.

Table 5 The BC-based representation of Table 2

In Table 5, a redundant BC (RBC) is obtained as follows: Let \(A\) and \(B\) be different BCs. \(A\) is absorbed by \(B\) (\(A\) is a RBC) if \( A \& B=B\) and \(B\) is absorbed by \(A\) (\(B\) is a RBC) if \( A \& B=A\). If the result of \( A \& B\) is neither \(A\) nor \(B\), then none of these BCs is a RBC. Based on this rule, we obtain from Table 5 that \(S_\mathrm{min} = \{B_{1,4}, B_{1,5}\}=\{0110, 0001\}\). According to the structure in (5), this result should be interpreted as a set of reducts \(a_{2}a_{4}\) and \(a_{3}a_{4}.\)

Note that, removing the RBCs from the set \(S\) disorders the indices of the BCs remaining in \(S_\mathrm{min}\). Therefore, for indexing the BCs in \(S_\mathrm{min}\), we use a single variable \(t = 1, 2, {\ldots }, q \le Q\) and rewrite (1), (4) and (2), respectively, as follows:

$$\begin{aligned} B_t&= (b_{tk} )_{k=1}^n\end{aligned}$$
(6)
$$\begin{aligned} S_\mathrm{min}&= \{B_t \}_{t=1}^q\end{aligned}$$
(7)
$$\begin{aligned} W&= |_{t=1}^q E(B_t ) \end{aligned}$$
(8)

where \(W\) is the bitwise representation of a DF, the “\({\vert }\)” denotes the sign of the bitwise OR operation used for performing the AND operation on the pairs of BCs [24, 26, 27], and \(E(B_{t})\) is an operator expanding a \(B_{t}\) into the set of associated unit BCs as follows:

$$\begin{aligned} E\left( {B_t } \right) =\left\{ {Pr_k \left( {B_t } \right) |b_{tk} =1} \right\} \end{aligned}$$
(9)

where \(Pr_{k}(B_{t})\) is the projection of \(B_{t}\) on the bit-position \(k\) that gives a unit BC with a single 1 in the \(k\)th bit-position and 0’s in all the others. For instance, \(B_{t}= 1011001,\, E(B_{t}) = \{ Pr_{k}(B_{t}){\vert }b_{tk}=1\}=\{1000000, 0010000, 0001000, 0000001\}\).

3 Problem statement

As discussed before, every DF is considered as a Boolean function where every variable represents an attribute of the dataset to which the DF belongs. Since every variable in a DF represents a certain attribute occurring only in the positive (uncomplemented) form, it may be simply considered as a literal. Therefore, in this study, the concepts of attribute, variable, and literal are interchangeably used.

Recall that according to DF-based AR, each reduct is obtained as one PI of the DF, and for obtaining the PIs of a DF, it should be converted from CNF to DNF [1, 8, 19, 21]. As it is known, all CNF to DNF conversion algorithms are based on the Nelson’s theorem [15, 2831] stating that “all PIs of a Boolean function given in CNF can be obtained by straightforward multiplying its clauses and deleting all RPs from the result” [29]. As it is easy to understand, the Nelson’s CNF to DNF conversion is an exhaustive process during which no any minimization of intermediate results is done. We will refer to such a conversion as an exhaustive conversion. In the process of such conversion of a DF with \(n\) variables, totally \(\Omega _{EC} \left( n \right) =\prod _{i=1}^q w(c_i )\approx w(c)^{q}\) products are generated [24, 30], where \(q\) is the size of the DF, \(w(c_{i})<n\) is the size of the clause \(C_{i},\, w(c)<n\) is the average size per clause, and \(\Omega _{EC}(n)\) is the space complexity of the exhaustive CNF to DNF conversion. On the other hand, the number of PIs of a logic function of \(n\) variables may be as large as \(3^{n}/n\) [29]. This is to say that the number of RPs generated in the DF to DNF conversion process may be as large as \(R_\mathrm{max}=w(c)^{q} - 3^{n}/n\). But while this upper bound for \(R_\mathrm{max}\) is valid for a generic Boolean function of \(n\) variables, it is very surplus for a DF in which every attribute appears only in the positive form. Namely, every DF is a completely monotonic function (CMF). Such a specification of a DF is very beneficial because the number of PIs of a CMF of \(n\) variables is limited by \(\left( _{n/2}^n \right) \) [1, 25] which is much smaller than \(3^{n}/n\) for all \(n >9\). That is, for a CMF, \(R_\mathrm{max}\) may be as large as \(w(c)^{q}-(_{n/2}^n )\). The analysis of this formula shows that \(w(c)^{q}\) reaches its maximum value at \(w(c)=n/2\) and \(q=(_{n/2}^n)\). Namely,

$$\begin{aligned} R_\mathrm{max} =\left( {n/2} \right) ^{(_{n/2}^n )}-(_{n/2}^n ) \end{aligned}$$
(10)

Moreover, a DF considered as a CMF may be specified in the binary space \(\{0,1\}^{n}\) of the complexity \(2^{n}\) instead of the ternary one \(\{0,1,x\}^{n}\) with the complexity \(3^{n}\) needed for specifying a generic function of \(n\) variables. The computations by (10) show that even for \(n=7\), the \(R_\mathrm{max}\) for a CMF could be as large as \(10^{19}\). This is because with every increase of \(n\) by 1, the exponent of \(R_\mathrm{max}\) increases more than the double. For instance, the increment in \(n\) from 9 to 10 causes an increment in the exponent of \(R_\mathrm{max}\) from 82 to 176. While modern computers cannot afford a set of the size of the order of \(10^{19}\), that would take place even at \(n=7\), they have to process the datasets with \(n \gg 7\). Briefly speaking, the main factor complicating the CNF to DNF conversion is the excessively high WCSC of this problem that usually even exceeds the virtual memory capacity of modern computers. Therefore, reducing the WCSC of the mentioned conversion is one of the actual problems in logical data processing.

In order to reduce the WSCS of the CNF to DNF conversion, a number of attempts were made. In particular, the algorithm proposed by Slagle and et al., [32] represents a function in CNF by a semantic tree, considers each prime path in it as a PI of the function, and finds the PIs by a depth-first searching in the tree. Unfortunately, this algorithm usually performs superfluous operations as it would generate some PIs more than once and present some RPs as PIs [31]. Another search tree (ST)-based CNF to DNF conversion method is the Thelen’s approach [3335]. A Thelen’s ST is built such that in its every level, one clause represented by the CNF is processed. Therefore, a ST representing a CNF of size \(q\) has exactly \(q\) levels. Every node in the level \(i\) of the ST have as many outgoing arcs as many literals in the clause \(C_{i}\) processed in this level, and each arc outgoing from every node of the level \(i\) is labeled by one of literals of \(C_{i}\). In such a ST, conjunction of all literals labeling the arcs at the path from the root to a node is considered as the product to be represented by this node. Therefore, each leaf node of such a ST represents a product that either a PI or a RP. In order to prevent the growth of the number of RPs, a ST is pruned in each of its levels by the following rules [34, 35]:

R1: :

An arc is pruned if its predecessor node (product) contains the complement of the arc-literal,

R2: :

A clause is discarded if it contains a literal, which appears also in the predecessor node (product),

R3: :

An arc is pruned if another non-expanded arc on a higher level still exists, which has the same arc-literal.

The rule \(R1\) has no meaning for DFs since, as a CMF, every DF always contains only the positive variables. In terms of switching functions, Thelen’s approach begins with a root node representing a 1. In order to build each next level of a ST, the algorithm compares the next clause to be embedded into the ST with each of products represented by the nodes in the most recently built level. While the clause is multiplied by every product that does not contain any literal from it, it is discarded for every product that contains at least one literal from it (Rule 2). This process is iteratively continued until all clauses of the DF are processed (embedded into the ST). In result, the algorithm gives all possible PIs and some URPs that could not be recognized by rule \(R2\). These URPs are removed from the result by rule \(R3\). Note that both rules \(R2\) and \(R3\) realize the same logic absorption law given below.

$$\begin{aligned} \hat{P} \cap \hat{C} =\hat{P} \rightarrow P\wedge C=P \end{aligned}$$
(11)

where \(P\) is a product, \(C\) is a clause and \(\hat{P}\), and \(\hat{C}\) are the sets of literals present in \(P\) and \(C\), respectively. Since (11) is the rule for the irredundant logic expansion (ILE) of a product \(P\) by a clause \(C\), we will refer to an approach based on this rule as an ILE. As it is stated in [35], in spite of iteratively pruning of the ST, the WCSC of the ILE is exponential in \(n\). This is because of rule \(R2\), according to which every clause to be embedded into a ST, should be compared with all already computed products, the number of which may be as large as \((_{n/2}^n )\) that is of order of \(2^{n}\). In [36], Socher proposed a \(n\times q\) binary matrix (BM)-based approach where the row \(k_{\in }\{1,2, {\ldots }, n\}\) and the column \(t\,{\in }\,\{1,2, {\ldots }, q\}\) are labeled by the literal (attribute) \(a_{k} \in A\) and by the BC \(B_{t}\in S_\mathrm{min}\), respectively. In such a BM, the entry \((k, t)\) is always associated with the bit \(b_{tk} =Pr_{k}(B_{t})\). For example, the BM representation of the function \(F=(a\vee c)(b\vee c)(b\vee d\vee f)(b\vee e\vee f)\) is to be as in Table 6.

Table 6 The BM representation of the function \(F\)

The mentioned approach is based on the divide-and-conquer strategy according to which the given BM is processed iteratively and its remainder to be processed in the \(i\)th, \(i\le n-1\), iteration is denoted by \(M_{i}\) (the original BM is denoted by \(M_{1})\). According to this approach, in the \(i\)th iteration, a split literal \(SL_{i}\) is chosen and the \(M_{i}\) is partitioned into two sub-BMs (SBMs) so that while one of the SBMs, denoted by \(D_{i}\), contains only those columns of the \(M_{i}\) for which \(SL_{i}=0\), another SBM, denoted by \(M_{i+1}\), contains all the rows of BM except the row labeled by the \(SL_{i}\). The SBM \(D_{i}\) is processed at once in order to obtain all PIs that can be derived from it. But the SBM \(M_{i+1}\) is passed to the next iteration as the remainder of the BM \(M_{1}\) to be processed in it. In sequel, we will call this method as the sequential logic partitioning (SLP). In short, a SLP-based algorithm discovers the PIs of a DF by generating all paths through the BM, discarding subsumed paths and considering each preserved path as a PI [31]. Since at each iteration of the SLP one SL is chosen, the number of iterations of a SLP-based algorithm cannot exceed the limit of \(n-1\). Unfortunately, this approach performs superfluous operations as it generates some part of products more than once and carries out some RPs [31, 37]. In [37], an improvement of the SLP according to which the BM is partitioned into several cofactors is proposed. The subsets of PIs for all cofactors are obtained separately and then cross-concatenated. There are several other methods for CNF to DNF conversion reviewed in [31, 37] but not so attractive for the conversion of a DF to DNF. In our opinion, the algorithms proposed in the abovementioned studies suffer from the following drawbacks making them not so convenient for converting a DF from CNF to DNF:

  1. 1.

    The algorithms have been developed for logic minimization in the space \(\{0,1,x\}^{n}\) that is much wider than the space \(\{0,1\}^{n}\) enough for DFs,

  2. 2.

    Every algorithm fixes all paths existing in the BM or ST, removes redundant paths from the result and considers each of remaining paths as a PI. But since the number of RPs for a function is usually much greater than that of PIs, the algorithm would generate so many RPs that may overflow the memory,

  3. 3.

    The computational complexities of the algorithms have not been satisfactorily estimated,

  4. 4.

    Search for PIs in a BM or in a ST is more difficult than their computation by switching functions techniques, where the PIs are calculated by logic transformations on the input cubes (vectors specified in the space \(\{0,1,x\}^{n})\).

The contributions of this paper include:

  • We show that every DF is exactly a CMF and can be processed by much simpler way than when it is considered as a generic function,

  • We show that in the SLP the number of iterations cannot exceed \(n-1\) and the most complex iteration is the first one with a space complexity no higher than the square root of that of the original BM,

  • We show that the WCSC of the ILE applied is \((_{n/2}^n )/2\),

  • We show that SLP and ILE may be combined in a single algorithm so that the WCSC of the CNF to DNF conversion reduces from \(\Omega _{EC} \left( n \right) =\left( {n/2} \right) ^{(_{n/2}^n )}\) to \(\Omega _{ILE} \left( {n-1} \right) =(_{n/2}^n )/2,\) where the \(\Omega _{EC} \left( n \right) \) and \(\Omega _{ILE} \) are WCSCs of the exhaustive and ILE-based CNF to DNF conversions, respectively.

4 Unique properties of SLP and ILE

In our opinion, SLP and ILE have some unique properties that allow us to combine them in a single algorithm more efficient than each of them alone. The unique properties of the SLP are as follows:

  1. 1.

    It iteratively (sequentially) partitions the BM so that at each iteration only one SBM with the WCSC much less than the original BM is processed,

  2. 2.

    The number of PIs generated in each iteration cannot exceed the half of the number of all PIs. This property is very useful, especially, in cases where \(n\) is a big number and the number of PIs is close to the maximum possible one; that is almost exponential in \(n\).

The unique property of the ILE is that it can prevent the generation of the vast majority of RPs by using Rule 2 given above. As for the remaining few RPs referred to above as URPs, they can be detected and deleted by the logic absorption law in (11) at the end of constructing every level of the ST being built. In our opinion, the ILE may be considered as an improvement of the Nelson’s method in the sense that it, additionally, can avoid the generation of all PRPs. Therefore, we obtain the WCSC for the ILE via the WCSC of the Nelson’s method.

4.1 The WCSC of the Nelson’s method

As mentioned above, in the Nelson’s method, the minimized DNF (disjunction of all PIs) of a Boolean function given in CNF is obtained by straightforward multiplying its clauses and removing all RPs from the result. If the clauses of a CNF are \(C_{1}, C_{2} , {\ldots }, C_{q}\) then the Nelson’s algorithm can be written as given in Fig. 1.

Fig. 1
figure 1

The original Nelson’s algorithm

In order to convert a DF to the DNF, the Algorithm_ON (Fig. 1) multiplies all clauses of the DF together regardless of which there may be generated a large number of RPs. Above, such conversion has been referred to as exhaustive one and its WCSC has been obtained as \(\Omega _{EC} \left( n \right) =\prod _{i=1}^q w(c_i )\). Since for the worst case \(q=(_{n/2}^n )\) and \(\forall i \in \{1,2,{\ldots },q\},\, w(c_{i}) =n/2\), the WCSC of the Algorithm_ON is to be as:

$$\begin{aligned} \Omega _{ON} \left( n \right) =\left( {n/2} \right) ^{(_{n/2}^n )} \end{aligned}$$
(12)

while processing such a big number of clauses is beyond the power of modern computers for \(n \ge 7\), the number of PIs of a function of \(n\) variables cannot exceed the number \(q=(_{n/2}^n )\ll \Omega _{EC} \left( n \right) \). However, the WCSC of the DF to DNF conversion can be reduced in a large scale if statement 2 in Fig. 1 to be included into the For loop as in Fig. 2.

Fig. 2
figure 2

The improved Nelson’s algorithm

In the iteration \(t\in \{0,1,2, {\ldots },q\}\) of the Algorithm_IN (Fig. 2), first the set of products \(W\) is expanded by multiplying it with the clause \(C_{t}\) and second all RPs present in \(W\) are removed. Namely, no RP is passed to the next \((t+1)\)th iteration. According to the algorithm, every iteration begins with a products set \(W\) of size \(\left| W \right| \le (_{n/2}^n )\). Statement 1 increases the size of the set \(W\) by a factor \(n/2\) and statement 2 reduces it up to \(\left| W \right| \le (_{n/2}^n)\). Therefore, WCSC of the Algorithm_IN is as follows:

$$\begin{aligned} \Omega _{IN} \left( n \right) =\left( {n/2} \right) \times (_{n/2}^n ) \end{aligned}$$
(13)

As it is easy to see, the WCSC of the Algorithm_ IN (Fig. 2) is \(R=\Omega _{ON} \left( n \right) /\Omega _{IN} (n)=\left( {n/2} \right) ^{(_{n/2}^n )}/(\left( {n/2} \right) \times (_{n/2}^n ))\) times less than that of the Algorithm_ON. For example, for \(n=4\), 5, 6, 7, 8, 9 and 10, \(R = 5.3, 3.8\times 10^{2}, 5.8\times 10^{7},\, 9.0\times 10^{16},\, 5.0\times 10^{39}, \,3.5\times 10^{79}\) and \(1.1\times 10^{173}\), respectively. In spite of the reduction in these large scales, the WCSC of the Algorithm_ IN remains in the class of exponential complexity because \(\left( {n/2} \right) \times (_{n/2}^n )\) may be specified as \(O(2^{n})\), i.e.,

$$\begin{aligned} O\left( {\Omega _{IN} \left( n \right) } \right) =O(2^{n}) \end{aligned}$$
(14)

However, fortunately, the space complexity of real tasks is much less than the mentioned WCSC, and hence, the conversion of a DF to DNF is applicable for datasets with several tens of attributes.

4.2 The WCSC of the ILE

Recall that according to the ILE, while a clause to be inserted (embedded) into the ST is multiplied by every product that does not contain any literal from it, it is neglected for every product that contains at least one literal from it (Rule R2). This process is continued iteratively until all clauses of the DF at hand are embedded into the ST. As it is shown in [26] and [34], rule \(R2\) may detect and prevent the generation of most RPs in each level of the ST. But the number of RPs that cannot be detected by rule \(R2\) remains not only unknown but also may grow from level to level. This problem can be avoided if rule \(R3\) is applied not at the end of the algorithm but at the end of building each level of a ST.

In the worst case, the DF has \(q=(_{n/2}^n )\) clauses with \(n/2\) literals per clause and each literal appears exactly in \((_{n/2}^n )/2\) products of DNF of the DF. This is to say that in expansion (multiplication) of a products set \(W\) with a clause \(C\) by the ILE, the maximum possible number of expanded and unexpanded elements of the set \(W\) are the same and are equivalent to \((_{n/2}^n )/2\), i.e., the WCSC of the ILE is to be as

$$\begin{aligned} \Omega _{ILE} \left( n \right) =(_{n/2}^n ) \end{aligned}$$
(15)

From (13) and (15), it is seen that the amount of intermediate products produced by an ILE-based algorithm is to be \(n/2\) times less than that of the Improved Nelson’s algorithm. This is to say that in many cases when Improved Nelson’s algorithm can overflow the memory, an ILE-based algorithm can work easily. In spite of this, \(\Omega _{ILE} (n)\) belongs to the same complexity class \(O(2^{n})\) as \(\Omega _{ILE} (n)\). However, the ILE is applicable for datasets with dozens of attributes because, firstly, the space complexities of the real AR tasks are usually much less than the mentioned WCSC and secondly, the SLP applied to the BM before the ILE may seriously reduce the work to do by it.

4.3 The WCSC of the SLP

Since the SLP was developed for solving the problems with a few variables such as theorem proving [36, 37], it has not been subjected to a serious computational complexity analysis. But in our opinion, since this method may be applied to the processing of DFs with dozens of attributes, the mentioned analysis is necessary. For this aim, consider the BM structure given in [24] that is a 90 degrees right-rotated form of the conventional BM. In such a BM, every row and column is, respectively, labeled with one BC and one certain literal of the DF it represents. Since the location of such a BM in the memory is almost the same with its logic structure, it may be processed by a more easy way. Moreover, every BM of this structure may be subjected easily to the commutative decomposition according to which every BM can be partitioned into smaller and independent SBMs without losing its original features [38]. In our opinion, from the related approaches, the approach that most satisfies this criterion is the Socher’s approach [36] summarized above (Sect. 3). In [24], two more important concepts for processing the BMs have been introduced. They are the weight of an attribute and the weight of a BC obtained as the numbers of 1’s in the column and in the row labeled by the attribute and the BC, respectively. While the weights of the attributes are used for selecting the best suited SL for current iteration, the weights of the BCs are used for obtaining the space complexity of the exhaustive processing of BM. According to [22, 24, 34, 36], as the SL for every iteration the attribute of the highest weight in the BM to be processed in this iteration is chosen. If there are more than one such attribute, any one of them may be chosen. An algorithm implementing the SLP is given in Fig. 3.

Fig. 3
figure 3

The algorithm of the SLP

In the Algorithm_SLP (Fig. 3), \(R_{t}\) is the \(t\)th row of the SBM \(M_{i},\, W_{i}\) is the subset of prime paths derived from the SBM \(M_{i}\), and \(W\) is the union of the subsets \(W_{1}, W_{2}, {\ldots }, W_{i-1}\). In order to derive the SBM \(D_{i}\) from the SBM \(M_{i}\), first the \(M_{i}\) is considered as a set and second the \(D_{i}\) is derived from it by the statement \(A2\).

As it is easy to see from Fig. 3, the Algorithm_SLP processes the original BM \(M_{1}\) iteratively so that, in the \(i\)th iteration, all paths containing the \(SL_{i}\) (split literal for the \({i}\mathrm{th}\) iteration) but not containing any literal from the set \(\left\{ {SL_I } \right\} _{I=1}^{i-1}\) are generated (see the statement A3), where \(i\in \{1,2, {\ldots },n-1\}\). This is to say that in the \(i\)th iteration, the algorithm processes a BM with \(n-i+1\) literal including the \(SL_{i}\). Generally speaking, the algorithm begins with a BM of \(n\) literals, reiterates with removing one literal in every iteration from the BM and ends when there a SBM \(M_{i}=0\) (a SBM with single zero row) appears. This is to say that for a BM with \(n\) columns (literals) the Algorithm_SLP will do at most \(n-\)1 iterations. But the number of BCs in the SBM \(D_{i}\) processed in the \(i\)th iteration and the number of PIs generated in the same iteration may be as large as \((_{(n-i)/2}^{n-i} )\). It is easy to see that with each increment in the value of \(i\) by 1, the value of \((_{(n-i)/2}^{n-i} )\) decreases approximately by the half. In other words, the WCSC of the \((i+1)\)th iteration of the Algorithm_SLP is to be approximately two times less than that of \(i\)th iteration.

As mentioned above, in the \(i\)th iteration, the BM \(M_{i}\) is partitioned into the SBMs \(D_{i}\) and \(M_{i+1}\). While the \(M_{i+1}\) is passed to the next iteration, the \(D_{i}\) is processed at once to find PIs derivable from it.

In the \(i\)th iteration, the algorithm Algorithm_SLP (Fig. 3) converts from CNF to DNF a SBM \(D_{i}\) with \(n-i+1\) nonzero columns (literals) and no more than \((_{(n-i)/2}^{n-i})\) rows (BCs). Since \(max\left\{ {(_{(n-i)/2}^{n-i} )} \right\} =(_{(n-i)/2}^{n-i} )\), among the SBMs \(D_{1}, D_{2}, D_{3}, {\ldots },D_{n-1}\) processed in the \(1\text {st},\, 2\text {nd}, {\ldots }, (n-1)\)th iterations of the algorithm, respectively, the one with the highest WCSC is the \(D_{1}\). Therefore, below we will focus only on the WCSC of the 1st iteration of the Algorithm_SLP.

As it is well known, the most complex BM representing a DF with \(n\) literals is a BM with \(q=(_{n/2}^n )\) rows, \(n\) columns, exactly \(w(c)=n\)/2 1’s per row and exactly \((_{n/2}^n )/2\) 1’\(s \)per column [1, 24, 25]. The WCSC of the exhaustive conversion of such a BM to DNF is as of Algorithm_ON (Fig. 2), i.e.,

$$\begin{aligned} \Omega _{EMC_1 } \left( n \right) =\Omega _{ON} \left( n \right) =(n/2)^{(_{n/2}^n )} \end{aligned}$$
(16)

As mentioned above in the first iteration of a SLP-based algorithm, to DNF is converted not the whole of the original BM \(M_{1}\) but only its SBM \(D_{1 }=\{R_{t}{\vert }R_{t}\in M_{1}\) and \(SL_{1}(R_{t})=0\}\), where \(R_{t}\) is the label of the row representing the \(BC_{t}\). Since in such a BM each column (attribute) has exactly \((_{n/2}^n )/2\) 0’s, the SBM \(D_{1}\) will have exactly \((_{n/2}^n )/2\) rows with the following WCSC of the exhaustive conversion,

$$\begin{aligned} \Omega _{ECD_1 } \left( n \right) =(n/2)^{(_{n/2}^n )/2}=\sqrt{\Omega _{EMC_1 } (n)} \end{aligned}$$
(17)

Namely, the SLP reduces the WCSC of the exhaustive processing of a BM at least by \(\sqrt{\Omega _{EMC_1 } (n)}\) times. The efficiency of this reduction increases almost in a quadratic behavior with every increase in the value of \(n\) by 1. For instance, for \(n=7\), 8, 9 and 10, while the orders of the values of \(\Omega _{EMC_1 } (n)\) are \(10^{19},\, 10^{42},\, 10^{82}\) and \(10^{176}\), the orders of the values of \(\Omega _{ECD_1 } (n)\) are \(10^{9},\, 10^{21},\, 10^{41}\) and \(10^{88}\), respectively. In spite of this, the space complexity of the mentioned conversion still remains very high for all \(n \ge 8\). However, the SLP has the following advantages that may be considered as a base for developing more efficient algorithms: 1) The WCSC of the approach cannot exceed the square root of that of the BM in whole, 2) The number of PIs that may be generated in one iteration of the approach cannot exceed the half of the maximum possible number of all PIs, 3) The number of iterations of the approach cannot exceed \(n\)-1.

5 The combined SLP-ILE approach

As mentioned above, the SLP allows us to reduce the WCSC of the exhaustive processing of a BM from \(\Omega _{EMC_1 } \left( n \right) =(n/2)^{(_{n/2}^n )}\) to \(\Omega _{ECD_1 } \left( n \right) =\sqrt{\Omega _{EMC_1 } (n)}\) and the maximum possible number of simultaneously generated PIs from \((_{n/2}^n )\) to \((_{(n-1)/2}^{n-1} )\approx (_{n/2}^n )/2\). As for ILE, it can reduce the WCSC of the mentioned processing from \(\Omega _{ECD_1 } \left( n \right) =\sqrt{\Omega _{EMC_1 } (n)}\) to \(\Omega _{ILE} \left( n \right) =(_{n/2}^n )/2\). That is, the SLP and ILE may be combined together such that the WCSC of CNF to DNF conversion of a DF can be reduced from \((n/2)^{(_{n/2}^n )}\) to \((_{n/2}^n )/2\). In order to do this, it is sufficient to replace the statements \(A4\) to \(A6\) of the Algorithm_SLP (Fig. 3) with a procedure implementing the ILE. But since the most efficient system of operations for processing a BM is the bitwise-logic [16, 24], we should implement the ST-based ILE in terms of this logic.

5.1 The bitwise implementation of the ILE

In our opinion, the ILE is based on the commutative decomposition principle and logic absorption law. According to this principle, a function is partitioned into smaller and independent sub-functions without losing its original features [38]. For example, the logic expression \(f\) consisting of a multiplication of a product \(P\) by a clause \(C=(x_{i}\vee x_{j} \vee {\cdots }\vee _{ }x_{h})\) may be expressed as follows:

$$\begin{aligned} f=P\left( {x_i \vee x_j \vee \cdots \vee x_h } \right) =x_i P\vee x_j P\vee \cdots \vee x_h P \end{aligned}$$
(18)

where \(i<j<h<n\). In order to explain the application of the logic absorption law to a function like (18), let us to denote \(x_{i}P, x_{j}P, x_{k} P, {\ldots }, x_{h}P\) by \(F_{i} \vee F_{j} \vee F_{k} \vee {\cdots }\vee F_{h}\), respectively. For every \(F_{g\in {\{i, j, k, \cdots , h\}}}\), rule \(R2\) (Sect. 3) may be rewritten as:

$$\begin{aligned} \text {if} \,x_g \in \hat{P}\, \text {then} \, F_g =P\, \text {else}\,F_g =x_g P \end{aligned}$$
(19)

where \(\hat{P}\) is the set of literals present in \(P\). The bitwise representation of (19) is as follows:

$$ \begin{aligned} \text {if}\, BP_i \& B_t \ne 0\, \text {then}\, F_g =BP_i \, \hbox {else}\, F_g =BP_i |E(B_t) \end{aligned}$$
(20)

where \(BP_{i }\) is a bit-product, \(B_{t}\) is a BC, \(E(B_{t})\) is the set of unit BCs generated according to (8) and “\(\vert \)” is the sign of bitwise OR operation used for bitwise ANDing the BPs and unit BCs. Note that the BP representation of a product \(P\) contains 1’s and 0’s in the positions of the literals present and not present in \(P\), respectively.

Equation (20) says that “if \(BP_{i}\) & \(B_{t}\ne 0\) then \(BP_{i}\) may be preserved (as a candidate to be a PI) without any operation on it, else it must be expanded by the \(B_{t}\)”. Based on this corollary, our main procedure given by (8) and to be realized by the ILE can be expressed as the following system of equations:

$$ \begin{aligned} V_{t1}&= \left\{ {BP:BP\in V_{t-1} \,and \,BP \& B_t \ne 0} \right\} ;V_{t2} =V_{t-1} -V_{t1}\end{aligned}$$
(21)
$$\begin{aligned} E\left( {B_t } \right)&= \left\{ {b_{tk} :b_{tk} =Pr_k \left( {B_t } \right) \,and\,b_{tk} =1} \right\} \end{aligned}$$
(22)
$$\begin{aligned} Q&= V_{t2} |E(B_t )\end{aligned}$$
(23)
$$ \begin{aligned} V_{t3}&= \left\{ {h:h\in Q\,and\,\bar{\exists } z\in V_{t1} :z \& h=z} \right\} \end{aligned}$$
(24)
$$\begin{aligned} V_t&= V_{t1} \cup V_{t3} . \end{aligned}$$
(25)

As it can be seen from (2125), while the set \(V_{t1}\) consists of such elements from the set \(V_{t-1}\) are to be included into the set \(V_{t}\) without any expansion, the set \(V_{t2}\) contains those elements from \(V_{t-1}\) are to be expanded by \(B_{t}\) (multiplied by \(E(B_{t}))\) and then included into the set\( V_{t}\). This is to say that, no element from \(V_{t1}\) may contain more literals than any element from the set \(V_{t3}\). In the other words, some elements from \(V_{t3}\) may be absorbed by some elements from \(V_{t1}\) but not vice versa, i.e., the \(V_{t3}\) may contain redundant BPs to instantly discard that each new element generated for \(V_{t3}\) should be controlled, whether it is absorbed by any element from \(V_{t1}\) or not. That is, the set of all non-redundant BPs generated in the \(t\)th level of a ST will be obtained and included into the set of prime BPs by (24) and (25), respectively. Equations (2125) can be implemented by the Algorithm_ILE (Fig. 4), where \(B_{0}\) is the unit BC with a 1 in the position of the \(SL_{i}\) and 0’s in all other positions. In the algorithm, the contribution of a SL has been realized indirectly via the \(B_{0}\). This allows us to process every BM (or SBM) by only parallel bitwise operations on its rows.

Fig. 4
figure 4

The bitwise algorithm for the ILE

In the Algorithm_ILE (Fig. 4), statement \(H1\) initiates the set \(V_{0}\), statement \(H2\) implements (21), statement \(H3\) implements (22) and initiates the set \(V_{t3}\), statements \(H\)4 to \(H7\) implement (23) and (24) with discarding each unpreventable BP once it is generated. Statement \(H8\) implements (25).

5.2 The SLP-ILE algorithm

As mentioned above, the SLP iteratively (sequentially) partitions a BM to SBMs so that the WCSC of a SBM processed in any iteration is much less than that of the original BM. Since in this approach, every SBM is transformed into the appropriate subset of PIs by conventional CNF to DNF conversion, the processing of SBMs with high dimensions may overflow the computer memory. As mentioned above, the main factor complicating CNF to DNF conversion is that, besides the essential products, there are usually numerous RPs are generated. We can avoid the generation of these RPs by replacing the statements \(A4\) to \(A6\) in the Algorithm_SLP (Fig. 3) with the Algorithm-ILE (Fig. 4) as shown in Fig. 5.

Fig. 5
figure 5

The Algorithm_SLP_ILE

As mentioned before, the most complex SBM generated by the statements A1.1, A1.2, A2, and A3 of the Algorithm_SLP_ILE is the SBM \(D_{1}\) with \(n-1\) literals except the SL. According to (15), this SBM is processed by the Algorithm_ILE with the WCSC \((_{(n-1)/2}^{n-1} )\approx (_{n/2}^n )/2\). Albeit this WCSC is almost exponential in \(n\), the probability of occurrence of a function with this WCSC is as small as \(1/(\left( {n-1} \right) !+n^{n-2})\), where \((\left( {n-1} \right) !+n^{n-2})\) is the number of all possible minimal forms of a function of \(n\) variables [39]. In particular, this probability becomes less than 0.001 even at \(n=6\) and decreases more than \(2n\) times with every increment of \(n\) by 1.

Example

For an example, we select the dataset Diabet [40] with \(m=768\) objects and \(n=8\) attributes denoted by \(a_{1},a_{2} ,{\ldots },a_{8}\). This dataset was preferred due to that it is the smallest of the available to us real datasets suitable for demonstrating all details of the proposed algorithm.

For the dataset Diabet were generated \(Q=294568\) BCs from which at most \(q=(_{n/2}^n )=70\) are to be prime BCs. But in this example, fortunately \(q=15<70\). Namely, there 294553 BCs have been removed as those absorbed by 15 prime BCs (PBCs) denoted by \(R_{1}, R_{2}, {\ldots }, R_{15}\) and represented as the BM \(M_{1}\) given in Table 7.

Table 7 The BM \(M_{1}\) representing the set of PBCs for the dataset Diabet

In the BM \(M_{1}\) (Table 7), the elements of the extra row \(w(a)\) and extra column \(w(R)\) contain the weights of the attribute \(a{\in }\{a_{1}, a_{2}, {\ldots }, a_{8}\}\) and BC \(R{\in }\{R_{1}, R_{2}, {\ldots }, R_{15}\}\), respectively. The \(w(a)\) and \(w(R)\) are obtained as the number of 1’s present in the column and in the row labeled by \(a\) and \(R\), respectively. For ease of processing the BM, its rows are ordered by the value of the \(w(R)\).

The space complexity of the BM \(M_{1}\) is \(\Omega _{EMC_1 } \left( n \right) =\prod _1^{15} w\left( {R_i } \right) =2.8\times 10^{9}\). The PIs for the dataset are obtained by the Algorithm_SLP_ILE as follows:

The \(\mathbf{1}{\mathbf{st}}\) Iteration of the Until Statement: i = 1, input is \(M_{1}\)

  1. A1.1:

    Since in the BM \(M_{1}\) (Table 7) \(max\{w(a_{j})\}_{j=1}^8 =w(a_{7})= 10, \,SL_{1}= a_{7}\)

  2. A1.2:

    \(B_{0}=00000010\)

  3. A2:

    \(D_{1 }= B_{0} \cup \{R_{t}: R_{t} \in M_{1}\) and \(R_{t} (SL_{1}=0=\{B_{0}, R_{8}, R_{9}, R_{13}, R_{14}, R_{15}\}\) that is re-denoted as \(\{B_{0}, \,B_{1}, B_{2}, B_{3}, B_{4}, B_{5}\}\), respectively (Table 8a)

  4. A3:

    \(M_{2}=M_{1}\) with the column \(SL_{1}=a_{7}\) cleared and the redundant rows \((R_{8},R_{9},R_{13},R_{14},R_{15})\) removed (Table 8b)

Table 8 The SBMs \(D_{1}\) (a) and \(M_{2}\) (b)

The space complexity of the SBM \(D_{1}\) to be processed in the first iteration is \(\Omega _{ECD_1 } \left( n \right) ={\prod }_0^5 w\left( {B_i } \right) =1\times 4\times 5\times 6\times 6\times 6=4320 \ll \Omega _{ECD_1 } \left( n \right) ={\prod }_0^5 w\left( {B_t } \right) =1\times 4\times 5\times 6\times 6\times 6=4320 \ll \sqrt{\Omega _{EMC_1 } (n)}=\sqrt{2.8\times 10^{9}}=52915\). But according to (15), the Algorithm_SLP_ILE will process the \(D_{1}\) with the WCSC \(\Omega _{ILE} \left( {n=8,i=1} \right) \le (_{(8-1)/2}^{8-1} )=35. \)Since \(\Omega _{ECD_1} \left( {n=8,i=1} \right) >\forall \Omega _{ECD_i } (n-i+1,i>1)\) and \(\Omega _{ILE} \left( {n=8,i=1} \right) >\forall \Omega _{ILE} >(n-i,i>1)\), we will estimate only the first iteration of the Until statement of the algorithm.

Start of the Algorithm_ILE \((\mathbf{D}_{\mathbf{i}})\) at \(\mathbf{i}=\mathbf{1}\). Since the SBM \(D_{1}\) has 5 rows, the Algorithm_ILE \((D_{1})\) will do 5 iterations.

  • H1: \(V_{0} =\{B_{0}\}=\{00000010\}\)

The \(\mathbf{1}{\mathbf{st}}\) iteration: \(\hbox {t} = 1\), inputs are \(V_{0}\) and \(B_{1}\)

  • H2:\(V_{11}=\{P:P\in V_{0}\) and \( P \& B_{1}\ne 0\}=\emptyset ;\, V_{12}=V_{0}-V_{11} =\{00000010\}\)

  • H3:\(E(B_{1})=\{Pr_{k}(B_{1}):b_{1k} =1\}=\{10000000, 01000000, 00100000, 00000100\}\)

  • H4-H7:\(V_{13}=V_{12}{\vert }E(B_{1})=\{10000010, 01000010, 00100010, 00000110\}\)

  • H8:\(V_{1}=V_{11} \cup V_{13}=V_{13}\)

The \(\mathbf{2}{\mathbf{nd}}\) iteration: \(\hbox {t} = 2\), inputs are \(V_{1}\) and \(B_{2}\)

  • H2:\(V_{21} =\{P:P\in V_{1}\) and \( P \& B_{2}\ne 0\}=\{01000010, 00100010, 00000110\}; \,V_{22}=V_{1}-V_{21} =\{10000010\}\)

  • H3:\(E(B_{2})=\{Pr_{k}(B_{2}):b_{2k } 1\}=\{01000000, 00100000, 00010000, 00000100, 00000001\}\)

  • H4-H7:\(V_{23}=V_{22}{\vert }E(B_{2})=\{10010010, 10000011\} //\)Here were generated and discarded 3 URPs: 11000010, 10100010 and 10000110.

  • H8:\(V_{2}=V_{21} \cup V_{23} =\{01000010, 00100010,00000110, 10010010, 10000011\}\)

The \(\mathbf{3}{\mathbf{rd}}\) iteration: \(\hbox {t}=3\), inputs are \(V_{2}\) and \(B_{3}\)

  • H2:\(V_{31} \!=\!\{P:P \!\in \! V_{2}\) and \( P \& B_{3}\ne 0\}\!=\!\{01000010, 00100010, 10010010, 10000011\};\) \(V_{32}=V_{2}-V_{31} =\{00000110\}\)

  • H3:\(E(B_{3})= \{Pr_{k}(B_{3}):b_{3k} =1\}=\{10000000, 01000000, 00100000, 00010000,\) \(00001000, 00000001\}\)

  • H4-H7:\(V_{33}=V_{32} \vert E(B_{3})= \{10000110, 00010110, 00001110, 00000111\}\, //\)Here were generated and discarded 2 URPs: 01000110 and 00100110

  • H8:\(V_{3}=V_{31} \cup V_{33} =\{01000010, 00100010, 10010010, 10000011, 10000110,\) \(00010110, 00001110, 00000111\}\)

The \(\mathbf{4}{\mathbf{th}}\) iteration: \(\hbox {t}=4\), inputs are \(V_{3}\) and \(B_{4}\)

  • H2:\(V_{41} =\{P:P\in V_{3}\) and \( P \& B_{4}\ne 0=\{00100010, 10010010, 10000011,\) \(10000110, 00010110, 00001110, 00000111\}; \,V_{42}=V_{3}-V_{41} =\{01000010\}\)

  • H3:\(E(B_{4})= \{Pr_{k}(B_{4}):b_{4k} =1\}=\{10000000, 00100000, 00010000, 00001000,\) \(00000100, 00000001\}\)

  • H4-H7:\(V_{43}\!=\!V_{42}\vert E(B_{4})\!=\! \{11000010, \,01010010, 01001010, 01000100, 01000011 \}\, //\) Here was generated and discarded 1 URP: 01100010

  • H8:\(V_{4}=V_{41} \cup V_{43} =\{00100010, 10010010, 10000011, 10000110, 00010110,\) \(00001110, 00000111, 11000010, 01010010, 01001010, 01000110, 01000011\}\)

The \(\mathbf{5}{\mathbf{th}}\) iteration: \(\hbox {t}=5\), inputs are \(V_{4}\) and \(B_{5}\)

  • H2:\(V_{51} \!=\!\{P:P\in V_{4}\) and \( P \& B_{5}\ne 0\!=\{10010010, 10000011, 10000110, 00010110,\) \(00001110, 00000111, 11000010, 01010010, 01001010, 01000110, 01000011\}; \,V _{52}=V_{4}-V_{51} = \{00100010\}\)

  • H3:\(E(B_{5})=\{Pr_{k}(B_{5}):b_{5k} =1\}=\{10000000, 01000000, 00010000, 00001000,\) \(00000100, 00000001\}\)

  • H4-H7:\(V_{53}=V_{52}\vert E(B_{5})=\{10000110, 01000110, 00100110, 00010110, 00001110,\) \(00000111\}\)

  • H8:\(V_{5}=V_{51} \cup V_{53} =\{10010010,10000011, 10000110, 00010110, 00001110,\) \(00000111, 11000010, 01010010, 01001010, 01000110, 01000011, 10100010, 01100010,\) \(00110010, 00101010, 00100110, 00100011\}\)

  • H9:\(W_{1} =V_{5};\) The end of the Algorithm_ ILE

While in this iteration of the “Until” statement, conventionally \(\sum _{t=0}^4 V_t \times E(B_{t+1})=1\times 4+4\times 5+5\times 6+8\times 6+12\times 6=174\) multiplications with the space complexity \(max\{4, 20, 30, 48, 72, 17\} = 72\) had to be performed, where the number 17 is the size of the set \(W_{1}\), the algorithm performs only \(\sum _{t=1}^5 V_{t2} \times E(B_t )=1\times 4+1\times 5+1\times 6+1\times 6+1\times 6=27\) multiplications with the space complexity \(max\{4, 5, 6, 8, 12, 17\} = 17\).

The \(\mathbf{2}{\mathbf{nd}}\) Iteration of the Until Statement: \(\hbox {i}=2\), input is \(\hbox {M}_{2}\)

  • A1.1: Since, in the BM \(M_{2}\) (Table 8b), \(max\{w(a_{j})\}_{j=1}^8 = w(a_{3})= w(a_{6}) =5,\, SL_{2}= a_{3};\,A1.2: B_{0}=00100000\)

  • A2:\(D_{2 }= B_{0} \cup \{R_{t}: R_{t} \in M_{2}\) and \(R_{t} (SL_{2})=0\}=\{B_{0}, \, R_{1}, R_{2}, R_{3}, R_{4}, R_{10}\}\) that are re-denoted as \(\{B_{0}, \,B_{1}, B_{2}, B_{3}, B_{4}, B_{5}\}\), respectively (Table 9a)

  • A3:\(M_{3}=M_{2}\) with the column \(SL_{2}= a_{3}\) cleared and the redundant (absorbed) rows removed (Table 9b).

Table 9 The submatrices \(D_{2}\) (a) and \(M_{3}\)(b)

Start of the Algorithm_ ILE \((\mathbf{D}_{\mathbf{i}})\) at \(\mathbf{i}=\mathbf{2}\). Since the SBM \(\hbox {D}_{2}\) has 5 rows, the Algorithm_ ILE \((\hbox {D}_{2})\) will do 5 iterations.

  • H1:\(V_{0} = \{00100000\}\)

The \(\mathbf{1}{\mathbf{st}}\) iteration: \(\hbox {t}=1\), inputs are \(V_{0}\) and \(B_{1}\)

  • H2:\(V_{11} =\{P:P\in V_{0}\) and \( P \& B_{1}\ne 0\}=\emptyset ;\, V_{12}=V_{0}-V_{11} =\{00100000\}\)

  • H3:\(E(B_{1})= \{Pr_{k}(B_{1}):b_{1k} =1\}=\{00000100, 00000001\}\)

  • H4-H7:\(V_{13}=V_{12}\vert E(B_{1})=\{00100100, 00100001\}\)

  • H8:\(V_{1}= V_{11} \cup V_{13 }=V_{13}\)

The \(\mathbf{2}{\mathbf{nd}}\) iteration: \(\hbox {t}=2\), inputs are \(V_{1}\) and \(B_{2}\)

  • H2:\(V_{21}=\{P:P\in V_{1}\) and \( P \& B_{2}\ne 0\}=\{00100001\};\, V_{22}= V_{1 }-V_{21} = \{00100100\}\)

  • H3:\(E(B_{2} )=\{Pr_{k}(B_{2}):b_{2k} =1\}=\{01000000, 00000001\}\)

  • H4-H7:\(V_{13}=V_{22}\vert E(B_{2})=\{01100100\} //\)Here was generated and discarded 1 URP: 00100101

  • H8:\(V_{2}=V_{21} \cup V_{23} =\{00100001, 01100100\}\)

The \(\mathbf{3}{\mathbf{rd}}\) iteration: \(\hbox {t}=3\), inputs are \(V_{2}\) and \(B_{3}\)

  • H2:\(V_{31} =\{P:P\in V_{2 }\) and \( P \& B_{3}\ne 0\}=\{01100100\};\, V_{32}=V_{2}-V_{31} =\{00100100\}\)

  • H3:\(E(B_{3})=\{Pr_{k}(B_{3}):b_{3k} =1\}=\{01000000, 00000100\}\)

  • H4-H7:\(V_{33}=V_{32}\vert E(B_{3})= =\{01100001, 00100101\}\)

  • H8:\(V_{3}=V_{31} \cup V_{33} =\{01100100,01100001, 00100101\}\)

The \(\mathbf{4}{\mathbf{th}}\) iteration: \(\hbox {t}=4\), inputs are \(V_{3 }\)and \(B_{4}\)

  • H2:\(V_{41}=\{P:P\in V_{3}\) and \( P \& B_{4}\ne 0\}=\{01100100, 01100001\};\, V_{42}=V_{3 }-V_{41} =\{00100101\}\)

  • H3:\(E(B_{4})=\{Pr_{k}(B_{4}):b_{4k}=1\}= \{01000000, 00010000\}\)

  • H4-H7:\(V_{43}=V_{42}\vert E(B_{4})=\{00110101\}\, //\)Here was generated and discarded 1 URP 01100101;

  • H8:\(V_{4}=V_{41} \cup V_{43} =\{01100100, 01100001, 00110101\}\)

The \(\mathbf{5}{\mathbf{th}}\) iteration: \(\hbox {t}=5\), inputs are \(V_{4}\) and \(B_{5}\)

  • H2:\(V_{51}=\{P:P\in V_{4}\) and \( P \& B_{4}\ne 0\}=\{01100100, 00110101\};\, V_{52}=V_{3 }-V_{41} =\{01100001\}\)

  • H4:\(E(B_{5})=\{Pr_{k}(B_{5}):b_{5k} =1\}=\{10000000, 00010000, 00001000, 00000100\}\)

  • H5-H8:\(V_{53}=V_{52}\vert E(B_{5})=\{11100001, 01110001, 01101001\} \, //\) Here was generated and discarded 1 URP: 01100101

  • H9:\(V_{5}=V_{51} \cup V_{23} =\{01100100, 00110101, 11100001, 01110001, 01101001\}\)

  • H10:\(W_{2}=V_{5}\); The end of the Algorithm_ ILE

To complete this example, two more iterations of the Until statement with \(i=3\), and \(i=4\) are to be performed. At \(i=3\), first the SBM \(M_{3}\) (Table 9b) is partitioned by \(SL_{3}=a_{2}\) (instead of \(a_{2}\) could be used either \(a_{6}\) or \(a_{8}\)) into the SBMs \(D_{3}\) and \(M_{4}\) (Table 10), and then \(D_{3}\) is converted into the set of PBPs \({\mathbf {W}}_{3} =\{01000101,11010100,11001100, 11010001,11001001\}\).

Table 10 The submatrices \(D_{3}\) (a) and \(M_{4}\) (b)

At \(i=4\), first the SBM \(M_{4}\) (Table 10b) is partitioned by \(SL_{4}=a_{1}\) (instead of \(a_{1}\) could be used either \(a_{4}\) or \(a_{6}\) or \(a_{8})\) into the SBMs \(D_{4}\) and \(M_{5}\) (Table 11), and then \(D_{4}\) is converted into the set of PBPs \(W_{4} =\{10010101\}\).

Table 11 The submatrices \(D_{4}\) (a) and \(M_{5}\) (b)

Since the SBM \(M_{5}\) consist of a single zero row, the Algorithm_SLP_ILE is terminated with:

  • A8. \(W\!=\!\left\{ {W_j } \right\} _{j\!=1}^4 \!=\! \{10010010, 10000011, 10000110, 00010110, 00001110, 00000111\), \(11000010, 01010010, 01001010, 01000110, 01000011, 10100010, 01100010, 00110010\), \(00101010, 00100110, 00100011, 01100100, 00110101, 11100001, 01110001, 01101001\), \(01000101, 11010100, 11001100, 11010001, 11001001, 10010101\}\);

  • A9. Consider each element of W as a PI of the DF: \(W_{PI} =\{a_{1}a_{4}a_{7}, a_{1}a_{7}a_{8}, a_{1}a_{6}a_{7}\), \(a_{4}a_{6}a_{7}, a_{5}a_{6}a_{7}, a_{6}a_{7}a_{8}, a_{1}a_{2}a_{7}\), \(a_{2}a_{4}a_{7}, a_{2}a_{5}a_{7}\), \(a_{2}a_{6}a_{7}, a_{2}a_{7}a_{8}\), \(a_{1}a_{3}a_{7}, a_{2}a_{3}a_{7}, a_{3}a_{4}a_{7}, a_{3}a_{5}a_{7}\), \(a_{3}a_{6}a_{7}\), \(a_{3}a_{7}a_{8}, a_{2}a_{3}a_{6}, a_{3}a_{4}a_{6}a_{8}, a_{1}a_{2}a_{3}a_{8}, a_{2}a_{3}a_{4}a_{8}, a_{2}a_{3}a_{5}a_{8}\), \(a_{2}a_{6}a_{8}, a_{1}a_{2}a_{4}a_{6}\), \(a_{1}a_{2}a_{5}a_{6}\), \(a_{1}a_{2}a_{4}a_{8}, a_{1}a_{2}a_{5}a_{8}, a_{1}a_{4}a_{6}a_{8}\).

6 Experimental results

To estimate the performance of the proposed method, we compare the results generated by it with the results generated by RSES (Rough Set Exploration System), which is a unique exhaustive search-based AR program generating all reducts. In the experiments, we used a target machine with an Intel Core I7 3840QM@2.80 GHz processor and 16 GB memory, running on Microsoft Windows 8 Ultimate Edition OS. For experiments, we chose datasets with different characteristics such as: the number of attributes, the number of classes, the number of distinct values of the attributes, the types of attribute values (symbolic, numeric and mixed) and the number of objects (instances). We loaded these datasets into the memory without any preprocessing. The results of experiments measured by Microsoft’s process explorer utility show that our algorithm has important advantages over RSES in terms of both the memory used and time elapsed. As the examples in Table 12, the results of processing of 23 datasets from the UCI repository [40] are given. Processing some of these datasets by RSES was failed due to overflow of memory. The results of experiments are given in Table 12, where “\(>\hbox {X}\)” denotes the expression “greater than X”.

Table 12 The amounts of the memory used, the CPU-times elapsed, and the number of reducts generated by SLP-ILE and RSES approaches

As seen from Table 12, our method requires much less amount of memory and much less processing time than RSES, especially, for medium- and large-sized datasets. For example, for the dataset Sonar with 60 attributes, 208 objects and 2 classes, while RSES failed after 404788.14 seconds with no result due to memory overflow at about 1800MB of the intermediate results, our program used only 7.48 MB of the memory and generated all of 86872 reducts in only 21.03 seconds.

For evaluating the comparative performance of the reducts generated by the SLP-ILE approach, we also generated reducts for the 11 datasets given in Table 12 by the widely used AR approaches such as Correlation-based approach (CBA), Consistency approach (CA) and Wrapper approach (WA) [41]. The sizes and the attributes of the reducts generated by the mentioned AR approaches are given in Table 13.

Table 13 The size and the attributes of reducts generated by AR approaches

Note that the shortest reduct is not always the reduct with the best classification accuracy [10, 42]. Therefore, in Table 13, column of SLP-ILE are not the sizes of the shortest reduct but the sizes of the reduct with the best classification accuracies.

For evaluating the classification accuracies of the reducts generated by the SLP-ILE, CBA, CA, and WA, we used the Euclidian distance-based K-nearest neighborhood (K-nn) classification algorithm with k = 9. The classification results of the datasets given in Table 13 with 10-fold cross-validation are given in Table 14.

Table 14 The classification accuracies and the elapsed CPU-Times for classification of reducts generated by AR approaches

As it is seen from Table 14, the reducts generated by SLP-ILE for 12 of 14 datasets provide significant higher classification accuracies than those generated by other AR approaches. Moreover, the CPU time elapsed by SLP_ILE is usually lower than other AR approaches.

7 Conclusions

At present, the generation of the complete set of reducts for a dataset can be done only by the exhaustive search program RSES that, unfortunately, is very time-consuming and suffers from frequent memory overflows even for the datasets of medium dimensionality. The proposed algorithm is based on a sequential partitioning of the BM of a DF into the SBMs and preventing the generation of redundant products during the transformation of a SBM into PIs (reducts) of the DF. Although this approach reduces the WCSC of the DF-based AR in a large scale, it still remains theoretically exponential in \(n\). This is, mainly, due to that the mentioned WCSC is evaluated for the DFs containing all \((_{n/2}^n )\) combinations of \(n\) attributes with uniformly distributed frequencies. But since the space complexity of the proposed algorithm is rapidly decreased with increasing the imbalance in weights of columns in the BM, it may be quite useful for problems of tens and even of hundreds of attributes with seriously imbalanced distributions of frequencies in DFs. This is confirmed by the analysis of results of processing numerous datasets borrowed from practice and from the UCI repository. The proposed approach also allows us to process the original BM in parallel by first partitioning it into the SBMs \(D_{1},\, D_{2}, {\ldots },\, D_{k\le n-1}\), and then separately processing them and uniting the separate obtained results. In this respect, if some SBM is found to be so large that cannot be processed by the computer used, it may be processed as a pseudo-original BM with uniting the result with that of the rest of processing of the original BM. It should be noted that, the proposed approach reduces not only the WCSC of DF to DNF conversion in a very large scale, but it also reduces the WCTC of this conversion from square of WCSC to somewhat near to it. Moreover, when it is to find a single reduct, it may be found simply by considering every \(D_{i}\) as \(M_{i+1}\) and taking each SL as a component of the reduct. We believe that the proposed approach could be an efficient solution for the unweighted set cover problem used for solving wide range of discrete optimization problems in knowledge and data engineering.