1 Introduction

The use of computational processing techniques has increased in the last years owing to the massive amount of data generated nowadays. In the field of data mining, the learning of association rules (AR) is a popular method used to discover interesting and apparently hidden relations among variables in large databases [2, 4].

There exist many algorithms focused on the discovery of AR in datasets with discrete values such as AIS [1] or Apriori [2]. However, the domain of most of the real-world data applications is continuous [25], and in that case, the AR are known as quantitative association rules (QAR). A common strategy to deal with continuous values is to discretize the attributes, but this loss of information may lead to poor results in some contexts.

Evolutionary algorithms (EA) [28] and genetic programming [19] have been widely used to discover AR and QAR since they offer several advantages for knowledge extraction and, specifically, for rule induction processes. EA were originally designed to solve single-objective optimization problems. Nevertheless, the use of EA to solve multi-objective real-world optimization problems has increased recently. Multi-objective optimization searches for a set of non-dominated solutions (rules in this context), called the Pareto-optimal set [38], whereas the single-objective optimization seeks to obtain the best solution for a single objective.

The mining process of AR can be considered as a multi-objective problem in which the measures used to evaluate a rule are different objectives in conflict that require simultaneous optimization [12]. In this context, two goals are pursued: to discover the rules that will form the Pareto-optimal set, and moreover, these rules have to be as diverse as possible. The most common multi-objective evolutionary algorithms (MOEA, from now on) based on a Pareto-optimal set are strength Pareto-evolutionary algorithm (SPEA-II) [37] and non-dominated sorting genetic algorithm (NSGA-II) [10].

Most of the MOEA are based on a combination of weighted objectives in a fitness function. Although a weighted scheme could be solved by a single-objective optimization, several disadvantages are identified such as the best trade-off for all the objectives could not be found, all non-dominated solutions could not be identified or the solutions depend on the weight assignments. Hence, the methods based on the Pareto-optimal set improve the AR optimization process since neither weights nor previous information on the problem under analysis is required.

This work extends previous proposals which optimize a sum of weighted objectives into a multi-objective approach based on the well-known NSGA-II algorithm. In particular, a non-dominated MOEA to discover QAR, hereafter called MOQAR, is here described. MOQAR is able to find QAR in datasets with continuous attributes avoiding the discretization. This task is accomplished by using adaptive intervals and by not requiring that attributes belong to either the antecedent or the consequent of the rules.

On the other hand, a review of the literature reveals that there is no a general agreement on the selection of the best set of quality measures to evaluate QAR, that is, it does not exist a defined group of AR quality measures to be optimized. Therefore, the study proposed in [24] is used as first step to discover the most adequate objectives to be included in the fitness function in order to simultaneously optimize as many measures as possible.

Moreover, the effects on the performance of different strategies to rerank solutions without modifying other components of the evolutionary process in MOQAR are also analyzed. For this purpose, two substitute distance assignment methods based on those proposed in [17] and one new distance here proposed have been applied. These distances use different strategies to recompute the secondary ranking of the solutions.

In summary, the main contribution of this work is to introduce a new methodology to obtain the most suitable configuration settings to discover high-quality QAR by means of MOEA-based approaches. The methodology proposed attempts to provide the most suitable set of objectives to guide the search of the rules and distance assignment measures to rank the non-dominated solutions.

MOQAR has been tested on several real-world datasets of different nature to assess its performance according to the proposed techniques. Specifically, datasets from the public Bilkent University Function Approximation (BUFA) repository [16] have been used. The best configuration settings of objectives and distance assignments have been compared to those obtained by other existing MOEA to discover QAR, specifically, the MOEA proposed in [5, 15] and [21], henceforth named MODENAR, MOEA_Gosh and QAR-CIP-NSGAII, respectively. Nonparametric statistical tests have been used in the performance study.

The remainder of the paper is structured as follows. Section 2 presents a summary of several methods to discover QAR mainly focused on MOEA. Section 3 provides a brief preliminary on AR. Section 4 widely describes the main features of MOQAR, the statistical analysis carried out to select the best objectives to be optimized and the distances used in the secondary ranking of MOQAR. The results achieved by the different configuration settings of MOQAR according to the selected objectives and the substitute distance assignments are discussed in Sect. 5. This section also presents a comparative study to assess MOQAR’s performance versus other MOEA approaches. Finally, Sect. 6 summarizes the most relevant conclusions.

2 Related work

This section explores the different approaches recently published in the literature related to the extraction of QAR. A large amount of methods have been proposed to address the discovery of QAR by optimizing a fitness function composed of weighted objectives [22]. The authors in [26] proposed an EA-based approach, called GENAR, capable of obtaining a variable number of quantitative attributes in the antecedent of the rule. GENAR optimized a fitness function based on a sum of weighted measures such as the support and confidence measures and the number of recovered instances. The same quality measures plus the comprehensibility and the amplitude of the intervals forming the rule were included in the fitness function of the GAR-plus algorithm [28].

In [4], an EA is proposed as a search strategy to mine both positive and negative QAR within databases. The discovery of QAR was optimized by a sum weighted of objectives composed of support, confidence, number of attributes and amplitude. Another EA known as EARMGA was used in [35] in order to obtain QAR. However, the confidence was the only objective included in the fitness function. To fulfill this goal, the authors avoided the specification of the actual minimum support, which is the main contribution of this work indeed. A weighted support based on the individual weight of the items according to their importance in the dataset was calculated in [29].

Nevertheless, the majority of the approaches are focused on Pareto-based MOEA with the purpose of finding the best trade-off between two or more conflicting objectives. A thorough review of the state of the art of MOEA can be found in [36]. The QAR mining based on a multi-objective genetic algorithm (MOGA) was introduced in [15]. In that work, the comprehensibility, interest and confidence (or predictive accuracy) were the objectives to be optimized. These measures were also considered in a Pareto-based MOEA rule mining method, which used as model the GA presented in [34].

The authors in [11] proposed a fast and scalable MOGA for mining QAR from large datasets using parallel processing and a homogeneous network of workstations. The confidence, comprehensibility and interest of QAR were the target objectives.

A multi-objective Pareto-based EA called MODENAR was presented in [5]. The fitness function was composed of four different objectives: support, confidence, comprehensibility of the rule (aimed to be maximized) and the amplitude of the intervals that forms the rule (intended to be minimized). An approach for mining QAR by means of NSGA-II was proposed in [7]. Different measures were used in the evaluation, such as interestingness, comprehensibility, support, confidence and the number of attributes.

Another multi-objective EA to discover QAR is proposed in [31]. This algorithm does not take into account the minimum support and confidence and applies the FP-tree algorithm. The fitness function maximizes the correlation between support and confidence. The authors in [21] proposed a MOEA based on the NSGA-II scheme which optimizes the product between the support and confidence, the interest of the rules and the comprehensibility. Furthermore, the amplitude of the rules was also evaluated.

Finally, two approaches based on NSGA-II and SPEA-II using a grammar-guided genetic programming were presented in [20]. Both proposals obtained very frequent and reliable rules achieving an optimal trade-off in terms of support, confidence and interest.

As can be appreciated, most existing multi-objective approaches based on Pareto-optimal set extend the NSGA-II algorithm to optimize the discovery of QAR, as MOQAR does. Recently, new substitute distance arrangement mechanisms have been proposed in [17] to replace the crowding distance in NSGA-II due to its poor performance in multi-objective optimization problems. However, all the existing proposals based on NSGA-II to discover QAR typically use the crowding distance.

Table 1 summarizes the measures used for both evaluation and optimization in the works reviewed in this section. It can be noted that most of the works reviewed in this section use the support and the confidence to evaluate and optimize the quality of the rules obtained. However, the optimization of such measures is not usually enough. In fact, very general QAR could be obtained and the amplitude of the intervals could be enlarged until reaching the whole domain of each attribute if only the support is optimized. Regarding the confidence, the support of the consequent of the rule is not considered; therefore, it is not able to detect negative dependence among attributes. Other existing measures to assess the QAR quality are presented in Sect. 3.

Table 1 QAR algorithms and quality measures used in the literature

From the review of the literature, it can be concluded that an analysis of the quality measures is needed to select the most adequate objective function. In a previous work [24], the authors proposed a methodology to automatically select a subset of measures whose optimization leads to the optimization of the entire set of measures. This work was focused on finding relations among different quality measures in order to determine which measures must be included in the fitness function. To fulfill this goal, a principal component analysis (PCA) was used to select the best subset of measures.

In the light of the above considerations, this paper attempts to overcome limitations of the existing MOEA to optimize the QAR extraction process. Moreover, new features have been included in this proposal, which will be described in the following sections.

3 Association rule mining: definition and measures

This section provides a brief preliminary on AR including some definitions, in addition to some interestingness quality measures described in the literature.

The AR mining process is considered as a non-supervised learning technique to describe and explore data properties instead of predicting the class of new data. The main purpose of AR mining is to discover groups of attributes appearing frequently together in a dataset and provide comprehensive rules able to represent the existing relationships among the attributes.

Formally, AR were first defined by Agrawal et al. [1]. Let \(I=\{i_1, i_2, \ldots , i_n\}\) be a set of n items or attributes, and \(D=\{tr_1, tr_2, \ldots , tr_N\}\) a set of N instances in a dataset, where each \(tr_j\) contains a subset of items or attributes. Thus, a rule can be defined as \(X \Rightarrow Y\), where \(X, Y \subseteq I\) and \(X \cap Y = \emptyset , X\) and Y are known as antecedent (or left side of the rule) and consequent (or right side of the rule), respectively.

When the domain is continuous, the AR are known as QAR. In this context, let \(F=\{F_1,\ldots ,F_n\}\) be a set of features, with values in \(\mathbb {R}\). Let A and C be two disjunct subsets of F, that is, \(A \subset F, C \subset F\), and \(A \cap C = \emptyset \). A QAR is a rule \(X \Rightarrow Y\), in which features in A belong to the antecedent X, and features in C belong to the consequent Y, such that X and Y are formed by a conjunction of multiple Boolean expressions of the form \(F_i \in [v_1, v_2]\) (with \(v_1, v_2 \in \mathbb {R}\)). Thus, in a QAR the features or attributes of the antecedent are related with the features of the consequent, establishing an interval of membership values for each attribute involved in the rule. The consequent Y is usually a single expression.

Alternatively, it is needed to measure the quality of the rules in order to select the best ones and evaluate the results obtained by any algorithm. Support and confidence are the most common measures to evaluate AR. The support of the rule \(X \longrightarrow Y\) is the percentage of instances in the dataset that satisfy X and Y simultaneously. It is used to evaluate the generality of the rule. The confidence is the probability that instances satisfying X, also satisfy Y. In other words, it is the support of the rule divided by the support of the antecedent. Table 2 details the most popular quality measures proposed in the literature to assess the AR quality. Support and confidence measures are denoted as sup and conf, respectively, in Table 2. Furthermore, the mathematical definition, the description of the evaluated properties and the range of variability of the measures are provided. Note that n(X) is the number of occurrences of the itemset X in the dataset and N is the total number of instances in the dataset.

Table 2 Quality measures for quantitative association rules

4 Description of the algorithm

MOQAR is a non-dominated MOEA that discovers QAR for datasets with continuous attributes without a previous discretization. Adaptive intervals instead of fixed ranges are used to identify subgroups of samples whose features share certain sets of values in a continuous domain. MOQAR performs the search for the most adequate intervals, which are adjusted to find high-quality QAR. It should be noted that it is not necessary to set what variables must belong to the antecedent or to the consequent of the rules.

In the population of MOQAR, each individual constitutes a rule and each gene of an individual represents the interval bounds and the membership type of each attribute to the rule. Specifically, the representation of an individual consists of two data structures where the first structure includes all the attributes of the dataset, and the second structure indicates the membership of an attribute to the rule represented by an individual. The individual representation is described in [23]. The individuals are subject to an evolutionary process, in which specific mutation and crossover operators are applied. An iterative rule learning (IRL) [33] is performed to penalize instances already covered by rules found by MOQAR, in order to boost the covering of instances still not covered.

MOQAR reuses the individual representation and genetic operators of QARGA [23] (crossover operator and mutation operators focused on the type of the attributes or the interval of the attributes). However, MOQAR includes new features to improve the mining of QAR. The main achievements of MOQAR is that it solves the inherent drawbacks of the use of fitness functions based on a weighted objective scheme to obtain QAR as described in Sect. 1. A multi-objective approach is the best way to tackle such problems since the best trade-off among all the measures could be obtained. MOQAR is inspired by NSGA-II [10], which is a computationally fast and elitist MOEA based on a non-dominated sorting approach.

NSGA-II assigns primary and secondary rankings to solutions using two different procedures. The primary ranking is given by a non-dominated sorting and the secondary ranking by the crowding distance of the solutions [3]. The primary ranking is used to sort the solutions, and the secondary ranking is applied when solutions have equals primary rankings. Its main purpose is the evolution of the population according to the sort of the solutions in dominance fronts. The first non-dominated front is composed of the non-dominated solutions of the population (called Pareto-front), the second one is composed of the solutions dominated by one solution, the third one of solutions is dominated by two, and so forth.

The multi-objective evolutionary process of MOQAR based on the NSGA-II algorithm is thoroughly described in Sect. 4.1. Although MOQAR is a MOEA based on NSGA-II, new features have been added to improve it. These enhancements are detailed in the following sections:

  • Section 4.2 describes the generation of the initial population based on the instances previously covered by rules.

  • Section 4.3 presents the genetic operators of MOQAR. In particular, crossover and mutation operators focused on the type and intervals of the attribute have been applied.

  • Section 4.4 explains the selection of the most representative quality measures based on a principal component analysis (PCA) in order to optimize the discovery of QAR.

  • Finally, Sect. 4.5 exposes different distance criteria used in the secondary ranking of MOQAR as an alternative option to the crowding distance [10]. These distances are used in the selection of the best individual of all the non-dominated solutions of the Pareto-front at the end of the evolutionary process.

4.1 Multi-objective evolutionary model

This section describes the MOQAR’s evolutionary process. The pseudocode of the algorithm is shown in Fig. 1.

Fig. 1
figure 1

Pseudocode of the MOQAR algorithm

The algorithm starts by creating an initial population of parents, \(P_{t=0}\), with N individuals (line 4) focused on instances covered by not many rules. The individuals are evaluated by means of the measures selected as objectives (line 5). Then, the population evolves through nGen generations (line 6). Because the MOQAR algorithm aims at finding the rule that presents the best trade-off among all the objectives in each iteration, the rule that belongs to the first Pareto-front \(F_{1}\) with higher value in terms of the distance is selected (line 7). A solution a is better than other one b if a is located in a region with a greater crowding distance. Note that the extremal points of the Pareto-set are not considered to be selected since their crowding distances are always assigned with an infinite value. MOQAR performs an IRL process to boost the covering of instances still not covered. The instances covered by the best rule found are penalized (line 8) for this purpose. The samples covered by few rules have a higher priority to be selected for the initial population. The whole evolutionary process is repeated until the desired number of rules is reached and the best rules found are returned by MOQAR (line 11).

During the evolutionary process (procedure 2), an offspring population \(Q_{t}\), of same size as \(P_{t}\), is generated for each generation (line 17) by applying the crossover (line 15) and mutation (line 16) operators to a selection of individuals from \(P_{t}\). Note that the individuals are selected using binary tournament-based selection method [27]. Later, the individuals of \(P_{t}\) and \(Q_{t}\) are merged into \(R_{t}\) (line 18) and the population \(R_{t}\) is ranked using the fast non-dominated sort method (line 19), which consists in sorting the individuals of a population in different Pareto-fronts \(F=(F_1,F_2,\ldots )\) according to the level of non-domination (primary ranking). Therefore, each solution must be previously compared to every other solution in the population to check if it is dominated or not. The next population \(P_{t+1}\) (line 20) consists of the N best individuals of \(R_{t}\). The evolutionary process ends when the desired number of generations is reached (line 22).

The building process of the population for the next generation (procedure 3) works until \(P_{t+1}\) (lines 27–30) is completed with N individuals as follows: the individuals of the ith Pareto-front \(F_{i}\) are added to the population \(P_{t+1}\) if the number of individuals of population \(P_{t+1}\) is less or equal to N individuals when the individuals of the ith Pareto-front \(F_{i}\) are added (line 28). In other case, the crowding distance assignment (line 32) is used to sort the population of \(F_{i}\) (line 33), and the \(N - {\mid }P_{t+1}{\mid }\) best individuals are selected to fill the population of next generation \(P_{t+1}\) (line 34) (secondary ranking).

4.2 Generation of the initial population based on ranking

The initial population in MOQAR is generated at the beginning of each evolutionary process and is performed such that at least one sample of the dataset is covered. The samples of the dataset are selected based on their level of hierarchy. The hierarchy is organized according to the number of rules which cover a sample. Thus, the records are sorted by the number of rules they satisfy and the samples covered by few rules have the highest priority.

A sample is selected according to the inverse of the number of rules which cover the sample. Intuitively, the process is similar to the roulette selection method where the parents are selected depending on their fitness function. In this selection method, a sample is represented by a portion of roulette inversely proportional to the number of rules that cover the sample. Therefore, the samples covered by few rules have a wider portion of roulette and the probability of selection is higher. All samples are equally probable to be selected when generating the initial population for the first time.

The process that generates the initial population is listed below:

  1. 1.

    For all instances of the dataset, the cumulative sum totalSum of the inverse of the number of rules that cover every instance is computed.

  2. 2.

    A random number between 0 and totalSum is generated.

  3. 3.

    For each instance of the dataset, if the totalSum is greater or equal than the random number, then the current example is selected.

The selected example is used to generate a new individual of the initial population as follows:

  • First, the number of attributes which belong to the rule is randomly selected according to a maximum and minimum number of attributes given by the user.

  • Taking into account that number, the number of attributes in the antecedent and consequent of the rule are randomly selected limited by a maximum and minimum number of attributes given by the user.

  • Then, the attributes to appear in the antecedent and the consequent are randomly selected from those that belong to the dataset considering the structure of the rule. The user can define whether there are attributes fixed or not fixed in the consequent of the rule.

  • Finally, an upper and lower bound are randomly generated to create the interval for each attribute selected taking into account the instance selected in the above-described process.

This process is repeated until completing the number of individuals that compose the initial population.

4.3 Genetic operators

An elitist strategy replicating the individuals with the best fitness and a binary tournament-based selection method [27] for the remaining individuals of the population is applied in order to select the individuals to be parents.

The genetic operators implemented in MOQAR are the crossover and the mutation described in [23]. The mutation process consists in modifying the genes of the randomly selected individuals according to a probability. The mutation of a gene can be focused on the type of the attribute or the interval of the attribute. The choice between the mutation of the type or the interval depends on a given probability. The limits of the intervals and types of the attributes of the offspring are checked to ensure those represent meaningful rules. If any attribute does not fulfill the required constraints related to the type of attributes, the individual is discarded and a new mutation is applied to the same original individual.

In addition, the directional mutation operator, focused on the type of the attribute, has been added. If the selected attribute belongs to the antecedent of the rule, i.e., the attribute membership type is 1, then the type of the attribute changes to belong to the consequent of the rule and the new type is 2. On the contrary, if the selected attribute belongs to the consequent, i.e., the attribute membership type is 2, then the type of such attribute changes to the antecedent and now the type is 1.

4.4 Principal component analysis

A statistical analysis for the quality measures described in Sect. 3 has been applied to select the best subset of measures to be optimized by MOQAR, according to the study carried out in [24]. Following with that study, QARGA [23] has been applied to 32 datasets from the BUFA repository described in Sect. 5.1. In particular, three sets of experiments have been performed to obtain diversity in the input data of PCA. The first set of experiments has used a minimum support threshold equal to 0 to obtain all the QAR found by QARGA. The second and third sets have used a minimum support threshold equal to 0.05 and 0.1, respectively, to obtain only those QAR with a support greater or equal to the minimum support.

All configurations of QARGA have been executed by modifying the weight \(w_{a}\), corresponding to the attribute number measure, in order to include or not this measure in the fitness function of QARGA. QARGA was run 6 times (3 \(\times \) 2) for each dataset, that is, 192 times in total. Note that a maximum number of 100 QAR has been obtained for each configuration and dataset. Thus, a matrix composed of 15,487 rows (192 \(\times \) the number of QAR obtained by each configuration that satisfy the minimum threshold), and one column for each quality measure considered has been generated as PCA’s input. Once PCA has been applied to the input data, one measure representing each component is selected. Thus, all selected measures will be the objectives used by MOQAR as fitness function. Figure 2 illustrates the complete process including the application of QARGA and PCA to obtain the objectives to optimize as first step before executing MOQAR. The results obtained after applying PCA are presented and discussed in Sect. 5.3.

Fig. 2
figure 2

General process of MOQAR including the application of QARGA and PCA to obtain the objectives to be optimized

4.5 Substitute distance assignments

As defined in Sect. 4.1, a two-stage ranking is performed to define an order over the individuals of the population in the Pareto-dominance fronts of the NSGA-II algorithm [10].

The primary ranking considers that two individuals are equal if they both belong to the same Pareto-front, that is, if the individuals are dominated by the same number of solutions. In the case of a maximization problem, an individual x dominates another different individual y (denoted \(x < y\)) if the values of all objectives for x are greater than those for y. That is,

$$\begin{aligned} x<y \iff f_{i}(x) \ge f_{i}(y), \quad \forall i\in \{1,2, \ldots , M\} \end{aligned}$$
(1)

where \(f_{i}\) is the ith objective and M is the maximum number of objectives to be maximized.

When the individuals belong to the same Pareto-front, then the secondary ranking establishes differences among them by using a distance [36]. Individuals with low distance values are desirable. NSGA-II applies the crowded-comparison operator \(\prec _{n}\) to guide the selection of the individuals which will belong to the Pareto-optimal front [10]. The order \(\prec _{n}\) between two individuals x and y of the population, taking into account the result obtained by the primary ranking \(x_{r}\) and the crowding distance \(x_{dist}\), can be defined as follows:

$$\begin{aligned} x \prec _{n} y \iff (x_{r} < y_{r})\quad \text{ or }\quad (x_{r} = y_{r} \text{ and } x_{dist} < y_{dist}) \end{aligned}$$
(2)

However, the dominance among solutions becomes more and more unlikely when the number of objectives increases [17]. This fact is due to the decrease in the selective pressure in NSGA-II because the proportion of non-dominated solutions in the population is increased. Therefore, several substitute distance assignment measures have been proposed in [17] to be used instead of crowding distance to improve the NSGA-II convergence. These distance measures are focused on several criteria according to the degree in which a solution A is nearly dominated by a solution B. For instance, the number of objectives with smaller or larger values, the difference between the values of different objectives or a multi-criterion based on the former ones can be considered.

Several distances are now described to replace the crowding distance for the secondary ranking in NSGA-II and therefore in MOQAR. Specifically, the subvector dominance assignment (henceforth svDominance) and the epsilon dominance assignment (henceforth eDominance), proposed in [17], assess the highest degree in which a solution is nearly Pareto-dominated by any other solution. On the other hand, the individuals dominance assignment (henceforth indDominance) is a novel measure, proposed in this work for the first time based on the number of individuals dominated for each individual. All the distances described in this section have been included in MOQAR, as well as the crowding distance. The experimental results for all distances considered are reported in Sect. 5.4.

4.5.1 Subvector dominance assignment (svDominance)

Let A and B be two solutions of a set S of solutions, the subvector dominance distance counts the number of objectives of a solution B that are better than the corresponding objectives of A. It is applied to every individual \(B \ne A\) that belongs to S, and the largest value is designated as the distance of the solution A. The svDominance distance is computed as follows [3]:

$$\begin{aligned} svd(A, B) = count\{f_{i}(B) > f_{i}(A),\quad i = 1,2, \ldots , M \} \end{aligned}$$
(3)

where \(f_{i}\) is the ith objective and M is the maximum number of objectives to be optimized. The secondary ranking \(d^{S}_{A}\) of A is given by:

$$\begin{aligned} d^{S}_{A} = max \{svd(A, B) \quad \forall B \in S \quad B \ne A\} \end{aligned}$$
(4)

The smaller the distance value is, the better the solution is, and therefore, a smaller \(d^{S}_{A}\) implies a better secondary ranking.

4.5.2 Epsilon dominance assignment (eDominance)

Let A and B be two solutions of a set S of solutions, and the epsilon dominance distance considers all the objectives of B that are worse than the corresponding objectives of A. Then, it calculates the necessary largest magnitude epsilon for solution B to become a solution Pareto-dominating of A. It is applied to all the individuals \(B \ne A\) which belong to S, and the smallest value is designated as the distance of the solution A. The eDominance distance is computed as follows [3]:

$$\begin{aligned} mepsd(A, B) = max\{f_{i}(A) - f_{i}(B), i = 1,2, \ldots , M \} \end{aligned}$$
(5)

where \(f_{i}\) is the ith objective and M is the maximum number of objectives to be optimized. The secondary ranking \(d^{S}_{A}\) of A is given by:

$$\begin{aligned} d^{S}_{A}=min \{mepsd(A, B) \quad \forall B \in S \quad B\ne A \} \end{aligned}$$
(6)

The larger the distance value is, the better the solution is, and the higher the difficulty is to make the other solutions Pareto-dominating of A. Therefore, a larger \(d^{S}_{A}\) implies a better secondary ranking.

4.5.3 Individuals dominance assignment (indDominance)

Let A and B be two solutions of a set S of solutions, and the individuals dominance distance calculates the difference between the number of solutions dominated by a solution A and the number of solutions dominated by a solution B, taking into account the dominance definition previously described in this section. It is applied to all the individuals \(B \ne A\) which belong to S and the largest value is designated as the distance of the solution A. The indDominance distance is computed as follows:

$$\begin{aligned} ind(A, B) = \{|P_{A}| - |P_{B}|\} \end{aligned}$$
(7)

where \(P_{A}\) and \(P_{B}\) are the set of solutions which belong to the population of individuals dominated by the solution A and B, respectively. The secondary ranking \(d^{S}_{A}\) of A is given by:

$$\begin{aligned} d^{S}_{A} = max\{ind(A, B) \quad \forall B \in S \quad B \ne A \} \end{aligned}$$
(8)

The larger the distance is, the better the solution is, and therefore, a larger \(d^{S}_{A}\) implies a better secondary ranking.

5 Experimentation

This section presents and discusses the experiments carried out to assess the MOQAR’s performance for the datasets described in Sect. 5.1. Specifically, different configuration settings have been executed: first, by using the distance assignments specified in Sect. 4.5 as substitutes of the crowding distance in the secondary ranking of the multi-objective evolutionary process, and second, by optimizing different sets of objectives according to the statistical study detailed in Sect. 4.4. Furthermore, a comparative analysis is performed to compare MOQAR to other similar techniques.

This section is divided as follows. Section 5.1 describes the datasets from the public BUFA repository, in which MOQAR has been tested. Then, a summary of the main parameter settings of MOQAR and the MOEA (MODENAR [5], the MOEA proposed in [15] henceforth named MOEA_Ghosh, and the MOEA described in [21] hereinafter called QAR-CIP-NSGAII) analyzed in the comparative study can be found in Sect. 5.2. The set of measures to be optimized according to the results provided by the PCA is presented in Sect. 5.3. Section 5.4 discusses the experimental results obtained by MOQAR with different configuration settings related to distances and objectives. Finally, the performance of the best configurations of MOQAR are compared to MODENAR, MOEA_Ghosh and QAR-CIP-NSGAII in Sect. 5.5.

5.1 Datasets description

This section summarizes the datasets from BUFA repository [16] used for the experiments. The number of records and attributes of the thirty-two datasets is summarized in Table 3.

Table 3 Public datasets from BUFA repository

Note that Buying, College, Education, Read and Usnews Colleged have been preprocessed in order to deal with missing values. Namely, the K-means imputation method, proposed in [18] and available in the KEEL tool [6], has been applied.

5.2 Parameters setup of MOQAR and the algorithms used in the comparative study

Table 4 summarizes the main parameter values of MOQAR and each MOEA used in the comparative study. MOQAR is compared to MODENAR [5] (Multi-objective differential evolution algorithm for mining numeric association rules), MOEA_Ghosh [15] (Multi-objective rule mining using genetic algorithms) and QAR-CIP-NSGAII [21] algorithms, which are available in KEEL software. It is noteworthy that the parameter settings of the studied algorithms are those appearing as default parameter values in the KEEL tool, having followed the kind suggestions of the authors.

Table 4 Parameters setting of the algorithms

5.3 Selecting the measures to be optimized by MOQAR

This section details the statistical analysis performed by the analysis of PCA. This analysis has been applied to the following quality measures: support of the rule, support of the antecedent, support of the consequent, confidence, leverage, accuracy, lift, gain, certainty factor, amplitude and finally, number of attributes of the rule. Note that the conviction measure has been excluded because a positive infinitive value for this measure is obtained for most of the rules since the confidence of them is 1. Therefore, PCA cannot be computed when infinitive values are presented in the input data.

Table 5 shows the principal components extracted after the application of PCA to the thirty-two datasets as detailed in Sect. 4.4. It can be noticed that three principal components, each of them corresponding to an independent group of measures, have been obtained. Accuracy, gain and support of the consequent belong to the first group because they are the most correlated measures in the first component. Note that the second component is composed of the certainty factor and confidence measures. Finally, support of the rule and leverage belong to the third group due to the high correlation that theses measures present in the third component.

Table 5 Matrix of rotated components obtained by PCA from BUFA datasets

The next step is to select a representative measure for each component. The accuracy, confidence and leverage measures are selected to be optimized by MOQAR since these measures present the highest correlation in each group (0.997, 0.963 and 0.916, respectively). However, support of the antecedent has a similar correlation in the second and third component and lift does not present a high correlation in any component. Hence, these measures should be also considered as objectives since they do not belong to a specific component. Note that amplitude of the intervals and number of attributes are not considered as objectives since optimal values depend on the type of desired rules.

From the previous analysis, two set of objectives are considered to be maximized to assess the quality of the QAR obtained by MOQAR. Specifically, the first set of objectives is composed of accuracy, confidence and leverage, and the second one is composed of the lift and support of the antecedent in addition to the aforementioned three measures.

5.4 Performance of MOQAR using different distances and objectives

This section presents a comparative analysis for different MOQAR’s configuration settings and for the two sets of objectives obtained in the previous section. Additionally, the crowding distance defined to sort the individuals in the secondary ranking of NSGA-II scheme has been compared to the distances described in Sect. 4.5.

The average results obtained for the fifty executions for each set of objectives selected in Sect. 5.3 and for each distance assignment described in Sect. 4.5 are summarized in Table 6. Percentage of records, support of the rule, support of the antecedent, support of the consequent, lift, confidence, accuracy, leverage, certainty factor and gain have been the measures used in order to evaluate the QAR. Note that the best configuration setting for each quality measure has been emboldened.

Table 6 Average results obtained by MOQAR for each configuration setting

When the number of objectives is five (lift and support of the antecedent are included), the percentage of records decreases in all distance assignments except indDominance. A similar behavior was found for the support of the rules, support of the antecedent and support of the consequent. Lift and accuracy values are incremented when crowDistance and eDominance are applied. On the other hand, confidence, leverage, certainty factor and gain get worse results when the set of five objectives is optimized.

In general terms, the substitute distance assignments provide more specific, reliable and accurate rules when three objectives are optimized by MOQAR. It can observed that crowDistance and svDominance are the distances that present the worst performance when five objectives are maximized. Alternatively, the proposed distance assignment indDominance and eDominance are the best ones when three objectives are optimized.

A statistical analysis has been conducted to assess the performance of MOQAR according to different objectives to be optimized and distances following the nonparametric procedures discussed in [13]. In particular, the analysis is devoted to determine the best configuration setting for each measure separately and for the normalized average of all the measures.

5.4.1 Statistical tests for each quality measure separately

Friedman and Iman-Davenport tests have been applied to assess whether there are global differences in the configuration settings of MOQAR for all the measures considered. The ranking obtained for each measure for the level of significance \(\alpha = 0.05\) is summarized in Table 7. The average ranking of all measures is shown in the rightmost. The best setting for each quality measure has been emboldened. It can be appreciated that the best average ranking values are obtained for the indDominance and eDominance distances when optimizing three objectives.

Table 7 Average ranking obtained by Friedman of each quality measure for each configuration setting of MOQAR

The statistics obtained by Friedman and Iman-Davenport tests have been greater than the critical values associated with each measure, according to the \(\chi ^2\) and \(F-Snedecor\) distributions. Therefore, the null hypothesis is rejected, that is, there are significant differences among the configuration settings of MOQAR for each quality measure. Hence, a post hoc statistical analysis has been performed in order to find significant differences among the results obtained for each one of the different configuration settings with respect to the best configuration setting for every quality measure.

Table 8 summarizes the results obtained by Friedman and Iman-Davenport tests and the Holm and Finner post hoc statistical tests for a level of significance \(\alpha = 0.05\). The column Measure describes the quality measure under study. They are grouped according to the algorithm that finds the best solution. The PCA component column shows the number of the principal component obtained by PCA in Sect. 5.3 for each measure specified in the first column. The Control algorithm column details the configuration setting determined as control algorithm, that is, the best configuration setting for each measure. It is worth noting that all the measures belonging to the same PCA component share the same control configuration setting obtained for the Friedman and Iman-Davenport tests. Consequent support achieves the best values with eDominance and three objectives, but it can be observed that is negatively correlated with the first PCA component. Therefore, the optimization of the set of three objectives using the indDominance distance should lead to the optimization of all the measures belonging to the first and second PCA components in addition to lift. On the contrary, if MOQAR uses the eDominance distance with the set of three objectives, it should maximize the third PCA component. Alternatively, if MOQAR optimizes the set of five objectives and uses the svDominance distance, it should optimize the support of the antecedent, which does not belong to a predetermined group.

Table 8 Post hoc comparison table (\(\alpha = 0.05\)) of each quality measure for each configuration setting of MOQAR

Finally, the fourth column specifies the configuration settings in which the null hypothesis is accepted with respect to the control algorithm for each measure. That is, there are not significant differences between the specified configuration settings and the best one (Control algorithm column).

5.4.2 Statistical tests for the normalized average of all the quality measures

The average of the results retrieved for all the measures considered for each configuration setting of MOQAR has been calculated. Note that it is necessary to normalize (taking into account the minimum and maximum values obtained in the experimentation) all measures in order to compute the average. The normalized value of each measure for each configuration setting and dataset is computed considering their minimum and maximum values, which are calculated according to that obtained by all the configuration settings and all datasets studied in the experimentation.

Friedman and Iman-Davenport (ID) tests have been applied to determine which configuration setting presents the best performance for the mean of all measures since there is no single algorithm that has obtained the best performance for all the considered measures.

The results obtained by both tests for the level of significance \(\alpha = 0.05\) have shown that there exist significant differences in the results obtained for each different configuration settings of MOQAR as the p-values obtained from both tests are lower than the level of significance considered. Therefore, a post hoc statistical analysis is required.

The average ranking obtained for the average of the normalized measures considering the level of significance \(\alpha = 0.05\) is summarized in Table 9. The ranking obtained agrees with that of Table 7 since the top two is composed of eDominance and indDominance using three objectives.

Table 9 Average ranking obtained by Friedman for the normalized average of quality measures for MOQAR

The Holm and Finner tests have been applied to compare separately the control configuration setting, that is, eDominance using three objectives against the other configuration settings. Table 10 shows the sorted p-values obtained for each configuration setting for the level of significance \(\alpha = 0.05\). These tests allow to conclude that eDominance with three objectives has better performance in six out of seven configurations for such level of significance. The post hoc statistical tests reject the hypotheses that have a p-value less or equal to 0.05 in the Holm and Finner tests. It can be noted that indDominance using three objectives is not significantly better than eDominance using three objectives, but is significantly better than the other six settings according to the p-values. In short, eDominance and indDominance distances present a similar performance for the average of all the measures when considering three objectives. However, each one of these two configurations could be applied depending on the desired type of rules: eDominance can be used to obtain general rules, that is, QAR with a high support; and indDominance to obtain specific rules with high confidence and accuracy. Finally, the svDominance presents the worst performance regardless of the number of objectives.

Table 10 Post hoc comparison table (\(\alpha =0.05\)) for the normalized average of quality measures for each configuration setting of MOQAR (Control algorithm: eDominance-3)

5.5 Comparison to other algorithms

This section details a comparative study between the best configuration settings of MOQAR, MODENAR [5], MOEA_Ghosh [15] and QAR-CIP-NSGAII [21]. As a result of the previous section, eDominance and indDominance have been the distances used in the secondary ranking and three objectives, confidence, leverage and accuracy, to be maximized by MOQAR have been considered.

The average results obtained by MODENAR, MOEA_Ghosh, QAR-CIP-NSGAII and MOQAR are summarized in Table 11. Note that MOQAR-e and MOQAR-i denote MOQAR when using the eDominance and indDominance distances for the secondary ranking, respectively. The best value for each quality measure and each algorithm is emboldened.

Table 11 Average measures of the rules obtained by MODENAR, MOEA_Ghosh, QAR-CIP-NSGAII and MOQAR algorithms

The percentage of covered records and average rule support for the QAR obtained by MODENAR, MOEA_Ghosh, QAR-CIP-NSGAII, MOQAR-e and MOQAR-i point out how general the rules are. It can be noted that the mean percentage of records covered by the rules obtained by MOQAR-e reaches values close to 100 % for all the datasets, whereas the percentage of records covered by MODENAR and MOEA_Ghosh is below 70 %. Specifically, MOQAR-e obtains the 100 % in 29 datasets. QAR-CIP-NSGAII and MOQAR-i obtain the second best values. Regarding the average support, MOEA_Ghosh obtains highly general QAR since average values close to 50 % are obtained, in contrast to the specific rules found by MOQAR-i.

Alternatively, the confidence, certainty factor, leverage, lift and gain measures are devoted to evaluate the reliability of the rules. The QAR found by MOQAR-e, MOQAR-i and QAR-CIP-NSGAII present a confidence value above 95 %. MOEA_Ghosh obtains the worst values regarding confidence measure.

MOQAR-i outperforms all the algorithms considered in the comparative study in terms of lift, accuracy, certainty factor and gain. Additionally, MOQAR-e is the best algorithm with regard to leverage measure. It is noteworthy that MODENAR and MOEA_Ghosh obtain the worst values for gain, certainty factor and leverage because the support of the consequent is close to 50 %. MOEA_Ghosh achieves the higher number of rules although this algorithm is not able to cover the entire dataset. Note that MOQAR-e and MOQAR-i cover most of instances of the datasets with less than 100 rules which lead to conclude that rules are more diverse and less overlapping.

The average runtime in seconds is also summarized in Table 11 for all the approaches. It can be observed that QAR-CIP-NSGAII algorithm achieves the best runtime. Although MOQAR-e and MOQAR-i spend more computational time, it should be considered that the number of evaluations spent by MOQAR-e and MOQAR-i is 20 times higher than the rest of algorithms. This fact is due to the iterative scheme implemented in both proposals since a complete evolutionary process is executed for each rule to obtain. Therefore, if the number of evaluations invested in each iteration is 10,000 and the fixed number of rules to obtain for each iteration is 100, the presented runtime refers to 1 million of evaluations. Hence, 50000 evaluations would be executed in 30 s (on average approximately) by MOQAR-e and MOQAR-i, which would be comparable to other algorithms such as MOEA_Ghosh (Table 11).

It can be concluded that although MOEA_Ghosh presents better results in terms of support of the rules, MOQAR discovers more precise and interesting rules with a strong dependency between the antecedent and consequent. Furthermore, MOQAR reaches higher information gain on the rules regarding the consequent, when the antecedent is also present. In spite of the support of the rules obtained by MOQAR is slightly lower, this fact should not be considered particularly relevant since this measure only determines how general QAR are. Additionally, support of the rules is not able to find negative dependencies between the antecedent and the consequent and high values for that measure do not guarantee quality rules.

In the light of the results obtained, it can be stated that MOQAR presents the best trade-off among all the measures under study. Furthermore, the optimization of the selected measures by the methodology based on PCA leads to the optimization of the whole set of quality measures specified in Sect. 5.3. Representative QAR obtained by both MOQAR-e and MOQAR-i are shown in Table 12. As can be observed, all the measures reach values close to their maximum values, whereas other algorithms usually obtain worse results in the measures not maximized in their fitness function. Alternatively, the use of alternative distances to crowding distance involves better results in terms of quality measures, not only for optimization problems with many objectives but also for problems with three objectives.

Table 12 Example of QAR obtained by MOQAR-e (ID 1 and ID 2) and MOQAR-i (ID 3 and ID 4) using AI dataset

Finally, a nonparametric statistical analysis has been carried out to determine whether significant differences are presented between the results obtained by MOQAR and the other approaches. For this purpose, support of the rule, percentage of covered instances, lift, confidence, accuracy, leverage, certainty factor and gain measures obtained from the results obtained by MODENAR, MOEA_Ghosh, QAR-CIP-NSGAII and MOQAR for the thirty-two datasets from BUFA have been calculated. The number of rules have not been included in the statistical analysis since a larger number of them does not necessarily involve that an algorithm is better.

Friedman and Iman-Davenport tests have been applied for a level of significance \(\alpha = 0.05\). Note that there exist significant differences for each quality measure for the algorithms under study, since the p-values are lower than the level of significance considered. Therefore, a post hoc statistical analysis has been carried out.

Table 13 shows the average rankings obtained by the Friedman test for the studied algorithms. The best algorithm for each measure is stressed in bold. It can be appreciated that MOQAR is the best algorithm in six out of seven measures, and in particular, MOQAR-i is the best configuration setting in five measures.

Table 13 Average rankings obtained by Friedman for each quality measure for all algorithms

Regarding the support of the rules, MODENAR is the best algorithm, although MOEA_Ghosh achieves better average values. It can be noted that such algorithm is better in terms of support, but the rules obtained cover around the 67 % of instances of datasets (see Table 11). On the contrary, both MODENAR and MOEA_Ghosh are the worst algorithms in the ranking for the remaining measures.

In short, MOQAR-i is the best algorithm for confidence, certainty factor, lift, accuracy and gain measures, MOQAR-e for the leverage measure and percentage of covered records and MODENAR for support of the rule.

Finally, Table 14 summarizes the results obtained by the Holm and Finner post hoc statistical tests for each quality measure. It can be observed that MOQAR-i is the control algorithm for the most of the measures, excluding the support, percentage of covered records and leverage measures in which MODENAR and MOQAR-e are the control algorithms, respectively. As for the support, MOQAR-e does not present significant differences with MODENAR. According to lift, CF and gain, QAR-CIP-NSGAII does not present significant differences with MOQAR-i.

Table 14 Post hoc comparison table (\(\alpha =0.05\)) for each quality measure for all algorithms

6 Conclusions

An algorithm, MOQAR, based on the NSGA-II scheme, has been improved to discover high-quality QAR. First, MOQAR has found the best trade-off among all the measures by maximizing the objectives selected by a principal component analysis. Namely, accuracy, confidence and leverage have been the set of objectives that better summarize all the considered measures. Second, several distances have been proposed to replace the well-known crowding distance used to obtain the secondary ranking in the Pareto-fronts achieved by the evolutionary process of MOQAR. In particular, the iDominance and eDominance distances have outperformed the quality of the QAR obtained by MOQAR when comparing to those found by using the crowding distance. Finally, thirty-two datasets from BUFA repository have been considered to compare the best configuration settings of MOQAR to other existing evolutionary algorithms such as MODENAR, MOEA_Ghosh and QAR-CIP-NSGAII. From the statistical tests, it can be concluded that MOQAR outperformed the remaining algorithms for all measures except for the rule support.

As future work, the authors intend to consider other modern approaches based on decomposition or differential evolution as a base learning to discover efficiently high-quality QAR.