Introduction

In today’s scenario, data is growing explosively, and it is available in many various forms (numerical, text, images, etc.). To manage this humanly unmanageable large amount of data, researchers and data scientists have developed many techniques. In knowledge discovery in databases (KDD), data mining is a popular technique for extracting the required information and finding patterns between data items. Association rule mining(ARM), classification, clustering, regression, etc., are a few well-known data mining techniques. Agrawal [2] introduced ARM in 1993 for finding the relationship between different data items, and later, he proposed the Apriori [3] algorithm and its version to discover interesting rules in large databases. ARM is widely used in market basket analysis, medical diagnosis, and bio-informatics. Apriori and FP-growth [28] are also the most popular algorithms in classical association rule mining. Different authors have various opinions about the discretization process and ARM. Recently, Draheim [18] “provides a frequentist semantics for conditionalization on partially known events, which is given as a straightforward generalization of classical conditional probability via so-called probability testbeds.”

The classical association rule mining deals only with the binary attributes, whereas real-world data have mixed attributes (numerical, categorical). Therefore, whenever data is in numerical form (height, weight, or age), the data items need to be changed from numerical to discrete using a discretization process. This process of finding association rules in numerical data items has been referred to as numerical association rule mining (NARM) or quantitative association rule mining (QARM) [60]. Initially, NARM was started with the discretization method, and later many methods (optimization, discretization, distribution) are proposed in the literature. Therefore, many other authors investigated the discretization method and proposed various alternatives to the discretization method.

In the literature, various methods with multiple algorithms are discussed; however, selecting an appropriate algorithm for a NARM task with valid reasons is not yet discussed. This article extends our previous work [32] and provide a detailed study of thirty NARM algorithms under different NARM methods. We also investigate how far the discretization techniques have been used in the numerical association rule mining methods.

We conduct an automated search process over Scopus Database and manual search on Google Scholar. We decide to have the term (“Numerical Association Rule Mining” OR “Quantitative Association Rule Mining”) to search in abstract, title, and keyword. Our research is limited to the articles published between the years 1996-2020. The selected papers are again assessed on the following criteria:

  • Papers introducing novel algorithm in numerical association rule mining or quantitative association rule mining.

  • Papers extending the existing algorithm in numerical association rule mining or quantitative association rule mining.

Moreover, we use the following criteria to exclude the papers from the list of searched papers:

  • Papers introducing the application of NARM algorithm in any field.

  • Papers published in languages other than English.

  • Technical reports, thesis and other documents had no peer-review process.

The paper is structured as follows. In section “Preliminaries,” we describe preliminaries. In section “Methods to Solve Numerical ARM Problems,” we discuss all three methods to solve numerical association rule mining problems. In section “The Optimization Method,” the optimization method is discussed with all its sub-methods. In section “The Distribution Method,” the distribution method is introduced and discussed, and in section “The Discretization Method,” the discretization method is discussed. A discussion on various methods and algorithms is given in section “Discussion.” The conclusion is given in section “Conclusion.”

Preliminaries

In this section, we provide basic introductions about ARM and NARM.

Association rule mining

In ARM, association rules are based on the If-then relations, which consist of antecedents (If) and consequents (Then) [2]. For example, (1) shows the following association rule: “If a customer buys bread, then he also buys milk.” Here, Bread appears as antecedent and Milk as consequent. Generally, an association rule may be represented as a production rule in an expert system, an if statement in a programming language, or an implication in a logical calculus.

$$\begin{aligned} \{\mathrm{Bread}\}\Rightarrow & {} \{ \mathrm{Milk} \} \end{aligned}$$
(1)

In a database, let I be a set of m binary attributes \(\{i_1, i_2, i_3, \ldots , i_m \}\) called database items. Let T be a set of n transactions \(\{t_1, t_2, t_3, \ldots , t_n\}\), where each transaction \(t_i\) has a unique ID and consists of a subset of the items in I, i.e., \(t_i \subseteq I\). As in (1), an association rule is an implication of the form

$$\begin{aligned} X \Rightarrow Y \end{aligned}$$
(2)

where \(X, Y \subseteq I\) (itemsets) and \(X \cap Y = \emptyset\). An association rule can be extracted on the basis of two important measures: support and confidence. Support of an association rule can be defined as the percentage of transactions of the total records containing both sets of items X and Y that are \((X\cup Y)\). Confidence of an association rule can be described as the percentage of transactions that contain X also contain Y.

$$\begin{aligned} \mathrm{Support} (X \Rightarrow Y)= & {} {\mathrm{Supp}(X\cup Y)} \end{aligned}$$
(3)
$$\begin{aligned} \mathrm{Confidence}(X \Rightarrow Y)= & {} \frac{\mathrm{Supp}(X\cup Y)}{\mathrm{Supp}(X)} \end{aligned}$$
(4)

For instance, with the reference of Table 1, we can understand the concept of support and confidence. The support of the association rule \((\mathrm{Bread} \Rightarrow \mathrm{Milk})\) is 2/6= 0.33. Since both items are bought together two times out of six transactions, so support is called 20%. However, both items are bought together two times out of four transactions that contain Bread. This indicates the confidence 2/4= 0.5 is 50%.

Table 1 Market basket analysis in association rule mining

In ARM, to find out the interesting rules, various interestingness measures are proposed in the literature [58]. In classical ARM, frequent itemsets and association rules are discovered from a Boolean dataset; therefore, it is also known as binary or Boolean ARM. Table 2 shows a Boolean dataset for classical ARM. This table contains attributes corresponding to each item and a row corresponding to each transaction. Each attribute has a value “1” if the item is available in the transaction else “0”.

Table 2 Example of Boolean dataset

Numerical Association Rule Mining

To extract association rules from numerical data, the problem of the quantitative or categorical attribute was first discussed by Srikant in 1996 [60]. In NARM, whenever data is in numerical form (height, weight, or age), the data items need to be changed from numerical to discrete using a discretization process. This process of finding association rules in numerical data items has been referred to as numerical association rule mining (NARM) [60]. NARM can easily be understood by the following example.

$$\begin{aligned} \mathrm{Age}~[25,40] \wedge \mathrm{Gender}:[\mathrm{Female}] \Rightarrow \mathrm{Salary}~[1300,2000]\\ (\mathrm{Supp}=30\%, \mathrm{Confidence}=60\%) \end{aligned}$$

Given a set of transactions T, let Antecedent denote the set of transactions in T in which Age has a value between 25 and 40 and Gender is Female. Similarly, let Consequent denote the set of transactions in which Salary has a value between $1300 and $2000. For instance, with reference to Table 3, here \(\mathrm{Supp}=30\%\) denotes that \(30\%\) of the employees are females and between the ages 25 and 40, earning a salary of between $1300 and $2000. \(Conf=60\%\) denotes that \(60\%\) of the female employees between age 25 and 40 are earning a salary of between $1300 and $2000. Here Age and Salary are numerical attributes and Gender is a categorical attribute.

Table 3 Example of numerical values dataset

As an early solution, the problem of association rules for numerical data was solved using a discretization process where numeric attributes are divided into different intervals and, henceforth, these attributes are treated as categorical attributes [12]. For example, an attribute Age with values between 20 and 80 can be divided into six different age intervals \((20\!-\!30,30\!-\!40,40\!-\!50,50\!-\!60,60\!-\!70,70\!-\!80)\). The data discretization process is an obvious solution; however, it reveals a loss of valuable information, which might cause poor results [17]. Thus, we review solutions from three different approaches (discretization, distribution and optimization) to solve issues with numerical association rule mining in section “Methods to Solve Numerical ARM Problems.”

Methods to Solve Numerical ARM Problems

To solve the issues in NARM, three main approaches (discretization, distribution and optimization) have been discussed in the literature. Based on these three approaches, many different NARM algorithms are proposed. The optimization method has several sub-methods as swarm intelligence and evolution-based algorithms, covering most of the area to deal with NARM. The distribution method does not contribute much in this area; however, the discretization method is a common method that transforms continuous attributes into discrete attributes and it is further subdivided into three sub-methods. Figure 1 (also compared with Fig. 1 in [9]) shows all three approaches and different algorithms proposed under each approach.

Fig. 1
figure 1

Different methods and algorithms to solve numerical association rule mining problems

The Optimization Method

To solve the NARM problems, many researchers have moved towards optimization methods. Optimization methods provide a robust and efficient approach to explore a massive search space. In this method, researchers have invented a collection of heuristic optimization methods inspired by the movements of animals and insects. For finding association rules, optimization methods work in two phases. In the first phase, all the frequent itemsets are found and in the second phase, all relevant association rules are extracted. As shown in Fig. 1, optimization methods are divided into bio-inspired optimization and physics-based optimization methods. Table 4 shows an overview of all those algorithms that come under the optimization method.

Table 4 An overview of optimization method algorithms for NARM

The Bio-inspired Optimization Method

Biology-based algorithms are generally divided into two parts: swarm-intelligence-based algorithms and evolution-based algorithms [15]. The main origin of these algorithms is the biological behavior of natural objects [68].

Evolution-Based Algorithms

Evolution-based algorithms are inspired by Darwinian principles and were first applied in [48]. These algorithms mimic the capability of nature to develop living beings that are well-adapted to their environment [68]. Evolution-based algorithms exploit stochastic search methods that follow the idea of natural selection and genetics. The algorithms show strong adaptability and self-organization [15] and use biology-inspired operators such as crossover, mutation, and natural selection [68]. The Genetic Algorithm [30] and the Differential Evolution Algorithm [63] are two examples of evolution-based algorithms. Table 5 shows an overview of the evolution-based algorithms for NARM, together with concepts.


Genetic Algorithms (GA) GA was first proposed by Holland [30] and they are one of the most popular algorithms in bio-inspired optimization methods. A basic genetic algorithm consists of five phases: initialization, evaluation, reproduction, crossover, and mutation. GAs for NARM can be divided into three fields, i.e., basic genetic algorithms, genetic programming and multiobjective genetic algorithms. A basic genetic algorithm has been proposed by Mata et al. [47] and together with the tool GENAR (GENetic Association Rules) to discover association rules with numeric attributes. With this tool, an undetermined amount of numeric attributes in antecedent and unique numeric attribute in consequent can be obtained. Association rules in GENAR algorithms allow for intervals (maximum and minimum values) for each numeric attribute. Mata et al. [48] further extended the GENAR algorithms and proposed a technique named GAR (Genetic Association Rule) to discover association rules in numeric databases without discretization. Authors present a technique to find frequent itemsets in numeric databases without needing to discretize numeric attributes. This algorithm was useful only for finding the frequent itemsets, not for association rules. In this paper, a genetic algorithm was used to find the suitable amplitude of the intervals that conform k-itemset and can have a high support value without too wide intervals. In [39], the GAR algorithm was further extended to EGAR (extended genetic association rule). This algorithm generates frequent patterns with continuous data [48].

A genetic-based strategy and two other algorithms ARMGA and EARMGA, were proposed by Yan et al. [74]. In this approach, an encoding method was developed with relative confidence as the fitness function. ARMGA was proposed for Boolean ARM and EARMGA for quantitative attributes or generalized association rules. In these algorithms, there was no requirement of a minimum support threshold. The GAR-plus tool was presented by Alvarez [10]. This tool deals with categorical and numeric attributes in large databases without any need for a prior discretization of numeric attributes.

In 2013, Salleb et al. [56] proposed “Qu antMiner, a quantitative association rule mining system based on the genetic algorithm. This tool dynamically discovers meaningful intervals in association rules by optimizing both the confidence and the support values.

Seki and Nagao [57] worked on GA-based QuantMiner for multi-relational data mining and developed RelQM-J, a tool for relational quantitative association rules in Java programming language. In this tool, efficient computation of the support of the rules has been realized by using a hash-based data structure.

A real-coded [36] genetic algorithm was presented in [46] in 2010. The proposed algorithm RCGA follows the CHC binary-coded evolutionary algorithm [19]. RCGA algorithm has been applied to pollutant agent time series and helps to find all existing relations between atmospheric pollution and climatological conditions.

Table 5 An overview of evolution based algorithms for NARM

Genetic Programming for ARM Genetic Programming [37] is a well-known type of GA. In GA, the genome is in string structure, while in GP, the genome is in the form of tree structure [29]. Genetic Network Programming (GNP) is a graph-based evolutionary algorithm and finds the association rules for continuous attributes. In this method, important rules are stored in a pool and these extracted rules are measured by the chi-squared test. This pool is updated in every generation by exchanging the association rule with a higher chi-squared value for the same association rule with a lower chi-squared value [64].


Multi-Objective Genetic Algorithm The multi-objective genetic algorithm was proposed by Fonseca et al. [21] in 1993. Generally, the resource consumption of an association rule mining computation is affected by two parameters, i.e., minimum support and minimum confidence. In classical ARM algorithms, only a single measure (support or confidence) has been used as a measure to evaluate the rule interestingness, therefore, if the values of minimum support and minimum confidence are not appropriately set, then the number of association rules may be significantly less, or it may be very large. This problem can be solved by using more objectives or measures as referred to in multi-objective ARM.

Gosh and Nath [23] used a Pareto-based genetic algorithm to solve the multi-objective rule mining problem using three measures: interestingness, comprehensibility and predictive accuracy. The single-objective algorithm, ARMGA [74], had issues that were addressed by introducing the multi-objective genetic algorithm called ARMMGA by Qodmanan et al. in [53]. The ARMGA algorithm finds high confidence and low support rules, whereas ARMMGA finds high confidence and high support rules. ARMGA has a large set of rules  compared to ARMMGA; this problem was solved using a new fitness function in ARMMGA. To prevent invalid chromosomes in ARMGA, new crossover and mutation operators are presented in the literature.

Srinivasan and Deb [61] proposed a  non-dominated genetic sorting algorithm to solve multi-objective optimization problems. In 2002, Deb et al. [16] extended NSGA to NSGA-II. In 2011, Martin et al. [45] extended NSGA-II with a trade-off between interpretability and accuracy. NSGA-II performs evolutionary learning of intervals of attributes. For each rule, condition selection is made for three objectives (interestingness, comprehensibility and performance). This method did not depend on minimum support and confidence thresholds. Martin et al. again extended their research on NSGA-II to a new approach called QAR-CIP-NSGA-II and compared the results of this algorithm with other MOEA(Multi-objective evolutionary algorithm) algorithms.


Differential Evolutionary Algorithms Differential evolutionary (DE) algorithms are evolution-based algorithms. These algorithms were proposed by Storn and Price in [62]. DE algorithms are simple and effective single-objective optimization algorithms that solve real-valued problems based on the principle of natural evolution. DE algorithms use Genetic-based operators such as crossover, mutation, and selection. Although the evolution process of DE is similar to the one of GA, it relies on a mutation operator instead of a crossover operator [69].

A Pareto-based multi-objective DE algorithm for ARM was first proposed in [7] by Alatas et al. for searching accurate and comprehensible association rules. The problem of mining association rules was formulated with four objective optimization problems, i.e., support, confidence, comprehensibility and amplitude. Here, support, confidence and comprehensibility are maximization objectives and the amplitude of intervals is a minimization objective. In a single run, a Pareto-based multi-objective DE algorithm search intervals of numeric attributes and association rules.

In 2018, [20] proposed a novel approach for mining association rules with numerical and categorical attributes based on DE. In this algorithm, a single objective optimization problem is considered in which support and confidence of association rules are combined into a fitness function. This new DE using ARM (ARM-DE) with mixed (i.e., numerical and categorical) attributes consists of three stages: (1) domain analysis, (2) representation of a solution, (3) definition of a fitness function.

Swarm Intelligence Based Algorithms

Swarm intelligence-based algorithms are further divided into two sub-optimization methods, particle swarm optimization and the wolf search algorithm. Table 6 provides an overview of swarm intelligence algorithms for NARM.

Table 6 An overview of Swarm intelligence based algorithms for NARM.

Particle Swarm Optimization Particle Swarm Optimization (PSO) is a population-based optimization algorithm for nonlinear functions. This algorithm is oriented towards animal behavior, such as bird flocking or fish schooling. It was developed in 1995 [33, 52]. PSO was first used for NARM to find intervals of the numerical attributes in 2008 [4].

Rough PSOA, based on rough patterns, was proposed in [4], in which rough values are defined with upper and lower intervals. This algorithm can complement the existing tools developed in rough computing. Rough values are helpful in representing an interval for an attribute. In this work, each particle consists of a decision variable that has three parts. The first part of each decision variable represents the antecedent or consequent of the rule and can take values between 0 and 1. The second part describes the lower bound; the third part represents the upper bound of the item interval. The second and third parts are combined as one rough value during the implementation phase of particle representation.

Alatas and Akin [5] proposed a novel PSO algorithm based on chaos numbers. The CENPSOA algorithm ( chaotically encoded PSO) uses chaos decision variables and chaos particles. Chaos and PSO relation were first discovered by Liu et al. [42]; the CENPSOA algorithm performs encoding of particles given by chaos numbers. The Chaos numbers consist of the midpoint and radius part of values [5]. Alatas and Akin [6] also proposed a multi-objective chaotic particle swarm optimization algorithm for mining accurate and comprehensible classification rules.

Yan et al. [73] proposed a parallel PSO algorithm for NARM. This parallel algorithm was designed with two strategies called particle-oriented and data-oriented parallelization. Particle-oriented parallelization is more efficient and data-oriented parallelization is more scalable to process large datasets.

To discover association rules in a single step without prior discretization of numerical attributes, Beiranvand et al. [12] proposed a multi-objective particle swarm optimization algorithm (MOPAR). The algorithm defines multiple objectives such as confidence, comprehensibility and interestingness. In the Pareto method, a candidate solution is identified better than all other candidates. In multi-objective optimization, a set of best solutions is identified in which the members are superior among all the candidates.

Kuo et al. [38] proposed a multi-objective particle swarm optimization algorithm using an adaptive archive grid for NARM. It is also based on Pareto's optimal strategy. In this algorithm, minimum support and minimum confidence are not required before mining. MOPSO algorithm is executed in three parts: (1) initialization, (2) adaptive archive grid, and (3) particle swarm optimization searching.

PSO for NARM with Cauchy distribution (PARCD) has been evaluated by [65] and it showed that the result of PARCD is better than the method of MOPAR.


Wolf Search Algorithm The wolf search algorithm (WSA) is a bio-inspired heuristic optimization algorithm. It was proposed by [67] and imitated the way wolves search for food and survive by avoiding their enemies. WSA is tested and compared with other heuristic algorithms and investigated with respect to its memory requirements. The group of wolves has characteristics of commuting together as a nuclear family; that is why it is different from particle swarm optimization [72].

Agbehadji and Fong [1] proposed a new meta-heuristic algorithm that used the wolf search algorithm for NARM. The wolf has three different features of preying. These are prey initiatively, prey passively and escape. The preying initiatively feature allows the wolf to check its visual perimeter to detect prey. If the prey is found within visual distance, the wolf moves towards the prey with the highest fitness value; else, the wolves will maintain their direction. In prey passively mode, the wolf only stays alert from threats and tries to improve its position. In the escape mode, when a threat is detected, the wolf escapes quickly by relocating itself to a new position with an escape distance greater than its visual range.

Physics-Based Algorithm

The physics-based meta-heuristic optimization algorithm simulates the physical behavior and properties of the matter or follows the laws of physics [15]. For NARM, the gravitational search algorithm is a physics-based meta-heuristic optimization algorithm.

Gravitational Search Algorithm

Rashedi et al. proposed a new optimization algorithm based on the law of gravity and named it gravitational search algorithm (GSA) [54]. Newtonian gravity laws state that “Every particle in the universe attracts every other particle with a force that is directly proportional to the product of their masses and inversely proportional to the square of the distance between them.” In GSA, agents act as objects and their performance is evaluated by their mass. Each mass presents a solution and it is expected that masses will be attracted by the heaviest mass. GSA is like a small artificial world of masses obeying the Newtonian laws of gravitation and motion. There are four ways of representing the agents or coding the problem variables. These are continuous (real-valued), binary-valued, discrete, and mixed, which are called GSA variants [55].

Can and Alatas [13] first used GSA for NARM. GSA eliminated the task of finding the minimum values of support and confidence. Automatically mined rules have high confidence and support values. In this work, GSA has been designed to automatically find the numerical intervals of the attributes, i.e., without any a priori data process at the time of rule mining. The problem of interactions within attributes has been eliminated with the designed GSA by not selecting one attribute at a time and not evaluating a partially-constructed candidate rule due to its global searching with a population.

The Distribution Method

In [11], Aumann and Lindell have introduced a new definition for numerical association rules based on statistical inference theory. In this study, they have implemented several distribution scales, including mean, median, and variance. The following example shows the kind of generalization of ARM proposed by the authors.

$$\begin{aligned} \mathrm{Gender}\!=\!F \Rightarrow \mathrm{Wage}:\mathrm{mean} \!=\! \$ 8.50 \quad (\mathrm{overall\,\, mean\,\, wage} = \$ 12.60) \end{aligned}$$
(5)

As the above example shows, the average wage for females was $ 8.50 p/hr. The rule displays that the wage of that group was far less than the average wage; therefore, this rule can be considered useful. They also used the algorithm to identify repeated itemsets and then calculate the desired statistics for the purpose with respect to repeated itemsets. This procedure is restricted by the requirement to store every repeated itemsets in memory throughout repeated itemset generation. Where the data is not sparse, the number of frequent itemsets will be huge and repeated itemset storage and access will dominate the calculation. Moreover, they concluded that the suggested algorithm is beneficial and may find rules between two given quantitative attributes. Webb [71] extended the work proposed by Aumann and Lindell in [11] with name impact rules using the OPUS search algorithm [70]. In this paper, the author evaluated the impact of conditions on a numeric variable that association rules with discretization can not emulate. The author compared the frequent itemset approach with the OPUS_IR approach. The author found OPUS_IR avoids large memory requirements with a frequent itemset approach by avoiding the need to store all frequent itemsets.

The Discretization Method

Discretization is a process of quantizing numerical attributes into groups of intervals and it is one of the most popular methods to solve the problem of numerical association rule mining. There are numerous methods of discretization in literature. Due to different needs, discretization methods have been developed in different ways, such as supervised vs. unsupervised, dynamic vs. static, global vs. local, splitting (top-down) vs. merging (bottom-up) and direct vs. incremental [43]. In classical ARM algorithms, numerical columns cannot be processed directly [44], i.e., all columns need to be categorical, which is a major limitation of ARM [66].

Discretization of numerical values is used to overcome this problem [34, 49, 50]. When a numeric column is divided into useful target groups, it becomes easier to identify and generate association rules, i.e., discretization helps to understand the numeric columns better. The discretized groups are useful only if the variables in the same group do not have any objective difference. Discretization minimizes the impact of trivial variations between values. Discretization can be performed using fuzzifying, clustering and partitioning and combining [8]. In Table 7, we summarize some selected discretization algorithms used in NARM.

Table 7 An overview of discretization-based algorithms for NARM

Fuzzifying

Fuzzifying is the technique of illustrating numeric values as fuzzy sets [35], which can help to rectify the sharp boundary problem of ARM. Sometimes, endpoint values of discretized groups have more or less influence on the result than the midpoint values: this phenomenon is known as a sharp boundary problem. Fuzzy Class Association Rule Support Vector Machine (FCARSVM) is a model proposed by Kianmehr et al. [35] to get the fuzzy class association rules. In the first phase of the model, Fuzzy class association rules (FCAR) are extracted using fuzzy c-means clustering algorithm for quantitative datasets and in the second phase, extracted FCARs are weighted based on scoring metric strategy.

For mining fuzzy quantitative association rules, those have crisp values, fuzzy terms and intervals in both antecedent and consequent, Zhang [76] presented an algorithm EDPFT(equal-depth partition with the fuzzy term). The author used an equal-depth partition algorithm for finding the intervals of numeric values and map crisp values and fuzzy terms of each categorical attribute into consecutive integers and generate frequent itemsets using the extended apriori algorithm. In 1999 Hong et al. [31] also proposed an algorithm FTDA (fuzzy transaction data-mining algorithm), which integrates the fuzzy-set concepts with an apriori algorithm. This method encounters the problem of requiring the fuzzy-sets and their corresponding membership functions in advance. Choosing the best fuzzy-sets for mining the association rule is difficult, as anomalies may occur if fuzzy-sets are not well chosen. To tackle this problem, [26] introduced an additional fuzzy normalization process and proposed an algorithm for fuzzy quantitative association rules. [26] also compared with normalization and without normalization methods for mining fuzzy quantitative rules and show with normalization method gives a high number of interesting rules compare to with normalization method. The authors used three interest measures: fuzzy support, fuzzy confidence, and fuzzy correlation. In 2014, [77] proposed a novel algorithm OFARM (optimized fuzzy association rule mining) to optimize the partition points of fuzzy sets with multiple objective functions. A two-level iteration process is used to generate the frequent itemsets and employ certainty factor with confidence to evaluate fuzzy association rules.

Clustering

Clustering is one of the popular methods of discretizing a numerical column in an unsupervised manner [8]. In clustering, a numerical column is segregated into different groups according to the properties of each value; in this method, the probability of having values in the same group depends on the degree of similarity or dissimilarity of the values [27, 59]. To obtain maximum results in clustering, the degree of similarity and dissimilarity needs to be well defined [24]: “In other words, the intra-cluster variance is to be minimized, and the inter-cluster variance is to be maximized” [66]. Two-step clustering [59] is the most common clustering method.

DRMiner Algorithm

Lian et al. [41] have proposed the DRMiner algorithm, which exploits the notion of “density” to capture the characteristics of numeric attributes and an efficient procedure to locate the “dense regions.” DRMiner scales up well with high-dimensional datasets. When mapping a database to a multi-dimensional space, the data points (transactions) are not distributed evenly throughout the multi-dimensional space. For this kind of distribution, the density measure was introduced and the problem of mining quantitative association rules transformed into the problem of finding dense regions to map them to find quantitative association rules. Weaknesses of this method were the prior requirement of many thresholds and unsolving the dimensionality curse. It was noted that the algorithm might not perform well for datasets with uniform density between minimum density threshold and low density.

DBSMiner

DBSMiner is a density-based sub-space mining algorithm using the notion of density-connected to cluster the high-density sub-space of numeric attributes and gravitation between grid/cluster to deal with the low-density cells [25]. DBSMiner employs an efficient high dimension clustering algorithm CBSD (Clustering Based on Sorted Dense unit) to deal with high dimensional data sets. The algorithm has a unique feature to deal with low-density sub-spaces and there is no need to scan the whole space; check the neighbor cell. It can find interesting association rules.

MQAR

MQAR (Mining Quantitative Association Rules based on a dense grid) is a novel algorithm that was proposed by Yang and Zhang [75]. The main objective of this algorithm was to mine the numeric association rules using a tree structure, DGFP-tree, to cluster dense space. This algorithm is helpful to eliminate noise and redundant rules by transforming the problem into finding regions with enough density and to map them to quantitative association rules. A novel subspace clustering algorithm was also proposed based on searching DGFP-tree and inserting the dense cell in the database space into DGFP-tree as a path from a root node to a leaf node. MQAR has the advantage that DGFP-tree compresses the database and there is no need to scan the database several times.

ARCS

The Association Rule Clustering System [40] was presented by Lent et al. together with a new geometric-based clustering algorithm, BitOP. In this paper, the problem of clustering of association rules like \((A\wedge B) => C\) where L.H.S. is having quantitative attributes and R.H.S. having a categorical attribute was discussed and a two-dimensional grid is formed where each axis represents one of the L.H.S. attributes. ARCS is an automated system to compute a clustering of two-attribute spaces in large databases. In ARCS framework Binner, For a given partitioning of the input attributes, the algorithm makes only one pass through the data. and allows the support or confidence thresholds to change without requiring a new pass through the data. BitOp algorithm enumerates the clusters. To locate clusters within bitmap grids, the algorithm performs bit-wise operations.

Partitioning and Combining

In [60], Srikant and Agrawal discussed the problems of numeric attributes in databases. The authors addressed the issue of mining association rules from large databases containing both numerical and categorical attributes. A partitioning method was introduced to deal with this problem, which partitions quantitative attributes into intervals and map pairs (attribute, interval) to Boolean attributes. Before partitioning, a measure of partial completeness was introduced to quantify information lost due to partitioning and to decide the number of partitions and whether or not to partition a quantitative attribute. The following formula computes the number of required partitions.

$$\begin{aligned} \mathrm{Number \,\,of\,\, intervals} = \frac{2n}{m(K-1)} \end{aligned}$$
(6)

where n is the number of numeric attributes, m is the minimum support and K is the partial completeness level. To identify interesting rules and to prevent the generation of similar rules, the authors used the “greater-then-expected-value” interest measure.

In [14], a novel algorithm, APACS2 was proposed, which implemented adjusted difference analysis to find the interesting associations among attributes. This algorithm has the advantage of discovering both positive and negative associations and it avoids user-specified threshold, which is hard to determine. Fukuda et al. [22] presented a novel algorithm to generate optimized intervals in linear time for sorted data. They used randomized bucketing as a prepossessing method because it was expensive to sort the quantitative attribute for large databases.

Table 8 Summary of different numerical association rule mining methods

Discussion

In Table 8, we discuss the advantages and disadvantages of the optimization method, discretization, and distribution method. We assessed that every mining method for numerical association rules has some pros and cons. However, being fundamentally different, these approaches have standard support and confidence and mostly have a user-specified threshold. We have investigated that which methods use the discretization technique as a pre-processing step for partitioning or finding the interval of numeric attributes. We observed that all the sub-methods of optimization methods do not use the discretization technique but used in the distribution method. Figure 2 is depicting the year-wise contribution of each method in NARM. It is clear that most of the algorithms of the discretization method were proposed in the 20th century, and few of them were proposed in the 21st century. OFARM is the most recent algorithm that was proposed in 2014. In the swarm intelligence method, parallel PSO and MOPSO are the most recent algorithm among algorithms under all other methods. Algorithms from evolution-based methods came into the scene after 2000. The distribution method was proposed in 2003 and it does not contribute much to NARM. Recently, a Grand report tool has also been proposed that reports mean values of a chosen numeric target column concerning all possible combinations of influencing factors [51].

Fig. 2
figure 2

Year-wise contribution of existing algorithms of NARM

Conclusion

Real-world databases contain a high volume of quantitative/numerical and categorical data. Therefore, it is essential to use NARM methods for discovering knowledge from these data sets. In this article, we conducted a detailed study on three NARM methods and their supporting algorithms. We have investigated the use of the discretization technique for partitioning the numerical attributes in various NARM methods. We find that the optimization methods (evolution-based algorithms, swarm-intelligence-based algorithms and physics-based algorithms) do not use discretization techniques; however, they have higher computational costs. The distribution method has not been discussed much in the literature and it does not support the multiple comparison procedure. In the discretization method, dimensionality curse and requirement of many user-specified thresholds is also a disadvantage. Finding the best partition is still very challenging and it has a vast scope in NARM. This article highlighted open research challenges, pros and cons of popular NARM methods and algorithms. We concluded that no single NARM method seems to be perfect for discovering patterns from real-world datasets.