Abstract
The size of collected data is increasing and the number of rules generated on those datasets is getting bigger. Producing compact and accurate models is being the most important task of data mining.
In this research work, we develop a new associative classifier – ACHC, that utilizes agglomerative hierarchical clustering as a post-processing step to reduce the number of rules and a new method is proposed in the rule-selection step to increase classification accuracy.
Experimental evaluations show that the ACHC method achieves significantly better results than classical rule learning algorithms in terms of rules on bigger datasets while maintaining classification accuracy on those datasets. More precisely, ACHC achieved the highest (43) result on the average number of rules and the third-highest (84.8%) result in terms of average classification accuracy among 10 classification algorithms.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Frequent itemsets
- Class association rules
- Associative classification
- Agglomerative hierarchical clustering
- Cluster center
1 Introduction
Since information technologies are developing very rapidly and the amount of collected data is growing, analyzing such big data is an important task of data mining.
The number of rules generated from “real-life” datasets can easily grow very large, which may cause a combinatorial explosion. Therefore, mining association rules from these data and reducing their number to produce compact models for end-users is becoming a crucial data mining task [17].
To overcome this problem and achieve more compact as well as understandable models, rules have to be pruned and/or clustered.
Association [1] and classification rule mining [2, 7, 26] are two important fields of data mining. Classification rule mining aims at building accurate models to forecast the class value of a future object by selecting small subsets of rules, while association rule mining algorithms find all existing rules in a dataset based on some user-specified constraints by exploring the entire search space.
Associative Classification (AC) [5, 12, 14, 18, 21, 22] is another data mining technique that combines classification and association rule mining. The main goal of AC methods is to produce compact, accurate and descriptive models based on association rules. Thus, the performance of AC methods can sometimes be better than some of the traditional classification methods on accuracy, in spite of worse efficiency, because AC methods are sensitive to user-defined parameters such as minimum support and confidence.
Another area in data mining is clustering [13, 25, 29]. Clustering methods can be usually grouped in two groups: partitional clustering [24, 30] that aims at grouping similar objects together by using partitioning techniques, and hierarchical clustering [28], which is a nested sequence of partitions.
In this research work, we present a novel cluster-based associative classification method. Firstly, we describe a new normalized combined distance metric to find the similarity of two class association rules (CARs). Secondly, we cluster the CARs by using a bottom-up approach of hierarchical agglomerative clustering. In this step, we automatically identify the optimal number of clusters. Thirdly, after CARs are clustered, we present a novel method of extracting the “representative” CAR for each cluster. Finally, we develop a compact and accurate associative classification model by including just the representative CARs from each cluster.
The performance of our method (ACHC) is evaluated on 12 selected datasets taken from the UCI ML Repository [6] and compared with 9 popular (associative) classification approaches, such as Decision Table and Naïve Bayes (DTNB) [10], Decision Table (DT) [15], PART (PT) [7], C4.5 [26], CBA [18], Simple Associative Classifier (SA) [20], FURIA (FR) [11], Ripple Down rules (RDR) [27], and J&B [19].
2 Related Work
The first related approach [3] is clustering of association rules based on the k-means clustering algorithm. Similarly to our approach, this approach utilizes the “APRIORI” algorithm [1] to generate the association rules. The key differences between ACHC and this method are (1) it uses a different algorithm in the clustering step, that is, it clusters the rules based on interestingness measures by using the k-means clustering method and (2) a novel distance metric is used to cluster the rules in our method.
The fuzzy clustering algorithm on association rules (FCAR) [4] is proposed based on rule simulation. In this method, researchers aimed to develop a fuzzy clustering method which partitions n objects into k subsets. FCAR uses the “APRIORI” algorithm in the rule-generation part as in our method, but they use a different technique (partitional) in the clustering phase. Another key difference is that they cluster the association rules while we intend to cluster class association rules.
Another related approach [8] is distance-based clustering of association rules. In this method, an “indirect” distance metric (based on CARs support and coverage) is used to group the association rules, while we develop a novel “combined” distance metric (based on direct and indirect measures) and an agglomerative clustering method to cluster the rules.
In [16], authors mine the clusters with association rules. The APRIORI algorithm is used to find strong CARs in the first step and then, it clusters these CARs by using a hierarchical clustering method as in our method. However, they used a different distance metric based on an “indirect” distance metric (based on coverage probabilities) in the clustering step.
The algorithm, described in this paper extends our previous work from [21] and [22]. In [21], the CMAC algorithm used a direct distance metric in the clustering phase of CARs and the cluster centroid approach to select the representative CAR for each cluster, while in [22] CMAC is compared to two similar algorithms, one using the direct distance metric and covering approach in the representative CAR selection phase and the other using combined (direct and indirect) distance metric with the same covering approach to select the representative CAR. This paper presents the only remaining combination, i.e., using a combined distance metric in the CAR clustering phase and the cluster centroid approach to select the representative CAR for each cluster.
3 Our Proposed Method – ACHC
It is assumed that we have given a relational table consisting of S examples (transactions). Each example is defined by A different attributes and classified into one of the C known classes.
Our main goals and contributions within this research are listed below:
-
Developing a novel distance metric to measure the similarity of class association rules;
-
Clustering of class association rules by utilizing the hierarchical clustering algorithm and finding the optimal number of clusters for each class value;
-
Defining a new method of selecting a representative class association rule for each cluster to represent the compact and accurate classifier;
-
Performing an experimental evaluation to show the advantages and disadvantages of the developed algorithm.
3.1 Class Association Rules
This section describes how to produce the strong class association rules. We first need to discover the frequent itemsets and we then generate the class association rules from these frequent itemsets. To generate the frequent itemsets, we apply the minimum support threshold, that is, frequent itemsets satisfied by minimum support constraint are generated in the first step. After generating the frequent itemsets, it is a straightforward approach to discover the class association rules from frequent itemsets. In this step, we apply the minimum confidence threshold to produce strong CARs.
We utilize the APRIORI algorithm to generate the frequent itemsets, because APRIORI is a well-known and frequently used algorithm for association rule mining.
Once we generate the frequent itemsets, our next goal is to produce the strong class association rules which satisfy the minimum confidence constraint. We check the confidence of each rule, if it satisfies the required threshold, then we generate that rule. Confidence of the rule: \(A\to B\) is computed as follows:
Equation (1) describes the confidence of a rule based on the support count of a frequent itemset, where A represents the antecedent (left-hand side of the rule) and B the consequence (class value in the case of CARs) of a rule. Moreover, support_count \((A \cup B)\) is the number of examples in the dataset that match the itemset \(A \cup B\), and support_count (A) is the number of examples that match the itemset A. We generate the strong class association rules that satisfy the minimum confidence constraint based on Eq. (1), as follows:
-
1.
All nonempty subsets S of frequent itemset L belonging to class C are generated;
-
2.
For every nonempty subset S of L, output the strong rule R in the form of “S→C” if, \(confidence(R)\ge\) min_conf, where min_conf is the minimum confidence threshold.
3.2 The New “combined” Distance Metric
In our previous research [21], we proposed a novel “direct” distance metric and defined its advantages and disadvantages. In this research, we use the previously developed “direct” distance metric and also focus on the “indirect” distance metric based on the Conditional Market-basket Probability (CMBP) [8]. Using a probability estimate for distance computation has many advantages. Probabilities are well understood, are intuitive, and a good measure for further processing, and it is appropriate for rules only with the same consequent. The distance \({d}^{CMBP}\) between two rules rule1 and rule2 is the (estimated) probability that one rule does not hold for a basket, given at least one rule holds for the same baskets. This distance is defined as follows:
where BS is both sides of the rule, that is, the itemset for the association rule. m(BS)and m(\({BS}_{rule1}\cup {BS}_{rule2}\)) denotes the set of transactions (baskets) matched by BS and \({(BS}_{rule1}\cup {BS}_{rule2})\) respectively. |m(BS)| and \(\left|m({BS}_{rule1}\cup {BS}_{rule2})\right|\) is the number of such transactions.
With this metric, rules having no common market baskets are at a distance of 1, and rules valid for an identical set of baskets are at a distance of 0.
In this research, we combine the “direct” and “indirect” distance metric to produce a new Weighted and Combined Distance Metric (WCDM). WCDM combines direct measure (rule items) and indirect measure (rule coverage). The weighted distance \({d}^{\mathrm{WCDM}}\) between two rules, rule1 and rule2 is defined in Eq. (3).
where \(\alpha\) is a weigh balancing parametr in (3). We set \(\alpha =\) \(0.5\) parameter in the distance metric developing part, that is, the contribution of direct and indirect is considered equal to make a weighted and balanced distance metric. The resulting distance metric is defined in Eq. (4).
3.3 Clustering
Clustering algorithms are usually split into two groups: partitional and agglomerative hierarchical clustering. In our research, we apply the complete linkage method of agglomerative hierarchical clustering (in the bottom-up fashion) because it is frequently used and more consistent than other algorithms.
In the algorithms based on the bottom-up fashion, every example is considered as a unique cluster at the beginning and the two the nearest clusters are merged in each iteration until all clusters have been merged into a unique cluster.
In the complete linkage (farthest neighbor) method of agglomerative hierarchical clustering, the similarity between two clusters is computed based on the similarity of their most dissimilar examples, that is, the farthest groups are taken as an intra-cluster distance.
To cluster the class association rules, we first identify the optimal number of clusters. In this step, we utilize the most-common technique (described in Algorithm 1), where the dendrogram is cut from the point which represents the maximum difference of two consecutive cluster heights.
Algorithm 1 gets cluster heights (distance between two clusters) which are computed during the dendrogram construction process as an input, and it outputs the optimal number of clusters.
N stores the total number of clusters in line 3. For each two consecutive cluster heights (lines 4–9), we compute the differences and store the maximum difference to Opt_number_of_cls parameter.
3.4 Selecting the “Representative” CAR
After clustering the CARs, we define an algorithm (presented in Algorithm 2) to select a “representative” CAR for each cluster. Only representative CARs are included in the final classifier.
Algorithm 2 describes the method of extracting the representative CAR. Firstly, the distances between the selected CAR and all other CARs are calculated for each CAR. Secondly, we find the CAR which obtains the minimum distance and returns that class association rules as a representative.
Algorithm 3 summarizes all the above-mentioned steps and describes the final ACHC classifier. It gets the training dataset, minimum support as well as confidence thresholds, as input parameters and outputs the compact and accurate associative classification model – ACHC.
The first three lines generate the strong CARs defined in Sect. 3.1 and sort them by confidence and support descending order due to the following criterion:
If \({Rule}_{1}\) and \({Rule}_{2}\) represent two CARs, \({Rule}_{1}\) is said to have higher rank than \({Rule}_{2}\) (\({Rule}_{1}>{Rule}_{2}\)):
-
if and only if, \(conf\left({Rule}_{1}\right)>conf({Rule}_{2})\) or
-
if \(conf\left({Rule}_{1}\right)=conf({Rule}_{2})\), but \(supp\left({Rule}_{1}\right)>supp({Rule}_{2})\) or
-
if \(conf\left({Rule}_{1}\right)=conf({Rule}_{2})\) and \(supp\left({Rule}_{1}\right)=supp\left({Rule}_{2}\right),\) \(\mathrm{but} \,\,{Rule}_{1}\) has fewer items in its left-hand side than \({Rule}_{2}\).
Once the CARs are sorted, we group them according to class value (line 4) and then we build the distance matrix (line 6) for each group of CARs (defined in Sect. 3.2) to apply the hierarchical clustering algorithm with complete linkage (For example: if we have 3 class value, we build the model for three group of CARS and merge them at the end). Line 7 finds the cluster heights (distances between clusters) and the natural number of clusters is identified by using the cluster heights (line 8). After determining the optimal number of clusters, the hierarchical clustering algorithm is again utilized to find the cluster of CARs (Cluster array stores the list of clustered CARs) in line 9. In the last step (lines 10–12), we select the representative class association rule for every cluster (described in Subsect. 3.4) to produce the final compact and accurate model.
4 Experimental Setting and Results
Our developed algorithm is compared against 9 rule learners in terms of classification accuracy and rules. All associative classifiers were run with default parameters set up by WEKA [9] software. Some parameters (minimum support and confidence) were modified on “imbalanced” datasets to achieve the intended number of rules (at least 10 rules for each class value) for AC methods.
Statistical significance testing is performed based on the paired t-test (significance difference threshold was set to 95%) method. The description of the datasets and input parameters are shown in Table 1.
Moreover, a 10-fold cross-validation assessment technique was employed to represent all experimental results. Table 2 shows the experimental results for classification accuracies (with standard deviations).
Table 2 shows that our proposed associative classifier (ACHC) achieved comparable average accuracies (84.8%) to other classification models on selected datasets. More precisely, ACHC gained the third-highest average accuracy among 10 classification and association rule mining algorithms.
Our developed classifier obtained the best accuracy on “Balance” (except DTNB), “Breast.Can”, “Spect.H”, “Hayes.R” and “Connect4” datasets among all algorithms, while on “Car.Evn” and “Nursery” datasets, ACHC is beaten by all “classical” classification models.
Unexpectedly, standard deviation of all rule-learners was high (this situation happens mainly with imbalanced datasets which affect the rule-generation part of “APRIORI” algorithm) on the “Hayes.R”, “Lymp” and “Connect4” datasets.
It can be seen from Fig. 1 that all classification algorithms achieved almost similar result on average accuracy except DTNB, DT and SA.
Table 3 represents the statistical significance testing (wins/losses counts) on accuracy between ACHC and other classification models. W: winning count (our approach was significantly better than the compared algorithm); L: losing count (our approach was significantly worse than the selected algorithm); N: no statistically significant difference has been detected in the comparison.
Table 3 illustrates that the performance of the ACHC method on accuracy is better than DTNB, DT and SA methods while this performance is similar or the same to C4.5, FR, J&B, RDR and CBA according to win/losses counts. Table 4 illustrates the size of all classification methods.
Experimental evaluations on the number of rules (Table 4) show that ACHC produced the best result in terms of the average number of rules among 10 rule-learners. Our approach produced a statistically smaller classifier on “Car.Evn” and “Nursery” datasets, although not achieving the best classification accuracies on those datasets.
On “Hayes-root” and “Balance” datasets, ACHC obtained an unexpectedly larger number of rules (due to imbalanced datasets) but it produced accurate classifiers for those datasets.
Figure 2 illustrates that associative classifiers achieved better result than “classical” classification models on the average number of rules. The main reason is that “classical” classification models are sensitive to the size of the dataset.
Table 5 shows that ACHC produced smaller classifiers than DTNB, DT, C4.5, PT, FR, SA, and J&B algorithms on more than 8 datasets out of 12 (by win/losses count).
5 Conclusion and Future Work
Overall, our developed method achieved the highest result on the average number of rules by exhaustively searching the entire example space using constraints and clustering while maintaining a classification accuracy that was comparable to state-of-the-art rule-learning classification algorithms. Experimental evaluations showed that ACHC produced an accurate and compact classifier that was able to reduce the number of classification rules in the classifier by 2–4 times on average compared to the other “classical” rule-learners, while this ratio is even bigger on datasets with a higher number of examples.
The main drawback of our proposed method (ACHC) is its time efficiency. In future work we plan to optimize ACHC to bring its time complexity at least a bit closer to state-of-the-art “divide-and-conquer” rule-learning algorithms.
We plan to develop new methods by setting up different values for \(\alpha\) parameter and perform experiments to show the advantages and disadvantages of those methods.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) VLDB 1994 Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Chile (1994)
Cohen, W.W.: Fast Effective rule induction. In: Prieditis, A., Russel, S.J. (eds.) ICML 1995 Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123. California (1995)
Dahbi, A., Mouhir, M., Balouki, Y., Gadi, T.: Classification of association rules based on K-means algorithm. In: Mohajir, M.E., Chahhou, M., Achhab, M.A., Mohajir, B.E. (eds.) 4th IEEE International Colloquium on Information Science and Technology, pp. 300–305. Tangier, Morocco (2016)
Dechang, P., Xiaolin, Q.: A new fuzzy clustering algorithm on association rules for knowledge management. Inf. Technol. J. 7(1), 119–124 (2008)
Deng, H., Runger, G., Tuv, E., Bannister, W.: CBC: an associative classifier with a small number of rules. Decis. Support Syst. 50(1), 163–170 (2014)
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, Irvine, CA (2019)
Frank, E., Witten, I.: Generating accurate rule sets without global optimization. In: Shavlik, J.W. (eds) Fifteenth International Conference on Machine Learning, pp. 144–151. USA (1998)
Gupta, K.G., Strehl, A., Ghosh, J.: Distance based clustering of association rules. In: Proceedings of Artificial Neural Networks in Engineering Conference, pp. 759–764. USA (1999)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Hall, M., Frank, E.: Combining Naive Bayes and Decision Tables. In: Wilson, D.L, Chad, H. (eds.) Proceedings of Twenty-First International Florida Artificial Intelligence Research Society Conference, pp. 318–319, Florida, USA (2008)
Hühn, J., Hüllermeier, E.: FURIA: an algorithm for unordered fuzzy rule induction. Data Min. Knowl. Disc. 19(1), 293–319 (2019). https://doi.org/10.1007/s10618-009-0131-8
Hu, L.Y., Hu, Y.H., Tsai, C.F., Wang, J.S., Huang, M.W.: Building an associative classifier with multiple minimum supports. SpringerPlus 5, 528 (2016). https://doi.org/10.1186/s40064-016-2153-1
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, USA (1990).
Khairan, D.R.: New associative classification method based on rule pruning for classification of datasets. IEEE Access 7, 157783–157795 (2019)
Kohavi, R.: The power of decision tables. In: Lavrač, N., Wrobel, S. (eds) 8th European Conference on Machine Learning, pp. 174–189. Crete, Greece (1995)
Kosters, W.A., Marchiori, E., Oerlemans, A.A.J.: Mining clusters with association rules. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 39–50. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48412-4_4
Lent, B., Swami, A., Widom, J.: Clustering association rules. In: Gray, A., Larson, P. (eds.) Proceedings of the Thirteenth International Conference on Data Engineering, pp. 220–231. England (1997)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Agrawal, R., Stolorz, P. (eds.) Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 80–86. New York, USA (1998)
Mattiev, J., Kavšek, B.: A compact and understandable associative classifier based on overall coverage.In: Procedia Computer Science, vol. 170, pp. 1161–1167. Warsaw, Poland (2020).
Mattiev, J., Kavšek, B.: Simple and accurate classification method based on class association rules performs well on well-known datasets. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 192–204. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_17
Mattiev, J., Kavšek, B.: CMAC: clustering class association rules to form a compact and meaningful associative classifier. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12565, pp. 372–384. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64583-0_34
Mattiev, J., Kavšek, B.: Distance-based clustering of class association rules to build a compact, accurate and descriptive classifier. Comput. Sci. Inf. Syst. 18(3), 791–811 (2021). https://doi.org/10.2298/CSIS200430037M
Mattiev, J., Kavsek, B.: Coverage-based classification using association rule mining. Appl. Sci. 10, 7013 (2020). https://doi.org/10.3390/app10207013
Ng, T.R., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Bocca, J., B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th Conference on Very Large Data Bases (VLDB), pp. 144–155, Santiago, Chile (1994)
Phipps, A., Lawrence, J.H.: An overview of combinatorial data analysis. clustering and classification, pp. 5–63, World Scientific, New Jersey (1996)
Quinlan, J.: C4.5: programs for machine learning. Mach. Learn. 16(3), 235–240 (1993)
Richards, D.: Ripple down rules: a technique for acquiring knowledge. Decision-making support systems: achievements, trends and challenges for, pp. 207–226. IGI Global, USA (2002)
Theodoridis, S., Koutroumbas, K.: Hierarchical algorithms. Pattern Recogn. 4(13), 653–700 (2009)
Zait, M., Messatfa, H.: A comparative study of clustering methods. Futur. Gener. Comput. Syst. 13(2–3), 149–159 (1997)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Widom, J. (ed) Proceedings of the 1996 ACM-SIGMOD International Conference on Management of Data, pp. 103–114. Montreal, Canada (1996)
Acknowledgement
The authors gratefully acknowledge the European Commission for funding the InnoRenew CoE project (Grant Agreement #739574) under the Horizon2020 Widespread-Teaming programme and the Republic of Slovenia (Investment funding of the Republic of Slovenia and the European Union of the European Regional Development Fund). They also acknowledge the Slovenian Research Agency ARRS for funding the project J2-2504. Jamolbek Mattiev is also funded for his Ph.D. by the “El-Yurt-Umidi” foundation under the Cabinet of Ministers of the Republic of Uzbekistan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mattiev, J., Kavšek, B. (2021). ACHC: Associative Classifier Based on Hierarchical Clustering. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-91608-4_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91607-7
Online ISBN: 978-3-030-91608-4
eBook Packages: Computer ScienceComputer Science (R0)