1 Introduction

Currently, association rule mining (ARM) has attracted significant attention in the research areas of data mining. It involves defining the frequency of items and evaluating conditional inference rules between them. Applications of data mining are scattered internally and externally over numerous techniques [1]. Data mining is a method by which beneficial data are collected from some large datasets, results analysis, and stored information processing. It is generally exploring and analyzing massive information blocks to discover significant patterns and correlations. Besides, it also discovers hidden relationships between different items in large datasets. Frequent itemsets are necessary for frequent item mining (FIM) from transaction databases. It is essential to investigate association rules in various real-life applications, such as cross-marketing, retail, market basket analysis, fraud detection, and spam email filtering. Frequent itemsets may be transactional or relational [2, 3].

Discovering behavioral patterns has been widely used in association analysis. It focuses on the support and confidence threshold to comprehend whether the data are correlated. Support is the occurrences of items in the transaction dataset. Confidence is the indication of the strength of a rule in terms of percentage. An itemset is called frequent when its support level is exceeded or equals the minimum support defined by the user. The methodology is to first find the frequent items by scanning the dataset according to the support and confidence level and then extract association rules [4]. Discovered association rules with confidence values less than the minimum confidence are eliminated. ARM [5] is one of the widely used techniques for association analysis. Based on the ARM, we can identify potential relationships between items in a large transaction database. The ARM seeks to recognize the transactional behaviors that conform to the primary metrics used to build an association rule [6]. Minimum support and confidence are the primary metrics for evaluating the effectiveness of the generated rules in the database.

An association rule can be identified by I = {i1, i2, i3 …….in} including several different items, and a database of transactions DB = {T1, T2, T3 …Tm} consists of several transactions where every transaction T is a subset of items list I. The form of an association rule is X → Y and X ∩ Y = φ, where X, Y is a subset of I. The set of items X is referred to as antecedent and Y is referred to as the consequent. The support of X will be dependent on the occurrence frequency in the database, and freq(X)/DB is determined from the fraction of every row including the items specified in support of itemset X. An association rule support is determined from the combination of the antecedent and the consequent (X, Y) as follows:

$$Support\left( {X \to Y} \right) = \left( {X \cup Y} \right)/DB$$

An association rule confidence is determined by the percentage of rows, which includes itemset X and Y as follows:

$$Confident\left( {X \to Y} \right) = P\left( {X/Y} \right) = \sup port\left( {X \cup Y} \right)/Support\left( X \right)$$

ARM seeks to recognize all rules, which meet user-specific constraints. Association mining aims to derive similarities with attention grabbing, frequent patterns, and association rules in transaction databases or various data repositories. This data industry has created numerous data mining applications that can be used to showcase the product in retail shopping, telecommunication networks [7], textile industry [8], physical education [9], and the Internet of Things (IoT) [10]. These applications work for all items that have specific attributes—patient/symptoms, restaurants/menus, internet/keywords, basket/products, and railways /timings.

Several aspects are of great concern, such as analyzing the market basket, promoting financial products, planning the warehouse, recommending news, and analyzing the network faults. Market basket analysis indicates the products that are frequently purchased together and it might provide data for promotion strategies. For example, the purchase of shampoo is accompanied by the purchase of a conditioner. The promotion of shampoo might increase the sale of the conditioner. The users themselves specify the primary metrics for discovering the association rules of traditional algorithms. Whether the value is smaller or larger as the results will be affected.

With frequent pattern mining techniques, there are two major problems. First, the database is scanned several times. Second, the complicated candidate method for generating a candidate with too many itemsets is generated. These two problems are the bottleneck of efficiency in FIM.

Many FIM techniques are discussed extensively, and the studies are shown in the literature survey of frequent itemset mining [11, 12]. One of the most famous algorithms is Apriori which uses to find frequent itemsets from large databases and extending them to larger frequent itemsets [13]. The frequent patterns identified by Apriori are used to generate association rules. However, the Apriori algorithm repeatedly scans the database in the mining process, which influences the mining running speed and the average search space for all frequent itemset is large. Many enhancement algorithms [14, 15] have been proposed to increase the Apriori efficiency based on the original algorithm. Several useful algorithms for distributed and parallel frequent itemset mining have been proposed. The AprioriTID algorithm [16] is a variation of the Apriori algorithm. The TID list is a list that stores the IDs of transactions containing that itemset. In the AprioriTID algorithm, the former uses the encoding of the candidates created in the previous iteration. So, after every pass, the encoding size is reduced, which reduces the complexity of the mining to a certain extent. However, many challenges still exist [17]. The management of unbalanced and cost-sensitive information is also an essential concern. The mining of massive data algorithms is a challenge. Furthermore, the creation of efficient algorithms for valuable patterns for iterative and collaborative mining is a major interest in big data applications [18].

A further achievement in FIM is the FP-growth algorithm [19, 20] which proposes a method for compressing the required information for the frequent pattern mining in FP-tree without candidate generation and recurrently builds FP-trees for all frequent patterns. The prefix-trees are used to store the database in the FP-tree compact model. It is divided into subtrees called conditional pattern bases, and then, it separately mines the frequent itemsets.

To compensate for this deficit in the mining procedure, this study proposes the modified FP-growth (MFP-growth) algorithm based on existing methods. The proposed MFP-growth is defined based on three main features. Firstly, it employs the structure of an address table to reduce the mapping difficulty of recurrent itemsets in the MFP-tree. Secondly, a modern FP-tree* structure eliminates the need to construct conditional FP-trees. Thirdly, the proposed algorithm has lower memory demand and superior efficiency than FP-growth-based algorithms of using the header address table and FP-tree *.

Experimental results indicate the proposed method and associate them with various existing algorithms, including the FP-growth particle swarm optimization algorithm (PSOFP-growth) [21]. PSOFP-growth is an enhanced algorithm implemented into a new algorithm to find the appropriate support level and mining rules using the FP-growth algorithm. The MFP-growth and PSOFP-growth algorithms applied to the association analysis of a social security incident, which was not easy to solve. The experimental results showed that the MFP-growth analyzes effectively the frequent itemsets with less need for storage requirements using filter data to provide an accurate output of association rules that meet the specific requirements. The MFP-growth outperforms other algorithms under various minimal supports in execution time.

The applicability of traditional association rule algorithms has decreased as a result of the explosive growth of data, and it is hard to discover the rules we need directly from a large amount of data. Thus, in the context of large-scale data, the proposed MFP-growth enhances the mined rule efficiency and decreases the algorithm running time and memory consumption. The major contributions of this paper are listed below:

  • Proposing the MFP-growth algorithm to boost FP-growth efficiency.

  • An address table layout is used to present frequent itemsets in an MFP-tree, which reduces the mapping complexity of the MFP-tree.

  • Presenting the structure of the MFP-tree mining methodology to obviate the requirement of constructing conditional pattern bases facilitates the configuration of the MFP-tree.

  • Evaluating the performance of the modified algorithm on twenty various datasets with different minimum support levels.

  • Testing the validity of the proposed algorithm efficiency by conducting comparative experiments with numerous advanced algorithms, such as FP-growth and PSOFP-growth algorithms.

  • Presenting a simplified model willing to increase the reliability of association rules, and boost user experience greatly.

  • The MFP-growth algorithm is mined using filter data to measure the strength of generated rules.

The rest of the paper is arranged as follows. Section 2 introduces the related work with a brief survey of current FIM and association rule mining algorithms. The MFP-growth algorithm and MFP-tree construction are discussed in Sect. 3. The comparative experiments using real and synthetic datasets such as dataset description, runtime consumption, and memory consumption are analyzed in Sect. 4. Finally, Sect. 5 concludes the paper and presents future work.

2 Related works

Nowadays, the construction of a conditional FP-tree involves space and time complexity [17, 19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47]. Many attempts [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36, 47] have been made to resolve this issue. Zaki [22] has developed the equivalence class transformation (ECLAT) algorithm for frequent itemset mining. It relies on the intersection property and can perform both sequential and parallel computing. Unlike Apriori and FP-growth, ECLAT works vertically and creates a TID list for each item. Unfortunately, ECLAT also suffers from the drawbacks of generating too many itemset candidates, like Apriori. Hyper-structure mining of frequent patterns (H-Mine) algorithm [23] proposed to overcome the FP-growth performance bottlenecks by using queues rather than a tree data structure. H-Mine organizes transaction items into distinct queues and uses hyperlinks to connect transactions having the same first item name. It is considered good for sparse datasets and is efficient in terms of memory and runtime consumption than Apriori and FP-growth. But in the case of dense datasets, FP-growth significantly overpowers H-Mine. Borgelt [24] developed an effective implementation of an FP-growth projection design to accelerate the conditional FP-tree construction process. Grahne et al. [25] proposed a new FP-array approach to minimize tree traversing time, but the recurring construction of conditional FP-trees still exists. Lin et al. [26] utilized a new address table improved FP-growth (IFP-growth) structure to boost FP-growth efficiency; however, it still requires much computation to generate a conditional FP-tree. Tanbeer et al. [27] introduced a new compressed pattern tree structure (CP-tree) capturing one scan (insertion phase) database information. It has been proved that CP-tree enhances frequent pattern mining efficiency and can be useful in applications, such as interactive and accumulative mining. Current methods of parallelization are based largely on the methodology of multithreading and distributed architecture. For example, the work in [28] parallels FP-growth with shared memory using a multithreading approach. FIM workflows are conducted on separate computing devices and performed the last results by analyzing output data. Another successful distributed computing framework implemented at Hadoop, Map-Reduce [29] provided a secure, powerful, and fault-tolerant data processing service. For instance, the parallel FP-growth (PFP) algorithm [30] employed three Map-Reduce phases to extract tasks of the FP-growth and combine intermediate data. The frequent node set-based boundary POC-tree (FNBP) algorithm [31] employed the Map-Reduce approach to identify mining tasks by establishing an early-stage boundary to remove the irrelevant patterns. The mining medical aggregation behavior based on distributed computing (DCMMAB) methodology [32] incorporated the distributed Map-Reduce computing model with the ARM techniques to identify abnormal activities during the medical insurance process. It used a detection algorithm throughout the medical cluster which focuses on distinct behaviors of patients in medical treatment based on frequent pattern mining. However, Map-Reduce would be generally not preferred to implement workflows with iterative operations that are extremely typical of the ARM algorithm due to the increased usage of the Apache Spark framework [33]. Many methods have been proposed to accelerate Spark’s memory-based workflow engine, such as yet another frequent itemset mining (YAFIM) [34]. These algorithms also suffer due to the inherent Apriori limitation from the complexity of time and memory. Zhang et al. [35] introduced the distributed frequent itemset mining (DFIMA) pruning procedure using a matrix that may minimize the computation complexity of the pattern growth stage. The parallel Spark workflow version of the array prefix-tree growth (PAPT-growth) algorithm [36] facilitated the processing of large data as a Spark workflow. There is also the associative classification (AC) method [37]; however, it suffers from many regulations, which is an issue inherited from the ARM. The classification-based association (CBA) [37] is one of the AC-developed algorithms. It employs the Apriori’s generating candidates function for exploring association rules from databases. The active pruning rules (APR) in [38] as a new method of classification showed an improvement in classification precision. The technique of sequential minimally optimized (SMO) ontology of context (EAA-SMO) employed an enhanced Apriori algorithm (EAA) for mining and analyzing collected data by using the discovered patterns and rules [39]. The fast incremental updating frequent pattern growth algorithm (FIUFP-growth) is a new approach that focuses on incremental frequent itemset mining by using a new incremental conditional pattern tree structure (ICP-tree) [40]. Table 1 summarizes all of the above-mentioned related works.

Table 1 Summary of the literature review from 2000 to 2021 related work

The above-related works provide us a better understanding of previous issues. Therefore, it is necessary to improve the operating efficiency of the FIM algorithms to mine the high-dimensional datasets. The contribution of this paper focused on how to enhance the efficiency of the execution time and the association rules of the algorithm. As irrelevant association rules will misguide the decision-making, and this is must be considered. This paper concentrates on improving the reliability and reduction of the memory space specifications of the mined association rules.

3 The proposed modified FP-growth algorithm

The proposed MFP-growth algorithm aims to decrease both runtime and memory usage. The FP-growth algorithm requires traversing the header table at item frequency in descending order to collect each item and then scan the current FP-tree to collect the conditional pattern base of each item. Then, recurrently create the conditional FP-tree without making the best use of some pruning strategies. Primarily, the pattern growth is derived through the series of suffix patterns with the frequently generated itemsets. Figure 1 presents an example of the recursive mining operation in the FP-growth algorithm [41]. These FP-tree recursive constructions require more time and memory. This, a new structure is required to avoid these costly operations to improve runtime performance.

Fig. 1
figure 1

Recursive mining process of FP-growth algorithm

The MFP-tree is a prefix-tree with a node layout and a frequent item address table consists of three fields: One field is indicated by a node list for the frequent item, whereas the other two fields carry the item and the total frequent count for every node throughout the node list. It also maintains the data between itemsets.

The structure of the proposed MFP-tree can be designed as shown as follows:

  • It consists of three attributes namely:

  • Root node: defined as “null.”

  • Child node: a set of prefix subtrees.

  • Header table: header table of frequencies of items.

There are three attributes in every node in the FP-tree:

  • Item name: registers the names of the items stored in this file.

  • Count: registers the number of item occurrences.

  • Node link: prefers to the next node that holds the same item or is labeled with null if there is none.

There are two characteristics for every node in the FP-tree*:

  • Item -name

  • Head of node link: points to an address table.

The MFP-tree primarily builds over an FP-tree structure, which is developed to keep the information in the dataset without the need for a new generation of FP-tree. Besides, the MFP-growth algorithm eliminates the generation of subtrees and conditional pattern base that consumes more time and memory. Accordingly, the proposed algorithm has quickly traversed the tree. The FP-tree* technique is presented in this paper. It is used in every conditional pattern mining process to minimize the need for reconstructing FP-trees. By combining the FP-tree* approach with the conditional FP-tree methodology, an MFP-tree mining approach has been proposed to effectively discover frequent itemsets. The FP-tree* mining concept is similar to that of the conditional FP-trees, but the FP-tree* mining direction differs from that of the conditional FP-tree. Every FP-tree* can be built on the original FP-tree which reduces the memory demands.

The FP-tree* construction is presented as follows: Each node in the FP-tree* consists of the item name, node count, and node link. Algorithm 1 shows the procedure of constructing an FP-tree*. Each parent node (PNode) support is accumulated from the children nodes (CNode). The node link of every node connects the node with the same item name through its node link form.

figure a

Figure 2 shows the mining scheme of the FP-tree* of item p. The frequent itemsets associated with f, c, and p are recursively mined. The frequent item “p” is derived and the following paths are drawn as follows: {< f:1, c:1, a:1, p:1 > , < f:1, c:1, p:1 > , < c:1, p:1 >} of node p. The supports of nodes f and c are 2 in the tree are assembled from path < f, c, a, p > and < f, c, p > . In FP-tree*, the frequent item is in the address table, a frequent itemset {pc} is output, and the FP-tree* of {pc} is constructed for mining consecutively. There are no frequent itemsets derived from {pc} in the example. Therefore, the associated frequent itemsets of p are {p, pc}.

Fig. 2
figure 2

FP-tree* of item p

The FP-tree* procedure will be defined as follows: First, the memory use is saved without reconstructing conditional FP-trees as FP-growth. Second, the construction of the FP-tree* is quick because no extra costs are needed to eliminate infrequent items.

The MFP-tree generates and extracts frequent patterns. Algorithm 2 shows the basic steps of the MFP-tree construction, namely: (1) Find frequent itemsets by scanning the DB once, (2) sort the frequency of items in descending order, and (3) scan the DB and build the MFP-tree again. The MFP-growth adopts a hybrid method for frequent itemset mining. The frequent itemsets (P) are the output whenever new frequent itemsets are found. At the beginning of the algorithm, the frequent itemsets in a single path are discovered. Then, the MFP-growth mining method recognized the conditional pattern tree type. And depending on the conditional pattern tree (β) type, the algorithm generates all the frequent itemsets from the MFP-growth tree by performing the MFP-growth mining procedure shown in Algorithm 3. The explored frequent itemsets from MFP-tree are from last to first using the “bottom-up” approach.

figure b
figure c

Figure 3 shows the flowchart of the MFP-growth algorithm. The construction of MFP-growth is as follows. Table 2 presents the transaction database. Table 3 displays the list of frequent items with their support count. A frequent item is arranged in the descending sequence of support count. Items with lesser support value are pruned.

Fig. 3
figure 3

Flowchart of the MFP-growth algorithm

Table 2 Transaction database
Table 3 Items priority

A simple example of the MFP-tree construction process is illustrated in Fig. 4. At first, the MFP-algorithm generates a node called “root.” The second scan starts generating nodes for each transaction. Figure 4 shows that the first transaction of the database contains five items f, c, a, m, and p arranged in descending order, building the first wing (f: 1) (c: 1) (a: 1) (m: 1) (p: 1) in MFP-tree with five nodes. The second transaction contains five items (f: 1) (c: 1) (a: 1) (b: 1) (m: 1), where the five nodes are generated. Here this branch shares the prefix ‘f, ‘c, ‘a with the existing path of transaction “01.” Following this approach, the count of the nodes is incremented to “2.” For items ‘b, ‘m new nodes are generated from item ‘a. It continues for the remaining transactions. Figure 3(c) shows the complete MFP-tree.

Fig. 4
figure 4

Construction of MFP-tree

4 Results analysis and discussion

The validity of the proposed MFP-growth algorithm was examined through four experiments. The first experiment was performed on MFP-growth, CBA, and APR algorithms. The second experiment was conducted to test the MFP-growth algorithm superiority compared with the four recent frequent itemset mining algorithms. The third experiment is a comparative analysis of the MFP-growth and five FIM algorithms in terms of their memory consumption. Finally, the fourth experiment was performed to examine the runtime of the modified new algorithm MFP-growth, PSOFP-growth, and the famous algorithms Apriori and FP-growth. The runtime is the time to mine recurrent itemsets of entirely different transactions. For performance evaluation, datasets of various domains are chosen. Different support levels for each dataset were used. The Weka 3.8 workbench tool [42] for data mining tasks is used to conduct the evaluations. It is open-source machine learning software built in Java, which implements different data mining methods such as classification, regression, association rule mining, and visualization. The experimental setup was performed using a desktop computer with 4 GB of free RAM and a Core i5 processor running Windows 10. The experiment measures the execution time against the “minsup” threshold parameter.

4.1 EXP 1: classifier processing time

The proposed MFP-growth algorithm performance was evaluated using various repository datasets from the University of California Irvine (UCI) [43]. Table 4 presents the entire information of the datasets used for the implementation where all numerical attributes have been described. Several established classification algorithms were used to conduct the experiments, which generate rule-based classifiers. In all experiments, the minimum support and minimum confidence for APR, CBA, and MFP-growth were set at 2% and 30%, respectively, according to [38]. The minimum support value has a significant influence in controlling the number of generated candidates. Furthermore, the minimum confidence value has a minimal influence on the execution and is set to 30%.

Table 4 UCI dataset characteristics

Figure 5 shows the time algorithms required for the classifier construction on UCI datasets. The proposed algorithm consistently outperforms the CBA and APR in terms of the required training time to generate the classifier. This is because of the MFP-growth classifiers pruning procedure discards the generation of conditional subtrees. This is not the case with CBA, because it assumes that all associated rules specified the confidence threshold. Thus, the CBA will test all the candidates until the training dataset is empty during the classifier construction. On the other hand, the APR classifier needed a lower execution time than CBA to generate the datasets. That is noticeable on the “Mushroom” and “Led” datasets compared with the other datasets, which have a significant number of data examples. However, the proposed algorithm took less processing time to establish the classifier compared to CBA and APR which provides clear evidence of the MFP-growth pruning procedure effectiveness.

Fig. 5
figure 5

Processing time needed for creating the UCI dataset classifier

4.2 EXP 2: execution time evaluation

Several workflow experiments were conducted on a Spark platform to evaluate the MFP-growth with the PFP, PAPT-growth, DFIMA, and YAFIM algorithms to test the execution time for the processing of large-scale databases [33]. Various datasets were used in the experiments, such as mushroom, accidents, KddCup99, and Kosarak, downloaded from [43], whereas the synthetic datasets, such as T10I4D100K and T40I10D100K were obtained from [44] using the IBM data generator for the latter. Table 5 presents the properties and characteristics of every dataset. Different minimum support levels run on every dataset in this experiment. Figures 6 and 7 show the experimental results for each dataset.

Table 5 Properties of the real-life and synthetic datasets
Fig. 6
figure 6figure 6

Performance of running time on real-life datasets

Fig. 7
figure 7

Performance of runtime on synthetic datasets

The mushroom dataset is not big but highly complex, and transactions were mostly replicated and irrelevant, indicating that it involves some possible frequent items within that dataset. Figure 6a shows the algorithm runtime on the mushroom dataset for different levels of support. Due to Apriori’s inherent limitation, the running time of YAFIM is far beyond the axis range. This is why the YAFIM curve is not presented in the figure. As shown in the figure, the PFP curve tends to rise with the threshold reaches 1% probably because there are a lot of items with a support count between 1 and 3%. The curves of MFP-growth, PAPT-growth, and PFP are quite smooth with a support count between 4 and 5%. In this experiment, the MFP-growth outperforms the other three algorithms and the superiority was very obvious because the dataset is dense, which has a major influence on the operation of the candidate generation required by their pruning procedure.

The accidents database is a quite dense database that contains data about traffic accidents. Figure 6b shows the experimental results on the accident dataset. The increase in minimum support value influences the runtime in all algorithms. The reason is that the associated patterns number is decreased with the increase in the level of support, which makes a significant improvement of the recursive process.

KddCup99 and Kosarak, known as UCI repository, are both unique real-life databases. KddCup99 consists of several short transactions and small separate objects, making it fairly easy to process. Figure 6c and d shows the measurement of runtime on KddCup99 and Kosarak, respectively. As shown in Fig. 6c, all the output curves increase gradually because the support threshold decreases. The MFP-growth outperforms the other state-of-the-art algorithms. Kosarak is a sparse dataset that contains a lot of distinct items. All performance curves gradually increase with the support threshold count decreasing, as shown in Fig. 6d. The MFP-growth has a better time performance than PFP, PAPT-growth, DFIMA, and YAFIM algorithms.

Figure 7 shows the measures of effectiveness for synthetic datasets, i.e., T10I4D100K and T40I10D100K. These datasets are created with different conditions by the IBM data generator. Both are similar, but they vary in size and complexity. All algorithms have a consistent efficiency with different thresholds, as seen in Fig. 7a and b. Because T10I4D100K consists of several distinct objects, there is little effect on the frequent itemset number due to the variability in the support value. Thus, the curves appear to be smooth. As shown in Fig. 7b, the MFP-growth algorithm’s running time is very low; thus, increasing the mining speed. These results show that MFP-growth using the MFP-tree is superior based on the time cost for FIM over PAPT-growth, PFP, DFIMA, and YAFIM. Even if the minimum support is reduced, the MFP-growth provides an effective method. Therefore, the MFP-growth is very appropriate for high-performance computing.

4.3 EXP 3: memory consumption evaluation

This experiment illustrates the performance evaluation of six frequent itemset mining algorithms on the zoo dataset. Zoo is a dense dataset that contains a lot of common items. The algorithms have been compared based on their memory usage for discovering the set of all valid association rules [45]. The entire information of the zoo dataset is shown in Table 6. The results obtained are shown in Fig. 8.

Table 6 Dataset description
Fig. 8
figure 8

Performance of memory usage on a dense dataset

The graph longitudinal axis shows the memory in MB, and the latitudinal axis shows the different support threshold values. The charts portrayed in Fig. 8 show the memory consumption of the MFP-growth, Apriori TID, H-Mine, Apriori, ECLAT, and FP-growth algorithms. The support threshold varies from 20 to 40% to record the maximum utilization of memory at different counts of support. Since zoo is a dense dataset and it contains a lot of frequent items, the size of the constructed MFP-tree will be significantly lower due to the overlapping of frequent items. Moreover, the proposed algorithm eliminates the generation of conditional pattern base and conditional subtrees. This eliminates the need for space to store conditional patterns. Therefore, the experimental results proved the superiority of the proposed MFP-growth algorithm in terms of memory consumption. The FP-growth is the next best in memory usage due to the compact and small FP-trees construction. On the other hand, Apriori, and Apriori TID are the most expensive algorithms in terms of memory usage due to the storage of the huge number of candidate generations during each pass and multiple database scans.

4.4 EXP 4: number of generated association rules

This study investigates the association rules for social security incidents. We compare the proposed MFP-growth algorithm with the PSOPF-growth [21], the standard Apriori, and the FP-growth algorithms to confirm the superiority of our algorithm and analyze the output association rules. Table 7 shows that the association rules explored by the MFP-growth algorithm are approximately 29% lower than that of the Apriori and FP-growth algorithms by adjusting the present support value and using filter data to the association rule that satisfies the specific requirements. Some irrelevant rules were removed that reduced the memory space requirements, making it easy to find the rules we need from massive datasets. These results showed that the modified algorithm could construct a new set of rules with high confidence and improve efficiency, indicating the significant effect of using a data filter on the output.

Table 7 Comparison of the number of rules that are produced with the conventional algorithm

The experiments were conducted with the adapted new algorithm MFP-growth, PSOFP-growth, Apriori, and FP-growth algorithms. The runtime comparison of the algorithms is presented in Fig. 9. The MFP-growth algorithm requires even less time than the PSOFP-growth, the Apriori, and FP-growth traditional algorithms. As shown in the figure, the greater the support is set, the lesser the execution time of the algorithm, indicating the significant impact of support levels on the algorithm performance. These results showed that, in terms of time cost, the MFP-growth, using the MFP-tree is optimal for the ARM over the PSOFP-growth, Apriori, and FP-growth algorithm.

Fig. 9
figure 9

Comparison of runtime with different minimum support

5 Conclusion

This paper presented a new scheme to discover associations from a wide range of relations across the dataset. Besides, we developed an improved MFP-growth algorithm that discovers frequent associations is developed by combining the FP-tree* mining method and the header table of the FP-growth. The major advantages of the MFP-growth and MFP-tree are that they eliminate the requirement for reconstructing conditional pattern base, subtrees, and simplify the function of tree construction. Extensive workflow experiments through different types of datasets showed that the MFP-growth significantly improves the recent mining speed FIM algorithms and the association rules generated with various support levels and confidence thresholds. The first experiment was evaluated on a set of 12 different UCI datasets and compared its performance with the CBA and APR algorithms. To further evaluate the MFP-growth efficiency, the second experiment was compared with four recent frequent itemset mining algorithms: PAPT-growth, PFP, YAFIM, and DFIMA. The third experiment describes the association rules that satisfy the degree of interest and the runtime of the enhanced MFP-growth algorithm, PSOFP-growth, Apriori, and FP-growth algorithms to the ARM of social security incidents. Different support levels for each dataset were used. Finally, data filters were also used to prove the effectiveness of association rules, facilitating the collection of productive knowledge. The findings confirmed the MFP-growth efficiency and ensured that it is a suitable algorithm for dense datasets. However, MFP-growth can also have the same constraint on efficiency as FP-growth because both need a prefix-tree to state transaction information. The MFP-tree cannot be stored on one node in a transaction database greater than a certain size. Although the information is in a compact form in the MFP-tree structure, long-running times may render it impractical for time-critical applications. Thus, when the database is very large, the algorithm may not fit in the shared memory.

It is our future interest to further calculate the accuracy and performance evaluation by applying the modified algorithm on other real-life datasets with a massive size. Moreover, association rule discovery can be a powerful weapon in fighting pandemics when it is utilized in data mining to develop sciences and to provide new insights into health care.