1 Introduction

With the rapid growth of technology, data collection has become a relatively easy task. A huge number of data can be easily collected and transformed into valuable information. Data mining is a method of retrieving and transforming raw data into meaningful information [1]. One well-known data mining technique is association rule mining. Association rule mining (ARM) is an approach to discover a valuable relationship among a set of items within a given set of data. It was first introduced by Agrawal et al. [2]. ARM produces a list of rules, made up of antecedents and consequents.

In ARM, dealing with a dataset containing categorical and quantitative attributes is a challenging problem. Most ARM methodology discretizes attributes and treats them as categorical attributes [3]. However, quantitative attributes usually have a wider range of continuous values, and a complex process is required to discretize all the attributes [4, 5]. In order to deal with numerical dataset, a discretization scheme is required to transform numerical data into categorical data. This transformation requires a predefined range or limitation, which is subjectively defined. Thus, if the range is not suitable, the result is unlikely to be accurate. In order to overcome this problem, some methods have included the discretization in their algorithm [6, 7]. This paper also proposes an association rule algorithm which can automatically categorize numerical data. The proposed algorithm therefore does not require data discretization, since it can automatically categorize each variable.

Measuring the quality of rules discovered by a data mining algorithm is a non-trivial problem. It involves several criteria, some of which are quite subjective. Ideally, the rules discovered by a data mining algorithm should be accurate, comprehensible (simple) and interesting (novel, surprising, useful) [8]. For example, a rule {Pregnant} → {Female}. This rule has a very high accuracy and can be considered as a simple and comprehensible rule. However, it is entirely uninteresting since it contains a very obvious relationship. In order to cover these objectives, it has been proposed that ARM considers multi-objective problems rather than single-objective problems.

Solving ARM as a multi-objective problem has been proposed in some previous studies. Ghosh and Nath [3] and Qodmanan et al. [9] proposed a multi-objective ARM considering three common objectives: confidence, comprehensibility and interestingness. These studies solved ARM using a multi-objective genetic algorithm. Other metaheuristic-based methods for solving multi-objective ARM have been proposed by Alatas et al. [6] and Beiranvand et al. [4]. Alatas et al. [6] considered the confidence, comprehensibility and amplitude of interval as the objectives, and solved the problem using a differential evolution-based algorithm. Recently, Beiranvand et al. [4] applied particle swarm optimization to solving multi-objective ARM.

This paper proposes a rule discovery method that uses an adaptive archive grid multi-objective particle swarm optimization algorithm (MOPSO) to find the effective rules. It considers three objectives, namely confidence, comprehensibility and interestingness. In addition, this paper is also focuses on numerical data.

The remainder of this paper is arranged as follows. Section 2 describes the basic concepts of association rules, multi-objective optimization and PSO. The proposed MOPSO for numerical association rule mining is described in Sect. 3. Section 4 presents a thorough discussion on computational experiences. Section 5 presents the case study. Finally, concluding remarks are given in Sect. 6.

2 Literature study

This section briefly reviews some theories related to association rule mining and particle swarm optimization as applied in this paper.

2.1 Association rules

Association rule mining (ARM) was first introduced by Agrawal et al. in 1993. It aims to find interesting relations between variables within a dataset. An association rule is an implication expression of the \(X \to Y\), where \(X\) and \(Y\) are disjoint item sets. X is referred to as the antecedent, while Y is called the consequent. Usually, association rules are applied to datasets with Boolean attributes. Although Boolean ARM can extract meaningful information from the data, many real data consist of categorical (e.g., sex or brand) and quantitative (e.g., age, salary or heat) data.

2.2 Multi-objective optimization

The main challenge in the real-world optimization problem is that multiple solutions may exist, and it is difficult to compare one solution with another. Thus, how to simultaneously optimize a multi-objective problem is practically relevant. This paper applies a Pareto method to optimize more than a single objective. In the Pareto method, a candidate solution that is better than all other candidates for each objective is said to dominate other candidates. None of the solutions included in the POF are better than the other solutions in the same POF for all the objectives being optimized; hence, all of them are equally acceptable.

In the maximization problem, let \(F = \left\{ {f1,f2, \ldots fn} \right\}\) be a set of objective functions. A solution s belongs to the POF if there is no other solution \(s^{\prime}\) that dominates it. A solution \(s^{\prime}\) dominates \(s\) if and only if the following two conditions in Eq. (1) are satisfied:

$$\forall_{i} \in \left\{ {1,2, \ldots ,n} \right\}, f_{i} \left( {s^{\prime}} \right) \ge f_{i} \left( s \right), \;{\text{and}} \;\exists_{i} \in \left\{ {1,2, \ldots n} \right\}, f_{i} \left( {s^{\prime}} \right) > f_{i} \left( s \right)$$
(1)

Figure 1 shows an example of a Pareto optimal front with two objectives \(f_{1}\) and \(f_{2}\), which must be maximized simultaneously. The shaded region represents the feasible solutions. \(s_{i}\) represents solution \(i\) in the objective space. Solutions \(s_{1} ,s_{3} ,s_{6} ,s_{9} ,s_{11} ,s_{12}\) are not dominated by any other solutions among the available feasible solutions.

Fig. 1
figure 1

Pareto front concept

2.3 Metaheuristic method

Metaheuristics provide acceptable solutions in a reasonable time for hard and complex problems in science and engineering [10, 11]. Recently, metaheuristic algorithms have been applied in association rule mining [12, 13]. Table 1 summarizes some previous studies related to the application of metaheuristic algorithms in multi-objective association rule mining for numerical datasets.

Table 1 Studies related to metaheuristics in numerical ARM

2.4 Multi-objective particle swarm optimization using adaptive archive grid (MOPSO)

Particle swarm optimization (PSO) algorithm is a nature-inspired metaheuristic proposed by Eberhart and Kennedy [14]. Each particle represents a potential solution. In finding the optimal solution, PSO algorithm maintains the searching directions of all particles. Let \(x_{i} \left( t \right)\) denote the position of particle \(i\) in the search space at time step \(t\) [15]. Each particle iteratively moves across the search space to find the best solution (fitness). Besides its current position, each particle also records its own best position, as well as the fitness value. This position is called pbest. Another position, called gbest, is the best position among all particles’ positions. During the exploration process, the particles update their velocity (\(v_{i}\)), pbest and gbest. The velocity is updated using Eq. (2):

$$\vec{v}_{i} \left( {t + 1} \right) = w\vec{v}_{i} \left( t \right) + r_{1} c_{1} \left( {pbest - \overrightarrow {{x_{i} }} \left( t \right)} \right) + r_{2} c_{2} \left( { gbest - \overrightarrow {{x_{i} }} \left( t \right)} \right),$$
(2)

where \(w\) is an inertia weight to control the exploration and exploitation abilities of the particle. \(r_{1}\) and \(r_{2}\) are two random numbers uniformly distributed in the range of [0, 1], while \(c_{1}\) and \(c_{2}\) are two acceleration constants which usually lie between [1, 4].

Multi-objective optimization (MOO) differs from single-objective optimization. This study applies a multi-objective optimization inspired by Coello et al. [16]. The main objective of an archive in multi-objective optimization algorithms is to track all of the non-dominated solutions found so far. The MOPSO is proposed based on the Pareto archive strategy. It uses an adaptive grid concept, in which the objective space is separated into a number of hypercubes. The edge length of these hypercubes, \(l_{k}\), is calculated in Eqs. (3)–(5):

$$l_{k} = \frac{{\alpha \left( {f_{{{ \hbox{max} },k}} - f_{{{ \hbox{min} },k}} } \right)}}{nGrid},$$
(3)

where

$$f_{{{ \hbox{max} },k}} = \mathop {\hbox{max} }\limits_{{a = 1, \ldots A.n_{s} }} \left\{ {f_{k} \left( {x_{a} } \right)|x_{a} \in A} \right\},$$
(4)
$$f_{{{ \hbox{min} },k}} = \mathop {\hbox{min} }\limits_{{a = 1, \ldots A.n_{s} }} \left\{ {f_{k} \left( {x_{a} } \right)|x_{a} \in A} \right\},$$
(5)

where \(f_{k}\) is the value of objective function \(k\), \(f_{{{ \hbox{max} },k}}\) and \(f_{{{ \hbox{min} },k}}\) are the maximum and minimum values of objective function \(k\), \(\alpha \in \left( {0,1} \right)\) is the parameter controlling the ratio of objectives’ value and grid size, \(nGrid\) is the grid size and A denotes the archive.

A truncated archive is used to store non-dominated solutions. For each iteration, if the archive is not yet full, a new particle position representing a non-dominated solution is added to the archive. However, because of the size limit of the archive priority is given to new non-dominated solutions located in less populated areas. In the case that members of the archive must be deleted, members in densely populated areas have the highest probability of deletion. More densely populated hypercubes have a lower score. Roulette wheel selection is then used to select a hypercube, \(H_{h}\), based on the selection fitness values. The global guide for particle \(i\) is selected randomly from among the members of hypercube \(H_{h}\). Particles will therefore have different global guides.

3 Methodology

This paper proposes a multi-objective particle swarm optimization algorithm using an adaptive archive grid for numerical association rule mining. The original MOPSO algorithm was proposed by Coello et al. [16]. They proposed a multiple objective particle swarm optimization algorithm integrating the concept of Pareto dominance and adopted an archive controller. The integration of these two approaches decides and stores the membership of new non-dominated solutions found in each iteration. The proposed MOPSO algorithm considers three objectives: confidence, comprehensibility and interestingness.

  1. (a)

    Confidence

Confidence is a standard measurement for association rules. Confidence is very important to measure how often Y appears in transactions t that contain X. The confidence value, \(\varvec{CONF}\), can be obtained by Eq. (6):

$$\varvec{CONF} = \frac{{\sigma \left( {X \cup Y} \right)}}{\sigma \left( X \right)},$$
(6)

where \(\sigma \left( {X \cup Y} \right)\) is the total number of transitions that contain both X and Y, and \(\sigma \left( X \right)\) is the total number of transactions that contain X only [17].

  1. (b)

    Comprehensibility

Knowledge comprehensibility is a kind of subjective concept—a rule that is incomprehensible to one user may be very comprehensible to another. Nevertheless, to avoid difficult subjective issues, the data mining literatures often use an objective measure of rule comprehensibility. In this paper, the fewer the number of rules in the antecedent, the more the comprehensible it is, according to Ghosh and Nath [3]. Equation (7) is used to quantify the comprehensibility of the rules which may contain more than one attribute in the consequent:

$$\text{Comprehensibility} = \frac{{\log \left( {1 + \left| C \right|} \right)}}{\log (1 + |A \cup C|)},$$
(7)

where |C| and \(\left| {A \cup C} \right|\) show the number of attributes in the consequent and the whole rule (both antecedent and consequent of the rule), respectively [5].

  1. (c)

    Interestingness

Rule interestingness measures the potential for rules generated by ARM to be surprising for the user. The basic intuition about this approach is to find the potentially interesting rules over the whole dataset. For instance, {Salary = High} → {Credit = good}. This rule tends not to be very surprising for the user—even though it might be an overall good rule, in the sense of being accurate and comprehensible. In contrast, a similar rule {Salary = High} → {Credit = bad} will be surprising and so potentially interesting since its prediction is the opposite of what the user would expected, given the occurrence of the condition in the rule antecedent [18]. The interestingness can be calculated by Eq. (8):

$$\text{Interestingness} = \left[ {\frac{{SUP\left( {A \cup C} \right)}}{SUP\left( A \right)}} \right] \times \left[ {\frac{{SUP\left( {A \cup C} \right)}}{SUP\left( C \right)}} \right] \times \left( {1 - \frac{{SUP\left( {A \cup C} \right)}}{SUP\left( D \right)}} \right).$$
(8)

The equation contains three parts. The first expression describes the probability of generating the rule based on the antecedent. The second part shows the probability based on the consequent, while the last one describes the probability of not generating the rule based on the whole dataset [5].

These objectives are computed and considered as a maximization problem based on the Pareto optimal strategy. The results of this method are measured by several measurement factors for evaluation purposes. There are at least three factors that can be evaluated in order to ensure the quality of the generated rules, namely support, amplitude and size.

  1. (a)

    Support

This criterion measures the quality of a rule based on the number of occurrences of a rule in the whole dataset. A rule that has very low support may occur simply by chance. A rule with more occurrences in the dataset is considered to be of better quality. This support is evaluated over all the rules generated, using Eq. (9):

$$SUP = \frac{{\sigma \left( {X \cup Y} \right)}}{n}$$
(9)
  1. (b)

    Amplitude

Amplitude is obtained to evaluate the mean size of the intervals extracted for each attribute in the rules; smaller amplitude is considered more interesting. The amplitude value is calculated by Eq. (10):

$${\text{Amplitude}} = \frac{1}{m }\mathop \sum \limits_{i = 1}^{m} \frac{{u_{i} - l_{i} }}{{\hbox{max} \left( {A_{i} } \right) - { \hbox{min} }\left( {A_{i} } \right)}}$$
(10)

where \(u_{i}\) and \(l_{i}\) are the upper and lower bounds of interval \({i}\), respectively, \(A_{i}\) is attribute \({i}\) and \({m}\) is the number of attributes.

  1. (c)

    Size

The size shows the mean number of attributes appearing in the rule result. Based on the comprehensibility objective, the fewer the attributes in the antecedent, the more understandable the rule! Thus, this factor is measured by the number of attributes involved in each rule. A smaller size indicates that the attributes involved are not too numerous.

3.1 MOPSO algorithm

The proposed MOPSO for numerical ARM consists of three parts: initialization, adaptive archive grid and particle swarm optimization (PSO) searching. Figure 2 shows the flowchart of the MOPSO algorithm.

Fig. 2
figure 2

Multi-objective particle swarm optimization algorithm using adaptive archive grid

3.1.1 Solution representation

In this MOPSO algorithm, each particle represents a rule solution. Each attribute in the particle consists of three parts. The first part of an attribute in the particle specifies which part of the rule it belongs to. If \(0 \le {\text{AC}}_{i} < 0.33\), then the attribute belongs to the antecedent. If \(0.33 \le {\text{AC}}_{i} < 0.66\), then the attribute belongs to the consequent. If \(0.66 \le {\text{AC}}_{i} \le 1\), then the attribute belongs to neither. The second and third parts represent the upper and lower bound values of that attribute in the dataset. The upper bound is always greater than the lower bound generated in each particle. Table 2 illustrates the particle representation.

Table 2 Solution representation

For example, suppose a dataset consists of four attributes (A, B, C, D). MOPSO represents a solution for this dataset as follows:

$${\text{A}}\left( {0.1} \right)\left( {0.2} \right)\left( {0.5} \right){\text{ B}}\left( {0.45} \right)\left( {0.8} \right)\left( {0.9} \right){\text{ C}}\left( {0.9} \right)\left( {0.1} \right)\left( {0.3} \right){\text{ D}}\left( {0.36} \right)\left( {0.7} \right)\left( {0.8} \right)$$

The expression above represents a generated rule A → BD. Attribute C is not included in the rule, since the value of AC is equal to 0.9. The rule contains attribute A from 0.2 until 0.5, attribute B from 0.1 until 0.3 and attribute D from 0.7 until 0.8.

3.1.2 Initialization

The initial particles and velocities are randomly generated based on the size of the population and the aforementioned particle structure. The initial particle generation is based on [19]. The random particles are evaluated to find the potential solutions based on their objectives.

3.1.3 Adaptive archive grid

In this step, all the particle solutions are compared to obtain the non-dominated solution based on the Pareto optimality approach. According to Coello et al. [16], there are two components to maintain the repository as follows:

  1. (a)

    The archive controller

This decides whether a certain solution is eligible or not. The non-dominated vectors found at each iteration in the primary population are compared (on a one-to-one basis) with respect to the contents of the external repository, which will be empty at beginning of the search. If the external archive is empty, then the current solution is accepted (Fig. 3 Case 1). If this solution is dominated by an individual within the external archive/repository, then such a solution is automatically discarded (Fig. 3 Case 2). Otherwise, if none of the elements contained in the external population dominate the solution wishing to enter, then such a solution is stored in the external archive (Fig. 3 Cases 3 and 4). Finally, if the external population has reached its maximum allowable capacity, then the adaptive grid procedure is invoked (Fig. 3 Case 5).

Fig. 3
figure 3

Possible cases for archive controller

  1. (b)

    The grid

In order to produce well-distributed Pareto fronts, this approach uses a variation of the adaptive grid proposed by Knowles and Corne [20]. The basic idea is to use an external archive to store all the solutions that are non-dominated with respect to the contents of the archive. In the archive, the objective function space is divided into regions, as shown in Fig. 4. Note that if the individual inserted into the external population lies outside the current bounds of the grid, then the grid must be recalculated, and each individual within it must be relocated, as shown in Fig. 5.

Fig. 4
figure 4

Adaptive grid when the solution lies within the current boundaries of the grid

Fig. 5
figure 5

Adaptive grid when the solution lies outside the previous boundaries of the grid

The adaptive archive grid is a space formed by hypercube. The detailed steps for the adaptive archive grid are as follows:

Step 1:

Obtain and store all non-dominated particles in the archive.

Step 2:

The archive controller determines which particles must be inserted into the repository or archived, based on the rules in Fig. 3. The archive controller will allow all particle values to be added into the archive, since it is an empty archive, namely rep.

Step 3:

Generate a hypercube of the search space explored so far, and locate the particles using these hypercube as a coordinate system, where each particle’s coordinates are defined according to the values of its objective functions.

Step 4:

Obtain all non-dominated particles. The non-dominated particles are obtained after the particles move to different positions in the search space. These new non-dominated particles are ready to be inserted into rep. However, the size of these archives must be checked first, and if it exceeds the predefined size, the excess particles must be deleted based on the hypercube position. Otherwise, it will just be updating rep.

3.1.4 PSO searching

The PSO searching as a part of the proposed MOPSO algorithm follows the following steps:

Step 1:

Stopping criteria

The stopping criterion in this MOPSO procedure is the number of iterations.

Step 2:

Local best selection (pBest)

Local best is the memory of each particle that serves as a guide to travel through the search space. If the current particle dominates the local best, then the local best will be updated by the current particle.

Step 3:

Global best selection (rep(h))

The selection of global best (rep (h)) is made using roulette wheel selection from rep. The index h is selected in the following way: Those hypercubes containing more than one particle are assigned fitness equal to the result of dividing any number x (x > 1, in this paper, x = 10 [16]) by the number of particles they contain.

Step 4:

Velocities update

The speed of each particle is computed using the velocity by Eq. (2).

Step 5:

Boundary constraints

The boundary constraints restrict the velocity and prevent particles moving beyond the feasible search space. In this study, the particle limit is between 0 and 1. The velocity limit is from − 0.5 to 5, based on [16].

Step 6:

Particle update

The new position of particle \(i\), \(pop\left( i \right)\), is updated using Eq. (11), where \(v\left( i \right)\) is the velocity of particle \(i\).

$$pop\left( i \right) = pop\left( i \right) + v\left( i \right)$$
(11)
Step 7:

Particle evaluation

The new particle position is re-evaluated to find other potential solutions. The particle evaluation procedure is explained under Initialization.

4 Experiment results

This section presents the evaluation of the proposed method. The proposed MOPSO for numerical ARM is applied to some benchmark datasets. The evaluation covers the sensitivity analysis for the parameter setting in the MOPSO and comparison with other metaheuristic methods.

4.1 Datasets

The computational experiments were conducted using three benchmark datasets selected from the Bilkent University Function Approximation Repository. These datasets were introduced by Guvenir and Uysal [21]. Table 3 gives a description of each dataset. The algorithm is implemented in MATLAB, on a PC with an Intel Core i7 @3.40 GHz Processor and 16 GB RAM.

Table 3 Brief information of benchmark data.

4.2 Parameter settings

Since the proposed algorithm involves some parameter settings, this section aims to analyze the effect of the parameter setting on the results. This paper applies a general factorial design to determine the best combination of tuning parameters. The experimental factors for PSO are inertia weight (\(w\)), learning rate 1 (\(c_{1}\).) and learning rate 2 (\(c_{2}\)), while the archive size parameter is fixed at 100. The factors and levels of determination are summarized in Table 4. The parameter settings in this paper are based on previous papers and a preliminary experiment [20, 22, 23]. Table 5 summarizes the results of generaal factorial design (Table 5).

Table 4 Factors and levels
Table 5 Average MOPSO solution on different tuning parameters

Analysis of variance is employed to conduct a statistical test of the MOPSO tuning parameters effect model. The statistic results given in Table 6 show that for the Basketball dataset, \(c_{1}\) and interaction between \(w\) and \(c_{1}\) significantly influence the results. The result shows that a smaller \(c_{1}\). and \(w\) lead to better results for the Basketball dataset. For the Body Fat dataset, \(w\) and interaction between \(w\) and \(c_{1}\) significantly influence the results. Better results are obtained with higher values of \(w\) and \(c_{1}\). On the other hand, for the Quake dataset, different parameter settings do not significantly affect the results. The best parameter settings for each dataset are shown in Table 7.

Table 6 Statistic test for parameter analysis (p value)
Table 7 Optimal tuning parameters for MOPSO

4.3 Experimental results and analysis

In order to obtain the best solution, 20 independent runs were conducted using the optimal parameters. The results were then compared with other algorithms. Parameter settings for these algorithms were optimized in previous studies [4, 5, 7, 24]. In this paper, the proposed MOPSO algorithm is compared with multi-objective PSO algorithm for association rules (MOPAR) [4], multi-objective differential evaluation algorithm for mining numerical association rules (MODENAR) [6], genetic algorithm association rules (GAR) [7], multi-objective genetic algorithm (MOGAR) [5] and rough particle swarm optimization (RPSOA) [24]. The first comparison considers the support result. It is then followed by confidence, amplitude, size and the number of rules.

Table 8 shows the average support value for each method over benchmark datasets. It shows that the MOPSO algorithm obtains the best support for the Quake dataset. The best algorithms for the Body Fat and Basketball datasets are the MODENAR and MOGAR algorithms, respectively. In terms of confidence, shown in Table 9, the proposed MOPSO algorithm achieves better results for two of the three tested datasets. These results also show that among the five algorithms, three have relatively higher confidence. These three algorithms are the MOPSO, MOPAR and MOGAR algorithms. This is because they use confidence as one of their objective functions.

Table 8 Comparison of the support (%) measure in different methods
Table 9 Comparison of the confidence (%) measure in different methods

Furthermore, the amplitude is presented in Table 10. The proposed MOPSO algorithm only can obtain the best amplitude value for the Body Fat dataset. For the Basketball dataset, the best result is given by the MOPAR algorithm, while for the Quake dataset, the best result is given by MODENAR. For the Basketball and Quake datasets, the MOPSO algorithm is the second best algorithm. Similar results were also obtained for the size measure presented in Table 11. Additional information about the results is summarized in Tables 12 and 13.

Table 10 Comparison of the amplitude measure in different methods
Table 11 Comparison of the size measure in different methods
Table 12 Comparison of the number of rules in different methods
Table 13 MOPSO summary result

These results prove that the proposed MOPSO algorithm is a promising ARM algorithm for numerical datasets. It can generate a set of rules with high support and confidence. That the generated rules also have smaller amplitude values shows that they are interesting rules. Furthermore, the attributes involved in the rules are not too numerous.

In addition, the experimental results reveal that PSO-based and DE-based ARM algorithms can give better results than can GA-based ARM algorithms. This might be because, when generating a set of rules, the algorithm needs to conduct deep searches. Both the PSO and DE algorithms only have a high capability to focus on narrow exploitation; however, the GA algorithm has a mutation operator. This operator is very important for avoiding local optima and performing large exploration. However, for problems with fewer local optima, or if the optimal solution is located in a narrow location, the mutation operator may become a disadvantage. Therefore, from these results, PSO-based and DE-based algorithms have relatively better results. Furthermore, the objective function also has a significant effect on the result. For instance, the MOPSO, MOPAR and MOGAR algorithms use confidence, interestingness and comprehensibility as their objective functions. Therefore, they have good results in terms of confidence and support. On the other hand, MODENAR includes amplitude in its objective functions, and its results have low amplitude.

5 Case study

The performance evaluation result presented in Sect. 4 reveals that the proposed MOPSO algorithm can perform better than some other algorithms. Thus, this paper applies the proposed algorithm to a real-world problem. This case study is obtained from a domestic industry that developed a healthy weight loss application (app). This app provides services to help users manage their personal health using a calorie counter, calorie-controlled meals and location information, and other functions for maintaining body weight. In order to identify the behavior of customers using this app, association rule mining is very useful in determining the relations between each function within the app. The app dataset is the click log history consisting of 38 page functions or attributes that would be considered as attribute inputs for the MOPSO. In addition to page attributes, there are several other important attributes.

5.1 Preprocessing

The data preprocessing involves data normalization and attribute selection. Not all attributes are included in the data processing because some attributes have zero value. The attribute selection is conducted based on the following steps:

Step 1:

Calculate the percentages of zero values for each attribute.

Step 2:

Conduct observation for three combinations by eliminating attributes that have more than 90, 80, 70 and 60% zero values.

Step 3:

Compare the results and determine the selection attributes.

Table 13 lists the alternatives of the selected attributes. The total number of attributes in solutions 4 and 5 are similar. The same situation occurs for solutions 3 and 4. However, the difference in number of attributes left between solutions 2 and 3 is significant. In other words, if solution 2 is picked, the overall attributes will contain more attributes with zero value than without zero value, since most zero values lie between 80 and 90%. Therefore, solution 3 is considered the best solution.

5.2 MOPSO implementation

In applying the MOPSO algorithm for the case study, the parameter combination for w, c1, c2 used follows the parameter setting for the Body Fat dataset. The values are 0.8, 2 and 1, respectively. This parameter combination is chosen since the number of attributes in the case study dataset is similar to that of the Body Fat dataset, and the statistical analysis result indicates that these parameter combinations have significant impact for the Body Fat dataset. The convergence histories of the MOPSO algorithm for the case study data are shown in Fig. 6.

Fig. 6
figure 6

APP fitness convergence history

Table 14 presents the association rule results. It shows that the top rules with confidence equal to 1 reveals that users tend to visit and spend more time on click_calories_map. The close relation between click_calories_map and suggest_a_meal is shown by the highest support value found on rule number 1. Users tend to browse the calories maps provided by restaurants before ordering their meals. These findings are reasonable, since the calorie information in this app is very useful support for their diet or exercise program. Users also tend to use suggest_a_meal and select_a_restaurant with exercise_information together. Furthermore, users tend to click select_a_restaurant and then click exercise_information, as found in rule #4. There is a close relation between the calories map, weight details and exercise found in rules #6, #7 and #8.

Table 14 Alternatives of selected attributes

The following conclusions can be drawn from this result:

  1. (a)

    Users of this application tend to use several functions of this app, due to a lot of zero records being found.

  2. (b)

    From the most visited functions, it is found that the calorie map provided by restaurants is very useful, helping users select restaurants or to plan their exercise (Table 15).

    Table 15 List of association rules over weight loss application dataset in duration

6 Conclusions

This paper proposes a multi-objective particle swarm optimization (MOPSO) algorithm for numerical association rule mining. While most ARM algorithms are only applicable for categorical data, the proposed MOPSO algorithm includes a discretization procedure in order to process numerical data. It represents intervals for each variable in a particle to find the best interval for each dataset automatically. Furthermore, in order to obtain interesting and reliable sets of rules, this paper considers three objectives when generating the rules, namely confidence, interestingness and comprehensibility. This algorithm does away with the need for decision-makers to determine the minimum support and confidence in advance.

In order to evaluate the proposed MOPSO algorithm, three benchmark datasets were applied. The results were compared with those of four other ARM algorithms, namely MOPAR, MODENAR, GAR and RPSOA. The computational results showed that the proposed MOPSO algorithm gives better results in terms of confidence and amplitude, resulting in fewer rules being generated. These results also reveal that PSO-based and DE-based ARM algorithms are more promising than GA-based ARM algorithms. This is because ARM requires more intensive exploitation to get a better set of rules. In addition, the objective functions used by the algorithm also influence the rules. Based on the experimental results, by choosing confidence, interestingness and comprehensibility as the objective functions, the rules generated might have higher support and confidence. This study also applied the proposed MOPSO algorithm to a real-world problem. The case study aims to reveal important information from the user behavior for a weight loss application. Based on the results obtained, further study should consider more objectives, such as amplitude, in order to obtain better rules. Hybrid metaheuristics also should be evaluated to generate better rules.