An evolutionary algorithm to discover quantitative association rules in multidimensional time series

Martínez-Ballesteros, M.; Martínez-Álvarez, F.; Troncoso, A.; Riquelme, J. C.

doi:10.1007/s00500-011-0705-4

An evolutionary algorithm to discover quantitative association rules in multidimensional time series

Original Paper
Published: 22 March 2011

Volume 15, pages 2065–2084, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Soft Computing Aims and scope Submit manuscript

An evolutionary algorithm to discover quantitative association rules in multidimensional time series

Download PDF

M. Martínez-Ballesteros¹,
F. Martínez-Álvarez²,
A. Troncoso² &
…
J. C. Riquelme¹

651 Accesses
33 Citations
Explore all metrics

Abstract

An evolutionary approach for finding existing relationships among several variables of a multidimensional time series is presented in this work. The proposed model to discover these relationships is based on quantitative association rules. This algorithm, called QARGA (Quantitative Association Rules by Genetic Algorithm), uses a particular codification of the individuals that allows solving two basic problems. First, it does not perform a previous attribute discretization and, second, it is not necessary to set which variables belong to the antecedent or consequent. Therefore, it may discover all underlying dependencies among different variables. To evaluate the proposed algorithm three experiments have been carried out. As initial step, several public datasets have been analyzed with the purpose of comparing with other existing evolutionary approaches. Also, the algorithm has been applied to synthetic time series (where the relationships are known) to analyze its potential for discovering rules in time series. Finally, a real-world multidimensional time series composed by several climatological variables has been considered. All the results show a remarkable performance of QARGA.

Differential Evolution for Association Rule Mining Using Categorical and Numerical Attributes

A novel hybrid GA–PSO framework for mining quantitative association rules

Article 20 July 2019

A New Evolutionary Algorithm for Extracting a Reduced Set of Interesting Association Rules

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

It is usual to find natural phenomena correlated to some other variables. Thus, real-world processes can be modeled by inferring knowledge from other associated variables that definitively have an effect on the original process. For instance, the existence of acid rain cannot be understood without the existence of other pollutant agents, such as monoxide carbon or sulfur dioxide. In other words, the knowledge of how some variables could affect other ones may be useful to obtain accurate behavior models.

Quantitative association rule (QAR) extraction in time series can be of the utmost usefulness for predictive purposes (Shidara et al. 2008; Wang et al. 2008). Thus, it could be interesting to find relationships among several time series to determine the range of values for a particular time series in a given time interval depending on the values of others for the same interval. For instance, rules such as $hour \in [10, 12] \wedge demand \in [12,000, 15,000] \Rightarrow price \in [3.2, 4.5]$ can provide useful knowledge for forecasting the electric energy price at peak hours (from 10 am to 12 pm) depending on the values of the energy demand during these hours. This information could help to obtain different models adjusted to different intervals or to develop a family of models for every rule. Hence, QAR are introduced in a new time series framework with the means of obtaining relationships among correlated time series that help to model their behavior.

Evolutionary algorithms (EA) have been extensively used for optimization and model adjustment in data mining tasks. In fact, the use metaheuristics in general, and of EA in particular, to deal with data mining-based problems is a hot topic of research nowadays (Alcalá-Fdez et al. 2009a, 2010; Chen et al. 2010; del Jesús et al. 2009; Yan et al. 2009). Also, EA have been used to build rule-based systems (Aguilar-Ruiz et al. 2007; Berlanga et al. 2010; Orriols-Puig and Bernadó-Mansilla 2009).

Real-coded genetic algorithms (RCGA) are very important within EA due to the increasing interest in solving real-world optimization problems. The main problem of RCGA, in which many researchers have focused their works, is the definition of adequate genetic operators (Herrera et al. 2004; Kalyanmoy et al. 2002). In particular, a new RCGA, henceforth called QARGA (Quantitative Association Rules by Genetic Algorithm) is proposed in this work. It is worth noting that QARGA does not perform previous variable discretization, that is, it handles numeric data during the whole rule extraction process, in contrast with many other approaches that perform data discretization to discover rules (Agrawal et al. 1993; Aumann and Lindell 2003; Vannucci and Colla 2004). Furthermore, the approach allows several degrees of freedom in specifying the user’s preference regarding both of the number of attributes and structure of the rules. On the other hand, besides the well-known support and confidence measures, the accuracy of the rules is also obtained with a measure called lift due to its usefulness in the specific area of time series analysis (Ramaswamy et al. 1998).

First, QARGA has been applied to datasets from the Bilkent University Function Approximation (BUFA) repository (Guvenir and Uysal 2000). These datasets have been chosen because the literature offers multiple EA applied to them (Alatas and Akin 2006; Alatas et al. 2008; Mata et al. 2002). Later, time series have been synthetically generated to determine the suitability of applying QARGA to temporal data. Finally, multidimensional real-world time series have been used to extract QAR. In particular, climatological time series have been analyzed to discover the factors that cause high ozone concentration levels in atmosphere.

The remainder of the paper is divided as follows: Sect. 2 provides a formal description of QAR, as well as introduces the quality indices applied to QARGA. Section 3 presents the most relevant related works found in literature. Section 4 describes the main features of QARGA used in this work. The results of applying the proposed algorithm to different datasets are reported and discussed in Sect. 5. Finally, Sect. 6 summarizes the conclusions.

2 Preliminaries

This section is devoted to formally describe QAR and to introduce the quality measures used in this paper.

2.1 Quantitative association rules

Association rules (AR) were first defined by Agrawal et al. (1993) as follows. Let $I=\{i_1, i_2,\ldots, i_n\}$ be a set of $n$ items, and $D=\{tr_1, tr_2,\ldots, tr_N\}$ a set of $N$ transactions, where each $tr_j$ contains a subset of items. Thus, a rule can be defined as $X \Rightarrow Y,$ where $X, Y \subseteq I$ and $X \cap Y = \emptyset$. Finally, $X$ and $Y$ are called antecedent (or left side of the rule) and consequent (or right side of the rule), respectively.

When the domain is continuous, the association rules are known as QAR. In this context, let $F=\{F_1,\ldots,F_n\}$ be a set of features, with values in ${\mathbb{R}}$. Let $A$ and $C$ be two disjunct subsets of $F$, that is, $A \subset F, C \subset F,$ and $A \cap C = \emptyset$. A QAR is a rule $X \Rightarrow Y,$ in which features in $A$ belong to the antecedent $X,$ and features in $C$ belong to the consequent $Y,$ such that

$$ X = \bigwedge_{F_i \in A} F_i \in [l_i, u_i] $$

(1)

$$ Y = \bigwedge_{F_j \in C} F_j \in [l_j, u_j] $$

(2)

where $l_i$ and $l_j$ represent the lower limits of the intervals for $F_i$ and $F_j,$ respectively, and the couple $u_i$ and $u_j$ the upper ones. For instance, a QAR could be numerically expressed as

$$ F_1 \in [12,25] \wedge F_3 \in [5,9] \Rightarrow F_2 \in [3, 7] \wedge F_5 \in [2,8]$$

(3)

where $F_1$ and $F_3$ constitute the features appearing in the antecedent and $F_2$ and $F_5$ the ones in the consequent.

2.2 Quality parameters

This section provides a description of the support, confidence and lift indices (Brin et al. 1997) used to measure the interestingness of rules and of a new index, called $recovered,$ to ensure that the full search space is explored.

The support of an itemset $X$ is defined as the ratio of transactions in the dataset that contain $X$. Formally:

$$ sup(X) = {\frac{\#X}{N}}=P(X) $$

(4)

where $\#X$ is the number of times that $X$ appear in the dataset, and $N$ the number of transactions forming such dataset. Other authors prefer naming the support of $X$ simply as the probability of $X, P(X)$.

Let $X$ and $Y$ be the itemsets that identify the antecedent and consequent of a rule, respectively. The confidence of a rule is expressed as follows:

$$ conf(X \Longrightarrow Y) = {\frac{sup(X \Longrightarrow Y)} {sup(X)}} $$

(5)

and it can be interpreted as the probability that transactions containing $X$, also contain $Y$. In other words, how certain is the rule subjected to analysis.

Finally, the interest or lift of a rule is defined as

$$ lift(X \Longrightarrow Y) = {\frac{sup(X \Longrightarrow Y)} {sup(X)sup(Y)}} $$

(6)

Lift means how many times more often $X$ and $Y$ are together in the dataset than expected, assuming that the presence of $X$ and $Y$ in transactions are occurrences statically independent. Lifts greater than one are desired because this fact would involve statistical dependence in simultaneous occurrence of $X$ and $Y$ and, therefore, the rule would provide valuable information about $X$ and $Y$.

For a better understanding of such indices, a dataset comprising ten transactions and three features is shown in Table 1. Also consider an example rule

$$ F_1 \in [180,189] \wedge F_2 \in [85,95] \Rightarrow F_3 \in [33, 36] $$

(7)

Table 1 Illustrative dataset

An evolutionary algorithm to discover quantitative association rules in multidimensional time series

Abstract

Similar content being viewed by others

Differential Evolution for Association Rule Mining Using Categorical and Numerical Attributes

A novel hybrid GA–PSO framework for mining quantitative association rules

A New Evolutionary Algorithm for Extracting a Reduced Set of Interesting Association Rules

Explore related subjects

1 Introduction

2 Preliminaries

2.1 Quantitative association rules

2.2 Quality parameters

3 Related work

4 Description of the search of rules

4.1 Codification of the individuals

4.2 Generation of the initial population

4.3 Genetic operators

4.4 The fitness function

4.5 The IRL approach

5 Results

5.1 Dataset description

5.1.1 Public datasets

5.1.2 Synthetic multidimensional time series

5.1.3 Real-world time series application: ozone concentration

5.2 Parameters configuration

5.2.1 Configuration for public datasets

5.2.2 Configuration for synthetic time series with no disjunctions

5.2.3 Configuration for synthetic time series with disjunctions

5.2.4 Configuration for ozone time series

5.3 Analysis of results

5.3.1 Results in public datasets

5.3.2 Results in synthetic time series

5.3.3 Results in ozone time series

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation