Abstract
Technological changes have occurred at an exponential rate in recent years leading to the generation of large amount of data in various sectors. Several database and data warehouse is built to store and manage the data. As we know the data which are relevant to us should be extracted from the database for our task. Earlier different mining approaches are proposed in which items are collected at same minimum support value. In this paper we propose a fuzzy data mining algorithm which generates the fuzzy association rules from time series data having different minimum support values. The temperature varying dataset is used to generate fuzzy rules. The proposed algorithm also predicts the variation of temperature. Experiments are also performed to get the desired result.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Data mining plays a vital role in today’s application. So, researchers are paying more attention toward the new tricks and techniques which can be evolved in it. It covers a large domain where it is frequently applied such as business, medical, biometrics. Fuzzy concepts have a great impact on data dredging methodology. Various data warehouses are managed to store and use the data efficiently through different domain. Time series data is a collection of data points which has some specific value at that instant of time. It varies with respect to time. This paper proposes an algorithm which induce fuzzy association rule with multiple minimum support value. Earlier many algorithms have been proposed but they follow only single support value condition. Sometimes itemset has different minimum support. To explain this we applied a fuzzy concept on time series data more specifically temperature varying data. As we know that time series data comes under the category of Sequence data which has some trend or pattern in it. So algorithm would predict the near temperature using the trend analysis. The proposed algorithm has two advantages:—First, the result would be easier to understand as we are using fuzzy theory which is quite familiar with natural language. Second, it also helps to determine the sudden change in temperature of a place.
The remaining parts of this paper are assembled as follows: review of fuzzy set theory is given in Sect. 2. The related work of the paper is explained in Sect. 3. The proposed algorithm is explained in given in Sect. 4. Further experimental results are shown in Sect. 5. Finally conclusion and future work is discussed in Sect. 6.
2 Fuzzy Set Theory
Fuzzy set theory was pointed out in 1965 by Zadeh in his seminal paper entitled “Fuzzy sets” which played a vital role in human thinking, focusing in the domains of pattern recognition, communication of information and abstraction. Fuzzy set theory consists of fuzzy membership functions. Fuzzy set expresses the degree to which an element belongs to a set called as characteristic function. For a given crisp set B, the function assigns a value \(\mu _{\mathrm{B}}\)(x) to every x € X such that
Assume that \(\text {x}_{1}\) to \(\text {x}_{\text {k}}\) are the elements in fuzzy set B, and \(\mu _{1 }\)to \(\mu _{\text {k}}\) are respectively their grades of membership function in B. B is usually represented as follows:
3 Related Work
Data mining is frequently used in inducing association rules from large itemsets. The association rules describes the effects of presence and absence of an item in a transaction with other items in terms of two measures support and confidence.
Hong proposed an algorithm which induces association rules with multiple minimum supports using maximum constraints on general items. Au and Chan proposed a fuzzy dredging approach to find fuzzy rules for time series data. Das proposed a dredging algorithm for time series data prediction. Das used the clustering method to extract basic shapes from time series and applied Apriori method to induce the association rules on it.
4 The Proposed Algorithm with Multiple Minimum Support
Input: A time series TS with n data points, a list of m membership functions for data points, a predefined minimum support threshold for each fuzzy item \(\text {ms}_{\text {i}}\), i \(=\) 1 to z, a predefined minimum confidence threshold \(\lambda \), and a sliding window size ws.
-
Step 1:
Convert the time series TS into a list of subsequences W(TS) according to the sliding-window size ws. That is, \(\text {W}\left( {\text {TS}} \right) =\{\text {s}_\text {b}|\text {s}_\text {b} =\left( {\text {d}_\text {b},\text { d}_{\text {b}+1}, \ldots , \text { d}_{\text {b}+\text {ws}-1} } \right) ,\text { b }={1 \text {to n}}-\text {ws}+1\}\), where \(\text {d}_{\text {b}}\) is the value of the b-th data point in TS.
-
Step 2:
Transform the k-th (k \(=\) 1 to ws) quantitative value \(v_{\text {bk}}\) in each subsequence \(\text {s}_{\text {b}}\) (b \(=\) 1 to n-ws \(+\) 1) into a fuzzy set \(\text {f}_{\text {bk}}\) represented as \(\left( {\text {f}_{\text {bk1}} /\text {R}_{\text {k1}} +\text { f}_{\text {bk2}} /\text {R}_{\text {k2}} +\ldots +\text { f}_{\text {bkn}} /\text {R}_{\text {kn}} } \right) \) using the given membership functions, where \(\text {R}_{\text {kl}}\) is the l-th fuzzy region of the k-th data point in each subsequence, m is the number of fuzzy memberships, and \(\text {f}_{\text {bkl}}\) is \(\text {v}_{\text {bk}}\)’s fuzzy membership value in region \(\text {R}_{\text {kl}}\). Each \(\text {R}_{\text {kl} }\) is called a fuzzy item.
-
Step 3:
Compute the scalar cardinality of each fuzzy item \(\text {R}_{\text {kl}}\) as
$$\begin{aligned} \text {Count}_{\text {kl}} =\sum _{b=1}^{n-ws+1} {f_{\text {bkl}} } \end{aligned}$$ -
Step 4:
Check whether the support value (\( =\text {count}_{\text {kl}}/\text {n}-\text {ws} + 1\)) of each \(\text {R}_{\mathrm{kl}}\) in \(\text {C}_{1}\) is greater than or equal to its predefined minimum support threshold value \(\text {ms}_{\text {Rkl}}\). If \(\text {R}_{\text {kl}}\) satisfies the above condition, collect it in the set of large 1-itemsets (\(\text {L}_{1})\). That is:
$$\begin{aligned} \text {L}_1 =\{\text {R}_{\text {kl}} |\text {count}_{\text {kl}} \ge \text {ms}_{\text {Rkl}}, 1\le \text {k}\le \text {b}+\text {ws}-1\;{\text {and}}\;1\le l\le \text {m}\}. \end{aligned}$$ -
Step 5:
IF \(\text {L}_{1}\) is not null, then perform the next step; otherwise, terminate the algorithm.
-
Step 6:
Set t \(=\) 1, where t is used to represent the number of fuzzy items in the current itemsets to be processed.
-
Step 7:
Join the large t-itemsets \(\text {L}_{\text {t} }\) to obtain the candidate (t \(+\) 1)-itemsets \(\text {C}_{\text {t}+1 }\) in the same way as in the Apriori algorithm provided that two items obtained from the same order of data points in subsequences cannot exist in an itemset in \(\text {C}_{\text {t}+1 }\) at the same instant provided the minimum support of all the large t-itemsets must be greater than or equal to the maximum of the minimum supports of fuzzy items in theses large t-itemsets.
-
Step 8:
Now, perform the following steps for fuzzy items in \(\text {C}_{\text {t}+1 :}\)
-
(a)
Compute the fuzzy value of I in each subsequence \(\text {s}_{\text {b}}\) as \( {\text {f}}_{{\text {I}}}^{{{\text {sb}}}} = {\text {f}}_{{{\text {I}}1}}^{{sb }}\wedge {\text {f}}_{{{\text {I}}2}}^{{{\text {sb}} }}\wedge \ldots { \wedge } {\text {f}}_{{{\text {It}} + 1}}^{{sb}} \) where \(\text {f}^{\text {sb}}_{\text {Ik} }\) is the membership value of fuzzy item \(\text {I}_{\text {k}}\) in \(\text {S}_{\text {b}}\). If the minimum operator is used for the intersection, then:
$$\begin{aligned} {\text {f}}^{\text {s}} _{{{\text {Ib}}}} = {\text {Min}}_{{{\text {k}} = 1}}^{{{\text {t}} + 1}}\;{\text {f}}\;^{\text {s}} _{{{\text {Ip}}}}. \end{aligned}$$ -
(b)
Compute the count of I in all the subsequences as:
$$\begin{aligned} \hbox {Count}_{l} =\sum _{b=1}^{n-ws+1} {f_{I} ^{{sb}}} \end{aligned}$$
-
(a)
-
Step 9:
If the support (\(= \text {count}_{\text {I} }/\text {n}-\text {ws} + 1\)) of I is greater than or equal to maximum of the minimum support value, put it in \(\text {L}_{\text {t}+1}\).
$$\begin{aligned} {\text {L}}_{{{\text {t}} + 1}} = \left\{ {{\text {I}}_{{\text {k}}} \left| {{\text {count}}_{{\text {I}}} > = {\text {ms}}_{{{\text {Ik}}}},} \right| } \right. \end{aligned}$$ -
Step 10:
STEP 13: If \(\text {L}_{\text {t}+1}\) is null, then do the next step; otherwise, set t \(=\) t \(+\) 1 and repeat STEPs 6–9.
-
Step 11:
Generate the association rules for each large h-itemset I with items (\(\text {I}_{1}, \text {I}_{2},\ldots , \text {I}_{\text {h}})\), \(\text {h}\ge \)2, using the following substeps:
-
(a)
Form each possible association rule as follows: \( {\text {I}}_{1} ^{ \wedge } \ldots ^{ \wedge } {\text {I}}_{{{\text {n}} - 1}} ^{ \wedge } {\text {I}}_{{{\text {n}} + 1}} ^{ \wedge } \ldots ^{ \wedge } {\text {I}}_{{\text {h}}} \rightarrow {\text {I}}_{{\text {n}}} \), n \(=\) 1 to h.
-
(b)
Calculate the confidence values of all association rules by the following formula:
$$\begin{aligned} = \sum \limits _{{b = 1}}^{{n - ws + 1}} {f_{{\text {I}}}^{{{\text {sb}}}} \backslash \sum \limits _{{b = 1}}^{{n - ws + 1}} {\left( {(f_{I}^{{{\text {sb}}\; \wedge }} \ldots ^{ \wedge } f^{s} _{{{\text {IP}}}} } \right) } } \end{aligned}$$ -
(a)
Output: A set of association rules which satisfies the condition of the maximum values of minimum supports.
5 An Example
This section explains the working of the proposed algorithm and generates fuzzy association rule (Table 1).
Assume the membership function used in the example as Fig. 1 (Table 2).
-
Step 1:
The window size is assumed as 5. Using the formula we get (15\(-\)5\(+\)1) \(=\) 11 subsequences
-
Step 2:
The data values are then converted into fuzzy item sets using the membership function shown in fig no.
-
Step 3:
Add all the value of fuzzy region of the subsequences called as its count. For example-Assume a fuzzy item \(\text {Q}_{1}\).Middle.count is (\(0+0.33+1+0+0.33+1+0+.2+0+1+0\)) \( = \) 3.86
-
Step 4:
Now,compare the count of all fuzzy item with its individual minimum support count which is predefined. Fuzzy items whose count is greater than minimum support value of itself put the fuzzy item in the table \(\text {L}_{1}\) (Table 3).
-
Step 5:
If \(\text {L}_{1}\) consists of fuzzy item, proceed to step 6, else terminate.
-
Step 6:
Candidate set \(\text {C}_{\text {t}+1}\) is generated from \(\text {L}_{\text {t}}\). Fuzzy items in \(\text {L}_{1}\)are (Q1.Low,Q1.Middle, Q2.Low,Q2.Middle, Q3.Low, Q3.Middle, Q4.Low, Q5.Low,Q5.High) .
-
Step 7:
\(\text {L}_{1}\) is joined to generate \(\text {C}_{2}\). The new fuzzy items in \(\text {C}_{2}\) are as follows (Q1.Low,Q2.Mid),(Q1.Low, Q3.Mid), (Q1Low, Q5.High), (Q1.Low, Q2.Mid), (Q1.Low, Q3.Mid),(Q1.Low, Q5.Low), (Q2.Low, Q3.Low), (Q2.Low, Q4.Low), (Q2.low, Q5High), (Q2.low, Q1.Mid), (Q2.Low, Q3.Mid), (Q2.Low, Q5.Low), (Q3.Low, Q4.Low), (Q3.low, Q5.High), (Q3.Low, Q1.Mid), (Q3.low, Q2.Mid), (Q3.low, Q5.low), (Q4.Low, Q1.Mid), (Q4.Low, Q2.Mid), (Q4.low, Q5.High), (Q4.Low, Q5.Low), (Q5.High, Q1.Mid), (Q5.High, Q2.Mid), (Q5.High, Q3.High).
-
Step 8:
Now compute the count of all the fuzzy items of \(\text {C}_{2}\).
-
Step 9:
Compare the \(\text {C}_{2}\) itemset count with minimum support count of Fuzzy itemset. C2 items whose count is greater or equal to minimum support of maximum of the two itemset is stored in \(\text {L}_{2.}\)
-
Step 10:
Since \(\text {L}_{2}\) is not null, repeat step no 6–9 until \(\text {L}_{t}\) is null (Tables 4, 5, 6).
-
Step 11:
(a) In this example, only (Q3.Low Q2.Mid) exists. It means association rules formed are If Q3 \(=\) Low then Q2 \(=\) Mid. If Q2 \(=\) Mid then Q3 \(=\) Low. (b) Calculation of confidence of (Q3.Low Q2.Mid) \(=\) 3.34\(\backslash \) 3.34 \(=\) 1. It means if the value of a data point is mid at time2 then value of a data point is low at time3 with a confidence factor of 1.
6 Experimental Results
The proposed algorithm is implemented in a programming language C. The dataset points consisted of temperature varying points between year 2008–2012. The dataset is taken from National Data Center (NDC) US (Figs. 2 and 3).
In Fig. 4 as the support value of fuzzy itemset is increased, number of fuzzy association rule decreased. This means change in temperature is effected with change in support value. It means that if the temperature of a day at second day of a month is moderate, then it may be high at third day of the month.
7 Conclusion and Future Work
In this paper, the proposed algorithm provided the best way to induce efficient fuzzy association rule as there is predefined minimum support for all the fuzzy items. The temperature prediction would be more accurate than earlier proposed approaches. Future work suggests that the membership function can be set dynamically. In this paper membership functions are known in advance. More complex operations could be made in near future. It also provided us another view point for defining minimum support of fuzzy items.
References
Aggrawal, R.: Mining association rules between sets of items inlarge database ACM
Stepnicka, M.: Time series analysis and prediction based on fuzzy rules and the fuzzy transform
Dr.Sivatsa S.K.: Inaccuracy minimization by partitioning fuzzy data sets—validation of an analytical methodology(IJCSIS). Int. J. Comput. Sci. Inf. Secur. 8(1), (2010)
Das, G.: Rule discovery from time series. In: Proceedings of the 4 the International Conference
Pongracz, R.: Application of fuzzy rule-based modeling technique to regional drought. J. Hydrol. 224, 100–114 (1999)
Mueen, A.A.: Exact primitives for time series data mining. University of California, Riverside (2012)
Zhu, Y.: High performance data mining in time series: techniques and case studies. New work University, New York (2004)
Herrera, Francisco: Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms. Fuzzy Sets Syst. 160, 905–921 (2009)
Han, J.: Data Mining concepts and Techniques
Hong, T.P.: Mining association rules with multiple minimum supports. Int. J. Approximate reasoning. 3, 38–42 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer India
About this paper
Cite this paper
Rathi, R., Jain, V., Gautam, A.K. (2014). Inducing Fuzzy Association Rules with Multiple Minimum Supports for Time Series Data. In: Babu, B., et al. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. Advances in Intelligent Systems and Computing, vol 236. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1602-5_47
Download citation
DOI: https://doi.org/10.1007/978-81-322-1602-5_47
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1601-8
Online ISBN: 978-81-322-1602-5
eBook Packages: EngineeringEngineering (R0)