Keywords

1 Introduction

Data mining plays a vital role in today’s application. So, researchers are paying more attention toward the new tricks and techniques which can be evolved in it. It covers a large domain where it is frequently applied such as business, medical, biometrics. Fuzzy concepts have a great impact on data dredging methodology. Various data warehouses are managed to store and use the data efficiently through different domain. Time series data is a collection of data points which has some specific value at that instant of time. It varies with respect to time. This paper proposes an algorithm which induce fuzzy association rule with multiple minimum support value. Earlier many algorithms have been proposed but they follow only single support value condition. Sometimes itemset has different minimum support. To explain this we applied a fuzzy concept on time series data more specifically temperature varying data. As we know that time series data comes under the category of Sequence data which has some trend or pattern in it. So algorithm would predict the near temperature using the trend analysis. The proposed algorithm has two advantages:—First, the result would be easier to understand as we are using fuzzy theory which is quite familiar with natural language. Second, it also helps to determine the sudden change in temperature of a place.

The remaining parts of this paper are assembled as follows: review of fuzzy set theory is given in Sect. 2. The related work of the paper is explained in Sect. 3. The proposed algorithm is explained in given in Sect. 4. Further experimental results are shown in Sect. 5. Finally conclusion and future work is discussed in Sect. 6.

2 Fuzzy Set Theory

Fuzzy set theory was pointed out in 1965 by Zadeh in his seminal paper entitled “Fuzzy sets” which played a vital role in human thinking, focusing in the domains of pattern recognition, communication of information and abstraction. Fuzzy set theory consists of fuzzy membership functions. Fuzzy set expresses the degree to which an element belongs to a set called as characteristic function. For a given crisp set B, the function assigns a value \(\mu _{\mathrm{B}}\)(x) to every x € X such that

$$\begin{aligned} \mu _\text {B} \left( \text {x} \right)&=\left\{ 1 \ \text {iff x}\in \text {B} \right. \\ \mu _\text {B} \left( \text {x} \right)&=\{0 \text { iff x does not}\in \text {B} \end{aligned}$$

Assume that \(\text {x}_{1}\) to \(\text {x}_{\text {k}}\) are the elements in fuzzy set B, and \(\mu _{1 }\)to \(\mu _{\text {k}}\) are respectively their grades of membership function in B. B is usually represented as follows:

$$\begin{aligned} \text {B }=\mu _1 /\text {x}_1 +\mu _2 /\text {x}_2 +\cdots +\mu _\text {k} /\text {x}_\text {k} \end{aligned}$$
(1)

3 Related Work

Data mining is frequently used in inducing association rules from large itemsets. The association rules describes the effects of presence and absence of an item in a transaction with other items in terms of two measures support and confidence.

Hong proposed an algorithm which induces association rules with multiple minimum supports using maximum constraints on general items. Au and Chan proposed a fuzzy dredging approach to find fuzzy rules for time series data. Das proposed a dredging algorithm for time series data prediction. Das used the clustering method to extract basic shapes from time series and applied Apriori method to induce the association rules on it.

4 The Proposed Algorithm with Multiple Minimum Support

Input: A time series TS with n data points, a list of m membership functions for data points, a predefined minimum support threshold for each fuzzy item \(\text {ms}_{\text {i}}\), i \(=\) 1 to z, a predefined minimum confidence threshold \(\lambda \), and a sliding window size ws.

  1. Step 1:

    Convert the time series TS into a list of subsequences W(TS) according to the sliding-window size ws. That is, \(\text {W}\left( {\text {TS}} \right) =\{\text {s}_\text {b}|\text {s}_\text {b} =\left( {\text {d}_\text {b},\text { d}_{\text {b}+1}, \ldots , \text { d}_{\text {b}+\text {ws}-1} } \right) ,\text { b }={1 \text {to n}}-\text {ws}+1\}\), where \(\text {d}_{\text {b}}\) is the value of the b-th data point in TS.

  2. Step 2:

    Transform the k-th (k \(=\) 1 to ws) quantitative value \(v_{\text {bk}}\) in each subsequence \(\text {s}_{\text {b}}\) (b \(=\) 1 to n-ws \(+\) 1) into a fuzzy set \(\text {f}_{\text {bk}}\) represented as \(\left( {\text {f}_{\text {bk1}} /\text {R}_{\text {k1}} +\text { f}_{\text {bk2}} /\text {R}_{\text {k2}} +\ldots +\text { f}_{\text {bkn}} /\text {R}_{\text {kn}} } \right) \) using the given membership functions, where \(\text {R}_{\text {kl}}\) is the l-th fuzzy region of the k-th data point in each subsequence, m is the number of fuzzy memberships, and \(\text {f}_{\text {bkl}}\) is \(\text {v}_{\text {bk}}\)’s fuzzy membership value in region \(\text {R}_{\text {kl}}\). Each \(\text {R}_{\text {kl} }\) is called a fuzzy item.

  3. Step 3:

    Compute the scalar cardinality of each fuzzy item \(\text {R}_{\text {kl}}\) as

    $$\begin{aligned} \text {Count}_{\text {kl}} =\sum _{b=1}^{n-ws+1} {f_{\text {bkl}} } \end{aligned}$$
  4. Step 4:

    Check whether the support value (\( =\text {count}_{\text {kl}}/\text {n}-\text {ws} + 1\)) of each \(\text {R}_{\mathrm{kl}}\) in \(\text {C}_{1}\) is greater than or equal to its predefined minimum support threshold value \(\text {ms}_{\text {Rkl}}\). If \(\text {R}_{\text {kl}}\) satisfies the above condition, collect it in the set of large 1-itemsets (\(\text {L}_{1})\). That is:

    $$\begin{aligned} \text {L}_1 =\{\text {R}_{\text {kl}} |\text {count}_{\text {kl}} \ge \text {ms}_{\text {Rkl}}, 1\le \text {k}\le \text {b}+\text {ws}-1\;{\text {and}}\;1\le l\le \text {m}\}. \end{aligned}$$
  5. Step 5:

    IF \(\text {L}_{1}\) is not null, then perform the next step; otherwise, terminate the algorithm.

  6. Step 6:

    Set t \(=\) 1, where t is used to represent the number of fuzzy items in the current itemsets to be processed.

  7. Step 7:

    Join the large t-itemsets \(\text {L}_{\text {t} }\) to obtain the candidate (t \(+\) 1)-itemsets \(\text {C}_{\text {t}+1 }\) in the same way as in the Apriori algorithm provided that two items obtained from the same order of data points in subsequences cannot exist in an itemset in \(\text {C}_{\text {t}+1 }\) at the same instant provided the minimum support of all the large t-itemsets must be greater than or equal to the maximum of the minimum supports of fuzzy items in theses large t-itemsets.

  8. Step 8:

    Now, perform the following steps for fuzzy items in \(\text {C}_{\text {t}+1 :}\)

    1. (a)

      Compute the fuzzy value of I in each subsequence \(\text {s}_{\text {b}}\) as \( {\text {f}}_{{\text {I}}}^{{{\text {sb}}}} = {\text {f}}_{{{\text {I}}1}}^{{sb }}\wedge {\text {f}}_{{{\text {I}}2}}^{{{\text {sb}} }}\wedge \ldots { \wedge } {\text {f}}_{{{\text {It}} + 1}}^{{sb}} \) where \(\text {f}^{\text {sb}}_{\text {Ik} }\) is the membership value of fuzzy item \(\text {I}_{\text {k}}\) in \(\text {S}_{\text {b}}\). If the minimum operator is used for the intersection, then:

      $$\begin{aligned} {\text {f}}^{\text {s}} _{{{\text {Ib}}}} = {\text {Min}}_{{{\text {k}} = 1}}^{{{\text {t}} + 1}}\;{\text {f}}\;^{\text {s}} _{{{\text {Ip}}}}. \end{aligned}$$
    2. (b)

      Compute the count of I in all the subsequences as:

      $$\begin{aligned} \hbox {Count}_{l} =\sum _{b=1}^{n-ws+1} {f_{I} ^{{sb}}} \end{aligned}$$
  9. Step 9:

    If the support (\(= \text {count}_{\text {I} }/\text {n}-\text {ws} + 1\)) of I is greater than or equal to maximum of the minimum support value, put it in \(\text {L}_{\text {t}+1}\).

    $$\begin{aligned} {\text {L}}_{{{\text {t}} + 1}} = \left\{ {{\text {I}}_{{\text {k}}} \left| {{\text {count}}_{{\text {I}}} > = {\text {ms}}_{{{\text {Ik}}}},} \right| } \right. \end{aligned}$$
  10. Step 10:

    STEP 13: If \(\text {L}_{\text {t}+1}\) is null, then do the next step; otherwise, set t \(=\) t \(+\) 1 and repeat STEPs 6–9.

  11. Step 11:

    Generate the association rules for each large h-itemset I with items (\(\text {I}_{1}, \text {I}_{2},\ldots , \text {I}_{\text {h}})\), \(\text {h}\ge \)2, using the following substeps:

    1. (a)

      Form each possible association rule as follows: \( {\text {I}}_{1} ^{ \wedge } \ldots ^{ \wedge } {\text {I}}_{{{\text {n}} - 1}} ^{ \wedge } {\text {I}}_{{{\text {n}} + 1}} ^{ \wedge } \ldots ^{ \wedge } {\text {I}}_{{\text {h}}} \rightarrow {\text {I}}_{{\text {n}}} \), n \(=\) 1 to h.

    2. (b)

      Calculate the confidence values of all association rules by the following formula:

    $$\begin{aligned} = \sum \limits _{{b = 1}}^{{n - ws + 1}} {f_{{\text {I}}}^{{{\text {sb}}}} \backslash \sum \limits _{{b = 1}}^{{n - ws + 1}} {\left( {(f_{I}^{{{\text {sb}}\; \wedge }} \ldots ^{ \wedge } f^{s} _{{{\text {IP}}}} } \right) } } \end{aligned}$$

Output: A set of association rules which satisfies the condition of the maximum values of minimum supports.

5 An Example

This section explains the working of the proposed algorithm and generates fuzzy association rule (Table 1).

Table 1 Set of data points
Fig. 1
figure 1

Membership function used in this example

Assume the membership function used in the example as Fig. 1 (Table 2).

Table 2 Predefined minimum support value of all fuzzy itemset
  1. Step 1:

    The window size is assumed as 5. Using the formula we get (15\(-\)5\(+\)1) \(=\) 11 subsequences

  2. Step 2:

    The data values are then converted into fuzzy item sets using the membership function shown in fig no.

  3. Step 3:

    Add all the value of fuzzy region of the subsequences called as its count. For example-Assume a fuzzy item \(\text {Q}_{1}\).Middle.count is (\(0+0.33+1+0+0.33+1+0+.2+0+1+0\)) \( = \) 3.86

  4. Step 4:

    Now,compare the count of all fuzzy item with its individual minimum support count which is predefined. Fuzzy items whose count is greater than minimum support value of itself put the fuzzy item in the table \(\text {L}_{1}\) (Table 3).

  5. Step 5:

    If \(\text {L}_{1}\) consists of fuzzy item, proceed to step 6, else terminate.

  6. Step 6:

    Candidate set \(\text {C}_{\text {t}+1}\) is generated from \(\text {L}_{\text {t}}\). Fuzzy items in \(\text {L}_{1}\)are (Q1.Low,Q1.Middle, Q2.Low,Q2.Middle, Q3.Low, Q3.Middle, Q4.Low, Q5.Low,Q5.High) .

  7. Step 7:

    \(\text {L}_{1}\) is joined to generate \(\text {C}_{2}\). The new fuzzy items in \(\text {C}_{2}\) are as follows (Q1.Low,Q2.Mid),(Q1.Low, Q3.Mid), (Q1Low, Q5.High), (Q1.Low, Q2.Mid), (Q1.Low, Q3.Mid),(Q1.Low, Q5.Low), (Q2.Low, Q3.Low), (Q2.Low, Q4.Low), (Q2.low, Q5High), (Q2.low, Q1.Mid), (Q2.Low, Q3.Mid), (Q2.Low, Q5.Low), (Q3.Low, Q4.Low), (Q3.low, Q5.High), (Q3.Low, Q1.Mid), (Q3.low, Q2.Mid), (Q3.low, Q5.low), (Q4.Low, Q1.Mid), (Q4.Low, Q2.Mid), (Q4.low, Q5.High), (Q4.Low, Q5.Low), (Q5.High, Q1.Mid), (Q5.High, Q2.Mid), (Q5.High, Q3.High).

  8. Step 8:

    Now compute the count of all the fuzzy items of \(\text {C}_{2}\).

  9. Step 9:

    Compare the \(\text {C}_{2}\) itemset count with minimum support count of Fuzzy itemset. C2 items whose count is greater or equal to minimum support of maximum of the two itemset is stored in \(\text {L}_{2.}\)

  10. Step 10:

    Since \(\text {L}_{2}\) is not null, repeat step no 6–9 until \(\text {L}_{t}\) is null (Tables 4, 5, 6).

  11. Step 11:

    (a) In this example, only (Q3.Low Q2.Mid) exists. It means association rules formed are If Q3 \(=\) Low then Q2 \(=\) Mid. If Q2 \(=\) Mid then Q3 \(=\) Low. (b) Calculation of confidence of (Q3.Low Q2.Mid) \(=\) 3.34\(\backslash \) 3.34 \(=\) 1. It means if the value of a data point is mid at time2 then value of a data point is low at time3 with a confidence factor of 1.

Fig. 2
figure 2

Temperature varying data

Table 3 Sequence generated, ws=5
Table 4 Converted fuzzy set
Table 5 Candidate set \(\text {C}_{2 }\)
Table 6 Fuzzy itemset \(\text {L}_{2}\)

6 Experimental Results

The proposed algorithm is implemented in a programming language C. The dataset points consisted of temperature varying points between year 2008–2012. The dataset is taken from National Data Center (NDC) US (Figs. 2 and 3).

Fig. 3
figure 3

Membership function used in experiment

Fig. 4
figure 4

Relation between support value and confidence

In Fig. 4 as the support value of fuzzy itemset is increased, number of fuzzy association rule decreased. This means change in temperature is effected with change in support value. It means that if the temperature of a day at second day of a month is moderate, then it may be high at third day of the month.

7 Conclusion and Future Work

In this paper, the proposed algorithm provided the best way to induce efficient fuzzy association rule as there is predefined minimum support for all the fuzzy items. The temperature prediction would be more accurate than earlier proposed approaches. Future work suggests that the membership function can be set dynamically. In this paper membership functions are known in advance. More complex operations could be made in near future. It also provided us another view point for defining minimum support of fuzzy items.