Keywords

1 Introduction

The main goal of this paper is to provide overview of the results that have been obtained over several years in analyzing, forecasting, and mining information from time series using methods that predominantly have nonstatistical character. Since our methods are not sufficiently known among statisticians, we want to fill in this gap. We argue that our methods can be successful in time series processing and demonstrate this on examples. On the one hand, they provide similar results as statistical methods but give a different point of view on them. On the other hand, they also give an explanation of the obtained results. Moreover, our methods can provide additional information that can hardly be obtained using statistical methods. We will discuss this feature of our methods in the sequel. However, let us emphasize that our goal is not to beat statistical methods but rather to extend the power of time series processing methods and benefit from the mutual synergy. Let us remark that all methods described below have been developed by the authors of this paper and their collaborators.

Recall that by a fuzzy set A on the universe U, we understand a function A : U → [0, 1].Footnote 1 We often write A⊂ ∼ U or \(A\in \mathcal {F}(U)\) where \(\mathcal {F}(U)\) is the set of all fuzzy sets on U.

It is important to note that there is an essential difference between the probabilistic and fuzzy techniques, which we extensively argued in [32] and elsewhere. Recall from the latter that the probability theory provides a model of the uncertainty characterized by a lack of information about possible results of some event (experiment). Fuzzy sets, on the other hand, provide a mathematical model of the vagueness phenomenon. The latter rises if we want to form a class of all objects with vague property (e.g., “warm,” “strong,” “steep,” etc.). No random event (a result of some experiment) occurred in this case and so, it cannot be considered.

Both phenomena are occurring in reality and should be treated using different mathematical principles. While probability theory is based on the properties of measure and a key notion is that of independence of events, fuzzy set theory (and fuzzy logic) is based on the properties of ordered structures. Thus, both probability and fuzzy set theory are complementary rather than competitive.Footnote 2

In this paper, we present special techniques of fuzzy modeling suitable for applications in time series processing. The first one is the fuzzy transform (F-transform) and the second one are a few selected methods of fuzzy natural logic (FNL). These techniques are in detail described in various papers. A comprehensive explanation including applications can be found in the book [34].

Our methods can be applied to processing of classical time series. There is also a branch focusing on elaboration of the so-called fuzzy time series [36, 42] which are sequences \(\{A(t)\mid t\in \mathbb {T}\}\) where A(t) are fuzzy sets. We do not deal with this approach in this paper.

This paper is structured as follows. In the first two sections, we introduce the basic concepts of fuzzy transform and fuzzy natural logic. In Sect. 4, we introduce the decomposition of time series into components that are later elaborated separately. Section 5 describes basic principles of forecasting of time series. In Sect. 6, we overview three main kinds of information that can be mined from time series. Its last subsection is a brief overview of other applications of our methods in time series processing.

2 Fuzzy Transform

This is a universal technique introduced by I. Perfilieva in [37, 39] that is widely applied in many areas. Its fundamental idea is to transform a real bounded continuous function f : [a, b] → [c, d], where \([a, b], [c, d]\subset \mathbb {R}\), to a finite vector of components and then transform it back. The former is called a direct F-transform and the latter an inverse one. The result of the inverse F-transform is a function \(\hat {f}:[a, b]\to \mathbb {R}\) that approximates the original function f. We can set the parameters so that the approximating function \(\hat {f}\) has the desired properties.

The F-transform has several strengths: excellent approximation abilities, ability to filter out high frequencies, ability to reduce noise [18, 19, 35], and ability to estimate values of the first and second derivatives in an approximately specified area (cf. [11]).

The first step of the F-transform procedure is to form a fuzzy partition of the domain [a, b] which consists of a finite set of fuzzy sets on [a, b]

$$\displaystyle \begin{aligned} \mathcal{A}_h=\{A_0,\ldots, A_n\}, \qquad n\geq 2, \end{aligned} $$
(1)

defined over the set of nodes a = c0, …, cn = b such that ck+1 = ck + h where h > 0. Each fuzzy set Ak has a support defined over three nodes ck−1, ck, ck+1 where A(ck) = 1 and A(ck−1) = A(ck+1) = 0. The fuzzy sets Ak are often called basic functions. The properties of basic functions are defined axiomatically; for the details, see [37] and elsewhere.

If the fuzzy partition is given, then an (n + 1)-tuple

$$\displaystyle \begin{aligned} {\mathbf{F}}^m[f]=(F^m_0[f],\ldots, F^m_n[f]) \end{aligned}$$

is called m-th degree direct fuzzy transform of f if

$$\displaystyle \begin{aligned} F^m_k[f](x)= \beta^0_k[f] + \beta^1_k[f] (x-c_k) +\cdots+ \beta^m_k[f] (x-c_k)^2, \end{aligned} $$
(2)

for all k = 0, …, n. We call \(F^m_k[f]\) in (2) components of the fuzzy transform. Precise computation of the components (2) is in detail described in [20] and elsewhere.

The F-transform is linear, namely, if f = α1g1 + α2g2 where α1, α2 are numbers and g1, g2 real bounded functions on the same domain, then

$$\displaystyle \begin{aligned} F^m_k[f]=\alpha_1 F^m_k[g]+\alpha_2 F^m_k[g_2] \end{aligned}$$

for all k = 1, …, n.

The inverse F-transform is

$$\displaystyle \begin{aligned} \hat{f}_h^m(x)=\sum_{k= 0}^n F^m_k[f]\cdot A_k(x),\quad x\in [a, b]. \end{aligned} $$
(3)

It can be proved that the function \(\hat {f}_h^m\) approximates the original function f with arbitrary precision depending on the choice of h when forming the fuzzy partition \(\mathcal {A}_h\). The computational complexity of fuzzy transform is linear.

The following holds for the coefficients \(\beta ^j_k\) in (2) (see [39]):

$$\displaystyle \begin{aligned} \beta^0_k[f] &= f(c_k) + O(h^2), {} \end{aligned} $$
(4)
$$\displaystyle \begin{aligned} \beta^1_k[f] &= f'(c_k) + O(h^2),{} \end{aligned} $$
(5)
$$\displaystyle \begin{aligned} \beta^2_k[f] &= \frac{f''(c_k)}{2} + O(h^2).{} \end{aligned} $$
(6)

Hence, each coefficient \(\beta ^j_k\) provides a weighted average of values as well as of derivatives of the function f over the area characterized by the fuzzy set \(A_k\in \mathcal {A}_h\).

3 Fuzzy Natural Logic

This is a class of special theories of mathematical fuzzy logic whose goal is to model the reasoning of people based on using natural language. So far, it consists of the following theories:

  1. (a)

    A formal theory of evaluative linguistic expressions explained in detail in [24] (see also [23, 34]) that are expressions of natural language such as small, medium, big, very short, more or less deep, quite roughly strong, extremely high, etc.

  2. (b)

    A formal theory of fuzzy/linguistic IF-THEN rules and approximate reasoning [22, 30, 33, 34]. The basic concept here is that of a linguistic description, that is, a finite set of fuzzy/linguistic IF-THEN rules:

    (7)

    where “\({X}\text{ is }{\mathcal {A}_j}\),” “\({Y}\text{ is }{\mathcal {B}_j}\),” j = 1, …, m are evaluative linguistic predications (e.g., “trend is very steep, difference is small, trend-cycle is stagnating,” etc.). The linguistic description can be learned from data.

    To find a proper conclusion on the basis of linguistic description, it is necessary to use a special reasoning method called perception-based logical deduction (PbLD). This method is based on the mathematical model of the used evaluative predications. To find conclusion, it acts locally so that it mimics the way how people make their reasoning on the basis of linguistic information. More detailed description of PbLD can be found in [34].

  3. (c)

    A formal theory of intermediate and generalized fuzzy quantifiers [6, 15, 25] and elsewhere. These are expressions of natural language such as most, many, a lot of, a few, several, etc.

Theory (b) is applied in forecasting. Theories (a) and (c) are applied in mining information from time series described in Sect. 6. For more details about FLN, see the cited literature. Less informal explanation of methods of FLN including description of applications can be found in [34].

4 Analysis of Time Series

Application of techniques of fuzzy modeling in time series analysis is based on the assumption that time series can be decomposed as follows: let \(\mathbb {T}=\{1, \ldots , p\}\) be a set of natural numbers interpreted as time moments. Then a time series is a set X = {X(t, ω)∣t ∈ T, ω ∈ Ω} where

$$\displaystyle \begin{aligned} X(t, \omega) = \mathit{T\!C}\,(t)+S(t)+ R(t, \omega), \qquad t\in \mathbb{T}, \omega\in \varOmega. \end{aligned} $$
(8)

The \({\mathit {T\!C}}(t)\) is a trend-cycle that can be further decomposed into trend and cycle, i.e., \({\mathit {T\!C}}\,(t)= {\mathit {T\!r}}\,(t)+C(t)\). The S(t) is a seasonal component that is a mixture of r periodic functions:

$$\displaystyle \begin{aligned} S(t)= \sum_{j=1}^r P_j e^{i\lambda_j t} \end{aligned} $$
(9)

where λ1, …, λr are frequencies and Pj, j = 1, …, r are constants. Without loss of generality, we assume that the frequencies are ordered λ1 < ⋯ < λr (this corresponds to ordering of periodicities T1 >, …, Tr).

Note that \({\mathit {T\!C}}\) and S are ordinary non-stochastic functions. Only R is a random noise and we assume that it is a stationary stochastic process with the mean E(R(t, ω)) = 0 and variance Var(R(t, ω)) < σ, \(t\in \mathbb {T}\).

In practice, we always have only one realization of time series at disposal, which is obtained by fixing ω ∈ Ω. Then

$$\displaystyle \begin{aligned} X = \{X(t)\mid t\in \mathbb{T}\} \end{aligned} $$
(10)

is an ordinary real (or complex) valued function.

Let us now choose a fuzzy partition \(\mathcal {A}_h\) for some h > 0 and apply the F-transform to X in (10). The result of the inverse F-transform is

$$\displaystyle \begin{aligned} \hat{X}(t)= \hat{{\mathit{T\!C}}}+\hat{S}(t)+\hat{R}(t), \qquad t\in \mathbb{T}. \end{aligned} $$
(11)

Then the following can be proved:

Theorem 1 ([16, 17, 35])

  1. (a)

    If we set \(h= \bar {d}\, \bar {T}\) where d > 0 and \(\bar {T}\) is the longest periodicity occurring in S, then \(\lim _{d\rightarrow \infty } |\hat {S}(t)|= 0\).

  2. (b)

    \(\lim _{h\rightarrow \infty } \mathbf {Var}(\hat {R}(t))=0\).

  3. (c)

    There is a number D(m, h), \(m\in \mathbb {N}\), such that

    $$\displaystyle \begin{aligned} |\hat{X}(t) - {\mathit{T\!C}}\,(t)|\leq 2\omega(h, {\mathit{T\!C}})+ D(m, h), \qquad t\in [c_1, c_{n-1}] \end{aligned} $$
    (12)

    where limhD(m, h) = 0 and \(\omega (h, {\mathit {T\!C}})\) is a modulus of continuity w.r.t. h and \({\mathit {T\!C}}\).

It follows from this theorem that, by proper setting of h, the F-transform makes it possible to “wipe out” part or the whole of the seasonal component S of the time series and significantly reduce its noise. To set h, we follow the general OECD specification: Trend (tendency) is a component of a time series that represents variations of low-frequency, high-frequency, and medium-frequency fluctuations having been filtered out. Trend-cycle is a component that represents variations of low and medium frequency in a time series, the high-frequency fluctuations having been filtered out.

Hence, we proceed as follows:

  1. (i)

    Find periodicities:

    $$\displaystyle \begin{aligned} T_1>\cdots > T_s \end{aligned} $$
    (13)

    using periodogram (see [1, 2, 8] and elsewhere). Choose a proper periodicity T from the list (13) and due to Theorem 1(a), set h = dT for some d (we usually put d ∈{1, 2}).

  2. (ii)

    Form a fuzzy partition (1) and compute Fm-transform components:

    $$\displaystyle \begin{aligned} {\mathbf{F}}^m[X]= (F^m_0[X], \ldots, F^m_{n-1}[X]) \end{aligned} $$
    (14)

    for m ∈{0, 1}.

  3. (iii)

    Compute estimation of trend or trend-cycle using the inverse F-transform (3). Taking into account the equality (11) and Theorem 1, we can estimate trend-cycle as (by ≈ we denote approximate equality)

    $$\displaystyle \begin{aligned} {\mathit{T\!C}}\,\approx\hat{X}_{h_{TC}} \end{aligned}$$

    and trend as

    $$\displaystyle \begin{aligned} T(t)\approx\hat{X}_{h_T}(t) \end{aligned}$$

    where hTC is set according to a periodicity T that is chosen from the middle of the list (13) and for hT it is chosen from the left part of it .

Note that (3) provides also analytic form of the estimation. As a consequence of Theorem 1, we conclude that the F-transform makes it possible to estimate trend or trend-cycle with high fidelity. A convincing demonstration of this statement is presented in [35].

5 Forecasting Time Series

Recall that the direct F-transform provides estimation of the trend-cycle \({\mathit {T\!C}}\) (or trend) in the form of the vector of components (14). Then, using the special learning method developed in FNL (see [34]), we can learn the linguistic description which characterizes the principles of the behavior of \({\mathit {T\!C}}\). Then, using the PbLD method, we can forecast k future F-transform components:

$$\displaystyle \begin{aligned} F^m_n[X], \ldots, F^m_{n+k-1}[X]. \end{aligned} $$
(15)

Finally, from (15) we compute the forecast of the trend-cycle as the inverse F-transform \(\hat {X}(t)\) for t = p + 1, …, p + K where K is the forecast horizon. In our case, it is k-times the width of the basic functions, i.e., K = 2kh. The idea of the forecast is depicted in Fig. 1.

Fig. 1
A graph plots the forecast of component F n plus 1 that starts from the left on the vertical axis above 0 and passes through F 1, F n minus 2, F n minus 1, and F n. The region on the left of F n is labeled known part and on the right is labeled forecast part.

Scheme of the forecasting idea: The component Fn+1 is forecasted using PbLD method on the basis of the learned linguistic description

Example of such a learned linguistic description is

(16)

Let us emphasize that the learned linguistic description (16) explains in natural language the way how the forecast was obtained. This can be interesting information for the user that he/she can use in further decision or strategy of behavior (Fig. 2).

Fig. 2
A line graph of plots horizontal, fluctuating, rising, falling, solid, and dotted lines. The left part of the graph is the validation part, and the right part is the testing part.

Example of the time series forecasting. The left part is validation part; on the right is testing part (never used in computation): The dotted lines are real and forecasted trend-cycle; the full lines are real and forecasted values of time series

The seasonal component is forecast separately. The forecast of the whole time series is obtained by summing predictions of \({\mathit {T\!C}}\) and S (cf. (8)). Demonstration of our forecasting method on real time series is presented in [21, 44] and elsewhere.

6 Mining Information from Time Series

One of the essential characteristics of our methods is the possibility to characterize various features of time series using expressions of natural language. Hence, our methods have a big potential in the area of mining information from time series since they can provide information that cannot be obtained using statistical methods. Below we will mention some of them. A concise overview of methods for mining information from time series is given in [7].

Linguistic Evaluation of Local Trend

An exciting question is what trend (tendency) of the time series can be recognized in a specific time interval. Surprisingly, recognizing the trend is by no means a trivial task even when people watch the time series graph. Moreover, it can be essentially influenced by a subjective opinion. Therefore, objective and independent tool for this task is welcome. A convenient one is the F1-transform since it makes it possible to estimate the average slope (tangent) over an imprecisely determined area, and using methods of FNL, it can be characterized in natural language. For example, we can say “clear decrease (or huge increase) of trend,” “the trend is negligibly increasing (or stagnating),” etc. Such linguistic expressions characterize trend (tendency) of the time series in an area specified by the user. The ability to generate such linguistic evaluations is a quite important achievement of the fuzzy techniques. The method is based on the theoretical results in fuzzy natural logic and is described in more detail in [26, 27].

Evaluation of the steepness of the slope in natural language is determined using the function of local perception:Footnote 3

$$\displaystyle \begin{aligned} \mathcal{A}={\mathrm{LPerc}}\,(\beta^1,w_{tg}) \end{aligned} $$
(17)

which assigns a proper evaluative expression \(\mathcal {A}\) to the value β1 w.r.t. the context wtg. To determine it, we must first specify what does it mean “extreme (utmost) increase (decrease)” in a given context. It can be determined as the largest acceptable difference of time series values in relation to a given (basic) time interval (e.g., 12 months, 31 days), i.e., a minimal and maximal tangent. In practice, we set only the maximal tangent vR, while the smallest one is usually vL = 0. The typically medium value vS is determined analogously as vR. The result is the context wtg = 〈vL, vS, vR〉 that determines the interval [vL, vS] ∪ [vS, vR]. Demonstration of the evaluation of the slope is in Fig. 3.

Fig. 3
A line graph plots a triangle and a fluctuating line. A right-down slanted arrow points from above the center of the vertical axis to below the right center of the graph.

Demonstration of the slope determined by the value of β1 computed over an area characterized by a triangular basic function (depicted above the x-axis). It can be characterized linguistically w.r.t. context wtg of the tangent which is determined by the ratio of the largest difference between values of the time series and the basic time interval (day, week, month, etc.). The evaluation in this picture is “slightly decreasing.” Note that human eye does not immediately see this slope from the course of the time series

A related task is to find intervals in which the time series has a monotonous trend (see [32]). This means that we decompose the time domain \(\mathbb {T}\) into a set of intervals:

$$\displaystyle \begin{aligned} \mathcal{T}= \{\mathbb{T}_i\mid i=1, \ldots, s\}, \qquad \bigcup \mathcal{T}=\mathbb{T} \end{aligned} $$
(18)

so that the time series \(X|\mathbb {T}_i\) (restriction of X to the interval \(\mathbb {T}_i\)) has a monotonous trend and each two adjacent time intervals \(\mathbb {T}_i, \mathbb {T}_{i+1}\) have a common time point. Each \(\mathbb {T}_i\) is the largest interval that is evaluated using the same evaluative expression \(\mathcal {A}\). For example, it is the largest interval, in which the trend is stagnating (sharply increasing/decreasing, etc.), while the interval \(\mathbb {T}_{i+1}\) has a different slope.

Finding Perceptually Important Points

Finding perceptually important points is another task successfully solved using our methods. According to [7], these are points where the time series essentially changes its course. In this paper, however, the authors have in mind just isolated points in the time series. However, its character can be quite complicated, various frequencies and noise are present, and, therefore, we cannot expect that perceptually important point is just one isolated time point, but it is better an area that cannot be precisely determined. Therefore, we suggest a method based on the higher-degree F-transform because it makes it possible to estimate the first and second derivatives (5) and (6) of a function with complicated course in a vaguely specified area. The perceptually important points can be recognized in areas \(A_k\in \mathcal {A}_h\) for k ∈{1, …, k} in which estimation of the slope (2) is close to zero, i.e., \(\beta ^1_k\approx 0\). The method is in more detail explained in [29].

Recently, a new promising method for finding perceptually important points in time series has been presented in [38]. It is based on construction of a special Laplacian with kernels producing fuzzy partition used in fuzzy transform. The method can further be used to register similar time series or in a new algorithm for their forecasting.

Structural Breaks

Structural breaks are sudden, considerable changes in the ordinary course of the time series X. In statistics, there are many methods suggested to solve this task [4, 5, 40].

In [28], we suggested a method for their detection which is similar to finding intervals of monotonous behavior described above. We check the slope of time series within two subsequent intervals determined by two adjacent fuzzy sets \(A_i, A_{i+1}\in \mathcal {A}_h\) for a particular fuzzy partition \(\mathcal {A}_h\) with shorter h (in practice, we set h ∈{4, …, 7}). The main difference lays in searching intervals \(\mathbb {T}_i\) in which the slope of \(X|\mathbb {T}_i\) is largely of hugely increasing/decreasing and the slope \(X|\mathbb {T}_{i+1}\) in the adjacent interval \(\mathbb {T}_{i+1}\) is much less increasing/decreasing or even stagnating. Examples demonstrating our method in identification of structural breaks are in [44] (this volume). Let us remark we have also developed a method for detection of structural breaks in time series volatility.

Other Applications

On the basis of our theories, we also developed the following methods:

  1. (a)

    Detection of “bull and bear” phases of financial time series—see [21].

  2. (b)

    Measures of similarity between time series — see [12, 31]. We suggested two indexes that measure similarity (and, potentionally, dependence) between two time series. Both indexes are based on the F-transform and give convincing results.

  3. (c)

    Automatic summarization of knowledge about one or several time series. This task is addressed by various authors (see, e.g., [3, 7, 9, 10, 13, 41]). The theory of fuzzy natural logic contains a sophisticated formal theory of intermediate quantifiers that are expressions of natural language such as “many, almost all, most, a few,” etc. Using them, it is possible to derive statements that provide summarization of knowledge about time series. A typical example of such summarizing statement is

    The trend during the past three years is in almost all tracked time series is clearly increasing.

    The theory enables also humanlike syllogistic reasoning on the basis of the formal model of generalized Aristotle’s syllogisms. For more details, see [14, 26].

  4. (d)

    It is well-known that there are many methods for time series forecasting but none of them outperforms all the other ones. The reason is that each method is well suited to time series having specific features that, however, may not be fulfilled by the other ones and so, the given method fails. This suggests the idea to form a linear combination of several forecasting methods using weights that express a certain degree of successfulness of each method. However, it is difficult to set the weights. Our idea is to find a linguistic description (7) based on specific features of time series, for example, trend, seasonality, stationarity, and other ones. The linguistic description is learned using a method for mining linguistic associations. Our approach is described in [43] where also experimental justification is provided.

All our methods are robust and very fast because of the linear time complexity of the F-transform.

7 Conclusion

In this paper, we gave an overview of a few nonstatistical methods for analyzing and forecasting time series and mining information from them. The theoretical background of our methods is the theory of fuzzy transform and the theory of fuzzy natural logic. The former enables to estimate trend or trend-cycle with high fidelity and to reduce noise. Moreover, the F-transform also provides an analytic form of the latter. Using selected methods of FNL, we can accompany these results by explanation in natural language.

Moreover, a combination of the latter and the F-transform provides a forecast of time series. Further applications of our methods are in the area of mining information from them. They include finding intervals of monotonous behavior completed by its linguistic evaluation, detection of perceptually important points and structural breaks, summarization, measuring of similarity, and other applications.