Keywords

1 Introduction

The construction of the neural networks structure suffers from some deficiencies: the local minima, the lack of efficient constructive methods, and the convergent efficiency, when using ANNs. As a result, the researchers discovered that the WNN, is a new class of neural networks which joins the wavelet transform approach. The WNN were presented by Benveniste and Zhang. This approach is used to approximate the complex functions with a high rate of convergence [1]. This model has recently attracted extensive attention for its ability to effectively identify nonlinear dynamic systems with incomplete information [15]. The satisfying performance of the WNN depends on an appropriate determination of the WNN structure. To solve this task many methods are proposed to optimize the WNN parameters. These methods are applied for training the WNN such as the least-square which is used to train the WNN when outliers are present. These training methods are applied to reduce some function costs and improve performed the approximation quality of the wavelet neural network. On the other hand, the WNN has often been used on a small dimension [6]. The reason is that the complexity of the network structure will exponentially increase with the input dimension. The WNN structure has been studied by several researchers. Moreover, the research effort has been made to deal with this problem over the last decades [69]. The application of WNN is usually limited to problem of small dimension. The number of wavelet functions in hidden layer increases with the dimension. Therefore, building and saving WNN of large dimension are of prohibitive cost. Many methods are used to reduce the size of the wavelet neural networks to solve large dimensional task. Such as, magnitude approach is applied to eliminate of wavelets function with small coefficients [3]. The method, referred as Matching Pursuit (MP), was first introduced by Mallat. This method is used to determine a good wavelet basis in a dictionary [10]. Following this line of works, the Residual Based Regressor Selection (RBRS) algorithm is proposed for the synthesis of WNN. The Stepwise Regressor Selection by Orthogonalization (SRSO) and the Orthogonal Least Square (OLS) suggested are both the popular approaches [8]. These methods are used to reduce the complexity of the selected subset models. The number of regressors is smaller than that of the inputs [27, 28]. In addition, Bellil et al. applied a new initialization by selecting the method of the library WNN training. This approach is used to approximate a small number of inputs [22]. In this study, we use the Least Trimmed Square (LTS) method to select a little subset of wavelet candidates from MLWNN constructing the WNN structure in order to build a method to classify a collection containing a dataset of DNA sequences. This method is used to optimize an important number of inputs of DNA sequences. The Beta wavelet function is used to build the WNN. This wavelet makes the WNN training very efficient a reason of adjustable parameters of this function. Various approaches are used for clustering the DNA sequences such as the WNN, which is applied to construct a classification system. Wu et al. usedii an artificial neural network to classify the DNA sequences [11]. Moreover, Jach and Marín are proposed a method to classify the mitochondrial DNA Sequences. This approach joins the WNN and a Self-organizing map method. The feature vector sequences constructed by using the WNN [12]. Yang et al. used a Wavelet packet analysis to extract features of DNA sequences, which are applied to recognize the types of other sequences [13]. Wu et al. applied the neural network to classify the nucleic acid sequence. This classifier used three-layer and feed-forward networks that employ back-propagation learning algorithm [14]. Since a DNA sequence can be converted into a sequence of digital signals, the feature vector can be built in time or frequency domains. However, most traditional methods, such as k-tuple and DMK,… models build their feature vectors only in the time domain, i.e., they use direct word sequences. This paper contains three sections: in Sect. 2, we present our proposed approach. Section 3 presents the wavelet theory used to construct the WNN of our method. Section 4 shows the simulation results of our approach and Sect. 5 ends up with a conclusion.

2 Methods

This paper presents a new approach based on the wavelet neural network, which is constructed by using the Multi-library Wavelet Neural Networks (MLWNN). The WNN structure is solved by using the LTS method. Our approach is divided into two stages: approximation of the input signal sequence and clustering of feature extraction of the DNA sequences using the WNN and k-means algorithm.

2.1 Conversion of the DNA Sequence into a Genomic Signal

The species are classified in class by using the DNA sequence, which is composed of four basic nucleotides A(Adenine), G(Guanine), C(Cytosine) and T(Thymine), where each organism is identified by its DNA sequence [21, 22]. The feature extraction vector presents the DNA sequence. This vector is used to classify the species [23]. In this study, this vector coded by using a digital format, which can be used for DNA signal spectrum indicates 1 or 0 for the existence or not of a specific nucleotide at the DNA sequence level [24, 25]. For example, if x[n] = [A A A T…], we obtain: x[n] = [1000 1000 1000 0001…]

The indicator sequence is manipulated with mathematical methods. The sequence of complex numbers, called f (x) (1), is obtained by using the discrete Fourier Transform:

$$ f(x) = \sum\limits_{n = 0}^{N - 1} {\mathop X\nolimits_{e} (n)\mathop e\nolimits^{ - j\pi k/N} ,k = 0,1,2, \cdots N - 1} $$
(1)

The Power Spectrum is applied to compute the Se[k] (2) for frequencies k = 0, 1, 2, …, N−1 is defined as,

$$ Se[k] = \mathop {\left| {f(x)} \right|}\nolimits^{2} $$
(2)

2.2 Wavelet Neural Network

The wavelet neural network is defined by the combination of the wavelet transform and the artificial neuron networks [33, 34]. It is composed of three layers. The salaries of the weighted outputs are added. Each neuron is connected to the other following layer. The WNN (Fig. 1) is defined by pondering a set of wavelets dilated and translated from one wavelet candidate with weight values to approximate a given signal f. The response of the WNN is:

$$ \hat{y} = \sum\limits_{i = 1}^{{N_{w} }} {w_{i} } \varPsi \left( {\frac{{x - b_{i} }}{{a_{i} }}} \right) + \sum\limits_{k = 0}^{{N_{i} }} {a_{k} } x_{k} $$
(3)

where (x1, x2,…, xNi) is the vector of the input, Nw is the number of wavelets and y is the output of the network. The output can have a component refine in relation to the variables of coefficients a k (k = 0, 1… Ni) (Fig. 1). The wavelet mother is selected from the MLWNN, which is defined by dilation (ai) which controls the scaling parameter and translation (bi) which controls the position of a single function (Ψ(x)). A WNN is used to approximate an unknown function:

$$ y = f\left( x \right) + \varepsilon , $$
(4)
Fig. 1.
figure 1

The three layer wavelet network

Where \( f \) is the regression function and \( \varepsilon \) is the error term.

2.3 Multi Library Wavelet Neural Network (MLWNN)

Many methods are used to construct the Wavelet Neural Network. Zhang applied two stages to construct the Wavelet neural Network [2, 3]. First, the discretely dilated and translated version of the wavelet mother function Ψ is used to build the MLWNN.

$$ W = \left\{ {\mathop \psi \nolimits_{i} :\psi_{i} (x) = \mathop \alpha \nolimits_{i} \psi \left( {\frac{{(x_{k} - b_{i} )}}{{a_{i} }}} \right),\;\;\mathop \alpha \nolimits_{i} = \left( {\sum\limits_{k = 1}^{n} {\left[ {\psi \left( {\frac{{(x_{k} - b_{i} )}}{{a_{i} }}} \right)} \right]}^{2} } \right)^{{\frac{1}{2}}} ,\;\;i = 1, \ldots L} \right\}, $$
(5)

Where L is the number of wavelets in W and xk is the sampled input. Then the best M wavelet mother function is selected based on the training sets from the wavelet library W, in order to construct the regression:

$$ \mathop f\nolimits_{M} \left( x \right) = \hat{y} = \sum\limits_{i \in I} {\mathop w\nolimits_{i} \mathop \psi \nolimits_{i} } \left( x \right), $$
(6)

Where \( \text{M} \le \text{L} \) and I is a subset wavelet from the wavelet library.

Secondly, the minimized cost function:

$$ j\left( I \right) = \mathop {\hbox{min} }\limits_{{\mathop w\nolimits_{i} ,i \in I}} \frac{1}{n}\sum\limits_{k = 1}^{n} {\left( {\mathop y\nolimits_{k} - \sum\limits_{i \in I} {\mathop w\nolimits_{i} } \mathop \psi \nolimits_{i} \left( {\mathop x\nolimits_{k} } \right)} \right)}^{2} , $$
(7)

The stepwise selection by orthogonalization and the backward elimination are used by Zhang; the first is applied to select the appropriate wavelet in the hidden units while the second is used to choose the number of the hidden units, and of wavelets M, which are selected as the minimum of the so-called Akaike’s final prediction error criterion (FPE) [2, 3]:

$$ \mathop j\nolimits_{FPE} \left( {\hat{f}} \right) = \frac{{1 + {{n_{pa} } \mathord{\left/ {\vphantom {{n_{pa} } n}} \right. \kern-0pt} n}}}{{1 - {{n_{pa} } \mathord{\left/ {\vphantom {{n_{pa} } n}} \right. \kern-0pt} n}}}\frac{1}{2n}\sum\limits_{k = 1}^{n} {\left( {\hat{f}\left( {\mathop x\nolimits_{k} } \right) - \mathop y\nolimits_{k} } \right)}^{2} $$
(8)

Where the estimator have \( \mathop n\nolimits_{pa} \) parameters.

The gradient algorithms used to train the WNN, like least mean squares to reduce the mean-squared error:

$$ j\left( w \right) = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {y_{i} - \hat{y}\left( w \right)} \right)} {}^{2}, $$
(9)

Where \( j(w) \) is the output of the Wavelet neural networks. The time-frequency locality property of the wavelet is used to give a signal \( f \), a candidate library \( w \) of wavelet basis can be constructed.

2.4 Wavelet Network Construction Using the LTS Method

The set of training data \( TN = \left\{ {x_{1} ,x_{2} , \ldots ,x_{k} ,f(x_{k} )} \right\}_{k = 1}^{N} \) is used to adjust the weights and the WNN parameters, and the output of the three layers of the WNN in Fig. 2 can be expressed via (7). The model selection is used to select the wavelet candidates from the Multi Library Wavelet Neural Networks (MLWNN). These wavelet mothers are used to construct the wavelet neural network structure. In this study, the Least Trimmed Squares estimator (LTS) is proposed to select a little subset of wavelet candidates from the MLWNN. These wavelet candidates are applied to construct the hidden layer of the WNN [35]. Furthermore, the Gradient Algorithm is proposed to optimize the wavelet neural networks parameter. The residual (or error) ei at the ith output of the WNN due to the ith example is defined by:

$$ \mathop e\nolimits_{i} = \mathop y\nolimits_{i} - \hat{y}_{i} ,{\text{ i}} \in {\text{n}} $$
(10)
Fig. 2.
figure 2

Proposed approach

The Least Trimmed Square estimator is used to select the WNN weights that minimize the total sum of trimmed squared errors:

$$ \mathop E\nolimits_{total} = \frac{1}{2}\sum\limits_{k = 1}^{p} {\sum\limits_{i = 1}^{l} {\mathop e\nolimits_{ik}^{2} } } $$
(11)

The Gradient Algorithm used to optimize the parameters (ai, bi, wi) of the WNN is as follows:

$$ \frac{\partial c}{{\partial \hat{y}}} = e_{k} $$
(12)
$$ \frac{\partial c}{{\partial w_{i} }} = e_{k} \psi (\frac{{x_{i} - b_{i} }}{{a_{i} }}) $$
(13)
$$ \frac{\partial c}{{\partial b_{i} }} = - e_{k} w_{i} \frac{1}{{a_{i} }}\psi {}^{'}\left( {\frac{{x_{i} - b_{i} }}{{a_{i} }}} \right) $$
(14)
$$ \frac{\partial c}{{\partial a_{i} }} = - e_{k} w_{i} \left( {\frac{{x_{k} - b_{i} }}{{a_{i} {}^{2}}}} \right)\psi '\left( {\frac{{x_{k} - b_{i} }}{{a_{i} }}} \right) $$
(15)

where

$$ e = \hat{y}(x_{k} ) - f(x_{k} ) $$
(16)

2.5 Approximation of DNA Sequence Signal

The classification of DNA sequences is an NP-complete problem; the alignment is outside the range of two sequence of DNA, the problem rapidly becomes very complex because the space of alignment becomes very important. The recent advance of the sequence technology has brought about a consequent number of DNA sequences that can be analyzed. This analysis is used to determine the structure of the sequences in homogeneous groups using a criterion to be determined. In this paper, the Power Spectrum is used to process the signal of the DNA sequence. These signals are used by the wavelet neural networks (WNN) to extract the signatures of DNA sequences, which are used to match the DNA test with all the sequences in the training set [29]. Initially, the signatures of DNA sequences developed by the 1D wavelet network during the learning stage gave the wavelet coefficients which are used to adapt the DNA sequences test with all the sequences in the training set. Then, the DNA test sequence is transmitted onto the wavelet neural networks of the learning DNA sequences and the coefficients specific to this sequence are computed. Finally, the coefficients of the learning DNA sequences compared to the coefficients of the DNA test sequences by computing the Correlation Coefficient. In this stage, the k-means clustering is used to classify the signatures of the DNA sequences [27].

2.6 Learning Wavelet Network Using Gradient Algorithm and LTS Methods

In this section, we show how the library wavelet is used to learn a wavelet neural network [15, 16, 26, 27].

  • Learning approach

    • Step 1: The data set of DNA sequence is divided into two groups: training and testing dataset. These groups are applied to train and test the wavelet neural network.

    • Step 2: Conversion of DNA sequence to a genomic signal using a binary indicator and Power Spectrum Signal Processing

    • Step 3: the discretely dilated and translated version are used to construct the library W. The training data are proposed to create this library wavelet, apply the Least Trimmed Square (LTS) algorithm to select the optimal mother wavelet function (10), (11) and choose, from the library, the N wavelet candidate that best matches an output vector.

      • Step 3.1: Initializing of the mother wavelet function library

      • Step 3.2: Randomly initialize wjk and vij.

      • Step 3.3: For k = 1,…, m

        1. (a)

          Calculate the predicted output \( \hat{y}_{i} \) via (3).

        2. (b)

          Compute the residuals \( e_{ik} = y_{i} - \hat{y}_{i} \) via (10).

        3. (c)

          the algorithm is stopped when the criteria diverged, then stop; otherwise, go to the next step

        4. (d)

          Find the arranged values \( e^{2}_{ik} \le \ldots \le e^{2}_{im} \). Choosing the N best mother wavelet function to initialize the WNN.

    • Step 4: The values of \( \mathop w\nolimits_{ij}^{opt} ,\mathop a\nolimits_{i}^{opt} \;and\;\mathop b\nolimits_{i}^{opt} \) are computed using the Gradient algorithm via (13), (15), (14) and (16) go to step 3.3.

  • Clustering using K-means

    • Step 1: Generate a matrix M_signature of DNA sequences (\( \left( {\mathop w\nolimits_{ij}^{opt} ,\mathop a\nolimits_{i}^{opt} \;and\;\mathop b\nolimits_{i}^{opt} } \right) \)).

    • Step 2: Let M_signature \( s_{i} = \left\{ {w_{ij}^{opt} ,\;\mathop a\nolimits_{i}^{opt} ,\;\mathop b\nolimits_{i}^{opt} } \right\} \) be the set of data points and V = {v1, v2,…, vc} be the set of centers.

    • Step 3: choose the Number of groups (k)

    • Step 4: Assume the k training instance of signature of DNA as lonely-unit groups.

    • Step 5: Affect each of (n-k) training instance of signature of DNA to the group with the proximate centroid and Recalculate the centroid of the winning group.

    • Step 6: Compute the distance of each signature of DNA from the centroid of each group, switch the instance if it is not in the group and update the centroid of the winning and losing group.

    • Step 7: Go to Step 6 if the convergence is not achieved

3 The Bêta Wavelet Family

The function beta is defined by β(x) = βx0, x1, p, q(x) [21, 22, 32, 33], x0 and x1 are real parameters.

$$ \beta (x,p,q,x_{0} ,x_{1} ) = \left\{ {\begin{array}{*{20}l} {\mathop {\left( {\frac{{x - x_{0} }}{{\mathop x\nolimits_{c} - x_{0} }}} \right)}\nolimits^{p} \mathop {\left( {\frac{{x - x_{1} }}{{\mathop x\nolimits_{1} - x_{c} }}} \right)}\nolimits^{q} } \hfill & {if\;\;x \in \left[ {x_{0} ,x_{1} } \right]} \hfill \\ {0 \, } \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$
(17)

The derivatives of this function ∈ L2(ℜ) are of class C∞. The general form of the nth derivative of Beta function is:

$$ \begin{aligned} \mathop \psi \nolimits_{n} (x) = \frac{{\mathop d\nolimits^{n} \beta (x)}}{{d\mathop x\nolimits^{n} }} = \left[ {\mathop {\left( { - 1} \right)}\nolimits^{n} \frac{n!p}{{\mathop {\left( {x - \mathop x\nolimits_{0} } \right)}\nolimits^{n + 1} }} + \frac{n!q}{{\mathop {\left( {\mathop x\nolimits_{1} - x} \right)}\nolimits^{n + 1} }}} \right]\beta (x) + \mathop P\nolimits_{n} (x)\mathop P\nolimits_{1} (x)\beta (x) + \hfill \\ \sum\limits_{i = 1}^{n} {\mathop C\nolimits_{n}^{i} } \left[ {\mathop {\left( { - 1} \right)}\nolimits^{n} \frac{{\left( {n - i} \right)!p}}{{\mathop {\left( {x - \mathop x\nolimits_{0} } \right)}\nolimits^{n + 1 - i} }} + \frac{{\left( {n - i} \right)!q}}{{\mathop {\left( {\mathop x\nolimits_{1} - x} \right)}\nolimits^{n + 1 - i} }}} \right] \times \mathop P\nolimits_{1} (x)\beta (x) \hfill \\ \end{aligned} $$
(18)

where:

$$ \mathop P\nolimits_{1} (x) = \frac{P}{{x - \mathop x\nolimits_{0} }} - \frac{q}{{\mathop x\nolimits_{1} - x}} $$
(19)
$$ \mathop P\nolimits_{n} (x) = \mathop {\left( { - 1} \right)}\nolimits^{n} \frac{n!p}{{\mathop {\left( {x - \mathop x\nolimits_{0} } \right)}\nolimits^{n + 1} }} - \frac{n!q}{{\mathop {\left( {\mathop x\nolimits_{1} - x} \right)}\nolimits^{n + 1} }} $$
(20)
$$ x_{c} = \frac{{(px_{1} + qx_{0} )}}{(p + q)} $$
(21)

4 Results and Discussion

This paper used three datasets HOG100, HOG200, and HOG300 selected from microbial organisms [23]. In this study, different experiments are used to evaluate the performance of our approach. The data set of DNA sequences are divided into test and train data. The published empirical and synthetic datasets are selected to perform the clustering comparative analysis [23] (Table 1).

Table 1. Distribution of available data into training and testing set of DNA sequence
Table 2. Selected mother wavelets and normalized square root of mean square error using residual based regressor selection (RBRS)
Table 3. Selected mother wavelets and normalized root mean sequare using stepwise regressor selection by orthogonalization (SRSO)

4.1 Selecting the Mother Wavelet

To evaluate the performance of our method, the NSRMSE and the training time serve. The LTS has a better performance to select the wavelet candidates from the Multi Library Wavelet Neural Networks (MLWNN). In the beginning, during the phase of approximation our approach tried to decompose the input signal for every DNA sequence and in the end, it tried to reconstruct the input signal of DNA. The estimation of the performance of this phase is measured by the NSRMSE. (Table 4) shows that obtained NSRMSE are low (0.000869) and the run time increases relatively with size of the DNA sequence. The result shows that the size of DNA sequence increases the time of the training phase. This time depends in the length of a DNA sequence. The training time increases (12.213 s) when the size is equal to 700. The complexity of the WNN structure increases exponentially with the input dimension. To solve the approximation problem we applied the library wavelet which incorporates a family wavelet. This library is called the Multi Library Wavelet Neural Network Model (MLWNN) (Tables 2 and 3).

Table 4. Selected mother wavelets and normalized root mean sequare using the least trimmed square (LTS).

4.2 Classification Results

Experiment results were performed to prove the effectiveness of our proposed approach. Evaluation metrics namely Precision, Recall and F-measures are used to compare our approach with other competitive methods. The F-measure combines the precision and the recall metric. We then calculate the recall and the precision of that cluster for each given class. More specifically, for cluster j and class i. The F-measure of the class j and the group i is the given by

$$ F(j,i) = (2*Recall(i,j)*Precision(i,j))/((Precision(i,j) + Recall(i,j)) $$
(22)

Table 5 shows that WNN-LTS (our method) is better than other models(WFV, K-tuple and DMK) in terms of the classification results and optimal settings. The number of classes obtained by our approach is little less than in the other methods. The F-score (F-measure) proves the efficiency of our method. The F-measure is increased using WNN and LTS method. The LTS is applied to optimize the WNN structure.

Table 5. The classification results of WNN- LTS (Our Method) and other models (WFV, K-tuple, DMK) on different datasets of DNA sequences

4.3 Running Time

Tables 5 and 6 show that the WNN can produce very good the prediction accuracy. The results of our approach WNN-LTS tested on datasets show that accuracy outperforms the other techniques in terms of percentage of the correct species identification. Tables 5 and 6 show the distribution of the good classifications by class as well as the rate of global classification for all the DNA sequences of the validation phase. The WNN-LTS (our approach) is faster than the other methods. This speed is due to the use of the Least Trimmed Square (LTS) algorithm.

Table 6. Running time in seconds of each method on all datasets.

5 Conclusions

In this study, we have used the LTS method to select a subset of wavelet function from the Library Wavelet Neural Network Model. This subset wavelet is applied to build Wavelet Neural Network (WNN). The WNN is used to approximate function f (x) of a DNA sequence signal. Firstly, the binary codification and Power Spectrum are used to process the DNA sequence signal. Secondly, the Library Wavelet is constructed. The LTS method is used to select the best wavelet from library. These wavelets are applied to construct the WNN. Thirdly, the k-means classification is used to classify the similar DNA sequences according to some criteria. This clustering aims at distributing DNA sequences characterized by p variables X1, X2…Xp in a number m of subgroups which are homogeneous as much as possible while every group is well differentiated from the others. The proposed approach helps to classify DNA sequences of organisms into many classes. These clusters can be used to extract significant biological knowledge.