Introduction

The recording of the human electroencephalogram (EEG) is a non-invasive method to record brain electric potentials from the human scalp via a set of electrodes. Because of the complex character of the EEG, research is needed for better understanding of the mechanisms causing epileptic disorders. EEG researchers have long been familiar with the fact that EEG activity has a wide variety of frequency components, which may be associated with different physiological processes. However, the range of clinical and physiological interests is between 0.5 and 30 Hz. This range is classified approximately in a number of frequency bands as follows: δ (0.5–4 Hz), θ (4–8 Hz), α (8–13 Hz), β (13–30 Hz) [1].

The epileptiform EEG includes abnormal electrical activity generated during an epileptic seizure, superimposed onto the ongoing background EEG. The detection of epileptiform discharges is of particular interest in determining the regions of the brain which are involved with the seizure activity, especially during seizure onset. Much work has also been performed in analyzing the structure of normal background EEG in healthy subjects. In particular, in trying to extract a measure of ‘vigilance’ from background wake EEG.

Since time domain analysis of EEG signal is insufficient, generally frequency domain analysis, different signal processing and soft computing techniques are used to extract the diagnostic information. For example; Alkan et al. [1], have used logistic regression and artificial neural network for epileptic seizure detection. They used the three different spectral analysis methods for pre-processing to get features of EEG signals. These features were used to train and test artificial neural network. Subasi et al. [3], have used wavelet neural networks [2], parametric and subspace methods for epileptic seizure detection.

In recent times, the independent component analysis (ICA) method has been introduced in the field of bio-signal processing as a promising technique for separating independent sources [47]. The ICA method can process raw EEG data and find features related to various one’s activity. Therefore, ICA overcomes the problems related with ensemble averaging, and it observes the waveforms of the EEG data. There have been research results reported for applying ICA to EEG signals and magneto encephalogram (MEG) signals. For example, Jung et al. [5], applied ICA to removing electrooculogram (EOG) noise from EEG data. Ikeda et al. [6], Applied ICA to removing signal noise introduced by environmental noises. Tang et al. [7], applied ICA to the task of estimating dipoles using MEG data. In the field of EEG and MEG researches, the main application for ICA is to noise reduction and dipole estimation (Fig. 1).

Fig. 1
figure 1

Block diagram of the proposed classification system

Hence, there has been little research undertaken to extract the desired EEG signals related to motion and emotion in applications of the ICA method. ICA performs a blind separation of statistically independent sources using higher-order statistics. Several different implementations of ICA can be found in the literature; Comon [8], Bell and Sejnowski [9] Makeig et al. [10] and Leal et al. [11].

In this study we have used the fast independent component analysis (FastICA) algorithm as an projection pursuit, because of its ease of implementation and speed of operation. Outputs of the Fast ICA algorithm are used as an input to a multiple layer neural network (MLPNN) to classify the subjects as epileptic or not. The results of the classification system are superior than the studies give in references [1, 2].

Materials and methods

In our applications, a detailed classification between a set of healthy subjects and a set epileptic subjects was performed. The correct classification rates and convergence rates of the proposed MLPNN model which uses Fast ICA as a preprocessor was examined and then performance of the model was reported.

Finally, some conclusions were drawn concerning the MLPNN on classification of the EEG signals. A sample epileptic and non-epileptic EEG signals are given in Fig. 2. The details of the data can be found in the reference [12]. We performed all of the methods/computations described in this study using an IBM PC with 3 GHz Pentium IV processor, 1 GB of memory, and Windows-XPTM Professional operating system.

Fig. 2
figure 2

Sample EEG recordings. a Epileptic. b Normal

Independent component analysis

ICA is a novel data analysis method that has gained interest in recent years, and it has different applications of the algorithm to several data analysis problems. ICA has been developed for the cocktail party problem, which involves the separation of the relevant individual signals from the mixture signal resulting when two or more speakers talk simultaneously. Thus, several application examples have been reported with the increased attention in ICA on the recent years.

Definition of independent component analysis

Mixture signals consist of independent original signals which overlap under different conditions. ICA can solve the problem by which observation of the mixture signals decomposed independent signals. Assume that we observed an M-dimensional observation vector expressed as;

$$x(t) = {\left[ {\begin{array}{*{20}c} {{x_{1} (t)}} \\ {{x_{2} (t)}} \\ {{..}} \\ {{..}} \\ {{x_{M} (t)}} \\ \end{array} } \right]}$$
(1)

And an N-dimensional original vector s(t) can written as;

$$s(t) = {\left[ {\begin{array}{*{20}c} {{s_{1} (t)}} \\ {{s_{2} (t)}} \\ {{..}} \\ {{..}} \\ {{s_{N} (t)}} \\ \end{array} } \right]}$$
(2)

Without loss of generality, we can assume that both the mixture variables and the independent components have zero mean: if this is not true, then the observable variables x(t) can always centered by subtracting the sample mean, which makes the model zero-mean [15]. Using this vector-matrix notation, to give the the relation between x(t) and s(t) the above mixing model is written as

$$x{\left( t \right)} = a_{1} s_{1} + a_{2} s_{2} + ........... + a_{N} s_{N} = As{\left( t \right)}$$
(3)

It is implicit that there exists a linear relationship between the signals x and s, so the vector A is given as; A = [a 1,a 2,a 3,...,a N ].

The aim of the use of ICA is to find the estimation of the mixing matrix A, consequently the independent source vectors from the observed mixed vector x. This aim is the same to find a separating matrix W that satisfies,

$$\widehat{s} = Wx$$
(4)

where \( \widehat{s} \) is the estimation of s.

To find the separation matrix W, the following assumptions are made: (1) The sources are statistically independent; (2) the sources must have nongaussian distributions. It needs to point out here that non-Gaussianity is a requirement for the elements of \( \widehat{s} \), i.e. the source signals or independent components. There are different measures of non-Gaussianity such as kurtosis and negentropy. Non-Gaussianity is here measured by the approximation of negentropy.

Negentropy is based on the information-theoretic quantity of (differential) entropy. The entropy of a random variable can be interpreted as the degree of information that the observation of the variable gives. The more random, i.e. unpredictable and unstructured the variable is, the larger its entropy. The differential entropy H of a random variable y with density p(y) is defined as H(y) = −∫p(y) log p(y) dy. A fundamental result of information theory is that a Gaussian variable has the largest entropy among all random variables of equal variance. This means that entropy could be used as a measure of non-Gaussianity. To obtain a measure of non-Gaussianity that is zero for a Gaussian variable and always non-negative, one often uses a slightly modified version of the definition of differential entropy, negentropy, which is defined as J(y) = H(y gauss) − H(y), where y gauss is a Gaussian random variable of the same covariance matrix as y. Due to the above-mentioned properties, negentropy is always nonnegative, and is zero if and only if y has a Gaussian distribution [15].

The negentropy J can have the following approximation,

$$J{\left( y \right)} \approx k{\left[ {E{\left\{ {G{\left( y \right)}} \right\}} - EG(v)\} } \right]}2,$$
(5)

where G is a practically any non-quadratic function, k is a positive constant and v is a Gaussian variable of zero mean and unit variance.

Therefore, in order to find one independent component, one can maximize the function JG,

$$JG{\left( w \right)} = {\left[ {E{\left\{ {G{\left( {wTx} \right)}} \right\}} - E{\left\{ {G{\left( v \right)}} \right\}}} \right]}^{2} $$
(6)

where w, a vector in the weight matrix W, is constrained so that E{(w T x)2} = 1. According to Hyvärinen [13, 14] the function G has the following choices:

$$G_{1} {\left( u \right)} = {\left( {1 \mathord{\left/ {\vphantom {1 {a_{1} }}} \right. \kern-\nulldelimiterspace} {a_{1} }} \right)}{\text{ }}\log {\text{ }}\cosh {\left( {a_{1} u} \right)},$$
(7)
$$G_{2} {\left( u \right)} = - \exp {\left( {{ - u_{2} } \mathord{\left/ {\vphantom {{ - u_{2} } 2}} \right. \kern-\nulldelimiterspace} 2} \right)},$$
(8)
$$G_{3} {\left( u \right)} = {\left( {1 \mathord{\left/ {\vphantom {1 4}} \right. \kern-\nulldelimiterspace} 4} \right)}u_{4} ,$$
(9)

where 1 ≤ a 1 ≤ 2, a ≈ 1. G 1 is said to be a good general-purpose function among these three choices. When the independent components are highly super-Gaussian, or when robustness is very important, G 2 may be better. G 3 is actually based on the kurtosis and is justified on statistical grounds only for estimating sub-Gaussian independent components when there is no outlier.

Preprocessing

In the data analysis using ICA, it is necessary to carry out preprocessing of the observation signals. This is referred to as whitening. Because the general technique of whitening can be easily performed as compared to the independent component analysis, such a technique is suitable to reduce the complexity of the problem. Hyvärinen and Oja [15], have developed a fast fixed-point algorithm (FastICA). Fast ICA is a method by which the independent components are extracted one after the other by the use of the kurtosis. This method has high-speed convergence.

Results and conclusions

Normal and epileptic EEG signals are classified by using a standard artificial neural network (ANN). Total number of EEG data is 200 (100 epileptic and 100 normal). Half of the data is used for training and the other half is used for testing the network. Since each data is very long (4,097 sample), we divided the each data into two parts and there fore we have 400 (doubled) the used data each has 2,048 samples. As a result we used 200 epileptic and 200 normal EEG signals. To lower the dimension of the signals and provide the classification between epileptic and normal signals we used ICA as a projection pursuit. Although there are different ICA algorithms in literature we used the Fast ICA algorithm because of its computational speed.

The outputs of Fast ICA are used as an input for the multi layer perceptron neural network (MLPNN). Since the output of the Fast ICA contains 20 components we used a MLPNN with 20 input neurons and 1 output neuron. The optimum number of hidden layer neurons is selected as 29 experimentally. After 35 iterations the network reached a performance of 9.01214e-024. This result can be seen in Fig. 3.

Fig. 3
figure 3

Error levels according to iteration counts created by ANN

The sensitivity and specificity values can be calculated by using the equations given below.

$${\text{Sensitivity}} = {{\text{TP}}} \mathord{\left/ {\vphantom {{{\text{TP}}} {{\left( {{\text{TP}} + {\text{FN}}} \right)}}}} \right. \kern-\nulldelimiterspace} {{\left( {{\text{TP}} + {\text{FN}}} \right)}}$$
(10)
$${\text{Specificity}} = {{\text{TN}}} \mathord{\left/ {\vphantom {{{\text{TN}}} {{\left( {{\text{FP}} + {\text{TN}}} \right)}}}} \right. \kern-\nulldelimiterspace} {{\left( {{\text{FP}} + {\text{TN}}} \right)}}$$
(11)

After training the MLPNN we used the test data to see the test results of the network. The test results are summarized in Table 1. By using these test results in Eqs. 10 and 11 [1] the sensitivity and specificity values are founded 98% and 90.5% respectively. The results of the classification system are especially sensitivity value superior than the studies give in references [1, 2]. The testing performance of the neural network diagnostic system which uses the Fast ICA algorithm as a pre-processor is found to be satisfactory and we think that this system can be used in clinical studies in the future. Since the time series analysis of EEG signals is unsatisfactory and requires expert clinicians to evaluate, this application brings objectivity to the evaluation of EEG signals and it makes it easy to be used in clinical applications.

Table 1 Number of epilepsy and normal EEG data after testing by ANN