1 Overview

At birth, the brain contains as many nerve cells called neurons; the number of neurons at birth is the most that will ever be present in the brain. Unlike other cells of our body, such as skin or bone, the nerve cells cannot self-repair. Therefore, as we get older, the neurons die due to aging and are irreplaceable. Parkinson’s disease (PD) is a slowly advancing neurological condition that destroys the dopamine neurons in the substantia nigra pars compacta, which affects communication pathways of the brain. The PD usually affects people aged 50 or older and has affected approximately 7–10 million people worldwide to date, according to the WHO statistics [1]. In the next 25 years, the number of the PD is expected to increase due to the rise in the proportion of elderly people [1]. The cardinal symptoms of PD are tremor, muscle stiffness, bradykinesia (slowness of movement), unstable posture, or diminished balance and coordination and dysphonia (voice disorders). In addition, the PD is also identified by with the occurrence of non-motor symptoms including cognitive dysfunction. However, the diagnosis of PD based on the clinical symptoms remains complicated, when there are no significant motor or non-motor symptoms. On the other hand, the detection of PD from the neuroscience viewpoint provides us with another way to explore the neuronal mechanisms and improve diagnostic findings [2].

Researchers have proposed various noninvasive methods to detect PD using voice, gait, and wearable sensors [35]. Chen et al. [59] used the feature reduction method to exclude the redundant information in the original PD voice signal, which was integrated with fuzzy classifier for PD diagnosis. They have achieved a mean classification accuracy of 96.07% [6]. Zuo et al. [7] presented a study based on particle swarm optimization enhanced with fuzzy for the PD detection and obtained the highest accuracy of 97.47%. Ma et al. [4] combined extreme learning machine with subtractive clustering features algorithm to detect PD, which resulted with the best mean accuracy of 99.49%. Daliri [5] used gait information and employed feature discriminant ratio based on short-time Fourier transform and achieved maximum classification accuracy of 91.20%. In recent years, detection of PD using electroencephalogram (EEG) signals has drawn significant attention of numerous researches as the PD is a complicated neurodegenerative disease and there could be lot of information underlying neural mechanisms [1] compared to other modalities. It has been extensively shown that the EEG analysis, such as linear or nonlinear methods, could depict more global indices of brain functions, which can reflect the disturbed subcortico-cortical mechanisms in patients with PD. Various researchers have revealed abnormalities in the PD patient’s EEG signal by using conventional spectral method (e.g., fast Fourier transform [FFT]), time–frequency analysis (e.g., wavelet), or nonlinear time series methods (e.g., correlation dimension) [1, 812]. Pezard et al. (2001) found a significant decrease in EEG spectral power between the PD patients and normal subjects [10]. Muller et al. [13] revealed a significant difference in the resting-state EEG signals in PD patients compared to the normal subjects using correlation dimension. Han et al. [1] found that the PD patient’s EEG signals are characterized by higher entropy in the frequency domain. Thus, the understanding of the neural basis in PD is essential, both from a prognostic perspective and for the development of targeted therapeutic strategies. However, it is still unclear which measures can be more useful to reveal more important information regarding brain dysfunctions. Furthermore, the EEG has been increasingly used to recognize the cortical integrative functions and their subcortical driving structures. It has also been proved that computing EEG signals can provide a vital biomarker for many neurological disorders such as epilepsy, schizophrenia, Alzheimer’s disease [1417].

Over the years, several linear methods have been used to analyze the EEG signals and compute the central nervous system activity in PD patients [1, 10, 13]. However, such methods are not effective to identify the subtle variations in EEG signals due to their complex, nonlinear, and chaotic nature. Even though frequency-domain methods are used, the accuracy of spectral information decreases signal-to-noise ratio [18]. Despite that, nonlinear algorithms are widely used to unearth the hidden signatures from EEG signals [19]. Moreover, it has been shown that the development of PD is associated with slowing of EEG, reduction of its complexity, and the presence of the perturbations in EEG synchrony. This essential and hidden information can be evaluated by understanding the nonlinear components present in the EEG signals since it captures the momentary variations related to the properties such as similarity, predictability, reliability, and sensitivity of the signal. In recent years, subtle changes in biosignals have been extracted via the widely used higher-order spectra (HOS) method [2023]. The HOS-based bispectrum methods have been used for various biomedical applications such as epilepsy diagnosis [20], sleep stages [21], cardiac abnormalities [22], effective computing [23]. However, such studies have not yet been performed on PD patient’s EEG signals. This work, therefore, aims to extract the hidden changes in the EEG signals to help the automated classification of PD.

A description of materials (i.e., participants and EEG recording) used in this study is given in Sect. 2. In Sect. 3, the detailed description of methodology (includes preprocessing, HOS features, ranking, and classification) is presented, followed by experimental results in Sect. 4. We have also proposed “PD diagnosis index,” to find whether the recorded EEG signal belongs to normal or PD patient using a single numeric value. The interpretation of the results and conclusion of this study are given under Sects. 5 and 6, respectively.

2 Materials

2.1 Participants

After the approval from Hospital University Kebangsaan Malaysia’s (HUKM) Ethics Committee, the EEG signals of 20 idiopathic PD patients (10 women and 10 men, average age 59.05 ± 5.64 years, range 45–65 years) were acquired. The mean duration of PD was 5.75 ± 3.52 years (range between 1 and 12 years). The Hoehn and Yahr (H-Y) severity stage was I–III: Two PD patients were in stage I, eleven were in stage II, and seven were in stage III. The Mini-Mental State Examination (MMSE) scores were within the normal limits (26.90 ± 1.51 [range 25–30]). Exclusion criteria included the presence of other neurological disorders (e.g., epilepsy) or psychiatric conditions (e.g., depression) and any other severe mental illness. All the PD patients took levodopa (L-dopa) drugs in order to reduce the heterogeneity in the medication.

Twenty age-matched normal subjects (11 women and 9 men, average age 58.10 ± 2.95 years) with no history or symptoms of neurological or mental illness served are recruited. The scores of the MMSE for normal subjects were 27.15 ± 1.63. Both PD patients and normal participants were self-reported as right-handed, confirmed by the Edinburgh Handedness Inventory (EHI), and without impairments of hearing. Approval for this study was sought from all participants by explaining the potential risk involved.

2.2 EEG recordings

The participants were seated comfortably in a quiet room with eyes-closed state to attain a state of relaxed wakefulness and were instructed prior to the study to avoid any body movements such as eye movement/blinking during the experiment. The EEG signals were recorded for 5 min in eyes-closed resting state on 14-channel (AF3, AF4, F3, F4, F5, F6, F7, F8, T7, T8, P7, P8, O1, and O2) wireless (2.4 GHz band) Emotiv EPOC neuroheadset at 128 Hz sampling rate.

3 Analysis of the EEG data

Figure 1 displays the block diagram of the proposed methodology. It consists of signal preprocessing, HOS features, ranking, integrated Parkinson’s Disease Diagnosis Index (PDDI), and classification steps.

Fig. 1
figure 1

System for automated identification of PD patients

3.1 Preprocessing

The artifacts due to eye blinking were eliminated by thresholding technique during the preprocessing of EEG signals, by discarding the amplitudes of more than 80 µV. Then, the data were filtered using sixth-order Butterworth bandpass filter with lower and higher cutoff frequencies of 1 and 49 Hz in order to reduce the artifact components. The filtering is performed in forward and reverse, twice, to cancel the phase nonlinearity of the butterworth filter. In each channel, the artifact free signals were separated into EEG epochs of 2 s for further processing [24].

3.2 Feature extraction—higher-order spectra (HOS)

The HOS is a powerful tool used to study the nonlinear characteristics of the EEG signal [25]. It is a spectral representation of higher-order statistics. HOS has the ability to preserve the information due to deviations from Gaussianity and degrees of nonlinearities in the time series. As it is expected that the EEG signals have nonlinearities in the generating mechanism, the HOS analysis of PD patient’s EEG might reveal additional non-Gaussian and nonlinear information due to its certain advantage [25]. In this work, the third-order spectra of the signal called the “bispectrum” were implemented. It is defined as: \(B(f_{1} ,f_{2} ) = E[X(f_{1} )X(f_{2} )X^{*} (f_{1} + f_{2} )]\), where \(B(f_{1} ,f_{2} )\) is the bispectrum in the bifrequency \((f_{1} ,f_{2} )\), \(X(f)\) is the discrete-time Fourier transform of the given signal, * denotes complex conjugate, and E[] denotes the statistical expectation operation over an ensemble of possible realizations of the signal [25]. The bispectrum is the most accessible of the HOS as it is the simplest to compute (computational complexity increases with increasing order) and its properties have been well explored [20, 25, 26]. The bispectrum display symmetry is evaluated in the principal domain region (Ω) as given in [25]. Bicoherence is the squared-magnitude of the normalized bispectrum [25]. A total of thirteen bispectrum features are extracted, namely bispectrum mean magnitude (BiMag) [27], bispectral entropies (BiEnt1 and BiEnt2) [20], bispectrum phase entropy (BiPhEnt) [20], bispectrum moments [26] (sum of logarithmic amplitudes of bispectrum (H 1), sum of logarithmic amplitudes of diagonal elements in the bispectrum (H 2), first-order spectral moment of amplitudes of diagonal elements of the bispectrum (H 3), second-order spectral moment of amplitudes of diagonal elements of the bispectrum (H 4), first-order spectral moment of amplitudes of the principal domain region in the bispectrum (H 5), weighted center of bispectrum (WCB ix and WCB iy ) [28] and absolute weighted center of bispectrum (WCB ix and WCB iy ) [20]. These bispectrum features can capture the minute changes in the EEG signals to discriminate PD and healthy brain dynamics and can be used for automated diagnosis. To calculate these bispectrum features, epochs of 256 samples (2 s) with 50% overlap Hanning window and record of 256 NFFT points at 128 Hz sampling rate were used. The mathematical equation of thirteen extracted bispectrum features is subsequently given:

$${\text{Bi}}_{\text{Mag}} = \frac{1}{N}\sum {_{\varOmega } \left| {{\text{Bi}}(f_{1} ,f{}_{2})} \right|}$$
(1)
$${\text{BiEnt}}_{1} = - \sum\limits_{x} {q_{x} \log (q_{x} )} ,\quad {\text{where}}\quad q_{x} = \frac{{\left| {{\text{Bi}}(f_{1} ,f_{2} )} \right|}}{{\sum\nolimits_{\varOmega } {\left| {{\text{Bi}}(f_{1} ,f_{2} )} \right|} }}$$
(2)
$${\text{BiEnt}}_{2} = - \sum\limits_{x} {S_{x} \log (S_{x} )} ,\quad {\text{where}}\quad S_{x} = \frac{{\left| {{\text{Bi}}(f_{1} ,f_{2} )} \right|^{2} }}{{\sum\nolimits_{\varOmega } {\left| {{\text{Bi}}(f_{1} ,f_{2} )} \right|}^{2} }}$$
(3)
$${\text{BiPhEnt}} = \sum\limits_{x} {{\text{ph}}(\alpha_{x} )\log {\text{ph}}(\alpha_{x} )} ,\quad {\text{where}}\quad {\text{ph}}(\alpha_{x} ) = \frac{1}{N}\sum\nolimits_{\varOmega } {l(\varphi ({\text{Bi}}(f_{1} ,f_{2} )) \in \alpha_{x} )}$$
(4)
$$\alpha_{x} = \left\{ {\phi |{{ - \pi + 2\pi x} \mathord{\left/{\vphantom {{ - \pi + 2\pi x} M}} \right. \kern-0pt} M} \le \phi <{{ - \pi + 2\pi (x + 1)} \mathord{\left/ {\vphantom {{ - \pi + 2\pi(x + 1)} {M,x = 0,1,2, \ldots ,M - 1}}} \right. \kern-0pt} {M,x =0,1,2, \ldots ,M - 1}}} \right\}$$
$$H_{1} = \sum\nolimits_{\varOmega } {\log \left( {\left| {{\text{Bi}}(f_{1} ,f_{2} )} \right|} \right)}$$
(5)
$$H_{2} = \sum\nolimits_{\varOmega } {\log \left( {\left| {{\text{Bi}}(f_{D} ,f_{D} )} \right|} \right)}$$
(6)
$$H_{3} = \sum\limits_{m = 1}^{N} {m\log \left( {\left| {{\text{Bi}}(f_{D} ,f_{D} )} \right|} \right)}$$
(7)
$$H_{4} = \sum\limits_{m = 1}^{N} {(m - H_{3} )^{2} \log \left( {\left| {{\text{Bi}}(f_{D} ,f_{D} )} \right|} \right)}$$
(8)
$$H_{5} = \sum\limits_{\varOmega } {\sqrt {i^{2} + j^{2} } \log \left( {\left| {{\text{Bi}}(f_{i} ,f_{j} )} \right|} \right)}$$
(9)
$${\text{WCBi}}x = \frac{{\sum {_{\varOmega } i{\text{Bi}}(i,j)} }}{{\sum {_{\varOmega } {\text{Bi}}(i,j)} }}\quad {\text{and}}\quad {\text{WCBi}}y = \frac{{\sum {_{\varOmega } j{\text{Bi}}(i,j)} }}{{\sum {_{\varOmega } {\text{Bi}}(i,j)} }}$$
(10)
$$a{\text{WCBi}}x = \frac{{\sum {_{\varOmega } i\left| {{\text{Bi}}(i,j)} \right|} }}{{\sum {_{\varOmega } \left| {{\text{Bi}}(i,j)} \right|} }}\quad {\text{and}}\quad a{\text{WCBi}}y = \frac{{\sum {_{\varOmega } j\left| {{\text{Bi}}(i,j)} \right|} }}{{\sum {_{\varOmega } \left| {{\text{Bi}}(i,j)} \right|} }}$$
(11)

where N = total number of points within Ω region, \(\phi = {\text{phase angle of the bispectrm}}\), \(1(.) =\) function whose value will be 1 when the phase angle falls inside bin \(\alpha_{x}\), and i and j are bispectrum frequency bin index in the principal domain region, Ω.

3.3 Feature ranking/selection

The feature extraction step usually results in a more number of feature vectors, and many of these vectors may not contribute to differentiating the two classes. Hence, the most common idea is to apply ranking/selection algorithms on the extracted feature vectors to rank these feature vectors based on their discriminating ability. This as well reduces the complexity of classifiers without disturbing its performance. In this study, Student’s t test is used for this purpose [29, 30]. For each feature vector, the t test yields 2 parameters: (i) the p value—it is used to determine the significance of the extracted features, a low p value indicates a high significance—and (ii) t value—it is used to rank the feature vectors, a high t value indicates better rank and feature. To evaluate the performance, the ranked features are added one by one to a particular classifier until the highest classification accuracy is reached.

3.4 Parkinson’s Disease Diagnosis Index (PDDI)

It is time-consuming to develop an automated system as it involves extracting features, feature ranking, training, and testing. So, it is more convenient for the clinicians to use a single number that clearly separates the two classes. The concept of integrated index is first conceived by Ghista [31, 32] and further applied for the diagnosis of depression [33], epilepsy [34], sudden cardiac death [35], carotid plaque [36], thyroid [37], diabetes [38], and glaucoma [39]. Accordingly, we have proposed and formulated an integrated index called the Parkinson’s Disease Diagnosis Index (PDDI), by combining most distinguishing feature vectors in such a way that the integrated index value is distinctly different for normal and PD patients.

The PDDI is developed using highly ranked three features (H 1, Ent1, and H 2) from Table 1. We have developed a mathematical Eq. (12) by trial-and-error method in such a way that it clearly discriminates the two classes using a single number (PDDI). The mathematical formulation of this PD index is given by

Table 1 Mean ± SD of bispectrum feature vectors extracted from EEG signals for PD and normal subjects
$${\text{PDDI}} = \frac{{\left\{ {\left( {3.5*{\text{ENT}}1} \right) + \left[ {0.5*\left( {H1/H2} \right)} \right]} \right\}}}{10}$$
(12)

3.5 Classification

In decision tree (DT) classifier, the input feature vectors were used to construct a tree [40]. This tree provides the rules to classify the two classes, and these rules were used to determine the test data class. The performance of this classifier depends on how well the tree was designed. The Gini index was used to measure the impurity at each node [41]. Fuzzy K-nearest neighbor (FKNN) classifier designates a class based on the major class among the KNN. Here, Euclidean distance was used in FKNN to allocate the fuzzy class membership before taking decisions. To determine how heavily the distance was weighted when calculating each neighbor’s contribution to the membership value, fuzzy strength parameter (m) was used. In this study, the highest classification performance was obtained using m = 1.24 and k = 3. K-nearest neighbor (KNN) classifier calculates the minimum distance between testing and training data in terms of K-nearest neighbors [40]. The Euclidean distance and k = 2 were used for evaluating the separation. Naive Bayes (NB) classifier is a probabilistic classifier that works on Bayes theorem and on the assumption that the features are independent random variables [42]. Probabilistic neural network (PNN) is a multilayered feed forward network that uses the exponential activation function. In this study, the best performance was achieved using the smoothing parameter (σ) value 0.284. Support vector machine (SVM) classifier separates the training data into two classes in the feature space, by constructing a separating hyperplane [43]. The nonlinear signals which are not easily separable are converted to a higher-dimensional feature space using kernel functions. Polynomial kernel functions of order 2 and 3, radial basis function (RBF), and linear kernels were used in this work.

3.6 Performance measures

The tenfold cross-validation was used to evaluate the performance of the developed system, and the performance is evaluated using five different measures: sensitivity—true-positive value that quantifies the percentage of correctly classified PD patients among all PD feature vectors; specificity—true-negative value that quantifies the percentage of correctly classified normal subjects in all the healthy feature vectors; accuracy—% of correctly classified samples (both PD patients and normal subjects) in the total feature vectors; precision—% of correctly classified PD samples in all feature vectors recognized as PD; and F - score—harmonic mean of the precision and sensitivity. Herein, 20 participants per group with 150 EEG segments of 2 s per electrode which resulted in a total feature vectors of 3000 × 14 (electrodes) were analyzed.

4 Results

Figure 2a, b displays the magnitude bispectrum plots of normal and PD EEG signals. It can be observed from these figures that the magnitude in the bifrequency plane is unique for each class. It is also clear that the bispectrum has most of its magnitude within −0.2 to +0.2 (the bifrequency range) in the normalized scale (i.e., \(- 0.5 \le f_{1} ,\,f_{2} \le + 0.5\)). Figure 3a and b displays the bicoherence plots of normal subjects and PD patient’s EEG signals. These plots also indicate that the magnitudes are randomly distributed at various frequencies throughout the plot. In normal subject, the spread in the bifrequency plane is more compared to PD patient. This may be due to the fact that, as the person becomes diseased, the EEG signal becomes less chaotic.

Fig. 2
figure 2

Bispectrum plots of (a) PD patient EEG (b) normal subject EEG

Fig. 3
figure 3

Bicoherence plots of (a) PD patient EEG (b) normal subject EEG

In this work, a total of thirteen bispectrum features from EEG segments of 2 s were extracted. Table 1 presents the mean ± standard deviation (SD) of various HOS feature vectors. The feature vectors were ranked with respect to their t values. It can be noted from Table 1 that all the extracted bispectrum features are decreased for PD class. This observation suggests that the EEG signal became less complex for PD patients due to dysfunction in the neural circuits. The p value and t value indicated that almost all the features were clinically significant. Table 2 shows the results of classification using different classifiers. It can be noted that the SVM classifier using RBF kernel function (SVM-RBF) achieved an optimum mean accuracy of 99.62 ± 0.57%, sensitivity of 100.00 ± 0.00%, and specificity of 99.25 ± 0.53%. This performance was obtained using only three ranked bispectrum features, namely H 1, Ent1, and H 2. The EEG signals are nonlinear in nature; hence, the nonlinear kernel functions, such as RBF, perform well. Figure 4 displays the plot of accuracy vs number of features for various classifiers used. It clearly shows that the classifiers yielded better classification accuracy for the top three ranked features, beyond which there is a decrease in accuracy level. Figure 5 shows the plot of mean accuracy (%), sensitivity (%), and specificity (%) vs number of folds (tenfold cross-validation) using SVM-RBF classifier.

Table 2 Performance measures of various classifiers using PD patients and normal subjects EEG signals based on different combinations of HOS features
Fig. 4
figure 4

Plot of number of features versus average accuracy for the various classifiers used

Fig. 5
figure 5

Plot of average accuracy (%), sensitivity (%), specificity (%) vs number of folds (tenfold cross-validation for SVM-RBF classifier)

Table 3 shows the range of PDDI for normal and PD patients. Figure 6 shows the plot of PDDI for two classes. The table and figure enable to understand that there is a clear separation between the two classes, and hence, we can separate them using a single number.

Table 3 Range of PDDI for normal and PD classes
Fig. 6
figure 6

Plot of PDDI for normal subjects and PD patients

5 Discussion

In this study, a nonlinear method for an automated diagnosis of PD was proposed using HOS bispectrum features extracted from EEG signals. The novelty of this work is the formulation of PDDI and also proposed unique HOS plots for normal and PD classes. Table 4 lists the classification accuracies of works conducted in the diagnosis of PD. It can be understood from the table that most of the studies have used dysphonia-based features (group of vocal symptoms) to identify the PD. However, it is well known that PD is predominantly a motor disorder mainly caused due to the loss of dopamine-producing neurons in the basal ganglia. Also, non-motor impairment involving cognitive dysfunction in PD patients has often been noted [44]. Cognitive status is mainly associated with neurophysiological signals (e.g., EEGs). Subsequently, an understanding of the neuronal activity is important to compare voice impairment symptoms, for the advancement of both targeted therapeutic strategies and prognostic perspectives. It has been revealed that the analysis of EEG signals can help to show the disturbed subcortico-cortical mechanisms in PD patients or dementia [9, 10, 12]. Thus, this study was performed to develop an automated detection system using EEG signals for PD patients. In addition, it can be seen from the table that researchers have achieved accuracy between 70 and 99.5% using various methods.

Table 4 Summary of studies conducted on automated detection of PD and normal classes

Table 4 shows that the HOS-based method, proposed by the authors of this paper, has given the superior performance compared to all available modalities. Moreover, the best distinguishing features were selected and combined into an integrated index as shown in Eq. (1), to optimally separate the two classes.

The main salient features of proposed automated diagnosis system are as follows:

  1. 1.

    The method yielded an optimum mean accuracy: 99.62%, sensitivity: 100%, and specificity: 99.25% using bispectrum features.

  2. 2.

    The method proposed PDDI using short segments (2 s) of EEG signals. This can be used by the clinicians for the diagnosis of the PD using one numeric value.

  3. 3.

    The method was implemented using MATLAB software and can be installed in hospitals. This could help to reduce the workload of neurologists and to help the process of accurate diagnosis of the PD patients.

  4. 4.

    The developed technique is fully automatic, noninvasive, and robust.

  5. 5.

    Extracted HOS-based bispectrum features are more robust to noise, and the method can be extended to the diagnosis of other neurological disorders such as cerebral palsy etc.

The sensitivity of 100% and specificity of 99.25% using only three highly ranked HOS features were obtained. However, the proposed method needs to be tested with a larger database of PD patients belonging to diverse ethnic groups. This necessitates a large storage space to extract the features and run the classification algorithms. The study was conducted using Intel i7-2410 M processors @2.30 GHZ with 8.0 GB RAM. The entire experiment was carried out using MATLAB (version 8.1.0.604, R2014a). The average time required to train and test the system was 6.484 s.

The limitation of this study is that only 20 PD patients were participated and the data were only obtained from Malaysian race. In order to get a reliable index, huge dataset with data from diverse ethnic groups is needed.

6 Conclusion

This study presented an application of HOS features extracted from EEG signals for diagnosis of PD patients. The findings demonstrate that the proposed technique is able to discriminate PD from normal EEG signals using a single number (PDDI) clearly and also with an average sensitivity, specificity, and accuracy of 100, 99.25, and 99.62%, respectively, using SVM classifier. Thereby, our developed EEG-based automated system can be used as a promising alternative tool in the diagnosis of PD. The proposed index provides a distinct non-overlapping ranges for normal and PD classes. It can help the neurologists in faster and more accurate diagnosis of PD during their screening. The proposed technique can be extended in order to classify the severity levels of PD.