1 Introduction

PD is a neurodegenerative disease of the central nervous system that causes partial or complete loss of the motor reflexes, speech, behavioral and mental processes, and other vital functions [1, 2]. In this disease, loss of the neurons that produce dopamine molecules in the brain is observed. It was described and named in 1817 by Dr. James Parkinson [3]. In a comprehensive study that has been carried out recently, the incidence of the disease was given as 20/100,000 [4]. It is known that there are more than one million patients with PD in North America alone [5]. In addition, it is estimated that currently 20 % of patients are not diagnosed correctly [6]. PD affects a significant part of the population and impacts on approximately 1 % of those over 50 years of age [7]. This ratio is expected to increase as people live longer, thus aging is an important risk factor in PD [8].

Some of the PD symptoms can be reduced with pharmacological and/or surgical intervention, and the life span of the patients can consequently be extended. Currently, no specific method has been developed for PD diagnosis. Specialists use many different measurement techniques such as the Unified Parkinson’s Disease Rating Scale (UPDRS), the Hoehn–Yahr Scale, the Schwab and England Scale of Activities of Daily Living, the Parkinson’s Disease Questionnaire 39, and the Parkinson’s Disease Quality of Life Questionnaire for measuring the severity of PD. The UPDRS is the most commonly used technique [9]. These scales are based on the history of the disease and usually help to detect the existence and severity of symptoms. However, these processes are known to be both time- and effort-consuming [10, 11].

In recent years, computer-based solutions research has considerably increased the support provided to medical decision making. When these studies are reviewed, it appears that the relationship between speech disorders and PD is proved [1214]. Also, many studies have indicated the reduction in the use of speech as the disease progresses [15, 16]. Therefore, speech samples of the patient are ideal in terms of a decision support system that can be used to perform a diagnosis. This is because it is a noninvasive technique, and the speech data can be collected easily. Speech samples have been used in several investigations with regard to the diagnosis of PD [1722].

Recent studies have proposed some machine learning methods using audio recordings associated with PD. Little et al. [23] aimed to analyze the stage of the disease by measuring the dysphonia that occurs due to PD. In their study, they made sound recordings of the constant “a” vowel of 31 subjects, including 23 patients with PD. Then, the dysphonia criteria were removed from these sounds and attempts were made to determine the level of the disease by remote monitoring. Shahbaba et al. [24] presented a nonlinear model based on a Dirichlet mixture for diagnostic purposes. Das [25] carried out a comparative analysis using four different methods. Guo et al. [26] proposed a method based on a genetic algorithm (GA) and expectation maximization (EM). Luukka [27] proposed a new method using fuzzy entropy measures and similarity classifiers. Li et al. [28] used a fuzzy-based nonlinear transformation approach with a support vector machine (SVM) with regard to a PD dataset. Ozcift et al. [29] submitted a new classification scheme based on SVM selected attributes to train rotation forest (RF) ensemble classifiers in order to improve the diagnosis of PD. Spadoto et al. [30] have proposed an evolutionary-based method involving an optimum-path forest (OPF) classifier for the diagnosis of PD. Polat [31] applied a fuzzy c-means clustering feature weighting (FCMFW) method with a k-nearest neighbor (KNN) classifier. Zuo et al. [32] used a new diagnostic model based on particle swarm optimization (PSO) for the diagnosis of PD. Sakar and Kursun [33] applied a common knowledge-based feature selection with permutation tests to determine the validity and statistical significance of the relationship between features of the illness with UPDRS scores and created a classification model by giving selected features to the SVM classifier. Chen et al. [34] proposed a detection system using a fuzzy k-nearest neighbor approach with principal component analysis (PCA). Ma et al. [35] obtained high accuracy rates with a kernel-based extreme learning machine followed by a subtractive clustering features weighting. Comparative information about previously performed studies of the diagnosis of PD is given before conclusion section.

In this study, a PD dataset comprising the features obtained from speech samples is used for the diagnosis of PD. As a method, a feature weighting and complex-valued classifier-based new hybrid model is proposed. Feature weighting is used to increase the classification performance. In this study, the KMCFW method is preferred as the weighting method. The aims of KMCFW are (i) to transform the nonlinearly separable dataset into a linearly separable dataset and (ii) to gather similar data points. New features were obtained after the weighting process was converted into a complex number format. In the final stage, these feature values were presented as complex-valued neural network (CVANN) input.

The outline of the study is as follows: General information about the dataset and the methods used in this study are presented in Sect. 2. In Sect. 3, the experimental results are presented. Finally, the outcome of the paper is given in Sect. 4.

2 Materials and methods

2.1 Data

The PD dataset used in this study, comprising speech samples, was created by Max Little with the cooperation of the National Voice and Speech Centre of the University of Colorado and the University of Oxford. It was obtained from the UCI (Machine Learning Repository) [36]. The dataset consists of 195 biomedical sound measurements taken from 31 people consisting of 8 healthy subjects and 23 with PD.

The features of the PD dataset used in this study are as follows: mean, maximum and minimum sound fundamental frequency, irregularity measures in terms of fundamental frequency, amplitude irregularity measurements, measurements of harmonics and the noise ratio, nonlinear dynamic complexity measurements, nonlinear fundamental frequency change measurements, and fractional exponent signal. Also, the PD dataset includes a status column defined as 0 for healthy and 1 for PD patients. Table 1 presents the statistical values of the features of the PD dataset with their definitions [37, 38].

Table 1 Statistical values and definitions of the features in PD dataset

2.2 K-means clustering-based feature weighting (KMCFW): Data preprocessing

The clustering method is a process of dividing the data into groups according to the similarity or uniqueness criteria between data points. Clustering algorithms are not only used for classification but are also used for data compression, feature weighting, and data reduction. The most commonly preferred clustering methods in terms of frequency of use are k-means clustering [39], fuzzy c-means clustering [40], mountain clustering [41], and subtractive clustering [42]. In this study, a data weighting process has been carried out using the k-means clustering (KMC) algorithm which is the most widely preferred in the literature.

In KMCFW, initially, the clusters of each feature are found using KMC. The distance between its cluster and the mean value of that feature is calculated. Features are weighted in accordance with the calculated distance [43].

The aim of the feature weighting method is to map the features according to their distributions in a dataset and also to transform them from nonlinearly separable datasets to linearly separable ones [43]. The feature weighting method works upon the principle that it decreases the variance in features forming the dataset. By means of this, data displaying the same features are gathered together, and the differentiation ability of the classifier is increased.

The k-means algorithm determines the cluster centers, based on minimizing the squared error-based cost function. The purpose of this algorithm is to locate the cluster centers as far away as possible from each other and to associate each data point with the nearest cluster center [44]. Euclidean distance is often used as a measure of uniqueness in a KMC algorithm. The Euclidean distance (J) is defined as in Eq. 1:

$$ J = \mathop \sum \limits_{i = 1}^{K} \left( {\mathop \sum \limits_{k} \|x_{k} - c_{i}\|^{2} } \right) $$
(1)

where k indicates the number of clusters, c i indicates the center of the clusters, and x k indicates the kth pattern in the ith cluster. This pattern is a member of the closest cluster center, and accordingly, the elements of binary membership matrix (u) are defined as in Eq. 2:

$$ u_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & \|{x_{j} - c_{i}\|^{2} \le \|x_{j} - c_{t}\|^{2} , \forall t \ne i} \hfill \\ {0,} \hfill & {\text{others}} \hfill \\ \end{array} } \right\} $$
(2)

where u ij indicates whether or not the jth pattern belongs to the ith cluster. Each cluster center minimizing the cost function c i is defined as in Eq. 3:

$$ c_{i} = \frac{{\mathop \sum \nolimits_{j = 1}^{N} u_{ij} x_{j} }}{{\mathop \sum \nolimits_{j = 1}^{N} u_{ij} }} $$
(3)

where N indicates the number of patterns.

The working of the KMC algorithm can be summarized as follows:

  1. 1.

    k units are selected randomly as initial cluster centers.

  2. 2.

    Units without cluster centers are assigned in accordance with the defined distance measure to the clusters that the initial cluster centers belong to.

  3. 3.

    New cluster centers are created by averaging the variables in k initial sets that were created.

  4. 4.

    Units are assigned to the closest clusters which are the newly created cluster centers. Distances are calculated.

  5. 5.

    The distances to the previous cluster centers are compared with the distances to the newly created cluster centers.

  6. 6.

    If the distances reduce reasonably, return to step 4.

  7. 7.

    If a fundamental change is not in question, the iteration is finalized and the algorithm is ended.

Briefly, the KMC feature weighting works as follows [43]: At first, the cluster centers are calculated using the KMC method. After calculating the centers of features, the ratios of means of these features to their center are calculated, and these ratios are multiplied by the data point of each feature. Figure 1 shows the flowchart of KMCFW. Figure 2 shows the pseudo-code of the feature weighting.

Fig. 1
figure 1

Flowchart of KMCFW

Fig. 2
figure 2

Pseudo-code of the feature weighting

2.3 Complex-valued artificial neural network (CVANN)

In a complex-valued neural network algorithm, input signals, weights, threshold values, and output signals are all complex numbers (Fig. 3). Recently, the use of complex-valued classifiers has increased for the solution of different classification problems [4549].

Fig. 3
figure 3

A three-layer complex-valued neural network [47]. It comprises input, output, and hidden layers. Each circle represents a single neuron. \( I_{N} , W_{lm} , \theta_{m} , O_{n} , z \) and f c (z) are all complex numbers

There are many studies in the literature emphasizing the advantages of complex-valued ANNs compared to real-valued ANNs [4547, 50]. These advantages are high-level functionality, better plasticity, and greater flexibility. Additionally, they learn faster and arrive at better generalizations [51]. Neurons in a complex-valued neural network have the ability to learn without generating higher degree inputs and progress to a higher-dimensional space. In addition, the study by Nitta et al. [50] can be examined to see the advantages of CVANN more clearly. This study shows that the XOR problem, which cannot be solved by using two-layered real-valued neural networks, can be easily solved using two-layered CVANN.

2.3.1 The mathematical model of the CVANN algorithm

The mathematical model of complex-valued neural networks is as presented below [52, 53]. The active value of the n neuron Y n can be defined as follows:

$$ Y_{n} = \sum\limits_{m} {W_{nm} I_{m} + \theta_{n} } $$
(4)

In Eq. 4, W nm is a complex-valued connection weight between the n neuron and the m neuron. I m is a complex-valued input signal of the m neuron, and θ n is a complex-valued threshold value of the n neuron. To obtain the complex-valued output signal Y n , the active value is converted into two components in the form of real and imaginary parts as shown below:

$$ Y_{n} = x + iy = z $$
(5)

Here, i stands for the value of \( \sqrt { - 1} \). Considering the various output functions of each neuron, the output function can be defined using the following equation:

$$ f_{c} \left( z \right) = f_{R} \left( x \right) + i.f_{R} \left( y \right) $$
(6)

f R (x) and f R (y) are expressed as the activation function of the neural network. Suppose that the sigmoid function is selected as the activation function. In this case, \( f_{R} \left( u \right) = 1/\left( {1 + \exp \left( { - u} \right)} \right), u \in R \) (R specifies the set of real numbers), the real and imaginary parts of an output of a neuron mean the sigmoid functions of the real part \( x \) and the imaginary part \( y \) of the net input \( z \) to the neuron, respectively.

Figure 3 presents the three-layered (input, hidden, and output) CVANN structure used in the study. W ml is the weight between the input layer neuron l and the hidden layer neuron m; V nm is the weight between the hidden layer neuron m and the output layer neuron n; θ m indicates the threshold value for the hidden layer neuron m; and γ n indicates the threshold value for the output layer neuron n. I l , \( H_{m} \), and O n indicate the input layer neuron l, the hidden layer neuron m, and the output layer neuron n, respectively. Similarly, U m and S n indicate the active values of the hidden layer neuron m and the output layer neuron n, respectively.

$$ U_{m} = \sum _{l} W_{lm} I_{l} + \theta_{m} $$
(7)
$$ H_{m} = f_{c} \left( {U_{m} } \right) $$
(8)
$$ S_{n} = \sum _{m} V_{mn} H_{m} + \lambda_{n} $$
(9)
$$ O_{n} = f_{c} \left( {S_{n} } \right) $$
(10)

In this study, the square error function was preferred. It is expressed as shown in Eq. 11 for p pattern:

$$ E_{p} = \left( {{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}} \right)\mathop \sum \limits_{n = 1}^{N} \left| {T_{n} - O_{n} } \right|^{2} = \left( {{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}} \right)\mathop \sum \limits_{n = 1}^{N} \left| {\delta^{n} } \right|^{2} $$
(11)

where N is the number of neurons in the output layer. (δ n = T n  − O n ) is the error between O n , obtained by the n output layer neuron, and T n , the target output. The square error can also be rewritten as:

$$ E_{p} = \left( {{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}} \right)\mathop \sum \limits_{n = 1}^{N} \left\{ {\left| {Re\left( {T_{n} } \right) - Re\left( {O_{n} } \right)} \right|^{2} + \, \left| {Im\left( {T_{n} } \right) - Im\left( {O_{n} } \right)} \right|^{2} } \right\} $$
(12)

In order to minimize the square error E p , the learning rule for the complex-valued back-propagation model is described below [54]. Configuration of weights and threshold values is done according to the following equations (where η > 0, η is a small learning constant):

$$ \Delta V_{nm} = - \eta .\frac{{\partial E_{p} }}{{\partial Re\left[ {V_{nm} } \right]}} - i.\eta \frac{{\partial E_{p} }}{{\partial Im\left[ {V_{nm} } \right]}} $$
(13)
$$ \Delta \lambda_{n} = - \eta .\frac{{\partial E_{p} }}{{\partial Re\left[ {\lambda_{n} } \right]}} - i.\eta \frac{{\partial E_{p} }}{{\partial Im\left[ {\lambda_{n} } \right]}} $$
(14)
$$ \Delta W_{ml} = - \eta .\frac{{\partial E_{p} }}{{\partial Re\left[ {W_{ml} } \right]}} - i.\eta \frac{{\partial E_{p} }}{{\partial Im\left[ {W_{ml} } \right]}} $$
(15)
$$ \Delta \theta_{m} = - \eta .\frac{{\partial E_{p} }}{{\partial Re\left[ {\theta_{m} } \right]}} - i.\eta \frac{{\partial E_{p} }}{{\partial Im\left[ {\theta_{m} } \right]}} $$
(16)

Expressions given from Eqs. 13 to 16 can be rewritten as follows:

$$ \Delta V_{nm} = \overline{{H_{m} }}\Delta \lambda_{n} $$
(17)
$$ \Delta \lambda_{n} = \eta \left( {Re\left[ {\delta^{n} } \right]\left( {1 - Re\left[ {O_{n} } \right]} \right)Re\left[ {O_{n} } \right] + i.Im\left[ {\delta^{n} } \right]\left( {1 - Im\left[ {O_{n} } \right]} \right)Im\left[ {O_{n} } \right]} \right) $$
(18)
$$ \Delta W_{ml} = \overline{{I_{l} }}\Delta \theta_{m} $$
(19)
$$ \begin{aligned}\Delta \theta_{m} & = \eta \left[ {\begin{array}{*{20}c} { \left( {1 - Re\left[ {H_{m} } \right]} \right)Re\left[ {H_{m} } \right]} \\ {x\sum\limits_{n} {\left( {\begin{array}{*{20}c} {Re\left[ {\delta^{n} } \right]\left( {1 - Re\left[ {O_{n} } \right]} \right)} \\ {Re\left[ {O_{n} } \right]Re\left[ {V_{nm} } \right]} \\ { + Im\left[ {\delta^{n} } \right]\left( {1 - Im\left[ {O_{n} } \right]} \right)} \\ {Im\left[ {O_{n} } \right]Im\left[ {V_{nm} } \right]) } \\ \end{array} } \right)} } \\ \end{array} } \right] \\ &\quad - i\eta \left[ {\begin{array}{*{20}c} {\left( {1 - Im\left[ {H_{m} } \right]} \right)Im\left[ {H_{m} } \right]} \\ {x\sum\limits_{n} {\left( {\begin{array}{*{20}c} {Re\left[ {\delta^{n} } \right]\left( {1 - Re\left[ {O_{n} } \right]} \right)} \\ {Re\left[ {O_{n} } \right]Im\left[ {V_{nm} } \right]} \\ { - Im\left[ {\delta^{n} } \right]\left( {1 - Im\left[ {O_{n} } \right]} \right)} \\ {Im\left[ {O_{n} } \right]Re\left[ {V_{nm} } \right]} \\ \end{array} } \right)} } \\ \end{array} } \right] \\ \end{aligned} $$
(20)

2.3.2 Summary of the CVANN algorithm

  • Initialization: Assign all weight and threshold values as numbers.

  • Submission of inputs and outputs (the target): Providing complex-valued input vectors I 1, I 2, I 3, …, I N and corresponding complex-valued output (target) vectors T 1, T 2, T 3, …, T N to the network. N is the number of patterns to be used in training.

  • Calculation of actual output: Calculate the actual output (Y n ). Actual output is calculated using Eq. 10.

  • Determining the error value: Calculating the error value depends on the obtained output and the target output value according to Eq. 11.

  • Changing the weight and threshold values: Update the weight and threshold values using the formulas in Eqs. 1720. Continue this process until the error is minimized.

2.4 Application and experimental results

In this study, a new hybrid model is proposed for PD diagnosis. As shown in Fig. 4, the proposed method consists of two steps: In the first step, features in the PD dataset were weighted using the KMCFW method. The aim of this method is to map the features according to their distributions in a dataset and to transform from linearly non-separable space to linearly separable space. Using this method, similar data in the same feature are gathered together. This will substantially help to improve the differentiation ability of the classifiers [31, 43]. In the next step, an input set was created by obtaining a complex value from two real values for CVANN input. For example, the first feature value is \( X_{1} \) and the second one is X 2. These two feature values are converted into the complex number format as X 1 + iX 2.

Fig. 4
figure 4

Block diagram of the proposed system for the diagnosis of PD

In this way, 11 complex-valued features were obtained from 22 feature values. The feature values obtained in the last step are classified using the CVANN algorithm. The block diagram of the proposed system is shown in Fig. 4.

Figure 5 shows a box graph representation of the original and weighted PD dataset with all 22 features. Figures 6 and 7 show the 3D distribution of two classes of the original and weighted 195 samples formed by the best three principal components obtained using the PCA algorithm. From Figs. 5, 6, and 7, it can be seen that the differentiation ability of the original PD dataset has been improved substantially using the KMCFW approach. After the data preprocessing step, the classification algorithm has been used and has differentiated the weighted PD dataset.

Fig. 5
figure 5

Original and weighted PD dataset

Fig. 6
figure 6

3D distribution of two classes in the original feature space

Fig. 7
figure 7

3D distribution of two classes in the weighted feature space

In the classification stage, the CVANN algorithm was preferred. The neural network architecture gives the highest accuracy rate, and its parameters were found empirically. Accordingly, the optimal network structure (input–hidden–output) has been identified as 11-10-1. The learning coefficient was determined as 0.9, and the maximum number of iterations was determined as 1000. Complex sigmoid was selected as the activation function.

The prediction performance of the KMCFW–CVANN method was tested using five different performance evaluation criteria, the formulations of which are given below. These criteria are accuracy, sensitivity, specificity, measure, and kappa statistic value, respectively.

$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{FP}} + {\text{FN}} + {\text{TN}}}} \times 100\,\% $$
(21)
$$ {\text{Sensitivity}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} \times 100\,\% $$
(22)
$$ {\text{Specificity}} = \frac{\text{TN}}{{{\text{FP}} + {\text{TN}}}} \times 100\,\% $$
(23)
$$ {\text{Precision}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FP}}}} $$
(24)
$$ {\text{Recall}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} $$
(25)
$$ {\text{f-measure}} = \frac{{2 \times {\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$
(26)

where the f-measure is composed of precision and recall values. TP is the number of true positives, which represents the fact that some cases within the PD class are correctly classified as having PD. FN is the number of false negatives, which represents that some cases within the PD class are classified as being healthy. TN is the number of true negatives, which represents that some cases within the healthy class are correctly classified as healthy, and FP is the number of false positives, which represents that some cases within the healthy class are classified as PD.

Kappa statistics is an alternative way for evaluating the accuracy of classifiers. Initially, it was introduced as a measure for measuring the degree of consistency between two observers [55]. Since then, it has been used in a variety of disciplines. In the field of machine learning, this measure is used to compare the accuracy of a classifier with the accuracy of a random classifier which estimates by chance. This measure is defined as:

$$ {\text{KS}} = \frac{{P_{0} - P_{c} }}{{1 - P_{c} }} $$
(27)

P 0 is the accuracy of the classifier, while P c is the accuracy obtained by random guessing with regard to the same dataset. The Kappa statistic values can be between −1 and 1. −1 indicates complete inconsistency (completely wrong classification), while 1 indicates perfect consistency (completely correct classification). The results obtained according to the said performance evaluation criteria are presented in Table 2. In addition, the results obtained by the application of the ANN method to the same feature values are added to the table. To make an equal comparison with the results obtained by different researchers, both k-fold cross-validation and 50–50 % holdout methods are preferred as data distribution methods. The experiment was repeated 10 times to determine the reliability and stability of the results, and the average values of the obtained values were selected. When we analyze Table 2, it can be seen that the CVANN method gives much better results compared to the real-valued ANN.

Table 2 Results obtained according to the performance evaluation criteria

The comparative analysis of the diagnosis of PD performed in this study in terms of previously performed studies is given in Table 3. As shown, our proposed method obtains better classification results than all the methods proposed in previous studies. The accuracy rates obtained by other researchers vary between 85 and 97 %. The proposed method gives better result with an accuracy of 99.52 %. In an important issue such as medical diagnostics and diagnostic systems, even a 0.1 % increase can be very important. Consequently, the proposed method is expected to make an important contribution to this field.

Table 3 Comparison of the results with the literature

There is no significant difference between the proposed method and the methods presented in Table 3 in terms of simplicity and computational load, the proposed method having two steps combining feature obtaining and classification. Using CVANN in the classification stage does not lead to an additional computational load. When analyzing the computation time, it can be seen from Table 3 that a complex-valued classifier allows faster classification compared to a real-valued classifier. As a result, the proposed method is fast and with a light computational load.

3 Conclusion

The paper presents an automated diagnostic system supporting the neurologist in the diagnosis of PD. The main novelty of this paper lies in the proposed system, which is entitled KMCFW–CVANN, that integrates an effective clustering features weighting method and a fast classifier. It allows the diagnosis of PD in an efficient and fast manner.

In this study, a Parkinson’s dataset comprising the features obtained from speech and sound samples was used. In the proposed method, KMCFW was used as a data preprocessing tool, with the aim of decreasing the variance in features of the PD dataset in order to further improve the diagnostic accuracy of the CVANN classifier.

It can be seen from the experiments that the complex-valued ANN method gives a much better result compared to real-valued ANN. The prediction performance of the KMCFW–CVANN hybrid method was tested with five different performance evaluation criteria. These are accuracy, sensitivity, specificity, f-measure, and kappa statistic value. The proposed method gave better results with an accuracy value of 99.52 %. With this value, it is clear that the proposed system outperforms other methods proposed in the literature.

All of this points to the fact that the proposed system using complex-valued classifiers can be shown to have a positive impact in terms of providing an accurate and rapid diagnosis of PD. It is projected that such high accuracy rates with regard to prediction can also be obtained in different medical diagnosis situations.