Introduction

Schizophrenia is a mental disorder that affects about one in 100 people. Also, among the psychological diseases, this is a serious one, suspending the person from all aspects of life based on the diagnostic criteria of the American Psychiatric Association (DSM-IV) [1], this disease often refers to personality rupture disorder because the thoughts and feelings of the patient won’t have a natural and logical connection with each other. The patient is also affected by cognitive impairment. These deficiencies can reduce expression memory, short-term memory, language performance, and other executive functions such as occupational disorders. These deficiencies can help in the early diagnosis of schizophrenia [2]. It is very important to treat schizophrenia as soon as possible after the onset. With delay ineffective treatment, patients may be at increased risk for brain volume loss with harmful implications for long-term treatment outcomes.

Extensive studies have been performed using electroencephalogram (EEG) signals based on DSM-IV criteria for the diagnosis of schizophrenia and comparing schizophrenic patients with healthy subjects [3]. Several researchers have reported that using EEG irregularities and paroxysmal dysrhythmias may have a positive effect on the early diagnosis and prediction of schizophrenia [4].

The analysis of measured EEG signals from schizophrenic patients seems to have significant differences in the two situations of resting and applying tasks. In task-based studies, the responses of neurons with time-locked are specific to the designed event; all other involuntary activities are usually considered background noises. Many studies suggested that the brain is a system that acts intrinsically, with inherent resting-state integration. External sensory information communicates with, rather than assigns, the operation of brain systems. Using resting-state EEG for the diagnosis of schizophrenia seems to have acceptable results. Also, people who were diagnosed, with schizophrenia were referred to a specialist for more accurate management and diagnosis.

In recent years, nonlinear methods and machine learning have been used to classify healthy individuals and patients. Today, the DSS systems are widely used to diagnose a variety of diseases, such as hepatitis [5, 6], Parkinson’s [7], etc., which show that intelligent systems that are used as DSS can save lives by helping to diagnose diseases. Boostani et al. [8], used support vector machine (SVM) and linear discriminant analysis (LDA) classifiers to classify resting EEG signals of 13 schizophrenic patients and 18 healthy controls. The extracted features of the autoregressive model coefficient (ARC), sub-bands frequencies (SF), fractal dimension (FD), and wavelet energy (WE) were used to classify the signals. Also, the genetic algorithm was used to select the best EEG signal recording channels and reduce the dimension of the extracted feature. Most of the channels from the temporal lobe regions of the brain were selected by the genetic algorithm, and the classification accuracy was improved. Boostani et al. [9] and Sabeti et al. [10] used direct linear discriminant analysis (DLDA), weighted distance nearest neighbor (WDNN), basic nearest neighbor (BsNN), naive Bayes (NB), DLDA, adaptive distance measure (ADM), LDA, Adaboost, fuzzy SVM (FSVM), and SVM classifiers to classify the both healthy and schizophrenic groups. The features extracted from the EEG signal included the AR model, SF, and FD. The cross-validation method of Leave-one-out was used to produce outputs with high accuracy and reliability. Their results showed that the SVM, WDNN, and DLDA classifiers in classification were highly accurate. In other two studies, Sabeti et al. [11] used Adaboost and LDA classifiers to classify the EEG signals of 20 schizophrenia patients and 20 healthy subjects in resting state. The extracted features of the measured EEG signals included Shannon entropy (ShEn), spectral entropy (SpEn), approximate entropy (ApEn), Lempel-Ziv complexity (LZC), and Higuchi fractal dimension (HFD). Channels with discriminatory information, including Cz, C3, T3, T4, Fp2, F3, F4, T5, and O2 most of which were situated in the frontal, temporal, and limbic regions. Li et al. [12] an chose artificial neural network (ANN) classifier to classify the EEG signals of depressed and schizophrenia patients as well as healthy individuals. The back-propagation on artificial neural networks (BP ANN) and Self-organizing competitive artificial neural networks (SOC ANN) were used. They extracted the power spectrum as a feature of the EEG signal. Darkhovsky et al. [13] focused more on reducing the dimensions of the extracted features, which is the power spectrum (PS). The autoregressive moving average (ARMA) model algorithm, which estimates PS coefficients, reduces the dimension of the features. After a significant decrease in size (from 96 to 4), SVM and random forest (RF) classifiers were used for classification. Boostani et al. [14] used three methods of phase-locking value (PLV), robust synchronization (RS), and synchronization probability (SL) to extract the feature from the EEG signal. The greedy overall relevancy (GOR) and Across-group variance (AGV) methods were employed to optimize the feature extraction. SVM, DLDA, and Modified nearest neighbor (BNN) classifiers were utilized to distinguish patients with schizophrenia and bipolar disorder. Jahmunah et al. [15] used non-linear entropy features to classify the EEG signal of healthy individuals and schizophrenic patients. The results of this study showed that the SVM classifier was more accurate than classifiers of the LD, k-nearest-neighbour (KNN), probabilistic-neural-network (PNN), and DT. Two other studies [16, 17] used a convolutional neural network (CNN) method based on in-depth learning to diagnose healthy individuals and schizophrenic patients. They converted the EEG signal to 2D using the short-time Fourier transform (STFT) to extract the useful features. A literature review on available techniques of schizophrenia classification has been provided in Table 1.

Table 1 A summary of the properties of the methods available in EEG signal processing

Here, a new classification approach with efficient features is introduced for the diagnosis of schizophrenic patients with healthy subjects. The proposed method is based on ANFIS to classify recorded EEG signals from 14 schizophrenic patients and 14 age-matched control participants. The 2nd order Butterworth filter is used to remove possible artifacts of 16 EEG channels. Four features, namely Shannon, spectral entropy, approximate entropy, and AVLSAC were extracted from each selected EEG channel in 5 frequency sub-bands, including Delta, Theta, Alpha, Beta, and Gamma. In addition, the AVLSAC method has a special initiative that does not seem to have been deployed. The results show that our proposed method is superior to the previously published methods.

Materials and methods

Clinical data

We tested our algorithm on 14 healthy controls (7 females: 28.7 ± 3.4 years, 7 males: 26.8 ± 2.9) and 14 patients with paranoid schizophrenia (7 females: 28.3 ± 4.1 years, 7 males: 27.9 ± 3.3 years). The patients were hospitalized at the Institute of Psychiatry and Neurology in Warsaw, Poland and the study protocol was approved by the Ethics Committee of the Institute of Psychiatry and Neurology in Warsaw [25]. The EEG signals were recorded in resting state during eyes-closed (EC) and were measured based on the 10–20 standard for 15 min with a sampling frequency of 250 HZ. All 19 EEG channels were constituted Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2 and the electrode FCz was considered as the reference electrode.

The selected areas included frontal, temporal, and occipital as well as 16 channels of Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, T6, O1, and O2 from the 19 main channels which have the greatest overlap with the involved areas are considered. The channels have been selected based on previous studies [11, 26, 27]. The position of the electrodes and brain areas selected for EEG signal processing are shown in Fig. 1.

Fig. 1
figure 1

The position of the electrodes and brain areas selected for EEG signal processing

Pre-processing

Initially, we preprocessed the EEG signals of 16 selected channels which were divided into 32 sections of 25 s to remove the artifacts with the second-order Butterworth filter. These segments were without overlap. In the second step, each of the 16 channels was divided into 5 sub-frequency regions of Delta, Theta, Alpha, Beta, and Gamma by the wavelet algorithm with the mother wavelet of Daubechies 2 (Dp2).

Five frequency regions of delta (∆), theta (θ), alpha (α), beta (β), gamma (γ) were used for more accurate analysis of the EEG signal. The regions of selected frequency filters included 2–4 Hz, 4.5–7.5 Hz, 8.5–12.5 Hz, 13–30 Hz, and 30–45 Hz, respectively. Figure 2 shows the frequency regions of the F4 channel of a healthy individual and schizophrenic individual.

Fig. 2
figure 2

The 5 frequency regions of the F4 channel EEG signal. a Healthy individual, b schizophrenia patient

Feature extraction

Feature selection is of paramount importance since different types of brain activity patterns can be decoded from EEG signals of patients. Four types of features, namely ShEn, SpEn, ApEn, and the absolute value of the highest slope of autoregressive coefficients (AVLSAC) are used for differentiation of healthy individuals and schizophrenic patients. These proposed features have an excellent reflection of the main nature of the EEG signal. The three entropy methods of spectral, Shannon and approximate, which are compatible with non-stationary and stationary signals, are described in detail [13, 28, 29]. In addition, the ALVSCA seems to be a suitable feature extraction method because it can increase the frequency resolution by using the AR method, which is a powerful algorithm in signal modeling [30].

The AR model with \(p\)-order predicts the input S via minimizing forward and backward prediction errors using the Burg method [31]. While the AR coefficients are finite, \({c}_{i}\), to satisfy the Levinson–Durbin returns:

$$ s\left( t \right) = - \mathop \sum \limits_{i = 1}^{p} c_{i} s\left( {t - i} \right) $$
(1)

Value p represents the order of the model. There are several methods to determine the model order, such as decision rules based on the Bayesian approach [32], the number of information measures [33], and the maximum likelihood approach [34]. In this study, the minimum error (threshold) method [35] is used to obtain the best order of the model. Also, we calculated the absolute value of the highest slope between the autoregressive coefficients. The order utilized in this study is 20. Figure 3 shows an example of the absolute value of the maximum slope in a specific window of the F4 channel of a healthy individual and a schizophrenia patient.

Fig. 3
figure 3

Example of the absolute value of the maximum slope in a specific window of the F4 channel EEG

Classifiers

Accurate classification and labeling of different groups are executed by proper classifiers. Three types of classifiers, SVM [36, 37], ANN [38], and ANFIS [39], were selected to classify the two groups, and it seems that these three classifiers have better power and better compatibility with the EEG signal of patients with schizophrenia.

SVM structure

Nonlinear SVM_ radial basis function (RBF): given a training set of instance-label pairs \(\left( {x_{i} ,y_{i} } \right)\), \(i = 1, \ldots ,l\) where \(x_{i} \in {\mathcal{R}}^{n}\) and \(y_{i} \in \left\{ { - 1, + 1} \right\}^{l}\), support vector machines (SVM) require the solution of the following (primal) optimization problem:

$$ \begin{gathered} \mathop {\min }\limits_{w,b,\xi } \frac{1}{2}\left( {w^{T} w} \right) + c\mathop \sum \limits_{i = 1}^{l} \xi_{i} \hfill \\ subject\,\,to\,\,y_{i} \left( {w^{T} z_{i} + b} \right) > 1 - \xi_{i} , \xi_{i} \ge 0, i = 1, \ldots , l \hfill \\ \end{gathered} $$
(2)

Here, training vector \({x}_{i}\) is mapped into a higher- (maybe infinite-) dimensional space by the function ∅ as \(z_{i} = \emptyset \left( {x_{i} } \right)\). \(C > 0\) is the penalty parameter of the error term.

Usually, Eq. 2 were used to solve the following dual problem:

$$ \begin{gathered} \mathop {\min }\limits_{\alpha } F\left( \alpha \right) = \frac{1}{2}\alpha^{T} \varrho \alpha - e^{T} \alpha \hfill \\ subject\,\,to\,\,0 \le \alpha_{i} \le C , i = 1, \ldots , l, y^{T} \alpha = 0. \hfill \\ \end{gathered} $$
(3)

where e is the vector of all ones and \(\varrho\) is an l by l positive semidefinite matrix.

The \(\left( {i,j} \right)th\) element of \(\varrho\) is given by \(\varrho_{ij} \equiv y_{i} y_{j} k\left( {x_{i} ,x_{j} } \right),\) where \(k\left( {x_{i} ,x_{j} } \right) \equiv \emptyset \left( {x_{i} } \right)^{T} \emptyset \left( {x_{j} } \right)\) is called the kernel function. Then \(w = \mathop \sum \limits_{i = 1}^{l} \alpha_{i} y_{j} \emptyset \left( {x_{i} } \right)\) and the decision function is as follows:

$$ sgn\left( {w^{T} \emptyset \left( x \right) + b} \right) = sgn\left( {\mathop \sum \limits_{i = 1}^{l} \alpha_{i} y_{j} k\left( {x_{i} x} \right) + b} \right) $$
(4)

The Gaussian radial basis function (RBF) kernel, two samples of \(\tilde{x }\) and \(\overline{x }\) samples are defined as feature vectors in some input spaces as follows:

$$ k\left( {\widetilde{x,}\overline{x}} \right) = \exp \left( { - \frac{{\tilde{x} - \overline{x}^{2} }}{{2\sigma^{2} }}} \right), \gamma = \frac{1}{{2\sigma^{2} }} $$
(5)

The best value for the kernel function coefficient, gamma (\(\gamma \)) = 0.6, was selected. As the gamma value increases, the algorithm tries to perform the fitting exactly based on the training data set, which leads to the generalization of the error and the over-fitting problem. The C parameter trades off the correct classification of training examples against the maximization of the decision function’s margin. The optimal value of C was considered 0.1. Figure 4 shows the architecture of the SVM and the structure of the network in detail.

Fig. 4
figure 4

The architecture of the SVM (a) and the structure of the network in detail (b)

ANN structure

A multi-layer perceptron neural network (MLPNN) network, which is one of the most generally used neural networks, has been a for the post-classification procedure and trained by a supervised method of learning known as backpropagation [40]. It is a feed-forward Artificial Neural network used for the classification of data. It has several layers that consist of nodes in a graph that is directed. Every layer is made to be fully connected to the subsequent layer. It is a variation of a linear-perceptron that possesses the capability to differentiate samples not linearly separable. Based on the error percentage in the given output obtained via comparing with the desired result, learning occurs inside the network by updating the connecting weights post-processing of every piece of data.

It is designed with three layers: an input layer, a hidden layer, and an output layer. Neurons in the input layer act as buffers for distributing the input signals \({z}_{p}\) to neurons in the hidden layer. Each neuron \(p\) in the hidden layer sums up its input signal \({z}_{p}\) after weighting them with the strengths of the respective connections \({w}_{np}\) from the input layer. Then it calculates its output \({y}_{n}\) by passing the sum through a nonlinear activation function, namely binary sigmoidal function, as presented in the following Equation.

$$ y_{n} = sigm\left( {\mathop \sum \limits_{p} { }w_{np} z_{p} } \right){ } $$
(6)

The backpropagation algorithm is a gradient descent algorithm [41]. When the momentum term is added, the algorithm gives the change \(\Delta w_{pn} \left( k \right)\) in the weight of a connection between neurons \(m\) and \(p\) as follows:

$$\Delta {w}_{pn}\left(k\right)=\gamma {\delta }_{p}{z}_{p}+\alpha \Delta {w}_{pn}\left(k-1\right)$$
(7)

where \({\delta }_{n}\) is an agent depending on neuron \(n\) is a hidden neuron or an output neuron, \(\gamma \) is a learning rate parameter and \(\alpha \) is the momentum coefficient. The output neuron is obtained as follows:

$${\delta }_{n}=\left(\frac{\partial f}{\partial {net}_{n}}\right)\left({u}_{n}^{(k)}-{y}_{n}\right) , k=1,.., n$$
(8)

where \(u_{n}^{\left( k \right)}\) is the desired output for neuron n and \( \frac{\partial f}{{\partial net_{n} }} = y_{n} \left( {1 - y_{n} } \right)\). Here f denotes the sigma function. Thus, iteratively, beginning with the output layer, the \(\delta \) term is computed for neurons in all layers and weight updates for all connections according to Eq. 7 [38].

Figure 5 shows the architecture of the perceptron neural network and the structure of the network in detail. The MLPNN is trained several times using different numbers of hidden neurons until achieving the best performance. The most optimal neuron with excellent performance was 50 and the learning rate was (γ) = 0.01.

Fig. 5
figure 5

The architecture of the perceptron neural network (a) and the structure of the network in detail (b)

ANFIS structure

The adaptive neuro-fuzzy inference system is a type of artificial neural network, which learns features in the data set and regulates the system parameters according to a given error standard. It is widely used in biological signal analysis. To improve the generalization, ANFIS classifiers were trained with the hybrid back-propagation of gradient descent (GD) method in combination with the least-squares estimation (LSE) technique. Feature vectors were used as inputs for ANFIS. Binary values of (1, 0) and (0, 1) were set as the target outputs for patients with schizophrenia and control subjects, respectively. For simplicity, we assume the fuzzy inference system under consideration with two inputs \({k}_{1}\) and \({k}_{2}\) and one output \(u\). Figure 6 shows our implemented ANFIS architecture. This system is based on the Takagi–Sugeno fuzzy inference system and the fuzzy rule structure of the ANFIS classifiers were created by utilizing a generalized bell-shaped membership function defined as follows:

$$ \beta A_{i} \left( {k_{1} } \right) = \frac{1}{{1 + \left\{ {\left( {\frac{{k_{1} - t_{i} }}{{a_{i} }}} \right)^{2} } \right\}^{{b_{i} }} }} $$
(9)

where \({a}_{i}\), \({b}_{i}\), and \({t}_{i}\) are adaptable parameters, \({k}_{1}\) is the input to node \(i\), and \({A}_{i}\) is the linguistic label. Next, two first-order Sugeno-type ANFIS models are implemented with inputs of feature vectors and one output. The first-order Sugeno fuzzy models have the following rules:

$$ \begin{gathered} R_{1} :\,if \left( {k_{1} \,is \,A_{1} \,and\, k_{2} \,is \,B_{1} } \right) \hfill \\ then\,u\,is \,f_{1} = n_{1} + p_{1} k_{1} + q_{1} k_{2} \hfill \\ R_{2} :\,if \left( {k_{1} \,is \,A_{2} \,and\, k_{2} \,is\, B_{2} } \right) \hfill \\ then\,u \,is \,f_{2} = n_{2} + p_{2} k_{1} + q_{2} k_{2} \hfill \\ \end{gathered} $$
(10)

where \(R_{i}\) is the \(i\) the rule of the fuzzy system, \(k_{i} \left( {i = 1,2} \right)\) are the inputs to the fuzzy system, and u is the output of the fuzzy system; \(\left\{ {n_{i} ,p_{i} , and\,q_{i} } \right\} \left( {i = 1,2} \right)\) are membership function parameters.

Fig. 6
figure 6

The architecture of ANFIS and (a) The structure of the network in detail (b)

Every node in layer 2 is a square node labeled \(\mathop \prod \limits_{ } \) which multiplies the incoming signals and sends the product out. For instance:

$${w}_{i}={\beta A}_{i}\left({k}_{1}\right)\times {\beta B}_{i}\left({k}_{2}\right)$$
(11)

Every node in layer 3 is a circle node labeled \(N\). The ith node calculates the ratio of the ith rule’s firing strength to the sum of all rules’ firing strengths:

$${\overline{w} }_{i}=\frac{{w}_{1}+{w}_{2}}{{w}_{i}}$$
(12)

Every node i in layer 4 is a square node with a node function:

$${\overline{w} }_{i}{f}_{i}={\overline{w} }_{i}({n}_{i}+{p}_{i}{k}_{i}+{q}_{i}{k}_{i})$$
(13)

The following formula produces the ANFIS output:

$$ U = \mathop \sum \limits_{i = 1,2} \overline{w}_{i} f_{i} = \frac{{\mathop \sum \nolimits_{i} w_{i} f_{i} }}{{\mathop \sum \nolimits_{i} w_{i} }} $$
(14)

ANFIS learns and updates all its modifiable parameters by using a two-pass learning algorithm, namely forward pass and backward pass. ANFIS trains its parameters, such as the number of linear parameters as well as ni, pi, and qi (antecedent parameters) for minimizing error between actual and the desired outputs using a hybrid of GD and LSE. For our data, bell-shaped member functions with number of 4 and epoch = 100 were the best.

Evaluation parameters

Cross-validation is a model evaluation method that determines the extent of generalizability and independency of the results that the statistical analysis on a data set suggests based on training data. The data set is divided into k equal subsets. From this k subset, n/k data for training and m/k data as a model test randomly were selected (\(m+n=k, n\ge m\)). This procedure is repeated k times. Sixty percent of individuals' feature vectors are used as training samples and 40 percent as test sets. The accuracy of individuals' feature vectors was calculated five times randomly for further validation, then the average of the five iterations was reported.

Figure 7 shows the confusion matrix. The accuracy and error of the results were obtained through the following mathematical formulas:

$$ Accuracy = \frac{TP + TN}{{FP + FN + TP + TN}} = 1 - Error $$
(15)

Here true positives (TP) are the number of positive samples that the model correctly classified as positive; true negatives (TN) are the number of negative samples that the model correctly classified as negative; false positives (FP) mean the number of negative samples that the model wrongly classified as positive; false negatives (FN) are the number of positive samples that the model wrongly classified as negative.

Fig. 7
figure 7

The confusion matrix

Result

Sixteen channels of 19 main EEG signal channels of 14 schizophrenia patients and 14 healthy controls were selected and divided into 5 frequency bands (delta, theta, alpha, beta, and gamma) because of the complex and non-stationary nature of the EEG signal [42]. Then these frequency regions were embedded in ShEn, ApEn, SpEn, and AVLSAC for extracting features. To calculate ApEn, ShEn, and AVLSAC respectively, the frequency region of each channel was divided into windows of six, four, and ten seconds. With an available signal length of 900 s, we divided them into 150 windows for ApEn, 225 windows for ShEn, and 90 windows for AVLSAC. Then the average of the extracted feature of each window was reported as a feature. To obtain the feature SpEn, the whole signal was used, and according to four feature extraction methods and five frequency regions (the length of each feature vector is 20), a feature vector was calculated for all channels of each person.

Figure 8 shows the accuracy of the ANFIS classifier containing 16 channels with SpEn, ShEn, ApEn, and AVLSAC as feature extraction methods. The range of accuracy shown in this figure is between 86 and 97. Accuracy outputs were calculated in five frequency bands. The best accuracy was obtained 97% for the ANFIS classifier in the alpha channel O1 with the AVLSAC method. The best available features for each channel in 5 frequency regions have been summarized in Table 2. In this table, five frequency regions, and four methods of feature extraction are separated for each of the 16 channels. The check mark in Table 2 indicates the best feature for each channel with the highest accuracy. A total of 46 of 640 possible features for healthy subjects and schizophrenic patients were selected. The accuracy of ANFIS, SVM, and ANN classifiers with the best features with five-folds were near 100%, 98.89, and 95.56 respectively.

Fig. 8
figure 8

The accuracy of the ANFIS classifier containing 16 channels with SpEn, ShEn, ApEn, and AVLSAC as feature extraction methods. This figure covers the accuracy range from 86 to 97

Table 2 The selection of the best available features for each channel in 5 frequency regions

The highest difference in standard deviation was obtained between the patient and healthy group in the channel O1, which can be seen in Fig. 9. The absolute value of this difference is 1.3, which is related to ALVSCA of alpha. Figure 9 shows a statistical chart of the mean and standard deviation of the channel O1 in 5 sub-frequencies and 4 feature extraction methods. This chart includes the channel O1 of 14 patients and 14 healthy individuals.

Fig. 9
figure 9

Statistical chart of the mean and standard deviation of the channel O1 in 5 sub-frequencies and 4 feature extraction methods

Discussion

Today, the use of DSS in the diagnosis of various diseases is relatively common; therefore, recent studies have endeavored to design and present DSS with acceptable accuracy [5,6,7]. DSS requires signals that contain useful information for proper functioning; for this reason, the EEG signal contains useful information on brain function and disease diagnosis that has been considered by the majority of researchers. The authors in this study attempted to provide an acceptable DSS for schizophrenia, using the information contained in the EEG signal. An ideal classifier should distinguish patients at a high rate while identifying all healthy controls with no diseases. Therefore, both the selection of the appropriate feature and the classifier must be simultaneous, which will ultimately lead to high accuracy.

The four features SpEn, ApEn, ShEn, and AVLSAC were selected as explained in Section “Feature extraction”. These features show the momentary changes and the complexity of the signal and are compatible with the non-stationary property of data. In addition, the ALVSCA has the ability to present the hidden nature of the signal due to its special property in increasing the frequency resolution. Figure 10 shows the performance of the features in the three classifiers of SVM, ANN, and ANFIS with a dataset for each channel presented in Section “Clinical data”. The accuracy range shown in this figure is selected between 92 and 97. According to Fig. 10, the most common area involved among the three classifiers is the frontal area in delta sub-frequency and the ALVSCA feature has acceptable performance and the highest accuracy among classifiers.

Fig. 10
figure 10

The performance of the features in the three classifiers of SVM, ANN, and ANFIS with a common dataset. The accuracy range between 92 and 97

We compared our results with certain different available methods. Table 3 presents a series of classification accuracy of the ANFIS method and earlier methods with common feature dimensions. It is usually used 5-fold or 10-fold for validation. Figure 11 shows a comparison of the accuracy results obtained with different k values from 5 to 10 in k-fold cross-validation for the three classifiers. These results show us that our method using 5-fold cross-validation obtains the highest classification accuracy (99.92%).

Table 3 The comparison results of the ANFIS accuracy with other available methods
Fig. 11
figure 11

The performance of three implemented classifiers with different k values from 5 to 10 in k-fold cross-validation

Our dataset was previously evaluated by Shu Lih Oh et al. [17] who reported achieving a classification accuracy of 98.07%. Their method was a CNN model that used raw EEG signals as inputs. The method used an eleven-layered CNN model to accurately assess schizophrenic patients versus controls by using the extraction, selection, and classification processes automatically. The previous dataset was also evaluated against a CNN model proposed by ZülfikarAslan et al. [16]. In their method, using 2-D time–frequency features for automatic diagnosis of schizophrenic patients predicts the class value, which obtained 97% accuracy. The second dataset was previously evaluated by Santos-Mayo et al. [48] who reported obtaining 93.42% classification accuracy, their method was a Multi-Layer Perceptron(MLP) that used EEGlab and J5 as feature extraction methods.

In comparison to these previous literature methods related to our work, our method outperforms the majority of them. Our improved performance seems to be largely due to three methods implemented on raw EEG signals: 1. the use of converting EEGs to sub-frequency regions that explicitly represent the signal frequency information, 2. using the ALVSCA feature extraction method, 3. and selecting the best feature. The ALVSCA seems to be a suitable feature extraction method because it can increase the frequency resolution by using the AR method, which is a powerful algorithm in signal modelling [30]. Table 4 summarizes the relevant literature methods according to the methodology and the accuracy obtained. As a result of the table, it is obvious that the method proposed in this paper outperforms all the methods mentioned in the table.

Table 4 The comparison results of our method and the different methods available on the common dataset

Conclusion

It is beneficial to present a new decision support system (DSS) as an interactive information system to handle and analyze large volumes of data by allowing for better informed decision makings, solve problems in a timely manner, and improve the efficiency in dealing with problems related to screening schizophrenia patients. In this paper, we present a new computerized information system that can receive a person’s EEG signal and distinguish schizophrenic patients and healthy subjects with high accuracy. Individuals who were diagnosed with schizophrenia were referred to a specialist for more accurate management and diagnosis.