Keywords

1 Introduction

Alcohol addiction is perhaps the most widely recognized mental problem related to extensive horribleness and mortality [1]. As per the World Health Organization (WHO) report in 2014, practically 3.3 million individuals (5.9%) of overall deaths are because of alcohol utilization [2], which is the fifth driving reason for passings [3] and is the principal risk factor for early demise and handicap [4]. There is a wide scope of health impacts in wellbeing due to alcohol dependency, for example, liver and heart diseases, mental deficiency, certain malignant growths, and so forth. Likewise, alcohol consumption is a critical reason for different vandalization, like street crimes, road collisions, social issues, and adds to family breakdown [5, 20]. Alcoholics go through various intellectual insufficiencies, for example, learning and memory deficiencies, issues with motor abilities, and enduring conduct changes that incorporate nervousness and melancholy [6, 7].

The electroencephalogram (EEG) is one of most significant medical procedure for considering brain occasions, capacities, and problems. EEG signals measure recorded electrical movement produce by the firing of neurons around different brain regions [8, 22, 39, 42]. These recorded EEG signals are extremely intricate in nature and add to a lot of information to be taking into account. Usually, visual assessment is adopted to distinguish dissimilarities in EEG signals by talented clinicians whether the signs come from normal or alcoholic subjects. Indeed, even experienced clinicians can fail to identify the differences in signals because of the presence of noises [9, 40, 41, 43]. Subsequently, the inspiration of this study is to foster a programmed examination framework for the determination of alcoholism with worthy exactness, because of the expanded requirement for appropriate analysis and instruction of neurological anomalies. It will assist us to early admonitions about the approaching sicknesses [21].

For automated identification of alcoholic and normal EEG signals, in literature time-dependent, frequency-dependent, non-linear features based, Auto-regressive and time-frequency approaches are available. Either time or frequency domain methods are not suitable for non-stationary signals analysis because these signals have dynamic characteristics thus, time-frequency analysis is obligatory. In study [13], AR and fast Fourier transform (FFT) methods were utilized to estimate power circulation of EEG signals for classification purpose. In [14], a computerize method is proposed by integrating AR model, fuzzy-based adaptive approach and principal component analysis (PCA).

Several non-linear features such as approximate entropy (ApEn), largest Lyapunov exponent (LLE), sample entropy (SampEn), correlation dimension (CD), Hurst exponent (H), along with higher order spectral features are employed in studies [10,11,12] to classify normal and alcoholic EEG signals. In studies [15], time-frequency approaches are suggested for discrimination of normal vs alcoholic EEG signals. A spectral entropy based approach is proposed in research [16] indicating the suitability of gamma band to extract useful information related to alcoholism. Recently a graph based approach along with non-linear features is presented for identification of alcoholic subjects signals. Despite the extensive work in alcoholism EEG field, there is still gap to develop a stable automated system with few features in a way to provide high classification results with several performance measures.

Subject to the aforementioned issues, the proposed study presents a stable computer-aided diagnosis framework, which utilizes only one feature to obtain high classification outcomes for different performance evaluation parameters. In the alcoholism EEG field, most of the work focus only on classification accuracy and utilized several distinct features for the classification of EEG signals [29, 30]. In the proposed computerized system, we first divide each category’s EEG data into several segments with an optimal time interval, and artifacts are removed from each segment. We consider each segment as one signal. Secondly, we executed the auto-correlation of each EEG signal to enhance its quality and avoid its dependency on noises. Thirdly, we consider coefficients of autocorrelation as features, and these features are concatenated for decision making. At last, the statistically significant features are provided as an input to bayesnet, naïve Bayes, support vector machine with the linear and sigmoid kernel, logistic regression, multi-layer perceptron, simple logistic, sequential minimum optimization, voted perceptron, k-nearest neighbor, k star, locally weighted learning, AdaBoost, bagging, logit boost, rotation forest, decision stump, Hoeffding tree, J48, logistic model tree, random forest, and random tree. The results from these classifiers are verified with several performance measures named accuracy, sensitivity, specificity, precision, F-measure, area under the receiver operating curve (AUC), and Matthews correlation coefficient (MCC).

2 Materials

The EEG alcoholism dataset is acquired from human brains of alcoholic and control subjects. This dataset is publically available at (https://archive.ics.uci.edu/ml/datasets/EEG+Database). for research purposes. This dataset provides the 64 electrodes recoding on the scalps of the subjects. The places of the electrodes were situated with standard sites (American Electroencephalographic Association 1990). The examining frequency of recorded EEG signal 256 Hz. There are two categories of subjects: normal vs alcoholic EEG. In the two categories, there are 122 subjects and each subject finished 120 preliminaries. They give 32s EEG information division. There are three forms of the EEG datasets collections: the small collection, the large informational collection, and the full dataset. In this investigation, the small dataset collection is utilized for experimental purposes [17, 18].

Figure 1 shows the visual representation of alcoholic and control EEG signal.

3 Methods

In this investigation, an auto-correlation based approach is presented for categorization of normal and alcoholic EEG signals. The whole cycle of proposed framework is partitioned into following modules: pre-processing, auto-correlation as feature extraction, and classification as displayed in Fig. 2. The beforehand mentioned modules are examined underneath.

Fig. 1.
figure 1

Graphical representation of normal and alcoholic EEG signals

Fig. 2.
figure 2

Block diagram of the proposed auto-correlation framework for classification of normal and alcoholic EEG signals

3.1 Module 1: Pre-processing

The dataset includes the recorded EEG signals 256 Hz sampling rate and 12-bit resolution for 32 s (about 16400 samples). In this study investigations are done utilizing smaller data sets where the baseline filter effectively removes artefacts such as blinking and muscle movements (>73.3 \(\upmu \)v). The huge EEG recordings are divided into an eight-second window comprising 4 equal sections of 2048 samples for further investigation.

3.2 Module 2: Auto-correlation as Feature Extraction

A signal can also be correlated with its own other segments. The technique is known as autocorrelation by executing a cross correlation of same signal. The Autocorrelation function essentially measures how strongly a signal is correlated with shifting variants. Autocorrelation is helpful in identifying portions of a repeated signal and provide information how the signal corresponds to its neighbours. Determining how the next segments of a signal relate to each other gives some insight into the way in which intervening processes have created or changed the signal. For instance, a signal that remains exceptionally corresponded with itself throughout some duration more likely than not been delivered, or altered, by some interaction that considered past values of the signal. Such a process can be depicted as having memory, since it should recollect past values of the signal and utilize this data to shape the signal present qualities. The more drawn out the memory, the more the signal remaining parts somewhat connected with moved adaptations of itself. The mathematical formulation of autocorrelation function is given as follows [19],

$$\begin{aligned} A_{a a}[k]=\frac{1}{M} \sum _{m=1}^{\mathrm {M}} a[m] a[m+k] \quad k=0,1,2, \ldots K \end{aligned}$$
(1)

where M denotes the data points, k is a displacement and varies from 0 to K, and K changes dependent on the treatment of endpoints. Specifically, this approach involves matching the signal to K periods in which K might be fairly large. This shifting k is also named as lag and specify the amount of samples shifted for a specific correlation. If the autocorelation signals included time functions originally, lags may be translated into time shifts in seconds. Feature extraction chooses and/or incorporates elements into functions to efficiently reduce the quantity of data to be processed whilst also representing the actual range of data properly and thoroughly [25, 26]. In the EEG-based alcoholism field, several feature extraction approaches such as time domain, frequency domain, time-frequency domain, linear or non-linear approaches, however, each method inherent a deficiencys. Despite the intensive work, there is still a need to present effective features for the classification of normal and alcoholic EEG signals. In the present work, we employed autocorrelation coefficients as features. We executed autocorrelation functions for different lag values to obtain autocorrelation coefficients. After empirical evaluation, the lag value is set to 40 for each trial. Figure 3 represents the autocorrelation coefficients for trial 1 of normal and alcoholic class EEG signal. It is noted that there is significant difference among different class coefficients, representing the potential of features for better discrimination.

Fig. 3.
figure 3

Autocorrelation coefficients for alcoholic (blue color) and normal (red color) EEG signal (Color figure online)

3.3 Module 3: Classification

To segregate normal and alcoholic EEG signal features, we employed bayesnet (C1), naïve Bayes (C2), support vector machine with the radial basis function, linear and sigmoid kernels (C3, C4 and C5), logistic regression (C6), multi-layer perceptron (C7), simple logistic (C8), sequential minimum optimization (C9), voted perceptron (C10), k-nearest neighbor (C11), k star (C12), locally weighted learning (C13), adaBoost (C14), bagging (15), logit boost (C16), rotation forest (C17), decision stump (C18), Hoeffding tree (C19), J48 (C20), logistic model tree (C21), random forest (C22), and random tree classifiers (C23) [23, 24].

4 Results and Discussions

The statistical significance of features extracted from all trials is tabulated in Table 1, where, T represents the trial. It is clearly shown that there is significant difference among mean and standard deviation (STD) of both categories. The probability (P) values for all features are also very small, indicating the statistical significance of features.

Table 1. Statistical analysis of features

The performance of the proposed framework is measure by following performance measures [27, 28]:

  • Accuracy: Ratio of estimated labels to total labels.

  • True Positive Rate (TPR): The potential to recognize alcohol EEG signals accurately.

  • True Negative Rate (TNR): The capability to recognize normal EEG signals accurately.

  • Precision: The closeness between alcoholic and normal EEG signals.

  • F-measure: A single numeric value measure to explore balance between sensitivity and precision.

  • Matthews correlation coefficient (MCC): It consider true positive, true negative, false positive and false negative to measure the quality of alcoholic and normal EEG classification.

  • Area Under the Receiver Operating Curve (AUC): AUC measures a value between 0 and 1, the value near to 1 indicating the authenticity of a classifier for classification of alcoholic and normal EEG signals.

Figure 4 shows the accuracy, TPR, TNR and F-measure classification performance measures (in %). In term of classification accuracy the proposed framework deliver 98.75% classification accuracy results for 10 classifiers, which indicate the effectiveness and flexibility of framework. The minimum classification accuracy of 93.75% is achieved by bayesnet (C1), which is only 5% less than the highest results. The TPR i.e. detection capability of classifiers for alcoholism EEG class is 100% for most of the cases, whereas, TNR i.e. detection ability of classifiers for normal EEG signals is 97.5%. It is worth noting that the difference between TPR and TNR is only 3.5% indicating that the proposed framework is fairly stable in recognition of alcoholic and normal EEG signals. Additionally, precision and F-measure best results are 98% and 98.9% accordingly. Figure 5 indicates the MCC and AUC results deliver by all classifiers. It is seen in Fig. 5 that C7, C8, C15, C17, C21 and C22 obtained 1 value which indicate that these classifiers provide accurate results. In addition, 0.95 value is achieved by C1 and the difference among different classifier results is very small as seen in Fig. 5. On the other hand, C4, C6, C7, C8, C9, C11, C16, C17, and C21 deliver 0.98 value which is also very closed to 1. These results clearly indicate that proposed framework is effective and stable and can be utilized for classification of alcoholic and normal EEG signal identification.

Fig. 4.
figure 4

Classification results for proposed framework

Fig. 5.
figure 5

MCC and AUC results for proposed framework

Fig. 6.
figure 6

Comparison of proposed framework with available literature

The proposed study classification accuracy results are compared with other available literature as shown in Fig. 6. It is understood from studies [12, 31,32,33,34,35,36,37,38] that all these students require signal decomposition approaches, non-linear features and higher-order statistical features, which suffer mode mixing, complexity and noise artifact issues. On contrary the proposed auto-correlation coefficients based features are relatively simple and help to reduce noise artifacts. In comparison with above-stated studies, the proposed study provide upto 15.77% improvements in detection of depression patients from normal subjects.

5 Conclusion

A computerized method for classification of normal and alcoholic subjects is design in the present study. The proposed system perform segmentation, signal enhancement by autocorrelation, feature extraction and concatenation, and classification. The autocorrelation coefficients are taken as features and classified by twenty-three classifiers. The results suggest that the 40-lag autocorrelation coefficients tested with the support vector machine, logistic, multilayer perceptron, simple logistic, sequential minimal optimization, K-Nearest Neighbors, K star, LogitBoost, rotation forest, and Logistic Model Trees results in average classification accuracy of 98.75%, sensitivity 100%, specificity 97.5%, precision 98%, F-measure 98.9%, area under the receiver operating curve 98.7%, and Matthews Correlation Coefficient (MCC) 97.77%. The achieved results are better than the state-of-the art.