Keywords

1 Introduction

Sleep scoring is a part of sleep neurobiology closely related to cognitive neuroscience and helps in understanding the neural basis of various cognitive functions such as learning and memory. Therefore sleep stage scoring is of fundamental importance in the neuroscience framework for the discovery of pathologies for instance insomnia, hypersomnia, circadian rhythm disorders, epilepsy and sleep apnea. Consistently, the neuronal system’s functional changes are quantified by electrophysiological technique called polysomnography (PSG) involving electroencephalography (EEG), electrooculography (EOG) and electromyography (EMG). Polysomnography is a traditional standard for sleep stage classification based on visual inspection of physiological signal by sleep specialists according to the Rechtschaffen and Kales’s (R&K) guidelines [1] suggested in the year 2012 or the manual suggested by the American Academy of Sleep Medicine in the year 2015 [2]. The R&K rule enables the interpretation of sleep by 30 s epoch into six different stages: Wake, REM, N- REM stages S1, S2, S3 and S4. Whereas, the AASM manual classifies sleep into five stages by combining the N-REM stage 3 and N-REM stage 4 to a single stage. Sleep stage scoring by visual inspection has multitudinous problems: troublesome, time-consuming and fallible due to fatigue. In order to overcome these issues automatic approaches which are sufficiently precise, vigorous, extensible and cost effective have been developed for sleep stage classification. Researches that use multiple physiological signals for automatic sleep stage classification [3] are associated with complex preparation algorithms, limit subject’s movements and various problems. In order to overcome the above issues and the EEG signal’s capability to study the dynamics of neural information processing of the brain, the automatic sleep stage classification using only the EEG signal gathered the sleep research committee’s consciousness. The works [4,5,6,7] for automatic sleep stage scoring based on single channel EEG suggests that the single channel EEG based analysis is a suitable way of sleep stage scoring and Pz-Oz channel is more accurate than the Fpz-cz channel. According to result obtained by [7, 8], due to the non-stationary and non-linear characteristics of EEG signal the DWT is much more applicable for sleep stage classification when compared with their counterparts in time domain. Since, the cepstral feature implements framing and windowing of the signal being a part of feature extraction and integrates time-localization information they can be implemented in sleep stage analysis to yield better classification performance. In addition, the cepstral features are more robust in presence of nuisance variation in the signal and finds application in numerous researches [9,10,11]. This work proposed a new methodology for sleep stage scoring by incorporating the feature extraction method based on the Mel-frequency cepstral coefficients in composite with the DWT. The work is focused on single channel (Pz-Oz) EEG analysis where, the EEG signal is initially segmented into epochs of 30 s duration. Then, the detailed and approximation coefficients are calculated by decomposing the cerebral rhythm into five different sub bands: γ rhythm (>25 Hz), β rhythm (12–25 Hz), α rhythm (6–12 Hz), θ rhythm (3–6 Hz) and δ rhythm (<4 Hz). Each rhythm is associated to specific sleep stage classified by computing the short term power spectrum of the EEG signal based on linear cosine transform of a log power spectrum on a non-linear Mel scale of frequency from the wavelet coefficients. Next, the statistical property of the cepstral coefficient is computed. Finally, the MFCC statistical feature vectors are used to train the GMM-EM classifier for sleep stage classification.

The rest of the paper is arranged as follows: Sect. 2 gives an overview of the existing methods. Section 3 describes the proposed methodology. Section 4 presents the experimental results. Section 5 fetches the conclusion of this work.

2 Related Works

This section discusses the existing methods related to this work. In most researches, the features extracted from the EEG signal are forwarded to the classifier to disintegrate the EEG signal into six possible sleep stages. Suily et al. [12] introduced a new clustering technique for feature extraction and least square support vector machine for classification. The experiment is conducted on the publicly available epileptic EEG, motor imagery EEG data, and mental imagery task EEG dataset with classification accuracy of 94.55%, 84.52% and 61.60% respectively. Bajaj and Pachori [13] suggested smooth pseudo Wigner-Ville distribution to distinguish the EEG signal into different sleep stages based on their Time Frequency images (TFIs). The histograms of the segmented TIFs are used by the multiclass least squares support vector machine for classification. Hsu et al. [14] used the energy features extracted from the EEG signal to differentiate the sleep stages by using the recurrent neural classifier. Herrera et al. [15] employed wavelet transform, Hjorth parameters and symbolic representation to extract different combination of features. The features are ranked using normalized mutual information extraction and fed into SVM classifier for classification. Besides, stacked sequential learning approach is used to improve the classification results.

Sen et al. [16] produced a correlative learning on sleep stage classification by practicing different feature selection: time domain features, frequency domain, time frequency features, linear features and classification algorithms: Random forest, Feed-forward neural network, SVM, radial basis function neural network and decision tree. Zhu et al. [7] introduced sleep stage classification based on single channel EEG by utilizing the concepts of visibility graphs and horizontal visibility graph to extract the features. The corresponding graph features are forwarded to the SVM classifier for classification.

Hafeez allah Amin et al. [17] computed the relative wavelet energy of EEG signal by applying the Discrete Wavelet Transform (DWT). The experimental result shows the comparison of classification performance by SVM, MLP, K-NN and Navie Bayes. Mohammed Diykh et al. [18] suggested a classification method by mapping the derived statistical property and the EEG segment to complex network. K-means classification technique is practiced on two sets of twelve and nine features. Nandini Sengupta et al. [10] proposed a feature set computed from the statistical properties of cepstral coefficients to classify the lung sounds into three different types. It is observed that the statistical property from the cepstral coefficients yield better results when compared to the wavelet coefficients in terms of classification accuracy. It is also observed that the statistical properties from the cepstral coefficients consume less computational overhead in comparison with the baseline cepstral features.

The proposed system introduces cepstral coefficients based feature extraction technique for automatic sleep stage analysis. The proposed system aims to involve extraction of statistical properties from the robust cepstral coefficients. The extracted features are incorporated to GMM-EM pattern recognizer. The classification results are analyzed in terms of different evaluation metrics such as accuracy, sensitivity and specificity. The above discussed metrics are found to be high compared to other conventional methods.

3 Proposed Methodology

The key aspect of this proposed work is to evolve an efficient sleep stage scoring system to classify the single channel EEG into one of the six possible stages according to the R&K recommendation. The sample EEG epoch of each sleep stage is shown in the Fig. 1. The proposed automatic sleep stage scoring predominantly advances through the steps of preprocessing, wavelet decomposition, computation of cepstral coefficient, feature extraction and classification as illustrated in the Fig. 2. In the initial step the sleep EEG signal is segmented into epochs of 30 s duration. In the second step, each epoch is decomposed into different frequency rhythms by applying the DWT. Feature extraction, the third step of the proposed system procures through the compact characterization of large data set without losing distinct information. Then, from the wavelet coefficients the short term power spectrum of the EEG signal based on linear cosine transform of a log power spectrum on a non-linear Mel scale of frequency is computed. The final classification step assigns each epoch to their analogous sleep stage. The MFCC statistical feature vectors are used to train the GMM-EM classifier for sleep stage classification. The test procedure was realized by using the validation data prepared from the 3 hrs data of the subjects SC4001E0 to SC4051E0 of Physionet Bank Expanded sleep EDF database shown in Table 1.

Fig. 1.
figure 1

Sample EEG epoch of various sleep stages

Fig. 2.
figure 2

Structure of the proposed methodology

Table 1. Validation data prepared for the test procedure

3.1 Data Description

The data utilized by the proposed work to conduct the experiment is from the publicly available Physionet’s Sleep-EDF data set [19], widely adopted in the literature [6, 7]. The signals are recorded from Caucasian males and females of age ranged from 21–35 years by employing a miniature telemetry system. The signals are recorded at 100 Hz sampling rate during 24 h of subject’s daily life. This work utilized the EEG signal from Pz-Oz channel where all others signals are discarded.

3.2 Discrete Wavelet Transform

The DWT is used to extract local features from the biomedical signals especially for EEG signal due to its non-stationary and non-linear characteristics. The concept of DWT is to decompose the signal into multilevel successive frequency rhythms by employing a set of scaling function (ϕ) and wavelet function \( \uppsi \) given by,

$$ {\text{DWT }}\left( {\text{j, k}} \right) = \frac{1}{{\sqrt {\left| {2^{\text{j}} } \right|} }}\int\limits_{ - \infty }^{ + \infty } {{\text{x}}\left( {\text{t}} \right)\Psi \left( {\frac{{{\text{t}} - 2^{\text{j}} {\text{k}}}}{{2^{\text{j}} }}} \right){\text{d}}\left( {\text{t}} \right)} $$
(1)

This represents the signal as a series of approximation coefficient and the detailed coefficient. Where, the approximation coefficient is the outcome of the high pass filter g(n), the discrete mother wavelet and the detailed coefficients is the outcome of the low pass filter h(n), its mirror version. The approximation and the detailed coefficient at the first level decomposition are represented by A1 and D1 respectively. A1 is further disintegrated and the procedure is repeated till the specified number of decomposition level is achieved as shown in the Fig. 3(a). At each level, filtering doubles the frequency resolution and down sampling halves the time resolution. In this work, the normalized DWT of the Daubechies family with two vanishing moment is employed to analyze and decompose the EEG signal into multilevel successive frequency bands. The Db2 is chosen due to its efficiency in capturing the data variation only with two null moments.

Fig. 3.
figure 3

(a) Structure of 4 levels DWT decomposition (b) DWT decomposition structure of the proposed method

The dataset used for this experiment is recorded at a sampling frequency of 100 Hz. Therefore, on obeying the nyquist theorem four levels of decomposition is required to achieve the required frequency bands of the sleep EEG signal. The structure of 4 level DWT adopted in this work with the corresponding frequency range is shown in the Fig. 3(b). Each sleep stage is characterized with particular EEG rhythm tabulated in the Table 2. The four level decomposition of an epoch during which the subject is in REM state is shown in the Fig. 4.

Table 2. EEG rhythms corresponding to the sleep stages
Fig. 4.
figure 4

Wavelet decomposition of an epoch at which the subject is at REM stage

3.3 Mel Frequency Cepstral Coefficient

Mel Frequency Cepstral coefficient is a static feature extraction method that depends on the spectral analysis of the signal with a fixed resolution along a subjective frequency scale called the Mel Frequency scale. The structure of MFCC feature extraction is shown in the Fig. 5. The input Sleep EEG is firstly framed and windowed. Windowing is a point wise multiplication of the frame and the window function in time domain. The concept of applying the window function is to minimize the spectral distortion to increase the continuity of the adjacent frames. Then, the FFT is applied on the frame and the magnitude of the resulting spectrum is warped onto the Mel-scale. The idea behind the FFT is to represent a signal as the sum of properly chosen sinusoidal waves. The Fast Fourier Transform converts the frames in time domain to frequency domain which is defined on the set of N samples as follows:

Fig. 5.
figure 5

Structure of MFCC computation

$$ {\text{w}}\left( {\text{n}} \right) = 0.54 - 0.46\cos \left( {\frac{{2\pi {\text{n}}}}{{{\text{N}} - 1}}} \right).\;0\; \le \;{\text{n}}\; \le \;{\text{N}} - 1 $$
(2)

The result after this step is the spectrum or periodogram of the EEG signal. Then the log amplitude of the spectrum is mapped onto the Mel scale using triangular filters to obtain the Mel spectrum. The Mel scale is a mapping between the real frequency scale (Hz) and the perceived frequency scale (Mels). The mapping is virtually linear given by,

$$ {\text{m}} = 2 5 9 5 {\text{log}}_{ 1 0} \left( {\frac{\text{f}}{ 7 0 0}{ + 1}} \right) $$
(3)

In the next step, the log Mel spectrum is converted back to time domain by applying the Discrete Cosine transform. The final resulting is the Mel Frequency Cepstral Coefficients obtained by,

$$ {\text{c}}_{\text{n}} = \sum\limits_{\text{k = 1}}^{\text{k}} {{ \log }\left( {{\text{S}}_{\text{k}} } \right){ \cos }\left[ {{\text{n}}\left( {{\text{k}} - \frac{1}{2}} \right)\frac{\uppi}{\text{k}}} \right]} , {\text{n}} = 0 , 1 ,\ldots {\text{k}} - 1 $$
(4)

Where n = 0, 1, …, N − 1, k is the number of filters, N is the number of coefficients and c(n) is the Mel frequency Cepstal Coefficient.

The attractive features of MFCC are confirmed by the histograms of MFCC for various sleep stages illustrated in the Fig. 6. It is noticed that the shape and the range of frequency values are markedly distinct among different stages of sleep conforming the usefulness of MFCC for sleep EEG signal analysis.

Fig. 6.
figure 6

Histograms of MFCCs for various sleep stages

3.4 Feature Extraction

The feature extraction is used to extract relevant information from the EEG recording for evaluation and understanding of the desired cognitive processes. The main goal of Feature extraction is to reduce the dimensionality of large volume of signal data without any loss of information. The extracted feature has direct impact on the systems classification performance. Hence, extracting suitable features from EEG signals to get high classification performance is mandatory. In this work the statistical features energy, envelope kurtosis, envelope skewness, Standard Deviation and variance are extracted from the MFFCs computed on the wavelet coefficients of each decomposition level.

3.4.1 Energy

The energy of the signal in discrete form is calculated by the given equation,

$$ {\text{E = T }}\sum\limits_{{{\text{n}} - 0}}^{{{\text{N}} - 1 }} {{\text{x}}^{2} \left[ {\text{n}} \right]} $$
(5)

Where, T is the duration and x[n] is the discrete samples.

3.4.2 Variance

Variance is the measure of how far a set of numbers is spread out from its mean.

$$ \upsigma^{ 2} = \frac{{\sum {\left( {{\text{x}} -\upmu} \right)^{2} } }}{\text{N}} $$
(6)

Where, x is variable, N is the number of variables and μ is the mean.

3.4.3 Standard Deviation

Standard deviation is the measure of dispersion of a dataset given by,

$$ {\text{Std}}\left( {\text{x}} \right) = \sqrt {\frac{{\sum {\left( {{\text{x}} -\upmu} \right)^{2} } }}{\text{N}}} $$
(7)

Where, x is the variable, N is the number of variables and μ is the mean.

3.4.4 Envelope

Envelope is a smooth curve outlining the extremes of the signal detected by using the Hilbert transform defined by,

$$ {\text{h}}_{\text{x}} = {\text{hilbert transform}}\left( {\text{x}} \right) $$
(8)

Where, x is the input dataset.

3.4.5 Envelope Kurtosis

Kurtosis is a non-dimensional quantity that measures the peakedness of a dataset.

$$ {\text{Kurt}}\left( {\hat{f} ( {\text{x)}}} \right){ = }\frac{{E\left( {\hat{f} ( {\text{x)}} - \mu_{{\hat{f} ( {\text{x)}}}} } \right)^{4} }}{{\left( {{\text{E}}\left( {\hat{f} ( {\text{x)}} - \mu_{{\hat{f} ( {\text{x)}}}} } \right)^{2} } \right)^{4} }} $$
(9)

Where, hx is the Hilbert transform of the dataset x and μhx is the mean of hx

3.4.6 Envelope Skewness

Skewness defines the extent to which a distribution differs from a normal distribution.

$$ {\text{Skew }}\left( {{\hat{f}} ( {\text{x)}}} \right) = \frac{{{\text{E}}\left[ {\left( {{\hat{f}} ( {\text{x)}} -\upmu_{{{\hat{f}} ( {\text{x)}}}} } \right)^{3} } \right]}}{{\left( {{\text{E}}\left[ {\left( {{\hat{f}} ( {\text{x)}} -\upmu_{{{\hat{f}} ( {\text{x)}}}} } \right)^{2} } \right]} \right)^{{\frac{3}{2}}} }} $$
(10)

Where, hx is the Hilbert transform of the dataset x and μhx is the Mean.

3.5 Classification

To reveal the productiveness of the proposed feature extraction scheme in cognitive function classification, the GMM classifier is used.

3.5.1 Gaussian Mixture Model-Expectation Maximization

In GMM based model the random variable y is represented as a weighted sum of G number of Gaussian functions which are widely used for automatic identification of bio signals [20, 21] and for the approximation of continuous probability density function from a multi-dimensional feature. The multivariate Gaussian probability density given by,

$$ p\left( y \right) = \sum\limits_{\text{j = 1}}^{G} {q_{j} {\text{N(y,}}\mu_{\text{j,}} \sigma_{j} )} $$
(11)

Where \( {\text{N(y,}}\mu_{\text{j,}} \sigma_{j} ) \) is the n dimensional data vector of normal distribution with covariance \( \sigma_{j} \) and mean \( \mu_{\text{j,}} \). The \( q_{i} \) is the weight representing the probability of class j defined as

$$ N\left( {{\text{y,}}\mu_{\text{j,}} \sigma_{j} } \right){ = }\frac{1}{{\sqrt { ( 2\uppi )^{\text{n }} |\sigma_{j} |} }}{ \exp }\left( {\frac{ - 1}{2}\left( {{\text{y}} - \mu_{j} } \right)^{T} \sigma_{\text{j }}^{ - 1} \left( {{\text{y}} - \mu_{j} } \right)} \right) $$
(12)

According to Bayes’ rule the conditional probability of the observation vector s belongs to the component \( g_{i} \) of the GMM defined by,

$$ p\left( {\frac{{g_{j} }}{s}} \right) = \frac{{q_{j} {\text{N(y,}}\mu_{j} ,\sigma_{j} )}}{{\sum\nolimits_{\text{j = 1}}^{G} {q_{k} {\text{N(}}y_{k} ,\mu_{k} ,\sigma_{k} )} }} $$
(13)

The Expectation-Maximization (EM) procedure is used to approximate the q, μ and σ variable that yields the maximum likelihood of the observed data D. The parameters of the mapping function are obtained by the joint probabilistic density of source and target features. A joint feature vector \( Y{ = [}s^{T} ,t^{T} ]^{T} \) where, s and t are the time aligned input and output feature vectors which are utilized to evaluate the GMM variables. The mapping function is defined by

$$ M\left( y \right) = \sum\limits_{\text{z = 1}}^{G} {\frac{{g_{z} }}{s}\left[ {\mu_{z}^{t} \sigma_{z}^{\text{ts}} \sigma_{z}^{{{\text{ss}}^{ - 1} }} \left( {{\text{s}} - \mu_{z} } \right)^{s} } \right]} $$
(14)

Where, \( \mu_{z} \) is the mean and \( \sigma_{z} \) is the covariance of rth Gaussian distribution. In GMM based technique the number of Gaussian functions G is determined by the amount of training sample.

4 Result and Discussion

The EEG signal recorded during the sleep provides useful information regarding the sleep stages which are useful in the diagnosis of sleep related disorders namely epilepsy, depression, sleep apnea and stress diagnosis. This work attempt to solve the problem of conventional sleep stage classification by employing the statistical features of cepstral coefficient computed from the DWT sub-bands of single channel Sleep EEG signal. The confusion matrix is constructed between the developed work adopting the GMM-EM classifier and the Experts’ method of visual scoring using R&k manual and is shown in the Table 3.

Table 3. Confusion matrix between the scoring result by GMM-EM classifier and experts scoring

The stage S1 is a transition phase and is a combination of wakefulness and sleep resulting in similarity with the neuronal oscillations of S1 and wake. Therefore, the classification of S1 stage is an extensive challenge to any sleep stage scoring system. Due to which S1 is misclassified as wake or REM. The proposed system classifies 42.18% of S1 epochs correctly. The REM stage is of particular importance which accounts for 5–20% of whole night sleep necessary for the diagnosis of various sleep related disorders including REM behavior disorder (RBD), narcolepsy etc. The proposed method detects 83.57% of REM epochs correctly. Also, the proposed method classifies 79.25%, 71.03%, 48.71% and 45.4% of epochs as Wake, S2, S3 and S4 respectively. In addition, the performance of the developed system is determined by evaluating the parameters: precision, accuracy, sensitivity and specificity shown in Table 4. This work achieves an average of 59.46% of precision, 88.72% of accuracy, 61.52% of sensitivity and 93.43% of specificity.

Table 4. Performance evaluation of the proposed work using GMM-EM classifier

The duration of various sleep stage is not even and therefore the recorded EEG signal has uneven number of epochs for various stage which can be confirmed by an observation on the Table 1. Thus, the automatic sleep stage scoring system poses expansive challenge due to class imbalance problem where in the specimen belonging to a class in a training data-set are higher in number than the specimen belonging to other class. Therefore, the classification model is liable to favor classifying all the samples belonging to the majority class. Even though the proposed method performs better, the class imbalance problem stops it from achieving 100% accuracy in all the cases of interest. As, the cepstral feature MFCC accomplish framing and windowing of the signal they integrates time-localization information which is averse to the wavelet-based method. Thus proposed MFCC based statistical features for sleep stage scoring provides an average classification accuracy of 89.71%. Since, the work is implemented on single channel EEG the method does not require any filtering, artifact rejection, and noise removal algorithm. This is a major advantage of the proposed methodology which can be convenient for portable sleep quality evaluation devices. As, the sleep stage scoring scheme can be operated directly on the recorded signal the device will ensure reduced power consumption.

The classification results of wavelet-based features are demonstrated in [22]. On comparing the classification performance of the wavelet-based features and cepstral features, it is found that the cepstral features performs 0.99 times better than the wavelet-based features. This may be due to the reason that the short-term spectral characteristics of the EEG signal can represent the sleep stage information in a more effective manner unlike the statistical measures of EEG signal in wavelet domain.

5 Conclusion

The paper discusses the exploring of Mel Frequency Cepstral coefficients for automatic sleep stage scoring based on single-channel EEG. The current study exploits the statistical parameters of the Mel Frequency Cepstral Coefficients for the purpose of classifying sleep stages. The proposed automatic sleep stage scoring system provides more accurate and speedy diagnosis of EEG signals corresponding to complex cognitive tasks and it will be useful in the clinical application such as epilepsy, depression, sleep apnea and stress diagnosis. MFCC based feature extraction technique achieve better performance than the wavelet based features and relatively robust for sleep stage classification. Since the proposed framework is capable of analyzing the non-stationary EEG signal it can also be implemented in real time EEG based signal analysis including Brain-Computer Interface(BCI) investigation to control external devices using cognitive neuroscience.