Artificial intelligence approach to classify unipolar and bipolar depressive disorders

Erguzel, Turker Tekin; Sayar, Gokben Hizli; Tarhan, Nevzat

doi:10.1007/s00521-015-1959-z

Artificial intelligence approach to classify unipolar and bipolar depressive disorders

Original Article
Published: 18 June 2015

Volume 27, pages 1607–1616, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Artificial intelligence approach to classify unipolar and bipolar depressive disorders

Download PDF

Turker Tekin Erguzel ORCID: orcid.org/0000-0001-8438-6542¹,
Gokben Hizli Sayar^2,3 &
Nevzat Tarhan^2,3

1570 Accesses
35 Citations
Explore all metrics

Abstract

Machine learning approaches for medical decision-making processes are valuable when both high classification accuracy and less feature requirements are satisfied. Artificial neural networks (ANNs) successfully meet the first goal with its adaptive engine, while nature-inspired algorithms are focusing on the feature selection (FS) process in order to eliminate less informative and less discriminant features. Besides engineering applications of ANN and FS algorithms, medical informatics is another emerging field using similar methods for medical data processing. Classification of psychiatric disorders is one of the major focus of medical informatics using artificial intelligence approaches. Being one of the most debilitating psychiatric diseases, bipolar disorder (BD) is frequently misdiagnosed as unipolar disorder (UD), leading to suboptimal treatment and poor outcomes. Thus, discriminating UD and BD at earlier stages of illness could therefore help to facilitate efficient and specific treatment. The use of quantitative electroencephalography (EEG) cordance as a biomarker has greatly enhanced the clinical utility of EEG in psychiatric and neurological subjects. In this context, the paper puts forward a study using two-step hybridized methodology: particle swarm optimization (PSO) algorithm for FS process and ANN for training process. The noteworthy performance of ANN–PSO approach stated that it is possible to discriminate 31 bipolar and 58 unipolar subjects using selected features from alpha and theta frequency bands with 89.89 % overall classification accuracy.

Automated diagnosis of bipolar depression through Welch periodogram and machine learning techniques

Article 07 September 2023

A Machine Learning-Based Method to Identify Bipolar Disorder Patients

Article 26 November 2021

EEG classification of adolescents with type I and type II of bipolar disorder

Article 15 October 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With its a lifelong prevalence of up to 4–5 %, bipolar disorders (BD) are one of the most prevalent psychiatric diseases [1]. Regrettably, BDs are still a diagnostic challenge, and criteria for bipolarity are still controversial [2] which causes misdiagnosed bipolar subjects as having unipolar depression and therefore leading to insufficient treatment and poor outcomes [3–5]. BD and unipolar disorder (UD) have specific pathophysiologies, but similar depressive appearances and current diagnoses are determined mainly according to structured clinical assessment based on Diagnostic and Statistical Manual (DSM) of Mental Disorders-IV, with a symptom-based rather than an etiology-based approach [6]. Patient’s self-report of past history or depressive appearance in bipolar patients may also cause misdiagnosis inadvertently [7]. So, an effective classification method is required to dichotomize unipolar and bipolar subjects in order to apply the right treatment to the right patient [8]. Recent studies used neuroimaging methods for bipolar and unipolar disorders to reveal discrete patterns of functional and structural abnormalities in neural systems critical for emotion regulation [9–11], while some other studies were employing traditional statistical techniques that rely on the basic assumption of linear combinations but which may not be appropriate for such tasks [12].

Classification is considered as a useful tool for medical problems which has a common application area focusing on medical diagnosis [13, 14]. Fundamentally, classification policy could be established by medical experts to enable better understanding of the problem. Recent engineering studies contributed to the classification of the diseases using the techniques such as expert systems, artificial neural networks, linear programming, database systems, evolutionary algorithms, and swarm intelligence [14–19].

Over the past decade, machine learning (ML) methods have been used increasingly in the field of affective disorders and in the comparison of these patients to those with other psychiatric disorders [20]. With its extensive use and promising results, ML approaches avoid oversimplification by incorporating high-order interactions between predictive variables and underline the superiority of ANNs over linear methods in a number of areas of medical research [21–24]. In most ANNs applications, gradient-based algorithms are used for training which may lead to a local minimum traps. Although training process may be repeated a number of times starting from different initial conditions, this rarely certifies to reaching the global optimum of a multimodal high-dimensional problem.

It is possible to overcome the emerging problem for training process that is the application of optimization algorithms, which has become more popular in recent years [25]. Increasing number of features in medical domain requires FS process, and recent studies enunciate swarm intelligence algorithms as a crucial step to evaluate and process the data in an efficient way [14]. An appropriate and relevant feature set selection process also reduces the risk of overfitting, thus improving model generalization by decreasing the model’s complexity [26]. This is particularly important in small-sized high-dimensional datasets, where the curse of dimensionality is present and a significant gain in terms of performance can be achieved with a small subset of features [27, 28].

In a recent study, a hybrid particle swarm optimization–back-propagation algorithm was used for feed-forward neural network training. The combined method could overcome the problem of slow searching process of PSO around the global optimum. In another study, two methods of neural network training using PSO and back-propagation learning for medical decision-making have been proposed, and the experimental results proposed that using back propagation is generally preferable over PSO for imbalanced training data, especially with small datasets and large number of features [29]. In this context, the motivation for the present research was an interest in developing a robust classification tool to address the diagnosing problem of unipolar and bipolar disorders. With its multidisciplinary nature, this study combined ML and metaheuristic approaches to discriminate the subjects of unipolar and bipolar disorders with decreased number of features using PSO algorithm and employing QEEG cordance as biomarker.

2 Materials and methods

2.1 Subjects

We conducted a retrospective investigation involving a study group of 89 patients selected among a larger population of 1200 patients, all of whom were consecutively admitted for a BD or UD at the Neuropsychiatry Istanbul Hospital Department of Psychiatric Outpatient Clinics between January 2010 and December 2013. We matched 31 bipolar disorder depressive episode patients and 58 unipolar depressive episode patients from various age groups and genders. Eligible subjects were outpatients suffering from a depressive episode associated with BD or UD. All participants were given a primary diagnosis of either BD or UD according to DSM-IV criteria and specifically the Structured Clinical Interview for Axis I Disorders (SCID-I). We included subjects with a diagnosis of UD who received at least the scores of 8 on the Hamilton Depression Rating Scale 17-item version (HDRS) or subjects with a diagnosis of BD episode and scoring higher than 13 points in Young Mania Rating Scale (YMRS) [30]. We excluded the subjects experiencing their first depressive episode or episode with current psychotic features as well as those with a history of rapid cycling (≥4 cycles during a year), history of mixed episodes, current psychiatric comorbidity on axis I, serious unstable medical illness or neurologic disorder (e.g., epilepsy, head trauma with loss of consciousness), alcohol or substance abuse within 6 months preceding the study, and patients treated by electroconvulsive therapy within 3 months before their participation to the study. All patients were medication-free for at least 48 h. Participants met the routine laboratory studies (complete blood count, chemistry, thyroid stimulating hormone); urine toxicology screen and electrocardiogram were performed at study screening, and subjects were required to be medically stable before enrollment to the study.

2.2 EEG recordings and cordance calculations

For all the patients, EEG was recorded for 12 h in drug-free condition. In order to observe and reveal the efficacy of cordance, QEEG data were collected from 89 subjects who were seated in a sound-attenuated, electrically shielded room in a reclining chair with eyes closed (wakeful resting condition). The technicians monitored the QEEG data during the recording and re-alerted the subjects every minute as needed to avoid drowsiness. Electrodes were placed with an electrode using 19 recording electrodes distributed across the head according to the international 10–20 system arrangement. Three minutes of eye-closed EEG at rest were acquired using Scan LT EEG amplifier and electrode cap (Compumedics/Neuroscan, USA) with the sampling rate of 250 Hz. Sintered Ag/AgCl electrodes positioned according to the 10–20 international system with binaural reference. For each individual, the cordance values were calculated using the EEG data gathered from recording electrodes and ten regions (prefrontal, frontocentral, central, left temporal, right temporal, left parietal, occipital, midline, left frontal and right frontal) in delta, theta and alpha frequency bands.

The cordance combines complementary information from absolute and relative power of EEG spectra to yield values having stronger correlation with regional cerebral perfusion compared with stand-alone measures [31]. Absolute power, coherence, and cordance have been shown to be an index of cerebral local perfusion in previous studies [32, 33]. Increased slow-wave and decreased fast-wave activity on the electroencephalogram is common in brain dysfunction and may be caused by partial cortical deafferentation. Cordance is measured along a continuum of values: Positive values denote concordance, an indicator associated with normally functioning brain tissue, and negative values denote discordance, indicator associated with undercutting lesions, low perfusion, and low metabolism [34].

Raw EEG signals were filtered through a band-pass filter (0.15–30 Hz) before artifact elimination. Artifact detection was visually performed to remove the EEG segments with obvious eye and head movements, muscle artifacts, or a decrease in alertness. Manually selected (minimum 2 min) artifact-free EEG data which have minimum split-half reliability ratio of 0.95 and test–retest reliability ratio of 0.90 were used for cordance calculations. The EEG reviewer was blind to the subject’s treatment condition and clinical status. Fast fourier transformation (FFT) was used to calculate absolute and relative power in each of four non-overlapping frequency bands [35], delta (1–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), and beta (12–20 Hz) by using NeuroGuide Deluxe 2.5.1 software (Applied Neuroscience; St. Petersburg, FL, USA). Cordance values were calculated using custom algorithm in MATLAB^® 7.10.0.499 available for research purposes.

This algorithm normalizes power across both electrode sites and frequency bands in three consecutive steps: first, absolute power values are reattributed to each individual electrode by averaging the power from all bipolar electrode pairs sharing that electrode. This electrode-referencing method is similar to the Hjorth transformation [36] except that the current method averages the power from neighboring electrode pairs and thus enabling stronger correlation between surface-measured EEG and perfusion of underlying brain tissue than either the linked ears reference or the conventional Hjorth transformation [37]. In the following phase, the relative power values are calculated on the basis of dividing absolute power values by total power values for each electrode site and frequency band. In the second step, the maximum absolute and relative power values (AMAX_f, RMAX_f) in each frequency band (f) are determined to obtain normalized absolute (A _NORM(s,f)) and normalized relative (R _NORM(s,f)) power values [absolute and relative power values at each electrode site(s) and for each frequency band (f) are divided by (AMAX_f, RMAX_f), respectively]. Finally, the cordance values at each electrode site for each frequency band (f) are calculated by summing the A_NORM and R_NORM values, after a half-maximal values (0.5 on the normalized scale) is subtracted: CORDANCE_(s,f) = A _NORM(s−f) − 0.5) + (R _NORM(s−f) − 0.5) [38].

2.3 Feature selection

For feature selection process, feature interaction could be a step to overcome. The best features are usually a group of features with the presence of feature complementarity, and there could be two-way or multi-way interactions among features [39, 40]. Thus, an individually relevant and important feature may become redundant while considered with other features, but eliminating some features may also reduce the complexity. On the other hand, an individually redundant or weakly relevant feature may become highly relevant when considered with others. In order to overcome the dilemma, optimal feature subset should be a group of complementary features that is significant for large search space. The size of the search space increases exponentially with respect to the number of available features in the dataset [41] which makes an exhaustive search practically impossible in most situations. Although there are various searching algorithms applied to feature selection process, many of them still suffer from the problems of stagnation in local optima or being computationally expensive [42, 43]. In order to better address the feature selection problems, an efficient global search technique is required. Evolutionary computation (EC) techniques are well known for their global searching abilities. PSO [44, 45], a relatively recent and promising EC method, is computationally less expensive compared with other EC algorithms and has been employed as an effective feature selection method in many studies [43, 46].

2.4 Particle swarm optimization

Particle swarm optimization (PSO) is an evolutionary-based algorithm inspired by animals’ social behaviors such as fish schooling, bird flocking, and insect swarming [44]. PSO searches for the optimum solution/solutions by updating a population of possibilities and directing the search toward the regions of interest in the search space [47]. The search in PSO is performed using a population of s particles given as x _i(i ∈ [1…s]) that update their location in the search space through a modified velocity, given as V _ij i ∈ [1…s] and j ∈ [1…n], where n is the dimension of the search space, over some period of time. The velocity (V) and particle update equations are given in Eqs. (1) and (2) respectively,

$$ V_{ij} \left( t \right) = W \times xV_{ij} \left( {t - 1} \right) + C_{ij} + S_{ij} $$

(1)

$$ x_{i,j} \left( t \right) = x_{i,j} \left( {t - 1} \right) + V_{i,j} \left( t \right) $$

(2)

$$ C_{ij} = c_{1} r_{1,j} \times \left( {Pbest_{i,j} \left( {t - 1} \right) - x_{i,j} \left( {t - 1} \right)} \right) $$

(3)

$$ S_{ij} = c_{2} r_{2,j} \times \left( {Gbest_{i,j} \left( {t - 1} \right) - x_{i,j} \left( {t - 1} \right)} \right) $$

(4)

In the given equations, w is the inertia weight, t is the current iteration, i is the particle index in the population, and j is the dimension. Here, r _1,j and r _2,j are distinct random values in the range between 0 and 1, c ₁ and c ₂ are acceleration coefficients which control the effectiveness of social (S) and cognitive (C) components. Several methods have been suggested to adjust the parameters in Eq. (1), including linearly decreasing inertia weight (LDIW), time varying inertia weight (TVIW), linearly decreasing acceleration coefficient (LDAC), time varying acceleration coefficient (TVAC), random inertia weight (RANDIW), fix inertia weight (FIW), random acceleration coefficients (RANDAC), and fix acceleration coefficients (FAC). LDIW and FixAC are the common approaches among the proposed methods in several studies [48–50]. Equation (5) represents the LDIW formulation as

$$ W = \left( {w_{1} - w_{2} } \right) \times \frac{{\left( {\max_{\text{iter}} - t} \right)}}{{\max_{\text{iter}} }} + w_{2} $$

(5)

where w ₁ and w ₂ are the initial and final inertia weight, t is the current iteration, and max_iter is the number of maximum iteration used to terminate the loop. Pbest _i,j and Gbest _i,j are the local and the global best solutions that represent the best solution found by an individual particle and the best overall solution found by the swarm and is updated using the Eqs. (6) and (7). In the equations, f represents the fitness function used to assess the feasibility of the particle (x), local best solution (Pbest) or global best solution (Gbest).

$$ Pbest_{i} \left( t \right) = \left\{ {\left. {\begin{array}{*{20}l} {Pbest_{i} \left( {t - 1} \right),} \hfill \\ {x_{i} \left( t \right),} \hfill \\ \end{array} \quad } \right|\quad \begin{array}{*{20}l} {if f(x_{i} \left( t \right)) \le f\left( {Pbest_{i} \left( {t - 1} \right)} \right)} \hfill \\ {otherwise} \hfill \\ \end{array} } \right\} $$

(6)

$$ Gbest\left( t \right) = \arg \hbox{min} \left\{ {f\left( {Pbest_{1} (t)} \right), f\left( {Pbest_{2} (t)} \right), \ldots f\left( {Pbest_{s} (t)} \right)} \right\} $$

(7)

The update mechanism of global best solution can be affected from the subset of particles that share their local best solutions. That is called as neighborhood topology. The common choices for neighborhood topology are local and global methods in which the local neighborhood allows the existence of sub-swarms of particles that update their global best solutions based on a set of personal best solutions found by the members of the sub-swarm, whereas the global neighborhood topology generates a single global best solution for the entire swarm [51–53]. The pseudocode of the PSO approach is given in the following algorithm and the hybrid structure of PSO with ANN is given in Fig. 1.

Initialization : Randomly initialize a population.
Initial Evaluation : Evaluate all members of the population using their fitness function f.
repeat
1. 1.
  Updating the population: Update the velocity for each particle using Eq. (1) and then update the particle by applying the new velocity to Eq. (4)
2. 2.
  Evaluation: Evaluate all members of the population using the fitness function f
3. 3.
  Updating the best findings: Update PBest and GBest using Eqs. (6) and (7)
until termination condition is satisfied : The maximum iteration is reached or the best member of the population (GBest) is performing above the highest expected performance [54].

2.5 Artificial neural network

Artificial neural network (ANN) is an artificial intelligence method used to create a computational model inspired by the structural and functional features of biological neural networks. One of the main outstanding properties of ANN is its ability to model complex nonlinear relationships, potentially incorporating high-order interactions between predictive variables. The use of ANNs in the fields of research requiring classification or prediction processes such as psychiatry, robotics and biology is stimulating for the researchers [55, 56] due to their ability to adapt, learn, generalize, organize, and classify data. The superiority of the ANN method over linear data mining methods is well addressed in a number of areas of medical research as well [21–23]. The ANN model is formed of neurons in layers and weighted connections transmitting signals between the neurons in a forward or looped way in order to transmit the information gathered from the input or former neurons to the output [57]. Thus, generated model represents a distributed adaptive system built by means of multiple interconnecting processing elements, just as real neural networks do.

In feed-forward neural networks (FNN), the processing elements, the neurons, are distributed in several layers. The intermediate layers are known as the hidden layers, while the first layer is called as input and the last one is known as output layer. In general terms, each neuron receives signals processed and transmitted by neurons in the preceding layer and transmits them to the next layer. The number of layers and the way in which the neurons are connected forms the architecture of the network. The signals in each layer are scaled in each connection according to an adjustable parameter associated with each connection between neurons called weight which is set randomly before the modeling process is initiated. Each neuron in the hidden layer collects the signals from the former layer/s and then adds them up to generate the output for the following layer using an activation function. Depending on the structure of the system, linear or nonlinear transfer function is preferred in the junction points of the neurons. The output of each layer is transmitted to the following layer, and finally the output layer generates the output to be compared with the reference to calculate the error value. The weights of neuron connections are then modified according to the selected training algorithm, in order to minimize the error. This process is repeated until a previously established criterion is reached, for example, when the error value gets to a threshold or stops decreasing [58].

One of the ways to minimize the error value is using back-propagation (BP) algorithm, a gradient-descent procedure which, ideally, requires infinitesimal changes in the connection weights. In BP, the network error for the given inputs is calculated, and the weights of the connections between the neurons in the last hidden layer and the output layer are modified according to the extent to which these connections have contributed to form the current error [59].

In this study, the ANN model used back-propagation learning algorithm with one hidden layer with 20 neurons. Because of its nonlinear structure, logsig transfer function was employed in the hidden layer and purelin transfer function in the output layer as given in Fig. 2.

Input data are collected from 19 electrodes in three frequency bands, trainlm training function was used to train the model, and sixfold cross-validation was used to test the classifier. In order to evaluate the classification algorithm receiver operating characteristic (ROC) curve, a plot of the sensitivity [true-positive rate (TPR)] as the function of false-positive rate (FPR) (1-specificity) was used.

3 Results

In this study, the combination of a swarm intelligence method, PSO, for feature selection and an ML approach, ANN, was used to appreciate the value of artificial intelligence methods for the diagnosis, treatment planning, and monitoring of psychiatric and neurological diseases. Initially, the classification performance of ANN model was expressed in terms of accuracy, then in order to improve the outcome of the model, it was transformed to a hybrid model incorporating a feature selection process. Twenty-eight inputs were used from Fp1, Fp2, F3, F4, F7, F8, T3, T4, T5, T6,C3, C4, P3, and P4 electrodes in alpha and theta frequency bands. A significant upgrade was observed with the contribution of PSO, and the classification accuracy increased despite the decreasing number of features. The classification results before and after feature selection process are given in terms of overall accuracy, sensitivity, and area under ROC curve parameters in Table 1. The ROC curve for the compared approaches is plotted in Fig. 3 as well.

Table 1 Classification performance of PSO–ANN and standalone ANN models

Full size table

Throughout the classification process, the intersection point of TPR and FPR at each threshold is plotted to form the ROC curve. Each point on the ROC curve represents a sensitivity/(1-specificity) pair corresponding to a particular decision threshold. Depending on the classification performance, the relative changes of TPR and FPR may differentiate causing sharp transitions between cutoff points in ROC curve. After the frequency band and channel selection phase, PSO algorithm was used to reduce the feature set by considering the classification error as cost function. The contribution of feature selection process to the accuracy is quite satisfactory. The hybrid model classified unipolar subjects with 89.89 % overall accuracy, percentage of examples been classified correctly. Sensitivities also expanded to 83.87 % from 64.52 % for bipolar disorder subjects and 93.1 % from 77.59 % for unipolar disorder subjects.

Area under ROC curve (AUC) parameter was also used to underline the performance of PSO algorithm. The feature selection increased AUC value for bipolar disorder subjects from 0.757 to 0.905, and comparative plots are given in Fig. 3. Following the feature selection and classification process, 14 electrodes, namely C3, C4, Fp1, Fp2, F3, F7, T4, T5 from alpha frequency band and Fp1, F4, C4, P4, T4, T6 from theta frequency bands were considered as prominent features, thus eliminated 14 other remaining features due to their limited informative contributions.

4 Discussions and conclusions

In this paper, we generated a hybrid artificial intelligence approach combining particle swarm optimization and artificial neural network to discriminate unipolar and bipolar depressive disorders employing more informative features. The literature on feature selection techniques is very vast encompassing the applications of ML and classification. The proposed approach first eliminated the less informative features regarding their contribution to the output. Using the selected features a back-propagation neural network was then generated in order to classify the subjects into two classes as unipolar or bipolar. The outcomes of the combined model are promising for the clinicians and could be evaluated as a tool useful for the diagnosis process.

The clinical interpretation of the outcomes is noteworthy for the following interdisciplinary studies. Numerous clinical and neuroimaging studies were conducted with the aim of validating unipolar and BD differentiation. Results of the previous neuroimaging studies suggest that abnormal activation in prefrontal and subcortical regions underlie impaired cognitive control and impulsivity that are commonly reported in BD and MDD [60, 61]. A few number of neuroimaging studies compared the brain functioning of bipolar and unipolar depressive subjects [62–64].

There is evidence that unipolar depression is associated with increased functional connectivity of three networks; the default mode network, the cognitive control network, and the affective network converging on the dorsal medial prefrontal cortex [65]. In another study, 14 individuals with bipolar II depression and 26 patients with recurrent unipolar depression, aged between 21 and 45 years [66]. All participants underwent functional magnetic resonance imaging (fMRI) and functional connectivity analyses while performing two repetitions of a motor activation task. The two groups did not significantly differ in their task performance. However, bipolar patients had significantly stronger functional connectivity between the posterior cingulate cortex and one cluster in the right parietal/insular region, compared with unipolar patients. This cluster included portions of the right inferior parietal lobule, the precentral gyrus and insula, and surrounding regions.

The functional neuroimaging studies suggest that in BD dysregulation of mood is caused by the disturbed prefrontal modulation of subcortical and medial temporal structures within the anterior limbic network. Elevated activity and volume loss of hippocampus, orbital frontal and ventral prefrontal cortex and hypometabolism of dorsal prefrontal cortex as well as bidirectional metabolic changes of anterior cingulate were described in BD [67–69]. Results of other researchers also suggested similar dysfunctions in brain connectivity in unipolar depression [70, 71].

Effort of trying to develop a reliable biomarker to predict treatment response resulted in “cordance” studies [34]. In unipolar depression, one of the best-documented brain functional biomarkers predicting a response to an antidepressant is the decrease in quantitative EEG (QEEG) prefrontal cordance in theta frequency band [72–74]. Furthermore, in another study it was described that the decrease in cordance value associated with a switch to mania [75].

Numerous former EEG studies have attempted to evaluate the distinct features of BD and UD as compared to other clinical and non-clinical populations. One well-replicated finding in UD is that, compared to healthy subjects, an inter-hemispheric frontal alpha asymmetry has been found due to an increased left frontal alpha power as it is well-known indicator of idling activity on that side [76]. Decreased alpha and increased theta power in the frontocentral regions are the most common findings in BD patients [77, 78]. A recent study reported that deficient left hemisphere alpha power in BD and decreased inter-hemispheric theta coherence in UD could discriminate these two groups. That study also underlined that BD patients, as compared to UD, exhibited greater central–temporal theta and parietal–temporal alpha and theta coherence [79]. So we used only theta and alpha activity data and removed delta and beta activity data in this study (green text will be removed).

Unfortunately, despite intensive research in the field, findings in cerebral metabolic studies of BD are controversial. When compared to unipolar depression, cerebral metabolic changes observed in bipolar disorder were suggested to be more associated with dysregulation of the dorsolateral prefrontal circuit [69] and the anterior cingulate [80]. In a study it was suggested that disrupted baseline metabolic status is reversed by effective treatment [81], but there is also some evidence of a persistence of metabolic abnormalities in euthymic patients [82].

Finally, the results demonstrate that EEG cordance values have potential to discriminate between UD and BD. The loss of temporal synchronization in the frontal interhemispheric and right-sided frontolimbic neuronal networks suggested to be the unique features that distinguish between BD and UD in previous research [79]. In this context, the paper puts forward a study using two-step hybridized methodology: PSO algorithm for feature selection process and ANN for training process. The noteworthy performance of ANN–PSO approach stated that it is possible to discriminate 31 bipolar and 58 unipolar subjects using selected features from alpha and theta frequency bands with 89.89 % overall classification accuracy.

Our findings support the potential utility of the proposed methodology to be used as a clinical tool in classifying UD and BD subjects. Functional neuroimaging methods provide information about differences in the neural processes associated with unipolar versus bipolar depression. Further studies are warranted to replicate this result in order to lead to the development of clinically useful diagnostic methodologies.

References

Merikangas KR, Akiskal HS, Ankst J et al (2007) Lifetime and 12-month prevalence of bipolar spectrum disorder in the National Comorbidity Survey replication. Arch Gen Psychiatry 64(5):543–552
Article Google Scholar
Angst J, Azorin JM, Browden CL et al (2012) Diagnostic criteria for bipolarity based on an international sample of 5,635 patients with DSM-IV major depressive episodes. Eur Arch Psychiatry 262(1):3–11
Google Scholar
Bowden CL (2010) Diagnosis, treatment, and recovery maintenance in bipolar depression. J Clin Psychiatry 71(1):e01
Article Google Scholar
Goldberg JF, Harrow M, Whiteside JE (2001) Risk for bipolar illness in patients initially hospitalized for unipolar depression. Am J Psychiatry 158(8):1265–1270
Article Google Scholar
Bowden CL (2001) Strategies to reduce misdiagnosis of bipolar depression. Psychiatr Serv 52(1):51–55
Article Google Scholar
Lee PS, Chen YS, Hsieh JC et al (2010) Distinct neuronal oscillatory responses between patients with bipolar and unipolar disorders: a magnetoencephalographic study. J Affect Disord 123(1–3):270–275
Article Google Scholar
Hirschfeld RM, Calabrese JR, Weissman MM et al (2003) Screening for bipolar disorder in the community. J Clin Psychiatry 64(1):53–59
Article Google Scholar
Phillips ML, Frank E (2006) Redefining bipolar disorder: toward DSM-V. Am J Psychiatry 163:1135–1136
Article Google Scholar
Almeida JR, Versace A, Mechelli A et al (2009) Abnormal amygdala-prefrontal effective connectivity to happy faces differentiates bipolar from major depression. Biol Psychiatry 66:451–459
Article Google Scholar
Lawrence NS, Williams AM, Surguladze S et al (2004) Subcortical and ventral prefrontal cortical neural responses to facial expressions distinguish patients with bipolar disorder and major depression. Biol Psychiatry 55:578–587
Article Google Scholar
Phillips ML, Drevets WC, Rauch SL et al (2003) Neurobiology of emotion perception II: implications for major psychiatric disorders. Biol Psychiatry 54:515–528
Article Google Scholar
Ritchie MD, White BC, Parker JS et al (2003) Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC Bioinformatics 4:28
Article Google Scholar
Freitas AA (2002) A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsui S (eds) Advances in evolutionary computation. Springer, Berlin, pp 819–845
Google Scholar
Pena-Reyes CA, Sipper M (2000) Evolutionary computation in medicine: an overview. Artif Intell Med 9(1):1–23
Article Google Scholar
Chang YH, Zheng B, Wang XH et al (1999) Computer-aided diagnosis of breast cancer using artificial neural networks: comparison of backpropagation and genetic algorithms. Proceedings of the International Joint Conference on Neural Networks, Washington, DC, USA, vol 5. IEEE Press, Washington (DC), pp 3674–3679
Google Scholar
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge (MA)
MATH Google Scholar
Tan KC, Tay A, Lee TH et al (2002) Mining multiple comprehensible classification rules using genetic programming. IEEE Congress on Evolutionary Computation. Honolulu, HI, pp 1302–1307
Google Scholar
Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Kaufmann (Morgan), San Francisco (CA)
Google Scholar
Wong ML, Lam W, Leung KS et al (2000) Discovering knowledge from medical databases using evolutionary algorithms. IEEE Eng Med Biol Mag 19(4):45–55
Article Google Scholar
Leslie S (1987) Neurometric quantitative EEG features of depressive disorders. In: Takahashi R, Flor Henry P, Gruzier J, Niwa S (eds) Cerebral dynamics, laterality and psychopathology. Elsevier, Amsterdam, pp 1–17
Google Scholar
Lucek P, Hanke J, Reich J et al (1998) Multi-locus nonparametric linkage analysis of complex trait loci with neural networks. Hum Hered 48:275–284
Article Google Scholar
Ottenbacher KJ, Smith PM, Illig SB et al (2001) Comparison of logistic regression and neural networks to predict rehospitalization in patients with stroke. J Clin Epidemiol 54(11):1159–1165
Article Google Scholar
Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127
Article Google Scholar
Jaimes F, Farbiarz J, Alvarez D et al (2005) Comparison between logistic regression and neural networks to predict death in patients with suspected sepsis in the emergency room. Crit Care 9(2):150–156
Article Google Scholar
Adam P, Pawel M, Jaroslaw J (2012) Comparison of evolutionary computation techniques for noise injected neural network training to estimate longitudinal dispersion coefficients in rivers. Expert Syst Appl 39:1354–1361
Article Google Scholar
Guyon I, Gunn S, Zadeh LA (2006) Feature extraction, foundations and applications. Springer, Berlin
Book MATH Google Scholar
Hassan R, Othman RM, Saad P, Kasim S (2011) A compact hybrid feature vector for an accurate secondary structure prediction. Inf Sci 181:5267–5277
Article Google Scholar
Maldonado S, Weber R, Basak J (2011) Kernel-penalized SVM for feature selection. Inf Sci 181:115–128
Article Google Scholar
Satchidananda D, Royb R, Choc SB et al (2012) An improved swarm optimized functional link artificial neural network (ISO-FLANN) for classification. J Syst Eng 85:1333–1345
Google Scholar
Young RC, Biggs JT, Ziegler E et al (1978) A rating scale for mania: reliability, validity and sensitivity. Br J Psychiatry 133:429–435
Article Google Scholar
Leuchter AF, Uijtdehaage SH, Cook IA et al (1999) Relationship between brain electrical activity and cortical perfusion in normal subjects. Psychiatry Res 90(2):125–140
Article Google Scholar
Hughes JR, John ER (1999) Conventional and quantitative electroencephalography in psychiatry. J Neuropsychiatry Clin Neurosci 11:190–208
Article Google Scholar
Niedermeyer E, Silva L (2004) Electroencephalography. Basic principles, clinical applications, and related fields, 5th edn. Lippincott Williams & Wilkins, Philadelphia
Google Scholar
Leuchter AF, Cook IA, Lufkin RB et al (1994) Cordance: a new method for assessment of cerebral perfusion and metabolism using quantitative electroencephalography. Neuroimage 1:208–219
Article Google Scholar
Nuwer MR, Lehmann D, da Silva FL et al (1999) IFCN guidelines for topographic and frequency analysis of EEGs and EPs. Electroencephalogr Clin Neurophysiol 52:15–20
Google Scholar
Hjorth B (1975) An on-line transformation of EEG scalp potentials into orthogonal source derivations. Electroencephalogr Clin Neurophysiol 39:526–530
Article Google Scholar
Cook IA, O’Hara R, Uijtdehaage S et al (1998) Assessing the accuracy of topographic EEG mapping for determining local brain function. Electroencephalogr Clin Neurophysiol 107:404–414
Article Google Scholar
Bares M, Novak T, Brunovsky M et al (2012) The change of QEEG prefrontal cordance as a response predictor to antidepressive intervention in bipolar depression. A pilot study. J Psychiatr Res 46:219–225
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43:5–13
Article MATH Google Scholar
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271
Article MathSciNet MATH Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Article Google Scholar
Chuang LY, Tsai SW, Yang CH (2011) Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst Appl 38:12699–12707
Article Google Scholar
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks, IEEE, Piscataway, pp 1942–1948
Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: International Conference on Evolutionary Computation, IEEE, pp 69–73
Lin SW, Chen SC (2009) PSOLDA: a particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis. Appl Soft Comput 9:1008–1015
Article Google Scholar
Peer ES, Van Den Bergh F, Engelbrecht AP (2003) Using neighbourhood with the guaranteed convergence PSO. In: Proceedings of the 2003 IEEE Swarm Intelligence Symposium, pp 235–242
Jakob SV, Jacques R (2002) Particle swarms extensions for improved local, multimodal, and dynamic search in numerical optimization. M.Sc. Thesis, Department of Computer Science, Aarhus University, Aarhus C, Denmark
Ratnaweera A, Halgamuge SK, Watson CH (2004) Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Trans Evol Comput 8(3):240–255
Article Google Scholar
Pasupuleti S, Battiti R (2006) The gregarious particle swarm optimizer (G-PSO). In: Eighth Annual Conference on Genetic and Evolutionary Computation, pp 67–74
Kennedy J, Mendes R (2002) Population structure and particle swarm performance. Proc Congr Evol Comput 2:1671–1676
Google Scholar
Zhang W, Xie X (2003) DEPSO: hybrid particle swarm with differential evolution operator. In: IEEE International Conference on Systems, Man and Cybernetics, Washington DC, pp 3816–3821
Atyabi A, Samadzadegan S (2011) Particle swarm optimization: a survey, applications of swarm intelligence. Nova Scientific Publishers, Hauppauge, pp 167–178
Google Scholar
Atyabi A, Luerssen MH, Powers DM (2013) PSO-based dimension reduction of EEG recordings: implications for subject transfer in BCI. Neurocomputing 119:319–331
Article Google Scholar
Marren A, Harston C, Pap R (1990) Handbook of neural computating applications. Academic Press Inc., San Diego
Google Scholar
Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems: fundamentals through simulations. Wiley, New York
Google Scholar
Kunhimangalama R, Ovallath S, Joseph PK (2013) Computer aided diagnostic problem solving: identification of peripheral nerve disorders. IRBM 34:244–251
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing: explorations in the microstructure of cognition, vol 1. The MIT Press, Cambridge, pp 318–362
Google Scholar
Asensio-Cuesta S, Diego-Mas JA, Alcaide-Marzal J (2010) Applying generalised feed forward neural networks to classifying industrial jobs in terms of risk of low back disorders. Int J Ind Ergon 40:629–635
Article Google Scholar
Kaladjian A, Jeanningros R, Azorin JM et al (2009) Reduced brain activation in euthymic bipolar patients during response inhibition: an event-related fMRI study. Psychiatry Res 173(1):45–51
Article Google Scholar
Langenecker SA, Kennedy SE, Guidotti LM et al (2007) Frontal and limbic activation during inhibitory control predicts treatment response in major depressive disorder. Biol Psychiatry 62(11):1272–1280
Article Google Scholar
Anand A, Li Y, Wang Y (2009) Resting state corticolimbic connectivity abnormalities in unmedicated bipolar disorder and unipolar depression. Psychiatry Res 171(3):189–198
Article Google Scholar
Lawrence SN, Williams AM, Surguladze S et al (2004) Subcortical and ventral prefrontal cortical neural responses to facial expressions distinguish patients with bipolar disorder and major depression. Biol Psychiatry 55:578–587
Article Google Scholar
Taylor JV, Clark L, Furey ML et al (2008) Neural basis of abnormal response to negative feedback in unmedicated mood disorders. Neuroimage 42:1118–1126
Article Google Scholar
Sheline YI, Price JL, Yan Z (2010) Resting-state functional MRI in depression unmasks increased connectivity between networks via the dorsal nexus. Proc Natl Acad Sci 107(24):11020–11025
Article Google Scholar
Marchand WR, Lee JN, Johnson S (2013) Differences in functional connectivity in major depression versus bipolar II depression. J Affect Disord 150(2):527–532
Article Google Scholar
Bares M, Novak T, Brunovsky M (2012) The change of QEEG prefrontal cordance as a response predictor to antidepressive intervention in bipolar depression. A pilot study. J Psychiatr Res 46(2):219–225
Article Google Scholar
Brooks JO, Wang PW, Bonner JC et al (2009) Decreased prefrontal, anterior cingulate, insula, and ventral striatal metabolism in medication-free depressed outpatients with bipolar disorder. J Psychiatry Res 43:181–188
Article Google Scholar
Hosokawa T, Momose T, Kasai K (2009) Brain glucose metabolism difference between bipolar and unipolar mood disorders in depressed and euthymic states. Prog Neuropsychopharmacol 33:243–250
Article Google Scholar
Drevets WC, Price JL, Furey ML (2008) Brain structural and functional abnormalities in mood disorders: implications for neurocircuitry models of depression. Brain Struct Funct 213:93–118
Article Google Scholar
Savitz J, Drevets WC (2009) Bipolar and major depressive disorder: neuroimaging the developmental-degenerative divide. Neurosci Biobehav R 33:699–771
Article Google Scholar
Leuchter AF, Cook IA, Hamilton SP et al (2010) Biomarkers to predict antidepressant response. Curr Psychiatry Rep 12:553–562
Article Google Scholar
Bares M, Brunovsky M, Kopecek M et al (2008) Early reduction in prefrontal theta QEEG cordance value predicts response to venlafaxine treatment in patients with resistant depressive disorder. Eur Psychiatry 23:350–355
Article Google Scholar
Bares M, Brunovsky M, Novak T et al (2010) The change of prefrontal QEEG theta cordance as a predictor of response to bupropion treatment in patients who had failed to respond to previous antidepressant treatments. Eur Neuropsychopharmacol 20:459–466
Article Google Scholar
Kopecek M, Tislerova B, Sos P et al (2008) QEEG changes during switch from depression to hypomania: a case report. Neuroendcrinol Lett 29:295–302
Google Scholar
Noonan SK, Haist F, Muller RA (2009) Aberrant functional connectivity in autism: evidence from low frequency BOLD signal fluctuations. Brain Res 1262:48–63
Article Google Scholar
Clementz BA, Sponheim SR, Iacono WG, Beiser M (1994) Resting EEG in first episode schizophrenia patients, bipolar psychosis patients, and their first-degree relatives. Psychophysiology 31:486–494
Article Google Scholar
Degabriele R, Lagopoulos J (2009) A review of EEG and ERP studies in bipolar disorder. Acta Neuropsychiatr 21:58–66
Article Google Scholar
Tas C, Cebi M, Tan O, Hizli Sayar G, Tarhan N, Brown EC (2015) EEG power, cordance and coherence differences between unipolar and bipolar depression. J Affect Disord 172:185–190
Article Google Scholar
Brooks J, Po WW, Ketter T (2010) Functional brain imaging studies in bipolar disorder: focus on cerebral metabolism and blood flow. In: Yatham LN, Maj M (eds) Bipolar disorder. Wiley, Chichester, pp 200–209
Chapter Google Scholar
Haldane M, Frangou S (2006) Functional neuroimaging studies in mood disorders. Acta Neuropsychiatr 18:88–99
Article Google Scholar
Culha AF, Osman O, Dogangun Y et al (2008) Changes in regional cerebral blood flow demonstrated by 99mTc-HMPAO SPECT in euthymic bipolar patients. Eur Arch Psychiatry Clin Neurosci 258:144–151
Article Google Scholar

Download references

Acknowledgments

The authors would like to express their thanks to NPIstanbul Hospital for providing the required EEG data.

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Uskudar University, Altunizade Mah. Haluk Turksoy Sk. No: 14, 34662, Uskudar/Istanbul, Turkey
Turker Tekin Erguzel
Department of Psychiatry, NPIstanbul Hospital, Istanbul, Turkey
Gokben Hizli Sayar & Nevzat Tarhan
Department of Psychology, Faculty of Humanities and Social Sciences, Uskudar University, Istanbul, Turkey
Gokben Hizli Sayar & Nevzat Tarhan

Authors

Turker Tekin Erguzel
View author publications
You can also search for this author in PubMed Google Scholar
Gokben Hizli Sayar
View author publications
You can also search for this author in PubMed Google Scholar
Nevzat Tarhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Turker Tekin Erguzel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Erguzel, T.T., Sayar, G.H. & Tarhan, N. Artificial intelligence approach to classify unipolar and bipolar depressive disorders. Neural Comput & Applic 27, 1607–1616 (2016). https://doi.org/10.1007/s00521-015-1959-z

Download citation

Received: 27 November 2014
Accepted: 05 June 2015
Published: 18 June 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s00521-015-1959-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Artificial intelligence approach to classify unipolar and bipolar depressive disorders

Abstract

Similar content being viewed by others

Automated diagnosis of bipolar depression through Welch periodogram and machine learning techniques

A Machine Learning-Based Method to Identify Bipolar Disorder Patients

EEG classification of adolescents with type I and type II of bipolar disorder

1 Introduction