1 Introduction

Schizophrenia is a chronic brain disorder that disrupts the process of normal thinking, speech, and behavioral characteristics of a person. Functional magnetic resonance imaging (fMRI) plays a pivotal role in the design of automated tools for diagnosis of schizophrenia. It is a neuro-imaging technique that captures brain activity in small units of the brain volume called voxels, by measuring the change in blood-oxygen-level dependent (BOLD) [41] signals over time. The brain activities are closely linked to the supply of oxygen to various regions of the brain. As the blood oxygenation level of a brain region varies according to the neural activity, these differences play an important role. The difference in the magnetic properties causes small differences in the magnetic resonance (MR) signal of blood depending on the degree of oxygenation.

Functional magnetic resonance imaging is used to detect biomarkers within the brain for different types of task-related activations enabling detection of several brain disorders such as schizophrenia, Parkinson’s disease, Alzheimer’s disease, mild traumatic brain injury, addiction, and bipolar disorder. Several models based on machine learning techniques have been proposed [17, 18, 26, 39, 45] for investigating the fMRI data to identify different ailments. High dimensionality poses a major challenge in applying machine learning techniques to fMRI data. The fMRI data are typically 4-dimensional consisting of 3-D images across time. A 3-D fMRI image may be thought of as a sequence of 2-D images (slices) across the whole brain. Further, each slice comprises small units of brain volumes, called voxels. Thus, a voxel represents a particular position in the brain. Another issue that confronts the researchers is the non-availability of sufficient number of subjects/ data samples. The curse of dimensionality [5] and the small sample size render most models very sensitive to changes in data. To deal with high-dimensional fMRI data, most models employ some feature reduction/ selection techniques for the problem under investigation.

In this paper, we propose a three-stage feature selection model to classify schizophrenics and healthy subjects using fMRI Data. The study is based on fMRI data, acquired during auditory oddball (AUD) task. The first stage deals with the application of General Linear Model (GLM) followed by paired student’s t-test in the second stage, and finally, we employ the Non-dominated Sorting Genetic Algorithm (NSGA-II) [14] to generate a feature set (set of voxels) that has low cardinality and yields high classification accuracy. The proposed model achieves classification accuracies in the range 92.6% - 99.5% for FBIRN phase-II dataset [30] having healthy and schizophrenia subjects. Using the proposed model, we are able to identify relevant regions in the brain affected by schizophrenia. To the best of our knowledge, evolutionary approach has not been used in the bi-objective framework for fMRI data to build a computer-aided diagnosis model for schizophrenia subjects.

The rest of the paper is organized as follows: in Section 2, we summarize the related work; in Section 3, we describe the data sets and the details of the proposed methodology; in Section 4, we describe the experimental settings and the results, and finally in Section 5, we summarize the conclusions and outline the scope of future work.

2 Related work

Acquisition of fMRI data is a complex process that generates huge volumes of data. Knowledge extraction from this data involves several steps including preprocessing, feature reduction, and modelling, often using machine learning techniques. Several machine learning algorithms like Principal Component Analysis (PCA) [8, 16, 18, 28, 49], Fisher Linear Discriminant (FLD) analysis [17, 45], Singular Value Decomposition (SVD) [4, 27], deep neural networks [31, 52], Convolution Neural Network (CNN) [42] are often used for feature extraction and feature selection.

Ford et al. [17] combined both structural and functional MRI scans for classifying the schizophrenia and healthy individuals. They extracted hippocampal formation by applying a mask and used Fisher linear discriminant analysis (FLDA) to reduce the feature set with the objective of maximizing the ratio of between-class and within-class variability. Using Leave-One-Out Cross-Validation (LOOCV), they obtained an accuracy of 83% - 87% on a group of 23 subjects (15 schizophrenic and 8 healthy). In another study, Ford et al. [18] proposed the application of Principal Component Analysis (PCA) to lower the dimensionality of the data, and applied FLD to distinguish between healthy subjects and schizophrenia patients, obtaining an accuracy of 60% - 80% for different principal components. They also demonstrated the effectiveness of the approach for differentiating the healthy subjects from Alzheimer’s disease patients and patients with a mild traumatic brain injury. Shi et al. [45] used regional homogeneity [54] as a measurement of regional coherence of brain spontaneous activity. They used the anatomical template on ReHo map to organize it into 116 brain regions. Mean and standard deviation of ReHo values in each region were used as features for the classification model. Pseudo Fischer linear discriminant (PFLD) was applied in LOOCV manner to classify the healthy subjects and schizophrenia patients achieving correct prediction rate of 80%. Dermici et al. [15] proposed projected pursuit (PP) algorithm for feature selection and used Independent Component Analysis (ICA) for separating the data into maximally independent groups to identify the networks which are related to the schizophrenia. They applied three group ICA operations on the data from three different tasks and obtained 20 independent spatial components. The classification was performed using LOOCV. Arribas et al. [4] used a two-step method – one-sample t-test, followed by Singular Value Decomposition (SVD) to reduce the number of features of the fMRI scans with AUD task for classification of healthy subjects, patients with bipolar disorder, and patients with schizophrenia. They trained four classifiers using stochastic gradient learning rule and obtained average three-way correct classification rate (CCR) in the range 70% - 72%. Using the resting state and task-related fMRI data, Du et al. [16] classified the schizophrenia patients and healthy control. They used three-level feature selection approach. In the first step, they used hypothesis testing based on t-test. In the second step, they used the kernel principal component analysis (K-PCA) to compute a low-dimensional representation of significant voxels, and finally applied FLD to further extract features which maximize the ratio of the between-class variability to the within-class variability. Classification was done using LOOCV approach. Using majority voting, they achieved accuracy of 98% and 93% for the AUD task and the rest data, respectively.

In a study, Castro et al. [10] used a combination of Multiple Kernel Learning (MKL) machines and proposed a new MKL (v-MKL) algorithm for achieving a tunable sparse selection of feature sets which resulted in improvement in the classification accuracy while using functional brain imaging dataset. They obtained a classification accuracy of 85% and 90% using lp-norm and L-norm, respectively. Juneja et al. [27] have used pattern recognition techniques for dimension reduction for fMRI data to classify schizophrenia and healthy subjects. They proposed a three-phase method for analysing the fMRI data. In the first phase, they generated 3-D spatial maps using GLM and ICA to generate independent components. In the second phase, they used clustering to retain local spatial contiguity followed by singular value decomposition (SVD) on each cluster, thus reducing the number of features substantially. In the third phase, a novel hybrid multivariate forward feature selection method was used to extract the features. Finally, schizophrenia and healthy control were classified using SVM with LOOCV policy, achieving 92.6% and 94% classification accuracy for the two fMRI datasets from FBIRN.

In another study, Juneja et al. [26], applied statistical paired t-test on the contrast map images created by SPM to develop a computer-aided diagnosis (CAD) tool to distinguish between the schizophrenic patients and the healthy controls. Having obtained the minimal set of features from statistical significance testing, they used the selected features for the classification task using Support Vector Machine (SVM). Using the LOOCV method, they obtained an accuracy of around 80% and 88% on the two fMRI datasets from FBIRN. In another study, Juneja et al. [28] proposed a three-phase dimension reduction technique comprising segmentation of 3-D spatial maps (ICA and β maps) into anatomical brain regions, followed by feature extraction carried out using fuzzy kernel PCA, and finally used the filter cum wrapper feature selection for finding reduced set of features. In their model, classification of schizophrenia and healthy subjects was done in LOOCV manner using SVM, resulting in accuracy of 95.6% and 96% on two fMRI data set from FBIRN.

Some works have also been reported for computer aided diagnosis of schizophrenia using resting state (rs) fMRI. Chyzhyk et al. [13] used Pearson’s correlation based features selection method, followed by application of genetic algorithm, to find an optimal set of features. Subsequently, they applied ensemble of extreme learning machine classifiers resulting in an accuracy of around 86%. Savio et al. [43], worked on rs-fMRI data of schizophrenia subjects and healthy controls. They computed different local activity measures, followed by application of three feature selection algorithms, namely, Pearson’s correlation measure, Bhattacharyya distance [6] and Welch’s t-test [50]. Finally, they used SVM to carry out the classification task and obtained maximum accuracy of around 80%.

Recently, multi-objective optimization approaches have been used for analysis of fMRI data. Aaberg et al. [1] proposed an evolutionary approach to select the features for multivariate pattern analysis. They used a single subject fMRI dataset having task conditions of brushing and resting state alternatively. Multiple Linear Regression (MLR) classifier was applied to the subjects individually using only five voxels to obtain an accuracy of 74.3%. Niiniskorpi et al. [39] used particle swarm optimization (PSO) in conjunction with simple MLR classifier and SVM with the linear kernel for the classification task for identifying the brain regions. They built two datasets, one having a single subject healthy control (a brushing task and resting state alternately), and another dataset comprising nine healthy controls (fingertapping task). They achieved a classification score of 83.5% on a group level 3D fMRI data from the fingertapping study. Ulker et al. [47] used a combination of an active method [38] and genetic algorithm for feature selection. Using a set of 300 voxels they obtained classification accuracy of around 90%. A genetic algorithm was also used by Shahamat et al. [44] for feature selection on fMRI images, followed by Linear Discriminant Analysis (LDA) to classify schizophrenia patients and healthy controls. The authors obtained an average classification accuracy of 83.0%, but they did not identify the regions in the brain that are responsible for the schizophrenia. Smart et al. [46] studied the application of Genetic Programming (GP) in feature selection using intracranial electroencephalography (iEEG) and fMRI data of epilepsy patients. They observed the need for patient-specific feature selection for better classification results. Using nearest-neighbour classification and 30 GP generations, they achieved over 60% median sensitivity and over 60% median selectivity for fMRI data. Ma et al. [37] carried out Multi-Voxel Pattern Analysis (MVPA) as a Multi-Objective (MO) pattern classification problem. They integrated a hierarchical heterogeneous PSO (HHPSO) scheme with SVM to propose a feature interaction detection framework for voxel selection. In this framework, the first stage finds a subset of interacted features while the second stage further eliminates interaction (or connectivity) redundancy, improving the classification accuracy.

3 Materials and methods

3.1 Dataset

All the data used for this study were obtained from the Function BIRN Data Repository. FBIRN repository contains the multi-site fMRI dataset which includes schizophrenia and healthy subjects. The data was acquired using 1.5T and 3T scanners keeping all other parameters same for the subjects. In this study, we have used BOLD fMRI data of Auditory oddball (AUD) task, where all subjects had regular hearing levels, sufficient eyesight, and were able to perform cognitive task. Healthy subjects were excluded if they had a current or past history of head injury or major medical illness. Only those subjects with schizophrenia and schizoaffective disorder were allowed who met the criteria as per the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) [20].

3.1.1 Dataset details

In our study, we have used two datasets, namely, D1 and D2. The dataset D1 contains fMRI data of 30 schizophrenia patients and 30 healthy subjects (available at site 0009 and site 0010 of FBIRN repository), which were acquired with 1.5T scanner. Four runs of each subject’s scan have been used for the experiments. Table 1 shows the demographic details of the dataset.

Table 1 Demographic details of the dataset D1 (1.5 Tesla)

The dataset D2 comprises fMRI data of 25 schizophrenia patients and 25 healthy subjects (available at site 0005, site 0006 and site 0018 of FBIRN repository) acquired with 3T scanner. Four runs of each subject’s scan have been used for the experiments. Table 2 shows the demographic details of the dataset.

Table 2 Demographic details of the dataset D2 (3 Tesla)

3.1.2 Task details

Auditory oddball task is a common task [3, 25, 29, 36, 40] used to detect alterations in brain activation patterns that help to differentiate between schizophrenic and healthy subjects. A subject is presented with a continuous stream of sound, and he/ she must identify the sequence of discrete stimuli comprising standard tones and deviant (i.e. oddball) tones. Standard tones, i.e., 1000 Hz appear for 95% of trials. Deviant (i.e. oddball) tones (1200 Hz) that are distinct from standard tones, appear occasionally (5% of trials). The FBIRN conducted the Auditory oddball task consisting of four experimental runs, each having duration of 280 seconds. During the experiment, in each run, the subjects were asked to see a gray screen with a black fixation cross in the middle. They were asked to press button ‘1’ each time they heard a deviant tone while focusing on the cross and listening to the tones. The task began with a fixation block of the silence of 15 seconds. Then a sequence of standard tones (duration = 100 ms) were presented. The deviant tone (duration = 100 ms) was presented every 6 to 15 seconds. A period of silence (duration = 15 seconds) ended each task run. In each experimental run, 140 brain scans were acquired with repetition time (TR) of 2 seconds.

3.1.3 Imaging parameters

According to FBIRN repository, the functional scans were T2*-weighted gradient EPI (Echo Planar Imaging) sequences. Pulse sequence parameters were closely matched based on pilot studies carried out by FBIRN research group: Orientation: anterior commissure-posterior commissure line; the number of slices: 27; slice thickness: 4 mm; TR: 2 seconds ; time to echo: 40 ms for 1.5 T scanners; matrix: 64 × 64; field of view: 22 cm; and flip angle: 90.

3.2 Theoretical background

Genetic algorithms (GA), often used for solving optimization problems, are evolutionary algorithms based on natural or biological evolution processes. They follow Darwin’s “survival of the fittest” concept and evolve to find the optimal solution from a set of candidate solutions. A genetic algorithm starts with an initial population of candidate solutions represented by vectors of strings or alphabets, mainly binary alphabets (0,1). These vectors, also called chromosomes, are randomly initialized. Once the chromosomes are generated, the genetic algorithm finds the fitness values of each of them for the optimization problem at hand. The next generation of solutions (also called child chromosomes), is created using selection, crossover, and mutation operations. The selection step imitates the survival of the fittest by giving preference to the better individuals. The selected chromosomes are placed in a common mating pool. In the crossover step, a crossover point is randomly selected and the crossover is done by recombining the portions of the two individuals to create two new offspring. The mutation step involves flipping one or more bits of the individuals. The purpose of the mutation is to maintain diversity amongst the chromosomes with the objective of avoiding premature convergence. The steps are repeated until no significant improvement is observed in successive generations, or the time-out condition is reached.

A bi-objective optimization problem is modelled using two conflicting objective functions f1 and f2 as:

  • f1: To be maximized or minimized

  • f2: To be maximized or minimized

The optimal solutions to the above problem can be modelled as a vector valued objective function f as:

$$f:X\rightarrow \mathbb{R}^{2}, f(x)=(f^{1}(x),f^{2}(x))^{T}$$

where a point xX denotes a feasible solution, and \(Y \in \mathbb {R}^{2}\) (solution space) denotes the image of X (decision space). Since the objectives are conflicting in nature, no single solution can optimize both the objective functions simultaneously. A solution xX is said to be Pareto optimal [14] if and only if there is no other solution xX that is equally good or better than x on both the objectives.

The fMRI dataset dimensions are too large for a classification model to distinguish between healthy and schizophrenic patients. Therefore, one needs to select an appropriate feature set for the efficacy of a decision model. To the best of our knowledge, evolutionary approaches have not been effectively applied to select relevant features that help to differentiate between schizophrenic and healthy subjects. Moreover, there is a conflicting relationship between classification accuracy and feature set size. This paper is the first attempt towards bi-objective modelling of the fMRI data analysis in schizophrenia to address the above mentioned conflicting issues. In this paper, we make use of Non-Dominated Sorting Genetic Algorithm (NSGA-II) [2, 14] to arrive at the Pareto optimal front. It is an evolutionary algorithm to solve the bi-objective optimization problem that aims at improving the fitness and adaptability of the population of candidate solutions towards the Pareto front.

The runtime complexity of NSGA-II mainly lies in the non-dominated sorting – the most expensive part of the algorithm, and the little time spent in computing the objective functions is insignificant. Thus, the runtime complexity of the algorithm is of the order O(mN2), where m is the number of objective functions and N is the population size [14]. As NSGA-II dominates the computation time of the proposed approach, the run-time complexity of the overall approach is also O(mN2). The space complexity of the NSGA-II is of the order O(mN + N2) [14, 19]. In this study, we are proposing a three-step feature selection algorithm. In the first step, we use standard general linear model (GLM) [21] approach. The second step involves the application of the paired Students’ t-test. Finally, we apply the NSGA-II to select the features useful for the classification task.

In this study, we have identified the following two conflicting objective functions:

  • f1: Maximization of classification accuracy

  • f2: Minimization of number of features

In the next section, we will discuss each step of our feature selection methodology in detail.

3.3 Our approach

For classification of schizophrenic patients and healthy subjects, we adopted a three-stage approach as shown in Fig. 1. Each stage of the proposed approach is described in the following subsections.

Fig. 1
figure 1

Stages in the proposed approach

figure f

3.3.1 Data pre-processing

The raw datasets taken from FBIRN repository have been preprocessed using Statistical Parametric Mapping (SPM) toolbox version 8 (SPM8, Wellcome Trust Centre for Neuroimaging, University College London, UK).Footnote 1 Raw scans were collected at voxel size of 3.4 × 3.4 × 4mm3. These are realigned with the first scan as a reference. The slice timing correction is done to correct the possible errors by temporal variations during the acquisition of fMRI datasets. Subsequently, the fMRI scans are spatially normalized into standard Montreal Neurological Institute (MNI) space using an EPI template available in SPM8. This transforms the initial voxel’s dimension to 3 × 3 × 3mm3 and yields each volume of 53 × 63 × 46 voxels. Finally, spatial smoothing is done with a 9 × 9 × 9mm3 full width at half maximum (FWHM) Gaussian kernel to get the smoothed volumes.

3.3.2 Stage-1: 1st level analysis

The 4-D fMRI scans of each subject obtained from the preprocessing steps are analyzed by employing general linear model (GLM) using SPM8 toolbox in MATLAB. GLM analysis is carried out by specifying the condition pair of deviant tone response versus standard tone response.

GLM analysis generates a 3-D contrast map, also called activation map. In a contrast map, the value at a particular voxel estimates the difference between the activation of that voxel amongst the conditions. Zero value at a voxel indicates that the particular voxel is not activated during the task condition [26]. GLM analysis is carried out for each of the four runs corresponding to each subject. Thereafter, for each subject, an average 3-D contrast map having only the activated voxels, is generated by averaging the contrast maps obtained for each of the four runs. Though, this stage reduces the feature vector size considerably, the dimensionality is still too large to efficiently distinguish the two classes.

3.3.3 Stage-2: the statistical testing

We linearised each subject’s data into a one-dimensional vector. In the second stage of analysis, we have used the popular two sample t-test for selecting the relevant features. T-test is applied feature-wise to find the statistical significance of a feature between the two groups of data. The null hypothesis (H0), mean value of a feature between the two groups being the same, is tested at α = 0.01. Let d be the total number of features selected in the first stage, then the t-test value will be computed for each ith feature as:

$$ t_{i}= \frac{ \mu_{s_{i}} - \mu_{h_{i}}}{\sqrt{\frac{\sigma^{2}_{s_{i}}}{n_{s}}+\frac{\sigma^{2}_{h_{i}}}{n_{h}}}} $$
(1)
$$\forall i = 1,2,3,...,d$$

where \(\mu _{s_{i}}\), \(\sigma ^{2}_{s_{i}}\) are the mean and variance for the schizophrenia patients and \(\mu _{h_{i}}\), \(\sigma ^{2}_{h_{i}}\) denote the mean and variance values for healthy subjects respectively, corresponding to ith feature. ns and nh are the number of schizophrenia and healthy subjects respectively. Higher t-test value signifies higher relevance of a feature. The t-test values are considered for ranking the features and they have been sorted accordingly. Based on experimental exploration with selection of different numbers of features, finally top 300 features (rank wise) were passed to stage-3 of our approach.

3.3.4 Stage-3: application of GA

The third stage of the proposed approach involves the application of the non-dominated sorting genetic algorithm (NSGA-II) [14] which is outlined in Algorithm 1. Based on the features selected in stage-2, we have created a population of binary chromosomes. Each chromosome is 300 bits long. A one (zero) at a position in the chromosome indicates the presence (absence) of the corresponding feature. Initial chromosome is randomly generated with 20% of the bits being one. For our experiments, the population size (S) is fixed at 200. The fitness value of a chromosome for the first objective function (f1, maximization of classification accuracy), is independently evaluated by employing three different classifiers, namely, support vector machine (SVM) with linear kernel, SVM with sigmoid kernel and k-NN classifier (with k= 1). The fitness value of the second objective function (f2, minimization of number of features) for a chromosome is computed by counting the number of ones in the chromosome. Offspring population (Mi) is generated using binary tournament selection, followed by one-point crossover and mutation. The mutation is applied at the rate of 0.01. The fitness value of the child population (Mi) generated after mutation step is computed, and a pooled population (Ti) of the initial (Pi) and child population (Mi) is formed. The pooled population (Ti) is then sorted to find the set of non-dominated solutions along the Pareto-Front. The chromosomes representing the trade-off solutions (Pi+ 1) are passed to the next generation. The maximum number of iterations (MaxGen) has been set to 100.

We have used the LOOCV scheme for feature selection. In LOOCV, one data sample is used for testing and rest are used for training purpose. This process is repeated N times (where N is the sample size) in such a way that each sample is chosen as a test sample exactly once. The feature selection process is carried out only on the training data to avoid the danger of double dipping [33]. We have repeated each experiment 10 times to capture the variability of the evolutionary approach. The feature selection process is shown in Algorithm 2.

figure g

4 Experimental results and discussion

Experiments are carried out using MATLAB-R2014a (Mathworks Inc., Natick, MA, USA) in Ubuntu 14.04LTS environment on a machine having Intel ® Xeon having 2.10GHz x17 processor with 32GB RAM. We have used SPM8 toolbox for preprocessing and general linear modeling; libsvm [11] package for the classification task; Talairach Daemon for mapping; and Multi-image Analysis GUI (Mango) [35] for visualizing the mapped brain regions.

We have used C-Support Vector Classification (C-SVC) [7], available in libsvm tool for Matlab, by fine tuning its parameters. The regularization parameter C was fine tuned at C= 100 after evaluating the values of C from 0.01 to 1000 in steps of 10. C-SVC uses the loss function,

$$ \underset{\omega, b, \xi}{min} \qquad \frac{1}{2} \omega^{T} \omega + {C}{\sum\limits}_{i = 1}^{l}\xi_{i} $$
(2)

subject to yi(ωTϕ(xi) + b) ≥ 1 − ξi,

$$\xi_{i} \geq 0, \quad i = 1,..., l$$

where ϕ(xi) maps xi into a higher-dimensional space, ω is the vector variable, ξi are the slack variables, and C > 0 is the regularization parameter [11].

4.1 Experimental results

Each experiment in our three-stage evolutionary based approach was repeated ten times to capture the variability. For dataset D1 (1.5 Tesla), we obtained mean classification accuracies of 99.0% , 99.5%, and 95.0% using SVM with sigmoid kernel, SVM with linear kernel, and 1-NN classifier, respectively. Table 3 shows the results for each run of the experiments on D1. It shows the mean and standard deviation of the number of features obtained in each run along with classification accuracies of the models. Similar experiments were carried out on dataset D2 (3 Tesla). For dataset D2, we obtained accuracies of 97.4%, 95.2% and 92.6% using SVM with sigmoid kernel and linear kernel, and 1-NN classifier, respectively. Table 4 shows the results for dataset D2.

Table 3 Classification accuracy, mean and standard deviation values of feature sets for each of the 10 runs of the experiments with i) SVM using Sigmoid kernel, ii) SVM using Linear kernel and iii) 1-NN classifier respectively for dataset D1
Table 4 Classification accuracy, mean and standard deviation value of feature sets for each of the 10 runs of the experiments with i) SVM using Sigmoid kernel, ii) SVM using Linear kernel and iii) 1-NN classifier respectively for dataset D2

Figures 2 and 3 show the variability in the number of relevant features selected for classification using linear SVM across 10 runs for dataset D1 and D2, respectively.

Fig. 2
figure 2

Variability in the number of selected features by linear SVM across the ten runs for dataset D1

Fig. 3
figure 3

Variability in the number of selected features by linear SVM across the ten runs for dataset D2

To evaluate the relevance of the proposed methodology, we have conducted experiments without incorporating any feature selection method for datasets D1 and D2. The feature set obtained after the GLM analysis was used for classification using SVM with linear kernel and sigmoid kernel, and 1-NN classifiers in LOOCV manner. For datasets D1 and D2, we obtained the mean classification accuracies of 45.0% and 44.0% in case of linear SVM, accuracies of 53.33% and 40% in case of SVM with sigmoid kernel, and accuracies of 53.33% and 42% for the 1-NN classifier, respectively (see Table 5).

Table 5 Mean classification accuracy for dataset D1 and D2 without feature selection

We have also conducted the experiments using principal component analysis (PCA) tool available in Matlab2014b for feature selection. The feature set obtained on carrying out GLM and t-test from the stage-1 and stage-2 analysis, was used as an input to the PCA. For the purpose of classification, we have used linear kernel SVM and L1-regularized L2-loss SVC in linear SVM in a LOOCV manner. We have experimented with the cost parameters by changing the value of C from 0.01 to 1000 in an interval of multiple of 10 to obtain the optimal accuracy. For dataset D1, using linear SVM and L1-regularized L2-loss SVC, we obtained highest mean classification accuracy of 65% and 60% respectively. For dataset D2, we have obtained highest mean accuracy of 50% and 52% with linear SVM and L1-regularized L2-loss SVC respectively.

4.2 Discussion

Different runs of the experiment on dataset D1 (see Table 3) resulted in about 40-50 features for each fold of the LOOCV technique. The experiments, repeated ten times, yielded about 800 distinct voxels. These identified voxels represent the regions that help in distinguishing between the schizophrenia patients and the healthy subjects. Brain regions, to which these voxels belong, are identified using the Talairach Daemon [34] for carrying out multi-level analysis – hemisphere level, lobe level, gyrus level and cell type (Brodmann Area) of the human brain in Talairach’s space. Figures 456 and 7 represent the selected regions for dataset D1. As shown in Fig. 4, majority of the selected voxels either belong to the left cerebrum or right cerebrum region of the brain, or to the right brainstem. Figure 5 shows the percentage wise of distribution of voxels among the lobes. It can be observed that the majority of the voxels either lie in the frontal lobe, limbic lobe, mid brain or the temporal lobe. Figure 6 represents gyrus level analysis. It can be observed that the identified voxels either lie on the superior frontal gyrus, medial frontal gyrus, middle frontal gyrus, culmen, postcentral gyrus and thalamus. Figure 7 shows the percentage wise distribution of the identified voxels among the Brodmann Areas (BA). We can observe that the majority of the identified voxels either lie in BA 10, 6, 37, 8, 9, 2, 3, 19, substancia nigra, red nucleus or hypothalamus regions.

Fig. 4
figure 4

Percentage wise distribution of affected voxels covering hemisphere regions for dataset D1

Fig. 5
figure 5

Percentage wise distribution of affected voxels covering the lobes for dataset D1

Fig. 6
figure 6

Percentage wise distribution of affected voxels covering gyral regions for dataset D1

Fig. 7
figure 7

Percentage wise distribution of affected voxels covering Brodmann’s areas for dataset D1

Like the results on dataset D1, results on dataset D2 map to similar regions as shown in Figs. 8910 and 11 except for red nucleus and hypothalamus regions. In addition, voxels from anterior cingulate, parahippocampal gyrus, inferior frontal gyrus and precuneus regions are also present in the results on dataset D2.

Fig. 8
figure 8

Percentage wise distribution of affected voxels covering hemisphere regions for dataset D2

Fig. 9
figure 9

Percentage wise distribution of affected voxels covering the lobes for dataset D2

Fig. 10
figure 10

Percentage wise distribution of affected voxels covering gyral regions for dataset D2

Fig. 11
figure 11

Percentage wise distribution of affected voxels covering Brodmann’s areas for dataset D2

The regions identified by our proposed approach, are similar to previous studies on schizophrenia [10, 24, 30, 48, 51, 53]. Several comparative studies between schizophrenia patients and healthy subjects were made to localise the brain regions, responsible for the diseased state. Kim et al. [30] identified the regions like culmen, superior temporal gyrus, middle temporal gyrus, inferior frontal gyrus, postcentral gyrus, parahippocampal gyrus, precuneus, angular cingulate gyrus, and so on. In a similar study, Garrity et al. [22], showed that the patients with schizophrenia exhibited similar brain connectivity between the regions comprising posterior cingulate, precuneus and cingulate gyrus. The experiments also marked the precuneus and the middle frontal gyrus regions that are involved in selective attention, which is an important characteristic of schizophrenia [48]. Neuropathological alteration in substancia nigra region has been noticed in a previous study [51], and some task-evoked hyperactivity in this region has also been observed in schizophrenia patients [53]. In another study, Honea et al. [24] analysed the structural segment of MRI with the objective of distinguishing between healthy and schizophrenia patients. Their findings showed that the patients with schizophrenia had highly significant decreases in the frontal cortex, mainly the bilateral medial frontal cortex and inferior frontal gyri regions. They noted that the changes in the gray matter volume in the prefrontal and medial frontal cortices were more evident. In another study, decrease in hypothalamus volume was noted from structural point of view [32].

Figures 12 and 13 show the identified voxels in the brain for three different views of the brain for dataset D1 and D2, respectively. Our study also points to the differences in these regions between the schizophrenia patients and healthy subjects, as seen in Figs. 6710, and 11. Moreover, several studies [9, 12, 23] have also identified regions similar to our study.

Fig. 12
figure 12

Voxels identified by the proposed approach with variation in t-test values in different views of the brain for dataset D1 (1.5 Tesla)

Fig. 13
figure 13

Voxels identified by the proposed approach with variation in t-test values in different views of the brain for dataset D2 (3 Tesla)

The main contribution of the paper lies in the application of a bi-objective optimization framework in classification of fMRI data. It uses NSGA-II to select a small set of features (voxels) that improves the classification accuracy. Our study also identifies the relevant regions of the brain which are potentially affected in schizophrenia. This study may throw light on the conventional line of treatment of the disorder.

5 Conclusion and future scope

In this paper, we have addressed the problem of feature selection in fMRI data to improve the classification accuracy in a bi-objective framework. We have proposed a three-stage approach comprising of GLM analysis, statistical hypothesis testing, and NSGA-II to obtain a small set of relevant features that yields high classification accuracy. Thus, using a small set of 40 to 50 voxels, we achieved a mean classification accuracy of 99.5% over ten runs of the experiment. Using brain atlases in the Talairach space, we have successfully identified the regions of the brain that are mostly affected in schizophrenia patients. Specifically, we were able to identify the regions that are helpful to make a distinction between healthy subjects and schizophrenia patients. In future, one may explore the applicability of other evolutionary approaches like differential evolution, particle swarm optimization, and ant-colony optimization for identifying the brain regions affected by schizophrenia. Further, this study may be extended to incorporate the effect of different co-variates like age, gender, smoking habit, and anti-psychotic medication. It may also be interesting to explore the applicability of the proposed methodology to structural MRI analysis to find the volumetric changes in brain.