Keywords

1 Introduction

Given the multitude of activities made possible by the brain, the visual object recognition is an extremely intriguing task. For so many years, neuroscientists have been trying to further their understanding of the various cognitive processes. It is increasingly possible by solving the problem of brain-mapping wherein a relationship is established between the perceptual state and the specific patterns in the brain.

Functional Magnetic Resonance Imaging (fMRI) is an imaging technology which is primarily used to record the brain activation during any activity by measuring neural activity in the brain. Its non-invasive, safe and easy-to-use nature powered with its promising spatial and good temporal resolution have contributed immensely to its popularity in medicine, research and industry. It has been instrumental in empowering studies that have thrown light on the functional aspects of the brain with respect to memory, language, pain, learning and emotion as elaborately discussed in [1]. Multiple methods of data analysis when applied on the fMRI data can give deeper insights into the patterns represented by these images of the brain. Researchers have now employed fMRI to conduct hundreds of studies that identify which regions of the brain are activated on average when a human performs a particular cognitive task. Research publications have enumerated the summary statistics of brain activity in various locations.

As elucidated in [2,3,4], a number of machine learning techniques can be effectively employed to draw certain scientific results. These depict how computing is used as tool to delve deeper into the patterns that are generated in the brain. These patterns have to be used by computing algorithms to draw inferences about many useful things. Pattern analysis is the key to solve the brain-mapping problem.

1.1 Related Studies

Many approaches have been developed for pattern analysis. Multivariate Pattern Analysis (MVPA) is described and used in [5] and [6]. It involves analysing the pattern considering the fMRI data as a whole. It has proven to be more sensitive and more informative about the functional organization of cortex than in univariate analysis with the General Linear Model (GLM). The multivariate pattern analysis allows us to study how specific stimuli are encoded in detailed activity patterns in specific parts of the brain.

Classifying the stimuli for a particular activity is a fundamental task in dealing with the brain. Machine learning makes this possible with data classification algorithms. A slew of classifiers have been used across various works. LDA is implemented by the work in [7]. A technique of using a collection of machine learning algorithms to train classifiers of specific stimuli is adopted in [8]. Here, GNB and kNN classifiers are combined to achieve more than 95% accuracy. It points out that the high dimensionality and intrinsically low signal-to-noise ratio of fMRI data raises a need for using alternate and collective methods of classification. The approach seems to improve multiple subject experiments by reducing the high inter-subject variability in brain function. The work in [9] uses LDA and SVM classifiers with the SVM classifiers achieving 53% accuracy for restricted voxels. It explains the classification with specific reference to the visual cortex. The work [10] also uses an SVM classifier to predict the orientation of the stimulus. Jeiran Choupan [11] compares SVM, NN and CRF classifiers under various conditions for the same dataset used in our work. Song [12] is a comprehensive study of SVM classification for fMRI data with different voxel selection schemes. Weili Zheng [13] points out the need for these classifiers to optimally select brain regions. It is evident that, owing to the high-dimensional nature and volume of the fMRI data, the performance of SVM classifier seems to have outsmarted all the other classifiers as in [14]. Hence, the motivation of this work is to incorporate an SVM classifier. In order to further enhance the performance of an SVM classifier, a different version of the same can be employed.

1.2 Fuzzy SVM

Fuzzy SVM (FSVM) is a classification methodology that can be incorporated as an extension of the SVM classifier with additional conditions for classification. A clear explanation about the underpinnings of the SVM and FSVM classifiers formulation were detailed in [15]. It explains about the handling decisions of classification based on certain rules whenever the distribution of the test data in the feature space does not yield a decisive classification.

The interest is to train classifiers to automatically decode the subjects’ visual cognitive state over an interval in time. When such classifiers are trained reliably, they can be made as virtual sensors of cognitive states to use them for further analysis or usage. This study investigates the utility of methods in improving the prediction accuracy of classifiers trained on functional neuroimaging data taken from [16].

1.3 Scope of Our Work

This work explores the use of a classification method—FSVM in the context of an event-related functional neuroimaging experiment where participants viewed images of objects in intervals. It requires to train support vector machines on functional data to predict with a greater accuracy the objects viewed by the participants. It shows that the classifier achieves better than random predictions and the average accuracy is close to that of the actual stimuli. Here, the classification method consists of feature extraction, feature selection and classification parts, and it also employs a feature extraction method based on the mean change in the intensity from baseline condition to the sample.

To process the fMRI data corresponding to the task of visualizing objects belonging to finite categories one after the other, classifiers are built to classify input fMRI image volumes into their corresponding categories. It involves performing statistical corrections and analysis on the data, selecting and extracting the characteristic features as voxels and training the data for classification by an SVM classifier, followed by the constructed FSVM classifier. N-fold cross-validation mechanism is used with the training of these classifiers. The performance of the two classifiers is compared with respect to their relative accuracies in predicting the different categories corresponding to the data.

2 System Design

The system implemented in this work involves the fMRI data of the visual one-back task dataset downloaded from [16] and the acquired fMRI data of two additional subjects that are employed as test data in the classification. The image volumes are preprocessed applying many techniques and the category representing features are extracted. The feature set is given as input to the SVM and FSVM classifiers for categorizing the data into the corresponding categories of objects that were viewed by the subject during data acquisition.

2.1 Dataset

During the task of recording the fMRI images for the dataset, the subjects see the eight objects presented as greyscale photographs for 24-s, followed by 12-s of rest. Each of the stimuli is held for 500 ms with an inter-stimulus interval of 1500 ms. Twelve time series volumes are extracted for each of the eight subjects.

Additional real test data was acquired by us by carrying out the same task (only for two categories—shoe and bottle images) with two healthy volunteers under the same experimental conditions [repetition time (TR) = 2500 ms, 40 3.5-mm-thick sagittal images, field of view (FOV) = 24 cm, echo time (TE) = 30 ms, \({\text {flip angle}} = 90^{\circ }\)].

Currently, the dataset consists of visual identification of eight different categories of objects: House, Scrambled, Cat, Shoe, Bottle, Scissors, Chair and Face as greyscale images by eight different subjects and additional test data.

2.2 Preprocessing

A series of operations is applied to correct and normalize the data to make it compatible for extracting features and further processing. This helps in preparing our data for classification. They are summarized in Table 1.

Table 1 A summary of the preprocessing steps applied

2.3 Feature Extraction

Extracting the features with respect to the baseline condition using feature space reduction and a searchlight technique to construct data that can be used for training.

The major steps in feature selection and extraction are explained as follows:

Examples Creation The brain images corresponding to each category are distributed across time in independent blocks. This step combines the images across time points as an example. It is done by block averaging, i.e. averaging the images within each block of time in a run.

Spherical Searchlight The image volume examples in a trial is analysed by applying a searchlight to compare each voxel with the neighbouring voxels. In this process, it is inferred if the voxel is representative of the features of the category. Hence, a set of voxels which represent the features are selected and the pattern is generated by formulating it as a feature vector labelled by the category it represents.

To reduce feature input dimensionality feature representing voxels need to be selected using a similar approach used in [3].

  1. i.

    A fixed sphere is moved over the brain image volume, voxel-by-voxel.

  2. ii.

    The mean intensity of all the voxels within the sphere is computed.

  3. iii.

    Fixing the mean value within the sphere as a threshold, all the voxels with higher intensity are assigned a score based on ranking.

  4. iv.

    This scoring information is corrected for multiple comparisons as each data point is used multiple times.

  5. v.

    Finally, all those voxels with the maximum score are selected.

Voxel Reduction The set of voxels returned by the searchlight are huge in number and contains voxels that are trivial. The features that represent the visual activity are localized around the visual cortex. A brain atlas that provides a spatial mask of the visual cortex is used as an anatomical mask to select the voxels that are confined around the visual cortex and reject the remaining voxels. This process yields a reduced list of voxels based on the Region of Interest (ROI).

Generation of Training and Test Data The reduced set of voxels is converted into a form of data which can be used to train a classifier. The x, y and z coordinates of the voxels are indicated along with the category label whose features the voxels represent. The unequal number of voxels of each category is adjusted by padding with out-of-bound values. The data is then optimally split into test and training data.

2.4 Classification

Support Vector Machine (SVM) and Fuzzy Support Vector Machine (FSVM) classifiers with linear kernels have been used for the classification. A cross-validation mechanism is used to determine the best possible subset for training.

SVM leads to good generalization performance [17] even in case of high-dimensional data and a small set of training patterns. It reduces the problems due to dimensionality by reducing the risk of overfitting the training data when the number of voxels is reduced.

FSVM follows the same principle of SVM, but certain additional computations are performed to add more decision rules to classify data that are either unclassified or classified in an overlapping fashion.

In FVSM, for an m-dimensional input \( \mathbf {x_i} (i = 1, \ldots , M )\) belonging to a class \(y_i\), and assuming the data to be separable linearly, the decision function is given by

$$\begin{aligned} D_i(\mathbf {x}) = \mathbf {w}^t \mathbf {x} + b \end{aligned}$$
(1)

where w is an m-dimensional vector and b is a scalar with the separating hyper-plane satisfying:

$$\begin{aligned} y_i (\mathbf {w}^t\mathbf {x_i} + b) \ge 1 \end{aligned}$$
(2)

As stated in [15], the procedure of classification is as follows:

  1. i.

    If \(D_i(\mathbf {x}) > 0\) for just one class, the input is classified into the class.

  2. ii.

    If \(D_i(\mathbf {x}) > 0\) for more than one class \(i \epsilon (i = i_1 , \ldots , i_l , l > 1)\), classify the datum into the class with the maximum \(D_i(\mathbf {x}) (i \in {i_1 , \ldots , i_l })\).

  3. iii.

    If \(D_i(\mathbf {x}) \le 0\) for all the classes, the datum is assigned to the class with the minimum absolute value of \(D_i(\mathbf {x})\).

The corresponding category is determined by the decision function is output. This classification result for the test data belonging to all of the categories is output by constructing a confusion matrix by the classifiers.

3 Results and Discussion

The features are extracted from the fMRI data in the dataset to construct the training set and tested with test data for classification. The results obtained from the classifiers are analysed to measure their performance.

The neuroimaging data exists as anatomical image volume and functional image volumes. The functional image volumes are the acquired data that reflect the intensity change as the stimulus events take place. The 4D time series for each subject consists of 1452 volumes with 40\(\,\times \,\)64\(\,\times \,\)64 voxels, corresponding to a voxel size of 3.5\(\,\times \,\)3.75\(\,\times \,\)3.75 mm and a volume repetition time of 2.5 s.

A sequence of preprocessing steps were applied using FSL [18], to the four-dimensional images to refine them and highlight the features. The brain portion is extracted from the image volumes and the corresponding masks are generated. Motion correction and filtering are done on them to correct recording errors and remove noise.

The resulting image volumes were further preprocessed to normalize the intensities across the voxel space. Detrending was performed on consecutive image volumes in the time series. After applying the other preprocessing steps, it appears like Fig. 1.

Fig. 1
figure 1

A slice of the final preprocessed fMRI image volume

The final preprocessed fMRI data is used for creating examples of the average image volumes for the corresponding stimuli conditions. Further, the spherical searchlight technique is applied on them using PyMVPA [19] to extract the voxels which represent the features.

The voxels in the features of the corresponding runs are reduced in number using ROI representing the visual cortex and corresponding feature data of the 577 voxels per object category is generated. The feature data is represented as training and test data.

The input data is split into training data consisting of nine runs and test data consisting of three runs per subject.

An SVM classifier with a linear kernel is invoked using PyMVPA with the generated training and testing data as input. It performs N-fold cross-validation by selecting various combinations of the training and test data to come up with the best possible classification.

The 12 runs of 8 subjects are split into 9 runs for training data and 3 runs of test data. There are eight categories of visual objects. The classifier outputs the classified label in each case. The category labels predicted by the classifier for test samples are compared with the actual categories they belong to.

The test samples of each category were tested with the SVM classifier. Out of 192 total samples, 140 were correctly classified. The results of the SVM classifier are summarized as the number of test samples predicted per categories versus the actual categories are shown in Fig. 2. This confusion matrix representing the classification results of the SVM classifier for each of the 24 test samples for the eight categories.

In the case of FSVM classifier, out of 192 total samples, 146 were correctly classified. The results of the FSVM classification are presented as a confusion matrix is shown in Fig. 3.

Fig. 2
figure 2

SVM confusion matrix

Fig. 3
figure 3

FSVM confusion matrix

The real test data acquired for the categories: shoe and bottle were tested with the SVM and FSVM classifiers and the results of the classification are summarized in Table 2. The FSVM classifier gives the correct prediction for both the subjects, indicating a better generalization over SVM.

Table 2 Results of testing SVM and FSVM classifiers with acquired dataset

Table 3 compares the number of test samples that were classified correctly in each category by the SVM and FSVM classifiers.

Table 3 Classification results of SVM and FSVM classifiers for various categories
Fig. 4
figure 4

Performance comparison of SVM and FSVM classifiers

The overall accuracy percentage of SVM was 72.92% and that of FSVM was 76.04%. Figure 4 shows that FSVM has considerably enhanced the overall percentage of accuracy along with the accuracy of certain specific categories. The category—face, whose accuracy was 45.83% with SVM had improved crossing the halfway mark to 54.16%. Scrambled, which had 75% accuracy in previous SVM has increased the accuracy to 83.33%. It is especially a category that is hard to generalize. Bottle and chair categories also saw considerable progress in accuracy with an increase of more than 4%. The other categories, however, perform with the same accuracy as SVM when trained and tested with FSVM.

4 Conclusion and Future Work

This work carried out the prediction of the visual state of the subject according to the object viewed by him/her and classified the visual stimuli into various categories. The major task was to consolidate the characteristic features of each of the stimulus object into a number of voxels to use for multivariate pattern analysis. The extracted features were used to train an SVM classifier and was tested to understand which categories were predicted accurately and which categories were mistaken for other categories by the classifier. The accuracy of the classifier was noted down. To minimize the effect of wrongly classified or unclassified data, Fuzzy SVM classifier was built by modifying the existing classifier and performing training and testing for the same data. This work demonstrates the improvement in the classification accuracy of the presently existing SVM algorithm when a Fuzzy SVM (FSVM) is used. This work is aimed at highlighting the possibility of applying computational methods to further the current medical diagnosis practices. It can be extended to building human–computer interfaces and understanding brain visual information encoding.