1 Introduction

Parkinson’s disease (PD) is a progressive neurodegenerative disorder caused by early death of dopaminergic neurons in the substantia nigral region [26]. This pattern of neurodegeneration starts from dorsal striatum and extends more to the ventral part of striatum as the disease progresses. The striatum comprising the caudate nucleus and the putamen regulates various aspects of motor and cognitive functions. In Parkinson’s disease, high levels of reactive oxygen species produced by dopamine metabolism result in an increased level of iron content which is liable to damage the cell components affecting the neuronal functions [12]. The striatal dopamine deficiency is seen in 2-3% of the population in the age group of greater than 65 years [26]. PD, being the second most common neurodegenerative disease, is strongly associated with increased mortality rate in the recent years [25] with a mediocre of 1.4% of the total population all over the world [8]. The manifestations of PD have become more widespread in the striatal region in later stages of the disease. The knowledge pertaining to the impairment of the dopaminergic pathways is related to the PD symptoms [17]. This depletion of dopaminergic neurons creates a manifold of motor and non motor symptoms. Motor symptoms include shaking, rigidity, slowness of movement and difficulty with walking. Although the cardinal features of the disease are motor in nature, the major non motor symptoms encompass of depression, psychosis, falls, genitourinary problems, and sleep disturbances [4]. The symptoms appear when 60% of the dopaminergic neurons start declining [5]. These motor and non motor symptoms correlate with the aging factors [28] and influence the deterioration of quality of life. Appropriate treatment measures that could slow or halt the progression of disease is crucial during the earlier stages of the disease.

Neuroimaging is able to capture the pathophysiological changes that determine the impairment of the dopaminergic pathways for the diagnosis of the disease. A wide range of neuroimaging studies are performed to yield an in vivo quantification of early PD [23]. Various neuroimaging modalities like Magnetic Resonance Imaging (MRI), Single Photon Emission Computed Tomography (SPECT), Positron Emission Tomography (PET), Transcranial Sonography, Functional Magnetic Resonance Imaging (fMRI) are used for diagnoses of PD [1, 29]. PET and SPECT imaging modalities are used to visualise the striatal region to give the dopaminergic deficiency. However these neuroimaging methodologies are liable to diagnose the disease only when 80% of the neurons start degenerating and when salient manifestations of the disease develop [3]. The symptomatic treatments available for PD demand an initiation of the treatment at a later stage. This creates an extensive deterioration of the disease making the treatment procedure to be of minimal help. Hence early diagnosis of PD is required for early management to allow for better treatment procedures to be followed. Thus the definition of new biomarkers is required along with SPECT and PET imaging to diagnose the disease severity and progression. Recently, Magnetic Resonance (MR) brain imaging methods have reported promising results in providing better characterisation of PD in the early stages and are expected to have better sensitivity than standard clinical measures [21]. MRI could monitor brain structural changes [3] and detect the iron accumulation in the substantia nigra [12]. The quantification of structural changes would facilitate the evaluation of disease progression. However, a comprehensive analysis of the MR images is required for investigating these pathological changes.

Volumetric analysis is one of the widely used MRI protocols to demonstrate pathological modifications related to PD in the striatal region [23]. Voxel based morphometry is an automatic volumetric method employed for the detection of gray matter intensity reduction [18] in the caudate and putamen regions [12]. These recent studies on shape differences in the brain structures have revealed local atrophy. However these analyses suffer from lack of spatial specificity. Deep learning neural network has recently emerged as a powerful analysis to exploit spatial structure of subanatomical regions [7]. Various deep learning techniques have been used in medical image analysis for segmentation, lesion detection, registration, shape modelling and disease classification [10]. Deep neural networks have an exceptional generalisation capacity and are capable of extracting higher level features that provide better accuracies in the disease classification. Additionally, the development of Convolutional Neural Networks (CNN) for image processing has led to integration of feature extraction and learning of models into unified framework. CNN architectures have achieved remarkable performances in a wide range of medical image applications that includes classification of Alzheimer’s disease [36] and alcoholism detection using MR images [37]. The integration of feature extraction and learning models improves the diagnostic performance, however, it makes the construction of the CNN architecture more laborious and time consuming. Thus an alternative approach is to utilise the weights of a trained model that has been constructed for a different application [34]. The knowledge acquired from these pre-trained models can be used with transfer learning for medical image analysis. Several CNN models such as LeNet, AlexNet, VGG-Net, GoogLeNet, ResNet, and Inception were built for various image classification tasks [32]. AlexNet is a deep convolutional neural network model that was trained to classify the images on ImageNet Large-Scale Visual Recognition Challenge (ILSVRC), a subset of Image Net [15]. Improved classification performance of the AlexNet architecture with transfer learning has been reported in various medical applications such as detection of polyps from colonoscopy data [34], soft tissue sarcoma classification [11], detection of pathological brain conditions [20] and diagnosis of Leukemia from blood slides [35].

In this work, analysis of T2 weighted MR brain images for discrimination of healthy control and PD subjects is attempted using CNN model. A transfer learned pre-trained AlexNet model is used to classify these images into two classes as Healthy Control (HC) and PD. The MR images are trained and tested using the modified network model and the performance measures are evaluated. The paper is organised to expound the methodology and architecture used in this work followed by the experimental results obtained by classification and the conclusion of the proposed work.

2 Methodology

This section briefly explains on the MR image database, pre-processing of the MR images, CNN AlexNet architecture, Transfer learning applied to the pre-trained AlexNet model and the classification accuracy measures to evaluate the model for given input image dataset.

2.1 PD image database

The axial T2 weighted MR images are obtained from the Parkinson’s Progression Markers Initiative (PPMI) public domain database [22]. The PPMI database is widely applied by researchers to identify the progression biomarkers in PD and access the structures and the function of brain over the course of disease. The PPMI cohort used in this study consists of 182 subjects with 82 Healthy Control and 100 Parkinson’s disease subjects. The demographics of the subjects used in this study are given in Table 1.

Table 1 Demographics of the subjects

2.2 MR image pre-processing

In the pre-processing stage, the input MR images are normalised to achieve a uniform contrast and intensity range across all the images [24]. Each MR image in the input dataset is normalised to the range (0, 1) by selecting a general normalisation criterion [10, 27]. The normalised MR images are then subjected to filtering operation for noise reduction. A 2D Gaussian filter with a smoothening kernel of size 5 X 5 and an optimised standard deviation σ of 0.8 is used to smoothen and thus reduce the intensity inhomogeneities from the images to be used for further processing.

2.3 Architecture

AlexNet model is used as the CNN architecture which comprises of different layers such as the input layer, convolution layer, pooling, dropout layer and the fully connected layer. These layers perform the necessary operations to classify the input data as HC and PD. The input layer obtains the MR images from the user and resizes the images as specified by the pre-trained model to transfer them to the next layer. The first few slices of each subject which do not carry any information required for the analysis are removed [13] and the remaining are chosen for the analysis.

The architecture of the considered AlexNet model is shown in Fig. 1. There are five convolution layers and three fully connected layers. The output from the first, second and the fifth convolution layer are pipelined to the max-pooling layer. The convolution layer performs the convolution of the input images with the predefined kernels. The kernel size and the stride parameters considered in this model are given in the Fig. 1. The convolution layers of AlexNet use nonlinear activation function known as the Rectified Linear Unit (ReLU). The problem of diminishing or vanishing gradient can be solved by the ReLU activation function. The ReLU activation function can be given as

$$ f(x)=\left\{\begin{array}{c}0\kern2.25em if\ x<0\\ {}x\kern2.25em if\ x\ge 0\end{array}\right. $$
(1)
Fig. 1
figure 1

Architecture Diagram

Normalisation is applied to the first two convolution layers in the network. The pooling layers use downsampling to reduce the dimensionality of the outputs produced by the previous layer by either max-pooling or average pooling. The fully connected layer determines the features correlating to a given class. The compact, high level features are extracted by the fully connected layers. The atrophy in the PD subjects compared to HC is captured as high level features for classification. The dropout layers are added after the fully connected layer in order to reduce the over-fitting of the entire network [9]. The drop out probability is set to be 0.5 [31].

2.4 Transfer learning

The transfer of knowledge from one learned task to a new task is known as transfer learning in machine learning algorithms [14]. The pre-trained AlexNet model with transfer learning is considered and adapted for classification of MR images for Parkinson’s disease. The weights of the convolutional layers of CNN are initialised with the weights of the pre-trained AlexNet CNN model with the same architecture. The features in CNN are more generic in the early layers and pertain specifically to the Parkinson’s dataset in the later layers. The last fully connected layers are fine-tuned with Parkinson’s disease MR images and thus high level features of the Parkinson’s data are learnt. This is achieved by appropriately modifying the last fully connected layer with two output neurons to classify MR images as HC and PD. Additionally, CNN would require large number of Parkinson’s data for generating and updating the weights. Hence transferring the weights from pre-trained model to CNN would yield desirable performance and improved rate of convergence [34]. The memory required for computation is reduced by utilising a pre-trained model. The softmax layer following the fine-tuned fully connected layer utilises the softmax output unit activation function as

$$ {y}_r(x)=\frac{e^{a_r(x)}}{\sum_{j=1}^k\ {e}^{a_j(x)}\ } $$
(2)

where 0 ≤ yr ≤ 1 and\( {\sum}_{j=1}^k{y}_j=1 \). The weights of the network in the fully connected layer are optimised during training using stochastic gradient descent algorithm [2, 16]. The parameters in the training process are updated by the stochastic gradient descent solver method in batches known as mini-batch. The cross entropy function commonly called the log loss evaluates the network by assigning its output a probability value between 0 and 1. Thus cross entropy function is used in the classification layer where each input image is assigned to the specific class in the network.

2.5 Classification metrics

The performance of the classifier using transfer learned pre-trained AlexNet CNN is obtained by measuring its accuracy, specificity and sensitivity. Specificity is calculated by the proportion of the true negatives where the negative cases are correctly identified. The proportion of the true positives is calculated to give the sensitivity of the classification procedure. Accuracy is calculated by the proportion of the true positive and true negative to obtain the total number of predictions that are correct.

3 Experiments

3.1 Data preparation

The MR image dataset used in this work includes the Healthy Control and Parkinson’s disease subjects considered from the PPMI database. The representative sets of MR images for HC and PD subjects are shown in Fig. 2(a-b). These images are pre-processed using Gaussian filter and the corresponding filtered images are shown in Fig. 2(c-d). MR images can illustrate the extent to which brain structures have degenerated. A significant grey matter intensity loss with changes in the striatum region is observed in PD subjects when compared with HC. Gaussian filtering reduces the noise and thereby improves the sensitivity of the analysis.

Fig. 2
figure 2

Representative sets of (a) Healthy Control (b) Parkinson’s disease MR images and (c-d) their Gaussian filtered images respectively

CNN AlexNet architecture is used in this study to classify the HC and PD subjects. AlexNet has been pre-trained with colour images of size 227 X 227 pixels and process them in its respective layers, from input to output. The image dataset with 80% of the input data is used for training and the remaining 20% is used for testing. The number of images from each subject given to the deep learning model is averaged to be 40 ± 5 slices based on the selection criterion as shown in Table 2. These images are given to the subsequent convolution layers. The last three layers of the pre-trained AlexNet model is modified by transfer learning. The momentum parameter is set to be 0.9 for the stochastic gradient descent solver method. The learning rate is initially set to be 1 × 10−4.

Table 2 Number of images in each category

3.2 Experimental results

The MR images are trained on the proposed network model with the suitable hyper-parameters as stated in the previous section. The first convolution layer obtains the raw data and performs convolution of the images with the filters. At each convolution layer, the features extracted are visualised as those originating from pixels, edges from the pixels and obtaining the shapes from the edges and finally complex regions as features from the shapes for discriminating between the two classes. The feature map of the fifth convolution layer containing the first 64 features is shown in Fig. 3. The variation in the intensity levels and complex regions learnt by the network are captured as features for the classification process. Thus the high level combination of features learned by the previous layers is obtained by the fully connected layers. Interpretation of the CNN model is done by visualisation of the weights. The modified fully connected layer corresponds to the two classes as healthy control and Parkinson’s disease. The feature maps obtained from the modified last fully connected layer pertaining to each class as HC and PD is shown in Fig. 4. It is found that the discriminative features which could capture the structural variations between the two classes are retrieved by the final fully connected layer.

Fig. 3
figure 3

First 64 features of the fifth convolution layer

Fig. 4
figure 4

Last fully connected layer corresponding to each class as HC and PD

The network is trained for 30 epochs where in each epoch the entire dataset is gone through. The entire image dataset is trained and tested for every thirty iterations in order to reduce the computation load on the network and reduce the time taken for the learning process. The accuracy and loss incurred during the training and the test phase are monitored to estimate the effectiveness of the network for classification.

The learning process is shown as accuracy plot in Fig. 5. The learning process converges to be stable subsequently from 2500 iterations in both the training and testing phase. The loss incurred during the training and the testing cycle is given in Fig. 6. The loss or error rate in classifying the HC and PD images saturates to remain constant starting from 2500 iteration, similar to the learning curve. The proposed transfer learned AlexNet architecture is able to achieve an accuracy of 88.90%. Sensitivity and specificity values of 89.30% and 88.40% are exhibited by this architecture respectively. Without complicated image feature extraction and selection, the proposed model is able to discriminate between HC and PD subjects with a better accuracy for objective classification of MR images.

Fig. 5
figure 5

Accuracy plot for training and test set during the learning process

Fig. 6
figure 6

Loss incurred for training and test set during the learning process

The Receiver Operating Characteristic (ROC) curve is computed for both the classes to evaluate the performance of the network. The area under the curve (AUC) is calculated to be 0.9618 from the ROC curve as given in Fig. 7. Thus an acceptable performance of the trained model is achieved which is evidenced from the ROC curve tending to perform a better classification for each class. Additionally, the transfer of knowledge from the natural image to the MR images is made possible by fine tuning the latter layers of the pre-trained AlexNet model.

Fig. 7
figure 7

ROC Curve of the transfer learned pre-trained AlexNet architecture

Several studies have investigated machine learning strategies for diagnosis of PD using MRI data. The effectiveness of the proposed method is compared with the other state-of-art studies utilising the MR images from PPMI database as shown in Table 3. Amoroso et al. [3] have developed diagnostic support system using Random Forest and SVM. In this study, the complex network measures derived from brain patches were used as PD measures. Cigdem et al. [6] have assessed the feasibility of principal component analysis (PCA) and probability distribution function (PDF) with SVM for the classification of PD. Though all these studies on the PPMI dataset have achieved comparable accuracy, they have investigated PD patterns based on the selected regions of interest. Hence these approaches could suffer from loss of information. Also, these methods are intrinsically biased by segmentation errors and would affect the discrimination power. The proposed CNN model based on whole brain analysis exploits the spatial structure without relying on hand-crafted features and hence considered to be robust.

Table 3 Comparison of performance of the classifiers (units in %)

Previous studies have reported comparable performance levels on different Parkinson’s disease datasets, utilising various dimensionality reduction techniques [30] and machine learning algorithms [19, 23]. Similarly, assessment of PD has been carried out on MR and SPECT images using different configurations of CNN [33]. However, the preliminary results stated by this method shows an accuracy of 74%.

4 Conclusion

The assessment of pathophysiological changes using neuroimaging would be essential for the diagnosis of Parkinson’s disease. MRI has been receiving greater attention in the recent years on the study of degenerative diseases. The progression of Parkinson’s disease can be assessed by structural changes in the MR images. Many machine learning to deep learning algorithms is used in the field of image analysis to identify these pathological changes to discriminate between HC and PD. In this work, an attempt has been made to classify the MR images of HC and PD subjects by applying deep learning architecture. The images used for classification are taken from PPMI public domain database. The MR images are pre-processed by normalisation and a Gaussian filter is applied to the normalised images. An AlexNet model considered as a convolution neural network is used for the classification purpose. The weights of the pre-trained model is utilised and the last fully connected layer is fine tuned with suitable hyper-parameters to classify the HC and PD subjects. The model is trained to learn the low level to high level features and the classification results are validated. An accuracy of 88.90% is achieved for classifying the HC and PD subjects. An AUC value of 0.9618 is reported from the ROC curve which shows a better discriminative proficiency of the proposed deep learning model. The proposed methodology can be extended on deep fine tuning of the AlexNet model to obtain improved performance levels. Thus with a rapid growth in the deep learning architectures, an objective diagnosis of Parkinson’s disease will no longer be a laborious job for the clinicians in the near future.