Keywords

1 Introduction

Parkinson’s disease (PD) affects more than 6 million people worlwide [15], being the second most relevant neurodegenerative disorder after Alzheimer’s Disease (AD). Parkinson’s disease is a severe, progressive and chronic disease of unknown origin which is mainly caused by the progressive loss of dopaminergic neurons of the nigrostriatal pathway [11]. The loss of dopamine neurons leads to a reduction in the neurotransmitter dopamine, which is essential in the control of movement. Nowadays, PD is incurable, and when symptoms appear it means that neuronal destruction is excessive, and any possible treatments that may slow or stop the progression are mainly ineffective. Therefore the greatest challenge in this neurodegenerative disorder is an early diagnostic, even before the apparition of symptoms.

At present, the typical diagnosis of PD is clinical, and based on a subjective view of a patient’s symptoms. There are however two main reasons that complicate the diagnosis of PD. First and foremost, because there are many similar movement disorders that conform what is frequently called parkinsoninsm, of different etiology. Secondly, the severity of these symptoms changes over time. For these reasons, approximately 25% of PD diagnoses are incorrect when compared to autopsy findings [10].

Diagnosis has improved with time, with the increasing use of biomarkers such as tau protein, present in Cerebro-Spinal Fluid (CSF). There are also many studies that aim at diagnosing PD using movement or tremor recorded in wearable sensors [5, 7] or the smartphone data [16]. Or even studies that relate PD with speech faculties: [12].

Neuroimaging modalities yields early diagnosis of PD by providing noninvasive biomarkers. Among them, Single Photon Emission Computed Tomography (SPECT) uses highly specific radiopharmaceuticals –e.g., DaTSCAN– that bind to dopamine transporters in the striatum, making it possible to observe dopaminergic deficits in the brain. However, these images provide high dimensional features that could be better exploited by means of computers than by simple visual inspection. As a consequence, Machine Learning (ML) has thrived in the latest years, mainly to perform a differential diagnosis and enabling Computer Aided Diagnosis (CAD) systems [6, 11, 17]. However, there are a lesser amount of studies that try to make sense of PD progression using these images and other variables.

This study aims to develop an algorithm that can model the progression of PD, offering support for its early diagnosis. The proposed system is based on a non-linear decomposition of the SPECT images using unsupervised machine learning methods and the modelling of the composite variables via support vector machines (SVMs). The system is tested in two differential tasks: differential diagnosis (classification) and disease progression analysis (regression) by means of a longitudinal dataset, both using SVMs. The system is evaluated by means of stratified k-fold cross-validation using a different performance metrics in order to ensure the validity of the decomposition techniques for studying and diagnosing PD.

Fig. 1.
figure 1

Flowchart of the study

2 Methodology

2.1 Dataset Description and Image Preprocessing

Data used in the preparation of this article were obtained from the Parkinson’s Progression Markers Initiative (PPMI) database (https://www.ppmi-info.org/accessdata-specimens/download-data). For up-to-date information on the study, visit www.ppmi-info.org.

The cohort used for this study is composed by those subjects initially diagnosed either as controls (CTL), with no evidence of neurodegenerative deficits, and Parkinson’s Disease (PD) affected subjects, with different levels of severity. The CTL group consists of 101 males and 53 females, with 2 of them showing mild symptomatology (HY_on=1). The PD group consists of 284 males and 159 females. Table 1 shows the demographic analysis of the dataset. PD subjects and some subjects of the CTL group were clinically followed for up to 5 years, providing data for 1399 sessions, which will be evaluated afterwards to study the progression of imaging biomarkers and their relationship to PD-specific progression indicators.

Table 1. Demographics of the PPMI subjects included in the study.

DaTSCAN images were preprocessed using affine registration to the MNI space as in [8]. Intensity normalization was based on a thresholding technique based on an alpha-stable modelling of the image intensity distribution using the algorithm proposed in [2]. Additionally, three categorical variables will be used for classification: approximate diagnosis at each visit (APPRDX), diagnosis at the first visit (PRIMDIAG) and the value for Hoehn and Yahr (HY) scale for PD, that ranges from 0 (no symptoms) to 3 (severe disability). APPRDX distinguishes between healthy patients, PD’s and SWEDD patients, whereas PRIMDIAG just between PD and CTL. The Unified Parkinson’s Disease Rating Scale (UPDRS) degree for PD, that measures the degree of symptomatology, is also used as a continuous variable in regression.

2.2 Manifold Learning

Manifold Learning builds on the assumption that real, highly dimensional data such as images lies on a lower-dimensional nonlinear manifold. The group of algorithms aimed at approximating this manifold and how real data is projected to it are known as Manifold Learning algorihtms [8].

This work follows the outline at Fig. 1, in which feature extraction via different dimensionality reduction techniques is achieved. After the image preprocessing described at Sect. 2, we apply two algorithms: Principal Component Analysis (PCA) or ISOMAP.

Principal Component Analysis (PCA) is a linear dimensionality reduction technique which allows to identify new linear subspaces. The principal components (PC) are each one of the spatial directions which maximizes the variance of the data while being orthogonal to each other, therefore they are uncorrelated variables [9].

In contrast to the linear nature of PCA, we have applied an isometric feature mapping, or Isomap, an algorithm that performs nonlinear modelling of a manifold extending metric multidimensional scaling via geodesic distances. The aim is to find a linear, lower dimensional subspace in which to embed the data in a high dimensional space, preserving the geodesic distance between the data [3].

We used the algorithm for computing Isomap as defined in [14], that is composed of three steps:

  1. 1.

    First, K-nearest neighbors are applied to construct a neighborhood graph.

  2. 2.

    Second, the shortest path between all pairs of point is calculated by estimating geodesic distances, usually using the Dijktra or Floyd-Warshall algorithms.

  3. 3.

    Finally, a d-dimensional Euclidean embedding is constructed by a partial eigenvalue decomposition (i.e., taking the d largest eigenvalues of the kernel.

2.3 Classification and Regression Experiments

Last step in our analysis is the application of machine learning modelling for predicting scores. We have used an algorithm that has proven robust in many applications: Support Vector Machines (SVM), which have been widely used in Alzheimer’s [13] or cancer [1]. We have used the implementation of LIBLINEAR for SVM classifiers (SVC) and SVM regression (SVR) [4].

Linear SVC is a particular case of SVR, as both try to predict a target variable Y from a set of data X, which in our case are the coordinates of the projections of the images in the manifold. SVMs in general try to model the curve inherent to the trend of data, creating a linear hyperplane that in the case of classification, separates (theoretically) the data. In the case of SVR, it becomes a predictor of certain variables.

5-fold stratified cross-validation (CV) is used to obtain performance measures both for regression and classification. For the classification approach, the average and standard deviation of accuracy, sensitivity, specificity and balanced accuracy are provided, along with the ROC curve in each CV loop. In the case of regression, Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE) and \(R^2\) are used as performance measures.

Fig. 2.
figure 2

DaTSCAN central slice placed in the corresponding coordinates of the first 2 dimensions of PCA (left) and Isomap (right).

3 Results and Discussion

First of all, we observe the differences between the linear decomposition with PCA and non-linear decomposition with Isomap in Fig. 2. First, two dimensions are shown for each methodology. Both approaches show similar trends in the first dimension (dimension 0), spanning from negative to positive values that are related to the intensity of the striatal region. However, the Isomap decomposes in a more uniformly distributed coordinate space, thanks to its nonlinear modelling, whereas PCA is more influenced by extreme values and outliers. Dimension 1, however, differs. In Isomap it is related to the intensity of the tails of the striatal region (putamen), whereas in PCA seems to measure roughly the asymmetry of the image.

With the aim of predicting the symptomatology of a patient just from image composite values, we trained a SVR with the projections of both Isomap and ICA. The performance of the SVR in predicting the variable UPDRS is shown at Table 2. There, the Isomap is shown a more accurate decomposition than the PCA, with no big differences between the two and three component decomposition, and higher R2 (above 0.2) than PCA.

Table 2. Performance of the SVR in predicting UPDRS from the 2 and 3-component PCA and Isomap projections.

The resulting SVR model for Isomap and PCA is shown at Fig. 3. There, the black lines represent a perfect linear reconstruction. We can observe that the predictions using Isomap decomposition are more linear with respect to the real UPDRS score than those of the PCA. The error encountered is due mainly to larger UPDRS values, which involves a more severe symptomatology.

Fig. 3.
figure 3

Comparison of the predictions obtained by a SVR using the PCA and the Isomap decomposition with two components.

The composite imaging features obtained from PCA and Isomap are also used to predict three different targets: PRIMDIAG, APPRDX, and HY scale (see Sect. 2 for a description of these variables). To do so, a SVC is trained, and measures of Accuracy, Sensitivity, Specificity, and Balanced Accuracy (average of Sensitivity and Specificity) are reported in Table 3.

Table 3. Performance of the SVC when using the two- and three-dimensional projections of Isomap and PCA.

It can be observed that the best performing decomposition is again Isomap with three dimensions. In this cases, it is able to approximate each subject’s and session diagnosis with high accuracy. It is also capable of predicting primary diagnosis with high sensitivity (>0.98). For its part, the symptomatology as measured by HY, was predicted with a Multiclass one-vs-all SVC. The results were good for classes 0 and 2, but not for classes 1 and 3. This is relevant for interpreting this result, as class 0 are controls and class 2 has severe symptoms. Class 3 subjects were interpreted as class 2, probably because they show similar dopaminergic deficit, in contrast to controls. Class 1 is so close to class 0 that some subjects with primary diagnosis as CTL have class 1 assigned. However, it is very likely that some of these subjects are incorrectly interpreted as controls and other as class 2 because they show dopaminergic deficit.

ROC curves for the differential binary diagnosis are shown at Fig. 4. Individual curves for each CV fold are shown, and the mean ROC curve is shown in blue. There it can be seen that the AUC of our methodology is very high for providing a good differential diagnosis of PD based solely on imaging markers.

Fig. 4.
figure 4

ROC curves obtained with SVC applied to ISOMAP 3 output with the apporximate diagnosis, HY scale and primary diagnosis.

4 Conclusions

Due to the critical importance and the difficulty of diagnosing Parkinson’s disease, it is necessary to find reliable and accurate methods of early diagnosis. The aim of this work is to explore the usefulness of nonlinear manifold learning methodologies to inform a new space in which imaging features relate to symptomatology of PD, allowing for the creation of longitudinal PD progression models. The proposed methodology, using Isomap as a manifold learning method, and Support Vector Machines (SVM) as machine learning models for classification and regression, yielded high performance in regression, binary classification, and multi-class classification, as in the case of the HY score.

Each of the dimensions of the nonlinear subspace found by Isomap can be related to relevant changes in the brain such as the concentration of dopamine transporters (DaT) in the striatum or DaT concentration at the putamen (as it is the case of dimension 1). This builds a machine learning model that, unlike many in the literature, is fully interpretable by healthcare professionals, paving the way for a more informed diagnosis.

The latent space of Isomap and other manifold learning algorithms in DaTSCAN images is yet to be fully explored. Other decomposition methods based on self-supervised neural networks could be of help here, along with more complex regression models that account for differences in distribution and hierarchical models with covariates, paving the way for new interpretable Computer Aided Diagnosis systems to understand the diagnosis and progression of PD.