Introduction

The clinical diagnosis of idiopathic Parkinson’s disease (IPD) is inaccurate as 25% of patients classified as having Parkinson’s turn out to have an atypical parkinsonian syndrome (APS) [1]. Multiple system atrophy (MSA) and progressive supranuclear palsy (PSP) account for over 90% of APS cases, which can be difficult to distinguish from IPD at early disease stages [2,3,4,5]. Indeed, in a study [6] comparing neuropathological diagnosis with the initial clinical diagnosis made in untreated or ambiguously responsive parkinsonian subjects, the positive predictive value (PPV) of a PD diagnosis by a movement disorder specialist was only 26%; in early-stage patients who responded to therapy, the PPV was only 53% [6].

Metabolic imaging can improve the diagnostic accuracy in distinguishing PD from APS using observer-dependent and observer-independent methods [7,8,9,10,11]. We developed an automated, image-based algorithm that classifies patients based on the expression of disease-related metabolic networks at the individual subject level [12, 13]. Specifically, we used 18F-fluorodeoxyglucose (FDG) PET to differentiate between IPD, MSA, and PSP based on their respective pattern expression values [12]. Several subsequent studies showed that the resulting classifications agree closely with the clinical diagnosis made by movement disorder experts 3 to 4 years after imaging blind to the results of the automated analysis. The PPV of an IPD diagnosis ranged from 92 to 98% [12, 14,15,16].

Here, we asked how the accuracy of automated imaging classification compared to the neuropathological gold standard, and how it compared to the clinical impression at the time of referral for imaging.

Methods

Study design

We evaluated cases of uncertain diagnosis of parkinsonism that had undergone both FDG PET imaging and eventual neuropathological assessment (Fig. 1). All patients were referred to the Center for Parkinson’s Disease and Other Movement Disorders at Columbia University Irving Medical Center (CUIMC) in New York by an outpatient physician/neurologist to seek the opinion of a center specializing in movement disorders. At CUIMC, all patients underwent detailed clinical assessments by a movement disorder specialist (S.F., S.A.O.). Only patients with an uncertain diagnosis were referred to The Feinstein Institutes for Medical Research (Manhasset, NY) for FDG PET imaging [12].

Fig. 1
figure 1

Study design. Patients with parkinsonism and uncertain clinical diagnosis were referred by a movement disorder specialist for brain imaging with FDG PET an average of 4 years from symptom onset. The first step (level 1) determines whether the diagnosis is likely IPD or an atypical parkinsonian syndrome (APS); if the algorithm predicts an APS, level 2 evaluates the diagnosis of progressive supranuclear palsy and multiple system atrophy. At either level, the pattern expression may be insufficiently developed for the algorithm to render a diagnosis, leading to an indeterminate (IND) diagnosis. The probability of each disease was calculated for every case (see Methods and Table 1). Each patient was followed clinically by a specialist who was blind to the results of the imaging algorithm. For every case, the referral diagnosis, the algorithm’s diagnosis, and the final clinical diagnosis were compared to the neuropathology, determined an average of 2 years after the final clinical diagnosis. (18F-FDG PET, fluorodeoxyglucose positron emission tomography; PSPRP, progressive supranuclear palsy-related pattern; MSARP, multiple system atrophy-related pattern; PDRP, Parkinson’s disease-related pattern)

The inclusion criteria for this study were as follows: (1) uncertain clinical diagnosis, i.e., a differential of two or more possible conditions or a non-specific diagnosis of parkinsonism between January 1, 1996 and January 1, 2016, and (2) no evidence of secondary cause for parkinsonism based on structural imaging. At the time of the referral to FDG PET, the movement disorder specialist recorded a referral diagnosis representing the clinical impression, i.e., the most likely of the various diagnostic possibilities. After imaging, scans were classified according to an automated image-based algorithm [12] performed by an analyst (C.C.T.), blind to patient identity, clinical diagnosis, and autopsy findings. The algorithm provided an image-based diagnosis of IPD, MSA, PSP, or indeterminate (IND) based on the pattern expression values (subject scores) computed for each patient. A final clinical diagnosis was subsequently made following at least two follow-up visits by a movement disorder specialist (S.F.), who was blind to the results of the imaging algorithm. The final clinical diagnosis of each patient was made in agreement with consensus criteria for IPD [17], MSA [18], and PSP [19]. In each case, the autopsy diagnosis was determined by a neuropathologist (J.-P.V.) at the New York Brain Bank (NYBB), who likewise was blind to the image-based diagnosis.

Study sample

Two hundred six parkinsonian patients were referred for FDG PET because of an initially uncertain diagnosis and followed by movement disorder specialists at CUIMC to receive a final clinical diagnosis. Of these, 20 (9.7%) had postmortem examination at NYBB. Because the algorithm was validated only for the differential diagnosis of IPD, MSA, and PSP [14, 15], we included patients confirmed at autopsy to have one of these disorders. Thus, four patients with other pathological diagnoses (three with corticobasal syndrome (CBS); one with an unclassified tauopathy) were excluded. Additionally, one patient had structural MRI at the time of PET, which disclosed severe atrophy, which can confound the computation of subject scores and invalidate the algorithm [14]. Thus, the image-based classification was compared to the neuropathological diagnosis in 15 patients with parkinsonism. Descriptive information from a subset of these patients have been reported previously as highlighted in Supplementary Table 1 [12].

At the end of the study, the imaging classifications and the clinical and autopsy findings were separately submitted to an independent evaluator (D.K.G.) at a third site. He opened the blind and merged the results into a final database that was shared with the other sites and used for statistical analysis.

FDG PET

All subjects were scanned using the General Electric Advance tomograph (Milwaukee, WI) at the Feinstein Institutes/Northwell Health as described previously [20]. In brief, all subjects fasted overnight, and antiparkinsonian medication was withheld at least 12 h before the scan. Scans from each subject were realigned and spatially normalized to a standard Montreal Neurological Institute (MNI)-based PET brain template and smoothed with an isotropic Gaussian kernel (10 mm) in all directions to improve the signal-to-noise ratio. Image processing was performed using Statistical Parametric Mapping (SPM5) software (Wellcome Centre for Human Neuroimaging, London, UK).

Automated pattern-based classification

Expression values (subject scores) for each of the three previously validated disease-related metabolic covariance patterns (PDRP, MSARP, and PSPRP) were computed on a prospective individual-scan basis [21] using existing software (ScAnVp, freely available at http://www.feinsteinneuroscience.org/). Subject scores for each disease pattern were standardized (z-scored) with respect to corresponding values from healthy volunteer subjects used in the original study [12].

Using logistic regression analysis, we used the subject scores from each patient to compute the probability of each disease category in that individual (see Fig. 1). Based on these probabilities, we classified each of the subjects according to a two-level procedure as described previously [12]. At level 1, each subject was classified as IPD or APS by comparing the subject’s probabilities to the cutoff probabilities for IPD (0.81) and APS (0.79) determined in the original study. Patients who had a higher probability than the cutoff value for IPD were classified as IPD, whereas those with a higher probability than the cutoff value for APS were classified as APS. At level 2, subjects classified at level 1 as APS were further subclassified as MSA or PSP using the previously reported cutoff probabilities for MSA (0.74) and PSP (0.55) [12]. Subjects with probabilities lower than the cutoff values were classified as “indeterminate” (IND) at each level. The pattern-based classification algorithm employed in this analysis was identical to that used in the original cohort [12] and in the subsequent validation studies [14,15,16].

Neuropathology

All patients underwent postmortem examination and received a histopathological diagnosis by a neuropathologist (J.-P.V.) according to established criteria [19, 22,23,24]. Half-brains were placed in formalin and cut; tissue was processed according to published protocols [25]. Briefly, 7-μm-thick sections were stained with Luxol fast blue counterstained with hematoxylin and eosin for general survey. Selected sections were stained using the Bielschowsky method to evaluate axons, neuritic plaques, and neurofibrillary and glial tangles; AT8 antibodies against hyperphosphorylated tau; and antibodies against β-amyloid, α-synuclein, ubiquitin, and TDP43.

Statistical analysis

For statistical analysis, we calculated the proportions of correctly classified cases for the imaging diagnosis, the referral diagnosis, and the final clinical diagnosis each in comparison with the pathological diagnosis. The binomial test was used to test the hypothesis that the proportion of correctly classified cases was different from a chance guess, i.e., a probability of 0.50, and was considered significant for p < 0.05, two-tailed. All statistical analyses were performed in SAS Studio.

Results

Sample characteristics

The 15 patients in the study had a postmortem diagnosis of IPD (n = 4), MSA (n = 6), or PSP (n = 5) (Supplementary Table 1). Thus, the majority of cases (73.3%) had an atypical form of parkinsonism at autopsy. This is not unexpected, given that autopsies are seldom performed except in the most challenging cases. The mean age of the patients at the time of imaging was 66.3 (range 44–82) years, and the mean symptom duration was 4.1 (range 2–9) years at the time of PET (Supplementary Table 2). The mean time interval between PET and autopsy was 3.5 (range 0.4–5.8) years. The final clinical diagnosis was reached after a mean 1.9 (range 0.3–4.7) years after imaging. The mean time from final clinical diagnosis to autopsy was 1.7 (range 0.1–5.0) years.

Clinical referral diagnosis

At the time of the referral for PET, nine of the 15 cases were thought likely to have IPD based on the leading diagnostic impression, whereas the other six were thought to have either PSP or MSA as the most likely possibility. Relative to postmortem, the referral diagnoses were 66.7% correct, which did not differ significantly from chance (Z = 1.29, p = 0.19; Table 1). All five initial misdiagnoses were APS patients thought incorrectly to have IPD. However, of the six cases with an initial clinical impression of MSA or PSP, all diagnoses were subsequently confirmed at autopsy.

Table 1 Concordance of clinical and imaging diagnosis with postmortem

Automated image-based diagnosis

The automated algorithm produced one of three possible results: a correct diagnosis (with reference to autopsy), a misdiagnosis, or an “indeterminate” classification reflecting classification probabilities beneath the prespecified cut points. For comparison of diagnostic accuracy in this study, we considered indeterminate (IND) as an incorrect diagnosis.

Based on the neuropathological diagnosis, the automated algorithm correctly classified 80.0% of the cases at level 1 (IPD vs. APS), which was greater than chance (Z = 2.32, p = 0.02; Table 1). Of the 8 cases classified as APS at level 1, the algorithm correctly classified 87.5% as MSA or PSP at level 2, which was also greater than chance (Z = 2.12, p = 0.03; Table 1). Two representative cases are shown in Fig. 2.

Fig. 2
figure 2

Cases of autopsy-confirmed Parkinson’s disease (PD) and progressive supranuclear palsy (PSP). (a) Patient 2 (57-year-old male) had an uncertain clinical diagnosis of idiopathic Parkinson’s disease (IPD) at the time of PET referral, roughly 9 years after symptom onset. The automated imaging algorithm classified the patient as IPD (99% likelihood). One year after imaging, a final clinical diagnosis of IPD was reached, which was confirmed on autopsy 4 months later. The neuropathological examination demonstrated Lewy body-containing neurons and severe neuronal loss in the pars compacta of the substantia nigra (LHE, 200×; left). Lewy body-containing neurons were labeled with α-synuclein (400×; right). LHE, Luxol fast blue hematoxylin and eosin. (b) At the time of PET referral, patient 12 (69 years old male, symptom duration 2 years) had an uncertain clinical diagnosis with PSP as the leading possibility. The automated imaging algorithm classified the patient as PSP (83.3% likelihood). Six months after imaging, a final clinical diagnosis of PSP was made, which was confirmed on autopsy 10 months later. The histopathological examination showed neuronal loss in the globus pallidus, substantia nigra, red nucleus, subthalamic nucleus, pons, medulla oblongata, and cerebellum. The cerebellar cortex displayed loss of Purkinje cells and presence of torpedoes (Bielschowsky, 400×; left). AT8-labeled cells including tufted astrocytes and glial cytoplasmatic inclusions were found in the paracentral cortex (630×; right), superior parietal lobe, and prefrontal cortex

The automated algorithm misclassified two patients with APS as IPD. In both instances, IPD was also the leading referral diagnosis, but case 9 was eventually found to have MSA at autopsy and case 15 to have PSP. The algorithm classified two cases as IND (cases 10 and 13). At the time of PET referral, the leading clinical impression of case 10 was IPD, but the final clinical diagnosis was MSA 4 years later, which was consistent with autopsy 4 months later. Because neuronal loss and gliosis of the putamen and globus pallidus were heavily lateralized on the left hemisphere, the algorithm was unable to determine whether this metabolic asymmetry was due to a reduction on the left or relative increase on the right and so rendered a classification of IND at level 1.

Final clinical diagnosis

Based on the neuropathology, the final clinical diagnosis was correct in 14/15 (93.3%) cases. This was markedly different than chance (Z = 3.36, p = 0.0008; Table 1). A single case with MSA (mixed type) at autopsy (Fig. 3) was incorrectly classified by the algorithm as IPD and by the expert clinician as CBD.

Fig. 3
figure 3

A challenging case of pathology-proven multiple system atrophy (MSA). Patient 9 (64-year-old female, symptom duration 2 years) was referred for FDG PET because of an uncertain clinical diagnosis with a suspicion of IPD or CBS. The automated imaging algorithm classified the patient as IPD (83.9% likelihood). After 6 months of additional clinical follow-up, the clinical diagnosis was revised to CBS. Autopsy performed 4 years later revealed changes consistent with MSA: severe neuronal loss and reactive gliosis in the putamen (LHE, 200×; left), pons, substantia nigra, and cerebellum. α-synuclein-labeled glial cytoplasmic inclusions were found in the frontal cortex (α-synuclein antibody, 400×; right), the striatum, the pons, amygdala, and medulla oblongata

Discussion

In this small sample of 15 diagnostically challenging cases of parkinsonism, the clinical referral diagnosis, rendered after a mean disease duration of 4.1 years, correctly predicted the autopsy result in 67% of cases (Fig. 4). Metabolic imaging in conjunction with the pattern-based algorithm was accurate in 80% (level 1) and 88% (level 2) of the cases. The final clinical diagnosis, reached after an average of two additional years of clinical follow-up, was even more accurate (93%). Although the sample size is a limitation of the current study, these statistics are congruent with previous studies showing close agreement between PET-based classification and the final clinical diagnosis made 3–4 years later, where the algorithm distinguished IPD from APS with 94% specificity and achieved 90% specificity for MSA and 94% for PSP [12, 14,15,16]. The data also accord with comprehensive studies showing a close relationship between the final diagnosis of the expert clinician and autopsy [1, 26]. APS clearly still presented a challenge to the expert clinician as evident in the Hughes et al. study [26], which did not incorporate brain imaging: the final clinical diagnosis, achieved after a mean disease duration of 5.3 years, was correct for 85.7% of the MSA cases and 80% of the PSP cases.

Fig. 4
figure 4

Comparison of the concordance of clinical and PET diagnoses with autopsy. At the time of PET referral, in patients with uncertain clinical diagnosis (mean symptom duration 4 years), the referral diagnosis agreed with autopsy only 67% of the time. Automated classification based on contemporaneous PET imaging increased diagnostic accuracy to 80% for level 1 and 87.5% for level 2. The final clinical diagnosis, reached by the expert clinician after an average of 2 years of follow-up, was 93% concordant with autopsy findings obtained approximately 2 years later

The low likelihood of accurate clinical diagnosis in patients with disease duration <5 years is particularly concerning, and it has not improved appreciably over the past few decades [1, 6]. In the current study, the referral diagnosis, which was made by a movement disorder specialist an average of 4 years after symptom onset, was correct in only 67% of patients. Indeed, of nine cases initially thought to have IPD, five were found to have APS at postmortem examination. By contrast, all initial diagnostic impressions of APS were correct, despite the clinical uncertainty that prompted the request for imaging to begin with.

Two of the four discordant classifications of the imaging algorithm were attributed to indeterminate readouts, i.e., where the classification probability did not exceed the requisite cut point for diagnosis. In such cases, the algorithm provides a degree of doubt that is preferable to an incorrect classification [12]. That notwithstanding, only two of the misdiagnoses made by the algorithm reflected true errors. Case 9 was the most challenging (Fig. 3). This patient had MSA but was misclassified as IPD by both the initial clinical assessment and the algorithm; the final clinical diagnosis of CBS was also incorrect. It is possible that at the time of referral, the site of neurodegeneration in this patient was more nigral than striatal. This would accord with a more IPD-like clinical presentation, which includes levodopa responsiveness [3, 27]. With regard to the algorithm, all three expression scores were quite low, with absolute values ≤0.6. While the case was categorized as IPD, the probability was only 83.9%, marginally exceeding the cut point (81.0%) for this diagnosis. In all likelihood, the initial classification of IPD would not have been sustained over time as neurodegeneration progressed to involve the striatum. Overall, this case was the only low expression case (pattern expression scores of all three pattern ≤1.0) in this study. The PPV for discriminating IPD from atypical parkinsonism was 96–98% in previous studies [12, 14, 15]. In cases with low expression, the PPV was reduced to 80% for IPD [12], which should be considered when making the final diagnosis in clinical practice.

Case 15 has a primary diagnosis of PSP with coexisting Alzheimer’s disease (AD) pathology and was misclassified as IPD by the algorithm. The AD-related covariance pattern (ADRP) is characterized by relative metabolic decreases in the hippocampus, parahippocampal gyrus, and parietal and temporal association regions [28,29,30]. This topography includes a circumscribed area of modest overlap with the PDRP, which has minimal effect on PDRP expression in PD datasets [30]. Thus, the presence of incipient AD pathology with slight PDRP elevations would be unlikely to bias algorithmic classifications of patients as IPD, as seen in cases 3 and 4. The PSPRP, on the other hand, has little topographic overlap with PDRP. Indeed, in a recent study, PSP co-pathologies, the most common being AD, had minimal clinical impact on the disease [31]. Despite the topographic and clinical independence of the two pathologies, it is possible that a patient with PSP-AD dual pathology will exhibit metabolic decreases in parietal cortex due to AD rather than PSP. As noted above, a localized change of this sort can cause a spurious, albeit modest, increase in PDRP expression and lead to a misclassification as IPD, as seen in case 15.

Nonetheless, at the time of PET referral, the automated image-based classification was substantially more accurate than the contemporaneous clinical diagnosis. Specifically, three of the five patients who initially were thought (incorrectly) to have IPD were correctly classified as APS by the algorithm. As targeted proteinopathy-specific therapies are developed, confirming a clinical diagnosis of one or the other major forms of APS will become relevant [32].

Conclusion

This study confirms the advantages of clinical expertise combined with longitudinal patient observation in difficult-to-diagnose cases. Unfortunately, many patients will not have access to such expertise, and few are willing to wait several years for an accurate diagnosis and a realistic prognosis. In such circumstances, reliable biomarkers will help reduce diagnostic uncertainty early in the disease course. So far, the image-based algorithm has the highest specificity and predictive value of available network biomarkers in differentiating PD, MSA, and PSP [7, 12,13,14,15, 33]. The advantage of metabolic imaging is that it can provide a wealth of information regarding rates of disease progression and treatment responses [13, 33, 34]. Perhaps more importantly, accurate early diagnosis is critical for efficient clinical trial design. Many disease-modification trials in PD focus on recent onset, drug-naïve PD patients—the very group for whom early diagnosis is least accurate [6]. Using the algorithm to screen for potential APS patients “masquerading” as IPD may enhance the likelihood of detecting a therapeutic effect. This attribute is particularly relevant in phase 2 clinical trials in which sample sizes are smaller and significant results can be obscured by just a few misdiagnoses.