Introduction

Parkinson’s disease (PD) is a chronic progressive neurodegenerative disorder comprising a diversity of motor and non-motor symptoms. Cardinal motor symptoms include tremor at rest, rigidity, bradykinesia and postural instability [1]. Non-motor symptoms entail, e.g., neuropsychiatric symptoms, fatigue, sensory, sleep and autonomic dysfunctions [2]. Traditionally, clinical trial outcome measurement has emphasized motor symptoms, but the past decade has seen an increased recognition of non-motor features [2].

There is an increasing emphasis on the importance of involving end-users, e.g., those affected by a particular condition, in research [35]. Focusing on research questions that are relevant from an end-user perspective is considered a key issue for the success of clinical research [6]. Accordingly, it is argued that interventions should be evaluated by using outcomes that are meaningful to patients [4, 7, 8]. However, involving end-users appears to be a relatively underutilized avenue in research targeting people with PD (PwPD). For example, although there has been a general broadening of disease areas considered for outcome measurement in clinical PD trials over the past decades, the selection of outcome variables largely appear investigator driven.

Knowledge regarding what outcomes PwPD prioritize is limited. Nisenzon et al. [9] investigated treatment success and expectations from a PwPD perspective by using a questionnaire tapping 10 investigator predefined disease areas. They found that PwPD rated walking, slowness, fatigue, sleep and activities of daily living as the most important areas to see improvements in [9]. This is largely in accordance with results from another study that identified fatigue, sleep, pain, depression and mobility as significant predictors of illness-related distress in PD [10]. In a recent mixed-methods study, we used a participant-driven approach (concept mapping) to identify and conceptually map out prioritized areas for outcome measurement in clinical PD trials from the perspectives of two main end-user stake holders, PwPD and health care professionals (HCPs) [11]. Among identified prioritized areas for outcome measurement, quality of life, walking, sleep and psychological wellbeing were regarded the most important ones. However, that study did not differentiate the perspectives of PwPD and HCPs. Hence, the degree of concurrence in their respective conceptualizations and prioritizations remains unclear. In this study, we therefore reanalyzed the data from Sjodahl Hammarlund et al. [11] in order to contrast the perspectives of PwPD to that of HCPs regarding prioritized areas for outcome measurement in clinical PD trials.

Methods

Concept mapping

Concept mapping is a mixed-method combining qualitative and quantitative methodologies in order to explore phenomena, reveal their structures and discover new meaning [12, 13].

Concept mapping is conducted in multiple steps: statement/item generation through brainstorming, sorting and rating, data analysis and interpretation. First, a focus prompt is developed to guide the generation of data (statements/items). Statements are then sorted according to their perceived conceptual similarities and each statement is rated, typically regarding its importance in relation to the focus prompt. The sorted data are analyzed by multidimensional scaling (MDS) to depict relationships among statements/items in a 2D map. Cluster analysis is then used to identify relevant and interpretable clusters of statements representing specific aspects. Finally, the map is interpreted qualitatively. The general concept mapping procedure is described in detail elsewhere [12, 13].

Participants and procedures

The study was conducted in accordance with the Declaration of Helsinki. All participants gave their written informed consent following oral and written study information. No personal information was recorded that would allow any data to be linked to individual participants.

Study procedures have been described in detail elsewhere [11]. Briefly, 12 PwPD (mean age 67; Hoehn and Yahr [14] stages of PD, II–V) and 12 HCPs (mean age 47) from multidisciplinary PD teams at two hospitals (one university hospital and one central hospital) participated in four and three item-generation focus group sessions, respectively (Table 1). To maximize input at the item-generation stage, three preclinical researchers working on disease mechanisms and development of novel interventions for PD formed an additional focus group.

Table 1 Sample characteristics

Focus group sessions lasted for 1–1.5 h and generated statements in response to the prompt: “A concrete example of what is most important to assess when treating PD, regardless of whether there is such a treatment available or not, is…” All generated items were recorded and reviewed by the group at the end of each session. Data saturation was considered reached when no new statements were generated. The resulting statements were reviewed; duplicates and non-relevant statements were removed.

The statements were then sorted and rated by 19 PwPD (mean age 69; Hoehn and Yahr [14] stages of PD, I–IV) and 19 HCPs (mean age 45) of whom five of each had participated in the focus groups (Table 1). Participants were instructed to sort statements into piles “in a way that made sense to them” based on perceived conceptual similarities. Next, participants were instructed to rate each statement on a 1–5 scale regarding its perceived importance for outcome measurement in PD (5 = very important), regardless of whether such a treatment is available or not.

Finally, following MDS and cluster analyses, the resulting PwPD and HCP maps were reviewed by a subset of participants. The PwPD map was reviewed by nine PwPD (2 groups), and the HCP map was reviewed by nine HCPs (3 groups) (Table 1). Each session lasted for 1–1.5 h and aimed at reviewing the contents of each cluster to provide representative cluster descriptions. Finally, PwPD and HCPs reviewed and interpreted their respective cluster map as a whole from the perspective of the relative location of each cluster in order to provide a higher-order interpretation of the map.

Data analyses and interpretation

Data were analyzed separately for PwPD and HCPs using the Concept Systems software (version 4.0.175; www.conceptsystems.com), PROTEST software (http://jackson.eeb.utoronto.ca), IBM SPSS (version 20) and MS Excel 2010. First, the relationships (distances) between items were estimated by means of 2D non-metric MDS based on a similarity matrix generated from participants’ sorting of statements [12, 13]. Goodness-of-fit between the input sort data in MDS and the resulting map was assessed by the stress value. Lower stress values indicate better fit; 0.39 has been suggested as an upper acceptable limit for 2D non-metric MDS, based on similarly sized input matrices [15]. Meta-analyses of concept mapping studies [16, 17] have found average stress values of 0.28 (ranges 0.17–0.34 and 0.15–0.35, respectively). To further assess the internal representational validity of the MDS-generated maps, their configural similarities were assessed by correlating (Pearson’s product-moment correlation) the raw aggregated similarity matrices with the final matrices of Euclidean distances between pairs of points on the MDS-generated maps [16]. These values have been found to average 0.66 and range between 0.53 and 0.83 across previous concept mapping studies [16].

Next, hierarchical cluster analysis was applied to the MDS-generated xy map coordinates to identify groupings of items [12, 13]. Each possible solution from 3 to 20 clusters was independently examined for interpretability and statistical adequacy [bridging values (BVs)] to choose the most representative number of clusters. BVs are indices ranging from 0 to 1 that denote the degree to which participants have sorted an item within the cluster it resides in versus other clusters. High cluster BVs suggest a more complex construct with conceptual similarities with other clusters. Low cluster BVs suggest that cluster statements hang well together. At the item level, a low BV suggests that the statement can be considered a representative “anchor” for its cluster [12].

To assess the quantitative association between the conceptual interpretations by PwPD and HCPs, two analyses were conducted. First, vectors of the total similarity matrices (overall raw sorting data) produced by PwPD and HCPs, respectively, were correlated (Pearson’s product–moment correlation). The result of this analysis can be viewed as an index of the similarity by which participants in the two groups paired statements together within piles [16, 18]. Next, to test the correspondence between the structural representations of the two groups, we conducted a multivariate rotational-fit algorithm between two configurations as represented by the two sets of xy coordinates for each respective map (2D, multivariate summaries). To account for the lack of independence among distances and the spuriousness of their correlation, procrustes analysis was used to minimize the sum-of-squared deviations between the data values in two observation-by-variable matrices through matrix translation, reflection, rigid rotation and dilation [19, 20]. Residuals between the original values and the best fit solution were calculated for each observation to identify outlying and deviant points with the resultant m 12 statistic (0 < m 12 < 1.0), indicating the goodness-of-fit between two spatial configurations. To evaluate the significance of the observed m 12 statistic, 10,000 random permutations were run to estimate the probability of the derived statistic and ensure the relative stability of the estimated P value [19].

To assess similarities and differences in the values assigned to statements, mean and median item importance ratings from the two groups were correlated (Pearson’s product–moment and Spearman’s rank correlations, respectively) and compared (Mann–Whitney U tests, with and without Benjamini–Hochberg correction) to explore differences in priorities.

Results

A total of 175 statements were generated and resulted in a final set of 99 statements following removal of duplicates and non-relevant statements. MDS analysis of the sorting data resulted in 2D maps with stress values of 0.31 for PwPD and 0.21 for HCPs; both within the range found in previous meta-analytic studies of concept mapping [16, 17].

The configural similarity correlation for PwPD was 0.61 (P < 0.001) and 0.75 (P < 0.001) for HCPs. Both were within the range found in studies of similarly constructed concept maps [16], but differed (z, 13.13; P < 0.001 following Fisher’s r-to-z transformation). The PwPD configural similarity correlation was near the lower end of the range (relatively weaker relationship between sort and distance data), and the HCPs configural similarity correlation was near the upper end of the range (relatively stronger relationship between sort and distance data).

Following cluster analyses, an eight-cluster solution was regarded as the most parsimonious structural interpretation for both PwPD and HCPs. Although the numbers of clusters were eight for both groups, their configurations were not identical (Fig. 1). The number of statements per cluster ranged from 7 to 19 for PwPD and from 3 to 23 for HCPs (Table 2). The correlation between the raw sorting data (total similarity matrices) of PwPD and HCPs was 0.80 (P < 0.001). The spatial arrangement of the PwPD configuration showed marked similarity to the configuration of HCPs. The fit of the two spatial arrangements was greater than expected due to chance (m 12 = 0.53; P < 0.001). Thus, a highly significant concordance between the multivariate data sets was detected, and the two matrices showed a moderate non-random resemblance [21]. Given that m 12 = 1 − r 2, we solved for r (0.53 = 1–0.47) and found r (√0.47) to be 0.68. We then compared the magnitude of agreement for the sorting arrangement (r = 0.80; n = 4,950) and the magnitude of the spatial arrangement (r = 0.68; n = 99) and found that they differed (z, 2.62; P < 0.009 following Fisher’s r-to-z transformation).

Fig. 1
figure 1

The multidimensional scaling generated point maps from people with Parkinson’s disease (a) and health care professionals (b) representing the distances between the 99 statements (each numbered point on the map represents one statement; see Table 2 for statement contents), with the respective eight-cluster solutions superimposed

Table 2 Sorting of 99 items representing important aspects for outcome measurement in clinical Parkinson’s disease (PD) trials by people with PD (PwPD; n = 19) and health care professionals (HCPs; n = 19) into eight clusters

The mean cluster level BVs and importance ratings are depicted in Figs. 2 and 3, respectively. The sorting and BVs of each statement for PwPD and HCPs are presented in Table 2. In general, similarities between the sorting of PwPD and HCPs outweighed the differences. For example, 16 out of 19 statements in the PwPD cluster Mobility and motor functioning were also sorted together and given the same label by HCPs; the remaining three statements (numbers 4, 10 and 37) were sorted in a cluster labeled Pain, fatigue and miscellaneous symptoms by the HCPs (Table 2).

Fig. 2
figure 2

Cluster maps with the respective eight-cluster solutions from people with Parkinson’s disease (a) and health care professionals (b) with cluster names and their average bridging values in parenthesis (more cluster layers = higher bridging value). Average cluster bridging values for people with Parkinson’s disease and health care professionals are 0.36 and 0.48, respectively (P = 0.195)

Fig. 3
figure 3

Cluster maps with the respective eight-cluster solutions from people with Parkinson’s disease (a) and health care professionals (b) with cluster names and their average importance ratings in parenthesis (more cluster layers = greater importance). Average cluster importance ratings for people with Parkinson’s disease and health care professionals are 3.54 and 3.87, respectively (P = 0.012)

PwPD divided their map (Figs. 2a, 3a) into two main higher-order areas. The “northwest” part consisted of Mobility and motor functioning, Sensory, speech and swallowing problems and Autonomic dysfunctions (which in part was considered related to Cognitive functioning). The “southeast” part consisted of Cognitive functioning, Psychological symptoms, Executive functioning and participating in society and Social functioning. Neuropsychiatric symptoms and emotional reactions had a central position and was regarded influential on all other clusters.

HCPs divided their map (Figs. 2b, 3b) into four main higher-order areas. They considered the “northern” part (Mobility and motor functioning, Pain, fatigue and miscellaneous symptoms) to represent physical, somatic and “measurable” aspects. The “southern” part (Neuropsychiatric symptoms, Psychosocial problems) was expressed as representing psychological and abstract problems that persons may wish to conceal unless specifically asked about. The “eastern” part (Cognitive functioning, Social interaction) was interpreted as aspects of interaction and communication. The “western” part (Eating and blood pressure and Autonomic dysfunctions) was considered concealed and beyond the control of the individual. Clusters were also considered interrelated; if one cluster does not function, then aspects of other clusters will also be impaired.

The Pearson’s correlation between mean statement importance ratings (n = 99) by PwPD and HCPs was 0.60 (P < 0.001). The corresponding Spearman’s correlation between median importance ratings was 0.44 (P < 0.001). In 15 instances, the importance ratings of the statements differed significantly (P < 0.05) between PwPD and HCPs, none of which remained significant following Benjamini–Hochberg correction (Table 3). In all instances of uncorrected significant differences, HCP ratings were higher (perceived as more important) than PwPD ratings (Table 3; Fig. 3). Table 4 lists the ten statements receiving the highest mean importance ratings from the two respective groups.

Table 3 Statements with significantly (uncorrected P < 0.05) different importance ratings between PwPD and HCPs
Table 4 The ten highest rated statements from PwPD and HCPs

Discussion

By employing a diagnostic mixed-methods approach to evaluate the conceptualization patterns of PwPD and HCPs, we were able to distinguish similarities and differences in both structure and substance. Quantitatively, we found consistency in the within-group relationship between the sort data and the distance data among items on the map. There was a strong correspondence for the HCPs; that is, we found a low stress value and high configural similarity in our examination of their sort and distance data. Greater variability was found in the relationship between sort and distance data for the PwPD, where we observed higher stress and lower configural similarity. This pattern of greater variability among PwPD was also reflected in their importance ratings, which had larger SDs as compared to HCPs’ ratings.

Similarities were predominant when comparing the results of how the statements were sorted by PwPD and HCPs. This was also confirmed by a strong correlation between the sorting data from the two groups. Although a significant relationship in the coordinate representations was found confirming their non-random resemblance, the resulting fit value suggests moderate concordance in the PwPD and HCP configurations. While we expected the correlation between the sort data of the two groups to be stronger than the fit of the configurations, the discrepancy between the two measures further highlight differences in how each groups’ data were represented in the 2D space. Collectively, these observations suggest some variation in the structural representation of the content organized by the two groups.

The higher configural similarity correlation of the HCPs may be explained by their training and clinical experiences. Similarly, this may have influenced their relatively homogenous importance ratings, as compared to PwPD. The greater variability in sort and distance data as well as in importance ratings among PwPD may reflect that data stem from a heterogeneous sample representing a broad variety of experiences and manifestations of the disease. It thus appears reasonable to assume that the views of each HCP are based on the knowledge of and clinical experience with a wide range of disease severities and expressions that collectively reflect a relatively representative view of the disease, albeit influenced by personal views and professional perspectives. That is, each HCP participant’s data point may be seen as an average, whereas PwPD data reflect each participant’s personal experience of the disorder. While the inclusion of wide arrays of personal PwPD views may be seen as problematic in that it can compromise representativeness and cloud or disperse the overall picture, this is wherein the value of incorporating this perspective lies [3, 4].

The relatively less coherent results from PwPD compared to HCPs (and compared to concept mapping studies in general [16]) suggest that from a PwPD perspective, various aspects and consequences of PD (expressed here in terms of outcome variables) are interrelated and difficult to separate from one another. That is, while various aspects/outcomes may be grouped into separate clusters, these are nevertheless intermingled and part of a unity; the impact of the disorder appears heterogeneous but unidimensional rather than homogeneous and multidimensional. This view was reflected also in the qualitative map interpretations by HCPs, but became more apparent in the quantitative data from PwPD than from HCPs and is probably also reflected in that PwPD clusters were generally larger compared to those of HCPs. More in-depth qualitative studies are needed to understand this better.

The interpretation that the HCP perspective appears to reflect a scholarly biomedical frame of reference, whereas the PwPD perspective reflects a perceived lived experience may thus account for observed differences in sorting and the resulting clusters. For example, depression was sorted to clusters labeled Neuropsychiatric symptoms by HCPs and Psychological symptoms by PwPD. Although cluster labels are similar and their contents exhibit overlap, there are also discrepancies within clusters reflecting the two perspectives. That is, the sorting of depression by PwPD appears to reflect its relation to experiences of daily living, as suggested by its co-sorting with statements such as perform everyday occupations, sense of shame, emotional balance and fatigue. In contrast, HCPs’ sorting of depression mainly appear to reflect a clinical perspective by its grouping with aspects such as anxiety, sleep problems, hallucinations and compulsiveness. At the cluster level, HCPs’ relatively coherent sorting of the Neuropsychiatric symptoms cluster seems to reflect consequences of the disease and treatment complications. PwPD, on the other hand, included neuropsychiatric symptoms and emotional reactions in the same cluster, which may be due to relationships between these aspects in terms of their perceived burden. These two perspectives are in line with the concepts of “disease” and “illness” that have been developed in anthropology [22, 23]. In this view, “disease” represents the biomedical perspective of HCPs largely based on the presence of a diagnosis and impaired body function/structure, whereas “illness” represents patients’ personal experience and perception of ill health [22, 23].

Both groups rated quality of life as the most important statement of all. PwPD-related quality of life with aspects of cognitive and executive functioning and sense of control, but also sorted it closely to their Social functioning cluster. HCPs, however, sorted quality of life with Psychosocial problems, including, for example, shame and participation in society, but at some distance from their Cognitive functioning cluster. It thus seems that both groups regarded quality of life to be associated with aspects of social functioning and participation, but PwPD associated it more to cognitive aspects than HCPs did. More in-depth studies, including operational definitions, are needed to further elucidate how these constructs may relate to one another.

From an outcome measurement perspective, our observations suggest that clinical PD trial endpoints primarily should comprise quality of life, walking ability/mobility, psychological well-being, control over the disease and sleep-related variables to best meet the PwPD end-user perspective. By doing so, and also considering fluctuations, depression and falls, they will also be relevant from the HCPs’ perspective. However, in addition to selecting meaningful variables for outcome measurement, it is equally important to ensure that those variables are measured in a meaningful as well as in a scientifically rigorous manner [7, 24, 25]. There are currently a number of outcome measures available that are used to capture the mentioned prioritized areas in PD [2631]. However, although employed in clinical trials, evidence is largely lacking regarding their appropriateness as rigorous outcome measures of these variables. For example, the 39-item PD questionnaire (PDQ-39) is commonly used to measure quality of life in PD. However, evidence speaks against its appropriateness as a clinical trial quality of life outcome measure [27, 3236]. This illustrates the need for comprehensive critical evaluations of available scales regarding their appropriateness as outcome measures in clinical PD trials, not merely regarding their statistical/psychometric properties, but also the extent to which they appear to constitute meaningful and valid representations of well-defined constructs [24, 25, 37, 38].

There are some limitations to this study. The number of participants is relatively small, and a larger sample may have clarified similarities and differences within and between PwPD and HCPs further. Importantly, however, a wide range of PD severities was represented, as well as HCPs from multiple teams at two hospitals representing different levels of care provision and a fair range of professionals typically involved in PD care. There were also slight biases toward people with moderate-to-severe PD (with no representatives of mild unilateral, Hoehn and Yahr stage I PD) during item generation, and toward mild-to-moderate PD during sorting and rating. These aspects may limit the generalizability of findings. Importantly, however, it is reasonable to believe that the list of statements generated by the focus groups was fairly exhaustive since multiple perspectives were taken into account and saturation was reached. A strength of this study is the mixed-methods approach to the comparative analysis of the conceptual models produced by the two groups, which contributed to enhancing the understanding of the similarities and differences in the perspectives of PwPD and HCPs.

In conclusion, despite similarities in perspectives, this study illustrates the importance of attention to the perspective of PwPD in order for clinical trial outcomes to be relevant for this group of end-users. Taking this into account, together with the perspective of HCPs, is likely to yield a broader and richer perspective that also provides evidence from clinical investigations that are meaningful and interpretable for end-users. Arguable, this could also play a role in licensing and reimbursement decision-making for therapies.