Introduction

Freezing of gait (FOG) is an episodic disabling symptom that commonly affects patients with Parkinson’s disease (PD), particularly in the latter stages of the disease [8, 24]. Patients who suffer from FOG typically experience abrupt events where they are unable to move their feet, despite their intention to walk forward. FOG is an episodic phenomenon that often is triggered by cognitive and emotional load and has only a limited response to medications. It not only hinders efficient locomotion but also affects quality of life beyond gait and mobility and may lead to an increased incidence of falls and even morbidity and mortality [4].

The ability to evaluate FOG and quantify its severity in patients with PD is challenging. Several factors contribute to this difficulty. The unpredictable, paroxysmal nature of the phenomenon is one factor. Patients may appear free of this symptom in the clinical setting, typically while “on” anti-parkinsonian medications, although evaluation during the “off” state can increase the likelihood of recording freezing episodes [22]. Although patients or their spouses may report that they have FOG, often it is not observed [11, 24, 29]. Furthermore, FOG is triggered in specific conditions, such as during multitasking and in challenging environments that are difficult to recreate within clinical settings. Without observing FOG, clinicians cannot be confident about the nature of the gait problem they are treating. Nevertheless, assessment of FOG in the clinic may offer a limited indication about the frequency of FOG at home and during daily activities, away from the “sterile” environment. The “gold standard” of FOG assessment is a clinical observation to determine the conditions that elicit FOG and its severity, but this proces is subjective, and inter-rater reliability is variable, ranging from low to good [21].

There are numerous attempts in the literature to suggest protocols to trigger FOG episodes to allow for direct observation in clinical settings [28,29,30]. As such, several tasks have been proposed e.g., 8 shape walking with and without an additional task, rapid 360° axial turns in both directions, and narrow trajectories with obstacles. Nonetheless, a standardized clinical assessment for FOG is still lacking and insensitive.

The evolution of a FOG questionnaire began in 2000 [9]. Because of its limitations, a new version was later introduced [23]. It first determined whether the subject experienced any freezing and then quantified the frequency and duration of the episodes and their effect on daily living activities. This new addition enabled clinicians to rate the severity of FOG based on self-report. Recently, Mancini et al. summarized the latest insights into clinical and methodological challenges for assessing freezing of gait [16]. To fully evaluate FOG, the authors emphasized that it needs to be provoked so that the phenotype can be observed. They concluded that rapid 360° turns in place to both sides are the most sensitive type of provocations. Ziegler and colleagues [32] suggested a structured FOG-provoking test that includes 360 turns clockwise and counterclockwise and other functional provoking situations (i.e., rising from a chair and passing through a narrow door) to elicit FOG under single and multiple-task challenging conditions This performance-based test is scored by a rater to determine the occurrence of FOG or festination. According to the original FOG-provoking protocol, each provoking part of the test was scored from 0 to 3 points (see “Methods”). It should be noted, however, that this scoring of each condition is not influenced by the numbers of the episodes or their duration. This potentially limits the ability to use the test to differentiate the severity of the phenomenon among patients. In addition, it is based on the assessor’s clinical judgment, restraining its usage to experts and possibly also minimizing across-tester reliability. Thus, currently, the clinical assessment of FOG is not yet sufficiently established and the existing clinical tests lack optimal clinometric properties. Several groups proposed utilizing technologies based on smartphones and wearables to collect objective measures for FOG detection to address these issues [1, 2, 15, 18, 19, 26]. These approaches, however, are constrained to research and gait-laboratory settings and are not common practice for most clinicians.

In the present work, we used the FOG-provoking test described by Ziegler et al. [32] since it corporates a full 360° turns, as suggested previously, along with other common triggers to provoke FOG. Here we propose a simple modification that addresses the limitations of the originally proposed test to better evaluate the severity of the FOG phenomenon in a given subject. The goal was to enhance the ability to quantify the degree and severity of FOG in each task condition by simply using a stopwatch and to evaluate changes in response to interventions or medications. More specifically, we explored the benefit of adding the time it takes to complete each condition of the task, as compared to the previously suggested observer-based scoring, for evaluating FOG severity during this FOG-provoking test.

Methods

Study participants

We collected data from subjects with advanced PD and marked FOG. The cohort was comprised of individuals from two sites (the Hinda and Arthur Marcus Institute for Aging Research, Hebrew SeniorLife, Boston, USA, and the Tel Aviv Sourasky Medical Center, Israel) who were invited to participate in a study designed to evaluate the effect of transcranial direct current stimulation (tDCS) on FOG. All subjects met the criteria for idiopathic PD according to the UK Brain Bank criteria. Other inclusion criteria were: a score of 21 points or more on the Mini-Mental State Examination (MMSE), evidence of FOG on examination, a score of 9 or above on the new freezing of gait questionnaire (NFOG-Q) [23], and a stable medication regimen i.e., no change in medications for the month before study participation. Subjects were excluded if they could not comply with the FOG-provoking protocol, if they reported neurological or psychiatric disorders other than PD, if they had severe orthopedic problems, or if they had a history of seizures or deep brain stimulation. Within a single visit, participants completed the FOG-provoking test two times, first in a practical off state (at least 12 h with anti-parkinsonian medication withdrawal) and again 1 h after taking their morning dose of medications. The study was approved by the appropriate ethics committee and was, therefore, been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. All participants provided written informed consent.

Demographic details and disease-related measures were obtained including disease duration, disease severity (Unified Parkinson Disease Rating Scale-UPDRS [7], anti-parkinsonian medications, as well as self-report of FOG severity (NFOG-Q) [23]. The NFOG-Q is comprised of three parts: part I—distinction between freezer and non-freezer, part II—freezing severity based on the frequency and duration of freezing episodes during turning and when initiating the first step; part III—impact of freezing on daily life. Participants completed the FOG-provoking test [32], in the OFF medication state and then again in the ON state.

The FOG-provoking protocol

The protocol includes standing up from a chair, walking 1 m to a marked square on the floor, completing two 360° turns (one in each direction), walking through a door, turning around, walking back to the chair, and sitting down. The test is performed under three conditions of increasing difficulty: (1) single task i.e., usual walking (2) dual task (walking while carrying a tray) and (3) triple task (walking while holding a tray and subtracting 7 s). This method is comprised of scoring the start hesitation after getting up, both turns and the passage through the door. Zero points are given when no festination and no FOG is observed. One point is given when festination or any hastening steps (“shuffling”) are observed. Two points are given if freezing (trembling-in-place or total akinesia) is observed. Three points are given for any abortion of the task or any need for interference by the examiner. In each condition, initiation, clockwise, and counterclockwise turns, passing through the door and the turn inside the room were scored accordingly. For each condition, the score ranged between 0 and 15 points, amounts to a max of 45 points [5]. The original test does not consider the number and the duration of the FOG episodes in each condition. It is scored as two points for one FOG episode or more, one point for hesitation or festination and zero points for no FOG. In addition to this conventional scoring, we recorded the time it took to complete each condition with a stopwatch. All tests conditions were videotaped, analyzed off-line, and FOG episodes were annotated by two raters. The ‘total time frozen’ measure was extracted from the video annotations, representing the sum of the ‘pure’ frozen time during each test condition, in the OFF and the ON medication state.

Statistical analyses

Statistical analyses were performed using Statistical Package for the Social Sciences (SPSS) version 26. Means and standard deviations (SD) were calculated for all dependent variables. To contrast between scoring and timing the duration of the test, we evaluated the ability of each method to capture changes between the different conditions of the FOG-provoking test as well as the effect of anti-parkinsonian medication. The Friedman test was used to compare the score and duration of the three FOG-provoking conditions, both OFF and ON medication states. Post hoc analysis was performed using Wilcoxon’s signed rank tests, adjusted for multiple comparisons via Bonferroni correction. Further, we used Wilcoxon’s Z rank to calculate the effect sizes (Cohen’s d) with the following formula: \(d^{\prime}=Z \div\surd n\). More specifically, we evaluated the change between (a) condition 1 and 3 (i.e., the motor–cognitive cost), and (b) the change within each condition, before and after anti-parkinsonian medication intake (OFF vs. ON medication state). An effect size of 0.2 was considered small, while magnitudes of 0.5 and 0.8 were considered medium and large, respectively. We used Spearman correlation analysis to explore the association between self-reported FOG severity (i.e., the NFOG-Q) and measures of duration and scoring of each testing condition. The correlations between duration and scoring within the same condition were also evaluated. To explore the added value of the test duration over scoring, forward linear regression was used with the NFOG-Q and the total time frozen as dependent variables. The alpha level of significance was set at p < 0.05.

Results

Seventy-one patients with PD and marked FOG participated in this study (mean age 69.9 ± 7.2 years, mean disease duration 9.3 ± 5.8 years, 80% male, mean NFOG-Q score 19.7 ± 4.1, range 10–29). All participants were non-demented (Mini-Mental State Examination mean score: 27.9 ± 1.9). From the cohort, 48 participants (68%) were able to perform the FOG-provoking test both in the OFF- and ON-medication states.

There were significant differences between all three conditions of the FOG-provoking test, for the duration of the condition and its conventional scoring, both OFF and ON medication (p < 0.0001) (Table 1). As expected, the time to complete the task increased across the three levels of difficulty (conditions). With regards to the duration, subjects performed the single-task condition (usual walking) significantly faster than the dual-task condition (walking plus carrying), and the triple task condition (walking plus carrying plus counting) (p < 0.001). In contrast, for the scoring method, a significant difference was found only between single and triple conditions, both in the OFF and ON states (p < 0.001). The scores between the single-task condition and the dual-tasking were not significantly different from each other, both OFF and ON (p > 0.537).

Table 1 The duration and scoring of the FOG-provoking test

The differences between medication state across conditions was significant for the time to completion, i.e., duration of the test (p = 0.015) but not for the scoring method (p = 0.226, see Table 1). Figure 1 contrasts the scoring and duration for the triple-task condition in the ON medication state. For a specific condition, for each test score, there was a wide range of completion times. Similar results were observed also in the two other conditions and in the OFF medication state.

Fig. 1
figure 1

Scatterplot of scoring and timing during the triple-task condition (the most difficult one) in the ON medication state. For each score, there was wide variability of task timing across subjects. Increased timing for a specific score may be due to either longer or multiple FOG episodes. For example, subjects who completed the triple task in 22 s or alternatively in 113 s received the same score: 4

When we compared the test duration to the conventional scoring method to explore the motor–cognitive cost (single-task condition vs. triple task, i.e., the extreme conditions), we observed higher effect sizes for the duration as compared to scoring both in the OFF (0.85 vs. 0.68, respectively) and in the ON (0.87 vs. 0.55, respectively) medication states. Similarly, the effect size based on the test duration was larger than the effect size of the test scoring when comparing between OFF and ON under all conditions, with the second condition demonstrating the largest difference (see Fig. 2 and Table 2). Additionally, a linear forward regression model revealed that test duration was the only independent predictor for the NFOG-Q (OFF state R = 0.53 p < 0.001; ON state R = 0.26 p = 0.041). Furthermore, the test duration was also the only independent predictor for the total time frozen across conditions and medication states (OFF state R > 0.98 p < 0.00001; ON state R > 0.77 p < 0.00001).

Fig. 2
figure 2

On the left panel, we present the effect size measured between the single, usual-walk task (condition 1) and the triple task, which includes concurrent motor and cognitive load (condition 3). The right panel demonstrate the effect size between ON and OFF medication state within the three conditions. As shown, the timing method (black bars) captured greater changes among all contrasts, suggesting higher sensitivity to change as compared to the conventional scoring. Dashed line represents the magnitude of the effect size; small = 0.2, medium = 0.5, large = 0.8

Table 2 Effect sizes based on duration and scoring of the FOG-provoking test

In general, the test duration and the conventional scoring were moderately correlated with each other (rho < 0.7, p < 0.05). The correlations between parts 2 and 3 of the NFOG-Q and the duration and the scoring are presented in Table 3. Correlations were stronger in the OFF state as compared to the ON medication state. The test duration was more strongly correlated with the NFOG-Q than the test scoring. More specifically, moderate correlations were found between NFOG-Q part 3 and the duration of the test.

Table 3 Spearman correlations of self-reported FOG severity with timing and scoring

Discussion

The current study evaluates the possibility of enhancing the scoring of a previously validated FOG-provoking test. Our findings support the notion that monitoring the test duration of the different conditions improves the sensitivity of the test in terms of response to medications (ON vs. OFF) and the complexity of the task. This idea is supported by two findings: (1) higher effect sizes were observed for the duration of the task as compared to scoring across all conditions and medication state and (2) the duration of the task remained the only independent predictor of both the NFOG-Q and the total time frozen. The motor–cognitive cost, for example, was better reflected when using test duration as compared to the conventional scoring (i.e., a large vs. a medium effect size, respectively). More importantly, the change in the effect size category was more pronounced in the ON state representing the common assessments in clinics and in daily life.

From a clinical perspective, the severity of FOG (frequency and duration) may be underestimated when using the scoring method alone. Thus, the simple addition of monitoring the duration of the test with a stopwatch, as shown by the regression analysis, could augment the ability to assess the freezing severity and perhaps better evaluate the medication and potentially other intervention effects.

FOG is a multi-dimensional problem that varies in its presentation, severity, duration, phenomenology, and time of occurrence. Moreover, there is a complex relationship between FOG and medication intake. It is, therefore, difficult to capture FOG episodes and to determine the severity of the problem. The suggested FOG-provoking test may help the clinician to capture and actually observe FOG episodes in the clinic or home setting. In addition, the use of the test duration simplifies the assessor’s role and offers an objective measurement to this multifaceted phenomenon. Further, adding timing can help to provide objective information about how much the subject advanced in the disease and how FOG is severe. The strong significant correlations between the objective duration measurements and the NFOG-Q part III (recall Table 3) and the regression analysis outcomes illustrate its potential added value over the originally proposed scoring method alone. These associations demonstrate that a FOG assessment based on test duration may better reflect the burden of freezing and its impact on the functional activities in the daily life routine of individuals with PD.

It is interesting to compare the evolution of the present test to another performance-based test that is used to assess functional mobility in older adults. Originally termed the “Get up and go” test [17], this test started as a qualitative form of assessment. Later, timing to complete the test was added, introducing a quantitative and objective one measure, that could be obtained with minimal expertise or training. The timed up and go (TUG) [27] version enabled the establishment of cut-off points for fall risk and other outcomes in many populations and is now a widely used measure of mobility. Subsequently, it was suggested that this test not only taxes motor functions but also relies on some cognitive resources [12] and finally, several groups have used wearable sensors as objective measures (i.e., the instrumented timed up and go -iTUG) to expand the evaluation of its sub-tasks [13, 14, 25, 31]. Because of its ease-of-use, clinical utility and objective outcome, the timed up and go has been used in thousands of studies. Somewhat analogously, adding timing quantification to the present FOG test conditions apparently improves its clinometric properties such as sensitivity to medications, and perhaps enhances inter-rater reliability and ceiling/floor effects. Moreover, this modification might magnify the clinical utility of the test, such as exploring cut-off points for fall risk or potentially classifying and distinguishing between patients with more severe FOG. Furthermore, timing with a stopwatch can help to calculate and determine the percent time frozen (i.e., the cumulative duration of FOG episodes/total duration of the walking task). Several studies used ‘percent time frozen’ as an outcome measure for change [3, 6, 10, 20] and it was suggested that the percent time frozen had very strong agreement between raters and was found to be a reliable metric of FOG severity [21]. It would be interesting in the future to use this approach to better grade FOG severity; however, video annotations of FOG are time consuming and very challenging and less applicable for clinical assessment and unspecialized raters.

Following the example of the timed up and go test, the next step in the development of the FOG-provoking test could be to instrument the task with wearable devices and body-fixed sensors. The analysis of signals such as acceleration or pressure during the test can provide further information about the timing of the different sub-tasks that make up the test (i.e., gait initiation, walking, and turning) and additional, quantitative gait and balance measures (e.g., transition to and from sitting). We speculate that this approach may be utilized to generate sensitive markers for the detection and characterization of FOG episodes and further enhance its utility.

Meanwhile, the simple addition of measuring the test duration via a stopwatch may already be applied in-home and clinical environments. The equipment requirement is low cost and easy-to-use with no need for special expertise. Test duration can be used to provide immediate feedback of freezing severity and a clear interpretation of a patient’s performance. These initial results suggest that the minimal extra effort that is required to time a FOG-provoking test enhances its utility and sensitivity.