1 Introduction

PD is a chronic, progressive, neurodegenerative disorder [2, 10, 14, 16, 30], with which a great number of motor and non-motor symptoms have been associated. The disorder was first described by James Parkinson in 1817 [26]. It affects the movement, and it is typically characterized by a loss of (motor) function, increased slowness and rigidity. Presently, the cause and origin of PD remain unknown [9, 14, 17, 30] and it cannot be cured. Consequently, treatments aim at reducing severity and frequency of motor complications. The disease is generally associated with elderly people and is rarely diagnosed before the age of 40. It is estimated that the mean age of onset is about 65 years [30].

PD is a great burden as it considerably decreases the quality of life, due to a gradual loss of function and decreasing ability to take care of oneself. The World Health Organization (WHO) considers the burden of PD to be on the same disability level as an amputated arm, drug dependency, congestive heart failure, deafness and tuberculosis [20]. The cardinal symptoms are bradykinesia, rigidity, tremor and postural instability [1, 10, 14, 16, 17, 30, 32]. However, a number of non-motor-related symptoms (e.g., sleep disturbances, depression, psychosis, autonomic and gastrointestinal dysfunction as well as dementia) may occur as well [10, 14, 16, 18, 30].

One of the motor symptoms is called FOG (also known as freezing or motor blocks). It is a form of akinesia, which presents itself as an inability to initiate or continue gait [12, 16, 24, 30, 31]. Motor blocks are a common symptom, experienced by people with Parkinson’s (although it does not occur uniformly) and can affect various extremities (e.g., arms and legs) as well as the face [16]. Freezing greatly impairs the quality of life of those affected and is one the most disabling symptoms. It is usually attributed to medium and advanced stages of PD, and it is a common cause of falls [6, 16, 30]. A single freezing episode typically lasts for several seconds. In severe cases, episodes can be apparent for as long as several minutes.

Continuous monitoring of FOG events can give neurologists information which is otherwise difficult to obtain. Clinical assessment of FOG at the doctor’s office is considered to be problematic since symptoms are commonly not evident in this clinical environment [25]. Thus, a wearable device capable of ambulatory monitoring FOG could benefit patients in two ways. First, it could provide clinicians with complementary information of the disease that can be used to improve treatment [31]. Second, since patients are capable of improving gait based on specific stimulations provided as haptic, visual or auditory cues [19], real-time FOG detection would allow patients to avoid some episodes and, consequently, avoid falls, such as the system presented in [3]. Consequently, many studies have attempted to develop wearable devices for the detection of FOG.

The literature indicates that these studies typically make use of multiple sensors (i.e., accelerometers, gyroscopes) at various body locations and they usually employ some form of supervised learning approach [e.g., SVM or neural network (NN)]. Djurić-Jovičić et al. [11] achieved an error rate of up to 16 % classifying “normal” (i.e., standing and regular steps) and pathological (i.e., festination, akinesia, shuffling and small steps) walking patterns of PD patients based on a NN (using multiple inertial measurement units). The approach by Cole et al. [7] yielded to 82.9 % sensitivity and 97.3 % specificity in detecting FOG (using acceleration and electromyograph (EMG) sensors) with a multistaged algorithm that utilized a simple linear classifier and a dynamic neural network (DNN). Cole et al. employed data collected during unscripted and unconstrained activities in an apartment-like setting. However, there is no information on the activities that patients performed. The other works that can be found in the literature employed signals gathered during scripted activities, such as Niazmand et al. [23], who used an accelerometer-based smart garment [22] to extract gait-related features. They achieved 88.3 % sensitivity and 85.3 % specificity (using multiple accelerometers). The approach by Bächlin et al. [5] yielded to 73.1 % sensitivity and 81.6 % specificity for detecting FOG events in real time (using multiple accelerometers and gyroscopes).

In this work, the authors present a multistaged approach based on an SVM and a single tri-axial acceleration sensor. Using a linear SVM kernel and the full feature set (see Table 5), an accuracy of 98.7 % and a geometric mean of 96.1 % have been achieved. The overall dataset includes signals from 20 PD patients, among who 8 of them presented FOG episodes. The overall dataset is divided into the training and the testing dataset, the latter including signals from two patients with FOG and three patients without FOG. These results have been obtained with a patient-independent methodology. Furthermore, the algorithm can be configured toward a higher sensitivity or a higher specificity. The employed movement signals were collected for the REMPARK project (Personal Health Device for the Remote and Autonomous Management of Parkinson’s Disease) database [29]. This project aims to develop a closed-loop system with the purpose of monitoring PD motor and non-motor symptoms and responding to these symptoms in real time using a series of actuators. Data collection of REMPARK’s database inertial signals took place in 4 different countries (Spain, Italy, Israel and Ireland). Signals were obtained at patient’s home during the execution of roughly scripted tasks (e.g., walk around the apartment and show all rooms) that enabled patients to perform partially unconstrained activities.

2 Methods

Firstly, the data acquisition is described. Then, the methodology and model selection of the proposed approach are outlined.

2.1 Data acquisition and labeling

All participants (aged between 50 and 75 years) had a clinical diagnosis of idiopathic Parkinson’s disease according to the UK PD Society Brain Bank [15]. Clinical fluctuations were present in all patients as well as Hoehn and Yahr stage [13] above two (moderate–severe phase of PD). Furthermore, all patients gave their signed informed consent before their participation. The experimental protocol was approved by the corresponding local ethics review committee. For this paper, signals from 20 PD patients were used, among which 8 patients presented FOG episodes and 12 did not present the symptom. The recordings are identical to those employed by Rodríguez-Martín et al. [27, 28].

Table 1 The number of windows (before aggregation) in each dataset that are used for signifying FOG
Table 2 The full set of features used for FOG detection. In contrast, the reduced feature set is only comprised of a fast Fourier transform (i.e., index 1)

As part of the experiments, participants were recorded with an HD quality camera while wearing a set of sensors (i.e., accelerometer, gyroscope and magnetometer) as they performed a set of scripted activities. However, these activities are of a rather general nature (e.g., walking around the apartment and showing it to the researchers or carrying a full glass of water from the kitchen to another room) and they are much more variable in comparison with other typically scripted activities like hand-to-nose or similar gestures which are performed in a seated position. The recordings also include non-scripted activities that took place during the recordings, such as answering the phone or similar unexpected situations that in some cases lead to FOG episodes due to turnings or going through narrow places. The experiments took place at the participant’s apartment and started in the morning. During the course of the day, two recording sessions took place: one in the “OFF” motor state and one in the “ON” motor state. For the first session, participants were asked to skip their morning dose of medication, thus recordings started while the participant was in a clinically defined OFF state [29]. After finishing the first round, participants took their normal medication and the second recording session was started once the participant had reached a clinically defined ON state. During both recording sessions, participants performed a series of short controlled activities. The activities performed by patients during their OFF state were an indoors walking test, a FOG provocation test and a gait test. During the ON state, a dyskinesia test, a dual-task test and a set of activities of daily livings (ADLs) also were performed. ADLs included brushing teeth, shaking a deodorant, erasing with an eraser, writing with a pencil, typing on a computer keyboard, cleaning a window or furniture and drying a wet glass [29].

Experienced clinicians labeled the videos based on the activities that patients performed and the symptoms shown during the video. The clinicians who performed the labeling were also physically present during the recording sessions. Each of the clinical sides (one for each country) had two clinicians with several years of experiences with PD patients (i.e., \(\ge\)5 years). Prior to the recording sessions, all clinicians received a training session on setting baselines for labeling of symptoms (including FOG). The group that performed the labeling is disjoint from the group that performed the analysis.

Video and inertial signals were synchronized based on the procedure described in [29]. FOG labels provided by clinicians have been treated with an automatic labeling procedure in order to consider specific peculiarities of FOG. In this sense, it is important to note that clinicians in charge of the FOG labeling had been with the patients during the experimental protocol and, in consequence, these clinicians knew whether a patient presented FOG episodes or not before examining the video recordings. However, in the labeling process of patients with FOG, clinicians may miss some FOG episodes given that, in some moments, video camera was not close enough to notice mild episodes. Accordingly, recordings of freezers were cut to the point where only FOG labels remained. This reduced the overall amount of data for recordings of freezers but ensured that no freezing episodes (which might not have been properly labeled) were used. On the other hand, those patients without any freezing episodes were relabeled in such a way that all available data were used. Consequently, sensitivity was determined by using data from patients with freezing episodes, while specificity was determined by using data from non-freezing patients. Overall, this procedure allowed using larger portions of the recordings.

As far as the actual labeling is concerned, the presence of any type of freezing (e.g., start, turn, end) was considered to be an episode of FOG. The detection of individual types of freezing requires additional contextual information which is not contained within the database (DB). Furthermore, such a fine granularity might not provide an additional value (e.g., to a PD monitoring system). The fact that a freezing episode is happening is more relevant than the actual type of episode (e.g., for rhythmic cueing purposes). Consequently, freezing episodes are detected rather than individual types of freezing.

2.2 Methodology

The general methodology is such that acceleration signals from a waist-mounted sensor are split into equally sized windows (i.e., a sliding window is applied to the time series). Features are extracted from those windows and fed to an SVM for training or classification. The classification output of n consecutive windows \(s_1,\ldots ,s_n\) is then aggregated over time t (in seconds) to achieve higher accuracies. However, the volatile nature of FOG must be considered during the development of an algorithm for detecting such episodes. In contrast to resting tremor (or dyskinesia for that matter), episodes of FOG do not last for prolonged periods of time which may emphasize the importance of the chosen window size ws. In any case, the contents of the database are split into two datasets (i.e., training and testing) that are used for training an SVM as well as optimizing additional parameters and testing, respectively. Datasets stay the same for all approaches (details are listed in Table 1). The individual datasets hold 15 and 5 patients for the training and testing dataset, respectively.

Two feature sets are evaluated: a reduced feature set with only the fast Fourier transform (FFT) and a full feature set with various additional features (see Table 2). The effect of adding these additional features is quantified in Sect. 3. These features are comprised of the freezing index [4] as well as some frequency-related features for differing frequency ranges [21].

At first, varying window sizes ws were evaluated such that freezing of gait detection was optimized. The comparison of different window sizes was done on an episode level (rather than a window level). An episode of FOG was detected when at least one window within an actual FOG episode was classified as such. As far as non-freezing episodes were concerned, an aggregation of windows over a period of time that corresponds to the average length of a FOG episode plus twice the standard deviation is performed. The acceleration data are resampled to 40Hz and split into unisized chunks of data \(s_1, \ldots , s_m\) with a certain length ws that overlapped to 50 %. These windows are then used to extract features which in turn were fed to an SVM for training and classification. This resembles the first and naive approach, where \(\mathrm{freezing}_{j}^{1}\) represents the \(j\mathrm{th}\) window in the series \(s_1, \ldots , s_m\) and whether FOG is present in that window.

$$\begin{aligned} \mathrm{freezing}_{j}^{1} = \left\{ \begin{array}{lll} 0 &{} \text {no freezing} &{} \text {if} \ \ f_{{SVM}} \le 0\\ 1 &{} \text {freezing} &{} \text {if} \ \ f_{{SVM}} > 0 \end{array} \right. \end{aligned}$$
(1)

where \(f_\mathrm{svm}= \sum _{i=1}^l y_i \alpha _i K({\mathbf {x}}_i, \mathbf {f}) + b\), \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_l\) are the support vectors (SVs), \(y_i, \alpha _i\) are the corresponding label and Lagrange multiplier of each SV and b is the bias [8]. The number in the superscript (here: 1) indicates the variation. The second and third variation will use 2 and 3, respectively.

The second variation aggregates the SVMs’ outputs over a time period t and calculates the degree of confidence \(c_j\). If the confidence value exceeds a threshold th, then the aggregated time frame t is considered to be an episode of FOG, otherwise not. Here, \(\mathrm{freezing}_{j}^{2}\) covers a time frame t (starting at the \(j\mathrm{th}\) window and covering n windows) and determines whether FOG is apparent in that time frame.

$$\begin{aligned} t= & {} \frac{ws(n + 1)}{40 * 2} \end{aligned}$$
(2)
$$\begin{aligned} c_j= & {} \sum ^{j+n-1}_{i=j}{\frac{\mathrm{freezing}_{i}^{1}}{n}} \end{aligned}$$
(3)
$$\begin{aligned} \mathrm{freezing}_{j}^{2}= & {} \left\{ \begin{array}{lll} 0 &{} \text {no freezing} &{} \text {if} \ \ c_j < th\\ 1 &{} \text {freezing} &{} \text {if} \ \ c_j \ge th \end{array} \right. \end{aligned}$$
(4)

where \(c_j, th \in [0, 1]; n, j \in \mathbb {N}; n, j > 0; t \in {\mathbb {R}}^{+}\). n, as previously described, corresponds to the number of windows that are aggregated in order to span the time period of t seconds.

The third variation introduces a second threshold. The lower threshold \(th_l\) and upper threshold \(th_u\) can be used to tune sensitivity and specificity separately. The lower threshold \(th_l\) sets the maximum confidence value for “no freezing” periods, and the upper threshold \(th_u\) sets the minimum confidence value for freezing episodes. By not requiring that these thresholds need to be equal (which would essentially be variation two), the final output of the algorithm may indicate the presence of freezing as well as “undefined.” This is the case when the confidence value is between the two thresholds. Consequently, some aggregated windows may be ignored and data usage is lowered.

$$\begin{aligned} \mathrm{freezing}_{j}^{3} =\left\{ \begin{array}{lll} 0 &{} \text {no freezing} &{} \text {if} \ \ c_j < th_l\\ 1 &{} \text {freezing} &{} \text {if} \ \ c_j \ge th_u\\ -1 &{} \text {undefined} &{} \text {if} \ \ th_l \le c_j < th_u \end{array} \right. \end{aligned}$$
(5)

where \(c_j, th_l, th_u \in [0, 1]; th_l \le th_u; j \in \mathbb {N}; j > 0\)

2.3 Model Selection

The individual SVM models are trained with the features that were extracted from the training dataset. For the second and third variation, the individual parameters t, th, \(th_l\) and \(th_u\) are also optimized on the training dataset. The final results are obtained from the testing dataset.

The window size ws is determined before any of these parameters are evaluated. For each of the proposed window sizes ws (see below), the naive algorithm is applied to the training dataset. The window size that yields to the best combination of accuracy and geometric mean is chosen.

During training, varying settings for kernel, weighting, cost and gamma were considered. The weighting parameters were used to balance both classes “FOG” and “non-freezing.” The cost and gamma parameters were systematically evaluated (i.e., \(10^{q}, q \in \{-3,-2,...,2, 3\}\)) depending on the chosen kernel (i.e., radial basis function (RBF) kernel or linear kernel). Additionally, a tenfold cross-validation is performed on the training dataset. However, instead of averaging the accuracy of the training set, the geometric mean of sensitivity and specificity is used (i.e., \(\sqrt{\mathrm{sensitivity}*\mathrm{specificity}}\)) to identify those parameters combinations with high sensitivity and specificity. The maximum geometric mean is used to select the optimal parameters and obtain the final SVM model to be used with the test dataset. The geometric mean was chosen as it does treat both classes equally as opposed to accuracy which implicitly weights the classes. The weighting of latter measure can be a problem if the classes have (very) different priors.

The following discrete values have been evaluated: \(ws \in 2^{\{5,6,7,8\}}\); \(t \in \{10, 15, 20, 25, 30, 45, 60\}\); \(th, th_l \in \{0, 0.05, 0.1, \ldots , 0.95, 1.0\}\); \(th_u \in (th_u\,\ge \,th_l | th_u \in \{0, 0.05, 0.1, \ldots , 0.95, 1\})\). The appropriate values and parameters were evaluated for each of the four conditions (two kernels and two feature sets).

3 Results

The average length of a FOG episode in our dataset was 3.48 [\(\pm\) 3.29] s (total: 209 freezing events). Figure 1 shows several measures for varying window sizes (i.e., sensitivity, specificity, geometric mean and accuracy). The best values for those measures were achieved with a window size of 128 samples (i.e., \(2^7\) samples). Accuracy and geometric mean were closest at this level. Consequently, this window size was utilized in all further analyses.

Fig. 1
figure 1

Results of an evaluation for varying window sizes with respect to freezing episodes. For each window size, an SVM has been trained on the training dataset and evaluated on the test dataset

Table 3 presents the results obtained for the first variation. It was observed that, on the training dataset, both full and reduced feature sets yield a similar geometric mean regardless of the employed SVM kernel. This, however, diverges on the test dataset. The RBF kernel seems to benefit from the reduced feature set, while the linear kernel favors the full feature set. Acceptable levels of specificity are consistently achieved on the test dataset, while sensitivity was reduced by false negatives (FNs). The latter may be counteracted when windows are aggregated. Nonetheless, accuracies above 90 % were consistently reached.

The impact of window aggregation t and threshold th is highlighted in Fig. 2 for all four conditions. The subfigures indicate that a threshold close to 50 % works best in all cases. Furthermore, it is observed that the geometric mean increases with the aggregation level.

Numerical results for the second variation are shown in Table 4. All conditions yielded to a threshold close to the intuitive border of 50 %, which is consistent with the observations in Fig. 2. Moreover, the aggregation period t is the same across all four conditions. Having optimized parameters t and th on the training dataset, the results on the testing dataset show an increase by 9.4 % (on average). All conditions achieve a high specificity of 98 % or greater, and furthermore, most conditions also reached a sensitivity of 90 % or above for an aggregation period of 60 seconds.

Results in Table 5 are those of the third variation. Most conditions still favor an aggregation level of 60 s. The lower and upper thresholds (i.e., \(th_l\) and \(th_u\)) were consistently found to enclose the previously found thresholds th in the second approach (see Table 4). Allowing for two thresholds increased sensitivity and specificity values on the test dataset for the linear kernel. However, the RBF kernel did not benefit from this approach in terms of geometric mean. The average change in geometric mean from variation two to three was \(-\)1.2 and 3.7 % for the RBF and linear kernel, respectively. Nonetheless, all conditions yield to a sensitivity of roughly 90 % and a specificity well above 90 %. However, this was at the cost of a slightly reduced data usage although still above 90 % for the most parts.

Table 3 Results in signifying FOG with the naive approach (i.e., variation 1). Various measures are listed for both datasets
Fig. 2
figure 2

Effect of window aggregation t and threshold th on geometric mean. The results are shown for all four conditions. a RBF with full feature set. b RBF with reduced feature set. c Linear kernel with full feature set. d Linear kernel with reduced feature set

Table 4 Results in signifying FOG with the one-sided approach (i.e., variation 2)
Table 5 Results in detecting FOG with the two-sided approach (i.e., variation 3)

4 Discussion

The presented FOG detection methods result in a geometric mean of 88.7, 96.1 and 96.1 % for each of the three proposed approaches (linear kernel with full feature set). Thus, the meta-analysis used in the second and third variation is shown to enable a better recognition of FOG episodes since it improves the overall performance (geometric mean) by 8 %. Regarding the feasibility of detecting FOG uniquely by means of frequency features, it was observed that a geometric mean of 96.1 % (one-sided approach with RBF kernel) can be achieved based on them. This way, it is concluded that frequency features enable a reliable monitoring of FOG.

The results previously obtained by Niazmand et al. [23], Cole et al. [7] and Bächlin et al. [5] were consistently lower than the obtained by the presented approach in its third variation, which has yielded to an average sensitivity and specificity above 94 %. Niazmand et al. [23] achieved a sensitivity of 88.3 % and a specificity of 85.3 %. Compared to the results from Bächlin et al. [5], both sensitivity (73.1 %) and specificity (81.6 %) were much lower than the ones reported in this paper. However, Cole et al. [7] achieved a similar level of specificity (97.3 %), but with quite lower sensitivity (82.9 %), although their signals were collected during completely unconstrained activities.

A limitation of the presented work relies in its applicability to real-time detection in order to provide rhythmic cues. In this case, a short lag between the appearance of a FOG episode and its detection is desired. The meta-analysis in the second and third variations may add a delay that could reach the aggregation time (60 s), which would not allow to be used for this particular purpose, although remaining useful in monitoring tasks. The first variation, however, could be employed, since the lag provided is roughly 3.2 s (128 samples at 40 Hz). A second limitation with respect to the work of Cole et al. [7] is that the signals employed in this work were not acquired in a completely unconstrained setting. In consequence, performances obtained may decrease with new activities in the daily life of patients. However, the REMPARK database also includes signals recorded under these specific conditions. In the future, these signals will be analyzed to determine the applicability and performance of the presented methodology on these specific conditions.

Besides the performance comparative, the proposed approach has with respect to [5] the advantage of being patient independent, given that the same classifier can be used by any patient. Moreover, we only use a single tri-axial accelerometer at the waist, while Cole et al. [7] used three tri-axial accelerometers and surface EMG, Niazmand et al. [23] five accelerometers and Bächlin et al. [5] three accelerometers and three gyroscopes. Moreover, the presented approach offers configuration capabilities since the algorithm can be tuned toward high sensitivity and high specificity by adjusting the thresholds. Finally, the optimal window size has been determined by evaluating the performance of the algorithm at episode level, as opposed to window level used in previous works, which may have increased specificity.

5 Conclusion

This work has evaluated three approaches to detecting FOG in Parkinson’s patients based on a waist-worn accelerometer. The optimal window size was determined, and it has been analyzed whether frequency features are sufficient to reliably detect FOG.

Although the linear and RBF kernel do not benefit equally from the third approach, combining the results from both variations (i.e., second and third variation) shows promising results. While the RBF kernel achieved a geometric mean greater than 95 % and an accuracy greater than 98 % with the second approach, the linear kernel reached similar levels (close to 95 % geometric mean and 98 % accuracy) with the third approach. However, in the latter case, the data usage is slightly penalized. The findings suggest that the full feature set is not required for satisfactory results. Instead, a linear kernel that has been trained with an FFT alone can accurately detect FOG episodes. Finally, the optimal window size has been found to be 128 samples (at 40 Hz).

In comparison with the previous approaches, although the method presented in this work has obtained higher performance metrics than those previously reported, it is noted that the conditions in which each study takes place are different and, in this sense, our study suffers of a lack of completely unconstrained activities, which may decrease the method’s performance during the activities of daily living of PD patients. However, the REMPARK database also includes signals recorded under these specific conditions. In the future, these signals will be analyzed to determine the applicability and performance of the presented methodology on these specific conditions. On the other hand, the present approach has the advantage of working in a patient-independent way and only requiring a single tri-axial accelerometer.