1 Introduction

Neurodegenerative diseases (NDs in the following) are incurable and debilitating, caused by progressive degeneration of nerve cells, affecting movements and/or mental skills. Alzheimer’s disease (AD) is the most common among them, and because of worldwide lifespan lengthening, it is expected that its incidence will dramatically increase in the coming decades.

AD produces a slow and progressive decline in mental functions such as memory, thought, judgment, and learning abilities. The predominant symptom in the early stages of AD is the episodic memory impairment, whereas later stages are characterized by progressive amnesia and deterioration in other cognitive domains.

Unfortunately, there is no cure for AD, but its symptoms can be managed during their progression. This creates a critical need for the improvement of the approaches currently used for diagnosing them as early as possible. As cognitive and motor functions are both involved in planning and execution of movements, and because handwriting requires a precise and properly coordinated control of the body [20], the analysis of handwriting dynamics might provide a cheap and non-invasive method for evaluating the disease progression [10]. Furthermore, it has been observed that the application of machine learning methods to motor function has shown promise in decreasing the time taken to perform clinical assessments [1, 12]. To this aim, cheap and widely used graphic tablets can be used to administer handwriting tests, which include simple and easy-to-perform handwriting/drawing tasks [10], and to record kinematic and dynamic information of the performed movements. For this reason, researchers are showing an increasing interest in developing and using machine learning based methodologies to support both the diagnosis and the treatment of NDs, and several methods have been proposed for the diagnosis of both AD [24].

The Kinematic Theory of rapid movements, together with the use of the Sigma-Lognormal model, allows the decomposition of a complex movement into a vector summation of simple time-overlapped movements [15,16,17]. This theory has been applied in several fields to model numerous human movements such as, handwriting [14], speech [2], head and trunk movement [11], etc. However, it has been barely applied to the detection and monitoring of neuromuscular disorders [13, 19]. Specifically, this model has been used to classify parkinsonian patients in this pair of papers [8, 9]. The authors found competitive performance by combining this model with other velocity-based features like Maxwell-Boltzmann distribution, Fourier, and Cepstrum transforms.

In this paper, we present the results of a preliminary study aimed at exploring the use of lognormal features to classify patients affected by AD, on the basis of their ability to accomplish six handwriting tasks. Those tasks were introduced in [4], and are described later in this paper. We collected the data produced by 174 participants (89 AD patients and 85 healthy people). To the best of our knowledge, this is the largest dataset containing handwriting data related to AD. Starting from the lognormal parameters computed to represent the handwriting contained in this data, we have identified fourteen features that can be used to characterize the handwriting of people affected by AD. We assessed the effectiveness of the features extracted We used seven well-known and widely used classifiers to asses the effectiveness of the features proposed. The promising results achieved confirms that lognormal features can be used to support AD diagnosis.

The organization of this paper is as follows. Section describes the Sigma-lognormal model used for the representation of handwriting. In Sect. 3 we present the tasks used to collect handwriting data and the features extracted using the Sigma-lognormal model. Section 4 details some experimental results. Concluding remarks and possible future investigations are outolined in Sect. 5.

2 The Sigma-Lognormal Model

Based on lognormal movement decomposition, there are several studies about the normative range of variations in the lognormal parameters, which give a notion of how ideal a movement could be [18]. To parametrizing the human movement velocity and trajectory by the Kinematic Theory, different algorithms have been developed, as Robust XZERO [5, 14] and IDeLog [6]. In this work we based on the IDeLog algorithm [6].

Sigma-Lognormal model considers the resulting velocity of each simple fast movement primitive as a lognormal function (\(\varLambda \)), being each peak of velocity between two speed minima modeling by a lognormal. The lognormal parameters, \(t_{0_j} \), \(\mu _j\) and \(\sigma _j^2\) are calculated finding the less minimimun error between the velocity profile and the obtained lognormal from successive interactions and the trajectory original profile and the reconstructed one. The lognornormal function that model each velocity peak or “simple movement” or “stroke” can be defined as:

$$\begin{aligned} v_j (t;{t_{0_j} ,\mu _j,\sigma _j^2 )= D_j \varLambda (t;t_{0_j} ,\mu _j,\sigma _j^2 )=\frac{D_j}{\sigma _j\sqrt{2\pi }(t-t_{0_j})}exp\{\frac{[-ln(t-t_{0_j})-\mu _j]^2}{2\sigma _j^2}} \end{aligned}$$
(1)

where t is the time basis, \(D_j\) the amplitude , \(t_{o_j}\) the time of occurrence, \(\mu _j\) the time delay and \(\sigma _j\) the response time, both on a logarithmic time scale.

In case a complex movement, a succession of simple movements or strokes as can be observed in Fig. 1, the velocity profile \(v_n(t)\) is given by the time superposition of the M previous lognormals.

$$\begin{aligned} v_n(t)=\sum _{j=1}^{M}v_j (t;{t_{0_j}} ,\mu _j,\sigma _j^2)=\sum _{j=1}^{M}D_j \begin{bmatrix} \cos (\varPhi _j (t))\\ \sin (\varPhi _j(t)) \end{bmatrix} \varLambda (t;t_{0_j} ,\mu _j,\sigma _j^2 ) \end{aligned}$$
(2)

where \(\varPhi _j(t)\) is the angular position given by:

$$\begin{aligned} \varPhi _j (t)=\varTheta _{s_j}+\frac{(\varTheta _{e_j}-\varTheta _{s_j})}{2}[1+erf(\frac{\ln (t-t_{0_j})-\mu _j}{\sigma _j\sqrt{2}})] \end{aligned}$$
(3)

being \(\varTheta _{s_j}\) and \(\varTheta _e{_j}\) are the starting and the end angular direction of the \(j^th\) simple movement or stroke.

Fig. 1.
figure 1

An example of lognormal.

3 Tasks and Features

Following subsections detail the data collection procedure, the tasks used, and the features extracted.

3.1 Data Collection

Nowadays it is known that alteration in handwriting is one of the first signs of AD, that is why data acquisition step for this work focus on the recording and collection of handwriting samples. Those samples comes from the execution of a protocol [4] composed of different kinds of handwriting tasks. Every participant executed all the tasks with a special pen on A4 paper sheets fixed to a graphic tablet, that allow the recording of the handwriting in terms of x-y-z coordinates for each point, acquired at a constant sampling rate, equal to 200 Hz. The first two coordinates are spatial ones and represent the point position in the two-dimensional surface where the writing is produced, while the third one is a measure of the pressure exerted by the subject at that point. This last measure assumes positive values when the pen is on the sheet, while a null value when it is detached, up to a maximum distance of 3 cm from the sheet, beyond which the system is not able to receive information. The protocol was administered to a group of 174 participants: 89 patients at the first stages of AD and a control group of 85 people. Both the AD patients and the control group were recruited with the support of the geriatrics department, Alzheimer’s unit, of the “Federico II” hospital in Naples. Both groups were selected according to a recruiting criteria based on standard clinical tests, such as the Mini-Mental State Examination (MMSE), the Frontal Assessment Battery (FAB) and the Montreal Cognitive Assessment (MoCA).

3.2 Tasks

In this study we considered only the handwriting samples relative to six tasks of the protocol:

  1. 1.

    Join two points with a vertical line continuously for four times. The up-down vertical movements require the finger joint and wrist movements. This task is useful to investigate elementary motor functions [27];

  2. 2.

    Trace a circle continuously for four times. The circle diameter has to be 6 cm. This task allows to test the automaticity of movements and the regularity and coordination of the sequence of movements [21];

  3. 3.

    Write continuously for four times, in cursive, the bigram ‘le’. These letters allow to test the motion control alternation;

  4. 4.

    Copy in reverse order a simple italian word: “bottiglia” (bottle in English). This task has been inspired by the MMSE test, where one of the task requires people spelling a word backward;

  5. 5.

    In the fifth task a telephone number (10 digits) has to be written under dictation. The hypothesis underlying the introduction of this task is that motor planning in writing a telephone number is different from that for writing a word;

  6. 6.

    The sixth task is the Clock Drawing Test (CDT). In [26] the authors found that CDT shows a high sensitivity for mild AD.

The first two tasks belong to the category of graphic tasks, whose objective is to test the patient’s ability in: (i) writing elementary traits; (ii) joining some points; (iii) drawing figures (simple or complex and scaled in various dimensions). The third and the fourth tasks are copy and reverse Copy tasks, whose objective is to test the person’s abilities in repeating complex graphic gestures, which have a semantic meaning, such as letters, words and numbers (of different lengths and with different spatial organizations). The fifth is a dictation task, whose purpose is to investigate how the writing varies (with phrases or numbers) in which the use of the working memory is necessary throughout the execution. The sixth task is a graphic task whose purpose is not only to test the dynamic ability of a person, but also his cognitive skills, the spatial dysfunction and lack of attention. This test requires verbal understanding, memory and spatially coded knowledge in addition to constructive skills [25].

3.3 Lognormal Features

The feature engineering process allowed us to identify a set of features that according to our domain of knowledge were good candidates to discriminate the handwriting of people affected by AD from that of healthy people. The Sigma-Lognormal model, defined in Sect. 2 was applied to the data acquired as stated in Sect. 3.1. The result of this procedure was the decomposition of each task into a vector summation of simple time-overlapped movements, from which it was possible to extract a set of Sigma-Lognormal parameters \(P_j=[D_j, t_{0_j}, \mu _j, \sigma _j, \varTheta _{s_j}, \varTheta _{e_j}]\). In particular, for every point (x, y) acquired during the execution of the tasks, one or more overlapping lognormals were found, so their parameters and the percentage of contribution were stored for every point. The term “First lognormal” is used to refer to the lognormal that most contributes for a certain point. Once the Sigma-Lognormal parameters were obtained for every task and every participant, it was possible to compute a set of fourteen features, described in Table 1.

Table 1. Summary of computed features

The aim of this procedure is to use those computed features to distinguish between patients and healthy controls, the two groups of participants involved. From this section on, those features will be referred as “Lognormal Features”.

4 Experimental Results

This section shows the results obtained by applying several classification approaches according to the input data. Specifically, lognormal features are classified through six well known ML algorithms, while RGB images are used to feed three different kinds of CNNs.

We used the lognormal features (see Sect. 3.3) with standard machine learning algorithms: k-Nearest Neighbors (K-NN), Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosting (GB), XGboost (XGB). We used the scikit-Learn library. The settings of their hyperparameters were left at the default values provided by scikit-Learn. The only exceptions regard the SVM classifiers, for which we used a linear kernel, and the KNN classifier, for which a the number of neighbours was set to 3. In order to obtain statistically significant results, we performed 30 runs for each classifier. For each run, the dataset was randomly shuffled and a 5-fold classification strategy was adopted. In order to evaluate the performance of the mentioned models we considered the following metrics: accuracy (acc), Sensitivity (True Positive Rate, TPR), Specificity (True Negative Rate, TNR), Precision, False Negative Rate (FNR), and Area Under the Curve (AUC).

Since we performed 30 runs for each classifier, the above mentioned parameters were computed for each run and their average and standard deviation (in parentheses) are shown in the following tables. All the metrics are expressed in percentages, except for the AUC and bold values highlight the best performance achieved.

Looking at the accuracies in Table 2 it is worth noting that we achieved the best performance on task 3 with a value of accuracy equal to 74.66% (SVM), whereas the worst performance was obtained on the task 1 with an accuracy of 58.24% (DT). The best performing algorithm was SVM, except for tasks 2 and 4 where RF achieved higher values. The DT classifier, on the contrary, achieved the worst performances for almost every task.

Looking at the table, we can observe a general trend: the first two tasks have worse performances than the others. An explanation to this phenomenon can comes from the analysis of the considered tasks. As mentioned in Sect. 3 the first two tasks are graphic and test the dynamics of simple movements and the motor control of the person who executes them, without requiring an important cognitive attention. The other tasks are words, numbers and the clock drawing test and they indeed require cognitive attention, as some of them have semantic meanings, include descending and ascending traits, requiring greater coordination, control skills and the use of the working memory. These considerations suggest that the use of the lognormal features is more effective on tasks with a semantic meaning instead of graphic tasks and it better brings out the difference between patients and healthy controls.

Table 3 shows the sensitivity values obtained during the experimental process. The sensitivity is a very important metric to consider when facing problems in the medical field, as it gives information about the number of patients correctly recognized. The best sensitivity score is obtained by RF on the fourth task (77.47%), while the worst by DT on the first task (59.29%). According to this table, RF and LR classifiers achieved good sensitivity values, but this doesn’t mean they are the best classifiers, because looking at the accuracy the best is SVM. Despite SVM is the best classifier, this table shows that other classifiers are better able to recognize patients correctly.

From Table 4 we can see the specificity values obtained. The best specificity measure is achieved by SVM on the third task (82.03%), while the worst by RF on the second task (54.06%). This measure is linked to the sensitivity as it gives information about the healthy control participants correctly classified. As we said SVM was the best classifier according to the accuracy, but didn’t achieve the higher sensitivity values, the specificity table, as a consequence, show high values of this metric for the SVM classifier. It means SVM is the best classifier according the accuracy, but, taking into account the considerations on sensitivity and specificity, it seems to better recognize healthy controls instead of patients, among our participants.

Table 5 shows that the best precision value is achieved from SVM on the third task (80.36%), while the worst by DT on the first task (59.89%). Though according to the sensitivity SVM wasn’t the better classifier in recognising patients, this table shows that it is the most precise.

The FNR values obtained during the experimental process are shown in Table 6, where we can see that the best value is obtained with RF on the fourth task (22.52%), while the worst by DT on the first task (40.71%). FNR is a metric linked to Sensitivity, in fact they are complementary measures. FNR represents the number of patients that are erroneously classified and of course it should be at the lowest possible value. This is a fundamental information in the medical field, because an error in classifying a patient is a more serious problem than an error on a healthy person.

Table 7 shows the AUC values. AUC measures the area under the ROC curve, which illustrates the diagnostic ability of a binary classifier as its discrimination threshold is varied with the higher value the better. From the table we can observe that LR on the third task achieved the best result (0.83), whereas the worst was obtained by DT on the first task (0.58).

Table 2. Average Accuracy achieved on 30 runs for every ML algorithm on lognormal features
Table 3. Average Sensitivity achieved on 30 runs for every ML algorithm on lognormal features
Table 4. Average Specificity achieved on 30 runs for every ML algorithm on lognormal features
Table 5. Average Precision achieved on 30 runs for every ML algorithm on lognormal features
Table 6. Average FNR achieved on 30 runs for every ML algorithm on lognormal features
Table 7. Average AUC achieved on 30 runs for every ML algorithm on lognormal features

4.1 Comparison Findings

To test the effectiveness of the lognormal features extracted, we compared the results shown in the above tables with those achieved by some deep neural networks (DL) trained on synthetic images generated from the raw data described in Sect. 3.1. The image generation process and the comparison between the our approach and DL are detailed in the following.

RGB Images. Starting from the raw data acquired as described in Sect. 3.1 and stored in terms of x-y coordinates and pressure at a frequency of 200 Hz, we generated synthetic images to feed Convolutional Neural Networks (CNN). The traits of these images are obtained by considering the points (\(x_i, y_i\)) as vertices of the polygonal that approximates the original curve. We encoded kinematic information in the RGB channels and as the tools used for the acquisition step allow us to record in air movements too, these images contains both in air and on paper information. In particular, they were obtained by considering the triplet of values (\(z_i, v_i, j_i\)) assumed as RGB color components for the \(i-th\) trait, delimited by the couple of points \((x_i, y_i)\) and (\(x_{i+1}, y_{i+1}\)). The triplet is obtained as follows:

  • \(z_i\) is the pressure value at point \((x_i, y_i)\) and it is assumed constant along the i-th trait;

  • \(v_i\) is the velocity of the i-th trait, computed as the ratio between the length of the i-th trait and interval time of 5 ms corresponding to the period of acquisition of the tablet;

  • \(j_i\) is the jerk of the i-th trait, defined as the second derivative of velocity.

The values of the triplet (\(z_i, v_i, j_i\)) have been normalized into the range [0, 255] in order to match the standard 0–255 color scale, by considering the minimum and the maximum value on the entire training set for these three quantities. For further details about the generation of these images, we suggest checking out our recent publication [3]. We selected three CNN models that accept input images that are automatically resized to 256 \(\times \) 256 for VGG19 [22], to 224 \(\times \) 224 for ResNet50 [7], to 299 \(\times \) 299 for InceptionV3 [23] respectively. Taking into account these constraints for both type of images, the original x, y coordinates have been resized into the range [0, 299] for each image, in order to provide ex-ante images of suitable size and minimize the loss of information related to possible zoom in/out.

ML/DL Comparison. As mentioned above, lognormal features can be given in input to standard ML algorithms, whereas RGB images contain dynamic information encoded into the three color channels and can be used to feed a different CNN. Table 8 shows the accuracy performances achieved by the two approaches. From the table we can observe that in most cases ML outperformed DL, especially with the SVM classifier. DL only won on the second task with the VGG19 net. For the sake of comparison, for each task we plotted the ROC curves of the classification algorithms/nets that outperformed the others in at least one task, namely LR and SVM among the ML classifiers, and VGG19 among the CNNs (see Table 8). Looking at these two different sources of evaluation, we can observe that the deep approach (RGB images) outperformed the lognormal-based one on the graphic tasks (Tasks #1 and #2). On the contrary, the lognormal features confirmed their effectiveness in dealing with handwriting and cognitive tasks (see Fig. 2).

Table 8. Comparison results.
Fig. 2.
figure 2

Comparison of ROC curves obtained from RF, SVM and VGG19 for every task.

5 Conclusions and Future Work

Neurodegenerative disease is a cognitive impairment that can be manifested through the graphonomics lack of skills. Alzheimer’s and Parkinson’s are the two most common diseases observed in handwriting. This paper analyzes the writing of patients and healthy people with kinematic features extracted from the kinematic theory of rapid movement. To study the skill levels of participants, we used a dataset with healthy people and patients at the early stage of AD. Their handwriting included signatures, letters, and drawings following a well-established protocol for Alzheimer’s [4]. Our preliminary results confirm that lognormal features model handwriting better than graphic tasks. In particular, we achieved the best results (ACC > 70 %) with the following tasks:

  • ‘le’ (repeated four times);

  • Word ‘bottiglia’ backward;

  • Dictated telephone number;

  • Clock drawing test.

It is worth noting that we did not apply any parameter optimization in this work. We expect that a grid search procedure will allow us a significant improvement in our future outcomes. Furthermore, these results align with those achieved using standard kinematic features (velocity, acceleration, etc.). We also plan to analyze which tasks lognormal/deep features perform better. This would allow us deepening for each task which features (approach) achieve the best classification performance. Finally, similar to related works, there is room to improve our final prediction by combining the responses from the single classifiers (one per task).