Keywords

1 Introduction

People use their mobile devices in a unique way. The way they hold and touch their devices can be sensed by the mobile’s sensors (such as accelerometer or touchscreen). Several studies have used touchscreen swipes for continuous user authentication. Feng et al. [15] examined gesture based continuous user authentication for the first time, using horizontal and vertical swipes for user authentication. Zooming gestures were also investigated for the same purpose. Similar studies were performed by Li et al. [19] and Frank et al. [16], which resulted in the conclusion that user identity can be predicted with high accuracy based on a sequence of swipes. Bo et al. [2] added the micro-movement characteristics of the device (obtained from the accelerometer sensor) to the touchscreen data. In addition, Serwadda et al. [26] and Shahzad et al. [27] confirmed the requirement for several consecutive swipes in order to obtain a high accuracy in user authentication. Roy et al. [23] used Hidden Markov models on the dataset collected by Frank et al. and as a consequence improved the authentication performance obtained by the creators of the dataset. In this article we show to what extent personality traits of users can be predicted from touchscreen swipes.

Several studies have been conducted in order to reveal the relationship between the use of digital technology and personality traits of users. Phillips et al. [21] sought to identify personality traits associated with mobile phone games. Butt and Phillips reported that personality can predict the amount of mobile phone use [4]. Chittaranjan et al. [6, 7] investigated the relationship between behavioural characteristics derived from rich smartphone data and self-reported personality traits, specifically an automatic method to infer the personality type of a user based on mobile phone usage was developed. de Montjoye et al. [20] provided the first evidence that personality traits could be reliably predicted from standard mobile phone logs. Again, personality traits were predicted from social media profiles such as Facebook or Twitter [1, 9, 22, 28]. The use of Internet services and its relationship to personality traits was also investigated [18, 29].

Eysenck Personality Questionnaire (EPQ) was developed and published in 1975 by Eysenck&Eysenck [12]. Afterwards, the addition of several improvements resulted in a revised version of the questionnaire (EPQ-R) [11, 13]. The EPQ-R has been translated into different languages and its validity demonstrated in several studies [5, 10, 14, 24, 25]. In our study we used the Hungarian 58-question Eysenck Personality Questionnaire. The participants completed the questionnaire on a mobile device, which was in our case a tablet computer.

Our first objective is to study whether user identity information can be revealed from such constrained swipes. The second objective is to examine how reliably personality traits can be predicted from the same data. We will reveal the significant differences in mobile usage data among people belonging to different personality types.

2 Materials and Methods

2.1 Participants and Materials

Ninety-eight participants (60 male, 38 female, aged 19–58) participated in the experiment. All participants completed the Hungarian 58-question Eysenck Personality Questionnaire in a controlled environment. Details of data acquisition are shown in Table 1. All information related to this research is available at http://www.ms.sapientia.ro/~manyi/personality.html

Table 1. Details of data acquisition.

The questionnaire is used to assess four personality traits: Extraversion (E), Neuroticism (N), Psychoticism (P) and Social Desirability (L - Lie scale). The Lie scale measures to what extent subjects deliberately attempt to control their scores. This version of the Eysenck questionnaire consists of 16 questions related to Extraversion, 19 to Neuroticism, 15 to Psychoticism and 8 to the Lie-scale. On each scale the number of questions with corresponding answers were counted and then multiplied by 2 on evaluation purposes. This resulted in four integer numbers (denoted by E, N, P and L), one for each trait. For example, subjects obtained values between 0 and 32 on the extraversion scale. Subjects scoring below 16 may be considered introverted while those scoring above 16 may be regarded as extraverted.

2.2 Procedure

Raw Data. Participants were instructed to answer all the 58 personality questions shown on a tablet by dragging a slider with their finger from the middle point to either left for a negative or right for a positive answer. Raw data recorded from sensors in each touch point during the drag operation are as follows (see Fig. 1): action code: \(\{DOWN, MOVE, UP\}\); x, y coordinates; acceleration measured along x, y, z axes; pressure exerted on the screen; finger area - a normalized value of touch area in pixels; timestamp.

Fig. 1.
figure 1

Swipe consisting of five touch points. In each touch point the saved raw data are the following: x, y - touch point coordinates; t - timestamp; p - pressure; FA - Finger area; Gx, Gy, Gz - acceleration measured along the x, y and z axes

Some participants made more than one drag operation in order to answer the question, resulting in several swipes connected to a question. In these cases we always kept only the swipe that contained the most touch points.

Feature Extraction. For each swipe we computed a feature vector having nine features: average_velocity (av), acceleration_at_start (aas), midstroke_pressure (msp), midstroke_finger_area (msfa), mean_pressure (mp), mean_finger_area (mfa), meangx (mgx), meangy (mgy), meangz (mgz).

Let us consider a swipe consisting of n touch points: \(Swipe=\{ P_1, P_2, \ldots P_n\}\), where a touch point is defined as \(\quad P_i=(action_i,x_i, y_i, gx_i, gy_i, gz_i, p_i, fa_i, t_i)\) and \(action_1=DOWN\), \(action_i=MOVE \quad i=2 \ldots n-1\), \(action_n=UP\). The nine features were computed as follows:

$$\begin{aligned} av = \frac{\sum _{i=1}^{n-1} d(P_i,P_{i+1})}{t_n - t_1} \end{aligned}$$
(1)
$$\begin{aligned} aas = \frac{1}{3} \sum _{i=1}^{3} \frac{v_{i+1}-v_i}{t_{i+1} - t_{i}}, v_{i+1} = \frac{d(P_{i+1},P_i)}{t_{i+1}-t_i} , i>0 \end{aligned}$$
(2)
$$\begin{aligned} msp = p_{ {\lfloor \frac{n}{2} \rfloor } }, msfa = fa_{ \lfloor \frac{n}{2} \rfloor } \end{aligned}$$
(3)
$$\begin{aligned} mp = \frac{1}{n} {\sum _{i=1}^{n}{p_i}}, mfa = \frac{1}{n} {\sum _{i=1}^{n}{fa_i}} \end{aligned}$$
(4)
$$\begin{aligned} mgx = \frac{1}{n} {\sum _{i=1}^{n}{gx_i}}, mgy = \frac{1}{n} {\sum _{i=1}^{n}{gy_i}}, mgz = \frac{1}{n} {\sum _{i=1}^{n}{gz_i}} \end{aligned}$$
(5)

Classification. Two well-known classification algorithms were used to evaluate our datasets: k-Nearest Neighbours (k-NN) and Random forest [3]. The k-NN algorithm is a type of instance-based classification algorithm, where a new instance is classified by a majority vote of its k nearest neighbours. The algorithm is one of the simplest machine learning algorithms that does not require a training phase. In contrast, the Random forest algorithm is a complex algorithm, which constructs a multitude of decision trees at training time.

For evaluation we implemented a Java application based on Weka Data mining tools (version 3.6) [17]. Two types of cross-validation were used to evaluate the accuracy of our methods. The first type was the usual stratified 10-fold cross-validation. The dataset was partitioned into k partitions (folds), then the classifier was trained using 9 partitions and tested with the remaining one. This was repeated 10 times for each partition. The second type is a variant of leave-one-out cross-validationof cross-validation, namely the leave-one-user-out cross-validation, introduced by Cornelius and Kotz [8]. In this case a classifier was trained using the whole dataset except one user’s data and tested with the omitted user’s data. The procedure was repeated for each user. This type of cross-validation tests the generality of the classifier, namely how it performs in the case of an unseen user [8].

Besides simple swipe classification we also evaluated the classification performance based on sequences of swipes. A sequence of swipes was classified using the following method. Let us denote N the number of classes and X the sequence of swipes to be classified:

$$\begin{aligned} X=\{x_{1},x_{2},\ldots ,x_{T}\},\qquad x_{i}\in {R}^{D}, \end{aligned}$$
(6)

where T is the number of swipes and D is the dimension of the feature vector (in our case we used nine features). We computed for each swipe the prediction distribution (Eq. 7).

$$\begin{aligned} P_i=\{p_i^{1}, p_i^{2}, \ldots ,p_i^{N}\}, p_i^{k}\in [0.1], k=1 \ldots N, i=1 \ldots T, \end{aligned}$$
(7)

where \(p_i^{k}\) is the probability that \(x_{i}\) belongs to class k (We used distributionForInstance function from Weka [17]).

This was followed by computing the average probability for each class and choosing the maximum one (Eq. 8).

$$\begin{aligned} Class(X) = \arg \max _{k =1}^N \left\{ {\frac{\sum _{i=1}^{T} p_i^k}{T}} \right\} \end{aligned}$$
(8)

Consequently, a sequence of swipes is classified as belonging to the \({k^{th}}\) class if the average probability for this class is the maximum one.

3 Results

3.1 Eysenck Personality Questionnaire Results

Table 2 presents the descriptive statistics of the personality trait scores of the 98 participants.

Table 2. EPQ results for the 98 participants, Means, Standard deviations (SD), Minimum, Maximum values and Medians for Extraverison, Neuroticism, Psychoticism and Lie scale traits.

Extraversion was negatively correlated with Neuroticism (\(r=-0.33, p <0.001\)) and Psychoticism (\(r=-0.23, p<0.05\)). Neuroticism was positively correlated with Psychoticism (\(r=0.33, p<0.001\)), while Lie scale was negatively correlated with Neuroticism (\(r=-0.36, p<0.001\)) and positively correlated with Psychoticism (\(r=0.26, p<0.01\)).

3.2 User Classification Results

We performed user classification using 10-fold cross-validation for classifier evaluation. In this case we had 98 classes (the number of subjects) and 58 samples from each subject. Both k-NN and Random forest classifier were evaluated using 10-fold cross-validation. Measurements were repeated five times, each time increasing the length of the swipe sequence to be classified. Outstanding classification results were obtained in the case of Random forest classification algorithm, especially when using 5 swipes (98.8 % accuracy). The detailed results for k-NN and Random forest classifiers are shown in Table 3.

Table 3. User classification accuracies with standard deviation in parentheses. Measurements: 10-fold cross-validation. Swipe sequences of length: 1, 2, 3, 4, 5.

3.3 Personality Traits Classification Results

According to the EPQ evaluation procedure (using the Extroversion value obtained for each user), we can split the users into two classes: the more introverted E1 (\(E < 16\)) and the more extraverted class E2 (\(E \ge 16\)). Similarly, we split the dataset along the Neuroticism scale into a less neurotic N1 (\(N < 19\)) and a more neurotic class N2 (\(N \ge 19\)).

These splits resulted in two datasets. The only difference between these datasets is the class information. The population of classes for the two datasets are shown in Table 4.

Table 4. Datasets class information.

We evaluated these datasets using the two types of cross-validations described in Sect. 2.2 and the results are presented in Table 5 and 6. In the case of leave-one-user-out cross-validation, classifiers were trained using data from 97 subjects (5626 instances) and tested by using data from a single user (58 instances). After repeating the same procedure for each of the 98 users, the mean and the standard deviation were computed.

Table 5. E and N classification accuracies with standard deviation in parentheses. Measurements: 10 runs, 10-fold cross-validation. Swipe sequences of length: 1-5.
Table 6. E, N and EN classification accuracies with standard deviation in parentheses. Measurements: leave one user out cross-validation. Swipe sequences of length: 1-5.

As it can be seen in Table 5, high accuracies have been obtained for both datasets, especially in the case of Random forest classifier and using 5 swipes for classification. The best accuracy obtained was 90.4 % for one swipe and 99.4 % for five swipes, both obtained by the Random forest classifier and E2dataset.

Repeating the same measurements and using leave-one-user-out cross-validation resulted in dramatically dropped mean accuracies (see Table 5). Using Random forest classifier for the E2dataset the accuracies obtained varied from 60.5 % for one swipe to 62.9 % accuracy for five swipes. Nevertheless, the mean accuracy is low, 40 users out of 62 in the Extraversion class (E2dataset) are identified with high accuracy (over 80 %). The distribution of recognition accuracies for users is shown in Fig. 2. It can be seen that the classification accuracy is higher for the E2 class (more extraverted) than for the E1 class (more introverted).

Fig. 2.
figure 2

E2dataset classification results for the 98 measurements (one swipe, leave-one-user-out cross-validation). Accuracy distribution among the two classes: E1 - more introverted and E2 - more extraverted.

Fig. 3.
figure 3

E2dataset statistics. Sample probability distribution of the meangy feature for the classes in E2dataset.

3.4 Statistical Analysis

Statistical analysis was performed by using the ttest2 function from MATLAB (The Mathworks, Inc., Natick, MA). The p value of 0.001 was considered significant.

The discriminatory capability of the features was tested using a two-tailed t-test for the difference of the mean values for class related feature sets individually, at a significance level of 0.001 and assuming equal variance. The mean values of 8 features out of 9 (except acceleration_at_start) differ significantly (\(p<0.001\)) for the two classes of E2dataset. As for the N2dataset, significant differences in the mean were found only for two features, meangx and meangy.

Differences in device holding position can also be inferred from the sample probability distribution of the meangy feature for the classes in the E2dataset (Fig. 3). The mean values for meangy are 5.953 for class E1 and 3.979 for class E2, which is a consequence of the tendency in group E1 to hold the device more vertically. Lying the device on the table (meangy around 0) was mostly characteristic to the extraverted group.

4 Discussion

Several research papers have been dedicated to show that touchscreen swipes contain a high quantity of user specific information, therefore they may be used in authentication tasks. In this paper we have evaluated user classification based on information content obtained from constrained horizontal swipes. Among the two classifiers Random forest classifier provided the better accuracies for all test cases. However, accuracy for one swipe is not outstanding (80.6 %), increasing the number of swipes to five resulted in 98.8 % user classification accuracy.

Another question we have examined is whether personality traits can be reliably predicted from touchscreen swipes using machine learning methods. For this purpose two types of cross-validation measurement were used, namely 10-fold cross-validation and a leave-one-user-out cross-validation. While the first type of cross-validation shows how well a method is performing in the case of a particular known user, the second method shows how well an unseen user’s data is classified according to a criterion (generalisation). We evaluated the two datasets (E2dataset, N2dataset) using binary classifiers. Our classifiers performed well for each dataset using the traditional 10-fold cross-validation (see Table 5). Moreover, the classification accuracies increased in the case where more than one swipe was used for classification.

In order to better reflect reality, we needed to predict the personality traits of an unseen user. This was achieved by using the leave-one-user-out cross-validation method. The results are shown in Table 6. We can see that the best results were obtained for the Extraversion trait, which can be predicted with approximately 60 % accuracy. The huge differences between the two types of evaluation may be explained by taking into account the user identity classification results shown in Table 3. This clearly shows that swipes contain a large amount of information about user identity and this helps in recognising the personality traits of a user in the case of 10-fold cross-validation evaluation (in this case we used each user’s data in the training set). However, when we classify the data of an unseen user (the case of leave-one-user-out cross-validation), the identity information is missing from the training data.

5 Conclusions

In this paper we have analysed user identity and personality related data contained in simple left-right swipes on a touchscreen of a mobile device. The mobile application we developed presents the user the Hungarian version of Eysenck’s Personality Questionnaire, containing 58 questions and it also records usage data for each answer. During our study 98 users completed the test using identical mobile devices. Despite using constrained swipes very good user classification accuracy was obtained by the Random forest classifier. We should mention that this good accuracy was obtained by using 5-swipe sequences (98.8 %).

From the collected data two datasets were created in order to analyse the predictability of users’ personality traits across two dimensions: Extraversion and Neuroticism. These datasets were analysed with a 10-fold cross-validation and a leave-one-user-out cross-validation. Very high accuracies were obtained using the 10-fold cross-validation (over 99 % for 5 swipes), although this method cannot be used reliably to predict the personality traits of an unseen user based on our feature set. Only the leave-one-user-out cross-validation provides an effective method of predicting the personality traits of an unseen user. Results obtained by this method are slightly better than the chance level (62.9 % average accuracy for E2dataset and 5 swipes). However, the classification accuracy is over 80 % for two thirds of the more extroverted subjects. This may lead to some potentially promising future research.

Our conclusions are drawn taking into consideration the limitation imposed by having only 98 users with 58 samples/user in this study.