Predicting User Identity and Personality Traits from Mobile Sensor Data

Antal, Margit; Szabó, László Zsolt; Nemes, Győző

doi:10.1007/978-3-319-46254-7_13

Margit Antal¹²,
László Zsolt Szabó¹² &
Győző Nemes¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 639))

Included in the following conference series:

International Conference on Information and Software Technologies

1400 Accesses
3 Citations

Abstract

Several types of information can be revealed from data provided by mobile sensors. In this study touchscreen and accelerometer data was collected from a group of 98 volunteers during filling in the Eysenck Personality Questionnaire on a tablet computer. Subjects performed swipes on the touchscreen in order to answer the questions. Touchscreen swipes have been already used for user authentication. We show that our constrained swipes contain enough user specific information to be utilized for the same task. Moreover, we have studied the predictability of personality traits such as extraversion, and neuroticism from the collected data. Extraversion was found to be the most reliably predictable personality trait.

Access provided by Autonomous University of Puebla. Download conference paper PDF

The probability of predicting personality traits by the way user types on touch screen

Article 11 December 2018

Predicting Personality Using Novel Mobile Phone-Based Metrics

What data are smartphone users willing to share with researchers?

Article 20 June 2019

Keywords

1 Introduction

People use their mobile devices in a unique way. The way they hold and touch their devices can be sensed by the mobile’s sensors (such as accelerometer or touchscreen). Several studies have used touchscreen swipes for continuous user authentication. Feng et al. [15] examined gesture based continuous user authentication for the first time, using horizontal and vertical swipes for user authentication. Zooming gestures were also investigated for the same purpose. Similar studies were performed by Li et al. [19] and Frank et al. [16], which resulted in the conclusion that user identity can be predicted with high accuracy based on a sequence of swipes. Bo et al. [2] added the micro-movement characteristics of the device (obtained from the accelerometer sensor) to the touchscreen data. In addition, Serwadda et al. [26] and Shahzad et al. [27] confirmed the requirement for several consecutive swipes in order to obtain a high accuracy in user authentication. Roy et al. [23] used Hidden Markov models on the dataset collected by Frank et al. and as a consequence improved the authentication performance obtained by the creators of the dataset. In this article we show to what extent personality traits of users can be predicted from touchscreen swipes.

Several studies have been conducted in order to reveal the relationship between the use of digital technology and personality traits of users. Phillips et al. [21] sought to identify personality traits associated with mobile phone games. Butt and Phillips reported that personality can predict the amount of mobile phone use [4]. Chittaranjan et al. [6, 7] investigated the relationship between behavioural characteristics derived from rich smartphone data and self-reported personality traits, specifically an automatic method to infer the personality type of a user based on mobile phone usage was developed. de Montjoye et al. [20] provided the first evidence that personality traits could be reliably predicted from standard mobile phone logs. Again, personality traits were predicted from social media profiles such as Facebook or Twitter [1, 9, 22, 28]. The use of Internet services and its relationship to personality traits was also investigated [18, 29].

Eysenck Personality Questionnaire (EPQ) was developed and published in 1975 by Eysenck&Eysenck [12]. Afterwards, the addition of several improvements resulted in a revised version of the questionnaire (EPQ-R) [11, 13]. The EPQ-R has been translated into different languages and its validity demonstrated in several studies [5, 10, 14, 24, 25]. In our study we used the Hungarian 58-question Eysenck Personality Questionnaire. The participants completed the questionnaire on a mobile device, which was in our case a tablet computer.

Our first objective is to study whether user identity information can be revealed from such constrained swipes. The second objective is to examine how reliably personality traits can be predicted from the same data. We will reveal the significant differences in mobile usage data among people belonging to different personality types.

2 Materials and Methods

2.1 Participants and Materials

Ninety-eight participants (60 male, 38 female, aged 19–58) participated in the experiment. All participants completed the Hungarian 58-question Eysenck Personality Questionnaire in a controlled environment. Details of data acquisition are shown in Table 1. All information related to this research is available at http://www.ms.sapientia.ro/~manyi/personality.html

Table 1. Details of data acquisition.

Full size table

The questionnaire is used to assess four personality traits: Extraversion (E), Neuroticism (N), Psychoticism (P) and Social Desirability (L - Lie scale). The Lie scale measures to what extent subjects deliberately attempt to control their scores. This version of the Eysenck questionnaire consists of 16 questions related to Extraversion, 19 to Neuroticism, 15 to Psychoticism and 8 to the Lie-scale. On each scale the number of questions with corresponding answers were counted and then multiplied by 2 on evaluation purposes. This resulted in four integer numbers (denoted by E, N, P and L), one for each trait. For example, subjects obtained values between 0 and 32 on the extraversion scale. Subjects scoring below 16 may be considered introverted while those scoring above 16 may be regarded as extraverted.

2.2 Procedure

Raw Data. Participants were instructed to answer all the 58 personality questions shown on a tablet by dragging a slider with their finger from the middle point to either left for a negative or right for a positive answer. Raw data recorded from sensors in each touch point during the drag operation are as follows (see Fig. 1): action code: $\{DOWN, MOVE, UP\}$; x, y coordinates; acceleration measured along x, y, z axes; pressure exerted on the screen; finger area - a normalized value of touch area in pixels; timestamp.

Some participants made more than one drag operation in order to answer the question, resulting in several swipes connected to a question. In these cases we always kept only the swipe that contained the most touch points.

Feature Extraction. For each swipe we computed a feature vector having nine features: average_velocity (av), acceleration_at_start (aas), midstroke_pressure (msp), midstroke_finger_area (msfa), mean_pressure (mp), mean_finger_area (mfa), meangx (mgx), meangy (mgy), meangz (mgz).

Let us consider a swipe consisting of n touch points: $Swipe=\{ P_1, P_2, \ldots P_n\}$, where a touch point is defined as $\quad P_i=(action_i,x_i, y_i, gx_i, gy_i, gz_i, p_i, fa_i, t_i)$ and $action_1=DOWN$, $action_i=MOVE \quad i=2 \ldots n-1$, $action_n=UP$. The nine features were computed as follows:

$$\begin{aligned} av = \frac{\sum _{i=1}^{n-1} d(P_i,P_{i+1})}{t_n - t_1} \end{aligned}$$

(1)

$$\begin{aligned} aas = \frac{1}{3} \sum _{i=1}^{3} \frac{v_{i+1}-v_i}{t_{i+1} - t_{i}}, v_{i+1} = \frac{d(P_{i+1},P_i)}{t_{i+1}-t_i} , i>0 \end{aligned}$$

(2)

$$\begin{aligned} msp = p_{ {\lfloor \frac{n}{2} \rfloor } }, msfa = fa_{ \lfloor \frac{n}{2} \rfloor } \end{aligned}$$

(3)

$$\begin{aligned} mp = \frac{1}{n} {\sum _{i=1}^{n}{p_i}}, mfa = \frac{1}{n} {\sum _{i=1}^{n}{fa_i}} \end{aligned}$$

(4)

$$\begin{aligned} mgx = \frac{1}{n} {\sum _{i=1}^{n}{gx_i}}, mgy = \frac{1}{n} {\sum _{i=1}^{n}{gy_i}}, mgz = \frac{1}{n} {\sum _{i=1}^{n}{gz_i}} \end{aligned}$$

(5)

Classification. Two well-known classification algorithms were used to evaluate our datasets: k-Nearest Neighbours (k-NN) and Random forest [3]. The k-NN algorithm is a type of instance-based classification algorithm, where a new instance is classified by a majority vote of its k nearest neighbours. The algorithm is one of the simplest machine learning algorithms that does not require a training phase. In contrast, the Random forest algorithm is a complex algorithm, which constructs a multitude of decision trees at training time.

For evaluation we implemented a Java application based on Weka Data mining tools (version 3.6) [17]. Two types of cross-validation were used to evaluate the accuracy of our methods. The first type was the usual stratified 10-fold cross-validation. The dataset was partitioned into k partitions (folds), then the classifier was trained using 9 partitions and tested with the remaining one. This was repeated 10 times for each partition. The second type is a variant of leave-one-out cross-validationof cross-validation, namely the leave-one-user-out cross-validation, introduced by Cornelius and Kotz [8]. In this case a classifier was trained using the whole dataset except one user’s data and tested with the omitted user’s data. The procedure was repeated for each user. This type of cross-validation tests the generality of the classifier, namely how it performs in the case of an unseen user [8].

Besides simple swipe classification we also evaluated the classification performance based on sequences of swipes. A sequence of swipes was classified using the following method. Let us denote N the number of classes and X the sequence of swipes to be classified:

$$\begin{aligned} X=\{x_{1},x_{2},\ldots ,x_{T}\},\qquad x_{i}\in {R}^{D}, \end{aligned}$$

(6)

where T is the number of swipes and D is the dimension of the feature vector (in our case we used nine features). We computed for each swipe the prediction distribution (Eq. 7).

$$\begin{aligned} P_i=\{p_i^{1}, p_i^{2}, \ldots ,p_i^{N}\}, p_i^{k}\in [0.1], k=1 \ldots N, i=1 \ldots T, \end{aligned}$$

(7)

where $p_i^{k}$ is the probability that $x_{i}$ belongs to class k (We used distributionForInstance function from Weka [17]).

This was followed by computing the average probability for each class and choosing the maximum one (Eq. 8).

$$\begin{aligned} Class(X) = \arg \max _{k =1}^N \left\{ {\frac{\sum _{i=1}^{T} p_i^k}{T}} \right\} \end{aligned}$$

(8)

Consequently, a sequence of swipes is classified as belonging to the ${k^{th}}$ class if the average probability for this class is the maximum one.

3 Results

3.1 Eysenck Personality Questionnaire Results

Table 2 presents the descriptive statistics of the personality trait scores of the 98 participants.

Table 2. EPQ results for the 98 participants, Means, Standard deviations (SD), Minimum, Maximum values and Medians for Extraverison, Neuroticism, Psychoticism and Lie scale traits.

Full size table

Extraversion was negatively correlated with Neuroticism ($r=-0.33, p <0.001$) and Psychoticism ($r=-0.23, p<0.05$). Neuroticism was positively correlated with Psychoticism ($r=0.33, p<0.001$), while Lie scale was negatively correlated with Neuroticism ($r=-0.36, p<0.001$) and positively correlated with Psychoticism ($r=0.26, p<0.01$).

3.2 User Classification Results

We performed user classification using 10-fold cross-validation for classifier evaluation. In this case we had 98 classes (the number of subjects) and 58 samples from each subject. Both k-NN and Random forest classifier were evaluated using 10-fold cross-validation. Measurements were repeated five times, each time increasing the length of the swipe sequence to be classified. Outstanding classification results were obtained in the case of Random forest classification algorithm, especially when using 5 swipes (98.8 % accuracy). The detailed results for k-NN and Random forest classifiers are shown in Table 3.

Table 3. User classification accuracies with standard deviation in parentheses. Measurements: 10-fold cross-validation. Swipe sequences of length: 1, 2, 3, 4, 5.

Full size table

3.3 Personality Traits Classification Results

According to the EPQ evaluation procedure (using the Extroversion value obtained for each user), we can split the users into two classes: the more introverted E1 ($E < 16$) and the more extraverted class E2 ($E \ge 16$). Similarly, we split the dataset along the Neuroticism scale into a less neurotic N1 ($N < 19$) and a more neurotic class N2 ($N \ge 19$).

These splits resulted in two datasets. The only difference between these datasets is the class information. The population of classes for the two datasets are shown in Table 4.

Table 4. Datasets class information.

Full size table

We evaluated these datasets using the two types of cross-validations described in Sect. 2.2 and the results are presented in Table 5 and 6. In the case of leave-one-user-out cross-validation, classifiers were trained using data from 97 subjects (5626 instances) and tested by using data from a single user (58 instances). After repeating the same procedure for each of the 98 users, the mean and the standard deviation were computed.

Table 5. E and N classification accuracies with standard deviation in parentheses. Measurements: 10 runs, 10-fold cross-validation. Swipe sequences of length: 1-5.

Full size table

Table 6. E, N and EN classification accuracies with standard deviation in parentheses. Measurements: leave one user out cross-validation. Swipe sequences of length: 1-5.

Full size table

As it can be seen in Table 5, high accuracies have been obtained for both datasets, especially in the case of Random forest classifier and using 5 swipes for classification. The best accuracy obtained was 90.4 % for one swipe and 99.4 % for five swipes, both obtained by the Random forest classifier and E2dataset.

Repeating the same measurements and using leave-one-user-out cross-validation resulted in dramatically dropped mean accuracies (see Table 5). Using Random forest classifier for the E2dataset the accuracies obtained varied from 60.5 % for one swipe to 62.9 % accuracy for five swipes. Nevertheless, the mean accuracy is low, 40 users out of 62 in the Extraversion class (E2dataset) are identified with high accuracy (over 80 %). The distribution of recognition accuracies for users is shown in Fig. 2. It can be seen that the classification accuracy is higher for the E2 class (more extraverted) than for the E1 class (more introverted).

3.4 Statistical Analysis

Statistical analysis was performed by using the ttest2 function from MATLAB (The Mathworks, Inc., Natick, MA). The p value of 0.001 was considered significant.

The discriminatory capability of the features was tested using a two-tailed t-test for the difference of the mean values for class related feature sets individually, at a significance level of 0.001 and assuming equal variance. The mean values of 8 features out of 9 (except acceleration_at_start) differ significantly ($p<0.001$) for the two classes of E2dataset. As for the N2dataset, significant differences in the mean were found only for two features, meangx and meangy.

Differences in device holding position can also be inferred from the sample probability distribution of the meangy feature for the classes in the E2dataset (Fig. 3). The mean values for meangy are 5.953 for class E1 and 3.979 for class E2, which is a consequence of the tendency in group E1 to hold the device more vertically. Lying the device on the table (meangy around 0) was mostly characteristic to the extraverted group.

4 Discussion

Several research papers have been dedicated to show that touchscreen swipes contain a high quantity of user specific information, therefore they may be used in authentication tasks. In this paper we have evaluated user classification based on information content obtained from constrained horizontal swipes. Among the two classifiers Random forest classifier provided the better accuracies for all test cases. However, accuracy for one swipe is not outstanding (80.6 %), increasing the number of swipes to five resulted in 98.8 % user classification accuracy.

Another question we have examined is whether personality traits can be reliably predicted from touchscreen swipes using machine learning methods. For this purpose two types of cross-validation measurement were used, namely 10-fold cross-validation and a leave-one-user-out cross-validation. While the first type of cross-validation shows how well a method is performing in the case of a particular known user, the second method shows how well an unseen user’s data is classified according to a criterion (generalisation). We evaluated the two datasets (E2dataset, N2dataset) using binary classifiers. Our classifiers performed well for each dataset using the traditional 10-fold cross-validation (see Table 5). Moreover, the classification accuracies increased in the case where more than one swipe was used for classification.

In order to better reflect reality, we needed to predict the personality traits of an unseen user. This was achieved by using the leave-one-user-out cross-validation method. The results are shown in Table 6. We can see that the best results were obtained for the Extraversion trait, which can be predicted with approximately 60 % accuracy. The huge differences between the two types of evaluation may be explained by taking into account the user identity classification results shown in Table 3. This clearly shows that swipes contain a large amount of information about user identity and this helps in recognising the personality traits of a user in the case of 10-fold cross-validation evaluation (in this case we used each user’s data in the training set). However, when we classify the data of an unseen user (the case of leave-one-user-out cross-validation), the identity information is missing from the training data.

5 Conclusions

In this paper we have analysed user identity and personality related data contained in simple left-right swipes on a touchscreen of a mobile device. The mobile application we developed presents the user the Hungarian version of Eysenck’s Personality Questionnaire, containing 58 questions and it also records usage data for each answer. During our study 98 users completed the test using identical mobile devices. Despite using constrained swipes very good user classification accuracy was obtained by the Random forest classifier. We should mention that this good accuracy was obtained by using 5-swipe sequences (98.8 %).

From the collected data two datasets were created in order to analyse the predictability of users’ personality traits across two dimensions: Extraversion and Neuroticism. These datasets were analysed with a 10-fold cross-validation and a leave-one-user-out cross-validation. Very high accuracies were obtained using the 10-fold cross-validation (over 99 % for 5 swipes), although this method cannot be used reliably to predict the personality traits of an unseen user based on our feature set. Only the leave-one-user-out cross-validation provides an effective method of predicting the personality traits of an unseen user. Results obtained by this method are slightly better than the chance level (62.9 % average accuracy for E2dataset and 5 swipes). However, the classification accuracy is over 80 % for two thirds of the more extroverted subjects. This may lead to some potentially promising future research.

Our conclusions are drawn taking into consideration the limitation imposed by having only 98 users with 58 samples/user in this study.

References

Back, M.D., Stopfer, J.M., Vazire, S., Gaddis, S., Schmukle, S.C., Egloff, B., Gosling, S.D.: Facebook profiles reflect actual personality, not self-idealization. Psychol. Sci. 21(3), 372–374 (2010)
Article Google Scholar
Bo, C., Zhang, L., Li, X.Y., Huang, Q., Wang, Y.: Silentsense: Silent user identification via touch and movement behavioral biometrics. In: Proceedings of the 19th Annual International Conference on Mobile Computing; Networking, pp. 187–190. MobiCom 2013, ACM (2013)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Butt, S., Phillips, J.G.: Personality and self reported mobile phone use. Comput. Hum. Behav. 24(2), 346–360 (2008)
Article Google Scholar
Chico, E., Tous, J., Lorenzo-Seva, U., Vigil-Colet, A.: Spanish adaptation of dickman’s impulsivity inventory: its relationship to eysenck’s personality questionnaire. Pers. Individ. Differ. 35(8), 1883–1892 (2003)
Article Google Scholar
Chittaranjan, G., Blom, J., Gatica-Perez, D.: Who’s who with big-five: Analyzing and classifying personality traits with smartphones. In: 2011 15th Annual International Symposium on Wearable Computers (ISWC), pp. 29–36. IEEE (2011)
Google Scholar
Chittaranjan, G., Blom, J., Gatica-Perez, D.: Mining large-scale smartphone data for personality studies. Pers. Ubiquit. Comput. 17(3), 433–450 (2013)
Article Google Scholar
Cornelius, C.T., Kotz, D.F.: Recognizing whether sensors are on the same body. Pervasive Mobile Comput. 8(6), 822–836 (2012)
Article Google Scholar
Counts, S., Stecher, K.B.: Self-presentation of personality during online profile creation. In: ICWSM, pp. 191–194 (2009)
Google Scholar
Dazzi, C.: The eysenck personality questionnaire-revised (epq-r): A confirmation of the factorial structure in the italian context. Pers. Individ. Differ. 50(6), 790–794 (2011)
Article Google Scholar
Eysenck, H.J., et al.: Manual of the eysenck personality scales (eps adult) (1991)
Google Scholar
Eysenck, H.J., Eysenck, S.B.G.: Manual of the Eysenck Personality Questionnaire (junior and adult). Hodder and Stoughton, London (1975)
Google Scholar
Eysenck, H., Eysenck, M.: A natural science approach (1985)
Google Scholar
Eysenck, S.B., Barrett, P.T., Barnes, G.E.: A cross-cultural study of personality: Canada and england. Pers. Individ. Differ. 14(1), 1–9 (1993)
Article Google Scholar
Feng, T., Liu, Z., Kwon, K.A., Shi, W.: Continuous mobile authentication using touchscreen gestures. In: 2012 IEEE Conference on Technologies for Homeland Security (HST), pp. 451–456 (2012)
Google Scholar
Frank, M., Biedert, R., Ma, E., Martinovic, I., Song, D.: Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication. Inf. Forensics Secur. IEEE Trans. 8(1), 136–148 (2013)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Hamburger, Y.A., Ben-Artzi, E.: The relationship between extraversion and neuroticism and the different uses of the internet. Comput. Human Behav. 16(4), 441–449 (2000)
Article Google Scholar
Li, L., Zhao, X., Xue, G.: Unobservable re-authentication for smartphones. In: NDSS. The Internet Society (2013)
Google Scholar
de Montjoye, Y.A., Quoidbach, J., Robic, F., Pentland, A.S.: Predicting personality using novel mobile phone-based metrics. In: Social Computing, Behavioral-Cultural Modeling and Prediction, pp. 48–55. Springer (2013)
Google Scholar
Phillips, J.G., Butt, S., Blaszczynski, A.: Personality and self-reported use of mobile phones for games. CyberPsychol. Behav. 9(6), 753–758 (2006)
Article Google Scholar
Ross, C., Orr, E.S., Sisic, M., Arseneault, J.M., Simmering, M.G., Orr, R.R.: Personality and motivations associated with facebook use. Comput. Hum. Behav. 25(2), 578–586 (2009)
Article Google Scholar
Roy, A., Halevi, T., Memon, N.: An hmm-based behavior modeling approach for continuous mobile authentication. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3789–3793, May 2014
Google Scholar
Ruch, W.: Die revidierte fassung des eysenck personality questionnaire und die konstruktion des deutschen epq-r bzw. epq-rk. Z. für Differ. Diagnostische Psychol. 20, 1–14 (1999)
Article Google Scholar
Sanderman, R., Eysenck, S., Arrindell, W.: Cross-cultural comparisons of personality: The netherlands and england. Psychol. Reports 69(3f), 1091–1096 (1991)
Article Google Scholar
Serwadda, A., Phoha, V., Wang, Z.: Which verifiers work?: A benchmark evaluation of touch-based authentication algorithms. In: 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–8 (2013)
Google Scholar
Shahzad, M., Liu, A.X., Samuel, A.: Secure unlocking of mobile touch screen devices by simple gestures: You can see it but you can not do it. In: Proceedings of the 19th Annual International Conference on Mobile Computing; Networking, pp. 39–50. ACM, New York, MobiCom 2013, NY, USA (2013)
Google Scholar
Stecher, K.B., Counts, S.: Spontaneous inference of personality traits and effects on memory for online profiles. In: ICWSM, pp. 118–126 (2008)
Google Scholar
Tosun, L.P., Lajunen, T.: Does internet use reflect your personality? relationship between eysenck’s personality dimensions and internet use. Comput. Hum. Behav. 26(2), 162–167 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Technical and Human Sciences, Sapientia University, Soseaua Sighisoarei 1C, 540485, Tirgu Mures/Corunca, Romania
Margit Antal & László Zsolt Szabó
Telekom, Bucharest, Romania
Győző Nemes

Authors

Margit Antal
View author publications
You can also search for this author in PubMed Google Scholar
László Zsolt Szabó
View author publications
You can also search for this author in PubMed Google Scholar
Győző Nemes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margit Antal .

Editor information

Editors and Affiliations

Kaunas University of Technology , Kaunas, Lithuania
Giedre Dregvaite
Kaunas University of Technology , Kaunas, Lithuania
Robertas Damasevicius

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Antal, M., Szabó, L.Z., Nemes, G. (2016). Predicting User Identity and Personality Traits from Mobile Sensor Data. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-46254-7_13
Published: 22 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46253-0
Online ISBN: 978-3-319-46254-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting User Identity and Personality Traits from Mobile Sensor Data

Abstract

Similar content being viewed by others

The probability of predicting personality traits by the way user types on touch screen

Predicting Personality Using Novel Mobile Phone-Based Metrics

What data are smartphone users willing to share with researchers?

Keywords

1 Introduction