Keywords

1 Introduction

User identification is not only key to privacy and security but also offers a way to personalize user experiences, e.g., by displaying user-specific content. Apart from biometric user identification, a non-intrusive alternative is offered by user behavior. In contrast to physical traits, behavioral-based authentication allows for continuous (re-)identification during user sessions. In this context, particularly mouse movements is of interest, as it does not require additional hardware and allows implicit and non-inversive measurements of behavioral biometrics [10, 11, 16, 17].

Similar to gestures in human communication, the dynamics of the pointing device in human-computer interaction are unique and can deliver valuable and deterministic information about the user [2, 9, 11, 16, 17]. However, the question raises how such a system could be reasonably build. Traditionally, a multi-class classification approach suggests itself: every user is identified with a class and a classifier chooses the most likely user among the candidates. While such an approach may work for hardly changing environments, dynamic scenarios with many new and deleted users require frequent retraining of the classifier. For a large user base with many sessions per day, this could quickly become infeasible.

By contrast, we treat user identification as an anomaly detection problem [12, 13, 15] and propose to learn a model of normality for every user. The idea is as follows: As long as the user interacts with the system, the corresponding model correctly identifies the user. If a third user takes over, the identification fails and the model considers the third party as an anomaly and may shut down critical applications and data access points. Maintaining a multitude of these models is simple. Once a user logs in, the right model is retrieved and used until the end of the session. Retraining can be trivially parallelized for all users, new user models are integrated by training a single new model, and a deletion of a user simply deletes the corresponding model without any side effects for other models.

Our contributions are as follows: (i) We cast user identification as an anomaly detection problem, where user profiles are learned in a rich (non-linear) feature space spanned by a set of automatically derived features [13, 15] and in a deep neural architecture [12]. (ii) We evaluate the impact of splitting sessions into sequences including pause-based and an equal number of data points splits. (iii) We report on lessons learned that may shed light on future research in this area.

2 Related Work

Mouse movement has been investigated in the context of user authentication [4, 5, 14] and behavioral analyses [2, 3, 10, 16]; a great deal of these publications rely on hand-engineered features [2, 3, 5, 8, 14, 16] though. User identification based on biometrics extracted from mouse behavior has been first introduced by Gamboa & Fred [6]. They proposed a number of features and split the session into single sequences based on mouse clicks. Features are subsequently reduced by greedy search and fed into a sequential classifier. Feher et al. [5] introduce a hierarchical structure of mouse features, proposing in total 66 features. With these features, a random forest classifier is trained using 30 actions for verification. Recently, Chong et al. [4] investigate different architectures for user authentication using mouse data. However, their approach requires to retain the full model with samples of all users, when a new user is added.

3 Algorithms

Perhaps the most prominent one-class-classifier is the One-Class Support-Vector-Machine (OC-SVM) [13]. Its objective is to find the max-margin hyperplane that separates the origin from the data, where the latter is mapped by a function \(\phi \) into a (possibly nonlinear) feature space spanned by \(\phi :\mathcal {X}\mapsto \mathcal {F}\). Given a training set \(\mathcal {D} = \{x_1, ...,x_n\}\) with \(x_i \in \mathcal {X}\), the primal problem of the OC-SVM can be written as

$$\begin{aligned} \min _{\varvec{\omega }, \rho , \varvec{\xi }} \quad \frac{1}{2} \Vert \varvec{\omega }\Vert ^2 - \rho + \frac{1}{\nu n} \mathbf {1}^\top \xi \quad \text {s.t.}\,\, \forall \,\,i: \,\, \varvec{\omega }^\top \phi (\textit{\textbf{x}}_i) \ge \rho - \xi _i \,\,\wedge \,\, \xi _i\ge 0 \end{aligned}$$

where \(\rho \) is the distance of the hyperplane to the origin and acts as a threshold such that a new instance \(\textit{\textbf{x}}\) is considered anomalous (not belonging to the class that is represented by data \(\mathcal {D}\)) if \(f(\textit{\textbf{x}})=\varvec{\omega }^\top \phi (\textit{\textbf{x}}) - \rho <0\).

The Support Vector Data Description (SVDD) [15] is similar to the OC-SVM, but uses a hypersphere as a model of normality. The objective of the SVDD is to find the smallest hypersphere, given by radius \(R > 0\) and center \(c \in \mathcal {F}\), which encloses the majority of the data in feature space. The primal optimization problem is given by

$$\begin{aligned} \min _{\textit{\textbf{c}}, R, \varvec{\xi }} \;\;\; R^2 + \frac{1}{\nu n} \mathbf {1}^\top \xi \quad \text {s.t.}\,\, \forall \,\,i: \,\, \Vert \phi (\textit{\textbf{x}}_i) - \textit{\textbf{c}} \Vert ^2 \le R^2 + \xi _i\,\,\wedge \,\, \xi _i \ge 0. \end{aligned}$$

New points are considered anomalous if they lie outside of the hyperball, that is, if \(\Vert \phi (\textit{\textbf{x}}) - \textit{\textbf{c}}\Vert ^2>R^2\).

Recently, [12] presented a deep variant of the SVDD. An autoencoder is used for dimensionality reduction while a second part of the network minimizes the volume of the data-enclosing hypersphere. The objective of the Deep SVDD [12] is given by

$$\begin{aligned} \min _{\textit{\textbf{c}},R, \mathcal {W}} \;\; R^2 + \frac{1}{\nu n} \sum _{i = 1}^{n}\max \{0,\Vert \phi (\textit{\textbf{x}}_i; \mathcal {W}) -\textit{\textbf{c}}\Vert ^2 - R^2 \} + \frac{\lambda }{2}\sum _{l=1}^{L}\Vert \mathcal {W}^l\Vert ^2\quad \text {s.t.} \,\, R > 0 \end{aligned}$$

The second term penalizes points lying outside the sphere analogously to the traditional SVDD.

4 Empirical Study

We use data from the Balabit Mouse Dynamic ChallengeFootnote 1 that comprises sessions from ten different users. The training data encompasses five to seven longer sessions for each user while the test set contains multiple smaller sessions. The test set contains also out-of-sample users not present in the training set as well as sessions from anonymous attackers. The latter are simulated by mixing sessions from other users into the test session of a third user. Table 1 summarizes the data.

Table 1. Overview of Balabit Mouse Dynamics Challenge dataset

4.1 Session Splitting

We split the mouse movement within a session into short sequences. We investigate two different splitting criteria, the first Time Difference Split (TDS) and Equal Number Of Data Points Split (EDPS).

The former splits the session by time differences between two consecutive mouse coordinates. The pauses made by the user during the interaction with the system are an active field of mouse movement research [5, 6, 16]. Our approach is similar to [4], but instead of setting a hyper-parameter for the time difference splitting criterion, we determine the parameter based on the users’ mouse data using quantiles. We study the effect of splitting mouse movements at 95%, 98% and 99% quantiles. This leads to a unique splitting criterion for every user, see Fig. 1.

EDPS splits mouse data into sequences by using a fixed number of data points. We investigating different lengths of sequences (\(m \in {\{50, 100, 200, 500\}}\)). Using a fixed number of data points as a splitting criterion ensures that the session is separated and provides sequences of the same length, see Fig. 2.

The splitted sequences are cleaned and the resulting logs contain the following variables: timestamps, (x, y) coordinates, mouse buttons (left, right, scroll) and the action type (move, pressed, released, drag). Since the velocity of a scroll is not given we discard the related actions and ignore scroll operations entirely. We compare the 65 features from [5] with additional 12 features described in the Appendix (Table 2). All features are normalized.

Fig. 1.
figure 1

Sequences produced by the TDS-method, using the 98% quantile of the overall pauses as a splitting criterion. The first row shows legal, while the second row shows illegal sessions of user 7.

Fig. 2.
figure 2

Sequences produced by the EDPS-method, using 200 data points as a splitting criterion. The first row shows legal, while the second row shows illegal sessions of user 7.

We evaluate area under the curve (AUC) and the equal error rate (EER). The latter is identical to the intersection of the false acceptance rate (FAR) and the false rejection rate (FRR). To not clutter the evaluation part unnecessarily, we report only results for TDS using the 99% quantile and EDPS with length 100 that worked best over all tested parameter settings.

We compare OC-SVM, SVDD, and Deep SVDD and also include a vanilla SVM trained in a one-versus-rest manner, denoted by OvR-SVM for interpretability. The results are shown in Fig. 3 for TDS-99% and Fig. 4 for EDPS-50.

Unsurprisingly, the OvR-SVM clearly outperforms the one-class approaches. However, OvR-SVM also uses more information by including unified data from all other users as the negative class in the training process. Thus, OvR-SVM shows that there is room for improvement for the methods of interest, but, by construction, poses a solution that is clearly inappropriate in many practical scenarios. Also unsurprisingly, OC-SVM and SVDD perform equivalent throughout the experiment; for certain normalized feature representation, their objective functions become identical and provide naturally the same solution. The Deep SVDD performs well for user 7, 9 and 20 for sequences derives by the EDPS- as well as the TDS-method on 65 and 77 respectively. This finding gives rise to two conjectures: The first is that some users can, in general, be identified by their mouse movement as was also shown e.g., in [1, 4,5,6, 14]. And second, that perhaps the feature representation was simply not the right one for the other users. Thus, it can be hypothesised that the features learned for the authentication process have to be individualized so that the detection performance is maximized (see e.g., [7]).

Fig. 3.
figure 3

Results for TDS-99%

Fig. 4.
figure 4

Results for EDPS-100

5 Conclusions

We studied user identification by mouse movements. Conceptually, we interpreted the problem setting as an anomaly detection problem and evaluated traditional (OC-SVM, SVDD, OvR) and recent (DeepSVDD) methods. Preliminary empirical results showed that some users can actually be identified solely based on their mouse movement. This finding however does not hold for most of the users. Our lessons learned is twofold: (i) We conjecture that mouse behaviour is idiosyncratic, which is in line with other studies [1, 4,5,6, 14], and (ii) that we might be able to improve user identification by tailoring (learning) an individual feature representation for every user.