Keywords

1 Introduction

User authentication is a method that is used to determine whether a user is genuine (“allowed to access the system”) or an impostor (“prohibited from access to the system”) [1]. User authentication has three types of classes: knowledge based, object or token based, and biometric based. Knowledge-based user authentication is characterized by confidentiality; it is something that only the user would know. Object-based user authentication is characterized by control; it is something that the user has. Biometric-based user authentication relies on the user’s physiological or behavioral characteristics; it is something the user is. While the weaknesses of knowledge-based and object-based approaches are that the user may lose or forget passwords and tokens, the advantage of a biometric-based approach is that it can uniquely identify an individual by using the individual’s biological characteristics.

Although using biometric makes the authentication stronger and determines a user’s identity uniquely, verification based on physiological biometrics such as iris, face, or fingerprint offers mainly a one-time static authentication [2, 3]. To avoid this drawback, behavioral biometrics such as mouse clickstream data can be used to continuously authenticate a user by monitoring the user’s behavior [4]. In this work, an empirical evaluation of three classifiers is conducted on the Balabit dataset [5], which contains data for 10 users with a set of 39 behavioral features per user [6].

The rest of the paper is organized in four sections. Section 2 summarizes some previous research in this area. Section 3 describes the Balabit dataset and the feature extraction method. Section 4 describes the model and the experiments, followed by a discussion of the test results. Section 5 has concluding remarks and suggestions for future work.

2 Related Work

User behavioral analysis has been a focus of research for more than a decade. This section briefly presents some of the research on mouse-based authentication.

Antal et al. [6] applied a Random Forest (RF) classifier for each user using mouse movements for verifying impostor detection. They used the Balabit dataset [5], which includes 10 users. Each user has many sections and mouse actions. They segmented each session’s data into three types of mouse actions: Mouse Movement (MM), Point Click (PC), and Drag and Drop (DD). The researchers extracted 39 features and obtained results of 80.17% average accuracy (ACC) and 0.87 average Area Under Curve (AUC). The highest accuracies achieved for users (7 and 9) were 93% and 0.97 AUC. The lowest accuracy achieved for a user (8) was 72% and 0.80 AUC.

Nakkabi et al. [7] proposed a user authentication scheme based on mouse dynamics. They collected mouse behavior data from 48 users and applied a fuzzy classification that relied on a learning algorithm for multivariate data analysis. They conducted an evaluation and achieved a False Acceptance Rate (FAR) of 0% and a False Rejection Rate (FRR) of 0.36%. Their experiments required more than 2000 mouse events in order to classify a user as legitimate.

Feher et al. [8] introduced a framework for user verification using mouse activities. The framework was divided into three parts: acquisition, learning, and verification. The first step is to capture user actions from the users’ mouse activities. Then, classify each event type and store them in a database. The third phase is to send each event to the favorite classifier based on action type. The classifier has two layers: a prediction layer and a decision layer. The researchers conducted tests of multi-class classifier using a RF classifier. They collected the data from 25 volunteers. They obtained an Equal Error Rate (EER) of 1.01% based on 30 actions.

Gamboa et al. [9] developed a data acquisition system for collecting users’ mouse activities. The system records all user interaction throughout the world wide web. The dataset was collected from 50 participants; each user has 400 strokes. A stroke is defined as a group of points between two actions. The authors proposed 58 behavioral features extracted from the raw data using some mathematical operations. These features were used to identify a user based on how they interact with the system. Furthermore, Gamboa et al. developed a sequential classifier using statistical pattern recognition techniques in order to distinguish between users. The authors achieved an equal error rate of 0.7% per 100 mouse strokes.

Another biometric authentication approach based on mouse dynamics was introduced by Shen et al. [10]. They collected user behavioral data under a controlled environment using the software tool they developed. The software collected the events of “mouse move” or “mouse click” for about thirty minutes in each session. The dataset obtained had 15 sessions for each of 28 subjects. Based on a mining method, the researchers focused on using frequent and fixed actions as behavioral patterns for extracting user characteristics through pattern growth. They used an SVM and achieved an FAR of 0.37% and an FRR of 1.12%.

Schulz [11] collected a dataset from 72 volunteers using a software tool on their personal machines. The software tool presented a continuous authentication system using mouse events; it segmented a user’s events into length of a movement, curvature, inflection, and curve straightness features, and then computed a user’s behavioral signature using histograms based on curve characteristics. For the verification stage, the researcher used Euclidean distance for classification and computed the distance between a user’s login and the mouse activities. An EER of 24.3% from a group of 60 mouse curves is obtained. In contrast, by using groups of 3600 mouse curves, the performance increased to an EER of 11.2%.

Bours et al. in [12] proposed a login system based on mouse dynamics. They collected data from 28 participants of different age groups. They used a technique called “follow the maze” in which the participants performed a task by following the tracks on their own computer. This task was performed five times per session in order to acquire sufficient data on mouse movements. The maze contained 18 tracks, divided into 9 horizontal and 9 vertical tracks. They measured the various distances using Euclidean distance, Manhattan distance, and edit distance algorithms. The results that they obtained were an EER of 26.8% in the case of the horizontal direction and an EER of 27.0% in the case of the vertical direction.

Hashia et al. [13] worked on mouse movement as a biometric. They proposed two authentication methods: the first method is for initial login of users (enrollment), and the second one is to monitor a computer for suspicious activities (verification). It required from the user about 20 s to complete each of two methods. For the enrollment phase, a user must be using the mouse and following a series of dots that showed one at a time on the user’s screen. The purpose of this step is to record the coordinates of the mouse every 50 ms and then calculate the speed, deviation from a straight line, and angles. They used the data collected from the enrollment phase for the verification phase by comparing a user’s credentials and the data collected in the enrollment phase. They tested their approach using 15 participants of age 22–30. They achieved an error rate of 20% when using 1.5 standard deviations of the average from the corresponding enrollment value, and an error rate of 15% using 1 standard deviation of the average from the corresponding enrollment value.

3 Description of Mouse Raw Data

This research used the Balabit Mouse Challenge dataset [5], obtained at the Budapest office of the Balabit company. The dataset contains raw data obtained from 10 users using remote desktop clients connected to remote servers. It has many sessions with characteristics of how a person uses a mouse. Each session includes a set of rows, where each row recorded a user action as (rtime, ctime, button, state, x, y): “rtime” is the elapsed time recorded since the start of the session using the network monitoring device, “ctime” is the elapsed time through the client computer, “button” is a mouse button, “state” is information about the button, and “x” and “y” are the Cartesian coordinates of the mouse location [6].

3.1 Extraction of Features

A mouse action is a set of sequential user actions that represent a movement of the mouse between two points. This study uses the user features extracted from the Balabit Mouse Challenge dataset [5]. This dataset divides the raw data into three types of actions: MM, PC, and DD. MM describes a movement between two screen positions; PC is a Point Click or Mouse click; DD is a drag-and-drop event. The dataset presents 39 features extracted from an individual’s mouse actions. A detailed description of features is found in [6].

4 Mouse Dynamics Model and Experimental Results

In this research, supervised machine-learning techniques were utilized to monitor the behavior of users in order to distinguish legal users from illegal users [14]. Three machine-learning algorithms were evaluated: Decision Tree Learning (DT), k-Nearest Neighbors (k-NN), and Random Forest (RF). The Scikit-learn software tools were used for the analysis of mouse clickstream data [15]. A significant step in the classification was to prepare the training data in CSV format, so that it could be interpreted by the classifiers. In the model, if a user’s mouse dynamics are the same as the characteristics stored in the system’s database, then the system lets the user continue working on the device; otherwise, the system must log out the user (see Fig. 1). Specifically, the following steps describe how the model works:

Fig. 1.
figure 1

User behavioral biometrics model

  • Data Collection Phase: Raw data of the users are collected.

  • Features Extraction Phase: Meaningful features, such MM, PC, and DD, were extracted using the method reported in Antal et al. [6].

  • Data Preparation Phase: For the training phase, all the users’ data was aggregated and put in random order. The training dataset was then split into two parts: the first part (70% of the data) was used for training, and the second part (30% of the data) was used for testing the model’s performance. For every experiment, the balance of training sets and evaluation sets remained the same in order to avoid classifier bias.

  • Select a Classifier Phase: DT, RF, and KNN were utilized to show the ability of the proposed model to determine whether a user was genuine or an impostor from a user’s mouse clickstream data.

  • Training Data Phase: The training process began by reading the characteristics of all the users from the training dataset and then loading them into the three classifiers to train the model. This step was a significant step, since the training data contained the user behavior itself and a class label.

  • Testing Data Phase: After completion of the training step, the model was tested on the new data that was never used for training, to categorize whether the user as a genuine user or an impostor.

The experiment was conducted in two stages: (i) a verification stage, and (ii) an authentication stage. The evaluations were measured using classifier accuracy (ACC) and area under curve (AUC). Another important evaluation to examine the classifiers is to plot the receiver operating characteristic (ROC). The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) [16].

4.1 Verification Stage

In this stage, all three classifiers were first trained using the data that only contained the genuine user’s actions (positive). Each user has many sessions; all users’ sessions data were placed in one Excel file. Then, the experiment was conducted by doing training and testing for each user using the DT, K-NN, and RF classifiers. The goal of the verification stage was to verify whether the mouse data was related to a given user. After testing all the users using three classifiers, a perfect score of 100% verification rate was achieved.

4.2 Authentication Stage

In this stage, each user is in one of two classes: genuine (positive) and impostor (negative). The impostor actions were selected from the other users. Then, we assigned the positive action as {1} and the negative action as {0}. The classifiers are responsible for determining the probability that the user belongs to the genuine class or imposter class. Therefore, all classifiers were tested based on these two scenarios:

  1. A.

    A single user’s data with all actions (MM, PC, DD)

  2. B.

    All the users’ data with a single action (MM, PC, DD)

Scenario A: A Single User’s Data with All Actions.

In scenario (A), an experiment was conducted for a single user (35, 7, 9, 12, 15, 16, 20, 21, 23, and 29) with all actions (MM, PC, and DD), using the three classifiers. The DT, K-NN, and RF classifiers achieved average accuracies of 91.9%, 94.4%, and 79.7%, respectively. The highest average accuracies were achieved for user (9): (ACC: 91.8%), DT 96.2%, KNN 99.2%, and RF 80.1%. The lowest average accuracies were achieved for user (12): (ACC: 85.6%), DT 90.1%, KNN 91.5%, and RF 75.2%. Table 1 reports the results in detail for each user. The AUC value is computed based on the FPR and the TPR. ROC curves are given in Figs. 2, 3, and 4.

Table 1. Scenario A: single user, all actions (MM, PC, DD)
Fig. 2.
figure 2

ROC curve for DT, single user, all actions

Fig. 3.
figure 3

ROC curve for KNN, single user, all actions

Fig. 4.
figure 4

ROC curve for RF, single user, all actions

Scenario B: All Users’ Data with a Single Action.

In scenario (B), the dataset was initially separated into three groups of mouse actions: MM, PC, and DD. Each group contained all users (35, 7, 9, 12, 15, 16, 20, 21, 23, and 29). Training and testing of the three classifiers were then conducted on each group based on the single action. The results are reported in Table 2 (MM), Table 3 (PC), and Table 4 (DD). The highest accuracies were achieved with the PC action compared to MM and DD, as shown in Table 3 (PC): (DT: ACC: 87.6%, AUC: 90.3%), (KNN: ACC: 99.3%, AUC: 99.9%), and (RF: ACC: 89.9%, AUC: 92.5%). Also, ROC curves are given in Figs. 5, 6 and 7 for (MM), Figs. 8, 9, and 10 for (PC), Figs. 11, 12 and 13 for (DD).

Table 2. Scenario B: all users, single action (MM action)
Table 3. Scenario B: all users, single action (PC action)
Table 4. Scenario B: all users, single action (DD action)
Fig. 5.
figure 5

ROC curve for DT, all users, MM action

Fig. 6.
figure 6

ROC curve for KNN, all users, MM action

Fig. 7.
figure 7

ROC curve for RF, all users, MM action

Fig. 8.
figure 8

ROC curve for DT, all users, PC action

Fig. 9.
figure 9

ROC curve for KNN, all users, PC action

Fig. 10.
figure 10

ROC curve for RF, all users, PC action

Fig. 11.
figure 11

ROC curve for DT, all users, DD action

Fig. 12.
figure 12

ROC curve for KNN, all users, DD action

Fig. 13.
figure 13

ROC curve for RF, all users, DD action

5 Conclusion

This paper provides a continuous user authentication model based on mouse clickstream data analysis. Each of three machine-learning classifiers used 39 features of mouse actions MM, PC, and DD. The classifiers were able to determine a genuine user from an impostor with reasonable accuracies and AUC.

In the verification phase, the model was able to recognize the user with an accuracy of 100%. In the authentication phase, data containing genuine and impostor actions were examined using two scenarios: (A) a single user with all actions, and (B) a single action with all users. The best results were obtained from scenario B using the PC action: (DT - ACC: 87.6%, AUC: 90.3%), (KNN - ACC: 99.3%, AUC: 99.9%), and (RF - ACC: 89.9%, AUC: 92.5%). In the future, a deep learning model will be constructed using the MM, PC, and DD actions, and its performance will be compared with the traditional classifiers.