Keywords

1 Introduction

The process by which a system recognizes a user or verifies the identity of a user trying to access it, is known as authentication. Installing a robust authentication technique that prevents impersonation, is of utmost importance for any personalized system since it plays a major role to defend against unauthorized access of the system. The procedure for establishing the identity of a user can be broadly branched into three categories [1]:

  1. 1.

    Proof by Knowledge—A user’s identity can be authenticated with the help of information which is known only to the actual user. (e.g. Password)

  2. 2.

    Proof by Possession—Here the authentication is done with the help of an object specific to and in possession of the real user. (e.g. Smart Card)

  3. 3.

    Proof by Property—The user’s identity is validated by measuring certain properties (e.g. Biometrics) and comparing these against the claimed user’s original properties (e.g. Fingerprint)

Majority of the research in this domain mainly focuses on the proof by knowledge domain. Here, the validation is done with the use of password, PIN or pattern based techniques. These authentication schemes are mainly victim of shoulder surfing as shown in Fig. 1. It is a form of spying to gain knowledge of one’s password or identity information. Here, the forger or imposter may observe or glance at the password, PIN or pattern being entered during authentication and may use it to impersonate a valid user. Extensive research is going on in this field to aid various applications such as authentication to prevent e-financial incidents [7], etc. Most of these applications use keystroke patterns [8] or biometrics [9, 11] or password entry [10] for authentication. The visual feedback provided by above mentioned techniques make them vulnerable to theft of identity. A possible solution to this may be to exploit the fact that the field of view of the valid user will be different as compared to the impersonator while shoulder surfing. Combining minimization of visual feedback with the above possible solution is likely to create a robust system resistant to user impersonation.

Fig. 1
figure 1

An instance portraying authentication by a legitimate user while an imposter is applying shoulder surfing

This paper proposes a novel authentication technique to avoid identity theft mainly caused by shoulder surfing. Here, we have used pattern based authentication technique without visual feedback (unlike pattern based authentication used in touch enabled devices) where Leap Motion device serves as the sensor to capture input signal. The device’s interface has been used to create patterns with the help of on-air gestures. Leap Motion sensorFootnote 1 is a recent release by Leap Motion Inc. It can capture real-time movement of fingers and it can track precise movement of hand and fingers in three-dimensional space. It has a tracking accuracy of 0.01 millimetre. The device is currently being used for various gesture based applications like serious gaming [13], human computer interface, augmented reality, physical rehabilitation [12], etc. It is a low-cost device that is small in size. It supports a number of frameworks and is fairly accurate. These features of the device makes it a good choice as compared to other similar devices such as Microsoft’s Kinect or Intel’s RealSense. For proper tracking of hand or fingers, a user should place his/her hand in the field of view of the device. Its range is about 150\(^{\circ }\) with the distance constrained to less than a meter. The device comprises of a pair of infra-red cameras and three LEDs providing a frame rate varying from 20 to 200 fps. Information regarding the position of fingers, palm, or frame time-stamp can be obtained from each frame.

We have developed a methodology to use this for authentication on personalized devices. We start with partitioning the 2D screen or display into non-overlapping rectangular blocks and map it with the 3D field of view of the device. Assuming each of these blocks represent one character or symbol of the alphabet, users are asked to draw patterns on air. However, during the process, we do not provide any visual feedback to the user. Therefore, no cursor movement is seen on the screen. Then the task of recognising these patterns can be done by classifiers such as Hidden Markov Model (HMM) [5], Support Vector Machine (SVM), Conditional Random Field (CRF), etc. Here we have used HMM as the classifier due to its ability to model sequential dependencies and its robustness to intra-user variations. We train independent HMM for each distinct pattern in the training set and then a given sequence is verified against all trained model. The model having maximum likelihood is assumed to be the best choice.

Fig. 2
figure 2

Partitioning of the display into non-overlapping blocks and assignment of symbols or alphabets

Rest of the paper is organized as follows. In Sect. 2, proposed methodology is presented. Results obtained using large set of samples collected in laboratory involving several volunteers, are presented in Sect. 3. We conclude in Sect. 4 by highlighting some of the possible future extensions of the present work.

2 Proposed Methodology of Authentication

This section describes about the signal acquisition, field of view mapping, training and testing the authentication methodology.

2.1 Device Mapping and Feature Extraction

First, we divide the whole screen or display into non-overlapping rectangular boxes and label each of those boxes. As an example, the screen can be arranged as a matrix of size \(4\times 4\) labelled “A” to “P” as depicted in Fig. 2. Using the finger and hand tracking utility of the Leap Motion device, we track movement of a user’s index finger while performing the gesture during authentication. Initially, we provide a visual feedback to the user in the form of a visible cursor that helps the user to get an idea of the initial position of his/her finger with respect to the screen. Therefore, before drawing the authentication pattern, the user first executes a predefined gesture (e.g. circle gesture) that is used as a marker to start the authentication pattern and thereafter we hide the cursor. Therefore, the visual feedback is removed and the user draws the pattern through sense and anticipation. We have tested various patterns such as swipe, screen-tap, key-tap or circle to understand the best choice for the start marker. Circle gesture was found to be the most suitable and comfortable by the volunteers. Based on their feedback and the fact that the execution of the gesture should facilitate the knowledge of the finger position on screen before the cursor is hidden, circle gesture was used.

Fig. 3
figure 3

Respective coordinate systems and possible mapping

Next, we present the method to map the field of view of the Leap Motion device to the display screen (e.g. computer screen). Since the display screen is rectangular in shape, therefore, instead of mapping the entire inverted pyramid interaction-space of the device (3D) to the 2D screen, we create an interaction box within the field of view to ease the movement and mapping of fingers. The height of interaction box can be set according to the user’s preference of interaction height. Respective coordinate systems of display screen and Leap Motion are shown in Fig. 3. From the figure, it is evident that we need to flip the Y-axis of Leap Motion to map the coordinates correctly to the display screen. We normalize the real world position of the finger so that the coordinates lie between 0 and 1 and then translate the coordinates to the screen position as described in (1) and (2). This helps us to localize the position of the finger-tip on the display screen segment towards which the finger is pointing. We have not included Z-axis of the real-world position (with respect to the device) of the finger since we want to portray the movement on the 2-D screen.

$$\begin{aligned} X_{s} = (X_{n})W_{s} \end{aligned}$$
(1)
$$\begin{aligned} Y_{s} = (1-Y_{n})H_{s} \end{aligned}$$
(2)

where,

\(X_{s}\), \(Y_{s}\) represent X and Y coordinate of the finger position mapped on to the screen, respectively. \(X_{n}\) and \(Y_{n}\) represent the normalized X and Y coordinate of the finger-tip within the field of view of the device. \(W_{s}\) and \(H_{s}\) represent width and height of the screen.

Fig. 4
figure 4

A sample pattern “AEIJKL” drawn over the field of view of the device and its corresponding 2D mapping onto the screen

Next, acquisition of authentication patterns with respect to the above mentioned mapping is described. Suppose, a user wants to draw a pattern “AEIJKL” as depicted in Fig. 4. The user needs to move his/her finger over the device’s field of view to traverse the labelled boxes in the following order of sequence, A, E, I, J, K, L. To accomplish this, the user must bring his/her finger within the device’s field of view and try to point box “A”. After making a small circle gesture on box “A” (as described earlier), the user needs to traverse the other boxes in the above mentioned order. Although there is no visual feedback, position of the finger-tip in each frame is recorded. This information is used for generating the pattern. A pattern of such movement can be represented as follows,

$$\begin{aligned} p = (x_1, y_1),\ldots \ldots .., (x_k,y_k) \end{aligned}$$
(3)

where, p represents the pattern under consideration, (\(x_k\), \(y_k\)) represents the coordinate of the finger-tip with respect to the screen-space in the \(k\mathrm{th} \) frame. Figure 5 depicts some of the patterns engaged in this experiment.

Fig. 5
figure 5

Different test patterns involved in the study

2.2 Training of Hidden Markov Model and Recognition

In this section, we present a methodology to implement the authentication protocol. We have applied Hidden Markov Model (HMM) based stochastic sequential classifier to train our system and classify test patterns. In our authentication scheme, users were asked to register their favourite patterns or secret sequence of symbols.

HMM is a preferred choice for such pattern classification tasks because of its ability to model sequential dependencies. An HMM can be defined by initial state probabilities \(\pi \), state transition matrix \(A=[a_{ij}]\), \(i, j = 1, 2, \ldots , N\), where \(a_{ij}\) denotes the transition probability from state i to state j, and output probability \(b_j\)(O) is modelled with discrete output probability distribution with S number of states. After several experiments, we find that, \(S=5\) provides optimum results. Vector quantization with 16 clusters has been used to discretize the input patterns or sequences. We perform the task of recognition using the Viterbi decoding algorithm [2,3,4]. We assume that the observation variable depends only on the present state. Therefore, a first order left to right Markov model has been presumed to be correct in the present context. The estimation of maximum likelihood parameters is carried out using Baum-Welch training algorithm. It uses EM technique for maximization of the likelihood where \(\theta = (A, b_j,\pi )\) describes the Hidden Markov chain. The algorithm finds a local maximum of \(\theta \) for a given set of observations (Y) as depicted in (4), where Y represents the observation sequence. More on the method can be found in Rabiner’s pioneering documentation on HMM [5].

$$\begin{aligned} \theta ^* = \max _{\theta } P(Y|\theta ) \end{aligned}$$
(4)

The parameter \(\theta \) that maximizes the probability of the observation can be used to predict the state sequence for a given vector [6]. We compute the probability of observing a particular pattern (\(p_j \in S\)) using (5), where \(\theta _i\) represents the parameters of the \(i\mathrm{th}\) HMM that are learned through training and X denotes the hidden state sequence. Finally, given a test pattern we can classify it into one of the classes using (6) assuming there are C such distinct patterns in the dataset.

$$\begin{aligned} P(p_j,\theta _i) = \sum _{X}P(p_j|X,\theta _i)P(X,\theta _i) \end{aligned}$$
(5)
$$\begin{aligned} \arg \max _{\theta _i} P(p_j,\theta _i) \, \, i = 1, 2, \ldots , 10 \end{aligned}$$
(6)

Using the normalized coordinate vector representing all samples including training and testing patterns, the approach seems fairly robust to intra-user variations. In addition to that, since HMMs are scale-invariant in nature, the recognition process works fairly well regardless of the size of the coordinate vector. The procedure is summarized in Algorithm 1.

figure a

3 Results

This section presents the results obtained during experiment involving 10 unbiased volunteers. To test the robustness of the proposed system, we have selected 10 varying authentication patterns (simple as well as complex patterns). Users were asked to mimic these patterns. Each volunteer was involved for the data acquisition phase where they were given a short demonstration to make them familiar with the Leap Motion device. A total of 1000 patterns were collected and 80 % of this data was used for training and remaining 20 % was used for testing.

Table 1 Confusion matrix depicting accuracy of test pattern recognition (authentication)
Fig. 6
figure 6

Illustration of closely matching patterns “DGKP” and “HGKL”

A total of 10 models were created, one for each of the 10 unique patterns (essentially represent 10 distinct users). These models were trained (HMM) following the procedure described in Algorithm 1. Out of 1000 samples, 800 patterns were used for training and 200 patterns were used for testing. Confusion matrix of the classification is presented in Table 1. It is evident from the results that, accuracy is quite high for majority of these patterns except a few patterns. For example, it may be noted that, single instance of two of the patterns, namely 7 and 9, often getting confused with 9 and 6, respectively. 9 (“HGKL”) is being recognized as 6 (“DGKP”). This is due to the fact that, while the user was trying to draw “HGKL” pattern, he/she might have traversed the path representing “DGKP” as depicted in Fig. 6. Therefore, unintentionally visiting nearby blocks during the gesture may cause failure in the logging-in procedure. However, our experiments reveal that only in two cases, we got a mismatch. Remaining cases were detected correctly with an overall accuracy of 99 %.

4 Conclusion

The paper proposes a novel technique for personalized device authentication via patterns without visual feedback. Here, we can conclude that, if the visual feedback is eliminated during authentication, the process becomes robust. However, existing touch-less or touch-based systems rely on visual feedbacks. On the contrary, the proposed Leap Motion based interface is robust against shoulder surfing attacks. This happens due to the difference in the field of views of the authentic user and the imposter.

The proposed system can be used for designing robust authentication schemes for personalized electronic devices. This will mitigate some of the limitations of existing contact-based or visual feedback based authentication mechanisms. However, the proposed system needs to be tested against real-imposter attacks and experiments need to be carried out to test its protection potential against such attacks.