1 Introduction

The prevalence of smartphones has rapidly increased due to their improved interactive features and sensing capabilities. People have come to rely on their smartphones to accomplish various tasks in their lives. With increasing smartphone dependence, increasing security measures has become an urgent need. Some of the most commonly addressed mobile security features are authentication mechanisms, traditionally including explicit methods such as the personal identification number (PIN), patterns, and passwords. However, all of these are single-time authentication methods that require active user participation which causes inconvenience to the user. Moreover, these types of authentications are easy to breach and could be effortlessly hacked by an attacker, including through shoulder surfing (Zakaria et al. 2011) and smudge attacks (Aviv et al. 2010). Recently, researchers have proposed new biometric authentication methods using built-in smartphone sensors such as accelerometers and gyroscopes. Currently proposed mechanisms have had greater reliance on the user’s physiological and behavioral characteristics. Physiological authentication mechanisms are related to the user’s body characteristics, such as their fingerprints, facial features or retina images. Unfortunately, physiological biometric-based authentication mechanisms are considered one-time authentication methods. The performances of these solutions are heavily influenced by external factors. Also, these methods require specialized equipment to perform biometric scanning (Salem et al. 2016), whereas behavioral authentication methods are based on user behavior with the smartphone during everyday use, such as their touchscreen interaction (Frank et al. 2013; Kambourakis et al. 2016), gait (Damaševičius et al. 2016; Hoang et al. 2013), hand motions (Sitová et al. 2016) and so on. This type of authentication adapts to identify features of the user’s behavior that do not vary over a period of time (Alzubaidi and Kalita 2016). The behavioral authentication mechanisms continuously authenticate the user without his/her intervention. Moreover, no additional hardware is required to identify the smartphone’s owner. However, the majority of preexisting authentication methods use single behavioral biometrics which suffer from low accuracy and have not achieved performance good enough to allow real-world implementation (Do et al. 2014). Therefore, multimodal biometrics have been adopted in recent academic research to enhance authentication system performance. (Akhtar et al. 2017; Galdi et al. 2016).

From the above, it is evident that there is a crucial need for multimodal authentication mechanisms that continuously authenticate the smartphone user without his/her intervention. By leveraging the capacities of today’s multi-sensor smartphones, sensor inputs such as gait signals and keystroke dynamics are used as sources of user authentication which can be gathered without user awareness. Most people are used to walk and perform routine behaviors in which texting while walking is most common to do. Using data acquired from the built-in smartphone sensors during simultaneous walking and typing enable to build accurate behavioral patterns for the user. As detailed in Sect. 2, academic research has demonstrated that keystroke dynamics and gait patterns are unique for each user. These two modalities are widely used in smartphone authentication studies, and most of this research uses keystrokes and gait as a single behavioral biometric or in combination with other biometrics. However, none of the existing research actually combines these two biometrics to built a user profile for authentication purposes through using a real multimodal dataset, which is the main contribution of our work.

This paper makes use of gait patterns and keystroke dynamics to build a new multimodal authentication method. The proposed method continuously acquires the user’s gait signal with keystroke dynamics during simultaneous walking and text input, by using the smartphone’s built-in sensors without explicitly seeking user cooperation. Moreover, to reduce the impact of continuous sensing on battery life, the proposed mechanism only uses the accelerometer sensor to measure acceleration during movement, which is considered to be the most efficient sensor in terms of energy consumption. To demonstrate the efficacy of the proposed mechanism, a real multimodal dataset of keystroke dynamics and gait patterns has been collected from 20 volunteers through various scenarios. The results obtained from the experiments show that the proposed authentication method is able to enhance the smartphone’s security.

The rest of this paper is organized as follows. Section 2 addresses the related works. Section 3 describes the proposed framework architecture, Sect. 4 details the proposed authentication method. Section 5 reports the experimental results. Section 6 discusses the usability of the approach. Finally, we conclude this paper and outline further directions for this work in Sect. 7.

2 Related works

Already several studies on gait patterns and keystroke dynamics have been adopted within the field of continuous authentication. In this section, we categorize them as follows.

2.1 Gait patterns-based authentication solutions

Gait modality had received considerable attention in previous works through the proposals of different gait pattern authentication methods using accelerometer sensors for user identity recognition. Earlier studies (Derawi et al. 2010; Mantyjarvi et al. 2005) demonstrated that gait signal acquired with three-dimensional accelerometers could be used to identify mobile phone users. Unlike previous work, (Hoang et al. 2013) proposed a new gait authentication mechanism regardless installation error using both the built-in accelerometer and magnetometer, where a novel segmentation algorithm was used to segment the signal into separate gait cycles. An experiment with the participation of 38 volunteers achieved approximately 94.93% accuracy under identification mode, also a false match rate (FMR), false non-match rate (FNMR) of 0%, 3.89% and a processing time of less than 4 s under authentication mode. Later, (Choi et al. 2014) proposed a set of new gait signature metrics for recognizing different walking patterns in human gait that could efficiently extract distinctive gait characteristics and identify an individual from a list of subjects. Recently, (Zhang et al. 2015) introduced an accelerometer-based gait recognition method to avoid cycle detection failures and inter-cycle phase misalignment issues where it combined the multi-scale signature points (SP) extraction method, an SP sparse encoding scheme with implicit consideration of the phase propinquity, and the classifier for sparse-code collection (CSCC) for recognizing feature collection. The results from this methodology achieved an accuracy of 95.8% for identification, and the equal error rate (EER) of 2.2% for verification. (Zhong et al. 2015) proposed a new pace independent mobile gait biometric algorithm to address the challenges of varying walking speed and sensor rotation. The performance analysis of the algorithm on a realistic mobile gait dataset, which contained 51 subjects, had achieved an EER of 7.22% with a performance improvement of 37%. In addition to these studies, a number of different methods had been proposed for gait based recognition on mobile devices using various types of features or classification algorithms (Damaševičius et al. 2016; Muaaz and Mayrhofer 2013, 2014; Zhao and Zhou 2017).

2.2 Keystroke dynamics based authentication solutions

The application of keystroke dynamics for continuous authentication is not entirely new, as it is derived from research on authenticating computer access (Brown and Rogers 1993; Gunetti and Picardi 2005; Monrose and Rubin 2000). With the new interactive features of present-day touchscreen-equipped smartphones, typing behavior has become easier to extract from smartphone virtual keyboards with additional features such as pressure and the finger area. (Antal et al. 2015) examined the effect of these additional touchscreen features in identification and verification performance, it was concluded that the addition of these feature sets enhances the accuracy of both processes. Whereas (Buschek et al. 2015) proved that including spatial touch features reduces implicit authentication EER by 26.4–36.8% relative to the previously used temporal features. Keystroke dynamics are also used in Multi-factor authentication methods to strengthen user authentication on smartphones, where (Salem et al. 2016) proposed a user verification and identification system on touchscreen mobile devices, using keystroke dynamics as a second authentication factor with a password, The model achieved false acceptance rate (FAR) of 2.2%, false rejection rate (FRR) of 8.67%, and an EER of 5.43% and proved that keystroke dynamics provide an acceptable level of performance measures as a second authentication factor. (Kambourakis et al. 2016) proposed two-factor touch stroke user authentication method to discriminate between the legitimate user and intruders, the achieved experimental results of 20 participants showed that touch stroking has significant potential in designing enhanced authentication systems for smartphone devices. For investigating the effectiveness of sensor-enhanced keystroke dynamics, (Stanciu et al. 2016) implemented a statistical attack against sensor enhanced keystroke dynamics and evaluated its impact on detection accuracy. The results showed that sensor-enhanced keystroke dynamics are generally robust against statistical attacks with a marginal EER impact (< 0.14%). Moreover, several other research studies using different authentication algorithms and classification features also achieved promising results in continuous smartphone authentication based on keystroke dynamics (Alsultan et al. 2016; Antal and Szabó 2015; Bours and Mondal 2015; Kang and Cho 2015).

2.3 Gait patterns and keystroke dynamics in multimodal based authentication solutions

The majority of the aforementioned works on authentication use gait patterns or keystroke dynamics as a single behavioral biometric to recognize the user identity. Despite having many inherent advantages, there are numerous challenges such as the susceptibility of the biometric sensor to outside factors, the changing emotional or physical state of the user or poor data acquisition. To overcome these limitations, research has moved from unimodal biometrics to multimodal biometrics to increase authentication performance. In (Saevanee et al. 2012), keystroke dynamics were used with behavioral and linguistic profiling to discriminate users, where matching-level fusion methods were applied to study the feasibility of the proposed system. The results showed that matching-level fusion could improve classification performance with an overall EER of 8%. Later, (Crawford and Renaud 2014; Crawford et al. 2013) combined keystroke dynamics and voice recognition for mobile systems to identify the device owner, the initial results showed that the described transparent authentication framework is effective in increasing both usability and security. Furthermore, (Damer et al. 2016) introduced a multi-biometric continuous authentication solution that includes information from the face images and the keystroke dynamics of the user. Whereas, most of the existing studies based on gait patterns for multimodal authentication use video-cameras from a distance to capture gait coupled with face recognition (Almohammad et al. 2012; Guan et al. 2013; Hofmann et al. 2012; Hossain and Chetty 2011; Nanda et al. 2017; Xing et al. 2015), or body-worn motion recording sensors (Tao et al. 2018), such the work in (Vildjiounaite et al. 2006) where gait patterns had been combined with voice recognition for user authentication. The only related study within our scope of research is by (Do et al. 2014), which combined gait biometrics acquired by use of the smartphone’s built-in accelerometer and magnetometer sensors with keystroke dynamics. A virtual dataset was created by fusion gait and keystroke dynamics where fusion operated at both the feature extraction level and the matching score level. The proposed methodology achieved a recognition rate of approximately 97.86% under identification mode and an EER approximately 1.11% under authentication mode.

3 The system framework

Figure 1 represents the proposed framework architecture. Specifically, the proposed framework is divided into two phases: enrolment and authentication.

Fig. 1
figure 1

The proposed smartphone user authentication framework

-Enrolment phase: initiated by the text input and walking of the user, the system acquires gait signals and keystroke dynamics from the built-in smartphone sensors. The collected biometrics are first separately preprocessed and analyzed to extract features. Then the extracted features are processed to get the final data that will be used to built the behavioral profile template of the user.

-Authentication phase: the system autonomously collects the new sensor samples through any attempt to manipulate the smartphone, and compare them with the stored template. The user is only authorized to access to the smartphone services upon a successful match, otherwise, the user is classified as an imposter.

A detailed description of the authentication process will be explained in the following sections.

4 The methodology

4.1 Data acquisition

Many dataset resources containing several unimodal biometrics have been accessible for the academic research, however, no realistic multimodal dataset based gait and keystroke dynamics is available. In this paper, we collected a real multimodal dataset from 20 subjects with a balanced gender distribution, aged 22–33 years old, using the Xiaomi 2S mobile phone with the Android version (5.0.2). For the data collection task, we developed a customized Android application with a virtual keyboard for accelerometer and keystroke data collection (see Fig. 2). The application collects three-dimensional accelerometer data (X, Y, and Z axis) when the smartphone user is walking, with the user’s input rhythm at the same time. As it is known, the default android keyboard is allowed to only collect pressed key IDs. As detailed in Sect. 5, more features like the pressure and the size of touch area are required in our work to construct the behavioral profile template of the user. Hence, a virtual keyboard was designed using MotionEvent and Gesture classes provided by Android SDK (Software Development Kit) and used instead of the default keyboard in the settings of the device so that additional keystroke features could be collected from all types of applications that required typing.

Fig. 2
figure 2

The proposed application and the keyboard interface used for accelerometer and keystroke data collection

To evaluate the effectiveness of the proposed method, a real dataset is constructed under realistic acquisition scenarios, based on how users interact with their smartphones during their routine activities. In this paper, data is collected from each participant using the following four scenarios:

  • In the first scenario, the participants were asked to put the smartphone into their trouser pocket and walk as naturally as possible in a straight corridor.

  • In the second scenario, the participants were asked to hold the smartphone freely in hand and walk.

  • In the third scenario, the participants were asked to answer a call while walking.

  • In the last scenario, the participants were asked to walk in their usual manner and type the following phrase “the quick brown fox jumped over the lazy ghost.” which contains every letter of the alphabet, spaces and a period at the end of the sentence to indicate the completion of the typing process. This sentence has come widely known within keystroke analysis and has been adopted by many studies (Kambourakis et al. 2016; Lau et al. 2004). This set text provided a controlled variable to ensure a similar amount of data for all participants.

In this scenario, participants had the choice to use any application that requires typing as long they used the designed keyboard. They also were allowed to make mistakes and use the backspace key for any corrections. It should be mentioned that each scenario had to be repeated five times and that the participants activated the accelerometer when they started walking and stopped it upon their arrival at the end of the corridor.

4.2 Gait data pre-processing and feature extraction

4.2.1 Data interpolation

As it is known, only when forces acting on the three axes of the smartphone have considerable change does the accelerometer sensor report output values, which enables low power consumption. Hence, acceleration signals are acquired with variable sampling rate. In our study, the sampling rate of the device is not stable which causes non-constant time interval between two successive samples. Therefore, the acquired signal is resampled to 50 Hz using Linear-interpolation to correct the irregular time interval problem.

4.2.2 Noise elimination

Gait signal acquirement using the built-in accelerometer sensor is sensitive to noise. Mobile accelerometers produce numerous noises as compared with standalone sensors since its functionality is fully governed by the mobile OS layer (Hoang et al. 2013). Therefore, noise must be eliminated to improve the quality of the acquired signal. In our study, first Outliers observations are detected and removed from data, then a FIR low-pass filter with passband cutoff frequency fp = 0.9 Hz designed using MATLAB is applied to the acquired signal. The Low pass filter is the most frequently used because it can implement linear-phase filtering which means that the filter has no phase shift across the frequency bandFootnote 1.

4.2.3 Segmentation and feature extraction

In this paper, the adopted segmentation method is based on a sliding-window algorithm where sliding-window-based methods are most commonly used in activity recognition studies (Bersch et al. 2014; Niazi et al. 2017; Ravi et al. 2005). The raw data is divided into windows (i.e., segments) with a fixed length of 10 s and 50% overlap. Then a feature extraction method is applied to construct a feature vector that is later fed to the classifier. A combination of features from both time and frequency domains are extracted from four components, as shown in Fig. 3 which represents the accelerometer data reading for two randomly selected users on the X-axis, Y-axis, Z-axis and the magnitude-axis axyz Where axyz is defined as:

Fig. 3
figure 3

Example of data samples captured from two randomly selected users

$${a_{xyz}}=\sqrt {a_{x}^{2}+a_{y}^{2}+a_{z}^{2}} .$$
(1)

Frequency-domain features can be derived from the Fast Fourier Transform (FFT) performed on each window for the four types of acceleration. Multiple statistic features set includes maximum, minimum, mean, median and standard deviation values are derived from each type of acceleration that is mentioned above within each window for both time and frequency domains. We extended this set by adding the following features:

  • Average absolute difference

$$MD=avg\left( {\left| {{x_t} - \overline {x} } \right|} \right),$$
(2)

where

xt is the data point in time series of a window.

is the mean.

  • Spectral Centroid: In this case, spectrum refers to the identification window of acceleration values (Singha et al. 2017). For the four types of acceleration. The spectral centroid of each window is calculated as in following:

    $$C=\frac{1}{l}\sum\limits_{{k=1}}^{l} {{x_{tk}} * {f_{tk}}} /l,$$
    (3)

where

xtk is the data point in time series of a window

ftk is the data point in frequency series of a window

l length of the window

  • Cross-correlation refers to the correlations between the entries of two vectors. In this paper, three types of correlation are calculated.

Corrxy Cross-correlation of x axis and y-axis

Corrxz Cross-correlation of x axis and z-axis

Corrxz Cross-correlation of y axis and z-axis

  • Energy: is the normalized summation of absolute values of Discrete Fourier Transform of a windowed signal sequence.

  • The first ten FFT coefficients:

$${X_k}=\sum\limits_{{n=0}}^{{N - 1}} {{x_n}} {e^{ - \frac{{i2\pi kn}}{N}}},\,\,\,\,\,\,\,k=0, \ldots 9$$
(4)
  • The first ten DCT coefficients:

$$\begin{gathered} {X_k}={w_k}\sum\limits_{{n=1}}^{N} {{x_n}{{\cos }^{\frac{{\pi (2n - 1)(k - 1)}}{{2N}}}},\,\,\,\,\,\,\,\,\,\,\,\,\,k=1,...9} \hfill \\ where \hfill \\ {w_k}=\left\{ \begin{gathered} \frac{1}{{\sqrt N }}\,\,\,\,k=1 \hfill \\ \sqrt {\frac{2}{N}} \,\,\,\,\,2 \leq k \leq N \hfill \\ \end{gathered} \right\}. \hfill \\ \end{gathered}$$
(5)

4.3 Keystroke dynamics data pre-processing and feature extraction

Once the keystrokes input data has been collected, data is cleaned by eliminating missing data. Various feature extraction methods are used in keystroke analysis research where the most widely-employed features in keystroke dynamics focus on the timing information such as the moment when the keys are pressed and released. In this work, the following timing features have been extracted:

  • Digraphs: which are the time latencies between two successive keystrokes (Zhong and Deng 2015), including dwell time (holding time of a key) and flight time. The following types of latencies are used:

  • Down–up (hold time) (DU): is the time interval between pressing and releasing the same key.

  • Down–down key latency (DD): is the time interval between a key press and the next key press.

  • Up–down key latency (UD): is the time interval between the release of a key and the pressing of the next key.

  • Up–up key latency (UU): is the time interval between the release of a key and release of the next key.

  • Trigraph: is the interval time between every other successive key press.

In our study, additional non-timing features are extracted from keystroke data, such as

  • Pressure: the pressure exerted on the keyboard when the key is pressed.

  • Size of the touch area: size of the touch area when the user’s finger presses a key.

  • Typing speed: the average time to press and release a key.

  • Typing error: the number of times the backspace key is pressed.

4.4 Fusion method

After preprocessing and analyzing the acquired gait and keystroke dynamics data, the constructed feature vectors are combined to get a final feature vector that will be fed to machine learning classifier to determine if the user is genuine or an imposter. Feature level fusion is believed to be more effective owing to the fact that a feature set contains richer information about the input biometric data than the matching score or the output decision of a classifier (Ross and Jain 2004). A simple feature fusion method is to concatenate various feature vectors to a single feature vector. Let X = {x1, x2, …., xm} and Y = {y1, y2, ., yn} represent the gait and keystroke feature vectors respectively, the resultant feature vector Z can be obtained by concatenating the normalized vectors X′ and Y′, then applying feature selection on the fusion feature vector. In this paper, the concatenated feature vector is normalized by applying the Min–Max normalization technique where all values are scaled within the range − 1 to 1.

4.5 Feature selection

The feature selection approaches aims to select a small subset of features that minimize redundancy and maximize relevance to the target such as the class labels in classification (Tang et al. 2014). In this study, Fusion of gait and keystroke vectors through concatenation produces a feature vector with a large dimension leading to increasing complexity of the classifier. Therefore, the sequential floating forward selection (SFFS) Algorithm (Somol et al. 1999) is applied to perform feature selection on the resultant vector. The SFFS algorithm is an extension of the Sequential Forward Selection (SFS) algorithm where it consists of an additional forward or backward step to remove features once they were included or excluded so that a larger number of feature subset combinations can be sampled. We have used selected features instead of using all features to decrease the training time of the algorithm, taking into consideration the susceptibility of smartphones to memory and computational costs. Although reducing the feature subset into 24 features has decreased the accuracy of the proposed method (0.9%), it is still acceptable when looking at its gains, such as lower memory and processing time costs (the time taken to build a model decreased from 72.478 s to less than 1 s). Table 1 illustrates components of the final feature subset with the information gain of each feature. The dimension of the final feature vector is reduced to 24 features by using SFFS algorithm.

Table 1 List of the final feature subset

5 Evaluation

5.1 Database description

The data was collected from 20 participants in a single session under a controlled environment using a Xiaomi 2S mobile phone. A total of 63,500 samples from the accelerometer sensor and more than 8600 keystrokes were collected from all participants. After the data preprocessing task, a dataset of 24 features was constructed with a separate file for each user per scenario. In total 80 files were created, where each file contained samples of the genuine user and samples of all the remained 19 users which are considered as impostors. Each data row is composed of the 24 features and the binary representation of the genuine and imposter classes ‘TRUE’ and ‘FALSE’, respectively.

5.2 Evaluation of performance

With the aim of selecting the most applicable model for the proposed authentication system, experiments were conducted considering various classifiers. We use 10-fold cross-validation technique to evaluate the performance of the learning models which is based on partitioning the dataset into equally sized folds (groups of instances) where each fold gets the opportunity of appearing in both training and test datasets. Five popular algorithms implemented in Weka (Holmes et al. 1994) are considered in this work: support vector machines (SVM), random forest (RF), random tree (RT), Naïve Bayes (NB) and multilayer perceptron (MLP). To find out how effective are the learning models considered in this study, we used different statistical metrics taking into account that the authentication task is a binary classification problem, in which the system accepts or rejects the user identity. Therefore, we make use of FAR and FRR rates to show the proportion of imposters and authorized users that are incorrectly accepted and rejected, respectively, by the proposed biometric system. In addition to EER rate and accuracy metrics.

Table 2 summarises the experimental results obtained over accuracy and EER metrics across four different scenarios (as detailed in Sect. 4) and Fig. 4 shows the average FAR and FRR values of each classifier per scenario. The obtained results demonstrate the efficiency of the proposed multimodal authentication system using the MLP classifier, which achieved the highest accuracy of 99.11% with the average FAR%, FRR% and EER% values of 0.684, 7 and 1, respectively. The MPL classifier outperforms the NB and RF classifiers which only reached an acceptable accuracy because the FRR is still high. Whereas, SVM and RT achieved the lowest accuracy in comparison with other classifiers with a high FRR as well. This issue is the result of the imbalance between genuine and imposter class data. In order to compare the EER between different scenarios, we performed different test procedures such as Chi square test for testing independence and Marascuilo’s test for testing equality of several proportions. As shown in Table 3, the critical value of χ2 with 12 degrees of freedom is 12.026 (P value < 0.001) which indicates a significant difference of EER values among the results achieved under different scenarios. By applying Marascuillo test, the comparisons including (p1–p3), (p2–p3), (p3–p4) are significantly different from each other where p1, p2, p3, p4 refers to scenario 1, scenario 2, scenario 3, and scenario 4, respectively. Whereas, the differences between the remaining scenarios are not statistically significant.

Table 2 Performance of each classifier per scenario
Fig. 4
figure 4

The average FAR and FRR values of each classifier per scenario

Fig. 5
figure 5

Multimodal authentication with a single feature

Table 3 Significance test of EER

Moreover, the performance of gait authentication which used all of the extracted features is evaluated from the first three scenarios where participants had been asked to walk while holding the phone during various activities. Acceptable results for the three scenarios, with average EER% values of 9.73, 6.82 and 3.34 respectively, had been obtained when using the RF classifier. To study the effect of the adopted fusion method, we also evaluated the performance of keystroke dynamics separately. Table 4 shows the obtained results.

Table 4 Performance of keystroke dynamics

5.3 Identification result

Based on the encouraging results obtained in authentication mode, we conducted an experiment to evaluate the effectiveness of the proposed method under identification mode as well. Table 5 illustrates the experimental results for each classifier. It can easily be observed that the SVM classifier achieved the best results in identification. More specifically, the average accuracy and EER% values are 97.5 and 0 respectively. Our experimental results under both authentication and identification mode, are competitive with the results stated by (Do et al. 2014). Different from their work, we propose an energy efficiency model that requires less sensors reading to authenticate the smartphone user. Moreover, the effectiveness of our proposed model is evaluated using a real multimodal dataset collected under realistic acquisition scenarios. Finally, the effectiveness of each feature of the final selected subset is also examined independently for both authentication and identification mode, Figs. 5 and 6 illustrate the impact of each feature on EER value of the five classification models used in this study.

Table 5 Performance of the proposed multimodal system under identification mode
Fig. 6
figure 6

Multimodal identification with a single feature

Fig. 7
figure 7

The experimental settings

5.4 Realistic usage scenarios

The smartphone user may utilize his/her device under different conditions, such as stress and fatigue, or on different types of ground. Therefore, the study also verifies the effectiveness of the proposed method under several walking and typing conditions. To evaluate the efficacy of our approach across a variety of real-life conditions, in the second experiment within our study consisted of only two subjects (1 male, 1 female). The two volunteers were in good health. In this experiment, we address a series of real-life walking and typing conditions: fatigue, walking in high-heel shoes, and walking on different types of ground (flat and level, grassy and uneven forest terrain). Data were collected from participants on two separate days. On the first day, data was gathered from the subjects throughout the day under different levels of fatigue (morning, noonday, evening). On the second day, the two participants were asked to walk and type in three distinct experimental settings, Fig. 7 shows the different types of grounds used in this experiment. Finally, the woman was asked to wear high-heel shoes (100 mm heel height) and type while walking in a straight corridor. In total, 45 iterations were completed by the woman and 40 iterations by the man. Table 6 reports the accuracy and EER values of participants under the aforementioned conditions. Results suggest that the change of user conditions or environmental grounds does not affect the performance of the proposed system. Many research works such as (Ulinskas et al. 2018, 2017) have reported that a person’s typing characteristics could be effected by the level of fatigue during the day. We observe that our approach overcomes this issue; by using multimodal traits it is highly unexpected that the system would be affected by the above-mentioned conditions. However, some diseases such as Parkinson’s can affect the walking and typing rhythm of the user which leads to a significant change in the characteristics of these two biometrics. In future works, we aim to address this by expanding the number of participants and taking into consideration different health conditions in order to provide even more accurate results.

Table 6 Performance of the system under different conditions

5.5 Resistance against attacks

From the results of the aforementioned experiments, we validate that multimodal biometric-based gait and keystroke dynamics represent a reliable identifier of the smartphone user. However, walking and typing behaviors can be easily observed and impersonated by an attacker. To evaluate the security strength of the proposed method against various attacks, we designed a real-world experiment which included 10 participants (6 males, 4 females). The participants were randomly selected. Five of them had already participated in the previous experiment, whereas the remaining five were new volunteers without any prior knowledge of the system. In same-gendered pairs, each subject played either the role of an attacker or victim and then exchanged roles with their partners. There were two types of attacks considered in the experiment:

  • Zero-effort attack: the attacker has no prior knowledge about the victim’s behavior, he randomly tries to type and walk using the victim’s smartphone.

  • Minimal-effort mimicking attack: the attacker has to observe the victim’s behavior before trying to mimic him/her. The attacker was asked to watch the target walking and typing as many times as they wanted, to focus on his/her behaviors, and then to try to mimic him /her by walking side by side.

Before starting the experiment, we first collected data from the victims; they were asked to walk at their usual pace and type using the same sentence provided in the previous experiment (see Sect. 2). Then each attacker was asked to make 20 attempts per attack. Every participant in this experiment executed 20 rounds as a victim and 40 rounds as an attacker, in total, 80 × 10 trials were done.

To estimate the FAR values, we matched the mimicked gait samples of the attacker to the victim’s gait samples, Fig. 8 shows the FAR values of each attacker for the zero-effort attack and minimal-effort attack. An average FAR value of 0.112% for the ten attackers under the zero-effort attack and 0% under the minimal-effort attack. The results demonstrate the resistance of the proposed method against these types of attacks. Moreover, we noted no significant difference between FAR values of the two examined attacks which proved that imitating the target’s typing and walking behavior did not give the attacker a higher chance of matching the victim’s template.

Fig. 8
figure 8

Performance of the proposed multimodal system against zero-effort attack and minimal effort mimicking attack

It should be mentioned that all participants declared it difficult to emulate a target’s walking and typing manner at the same time. Focus on behavior impersonation lead to failing to simultaneously remain focused on the second behavior of the target, which validates the hypothesis that multimodal systems generally provide higher levels of security against attacks.

6 Discussion

The evaluation of the proposed method showed a high-security level (99.1% accuracy). The results from our experiment measured the security strength of the smartphone-based gait and keystroke authentication system against zero-effort attacks and minimal-effort mimicking attacks and demonstrated that the proposed system’s ability to resist such types of attacks. However, the security level alone cannot measure the success of an authentication system, evaluating usability is also an essential part of the system. Therefore, we developed a study questionnaire containing six questions in a 5-point Likert scale (from strongly agree to strongly disagree, respectively). After completing the data collection task, participants were asked to answer the questions. We first asked the smartphone users whether they preferred to use behavioral authentication methods over the traditional authentication methods, such as passwords and patterns, for smartphone authentication. Of the participants, 86% agreed or strongly agreed that they preferred to use behavioral authentication systems over traditional ones. Afterwards, 92% of participants agreed or strongly agreed that the proposed system was easy to use. Of the users, 68% reported that they often type while walking, whereas 20% of them disagreed which meant it was not convenient for them simultaneously type and walk. The balance between security level, time, and energy consumption was also questioned where 40% of users had declared that it was acceptable for them to decrease the security level of the system to gain lower time and memory consumption whereas 40% of them disagreed or strongly disagreed with having a decrease in the security level. Moreover, 44% of the volunteers agreed or strongly agreed with the continuous sensing’s draining impact on the battery life as long as they received a high-security system; however, 28% of them disagreed. Finally, participants were asked about the security of the proposed system. Of participants, 68% of them agreed or strongly agreed that it was difficult to attack the system, while only 8% of users agreed or strongly agreed that the system could be easily attacked. We observed that user preference differs from person to person, where some participants considered security level as the most important criteria in authentication systems whereas the rest preferred not only high levels of security but also convenient and fast systems. Based on this analysis, we can say that the proposed smartphone-based gait and keystroke authentication system might have high acceptance rates by real life users (Fig. 9).

Fig. 9
figure 9

Preferences of the smartphone users

7 Conclusion and future works

Continuous authentication methods based on user behavior have been widely used to enhance smartphone security. In this paper we have proposed a new continuous biometric multimodal authentication system for smartphone users. Our approach is based on analyzing gait signals and keystroke dynamics acquired from built-in smartphone sensors, and then applying a fusion method on the acquired biometrics to build a final profile for user authentication purposes. A series of experiments consisting of various realistic acquisition scenarios was conducted with 20 participating subjects.The achieved results in terms of FAR = 1.68, FRR = 7, EER = 1 and accuracy 99.1% are very promising for further investigation in designing enhanced authentication systems on smartphones. The security strength of the proposed system was investigated against two types of attacks, the zero-effort attack and minimal-effort mimicking attack. Our evaluation shows that the proposed method is robust and secure regardless of the level of knowledge about the target’s behavior. While smartphone use does happen in positions beyond those considered in this study (sitting, standing, lying in a bed, etc), this can be addressed by enhancing the system with a seamless activity recognition step to detect the user’s current activity and provide the right model based on spotted activity. In future work, we plan (1) to evaluate the performance of the method through expanding the participant base (2) to include more complex scenarios for data collection (3) to apply advanced segmentation methods and extract new features to improve the accuracy of the proposed multimodal biometric system.