Keywords

1 Introduction

Human Interaction Proof (HIP) schemes (or Completely Automated Public Turing Test to tell Computers and Humans Apart - CAPTCHA) are common and widely used security defense mechanisms in online services [1]. HIP schemes require users to prove that a human user is interacting with the system and not a malicious software through a challenge-response test, aiming to keep online services protected from malicious automated software agents [2]. The design of an efficient and effective HIP scheme is an inevitable tradeoff between usability and security. Increasing the HIP’s challenge difficulty leads to improved security of the mechanism, however, usability is significantly decreased [3, 4, 24, 30]. Therefore, numerous works focused on providing a better tradeoff between usability and security of these mechanisms [24,25,26,27,28,29,30,31,32,33,34,35,36]. Current HIP implementations can be broadly categorized as text-recognition HIP schemes, which require users to recognize a set of distorted textual characters, and image-recognition HIP schemes, which require users to solve image puzzle problems (e.g., identify a set of images among a larger set) [5].

Nowadays, one of the most commonly used HIP schemes is Google’s reCAPTCHA (Fig. 1) [6], which aims to minimize users’ cognitive burden through implicit user interaction data collection methods. In particular, the mechanism uses intelligent techniques to analyze the users’ interaction data on a given Website to implicitly infer that a human interacts with the service without asking the user to solve a challenge-response test. Nonetheless, in cases in which the mechanism is not confident on the data accuracy, a fallback image-recognition task must be solved by the user. This fallback task typically splits an image into a 3 × 3 grid, and asks the user to select the segments of the grid that contains the requested information (e.g., cars, traffic lights, cats, etc.).

Fig. 1.
figure 1

Google’s reCAPTCHA [6] mechanism.

Research Motivation.

From a cognitive processing perspective, the image-recognition HIP task requires visual information processing, and research indicates individual differences in such information processing, which suggest that individuals have an inherent and preferred mode of processing information either holistically (globally) or analytically (locally) [9, 10]. Among a plethora of cognitive processing differences theories, this work focuses on the field dependence-independence cognitive style theory [10], which is an accredited and widely applied model [11,12,13,14,15] that highlights human cognitive differences into Field Dependent (or Holistic) and Field Independent (or Analytic). Evidence suggests that Holistic and Analytic individuals have differences in visual perception and visual information processing [7, 10,11,12,13,14,15]. While Holistic individuals view the perceptual field as a whole and are not attentive to detail, Analytic individuals view the information presented by their visual field as a collection of parts and tend to experience items as separate from their backgrounds.

Given that such human cognitive differences exist, we believe that current “one-size-fits-all” approaches employed in image-recognition fallback HIP schemes might favor a certain type of cognitive style group (Holistic vs. Analytic). Hence, in this paper, we investigate whether human cognitive differences of visually processing information influence users’ visual behavior when interacting with an image-recognition HIP task. For doing so, we conducted an eye tracking study (n = 46) in which users solved an image-recognition HIP task. Analysis of results revealed several main effects of human cognitive differences towards user interaction and visual behavior in image-recognition HIP schemes.

2 User Study

2.1 Research Questions

  • RQ1. Are there differences in time to solve the image-recognition HIP challenge between Holistic and Analytic users?

  • RQ2. Are there differences in time to explore the image-recognition HIP between Holistic and Analytic users?

  • RQ3. Are there differences in users’ visual behavior while exploring and solving the image-recognition HIP challenge between Holistic and Analytic users?

2.2 Study Instruments

Image-Recognition HIP Mechanism.

We developed a Web-based image-recognition HIP mechanism (Fig. 2), in which an image is segmented in a grid of 3 × 3 smaller parts. The instructions of the task are displayed above the grid and the submit button is displayed below the grid. Users are asked to select all squares that contain the requested information (e.g., a window) in order to solve the challenge. Then, users are requested to click on the submit button to validate their solution. If the provided solution is incorrect, an error message is displayed to instruct users to retry.

Fig. 2.
figure 2

An example image-recognition HIP challenge.

Apparatus.

The study was conducted using an All-in-One HP personal computer with a 24” monitor at a screen resolution of 1920 × 1080 pixels. To capture the eye gaze metrics, we used the Gazepoint GP3 video-based eye tracker [16]. No equipment was attached to the participants.

Eye Gaze Metrics.

Following common practices, we selected fixation count as suggested in [8, 17], which is the total number of fixations during which the eyes of a user focus on a certain item within the surroundings.

Human Cognitive Factor Elicitation.

Users’ holistic and analytic characteristics were measured through the Group Embedded Figures Test (GEFT) [18], which is a widely accredited and validated paper-and-pencil test [11,12,13,14,15]. The test measures the user’s ability to find common geometric shapes in a larger design. The GEFT consists of 25 items. In each item, a simple geometric figure is embedded within a complex pattern, and participants are required to identify the simple figure by drawing it with a pencil over the complex figure. Based on a widely applied cut-off score, participants that solve less than 12 items are considered to have a holistic cognitive style, while participants that solve greater than or equal to 12 items are considered to have an analytic cognitive style.

2.3 Sampling and Procedure

Participants.

We recruited 46 participants that were undergraduate university students. We note that two users were outliers and did not have sufficient eye tracking measures, and were thus excluded from the analysis, resulting in a final dataset of 44 users. To increase the internal validity of the study, we recruited participants that had no prior experience with image-recognition HIP schemes, as assessed by a post-study interview.

Experimental Design and Procedure.

We adopted the University’s human research protocol that takes into consideration users’ privacy, confidentiality and anonymity. All participants performed the task in a quiet lab room with only the researcher present. To avoid any experimental bias effects, no details regarding the research objective were revealed to the participants until the end of the study. The user study involved the following steps: i) participants were informed that the data collected during interaction with the HIP mechanism would be stored anonymously and would be used only for research purposes; ii) users signed a consent form and completed a questionnaire on demographics; iii) an eye-calibration process followed; and iv) participants were then requested to solve an image-recognition HIP challenge in order to access an online service. Aiming to increase ecological validity of the user study, we applied the HIP challenge as a secondary task of user interaction. Finally, a post-study interview was conducted to get further insights on the users’ interactions and experiences with the HIP scheme.

3 Analysis of Results

Data are mean ± standard deviation, unless otherwise stated. There were two significant outliers in the data that were excluded from the analysis, as assessed by inspection of a boxplot. Figures 3, 4 and 5 illustrate the summary of results; the times to solve the image-based challenge, the times to explore the image-based challenge, and the number of fixations during user interaction with the image-based challenge respectively.

Fig. 3.
figure 3

Time to solve the HIP challenge indicating a tendency of Analytic users requiring more time to solve the challenge compared to Holistic users.

3.1 Differences in Time to the Solve Image-Recognition HIP Challenge Between Holistic and Analytic Users

To investigate RQ1, an independent-samples t-test was run to determine if there were differences in time to solve the HIP task between Holistic and Analytic users (Fig. 3). There was homogeneity of variances, as assessed by Levene’s test for equality of variances (p = .058). Results revealed that Analytic users needed more time to solve the HIP task (9.78 ± 5.45 s) than Holistic users (8.3 ± 3.6 s), however this difference was not statistically significant with a difference of 1.47 s (95% CI, −4.25 to 1.29), t(42) = −1.077, p = .28.

3.2 Differences in Time to Visually Explore the Image During Solving the Image-Recognition HIP Challenge Between Holistic and Analytic Users

To investigate RQ2, an independent-samples t-test was run to determine if there were differences in time to visually explore the image between Holistic and Analytic users (Fig. 4). There was homogeneity of variances, as assessed by Levene’s test for equality of variances (p = .246). In line with time to solve, results revealed that Analytic users spent more time to explore the image (7.33 ± 4.09 s) than Holistic users (5.19 ± 2.95 s), a statistically significant difference of 2.13 s (95% CI, −4.28 to 10.75), t(42) = −2.008, p = .051.

Fig. 4.
figure 4

Time to explore the HIP challenge indicating that Analytic users require more time to visually explore the challenge compared to Holistic users.

3.3 Differences in Eye Gaze Behavior During Solving the Image-Recognition HIP Challenge Between Holistic and Analytic Users

To investigate RQ3, we conducted two analyses with the number of total fixations and number of revisits on fixations as the dependent variables. We first investigated whether there were differences in total number of fixations between Holistic and Analytic users (Fig. 5). A Welch test was run due to the assumption of homogeneity of variances being violated (p = .001). Results revealed that Analytic users generated more fixations while exploring the image (29.45 ± 14.31) than Holistic users (20.2 ± 6.48), a statistically significant difference of 9.24 (95% CI, −15.81 to −2.66), t(25.448) = −2.669, p = .013. We further run a Welch t-test to determine if there were differences in number of AOI (Areas of Interest) revisits between Holistic and Analytic users due to the assumption of homogeneity of variances being violated (p < .001). Results revealed that Analytic users had more AOI revisits while exploring the image (17.65 ± 10.84) than Holistic users 10.12 ± 4.36), a statistically significant difference of 7.52 (95% CI, −12.4 to −2.64), t(24.114) = −2.911, p = .008.

Fig. 5.
figure 5

Number of generated fixations indicating that Analytic users produce significantly more fixations than Holistic users.

4 Main Findings

The analysis of results revealed several main effects of human cognitive differences (holistic vs. analytic) towards user interaction and visual behavior of image-recognition HIP schemes. Next, we summarize the main findings of the study.

Finding A.

Analytic users required more time to solve the image-recognition HIP challenge compared to Holistic users (95% CI, −4.25 to 1.29; t(42) =  −077, p = .28), which can be attributed to their analytical approach in information processing since Analytic users visually explored and processed more attention points compared to the Holistic users.

Finding B.

Analytic users spent significantly more time to visually explore the image-recognition HIP challenge compared to Holistic users (95% CI, −4.28 to 10.75; t(42) = −2.008, p = .051). Such a finding is in line with [11], which suggested similar effects in image-recognition graphical authentication schemes.

Finding C.

Analytic users fixated cumulatively on more attention points (95% CI, -15.81 to −2.66; t(25.448) = −2.669, p = .013) and had a significantly higher fixation count on attention point revisits than Holistic users (95% CI, −12.4 to −2.64), t(24.114) = −2.911, p = .008). This can be explained by their analytical approach in visual information processing, and hence generated more fixations than Holistic users who followed a more global approach in viewing the image grid.

5 Conclusions and Future Work

This paper presents the results of a cognitive-centered research endeavor, which investigated human cognitive differences in information processing and their effects on users’ visual behavior and interaction in image-recognition HIP schemes. For this purpose, an eye tracking study was designed, which entailed a psychometric-based survey for eliciting the users’ cognitive processing characteristics, and an ecological valid interaction scenario with an image-recognition HIP task.

The findings underpin the value of considering human cognitive differences as an important human factor, in both design and run-time, to implement more effective HIP mechanisms and to avoid deploying image-recognition HIP schemes that unintentionally favor a specific group of users based on the designer’s decisions. Specifically, results revealed that Analytic users spent more time to interact and explore the image-recognition HIPs, as well as generated significantly more fixations during interaction compared to Holistic users, which can be explained by the Analytic users’ inherent way of processing information using local information processing streams and paying more attention to detail.

Despite our efforts to keep the validity of the study, some design aspects of the experiment introduce limitations. First, we used a specific background image. Although users’ choices may be affected by the content and complexity of the image [22, 23], we provided images of the most widely used image categories (depicting a specific scenery and people [19,20,21]). Expansion of our research will consider a greater variety of image categories in order to increase the validity of the study. Moreover, considering the controlled in-lab nature of the eye tracking study, the users’ visual behavior and performance might have been influenced, however, no such comment was received from our participants at the informal discussions that followed the task completion.