1 Introduction

With the rapid growth network technologies, Internet services including cloud computing, email services, online registration, online voting, chat rooms, weblogs, and online games are constantly being developed. Because most of the services are free of charge, they have become prone to attacks. Automatic scripts and bots have been established to create free accounts, send spam, cheat in online games, and vote remotely. Although some authentication studies [2, 14, 19] have focused on helping to recognize users, privacy issue has become another burden. Human Interactive Proofs (HIPs) or completely automated public turing tests to tell computer and human apart (CAPTCHAs) [17, 21, 23] are widely used to resist these types of attacks. A CAPTCHA is a type of challenge-response test used in computing as an attempt to ensure that a response is generated by human beings. The process usually involves a computer asking a user to complete a simple test that the computer is able to grade. These tests are designed such that they are easy for a computer to generate but difficult to solve. However, owing to the limitations of the image sets and the types of response, they can be easily broken by recreating the entire database [8, 12, 17] or using image recognition [5, 18, 20]. Alternatively, text-based CAPTCHAs, require the user to translate an image or sound of words, but it still appears to be easily broken [5, 6, 11, 13, 24]. Although it is possible to systematically enhance the security of existing CAPTCHAs, the addition of noise distortion and obfuscation techniques would make characters recognition difficult for humans. Studies have shown that even computers can sometimes do much better than human beings [4, 7]. In addition, designs based on both text and speech recognition are language dependent.

On the other hand, traditional CPATCHAs, are designed for personal computers, which have a large screen and physical keyboard and mouse, as the input may not be suitable for handset devices. The examples show a traditional CAPTCHA on a handset device (i.e., iphone4s), which is inconvenient for users to operate because most of the submit buttons and challenge images are prohibitively small owing to the limitation of screen size and virtual keyboard (Fig. 1). In most cases, users need to zoom in/out and move up/down using different gestures to reveal and complete a challenge. Because the number of handset device users continues to undergo rapid growth, the CAPTCHA design for mobile users has become one of the most important features.

Fig. 1
figure 1

Traditional CPATCHAs on iPhone4S

In this study, we focus on more advanced human cognitive process abilities and propose a new scheme based on game-based image semantics named GISCHA. The GISCHA may be as simple as a rolling ball or puzzle games, which can be easily operated using only simple arrow keys, mouse movements, gestures and accelerometer. We believe that GISCHA has an extremely high resistance to automated malware attacks because it is considered to be nearly impossible for computers to reach such an advanced stage of artificial intelligence, regardless of how advanced the technology may be. Furthermore, it is language independent and background knowledge is not required.

We briefly introduce the related CAPTCHA works in Section 2 and the concept of GISCHA is described in Section 3. In Section 4, we confirm the usage of GISCHA, and discuss our design in section 5. We conclude our paper in Section 6.

2 Related works

A CAPTCHA is based on the Turing test concept, which was developed by Turing [21, 22]. The most common and successful CAPTCHAs are visually distorted images of letters and numbers that can ideally be identified by humans, but not by computers [8, 17]. CAPTCHAs can be grouped into three general categories: 1) Text-based: A string of characters is presented to the user, who is asked to recognize the combination of either words or random alphanumeric characters and punctuations [5, 12]. 2) Image-based: Images are presented to the user, who is normally challenged in the form of identifiable real-world objects, but which can also be presented in the form of shapes [24]. 3) Audio-based: The user is presented with an audio version of a CAPTCHA [1]. Current CAPTCHA implementations are primarily visual-based, and are therefore inaccessible to users who are unable to view the screen. On the other hand, audio CAPTCHAs ask users to interpret spoken audio sounds, thus posing other challenges (e.g., language dependency and dealing with the presence of annoying noises). To defeat automated speech recognition, these audio HIPs use a significant amount of background noise and a variety of speakers, making interpretation difficult. Many sites do not provide audio CAPTCHAs [1], and in this study, we therefore focus only on visual CAPTCHAs.

2.1 Text-based CAPTCHA

The ReCAPTCHA (Fig. 2) [25], which is character-based, recognition-based, and sound-based CAPTCHA system, is one of the most successful examples. However, with the rapid growth of computation power and development of advanced algorithms, most of the text-based CAPTCHAs have been reported to be breached. Studies have shown that more than 85 % of the commercial CAPTCHAs are vulnerable to automated attacks [5, 18]. Studies have also demonstrated that most CAPTCHA schemes are broken if they can reliably segmented [6, 11, 13]. Although the security levels can be enhanced by systematically adding distortion and noise to a text-based CAPTCHA, it becomes difficult for humans to recognize characters (Fig. 3).

Fig. 2
figure 2

An example of ReCAPTCHA

Fig. 3
figure 3

An example of CAPTCHAs with enhanced distortions and noises

2.2 Image-based CAPTCHA

Image-recognition-based CAPTCHAs have been considered as one of the best alternatives to text-recognition-based CAPTCHAs (Fig. 4). Images, which are intuitive to humans and are of a large variety, are rich in information. Based on the type of challenge presented, image-based CAPTCHAs can be categorized into two groups: anomaly-based and recognition-based. The recognition-based CAPTCHAs ask the user to identify the object in images [15] and the anomaly-based CAPTCHAs ask the user to determine the object that does not belong to the images [22]. Early image-based CAPTCHAs include Bongo and Pix [22], which use shapes and labeled images as challenges, and ask users to choose the tag related to all the images. Because of the limited number of shapes and labels, they may suffer from guessing and database reconstruction attacks [8]. Recently, image recognition techniques have also been able to help break the CAPTCHA system [5, 18, 20, 26]. In addition, labeling and objects may be ambiguous because different persons’ perceptions differ (i.e., HotCAPTCHA asks users to select a person found to be most attractive).

Fig. 4
figure 4

An example of an image-based CAPTCHA

On the other hand, Asirra [10], relies more on the human nature, because of the different capabilities of humans and bots to distinguish between cats and dogs. However, some images may not suitable for challenges, allowing the machine learning techniques to attack these types of CAPTCHAs [12]. Because of the limited number of image sets, potentially leading to database reconstruction attacks [8], recent CAPTCHAs have been designed to use the image search function provided by Google. Unfortunately, the search results may not be completely reliable because the returned images may additional confusing objects (e.g., images containing both dogs and cats). In addition, hackers can also obtain the correct answers by sending the challenged images to Google’s search engine.

Another example of CAPTCHAs is based on the image semantics. A randomly ordered four-panel cartoon is used, and the user is asked to reorder them as a human would (Fig. 5) [26]. It is almost impossible for the computer to understand the humor intended by the human beings. However, it is not yet understood how to properly make use of image semantics, called ambiguous semantics. Besides, Fig. 5 shows that it may fail to pass the challenge if the user is not familiar with Japanese language and culture. In addition, a four-panel cartoon CAPTCHA may suffer from brute force attacks because there are only 4! Combinations, and databases with four-panel comics may be limited in size.

Fig. 5
figure 5

A four-panel cartoon CAPTCHA

So far, existing CAPTCHAs are also suffering from defects involving language dependency and annoyance. Most of the existing CAPTCHA systems require a short paragraph to explain how it works, and provide tough challenges that lack native language support. Hence, we have to find another more advanced human cognitive processing ability to tackle this challenge.

3 GISCHA

3.1 Concept

In this study, we focused on the ability to solve games, which is considered to be one of the most advanced human cognitive processing abilities. Through simple web-based games, we can easily identify human because it is almost impossible for computers (Artificial Intelligence, AI) to understand the meanings of all types of games. In addition, simple games are considered to be solvable by all age groups without additional background, and they are also language independent.

As a specific example, we have proposed a GISCHA using the simple rolling ball game (Fig. 6a). In this example, there is a rolling ball and destination holes with different shapes; the user who is able to move the ball to the destination hole shaped as a circle is identified as a human. However, for a computer, it would be difficult to understand the meaning of the rolling ball game and make corresponding movement. Moreover, even if image processing capabilities are developed to the level where the computer could recognize the meanings of the images and make correct responses, it would be almost impossible for computers to understand the real meaning of the rolling ball game by adding traps (Fig. 6b) and enticing the user to fake destinations (Fig. 6b), in which destinations have the same shape (circular), but different colors. Moreover, randomly changing the types of paths with barriers (Fig. 6c) gives another example. For human beings, it is easy to associate the ball with the same color if they recognize that all destinations have the same shape. Fig. 6c requires basic spatial ability, which is natural to human beings.

Fig. 6
figure 6

Rolling ball game example

3.2 Control system

Performing traditional CAPTCHAs on a handset device may require more time because most CAPTCHAs were designed and displayed on a large screen. Another problem is that they are usually designed to be operated using physical keyboard and mouse, which is also a limitation for users of handset devices [9]. An accelerometer is a device that measures motion inputs and changes its orientation. As of early 2009, almost all new mobile phones and digital cameras were designed with at least a tilt sensor, and sometimes also had an accelerometer for the purpose of auto image rotation, motion-sensitive mini-games, and to compensate for shaking when taking photographs. In this study, we proposed two games, which can be operated by the accelerometer. Fig. 6 gives an example of GISCHA using HTML5 and JavaScript, which can be operated by a traditional keyword and mouse. In addition, we enhanced the control ability of handset devices by using a virtual keyboard and accelerometer. We use the information provided by accelerometers and gyroscopes which is built into modern handset devices. By turning handsets at different angles, one can control the direction of the rolling ball. Fig. 7a shows how to move the ball to the left side, Fig. 7b shows how to move the ball to the right, Fig. 7c shows how to move the ball to the top, and so on.

Fig. 7
figure 7

Control system of GISCHA

3.3 Definition

von Ahn et al. [23] broadly defined the CAPTCHA mathematically for transferring of AI problems into CAPTCHAs. In this study, on the basis of this mathematical model, we deduce a corollary to clarify that GISCHA can be automatically generated. The definitions are as follows.

Definition 1

A test V is said to be (α, β)-human executable if at least an α portion of the human population has success greater than β over V.

Definition 2

A game/problem is a triple P = (S, D, f), where S is a set of problem instances, D is a probability distribution over S, and f : S → {0,1}* answers the instances. Let δ∈(0,1]. We require that for an α > 0 fraction of human H, Pr x ← D [H(x) = f(x)] ≥ δ where Pr(.) is the deterministic programming that results when P uses random coins r.

Definition 3

A game/problem is said to be (δ, τ) − solved if there exists a program A, running in time at most τ on any input from S, such that \( \underset{x\leftarrow D,r}{ \Pr}\left[{A}_r(x)=f(x)\right]\ge \delta \).

3.4 Game problem

Let I be a distribution on images and T be a distribution of image transformations. We assume for simplicity that if i,i′∈[I] and i ≠ i′, then T(i) ≠ T′(i′) for any T,T′∈[T] where [I] denotes the support of I. Let M be a finite set of movements. Let λ:[I] → M compute the movement of a game. The set of problem instances is S I,T  = {t(i):t ∈ [T] and i ∈ [I]}, and the distribution on instances D I,T is the one induced by choosing i ← I and t ← T. Define g I,T,λ so that g I,T,λ (t(i)) = λ(i). Then the game problem GP I,T,λ  = (S I,T , D I,T , g I,T,λ ) involves writing a program that take t(i) as the input and outputs λ(i).

3.5 Definition of GISCHA

A GISCHA instance is described by a tuple G = (I,T,M,λ,τ), where I is a distribution of images and T is a distribution of image transformations that can be easily computed using current computer programs. GISCHA is a CAPTCHA with the following property: any program that has high success over G = (I,T) can be used to solve game problem (GP). The GISCHA verifier works as follows. First, V draws i ← I, and t ← T. Then, V sends to user U the message {t(i),M}, and sets a timer for τ. P responds with an operation mM. V accepts if m = λ(i) and its timer has not expired, and rejects otherwise.

Corollary 1

If G = (I,T,M,λ,τ) is (α,β)-human executable, then G is a (α,β)-CAPTCHA.

3.6 Algorithms

The notations used in this paper are listed as follows:

Notation

Description

Info

The user’s information.

OTPU

One time pass solution sends by the user.

r j

The random factor.

EC(.)

The generator of GISCHA with input Random r.

A➔B: M

Entity A sends message M to entity B.

The proposed scheme is composed of two phases: registration and verification. The registration phase presents EC to the newly incoming user with random factor r j , which contains the colors, arrows and type of the destinations. After the user fills info and completes EC(r j ) with OTPU i , the verification phase will check whether EC(OTPU i ) has been accomplished. If the EC(OTPU i ) output has the value “PASS,” then the server will send the message to the user regarding the success of the login or registration. Otherwise, the server will generate another EC(r j + 1) with newly generated r j+1 .

  1. A.

    Registration Phase

    If a user visits a web service, he/she may register a new account or login to the server.

    1. A1:

      Server → User: EC(r j )

      1. a.

        Generate r j

      2. b.

        Generate EC(r j )

      3. c.

        Send EC(r j ) to User i

    2. A2:

      User → Server: (Info, OTPU i )

      1. a.

        The user i provides his/her information Info for logon or registration on the service.

      2. b.

        Response OTPU i according to EC

      3. c.

        Submit (Info, OTPU i )

  2. B.

    Verification Phase

    Upon receiving the registration or login request from the user, the server verifies the GISCHA response and provides additional services:

    1. B1:

      Server → User: EC(r j+1 ) or PASS

      1. a.

        verify EC(OTPU i )

      2. b.

        if EC(OTPU i ) output fails then the server regenerates r j+1

      3. c.

        render EC(r j+1 ) and sends to user for another retry

      4. d.

        if EC(OTPU i ) output passes then send user login success and redirect web pages

4 Experiment design

In this section, we conducted several experiments to evaluate the first-time pass rate, the average operation time with a different control system, and the preference of the user. We also asked the user to perform the traditional text-based CAPTCHA system for the comparison purposes.

4.1 Subjects

There are 42 participants in this experiment with ages from 15 to 60, including professors, students, engineers and housekeepers. They performed these selected GISCHAs from their own handset devices or using provided iPad minis, which included iphone4S, iphone5, Desire HD, and Nexus7. We also asked participants to perform reCAPTCHA [25] operations on handset devices as a control group.

4.2 Experimental tools and procedures

In Section 3, we clearly defined the requirements, then we proposed two different types of GISCHAs as the experiment tool on the basis of the Corollary. There were three different types of rolling ball games: 1) destinations with different shapes, 2) destinations with different colors, and 3) paths with barriers (Fig. 6). The other game is the Dance Dance Revolution (DDR) game, which is based on the concept of the video game and has two types: 1) arrows with a sequence from top to bottom and left to right, and 2) arrows of type (1) with different colors added (Fig. 8). In Table 1, we summarize the types of experiments and control systems. Because there are different test platforms (e.g., iOS and Android) and different sizes of handset devices (e.g., screen sizes from 3.5 to 7 in.), we have to develop the experimental tools using HTML5, CSS3, and JavaScript technologies to enable cross-platform capabilities with a consistent operating environment. The other challenge is that there may be different sample rates on the platforms of iOS and Android, and so there may be inconsistent rolling speeds while operate the rolling ball games. In addition, the event firing models are also somewhat different for the different Internet browsers installed in these handset devices (e.g., Chrome, Firefox, and Safari).

Fig. 8
figure 8

A DDR GISCHA game

Table 1 Experimental setup with different types

There are six experiments in total: three for testing different types of rolling ball games, two for DDR games and one for reCAPTCHA. We did not explain how their operation, and so the participants played these GISCHAs for the first time. Then, before they passed the challenge, we revealed the first-time pass rate and the learning period. The participants were asked to strictly perform these six CAPTCHAs, and the operation time started when the user touched the canvas and stopped when they pressed the submit button.

4.3 Experimental results

In the experiment, all the GISCHAs had the pass rate of more than 90 %, which indicated that it was easy for human beings to operate. The repeat time indicates that most of the users passed the GISCHAs on the second trial, which implies that it is easy to learn through trial-and- error (Fig. 9). In comparison, when using reCAPTCHA, they need an average of 2.5 times to pass the challenge, which may imply that it is slightly more difficult for the users to recognize the distorted characters. From (Fig. 9), we observe that the average operating time of GISCHAs averages comes to 9.06 s and 17.5 s for reCAPTCHA, which means that the user needs more time to pass the challenge through virtual keyboards, and this may be possibly owing to the limitation of the screen size. The average operating time is not all positively correlated with the degree of operational difficulty because the average operating time of the different methods is not always the same, e.g., the keyboard operation time of DDR game type one is a little bit longer than when using the accelerometer. We also carried out a survey to find out whether users behind the screen are willing to use the GISCHA. The respective experimental results are shown in Table 2.

Fig. 9
figure 9

The average challenge time with virtual keyboard as input

Table 2 Experimental results

5 Discussions

5.1 Semantic ambiguities

One problem associated with the use of semantics is that the retrieved semantic information from an image tends to be subjective and user-dependent. For example, with the image-recognize-based CAPTCHA that asks the user to choose the most beautiful flowers, the result may be different for each person. This intrinsic ambiguity in semantics makes it difficult to generate CAPTCHA challenges using image semantics. Besides, the differences in users’ intellectual growth and their knowledge may also lead to different results. Hence, we propose GISCHA which is based on unambiguous high-level semantics. We focused on the ability to solve games, which is considered to be one of the most advanced human cognitive processing abilities. In addition, simple games are considered solvable for all ages without the need for additional knowledge, and they are language independent. Fig. 10 shows that the user takes longer to finish the challenge when more semantics information is added to the picture, which is a sort of semantic ambiguity. However, the pass rates in Table 2 show that it only takes slightly more time to figure out the ambiguity.

Fig. 10
figure 10

The camparsion of pass rates and repeat times

5.2 Resisting the replay attack

Most of the CAPTCHAs are proposed and deployed with strong assumptions, such as that the communication channel must be secure and protected from any modification. The main reason is that they may suffer from the replay attack.

5.2.1 Scenario 1: Replay attack using eavesdropping

If the CAPTCHA scheme sends a packet that has a HTTP protocol, the challenges of the CAPTCHA and the corresponding password may be transferred in plain text. Because it is possible to eavesdrop on the transmission, the hacker can simply replace both the challenge and answer. For GISCHA, games can be generated on the user side by applying both FLASH and HTML5 technology which means that there is no specific pair of challenge and answer.

5.2.2 Scenario 2: Replay attack using database reconstruction

If the user uses automated programs that can record the corresponding answers, they can breach the CAPTCHA that appeared before by replaying the procedure, which is also named the database reconstruction attack [8]. Applications such as JDownloader and FreeRapid which are used to help download files from cloud stages protected by the CAPTCHA system, are used to construct the database of text-based CAPTCHA challenges. When the user is challenged by a text-based CAPTCHA that was not previously presented, the application will ask the user to input the answer. If the user passes the challenge, the application will record the challenge and answer pair for the next challenge, and so on. In our proposed scheme, it is difficult to realize the actual moves of solution, which are the solutions for different games; this means that it is difficult to determine if the challenge is the same as before and when to begin recording the corresponding answer. Besides, there exists more than one solution for the same game.

5.3 Resisting the brute force attack

Because the info, e.g., password, of the traditional user authentication mechanism is usually fixed and rarely changes; it may suffer from brute force attacks. Some CAPTCHAs [8, 26] may suffer from the same attacks because the challenge and answer spaces are limited. Four-panel comics, for example, may suffer from this type of attack because there exist only 4! combinations. In the proposed scheme, the GISCHA, e.g., EC(r j ) is an one-time permutation or one-time pad (OTP) and a human must be present to answer the EC(r j ). If the user fails to pass the challenge, it will be changed in the next session with newly generated r j+1 , e.g., EC(r j+1 ), which implies that the automated program cannot launch a so-called brute force attack because the correct answer will change every time. We analyzed the two proposed games as follows:

5.3.1 Scenario 1: Rolling ball games

For rolling ball games, the user can operate with only four arrow keys. However, the solution may be different because there exists more than one path to the destination. In our proposed scheme, using Fig. 6a as an example, the shortest path to the destination is going up and to the right using four arrow keys. However, the brute force attack will have to be tried at least 44 different keys before the best case solution is found. The situation may worsen if we add destinations have various colors and barriers.

5.3.2 Scenario 2: DDR games

For DDR games, the user can operate using only four arrow keys. In our proposed scheme, using Fig. 8a as an example, the correct answer is found by first pressing the right arrow, followed by the down arrow, then the left arrow time, and finally the up arrow, implying that they must be answered in sequence. A brute force attack will have to try at least 40 combinations before the correct sequence is found. In addition, the situation may become worse if we add the limitation of the wrong inputs. Besides, we may also add various colors as barriers for more advanced protection.

5.4 Resisting the machine learning

The use of machine learning to classify or recognize objects was an effective attack on Asirra [12], but it will not work on GISCHA because the objects used in a current challenge are uncorrelated with those used in other challenges. The reason why we added berries as the sample is that even though the computer can recognize the sample as a maze, it is almost impossible for a computer to know how to avoid the barriers without physical/spatial cognition; the barriers can be replaced by meanings that are only understood by human beings. This type of processing ability is innate and is common in our daily life. In other words, we can enhance the security by adding new and different types of obstacles by changing the position after every try.

5.5 Operation complexity and language independence

Traditional text-based CAPTCHAs usually ask the user to enter vocabularies and phrases, that may be difficult for foreigners to spell, and which therefore lead to high failure rates. However, image-based CAPTCHAs, may suffer from mislabeling, synonymy and polysemy troubles (Table 3). Because GISCHA was designed to use simple games in order to identify humans, they are designed to be operated by simple and repeatable keys. In other words, GISCHA is language independent both for the challenge and operation instructions, which are suitable for all aspects.

Table 3 Comparison of previously proposed CAPTCHA with GISCHA

5.6 Entertainment and commercial aspects

Traditional CAPTCHAs are annoying to users, which may prevent the marketing expansion of commercial websites. Text-based CAPTCHAs with over distorted images may also increase the number of false results. In the proposed scheme, despite the fact that the proposed method may sometimes require more time for authentication compared to conventional CAPTCHAs using text-based ones, the level of usability experienced by the user is not expected to decrease significantly. In addition, users who use handset devices feel even better while operating GISCHA with an accelerator when walking or taking public transportation. Furthermore, there is a potential to realize commercial value. We demonstrated the rolling ball game on a flat surface, while indicating that these tiles can have other commercial purposes. For example, advertising posters can be used to decorate these tiles. In addition, all types of barriers can be big head dolls for commercial sausage.

6 Conclusion

Traditional CAPTCHAs may increase the sense of annoyance felt by users who have to prove that they are human every time they access the web. However, a CAPTCHA should bring a more pleasant experience to the user. In this study, we proposed a GISCHA scheme that uses games to differentiate between humans and bots. We formally define the limitations and types of games that can be used to develop the GISCHA. We also introduced the concept of using accelerometers which are built into most of the handset devices to control the CAPTCHA for conveniences and to add more fun. We also developed three games to evaluate the usability of our proposed scheme. The results show that the GISCHA is easily operated and language independent. The experiments show that the average response time is 11.6 s and the correct attempt rate is above 90 %. Besides, most of the user can pass the challenge at the first try and the learning period is relatively short.

Currently, there remains scope for improvement in terms of both security and usability, and so we plan to make improvements to the proposed method using experiments to reveal the cognitive load of different games and GUIs. Furthermore, we need to evaluate the extent to which the correct response rate and the total response time for the GISCHA depends on the intelligence of each user, and we attempt to implement the GISCHA in other games that lead to better results.