1 Introduction

Since the early days of the Internet, different services have been provided to serve a variety of users. Service providers aim to allocate these services and resources for end-users [8, 41, 53]. Internet users are increasingly encountering the emergence of malware programs and bots (automated scripts) that increase distribution of malicious information, reduce system performance, and waste resources [5, 17]. In order to eliminate these shortcomings, the CAPTCHA test (Completely Automated Public Turing Test to Tell Computers and Humans Apart) was developed in 1950 by Alan Turing [24]. The intention of Artificial Intelligence was to create a machine able to think as a human person. CAPTCHA represents a security mechanism that is used to distinguish human users from malicious computer programs trying to gain illegitimate access to resources [6, 21, 50, 54]. Hence, its main function is to: protect web sites, applications, interfaces, and services such as Google, prevent spam in blogs, and protect email addresses [43].

CAPTCHAs are used for a variety of online applications such as free email accounts, e-commerce, online polling, chat rooms, and many other interactive online services [6, 8]. There are different categories of CAPTCHAs: text-based, audio, video, and image-based [16, 49]. Image-based CAPTCHA represents one of the most effective solutions for machine attacks that the web sites are faced with.

Recognizing objects and images is normal for humans, but for computers it is too complex [25]. The use of this type of CAPTCHA requires minimal interaction with the keyboard, however, it requires significantly increased server processing and the area of a web page [35], but it is widely applied in many societies [35, 51]. In this research several forms of image-based CAPTCHAs were employed. This type of CAPTCHA is connected to the elements of Human-Computer Interaction and it relates to the choice of one or several images from an offered list based on the requested properties whose solving is affected by human factors. Selection of the appropriate CAPTCHAs represent a major challenge for web administrators.

The authors defined an innovative research model that was applied in the paper through four phases. The purpose of this study was to rank seven different image-based CAPTCHAs that were investigated based on their usability by PROMETHE-GAIA method. The objective of this paper was to investigate the best alternative through two scenarios, in this case, the best ranking CAPTCHA, including the evaluation of three different criteria through the subjective and objective assessment approach. The AHP method was used for the calculation of subjective weights of criteria, and the results were analyzed in scenario 1. Moreover, the Shannon Entropy method was used for the calculation of objective weights in scenario 2.

This study, in addition to the empirical contribution that is reflected in the linking of the scientific literature about CAPTCHAs test and PROMETHEE/GAIA methods which overcome the gap in the scientific literature, also has practical implications in the field of computer science. All web administrators who are dealing with a decision-making dilemma when they want to implement appropriate computer security mechanisms on their websites, can easily make an adequate decision considering a broad variety of indicators of usability using multi-criteria decision-making methods.

The paper is presented in the following sections. Section 2 gives the relevant literature background regarding the introduction and usability of image-based CAPTCHA; Section 3 describes the experimental part and formulation of the four step methodological framework for defining a research topic; Section 4 presents a discussion of the obtained results and analyses the contribution; and, finally, the conclusions and guidelines for the future work are presented in Section 5.

2 Theoretical framework of research

One of the safest and advanced types of CAPTCHA is image-based CAPTCHA. It requires from the users to find and point to the desired image from an image list. Because it is based on image details, it represents an extremely difficult task for a bot to solve [35]. Figure 1 shows an example of image-based CAPTCHA where users have to select all images in which Bill Gates appears [30], and in some cases, users need to select only one image.

Fig. 1
figure 1

Example of image-based CAPTCHA Source: [30]

In scientific literature, the authors considered different approaches of usability of the CAPTCHA which was one of the fundamental issues relating to it. The usability is related to the process of solving the CAPTCHA test by the Internet users, but if humans cannot use it, then the reason for its application does not exist. The usability also refers to measuring efficiencies of solving the CAPTCHA test through response time, number of attempts, and level of difficulty for the computer user. The usability of image-based CAPTCHAs and text-based CAPTCHAs that verify computer users who join the web was analyzed by [35]. The results of this study showed that the respondents preferred ESP-PIX and ASIRRA, and that statistically significant differences between the CAPTCHAs which were based on task completion time did not exist. Kwon and Cha [30] proposed the technique for analyzing the effectiveness of image annotation based on CAPTCHAs. The results of this technique showed a positive impact on the respondents. Yan and El Ahmad [56, 57] investigated usability aspects of the text-based CAPTCHA design. The results showed that this type of CAPTCHA was difficult for foreigners due to users’ language barriers. Table 1 presents the chronological order of overview of the relevant literature concerning the usability of different types of CAPTHCAs (Fig. 2).

Table 1 Literature review of the usability of CAPTCHA
Fig. 2
figure 2

Proposed research model

Based on the detailed analysis of the previous studies, it could be stated that the application of multi-criteria decision methods in combination with the CAPTCHAs has not been considered. The intention of this paper is to overcome this gap and offer a valuable scientific contribution by applying PROMETHE-GAIA methodology which introduces a novel systematic approach to planning and managing the CAPTCHA security mechanism in web pages. In addition, this study also has a practical contribution which is reflected through elimination of the dilemmas that web administrators encounter when they make choices about the application of adequate security mechanisms.

3 Methodology framework for ranking CAPTCHAs

In this paper, the PROMETHEE method (Preference Ranking Organization Method for Enrichment Evaluations) was used for ranking the image-based CAPTCHA according to their usability, from the perspective of the set of Internet users and informatics experts [1, 2, 18]. This method is one of the most prominent MCDM methods which is appropriate for solving very complex decision-making problems [1, 4, 11]. The motive for using the PROMETHEE method for processing the starting data set is in certain advantages of this method over the other MCDM methods. These advantages are reflected in a way of problem structuring, taking into consideration the amount of data that is possible to process, the ability for quantification of qualitative data type, good software support, and presentation of obtained results ([33]; Visual PROMETHEE software, 2012). Analyzing the state of the art of the CAPTCHAs, the authors did not find scientific papers that considered analyzing criteria for ranking CAPTCHAs. Therefore, the significant scientific contribution of this research is reflected in the fact that this methodology can become widely recognized in computer science. The experimental part of the study was conducted through four steps (Fig. 2): (1) problem identification and data collection; (2) determination of criteria; (3) weighting the criteria via AHP and Shannon Entropy methods; and (4) application of the PROMETHEE/GAIA method. The Internet users familiar with CAPTCHA and experts in the field of computer science were included in the study. Respondents evaluated seven different CAPTCHAs image based on three criteria: time to find a solution, a number of attempts, and task difficulty. Two approaches (two scenarios) were used to evaluate the criteria. In scenario 1, the subjective opinion of experts in the field of computer science was examined. The AHP method was used to quantify the data in this scenario. In the second scenario, the objective opinion of the Internet users was quantified with the Shannon entropy method. In the last phase of the research model, the ranking of CAPTCHAs was performed using the PROMETHEE II method, and the GAIA (Geometrical Analysis for Interactive Assistance) plane was used for the graphical representation of the obtained results.

3.1 Data collection

Three hundred and twenty subjects who use Internet daily participated in the experiment. All respondents participated voluntary and anonymously in research through an online form. To avoid being influenced, the respondents were not informed about the scope of the analysis, or the collected data types. Data collection was conducted between November and December 2019 and it consisted of two parts. In the first part of the survey, respondents filled the questionnaire with information which related to their demographic’s characteristics. Each respondent was characterized by: gender, age, level of education, number of years of the Internet experience, and daily Internet usage represented through the number of hours. The second part of the survey focused on the analysis of image-based CAPTCHAs. The task of each respondent was to find a solution for the following seven image-based CAPTCHAs: Animal in wild (CPT 1), House numbers (CPT 2), Picture of CAPTCHA (CPT 3), Animated character (CPT 4), Face of an old woman (CPT 5), Surprised face (CPT 6), and Worried Face (CPT 7). Three metrics were analyzed to judge the usability of the CAPTCHAs which the respondents rated: time to find a solution, number of attempts, and task difficulty. In addition, for each respondent, the solution time (in seconds) to the CAPTCHAs was measured from the time when the task was started by the respondent until its completion. When it came to the number of attempts, all respondents had between 1 and 3 attempts. A smart phone and tablet were used for the measuring. The criteria - task difficulty was rated on a scale form 1 up to 5 (1 - very easy to solve; 2 - easy to solve; 3 - neutral attitude; 4 - difficult to solve, and 5 - very difficult to solve).

3.2 PROMETHEE and GAIA methods

Among the numerous methods of MCDM, outranking methods show rapid progress due to their flexibility to adapt to the most realistic decision making situations. PROMETHEE method first appeared in the analysis of the efficiency of different services in hospitals by D’Avignon et al. [19] and for comparing different teaching projects by Dujardin [20]. It is now widely used in construction, ecology, agriculture, economy, medicine, computer science, etc., by [3, 4, 9, 10, 42, 55].

The PROMETHEE represents an outranking method, for the final set of alternatives, developed by Brans [12]. This method starts with the formulation of alternatives and a set of criteria, then it is formed as a m x n decision matrix [46, 58]. In the PROMETHEE method, it is possible to choose one out of six forms of the preference function (Fig. 3) (Usual, U-shape; V-shape; Level; Linear; and Gaussian) where each form could be described with two thresholds (Q and P). The indifference threshold (Q) shows the largest deviation which the decision-maker considers not to be important, while the preference threshold (P) shows the smallest deviation that is crucial for the decision making [13, 23]. Therefore, PROMETHEE method is based on positive (Φ+) and negative (Φ) preference flows for each alternative in the valued outranking relation to rank the alternatives according to the selected preferences (weights) [12]. Positive flow expresses how much the specific alternative dominates over other alternatives, and negative flow expresses how much that alternative is dominated by the others [32, 37, 52, 60].

Fig. 3
figure 3

List of preference functions [13]

The success of application of the PROMETHEE method is achieved due to the mathematical properties applied in it. Four major steps are usually involved in making decisions based on PROMETHEE method [11, 12, 38, 52]:

  1. Step 1.

    Forming a decision-making table. This table includes alternatives and criteria that can be formed based on cardinal (quantitative) or ordinal (quantitative) data.

  2. Step 2.

    Calculation of the weight of criteria. The weight of each criteria is an appropriate index that expresses the priority of criteria toward each other and it indicates the relative importance of that criteria. The greater weight of a criteria implies greater importance compared to other criteria. Weights are non-entering numbers whose sum is equal to 1.

  3. Step 3.

    Evaluation of the preference model. Based on pairwise comparisons between calculated criteria the deviations are defined. Eq. (1) is used for calculation, where the difference between an evaluated criteria in terms of two a and b alternatives is represented dj.

$$ {d}_j\left(a,b\right)={f}_j(a)-{f}_j(b) $$
(1)

Priority indicators are obtained based on all criteria by applying the weight of indicators calculated in the 2nd step Wj, using Eq. (2) to determine the preference of alternatives a and b.

$$ \pi \left(a,b\right)=\sum \limits_{j=1}^n{W_j}_{\cdot }{P}_j\left(a,b\right);\kern0.5em \sum \limits_{j=1}^n{W}_j=1 $$
(2)

π(a, b) is the preference of option a over option b based on the sum of criteria and it implies that option b is prioritized in some criteria compared to option a. π(a, b) ≃ 0 implies low priority of option a over option b, and π(a, b) = 1 implies the high priority (complete) of option a over option b [7].

  1. Step 4.

    Calculation of the preference flow. In order to rank the alternatives Eqs. (3), (4), and (5) are used to calculate the flow of leaving, entering, and net flows, respectively.

$$ {\varPhi}^{+}(a)=\frac{1}{m-1}{\sum}_{x\in A}\pi \left(a,x\right) $$
(3)
$$ {\varPhi}^{-}(a)=\frac{1}{m-1}{\sum}_{x\in A}\pi \left(x,a\right) $$
(4)
$$ \varPhi (a)={\varPhi}^{+}(a)-{\varPhi}^{-}(a) $$
(5)

Φ+(a) is the leaving flow or output of option a (priority of option a over other alternatives), Φ(a) is the entering flow or the input of alternative a (recessive of option a against other alternatives), and Φ(a) is the net flow of ranking. To facilitate the decision-making process, decision maker usually wants a complete ranking. An alternative with the lowest Φ(a) has the worst ranking and vice versa [11, 37, 38].

Method for visual representation GAIA complements the PROMETHEE ranking method. GAIA matrix consists of the decomposition of the net outranking flows Φ(a) [26, 29]. The matrix data are then processed by a Principal Component Analysis (PCA) algorithm, and then displayed on GAIA biplot [52]. The quality of the GAIA representation is given by the delta parameter (Δ), which indicates the quantity of information reflected in the GAIA plane. As a rule of thumb, values larger than 70% can be considered acceptable: little information is lost and the GAIA plane provides a good, reliable representation of the decision problem [32, 38].

3.3 Calculation of the weights of criteria

For solving MCDM problems, an adequate approach for determining the weights of selected indicators is important. Hence, the weights can be classified into two categories depending on the information source: subjective and objective weights [42]. The weight of each criteria is an appropriate indexthat expresses the priority of criteria toward each other and indicates the relative importance of that criteria.

The weight of criteria was calculated in scenario 1(SC1), through the prism of the subjective weights using Analytical Hierarchy Process (AHP) and the obtained results are shown in Table 3. Three decision-makers who were familiar with the CAPTCHAs were involved in this phase. The AHP method was proposed by Tomas Saaty [44] and its application for determining the weights of the parameters in MCDA was based on comparing pairs of criteria according to the Saaty scale from 1 to 9, which were then used to calculate the weight of normalized vectors. However, the main problem of the subjective approach in determining the weights of criteria was the inconsistency, which occurred mainly due to the decision-makers inability to always provide a consistent assessment under different schemes to determine the weight and the fact that the process of determining the weight basis depended on the structure of the problem. This problem could be overcome through an objective approach of assigning weights to the criteria, which could be conducted independently of the subjective assessment of the decision-makers and that could be applied when it was not possible to determine reliable subjective weights [33, 46]. This approach was presented in the second scenario.

Scenario 2 (SC2) used objective weights which were calculated based on the Shannon entropy method. This method was useful for measuring the amount of useful information with the data provided [59]. Therefore, the lower the entropy value was, the lower was the degree of disorder of the system, which indicated that if the difference in the value between the evaluated object for the same criteria was high, the criteria would provide more useful information [38]. In order to calculate objective weights, the next four steps were applied [18, 38, 59].

  1. Step 1.

    The decision matrix was normalized through Eq. (6), in which fj(ai) was the evaluated criteria of the ai option in terms of j criteria.

$$ {N}_{ij}=\frac{f_j\left({a}_i\right)}{\sum \limits_{i=1}^m{f}_j\left({a}_i\right)};{\forall}_j $$
(6)
  1. Step 2.

    Matrix E (the entropy) was calculated per index. Equation (7) was used to calculate matrix E.

$$ {E}_{ij}=\sum {N}_{ij}\times \log \left({N}_{ij}\right);{\forall}_j $$
(7)
  1. Step 3.

    The Wj weight matrix was calculated using Eq. (8):

$$ {W}_j=1+\left(k\times {E}_j\right);{\forall}_j $$
(8)

where k was constant and it was obtained through Eq. (9), and m was the number of alternatives:

$$ k=\frac{1}{\log (m)} $$
(9)
  1. Step 4.

    After the calculation of weights for each criteria, the sum of the weights may not be equal to 1. Thus, the final normalization of weights was performed according to Eq. (10).

$$ {W}_j={\frac{W_J}{\sum W}}_J;{\forall}_j $$
(10)

In order to apply the PROMETHEE/GAIA model, besides the determination of the weight of criteria, a set of parameters for each criteria needed to be assigned. These parameters included the impacts of the proposed alternatives, selecting the maximizing or minimizing value, and a preference function with the related thresholds and weight of the criteria.

4 Results and discussion

The results, presented in Table 2, obtaind by the anylisis of the descriptive statistics of the respondents, indicate that 55.6% men and 44.4% women participated in the study. The respondents were randomly selected, and the only requirement that they had to fulfill was to be Internet user and to be familiar with CAPTCHA. The greatest number of respondents, as many as 43.1%, were older than 41 years. When it came to the level of education of the respondents, 39.1% had MSc diploma and 29.1% had a BSc diploma. Analyzing number of years of Internet experience showed that 58.8% respondents had between 11 and 15 years of experience, and 33.8% had between 6 and 10 years of Internet experience. The interesting fact was that these two groups of respondents were dominant in a sample. On the other hand, the greatest number of respondents, 49.1%, was not spending so much time daily on the Internet (only up to 3 h a day), and more than 12 h a day was chosen by only 2.2% of respondents.

Table 2 Results of demographic characteristics of respondents

Based on the calculation of the weight of criteria using the AHP method for the SC1, time to find a solution with a dominant value of 0.53 had the highest priority. The next was task difficulty with value 0.26, and criteria number of attempts had the lowest priority with a value of 0.21 (Table 3). Input parameters for SC2 were determined using the Shannon entropy method after collecting data by questionnaire and its results are shown in Table 3 as well. For the SC2, task difficulty with a value of 0.39 had the highest priority, then the number of attempts with value 0.35, and criteria time to find a solution had the lowest priority with a value 0.26.

Table 3 Input parameters of impact matrix

Thereafter, the obtained weights of each criteria were entered into the Visual PROMETHEE software. All criteria were considered in minimized form (Table 3). Given that most obtained rating in the questionnaire had similar values and minimal difference existed amongst some evaluations, these fluctuations and conflicts could be considered for determining the preference function in which the answer may change with a duplication of evaluations. Therefore, to prevent such occurrence, the domain of changes for each criterion was calculated.

Considering that the data in Table 3 had a quantitative character, a linear function was chosen as the preference function for first criteria (time to find a solution), and level function for other two criteria (number of attempts and task difficulty) with indifference and preference thresholds (q and p) in the 5% and 30% zones, respectively [32]. Estimated impacts and their indices were submitted to PROMETHEE and were given in Table 3. Application of PROMETHEE for the SC1, which was based on subjective AHP weights, and for SC2, which was based on objective entropy weights, produced the following results (presented in Table 4 and Table 5, respectively) for the calculated values of leaving flows Φ+(a),entering flows Φ(a), and net flow Φ(a). All three flows had values between 0 and 1.

Table 4 Preference flow for the SC1 (Subjective AHP weights)
Table 5 Preference flow for the SC2 (Objective entropy weights)

The difference in the obtained results can be seen in the ranking of the position of the first three alternatives for both scenarios. The net flow ranking (Table 6) showed that the most preferred alternative for the SC1 was CPT 5 (Face of an old woman), then CPT 4 (Animated character) and CPT 2 (House numbers) followed, respectively. In the SC2 the most preferred alternative was CPT 4 (Animated character), then CPT 2 (House numbers) and CPT 5 (Face of an old woman) followed, respectively. That indicated that for SC1 the most usability image-based CAPTCHA from the perspective of experts was Face of an old woman, while from the prism of the Internet users it was Animated character. There were no difference in the obtained ranking of alternatives for the last four positions in SC1 and SC2. The order of alternatives from the fourth to the seventh position was: CPT 1, CPT 6, CPT 3, and CPT 7, respectively.

Table 6 Complete ranking of CAPTCHA for the selected scenarios

In order to determine the scope of preferred relations with given ranking, the analyses of interval of stability for both scenarios (SC1 and SC2) were done and the obtained results were presented in Table 7.

Table 7 Weight stability intervals for referent scenarios

The purpose of the stability interval analysis was to check the robustness of the chosen preference relations. The interval of stability defined the limits within which the range of the weight coefficient of the given criteria could be obtained without influencing the obtained ranking results. A wide stability interval indicated that the ranking did not change even when the parameters varied to a wide extent. However, the changes of weight could be done only by one criteria, while relative weights of the other criteria stayed the same. Based on relatively wide stability intervals (Table 7), it can be concluded that the final order of ranking alternatives did not change even when the weight coefficients varied in relatively wide limits.

The GAIA biplot provided valuable information in addition to the PROMETHEE ranking. This two-dimensional representation of the problem displayed the relations between alternatives and criteria, indicating strong and weak features of all alternatives and their interaction with the criteria. Alternatives were represented by triangles, and criteria by squares. Based on the position of the criteria in the GAIA plan (Fig. 4), conformity or conflict between individual criteria could be determined. The positions of the alternatives to determine the strength and weaknesses with respect to the criteria could be detrmined as well. The closer the orientation of the individual criterion axis was, the better the individual alternative was, judging by that criterion. In SC1 according to the criteria number of attempts and task difficulty, the closest position had CPT 2, while according to the criteria time for a solution, the closest position had alternative CPT 1. Moreover, according to the criteria time for a solution, the closest position had alternative CPT 5, according to the criteria number of attempts the closest position had CPT 2, and for criteria task difficulty the closest position had CPT 1 in SC2.

Fig. 4
figure 4

GAIA plane for defined Scenario 1 (Δ - 97.4%) and Scenario 2 (Δ - 99.1%) Source: Visual PROMETHEE software

5 Conclusion

Computer science faced numerous challenges when it came to the selection of security mechanisms on the web pages. The most successful technology used as a standard security mechanism was CAPTCHA test. In order to define and implement the most effective measures to overcome the undesirable or malicious Internet bot programs, decision-makers should identify not only the main weaknesses of each web page, but their strong points as well. This would enable the formulation of a comprehensive and coherent set of measures that would neutralize negative effects and support the strengths of each web platform. Preliminary research along these lines requires the application of multi-criteria analysis, considering that the web administrators have to consider a broad variety of indicators of systemic importance. This paper is an example of such an analysis. The application of the PROMETHEE/GAIA method in combination with the entropy model and the AHP method enabled the examination of the usability of CAPTCHA tests. The analysis found that the face of an old woman, based on the opinion of experts in computer science, and Animated character, based on the opinion of the Internet users, were the best tests according to the observed criteria. From the perspective of users, the analysis demonstrated which CAPTCHAs were the easiest to solve, and suggested that these two types of CAPTCHAs were useful because they offered the best human accuracy when solving. The easiness of solving proposed CAPTCHA confirmed that humans could accurately recognize matched faces even under severe distortion, because they were linked to cognitive psychology. Worried face was the worst ranked CAPTCHA according to both groups of respondent. The tested CAPTCHA samples were closely connected to facial expressions, which were difficult for respondents to recognize. But on the other hand, from a web administrator’s point of view even more complex CAPTCHAs should be applied to prevent potential machine attacks.

Although the practical and theoretical concepts employed in this study are clearly stated, and the model can be expanded to many other research problems, some constraints are inevitable. One of these constraints can be seen within decision-makers and respondents who were included in the research. Data in this research were collected during a short period of time in order to maintain the compatibility and coherence of the data. Also, the needs and expectations of users are rapidly changing. Therefore, the results of this research are regarded as a snapshot taken in a certain time, and these results can vary as time passes.