Keywords

1 Introduction

Currently, usability is considered one of the most important quality attributes in software development and aims to make the developed product easier to use for its users [1]. Due to the impact of usability on software products, there are various methods to evaluate that they are met within the applications, one of which is usability testing.

Usability testing is a UX research technique, in which a researcher asks a participant to complete certain tasks in a specific interface so that they can observe their behavior and hear their comments [2].

Similarly, today high-quality user experience (UX) has become a key competitive factor in product development. User experience methods and techniques investigate how people feel about a system. However, we must keep in mind that traditional evaluation methods, whether interviews or questionnaires are based on self-reported data, which are often exposed to social desire [3]. Therefore, by obtaining additional information from the user's interaction with the system to be evaluated, besides interviews or questionnaires, we build a more robust user experience.

In this study, we will present the results of a Systematic Literature Review (SLR) to identify the use and application of heatmaps in usability testing. Our main objective is to identify studies that will inform about the implementation of heatmaps in usability testing to understand their advantages, disadvantages, and challenges. To do this, we are following a protocol established by B. Kitchemham and M. Petticrew [4, 5]. With the information obtained, we will be able to identify a series of challenges, applications, and metrics related to the use of heatmaps in usability testing. This will allow us to deepen our understanding of how this technique can be used to evaluate the usability of a software product.

This document is structured as follows. In Sect. 2, we describe the main concepts belonging to the Human-Computer Interaction (HCI) area that was used in this study. In addition, in Sect. 3, we present the conduct of the systematic literature review and, in Sect. 4, we discuss the results of the SLR. Finally, in Sect. 5, we present the conclusions of the research.

2 Background

Next, the main concepts of Human-Computer Interaction (HCI) used in the present research are presented.

2.1 User Experience

According to I. Maslov and S. Nikou [6], user experience is a concept that describes the user's interactions with a product, including perception, learning, and usage. This can be evaluated based on ten facets: legal or ethical, economic, technological, pragmatic, cultural, emotional, social, reciprocal, cognitive, and perceptual. On the other hand, according to ISO 9241–210 [7], user experience refers to the perceptions and responses of a person resulting from the use of a product, system, or service.

2.2 Usability

According to J. Nielsen [2], usability is a quality attribute that represents how user-friendly interfaces are. Usability is defined by five components. The first, the ease of learning component, refers to the fact that the system must be easy to learn. Then, there is the efficiency component, which indicates that once the user has learned to use the system, it must have a high level of productivity. Subsequently, there is the ease of memory component, which refers to the fact that the system must be easy to remember so that if the user stops using it, they do not have to re-learn everything all over again. Then, there is the error component, which refers to the fact that the system must have a low error rate so that users make the least number of errors and in case they do, they can quickly recover from them. Finally, there is the satisfaction component, which refers to the fact that the system must be pleasant to use so that users are satisfied.

2.3 Usability Test

According to J. Nielsen [2], usability tests are a user experience research methodology. It is the process in which a usability evaluator asks a participant to perform different tasks using one or more specific user interfaces while the usability evaluator observes the behavior of the usability test participant and listens to the feedback from the evaluated interface. The objective varies depending on the study, but they are generally used to identify problems, discover opportunities, or learn about the behavior and preferences of the target user [8].

2.4 Heatmap

Heat maps are an additional tool to usability tests that have been successfully used in the HCI area [3]. They serve to be able to visualize in a friendly way the metrics obtained from the user's interaction with the system. The metrics that are ideal to be shown through a heatmap are the user's clicks, mouse movements, and eye movements of the user participating in the usability test [9].

2.5 Eye Tracking

According to S. Stuart [9], eye tracking is the process of monitoring and measuring the movements and positioning of the eyes during a specific task, without causing any discomfort. The device used for this purpose is commonly referred to as an eye tracker.

2.6 Data Visualization

According to C. Chen, W. Härdle, and A. Unwin [10], data visualization is the representation of data in a graphical form that can be easily understood by readers or viewers. As a result, it is becoming an increasingly important tool in scientific research.

2.7 User Experience Metric

According to Nielsen Group [11], a user experience metric is numerical data that informs usability specialists about some aspect of the user experience of the product being evaluated. These metrics can have high value as they help to improve the quality of designs and identify problems.

3 Systematic Literature Review

The Systematic Literature Review aims to identify the available research related to the phenomenon of interest. Additionally, the methodology proposed by B. Kitchenham and S. Charters [4] has been employed, which consists of the review being carried out in three stages. The first stage, the planning stage of the review, involves defining the needs of the review and developing a proposal for the review for further development. In this stage, the review questions, search strings, and inclusion and exclusion criteria are developed. Then comes the review execution stage, which consists of identifying and selecting the investigations that are related to the final project. Additionally, data extraction is performed for subsequent synthesis in this stage. Finally, the reporting and dissemination stage will facilitate the preparation of the main report generated [4]. For the execution of the process, we use the web tool called Parsifal, using the phases presented above.

3.1 Review Goal

The main goal of this review is to identify studies that report on the implementation of heatmaps in remote usability tests to understand their advantages, disadvantages, and challenges. Additionally, it aims to identify the tools used for creating and generating heatmaps in this context, to find the best way to implement them.

3.2 Review Questions

To achieve the objectives proposed above, the following review questions were formulated.

  • RQ1: How have heatmaps been used in the application of usability tests?

  • RQ2: What challenges were addressed by using heatmaps in usability testing? How were they cared for?

  • RQ3: What are the metrics that have been used when using heatmaps in usability tests?

To review each of the questions, the PICOC criteria were defined, which are included in the B. Kitchenham protocol [4] and were defined by M. Petticrew and H. Roberts [5]. The PICOC criteria are as follows: (1) Population, referring to the object of study, (2) Intervention, related to the aspects that will be studied by the population and how they will be carried out, (3) Comparison, in which the interventions are compared, (4) Outcomes, related to what is wanted or obtained from the systematic review, (5) Context, in which the circumstances under which the studies to be identified were carried out are established. It is important to mention that in this case, the comparison criterion will not be applied because it is not intended to compare the interventions. Table 1 shows the definition of the PICOC criteria.

Table 1. Definition of the general concepts using PICOC

3.3 Search Strategy

Search Engines.

Four search engines have been considered for the research: SCOPUS, ACM Digital Library, IEEE Digital Library for their relevance in the field of computer engineering, and ALICIA from Concytec as the largest digital collection of scientific and technological production in Peru, where, in addition, the thesis repositories of various Peruvian universities are included.

Search Strings.

For the formulation of the chains, it is important to define the keywords obtained from the definition of the PICOC criteria. Subsequently, the search for related words for each concept is carried out. Table 2 shows the grouping of each concept with its related words and the PICOC criterion to which they belong.

Table 2. Keywords and related words

Then, to build the search chain, the logical operator OR was used for the keywords and synonyms in the same group. On the other hand, the logical operator AND was used to join different groups. It is important to mention that an asterisk was considered at the end of the keywords that present more than one conjunction or a plural form. Next, the search chain is shown as a result of the previously mentioned process.

("heatmap*" OR "eye track*" OR "eyetrack*" OR "heat map*") AND ("usability test*" OR "UX" OR "user test*") AND ("algorithm*" OR "technique*" OR "disadvantage" OR "downside*" OR "drawback*" OR "metric*" OR "benchmark*" OR "criterion*" OR "measure*" OR "tools" OR "instrument*").

For ALICIA, the same string translated into Spanish was used. The search string is shown below.

("mapa de calor" OR "eye track" OR "seguimiento ocular") AND ("pruebas de usabilidad" OR "UX") AND ("algoritmo" OR "técnica" OR "desventajas" OR "métrica" OR "benchmark" OR "criterios" OR "medidas" OR "herramientas" OR "instrumentos").

Inclusion and Exclusion Criteria.

As not all scientific articles can answer the questions posed in the systematic review, Kitchenham establishes the definition of inclusion and exclusion criteria, which help to evaluate which investigations are useful in answering these questions.

For this reason, the study inclusion criteria were:

  • IC1: The study describes the different types of heatmaps in usability testing, as well as their challenges.

  • IC2: The study reports tools for the implementation of a heatmap module in usability testing.

  • IC3: The study describes eye-tracking methods applied to usability testing.

Similarly, the exclusion criteria for discarding investigations are as follows:

  • EC1: The study is more than 15 years old.

  • EC2: The study is written in a language other than English or Spanish.

  • EC3: The study focuses on usability but does not mention heatmap tools.

  • EC4: The study focuses on the implementation of heatmaps but not on usability.

3.4 Search Results

After executing the search chain on August 23, 2022, in each of the selected databases, results were obtained, which have been divided into three sections. The first is the total amount of studies found in each of the databases. The second is the number of duplicate studies found in each search engine. Finally, the number of selected studies in each database. The search results by the database are shown in Table 3.

Table 3. Number of extracted, duplicated, and selected studies

Additionally, Table 4 shows the primary studies collected to answer the previously posed research questions.

Table 4. Selected primary studies

3.5 Data Extraction

Once the selection of articles according to the inclusion and exclusion criteria was completed, a data extraction form was developed which allowed to organize of the most relevant information from each article to answer the previously raised research questions. Table 5 shows the data extraction form, which includes the field name, description, and the question related to which it will provide an answer with its filling.

Table 5. Data extraction form

4 Data, Analysis, and Results

After reviewing the 22 primary studies, it was observed that the majority come from Europe. This evidences that in Europe, heatmap research is considered an additional resource for usability tests. On the other hand, it is important to mention that the number of investigations is increasing, as evidenced in the year 2019. Finally, the relevant information from each study was compiled in the extraction form, and with this, the questions posed for the systematic review were answered.

4.1 Answer to Review Question RQ1

The answer to the question posed for the systematic review of "How have heatmaps been used in the application of usability tests?" is addressed by grouping twenty investigations, from which, taking into account the particular stages of each investigation, four stages were identified which are shown in Table 6.

Table 6. Stages for the generation of heatmaps in the application of usability tests.

The following is a description of each stage in obtaining heatmaps.

  1. 1.

    Training of the model to be used in data acquisition: To obtain data on the interaction between the user and the system, it is important to train the developed software using a data model [E01, E10 & E12] to obtain more precise and specific results according to the scenario to be applied. This training can be performed using data sets, such as the Face Expression Recognition Plus [E20].

  2. 2.

    Selection of User and System Sample for Evaluation: To perform a usability test, obtain the expected results, and finally visualize them through a heatmap, it is important to select an appropriate sample of users for each scenario [S01, S02, S03, S04, S06, S07, E08, E10, E12, S13, S14, S15, S16, S17, S18, S19, S20, S21 & S22]. One factor to consider is the age of the evaluated users [E18 & E20] because they have to be within the target audience of the system's users.

  3. 3.

    Usability Test Execution: To obtain the metrics that provide more information about the interaction between the users and the system, different methods have been used to provide both quantitative and qualitative results [S03, S13 & S15]. Among the main methods, the use of surveys to determine the user's opinion about the system under evaluation is used. On the other hand, as qualitative methods, there is eye-tracking [S10 & S16], which determines which area of interest the users have by the time they fixate on it. On the other hand, it is possible to analyze the user's emotions through audio or facial gestures [S08 & S19]. The data collected from each evaluation serves for a more effective analysis of the usability problems or design errors that may present the system's graphical interfaces [S01, S02, S03, S04, S06, S07, S08, S10, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21, and S22].

  4. 4.

    Display of Information Generated by User and System Interaction through Heat Map: Finally, after generating the data to be evaluated, it is displayed through useful graphs for the evaluators. One of the existing graphs is the heatmap [S02, S03, S04, S06, S08, S09, S10, S11, S13, S15, S16 & S19], which shows the number of user fixations [S02, S08, S09, S11, S16 & S18] on sectors of the screen through different shades of color. The colors used are yellow, blue, and red, with yellow being the least intense shade and red being the most intense shade. On the other hand, the fixations shown can be divided by quantity and time [S15, S19 & S20], providing new possibilities for analysis.

4.2 Answer to Review Question RQ2

The answer to the question posed for the systematic review of “What challenges were addressed when using heatmaps in usability tests and how were they addressed?” is addressed by a group of twelve studies, which are synthesized into three main challenges (Table 7).

Table 7. Challenges addressed when using heatmaps in usability testing.

A description of each of the challenges shown along with the solutions proposed in the literature is presented below.

  1. 1.

    Ineffective Visualization of User-Stimulus Interaction Data Collection: After obtaining the data from the result of the user's interaction with the system, it is important to visualize these for the researchers [S02]. Therefore, one of the proposed alternatives is heatmaps [S02, S03, S04, S06, S08, S09, S11, S16 & S19], which help in better evaluating the user experience as the information resulting from the user's interaction with the system [S03] can be easily visualized through colors in various screen areas [S04] depending on the frequency with which the sector is interacted with.

  2. 2.

    Searching for New Criteria Different from Effectiveness and Accuracy in Usability Tests: Usability tests often only evaluate the effectiveness and accuracy of systems. However, due to the high competitiveness of the market, it is not enough to analyze these indicators as users will remain unsatisfied [S02, S08 & S15]. Therefore, to obtain new evaluation criteria that allow for a better analysis of the user experience and software quality, the use of data-gathering methods that show the user's interaction with the system is proposed, such as eye tracking [S09, S10 & S19], so that new criteria can be obtained that help researchers build a user-friendly and satisfying system.

  3. 3.

    High Cost of Specialized Tools for Information Gathering and Data Generation to Complement Test Results: Obtaining additional information that complements the data obtained from the user's interaction with the system presents high-cost alternatives. Firstly, in the case of using additional equipment besides the software, a laboratory must be present or, otherwise, many replicas of the equipment, which generates high costs [S08 & S13]. On the other hand, there are also paid alternatives for generating heatmaps, in which only paying enables analysis options and does not allow access to the source code. Among them, we have HotJar or MouseFlow [E06]. As an alternative, there are Open-Source tools, such as WebGazer.js or SearchGazer.js [E06], which allow obtaining information from the user's interaction with the system at no cost and with the possibility of obtaining the source code.

4.3 Answer to Review Question RQ3

The answer to the question posed for the systematic review of "What metrics have been used when using heatmaps in usability tests?" is addressed by the grouping of nineteen studies, which are synthesized into three main metrics which are visualized in Table 8.

Table 8. Metrics used when using heatmaps in usability tests

Below is the description of each of the displayed metrics.

  1. 1.

    Number of user fixations on-screen sectors: One of the main metrics for generating heatmaps is the number of user fixations on-screen sectors [S02, S03, S04, S06, S07, S08, S10, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21 & S22]. This capture is performed using eye tracking, which determines the areas in which users have observed most frequently.

  2. 2.

    User fixation time on screen sectors: Like the number of screen fixations, the user's fixation time in the various screen areas [S02, S15 & S16] is available. This metric is obtained by capturing eye movement so that the screen areas with which the user interacts the most can be identified.

  3. 3.

    Position of the clicks made by users on-screen sectors: This metric shows us which sectors of the screen have the highest number of clicks by the user [S03 & S06]. This shows us which are the screen areas with which the user interacts the most. With this, said metric is shown on the heatmap according to the number of clicks in each zone with the colors yellow, blue, and red, with yellow being the least intense and red being the most intense.

5 Conclusion

As a result of the systematic review carried out using the B. Kitchenham protocol, it was determined that a formal process has not been identified in the literature that indicates the tasks that an evaluation team should perform to complement the results of heatmap application with user testing for better analysis of usability problems and design errors.

Additionally, while some tools reported in the literature were identified, these are not easily available to the academic or industrial community because they do not provide the ability to obtain the metrics necessary for generating heatmaps or only provide information on the user's interaction with the system but do not provide the ability to perform an analysis effectively and efficiently.

Finally, it was identified that heatmaps are useful for specialists who carry out usability testing with users because it is possible to visually display the user's interaction with the system being evaluated, such as the areas of gaze fixation on the screen.