Keywords

1 Introduction

With the continuous advance of wireless networks and the great proliferation of smartphones, many applications are launched in the market every day. Nowadays, the requirements for the ubiquitous computing, which predicts software as part of people’s daily life and available, transparently, at anytime, anywhere and from any device, have been increasingly explored to build these applications. This application’s ubiquity is reached by automatically monitoring contextual information related to the use of these applications.

Collecting data from smartphone users’ experiences and associating them to the context where the interactions occur is a great challenge for the Human-Computer Interaction (HCI) area. The situations change and the results from the tests are highly dependent on the context. As an example, a person that interacts with a mobile application sitting on his home sofa, will have different external interferences when compared to the same task done while walking on the street.

In the literature, it is possible to find many authors who defend the necessity to relate the context influence on the users’ interactions with these applications [1]. To conduct the studies with such a broad reach, it is necessary to use methodologies and techniques that carry out experiments that are able to collect data contextualized with the scenarios where the interactions take place [2]. This fact provokes a lot of discussion regarding the place where experiments are conducted (in the field or in the laboratory) [3], as well as the techniques which may be used to extract the best set of data that characterize the experiments [4]. This work was motivated by these discussions, having the following main contributions: (i) Comparison of the main approaches used to evaluate smartphone applications; (ii) Use the UXEproject infrastructure, as a new approach created with the potential to extract and relate quantitative, contextual and subjective data. (iii) Conduction of field experiments, relating contextual factors to usability metrics for smartphone applications.

The remainder of this paper is divided into six sections. Section 2 presents the state of the art concerning evaluations of smartphone usability, encompassing the investigation of the approaches used to carry out the usability experiments. Section 3 describes the UXEproject infrastructure adopted to facilitate the experiment presented in this work. Section 4 describes the methodology used for the execution of the experiment. Section 5 summarizes the presentation and discussion of the results obtained. Section 6 presents the conclusions and future prospects.

2 The State of the Art

The relationship between context and usability is an issue widely discussed by the scientific community which studies the influence of scenarios regarding the interaction with smartphones. The experiences show that humans usually interact with systems in unusual ways [5]. Thus, the insertion of users and real scenarios in the tests is essential to delineate the users’ preferences and the consequent adaptation of products addressed to them [3].

Kawalek et al. [6] suggests evaluation methods which encompass different observation angles in the experiments, such as quantitative data (usability metrics), the subjective evaluation (users’ feelings) and contextual data (e.g., environmental conditions and the devices’ characteristics). The main problem is the lack of literature covering approaches that support these three requirements combined in a single experiment. Generally, only one or two of them are related.

Coursaris and Kim [7] carried out a systematic data survey, from 2000 to 2010, which allowed them to identify that 47 % of the works that evaluate mobile devices are done in the laboratory, 21 % in the field, 10 % used both scenarios and 22 % are conducted without the participation of users. A point to be observed is that many studies don’t consider the mobile feature of such devices, applying traditional evaluation methods. Another fact which calls attention in the results presented is that 47 % of the studies evaluate individual and out of context tasks, 46 % are based on the technology used and only 14 % consider context data and the users’ characteristics.

In order to identify the current reality of the usability investigations related to smartphones, this section presents a study encompassing works published from 2008 to 2012 that describe empirical experiments and investigate at least one of the following usability attributes: efficiency, effectiveness, satisfaction, learning, operability, accessibility, flexibility, usefulness and ease of use. The publication venues investigate were from the ACM, IEEE, Springer, and Google Scholar. Twenty-one works were selected, and they are listed in Table 1 along with the investigation techniques used. The results of this study are detailed as follows:

Table 1. Works (2008–2012) that investigate the usability of applications for smartphones.
  • The Table 1 summarizes the techniques used for data capture in usability experiments: 71.4 % used surveys 19 % logging, 14.2 % evaluators’ direct observations, 14.2 % interviews with users, 19 % the think aloud technique and 28.5 % other less traditional techniques. The sum of percentages exceeds 100 % because 66.3 % of the experiments encompass more than one technique.

  • Concerning the amount of times each usability attribute was investigate, it’s possible to conclude that Ease of use (100 %), Satisfaction (90.4 %), Effectiveness (76.1 %) and Efficiency (52.4 %) are the most investigated usability attributes.

  • The number of participants was divided in three different ranges: 55 % of the experiments used from 5 to 24 participants, 20 % used from 25 and 44 participants and 25 % were carried out with over 44 users.

  • Regarding the investigation scenario, 52.3 % of the experiments were conducted in laboratory, 33.3 % in the field, 9.5 % in both scenarios and 4.7 % with simulators.

  • One of the main aspects to be highlighted is that only 3 experiments investigated contextual data, and were conducted in the laboratory, meeting the expectations and desires of a great number of researchers.

  • The last issue to be pointed out is that none of the approaches captures the users’ impressions concerning the application usability during their interactions, which could provide the correlation between the subjective data in the evaluations.

The main observation of the previous study was that in the experiments, surveys are used to collect data, which might complicate the correlation between different kinds of information in order to find out usability problems [29]. Furthermore, in most cases, contextual factors are not investigated, an issue posed by many researchers as a primary factor for advances in the usability evaluations area [1, 2, 7].

3 The UXEProject Infrastructure

The UXEProject infrastructure was built to give support to the usability evaluation based on the analysis of data captured directly from the devices. The formal model which originated the infrastructure can be found in full in [30]. It is conceptually divided in three units, which comprise: (1) mapping of tasks that will be investigated; (2) combination of traceability metrics which enables the capture of contextual data, usability statistics and subjective information regarding the experiences provided to users; and (3) the storage and evaluation of data captured during the experiments.

Using the UXEProject, the mapping of tasks is built through the capture of methods executed in the application that will be evaluated. The Evaluation Team is responsible for the choice and mapping of tasks, as well as the creation of data capture metrics. It is important to emphasize that it is not necessary to have programming experience to carry out these activities. The following subsections describe the tools used to encompass the predicted components in the three infrastructure units.

3.1 Mapping Unit

The first tool developed in the infrastructure encompasses the source code preparation to enable the mapping of tasks provided in the applications. This tool, named Mapping Aspect Generator (MAG), imports the source code from the application to be mapped and creates an Aspect that inserts the method onUserInteraction Footnote 1 in the classes that refer to the interaction layer. This process allows detecting the users’ actions. In order to have the application ready to be mapped, it is necessary to compile the application source code with the Aspect generated. After that, it is enough to embed the application in a smartphone to make the interactions.

So that the Evaluation Team maps the tasks, another tool, named Automatic Task Description (ATD), was developed. The ATD should be embedded in a device and executed simultaneously with the application that will be mapped. Thus, as the Evaluation Team interacts with the application, the methods executed are automatically captured to be used as steps for the conclusion of a task.

The ATD method consists of the use of a filter that identifies when there is a user interaction. The filter identifies which classes, methods and parameters of the application were used. This information is stored in a XML file, which will be sent to the server to be used in the creation of metrics.

3.2 Traceability Unit

The tool designed to allow the instrumentation of applications and to enable the data capture was named UXE Metrics Generation. This tool contains a library which has the structures of metrics to perform the measurements.

Initially, the tool has as input the XML file generated in the Mapping Unit. Then, the existing methods in the XML file are connected to the Metrics Library available in the tool, allowing the creation of Aspects responsible for the capture, transmission and persistence of data. At last, it is sufficient to compile the application’s source code along with the Aspects generated and to embed the application in a device that will be utilized by a user.

To encompass the data collection, three types of metrics were defined. The usability and context metrics use the Logging technique [8, 9], and the subjective metrics use the Experience Sampling Method [31] technique.

In order that the data related to the experiments could be transmitted and stored on a database, a micro instance from the service known as Amazon EC2Footnote 2 was utilized.

3.3 Assessment Unit

To encompass the components defined in the Assessment Unit, the following processes were performed: (i) create and setup an FTP and a database (DB) server and make them available on the Internet; (ii) carry out the modeling of a DB and a Data Warehouse (DW) to store and enable the analysis of information captured during the experiments; (iii) create tools to detect the presence of new files in the FTP server, populate the DB and load the DW; and (iv) choose an OLAP tool to the data analysis.

The Database Management System selected to store the data was the MySql Community Server. In order to encompass the load of data on the DB, a tool named Data Load was developed. The steps executed by this tool are: detect the arrival of new files in the FTP server, extract the data and load them into the DB.

The last tool designed (ETL Maker) extracts, transforms and loads the data, transferring them from the DB to the DW. To make the data analysis easier, the OLAP tool Pentaho Analysis ServicesFootnote 3 was chosen.

4 Experiments Conducted

The experiment reported in this article was divided in six different phases (Table 2), based on the directives proposed in the DECIDE framework [32], which guided the specification of the steps during all phases of the experiment.

Table 2. Phases of experiment based on DECIDE framework.

An important aspect of the experiment was to conduct exploratory research aiming to find applications that have attractive functionalities and with the possibility to be inserted in people’s daily life. The applications considered include the following prerequisites: they must have been developed using the Java language and for the Android platform; their source code must be available and have explicit rights of use; they must have been built using good programming techniques, showing a good modularization of its functionalities, to allow the source code to be instrumented with AOP.

Three applications were selected for the experiments. The first application, named Mileage, aims to help the users to control their costs with fuel and other maintenance services of an automobile, such as oil change, brake pad change, among others. The second one was ^3 (Cubed), a music and video clips manager. On its main menu, it is possible to select songs or videos and play them. The last application, named Shuffle, schedules the activities that allow to link tasks to dates and times, besides permitting the association to projects and contexts. Figure 1 shows the three selected application interfaces and Table 3 details the tasks investigated in the usability experiments.

Fig. 1.
figure 1

The three selected mobile application interfaces.

Table 3. Tasks investigated in the experiments.

Another important aspect of the experiment was define the participants. The selection considered the profiles that were under analysis and their smartphones’ features. Twenty-one users were selected, taking into consideration the age, educational grade, school background, occupation and purchasing power.

The relationship of the data used in the experiment was defined according to the capture strategies provided by the UXEProject infrastructure. Thus, the usability data was considered related to the mapped tasks, the users’ profile, the smartphones’ features and the contextual data obtained through sensors. The smartphones’ screen features and the range of values considered to compose the interactions context were:

  • Resolution (pixels): χ ≤ 320 × 240 (low); 320 × 240 < χ ≤ 320 × 480 (medium); χ > 320 × 480 (high)

  • Size (inches): χ ≤ 2.4 (small); 2.4 < χ ≤ 3.5 (medium); χ > 3.5 (large)

So as to contextualize the environment where the interactions take place, the data are captured considering the degree of luminosity, the device position during the interactions and the speed in which the user moves. These context data are captured directly from the devices sensors and their reference values are the following:

  • Luminosity (lux): χ ≤ 100 (low); 100 < χ ≤ 10000 (medium); χ > 10000 (high)

  • Movement (m/s): χ < 0,2 (stationary); 0,2 ≤ χ ≤ 2,7 (walking); χ > 2,7 (motorized)

  • Position: vertical; horizontal; mixed

5 Experimental Results

Initially, observed were the percentage values of tasks completed with errors in each application regarding the luminosity variation. The objective here is to identify the possible influence of this contextual variable on the interactions. In order to conduct the analysis, the luminosity was isolated and related to the percentage of tasks completed with errors in each application, as presented in the left of the Fig. 2.

Fig. 2.
figure 2

Errors rate due to luminosity (left) and due to the movement speed (right)

It was verified that, for all applications, the highest rates of errors in completed tasks occur when the luminosity is either too high or too low, that is, when the interaction scenario’s conditions are not within the parameters considered standard, which proves the luminosity influence on users’ performance.

The next evaluation refers to the speed in which users move when performing the interactions. The speed usually varies due to three possibilities: the user is either walking, or stationary, or in any kind of means of transportation. In Fig. 2, it is possible to identify in all applications that the actions performed with no movement show a lower error rate than the ones performed while moving.

The next analysis, detailed in Table 4, regards smartphone positions in the interactions. The aim here is to find usability problems in specific tasks related to the interaction position (vertical, horizontal or mixed). Only the tasks which had an error rate over 10 % related to the position of interaction are shown. This sort of information is useful for the applications developers, as in future versions of the applications, the interactions in positions of high error rate can be inhibited. Table 4 shows also that more than 50 % of the problems occur when the tasks are done in a mixed position, that is, they are started in a position and ended in another.

Table 4. Error/failure rate due to the position of interaction

The following analysis verifies the existence of contextual variables interference related to smartphone characteristics, such as, screen resolution and size. In order to carry out this evaluation, the tasks executions were investigated considering the smartphones’ characteristics. The data presented in left chart of the Fig. 3 allows identifying that the screen resolution influences significantly in the tasks execution speed, that is, the higher the resolution, the faster the tasks are concluded. We can observe, as for the application Cubed, once the resolution increases, in average 26.03 %, the task speed when compared to the low resolution. In the Mileage application, this difference is 19.66 % and, in the Shuffle, 17.17 %.

Fig. 3.
figure 3

Task execution time (in seconds) due to screen resolution (left) and size (right)

The same analysis made earlier was designed to verify the screen size influence on the users’ interactions. In the right of Fig. 3 right, one can see that the screen size is another contextual variable which influences the performance of users. In the Cubed application, the average speed for the task execution decreases around 4.1 s when compared to the use of small screen smartphones. In the Mileage application, this difference is apparent around 6.9 s, and, in the Shuffle, the difference is 10.9 s.

A fact observed in the smartphones market is that, normally, the phones with smaller screen also have low resolution. Thus, the users’ performance was observed considering the two variables simultaneously. The metrics used to measure the performance was the percentage of tasks completed with error. The Table 5(a) shows that the smaller the size and the lowest the resolution of the smartphones screen, more errors are found in the executed tasks. The difference between the extremes, that is, big screens with high resolution compared to small screens with low resolution, is 9.3 % of tasks executed with errors.

Table 5. Task error percentage considering (i) screen size and resolution variation and (ii) the purchasing power and screen resolution variation

When analyzing the rate of tasks executed with errors along with the profile of participants, an intriguing fact was observed. The occurrence of errors in the low social class is greater than in the other classes. In order to search for an explanation for this result, the kind of device used in the experiment by these participants was investigated. The conclusion was that the rate of errors was not related to the users’ purchasing power, but to the low resolution of the device’s screen. As the majority of the people with low purchasing power used low resolution smartphones, an isolated analysis of the social class can lead to wrong conclusions. This latest analysis highlights one of the potentialities of the UXEProject infrastructure as it permits to associate different contextual factors in a single evaluation, decreasing thereby the possibility of wrong conclusions. In Table 5(b), it is observed that, regardless the purchasing power, the errors are more frequent when low resolution smartphones are used.

6 Conclusions and Future Works

From the Sect. 2 of this article, the conclusion is that the majority of experiments made to evaluate the usability of applications for smartphones use surveys to collect the data and there is no correlation between the contextual variables and the usability problems observed. This fact is contrary to expectations of many researchers in area.

The results obtained in the experiment showed that the UXEProject infrastructure is a good solution for the investigation of usability problems associated to different types of data, highlighting the data collection using the smartphones’ sensors. With the experiments’ results, it is observed that approximately 70 % of the interactions occur when users are stationary, having the device in a single position and with a normal environment luminosity. However, when these contextual factors change, the users make more mistakes and take longer to execute the tasks. This information suggest that the applications should, for example (i) make interactions impractical in positions which offer more probability of errors, forcing users to interact in an appropriate position; (ii) detect the external luminosity and try to balance the luminosity radiated by the device in order to guarantee a good visualization; and (iii) identify the user’s movement and only enable the most usual functionalities, decreasing the visual pollution.

Another important observation concerns the smartphones’ setup interference in the users’ performance. Furthermore, it was proved that the correlation of different kinds of information are important for the conclusion of the results, as seen in the relationship between the errors rate and low purchasing power people. As a future work, it is intended to incorporate other sensors to the UXEProject, aiming to conduct new investigations involving different contextual factors.