Keywords

Introduction

In the recent years, the mobile devices like smartphones and tablets are getting more and more popular. As of 2015, around 1.86 billion people use smartphones (Url 1), and approximately 1 billion people use tablet computers (Url 2). With the increasing rate of mobile device usage, mobile applications gain popularity. The leading mobile application market expands at an enormous rate. There were 2.2 million applications on Google Play Store and 2 million applications on the Apple App Store in July 2016 (Url 3). Although there are many mobile applications on Google Play and iStore, some of them have not been adopted by the people, and some of the adopters quit using the application due to its complex, inconsistent and difficult to use features. Therefore, usability is one of the most critical quality factors affecting the intention to use a system as well as the continued usage of mobile applications.

Usability is defined by ISO 9241-11 as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” (ISO 1998). A usable system should be learnable, efficient and memorable. It should also satisfy the expectations and needs of the users with its physical and technical features. Companies that are aware of the importance of the usability try to design and develop more usable mobile applications. System Usability Scale (SUS) with an adjective scale rating is one of the most popular and easy to use the questionnaire to measure the usability level of any product. In the current study, we aim to explore the usability level differences in the most popular mobile applications such as Facebook, YouTube, WhatsApp, and Mail.

In addition, this study aims to reveal the difference in terms of mobile application usability between Android and iOS operating systems. iOS is a closed source operating system for mobile devices that are manufactured by Apple, Inc. Since no other device uses iOS, it is highly compatible with the Apple mobile phones and tablets. On the other hand, Android can be used in many models of mobile phones. It is an open source operating system for mobile phones. Since Android is used on different types of mobile phones, the integrity of the Android system for various mobile phones might not be at the same levels for every model.

This study is organised as follows: A review of the literature is presented in Section “Literature Review”. The research methodology is discussed in Section “Methodology”, and results are shown in Section “Results”. In the last section, the results are discussed, and possible future studies are given.

Literature Review

There are three types of usability evaluation methodologies for mobile applications: laboratory experiments, field studies and hands-on measurement (Nayebi et al. 2012). In laboratory experiments, participants in a controlled lab environment, perform specific tasks related to mobile applications. The main idea is to control the environment that the users are conducting their tasks. It is useful because the participants perform all the tasks for the measurement of usability without any distractions. On the other hand, the main problem of laboratory experiments is that the real world conditions may be different from lab environments and the actual usability of the mobile applications may not be measured. In the literature, there are several studies using laboratory experiments (Biel et al. 2010; Masood and Thigambaram 2015) to measure the usability of different type of mobile applications. In field studies (Hoehle and Venkatesh 2015; Hoehle et al. 2016), the questionnaires are applied to ask the mobile application users about their experience. Field studies are not always the best evaluation method to test the user interface of mobile applications, because they are more time consuming than the lab test and need more preparation, pre-test and pilot applications (Kaikkonen et al. 2005). In hands-on measurements, to evaluate the usability, defined specifications of mobile applications are measured directly (Nayebi et al. 2012). The method of usability evaluation should be chosen carefully concerning the nature of the application and evaluation.

The features of mobile devices such as such, as small screen size, mobile context, connectivity, different display resolutions, limited processing capability and power, restrictive data entry methods etc. (Jacko 2011; Zhang and Adipat 2005) different from other computer systems influence the usability of mobile applications. In addition, the operating system for mobile devices may affect the usability of mobile applications. Kortum and Sorber (2015) use SUS to measure the usability of top ten mobile applications on iOS and Android platforms for smartphones and tablets. The results of their study show that mobile applications on the iOS platform are more user friendlier than Android-based applications.

Google’s Android and Apple’s iOS have their own user interface guidelines that developers must follow to release their mobile applications on Apple and Google store. In addition to these guidelines, in the literature, there are several mobile application usability guideline developed by researchers based on these type of user interface guidelines. Hoehle and Venkatesh (2015) develop 19 first-order constructs such as instant start, effort minimisation, concise language and 6-second-order constructs such as application design, user interface graphics etc. for mobile applications such as based on Apple general user guidelines. They validate their conceptualisation by applying survey to U.S. consumer using social media applications. The results of the study show that application design, application utility, and user interface graphics are the more important predictors of mobile application loyalty and continued intention to use. Hoehle et al. (2016) develop ten usability constructs for mobile applications such as aesthetic graphics, colour, fingertip-size controls and gestalt based on Microsoft mobile usability guidelines. They validate their constructs by applying a survey to German consumers using social media applications like Facebook, Twitter on their mobile phones. The results of the study show that gestalt, fingertip-size controls, and subtle animation are the most significant factors of continued intention to use. In addition, gestalt, fingertip-size controls, and control obviousness are essential factors of brand loyalty.

In the literature review, there is no similar study conducted in Turkey measuring mobile application usability. Therefore, this study aims to estimate the usability of popular mobile applications used by Turkish consumers and reveal the usability related problems by using SUS survey adapted with an adjective rating scale. The results of the study may help mobile application developers to design more user-friendly products.

Methodology

The methodology of this study consists of three steps. First, a questionnaire including SUS items added with an adjective rating scale is applied to Turkish participants using Facebook, YouTube, Mail and WhatsApp application on their mobile phones. Second, average SUS scores and adjective rating scales are calculated for each mobile application and operating systems. Third, statistical analysis is applied to find out if there is any significant difference between the mobile applications and operating systems in terms of usability. In addition, the correlation between SUS scores and adjective rating scales are calculated.

SUS (System Usability Scale)

John Brooke developed SUS in 1996. It contains ten basic and simple questions about the usability of a system. SUS is a useful tool to understand the problems of users facing while they are using the system.

The items in the SUS are (Brooke 1996):

  1. 1.

    I think that I would like to use this system frequently.

  2. 2.

    I found the system unnecessarily complex.

  3. 3.

    I thought the system was easy to use.

  4. 4.

    I think that I would need the support of a technical person to be able to use this system.

  5. 5.

    I found the various functions in this system were well integrated.

  6. 6.

    I thought there was too much inconsistency in this system.

  7. 7.

    I would imagine that most people would learn to use this system very quickly.

  8. 8.

    I found the system very cumbersome to use.

  9. 9.

    I felt very confident using the system.

  10. 10.

    I needed to learn a lot of things before I could get going with this system.

SUS stands out with its wide range of usage area, simplicity, and quickness of use for both the practitioners and participants (Bangor et al. 2008). SUS provides a general overview of the usability of a product with the help of its understandable score calculation. Although it is a 100-point scale, it does not give an absolute judgment of the usability of a product. To deal with this situation, a seven-point adjective-anchored Likert scale is added as the eleventh question (Bangor et al. 2009). The question is: “Overall, I would rate the user-friendliness of this product as:” and the answer to this question ranges from “1: Worst imaginable” to “7: Best imaginable”. The adjective rating scale could help to find an absolute judgement from the SUS questionnaire (Bangor et al. 2009).

To apply the modified SUS questionnaire, we choose most commonly used mobile applications. WhatsApp is a messaging application that enables sending text messages, pictures and videos to individuals or groups of people. It is #1 on the list “Top Free Applications” on App Store and Google Play Store (Url 5, Url 6). Facebook is a social media platform that people share their thoughts, photos, videos and news about themselves. It is #5 on the list “Top Free Applications” on the App Store and #3 on Google Play Store (Url 5, Url 6). YouTube is the biggest online video-sharing platform on the World Wide Web. It is #3 on the list “Top Free Applications” on App Store (Url 5). Both iOS and Android have a default e-mail application. For iOS, its name is Mail, and for Android, it is E-Mail.

Calculating Average SUS Scores and Adjective Rating Scales

The participants answer the questions of SUS with a scale between 1 (Strongly disagree) and 5 (Strongly agree). However, the outcome of these answers is evaluated in the range of 0–4 according to Brooke’s scoring.

As can be seen in the SUS, the odd-numbered questions have positive meanings, and the even numbered questions have negative meanings. The scoring of positive questions is done as follows: The user’s score is reduced by one point. For example, if the user’s score is 4 for the question 5, then the outcome score will be 3. The scoring of negative questions is done as follows: The user’s score is subtracted from 5. For example, if the user’s score is 3 for the question 4, then the new score will be 2. After all the scores are determined, the sum of the scores is multiplied by 2.5 to make the range between 0 and 100. For each mobile application on Android and iOS, average SUS scores and adjective rating scales are calculated.

Statistical Analysis

To find out if there is any significant difference between mobile applications and operating systems in terms of usability, general linear model (GLM) univariate analysis is applied by using the SPSS program. GLM univariate analysis is applied to reveal the effect of multiple independent factors or variables on the means of various groupings of one dependent variable (Ho 2006). In addition, applying post hoc tests after an overall F test shows if there is any difference between specific means (Ho 2006). Furthermore, the correlation between SUS scores and adjective rating scales are calculated to show SUS adapted with an adjective rating scale gives meaningful results.

Results

The questionnaire is conducted with 222 Turkish participants in 2017. The participants evaluate the applications YouTube, Mail, Facebook, and WhatsApp in terms of usability based on their experience. Furthermore, participants who use more than one of these applications complete the survey for each application they use. Because of this, the number of surveys collected is 643. Demographic characteristics of the participants are shown in Table 1.

Table 1 Demographic characteristics of the participants

The average SUS scores and adjective rating scales of the mobile application for iOS and Android operating systems are calculated as shown in Table 2. The mean of average SUS scores for the four applications for iOS is 79.41 (ranging from 71.39 to 88.53), and the mean of average adjective scale scores is 5.21 (ranging from 4.73 to 5.89). The mean of average SUS scores for the four applications for Android is 81.2 (ranging from 75 to 86.1), and the mean of average adjective scale scores is 5.18 (ranging from 4.73 to 5.72). These results show that while compared the average SUS scores, Android has slightly better usability than iOS. On the other hand, according to the adjective rating scale, iOS has a better performance than Android. Average SUS scores of each application for iOS and Android are higher than 70, and they are all acceptable in terms of usability (Bangor et al. 2009).

Table 2 Average SUS scores and adjective rating scales of the mobile apps

According to the average SUS scores and an adjective rating scale, for both operating systems, Facebook has the lowest scores, and WhatsApp has the highest scores compared with the other applications. Although these average values roughly indicate the usability of the applications and operating systems, the further statistical analysis is needed to get more understanding. Therefore, GLM univariate analysis is applied to show if there is any difference between the applications and operating systems in terms of usability.

According to Levene’s test results, error variances of SUS scores of operating systems are homogenous (α = 0.971 > 0.05). Since the homogeneity assumption is met, Tukey test is applied for post hoc test. The results show that operating systems have no significant effect on SUS scores and there is no difference between iOS and Android operating system in terms of usability (mean difference = −1.649, std. error = 1.349, α = 0.222 > 0.05). Furthermore, the same tests are applied by taking into consideration average adjective rating scales of operating systems and mobile applications. Error variances of adjective rating scores of operating systems are homogenous (α = 0.725 > 0.05) according to Levene’s test results. Operating systems have no significant effect on adjective rating scores, and there is no difference between iOS and Android operating system in terms of usability (mean difference = 0.037, std. error = 0.081, α = 0.649 > 0.05).

According to Levene’s test results, error variances of SUS scores of mobile applications are not homogenous (α = 0.000 < 0.05). Due to inhomogeneous variances of mobile applications, Dunnett test is applied for post hoc tests. The results of the Dunnett test are shown in Table 3. The results of the Dunnett test show that there is no difference between the average SUS scores of YouTube and WhatsApp (α = 0.087 > 0.05) and Mail and Facebook (α = 0.465 > 0.05). In other words, the usability of YouTube and WhatsApp is at the same level and better than the usability of Mail and Facebook by taking into consideration their average SUS scores. In addition, the same steps are followed, taking into consideration adjective rating scales. Error variances of adjective rating scores of mobile applications are not homogenous (α = 0.000 < 0.05) according to Levene’s test results. Due to inhomogeneous variances of mobile applications, Dunnett test is applied for post hoc tests. The results of the Dunnett test are shown in Table 4. The results of the Dunnett test show that there is no difference between the average adjective rating scores of Mail and Facebook (α = 0.978 > 0.05). On the contrary, the results based on SUS scores, YouTube and WhatsApp have a different usability level (α = 0.000 < 0.05) in terms of adjective rating scores. While comparing their average adjective rating scales, WhatsApp is better than the other applications. In addition, YouTube has better usability than Facebook and Mail.

Table 3 Post hoc test results of mobile applications in terms of SUS scores
Table 4 Post hoc test results of mobile applications in terms of adjective rating scores

We also made an item-based evaluation by using GLM univariate analysis. First, if there is any difference between the IOS and Android is checked, and the results show that these two operating systems are different in only two questions “S5—I found the various functions in this system were well integrated” (α1 = 0.018 < 0.05) and “S9—I felt very confident using the system” (α2 = 0.044 < 0.05). While comparing their average scores for S5 and S9, Android is more confident and well integrated than iOS. For all questions, there is no effect of the operating system on the usability of mobile applications. For example, there is no difference between the usability of YouTube working on iOS or Android.

Furthermore, the same kind of analysis is conducted to make an item based evaluation of the usability of mobile applications. For “S1—I think that I would like to use this system frequently”, WhatsApp is better than the other applications, Facebook is the worst. There is no difference between Mail and YouTube (α = 0.32 > 0.05). For “S2—I found the system unnecessarily complex.” Facebook and Mail are unnecessarily complex than WhatsApp and YouTube. There is no difference between Facebook and Mail (α = 0.96 > 0.05) and YouTube and WhatsApp (α = 0.96 > 0.05). For “S3—I thought the system was easy to use.” WhatsApp is more easy to use than the other applications; YouTube is better than Facebook and Mail. Facebook and Mail are at the same level in terms of easiness (α = 0.99 > 0.05). For “S4—I think that I would need the support of a technical person to be able to use this system”, for all applications, there is no need for additional support. For “S5—I found the various functions in this system were well integrated.” WhatsApp is the most integrated application. Mail and Facebook are not well integrated like the others, and there is no difference between them (α = 0.99 > 0.05). For “S6—I thought there was too much inconsistency in this system.” WhatsApp and YouTube are more consistent than Mail and Facebook. There is no difference between WhatsApp and YouTube (α = 0.073 > 0.05), and Facebook and Mail (α = 0.569 > 0.05). For “S7—I would imagine that most people would learn to use this system very quickly.” WhatsApp is easier to learn, Mail and Facebook are more difficult to learn (α = 0.575 > 0.05). For “S8—I found the system very cumbersome to use.” all applications are not cumbersome to use. For “S9—I felt very confident using the system.” Facebook is less confident, and there is no difference between WhatsApp, YouTube, and Mail. For “S10—I needed to learn a lot of things before I could get going with this system.” there is no difference between the application, there is no need to learn many things to use these mobile applications.

Discussion and Conclusion

In this study, we determined usability scores of the four most used mobile applications Facebook, WhatsApp, YouTube, and Mail by using SUS adapted with an adjective rating scale. The applications used in this study are highly popular among people. Therefore, it is not a surprising result that the SUS scores of these applications are acceptable (over 70). In addition, their average SUS score is relatively high (80.63). While comparing the usability of mobile applications with each other, the results show that WhatsApp has the highest usability scores because of its easier to use, less complicated and well-integrated structure. On the other hand, Facebook has the lowest usability scores due to its complex structure and privacy concerns of the users. In addition, the usability of YouTube is less than WhatsApp but better than Mail and Facebook. The result showing that YouTube is better than Facebook in terms of usability is consistent with the study of Kortum and Sorber (2015). In addition, the results related to the usability of Facebook are consistent with the studies in the literature (Hart et al. 2008). The results of the study also correlate with the rankings on the lists “Top Free Applications” on the Apple Store and Google Play Store.

The results show that privacy is an essential aspect of usability and users find Facebook less confident than the other applications. To reduce the privacy concerns of users, Facebook should strengthen their security system against cyber-attacks and guide their users to improve their security against profile hacks. In addition, they should review their privacy and data use policy. Privacy is not just an important issue that Facebook should deal with all mobile application developers should be sensitive to privacy issues if they want to increase the brand loyalty of their users. Furthermore, the results show that all mobile applications are easy to learn, and there is no need for additional support to use these applications. Mobile application developers should also give importance to the complexity, consistency and integration issues, which are also important, issues affecting the usability of mobile applications.

The other aim of this study is to investigate the effect of operating systems on the usability of mobile applications, and the results show that there is no significant difference between the usability scores of mobile applications working on iOS and Android systems. This result is not consistent with the study of Kortum and Sorber (2015) which show that mobile applications on the iOS platform are more user-friendly than Android-based applications. This inconsistency should be raised from the use of different number and type of mobile applications. They conducted their study by using more applications with more participation compared with our study.

In addition, a correlation analysis is conducted to reveal how well SUS scores match with the adjective rating scales. SUS scores correlate well with adjective rating scales (r = 0.674, α < 0.01). This result is compatible with the study of Bangor et al. (2009). On the other hand, in the study of Bangor et al. (2009), the correlation between SUS scores and adjective rating scale is r = 0.822 (α < 0.01). Our correlation rate is less than the correlation rate of Bangor et al. (2009). The difference between the correlations may be due to the several reasons such as the different demographics of the participants, different usability measurement methodologies for a different type of products. Bangor et al. (2009) prefer user testing method after participants performed several tasks for the products such as TV, web, cell phones etc., they completed the surveys. Our study only focuses on the mobile applications and conducts surveys based on the user experience without making any user testing.

This study is crucial because mostly young people (average of 22) participated in the questionnaire. According to the statistics, young people aged 18–24 years spend more time than other age groups and usage time decline with age (Url 7). Therefore, the results reveal the opinions of young people about the usability of mobile applications and the study shows the general aspects of development to get more user-friendly mobile applications. On the other hand, this situation is also a limitation of this study. While usability dimensions such as learnability and complexity of the mobile applications may not be a problem for young people, for the users who are older and disabled, these features could be severe problems. In addition, this study only focuses on mostly used four mobile applications, which owned by the high tech companies such as Google and Facebook. Most of the usability problems that could exist in any ordinary mobile applications are already solved in these mobile applications. Because of this, as a future study, a different type of mobile applications should be investigated to understand the usability of mobile applications in detail with more participants having different demographics.

This study uses SUS adapted with adjective rating scale to measure usability of the mobile applications. SUS is developed for measuring the usability of any products or software; it is not a specific scale for mobile applications. Although it is straightforward to use and quickly applicable, it could only detect general usability problems and give a general overview. To identify more usability problems specific to mobile applications, there is a need for a comprehensive study using a scale, checklist or a usability guideline specific to mobile applications to provide more insights and understanding of usability of mobile applications.