On the dimensionality of the System Usability Scale: a test of alternative measurement models

Borsci, Simone; Federici, Stefano; Lauriola, Marco

doi:10.1007/s10339-009-0268-9

On the dimensionality of the System Usability Scale: a test of alternative measurement models

Letter to the editor
Published: 30 June 2009

Volume 10, pages 193–197, (2009)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Cognitive Processing Aims and scope Submit manuscript

On the dimensionality of the System Usability Scale: a test of alternative measurement models

Download PDF

Simone Borsci¹,
Stefano Federici² &
Marco Lauriola³

5201 Accesses
169 Citations
3 Altmetric
Explore all metrics

Abstract

The System Usability Scale (SUS), developed by Brooke (Usability evaluation in industry, Taylor & Francis, London, pp 189–194, 1996), had a great success among usability practitioners since it is a quick and easy to use measure for collecting users’ usability evaluation of a system. Recently, Lewis and Sauro (Proceedings of the human computer interaction international conference (HCII 2009), San Diego CA, USA, 2009) have proposed a two-factor structure—Usability (8 items) and Learnability (2 items)—suggesting that practitioners might take advantage of these new factors to extract additional information from SUS data. In order to verify the dimensionality in the SUS’ two-component structure, we estimated the parameters and tested with a structural equation model the SUS structure on a sample of 196 university users. Our data indicated that both the unidimensional model and the two-factor model with uncorrelated factors proposed by Lewis and Sauro (Proceedings of the human computer interaction international conference (HCII 2009), San Diego CA, USA, 2009) had a not satisfactory fit to the data. We thus released the hypothesis that Usability and Learnability are independent components of SUS ratings and tested a less restrictive model with correlated factors. This model not only yielded a good fit to the data, but it was also significantly more appropriate to represent the structure of SUS ratings.

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The System Usability Scale (SUS) developed in 1986 by Digital Equipment Corporation© is a ten-item scale giving a global assessment of Usability, operatively defined as the subjective perception of interaction with a system (Brooke 1996). The SUS items have been developed according to the three usability criteria defined by the ISO 9241-11: (1) the ability of users to complete tasks using the system, and the quality of the output of those tasks (i.e., effectiveness), (2) the level of resource consumed in performing tasks (i.e., efficiency), and (3) the users’ subjective reactions using the system (i.e., satisfaction).

Practitioners have considered the SUS as unidimensional (Brooke 1996; Kirakowski 1994) since the scoring system of this scale results in a single summated rating of overall usability. Such scoring procedure is strongly based on the assumption that a single latent factor loads on all items. So far this assumption has been tested with inconsistent results. Whereas Bangor et al. (2008) retrieved a single principal component of SUS items, Lewis and Sauro (2009) suggested a two-factor orthogonal structure, which practitioners may use to score the SUS on independent Usability and Learnability dimensions. This latter finding is very inconsistent with the unidimensional SUS scoring system as items loading on independent factors of Usability and Learnability cannot be summated according to the classical test theory (Carmines and Zeller 1992). Furthermore, these factor analyses of the SUS have been carried out by exploratory techniques, nevertheless these techniques lack of the necessary formal developments to test which of the two proposed factor solutions is the best account of collected data.

Unlike exploratory factor analysis, confirmatory factor analysis (CFA) is a theory-driven approach who needs a priori specification of the number of latent variables (i.e., the factors), of the observed-latent variables correlations (i.e., the factor loadings) as well as of the correlations among latent variables (Fabrigar et al. 1999). Once the model’s parameters have been estimated, the hypothesized model is evaluated according to its ability to replicate sample’s data. These features make the CFA approach the state of the art most accurate methodology to compare alternative factorial structures and eventually decide which is the best one.

Purpose

In the present study, we aim at comparing three alternative factor models of the SUS items: the one-factor solution with an overall usability factor (overall SUS) resulting from Bangor et al. (2008) (Fig. 1a); the two-factor solution resulting from Lewis and Sauro (2009) with uncorrelated Usability and Learnability factors (Fig. 1b) and its less restrictive alternative assuming Usability and Learnability as correlated factors (Fig. 1c).

Methods

Procedure

One hundred and ninety-six Italian students of University of Rome “La Sapienza” (28 males, 168 females, age mean = 21) were asked to navigate a website (http://www.serviziocivile.it) in three consecutive sections (all the students declared they never had previous surfing experience with the website):

1.
In the first 20-min pre-experimental training section, the participants were asked to navigate the website freely in order to learn features, graphic layouts, information structures and lays of the interface.
2.
Afterwards, in the second no-time-limit-scenario-based navigation section, the participants were asked to navigate the website following four scenario targets.
3.
Finally, in the third usability evaluation section, the SUS-Italian version was administered to the participants (Table 1).
Table 1 Synoptical table of the English and Italian versions of the SUS
Full size table

Statistical analyses

All models were estimated by the Maximum Likelihood Robust Method as the data were not normally distributed (Mardia’s normalized coefficient = 10.72). This method provided us with the Satorra–Bentler scaled chi-square statistic (S–Bχ ²), which is an adjusted measure of fit for non-normal data that is more accurate than the standard ML statistic (Satorra and Bentler 2001). According to the inspection of the model’s χ ², virtually any factor model can be rejected if the sample size is large enough, therefore many authors (McDonald and Ho 2002; Widaman and Thompson 2003) recommended to supplement the evaluation of the model’s fit by some more “practical” indices. The so-called Comparative Fit Index (Bentler 1990) was purposefully designed to take sample size into account, as it compares the hypothesized model’s χ ² with the null model’s χ ². By convention (Hu and Bentler 2004), a CFI greater than 0.90 indicates an acceptable fit to the data, with values greater 0.95 being strongly recommended. A second suggested index is the Root Mean Square Error of Approximation (Browne and Cudeck 1993). Like the CFI, the RMSEA is relatively insensitive to sample size, as it measures the difference between the reproduced covariance matrix and the population covariance matrix. Unlike the CFI, the RMSEA is a “badness of fit” index as a value of 0 indicates perfect fit and the greater the RMSEA the worse the model’s fit. By convention (Hu and Bentler 2004), a RMSEA less than 0.05 corresponds to a “good” fit and an RMSEA less than 0.08 corresponds to an “acceptable” fit.

Results

Table 2 shows that the S–Bχ ² was statistically significant for all the models we tested regardless of the number of factors and of whether the factors were correlated or not (Bentler 2004). The inspection of the CFI and RMSEA fit indexes indicated, however, that the less restrictive model assuming Usability and Learnability as correlated factors (Fig. 1c) resulted in a good fit (i.e., CFI > 0.95 and RMSEA < 0.06), whereas the unidimensional factor model (Fig. 1a) proposed by Bangor et al. (2008) resulted only in an acceptable fit (i.e., CFI > 0.90 and RMSEA < 0.00). Differently, the two-factor model proposed by Lewis and Sauro (2009) with uncorrelated factors (Fig. 1b) did not meet with any of the recommended fit indexes.

Table 2 Exact and close fit confirmatory factor analysis statistics/indices maximum likelihood estimation for the system usability scale

Full size table

Since both the Bangor’s and the Lewis and Sauro’s factor models are nested within the less restrictive and best fitting model (i.e., the model with Usability and Learnability as correlated factors) we could formally compare the fit of each of the model proposed in the literature to the fit of the model which they were nested in. Nevertheless, given that we used the Satorra–Bentler scaled χ ² measure for not multivariate normal data, we could not merely assess the χ ² difference of two nested models. Rather we have assessed the scaled S–Bχ² difference according to the procedures devised by Satorra and Bentler (2001). The first contrast, which involved the comparison of the Lewis and Sauro’s (2009) model (Fig. 1b) to the less restrictive two-factor model with correlated factors (Fig. 1c), was statistically significant (ΔS–Bχ ² = 30.17; df = 1; p < 0.001). Likewise, the second contrast, which involved the comparison of the unidimensional model (Bangor et al. 2008) (Fig. 1a) to the less restrictive two-factor model with correlated factors (Fig. 1c), was also statistically significant (ΔS–Bχ² = 28.54; df = 1; p < 0.001). Based on the inspection of absolute and relative fit indexes as well as on the results of formal tests of χ ² differences, we may conclude that the two-factor model with correlated factors outperformed both the factor models proposed in the literature to account for the measurement model of the SUS.

The inspection of model parameters assessed for the best fitting model (Table 3) indicated that all the SUS items significantly loaded on the appropriate factor, with factor loadings ranging from |0.44| to |0.74| for Usability and greater than 0.70 for Learnability. Accordingly, the factor reliability assessed by the ω coefficient^{Footnote 1} yielded fairly high values, such as 0.81 and 0.76, respectively, for Usability and Learnability factors. The correlation of Usability and Learnability was positive and significant (r = 0.70) thus showing that the greater the perceived Usability the greater the perceived Learnability.

Table 3 Maximum likelihood standardized solution for the two-factor model of the system usability scale

Full size table

Conclusions

Despite the SUS is one of the most used questionnaires to evaluate usability of systems, recent contributions have provided inconsistent results regarding the factorial structure of its items, which in turn has important consequences in determining the most appropriate scoring system of this scale for practitioners and researchers. The traditional unidimensional structure (Brooke 1996; Kirakowski 1994; Bangor et al. 2008) has been challenged by the more recent view of Lewis and Sauro (2009), assuming Learnability and Usability as independent factors. Based on a relatively large sample of users’ evaluations of an existing website, we tested which of the two alternative models was the best for SUS ratings. Our data indicated that both the proposed models had a not satisfactory fit to the data with the unidimensional model—being too narrow to represent the contents of all SUS items—and with the two-factor model with uncorrelated factors—being too restrictive for its psychometric assumptions. We thus released the hypothesis that Usability and Learnability are independent components of SUS ratings and tested a less restrictive model with correlated factors. This model not only yielded a good fit to the data, but it was also significantly more appropriate to represent the structure of SUS ratings. Albeit the literature reported greater reliability coefficients (e.g., >0.80) of the Overall SUS scale, the reliability of the two Learnability and Usability factors was in keeping with required psychometric standards for short scales (Carmines and Zeller 1992). Thus, we propose that future usability studies may evaluate systems according to the scoring rule suggested by Lewis and Sauro (2009) which is very consistent with the bidimensional and best fitting model we have retrieved in this study. However, since we have found a relative correlation of Usability factors with Learnability ones, future studies should clarify under which circumstances researchers may expect to obtain Usability scores dissociated from Learnability (e.g., systems with high Learnability but low Usability). In the present study, users evaluated a single system (i.e., the serviziocivile.it website) and this might have boosted up the association of the two factors. Alternatively, our sample of users, who is comprised of college students, might be considered a sample with high computer skills compared to the general population and this might have also boosted up the factor correlation. Other studies of the SUS should, then, consider different combinations of systems and users to test the generality of the correlation of the two factors.

Notes

\( \omega = {\frac{{\left( {\sum {\lambda_{i} } } \right)^{2} }}{{\left( {\sum {\lambda_{i} } } \right)^{2} + \sum {{\text{Var}}\left( {\varepsilon_{i} } \right)} }}} \) where λ_i the standardized factor loadings for the factor and Var(ε_i) the error variance associated with the individual indicator variables (both reported in Table 3).

References

Bangor A, Kortum PT, Miller JT (2008) An empirical evaluation of the system usability scale. Int J Hum Comp Interact 24:574–594
Article Google Scholar
Bentler PM (1990) Comparative fit indexes in structural models. Psychol Bull 107:238–246
Article PubMed CAS Google Scholar
Bentler PM (2004) EQS structural equations modeling software (Version 6.1) (Computer software). Multivariate Software, Encino
Google Scholar
Brooke J (1996) SUS: a 'quick and dirty' usability scale. In: Jordan PW, Thomas B, Weerdmeester BA, McClelland IL (eds) Usability evaluation in industry. Taylor & Francis, London, pp 189–194
Google Scholar
Browne MW, Cudeck R (1993) Alternative ways of assessing model fit. In: Bollen KA, Long JS (eds) Testing structural equation models. Sage, Beverly Hills, pp 136–162
Google Scholar
Carmines EG, Zeller RA (1992) Reliability and validity assessment. SAGE, Beverly Hills
Google Scholar
Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ (1999) Evaluating the use of exploratory factor analysis in psychological research. Psychol Meth 4:272–299
Article Google Scholar
Hu L, Bentler PM (2004) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model 6:1–55
Article Google Scholar
Kirakowski J (1994) The use of questionnaire methods for usability assessment (unpublished manuscript). http://sumi.ucc.ie/sumipapp.html
Lewis JR, Sauro J (2009) The factor structure of the system usability scale. In: Proceedings of the human computer interaction international conference (HCII 2009), San Diego CA, USA
McDonald RP, Ho MR (2002) Principles and practice in reporting structural equation analyses. Psychol Meth 7:64–82
Article Google Scholar
Satorra A, Bentler PM (2001) A scaled difference chi-square test statistic for moment structure analysis. Psychometrika 66:507–514
Article Google Scholar
Widaman KF, Thompson JS (2003) On specifying the null model for incremental fit indices in structural equation modeling. Psychol Meth 8:16–37
Article Google Scholar

Download references

Author information

Authors and Affiliations

ECoNA, Interuniversity Centre for Research on Cognitive Processing in Natural and Artificial Systems, University of Rome ‘La Sapienza’, Rome, Italy
Simone Borsci
Department of Human and Educational Sciences, University of Perugia, Perugia, Italy
Stefano Federici
Department of Psychology of Socialization and Development Processes, University of Rome ‘La Sapienza’, Rome, Italy
Marco Lauriola

Authors

Simone Borsci
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Federici
View author publications
You can also search for this author in PubMed Google Scholar
Marco Lauriola
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Borsci.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borsci, S., Federici, S. & Lauriola, M. On the dimensionality of the System Usability Scale: a test of alternative measurement models. Cogn Process 10, 193–197 (2009). https://doi.org/10.1007/s10339-009-0268-9

Download citation

Received: 30 May 2009
Revised: 12 June 2009
Accepted: 15 June 2009
Published: 30 June 2009
Issue Date: August 2009
DOI: https://doi.org/10.1007/s10339-009-0268-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the dimensionality of the System Usability Scale: a test of alternative measurement models

Abstract

Explore related subjects

Introduction

Purpose

Methods

Procedure

Statistical analyses

Results

Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation