1 Introduction

Smartphones are the fastest growing and most integrated technology in everyday life [1]. In 2022, the number of people using smartphones exceeded 4.7 billion, and it is estimated that there will be 5.1 billion smartphone users worldwide by 2028 [2]. Unlike old-fashioned mobile phones, smartphones are not only communication devices, but they also provide connection to the Internet and social networks 7/24, have mobility, and can be accessed anywhere at any time. Smartphones with more powerful processors and graphical units are so widespread, and this has promoted the development of many applications [3].

There are many useful applications developed for smartphones that have caught up with a considerable breakthrough with the Android and iOS operating systems that have dominated the market for approximately the last ten years. In the first quarter of 2022, there were nearly 3.5 million apps available on the Google Play Store and 1.6 million apps on the Apple App Store. Furthermore, the number of apps downloaded from Google Play reached 27 billion apps in the first quarter of 2023. On the other hand, the Android app retention rate after 30 days from installing the apps was only 2.6% [4], while iOS app retention rate after 30 days was at 43% in the third quarter of 2022 [5]. Studies related to mobile application usability reveal that usability related factors may influence the decision to use mobile applications and loyalty to the brand [6, 7]. For this reason, understanding usability factors and integrating these factors into the design, development, and evaluation may enhance user experience in mobile application use.

While a significant body of research has examined the usability of mobile applications, relatively few studies have created an instrument for measuring usability specific to mobile applications. Hoehle and Venkatesh [6] conceptualized mobile application usability based on Apple’s iOS Human Interface Guidelines and then created a survey instrument. In a similar study, Hoehle et al. [7] developed a survey instrument based on Microsoft’s usability guidelines by implementing the same methodology as Hoehle and Venkatesh [6]. While these two studies [6, 7] examined the mobile application development guidelines created by major mobile operating system providers such as Microsoft and Apple, the guidelines provided by Google for the Android mobile operating system still need to be explored. Considering that the Android operating system is the most used operating system not only among mobile operating systems but also among all operating systems (computer and mobile OSs) [8, 9], there is a noticeable gap in the literature. Addressing this gap, the present study aims to conceptualize mobile application usability and develop a survey instrument based on Google’s mobile application development guidelines. Furthermore, while previous research provided fundamental instruments, there is a need for updated research instruments due to the rapid development of mobile technology. This work addresses this need and introduces novel constructs, such as ‘Sound,’ ‘Component properties and shape’ and ‘Colour’, not discussed in previous studies [6, 7]. The proposed instrument can help mobile app developers and usability experts efficiently gather and analyse user feedback, providing invaluable insights for application improvements. Furthermore, this instrument can serve as a comprehensive guide for developers during the various phases of the mobile app’s lifecycle, including design, development, and testing, leading to more user-friendly applications.

This paper is structured as follows: the second section discusses mobile application usability literature. The following section presents the methodology and the findings. The last section discusses managerial implications, limitations, and future research.

2 Literature review

Usability is ‘the extent to which a system, product, or service can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use’ [10]. Usability is crucial factor for the success of mobile applications, and poor usability is a primary reason users choose not to use certain apps [6]. Lack of usability in mobile applications can lead to reduced productivity and decreased user satisfaction. Therefore, when designing a new mobile app, it is essential to consider the attributes specific to mobile applications. However, it should not be neglected that mobile devices and applications are constrained by limitations such as small screen size, Internet connection speed, data entry method, limited processing capacity and power, battery life, and low-resolution settings [11].

In the literature, there are several studies examining mobile application usability for specific contexts such as mobile health [12,13,14,15,16,17,18,19], mobile commerce [20,21,22], mobile learning [23,24,25], mobile government [26], mobile library [27], mobile travelling [28], mobile shopping [29], and mobile game [30]. Different methods have been employed in various studies to investigate mobile application usability, including heuristic evaluation [25, 26], usability testing [18], and questionnaires [16, 17, 24, 27]. Most of the studies using questionnaires implemented existing usability questionnaires such as SUS [31,32,33,34], PSSUQ [35,36,37], and CSUQ [38,39,40,41]. However, a few studies attempted to develop a usability instrument for mobile applications. Hoehle and Venkatesh [6] and Hoehle et al. [7] utilize grounded theory to conceptualize mobile application usability to develop instruments.

Grounded theory, a systematic approach to identifying patterns and developing concepts and theories from the data, was first proposed by Glaser and Strauss [42]. According to Makri and Neely [43], one of the biggest advantages of grounded theory for the researcher is that the gap in the literature can be identified in a more practical way through the eyes of practitioners. The open and axial coding procedure recommended by Strauss and Corbin [44] was followed during the domain development phase of this study. This approach is an iterative process that begins with open coding, where the main source of information is inspected line by line and divided into separate codes, and continues with axial coding, where connections between categories and subcategories are established [45].

Grounded theory has been widely used in numerous studies in many fields such as medicine [46,47,48], computer science [49, 50], social sciences [51,52,53], and business & management [54, 55] to explore new concepts and theories. In addition to these, there are also studies using grounded theory in the field of usability. Ming et al. [56] investigated the factors affecting the user behaviour intention to use the mobile library application, Mai et al. [57] investigated the functionality and management of cryptocurrency tools from the users’ perspective, Gallagher et al. [58] examined the usage of an anonymity software system, and Wang [59] investigated the low utilization of government websites using grounded theory. However, there are only a few studies in the literature that employ grounded theory to define concepts related to mobile application usability, despite the fact that it is widely used in the field of usability [6, 7]. Hoehle and Venkatesh [6] conceptualized mobile application usability by examining the user experience guidelines that Apple provides to developers on their website. As a result of their conceptualization study using grounded theory, they defined 6 second-order constructs including application design, application utility, user interface graphics, user interface input, user interface output, and user interface structure and 19 first-order constructs regarding mobile application usability. Similarly, Hoehle et al. [7] examined Microsoft’s mobile usability guidelines and developed 10 constructs regarding mobile application usability, including aesthetic graphics, colour, control obviousness, entry point, fingertip-size controls, font, gestalt, hierarchy, subtle animation, and transition. In this study, a conceptualization of mobile application usability was made by examining the guidelines on the material.io [60] website prepared by Google for mobile application developers. Therefore, this study makes an important contribution to the literature by confirming the applicability of grounded theory in the field of mobile application usability. Furthermore, a survey instrument is developed based on the concepts to measure the usability level of mobile applications from the perspectives of mobile application users.

3 Conceptualization and survey instrument development

In this study, we follow a three-step procedure: (1) Domain Development, (2) Survey Instrument Development, and (3) Evaluation of Measurement Properties, adapted from the Grounded theory procedure of Strauss and Corbin [44], Lewis et al.’s [61] three-step construct development methodology and MacKenzie et al.’s [62] 10-step validation procedure. In the first step, conceptual coverage of the constructs based on the material.io [60] website is defined through content analysis and open & axial coding procedures. The second step includes the development of a survey instrument and assessment of the instrument through a face validity check (pre-test), pilot study, and content validity check. The third step validates the survey instrument with explanatory and confirmatory assessments. The details of the three-stage conceptualization and survey instrument development procedure are given in Appendix 1, Figure 2.

3.1 Construct domain development

The process to be followed in the creation of constructs involves the careful analysis of the document taken as a source. This helps in understanding the importance of the constructs to be formed and the purpose they serve in the conceptualization of mobile application usability. This process should be carried out roughly by making inferences from a written text such as a literature review, case study, interview, or open-ended questionnaire study [61]. Initially, an open and axial coding procedure was carried out for content analysis of the material.io [60] website. The reason for analysing this content is that the material.io [60] contains adaptive guidelines regarding user interface design, which facilitates the connection between mobile application designers and developers, thus enabling companies to build better mobile applications quickly. Open coding is defined as an analytical process in which the definition of concepts is made and the dimensions of these concepts are revealed. Axial coding is the process of associating a category with its subcategories and linking categories and subcategories (belonging to the relevant categories) together in terms of their properties and dimensions [44].

Regarding the open and axial coding process, the material.io [60] website was studied with the utmost care during the 11 days, taking into account Strauss and Corbin’s [44] line-by-line analysis. During the 11 days, in the light of Lewis et al.’s [61] suggestion, the content analysis was provided with more than one iteration and the information contained in the constructs was formed. The questions to be answered and especially emphasised in this process are as follows: ‘What are the major criteria/elements associated with mobile application usability on the material.io website?’, ‘What are the keywords concerning these criteria/elements?’, ‘Do these keywords contain enough information and do they contain more subcategories?’ and ‘What are the definitions assigned to keywords?’.

Accordingly, all the content on the material.io [60] website was analysed and constructs were created and subcategories within these constructs were composed. In addition, open codes representing the aforementioned subcategories were assigned to subcategories—regarding similarities and/or differences—using axial coding. For instance, it can be described that two open codes under the colour axial code went through the following processes: ‘(1) When appropriate, mobile applications should have baseline theme, primary and secondary colours and some additional colours such as colours for backgrounds, surfaces, errors, typography, and iconography. (2) Additionally, text and important elements, like icons, should meet legibility standards when appearing on coloured backgrounds’. As a result of the line-by-line analysis on the material.io [60] website, these two open codes were conceptualized and gathered under the ‘colour usage’ subcategory. Then, using the axial coding procedure, these two open codes were placed under the main category that is ‘Colour’.

All content on the material.io [60] website was coded and transformed into a coding matrix using the draft mentioned in Miles and Huberman’s [63] study. Afterwards, this coding matrix was reviewed and assessed by one associate professor, two PhD students and one MSc. student working or studying at the university. Attention was paid to the fact that the judges selected to fulfil this task were studying in the area of HCI, but they were not particularly mobile usability experts. This was done to avoid a prospective/potential bias in the evaluation process. The first template obtained at the end of the 11 days was discussed many times and shaped according to the feedback received and took its ultimate form in the eighth draft.

Appendix 3, Table 8 shows the final open and axial codes matrix adapted from Google’s [60] mobile application development guideline. In this table, besides the data obtained from the material.io [60] website, there are also literature-based axial codes that the judges advise to be in the guideline. For instance, it was concluded that responsiveness and consistency axial codes are the basic contents for the usability of mobile applications and it was indicated that these factors should be supplemented to the conceptual structure in this study. As a result, the axial codes in the leftmost column of Appendix 3, Table 8 form a conceptual basis for each construct. While the subcategories in the middle column show a subcategory of constructs, the rightmost column lists open codes derived from material.io [60].

At the end of the analysis, a total of 12 constructs (Branding, Colour, Component properties and shape, Consistency, Feedback, Help, Interaction, Navigation and motion system, Responsiveness, Sound, Typography, text and writing, Visualization, imagery and iconography) are formed and then these constructs are related to the usability literature in the domain of desktop computers, websites, mobile devices, and mobile applications. The list of these studies is given in Appendix 4, Table 9. In addition, as regards the construct conceptualization, the concepts are discussed next.

3.1.1 Branding

Branding can be expressed as a process that aims to promote a brand and answers the questions such as who the brand is and why it should be trusted [64]. In various studies on the web and mobile usability, it has been observed that usability and branding factors are taken into consideration and used as variables in models. In addition, it has been frequently elicited that there is a statistically significant positive correlation between these concepts [64,65,66,67,68]. In a study in which website usability was expressed in terms of website quality, it was argued that branding should be used as an auxiliary factor in increasing the trustworthiness of e-commerce websites [69]. For Android and iOS-based mobile applications, it has been suggested/pointed out that the use of a branding kit on the interfaces of applications adds a dynamic ambience to the application and enhances user experience [70]. In this study, branding is defined as ‘the measure of perception by the user whether the elements of branding are used effectively in a mobile application.’ In addition, this conceptualization for branding aligns with the explanations of Hoehle and Venkatesh [6].

3.1.2 Colour

The use of colours in menus, buttons, icons, links, outlines, and background is important for the usability of the systems [71,72,73,74,75]. Colour is a visual element and influences visual aesthetics [76], which in turn influences the user experience on mobile devices [77]. In a study conducted on Taiwan mobile commerce applications, an experimental test showed that design aesthetics, including colour elements, ensure mobile trust through personalization, usefulness and ease of use factors [78].

Even though colour has not been considered a distinct construct in the studies, the colour concept has emerged as an important factor in the design of mobile applications on the material.io [60] website. Therefore, colour will be considered as one of the dimensions of mobile application usability rather than being categorized only under the elements like visual appeal, design aesthetics, and aesthetic graphics. Colour is defined as ‘the measure of perception by the user whether the colour factor is used effectively and conveniently in a mobile application’.

3.1.3 Component properties and shape

In Google’s mobile application development guidelines, comprehensive explanation is provided related to visual design, component properties, shape elements, and the design content based on components. It is also known that the information regarding material design in these guidelines is used to develop the graphical user interface [79]. Standard components commonly found in Android-based applications include the status bar, application bar (action bar), navigation drawer, tabs, floating action buttons, and notification screen [80]. Also, Clifton [80] stated that expressions should be made using shapes extensively in mobile applications. Thus, users should be able to make inferences by simply looking at the shape of any element without even reading any text.

Component properties and shape, which have rarely been encountered in the literature, are considered as a construct in this study and conceptualized accordingly. Even though constructs such as animation and transition are mentioned by Hoehle and Venkatesh [6] and Hoehle et al. [7], the focus of this study is mainly on the dimensions of mobile application usability, specifically component properties and shape. Component properties and shape are defined as ‘the measure of perception by the user whether components and shape factor are used effectively and for the favour of the user in a mobile application’.

3.1.4 Consistency

Consistency, one of the most prominent characteristics of usability [81], can be simply defined as an action or task performed at an interface in the same or very similar way, with the expectation of the same or very similar result [82]. According to Steinau et al. [81], systems with these features are known to be learned more rapidly by the users and increase the users’ satisfaction. George [83] also argued that all elements used during the design of websites should express a cohesive whole and be consistent with each other. Accordingly, a consistent website not only attracts in terms of visuality but also shortens users’ learning curves and increases the overall usability of the website. In the context of mobile applications on the Android platform, Alharbi et al. [84] noted that the number of mobile applications is too high and one of the elements that distinguish an application from others is its level of consistency. They also highlighted that an application with inconsistent content can confuse users, annoy, and even harm them.

In this study, consistency is defined as ‘the measure of perception by the user that a function or task performed by the user in a mobile application always has the same or very similar input and output’. The consistency defined and conceptualized above matches with the literature.

3.1.5 Feedback

The concept of feedback is based on systems (desktop, software, or mobile) informing users about what is happening in the system at appropriate time intervals [85]. The feedback concept which is considered a usability attribute by Kim et al. [86] should be included in usability tests [87].

Especially in field studies, feedback is important in terms of usability. Preuschl et al. [88] put forward that the mobile feedback system enlightens exercisers about their performances by considering their body movements and physical strains of the exercisers while exerting physical effort. Thus, the application server used becomes a more usable data provider for the users.

The feedback mentioned in terms of the help and visibility of system status elements in Nielsen’s 10 heuristics [89] was conceptualized in this study as a separate construct. It is obvious that feedback, which is not contained in the studies of Hoehle and Venkatesh [6] and Hoehle et al. [7], should be deemed as a result of both the information on the material.io [60] website and the supportive statements obtained from the literature research. In this study, feedback is defined as ‘the user’s perception of how effectively the feedback feature is used in a mobile application’.

3.1.6 Help

The Help construct, which is one of Nielsen’s 10 usability heuristics, is mentioned with documentation and bug reporting elements for websites and mobile applications in the literature. In line with the literature, the content of the help constructs on the material.io [60] website consist of content aimed at facilitating navigational access to help resources, reporting potential errors and providing step-by-step instructions for the user.

It was determined that the presence of a tool embedded in the application that provides required support to the users while performing a task in web applications increases usability. It was also underlined that this assistance support should be integrated into the process when users need it and while performing their tasks [90, 91]. In this study, the construct of help is defined as ‘the measure of perception by the user whether the help element is used efficiently and effectively in a mobile application’.

3.1.7 Interaction

Interaction is considered one of the important dimensions of usability in the literature [86, 92,93,94]. Its indirect effect on usability through visualization and orientation in mobile devices is also validated [95, 96]. In addition, the inadequacy in defining the key features of mobile devices may affect user interaction negatively [6]. Therefore, this study conceptualizes the frequently mentioned usability and interaction elements in the literature. The interaction construct, in the light of the material.io [60] website, covers an expanded version of the control obviousness explanations found in Hoehle et al.’s [7] study. In this study, interaction is defined as the user’s perception of how effectively all elements related to interaction are utilized in a mobile application.

3.1.8 Navigation and motion system

Navigation is related to the menu structure element and the transitions (with motion system) between the hierarchical layers, and mobile applications should be user-friendly when switching from one page to another [97]. In addition, navigation is the most common form of interaction with the mobile application and has a significant effect on the satisfaction of mobile application use [98]. When the movement system is considered Google’s [60] document is not limited to only switching from one page to another or easy use in terms of navigation. According to Clifton [80], the share of transitions in the construct of the motion system is undeniable because the use of transitions is of great importance in the process of what was on the screen before and what will happen next. Besides, Mew [99] explains that using the transition element during animations on mobile applications helps users to navigate from one action, task or activity to another.

In this study, navigation and motion system is defined as ‘the user’s perception of their ability to navigate from one page (or level) to another and the movement of the elements in the mobile application’. This definition is compatible with the concept of navigation and motion systems that Zhang and Adipat [11] mentioned in their study.

3.1.9 Responsiveness

Responsiveness is defined as the ability to detect and measure changes in an instrument, even if they have a trivial effect [100, 101]. Looking at the element of responsiveness in mobile website design, Groth and Haslwanter [102] considered responsiveness as a non-static approach, in which the content adapts to screen size and adjusts the layout. The study conducted by Karkin and Janssen which focuses on Turkey’s local government website [103], defined usability as the content, quality and functionality of the website. Many studies point out that usability is closely related to user satisfaction, accessibility, and involvement rate.

According to Palmer [104] responsiveness is one of the five basic design elements that are mandatory to improve website usability. Mullins [105] concludes that a responsive design can be costly, but when applied correctly and matched with the right content, it is beneficial in terms of user experience and usability in the long term. Responsiveness, which is expressed as the system’s response at the right time and used intertwined with interaction by [92] in the current study, is defined as ‘the user’s perception of a mobile application’s ability to adapt to the changes encountered’.

3.1.10 Sound

Although the use of sounds seems to be an important component for users, they should be used appropriately and effectively to increase usability. Redundantly distracting sounds annoy users rather than increase usability [106]. For instance, in a study on health, fitness and diet applications, it was recommended that notification sounds, alarms and music should be designed to benefit users and serve as reminders for important tasks [107].

In this study, sound is defined as ‘the user’s perception of how effectively sound elements are used in a mobile application’. Sound construct, which has not been addressed in any previous conceptualization study reviewed in the literature, will be conceptualized for the first time in this study.

3.1.11 Typography, text, and writing

The use of typography in a written text to convey the necessary message to the user is called typographic design [108], and discussion on how to create the typography was the subject of the pre-HCI period when printed media were used [109]. Considering the typography, which is stated to constitute 95% of the web content [108], the first subelement that draws attention is the font. When it comes to font, it is commonly believed that “Bigger is better” although the weight and size of the font may vary based on screen sizes. In this context, it has been noticed that larger font sizes, standard width instead of compressed width, and uppercase letters instead of lowercase are often preferred [110, 111].

One of the implications derived from these studies is that typography, text, and writing elements are not only associated with font size in the mobile application context. Therefore, in this study—in light of the content of the material.io [60] website—mobile application typography and therefore the font selection (particularly the size and weight), alignments, paragraphing, white areas and text and writing elements will also be considered.

Hoehle et al. [7] considered only the font element as a construct in their study, this concept will be presented in an expanded manner in this study. Typography, writing and text are defined as the user’s perception of how effectively typographic elements are used in a mobile application.

3.1.12 Visualization, imagery, and iconography

In the literature, there are many studies on the emphasis of using images and visualization (graphics and dashboards, if any) elements and particularly icons on websites and mobile applications. The use of powerful and interactive images and icons can speed up the process or tasks in mobile applications, thus saving energy and space [112]. For instance, it was demonstrated that the addition of icons that provide a realistic experience to mobile applications may create a better user experience by rapidly grasping which function the icons are placed in the minds of the users [95, 113]. However, when icons, which are defined in the naivest sense, as the visual representation of an object, action, or idea, are used in mobile applications, there are many matters to be considered. Icons ought to be aesthetic and separable from other objects [114], and their design selection ought to be made correctly to convey their meaning directly to the users [112].

In this study, visualization, image and iconography are defined as “the user’s perception of whether the necessary elements are effectively used and visualized in a mobile application” This definition of the conceptualization under this title is in line with Hoehle and Venkatesh’s [6] “short icon-labelling” and Hoehle et al.’s [7] “aesthetic graphic” definitions.

3.2 Survey instrument development

The second step of the methodology focuses on survey instrument development and consists of three separate substages [61]. First, items representing the constructs ought to be developed [62]. Then, a pre-test, which is also referred to as a face validity check [6], will be performed for the listed items [61]. After this stage, a pilot test will be carried out with users representing the main population, and the expression style and wording of the items will be refined and corrected according to feedback [62, 115]. At last, items will be screened and content validity will be checked numerically. The content validity measures the extent any item represents the related construct [61, 115]. The number of items revised and eliminated during these stages are summarized in Fig. 1.

Fig. 1
figure 1

Phases of survey instrument development: refinement and validation

In the current study, the item pool was formed using the open codes created in the previous stage and listed in Appendix 3, Table 8 and the keywords obtained from the relevant literature. In the item creation process, the open codes given in Table 8 were used by leveraging the information in the literature. A total of 98 items that measure 12 constructs were developed. In the creation of item development, each construct was represented with at least 4–6 items with recommendation of Hoehle et al. [7]. For example, within the ‘Branding’ construct, the open code ‘Show brand colours at memorable moments that reinforce your brand’s style’ was iteratively refined during the survey instrument development process to become the survey item ’In general, the mobile application uses brand colours to reflect the brand style’.

3.2.1 Face validity check (pre-test)

Face validity test was applied to 98 items, which present a portrait of the most principal aspect of 12 constructs, for simplifying and correcting wording. For this purpose, four volunteers participated in the face validity study. Before the survey, as a prerequisite, the participants were asked whether they have a smartphone and have enough mobile application experience. This ensures that the content can be fully understood by the participants without any confusion. Besides, it was underlined to the participants that they should only focus on the content of the items and do not need to put the items in any order. In the survey, a feedback field was presented to the participants for each item. Participants were asked to review each item and mark items that were confusing, meaningless or ambiguous, and feedback was received on these matters. As a result of the feedback received for 98 items, it was concluded that 39 items were unclear. Therefore, these items were excluded from the measurement instrument. For 22 items, participants proposed minor changes. As a result, 59 items were moved to the next stage, the pilot test. In this process, the criterion that each construct is represented by at least 4 items was also taken into consideration.

3.2.2 Pilot test

The primary purpose of conducting the pilot test is to expand the evaluation of the instrument after the revision of the face validity check. It was recommended that the sample to be selected in the pilot study should have similar characteristics to the larger sample [61]. Therefore, responses were collected from 30 participants representing the main population. The demographic profile of the participants of pilot test is given in Appendix 5, Table 10.

For the participants to fill in the survey in their mother tongue and give more convenient answers, the items were expressed in both English and Turkish without loss of meaning, namely bilingual translation. Feedback was received from the participants during the pilot test, and they were asked to make suggestions to improve the items. After the pilot study, 21 of the 59 items in the item pool were flagged to be corrected or removed from the pool. Six of these items were removed from the pool, expression corrections were made to thirteen of them, and a new item was suggested by the reviewers to be included in the item pool. It was also considered that each construct should be delineated by at least 4 items [7]. As a result, 54 items were elicited in the pilot test.

3.2.3 Content validity check

A content validity check is defined as the degree to which an element represents the construct to which it belongs [61, 62, 116]. In the content validity check, the following two questions should be answered appropriately: (1) Does any item represent the content of the construct to which it belongs? (2) Do all items belonging to any construct represent the whole content of that construct? [62].

For content validity check, Anderson and Gerbing’s [115] approach was used. In their approach, each item can represent only one construct. In this approach, a matrix structure is used where the items are in the row and the constructs are in the column, and participants are expected to mark which item belongs to which construct. Thus, the most appropriate item-construct match might be provided. In the quantitative analysis of content validity, PSA (proportion of substantial agreement) and CSV (substantive validity coefficient) indices were used as explained in the study by Anderson and Gerbing [115]. PSA is the numerical value of how many participants an item is assigned to the relevant construct. The formula of the PSA is as follows:

$${P}_{{\text{SA}}}=\frac{nc}{N}$$

In the above formula, N represents the total number of participants, while nc shows the total number of participants who assigned any item to the relevant construct. If this value is close to 1, it means that there is a stronger consensus that an item belongs under any construct.

The CSV index is a numerical measure of the extent to which an item is placed in a construct while measuring the extent to which it is placed in any other construct. The formula of the CSV is as follows:

$${{\text{C}}}_{{\text{sv}}}=\frac{nc-n0}{N}$$

In the above formula, N indicates the total number of participants, while nc indicates the total number of participants who assigned any item to the relevant construct. On the other hand, n0 indicates the number of the second most construct to which an item is assigned, apart from the related construct. CSV value takes a value between − 1 and 1, and positive values indicate that an item is assigned to the construct where it is expected to be assigned more than other constructs.

In this study, 41 people were selected for the content validity check as the recommendation of Hunt et al.’s [116], as the number of participants should be at least 12. The matrix in which the items are in the row and the definitions of the constructs in the column were presented to the participants in the form of a survey of several pages. The demographic information of the 41 participants is given in Appendix 5, Table 10, and most of the participants were selected from the young adult group, and relatively high mobile application experiences and competence were counted. The results of PSA and CSV indices of the constructs are shown in Appendix 2, Table 7. According to the results, the CSV of 10 of the 54 items was below the recommended threshold value of 0.25 [117]. It was heeded that each construct should be represented by at least 4 items, and it was decided to drop three of the 10 items (CMPS1, CON5, NVMS5) from the item pool, while the expressions of seven items (BRND3, CON2, CON4, FEDB1, FEDB2, RSPN1, RSPN4) were rectified. After the modification was made to better align to the construct domain, 51 items related to the constructs were validated through a content validity check. The final item pool is given in Appendix 6, Table 11.

3.3 Evaluation of measurement properties

The third phase of the methodology focuses on the evaluation of measurement properties. At this step, data were collected twice and the measurement properties of the instrument were evaluated and optimised. To evaluate measurement properties, both exploratory and confirmatory assessments should be applied consecutively. They also added that these two analyses should be done on distinct samples to elicit better results. Exploratory factor analysis (EFA) measures the factor structure in the first sample, and confirmatory factor analysis (CFA) verifies the scale characteristics [61].

3.3.1 Exploratory assessment

The survey instrument developed in the second phase is first assessed through explanatory factor analysis. A survey methodology was used and all the items in the survey instrument were measured on a 7-point Likert scale (e.g. from 1 = totally disagree to 7 = totally agree). The target population is mobile shopping application users in Turkey. The decision to test the developed instrument on mobile shopping applications has several reasons, including their increasing popularity, significant investments in the retail sector, companies’ strategies directing customers to mobile shopping, high user engagement, and a diverse user base. Furthermore, these applications are utilized for a range of activities, from casual browsing to executing complex transactions [118,119,120]. At the beginning of the survey, a list of the most commonly used mobile shopping applications in Turkey was presented to the participants: Trendyol, N11, Hepsiburada, Gittigidiyor, Letgo, Amazon, Çiçeksepeti, and Morhipo. The survey was collected through an online system and at the beginning of the survey participants were requested to choose the most frequently used mobile shopping application. Then, the survey questions were automatically adapted to the related mobile shopping application. In this way, for instance, instead of ‘The mobile application uses a legible font’. The statement ‘Trendyol mobile application uses the legible font’. was shown to participants in the survey form.

A total of 309 questionnaires were collected from the users of the mobile shopping application. Some of the questionnaires were excluded from the sample since respondents respond to the question—‘Please do not answer this question.’. In the end, a total of 293 questionnaires were available for further analysis. For the analysis, IBM SPSS Statistics 26 package programme was used. The demographic profile of the respondents is given in Appendix 5, Table 10.

Initially, Bartlett’s test of sphericity and Kaiser–Meyer–Olkin (KMO) tests were conducted to test whether the data obtained from the survey study fit the factor analysis [121]. The recommended KMO value is above 0.6 [122]. The results reveal that the KMO value is nearly 0.93 and Bartlett’s test is significant at 0.01 level. This shows that the data is fit for factor analysis. In the analysis, we conducted a maximum likelihood (ML) factor analysis with varimax rotation to reduce the data. To determine the number of factor structures, the “Eigenvalue one” criterion was used. The ML produced twelve factors that accounted for 61.5% of the total variance. The factors extracted from the explanatory factor analysis are equal to the constructs conceptualized in the first section as the constructs. For interpretation, a 0.5 cut-off point for item loadings was chosen [123]. As shown in Table 1, all the factor loadings were greater than 0.5 except TTW4, which was 0.39. This item was dropped from the survey instrument. With regard the internal reliability, Cronbach alpha values range between 0.78 for “Responsiveness” and 0.88 for “Feedback,” surpassing the critical value of 0.7 [124].

Table 1 Exploratory factor analysis: item descriptive statistics and loadings, Cronbach’s alpha values and variance explained

3.3.2 Confirmatory assessment

Lewis et al. [61] recommended that the finalized item set obtained from the exploratory assessment should be re-evaluated by confirmatory assessment. By selecting a new random sample that is independent of the sample used in the exploratory evaluation, a re-evaluation of the survey properties should be performed through confirmatory factor analysis. Confirmatory factor analysis is an ideal analysis method that evaluates the effectiveness of the measurement created for items and measures consistency between the theoretical concept and the structural equation model [125].

Furthermore, the nomological network of the scale as part of the confirmatory evaluation is also recommended by Lewis et al. [61]. It was underlined that to evaluate the nomological validation of conceptualized constructs, it is essential to collect data that may be theoretically related to these constructs [62]. To evaluate the nomological validity of usability constructs, we examine the impact of mobile application usability constructs on continued intention to use, satisfaction and brand loyalty. Brand loyalty is defined as the desire of users to revisit a website or mobile application or to repurchase a preferred product or service [126, 127]. Gommans et al. [128] visualized that electronic loyalty is affected by five factors and one of them is website design. The well-designed navigation and menu structure, customized content, different language options, personalized features, effective search function and appropriate feedback mechanisms directly affect users’ intention to revisit the website. Cyr et al. [127] pointed out that having a simple page outline on websites and the use of decorative colours, typography, iconography, images, and visuals promote the aesthetic element and utility perceived by the users and thus have a positive influence on brand loyalty. Similarly, it was stated that factors such as colour, typography, iconography, menu design, and help menu positively affect the intention of users to continue to use and brand loyalty [129]. Hoehle et al. [7] found the significant effect of aesthetic graphics, entry point, fingertip size controls, gestalt, subtle animation, and transition on continued intention to use mobile applications.

Furthermore, in the same study, it was found that brand loyalty to mobile applications is explained by control obviousness, fingertip size controls, gestalt, hierarchy, subtle animation, and transition. Dou et al. [130] put forward that a mobile application with a good design model also highlights the branding element, and this has a positive effect on user satisfaction. Therefore, the following research questions will be answered for nomological validation of mobile application usability factors.

RQ1 Do the mobile application usability factors have an effect on satisfaction?

RQ2 Do the mobile application usability factors have an effect on continued intention to use?

RQ3 Do the mobile application usability factors have an effect on brand loyalty?

Considering all this information, to measure brand loyalty, continued intention to use and satisfaction constructs, items representing these constructs were found and adapted from the studies in the literature [67, 131,132,133,134,135,136]. Items listed in Table 2 will be included in the mobile application usability survey instrument and for nomological validation, the effects of mobile usability factors on satisfaction, continued intention to use and brand loyalty will be analysed.

Table 2 Outcome variables to be measured by scales

For the confirmatory assessment, a new, second sample was collected from the population that uses the mobile shopping application but not participated in the first study. A total of 374 respondents accepted to participate in the study. Since some of the respondents responded to “Please do not answer this question,” a total of 340 questionnaires were available for confirmatory factor analysis. The demographic information of the participants is shown in Appendix 5, Table 10. IBM's SPSS AMOS 26 programme was used to analyse the data.

First, the normality of the data was tested with the skewness value. According to the result, the skewness value is between − 2 and 2, which shows the fitness of the data to the normal distribution [137]. In the analysis of CFA, the fit indices of the confirmatory model were also checked. According to the statement of [138], “the researcher should report one incremental and one absolute index, in addition to the χ2 value and associated degrees of freedom, and at least one of these indices should be badness-of-fit indices;” the values of Chi-square, degrees of freedom, comparative fit indices (CFI), Tucker Lewis index (TLI), Incremental Fit Index (IFI), Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Residual (SRMR) were used to check the fitness of data. According to the result, as shown in Table 3, the goodness of fit statistics (CFI at 0.91, IFI at 0.91, TLI at 0.90), badness-of-fit statistics (RMSEA at 0.05, SRMR at 0.05) and Chi-square to degrees of freedom ratio (at 1.68) provided a reasonably good fit to the data.

Table 3 Confirmatory study: model fit indices

The convergent validity of the confirmatory model is measured by standardized factor loadings, t-values, Cronbach alpha, composite reliability (CR) and average variance extracted (AVE) values, as shown in Table 4 and Table 5. The standardized factor loadings of all items are distributed between 0.70 and 0.91, which surpasses the threshold point of 0.70 [138]. All the t-values of the items were also significant at the 0.01 confidence level [139]. Furthermore, the internal reliability consistency was measured by Cronbach alpha and CR values. The minimum value of Cronbach’s alpha values is 0.81 for “Responsiveness,” which is greater than the threshold value of 0.70 [124] and the minimum CR value is also for “Responsiveness” with a value of 0.84, greater than the recommended value of 0.70 [141]. Furthermore, the AVE values of all constructs exceed the recommended value of 0.50 [142].

Table 4 Confirmatory study: item descriptive statistics, standardized factor loadings and Cronbach’s alpha values
Table 5 Confirmatory study: AVE, CR, and correlations

The discriminant validity of the constructs is checked with the procedure proposed by Fornell and Larcker [142]. In this procedure, the square root of the AVE value of each construct should be greater than its correlation with other constructs [143, 144]. In Table 5, the values on the diagonal axis show the square root of AVE values, and each value is greater than the correlation value with other constructs. This shows the discriminant validity of the constructs [142].

Table 6 R-square value of the outcome variables and standardized regression weights of each factor

The effects of mobile application usability factors on continued intention to use, brand loyalty, and satisfaction were tested for the nomological validation of the model. As shown in Table 6, regarding the research question 1, a total of 51.7% of the variance in satisfaction was explained by ‘responsiveness’, ‘typography, text and writing’, ‘consistency’, ‘colour’, ‘sound’, ‘navigation and motion system’ and ‘interaction’. In addition, regarding the research question 2, 32.7% of the variance in continued intention to use was explained by the usability factors of ‘typography, text and writing’, ‘component properties and shape’, ‘navigation and motion system’ and ‘responsiveness’. Furthermore, ‘colour’, ‘typography, text and writing’, ‘consistency’, ‘sound’ and ‘responsiveness’ explain 36.2% of the variance in brand loyalty, as questioned in RQ3. The significant relationships between usability factors and continued intention to use, satisfaction and brand loyalty provide evidence for the applicability of the developed survey instrument [61].

4 Discussion and conclusion

This study conceptualizes mobile application usability based on Google’s mobile application development guidelines and proposed a survey instrument to measure mobile application usability. A three-step formal methodology, adapted from the studies of Lewis et al. [61], MacKenzie et al. [62] and Strauss and Corbin [44], was followed. In the first phase, the construct domain related to mobile application usability was conceptualized based on Google’s mobile application development guidelines and the literature. We conceptualized mobile application usability via twelve constructs such as ‘Help’, ‘Colour’, ‘Feedback’, ‘Typography, text and writing’, ‘Visualization, images and iconography’, ‘Consistency’, ‘Sound’, ‘Branding’, ‘Component properties and shape’, ‘Navigation and motion system’, ‘Responsiveness’ and ‘Interaction’. In the second phase, the survey instrument was developed by carrying out the scale development process. For this, first of all, the item pool was created by adhering to the domain created in the first phase. The pre-test, pilot test, and content validity check were subsequently performed for the content validity of the survey instrument. The third phase of the methodology includes testing the validity and reliability of the survey instrument with exploratory and confirmatory assessment techniques. We collected two independent samples from mobile shopping users on their smartphones. In the first sample, explanatory factor analysis was performed for the factorial validity and reliability of the constructs. Subsequently, with the second sample, confirmatory factor analysis was performed to validate the findings of explanatory factor analysis and the generalizability of the survey instrument. Furthermore, it was revealed that the conceptualized mobile application usability constructs significantly predict continued intention to use, satisfaction, and brand loyalty to mobile shopping applications.

4.1 Theoretical contributions and implications

The study makes several contributions to the current literature. First, the study conceptualized mobile application usability and developed a survey instrument based on Google’s mobile application development guidelines. While previous studies [6, 7] have used guidelines from Apple and Microsoft, the current study is the first to employ the guidelines provided by Google. We developed twelve usability constructs, including responsiveness, navigation and motion systems, and component properties, by considering the specific needs of mobile applications. For example, responsiveness refers to a mobile application’s adaptability to changes. Given the constraints of screen sizes, it is essential for mobile apps to adjust component dimensions based on the device and efficiently utilize layers and containers for images and icons. Additionally, the navigation and motion system focuses on swift transitions and understanding element relationships, actions, and their results.

Furthermore, this study highlights the influence of usability constructs on users’ behaviours and their continued intention to use mobile apps, referencing information systems acceptance theories like TAM [145] and UTAUT (Unified Theory of Acceptance and Use of Technology) [146]. The research revealed that the continued intention to use is influenced by factors such as ‘typography, text and writing’, ‘component properties and shape’, and ‘navigation and motion system’, accounting for 32.7% of the variance. Furthermore, factors including ‘colour’, ‘typography, text, and writing’, ‘consistency’, ‘sound’, and ‘responsiveness’ account for 36.2% of the variance in brand loyalty. Additionally, the satisfaction factor is 51.7% explained by elements like ‘colour’, ‘typography, text, and writing’, ‘consistency’, ‘sound’, and ‘responsiveness’. In the context of mobile shopping applications, it is crucial for app developers to prioritize these usability factors to ensure increased user satisfaction, brand loyalty, and continued app usage.

This research introduced a formal methodology by integrating approaches from Strauss and Corbin [44], MacKenzie et al. [62], and particularly from Lewis et al. [61]. This systematic approach is applicable to various systems, not just mobile applications. Researchers can utilize the procedure illustrated in this study as a framework for their conceptualizations and instrument developments. Furthermore, the proposed mobile application usability instrument and open codes can serve as valuable guidelines for mobile app developers throughout different phases of the mobile application development life cycle. During the design and development phases, the open codes and the instrument can be utilized as a checklist to eliminate usability issues, thereby fostering the creation of more user-friendly applications. Similarly, existing applications can use this checklist to detect and address usability concerns, enhancing the app’s overall usability. Moreover, by leveraging the instrument, end-user feedback can be gathered, allowing developers to highlight and prioritize areas that significantly impact user satisfaction.

5 Future studies and limitations

While the present study makes significant contributions to mobile application usability, it is important to acknowledge its limitations. The usability instrument was initially validated in mobile shopping applications. Evaluating its validity, reliability, and generalizability across diverse mobile app categories like education, gaming, health care, or social media is crucial for a comprehensive understanding. Furthermore, the proposed usability constructs explained specific variances in brand loyalty, satisfaction, and intention to use. However, unexplained variances indicate that there may be additional underlying factors requiring further investigation. As potential future research, examining the impact of demographic factors, such as age, gender, education, and prior mobile application experience, on mobile application usability can provide valuable insights into tailoring user experiences and enhancing app usability for diverse user groups. Furthermore, in future research, conducting comparisons between the proposed instrument and existing usability evaluation instruments will be beneficial in identifying the strengths and weaknesses of the developed instrument and for instrument refinement.