Keywords

1 Introduction

Understanding why, and how, people choose which apps to install is important for several reasons: to help developers make apps that will be installed, and to help companies sell their apps.

The vast amount of information available in most of the search queries are considered to be overwhelming. It is increasingly important to consider essential aspects of using the available tools by understanding “how users manage to orient themselves and retrieve relevant information” [30, p. 778]. Given the enormous number of apps in app selection platforms (such as the Google Play Store), it appears unreasonable to expect consumers to regularly use their cognitive abilities and detailed evaluations of the available apps [16].

Evaluation of available information (cues and factors in the app platform) is necessary to make decisions [1, 4]. Cues, such as ratings, can be observed in the app platform. Factors are related to users, not the app selection platform. An example of a factor is the user’s motive to install an app, e.g. the problem that creates the need which the app would fulfill.

It is crucial to understand what drives people’s decisions of downloading apps since it impacts the benefits and usefulness of such technologies [26]. The key source of providing the required information for consumers lies with app developers since they are the ones who incorporate that information (e.g., cues) into their apps. The provided app description “represents a key channel to inform consumer choice” [21, p. 1].

In recent years, mobile app ecosystems have developed rapidly in which the number of mobile apps and categories have increased significantly. DataReportal recorded as of October of 2021 that there are 5.29 billion unique mobile phone users, accounting for 67.1% of the global population [7]. The worldwide increase of mobile apps has also led to fierce competition, resulting in some apps’ disappearance. To succeed in this highly competitive app market, app developers need to understand their users’ experiences and make strategic decisions for app survival and sales [23]. App stores owners, developers and marketers need to attract users’ attention by improving the app store interface to improve app downloads further [3, 31]. With a better understanding of users’ behaviour, devices, services and applications can be improved [32].

Our research aims to illustrate how users choose which app to install, and further investigate their importance using data mining approaches to find meaningful relationships, “interesting patterns,” and “useful insights.” Also, to explore the decision making strategies employed by users when searching for apps.

This work contributes to the understanding of the ‘app search space’ in the following ways: First, we extend the app model created by Huang and Bashir [14] by identifying multiple stages that users go through before installing an app and determining which information cues are used as a starting point in the app selection process. Second, we corrected a misconception reported in earlier work that the information cues that users initially viewed are important. Our result showed that the importance of these cues was determined later in the search process. Third, we found that users tended to consider a combination of cues and factors when searching for apps. Based on their experience of evaluating other apps, they may gradually develop a substantially different view of the importance of various cues. Fourth, our study shows the importance of certain cues, e.g. app reviews, and their role in the app selection process. Fifth, we show that users use different strategies when searching for apps. Sixth, we identified a new heuristic called “Return to the First.”

2 Background and Related Work

We outline previous research by providing theoretical aspects from a general heuristic perspective, heuristics from an app selection perspective, along with app selection studies.

Also, we briefly describe the data mining approaches and the measures used to assess the relationships between the itemsets “patterns.”

2.1 Heuristics

Even since the early days of research on decision-making, “it has been argued that humans do not always use their cognitive abilities extensively before they make a decision or execute an action” [30, p. 779]. Users tend to rely on heuristics as a result of “time deficit and insufficient skills,” applying shortcuts and ignoring some of the available information to facilitate quicker decisions [22, p. 59].

Such heuristics support people’s primary objectives when looking for information by finding it “quickly, conveniently, and without any substantial engagement with the information or source itself” [22, p. 62].

2.2 App Selection Studies

We first address heuristics from an app selection perspective. Then, we describe related app selection studies.

Fig. 1.
figure 1

Dogruel et al. Classification of decision-making methods [8].

Dogruel et al. [8] studied heuristic decision-making strategies users incorporate during the app selection process. Figure 1 summarizes their finding of five decision-making heuristics when users search for apps. Four of these heuristics are “variants of ‘TtF’ Take the First heuristics”.

They concluded that the most used heuristics was “Take the First (TtF)”, used by 80% of their participants and involved installing the first app that users encountered without looking for or examining other apps [8, p. 139]. Note that they did not investigate the Pondering Alternative heuristic due to the low frequency of occurrence among their participants.

Bowman et al. [2] and Joeckel et al. [16] conducted studies similar to Dogruel et al. [8] but both used think-aloud protocols and had their participants watch recorded footage of their app selection tasks. Bowman et al. found that 42% of their participants chose only to look for games that they had “either previously owned or had some specific familiarity with” [2, p. 4]. Joeckel et al. [16] concluded that users relied on the familiarity of an app, ratings and reviews.

Huang and Bashir [14] created an adoption flow model of anxiety-related apps in the mental health domain and collected eight types of what they called “metadata cues” (“app prices, ratings, reviews, installs, categories, permissions, ranking, and title”) [14, p. 4]. Their method did not include users.

2.3 Data Mining

Due to the increasing amount of data collected, many domains are using data mining techniques to extract valuable information and hidden interesting patterns through the process of knowledge discovery for further use [17].

Frequent itemsets and association rules mining are approaches for discovering transactions and interesting rules in the dataset. Frequent itemsets, are itemsets that frequently appear in a dataset that satisfy the given minimum support threshold. Association rules, is the task of uncovering relationships of itemsets or patterns that come together among large datasets that have a strong association between the discovered items. Mining all the rules are based on different measures such as confidence that would potentially find the relationships between these patterns to analyze the data better and help any domain with the decision-making process (e.g., the product placement) [12, 25]. Antecedent (or Premises) is the left-hand side of the rule LHS. Consequent (or Conclusion) is the right-hand side of a rule RHS.

Interesting Measures. We describe a few of the most used measures to evaluate the generated rules.

Support is a measure that gives the percentage of the transactions that include items X and Y to the total number of transactions in a dataset. It shows how frequently the items or combinations of itemsets are bought together [25].

Confidence measures the percentage of the transactions that contain both X and Y to the number of transactions that include only X. It shows how frequently items X and Y appear together in a transaction, given the number of times that X occurs [25].

Lift measures the “frequency of X and Y together if both are statistically independent of each other” [25, p. 21]. It shows the strength of a rule over the random occurrences of X and Y.

3 Research Objective

Prior research has identified multiple information cues when searching for apps [8, 9, 14, 15, 19]. The research conducted by Dogruel et al. [8], and Huang and Bashir [14] is the most relevant to our work, despite having different objectives and slightly different methods.

Dogruel et al. [8] did not address the use of multiple heuristics among their participants, and they did not investigate the Pondering Alternative heuristic. Dogruel et al. [8] assumed that the cues that were looked at first by users were important. Moreover, search terms were stored on the phone to “keep specific search terms consistent” among their participants [8, p. 131]. We note that users may not necessarily know the exact name of the app that they were looking to find in actual practice.

Huang and Bashir [14] work was limited to apps in the anxiety category.

We adopted the heuristic-decision strategies of Dogruel et al. [8] and adapted Huang and Bashir’s [14] model as the base to develop our model of how users choose which apps to install.

Following earlier work, we present the following research questions:

  • What type of information influences users when searching for apps?

  • Are the information cues that were looked at first important?

  • Do users follow the same heuristics while selecting an app?

  • What is the role of the “Pondering Alternative” heuristic?

4 Method

4.1 Participants

According to Creswell and Clark [5], the sample size recommended for using the thematic analysis approach to provide proper codes and themes is 20–30 participants.

The study data was collected as a convenience sample on the campus of a Canadian university with a total of twenty-six participants (25 males and 1 female). Twelve participants were 18–22 years of age, twelve were 23–30, one was 31–40, and one was older than 40. Of the sample, 34% were undergraduate students, 65% were graduate students, 11% were PhD students, 50% were Master’s students, and one participant had two Master’s degrees but was studying for Bachelor’s degree. Participants were recruited through the university e-mail lists. Two restrictions were applied to be eligible to participate: the participant’s age was at least 18 years old and should own an Android smartphone. Participants received $10 compensation for their participation.

4.2 Procedure

The study procedure was modified from that of Felt et al. [9] and Kelley et al. [18, 19]. It is an observational lab study with semi-structured interviews. It consisted of five parts as presented in the Supplementary Details (Table A1). After the participants signed the consent, they were asked to think aloud while installing two apps. Also, they completed a short questionnaire to collect demographic data. Moreover, they were interviewed to elaborate on why they either considered or ignored some factors/information cues during the search task. The study took about 30 min to complete.

It is important to note that five participants spent more time on the given app search task, and due to the time restriction of our study, they were only able to engage in one app search task. Also, paid apps were not considered in this study.

4.3 Measures Used for Data Preparation and Analysis

Some steps were taken to prepare the data for analysis [6]: (1) The participants’ interactions with the smartphone were video-recorded during the lab study using built-in software. We found that video-recording without eye-tracking provided sufficient data [8] as all interactions and touches were recorded. The phone has a 5.5-inch (\(1080 \times 1920\) pixel) screen. (2) All interviews were transcribed. (3) Video recordings were coded using the rules guide adapted from the work of Dogruel et al. [8] with modifications as shown in the Supplementary Details (Table A2). (4) Thematic analysis was used by identifying main themes based on participants’ observations and responses. Codes representing each theme were generated, and some of these codes were later merged into higher-level codes as shown in the Supplementary Details (Table A2). (5) The letter N represents the overall frequency of occurrences. (6) We created two visual matrices as shown in Fig. 4. (7) The first visual matrix, in Fig. 4a, depicts the total combination of cues and factors employed by participants and it shows the participants’ ID numbers and the task order (1 for first, and 2 for second) with the number of attempts made for each task. It is organized by the number of cues and factors based on frequency of occurrences, from the most used to the least (nine to zero). Zeroes are represented when participants moved between apps during the app selection tasks without considering any cues/factors. Note that because some participants viewed multiple apps during their first and second tasks, there are many occurrences of 1 and 2 in the chart. For example, ID10 viewed five apps (that is, that participant made five attempts) during the first task and viewed three apps during the second task. The second visual matrix is organized by participants’ ID numbers for easier interpretations is shown in Fig. 4b. (8) Each row/record of the factors/cues considered by participants is treated as a transaction coded as 1 (for true) or 0 (for false). An itemset is a subset of items within a transaction, e.g. {ratings, reviews} from {ratings, reviews, number of downloads, full description}. (9) We used tools such as Rapidminer and Weka to create models of the Apriori and FP-Growth algorithms to find the frequent itemsets (patterns) to generate the association rules. (10) We used different techniques to interpret some of the derived association rules such as: table-based technique, and graph-based visualization.

5 Results

First, we describe the model we created of how users choose which apps to download. Second, we identify decision-making patterns, along with the use of multiple heuristics and the Return to the First heuristic. Third, we outline the installation cues/factors considered by participants. Fourth, we describe our models for patterns discovery. Lastly, we detail some of the derived association rules and show the importance of the app reviews cue in particular and some other cues.

Fig. 2.
figure 2

A model of how users choose which apps to install. The dotted-red cue and its corresponding behaviour (namely permissions) are not available after Android 6. (Color figure online)

5.1 A Model of How Users Choose Apps

Based on our observations during the app selection process we created a model of how users choose apps to install. Figure 2 is an extended version of the model created by Huang and Bashir [14].

From 41 instances, we identify the cues used by participants as a starting point in the search process. Figure 2 shows three stages that participants go through before installing an app. It also shows the cues that are available in the app selection platform. Moreover, as can be see on the bottom left, there is a list of factors that are considered by users that are not part of the app selection platform.

In the first stage participants initiate the search process after typing a keyword for the desired app. They observe or consider the following available cues on the search results screen of the app selection platform: blue badge (a mark given by Google to the “top developers” in a particular field)Footnote 1, app ratings, app logo, app name/brand, app popularity, whether the app is free and avoiding ads or fewer ads before making their decision to view the app for more information.

If the participants were interested in viewing the app they proceeded to the second stage, where they examined and evaluated the cues of app description, categories, number of downloads, screenshots, and reviews.

There were two possibilities in the third stage: if participants were not interested in their initial options, they could (1) again browse the app search results or (2) initiate another search; or, if they decide to go ahead with the app installation, they would be presented with either (1) no permissions or (2) a list of permissions.

Note that the following factors in our model: having a problem, quick install, knowing the app, and install/try both are not directly related to the observed cues available on the app selection platform. Instead, they are related to the participants’ app search behaviour criteria or actions, where some participants indicated that they would have an issue and that the app would solve it. Some of our participants explained that they would quickly install the app rather than look for cues, as they wanted to experience the use of the app first-hand before making a decision to keep it. Similarly, some expressed that they would install more than one app to get a better sense when comparing two or more apps.

5.2 Identifying Decision-Making Patterns

Overall, the “Pondering Alternatives” heuristic has the highest percentage, garnering 70% (\(N=59\)) of our sample. Only 17% (\(N=14\)) of participants chose the “Confirm Take the First heuristic”, while the “Inform and Confirm and Take the First” heuristic came third at 8% (\(N=7\)). The least-used heuristics were “Inform and Take the First” and “Pure Take the First” with 1% (\(N=1\)) and 4% (\(N=3\)), respectively.

5.3 The Use of Multiple Heuristics and the Return to the First Heuristic

In terms of the use of multiple heuristics, we found that 65% (\(N=17\)) of participants adhere to using only one type of heuristic, whereas \(23\%\) \((N=6)\) used two different types of heuristics (simple and complex), and 11% (\(N=3\)) employed two variant of “Take the first” heuristic (two instances of cTtF and icTtF, one instance of iTtF and icTtF).

We found that 54% (\(N=25\)) of our participants viewed only one app per search task, 24% (\(N=11\)) viewed two apps, 11% (\(N=5\)) viewed three apps, 7% (\(N=3\)) viewed four apps, and 4% (\(N=2\)) viewed five apps.

Investigating the “Pondering Alternatives” heuristic, led us to some intriguing results in terms of the decision-making strategies used by some of the participants. We found that 8% (\(N=7\)), after evaluating other apps, returned to their initial first chosen app they had already viewed. Based on the heuristic literature [27, 29], we call this the “Return to the First” heuristic. In our study ID10 and ID25 returned to their first chosen app in both tasks, while ID16, ID22 and ID24 returned to their first chosen app in one of the tasks.

Fig. 3.
figure 3

Distribution of all cues & factors and first ones considered during search.

5.4 Cues and Factors Influencing App Selection

We report the overall distribution of cues, factors and the first ones considered.

Figure 3a suggests what influenced our participants to select some apps rather than others. Figure 3b shows the first cues/factors chosen among participants.

Overall, 57% (\(N=48\)) considered reviews, 43% (\(N=36\)) looked at ratings, 33% (\(N=28\)) read the app full description, 32% (\(N=27\)) noted the app screenshots, and 23% (\(N=19\)) looked for the number of downloads. For permissions, 67% of participants skipped them, 7% read them, and 26% checked if the permissions requested seemed reasonable to them.

Most of the cues and factors shown in Fig. 3 are self-explanatory. However, the nothing item as shown in Fig. 3b represents 11% (\(N=9\)) of our cases where some participants moved quickly between apps during the selection tasks apparently without considering anything. Lastly, Fig. 3b show that 18% (\(N=15\)) of our participants had determined that the app ratings was the most important first cue when selecting an app (discussed in Sect. 6.1).

5.5 Models for Patterns Discovery

We report first our initial finding of the combination of cues and factors. Then, we describe our models for finding frequent itemsets “patterns” and the derived association rules.

Combinations of Cues and Factors. Our findings show that participants had considered different combinations of cues and factors during the app search process as shown in Fig. 4.

Correlations of the combinations that are significant, according to the Spearman’s Rho test, are presented in the Supplementary Details (Table A3). We elaborate more on this behaviour and provide some examples. ID10 viewed five apps in the first task. His search path was: (1) He chose the Smart Receipts app and considered four cues: free, ratings, number of downloads, and reviews. (2) Then, he viewed the Cash Receipts app and considered app reviews and read the app’s full description. (3) He went back and chose the Receipts app as his third choice and considered only the app’s full description cue. (4) Then, he decided to view another app called Receipts Bank. He considered app reviews and reading the full app description in his fourth choice, similar to what he did when viewing the second app. (5) Finally, for the fifth app, he decided to return to his first choice and viewed the Smart Receipts app and considered three different cues and factors from his first time viewing the app. He considered the full app description, install/try both, and reasonable permissions.

Participant ID25 took a slightly different approach where he viewed three apps in total for the second task. He considered five cues at first, then viewed another app, and he decided to return to his first choice without considering any additional cues or factors.

In summary, when participants searched for apps in the selection platform, they considered different combinations of cues and factors. Most participants did not rely solely on individual cues or factors (in 60 of 84 instances). We found that some participants might consider additional cues and factors or disregard them while searching for apps.

Fig. 4.
figure 4

Combinations of cues & factors employed by each participant while completing their tasks organized by number of cues & factors and by the participants ID numbers.

Fig. 5.
figure 5

Weka and Rapidminer Models for (Apriori and FP-Growth algorithms).

Frequent Itemsets or Patterns. After our initial analysis of discovering the combinations of cues and factors as shown in Fig. 4, we used the data mining approaches shown in Fig. 5 and created two models using Weka [10] and Rapidminer [13].

The Weka model used both Apriori and FP-Growth algorithms, while the Rapidminer used only the FP-Growth algorithm.

Table 1. The frequency of the frequent itemsets of one item.
Table 2. The frequency of some frequent itemsets of two, three and four itemsets.

We present the most frequent itemsets using the Apriori algorithm in Weka. We chose minimum support of 3% to show the items that occurred three times, the lowest frequency in our dataset as seen in Table 1, which is similar to Fig. 4.

Overall, we found 60, 73, 35, and 4 cases of two, three, four and five frequent itemsets of cues and factors, respectively.

Tables 2 displays a few examples which show the relative importance of the different combinations of cues that occurred together.

5.6 The Derived Association Rules

We briefly describe the differences between the Weka and Rapidminer tools in terms of finding the rules. We report some interesting rules with preliminary interpretations.

Table 3. App ratings and reviews association rules with minimum support of 33%.
Fig. 6.
figure 6

A graph showing the links between some of the association rules of some cues.

Weka [10] shows the overall frequency of each cue and factor, but does not display the percentage of the support measure as shown in Table 3. For the first rule, 36 means the overall frequency of the rating cue, and 28 means the overall frequency of the reviews cue that appeared with the rating cue. Interestingly, the first rule has higher confidence of 78% than the second rule, which means that for rule one, the consequent, app reviews, is more likely to be present in a transaction containing the rating cue as an antecedent.

Table 4. The derived rules using the FP-Growth algorithm. The app reviews cue was highlighted using bold and italic text on the right side of the rule (Consequent). Bold: for the app reviews cue only; Italic for the app reviews appeared with other cues.

In Rapidminer [13], Table 4 does not display the frequency of the occurrence of each cue. However, it does show the actual percentage of the support measure.

It was interesting to see that the reviews cue appeared as a consequent in 20 rules (highlighted in bold) out of 32 as shown in Table 4. Moreover, in 3 instances (highlighted in italic), app reviews appeared with other cues as a consequent (RHS), which brings a total of 23 instances of the app reviews cue out of 32. This shows how important is app reviews to users.

Figure 6 shows the links between some of the association rules with some cues. The app review (F5-Rev) is the most important cue in the graph because even if the users considered other cues, most of the rules show that app review is the core of those rules. Also, it shows that almost all of the rules are headed towards the app reviews cue. It was mostly considered as the consequent (RHS) in the rules except in three cases when reviews were used as the first cue for three participants. Two of them viewed the app review cue in their second or third attempt when searching for apps and while moving between apps to evaluate them before their final decision to install the app.

Fig. 7.
figure 7

A heat map plot showing the importance of the association rules of some cues based on the Lift measure.

What stands out in Fig. 6 is that some cues are used by users as a starting point (antecedents) when searching for apps such as app rating (F2), app logo (F12), app name/brand (F17), avoid ads (F13), app popularity (F4) and free (F8), which validates our model. Those who consider the app logo, for example, as shown in Fig. 7 are 3.8 times more likely to view app screenshots, reviews and ratings. Moreover, the heat map plot of Lift measures (Fig. 7) shows the relative importance of cues.

Another interesting aspect was the number of downloads (F3), the full app description (F6), and the app screenshot (F11) cues. Figure 6 indicates that users tend to consider them and move on to other cues.

Table 5. A few association rules of some participants that shows their behaviour.

Lastly, Table 5 show the behaviour of a few participants that stands out when generating the association rules. According to the Lift measure, ID20 is 4.4 times more likely to consider the app reviews and screenshots.

6 Discussion

In this section, we discuss and elaborate on some of the results.

6.1 The Assumptions of the Importance of First Information Cues

One of the assumptions made by Dogruel et al. [8] work was that the cues that were considered first were important to users.

However, based on the model we created (Fig. 2) and results (Table 4, Fig. 6, Fig. 7), the importance of cues was determined later in the search process, i.e., after viewing apps and looking at different combinations of cues at the same time. The use of different combinations was in line with what was reported by Todorov et al., who stated that “people consider all relevant pieces of information, elaborate on these pieces of information, and form a judgment base on these elaborations.” [28, p. 196].

We reported earlier that the two most influential cues were app reviews and ratings. We found that the app rating cue came first (18%) when we looked at the first cue chosen by participants. However, the reviews cue came in eighth place (4%).

This does not necessarily mean that the first cue is always important, which contradicts the work of Dogruel et al. [8]. These cues may play an important role individually because they were treated or associated with other cues/factors. Still, they were not used as the primary reason for participants to make the final decision to install apps. Instead, they were considered to be a “starting point” or “just a place to start” the search process [9, p. 6].

7 Limitations

We provide an overview of some of the limitations of our work. We must first address one of the flaws of the conducted study, which is gender imbalance. In our study sample, we had only one female participant.

As a result of that, our results cannot be considered as representative of how users, in general, install apps from the Google Play Store, and it is only applicable in a very limited sample of mainly male participants.

On the other hand, it can still determine what kind of behaviour users may be exhibited when searching for apps.

It was also important to note that a few participants could only download one app, and we did not encounter the role of considering paid apps.

8 Future Work

Our initial research effort was an attempt to understand and observe how users choose which apps to install.

Therefore, future work should look at integrating multiple settings, such as moving from a laboratory setting to “real-world” settings or letting participants use their phones, which could make it more realistic rather than an artificial task [2, p. 7]. Researchers should also consider the individual differences and the domain knowledge (i.e., level of expertise). Everyone has their own internal ranking of what matters that might be related to the knowledge acquired about the alternatives when making the decision [20, 24] since some “apps might be more personally or situationally relevant to users” [16, p. 13].

Furthermore, since our study sample is small, future work should address this issue by incorporating a larger and more representative sample.

Lastly, we aim to refine our model by incorporating the changes made to the Google Play Store. For example, the blue badges popularity indicator was replaced with the “editors’ choice” badge.

Perhaps most importantly, the store now supports filtering results by rating and badge status. We stress that although our study used the Google Play Store, we are trying to create a general model for every app selection platform.

9 Conclusion

We conducted an observational lab study with semi-structured interviews to understand better how users search for apps in the Google Play Store. Our work expands and contributes to the app search domain, validated by our models and the analysis of the different combinations of cues and factors.

Using data mining approaches, we found that there are influential cues that are used as a starting point in the process of searching for apps, and that certain cues are used by users temporarily before moving on to other cues, and the most influential cues are used are the app reviews. We found that while participants were searching for apps in the app selection platform, they sometimes changed the cues they were attending to (considering additional cues or disregarding cues they had used previously).

Surprisingly, we also found that in 8% of the instances (\(N=7\)), after evaluating other apps, participants ultimately returned to the app they had first viewed, a process which we named the “Return to the First” heuristic.

In conclusion, our work showcase the importance of certain information cues that may help app developers to understand their users.