Keywords

1 Introduction

Mobile applications are an effective approach to motivate individuals for healthy behavior, and recent years have seen an increase in the use of mobile and ubiquitous technology for changing human behavior or attitudes in the health domain. Behavior change support systems (BCSS) utilize persuasive technologies to assist users in pursuing their goals (Oinas-Kukkonen 2012), and they are widely used in health as well as welfare, commerce, education, energy saving, and other areas (Oinas-Kukkonen and Harjumaa 2008). With mobile and ubiquitous technology, behavior change techniques can be used to influence individuals based on their context, personal needs, and progress, and they have been shown to be effective particularly in mobile applications (Unal et al. 2014).

Physical activity applications have an important place among mobile health applications in terms of their wide popularity, commonality, and the need to use behavior change techniques to initiate and promote physical activity. Researchers have sought effective ways of encouraging physical activity and have shown that interventions designed to increase physical activity may improve success rate from 50% without intervention to about 70–88% with certain interventions (Dishman and Buckworth 1996). In this study, we aimed to investigate whether significant relationships exist between user adoption and evaluation of physical activity applications and features pertaining to behavior change interventions. To this end, we conducted hands-on research by obtaining mobile physical activity applications from the Turkish and US versions of Google Play Store. From the English and Turkish versions of Google Play Store, we selected 78 top health and fitness applications in the following subcategories: fitness, workout, pedometer, and running. Each application was downloaded to a mobile phone to extract and classify all relevant features. To discover the significance and contribution of the features from the users’ adoption and evaluation perspective, the relationship between features and an application’s current rank in the store in terms of the number of downloads and rating was analyzed.

Our analysis of the relationship among features about the behavior change, user ratings, and download numbers revealed a significant relationship concerning download counts and nonsignificant relationship about the ratings. Furthermore, the subcategories of physical activity applications such as fitness, running, pedometer, and workout produced different results concerning their relation to behavior change support features.

2 Related Work

2.1 Behavior Change Features in Mobile Context

In recent years, the behavior change techniques used in mobile applications for physical activity have been analyzed by several authors using content analysis. Applications were rated based on the taxonomy of Abraham and Michie (2008) concerning the behavior change techniques used in interventions. The original study was designed for general interventions; therefore, the studies based on mobile applications had to interpret features of the applications and undertake some tailoring to fit the application context (Middelweerd et al. 2014). In the study by Middelweerd et al. (2014), the most frequently used behavior change techniques in mobile applications were found to be goal setting, self-monitoring, and feedback on performance as consistent with other types of intervention tool. The presence or absence of behavior change techniques was identified for physical activity and/or dietary behavior applications in research undertaken by Direito et al. (2014). The authors found that the most commonly used behavior change techniques provided instruction (83% of the apps), set graded tasks (70%), and prompt self-monitoring (60%). The limitation of these two studies lies in how they quantified the existence of behavior change techniques in the applications. The presence of a single feature in self-monitoring such as self-reports, diary, or route tracking in the application was deemed to be sufficient to label the application as exhibiting the related BCSS approach.

In the literature, there are a limited number of studies that investigated the effects of the features utilized in mobile physical activity systems. Munson and Consolvo (2012) found that the use of goals and reminders is more promising regarding positively affecting the user’s activity in comparison to rewards and sharing. The use of reminders was the most appreciated feature indicated by all the participants, none of whom disabled this feature; however, expected rewards did not appear to motivate the users (Munson and Consolvo 2012). In their qualitative study, the users reported benefits from the use of both secondary and primary goals but considered that there were limited benefits in sharing their progress. In another qualitative study by Harjumaa et al. (2009), the most motivating of ten features were self-monitoring, reduction, and reminders. Praise and rewards were found to be effective only in some specific cases. In the qualitative study conducted by Dennison et al. (2013), recording and tracking behavior and goal and getting advice and information were valued by users, whereas context-sensing capabilities and social media interactions were found to be unnecessary and disturbing.

In the current study, we made use of some of the features mentioned in the literature (Middelweerd et al. 2014; Direito et al. 2014; Munson and Consolvo 2012). The main features, namely, self-monitoring, goal setting, rewards, and sharing, which were previously explored by Munson and Consolvo (2012), were further investigated through hands-on research using mobile physical activity applications obtained from the application store and examining related features.

2.2 Users’ Adoption and Evaluation of Mobile Apps in the Market

The number of downloads and user ratings provide an insight into users’ point of view and their adoption and evaluation of mobile applications in the mobile application market. The number of downloads gives commercially valuable information about an application; however, application stores only provide this information to application developers and avoid making the data public. Google Play Store is the only application market that gives information about the download statistics of all applications; however, instead of providing the exact value, they give download counts in buckets. One of the main reasons for using Google Play Store to select applications is the availability of the download data. Another characteristic of Google Play Store is that the majority of the applications are free. This allows us to obtain a set of uniform applications that compete on the same basis.

All application stores make user rating data available to the public, which provides valuable information concerning the consumer’s perception of applications. Users can rate applications they have downloaded from 1 to 5 stars, with 5 being the highest possible rating. The average of these ratings is displayed in application markets for each application. However, there are serious drawbacks in using these ratings. Most importantly, the average rating is the average of multiple releases over time, which does not provide valid information for the user, who is interested in the latest release (Fu et al. 2013). Then, there are inconsistencies between user comments and ratings, which mostly result from careless mistakes or developers or their competitors’ attempt to manipulate ratings (Shi and Ali 2012). Finally, ratings are usually polarized, with the vast majority of ratings being either 1 or 5. This is the case for Google Play Store, in which most applications are free, and users tend to give an application 1 star when it does not work and 5 when it fulfills their expectations (Shi and Ali 2012).

In the literature , a significant positive correlation has been reported between the number of downloads and user ratings for Android applications (Sunyaev et al. 2013; Dehling et al. 2015). Similar results were found in the Blackberry market, in which authors observed a strong correlation between ratings and downloads, and highly rated applications had the most frequent downloads (Finkelstein et al. 2014). A recent study in 2014 on Google Play Store concluded that although there was an expectation that applications with higher ratings would have higher download rates, this was not the case: All the paid applications had an average overall rating of 4 with free applications having an average overall rating that was greater than 4. On the other hand, combining both free and paid application, the average rating was between 4 and 4.5 in any bucket of the download range (Viennot et al. 2014).

3 Methodology

3.1 Data Collection Process

The application data was collected from the US and Turkish versions of Google Play Store. Free applications were targeted in the study because more users prefer to download free applications rather than paid ones (Mohan et al. 2013). The health and fitness category was selected in both versions of the store. In the Turkish version of the store, there was a list named Top Free Apps in the health and fitness category. The names, number of downloads, and rating values of the first 200 applications from the Top Free Apps list were recorded as of December 7, 2014. Google Play represents the number of downloads information as a range (e.g., 10–20 million). The minimum number of downloads within this range was recorded (e.g., 10 million). As a result, a total of 25 free applications were obtained from the US version of the store.

After obtaining the initial application lists from both stores, applications were included in the study based on the following criteria: (i) the language of the application should be English and (ii) the application should support a behavior change, i.e., it should direct or guide users to undertake physical activity. Applications that only provide information and guidelines about health or fitness but do not have behavior change features that encourage the user to undertake physical activity were excluded from further analysis. Five of the applications obtained from the US version of the store could not be downloaded due to differences in regional releases; therefore, they also had to be excluded. Additionally, nine applications from this store were already included in the applications from the Turkish store.

From the US and Turkish versions of Google Play Store, we selected 78 top health and fitness applications in the following subcategories: fitness, workout, pedometer, and running. The main reason for creating subcategories was to distinguish specific features of different types of applications. Of the 78 selected applications, 11 (14.1%) were from the US store, and 67 (85.9%) were from the Turkish store. First, all the applications were screened by one reviewer. Then, the applications were shared equally among three other reviewers each assessing the applications in terms of the presence of features related to behavior change. Thus, each application was reviewed by two different reviewers. All reviewers explored each application by downloading and using it with all the available functions. All the applications were installed on a smartphone with the Android operating system version 4.3. Each reviewer stated his/her opinion about whether the applications contained features related to behavior change techniques based on the taxonomy prepared and explanations of the features. For this purpose, Abraham and Michie’s (2008) taxonomy of behavior change techniques used in interventions was adapted to the conditions of today’s mobile technology, resulting in 34 items as presented in Table 7.1. In case of a conflict between two reviewers, a third reviewer screened the application that caused the conflict, and the features were extracted based on the opinion of the majority (2 of 3).

Table 7.1 Availability of behavior change features in the selected applications

3.2 Data Analysis

In the analyses, dependent variables were selected as the number of downloads and rating of the selected applications. Independent variables were the features extracted related to behavior change. Table 7.2 shows the descriptive statistics of the dependent variables.

Table 7.2 Descriptive statistics for the number of downloads and rating

The Mann-Whitney U test is used to test for differences between two independent groups. In this study, this test examines whether there is a significant difference in the number of download or rating values in terms of presence of the features. In order to identify the important features on the dependent variables, we performed feature selection using the minimum-redundancy maximum-relevance (mRMR) method (Peng et al. 2005). It was shown that concerning feature selection and classification accuracy , mRMR achieved the lowest error rate compared with other algorithms such as Naïve Bayes, support vector machines, and linear discriminant analysis (Peng et al. 2005). The main aim of the mRMR algorithm is to find the features that best describe the target variable (i.e., the number of downloads and rating in the current study). The presence of these features in each application constituted our feature vector (FV1 = [a1, a2, a3,…, a13]). In other words, FV1 is a binary vector, which indicates whether the important features are present in the applications.

According to Cohen (1992), in order to conduct a statistical test measuring the difference between independent means, a sufficient number of data points are required. Cohen suggested 26 as the minimum number of data points to be included in a group in order to observe large differences between two groups. However, several features in our dataset had a lower number of data points than the given threshold. In order to not lose much data, we included the features having at least 20 data points.

4 Results and Discussion

4.1 Findings on All Application Types

In each analysis for the feature we are studying, we have divided the applications into two groups: one group of applications with that feature and other group of applications without the feature. For example, we have investigated the differences between the number of downloads/ratings of applications having voice coach feature and without this feature. The difference between the presence of the features (FV1) on the number of downloads and the rating values was investigated using the Mann-Whitney U test since the number of downloads and the rating data were not normally distributed (D(78) = 0.43, p < 0.05; D(46) = 0.17, p < 0.05). Based on the results, the following features were found to be highly relevant with the number of downloads, given in order of the results of the mRMR test: voice coach (U = 488.5, Z = −2.44, p = 0.02), visualize activity statistics (U = 304.5, Z = −3.55, p < 0.01), self-reports (U = 430.5, Z = −2.67, p = 0.01), reminders (U = 301, Z = −4.62, p < 0.01), share activity summary via other apps on device (U = 504, Z = −2.27, p = 0.02), provide a social platform (U = 238.5, Z = −4.65, p < 0.01), and share with community friends (U = 282.5, Z = −3.655, p < 0.01). However, none of the selected features were found to be significant with rating (p > 0.05). This may imply that if an application provides these features, its number of downloads is expected to be significantly higher than that of the applications without these features. Furthermore, the order of features with a significant effect has important implications for system designers.

4.2 Findings on Subcategories: Workout, Pedometer, Fitness, and Running Applications

Forty-six of 78 mobile health applications (58.97%) were included in the category of workout. In order to determine the features that had an effect on the number of downloads and the rating values of workout applications, the mRMR algorithm was used. The order of features plotted in Fig. 7.1 shows that the first feature, message exchanges, is highly related with the number of downloads. The same analysis was performed for the rating values of workout applications. The relevant features are plotted in Fig. 7.2.

Fig. 7.1
figure 1

Differences between the average numbers of downloads of workout applications with and without the given features

Fig. 7.2
figure 2

Differences between the average ratings of workout applications with and without the given features

Fourteen of 78 applications (17.95%) were included in the category of the pedometer. Since the number of pedometer applications was less than 26, statistical tests could not be performed for these applications. Therefore, the highly relevant features with the number of downloads and the rating of pedometer applications were identified using the mRMR algorithm since it does not require a minimum number of data points contrary to other statistical tests reported by Ding and Peng (2005).

Figure 7.3 shows the differences between the average number of downloads of applications with and without the features given in the y-axis. The mRMR algorithm orders the features according to their effect on the number of downloads. The most significant feature was found to visualize activity statistics. Figure 7.4 presents a similar chart on the rating values. Figure 7.4 shows that the most effective feature on the rating values of pedometer applications is visualized activity statistics similar to the number of downloads.

Fig. 7.3
figure 3

Differences between the average numbers of downloads of pedometer applications with and without the given features

Fig. 7.4
figure 4

Differences between the average ratings of pedometer applications with and without the given features

Nine of 78 applications (11.54%) were included in the category of fitness. Similar to other types of applications, the mRMR algorithm was employed to identify the important features for fitness applications since the number of fitness applications was relatively lower (being only 9). The difference between the applications with and without the selected features in terms of the average number of downloads and average rating values are given in Figs. 7.5 and 7.6, respectively. The order of the features in the figures is given by the mRMR algorithm . Figure 7.5 shows that with the feature of message exchanges of social platform, the average number of downloads for applications without the given features becomes higher than those with these features.

Fig. 7.5
figure 5

Differences between the average numbers of download of fitness applications with and without the given features

Fig. 7.6
figure 6

Differences between the average ratings of fitness applications with and without the given features

The first seven features in Fig. 7.5 are highly related with the number of downloads. The average number of downloads of applications offering the feature of select sports type and voice coach is higher than that of applications without this feature. A similar result was obtained from the feature of visualizing activity statistics for ratings (Fig. 7.6). The mRMR values of both Share on Facebook and Share on Twitter are negative, meaning that they do not have a positive relationship with the rating values.

Nine of 78 mobile health applications (11.54%) were included in the category of running. Figure 7.7 shows that the average number of downloads for applications offering the reminder feature was about 10 million, whereas it was only about 1 million for applications without this feature. Similarly, there was a great difference regarding the features of a challenge with invited friends and link to smart watch . The differences become smaller as the features become less effective on the number of downloads (as going to the bottom of the y-axis in Fig. 7.7). When the rating values of running applications were compared, no major difference was found. Since the difference between the minimum and maximum values was not that high, it can be concluded that the rating values of running applications in this study do not substantially different as shown in Fig. 7.8.

Fig. 7.7
figure 7

Differences between the average numbers of download of running applications with and without the given features

Fig. 7.8
figure 8

Differences between the average rating of running applications with and without the given features

5 Principal Findings

The current study aimed to explore the relationship between the number of downloads, user ratings, and support features promoting behavior change in health and fitness applications available in Google Play Store . This is the first study utilizing empirical hands-on research to determine the relation of the features in applications promoting physical activity with the number of downloads and user rating levels based on live data from users in the application store.

According to the results, the features of a voice coach, visualizing activity statistics, self-reports, reminders, sharing activity summary via other apps on the device, providing a social platform , and sharing with community friends have a significant relationship with the number of downloads. In the literature, reminders were also found to be the most contributing feature for users (Munson and Consolvo 2012). Sharing with friends on Facebook, Twitter, Google+, other applications, and communities was separately analyzed in this study. Similar to the results of the previous studies, sharing with Facebook friends did not contribute significantly to user ratings and number of downloads. In the previous studies, users reported limited benefits of sharing their progress on Facebook. Since information can be shared with all circles from family members to old friends, users are less willing to share personal information due to privacy concerns and social pressure obstacles (Oinas-Kukkonen 2012; Munson and Consolvo 2012; Ding and Peng 2005). Sharing with friends on Twitter, Google+, and other applications was explored for the first time in this domain, and they were also found not to be related with the number of downloads and user ratings . Providing a social platform and sharing with community friends were separately analyzed and found to be valuable for users. In agreement with the findings of the current study, in the literature, users reported benefits in disclosing their personal data to strangers and anonymous online communities (Ding and Peng 2005). These findings have significant implications for the differences between different modes of information sharing with others. The targeted audience in information sharing is important because people with whom information is shared can strongly influence individuals’ motivation to pursue physical activity; thus, connecting individuals facing similar challenges would have a comforting and encouraging effect and can facilitate social support. No significant relation was found between any of the features and user ratings. This may be due to the strong inclination to give high ratings to free applications and Google Play Store applications as mentioned in the literature (Finkelstein et al. 2014; Viennot et al. 2014). Another reason may be the small sample size of each evaluated feature.

In studies that performed content analyses in the literature, rather than exploring the effectiveness of features, the frequency of using behavior change techniques was investigated. Although this is a fundamental difference in methodology, some of the most frequently used behavior change techniques overlap with the most relevant features in our study. In the study by Middelweerd et al. (2014), the most frequently used behavior change techniques were found to be goal setting, self-monitoring, and feedback on performance. Direito et al. (2014) reported provide instruction, set graded tasks, and prompt self-monitoring to be the most prominent techniques. In our study, the features that were found most relevant were reminders and voice coach, which are related with provide instruction and goal setting. Visualizing activity statistics, providing a social platform, and sharing activities can be considered relevant to self-monitoring and feedback on performance.

For pedometer applications , visualizing activity statistics, sharing activity via other applications, and challenging friends for a distance were found to be the most relevant three features in terms of the number of downloads, whereas for running applications, the most relevant three features were reminders, challenging invited friends, and link to smart watch. The features that were most relevant for running applications seemed to fit the domain since reminders, challenging invited friends, and link to smart watch were the features in running applications that received higher download counts from users. On the other hand, for pedometer application users, visualizing their statistics and sharing them on other applications, e.g., through providing a link to gadgets, received higher download counts. When the features and the ratings of pedometer applications were considered, the most relevant three features were the same as those obtained from the number of downloads. However, for running applications, providing exercise programs for each sports type, sharing on Facebook, and challenging friends for a distance were the three most relevant features for rating. One factor on the discrepancy between the relevant features in terms of the number of downloads and rating of running applications can be related to how the particular features are implemented, i.e., the user interface and usability . Our findings clearly show that there is a large discrepancy between the reasons for downloading an application and its final rating. Furthermore, these discrepancies vary depending on the type of physical activity. This can be attributed to users not being satisfied with the quality or implementation of the features that initially attracted them to download the applications. Application developers can use our findings to improve these features in order to enhance the efficacy and usability of their applications and thereby improve the ratings of their applications.

The findings of this study show that physical activity applications can be substantially improved using applied behavior change techniques. This would create an opportunity to develop more sophisticated and effective mobile applications that address user needs and priorities. The implications of this study will guide designers of mobile physical activity applications to enhance user adoption and evaluation and provide them with a better insight into which features work better through successful interventions intended for behavior change.

6 Strengths of the Study

Previous studies have either assessed mobile physical activity applications by using content analysis or conducting qualitative techniques to determine the effectiveness of features pertaining to behavior change techniques. To the best of our knowledge, this is the first empirical study to evaluate the presence of a relationship between application features and live data on the number of downloads and ratings from the Google Play Store.

In previous studies that conducted a content analysis of mobile physical applications, behavior change techniques were rated. One of the strengths of our study is that we explored, in more detail, the specific features pertaining to the same behavior change techniques. Thus, we had the opportunity to discriminate between sharing on Facebook and sharing with community friends, and we found completely different relations.

7 Limitations

The present study has a few limitations. First, the analysis was limited to the applications in the Google Play Store since it is the only application store that provides download counts. Second, even though both the Turkish and US versions of Google Play Store were used, the analysis was limited to English language applications. Therefore, there is a need for further research on applications listed on other English and non-English language versions of Google Play Store. Another limitation is the use of only free applications. Since the majority of applications are free in Google Play Store, there was only a uniform set of applications available to conduct a statistical analysis.

One other limitation of the current study is that a detailed statistical analysis could not be conducted on the subcategories of physical activity, which were workout, pedometer, running, and fitness, due to the limited number of applications analyzed for each category. Therefore, further research is needed. Lastly, there was no follow-up research after downloading and using applications for a period of time. Therefore, the effects of rewards, which are rather devoted after the use of features, could not be properly analyzed in our research.

Despite the limitations, this study is significant in terms of being the empirical quantitative research on user evaluations regarding mobile physical activity applications in terms of a number of downloads and ratings and behavior change feature. To better understand the effectiveness of features , further research can be conducted to determine how frequently and for how long these applications are used.

8 Conclusion

Our hands-on research approach helped us understand the effects of features on users’ adoption and evaluation of mobile applications in the physical activity category. Based on our findings, when designing and developing mobile health applications, designers can focus on certain features relevant to the type of application to improve user adoption and evaluation. These specific features can be added to applications to enhance the effects of behavior change interventions for the mobile health domain. The theoretical ground can be incorporated into mobile applications with the help of clinicians and experts on health behavior change systems to develop applications that better address user needs.