Introduction

Eating addiction and food addiction represent enormous public health issues in today’s world. Between 1975 and 2016, the worldwide prevalence of obesity almost tripled. In 2016, over 39% of adults aged 18 years and older were overweight, more than 1.9 billion people. Of these, over 650 million adults were obese, representing 13% of the world’s population [1•]. Obesity-related conditions represent leading causes of preventable, premature death, such as heart disease, stroke, type 2 diabetes, and types of cancer [2]. While somewhat controversial, some researchers have likened obesity to drug addiction, as both conditions involve an enhanced value of one type of reinforcer (hyperpalatable food and drugs, respectively) at the expense of other reinforcers resulting from conditioned learning and resetting of reward thresholds after habitual drug abuse or overeating [3,4,5]. In this model, exposure to a reinforcer or to a conditioned stimulus triggers a mental simulation of the expected reward, which simultaneously over-activates motivation circuits and inhibits the cognitive control circuit, resulting in an inability to inhibit the urge to indulge, despite knowledge of the undesirable consequences of doing so [6••].

Targeting the neurocircuitry of addiction that has overlaps with obesity has been shown to reduce maladaptive eating behavior. For example, naltrexone, an opioid agonist commonly used to treat alcohol and opioid dependence, shows preliminary evidence for the reduction of food consumption, decreases subjective liking of foods (particularly hyperpalatable foods), and diminishes reward activation in the dorsal anterior cingulate and caudate upon seeing and tasting chocolate [7]. Reinforcement or associative learning (i.e., operant conditioning) was evolutionarily advantageous when food was scarce, as it was important for humans to remember where, when, and how they obtained palatable foods. This includes both positive and negative reinforcements: the receipt of a reward or elimination of an aversive stimulus, respectively, that increases the likelihood the behavior will be repeated [8,9,10,11]. Therefore, when we consume a delicious, calorie-dense food, we create a powerful emotional memory that integrates all the situational cues surrounding the experience and prompts us to repeat the behavior in response to those cues (positive reinforcement). Similarly, when we eat to numb our negative emotions, we create a memory to eat certain foods to make us feel less sad or stressed (negative reinforcement) [12••].

In our modern environment in which calorically dense foods are abundant and may even be engineered to “hijack” the reinforcement learning system, this mechanism is no longer evolutionarily advantageous but instead contributes to the obesity epidemic and related health problems. Repeatedly eating highly processed foods with excessive salt, sugar, or fat content may alter the reward circuitry in the brain, stimulating dopamine release along the same associative learning pathway as addictive drugs [8, 13,14,15,16,17]. Repeated, excessive sugar consumption may condition us to anticipate pleasure not only in response to sugary foods themselves but also in response to stimuli we associate with the food (e.g., seeing the McDonalds logo) [6••]. These stimuli may then trigger learned associations that induce eating in the absence of hunger [18,19,20,21]. Through the process of reinforcement learning, habits are created based on the rewarding experience rather than physical hunger [22,23,24]. We feel sad (trigger), eat high calorie foods (behavior), and temporarily feel better, causing us to repeat this behavior. As the behavior is repeated and eventually becomes ingrained as habit, recognizing the difference between homeostatic and non-homeostatic hunger becomes more and more difficult [12, 25]. An effective strategy for behavioral health interventions to help people break the cycle of reward-related eating may be to target the habit loop that reinforces maladaptive eating behavior [12].

Mobile technology has quickly spread around the world, and with an estimated five billion mobile device owners worldwide as of 2019, healthcare delivery using mobile phones (mHealth) has become increasingly common [25, 26]. In 2017, the term “digital therapeutics” was coined as a marker of the progression of treatments moving from in-person to app-based delivery [27]. Thousands of wellness apps are currently available for download, and researchers have demonstrated that mobile health technology can effectively promote behavior change [28, 29]. The goal of many of these smartphone apps is to promote healthy eating and nutrition behaviors [30]. The aim of this study was to evaluate and rank smartphone apps designed to improve eating behavior by targeting the addictive component of overeating, using a standardized scale (the Mobile App Rating Scale, MARS).

Methods

This study included mobile apps (both free and paid) aimed at improving eating behavior by targeting the addictive component of overeating or the addictive properties of unhealthy foods themselves (e.g. sugar addiction) found in the official stores of Apple iPhone (App Store) and Android (Play Store) in January 2020.

We systematically searched iTunes app store using all combinations of “food” and “eating” with a predetermined list of words related to addiction, generating the following search terms: food addiction, eating addiction, food craving, craving eating, food dependence, eating dependence, food fixation, eating fixation, food obsession, eating obsession, compulsive food, compulsive eating, food habit, eating habit, uncontrolled food, uncontrolled eating, disinhibited food, disinhibited eating, emotional eating, emotional food, and stress eating, identifying an initial 54 apps. We then screened the app descriptions for relevancy, and a total of 17 apps, which were not centrally focused on eating (e.g. intended for addictive behaviors in general), were excluded. Upon testing the remaining 37 apps, the following inclusion and exclusion criteria were applied: inclusion criteria—app is available on iOS app store and aims to improve eating behavior via targeting the addictive quality of either overeating/binge eating or unhealthy foods themselves; exclusion criteria—app requires external course or subscription, not offered in English, no free trial available, and not found to be a scam (no content beyond an advertisement for a particular product or program). As a result, 18 more apps were excluded: 12 did not offer a free trial, 4 were scams, and 2 required an external course or subscription (see Supplementary Figure S1).

A total of 19 apps were finally included; 9 were available on both the App Store and the Play Store, and 10 were available only on the App Store (see Supplementary Table S1). Each app was downloaded and then evaluated by means of the Mobile Application Rating Scale (MARS) by two independent reviewers.

The MARS includes four sections: app classification, app quality ratings, app subjective quality, and app-specific items. The app classification section gathers descriptive and technical information about the app. The app quality ratings section is further subdivided into domains A through D, evaluating the app on four dimensions: engagement, functionality, aesthetics, and information. Each item is rated on a 5-point Likert scale from “1—inadequate” to “5—excellent”. The final app quality score is calculated by averaging the mean scores of these four domains. The subjective quality section asks for the rater’s own opinions, for example, “Would you recommend this app to people who might benefit from it?” Lastly, the app-specific section includes six items that can be tailored to the app being rated and used to evaluate the app’s impact on the user’s knowledge, attitudes, intentions to change, and the likelihood of changing the target health behavior (which was identified as eating and/or food addiction) [31].

Differences in scores between the two reviewers were evaluated, and if a significant difference was discovered on any domain of the MARS for a particular app (greater than 2 points of difference), a third reviewer rated that domain. If the third rating did not decrease the disparity, the individual items of the domain were compared. Specific items for which reviewers gave significantly different scores were discussed and reevaluated until an agreement was reached. The new domain score was calculated as the average of the three reviewers. The final score for each app was then calculated as the average score of the reviewers. Based on these final scores, apps were divided into tertiles: worst-rated apps, average apps, and best-rated apps.

A descriptive analysis was then performed. We report the scores (of each section and the final score) of every app, as well as the mean score and standard deviation. In order to evaluate whether. there were significant differences between tertiles, Kruskal-Wallis tests were run with group as the independent variable and average rating as the dependent variable (as none were proven to follow a normal distribution in the Shapiro-Wilk test). Lastly, we performed Mann-Whitney U tests with a Bonferroni correction for multiple comparisons to calculate specific p-values between tertiles. All analyses were performed using R version 3.6.1.

Results

A total of 37 nonduplicate apps were initially identified as potential apps targeting the addictive component of overeating to be included in this study. Of these, 12 were excluded due to lack of a free trial, 4 were identified as scams because they did not include any content other than advertisements, and 2 required an external course or subscription, leaving a total of 19 apps to be analyzed.

Table 1 shows the apps included in the study and their main characteristics. Apps had user star ratings ranging from 2 to 5, with a mean star rating of 4.36 (0.75) with the exception of Diet Daily, which did not have a sufficient number of ratings to qualify for a user star rating. The affiliation of the vast majority of the apps was commercial, with five apps coming from nongovernmental organizations and five apps having unknown affiliations (see Supplementary Table S1). The apps targeted goal setting and behavior change most often in attempt to improve eating behavior. Similarly, goal setting was the theoretical background/strategy most commonly used by the apps. Six apps allowed data sharing to social media websites such as Facebook and Twitter and eight sent reminders; other technical aspects offered by apps included app communities, password protection, and log-in required.

Table 1 Characteristics of eating-related apps included in the study

Table 2 displays the mean quality score and standard deviation for each app, as well as the app’s mean score and standard deviation on each specific domain. The mean app quality score ranged from 2.58 (worst-rated app) to 4.87 (best-rated app). For the domains, the following score ranges were observed: 1.80 to 5.00 (engagement), 1.50 to 5.00 (functionality), 2.50 to 5.00 (esthetics), 1.60 to 4.64 (information), 1.00 to 4.50 (app subjective quality), and 1.00 to 5.00 (app specific). Overall, the 19 apps obtained a mean quality score of 3.52 (SD 0.70). On average, the best-rated section was functionality (mean = 4.15; SD = 0.87), followed by aesthetics (mean 3.54, SD 0.93), information (mean = 3.25; SD = 0.83), engagement (mean = 3.15; SD = 1.02), app specific (mean = 3.19; SD = 0.47), and finally subjective quality (mean = 2.44; SD = 1.12). Based on the division of the apps into tertiles, the minimum mean quality score to be considered as a best-rated app was 3.57.

Table 2 Description of eating-related apps included in the study

Kruskal-Wallis H tests showed significant differences between tertiles across all domains (see Supplementary Table S2). The most significant difference was in app quality, χ2 (2) = 16.01, p < 0.001 with significant differences between the highest and middle (p = 0.004) and highest and lowest (p = 0.007) ranked apps. Additionally, significant differences were found in the information (domain D) and app subjective quality (domain E) after adjusting for multiple comparisons (p = 0.007; p = 0.02). Figure 1 illustrates the difference in average scores between tertiles in each of the MARS’ domains. After correcting for multiple comparisons, no significant differences were found between the best-rated and worst-rated apps in the engagement (domain A), functionality (domain B), aesthetics (domain C), or app-specific (domain F) domains (p = .09; p = .0.06; p = 0.15; p = 0.11).

Fig. 1
figure 1

Ratings by tertiles across MARS domains

Discussion

Our study found highly significant differences between apps that received the highest versus lowest MARS ratings in overall app quality, information, and app subjective quality, suggesting that there is a large disparity in overall quality between apps. This finding is further supported by the significant differences between tertiles across all domains shown by the Kruskal-Wallis H tests. More importantly, the statistically significant difference in information suggests serious differences in, as per the MARS, “high-quality information from a credible source.” Given that these smartphone apps are aimed at addressing eating-related behaviors and have an impact on health outcomes, it is important to have reputable sources provide information. Specifically, information quality scores showed a broad range (1.2 to 4.6).

Apps with high scores in the information quality category came from reputable sources and included evidence-based information. The top-rated app in terms of information quality was Foodstand, which was developed by the social impact agency Purpose Campaigns, LLC. According to Foodstand’s app store description, the app is “science based” and was developed using the latest techniques in behavioral medicine and designed in collaboration with Registered Dietitians, as well as the Center for Science in the Public Interest (CSPI), Johns Hopkins Center for a Livable Future, and True Health Initiative (THI). An interesting feature is that when groups of users sign up as “teams,” Foodstand provides these teams with detailed weekly reports revealing trends, behaviors, outcomes, retention, and more. However, this app has not undergone any clinical trials, so no data are available about its user engagement or efficacy.

The app that received the second-best information quality score was Eat Right Now. Eat Right Now was developed by Dr. Judson Brewer MD PhD, an addiction psychiatrist and expert in mindfulness training (MindSciences Inc.). This was the only app included in the study that has clinical trial evidence behind it, though only a single-arm feasibility trial [32]. Conducted at the University of San Francisco, the trial examined 104 overweight and obese individuals and found a 40% reduction in craving-related eating, as well as reductions in reward-related eating and eating for social reasons [12••]. While data are suggestive of possible mechanisms of mindfulness in breaking reinforcement learning links between craving and eating, future randomized controlled trials are needed to determine efficacy.

Neither of the apps ranked third and fourth on information quality has undergone clinical trials (OA Workshops Free and OA Speakers Lite), but both are based on Overeaters Anonymous (OA), a twelve-step program, which has been reviewed and shown to be effective in several scientific studies [33, 34].

Illuminating the vast disparity in overall app and information quality among eating/food addiction-related smartphone applications, none of the remaining fifteen apps have been subject to any kind of scientific testing, so it is impossible to make any conclusions as to their engagement, efficacy, or effectiveness. Further, seven of the nineteen apps did not show any scientifically backed theory or evidence base: in their app store descriptions, these apps failed to provide any information about the validity of the program they offer. Not only have they not been tested for efficacy through clinical trials, they did not report that they had been developed in consultation with healthcare professionals. This puts the onus on the consumers themselves to be vigilant in screening apps before downloading and trusting their strategies. Future work for standardization in information provided by apps on app stores would help mitigate this (e.g. X has been verified in scientific studies or by healthcare professionals).

It is important to note that, by specifically targeting the addiction model via carefully curated search terms, our app search and screening process likely aided in weeding out many eating/diet apps without a scientific basis, such as apps targeting a forced restraint approach to dieting or simply counting calories. A search of the app store using the term “eating” resulted in 241 apps and “diet” resulted in 232 apps. This suggests that our search criteria narrowed the field significantly for this review and shows the significantly larger number of apps in the eating and dieting space that are being used in the consumer space and likely have not been evaluated for their scientific basis or efficacy.

This lack of theory-based approaches and scientific validation extends beyond apps related to food addiction, to all mental health apps: the vast majority of mental health apps are not backed by empirically based evidence. A recent commentary suggested that in the current landscape, there is no way to know whether or not mental health apps are effective and pointed out how some may even have adverse effects on mental health [35••]. For example, while a 2013 review identified over 1500 depression-related apps available in commercial app stores, only five of these apps were backed by scientific papers assessing their effects on mental health symptoms or disorders. Even among the health apps listed as “safe and trusted” by the UK’s National Health System (NHS), only 4 of the 14 apps devoted to treating depression offered scientifically based evidence to support their claims [36]. Furthermore, a separate analysis found that 35 of the apps originally included in the NHS library transmitted identifying information about users over the internet [37]. This library has since been taken down and replaced by a much shorter list of apps. Economists suggest that the lack of scientific evidence to support claims made by mental health apps is a product of the high cost of efficacy research, which reduces funds available for marketing and advertising.

Although apps claiming to prevent, diagnose, or treat diseases are often considered medical devices and subjected to regulatory scrutiny by the US Food and Drug Administration (FDA), apps that simply claim to improve mood or provide coaching are commonly able to escape such scrutiny. As a result, the potential risks associated with these apps are largely unknown. Researchers have identified two broad categories into which most health apps tend to fall: commercially developed apps with minimal supporting evidence and academic or government-developed apps that are more often backed by clinical trials. Unfortunately, the former tend to be more engaging for users, while the latter take much longer to develop and often seem outdated by the time they become available on app stores [35••].

Limitations

The most significant limitation of this study is that we were unable to include any apps in the evaluation that did not offer a free trial. Of the original 41 relevant apps identified by the raters in the iTunes app store, 16 were excluded due to lack of a free trial. Additionally, it is possible that we missed some apps that did not include any of our search criteria related to “food” and “addiction” in their titles or descriptions. Furthermore, the app market is constantly changing, with old apps being updated or removed from the app store and new apps being added. As a result, reviews like this will need to be updated regularly to follow the quickly changing digital therapeutic landscape.

Conclusions

As the regulation of mental health apps is murky, the burden of finding an app that effectively addresses the target behavior and has no harmful effects falls on consumers. Although thousands of mental health–related apps are commercially available, the number of these that have been tested in clinical trials is miniscule. Some apps may even provide incorrect and even damaging information and feedback. This is supported by the vast disparity we found between apps on information quality. Consumers should seek out apps that have been subjected to randomized controlled trials by independent researchers while actively avoiding apps that provide no scientific basis for their lofty claims of being able to cure depression, alleviate anxiety, or improve eating behavior.