1 Introduction to the Food Recommendation Problem

The importance of food to human life cannot be overstated. Food provides sustenance, but more than that it helps shape our identity [10]. Even in the recommender systems literature, two food recommendation papers—from different groups of authors—quote the idiom “You are what you eat” in the title [75, 153]. Food forms the basis of many of our social interactions. Friends tend to have similar eating habits [30] and our perception of others changes based on what we know about their diet [169]. Food also has major cultural and religious significance. Different cultures are associated with differing foods (think haggis, sauerkraut or frog’s legs and we are certain you can associate these meals with particular locations) and food forms the basis of celebrations and ceremonies regardless of where you are in the world [11, 210].

The importance of food can also be observed in many of the major challenges we face in modern society. Health problems ranging from obesity and diabetes to hypertension, heart disease and cancer, have all been associated with food consumption. Worrying increases in the prevalence of diet-related diseases suggest that people have difficulty finding and maintaining balanced diets. Similarly, the fact that food production accounts for over a quarter greenhouse gas emissions [33], as well as deforestation and loss of bio-diversity [170] have led to the suggestion that changes to individuals’ diet on a global scale could be a vital part of the solution to climate change [51, 142, 198]. These facets combine with the fact that, in many places in the world, we have never had such variety of food and food options, yet in other contexts paucity of choice has been documented [40]. Both an abundance and lack of choice have an influence on the food recommendation problem.

All of the points highlighted thus far combine to form the background to this chapter on food recommendation. For all of these reasons and more food recommendation is an important problem. But it is for the very same reasons that the problem is so challenging and worthy of scientific attention. The food recommender literature is still in its infancy and as such no formal theory specifically relating to food recommendation yet exists. There are, however, relevant theories from diverse fields, such as nutritional science (e.g., [36, 61, 140]), psychology (e.g., [164]) cultural science (e.g., [11]), and behavioural economics (e.g. [80, 175]). Food recommender research has in some cases been motivated by these theories and in some cases the results align with theoretical contributions from other fields. Where appropriate we refer to such links. Moreover, practitioners wishing to develop working food recommenders can profit from the results of past research. We summarise lessons that can be learned for these readers.

The chapter is structured as follows. In Sect. 2 we unpack the problem of food recommender systems by detailing the numerous possible facets, which could and should be addressed by researchers. In Sects. 3 and 4 we examine solutions proposed in the literature first, in terms of algorithms and second, via interfaces. In Sect. 5 we summarise evaluation methods, while 6 reviews the resources available for researchers in the field. In Sect. 7 we offer advice to practitioners based on the literature and our experience and present future aspects of research in the area.

2 Problem Description

In this section, we systematically define the various forms the food recommendation problem can take. We break the problem down into differing components by defining various user profiles, diverse food items that can be recommended, and different tasks, for which food recommenders can be used. A schematic illustration of the problem is featured in Fig. 1. As presented, the food recommendation problem is context-dependent and must account for different types of inputs, dietary constraints, and interfaces to generate an output, typically in the form of a ranked list. This multi-faceted structure and many other issues make the problem so complex as also acknowledged by the literature [140].

Fig. 1
figure 1

Schematic illustration of the food recommendation problem/process from input to output

2.1 User Roles and Groups

Food recommendations can be relevant to both individuals [55, 75] or groups of users [43]. Viewing these simply as two individual problems is an oversimplification, however, as these cases can be further teased apart. Individuals can, for instance, make personal decisions, such as what to eat themselves, at any given moment. Individuals can also, though, make decisions on behalf of a group of people, such as a family member deciding what the family will eat for dinner. At a group level, we can make a similar distinction. We may have a group that makes a shared decision, such as deciding which restaurant to visit together. Similarly, in a group setting within a restaurant, individual group members make decisions about what they are going to eat. We know from the literature, however, that what people choose will be influenced by the others present [121].

2.2 Item Types

At the lowest level, food items can be of two basic types. We distinguish between a basic foodstuff, such as an onion or carrot, and a food product, such as a chocolate bar. Both of these basic items can be referred to as grocery items in a shopping context, e.g., [5]. Basic items can be grouped into larger compounds to be recommended. The most prominent grouping in the literature is a recipe, which contain parts and/or multiples of both food items and food products, which would, in this context, be subsumed under the term of ingredients, e.g., [55]. One could argue that a food product is also the result of a recipe mixing different food items. For practical purposes, however, we define a food product differently since a precise separation into its ingredients is often not possible and typically impractical. When examining the food actually consumed, multiple food items, products, and dishes resulting from recipes can be integrated into one meal that is consumed together in one temporal context [50]. The most complex meal form is a menu, which contains multiple predefined dishes and possibly appropriate drinks and snacks (food products), e.g., [126]. When addressing a sequence of temporal occasions, a recommender would need to provide a meal plan, e.g., [44]. In a set of dishes, as given in a meal or menu, the single ingredients should likely harmonize or complement each other [22]. In a meal plan, on the other hand, the sequence of recommended items should be diversified.

2.3 Types of Food Choices

People typically make over 200 food choices every day [195]. Just as a few examples, people have to decide what to eat, when to eat, and where to eat. People also have to decide how to attain the food they wish to eat, to either cook it themselves or to buy it in a pre-prepared state. Then they need to consider where to buy their food. In the supermarket, at the farmers market, in a convenience store, at a restaurant, or should they have it delivered. A lot of these decisions are made out of habit without conscious consideration of the consequences. It is very difficult to formulate a recommender system that can address all of these situations. The recommendation of food is a collection of diverse and distinct sub-domains that each address a different user need. Despite these sub-domains revolving around the user’s interaction with food, the needs and goals of the user change in each scenario.

Different scenarios mean changing the means of assessing the suitability of food items both in terms of the type of item being recommended and the constraints considered in the recommendation process. Standing in a supermarket, for example, we may be more interested in the cost of a product compared to similar products. When we are going out with friends and searching for a restaurant, depending on our financial resources, the price may be less relevant than our current location and the mix of preferences in our group. These two simple examples hint at the complex network of constraints that surround a food decision. These incorporate aspects from the users’ personality, psychological state, and physical state, through location and socio-demographic factors, to health and environmental priorities. This complexity is reflected in models of food choice in the food and behavioural sciences (see e.g. [61, 140]). In Sects. 3.2, 3.3, and 3.4 we examine such factors in detail, showing how they should influence recommendations and how they can be contradictory.

To illustrate the complexity of food recommendation in the following sub-sections, we take a detailed look into different food-related scenarios (cooking, grocery, restaurant, and health recommender systems). We examine their associated user profiles (single or group), food item profiles, choice contexts, and recommender tasks.

The first three scenarios are based on a targeted action in a fixed location context. In the cooking recommender scenario, the user is at home and intending to prepare a meal. In the grocery recommender scenario, the user is visiting a store intending to buy items. In the restaurant recommender scenario, the users are looking for a restaurant that is reachable from their location and provides menu items suiting their food-related goals. The final scenario, health, is independent of a specific location or action. A health recommender in the context of food, adds the dimension of health to the recommendation process in each of the previously listed example scenarios. An overview of the number of works addressing each scenario is given in Fig. 2.

Fig. 2
figure 2

Number of papers focusing on one of the four major food recommender subdomains over the past 20 years

2.4 Cooking Recommender

One could imagine different categories of cooking recommender users. The first is a provider (e.g., a parent or carer) cooking a meal for someone other than themselves. This could be an individual or a whole family. Their goal, in this situation, would be to satisfy the eaters’ user profile, such as taste preferences and health needs. For example, in a family setting, the preferences of children are known to play an important role [134]. Individuals could also utilise a recommender for their own user profiles. One could imagine many different motivations for using such a system. For example, to diversify or extend one’s diet, to learn new cooking skills, or to improve one’s habits with respect to health, fitness, or sustainability. Different motivations will need to be serviced with different recommendations, and if systems fail to address these, then it is unlikely the recommender will be used. In a cooking context, the primary item type to be recommended will likely be recipes. The exception being the recommendation of substitute ingredients to make a recipe suitable, e.g., [38, 62]. From these recipe and ingredient profiles, ratings, flavour, and time requirements, are highly relevant. Allergies, intolerances, religious, and other dietary restrictions (e.g., vegetarian) are less likely to play a role since they would already have been addressed when buying the ingredients currently available in the kitchen. Important contextual constraints in a cooking scenario include those set by one’s own kitchen, meaning the available food items and kitchen equipment [74]. Additionally, the current cooking skills of the user and the available time play an important role [74]. A typical cooking recommender needs to rank recipes considering these user profiles and contexts. Examples of recipe and substitute recommender systems that target the cooking context are shown in Table 1.

Table 1 Work on recommending recipes, menus, and ingredients to cook

2.5 Grocery Recommender

User types in a grocery shopping context are very similar to those in the cooking context. Users are either shopping and making decisions for a whole household or for their own needs. Other relevant user profile criteria that should influence recommendations include: individual or group-specific preferences or nutritional needs [196]. In contrast to the cooking scenario, the grocery recommender focuses on single food items or food products. Such items could, of course, be derived from being ingredients in a recommended recipe. Even though price and quality are the most relevant item properties, the best price and highest quality will not always lead to a buying decision. Some criteria are hard constraints, such as individuals having allergies [162]. Others may be softer, depending on user priorities, such as price, ecological footprint or region. One could imagine systems recommending items from multiple stores (e.g., Google Shopping). In other cases, the physical location or a store preference would represent a hard contextual constraint for recommendations [32]. In this context, one typical use case is to replace food products with healthier but similar alternatives [70] fitting the overall shopping list. Examples of grocery recommender systems are summarized in Table 2.

Table 2 Work on recommending food products to buy during grocery shopping

2.6 Restaurant Recommender

A restaurant recommender scenario typically addresses groups of people who want to visit a restaurant together as a social event [138]. Positive user experience depends on user preferences and other aspects, such as location [205], cost [138], and cuisine [207]. It is possible, however, that different group members weigh these priorities differently. While some might prefer restaurants within walking distance, others might require affordable prices or even vegan menu items. As in the grocery context, some constraints are hard, e.g., allergies [99] and others soft, such as service quality [147]. The recommender can either focus on a whole restaurant or base the decision on the selection of available menu items for subsequent dish choices [106]. Within a chosen restaurant context, the ranking of available menu items could further consider the suitability of multiple items being combined into one meal [194]. Restaurant recommendations are very context-dependent, with location, social context, and temporal, both seasonal and daytime, context changing preferences drastically. Recommender tasks vary depending on the targeted items and supported contexts. Examples of restaurant and restaurant item recommender systems are summarized in Table 3.

Table 3 Work on recommending restaurants and their items

2.7 Health Recommender

Health can be considered in any of the previously presented scenarios. Users may either want to cook healthier alternatives of a recipe, buy healthier products, or choose a healthy restaurant. The health aspect requires a multitude of additional user profile information, such as age, gender, health status, or history of behaviour. In addition to diverse user profiles, health recommendations are typically targeted at specific user groups, such the elderly [146], children [131], hospital patients [90], individuals with diabetes [8] or simply users aiming for well-being [201]. Health recommender systems must balance user taste preferences with other criteria, such as the nutritional needs of the user [86]. In addition to all abovementioned items, meal planners are especially tightly connected to health since healthy people are often unwilling to invest time and effort into meal planning (see Tables 4 and 5). The temporal context is prominent in the health recommender scenario due to the balancing act of different nutritional requirements over time [54]. While all recommender tasks of the previous scenarios are relevant for healthy recommendations as well, personalization is a crucial aspect of these recommenders’ success. One recent advance in this area is the research in personalized nutrition according to genome and microbiome information [204]. Examples of health recommender systems in each of these categories are shown in Table 4 and in Table 5.

Table 4 Work on recommending healthy food for special target groups such as patients in hospitals, diabetes patients, children, and elderly. Table is sorted first by category and then by year
Table 5 Work on recommending food for health and wellbeing

3 Algorithms

In this section we examine algorithmic solutions to the food recommendation problem. First, in Sect. 3.1 we summarise research on the core problem, which focuses on recommending food items that appeal to users and which has typically been formalised as a prediction or ranking task. Predicting food choices is highly context dependent. In Sect. 3.2, we examine the evidence for which context variables can influence food choices and as such, should be accounted for by food recommenders. Finally, in Sect. 3.4, we look at formalisations of the food recommendation problem that go beyond predicting appreciation to incorporate the complexities outlined in Sect. 2. We conclude each section by reflecting on how the contributions made in the literature relate.

3.1 Food Recommendation as a User-Item Ranking Problem

Regardless of the type of food item being recommended, the variant of the food recommendation problem that has received the most research attention is the most basic formulation where only user food preferences are accounted for. The system suggests food items (i.e. recipes, meals, products etc.) that are estimated to appeal to or be appreciated by users. Typically, researchers have formulated this as a prediction task. The aim being to “learn” the preferences of system users and make predictions for how a given user would rate a given recipe. Systems can then be compared with respect to, for example, the extent to which the predicted ratings deviate on average from the actual ratings users provided (e.g., [55, 75]). A second popular approach—and this is becoming the standard option in the literature—is to formulate the recommendation task as a ranking problem (e.g., [180, 181]). Here the aim is to provide a ranking of food items for users where the items ranked highest are predicted to be appropriate for or most likely to be accepted by the user.

Most of the key algorithmic approaches from the general recommender systems literature (i.e. content-based in Chap. “Semantics and Content-based Recommendations”, collaborative filtering in Chap. “Advances in Collaborative Filtering”, knowledge-based and hybrid approaches) have been evaluated with data relating to food items of different types. There is little clear evidence to suggest that one approach performs better in any particular situation.

For ease of narrative, we focus the examples in this section on recipe recommendation as the majority of the of the literature has focused on this kind of food item. Recipe recommender systems provide recommendations for the user(s) to cook based on their profile(s) and potentially additional constraints, such as context information. Nevertheless the reader should bear in mind that similar core algorithmic approaches have been tested on other food and food-related items, such as restaurants, e.g., [138, 207, 208], cooked meals from menus [88, 92], meal plans [44, 94, 180] and products in supermarkets [100].

When content-based methods are used, items (typically recipes) are represented based on the contained ingredients and the similarity is estimated between items, as well as between items and user profiles modelled in different ways. The representations employed have varied with vector based representations [55, 75], topic model–based representations [95], dependency-tree representations [84], and multi-modal representations that account for different aspects of recipe content [123] all being utilised.

Diverse collaborative filtering-based methods have also been applied, ranging from nearest-neighbour approaches [55, 75] to singular value decomposition [75] and matrix factorization [64, 65] to latent dirichlet allocation and weighted matrix factorization [180]. All of these approaches allow the interaction between user and food items to be exploited in the recommendation process. Finally, hybrid methods have tried to combine the ideas behind the different content and collaborative filtering approaches. Two good examples of hybrid approaches are Freyne and Berkovsky’s [55] combination of a user-based collaborative filtering method with a content-base method. The same authors also tested a second hybrid technique where three different recommendation strategies were combined in a single model, with the exact strategy followed being determined by the ratio of the number of items rated by a user and number of items overall. Such hybrid approaches can be very effective. For example, in the experiments reported on by Harvey and colleagues [75], the best performance was achieved using an SVD approach augmented with content-based biases (see also [184]). A further hybrid approach is to exploit knowledge about the user and her goals, preferences and mood, as well as knowledge about the content and properties of food-items. Musto and colleagues evaluated what they refer to as a knowledge-based recommender with mixed results [126]. Many of the factors did not correlate with preferences, but there were hints that knowledge about gender and mood can be useful (we discuss these and other context factors in greater detail below). Other Knowledge-based recommenders have been proposed in the literature (e.g., [72, 82]), but lack of data resources have hindered their utility and also evaluation. Recent developments may change this trend. Haussmann and colleagues [76] recently presented a unified diet related knowledge graph incorporating aspects of foods and ingredients, nutritional knowledge and medical conditions, as well as how these relate. Incorporating such knowledge into the recommendation process could allow better personalised or context sensitive recommendations, for example, to account for available ingredients or cooking equipment.

A detailed overview of the algorithmic approaches that have been tested in a food recommendation context in [179]. Nevertheless, it is difficult to draw direct conclusions when comparing the contributions summarised above as the experiments performed were conducted using different, often small, proprietary datasets. The experiments were, moreover, often setup differently (i.e. to minimise predicted rating errors vs rank problem), with different algorithmic implementations being evaluated. One of the few trends one can find in the results is, for example, that both Harvey et al. [75] and Freyne and Berkovsky [55] report ingredient based CB methods to outperform CF approaches.

Trattner and Elsweiler [181] went some way to resolving the issue of comparability of results by testing several recommender algorithms using standardised implementations and a large, naturalistic dataset sourced from the online recipe portal allrecipes.com. This analysis showed a different outcome to Harvey et al. and Freyne and Berkovsky in that collaborative filtering methods clearly outperformed content-based approaches in their experiments. The results, while contradictory, are not incompatible. Further analyses by Trattner and Elsweiler where the user sample size was varied show that CF methods only start to outperform content approaches when a sample of 637 users was tested (roughly 50% of the data set). In both of the previous studies much smaller samples were employed. While unsurprising this result highlights the dangers of over interpreting studies with small, homogeneous samples.

A second point regarding providing food recommendations purely-based on user taste profiles is that there has been surprisingly little work exploiting different content-based aspects such as how the food looks and how the food tastes. This is despite the fact that we know that human food choice to be highly visual [35, 110] and that flavour preferences vary [107, 202]. Some recent studies have shown that visual information encoded within recipes accessed via online recipes can be used to predict user preferences. For example, Yang and colleagues learned users’ visual food preferences and were able to improve predictions with their models [201]. Elsweiler and colleagues showed that low-level image features, such as brightness, sharpness and contrast in photos, can be used to predict user preferences from recipe pairs [46]. Zhang et al. [209] tested the same features along with deep neural networking approaches on the images on three large recipe portals. The results showed that both approaches were able to offer predictive power, but the DNN approaches worked best. There is considerable scope to investigate how visual information can be best combined with other information in hybrid-approaches. Similarly, early work on the relationships between the flavour components in recipes from online portals suggest strong patterns can be observed in tastes, particularly in different geographical locations [4]. Accounting for different aspects of content is particularly important given the low performance found on food datasets.

This is a third striking observation relating to algorithms for food recommendation: Standard recommendation algorithms perform significantly less well when used for recommending food items than on other problems, such as movies or online purchases in an e-commerce context. As a point of comparison, for example, Rendle and colleagues’ [143] experiments when evaluating BPR (Bayesian Personalized Ranking from Implicit Feedback) against other modern and benchmark algorithms on movie and online purchase data achieved similar performance on both data sets. The AUC values achieved were consistently above .85 with several algorithms on both collections with the best performance found being 0.89 on the e-commerce data set and 0.93 ). Using the same algorithms on two different recipe datasets, however, attained much poorer performance scores (AUCmax=0.71). There are many possible reasons for this. One such explanation is that it has less to do with the kind of item being recommended and more to do with the property of the dataset, e.g., how dense the ratings are for users. Trattner and Elsweiler also experimented with this aspect by taking samples with different groups of users and items [181]. This did not, however, influence the maximum performance achieved. Yet another potential explanation is that food taste profiles are less stable than, for example, preferences for movies. As we discuss in detail in the following subsection, food habits are extremely context-dependent—what people eat depends on who they are with, where they are, financial and time constraints, as well as, as we shall see, many further factors.

3.2 Context-Dependent Food Recommendations

Context-dependent food recommenders alter the recommendations the provide to account for aspects of context. As has been demonstrated in evaluations of other kinds of recommended items, such as music [15], movies [154] and hotels [104], the appropriateness of recommended food-items is highly context-dependent. Using naturalistic data collected via online food-portals, e.g., [29], as well as laboratory studies, e.g., [75] researchers have gleaned insight into which contextual variables influence the acceptability of food-item recommendations.

Harvey and colleagues [75] studied the reasons why recommendations were considered suitable or unsuitable by participants in a longitudinal, naturalistic study. Many of the reasons provided related not to taste or to the content of the suggestion, but to the relationship between the recipe and the current context. For example, a recipe may have been appealing, but the participant, at that moment, sometimes lacked the time or cooking equipment necessary to prepare the meal, making the recommendation unsuitable.

Time is an important contextual factor for food recommendation. There is strong evidence suggesting that the food items users prefer varies seasonally. Numerous investigations have identified temporal trends in food choices, including the nutritional content, style of food and ingredients contained in recipes. These temporal patterns have been discovered in analyses of food related Tweets [67], in interaction data from online recipe portals [97, 191] and for searches for food using web search engines [197].

Food serves more that sustenance and relates to identity, health and well-being, social relationships and ritual [11]. As such, food choice is culturally embedded and culture becomes an important context variable for food recommender systems [209, 210]. In recommender systems and related fields culture has typically been operationalised using location. Several analyses have taste preferences to vary within (e.g., [185, 212]) and between (e.g., [191, 209]) countries. The size of the city has also been shown to be a location-related factor [29], as has the availability of food in specific locations [39]. Examining ingredients contained within recipes, as well as the flavour components making up ingredients Ahn and colleagues revealed that ingredients that are often paired in recipes vary across geographic regions [4]. Whereas Western cuisines show often use ingredient pairs that share many flavour compounds, East Asian cuisines tend to avoid compound sharing ingredients. Sajadmanesh analysed the content of crawled recipes from 200 different cuisines, identifying strong geographical and cultural similarities on recipes, health indicators, and culinary preferences [155]. These analyses focused on ingredient, flavour and nutritional content of recipes, whereas Zhang and colleagues examined the visual aesthetics of online recipes and how this varies across countries [209]. Again they identified strong regional variation. They found that food images perceived as attractive vary between users of German, American and Chinese food portals, but the visual ideals of German and American users seem to be closer than to those of the users of the Chinese portals. The empirical findings of these food recommender related studies align well with the theories underpinning food choices in other domains, which underline the complex, multi-facted, socially influenced, and personally variable (see e.g. [61, 140]) nature of the problem.

3.3 Preferences Vary Across User Groups

The research efforts summarised in Sect. 3.1 have shown that recommendations can be improved by personalising items to user taste profiles. Section 3.2 used further research to argue that personalisation efforts can be improved by accounting for context-factors. Here, we take this one step further, citing evidence suggesting that food recommender systems should behave differently for different groups of users to reflect varying food preferences groups. Gender is a good example of this, with evidence suggesting that male and female users prefer different dishes, make use of different spices and own and utilise different kitchen utensils [148]. Harvey and colleagues grouped users based on their attitude to healthy eating [75]. In their study, a small group of users who identified themselves as being health conscious behaved in a manner which reflected this. These participants rated meals with higher fat and calorific content negatively whereas they rated lower-fat, lower-calorie dishes more favourably. In the remaining participant sample no such relationship was found. Other research indicates that similar groupings will occur in different contexts. For example, hardened meat eaters should be supported differently to those open to transitioning to vegetarian or vegan diets [13]. Even within the latter group users may be grouped by their receptivity to meat-replacement products [159].

3.4 Variations on the Food Recommendation Problem

As emphasised in the introduction, what makes food recommendation such an interesting and challenging domain is the fact that the problem itself varies as the user or users have different needs, goals and priorities in different situations.

One way of thinking about more complicated food recommendation situations is to treat these as multi-objective optimisation problems. For example, as people often eat socially, group recommendation becomes important. In group recommendation situations recommendations are optimised to suit multiple taste profiles with the preferences of different users being traded-off or balanced against each other [17, 43]. Similarly, in a health context, the recommendations are not only derived such that they cater for user taste preferences, but also for some additional property that accounts for the healthiness of the food. This is important both for users who wish to prevent illness [45, 65, 180] and those who are ill and wish to manage systems or recover, e.g., diabetes patients [8, 25, 101, 166]. Food choices are increasingly made considering the environmental impact of the dish [13, 79]. Again, as with health-aware systems, recommendations must trade-off the user’s food preferences with some measure of environmental impact to satisfy this goal. This could be measured, for instance, in terms of food miles [79], carbon dioxide emissions [13, 41] or some other combined metric of environmental impact, e.g., [34].

As the literature is most pronounced in the area of health, we will focus on this domain. However, with few exceptions the technical solutions that have been employed to derive healthy in food recommendations could equally be applied for sustainability.

Health-Aware Food Recommendations

One means of providing health-aware food recommendations is to alter the recommendation algorithms to account for some health property. Ge and colleagues [65] achieve this by means of a calorie balance function of the difference between the calories the user still needs (calculated based on foods eaten that day and an estimation function) and the calories of the recipe. The smaller the difference is, the healthier the recipe is estimated to be. This assumes that calorie balance is one indicator for health. A similar approach was suggested by Elsweiler et al. who proposed optimising recommendations based on a weighted linear combination of recipes predicted to score highly with respect to user taste, and low distance from an estimated nutritional requirement [45].

Trattner and Elsweiler experimented with post-filtering methods where recommendation rankings were altered. In such an approach each item (recipe) for a particular user is re-weighted according to a scoring function relating to one of two health metrics (The WHO or inverse FSA metrics discussed above) for the recipe [180]. Their results demonstrated far superior approaches to the linear combination suggested previously by Elsweiler et al.

Ueta and colleagues [188] presented a system, which aimed at recommending recipes with particular health-related goals in mind. The starting point is a user provided query that provides context for recommendations, e.g., ‘I want to cure my acne’ or ‘I want to recover from my fatigue’. To achieve this a co-occurrence matrix was established between 45 common nutrients and nouns, such as ‘cold’, ‘acne’, ‘bone’ etc. By creating a nutritional profile for the user query, recommendations were provided from a large pool of recipes sourced from the recipe portal Cookpad.com.

Moving away from simply recommending different items by re-scoring existing recipes, Chen and colleagues algorithmically generate “healthy” pseudo recipes. A pseudo-recipe consists of a list of ingredients and respective quantities, with the nutritional values of the pseudo-recipe matching the predefined targets as best is possible [28]. To generate a pseudo-recipe, the authors propose an embedding-based ingredient predictor, which represents all ingredients a latent space and predicts the supplemented ingredients based on the distances of ingredient representations. A second component computes the quantities of the supplemented ingredients. The framework was tested on two large online recipe portals, allrecipes.com and yummly, with the experiments showing that the approach is able to improve the average healthiness of the recommended recipes without requiring any pre-computed nutritional information for the recipes.

Rather than generating completely new recipes, another means of making recipes more healthy is to substitute one or more ingredients within recipes to improve the “healthiness” of individual recipes. Scholars have investigated different methods of generating plausible substitutions. Achananuparp and Weber, for example, used food diary data to test several approaches inspired by the distributional hypothesis in linguistics, that is, foods or ingredients that are consumed in similar contexts are more likely to be similar dietarily and can therefore be treated as substitutes [2]. A crowd-sourced evaluation demonstrated the feasibility of such an approach. A different approach to the same problem was taken by Gaillard and colleagues, who extended a generic case-based reasoning system to handle both ingredient substitution and ingredient amounts using a formal concept analysis and mixed linear optimization [62]. Teng et al. used the user comments associated with recipes to generate an ingredient substitution network [174]. Comments in the form of, for example, “I replaced the egg with soya flour to make the cake vegan” were first parsed patterns matching the form of ”replace a with b”, “substitute b for a” etc, were isolated and matched against lists of ingredients. This allowed a directed ingredient substitute network to be built representing users’ knowledge about which ingredients could be substituted. Rather than examining the feasibility of substitutions, the utility of the network was shown in a recommendation task where the system should predict, from a given pair of similar recipes, which one has higher average rating.

How such substitution approaches work in health contexts, however, have yet to be evaluated. Initial steps in this direction were taken by Kusmierczyk and colleagues whose findings illustrate that to some extent it is possible to recommend a user substitute ingredients based on the their previously uploaded recipes and accounting for context information [98]. A further initial effort was published by who performed clustering analysis of foods with diabetic patients in mind. Employing Self-Organizing Maps and K-mean clustering on nutritional components of food items, in order to provide appropriate food item substitutions for diabetic patients.

A different algorithmic approach with health in mind is not to recommend meals as independent items, but to group them to create dietary plans [44, 57]. This fits with nutritional advice suggesting that individual meals themselves are not unhealthy, but rather should combine to create a balanced diet [54].

Lee et al. [101] proposed a system incorporating an ontology, personalisation and fuzzy logic as means to utilise uncertain data and knowledge to create meal plans for diabetic patients. Domain experts were used to evaluate the output of the system and while the details on the evaluation are minimal, the authors claim that the evidence shows that the proposed approach can work effectively and that the menus can be provided as a reference for patients.

In a research project associated with malnutrition in the elderly, Aberg [1] proposed a menu-planning tool which accounted for several sources of information and constraints, many of which were discussed in Sect. 3.2. Aberg accounted for user taste preferences and dietary restrictions (e.g., allergies); the nutritional make up of recipes; how long a meal would take to prepare, as well as how difficult it is to prepare; the cost and availability of the ingredients and the variety of meals in terms of used ingredients and meal category. To account for all these requirements, Aberg employed a design combining diverse technologies including collaborative filtering, content-based and constraint-based recommendation. The constraint-based component constructed the optimal meal plans. The paper presents a prototype system, which recommends meal-plans to users over particular time periods. Although the authors describe an ongoing user-base, to our knowledge no formal evaluation of the system was published.

Elsweiler and colleagues [44] evaluated a meal planning algorithm for a more general user group whose goal is to nourish themselves for well being. Rather than recommending individual food items, a sequence individual items, optimal under certain criteria are recommended. Starting from a personalised recommended recipe ranking generated from a recommender (as described in Sect. 3.1), the algorithm combines two main meals (for dinner and lunch) with a breakfast, plus an allowance for drinks and snacks such that the user’s daily nutritional needs are met. Nutritional needs were calculated firstly by estimating the daily calories that should be consumed by the user and then breaking this down to determine where the calories should be sourced (e.g., from proteins, carbohydrates, fats etc.). A simulated study was devised to test the approach systematically. Plans were created based on recommendations for given user taste profiles mined from a naturalistic dataset, such that they met the needs of diverse personas. Personas were defined user profiles that included details which influence nutritional required nutritional intake, such as height, weight, gender, age, nutritional goal (lose/gain/maintain weight) and activity level (from sedentary to highly active) [44]. While the meal-plan generating component is far simpler than that proposed by Aberg, the evaluation presents analyses on the properties of the generated plans. For example, in addition to testing the feasibility of creating plans that meet theoretical user nutritional needs, the authors explore how plans relate to user taste preferences, as well as diversity

Recommendations for Behavioural Change

Many of the technical and empirical contributions described in this section have the goal of behavioural change of some kind. That is, systems aim to alter the eating habits of the user for his or her own benefit. For example, to prevent illnesses associated with being overweight or obese, healthier recipes, meal plans or food items are recommended. In the literature, technical or empirical contributions have been described in conjunction with theories or frameworks from other fields, such as behavioural sciences. In the remainder of this section we describe contributions from the recommender systems literature that reference or were inspired by such concepts.

The behavioural sciences have provided several theoretical frameworks to discuss behavioural change. One such concept, bounded rationality, allows decisions to be influenced by accounting for the cognitive limitations of the decision-maker. Two competing ideas associated with bounded rationality are those of ‘nudges’ and ‘boosts’. ‘Nudges’ are interventions that steer people in a particular direction while preserving their freedom of choice (e.g., [71, 161]). Boosts, on the other hand, are educational in nature, where interventions provide the user with the knowledge or tools to make better decisions for themselves (see [80] for a detailed comparison of these approaches). In the food recommender literature ‘nudging’ has been explicitly referenced by Elsweiler, Trattner and Harvey [46] who employed machine learning techniques with low-level image properties of recipes photographs to predict, given pair of similar recipes, which would be preferred by the user. In their experiments they show that when two randomly chosen similar recipes are chosen, users chose the recipe with the highest fat content most often. However, when pairs were selected such that the recipe with the lowest fat content was predicted to be the most visibly attractive, this trend was reversed. As has been demonstrated in diverse contexts from politics to energy consumption [175], this kind of ‘nudge’ can be a powerful means to influence individual user choices. One major limitation of ‘nudges’, however, is that no learning takes place and when the nudge is no longer applied ’normal behaviour’ returns. There are also major ethical discussions regarding the freedom of choice for users. By educating users via interventions ‘boosting’ is advantageous in both these respects. To our knowledge, the food recommendation literature has not explicitly referenced ‘boosting’. However, explanations for recommendations, which have been suggested by several scholars (see Sect. 4) would correspond to this kind of intervention. For example, an explanation in the form of:

This recipe was recommended because it contains lentils, which you like. Lentils are an excellent source of B vitamins, iron, magnesium, potassium and zinc.

would correspond to a ‘boost’ as described by Hertwig and Grüne-Yanoff or an ‘educative nudge’ as defined by Sunstein [172].

Most studies have been limited to decision-making in individual moments. Few studies have monitored users over a longer time period to establish long-term behavioural change, which would be more difficult and require ingrained personal and social practices to be adapted [157]. Starke argues that whereas it is easy to achieve a smaller changes in behaviour, the example he provides is eating two cookies a day instead of four, moving away from one’s ingrained behaviour is challenging. To account for the difficulty of changes, Schäfer and Willemsen [157] proposed the use of the psychometric Rasch model to conceptualise nutrient intake as a single-dimensional construct. They argue that a user’s willingness or ability to make changes can be observed via current behaviours. This means that easier changes are associated with high probability of success, while changes associated with higher costs or difficulties are less likely. This can be used to tailor the user’s goals and make change more manageable. Schäfer and Willemsen investigated the idea of tailoring the goals of a nutrition assistance system based on the user’s abilities according to a Rasch scale. Evaluating two versions of a mobile system that tracked the user’s diet and personalised recipe recommendations. The control version targets optimal nutrition and focuses on the six nutrients most necessary of change. The experimental version tailors the advice to the next six achievable nutrients according to a Rasch scale. The results of a two-week study indicate that the tailored advice led to higher success for the focused nutrients and was perceived by users to be more diverse and personalised and therefore more effective.

4 Interfaces

The evidence suggests that when and how food recommendations are provided influences how users interact with these. For example, Trattner and colleague’s study of how online recipes are interacted with over time emphasises that despite the similar functionality and look and feel of the food portals Kochbar.de and Allrecipes.com, the presentation of recommendations has a strong influence on whether or not they will be bookmarked by users [184]. In this section we wish to provide the reader with an overview of the various interface options that have been proposed for food recommender systems. We summarise different interface components that one can find in the literature, providing examples as necessary. We further give one example of a popular commercial application for each of the scenarios defined in Sect. 2.

4.1 Presenting and Accessing Recommendations

Trattner and colleagues [184] demonstrate that since Kochbar only promotes newly uploaded recipes, there is a sharp drop-off in the frequency of bookmarks after a short amount of time has passed. This is not the case on Allrecipes.com where recipes have a longer active lifespan.

The importance of recommendation presentation is also emphasised by Chen and Keung Tsoi’s [27] results when comparing three common layout designs: list, grid and pie. Whereas with list and grid format interactions with items tended to be focused on the top-3 recommendations, interactions were more evenly distributed in pie layout. The non-linear formats seemed to be preferred by users and additional effects, such as in increasing users’ confidence in their decisions.

Current online food portals typically display recommendations as a simple list (e.g., kochbar.de) or in a grid format (e.g., allrecipes.com, cookpad.com). There are many such portals with similar functionality. Despite being three of the most commonly referred to sites in the recommender systems literature, none of these services provide personalised recommendations to users. Rather, they make general recommendations and combine these with faceted search interface.

Research prototypes described in the literature make use of the same display approaches. One can find recommendations presented in a list format, e.g., [73, 188] and as a grid, e.g., [43]. To our knowledge no system has been published where food item recommendations are presented using the pie layout.

Different pieces of information are typically shown for recommended items. Geleijnse [66], for example, present a graphical representation of the healthfulness of a recipe, which may influence whether this meal is cooked or not.

Elahi and colleagues present an interface, which allows users to clarify their needs using tags [43]. Not only do these tags help users specify their preferences, but the authors’ experiments show that this extra information can improve the accuracy of recommendations.

Svensson et al. [173] proposed a system named Kalas that allowed recipes in a large database to be navigated socially (see Fig. 3). Different kinds of social navigation were offered in order to study their respective effects on user behavior. Recipes were grouped into sub-collections with specific themes (e.g., vegetarian or spicy food). Traversing recipes could be influenced by other users logged into the system as the real-time presence of others and their navigation trails is displayed. The recommender functionality—also achieved via a socially related collaborative filtering algorithm—affects which recipes appear on the screen. Moreover, recipes can be commented on by users, and the details of past interactions with recipes is shown. Users can also chat about recipes in a chat function. Kalas is one of the few systems to be evaluated in the literature. 302 participants used the system for 200 days, with 18% of cooked recipes coming from recommendations.

Fig. 3
figure 3

Socially navigating recipe collections and recommendations with KALAS [173]

One point of note in the KALAS evaluation was that half of the system’s users did not understand how recommendations were generated. Explaining recommendations has become an important topic in recommender systems to increase transparency, trust, persuasiveness and satisfaction. We view explanations as a particularly important aspect in food recommender systems, particularly if the goal is to encourage a positive behavioural change. This is underlined by the use of food examples in papers, such as [203]. To our knowledge, however, only a few publications in the food recommender systems make note of explanations.

Elahi et al. [43] present recommendations along with tags as a means to explain recommendations. Moreover, users can explain the ratings they apply using tags. Both Leipold et al. [103] and Schäfer et al. [157] provide textual explanations for the provided recommendations, which emphasise the macro-nutritional benefits of the dish with respect to what is missing in the user’s previous intake and at the same time present in the recommended recipe. The example presented in their paper (see Fig. 4) describes the vitamin B2 content.

Fig. 4
figure 4

Explanations via macro-nutritional content translated from Schäfer et al. [157]

In a restaurant recommendation context location becomes an important factor with respect to the selection of recommendations. Systems, such as that presented by Vico et al. [190] (see Fig. 5), often use maps as a means to justify or explain recommendations [138, 163, 205].

Fig. 5
figure 5

Explaining restaurant recommendations with geographical context (Vico et al. [190])

Research from HCI could provide clues as to how recommendations may be presented in the future including the provision of explanations. Henze and colleagues [78] prototyped different means of augmenting diverse food items with information. In the surveys and focus groups performed to evaluate the concept participants were overwhelmingly positive about the idea and were creative in providing potential use cases.

4.2 Eliciting User Needs and Preferences

This section describes different interface variants for establishing the information required to generate suitable recommendations. This includes establishing user taste-preferences, but also additional criteria, such as the nutritional requirements of users.

Typically recommender systems learn user preferences based on past interactions with food items, e.g., ratings [55], bookmarks [180] or tags [43] applied to recipes. Yang et al. [200] raise different problems with this approach in recipe and restaurant recommendation contexts citing the high cognitive and time load on users, as well as a data sparsity issue. Similar issues have been highlighted for lightweight food diary systems where users take photos of the food they eat (e.g., [37, 201]). Yang and colleagues proposed a system called PlateClick to address these issues and explore the advantages of visual strategies for quickly eliciting user food preferences. The aim was to create an engaging experience for users by repeatedly presenting them with algorithmically generated pairs of visually-similar recipes to choose between. They could either indicate their preference or assert that neither recipe appealed. The system was evaluated by means of a field study of 227 users with the results showing the visual comparison method to significantly out-perform baselines. This approach has subsequently been applied in other experiments and systems including [46, 201].

Increased popularity of conversational assistants, such as Siri and Amazon Echo have led to interest in conversational recommendations [31] including for food [16, 59, 156]. Barko-Sherif and colleagues explored the potential for conversational preference elicitation in a food recommender context via a Wizard of Oz Study [16]. Using a between groups design, spoken and text-input chat interfaces were compared where the user interacted with an assistant to explain and refine the criteria of their needs and preferences. This work demonstrates that such interfaces may hold utility. Samagaio presented a RASA-based chatbot that is able to recognise and classify user intentions in a conversation designed to elicit food preferences for recommendation [156].

Other hardware developments with scope to influence food recommender systems relate to wearable and pervasive technology. One popular idea is to make use of wearable cameras to identify consumed food [132, 133]. Further wearable sensors have also been proposed to detect food intake [52]. Smart trays [211] have been proposed for this purpose too. Another use of wearable technology is to determine user calorie burn as an input to food recommendation algorithm [65].

4.3 Commercial Applications

In contrast to most research applications, commercial products concerning food recommendations often present rich and interactive interfaces while suffering from a lack of algorithmic recommender solutions. Figure 6 presents a popular commercial system for each scenario. The applications were selected by taking the first results on searches for the term cooking/grocery/restaurant/healthy food recommendation application. The most popular cooking application, BigOven,Footnote 1 provides a large recipe database that can be searched by themes, contexts, or ingredients. It further offers interfaces for meal planning and grocery shopping based on selected recipes. The most popular grocery application, AnyList,Footnote 2 offers an interface for organizing grocery items, including their location contextualization, cost aggregation, recipe disaggregation, and a meal planner. The most popular restaurant application, Yelp,Footnote 3 offers a ranked list and a map view of recommended restaurants as well as filters for cuisine preference, pricing, distance, and temporal context. In the case of healthy food, the second result, MyFitnessPal,Footnote 4 was chosen due to the first choice providing only a glossary. MyFitnessPal offers collections of healthy recipes, including their nutritional information, as well as personalized health feedback on current and previous intake when adding these recipes to a personal profile.

Fig. 6
figure 6

Commercial applications for cooking (A, BigOven), grocery shopping (B, AnyList), restaurant visits (C, Yelp), and health (D, MyFitnessPal)

5 Evaluation

Researchers have different means of evaluating food recommender systems depending on the type of problem they are tackling, the stage of development, and the availability of target group users. An overview of their usage over the past 20 years is shown in Fig. 7. The most frequently used method is the offline evaluation of new algorithms based on compiled dataset (e.g., [181]). Another frequently applied method is to use surveys to collect ratings and feedback for a new recommender system (e.g., [126]). A third common type of evaluation is to perform a user study, which can be varied by type and scales, depending research goals and available resources. These methods are described in detail below. Other methods found in the literature are rarer. These include field studies where the system is used in the wild [196], expert evaluations [49], interviews [48], and case studies [205].

Fig. 7
figure 7

Number of papers conducting one of the three major evaluation types over the past 20 years

Offline Evaluations are used across all types of food recommender systems. However, it relies on the existence of a large ground truth or benchmark dataset. These are most easily accessible via recipe databases, restaurant databases, or online grocery stores. The most frequently targeted variable is, in all cases, the rating of items. The dataset is typically split either by a chronological context or by using k-fold cross-validation. The offline evaluation has also been used in the context of substitutes and health. For substitutes, the ground truth datasets can be derived from large consumption databases or from recipe databases. In the health area, recipe databases can be a good source for offline tests if they contain nutritional information. The benefit of this method is the relatively low effort and cost. One of the biggest limitations of this approach is that it cannot capture the actual rate at which users would follow the given recommendation since there is a discrepancy between online rating behavior and actual preference and behavior. One way to get closer to the actual acceptance rate of the new recommendations with relatively low cost is to conduct online surveys.

Online Evaluations come in different forms such as surveys, user studies, field tests, expert evaluations, and interviews. The two main ones used in food recommender systems are surveys and user studies.

Surveys have been conducted via Mechanical Turk [126] or by inviting mailing lists to a survey tool [201]. The goal of most surveys being to attain user feedback with respect to a developed recommender system’s output compared with that of a baseline or a second variant of the system [181]. Surveys are often used for recipes or menu recommenders where the user has a clear preference in mind (see Table 1). In a health context surveys can be used to correlate the user’s reaction to healthy food with other variables such as demographic information or health attitude and intention [5]. Of course, any online survey can also be conducted in person if the required sample size allows for it. The primary advantage of online surveys is that it allows high number of participants to be reached. On the other hand, the quality of feedback can vary largely, which is why safeguards need to be implemented [116] to filter out random responses or contradicting responses.

User Studies offer the richest source of insights but are also time and cost-intensive with many pitfalls in terms of the study design. It is unsurprising, therefore, that most user studies conducted on food recommender systems are connected to the area of health where it is difficult to source offline data. Characteristically, user studies conducted in the food context provide the participants with a system that they can use either in predefined sessions [117] or over a longer period [23]. In addition to measuring the user’s behavior and interaction with the system, feedback on the system and characteristics of the user are often retrieved via questionnaires or interviews before, during, or after the study. User studies can further target different types of insights. The most prominent approach is to compare two systems with each other in order to validate one system’s usefulness or effectiveness, and to gain insights into the user’s behavior for system improvements. In a health context, the duration is of high importance. While limited interaction sessions can show the usefulness of a system, the effectiveness is often only measurable after several weeks or longer of using the system [24].

The variables that are measured during user studies typically focus on the user’s choices and actions in the system, e.g., preferring one recommendation over another or choosing one item from a list, their perception of the system, e.g., a system usability score, and their food-related behavior when using the system, e.g., dietary diary [157] or grocery bills [114]. One limitation of studying the behavior within such a setting is that many users behave differently when they are part of a study (known as the Hawthorne effect [120]).

6 Implementation Resources

In this section we describe implementation resources including datasets and frameworks typically used in the context of food recommender systems research and development. This section also introduces resources to compute and measure, for example, health, flavour and sustainability.

6.1 Recipe Datasets

To date most food recommender systems research has been performed in the context of recipes [179]. That being said, almost no publicly available datasets exist that make it easy for the research community to perform standardized benchmarks. Frequently resources are sourced online, for example, via recipe portals, such as Allrecipes.Footnote 5 Other commonly used Web resources in the European context are Chefkoch.Footnote 6 To obtain these data sources, researchers typically have to implement Web crawlers and be careful not to breach the Terms of Services that prohibit sharing of the gathered data or crawling.

Other data sources that have often been used in the literature in the context of food recipe recommendations are CookpadFootnote 7 and Yummly.Footnote 8 These services support the research community with an affordable licence model to access their recipe collections.

Non Web-based resources that have been employed includes CSIRO’S Wellbeing Diet Book,Footnote 9 which was used by [43, 55]. The dataset is, however, not publicly available.

The Koch-Wiki dataset used in [103] represents a further resource. Unlike the other sources mentioned so far, the Koch Wiki shares data trough a Creative Commons licence, which makes it easy to not only to study the data itself but also to re-use the recipes in ones own research. One limitation is that the data source is in German and does not include any kind of interaction data to perform offline experiments.

Finally, the one million recipe dataset collected by MITFootnote 10 researchers and made publicly available. The dataset comprises over one million recipes including food images and some meta-data and is a web crawl of the most prominent food recipe websites including Allrecipes and Foods. The dataset has to date mostly been used for food classification tasks, such as the image2recipe task, where the aim is to predict the recipe and the ingredients of a recipe given a picture alone as input to a system. The dataset has not been used by many recommender systems researchers as no interaction data is included.

6.2 Grocery Datasets

In the context of grocery shopping only a handful of datasets exist that have been officially published to be used for research purposes.

The first to mention is the so-called instacard dataset made available via KaggleFootnote 11 in 2017. The dataset contains information about three million purchases via instacard in several grocery stores in the US. The dataset contains session information about purchases, as well as item meta-data. The dataset has been used to predict future purchases mostly employing machine learning [20].

Another interesting data source for potential future research in the context of grocery food recommenders is the Tesco Grocery 1.0 dataset,Footnote 12 published in February 2020 in Nature’s Scientific Data repository. The dataset consists of a record of 420 million food items purchased by 1.6 million fidelity card owners that shopped in 411 Tesco shops in London. While the dataset is very interesting, a downside is that it is currently only available in an aggregated format. Session or per user data is not available. The dataset only allows predictions to be made on an area level. A further downside of the dataset is that it does not contain detailed information about the actual grocery items. Only categorical information is available.

A dataset that does contain meta-data information is the grocery dataset published by Klasson et al. in 2019 [93]. Included in the dataset are grocery images and explanations.Footnote 13 Moreover the OpenFoodsFactsFootnote 14 dataset may be the largest open access database of groceries all around the globe with extensive meta-date available. Again, the dataset provides no user interaction data that would it make possible to test personalised algorithms.

With the exception of the mentioned resources, research relies on proprietary Web crawls from AmazonFootnote 15 and other large online retailers that allow the crawling of their data or provide a dedicated API [77].

6.3 Meals, Menus and Restaurant Datasets

To implement meal plan recommender systems, research typically relies on single item representations as, for example, available in the datasets as discussed above, most commonly recipes. Dedicated meal plans as a ground truth data can, however, also be obtained from online sources such as Eatingwell,Footnote 16 FitBitFootnote 17 and many other popular health platforms. These sources provide the possibility to obtain general meal plan templates for a day or a week.

In the context of meals, i.e., a combination of, for example, a starter, main dish and dessert, Allrecipes has been used as a resource in research [28, 46, 180, 181]. Others researchers have used Foods, TudoGostosoFootnote 18 as well as giallozafferano as Italian food website.Footnote 19 In [126] the authors also released a detailed behavioral dataset, including user interactions with the items.Footnote 20

Finally, YelpFootnote 21 is prominently used as a data source in the context of restaurant recommendations for eating out [186]. Other researchers have used Twitter [163],Footnote 22 Foursquare [163],Footnote 23 Tripadvisor [208],Footnote 24 as well as Dianping [207]Footnote 25 and Baidu [205].Footnote 26

All of these data sources are preparatory and no standardised sources exist.

6.4 Flavour Resources

Another important aspect of food one is often confronted with when building food recommendation services is flavour. To compute flavour of a given recipe or ingredient the current approach is to rely on food chemicals. To extract these chemicals, Ahn et al. [4] rely on Fenaroli’s handbook of flavor ingredients [21]. The dataset can be downloaded directly from their article in Scientific Reports. A newer approach to the same problem is the online resource FlavorDB.Footnote 27 FlavorDB is a database that comprises over 25 thousand flavor molecules representing an array of tastes and odor associated with over 900 natural ingredients. The repository has been made available through an online app and is accessible through a Creative Commons licence.

Another useful resources available in the food recommender systems context, though less used yet for that purpose are the online service Foodpairing,Footnote 28 which allows to estimate which types of food go well together with based on taste and smell. While the platform offers promising service, the science behind the service is rather in-transparent and not free of charge.

6.5 Software Frameworks

To date, no dedicated software package exists to build a food recommender system. All of the existing research in the food recommender context either has implemented their own system or has been building their prototypes upon existing recommendation frameworks. This of course creates issues such as, for example, algorithms being developed in the movie domain and used in the food domain, but not suitable for also recommending healthy food items [180]. Another issue is obviously that the results obtained in one work cannot be compared to another study, even on the same dataset as protocols and evaluation metrics often differ. To work towards resolving this issue the well-know LibRec library has been extended especially for the purpose of food recommendations [181], e.g., by integrating recipe content and collaborative features. A downside of the framework, however, is that the framework yet only is available in the Java programming language. Similar to the well-known LensKitFootnote 29 framework it is planned to transition from Java to Python.

A selection of other recommender systems frameworks in other programming languages can be found on Graham Jenson’s Github page.Footnote 30 These frameworks typically provide standard recommender systems algorithms, such as user- and item-based collaborative filtering with nearest neighborhood search as well as BPR, SVD and many others. In general, these existing algorithms work more or less well in predicting the users preferences. Further research, however, is needed to make these more accurate as current experiments show that standard algorithms perform significantly less well for the purpose of recommending foods (e.g., recipes) than movies, their original application domain [180].

6.6 Nutrition Resources

In the context of food recommender systems, it is often essential to know more about a certain food item, such as their energy value or other nutritional properties. These are often essential to develop health-aware food recommender systems or systems that aim for certain food constraints or goals.

To measure and compute the nutrition of a given food item the usual way is to map ingredients to standard databases. Examples of such standard databases used in food computing are the ones provided by the USDAFootnote 31 (US) and the BLSFootnote 32 (Germany). The USDA database is also used in Google’s knowledge graph.

A typical issue associated with these direct mapping principles is that food items can only be mapped correctly and computed if they exactly match the entries in the database. That often causes issues and calls for NLP techniques that are aimed to normalise words [96]. A more practical solution to the problem is to relay on existing frameworks or Web services. Examples of such systems are Spoonacular,Footnote 33 a Web service that is able to extract ingredients, nutrition and correct amounts from noise text inputs as well as Edamam.Footnote 34 A downside of both are that they are commercial services. In [149], however, a method to predict nutrition of recipes based noisy data is discussed. Software and data can be obtained for free from the authors.

6.7 Health Resources

Health resources are needed when implementing health-aware food recommender systems. A typical approach to make a food recommender system “health-aware” is to filter food items in terms of their healthiness. To measure healthiness one can rely on a variety of resources. In the following we discuss the most commonly used these days in research.

A common approach in the real world to inform someone of to what extent a particular food item is healthy via food labels and standards as set by food safety authorities. The latter is typically country dependent. Successfully used so far in the food recommendation context have been the standards as set by the Food Standard Agency (FSA)Footnote 35 in the UK as well as standards as set by the World Healthcare Organisation (WHO)Footnote 36 [180]. Both standards are based on a 2000kcal diet and account for different nutritional properties of a food, see Tables 6 and 7. While the FSA accounts for Fat, Saturated Fat, Sugars and Salt, the WHO guidelines also take into consideration Fibre, Proteins and Carbs. Both metrics have been used in previous research in the context of food recommender systems. Other metrics used to date to measure or compute healthiness is the ‘Healthy Eating Index’Footnote 37 proposed by the USDA to target the US population. More and more such standards are being developed now all over the world, per region and country, and it is expected that they find their ways in food recommender systems of the future.

Table 6 FSA front of package guidelines for healthy eating
Table 7 WHO dietary guidelines

6.8 Food Sustainability Resources

A currently emerging hot topic in the food recommender systems context is sustainability [13]. Only a handful of tools exist to measure to what extent a food item is sustainable or not. One of the most recently released resources is the NAHGAST onlineFootnote 38 tool that considers the following 4 dimensions of sustainability [167]:

  • Environment: Material Footprint (<2670 g/<4000 g), Carbon Footprint (<800 g/<1200 g), Water use (<640 L/<975 L), Land use (<1.25 m2/<1.875 m2).

  • Social: Share of fair ingredients (>90%/>85%)l Share of animal-based food that foster animal welfare (>60%/>55%).

  • Health: Energy (<670 kcal/<830 kcal), Fat (<24 g/<30 g), Carbohydrates (<90 g/<95 g), Sugar (<17 g/<19 g), Fibers (<8 g/> 6 g).

  • Economic: Popularity (without quantified target value), Cost recovery (without quantified target value).

Other related resources in that context can be found at Greenhouse Gas and Dietary choices Open source Toolkit Footnote 39 and Take a BITE out of CLIMATE CHANGE.Footnote 40

6.9 Other Resources

Another useful resource one would like to use when building a food recommender system is provided by foodsubs.Footnote 41 The service includes a food thesaurus that can suggest food substitutes. This is a particularly useful resource to develop future similar item food recommender systems [183] that are able to display recommendations with ingredient alternatives [182].

Furthermore useful resources to develop food recommender systems are food word lists such as provided by enchantedlearning Footnote 42 and Wikipedia.Footnote 43 They are typically used in content-based recommender approaches to normalise ingredient list from noisy data sources such as the Web.

Last but not least, one may also want to employ regional data in a food recommender system such, for example, provided by the Centers for Disease Control and Prevention (CDC) in the US to implement, for example, regional and health-aware food recommender systems [153, 185].

Finally, one would like to employ knowledge-graphs for the purpose of recommending foods to people. Sources in this context are still at an early stage but can be more and more found online see, for example, FoodKG Footnote 44 and in the literature [76]. These may be useful in the future to provide semantic recommendations.

Plenty of further resources can be found on the Web to construct food recommender systems. It is though recommended that one validates the quality of these sources with care as they often lacking scientific foundation.

7 Conclusion

Lessons Learned

In the following, we attempt to extract key takeaways from the literature. In particular we use our experience to provide guidance for practitioners unfamiliar with the research and who wish to develop a food recommender system. In our view the biggest mistake one could make would be to reduce the problem to a single algorithmic problem. While there is no evidence of one particular algorithm being best suited to different food recommendation problems, differences do occur in terms of what should be shown, how items should be shown and to whom. More concretely, the research results so far encourage practitioners to:

  • Think about the task: What people want to be recommended differs depending on the application (i.e. the type of item the system recommends. See Sect. 4)

  • Consider who the users are and what they may need: The evidence shows that different groups of users have different preferences and priorities when looking for food items (see Sect. 3.3)

  • Consider context: What people want to eat changes depending on a wide range of context variables and it is important to consider how these may affect your system (see Sect. 3.2).

  • Exploit the visual nature of food choice: Food choices in digital environments are highly visual (see Sect. 3.4) and this is one of the few constant findings across different settings. This is even true across cultures (see Sect. 3.2).

Summary and Future Work

In the following, we shed light on the food recommendation problem. We have provided a review of the current state-of-the-art in the field from an algorithm, interface, evaluation, and implementation resources perspective. We try to summarize the most important findings and points of our work:

  • At the moment, no theory about food recommender systems exists. There is research that tries to combine theory about food choice but no work that has tried to integrate systems into a theory.

  • Most research on food recommender systems is in the context of health, but less has been done in sub-fields such as cooking, grocery shopping, or restaurant visits, including menu items.

  • Algorithms are mostly based on standard recommender systems approaches, and not many specialized algorithmic developments exist.

  • Most recommender system approaches in food typically integrate a collaborative component and use algorithms such as collaborative filtering.

  • Algorithms in food recommender systems are mostly focused on single users. The group context exists but is not the main focus of research at the moment.

  • When it comes to the evaluation of recommender system algorithms, there is furthermore a lack of standardized datasets. Still, there is no initiative that tries to collect these preparatory datasets. The reasons for this are GDPR or copyright limitations for behavioral data or recipe data.

  • Furthermore, when it comes to evaluation, offline assessments are still dominating, but online evaluations have been catching up, which is commendable, as online evaluation protocols capture the real world better than offline simulations.

  • There has been little research on how different interfaces change the effectiveness and acceptance rate of food recommender systems. Only a few studies test interfaces to their algorithms at all, and none compare different variants of interfaces for the same recommender system.

  • Finally, when it comes to the availability of implementation resources, many tools exist, but no standardized food recommender systems framework, which would allow the community to build upon and advance the research field.

Besides these main findings, our review of existing work reconfirms the gaps and challenges identified by previous more specialized surveys of food recommender systems, as shown in Table 8. For example, personalization of health-focused food recommender systems by the user’s genome, microbiome, blood values, or changing behavioral patterns is a major gap identified by five surveys. In behavioral personalization, this challenge is further complicated by the accurate and effortless extraction of user data regarding past eating behavior. One solution to this issue and significant challenge itself is the extraction of information from food images and the accurate integration of food-related sensor data, e.g., volatile organic compounds sensors. Besides health-related issues, both the prediction of preference or other success criteria and standard algorithms’ performance is still below that in other fields of recommender systems, especially when regarding multiple criteria at once. We further confirm missing contextualization as one large research gap across different food recommender systems that might lead to much higher acceptance rates.

Table 8 Challenges and suggestions for future research listed in previous food recommender surveys.

In summary, there are many challenges ahead when it comes to solving the food recommendation problem. While this may sound limiting for practitioners, this is, on the other hand, exciting from a scientific point of view, as there are many open issues and questions that need to be resolved. Therefore, it may be some more time before we see such systems all over in our everyday lives, as is the case with other commercial recommender systems, such as media content recommender systems.