1 Introduction

Chronic diseases, such as heart disease, cancer and diabetes, are responsible for approximately 70% of deaths among Europe and USA each year, and they account for about 75% of the health spendingFootnote 1,Footnote 2. Such chronic diseases can be largely preventable by eating healthily, exercising regularly, avoiding (tobacco) smoking and receiving preventive services. Prevention at every stage of life would help people stay healthy, avoid or delay the onset of diseases, and keep diseases they already have from becoming worse or debilitating; it would also help people lead productive lives and, in the end, reduce the costs of public health.

In the last decades, healthcare systems in many countries have invested substantial effort in informing people about the benefits of adopting healthy behaviors in their lives (Intille 2003). Given the increasing popularity of mobile and personalized applications and devices (e.g., smart watches), an interesting opportunity for the digital health domain is the design of artificial intelligence (AI)-enabled tools providing user-tailored services.

Internet-based and mobile technologies allow the collection of data from personal devices, off-the-shelf wearable sensors and external sources, exploiting these data to generate effective personalized recommendations and to engage people in developing and maintaining healthier patterns of living. The orchestration of these data sources is a challenging task since it is necessary to address the design and development of effective and efficient mobile ad hoc solutions able to manage data intensive smart health applications (Comito and Talia 2017).

However, according to Fogg (Fogg 2002), such tools have the ability of increasing the persuasion power of an intervention.

To carry out this task, a system providing personalized support for a healthy lifestyle has to take into account and reason on a considerable amount of knowledge from different domains (e.g., user attitudes, preferences and environmental conditions), in order to generate effective personalized recommendations, and to adapt the message in response to both the environment and the user status.

To this end, engaging people in developing and maintaining healthier patterns of living is a challenging task as well. The generation of effective personalized recommendations implies, for example, the justification of given suggestions and the adaptation of messages in response to the modification of the environment and of the user status. For this reason, as opposed to hardwired persuasive features, systems that apply general reasoning capabilities to provide flexible persuasive communication based on rich and diverse linguistic outputs are required. In this context, modeling persuasion mechanisms and performing flexible and context-dependent persuasive actions is more ambitious than most current approaches on persuasive technologies (see Captology Fogg 2002). The works focusing on context-dependent persuasive actions usually leverage a domain and a user model that are exploited by (probabilistic) reasoner systems (Carolis and Mazzotta 2017) or argumentation-based systems (Hunter et al. 2019; Chalaguine et al. 2018). The reader may refer to Agnisarman et al. (2018) for a survey.

In this paper, we present the personalization capabilities of the HORUS.AI solutions (Dragoni et al. 2018Footnote 3. In particular, we describe how our solution works as a persuasive personal health assistant supporting tailored interactions with users and patients by assisting them about the adoption of healthy lifestyle and for the self-management of chronic diseases associated with bad lifestyle habits. This work extends what we already presented and evaluated in two previous research contributions. In Dragoni et al. (2018), we presented the overall HORUS.AI architecture and the knowledge layer together with the motivations behind the choice of relying on semantic technologies. Such motivations have been recapped as well in Sect. 5.2.2. The details of the knowledge layer and, in particular, the conceptual model upon which the HORUS.AI solution has been developed are discussed in Dragoni et al. (2018). Here, we focus on the dialogue-based persuasive layer (see Sect. 5) that implements the tailored persuasive interactions with the users. Such a module has been exploited for: (i) acquiring personal data information from users and patients and (ii) providing a personalized support concerning the adoption of healthy lifestyle or the management of their chronic diseases based on the results of personal data processing. Such results are produced by logic reasoning operations, described in Dragoni et al. (2018), and translated into motivational strategies and messages by the dialogue-based persuasive layer itself.

To the best of our knowledge, the contribution proposed in this paper is innovative with respect to the state of the art for two reasons. First, we discuss the capability of the HORUS.AI solution to exploit a knowledge-based encoding of persuasive strategies provided and managed by different actors (i.e., different categories of healthcare professionals). This way, the solution demonstrated its capability of being general with respect to the scenario in which it may be deployed. Second, we discuss the technological implementation of the component in charge of performing rule-based reasoning activities on monitored people behaviors. Such a component is responsible of triggering the generation of tailored and persuasive feedback to end users.

The proposed architecture has been validated within a pilot project (named Key To Health) run at Fondazione Bruno Kessler (FBK) where a mobile application linked to the HORUS.AI solution has been used by a group of 120 users for 49 days. Our aim was to observe whether the HORUS.AI would be able to support them in improving the quality of their lifestyle. Such a pilot project falls in the context of the workplace health promotion (WHP) initiative. WHP, defined as the combined efforts of employers, employees and society to improve the mental and physical health and well-being of people at work Footnote 4, aims at preventing the onset of chronic diseases related to an incorrect lifestyle through organizational interventions directed to workers. Actions can concern the promotion of correct diet, physical activity and social and individual well-being, as well as the discouragement of bad habits, such as smoking and alcohol consumption.

The rest of the paper is structured as follows. Section 2 provides a brief overview of available solution for the monitoring of people’s lifestyles in real-world scenarios. In Sect. 3, we present the Key To Health project in which the proposed approach has been deployed and validated. Section 4 introduces the theoretical psychological framework integrated into HORUS.AI. Then, in Sect. 5 we present the details of the persuasive engine in charge of managing the interactions with users. Section 6 presents an evaluation of HORUS.AI from both the quantitative and qualitative perspectives. This is followed by Sect. 7 where we discuss some lessons learnt during the project along with the qualitative feedback of the user experience. Finally, Sect. 8 concludes the paper.

2 Related work

Systems for personalized healthy lifestyle recommendations and advice fall in the broad area of the decision support. The goal of these systems is to help and guide users in taking healthy informed decisions about their lifestyle, regarding aspects such as food consumption or everyday physical activities. Such systems support humans in taking better decisions (e.g., by suggesting some physical exercises or conscious food consumption), similarly as a human expert would do, based on available data (e.g., nutrients ingested in the past meals, user’s health conditions), and to communicate these decisions to the users according to their preferred means and modalities. Looking at the state of the art, the use of dialogue-based systems and conversational agents is becoming increasingly significant and relevant in the healthcare domain and in the development of mobile health applications (Survey of conversational 2019; Wyke et al. 2019). The work in Laranjo et al. (2018) presents a systematic review of studies regarding conversational agents developed in the healthcare domain; the authors found that the use of natural language input capabilities is still only at an experimental stage without reliable efficacy or safety evaluation. However, given the increasing use of dialogue-based systems, especially in the healthcare domain, the authors of Hoermann et al. (2017) present a review of the state of the art focusing on on-line mental health interventions that use text-based synchronous chats. The 24 articles reviewed showed improvements in mental health and positive outcomes following the text-based interventions; however, the authors underline the necessity of further work and research to improve the mode and timing of the intervention. In this work, we integrated multi-step dialogues but, since we did not want to increase the risk of a low acceptability of the system, all users provided information in a controlled way without using natural language text. Hence, in this case we cannot state that our system includes a full-fledged conversational architecture. Its design and integration is part of the future work.

The work in Kim et al. (2001) presents a web-based prototyping and clinical conversation system that wants to incorporate the extensive clinical data with users’ medical record systems creating a comprehensive clinical information system. However, the study showed that in order to create a well-designed interactive clinical dialogue program, which integrates clinical information systems with patients’ medical records, there is still the need of much improvement.

The work in Sama et al. (2014) is a review of mobile health applications focusing on the prevention of diseases or on the treatment chronic diseases; among the 400 applications reviewed, self-monitoring was the most common user engagement method used; however, many applications applied also progress tracking, especially for the tracking of physical activity using wearable devices. Also, approaches based on user engagement were found but they are limited and in need of development and improvement. This article shows the importance of providing effective monitoring or tracking methods in health-promoting systems and applications. In van der Weegen et al. (2015), the authors present a monitoring and feedback tool embedded in a self-management support program to stimulate physical activity. The tool paired with counseling was effective but it needs improvements like tailor and customize the intervention to specific users. The availability of small, cheap and customer-accessible tracking devices eases their utilization across a wide variety of contexts. A way to use monitoring devices to track activity and collect personal data to support users in self-managing chronic diseases is discussed in Chiauzzi et al. (2015). Still, the authors notice that there is room for improvements on how to use the data and how best to engage the user in self-management. To be effective and impactful, monitoring devices must be valid and reliable; moreover, user-personalized design improves the engagement and, ultimately, the effectiveness of the program. The articles reported before show how there is still a lot of work and research to do toward the automatic tracking of physical activity, but much work has still to be done in the reporting and logging of dietary habits and food eaten; these tasks are still manual and usually time-consuming.

Concerning the exploitation of persuasive technologies in health care, there are many studies regarding healthy promotion and disease risk prevention, which address system design and implementation along with effectiveness evaluation (Cawsey et al. 2000). These systems can be classified in two broad categories: vertical and horizontal solutions.

Vertical systems implement solutions that are tailored for a specific domain and usually rely on ad hoc strategies such as canned texts. These systems have the advantage of being effective on the domain, but their flexibility is usually low and an extensive reengineering is required to port them to new domains. In Berrouiguet et al. (2016), the authors present a systematic review of mobile phone and web-based text messages to promote mental health (reminders, information provision, tailored and standardized supportive messages and self-monitoring instructions). Considering 36 studies, 35 of them show the positive impact of text messaging on patient motivation to improve their health and encourage treatment. Other studies, such as (Head et al. 2013; Dijkstra 2014), show that tailored and personalized messages with variety in frequency are most efficacious, mainly in physical activity and smoke cessation interventions. In Job et al. (2017), researchers conducted an exploratory study to evaluate the tailored text messaging acceptability when used in the maintenance phase. Women involved in the study received encouragement messages to adopt healthy behaviors and text messages to prompt self-reported weight, goal setting and goal monitoring. Also in this case, positive results showed the importance of the tailored content and scheduling of text messages. Similarly, in Colineau and Paris (2011) the authors observed how the use of goal settings resulted effective in a family environment. The studies mentioned above give an objective validation on the use of tailored and personalized persuasive messages in behavior change, thus further supporting the use of this kind of messages in our application. Besides, HORUS.AI enables the dynamic creation of the persuasive messages based on the profile of the specific user and on the data provided.

On the contrary, horizontal solutions are not bounded to a particular domain and they try to address the problem of rich persuasive generation from a general perspective. Horizontal approaches have the potential of being easily portable and adaptable; however, they usually remain at a theoretical or proof of concept level. For example, in Oinas-Kukkonen (2013), (Kelders et al 2016) and (Oinas-Kukkonen and Harjumaa 2009) the authors give an important contribution defining a persuasive systems design model for behavioral change support systems. These works detail the concepts and methodology for the design and evaluation of persuasive behavior change systems adaptable to different domains and contexts. Focusing on generative aspects, some seminal works on argumentation-based text generation have been proposed (Zukerman et al 2000; Reed et al. 1996), but the authors focus on the validity of the generated messages rather than their effectiveness. A more recent approach, presented in Pan and Zhou (2014), introduces a persuasive framework combining natural language generation (NLG) strategies with information gathered from social media.

Turning to the specific task of generating motivational messages for health promotion, in op den Akker et al. (2015) the authors present a theoretical framework for representing real-time tailored messages in behavior change applications that can be adapted to different generation strategies ranging from canned text to deep generation. Four important properties of a motivational message are considered: timing, intention, content and representation. This framework inspired the HORUS.AI persuasive engine development (see Sect. 5). However, differently from our work, it has not been instantiated in any real system.

Occupational health can be important and can have an impact on workers’ lifestyle, but not many studies tested the actual reliability of such programs. Robroek et al. Robroek et al. (2012) tested the participation and attendance of an Internet-delivered health promotion program on six different workplaces. Participants were asked to answer a questionnaire on lifestyle, work and health factors. They were also offered a physical health check and personal advice on healthier lifestyle. During the period of the study, the participants could visit the website to access information on lifestyle, health and personalized feedback; they could also use web-based tools to self-monitor food intake and to ask questions. Users received monthly email messages to promote website visits. The study showed a need for more appealing and engaging techniques to prompt attraction to the program.

Melzner et al. (2014) proposed an integrated framework for mobile application adoption in workplace health promotion, focusing on four drivers for technological intervention: attitude beliefs (in terms of usefulness, usability, severity and enjoyment perception), influences (external and interpersonal), facilitating conditions (availability of resources to execute a behavior) and cues on action (feedback and motivational messages). The authors conclude that implementing into the application features tailored to the specific context of the workplace can improve user’s perceived usefulness and enjoyment. Balk-Møller et al. (2017) evaluated the efficacy of a web and app-based solution for workplace health promotion: social features, competition and prizes motivated people to lose weight and adopt healthy lifestyle. Self-reporting diet and exercises were required and feedback, practical tips and tricks on health and well-being were sent to the employee with periodical text messages and emails, set by user’s preferences. The authors report that the competition method of intervention used in the study was not effective enough to maintain the user motivated beyond an initial period.

We summarized above the main research directions and approaches dedicated to the healthy lifestyle recommendation research topic. Here, we want to connect our work also with the context of the role played by social media on the effectiveness of persuasive techniques. This is a promising research direction also implies significant ethical aspects (e.g., to exploit the behavior of famous people to trigger a behavior activation in a patient) that, in the current version of HORUS.AI have not been addressed and it has been left as future work.

Social media brought a new way of providing information due to the use of many slangs and other ways of communicating between people (Barbosa and Feng 2010). Social media have a significant influence on people in many areas of everyday life. This topic has been dealt with by many studies in different specific domains, e.g., fashion (Brambilla et al. 2021), mental health (Jaini et al 2020; Keles et al. 2020; Berryman et al. 2018; Ivie et al. 2020), health behaviors (Moreno et al. 2018), alcohol consumption (Egan and Moreno 2011), sexual behavior (Romo et al. 2016), young people’s health and well-being (Moreno et al. 2018; Shafi et al. 2018), as well as the influence of social media on eating behavior and lifestyle (Blundell and Forwood 2021). Understanding the factors that may trigger behavioral activation within users is fundamental to support the successful translation of healthy lifestyle goals into the target of such behavioral activation (Lassen et al. 2016; Carrillo et al. 2011). We can consider, as example, how social media like Instagram changed the way many people consume food (Walsh and Baker 2020). Food pictures are used on Instagram in photographic exchanges to identify and interact with the community Walsh and Baker (2020). The food topic is the second most popular one on Instagram after selfies Amato et al. (2017). Popular social media users called influencers have a strong impact on their followers’ decision-making Mardon et al. (2018); Hudders et al. (2021); De Jans et al. (2021); Coates et al. (2020). In the field of healthy food, they replace the already established food personalities and celebrity chefs and become the creators of healthy eating rules (Goodman and Jaworska 2020) and informal sources of health education (Marks et al. 2020). Influencers increasingly change the behavior of individuals in connection with food choice and diet and thus play a crucial role in public health (Byrne et al. 2017). As mentioned above, this is a very important ethical aspect to manage whether tailored persuasive systems like HORUS.AI decide to exploit the social media context.

With respect to the works presented in the literature, we provide a full-fledged AI-based solution supporting (i) the modeling and storing of all the data required to provide personalized healthy lifestyle support, as well as (ii) the definition and execution, via a reasoning engine, of a dynamic set of rules performing real-time monitoring of people lifestyles. The output of the reasoning task is then used for suggesting people to change their habits in order to follow healthier behaviors. Our proposal fits in the context of ontology-centric decision support systems (Rospocher and Serafini 2012), as all the data processed (e.g., user profile, meals) and produced (e.g., violations) by the system are stored in an ontology-based repository. To the best of our knowledge, our contribution is innovative with respect to the state of the art due to the capability of exploiting domain-specific knowledge provided and managed by different actors (i.e., different categories of healthcare professionals) with the aim to perform rule-based reasoning activities, at a very elementary fine-grained level, for monitoring people behaviors in the healthcare context.

3 The Key to Health project

As introduced in Sect. 1, this work is part of the Key to Health project. This project is part of a workplace health promotion initiative run inside FBK in line with the declaration on workplace health promotion in the European Union. The pilot activity was supported by HORUS.AI , i.e., an AI-enabled system enabling the monitoring of people behaviors by means of knowledge-based technologies for persuading them to follow healthy lifestyles.

The system relies on four (4) layers. The input layer is responsible to store events that trigger the platform activities and it accounts for the ability of a persuasive system of sensing the context of interaction. These events are of two types: (i) data input, where data are sent from the input layer to the knowledge layer, and (ii) context communication, where contextual information is sent from the input layer to the persuasion layer that may exploit this information for persuasive purposes. The knowledge layer encompasses two kinds of information: (i) Augmented Domain Knowledge, i.e., the structured representation of the domain of interest including those relations that are relevant for persuasion purposes, such as the similar-taste relation or the categorization of food properties into negative and positive ones, and (ii) the Monitoring Knowledge, i.e., the structured representation of the rules driving the behavior change process (i.e., the rules that a user should follow). The persuasion layer exploits the output of the kowledge layer (i.e., reasoning operations) for choosing the persuasive intentions to include in the natural language generated messages and focuses on the tasks of selecting the arguments to include in the message, to order them and to choose the right wording for each argument. Finally, the output layer is in charge of closing the loop by providing the feedback to users. It represents the many devices that are able to receive the data produced by the persuasion layer and conveys the physical feedback to users.

Within the Key to Health project, a mobile application integrated within the HORUS.AI solution has been used by 120 FBK’s workers (both researchers and employees) as a tool to persuade and motivate them to follow WHP recommendations. We reported in Appendix 1 (Figures from 10 to 18) a set of screenshots showing the main functionalities of the application with a brief description of their purposes. The engagement of the users was on a voluntary basis and no incentives were adopted. Indeed, all the users were already motivated in participating in the study (from the behavioral intervention perspective, they were all in the contemplation stage). All involved users have been equipped with smart bands that synchronized information about steps and physical activity data with our system. However, users were asked to insert a report of performed activities also manually for validating the synchronized information. Part of future work is to reduce the effort about the acquisition of physical activities information.

This study represented a first step within the Trentino Salute 4.0 digital health initiative aiming at extending the availability of the HORUS.AI solution at the whole Province of Trento before the end of 2019 and at Italian National level during the two-year period 2020-2021.

Table 1 shows main demographic information concerning the users involved in the performed evaluation campaign. All users presented a healthy status. Indeed, in this first pilot we decided not to involve people affected by chronic or other diseases.

Indeed, the involvement of patients affected by some disease requires the integration of the AI-based system within clinical practice. This means that each AI-based service must be certified by an ethical committee. This aspect is within our long-term agenda, but it is out of scope of this work.

Table 1 Distribution of demographic information of the users involved in the evaluation campaign

4 The HORUS.AI behavioral change strategies

The theoretical foundations of the personalized behavior change interventions implemented in HORUS.AI are inspired by the behavioral intervention technology (BIT) model presented in Mohr et al. (2014). This BIT model is a broad framework that combines behavioral principles with technological features. This model supports the translation of clinical (and well-being) aims of a BIT treatment and its intervention components into BIT features easy to implement. The state of change of a user can be solicited by using a sequence of possible interventions that compose a treatment. The component of the reference BIT model are constituted by (i) a theoretical level (why and conceptual how components) and (ii) an instantiation level (what, technical how and when components). Why describes the aim of a particular intervention (e.g., weight loss). Conceptual how defines the behavior change strategy (BCS) chosen for achieving the aim (e.g., Feedback, Goal, Monitoring, Education, Motivation Enhancement). A BCS is instantiated by a series of BIT elements, which are the what of the model (e.g., messaging, information delivery, reports, etc.). The technical how defines the technical characteristics of the intervention (e.g., complexity, the medium, the esthetics, etc.). Finally, the workflow of the intervention describes when each element of the intervention is delivered to the user (e.g., frequency, conditions, etc.).

The instantiation of HORUS.AI in the domain of healthy lifestyles combines all the components proposed in the BIT model as detailed in the following sections. Both behavioral change strategies (sect. 4.1) and BIT elements (sect. 4.2) were selected by a team of behavioral change strategies experts (psychologists) with good knowledge and experience in the healthy lifestyle domain. The selection has been carried out both on their experience and on the BCS literature. To this extent, we add the taxonomical identifiers of the implemented behavioral change strategies according to the taxonomy in Michie et al. (2013).

4.1 BIT theoretical level

At the theoretical level, the why component simply coincides with the aims of the specific motivational application implemented by HORUS.AI. As we explain below, in the project Key To Health we designed an application for promoting healthy behaviors at the workplace, an evolution of that presented in Dragoni et al. (2017). Specifically, the aims of the interventions are to motivate people to adopt a better diet that follows the rules of the Mediterranean diet and to increase the level of physical activity.

The conceptual how is realized in HORUS.AI by the implementation of several strategies of intervention: Goal setting, Monitoring, Feedback and Motivational enhancement. Goal setting (taxonomical identifier number 1.1 in Michie et al. (2013)) foresees the assignment of goals in the field of health where the user needs to improve according to his/her profile. Monitoring (taxonomical id 2.5) comprises both the initial collection of data from the users for assigning them to a specific profile (e.g., smokers, people who consume too much processed meat) and the collection of data, through dialogue, diary or sensors, to infer the users’ behavior and the fulfillment of the assigned goal(s). Feedback (taxonomical id 2.2) is the delivery of informative messages about the behavior of a person. These messages include information about food consumption (e.g., the number of meals with cold cuts in a week), physical performance (e.g., the number of daily steps) and the adherence/violation to a specific goal. Argument (taxonomical id 5.1) is the delivery of informative messages about the consequences (negative or positive) of performing a particular behavior that does not meet a goal. For example, such messages could state the possible long-term negative consequences (e.g., hypertension, cardiovascular disease) of consuming too much cold cuts. Suggestion (taxonomical id 8.2) is the delivery of informative messages about alternatives to a behavior that does not meet a goal. According to the user’s profile, the system could suggest the consumption of alternative food instead of red meat or a step-by-step reduction of red meat. Motivational enhancement (taxonomical id 3.1) in the project Key To Health is provided by the intervention of a human counselor alongside the intervention of the application.

4.2 BIT instantiation level

At the instantiation level, the what elements implemented in HORUS.AI are Notifications, Information delivery, logs, visualizations and Reports. Logs are implemented: (i) through a dialogue for collecting information about the profile of a user and (ii) through a diet diary for gathering the input data about food consumption. In the BIT literature, the self-reported diary is affected by the underreported problem: the cheating about the food intake due to a possible social desirability bias (Hebert et al. 1995; Miller et al. 2008). To cope with it is not easy, and a combination of detection and prevention methods is the best solution (Nederhof 1985). We did not address this in the Key to Health project and is left as future work. However, in this project, the users are in a state of change ready to accept and meet persuasion goals. This, in principle, would reduce the underreported problem.

Notifications and Information delivery at this level realize the feedback, argument and suggestion BCS of the Theoretical level, and can be part of a more complex dialogue constituted both by motivational utterances to reinforce or discourage detected behaviors and, by advice, recommendations or useful information. Graphical visualizations and written reports provide users with summaries of their behavior in relation to the assigned goals. In this version of HORUS.AI the medium at the technical how level is text and graphical representations with the complexity that depends on the specific application (e.g., messages for medium literacy employees can be more complex than those for a wide audience) and the motivational strategies adopted. Finally, when level is realized in HORUS.AI by the implementation of time-based rules and event-based rules as well as depending on the context. For example, an intervention asking the user to provide information about food consumed is delivered after the main meals. A message with feedback, argument and possible suggestion is delivered when the system infers the user’s behavior from input data.

The elements at both levels sketched above are discussed with more details in Sect. 5 where the implementation of HORUS.AI dialogue-based persuasive layer is presented.

5 PerSEO: the dialogue-based persuasive layer

In this section, we describe PerSEO (Persuasive mEssage generatOr), the HORUS.AI component in charge of generating and managing personalized dialogues to sustain effective interactions with the users and to motivate them about adopting correct behaviors both in the case of the self-management of chronic diseases and the maintenance of healthy lifestyles. Dialogues and messages generated by PerSEO are contextualized, since they unfold according to the users’ data (both explicitly provided or implicitly acquired from sensors) and to the responses provided by users to system utterances.

This component is shown in Figure 1. We only report the HORUS.AI modules involved in the customization of the health recommendations as the others are out of scope.

Fig. 1
figure 1

Extracts of the HORUS.AI solution. Arrows represent data flow and tables contain the BIT elements and the relative BCS presented above. The number in parenthesis is the identifier in the taxonomy in Michie et al. (2013)

For helping the reader in understanding the system in Fig. 1, we provide a scenario with a running example. Our user is a 25-year-old man with a medium physical activity level. He strongly consumes food and beverages with a lot of sugar and would like to decrease this consumption. The first interaction with the mobile application regards the collection of information about his lifestyle behavior (input layer) to create a profile and for setting a persuasion goal (dialogue manager component). He sets, as persuasion goal, to drink less than 200 ml of sweet beverages for meal.

Rules are represented in a structured format by using the Turtle Footnote 5 language:

Fig. 2
figure 2

An monitoring rule defining the maximum amount of fruit juice that a user should consume within a single meal

Briefly, the meanings of each property of a rule are the following. Row 3 specifies the profile which the rule is associated with. Here, only one profile is specified, but it is possible to include multiple vc:appliesTo axioms. Row 4 defines the priority of the rule. Row 5 provides the kind of validation that the reasoner has to perform. In this case, the command property assumes the value contains meaning that the rule has the goal of monitoring the detailed amount of a specific entity (e.g., the fruit juice in our case). Other possible values are notcontains (the opposite of the previous one), occurrence (i.e., the number of time a specific entity occurs) and property (i.e., a specific property of the monitored entity, e.g., the amount of calories). Row 6 specifies the mathematical operator used for validating the rule. Allowable values are less, lessequal, equal, greater, greaterequal and percentage. This property is exploited at reasoning time for the execution of data aggregation and comparison operations. Row 7 provides the timing of the rule, i.e., if a rule refers to a specific meal or other time spans like an entire day or an entire week. Finally, rows 8, 9, 10 and 11 describe the entity (as a HeLiS class) that is monitored by the rule, the type of the entity (this information facilitates the reasoner in performing some steps when it is not possible to automatically infer the type of the monitored entity), the values making the rule satisfied and the respective unit of measure.

Each meal, our user will insert the consumed food through a diet diary (input layer) in the mobile application. According to the implemented conditions elements (event listener component), the user’s profile, together with the chosen persuasion goals, and his diet diary is sent to the knowledge layer. This layer is in charge of processing the data provided by PerSEO by verifying their compliance with respect to a set of monitoring rules associated with the goals. These monitoring rules are stored in a dedicated ontology (the HeLiS ontology Dragoni et al. 2018) and the compliance checking is performed with a logical reasoner (Dragoni et al. 2018). Once a violation of a monitoring rule has been detected in the user’s meal, the reasoner returns a set of logical statements (hereafter called violation package) that contains information about the violated monitored rule, such as the kind of food that generated such a violation, the type of meal (e.g., breakfast or lunch), the level of the detected undesired behavior (that is, whether the user inserted 300 ml of sweet beverages or 500 ml) or the number of times the monitored rule has been violated.

Figure 3 shows an example of violation represented by using the Turtle language:

Fig. 3
figure 3

Example of the violation bean produced by the reasoner as consequence of the violation of the rule shown in Fig. 2. vc is the namespace prefix used for the concepts of the HeLiS ontology. The violation is described by using the Turtle language

Rows 1, 2 and 3 contain information about both the violation and the user ids. Information provided between rows 4 and 12 are inherited by the rule definition that has been violated. This information is preparatory for the feedback generation task since it avoids PerSEO to perform further queries on the knowledge repository. Rows from 13 to 16 contain information directly provided by the reasoner, i.e., the quantity observed and related to the entity described at row 10, the unit of measure of the specified quantity, the violation level and the timestamp in which the Violation has been detected. Finally, information between rows 17 and 22 are computed during the inference step where the knowledge repository is queried for retrieving more specific information about the generated violation. The start and end timestamps shown at rows 17 and 18 are extracted from the collection of the events (in this case the list of meals shown in rows 20, 21 and 22) that caused the generation of the Violation. Finally, the violation history value provided at row 19 is computed. This value provides a recidivism index about how a user is inclined to violate specific rules and it is exploited by the NLG component for choosing the proper terminology at feedback generation time.

The violation packages are sent back to PerSEO which Intervention Composer component implements, as textual messages, the feedback, the argument and the suggestion BCS to generate the personalized health recommendations for the user. This rendering in natural language is performed into two steps: (i) the selection of a violation according to an implemented ranking strategy and (ii) the composition of the final message. The latter is performed with data-to-text natural language generation techniques that leverage a database of ad hoc templates and query the HeLiS ontology for gathering consequences of a bad lifestyle and suggestions of alternative behaviors. The strategies implemented within the HORUS.AI system are inspired by data-to-text techniques Pauws et al. (2019); Reiter (2007) as described in Sect. 5.2.3. In addition, the Intervention Composer implements another BCS: the monitoring of the adherence to the selected persuasion goals. This is rendered with graphical charts. Both textual messages and charts are shown to the user through the mobile application.

In the following, we detail the components of PerSEO along with a brief description of the knowledge layer.

5.1 The dialogue manager component

In this version of HORUS.AI, the dialogue manager gather the profile of a user with a dialogue. A dialogue is implemented as a set of connected utterances, either questions to collect data or statements (e.g., feedback and motivational messages), which can be represented as a Directed Acyclic Graph (DAG). The vertexes are the single text messages sent by the system to the user (system utterance), see an example in Fig. 4, and the edges connect each utterance to the next one, according to the implemented strategies.

Fig. 4
figure 4

A fragment of a DAG representing a dialogue for profiling the user dialogue regarding his lifestyle habits, with question messages that require a categorical answer or a numeric one (in yellow and green, respectively), and motivational messages (in red)

Each system utterance can be either a statement, which does not require an answer, or a question, possibly accompanied by a motivational part. In the former case, the utterance can be a leaf vertex and the dialogue ends, till the next interaction triggered by PerSEO or by the user, or there can be an edge to the next message or question of the same dialogue. In the latter case, each possible answer to the question corresponds to an edge that connects the message to the next system utterance to send to the user. In this version, we modeled two kinds of answers to a question:

  1. 1.

    Closed-ended questions, which require the user to choose among a set of predefined answer (e.g., Do you drink sweet beverages for lunch? Yes/No). These will be represented in the user interfaces with a list of possible choices.

  2. 2.

    Questions that require a numerical answer (e.g., How much time of physical activity do you do in a week?). The answer is elaborated by PerSEO using the comparison operators (\(\le\), \(\ge\), =, etc.) to pick up the next message according to the conditions formalized in the DAG vertex.

A motivational message (or the motivational part of a question) can be predefined, or context-dependent composed at runtime by the motivational engine. In our running example, the system gathers the user profile described above and the user selects as goal setting the rule of drinking less than 200 ml of sweet beverages for each meal. In addition, the user has the possibility of selecting more other goals.

5.2 Context-dependent message generation

The generation of personalized motivational messages follows the model inspired by op den Akker et al. (2015). This model enables the exploitation of personalized information for the composition of tailored messages of different levels of complexity. A message can be generated according to (i) the timing of the message generation trigger; (ii) the level of the detected undesired behavior; (iii) the information that the user can be interested in; and (iv) the history of previous messages sent to the user. Below, we focus on these four factors and on the meta-reasoning implemented for each of them.

5.2.1 The event listener component

The Event Listener module in PerSEO implements the Conditions elements of the When dimension in the BIT model. The Event Listener is in charge of recognizing the events that trigger PerSEO to deliver a new message, which constitutes the intervention. This module implements the Timing concept proposed in op den Akker et al. (2015). Here we consider only system instantiated timing; contextualization, tailoring and efficacy of the message depend heavily on this aspect. PerSEO executes a meta-reasoning to evaluate whether a message generation is needed and which form of message is more appropriate in a particular moment. There are three kinds of events detected by the Event Listener:

  • Events related to user’s habits and behavior: in general a behavior is analyzed when the user inputs data in the system, such as a new meal in the food diary.

  • Time scheduling: PerSEO may need to send particular information to the user at a specific time of the day or of the week (i.e., every Sunday at 6 p.m. the employee receives a report about weekly adherence to the Mediterranean diet) or to perform a data input check to send (if needed) reminders, recommendations or advice of good practice to the user. In this case, the scheduling is defined observing users routine.

  • Localization: the third event triggering the intervention of PerSEO is the mobile application recognizing that the user is in a specific place (e.g., near a vending machine). Even in this case, the generation of a message depends on the event time. For example, if the position in front of a vending machine is detected mid-morning, it is highly probable that the employee is going to have a snack.

These events determine the form and the structure of the message. In the first case, message is considered as a post-strategy, while in the second and third, messages could be generated as a pre strategy.

5.2.2 The knowledge layer

The knowledge layer contains the ontologies and databases, the software services for their population by the domain experts and the logical reasoning facilities used in the HORUS.AI solution. We present here only the components needed by PerSEO: (i) the HeLiS ontology; (ii) the logical reasoner; and (iii) the database of templates for the natural language generation of the personalized messages.

The rationale behind using knowledge-based technologies relies on the following three pillars:

  • the capability of capturing and modeling how experts work (expert’s knowledge) in assessing adherence to healthy lifestyle recommendations among others. This involves representing different/heterogeneous kinds of data (e.g., food, nutrients, physical activities, user diseases or needs) and how they interplay in defining a healthy lifestyle (e.g., the best practices an expert would recommend);

  • the possibility of developing effective and efficient techniques implementing such an expert’s knowledge. This way, it would be possible to apply expert’s knowledge on real-time users’ data in order to assess how healthy and compliant with guidelines they are. In turn, the resulting system is able to provide motivational messages based on the data of monitored users;

  • to enable the evolution of healthy prescriptions through the use of dedicated facilities. As healthy lifestyle best practices are not immutable, and sometimes not even universally shared, but they continuously evolve (e.g., new prescriptions for new typologies of users), there is a need to facilitate experts in defining and revising the prescription used in the systems.

The HeLiS ontology is a state-of-the-art conceptualization that describes in a formal language the food and recipes composition, the rules of the Mediterranean diet, the physical activities domain and user preferences/habits in order to support the promotion of healthy lifestyles. HeLiS is the reference ontology adopted in our system and the parts used in the presented work regard: (i) the composition of foods, such as rules stating, for example, that a Carbonara dish is composed by eggs, pasta, cold cuts, EVO and aged cheese; (ii) the rules of the Mediterranean diet to be associated with the persuasion goals. For example, cold cuts can be consumed at maximum of two times per week. (iii) Long-term consequences of a not proper assumption of a given food. For example, an excessive intake of cold cuts for a long time could cause cardiovascular diseases. Specific details about this module can be found in Dragoni et al. (2019). (iv) Food alternatives to a given food according to similar nutritional properties. For example, legumes can be a valid source of alternative proteins to red meat. Further details about HeLiS can be found in Dragoni et al. (2018).

The reasoner component, in this work, is in charge of performing two tasks:

  • given a user profile and a span of time (e.g., the last meal, the last day or the last week), the reasoner will check whether the persuasion goals of the user have been violated Footnote 6. In such a case, it returns a set of violation packages.

  • Given a user profile and a particular food, the reasoner will return all the long-term consequences for a not proper consumption of that food and its alternative foods that are compliant with the user’s profile and the selected persuasion goals.

From the technological perspective, this component is based on a state machine implemented in DroolsFootnote 7: a Business Rules Management System (BRMS) solution with a forward and backward chaining inference-based rules engine. Strategies of composition of dialogues in response to the events (e.g., facts of the world), represented by the incoming data, are defined through the modeling of antecedent–consequent Drools rules. For the details of the reasoning component and the inference mechanism the reader may refer to (Dragoni et al. 2018). The violation package returned by the reasoner, one for violated rule, is a set of logical facts that describe a violation. It is implemented as a dictionary data structure, as shown in Fig. 3, which fields are exploited as follows by PerSEO:

  • The food entity: the food label (e.g., sweet beverages) that violated a rule associated with a persuasion goal;

  • The time span: whether it is a rule related to a particular time span or event, such as the last week or the last meal. This is used in the template system for choosing proper temporal adverbs.

  • The rule intention: according to (op den Akker et al. 2015), a persuasive message conveys an intention, that can be the reinforcement of a user’s state of change, or the encouragement/discouragement at following a certain behavior. Here, we adopt the encouragement and discouragement intentions. The former encourages users to introduce a bigger amount of a certain food entity in the monitored rule related to the persuasive goal, e.g., increase the consumption of fruit and vegetables in a day. The latter discourages users to introduce too much amount of a food entity in the monitored rule related to the persuasive goal, e.g., decrease the consumption of sweet beverages in a meal.

  • Expected quantity: the expected amount of food (e.g., maximum 200 ml per meal) stated in the violated rule;

  • Real quantity: the real amount of food consumed by the user in the time span (e.g., 300 ml in the last meal);

  • Measure: the unit that measures the amount of food (e.g., ml for beverages).

  • Violation level: this inconsistency level is a numeric value bigger than 0 that is a discretization of how much a user violates a rule. For example, drinking 300 ml of juice will have a level of 1 for the rule “Drink at most 200 ml of sweet beverages for a meal.” Drinking 500 ml will have a violation level of 3 for the same rule. These values are used to select the violation to be notified to the user (e.g., notify the most violated rules) and for rendering the feedback message. A highly violated rule will be rendered with a different modifier: “Today you drank A LOT OF orange juice.”

  • The rule history: the number of times the given user violated the given monitored rule.

The database of templates contains the templates used by the Intervention Composer to render a violation package into a natural language sentence. Formally, a template is a grammar whose terminal symbols are filled according to the data in the violation package and new information queried in the reference ontology. In addition, some terminal symbols do not depend on the queries to the reasoner but are fixed. This can be, for example, chunks of text at the beginning of the feedback/argument/suggestion sentence, such as, “Do you know that ...,” “Long-term consequences of an excessive consumption of ....,” “Next time try with ...” and “An alternative food is ....” For the Key to Health project, the templates were written by the mentioned team of BCS experts.

5.2.3 The intervention composer component

The Intervention Composer converts the violation packages returned the knowledge layer into the implemented BCS. As mentioned above, here we focus on the feedback, argument and suggestion BCS and the Intervention Composer will render these BCS with persuasive textual messages. This rendering is performed in two steps:

  1. 1

    the selection of the violation package with the highest priority according to a ranking strategy;

  2. 2

    the composition of the textual message given the selected violation package.

The ranking strategy has been developed by the BCS experts and takes into account the user’s conscientiousness and recidivity. A violation package will have high priority if the user demonstrates low conscientiousness for the related persuasion goal and high recidivity at violating the monitored rule. This is easily implemented by ranking the violation packages according to both the violation level and history keys defined above. The highest the values of these keys the highest the priority. If multiple violation packages have the highest priority a random choice is made as it does not affect the persuasion effectiveness.

After the selection of a violation according to the ranking strategy, the Intervention Composer implements the message composition, that is, the generation of three textual messages for the feedback, the argument and the suggestion, respectively. This is inspired by the work in op den Akker et al. (2015) and expanded taking into consideration additional strategies presented in Guerini et al. (2007). These consist of several persuasion strategies that can be combined together to form a complex message. Each mentioned BCS is rendered through natural language text with a template. A template is formalized as a grammar whose terminal symbols are filled according to the data in the violation package and new information queried in the reference ontology. Once the templates are filled, a sentence realizer (Gatt and Reiter 2009) generates natural language sentences that respect the grammatical rules of a desired language Footnote 8. In particular, the Italian language requires a morphological engine (based on morph-itFootnote 9) to generate well-formed sentences starting from the constraint written in the template (e.g., tenses and subject consistency for verbs). Below we describe the implemented strategies to automate the message generation, focusing also on linguistic choices.

Feedback: is the part of the message that informs the user about the violation of the goal that has been set up. Feedback is generated considering data included in the selected violation: the food entity of the violation will represent the object of the feedback, whereas the level of violation (e.g., deviation between food quantity expected and that actually taken by the user) is used to represent the severity of the incorrect behavior.

The intention of the violation represents the fact that the user has consumed too much or not enough amount of the food entity. Feedback contains also information about the kind of lunch (breakfast, lunch, dinner or snack) to inform the user about the time span in which the violation was committed. The template aligned with the terminal symbols of the violation in our running example is in Fig. 5.

Fig. 5
figure 5

Model (template and example of violation) for generating the text of the feedback. The choices of template and message chunks depend on the violation package. This holds also for both the argument and the suggestion. Dashed lines represent a dependency relation

From a linguistic point of view, choices in the feedback are related to the verb and its tense: e.g., beverages imply use of the verb to drink while for solid food we used to eat. To increase the variety of the message the verbs to consume and to intake are also used. Simple past tense is used when violation is related to a specific moment (e.g., You drank a lot of fruit juice for lunch), while simple present continuous is used when the violation is related to a period of time of more days and the period is not yet ended (e.g., You are drinking a lot of fruit juice this week).

Argument: is the part of the message informing users about the possible consequences of a behavior. For example, in the case of diet recommendations, the Argument consists of two parts: (i) information about nutrients contained in the food intake that caused the violation and (ii) information about consequences that nutrients have on human body and health. Consequences imply the positive or negative aspects of nutrients. The template for this BCS is shown in Fig. 6.

Fig. 6
figure 6

Model (template and example of violation) for generating the text of the argument

In this case, the Intervention Composer uses the intention element contained in the selected violation package to identify the type of argument to generate. Let us consider the violation of our running example where the monitoring rule limits the daily fruit juice drinking to less than 200 ml (a water glass) since it contains too much sugar. In the presence of an excess in juice consumption (discouraging intention) the argument is constituted by a statement with the negative consequences of this behavior on user health. On the contrary, the violation of a rule requiring the consumption of at least 200 gr of vegetables per day brings the system to generate an argument explaining the many advantages of getting nutrients contained in that food (encouraging intention). In both cases, this information is not stored in the violation package and is queried to the reference ontology in the knowledge layer.

Moreover, the intervention composer analyzes the message history to decide which property returned by the knowledge layer in the violation package to use in the Argument, to generate a message content that depends on, e.g., the content of messages sent in the past few days, ensuring a certain degree of variability. With respect to the linguistic choices, the type of nutrients and their consequences influence the verb usage in the text. Finally, to emphasize different aspects of the detected violation, we the templates encode the use of appropriate parts of speech. For example, for stressing the negative aspects of the violated food, the verb contain for nutrients and can cause for the consequences have been used. On the other hand, positive aspects are highlighted by the verb phrase is rich in and the verb help is used for nutrients and consequences, respectively.

Suggestion: this part represents an alternative behavior that PerSEO delivers to the user in order to motivate him/her to change his/her lifestyle. Exploiting the information available, described at the beginning of this section, PerSEO generates a post-suggestion to inform the user about the healthy behavior that he/she can adopt in alternative. To do that, the data contained in the selected violation are not sufficient. The Intervention Composer performs an additional meta-reasoning to identify the appropriate content that depends on (i) qualitative properties of the entities involved in the event; (ii) user profile; (iii) other specific violations; and (iv) history of messages sent. The model for generating a suggestion message is shown in Fig. 7 where, for the sake of readability, we report only the second point of the list: the compliance with the user profile.

Fig. 7
figure 7

Model (template and example of violation) for generating text of the suggestion

Continuing with the running example, first the Intervention Composer queries the HeLiS ontology in the knowledge layer through the reasoner to provide a list of alternative foods that are valid alternatives to the violated behavior (e.g., similar-taste relation, list of nutrients, consequences on user health). These alternatives are queried according to some constraints: (i) compliance with the user profile and (ii) compliance with the other set up goals. Regarding the first constraint, the reasoner will not return alternative foods that are not appropriate for the specific profile. Let us considering a vegetarian profile. The system does not suggest vegetarian users to consume fish as an alternative to meat, even if the fish is an alternative to meat by considering only the nutrients. The second constraint is needed to avoid alternatives that could generate a contradiction with other healthy behavior rules. For example, the system will not propose cheese as alternative to meat if the user has the persuasion goal of cheese reduction. Finally, a control on message history is again executed to avoid the suggestion of alternatives recently proposed. Regarding the linguistic aspect, the system uses appropriate verbs, such as try or alternate, to emphasize the alternative behavior.

6 Evaluation

In this section, we report the evaluation activities we performed on the HORUS.AI solution during the forty-nine days time span of the Key to Health project. This user study consisted in providing a group of users with a mobile application we created based on the services included into our solution. We analyzed the usage of the mobile application connected with our solution for the project time span by monitoring the information provided by the users and the associated violations, if any. Our goal was:

  • to measure the effectiveness of the persuasive messages generated by HORUS.AI by observing the evolution of the number of detected violations;

  • to test the suitability of HORUS.AI in a real-time scenario;

  • to gather qualitative information from users about their usage experience of the provided application.

The 120 users involved in the Key to Health project have been split in two groups. We used a non-randomized experiments setup, as described in Harris et al. (2006).

In particular, we relied on the setting using Intervention and Control groups with posttest design. This design involves two groups where the intervention is implemented in one group and compared with a second group without the intervention, based on a posttest measure from both groups. A first group of 92 users (the intervention group) received messages according to all the BCS implemented in the PerSEO layer. A second group of 28 users, which was our control group, did not receive messages implementing the arguments about the consequences of a violation and the behavior suggestion BCS. Indeed, they received only canned text messages with the feedback BCS notifying when a rule was violated. An example of canned text is “Today you have drunk too much (300 ml of maximum 200 ml) fruit juice” notified as soon as the related violation is detected. The control group of 28 users represents our baseline to evaluate the impact of the generated messages.

Our hypothesis was that a persuasive message exploiting the three mentioned BCS allows a higher decrease in the number of violations along with the usage of the application. In addition, for the Key to Health project, the domain experts validated and adopted three kinds of dietary rules provided by the HeLiS ontology:

  • QB-Rules (Quantity-Based rules related to single meals) that check the proper amount of a given food category to be consumed in a meal. Users were asked to insert 4 meals everyday: breakfast, lunch, snack, dinner. A pair (meal, day), e.g., breakfast at day 1, is associated with an identifier number.

  • DAY-Rules (related to a single day) that check the maximum (or minimum) amount (or portion) of a given food category that can (or should) be daily consumed.

  • WEEK-Rules (related to a single week) that check the maximum (or minimum) amount (or portion) of a given food category that can (or should) be weekly consumed.

We start by presenting an analysis of the raw data we collected during the forty-nine days time span of the Key to Health project (Sect. 6.1). Then, we discuss the effectiveness of the personalized messages generated by exploiting information inferred through reasoning activities (Sect. 6.2).

6.1 Analysis of real data

Figure 8a provides an average characterization of the meals inserted by users based on their type (breakfast, lunch, snack, dinner), in terms of number of composing foods, calories and main nutrients (carbs, lipids, proteins), and the number of triples necessary to encode this meal data in the triplestore; a daily per-user aggregate is also reported. The data give evidence (on average) of a 1800 Kcal daily diet, although users may have omitted some consumed food or underestimated its amount, either unintentionally (i.e., they forget to enter a meal) or intentionally (e.g., to “hide” the consumption of unhealthy foods). The number of triples needed for a meal is in the order of few tens, suggesting that the representation of meals in the ontology is compact and thus makes it easier to store and manipulate large numbers of meals. On average, 88 triples per user per day are currently needed, meaning that a small HORUS.AI deployment supporting 1B triples would manage to store one month worth of meal data for over 400K users.

Fig. 8
figure 8

Real data analysis for the Key to Health project: a Consumed foods, nutrients and RDF triples for user-supplied meals aggregated per user/day; b Reasoning time and output violation distributions

The box plots in Fig. 8b summarize the distributions of reasoning time and the number of output violations for the three different above-defined dietary rules Footnote 10. Both reasoning time and the number of violations increase moving from a single meal to a whole week, as more input data is being processed and more rules can be violated. Each violation corresponds on average to the assertion of 18 triples in the triplestore. Reasoning takes around 1s in half of the cases (medians in box plots), but other cases require much more time (up to 14s for WEEK-Rules) and lead to an increase in the average reasoning time (diamond means in box plots).

This variability is the result of the interplay of two main factors: server overload and numerousness of the triples generated from the data provided by some users. In both cases, the time required by the reasoner increased. This aspect is purely related to technological aspects. Hence, in the case of deploying the system into a real-world scenario, this insight will support the sizing of the machines to use.

6.2 Effectiveness of persuasive messages

Figure 9 presents the evolution of the average number of violations per user related to the QB-Rules, DAY-Rules and WEEK-Rules sets, respectively.

Fig. 9
figure 9

Evolution of the average number of detected violations through the Key To Health project time span

The blue line represents the average number of violations, whereas the red line represents the standard deviation observed for each single event in the intervention group. Then, the green line represents our baseline, i.e., the average number of violations generated by the control group. The orange line represents the associated standard deviation. As mentioned above, QB-Rules are verified every time a user stores a meal within the solution, DAY-Rules are verified at the end of the day, while WEEK-Rules are verified at the end of each week. The increasing trend of the gap between the blue and green lines demonstrates the positive impact of the persuasive messages sent to users. We can observe how for the QB-Rules the average number of violations is below 1.0 after the first 7 weeks of the project. This means that some users started to follow all the guidelines about what to consume during a single meal. A positive result has been obtained also for the DAY-Rules and the WEEK-Rules. By considering the standard deviation lines, we can appreciate how both lines remain contained within low bounds and after a more in depth analysis of the data, we did not observe the presence of outliers, see the box plots in Appendix in Section 1.

The analysis of the drop of violations after the 7 weeks time span of the project reported in Table 2 shows that that both QB and DAY rules obtained good drops.

Table 2 Drop of violations at the end of the observation period. The first column contains the type of rules observed, the second and third columns the drops observed within the Intervention and Control groups, respectively. The highest drops in violations occur with the more frequent rules (i.e., QB-Rules and DAY-Rules)

In general, we may observe that for all the cases the intervention group has a bigger drop with respect the control group. For the WEEK-Rules, however, such a drop remained limited. This can be explained with the fact that the QB and DAY-Rules are more frequently notified when violated: after every meal and day violations, respectively. The WEEK-Rules are notified once a week. As a consequence, the users pay more attention to the more frequent kind of notifications. However, another interesting aspect related to the WEEK-Rules is the different of their drops with respect to the ones of both QB-Rules and DAY-Rules. Within the Intervention group, the drop of the WEEK-Rules is around 48% and 37% lower with respect to the drops of QB-Rules and DAY-Rules, respectively. Instead, within the Control group, such drop differences increases to 88% and 85%. Here, lower percentages mean a higher effectiveness of provided messages. We may infer how the canned texts, i.e., the ones received by the users belonging to the Control group, are very low effective with respect to the ones received by the users belonging to the Intervention group. The hypothesis validated by the experts is that the more complete messages received by the users belonging to the Intervention group led to a better plan of their diet in the subsequent week. This point will be considered as further research question to investigate more in depth in the future.

By combining the evolution of the number of violations with the demographic information shown in Table 1, we did not find any particular correlation worthy of discussion.

Further quantitative analysis regard the time spent by our system to be effective. Figure 9 shows us that the two groups tend to diverge at a certain point during the Key To Health time span. Here, we are interested the day/week when the two groups start to diverge with a statistical significance. We report this analysis in Table 3 with these days/week along with their p-values and average number of violations in the starting day/week for both the intervention and control group.

Table 3 Key days/week where the intervention and control groups start to diverge with statistical significance. The first column contains the type of rules. The second column contains the day, or the week, after which the difference between the drops observed within the Intervention and Control group starts to become statistically significant. The third column contains the related p-value. Finally, the fourth and fifth columns contain the averages of the violations observed for a user during the entire time span and the related standard deviations for both the Intervention and Control groups, respectively

The DAY-Rules have the quickest starting point as the two groups diverge from the 19th day, that is, the HORUS.AI system took less than of the 39% of the project time span to be effective. On the other hand, the QB-Rules are the slowest to be effective taking 29 days of system usage. This is due to the fact that these rules regard strong dietary habits of users that require a constant attention and effort to be changed in order to respect the QB-rules. Indeed, for both intervention and control group the average number of violations is quite small. The WEEK-Rules have a similar starting point of the QB-Rules. This can be explained with the fact that WEEK-Rules require some organization to be respected. Indeed users need some planning of their meals for the week and consequently they have to buy the proper food with these rules in mind. This planning requires the proper effort and time.

The evaluation proposed in this section represents a standard procedure adopted to observe the effectiveness of an intervention implementing specific behavior change strategies from a quantitative point of view. From the theoretical perspective, an alternative baseline may be the exploitation of predictions formulated by domain experts and to observe whether such predictions are in line with actual trends. This possibility has not yet been investigated in the literature due to the hard challenge of having a complete agreement between experts about predictions. The reader may find a discussion about this aspect from a past experience we had in a previous project that we described in Dragoni et al. (2018). However, concerning predictions, we started to investigate the use of machine learning approaches to simulate users’ behaviors based on their personalities (Donadello et al. 2022). Preliminary results demonstrated the suitability of this research direction and we consider this as an important future activity for extending the contributions presented in this paper.

7 Beyond data analysis

Besides the quantitative analysis provided above, the overall Key to Health project experience allowed us to collect some lessons that will contribute to improve the effectiveness of the HORUS.AI solution and to the design of future living lab evaluations. The lessons learnt regard the automatic acquisition of data, the suitability of HORUS.AI in a real-time scenario and qualitative feedback from users about the personalization features of the solution.

Data acquisition. The observation of provided data highlighted a disparity between the amount of dietary records and of physical activity ones. Indeed, all users have reported their meals on a regular basis (that is, 5 times a day for a period of 49 days), while their physical activities have been reported only occasionally (around 20%). This is the reason for which, in the previous section, we focused our analysis only on violations generated from meal data. The fact that physical activity data have been reported only occasionally was not associated with a low usability aspect of the mobile application, but on the availability of personal wearable devices. Actually, those who had one of such devices provided data on a regular basis, but their number was too low to perform a significant analysis of them. For what we observed, we noticed that the trend on the number of detected violations concerning the physical activity was consistent with the dietary one, but this result was not conclusive. It will be part of the future work to improve data collection about physical activities by supporting users with further modalities for providing such data.

A second important aspect associated with the data acquisition perspective is the challenge of data heterogeneity. In this work, we operated within a closed environment where users belonged to the same study group and they all used the same equipment. However, the deployment of the HORUS.AI solution into an open environment (i.e., a real-world production system) needs to take into account that some data might be acquired by third-party services and/or devices using different data models implemented within distributed data sources (Comito and Talia 2004). Distributed data sources can be heterogeneous in their formats, schemas, access mechanisms and policies, ownership, and capabilities. Hence, it would necessary to address the data integration challenge since it would represent, in this context, a key issue. The adoption of data harmonization strategies based on the use of ontologies is a feasible solution.

Real-time suitability of the HORUS.AI solution. The system we proposed aims to be deployed into a real-time context. Personalized feedback and recommendations have to be provided timely to users based on the evolution of their behaviors and of the surrounding environment. Hence, we observed the performance of the whole reasoning process implemented into the HORUS.AI solution. Details about the effectiveness of the reasoner component are out of scope of this paper. Our focus here is on the optimization process that allow HORUS.AI to generate real-time personalized messages in an efficient way. Our results derived from the optimization of rules design and rules evaluation schedule. In a first stage, we designed few complex rules for covering all possible monitoring activities. On the one hand, we were able to cover several constraints with one rule. But, on the other hand, the computational time required for evaluating these rules was too high with the consequence that the personalized tracking of users’ behavior was not effective neither efficient. Therefore, in a second stage, we opted for splitting the rules in simpler ones and, at the same time, to schedule their evaluation according to their time and event-based properties. This strategy led to an improvement of the overall reasoning performance. This both makes the HORUS.AI system deployable within a real-time environment and allows us to have an easier control on the overall reasoning process (exactness of the Violation instances, debugging operations, etc.).

In the scenario addressed by the current deployment of HORUS.AI, reasoning operations are performed on datasets of triples that describe users’ specific events. A future improvement of personal tracking capabilities will be the investigation of stream reasoning strategies. These strategies are necessary when a continuous flow of information has to be monitored. In addition, learning strategies can be paired to suggest new rules or adaptations of existing ones. An example within the health domain is the real-time monitoring of the glycemic index.

Instead, on the end user side, the adoption of a real-time solution able to collect data from different sources opens to the necessity of designing efficient mobile applications taking into account also the energy management aspect with the aim of (i) optimizing the data management aspect of the mobile application Comito and Talia (2017) and, in turn, (ii) avoiding the missing acquisition of crucial knowledge exploited during reasoning operations Dragoni et al. (2018).

User perception about personalization. The last consideration is related to the actual perception that the users involved in our living lab had about the personalization capabilities of the proposed solution. To this extent, we organized a focus group at the end of the Key to Health project. The aim was to collect qualitative feedback about such perception by asking the users when the system succeeded and when it can be improved concerning personalized interactions. Overall, the users appreciated the system responsiveness and message tailoring capabilities when new data about both food consumption and physical activities were provided. However, during the discussion, we discovered that several users perceived the combination of some rules very hard to follow. An example of such rules were the ones related to Vegetables (at least three times a day) and the consumption of Milk and Yogurt (at least once a day). In the first case, many users found hard to introduce the third portion of vegetables within their daily diet. In the second case, some users experienced a psychological barrier concerning the consumption of such a food category due to their fear of having some digestion problems. We reported these feedback to the domain experts that took them into account for a new refinement iteration of the monitoring rules. These will be implemented in the future deployments of the solution. Further considerations can be done about the abandon rate of the system, a too pushy notification system could have a high abandon rate. In our case, all the users used the system until the end of the project and no complain about the notifications have been raised during the focus group. However, a common request was related to the possibility of better exploiting the geographical information that can be acquired through the smartphone sensors. This information was considered relevant for motivating people in changing habits within some real-life situations, for example not to stop at a vendor machine during a walk. Suggested examples about the exploitation of geographical information include the possibility of sending alerts about close healthy nutrition shops, restaurants cooking recipes that are compliant with users goals, sport events related with preferred users’ habits, etc. These suggestions will lead the next version of the personalization component of HORUS.AI in order to improve the perception that the system is providing a real-time support to users.

8 Conclusions

In this paper, we presented the HORUS.AI solution and its key components, i.e., the rule-based reasoner adopted for monitoring users’ behaviors in order to support the promotion of healthy lifestyles and the dialogue-based system enabling the user profiling activity and supporting the interaction and data acquisition tasks. We discussed in particular the role of dialogue-based persuasive component and how the knowledge layer supports it by providing the necessary information allowing the creation of contextual effective messages. Through a running example, we described how the reasoner operates on the data provided by the users and how messages are composed. We evaluated the overall solution in the context of the Key To Health project by evaluating the effectiveness of full persuasive messages with respect to a control group with limited behavior change strategies in the persuasive messages. The difference is statistically significant. In addition, qualitative results demonstrated the effectiveness of HORUS.AI and the suitability of adopting the system in real-world scenarios.

The limitations of the work regard both BCS and AI-related aspects. From the BCS side, HORUS.AI does not cope with the underreported problem in the diet diary. Therefore, according to Nederhof (1985), a combination of detection and prevention methods has to be developed as future work. On the AI side, the input layer should be enhanced with: i) image classification techniques to allow users to easily record the consumed food with just a picture taken with their smartphones Donadello and Dragoni (2019). ii) Natural Language Understanding techniques to understand users’ barriers and capacities in meeting the persuasion goals with interactive dialogues Zhang et al (2020). This will make the system more user friendly and supportive. Regarding the PerSEO layer, the future work will focus on developing and implementing more advanced, and less hard-coded, natural language generation techniques (Pauws et al. 2019) with respect to the template-based solution here proposed.

Thanks to its flexibility, HORUS.AI is suitable to be used as the persuasive and motivational technological support in several national and international projects crossing different domains of interest (e.g., asthma, diabetes, mental health, etc.). The experts will be involved in creating the rules used by the solution for monitoring users and for validating the output produced by the persuasive layer during the feedback provision.