1 Introduction

Smart environments are being developed to support and enhance humans in their regular lives by facilitating every-day procedures. This is achieved by making these procedures more convenient, less time consuming or by enabling people through Ambient Assisted Living (AAL) to conduct otherwise unsolvable tasks.

As we spend a lot of our time in our homes, in particular Smart Home environments offer a vast number of potential anchor-points for adding assistance functionality. However, the large divergence in user expectations and the diversity of requirements, legal regulations, safety and privacy aspects as well as technical possibilities still comprise many unsolved challenges.

Isolated solutions, such as vacuuming robots, light control systems or keypad-enabled door locks are already available as well as protocols, buses and hubs to link these devices and control them. However, solutions that encompass fully-fledged interconnectivity of such subsystems, combining gathered information to achieve a deeper understanding of the ongoing processes and providing goal-oriented assistance have not yet progressed significantly beyond the state of early prototypes.

As a first step towards our general vision of an interconnected cognitive smart home environment, we concentrated on the aspect of cooking. Cooking is a very important every-day task that usually requires a lot of practice and experience. Therefore, an assist system that not only facilitates cooking but also includes a teaching component, is highly desirable.

Given the complexity of the cooking process and the safety aspects, it is often the case that mentally handicapped or elderly people are forced to move into retirement homes or institutions as they cannot cook for themselves anymore. In particular in light of the demographic change currently taking place, devices or systems that enable such people to live longer in their homes become more and more important for society as a whole.

Given this situation, we introduce KogniChef, a smart and interconnected kitchen environment and software framework that implements a cooking recipe lane keeping assistant. Our prototype (see Fig. 1) features a framework for the integration of multi-modal displays, a service-oriented sensor layer, interfaces for compliance control units as well as an overarching dynamically created controller that connects all these components together to dynamically and interactively assist users as they work through a recipe. Our vision is for a system that can provide appropriate and adaptive cooking assistance similar to that given by a human chef.

Fig. 1
figure 1

KogniChef system during a presentation at the trade fair for Automation and Mechatronics (AUTOMATICA 2016) in Munich/Germany

This article is organized as follows. After discussing the state-of-the-art of assistive cooking technology (see Sect. 2), we introduce requirements and state important concepts for human-computer interaction in smart environments in Sect. 3. The setup and functionality of our research prototype is then explained in Sect. 4. After presenting a set of interconnected task assistance modules, Sect. 6 delivers results from a conducted usability study and summarizes our presented work.

2 Related Work

Assistant technology for cooking and for general household tasks has a long tradition. While before, approaches were restricted to very specific tasks, modern devices are able to not only guide but also to adapt to the user.

The Vorwerk Thermomix TM 5 is a particularly important device as it has already reached series maturity. Thermomix is a single pot endowed with devices and sensors enabling it to perform 12 different functions such as grinding, mixing, stirring, cooking or weighing. The cooking process is configured by selecting a recipe, which is then serialized into step-by-step instructions for the user. The reaction to the device has been mixed—while some consider it Germany’s most successful kitchen aid [1], Stiftung Warentest’s ratings were only mediocre [20]. There are many obvious similarities to our KogniChef idea, but our vision is to extend the functionality along two crucial axes. First, the versatility of Thermomix is limited by its dimensions and the features are embedded into a single device. Second, the user is forced to follow the suggested steps.

This leads to a role allocation, in which the human is reduced to an assistant to the device. In contrast to this, we strive for a user-centered approach that allows for self-determined cooking. KogniChef acts as an expert assistant and we embed it into the regular cooking process using existing appliances.

Apart from this commercial product, there are many related research systems. The Assistive Kitchen [3], employs a mobile robot to support and assist people in their activities of daily living (ADLs). Other research focuses on supporting the storage and retrieval of goods, using a dialog-based interface [6], or (semi-)automated replenishment [9]. The monitoring system presented in [4] provides ambient assistive living services for a smart kitchen. They observe and document the users ADLs in order to automatically deactivate kitchen appliances or to warn the people in emergency situations. These projects introduce assistance functionalities for support of kitchen ADLs, but they provide no particular features for improving the cooking process itself.

The cooking assistive system MimiCook navigates the user through a recipe employing an augmented reality environment [16]. To this end, cooking instructions and immediate scale feedback are projected directly onto the kitchen counter. The multimedia application Cooking Navi reschedules recipe steps by interpreting the cooking workflow as an optimization problem [10]. Their user study shows a positive resonance on the system’s multimedia navigation but it revealed that the rescheduled recipes were often perceived as unnatural. Both systems, MimiCook and Cooking Navi, guide the user through the recipe and offer a set of useful assistance functions, but in contrast to KogniChef, they do not actively control kitchen appliances and they cannot dynamically adapt to the user’s actions.

The Mampf system [15] realizes an adaptive kitchen, which guides the user through the cooking process by providing auditive and visual feedback. However, their main contribution is focused on zoneless cooking which allows cooking targets to be associated with pots and pans. Similar to our Hob Control module (see Sect. 5), a pot can be moved to another hob and the system automatically adapts the controlled hob zone. In contrast to the automatic adaption of recipe steps implemented by KogniChef, their recipe step rearrangement method requires manual configuration. Furthermore, their system is limited to a hob with a configurable number of zones and their system interface is limited to a single tablet PC.

3 Conceptual Requirements

When developing potential assistance systems for the kitchen domain, the large variety of both kitchens and their equipment as well as the skills and individual preferences of the kitchen users have to be taken into account for the definition of the requirements. To this end, we involved several domain experts such as professional chefs, ecotrophologists, nursing services, manufacturers of kitchen appliances, engineers, privacy experts and even lawyers to define typical personas of an exemplary ordinary family. This allowed us to come up with an initial set of common use-cases that served as a starting point for a user-centered design (UCD) development process [12] to iteratively refine our concepts and requirements in consideration of technical possibilities and limitations.

3.1 Engineering-Related Requirements

With regards to the UCD, we evaluated the following requirements.

  • Adaptive Personalized Workflows: The system should follow the user and not vice versa; ideally learning the user’s preferences and adapting to the user’s skills. The system may guide the user’s attention to prevent failure, but the user remains in control.

  • Multimodality: Multimodality is required in the cognitive kitchen for both interaction and perception to cope with a changing and non-deterministic environment.

  • ELSI: Ethical, Legislation and Social Implications must be carefully considered to avoid potential abuse of the assistance system or one of its components.

  • Loose Coupling and Concurrency: Considering the previously mentioned requirements, the system complexity suggests a modular software setup, in which components process and exchange data asynchronously.

    As for every complex technical system, standards in common UCD such as usability, fault-tolerance and system state transparency have to be reached for user acceptance, safety and security.

3.2 Concepts for Kitchen Assistive Systems

The spectrum of possible kitchen assistive systems spans an informed kitchen that augments the kitchen with helpful information for the user, to fully-fledged automatic cooking scenarios (see Fig. 2). The bottom triangle represents the actual physical kitchen including sensors, kitchen appliances as well as cooking tools and ingredients. Sensors can be employed by monitoring processes (blue) in order to retrieve information about the environment, such as the temperature of a pot or the state of an ingredient. Appliance interfaces allow controlling-modules to actively take over the responsibility of kitchen appliances (green).

Fig. 2
figure 2

Concepts for interaction in smart kitchens

In contrast to the either completely manual (blue) or the completely automated (green) control, a weighted fusion of both aspects allows a set of very promising smart assistant scenarios to be defined, in which control is shared between the system and the user. For a teaching or a guidance scenario the system provides a configurable amount of active support. In the guidance-case, a user profile that encodes the user’s cooking skills or possible age-related restrictions regarding the use of certain appliances could be employed to derive a user specific assistance level. A teaching scenario could additionally attach information about the user’s previous cooking sessions and update the profile on the basis of an estimated user score.

Further parameters associated to the user profiles would even allow the extension of this idea towards a taking care scenario, which could be suited to elderly people. Assuming good cooking skills but a non negligible tendency to forget mandatory cooking steps or actions (such as turning off the hob), the system would only actively intervene in safety-relevant situations and otherwise provide a minimal amount of assistance.

With this in mind, specialized and configurable assistive modules, based on an overarching controller unit, have been implemented and dynamically combined to create a system that best suits the individual user.

3.3 Context-Sensitive Guidance

As a human cook must track a large amount of parallel cooking processes, user attention should be considered a rare resource. Therefore, the design of multimodal interfaces for potential assist systems should merge these processes into the user’s context [8]. Given a certain context, the presentation of information must avoid dragging the user’s attention away from other, perhaps currently even more relevant, tasks. This must also take into account the different characteristics of modalities such as vision or hearing. While vision is the dominant conscious sense, it features a narrow focus of attention which can be self-controlled. In other words, while a user can look away with minimal effort, an attempt to not listen depends heavily on the heard sound. Therefore we propose the following saliency levels of system feedback.

  • Peripheral: Events that require no immediate user intervention (e.g. device status updates or other background events, e.g. oven temperature reached).

  • Overview: A structured overview of the current system state which can be manually navigated by the user (e.g. device status, recipe progress, ingredients, next steps).

  • Task-Related and Focussed: System feedback that can be perceived without shifting the current visual focus of attention (e.g. positioning advice for the current cookware).

  • Task-Related and Monitored: Information that offers an additional value for the current task and can be accessed easily by the user (e.g. timing and weighting feedback).

  • Critical: Information that requires immediate user feedback (e.g. critical device event).

4 KogniChef: A Cognitive Kitchen Platform

As a first step towards the realization of our envisioned smart kitchen assistant system that matches the requirements presented in Sect. 3, we developed the KogniChef prototype, which serves as an extensible research platform. The setup not only enables us to significantly speed up research-related adaptions of the hardware such as re-positioning or adding sensors, but its mobile construction also allows the whole system to be presented and tested outside of our laboratory.

The prototype already integrates several common kitchen appliances (hob, steamer and oven), whose connections are always implemented in a bidirectional fashion, so that our system can not only read their states but also actively configure, start, stop and control them. In addition, several other types of sensors including a microphone array, 2D and 3D cameras and custom-designed scale sensors are available.

In order to demonstrate the versatility of our proposed KogniChef platform, a set of modules was implemented, each of which provides a significant facilitation of a common cooking sub-task. The overarching controller integrates these modules in a reusable, configurable and reentrant fashion, which allows not only different modules to be executed at once, but also several parameterized instances to be created (see Sect. 5).

4.1 Our Research Prototype (Hardware Setup)

Our KogniChef system (see Fig. 3) consists of a main and a side unit. The main unit includes a \(1.4\times 0.6\)m working area with an integrated induction hob. An easily cleanable 4 K-screen replaces the back wall and extends from the work-top level up to an upper compartment that houses an extractor hood, a bright short distance projector and a multi-camera array (see Fig. 4). The central communication hub is a tablet PC that is attached to a custom-designed side-arm, mounted to the right of the screen. In addition to a microphone array, we integrated custom-designed load-cell based four-point scale sensors into a small dedicated area of the worktop as well as into the hob in order to facilitate weighing and physical interaction detection.

Fig. 3
figure 3

Keycomponents of the KogniChef hardware setup (the camera rig is omitted here)

Our software system (see Sect. 4.2) runs on a 6-core Linux PC that is hidden behind the lower drawers. The side unit is a standard cupboard that contains an oven and a steamer. All kitchen appliances (hob, steamer and oven) employ custom modified firmwares that allow them to be remotely controlled, which is legally thus far not possible with standard series products.

Fig. 4
figure 4

The system employs a camera rig that rigidly connects a Kinect 3D camera, a high-quality RGB-camera and a thermal camera. All cameras are calibrated in the scene, which allows an RGB/XYZ/temperature point cloud to be computed

4.2 Software Architecture

Mirroring our developed concepts (see Sect. 3), the KogniChef software architecture follows a modular and multi-layered design approach (see Fig. 5). On top of the hardware-layer, defined by sensors, auditive and visual displays and appliances, a service-layer is placed. The contained services abstract from particular hardware interfaces and protocols by providing logical and unified event-driven network interfaces that are based on the Robotics Service Bus (RSB) middleware [21]. The services are used by the components in the monitoring-layer. The monitoring modules are configured and run either manually (triggered by user input) or by the overarching top-level controller. Dedicated units that are embedded in the control-layer can be configured to create independent control-loops. The most important units will be discussed in more detail below.

Fig. 5
figure 5

Overview of the software layer architecture

4.2.1 Object Detection and Tracking

The object detection and tracking module is one of the most central components of our system as it defines a fundamental basis for the tracking of cooking actions carried out by the user. In each processing step, the module provides a list of Object-Beliefs, each referring to a detected and classified object in the scene, with an attached set of computed features. Objects can be identified by a uniquely assigned class-dependent ID, which allows even identical objects to be distinguished.

Based on an input XYZ/RGB/temperature point-cloud (see Fig. 4), our implementation employs a cascaded three-layer approach consisting of a model-free segmentation unit[18], a simple yet powerful bag-of-features-based classification system[19] and an spatio-temporal tracking layer.

4.2.2 Grasp Detection

In order to immediately react to user actions, it is often mandatory to recognize which objects are grasped. In particular, in situations in which a known but randomly ordered set of actions is required (e.g. pouring three different ingredients into a pot), the ID of the next grasped object is needed to appropriately setup potential monitoring components such as the pouring assistant (see Sect. 5.2). Our grasp detection module matches a heuristically identified hand-segment with the detected object, so that a spatial overlap indicates that an object is grasped.

4.2.3 Speech Processing and Dialog

For verbal communication, we employ a combination of an incremental speech processing framework and an asynchronous dialogue manager [5]. Internally, speech recognition is realized with the Sphinx [11] framework while MaryTTS [17] is used for text-to-speech synthesis. As motivated in Sect. 3, we use verbal feedback for important notifications that do not require visual attention (e.g. system starting to control a kitchen appliance or the expiration of a cooking time). In addition, the recipe can be navigated through using a simple dialog interface. This is of particular importance for cases in which the standard touch-gesture based control would require the user to suspend his current action in order to reach the control hub or even to wash his hands first.

4.3 Recipe Representation

To assist the user to cook different meals, a recipe representation providing all necessary information for the KogniChef system is required. Similar to [14], we decompose the cooking workflow in meaningful subcomponents which we call recipe steps.

Figure 6 shows an example for the partial order in a rice pudding recipe, which contains recipe steps that can be carried out in an arbitrary order as well as parallel steps. In addition, conditions can be specified to achieve synchronization between specific steps. The recipe steps are parameterized by an action unit, such as add, measure, heat or cool down and a list of involved utensils, appliances and specified amounts of ingredients. Optionally, recipe steps can be nested to combine similar steps, such as adding several ingredients in one step.

Fig. 6
figure 6

Workflow example for a rice pudding recipe

The cooking workflow description, represented as an XML-file, defines the basis of our KogniChef system. The description is parsed to dynamically generate a specialized user-specific controller and the content of a recipe-specific user interface that is used for the control hub.

4.4 Controller

In order to account for the high parallelism in cooking, the controller, internally based on SCXML, implements a concurrent hierarchical state machine HSM [2]. Hierarchical states are used to handle user interactions e.g. allowing for navigating the recipe by speech interaction at all times. Recipe steps of arbitrary order are automatically converted into decision steps that serialize the step order based on user activity. Action units, representing frequently conducted processes (see Sect. 4.3), are transformed into assist modules. These modules can be instantiated and maintained as background processes by the controller, allowing them to be added in a plug-and-play manner.

The controller state is continuously updated as the KogniChef system guides the user step-by-step through the recipe and tracks the user’s progress based on objects’ positions, temperatures, weight information, and states of the kitchen devices. At the end of nested recipe steps, the controller halts to allow a process review by the user and waits for confirmation. The user however stays in charge and can always explicitly override and adapt the state to his current needs.

4.5 Interaction Loop

The control hub is the central interface to provide the user with information about the current recipe and system status. In order to minimize the user’s cognitive load while commanding a level of attention, all visual information is structured after the Gestalt Principles [13]. Recipe information is semantically separated into tabs. This includes images and video clips of complicated or often unknown cooking actions (e.g. cutting a vanilla bean) to facilitate the understanding of these cooking operations. Additional status information and controls (e.g. appliance states or sound volume) can be accessed though a hidden side menu.

The interface employs a Model-view-viewmodel pattern using Vue.js [22], to keep control and visualization flow decoupled. This allows imminent feedback for user actions such as measuring ingredients or grasping items but also enables the user to alter the system state (e.g. to override current tasks). Additional view specific model parameters handle feedback in cases where user induced state changes have yet to be consolidated.

The resulting architecture is entirely stateless and allows content replacement at run-time by other subsystems. Doing so will trigger immediate adaption of the related view components, for instance if the controller has to modify the recipe structure due to resolved ambiguity (see Sect. 4.4). This feature is also used to synchronize the interface with the controller as both subsystems use the same configuration in the form of the previously mentioned recipe files (see Sect. 4.3).

To handle critical and dynamic information (e.g. the system offers to take control of a heating procedure), a system component can use the notification system to request a user decision. Depending on the notification’s severity/saliency level or the requested resource, the interface chooses an appropriate audiovisual representation.

5 Cooking Assist Modules

The following modules, which aim to facilitate certain common sub-tasks of cooking were implemented for our platform.

5.1 Hob Control

One very common sub-step is the warming up of food in pots and pans. However, the indirect control of the food’s temperature by altering the hob’s power-level (commonly in discrete steps between 1 and 9) requires the user to continuously maintain attention. With the Hob Control (HC) module, we enabled our system to intuitively take control of the warming-up process.

HC is based on the thermal information that is attached to each detected object (see Sect. 4.2). Similar to the work presented in [15], HC is configured with a target temperature \(t_d\) for a uniquely identifiable container/food object. By internally employing a simple PID-controller, it can automatically achieve and optionally also maintain \(t_d\) as a background process. Due to the association to an object, HC can even dynamically switch to control another hob plate in cases, where a displacement of the object is detected.

5.2 Interactive Pouring Assistant

Another common cooking step is to measure ingredients with a scale, often achieved by using an extra weighing container. In order to facilitate this step, KogniChef provides an interactive pouring assistant that allows this important task to be carried out more quickly and without the need for additional equipment. By fusing vision-based fill-level estimation with weight information which we acquire though our integrated scale sensors (see Sect. 4.1), a higher accuracy can be achieved. In addition, the assistant uses the object-detection module to provide an automatic tare function thereby guaranteeing an optimal workflow. A specialized fill-state widget (see Fig. 7) provides a real-time visualization of the filling progress. For transparent fluids and non-fluid ingredients, the system seamlessly switches over to fully rely on the scale data.

Fig. 7
figure 7

Real-time feedback widget of the interactive pouring assistant

5.3 Stirring Assistant

Our four-point scale-sensor design (see Sect. 4.1) not only allows us to directly weigh on the worktop or on the hob, but it can also be employed to detect stirring patterns. The stirring pattern detection is based on an estimated temporal trajectory of an object’s center of gravity, which is computed using standard triangulation methods. Assuming that the stirring spoon has a significant contact with the mixing bowl’s bottom, we can robustly distinguish circle and 8-patterns and the stirring direction and even estimate the stirring cycles per minute (cpm). The assistant can be set up in a pattern advisor-mode, in which the system encourages the user to perform a particular stirring pattern. Alternatively, it operates in timer-mode, providing interactive feedback about the progress within a desired stirring time and taking the estimated cpm value into account.

6 Results and Discussion

The KogniChef system has been continuously improved since its first public showing. In its current state, the system can track the progress of an abstract recipe representation. So far, it has be tested with two recipe examples: Creme brûlée and rice pudding with apple crumble and vanilla sauce. The system can be easily extended with a quantity of related recipes. Limitations arise due to the fact that the user interface requires authoring and media content.

During an early usability study in 2015, conducted with 12 participants, KogniChef was generally perceived positively (see Fig. 8). However, the study revealed several interaction obstacles, such as erroneously interpreted hand-gestures and visibility issues with the large back-screen that often required the user to step back to see the whole content. With this in mind, the main graphical user interface was redesigned and relocated to the tablet control hub.

Fig. 8
figure 8

During an initial user study, participant were asked if they consider their Creme Brulee preparation successful, if they felt comfortable during the interaction with the system and if they would like to have such a system at home. While success and comfort were rated positively, the feedback for personal usage was mixed

User feedback also included the general suitability of a large screen in the kitchen, which is at risk to be damaged by steam or boiling oil while cooking. Even though this is a valid remark, one has to consider that most components in the prototype serve as bridging technology which has to be replaced or specifically evaluated for a real world application.

As a next step, we plan to extend the user testing to groups with specific needs to evaluate and improve the actual assistance capabilities of KogniChef in close consultation with our KogniHome project partners.

Besides new impulses for system improvements, this also offers new insights about user participation in assisted living system development [7] and valuable training data for further progress in smart cooking. All of these factors will play a vital role in reaching the user acceptance, robustness, safety and security required to be able to widely deploy such assistance systems in the homes of users.

7 Conclusion

We defined concepts and context-specific requirements for a cognitive cooking assistance system and used these as the basis for our proposed KogniChef hard and software platform. Based on a formal recipe description, the system is able to dynamically generate a user interface that is specialized for the kitchen configuration and guides the user through the recipe. During this process, it automatically triggers novel assistance modules such as an adaptive hob control, a stirring detection which is integrated into the workspace and a filling detection based on computer vision. The user stays always in charge of the system and can pause, alter and skip steps. Additionally, if the user insists on having used a sufficient amount of an ingredient, KogniChef will adopt this believe. Further steps involve reaching out to specific user groups to gain new insights about cooking support in assisted living scenarios.