Keywords

1 Introduction

1.1 Intuition Behind the Proposed Product

Many of us have failed to follow a recipe properly and have ended up with an empty stomach many times. Even though the recipes are extremely detailed, we have ended up with maybe a burned, bland or uncooked meal. It makes us wonder, what if the recipes could talk to us and tell us that, “it’s too late to save that curry, it’s charred, start over”.

The proposed product, Kochen Helfer, intends to replace the tedious recipes with a life-like cooking assistant. An assistant that can not only read out the recipes but also give real-time feedback on the state of the food being cooked. The primary intention behind this proposed product is to provide a human touch and inculcate a human-like response.

1.2 Answering the Why

In the present era, the presence of the internet has opened doors to a lot of possibilities. Talking about looking up video recipes to help you cook up a meal is the go-to option for the cooking amateurs, although those are premade videos that cannot assess the current state of your food and give you proper feedback. That’s where our proposed product comes into the picture.

One of the primary focuses of the field of AI has been to make machines imitate human behaviour. Kochen Helfer is an approach to achieve this by using image processing and recommendation systems to equip an amateur with a personal cooking assistant.

2 Associated Publications

2.1 AI in Real-Life World

Artificial ıntelligence plays a significant role in our daily life. It is difficult to do anything which does not involve AI. With the advancements in technology, there has been a substantial impact on the businesses carried out. Strategies needed for any business to run smoothly are critical. This paper by Ghimire et al. [1], Prakash et al. [2] and Saxena et al. [3] has elaborately explained the growth of business and how machine learning and deep learning subsets of AI are used for tackling problems in business related to marketing, product recommendation, fraud detection and many more. The finance sector has also been significantly affected by the growing AI technology. The authors Zhang and Kedmey [4], Kolanovic and Krishnamachari [5] give a detailed discussion on the trends and attempts taken for processing business needs. The paper provides a model that can be used for the strategic modelling of an organisation.

Considerable use of AI in education has been witnessed in the last couple of years. Several tools have been created to make learning and teaching easy and interactive. Bhattacharya and Nakhare [6], Aldahdooh and Naser [7], Roll and Wylie [8] provide a detailed analysis of the use of AI-based tools in the vocational study sector and impacts it has on the students.

This pandemic situation made us realise that AI has played a significant role in dealing with problems, helping to build proper strategies for controlling the situation, suggesting appropriate plans, understanding the nature of the virus and generating necessary medicines and vaccines. The paper by Nirmala and More [9] provides an insight into the importance of AI in fighting COVID-19.

2.2 AI in Smart Kitchen Appliance

In recent times, the rise in artificial ıntelligence has revolutionised our ways and standards of living. As the field expands, we see numerous applications of it in our daily life. One such application is in smart kitchen appliances. Be it a refrigerator, stove, storage system, AI finds its usefulness everywhere.

Some of these scenarios are discussed by Mallikarjun et al. [10], Floarea and Sgârciu [11], who have addressed the use of AI and IoT in an intelligent refrigerator. The paper proposed a method by which AI can help determine the quality and quantity of food present in the refrigerator, which can be notified to the user with the help of an android application. The paper also proposed a machine learning algorithm that would recommend recipes based on the fruits and vegetables present in the refrigerator.

Another such case is mentioned by Afroz et al. [12], who have proposed an AI-enabled gas stove with two-step safety and age verification. Basically, using an ML algorithm for age detection, it detects if a child is trying to turn on the stove and prevents them from doing so. The authors have implemented a machine learning object detection algorithm and a deep learning architecture using CNN, for system execution.

Keeping track of your dietary consumption is a crucial aspect of a person’s life. We can do that manually by diet journaling, but for the elderly, it becomes a little challenging to keep track of their dietary habits. To tackle that, Gerina et al. [13] and Achananuparp et al. [14] proposed a way to track a person’s diet by measuring some air quality patterns in the kitchen using a deep neural network. This shows how far AI has reached and how greatly it can impact our lifestyle.

2.3 AI in Everyday Cooking

The role of artificial ıntelligence is primarily to reduce manual effort. And that is where, in the modern world, technologies like the Internet Of Things, machine learning and deep learning are being applied in varied domains like business, finance, sports and so on. Similarly, the same is being used in the cooking industry. Here we will specifically focus on the application of artificial ıntelligence in everyday cooking and the cooking industry.

There have been several improvements and developments in this domain, and there are multiple examples that have been discussed here. As discussed by Bień et al. [15], Papineni et al. [16] there have been several issues with generating text, but with the help of text-developing neural networks, the process has become more accessible. That is where Bień et al. [15] is mentioning how the recipe text dataset could help in this. In this research work, the author describes how the recipe text data and semi-structured datasets are related to each other by using a RecipeNLG dataset. The final goal of the author is to generate new recipes based on the recipe dataset provided to the model to learn. The author mentions that a cooking recipe consists of ingredients, quantity, and their unit of measures that could be used to train the model to extract food ingredients from a set of tokens. Now, once the model can identify food ingredients from a group of tokens, the same information is used to suggest new recipes that could be cooked.

Similarly, another such use case by Wang et al. [17] known as Market2Dish is described in their research work. It is a health-aware food recommendation scheme that maps ingredients sold in the nearby market to interesting, healthy dishes that could be cooked at home. This product primarily has three components, i.e. recipe retrieval, user-health profiling, and health-aware food recommendation. Each one of them has its work. Starting with recipe retrieval, it scans the ingredients available with the food consumer and based on that; it searches a large-scale dataset of recipes. Further, user-health profiling is done by capturing some parametric health numbers entered by the user and consumer itself. And finally, based on the health-aware food recommendation component, it provides the best healthy foods based on the input from the user.

3 Value Perspective

3.1 Need of the Product—A Helping Hand

A prevalent instance in a country like India is that after completing his schooling a student goes to some other city for his graduation where he lives all alone in a rented flat. He cannot have a maid at home as he gets lesser pocket money to run his days every month, and because of that, he needs to cook by himself.

The above narration is a pervasive story across thousands of students in India. It’s easy for a student staying at home because he has his parents to take care of him. Here, the situation calls for a solution in the form of an assistant that could help the student in need. That is why we propose an AI-based cooking assistant that could help the student in the hour of help needed.

Kochen Helfer is a cooking assistant that works on the concepts of deep learning and natural language processing. there are two pieces to this entire product. The first piece is the generic cooking assistant that helps with the recipe. This assistant allows freshers in the domain of cooking by being constant support throughout the process. Text to speech conversion is needed here for the assistant to work. For example, let’s say that someone wants to cook white sauce pasta. So, the person can go to the app and select the recipe he likes to cook. He can simply activate the assistant and ask to guide him with the recipe. Now, for cooking a white sauce pasta, one needs to aggregate all the ingredients first. Pasta, along with all the spices, is collected, and then the process of cooking starts. Now, throughout the process of cooking, the assistant can give instructions for cooking the same. It can tell when to boil the pasta and prepare the sauce and finally mix the same to get the final output, i.e. the white sauce pasta (Figs. 1 and 2).

Fig. 1
An illustration of a cooked penne pasta dish with white sauce sprinkled with green onions is presented on a dark bowl placed over a table.

White sauce pasta

Fig. 2
An illustration of twelve ingredients placed in bowls arranged in three rows. It has corn, rice, and wheat flour, red chilly flakes, fennel seeds, butter, green and red capsicums, corns, and penne pasta.

Ingredients to cook white sauce pasta

Now, the second part of the process or the app is the cooking correctness checker. This piece of the app works on the concepts of deep learning. To be specific, convolutional neural network can be used here. The specialty of this module is that it tells if your food is correctly cooked or not. The classification is done on the basis of stating the foods as undercooked, perfectly cooked and overcooked. This could be a perfect app to know if your food is cooked well. This works on a CNN model that could be trained on thousands of images taken while various recipes were executed (Fig. 3).

Fig. 3
An illustration of clipart presents how to cook a dish in 6 steps. 1. A jug of water is poured into a bowl, 2. Turn on the flame and cook to 100 degrees celsius, 3. The pasta and ingredients are sprinkled on the bow, 4. A spoon blends the dish for 10 minutes, 5. The dish is transferred to another bowl, 6. The dish is served hot.

Process of cooking white sauce pasta

3.2 User Persona

The proposed product, Kochen Helfer, is designed to assist amateurs, who are new to the cooking world, and those who have experience in cooking. For amateurs, it can help them by giving them constant feedback about the state of the food, whereas, for experienced cooks, it can provide them with new recipes from the recipe book. From college students living in the hostel to the little kids aspiring to become a Masterchef or to the adults trying to pick up a hobby, anyone who wants to step into the cooking world or wants to learn something new will find this product helpful. The model building process follows the preprocessing technique.

4 Proposed Architecture

This section explains the visualisations available for viewing categorical and numerical data. There are separate visualisations for each of these broad categories. The different visualisations are shown below.

4.1 High-Level Architecture

The initial and most crucial part of the Kochen Helfer application is data requirement and gathering. We must have the correct data that is required. Once we have gathered the data, it becomes necessary to understand that the data is trustworthy. Preprocessing is the next step that comes into the picture after the Data Sanity check; here various processing techniques are applied to the data. Prepossessing is followed by the model building process. Testing of the application is done once the model is built. Deployment is the final process of creating the application (Fig. 4).

Fig. 4
A process flow of five steps is connected by lines of arrows. 1. Data gathering, 2. Data sanity and preprocessing 3. Model building, 4. Model testing, and 5. Model deployment.

High-level architecture

4.2 Data Requirement and Gathering

The requirements of the data for this product are:

  1. 1.

    The dataset should contain food images.

  2. 2.

    The images should be of the different stages of a recipe.

  3. 3.

    For each stage of a recipe, the dataset should contain images of uncooked, overcooked and perfectly cooked food.

For example, if we are cooking a curry, then the dataset should contain images of all the cooking stages, be it sauteing the vegetables or adding the spices. Moreover, for each stage, let’s say “sauteing the vegetables”, there should be enough images of un-sauteed, burnt and perfectly sauteed vegetables.

One such dataset resource can be the Recipe1M+ dataset which is the most extensive publicly available food dataset.

4.3 Data Sanity and Preprocessing

Once we have collected the image data of various dishes and their cooking stages, it is crucial for us to check if the corresponding data is correct. Some cooking experts can testify and attest to a few samples for their genuineness and correctness in this step. Such activity can help us conclude that the data collected for the model building process is correct.

For example, if we are trying to cook some rice-oriented dish, the domain expert can check and tell based on the images if the rice is overcooked, undercooked or perfectly cooked. If the rice is overcooked, it breaks very easily and creates a slimy appearance. While if the rice is undercooked, then it does not shine white, and finally if the rice is cooked perfectly, then it shines white and grows big from its original size. Hence, such distinguishing factors can help deep learning models classify if the food is cooked correctly or not.

Going further, the preprocessing techniques that can be applied to the images for structuring the same are as follows:

  1. 1.

    Transformations—Such techniques are used to correct distortions or perspective issues in images. There are two types of transformations, affine and non-affine. Affine transformations include scaling, rotation and translation, while non-affine transformations tend to maintain collinearity and ıncidence. Non-affine transformations does not preserve parallelism (Fig. 5).

    Fig. 5
    A pair of illustrations. Illustration a is titled Affine transformation and reads a square of equal sides and a parallelogram. Illustration b is titled Nonaffine transformation and reads a pair of irregular quadrilaterals.

    Affine and non-affine transformations

  1. 2.

    Rotations—This action rotates an image and places it in the correct order. A rotation matrix is used for the same. The angle of rotation is the angle by which the rotation has to happen (Fig. 6).

    Fig. 6
    A pair of illustrations. Illustration a presents an upside down photograph. Illustration b presents a straight vie of illustration a. It is a passport photograph of an adult face without emotion with blonde hair and wearing a jacket with a blurred background. A curved arrow is observed at the top from the illustration a to b.

    Image rotation

  1. 3.

    Scaling and Resizing—Interpolation is used to construct new data points within the range of a discrete set of known data points. Such techniques help to resize images, i.e. zoom in or zoom out as per requirements. Similarly, the Image Pyramid can be used to upscale or downscale images (Fig. 7).

    Fig. 7
    A trio of illustrations. Illustration a is titled scaled down to 30 percent and reads a small teddy bear cartoon with a tie on its neck. Illustration b is titled Original raster illustration and reads a teddy bear cartoon which is a little bigger than illustration a. Illustration c is titled scaled down to 300 percent and reads a big teddy bear cartoon which is larger compared to illustration b.

    Image scaling

  1. 4.

    Cropping Images—Several times, the entire image is not needed, and only a specific portion is required. In such cases, the images can be cropped, excluding the non-required area of the images (Fig. 8).

    Fig. 8
    A pair of illustrations. Illustration a is a square view of 6 colorful and decorated dishes presented in separate bowls. Chopsticks and a few small bowls were observed in the illustration. Illustration b is the cropped format of illustration a. It reads a rectangular view of 4 colorful and decorated dishes presented in separate bowls. A few small bowls were observed in the illustration.

    Image cropping

  1. 5.

    Convolutions and Blurring—Convolution is a mathematical operation performed on two functions producing a third function, typically a modified version of one of the original functions. Usually, a Kernel is defined, which is an n × n matrix that can be run on our image set. Image blurring is an application of convolution where the pixels are averaged within a region (Fig. 9).

    Fig. 9
    A pair of illustrations. Illustration a is a landscape of a tree under a sunny sky. The tree is observed with a big trunk and bushy branches and leaves over the lawn. Illustration b is the blurred view of illustration a.

    Image blurring

  1. 6.

    Sharpening—A similar application of convolution is sharpening. Sharpening strengthens and emphasises the edges of an image. As this involves convolution, a kernel is again used. The sum of the kernel should be 1. If the kernel does not sum up to 1, then the image’s brightness and contrast may change (Fig. 10).

    Fig. 10
    A pair of illustrations. Illustration a is a photograph of the face of a fox starring. The face is observed with two big ears and hazel eyes with blurred background. Illustration b is the sharpened version of illustration a. The features of the face are highlighted more sharply than in illustration a.

    Image sharpening

  1. 7.

    Thresholding and Binarization—Thresholding is an activity where an image is binarised. But before an image is binarised, the image has to be converted to grayscale. Usually, simple thresholding requires us to provide the threshold value, but Adaptive Thresholding auto manages the same and does not ask us to define it (Fig. 11).

    Fig. 11
    A pair of illustrations. Illustration a is a portrait of a young woman posing with a hat on her head in a blurred background. Illustration b is the monochromatic or the greyscaled version of illustration a.

    Image binarization

  1. 8.

    Dilation and Erosion—Dilation is the process of adding pixels to the boundaries of objects in an image, while Erosion is the process of removal of pixels from the edges of the objects in an image (Fig. 12).

    Fig. 12
    An illustration of binarization of an illustration. It reads 4 boxes placed 2 in a row. All the boxes are shaded dark. The boxes on the top with an arrow in between reads dilation. The boxes are observed with grids at the center shaded light. The boxes on the bottom with an arrow in between reads erosion. The grids are lesser than the boxes on the top.

    Dilation and Erosion

  1. 9.

    Edge Detection and Image Gradients—Edge detection is the process of capturing sudden changes in an image. This is a significant step as it helps in feature engineering for deep learning models (Fig. 13).

    Fig. 13
    A trio of illustrations. Illustration a is a greyscale illustration of a cat sitting over a bed and staring, with its eyes highlighted. The cat is observed with fluffy hair and a long tail. Illustration b is a gradient view of the illustration a with intensity changed horizontally. Illustration c is a gradient view of the illustration a with intensity changed vertically.

    Image gradient

4.4 Model Building

One of the most important steps in building an AI-enabled cooking assistant is food ımage processing, which involves showing many food images to a mathematical model that can then classify food images into various subclasses. The proposed product, Kochen Helfer, will categorise any phase of cooking into three categories, namely, “perfectly cooked”, “undercooked” and “overcooked” (Fig. 14).

Fig. 14
A process flow of 7 steps is connected by dotted lines. Each step has overlapping frames that are shaded. In steps 2, 3, and 4 the front frame has 2 highlighted areas. In step 4, the front frame has a single square highlighted. The seven steps are illustration, convolution plus rely, pooling, convolution plus rely, pooling, fully connected, and output.

Process of convolutional neural network

In recent times, convolution neural networks (CNNs) have given us a massive breakthrough in the field of Image analysis. CNN is a class of deep learning neural networks and is most commonly used to analyse visual images and classify them into different categories. A neural network is a system of hardware and/or software that is designed to operate in the way a neuron does in the brain. In CNN, the neurons are arranged more like those in the frontal lobe of the brain, which is an area that deals with the visual stimuli. A CNN is a multilayer system designed in a way to reduce processing. The different layers of a CNN are as follows:

  1. 1.

    Convolution Layer

  2. 2.

    Pooling Layer

  3. 3.

    Fully Connected Layer

  4. 4.

    Normalisation Layer

Unlike any ML algorithm, where we need to show features to the ML algorithm manually, CNN extracts features from the pixels of the image and convolves them into a much simpler set of machine-understandable features. These features can then be used to identify and classify any image that is passed through the network. Training a CNN needs a lot of pictures and thus takes a lot of processing time. Therefore there are many CNN architectures available (e.g. AlexNet, GoogleNet) with pre-trained weights, which can then be fine-tuned and used for any problem statement according to need. This gives better accuracy since the architecture is pre-trained on many images to understand the features of an image and distinctly classify them.

Some examples are Yadav et al. [18], Zhou et al. [19], Simard et al. [20] where SqueezeNet and VGG-16 CNNs are used to classify food images automatically. A similar kind of architecture with some fine-tuning to classify images into “perfectly cooked, undercooked and overcooked” can be used for the proposed product, Kochen Helfer.

4.5 Model Testing

Testing plays a very vital role in any development process. It is the crucial step that decides whether the model is ready to use or requires further development. Hence it is essential to follow the right path for choosing the best tool for testing and creating the test cases.

Testing needs to be done end to end. So it is essential to cover all the scenarios.

These typically include functional testing, usability testing, performance testing, fit and finish testing, regression testing, device specific testing and user acceptance testing.

Below are some points that need to be kept in mind while selecting the test cases:

  • Multiple scripting languages should be supported

  • The application is going to be deployed in multiple mobile applications, so the creation of test scripts in various languages should be possible.

  • Integration of the testing tool with the CI/CD pipeline.

In recent times an automated mobile application testing tool named Appium is a popular tool that can be used to test hybrid or native iOs and Android applications. Appium uses the WebDeriver interface to run the test cases. The reusability of code features for iOS makes the tool robust. Integration with the CI/CD tool is simple.

4.6 Model Deployment

Software deployment can be considered as a combination of processes that makes a software system available to the user experience. Deployment of the webserver into a scalable production environment and deployment to the play store or apple store are the two main components of a deployment process. The webservers help to transfer data to, and fro from the app; hence proper configurations are essential. Similarly, for the app store, we will be required to provide screenshots, marketing material and fill out forms of various stores mobile enterprise application.

Mobile Enterprise Application Platform (MEAP) as middleware platforms can be used to achieve a high level of flexibility. They provide quick deployment procedures with the provision of high-level languages. Most importantly, they help deploy the application once and can be deployed in various mobile device types.

5 User Interaction Interface

5.1 Log In View

The Log In View is the application’s landing page. The main purpose of this view is to get the user registered into the application if it’s a new user or gets the user logged into the application for an already registered user. The view contains a text field for the username and another for the password, along with the login button. Apart from that, there is a register button to get the new users signed up. Pressing the register button opens up a registration form where the new users can enter their name, age, email ıd, phone number and set up a password. After getting registered, the user can log in using their credentials, and on pressing the login button, they are redirected to the homepage view (Fig. 15).

Fig. 15
An application login screen of a mobile. The mobile boundaries are outlined thick. It reads a header, Kochen Helfer, at the top and shaded light. It lists the following options. Log In with a Username tab, a Password tab, Or, a Google logo, Facebook logo, Forgot Password, and Forgot Username.

Login view

5.2 Homepage View

The homepage view is the view that you land on after logging into the application. The primary purpose of this view is to encapsulate all the different features present in Kochen Helfer and to provide a user-friendly ınterface between the user and the functionalities.

This view contains a side panel that contains a menu for all the different functionalities. Next, we have a button to take us to the recipe selection and uploads view. The view also contains a recipe book using which the user can go through any online recipe (Fig. 16).

Fig. 16
A pair of illustrations. Both the illustrations read the home page of an application on mobile. It reads a header Want Amazing Recipes question mark and a Search tab. Food Quality Check with a bucket of french fries, Upload Illustration. Illustration b is the same as illustration a and reads a menu under the search tab that lists kebab, Tandoori kebab, Reshmi kebab, and seekh kebab.

Home page and recipe selection view

5.3 Recipe Selection and Uploads View

The recipe selection page provides you with the different cuisines that can be made with the help of the application. It contains recipes of food from various regions segregated categorically. Once we click on a particular recipe, we are then directed to the instructions that need to be followed for the cuisine. Each step is explained in a very detailed manner.

The upload view feature of the application provides the user to upload the current status of the food being prepared. The primary purpose of this page is to make the experience really interactive and provide a comfortable cooking experience.

5.4 Results View

In the view result section, the evaluation results of the uploaded image are displayed. The interface helps to understand if the particular stage of the food is cooked or if it needs time. Accordingly, it displays “undercooked”, “cooked” and “overcooked”. If it is cooked or overcooked, the next step in the process is displayed on the screen. This helps the user to track their progress in every step (Fig. 17).

Fig. 17
An application result screen of a mobile. The mobile boundaries are outlined thick. It reads a header, Red Sauce Pasta, at the top with an illustration of a dry and burnt penne pasta. It reads Your Food is Overcooked, Start Again exclamation. Three vertical lines, home, and left arrow are observed at the bottom.

Results view

6 Conclusion

As we can see, the proposed approach can help amateurs in cooking while being away from home. Such a product could develop independent individuals who can take up their responsibility. With time, such applications could help amateur individuals with very minimal knowledge of cooking to have expertise in the same.

With the right model in the right place, it could provide quality insights into whether the food being cooked is correctly cooked or not. It could save millions of people from wasting food, given that they have a guide to help them out.

7 Future Work

This project is currently a theoretical concept. The real implementation of this project is yet to take place. In the near future we will be working on the application of the entire Kochen Helfer for machine learning to mobile application building.