1 Introduction

Many people are interested in eating junk food and soft drink which have more sugar content and high calorific value. Due to less exercise and lack of knowledge about the dietary food and uncontrollable eating habits among the people, there is an increase in obesity level. There are several issues related to obesity like hypertension, diabetics, cardiac issues, breathing problems etc. Obesity causes people to have ligament damages in their knee or other joints due to over body weight. They also have breathing issues while walking or climbing staircase and heart gets excess strain in pumping the blood all over the body. A diabetic is a condition in which the insulin production in the human body gets reduced and leads to increase in the sugar content in the blood.

Imbalanced dietary food habit and consuming food with less nutrient and more calorific values are the main reason for obesity. Proper dietary and balanced food with regular exercise can reduce the obesity level and also helps to have a healthy life with normal body mass index (BMI) level. By having measured food volumes which have high nutrient fibre and less calories can help in losing weight. It is very difficult for a person to measure the food volume and have knowledge about the nutrient content in each food. Hence, everyone needs an assistant or system to give information about the food volume and calories and guide them. It is very difficult to have a dietician every time with everyone to guide about the food. They need an automated system which can assist anyone anytime just by using an image of the food and give detail information about its volume and nutrients. Visual perception of an image is mainly based on the colour and its texture. In this proposed image processing system, image resizing, feature extraction, segmentation, and classification are performed. Multilayer perceptron (MLP) is used for classification and based on the food volume the calorific value is calculated.

Digital imaging’s promisingly have better results in recognition food items and calculating food calories over other traditional methods. Recognizing food items and calorie estimation to maintain proper dietary information are still a research challenging task and problem. We proposed an algorithm an improved MLP for recognizing food items with high performance and accuracy. The main objective of the proposed work is to provide computer-based solution to maintain proper dietary intake and BMI.

2 Related work

During the adolescence period, everyone will undergo many changes in both physiological and psychological aspects. Eating habits get modified as they start to decide what they want to eat. It is very difficult for them to maintain proper balanced diet with regular physical activity. Both physical activity and nutrition are linked and helps in maintaining a normal health with reducing health hazards [1].

For any expert dietician, it is very difficult to give the full information about the nutrition value from seeing a plate of food. This is because salt, sugar, fruits, vegetables, meat, oil contents cannot be examined without tasting. But in case of natural foods like vegetables and fruits, it is easy for them to give the nutrient values. Based on the proteins, carbohydrates and fats contents present in the food, its energy or calories are calculated. A balanced diet with required calories will help in maintaining proper BMI. Table 1 shows sample of fruits, vegetables, nuts and their corresponding calories values based on the international food standards [2, 3].

Table 1 Calories, protein and fats present in fruits, vegetables and nuts samples

Generally, a clinical technician or dietician used to monitor the inpatient food and drinks intake and note the dietary information and make a record of them. In case of out-patients, it is not easy to get all the details. Since patient cannot remember all the food and drinks he/she has taken. From this, it is clear that manual approach of calculating the dietary information is very difficult one. Hence, an automated system giving information about the nutrient and calories contents in a food is required. To overcome the drawbacks in the clinical methods, improved methods with fast response are developed by several researchers. Simple image of the food is given to the system which automatically calculates the amount of calories present in that [2,3,4].

Artificial intelligence field is gaining more interest among the researches and used in many real-time practical application. Like humans, the AI system is must take instantaneous solution for a problem. This is achieved by giving required knowledge, data and trains them to solve the problem effectively. Many researchers use ANN, SVM, CNN, KNN, decision tree and other methods to classifying the fruits and vegetables [5,6,7,8].

Amazon Recognition, Vision AI, Computer Vision and Clarifai are some of the deep learning platforms which are developed for identifying or detecting logos, celebrity, emotions, objects, texts, foods, vegetables, places etc. [9]. Convolutional neural network model is used for food classification by providing information such as name of the food, calories and nutrition value [10, 11].

Pro Trip uses the internet technology and helps users in getting personalised recommendation about the demographic information, actions, motivations, climate attributes. Based on the climate attributes, it suggests about the food availability based on the nutritive value [12].

Ancestral traditional food knowledge is very important for maintaining proper health. Food images are captured under proper environment for better quality. The traditional food images are divided into testing, training and validation sets. CNN model is used for performing the automatic recognition and analysis with high accuracy [13].

Machine learning-based automatic classification of food image and feature extraction was performed using convolutional neural network. Here, the food images are sent to the server by the client for decision-making and the feedback information are given to the client [14]. Fast RCNN classifier was used for determining the category labels for region of interest. Later, YOLO framework is used for computation and predicting. Food images are addresses as binary dataset by the neural network model to extract the information for food recognition [15, 16].

Food image and calories recognition is a challenging task. By using the cross-modal alignment and transfer network, food images are synchronised with the base images and visual embedding is implemented to perform the food recognition [17]. Ten class of food images in small scale are used as input for the combinational model of conventional a bag of feature and linear support vector machine for food image recognition [18].

Images of Korean food are resized and divided into training and testing groups. Deep convolutional neural network is used for the food recognition, and the results are tested with other models like AlexNet, GoogLeNet, VGG and ResNet [19].

To segment the food item, affinity propagation and unsupervised clustering method have been adopted. The affinity propagation with agglomerative hierarchical clustering (AHC) obtains 95% of accuracy. The monitoring of ingestive behaviour model has built to monitor and estimate the calories. Food calorie measurement is done with the help of personal digital assistive device in which the food intake details are recorded and estimation of calories is calculated. It is not easy for a patient to take these PDA devices always [2, 20].

The input food images have been taken from Smartphone with single and mixed food items and fed into training and testing. The pre-processing steps are carried first and followed by vision-based segmentation is done and deep learning algorithm applied to estimate calories. In the input food image, chopstick is used as a reference for measurement. The density-based database of the food is considered to evaluate the food volume, weight and calorie estimation. The estimated weight and calories, relative average error rate is 6.65% and 6.70% [21, 22].

The two dataset have been collected and trained with single task and multitask CNN. The above multitask CNN classifier achieved the good result in food identification and calorie estimation than single task CNN. The top 10 Thai curries are considered. The segmented image is fed into the fuzzy logic to identify the ingredients based on their intensities and boundaries. The calories are calculated by sum of all the ingredient calories [23,24,25].

Food shape, size, colour and texture characteristics are used for measuring the nutritional level from the image [26]. The extracted features are given to a support vector machine which calculates the calories by using the nutrition table [27]. Few systems were unable to calculate the nutrient value in case semi-solid food or liquid foods. Mobile phones integrated with the diet data recorder system was capable of calculating the food volume and gave nutrient and dietary information about the food [28].

Calculating the food nutrient value from an image is a tedious process since food actual contents like salt, sugar, oil and other items cannot be measured in that. Based on the food volume portion, the calories and nutrient are matched with the tables. Rotten or spoiled food must be informed to the patient so that they can avoid them. System must give a proper information to the user based on their BMI, that what type of food, quantity they must intake so that they have a balanced diet [29].

The input is given by mixed food items. The input food item applied to multiple-hypotheses image segmentation, followed by feature extraction to detect the local and global features. After extracting features, feed-forward neural network classifier applied to recognize the food items. The output of the experimentation reached 0.947 (MAA) and 0.9599 (SA) accuracy [30].

The food images are collected from the web pages. The dataset with 92,000 images is considered and divided into 23 class foods. The fixed 150 * 150 resolution of the image has been used. The DCCN model is used to perform food item recognition and compared the output results with other networks for better performance [31].

The work is proposed with multi-scale multi-view feature aggregation model for food item recognition. Various features are extracted from supervised CNN like mid level, high level and deep visual features for class representation. MSMVFA scheme with various deep networks achieves the high performance and accuracy [32].

The RNN classification is used with multiple hyperparameters to detect malware. The three feature vector is taken into consideration to perform the implementation. The feature vector RNN with Word2Vec model provides high-performance malware detection and stability [33]. The experimentation is carried out with various machine learning algorithms with 8200 java classes with 14 software systems. The metrics supported are sensitivity and specificity. Based on the comparative study, multilayer perceptron proves the best predictor on change proneness using code smell [34]. Recommendation system has been used mainly in E-commerce applications to find the good product. Here, hybrid action-related K-nearest neighbour similarity (HAR-KNN) method has been used to classify features from quality aspects. The implementation is carried with various metrics like MAE, MSE and RMSE with less error rate and high predictive rate [35].

Several authors with multiple researches work on different pre-processing methods, segmentation, feature selection and various food item recognition techniques such as Bayesian classifier, SVM, neural networks, HMM and SOM for improving the recognition rate and complexity for recognizing food items and calorie estimation is presented with right dietary intake. Research works for recognizing food items, the provided computer-based solution are not completely convincing and accuracy is not appropriate to retain dietary intake.

3 Proposed work

Block diagram of our proposed system is shown in Fig. 1. In our system, image of the fruits / vegetables / food is used as input for the system. The system has two stages first is training and second is testing. In training phase, from the input images required features are extracted and classifier is trained to obtain the datasets. Input image is resized and segmented to obtain the region of interest for which the nutrient value has to be determined. Features like size, shape, texture are extracted from the segmented image by using the multilayer perceptron (MLP), these information are stored as datasets which are used for the real testing phase.

Fig. 1
figure 1

Proposed system architecture

In the testing phase, the images of the food items are given as input and resized. Images are resized to the size of 128 × 128 and they are segmented. Without performing segmentation of the image, it is not possible to process and extract the required features. Entire images with plates containing the food items cannot be used for calculating the nutrient values. So the region of interest has to be segmented using a proper segmentation techniques. Segmented image feature are extracted, and the data obtained from the training phase are used to calculating the calorific value.

Images of the food samples are shown in Fig. 2. Food like apple, banana, bread, guava, pizza and pomegranate are taken as samples for our work.

Fig. 2
figure 2

Sample input images of the food items

Fig. 3
figure 3

Resized and ROI for different samples of banana

Fig. 4
figure 4

Resized and ROI for different samples of apple

Fig. 5
figure 5

Resized and ROI for different samples of bread

3.1 Resizing

The first step is resizing the input food image into the size of 128 × 128 pixels image. The region of interest is obtained, i.e. the edges of the food samples are determined so that the required feature information can be obtained from them. The original image, resized image, edge detected region of interest are shown in Figs. 3, 4, 5, 6, 7, and 8 for the different samples of banana, apple, bread, pomegranate, pizza and guava.

3.2 Feature extraction

The ROI is used here for extracting various features from the image. Feature that are required for calculating the calorific values are size, shape, colour and texture. Here, three different methods are used to extract features of the image.

3.2.1 Scale invariant feature transform

SIFT technique is used for identifying the local features of the image. Here, from the ROI image, the key point and feature vectors are extracted using this SIFT algorithm.

The entire ROI is used here for obtaining the key points. Scale Space extreme detection identifies the key point by using the Gaussian function. Gaussian function is given in Eq. (1). Key point localization method is used to eliminate the low contrast key points from the extracted features,

$$L\left( {x,y,{\upsigma }} \right) = G\left( {x,y,{\upsigma }} \right)*I \left( {x,y} \right)$$
(1)

where \(L\left( {x,y,{\upsigma }} \right)\) is the convolution of the original image \(I \left( {x,y} \right)\) with the Gaussian blur \(G\left( {x,y,{\upsigma }} \right)\) at scale. By using the gradient direction the orientation of the image is modified if required. Image scaling is performed on the oriented image. The Gradient transformation is done using invariance transformation technique as given in Eq. (2)

$$\theta \left( {x,y} \right) = \tan^{ - 1} (\left( {L\left( {x,y + 1} \right) - L\left( {\left( {x, y - 1} \right)} \right)/\left( {L\left( {x + 1, y} \right) - L\left( {x - 1, y} \right)} \right)} \right)$$
(2)

Key point descriptor generates feature vector from the key points. Here, all the regions are supported with 8 orientation points.

3.2.2 Gabor filter method

This method helps to haul out feature vector from the input resized image. A Gabor filter will return an array of Gabor objects which contain the pixels/cycle, i.e. wavelength and orientation. The energy of filtered image is obtained by using Eq. (3).

$$G\left( {x, y, \theta , u, \sigma } \right) = \frac{1}{{2\pi \sigma^{2} }} \left[ {\exp \left\{ { - x^{2} + y^{2} /2\sigma^{2} } \right\} \exp \left\{ {2\pi i \left( {u x \cos \theta + u y \sin \theta } \right)} \right\}} \right]$$
(3)

\(\theta\) represents the orientation of the normal to the parallel stripes of a Gabor function,\(u\) is the phase offset, \(\sigma\) is the sigma/standard deviation of the Gaussian envelope.

3.2.3 Colour histogram

Image Histogram is graphical method which shows the distribution of pixel in an image. Histogram of a digital image with grey levels in the range [0, L−1] represented as a discrete function as given in Eq. (4).

$$h\left( {r_{k} } \right) = n_{k}$$
(4)

where, rk is the kth grey level, nk is the number of pixels in the image having grey level rk.

In colour histogram process, colour distribution of the image is obtained. Pixel is grouped based on the colour zone and shown in the histogram image. In grey level, the images pixels are grouped based on the intensities level and shown in the histogram.

3.3 Segmentation

Dividing an image into equal parts is called as segmentation. This process makes the computation easy. Here, segmentation process divides the image into 4 blocks of equal size such that 4 × 4 segmented image is obtained as a result. Gabor filter in each block gives 6 orientation and 5 scales.

4 Classification

The output of Gabor filter is used for image feature extraction for which decision-making is performed using the classification algorithms. Classification methods are independent of each other, i.e. a change in one training class will not affect the other one. Classification process can be either supervised or unsupervised. Classification contains two phases training phase and testing phase. In training phase, based on the training datasets the reliable class is generated which is used in the testing phase for classification.

Basic algorithm for classification is given as follows:

  1. 1.

    Acquiring multiple images of each fruits / vegetables / food items for training and testing

  2. 2.

    Pre-processing and segmentation of the image for obtaining the ROI

  3. 3.

    Based on the boundaries, extracting the appropriate features

  4. 4.

    Measuring the actual volume of the fruits

  5. 5.

    Training the network by using the real volume as target and obtaining training data sets.

  6. 6.

    Performing the volume estimation on other images and calculating the calories

4.1 Multilayer perceptron

In supervised learning, data sets are already provided and the results are already known. But in case of unsupervised learning, no initial data are provided and based on trial and error methods with several iteration expected result accuracy is obtained. In semi-supervised learning, sample data are provided. The algorithm must learn to use these data and update its parameters while learning process.

Human brain can process a complicated task very easily by storing the information as the pattern and gains the knowledge to solve those complex problems by experience when required. Similarly, multilayer perceptron neural network works like a human brain with feed forward network. MLP uses supervised training algorithm with feed forward network as shown in Fig. 6 having input, hidden and output layers.

Fig. 6
figure 6

Resized and ROI for different samples of pomegranate

Fig. 7
figure 7

Resized and ROI for different samples of pizza

Fig. 8
figure 8

Resized and ROI for different samples of guava

In Fig. 9, each circle represents a network unit. Except the output layer unit, all the units are termed as hidden units of hidden layers for which the input or output is not directly accessible. MLP has the capability to approximate any arbitrary function when proper number of hidden units, activation function and weights are provided. Activation function is computed as given in Eq. (5).

$$x_{i}^{m + 1} \left( n \right) = f\left[ {\mathop \sum \limits_{{j = 1 \ldots N^{m} }}^{M} W_{ij}^{m} x_{j} \left( n \right)} \right]$$
(5)
Fig. 9
figure 9

Multilayer perceptron model

The weightage factor is changed by using error rate as given in Eq. (6).

$${\text{new}}.w_{ji}^{m} = w_{ij}^{m - 1} + \gamma \mathop \sum \limits_{t = 1}^{T} \delta_{i}^{m} \left( n \right)x_{j}^{m - 1} \left( n \right)$$
(6)

Algorithm for MLP.

  1. 1.

    Initialize the neurons and weights

  2. 2.

    Initializing the activation function for hidden and output layers in Eqs. (7) and (8).

    $$z_{h} = {\text{sigmoid}}\left( {\mathop \sum \limits_{j = 0}^{d} w_{hj} x_{j}^{t} } \right)$$
    (7)
    $$y_{i} = {\text{softmax}}\left( {\mathop \sum \limits_{h = 0}^{H} v_{ih} z_{h} } \right)$$
    (8)
  3. 3.

    Weightage function is defined in Eq. (9)

    $$\Delta w_{ih} = \eta \left( {\mathop \sum \limits_{i = 1}^{K} \left( {r_{i}^{t} - y_{i}^{t} } \right)v_{ih} } \right)z_{h} \left( {1 - z_{h} } \right)x_{j}^{t}$$
    (9)
  4. 4.

    Error calculation in Eq. (10)

    $${\text{err}}\left( {{\text{epoch}}} \right) = \frac{1}{2}\mathop \sum \limits_{{x\,\in\,X_{{{\text{Validating}}}} }} \left( {\mathop \sum \limits_{i = 1}^{K} \left( {r_{i}^{t} - y_{i}^{t} } \right)} \right)^{2}$$
    (10)
  5. 5.

    Modified Weightage function in Eq. (11)

    $${\text{new}}{.}w_{ji}^{m} = w_{ij}^{m - 1} + \gamma \mathop \sum \limits_{t = 1}^{T} \delta_{i}^{m} \left( n \right)x_{j}^{m - 1} \left( n \right)$$
    (11)

4.2 SVM

SVM classification also involves training and testing like all supervised networks. Single data in a training set contains several features. SVM feature vectors contain the information like colour, shape, size and texture of the food item. Features extracted during the training phase will be used while performing the testing.

After the identification of region of interest, i.e. the food portion, volume measurement is performed. Both top and side view of the food images are used for calculating the food volume. From the image length, width, height and depth are measured. Area of the food is calculated by using Eq. (12)

$$A = \, \sum_{i = 1} Ti$$
(12)

The entire area and depth of food determine the food volume and it is given in Eq. (13).

$$V = A \, *T$$
(13)

Based on the food type, mass of the food is calculated by the system. Mass of the food is obtained by multiplying the food volume (V) and density (ρ). Equation (14) represents the food mass calculation

$$M = \rho V$$
(14)

Finally, calorie value of the food image is obtained by multiplying the calculated mass of the food with the normalized calorific level.

5 Results

Apple, banana, bread, guava, pizza and pomegranate are the 6 food classes which are considered here. SVM- and MLP-based classifiers are used here and the results are compared. Based on the precision, recall, F-measure the efficiency and accuracy of SVM and MLP methods are analysed and the results are tabulated.

Accuracy represents how close the measurements are to the specific value. Whereas, precision represents how close the measurements values are related to each other. Here, precision is obtained using Eq. (15).

$${\text{Precision}} = \frac{{{{\text{|{ true positive}} |}}}}{{{{\text{|{ true positive} |} +{|{{\text {false positive}}} |}}}}}$$
(15)

Table 2 shows the precision values of MLP and SVM. It is clear that, MLP’s precision rate is higher compared to the SVM. Recall is a measure which clearly refers to how many relevant results are correctly obtained. Recall is obtained from Eq. (16)

$${\text{Recall}} = \frac{{{{\text{|{true positive}|}}}}}{{{{\text{|{ true positive}|} + {|{\text {false negative}} |}}}}}$$
(16)
Table 2 Precision rate

From Table 3, it is very clear that the MLP has the best recall values when compare with SVM. The F-measure is calculated by taking the precision and recall values of each class and is shown in Table 4.

Table 3 Recall rate
Table 4 F-measure

F-measure values calculated with respect to different class values using MLP provides the appropriate results when compared with the various traditional classifiers.

Mean square error helps to detect the number of errors present in the recognized food item. MSE is evaluated as,

$${\text{MSE}} = \frac{1}{MN}\sum\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{N} {\left( {f\left( {i,j} \right) - f^{^{\prime}} \left( {i,j} \right)} \right)} }^{2}$$
(17)

In Eq. (17) \(f\left( {i,j} \right)\) is represented as the original image and \(f^{^{\prime}} \left( {i,j} \right)\) is denoted as the recognized character image. M is the height of the image, and N is the width of the image.

Sensitivity allows how the proposed system classifies the food items. Sensitivity is evaluated as,

$${\text{Sensitivity}} = \frac{{{\text{True}}\,{\text{Positive}}}}{{({\text{True}}\,{\text{positive}} + {\text{False}}\,{\text{Negative}})}}$$
(18)

Specificity classify the negative classifiers during the food item recognition process and it is evaluated as,

$${\text{Specificity}} = \frac{{{\text{True}}\,{\text{Negative}}}}{{({\text{True}}\,{\text{negative}} + {\text{False}}\,{\text{positive}})}}$$
(19)

Table 5 shows the mean square error, sensitivity, specificity performance metrics of the proposed system. Calories estimation of SVM and MLP are shown in Table 6. Calorific values are calculated based on the reference calorific value, calculated mass and original mass of the food items. From Table 6, it is clear that the MLP results are very close to the actual calories values of the food type. SVM calorie value of are very much higher than the actual calories of the samples. When compared to SVM, MLP method was able to get results which are close to the actual calories of the samples.

Table 5 Mean square error, sensitivity and specificity of SVM and MLP
Table 6 Comparison results of calorie estimation

Mean absolute error is measured through the dataset that enhances the computation of the magnitude in the dataset. These values are combined to identify the error variations. The class accuracy is related with the precision, recall and the F-measure values that the incorrectly classified instances are discovered. The classification result according to the anomaly and weighted average through the normal values is demonstrated in Fig. 10.

Fig. 10
figure 10

Classification result

Figure 11 illustrates the overall accuracy generated with the training and the validation parameters into the training process. The proposed framework with false quantization values that simulate the error, the result illustrates that the full precision state of training is combined to identify the high-quality models and the process of generating the false values for computing the classification parameters.

Fig. 11
figure 11

Overall accuracy

Figure 12 demonstrates the confusion matrix of the proposed classification technique that the input class implements on the background process that has been computed using the non-classical components. The generated class with the confusion matrix identifies the other classes through the highest level of spectrum parameters that may not be implemented into the direct classification process.

Fig. 12
figure 12

Confusion matrix

The computation time is the important parameter which has the minimized time that the efficiency of the proposed technique is measured and compared with the relevant technique in Fig. 13.

Fig. 13
figure 13

Computation time

Precision value is calculated with the exactly predicted values through the total amount of values, recall is measured through the positive values of the whole absolute precision values. F1-score is computed through the average weighted values of precision and recall that the false positive values are used to distribute the class values. The accuracy value is measured through the ratio of exactly predicted values according to the total amount of values which is demonstrated in Fig. 14.

Fig. 14
figure 14

Performance comparison

With respect to detailed survey on food item recognition, quality results and better performance are obtained from SVM. So, SVM classifier is taken as the base and the results are compared with improved MLP. From the results attained, it is observed that proposed MLP algorithm provides better performance and classification accuracy when compared to other existing food item recognition techniques.

6 Conclusion

In this research work, food item recognition and calorie estimation are carried out using SVM and improved MLP models. The proposed work is carried out for single food item with various pre-processing techniques, segmentation, and feature extraction. The extracted features are fed into SVM and MLP classifier for recognition. This application helps the diabetic patient to maintain the proper diet and BMI by automatically calculating food item calorie values. Implementation of the proposed method is carried out using MATLAB. Food samples of 6 classes are considered here for calorific value estimation. MLP was capable of obtaining the calories values close to the actual sample values when compared with the SVM. The performance metrics are evaluated in terms of precision, recall, true positive rate, false positive rate, true negative rate and false negative rate measures. Thus, the proposed method achieves the high accuracy and efficient results while recognizing the food items. The future of this work deals with complex food items that are analysed with different directions, features are extracted with optimized techniques and it can be integrated with external devices, clustering techniques and various machine learning models can be used to perform real-time food item recognition to monitor the food intake consumed by the person and report from time to time which leads to free from various diseases.