Keywords

1 Introduction

Today in the twenty-first century, people’s living standards are constantly improving. Health has become a topic of concern. Health involves physical health and healthy diets. To provide proper guidance on people’s diet, the relevant departments of nutrition and diet in China have formulated dietary guidelines according to the actual situation and publicized them to the masses via various publicity methods. For example, the even more popular short videos and Applets often advertise some foods’ nutritional content, calorie content, and consumption methods. Still, it is unknown whether they can provide people with personalized guidance [1, 2]. With the popularization of Artificial Intelligence technology, there is also diverse nutritional diet-related software on the market recommending suitable food combinations through the user’s eating habits. However, most computer-based healthy catering systems aim to cater to particular groups, such as pregnant women, infants, and diabetic patients [3]. The catering process of ordinary people is cumbersome and professional, so it is difficult to judge the rationality of the food collocation recommended by these systems. Therefore, the application of AI, Cloud Computing, and other technologies to the general public’s nutrition, health, and balanced diet has become the concern of researchers in related fields.

A long-term, scientific, reasonable, and balanced diet can keep people healthy and enable them to create more value for society and improve work efficiency, promoting national economic development. With the progress of mobile communication, intelligent terminals are increasing rapidly, and the technologies related to Cloud Computing and Communication are even more widely used [4,5,6]. In the dietary nutrition field, people’s increasing attention to dietary nutrition balance is in sharp contrast to the current situation of dietary nutrition imbalance. Therefore, it is urgent to apply computer technologies to food nutrition identification and health analysis. For example, apply Big Data to food nutrition management, or use Cloud Communication technology to build a cloud platform for food nutrition and diet management. In the food cloud platform, Big Data is of a large volume. It can effectively extract the relevant data in food, such as the picture and weight of food, and analyze the relevant data information with the help of the Cloud Communication platform [7]. Computer technologies face the diversification of nutrition, food types, and nutrients required by users, and the constraints of users’ families, cooking menus, and regions. The convolution calculation unique to Convolution Neural Networks (CNNs) in Deep Learning (DL) can extract and classify food nutrition data features. CNN can also understand the personalized preference of users’ nutritional needs and food nutrition matching through the input of many orders and autonomic learning to realize the intelligent recommendation of food nutrition matching [8, 9]. Digital Twins (DTs) technology can map the food pictures and other data collected in the physical space to the virtual space. Then, it analyzes their nutrition in the cloud platform of the virtual space to achieve the balanced food nutrition intake of users [10].

To sum up, in today’s expected progress of science and technology and social economy, it is of significant practical value to make people’s intake of food nutritionally balanced and healthy to promote social development. The innovation of this study lies in four aspects. Firstly, Cloud Communication technology is adopted to analyze the needs of human nutrition and health. Secondly, the improved AlexNet network extracts and classifies food images. Thirdly, DTs technology is used to map the obtained image information and human nutrition and health data. Fourth, a food DTs nutrition evaluation model is constructed based on Cloud Computing combined with AlexNet. Besides, the objective function of the Sum Throughput Maximization (STM) algorithm is built to optimize the objective function. Finally, the model algorithm is verified by experiments to provide ideas for the follow-up balanced intake and intellectual development in the dietary nutrition field.

2 Recent Related Work

2.1 Development Status of Intelligent Detection of Dietary Nutrition Intake

Regular monitoring of people’s nutritional intake is essential to reduce the risk of disease-related malnutrition, especially for special groups such as pregnant women, infants, and people with diabetes. Although methods for estimating nutrient intake have been gradually developed, there is still a clear need for a more reliable, fully automated technique. Many scholars have researched the intelligent detection of nutrition and intake in food. Jia et al. (2019) employed a self-centered wearable camera to acquire pictures and developed an automatic food detection algorithm via AI from images for dietary estimation. Finally, the authors found that this algorithm could obtain images in the real world and automatically detect food with an accuracy of more than 85% [11]. Liu et al. (2020) proposed a food ingredient joint learning module based on the Attention Fusion Network for fine-grained food and ingredient recognition. The simulation proved that the module could effectively recognize fine-grained images of food [12]. Lu et al. (2020) put forward an AI-based new system to accurately estimate food nutrient exposure by simply processing Red Green Blue Depth image pairs collected before and after meals. The experimental results showed that the estimated nutrient intakes were highly correlated with the ground truth (>0.91), with a tiny average relative error (<20%), significantly better than existing nutrient intake assessment techniques [13]. Rachakonda et al. (2020) proposed a DL model for Edge Computing platforms capable of automatically detecting, classifying, and quantifying food on a user’s plate. Compared with other paradigms, the model achieved an overall accuracy of 98% and an average accuracy of 85.8% [14]. Arslan et al. (2021) investigated food classification using common DL algorithms. The authors took the public food database as the object and found that the classification accuracy reached 90.02%, meeting the practical classification needs [15].

2.2 Application Status and Development Trend of DTs

The DTs technology has been extensively applied in various industries in recent years. It can map the actual data obtained in the industrial field into the virtual space, further increasing its potential. At present, DTs technology also extends to the medical and health areas. Many scholars have studied it. Laamarti et al. (2020) put forward an ISO/IEEE11073-standardized DTs framework, including acquiring information from individual health equipment, analyzing the data, and sending feedback to users in a cyclical fashion. Finally, the authors proved that this framework could be used as the foundation for promoting intelligent medical DTs [16]. Wan et al. (2021) used DTs technology to map physical brain pictures to virtual space for many un data in brain images. On this basis, the authors established a brain image integration DTs diagnosis and forecasting model based on Semi-Supervised Support Vector Machine and improved AlexNet. The experimental results showed that the constructed model had brilliant accuracy, acceleration efficiency, and segmentation and recognition performance under the premise of ensuring low error. The research supported detecting a brain image characteristics and digital diagnosis [17]. Francisco et al. (2021) used intelligent meters to adjust building energy in DTs of smart cities, making smart energy management within the geographical scope of medium and large buildings in the smart city a critical step in smart city construction [18]. Zhang et al. (2021) proposed a social-awareness-based vehicle edge caching mechanism to cope with the rapid development of intelligent transportation. This mechanism dynamically coordinated the capability of roadside units according to user preference similarity and service availability, and the caching capabilities of smart vehicles. At the same time, the edge cache system was mapped into the virtual space using DTs technology to facilitate the construction of a social relationship model. Through simulation, the authors found that the constructed edge cache scheme had significant advantages in optimizing cache utility [19].

2.3 Analysis and Summary of Existing Research

To sum up, through the analysis of the researches of scholars in the above-mentioned related fields, many scholars have carried out intelligent detection on the nutritional components of food. However, there are few studies on dietary intake and nutritional matching. Besides, with the continuous improvement of DTs technology, its application fields have also developed from the initial industrial production to medical health and many other areas. Therefore, this paper also introduced DTs technology to evaluate dietary nutrition using intelligent algorithms like DL to balance food intake.

3 Analysis of the Nutrition Evaluation Model of Food DTs Based on Cloud Computing and DL

3.1 Analysis of Demand for Intelligent Assessment of Food Nutrition

The nutritional needs of the human body all come from food. To ensure the healthy balance required by the human body and achieve reasonable nutrition, people should eat diversified types of food and supplement nutrition according to the standard level. The Dietary Guidelines for Chinese Residents (2019) [20] issued by the Chinese nutrition society points out that eating more green leafy vegetables and fruits, taking cereals as the staple food, and adequately supplementing dairy products, beans, and a reasonable diet with meat and vegetables every day can make the human body carry out normal metabolism. The average level of nutrition required by the human body (average demand), the amount of food recommended by the nutrition standard (recommended daily intake), the proportion of food suitable for intake (adequate intake), and the macronutrients like protein are within the acceptable range. Based on the nutritional needs ratio, macronutrients, namely protein, fat, and sugar, primary elements, trace elements, namely minerals and various vitamins, fat-soluble vitamins, and water-soluble vitamins, have been included in the Dietary Reference Intakes of Chinese Residents by the Ministry of Health. Productive (thermal) nutrients release energy in the human body, including proteins, lipids, and carbohydrates [21]. Figure 1 displays the food nutrition collocation.

Fig. 1
A diagram illustrates of 4 layers of foods with their proportions from top to bottom, condiments, eat less, consumption of egg, milk, and meat, moderate, vegetables and fruits, eat more and cereals, eat most.

Examples of food nutrition collocation

As shown in Fig. 1, human intake primarily includes five categories: spices, eggs, milk, meat, vegetables, fruits, and cereals. Due to the different nutritional content of various foods, the intake is from less to more when matching, but the demand for nutrients of different groups is different. Food recommendation is faced with the diversification of user characteristics, nutrition required by users, food types, food and nutritional components of agricultural products, and the constraints of multi-user families, cooking menus, historical menus, seasons, regions, etc. Therefore, selecting the right food from multiple agricultural products under much agricultural product classification and making intelligent recommendations to users is a practical demand. Based on the food DTs nutrition evaluation model of Cloud Computing combined with DL, this paper collects the information of relevant food data in the human body using Cloud Computing technology and transmits it to the cloud center for processing. Besides, DTs technology is used to map the data information to virtual space to design a multi-objective optimization algorithm for human nutrition demand. The methods designed here can achieve the optimal matching relationship between the nutrition required by the human body and the necessary nutrition in various foods.

3.2 Application of Cloud Computing and Communication Technology to Human Nutrition and Health Analysis

As one of the essential branches of the communication network, the Wireless Body Area Network (WBAN) takes the human body as the network information environment. It is a communication network composed of light, ultra-thin, low-power, and intelligent sensors located on the body surface or in the body. The node transmits the critical physiological parameters obtained by the WBAN in real-time to the health monitoring center and the Cloud Computing center through the wireless communication network to realize data sharing [22, 23]. Here, the WBAN is applied to human health analysis.

A WBAN is a wireless communication network constructed by multiple sensor nodes located on the surface, inside and outside the human body. The whole system adopts the operating mode of first distributed perception information, then aggregated processing and remote analysis. The specific WBAN communication architecture is divided into three-layer: communication within the WBAN, communication between the WBAN, and communication outside the WBAN, as shown in Fig. 2.

Fig. 2
A diagram illustrates the relationship between communication outside the body area network, inter-body area network communication, and intra-body area network communication.

WBAN communication architecture under cloud communication

As shown in Fig. 2, in the WBAN communication, various sensors (such as EEG, ECG, blood pressure sensors, etc.) distributed in the human head, limbs, and trunk communicate with the sink node in real-time. By designing a network structure suitable for the scene, they choose the corresponding communication protocol according to the receiver structure. In interbody area network communication, a Personal Digital Assistant obtains human physiological information shares information and preliminarily integrates and analyzes data with other wireless body area networks (including home data terminals) through Bluetooth or Wi-Fi communication. The communication outside the Body Area Network shares the human physiological data to hospitals, intelligent homes, smart cars, etc., through the straightforward base station and the Internet to provide people with various services. It even analyzes human data through Big Data and Cloud Computing to provide high-quality dietary balance and human health decisions.

Due to the dynamic characteristics of the human body (such as body posture, muscle movement), shadow effect, and the different environment of WBAN, the typical path loss model PL can be described as Eq. (1).

$$ {P}_L= PL(d)+X $$
(1)

In Eq. (1), according to the Fries Transfer Formula in the free space, PL(d) refers to the path loss corresponding to the reference distance d to the node, and X stands for the shadow factor of a normal distribution with zero mean and standard deviation. The shadow factor X is determined according to different channel environments.

In body-surface communication, the channel loss function PdB(d) between body-surface nodes is constructed as Eq. (2).

$$ {P}_{dB}(d)={P}_{0, dB}+10n\log \left(d/{d}_0\right) $$
(2)

In Eq. (2), n represents the channel loss index, d0 denotes the reference distance, and P0,dB corresponds to the initial path loss of the reference distance d0.

However, the non-uniform and lossy communication transmission medium composed of human tissue layers is not uniform. Each tissue layer will absorb the electromagnetic energy of the sensor, significantly reducing the signal transmission in the body. Therefore, the channel loss model Pd between receivers and transmitters in vivo communication can be expressed:

$$ {P}_d=\frac{P_t}{P_r}={P}_0{\left(d/{d}_0\right)}^n{\chi}_r $$
(3)
$$ {P}_d(dB)={P}_0(dB)+10n\log \left(d/{d}_0\right)+{\chi}_r(dB) $$
(4)

where Pt and Pr represent the transmitted signal power and received signal power in the communication link, respectively; P0 refers to the initial path loss corresponding to the reference distance d0; \( {\chi}_r\sim N\left(0,{\sigma}_{\chi}^2\right) \) refers to the usual random variables with mean 0 and variance \( {\sigma}_{\chi}^2 \), that is, the deviation caused by different tissue layers of the human body (such as skin, muscle, fat, etc.) and antenna gain in different directions.

When monitoring human health, WBAN can realize real-time monitoring of critical physiological parameters of the human body through the transmission and collection of radio frequency energy and the acquisition of information data. Among them, the energy and information transmission models of WBAN principally include a point-to-point transmission model and a cooperative transmission model with relay nodes, as presented in Fig. 3.

Fig. 3
2 diagrams a and b, each has a human body as source node p and sensor s inside a cloud. In a, r f energy flows from source node p to sensor s and information flows from s to node p. In b, the r f energy and information from the specified source pass through a relay node r before it reaches the recipient. R F energy flow labeled h and information flow labeled g.

Energy and information transmission model of WBAN under cloud communication (a. point-to-point transmission model; b. cooperative transmission model with relay nodes)

In Fig. 3, h and g refer to the channel coefficients of the downlink and uplink, respectively. The point-to-point energy and information transmission model is the most basic and simple transmission model in the WBAN. It is suitable for scenes with close distance and little shadow effect interference. However, in the scenario of long-distance, the point-to-point direct link will affect the system’s overall performance. The relay node needs to forward energy and information to solve the problems of human dynamics and shadow effect to ensure the efficiency and reliability of system information transmission.

In the point-to-point transmission model, the SN P simultaneously transmits the energy flow carrying the command information to the sensor S. After passing through the power splitter at the receiving end, a part of the wireless signal is used for energy collection. The signal ys received by the sensor node is expressed as Eq. (5).

$$ {y}_s=\sqrt{P_a} hx+{n}_s $$
(5)

In Eq. (5), Pa and na refer to the transmit power and antenna noise of sensor S, respectively; xa refers to the signal sent by the sensor node when the wireless signal is used for energy harvesting. The energy E collected by the sensor can be written as Eq. (6).

$$ E=\eta \rho {P}_a{\left|h\right|}^2\left(T/2\right) $$
(6)

In Eq. (6), ρ(0 < ρ < 1) and η(0 < η < 1) refer to the power split ratio and energy conversion efficiency, respectively; T stands for the time period of information transmission. Another part of the wireless signal is used for information decoding. The signal ys' received by sensors and the signal xs sent by sensors can be written as:

$$ {y_s}^{\hbox{'}}=\sqrt{1-\rho}\left(\sqrt{P_a}h{x}_a+{n}_s\right) $$
(7)
$$ {x}_s=\sqrt{\mu}\left(\sqrt{1-\rho}\left(\sqrt{P_a}h{x}_a+{n}_s\right)+{n_s}^{\hbox{'}}\right) $$
(8)

where \( {n_s}^{\hbox{'}}\sim N\left(0,{\sigma}_n^2\right) \) stands for the additional processing noise on the sensor node. μ refers to the amplification and forwarding coefficient, as shown in Eq. (9).

$$ \mu =\eta \rho {P}_a{\left|h\right|}^2/\left(1-\rho \right){P}_s{\left|g\right|}^2+{\sigma}_n^2 $$
(9)

In Eq. (9), Ps refers to the received power of the sensor S.

Moreover, after the sensor S decodes the information and receives the energy, it sends the sensed information back to the SN P to complete the information transmission for some time T. After the amplification and forwarding coefficient μ is energy balanced, the signal received by the SN P is ya expressed as Eq. (10).

$$ {y}_a=g\sqrt{\mu}\left(\sqrt{1-\rho}\left(\sqrt{P_a}h{x}_a+{n_s}^{\hbox{'}}\right)+{n}_a\right) $$
(10)

In Eq. (10), na refers to the antenna noise of SN P following a normal distribution with mean 0 and variance \( {\sigma}_a^2 \). Finally, the throughput from sensor S to SN P is expressed as Eq. (11).

$$ R=\frac{1}{2}{\log}_2\left(1+\gamma \right) $$
(11)

In Eq. (11), γ refers to the Signal to Noise Ratio (SNR) from the sensor S to the SN P in the point-to-point transmission model.

In the relay cooperative transmission model, the relay node R is used to amplify and forward the wireless energy and information from the SN P and the sensor S. Usually, there are two power splitting and time switching protocols for the transmission of energy and information.

In the power-sharing protocol, the SN P and the sensor S transmit wireless energy and information to the relay node in the T/2 time slot. Then, the signal received by the relay node yr can be described as Eq. (12).

$$ {y}_r=\sqrt{P_a}h{x}_a+\sqrt{P_s}g{x}_s+{n}_r $$
(12)

In Eq. (12), nr refers to the noise generated by the relay node. Another part of the signal yr received through power splitting is expressed as Eq. (13).

$$ {y_r}^{\prime }=\sqrt{1-\rho}\left(\sqrt{P_a}h{x}_a+\sqrt{P_s}g{x}_s+{n}_r\right) $$
(13)

The relay node forwards the information and energy to the SN P and the sensor S in the remaining time slots, respectively. Equation (14) indicates the signal ya,ps received by the SN P.

$$ {y}_{a, ps}=h\sqrt{\beta_{ps}}\left(\sqrt{1-\rho}\sqrt{P_s}g{x}_s+{n_r}^{\prime}\right)+{n}_a $$
(14)

In Eq. (14), \( {n_r}^{\hbox{'}}\sim N\left(0,{\sigma}_r^2\right) \) refers to the additional processing noise of the relay, and βps represents the balance coefficient to ensure energy collection and consumption, as shown in Eq. (15).

$$ {\beta}_{ps}=\eta \rho {P}_a{\left|h\right|}^2/\left(\left(1-\rho \right){P}_s{\left|g\right|}^2+{\sigma}_r^2\right) $$
(15)

Then, the SNR generated on the SN P can be written as Eq. (16).

$$ {\gamma}_{a, ps}=\frac{\left(1-\rho \right){P}_s{\left|h\right|}^2{\left|g\right|}^2}{{\left|h\right|}^2{\sigma}_r^2+{\sigma}_a^2\left(\left(\left(1-\rho \right){P}_s{\left|g\right|}^2+{\sigma}_r^2\right)/\left(\rho {P}_a{\left|h\right|}^2\right)\right)} $$
(16)

In the power-sharing protocol, in the first time slot τT, the SN P transmits wireless energy to the relay node. 0 ≤ τ ≤ 1 means the time switching ratio. Equation (17) demonstrates the total energy Er,ts collected by the relay node.

$$ {E}_{r, ts}=\eta {P}_a{\left|h\right|}^2\tau T $$
(17)

Furthermore, in the second time slot (1 − τ)T/2, the information yr,ts received by the relay node from the sensor is expressed as:

$$ {y}_{r, ts}=\sqrt{P_s}g{x}_s+{n_r}^{\hbox{'}} $$
(18)

where \( {n_r}^{\prime}\sim N\left(0,{\sigma}_r^2\right) \) refers to the noise on the relay node. The relay node forwards the amplified signal to the SN P; meanwhile, the sensor decodes information and energy collection. Equation (19) describes the balance coefficient βts satisfying the energy collection and consumption.

$$ {\beta}_{ts}=2\tau {P}_a{\left|h\right|}^2/\left(\left(1-\tau \right)\left({P}_s{\left|g\right|}^2+{\sigma}_r^2\right)\right) $$
(19)

Similar to the analysis of the power shunt protocol, the SNR γa,ts generated by the AP SN is presented as Eq. (20).

$$ {\gamma}_{a, ts}=\frac{P_s{\left|h\right|}^2{\left|g\right|}^2}{{\left|h\right|}^2{\sigma}_r^2+{\sigma}_a^2\left(\left(\left(1-\tau \right)\left({P}_s{\left|g\right|}^2+{\sigma}_r^2\right)\right)/\left(2\tau \eta {P}_a{\left|h\right|}^2\right)\right)} $$
(20)

The final throughput Rts from the sensor to the SN P is:

$$ {R}_{ts}=\frac{1-\tau }{2}{\log}_2\left(1+{\gamma}_{a, ts}\right) $$
(21)

3.3 Application of DL in Food Nutrition Evaluation and Analysis

DL is a typical feature extraction algorithm. CNN is a booming Feedforward Neural Network with the optimal effect. Its most significant superiority lies in local connectivity and sharing weights. A large number of neurons in the model are organized in a particular pattern and have responded to the overlapping areas in the visual field [25]. The first parameter of the CNN’s operation refers to the input, the second parameter (function w) refers to the kernel function, and the output refers to the characteristic mapping [26]. Generally, CNNs conduct convolutions in multiple dimensions. Denote the input as a two-dimensional matrix I. Then, a two-dimensional kernel K is used, as shown in Eq. (22).

$$ {\displaystyle \begin{array}{c}S\left(i,j\right)=\left(I\cdot K\right)\left(i,j\right)\\ {}=\sum \limits_m\sum \limits_nI\left(m,n\right)K\left(i-m,j-n\right)\end{array}} $$
(22)

In Eq. (22), i, j, m, and n are all fixed parameters, referring to the dimension and order of the matrix. When CNN is applied to food image feature extraction and classification, the food image information in CNN intersects several convolutional and pooling layers and gathers together by one or more fully connected layers. All neurons in the layers are connected to the neurons in the previous layer. Usually, the establishment of this layer relies on two one-dimensional network layers. The local information of the convolutional layer or the pooling layer is grouped. The ReLU function is usually used as the excitation function by all neurons to improve the performance of the network.

AlexNet [27], a deep CNN model with more network layers and more vital learning ability is selected in this paper to simply the computation and enhance the generalization ability of CNN. Figure 4 reveals the process of extracting and classifying DTs data features of food images based on AlexNet.

Fig. 4
A flow diagram begins with input layer, convolution layer followed by local normalization, pooling layer, and fully connected layers leads to SoftMax. SoftMax has hidden layer and output layer.

Extraction and classification process of DTs data features of food images via AlexNet

In Fig. 4, the convolution operation is performed on the food image first when extracting and classifying the DTs data of food images. Then, the local normalization, pooling, and full connection operations are performed.

Moreover, the functional layer of the AlexNet model’s convolutional layer is improved. Specifically, the local normalization and pooling operation operations in order are reversed. It can enhance the generalization ability of AlexNet and weaken the overfitting phenomenon, significantly reducing the training time. In addition, local normalization after Overlapping Maximum Pooling (OMP) can retain more useful information and drain redundant information in the pooling process. It also speeds up the convergence of the model training and highlights the advantages of OMP over the existing MP approaches. The calculation of each layer is as follows.

The overlapping pooling method samples the t-th characteristic mapping \( {y}_t^l\left(i,j\right) \) of the l-th convolutional layer.

$$ {\displaystyle \begin{array}{c}{a}_t^l\left(i,j\right)=\max \Big\{{y}_t^l\left(i,j\right),{i}_s\le i\le {i}_s+{w}_c-1,\\ {}{j}_s\le j\le {j}_s+{w}_c-1\Big\}\end{array}} $$
(23)

In Eq. (23), s stands for the pooling movement step size, wc represents the pooling area’s width, and wc > s.

The AlexNet model performs the first and second pooling operations. Then, a local normalization layer is introduced to normalize the feature map according to Eq. (24).

$$ {c}_t^l\left(i,j\right)={a}_t^l\left(i,j\right)/{\left(k+\alpha \sum \limits_{\max \left(0,t-m/2\right)}^{\min \left(N-1,t+m/2\right)}{\left({a}_t^l\left(i,j\right)\right)}^2\right)}^{\beta } $$
(24)

In Eq. (24), k, α, β, and m are hyperparameters, the values of which are 2, 0.78, 10−4, 7, respectively. N denotes the amount of convolution kernels of the first convolutional layer. The ReLU function is used to activate the convolutional product \( {S}_t^l\left(i,j\right) \) to avoid gradient dispersion in the network model. The activation func-tion \( {y}_t^l\left(i,j\right) \) can be expressed as Eq. (25).

$$ {y}_t^l\left(i,j\right)=f\left({S}_t^l\left(i,j\right)\right)=\max \left\{0,{S}_t^l\left(i,j\right)\right\} $$
(25)

The Dropout Operation’s parameters are set to 0.5 to prevent overfitting in the fully connected layer. All food feature maps are integrated into a high-dimensional single-layer neuron architecture C5. Equation (26) describes the input Zi6 of the i-th neuron in the sixth fully connected layer.

$$ {Z}_i^6={W}_i^6{C}^5+{b}_i^6 $$
(26)

In Eq. (26), bi6 and Wi6 represent the sixth fully connected layer’s bias and weight of the i-th neuron, respectively.

The neurons Cl in the 6th and 7th fully-connected layers are abstained and output in the process of generalization capability improvement; \( {r}_j^l\sim bernoulli(dp) \); \( {\tilde{C}}^l={r}^l{C}^l \). Then, the input \( {Z}_i^{l+1} \) of the i-th neuron in the seventh and eighth fully-connected layers is \( {W}_i^{l+1}{\tilde{C}}^l+{b}_i^{l+1} \). The output \( {C}_i^l \) of the i-th neuron of the 6th and7th fully-connected layers is \( f\left({Z}_i^l\right) \), i.e., \( \max \left\{0,{Z}_i^l\right\} \). Finally, the input qi of the first neuron of the eighth fully connected layer can be obtained according to Eq. (27).

$$ {q}^i= soft\max \left({Z}_i^8\right)=\frac{e^{Z_i^8}}{\sum_{j=1}^{12}{e}^{Z_i^8}} $$
(27)

Meanwhile, the Cross-Entropy Loss Function (CELF) for the food image classification problem is used as the model’s error function, which can be written as Eq. (28).

$$ Loss=\sum \limits_{i=1}^K{y}_i\cdot \log \left({p}_i\right) $$
(28)
$$ {p}_i=\frac{\exp \left({\tilde{y}}_i\right)}{\sum_{i=1}^K\exp \left({\tilde{y}}_j\right)} $$
(29)

In Eq. (28), K stands for the quantity of types; yi denotes the accurate class distribution of the samples; \( {\tilde{y}}_i \) refers to the CNN’s output; pi signifies the sort results after the Softmax categorizer. The Softmax function’s input is an N-dimensional real vector, denoted as x, as presented in Eq. (30).

$$ \xi {(x)}_i=\frac{e^{x_i}}{\sum \limits_{n=1}^N{e}^{x_i}},i=1,2,\dots, N $$
(30)

Furthermore, the essence of the Softmax function is analyzed. This function maps an N-dimensional random real vector into an N-dimensional vector with the values of each element within (0, 1) to realize the normalization of the vector. The μ companding transformation reduces the output data volume to 28 to simplify the neural network’s computation, i.e., μ = 255, to improve the model’s forecasting ability.

$$ f\left({x}_t\right)=\mathit{\operatorname{sign}}\left({x}_t\right)\frac{\ln \left(1+\mu \left|{x}_t\right|\right)}{\ln \left(1+\mu \right)},\left|{x}_t\right|<1 $$
(31)

3.4 Construction and Analysis of the DTs Evaluation Model of Food Nutrition and Health Based on Cloud Computing and AlexNet

Firstly, human nutrition and health needs are analyzed through cloud communication technology to realize human nutrition and health needs and multi-targeted personalized intelligent food recommendation. The improved AlexNet is used to extract and classify the features of food images. Then, DTs technology maps the obtained image information and human nutrition and health data to the virtual space to intelligently recommend methods suitable for human nutrition and health. Figure 5 illustrates the food nutrition and health DTs evaluation model based on Cloud Computing and AlexNet.

Fig. 5
A flow diagram has 3 blocks form top to bottom, physical space, virtual space, and data fusion layer. Physical space has body area network cloud communication and improvement A e I x net. Virtual space has human health data and food nutrition data. Data fusion layer has 5 steps, data denoising, feature extraction, feature fusion, data transformation, and data reorganization. After evaluation it leads to healthy recipes.

Food nutrition and health DTs evaluation model based on Cloud Computing and AlexNet

As shown in Fig. 5, this model adopts Cloud Computing, and Communication technology are to obtain human health data. Then, the improved AlexNet extracts and classifies food nutrition data. The human health and food nutrition data obtained in the real space are mapped to the virtual space via DTs technology. The obtained data results are finally evaluated through data fusion and classification transmission to determine the healthy recipes suitable for human health.

When using Cloud Communication technology to acquire and analyze human health data, the proposed time switching protocol constructs the overall throughput objective function of multipoint WBAN. It presents the objective function of STM optimization. The overall throughput of the system is:

$$ \left(\textrm{P}1\right):\underset{\tau_0,{\tau}_i}{\max}\sum \limits_{i=1}^K{R}_i^{ab}\left({\tau}_0,{\tau}_i\right)=\underset{\tau_0,{\tau}_i}{\max}\frac{1}{2}\sum \limits_{i=1}^K{\tau}_i{\log}_2\left(1+\frac{X_i^{ab}\frac{\tau_0}{\tau_i}}{Y_i^{ab}\frac{\tau_0}{\tau_i}+{Z}_i^{ab}}\right) $$
(32)
$$ s.t.\sum \limits_{i=1}^K{\tau}_i\le 1,{\tau}_i\in \left[0,1\right],i=0,1,\dots, K $$
(33)

where \( {R}_i^{ab} \) represents the throughput from the sensor Si to the SN P under abnormal conditions; \( {X}_i^{ab} \), \( {Y}_i^{ab} \), and \( {Z}_i^{ab} \) are different parameters in the objective function, respectively; τ0 refers to the initial energy transmission time slot; τi stands for the energy transmission time slot at time i.

In this model, a large amount of food image data is used to improve AlexNet for self-learning to accurately learn the personalized preferences matching the nutritional needs of users with the nutrition of agricultural products. When using the enhanced AlexNet for training, the Weighted Cross-Entropy is used as the cost function to optimize the model training. Denote zk(x, θ) as the unnormalized logarithmic probability value of the pixel x of the k-th category of the food image with a specific parameter θ. The Softmax function pk(x, θ) is decided as Eq. (34).

$$ {p}_k\left(x,\theta \right)=\frac{\exp \left\{{z}_k\left(x,\theta \right)\right\}}{\sum \limits_{k\hbox{'}}^K\exp \left\{{z}_{k\hbox{'}}\left(x,\theta \right)\right\}} $$
(34)

In Eq. (34), K represents the amount of food categories. In the prediction stage, when Eq. (34) takes the maximum value, the pixel x is marked as the k-th class, i.e., k =  arg  max {Pk(x, θ)}. The log-normalized probability value is abbreviated as pik. Thus, the training focuses on locating the best network parameters θ* by minimizing the weighted CELF ℓ(x, θ), i.e., \( {\theta}^{\ast }=\underset{\theta }{\min}\ell \left(x,\theta \right) \). The Weighted Loss Function for food picture integration is defined as Eq. (35).

$$ \ell \left(x,\theta \right)=-\sum \limits_{i=1}^N\sum \limits_{k=1}^K{w}_{ik}{q}_{ik}\log {p}_{ik} $$
(35)

In Eq. (35), qik = q(yi = k| xi) refers to the actual label distribution of the pixel xi of the k-th category; wik represents the weighting coefficient. In the training process, the calculation strategy shown in Eq. (36) is adopted.

$$ {w}_{ik}=\frac{1}{\ln \left(c+{p}_{ik}\right)} $$
(36)

In Eq. (36), c stands an extra super parameter, empirically set to 1.11 during this experiment.

Figure 6 indicates the training process of the improved AlexNet algorithm applied to the food image DTs model

Fig. 6
A 15-step training flow algorithm.

Training flow of the improved AlexNet algorithm applied to the food image DTs model

3.5 Experimental Testing and Evaluation

This section evaluates the performance of the constructed food nutrition and health DTs evaluation model based on Cloud Computing combined with AlexNet. Firstly, the analysis effect of human health data under Cloud Communication technology is evaluated. The human body path loss model is chosen to be located on the torso of the human body. The channel reciprocity between uplink and downlink makes the channel coefficients the same for uplink and downlink. The objective function optimized by the STM algorithm reported here is compared with the Equal Time Allocation (ETA) algorithm [28] and the Minimum Throughput Maximization (MTM) algorithm [29]. These algorithms are compared from the changing curve of the overall throughput of the system with the transmission power (TP) Pa of the source node (SN) P, the number of sensors S, and the energy conversion efficiency η. The changing trend of the energy transmission time slot τi with the number of sensors is analyzed. In addition, various algorithms are analyzed in terms of fairness index.

The improved AlexNet proposed here is also evaluated from the food nutritional characteristics data. This study collects about 1284 kinds of food nutrition components from the Chinese Food Nutrient Composition Table as the data of this experiment. After collecting the nutritional data of agricultural products, the data is processed to remove some irrelevant data and parameters. Besides, the food data is divided into five categories (staple food, vegetables, meat, aquatic products, and fruits) according to the nutritional balance guidelines. There are about 200 pieces of data for the experiment after processing. A unique Identity Document is assigned to each data to be distinguished by food category. For the neural network, the following hyperparameters need to be set: the network iterates for 120 times, the simulation time is 2000 s, and the Batch Size is 128. The Poly learning rate adjustment method is adopted as the learning rate update strategy using Polynomial Decay,expressed as \( init\_ lr\times {\left(1-\frac{epoch}{\max \_ epoch}\right)}^{power} \). The initial learning rate init _ lr is0.0005 (or 5e−4), and the power is set to 0.9.

The algorithm reported here is compared with the algorithms applied by other scholars in related fields for the performance analysis, including Long Short-Term Memory (LSTM) [30], CNN [31], Recurrent Neural Network (RNN) [32], AlexNet [33], and Multilayer Perceptron (MLP) . The classification accuracy is compared and analyzed from the perspectives of Accuracy, Precision, Recall, and F1 value. The time required by different models is also examined.

The specific experimental configuration primarily considers hardware and software, as summarized in Table 1.

Table 1 Experimental configuration

4 Research Results

4.1 Comparative Analysis of Human Health Data Evaluation under Different Cloud Communication Technologies

The objective function optimized by the STM algorithm reported here is compared with the ETA algorithm and the MTM algorithm. This experiment compares the changing curves of the overall throughput of the system with the TP Pa of the SN P, the number of sensors S, and the energy conversion efficiency η. The changing trend of the energy transmission time slot with the change of the number of sensors is also discussed. In addition, various algorithms are analyzed from the perspective of fairness. The results are shown in Figs. 7, 8, and 9. Two repeated experiments for each algorithm are conducted separately.

Fig. 7
Line graphs a and b of the overall throughput versus transmit power and eta for S T M 1, S T M 2, M T M 1, M T M 2, E T A 1, and E T A 2. Each graph has increasing trends.

Variation curves of the overall throughput varying with the TP (mW) of the SN and energy conversion efficiency η (a. the overall throughput varying with the TP of the SN; b. the overall throughput varying with the energy conversion efficiency η)

Fig. 8
Line graphs a and b of overall throughput and tow i versus number of sources Line graphs a and b of the overall throughput versus transmit power and eta for S T M 1, S T M 2, M T M 1, M T M 2, E T A 1, and E T A 2. Graph 1 has increasing trends. Graph 2 has decreasing trends.

Variation curves of overall throughput and energy transmission time slot with the number of sensors (a. The overall throughput changing with the number of sensors; b. the energy transmission time slot changing with the number of sensors)

Fig. 9
Line graphs a and b of average throughput versus transmit power and the number of sensors for S T M 1, S T M 2, M T M 1, M T M 2, E T A 1, and E T A 2. Graph 1 has increasing trends. Graph 2 has decreasing trends. Grouped bar chart c of fairness index versus the number of sensors plots the m t m column at its highest at 0.5 at 2.

Results of the average throughput varying with different factors (a. Average throughput varying with the TP of the SN; b. average throughput varying with the number of sensors; c. the fairness index varying with the number of sensors)

As shown in Fig. 7a, as the TP of the SN P increases, the overall throughput of different algorithms grows. This is because the greater the transmit power, the more broadcast energy of the SN, resulting in a smooth rise in the overall throughput. In addition, according to the international standard of WBAN, the TP cannot exceed one mW; otherwise, it will cause damage to the human body. The TP of the SN ranges from 0 mW to 1 mW. The throughput performance of the STM algorithm reported here is significantly better than that of the MTM algorithm and the ETA algorithm. The throughput performance of the STM algorithm proposed here is considerably better than that of the MTM algorithm and ETA algorithm. The overall throughput of the ETA algorithm is lower, and the difference between them is no more than 2 bps/Hz. According to Fig. 7b, the overall throughput of different algorithms increases smoothly with the growth of energy conversion efficiency η. It is possible that the higher the energy conversion efficiency, the more energy the sensor can harvest for energy harvesting, resulting in greater throughput. In addition, it can be found that the throughput performance of the STM algorithm reported here is significantly better than that of the MTM algorithm and the ETA algorithm. The ETA algorithm has the lowest overall throughput. Besides, with the increase of energy conversion efficiency η, the overall throughput gap between the STM algorithm and ETA algorithm becomes larger. When η is 1, the throughput difference between the two is about 1.5 bps/Hz.

Figure 8a suggests that the overall throughput of each algorithm increases significantly as the number of sensors grows. In addition, the overall throughput performance of the STM algorithm reported here is substantially better than that of the MTM algorithm and the ETA algorithm. The overall throughput of the ETA algorithm is the lowest. With the increase of sensors, the overall throughput gap between STM algorithm, MTM algorithm, and ETA algorithm become smaller. When the number of sensors is 7, the throughput gap between the two does not exceed 0.5 bps/Hz. According to Fig. 8b, as the number of sensors increases, the energy transmission time slots τi of the MTM algorithm and the ETA algorithm show a downward trend, sharply affected by the number of sensors. However, the energy transmission time slot τi of the STM algorithm proposed reported here does not decrease significantly, almost stable within a specific range. The increase in the number of sensors implies that more information transmission time slots are required, reducing the energy transmission time slots within the normalized time slot T. This explains why the energy transmission time slots τi for each algorithm are reduced. Therefore, the STM algorithm can be kept within a certain range (τi is around 0.05) to ensure the stability of energy transmission

As shown in Fig. 9a, with the increase of the TP of the SN, the average throughput of the system under different algorithms increases smoothly. This is consistent with the reason that overall throughput increases with transmit power. In addition, the STM algorithm reported here produces significantly better throughput than the MTM algorithm and the ETA algorithm. With the increase of the TP of the SN, the difference between the MTM and ETA algorithms gradually reduces. According to Fig. 9b, as the number of sensors increases, the average throughput decreases, which just verifies the phenomenon presented in Fig. 8b. The throughput of each sensor decreases due to the decrease of energy transmission time slots due to the increase of the number of sensors. In addition, the STM algorithm has a significantly higher average throughput than the MTM algorithm and ETA algorithm. The change of fairness index with the number of sensors is shown in Fig. 9c. It can be found that the fairness index of the MTM algorithm decreases gradually with the uneven resource allocation caused by the addition of new sensors to the network. The fairness index of the STM algorithm is one and remains unchanged, making the resource allocation fair to obtain the same throughput to maximize the average throughput. The Equal Time Distribution Protocol assigns the same information and energy transmission time slot to all sensors. Therefore, the fairness index of the ETA algorithm decreases slightly with the increase of sensors, which is lower than that of the MTM algorithm. Thus, in terms of resource fairness, the STM algorithm is better than the ETA algorithm and MTM algorithm, which has better stability and better effect in analyzing human health data.

4.2 Prediction Performance Analysis of Food Nutrition Classification Under Different Algorithms

This test analyzes the food nutrition recognition accuracy of the food nutrition and health DTs evaluation model based on Cloud Computing combined with AlexNet constructed here. The recognition and prediction accuracy of the algorithm reported here is compared with LSTM, CNN, RNN, AlexNet, and MLP from Accuracy, Precision, Recall, and F1 value, respectively. The results are shown in Fig. 10. Figure 11 further compares the training and test time required by each algorithm.

Fig. 10
Grouped bar charts a to d of accuracy, precision, recall, and f 1 versus iterations. Each one has parameters labeled the proposed algorithm, AlexNet, l s t m, c n n, r n n, and, m l p that reach their highest values at an iteration range of 80 to 120.

Recognition results of different algorithms as the number of iterations increases (a. Accuracy; b. Precision; c. Recall; d. F1 value)

Fig. 11
Line graphs a and b of time versus data volume. Both graphs have increasing trends.

Comparison of the running time of different algorithms under the training set and test set under different food image data amounts (a. training set; b. testing set)

Figure 10 shows the comparison results of the model constructed here and the models proposed by scholars in other related fields from the perspectives of Accuracy, Precision, Recall, and F1 value. It can be found that the Accuracy of this algorithm reaches 93.17%. Compared with other neural network algorithms, the Accuracy is increased by at least 2.01%. Meanwhile, it is evident that the algorithm reported here has the highest values of Precision, Recall, and F1. Compared with other algorithms, the difference is at least 3.24%. In conclusion, compared with other neural network algorithms, the algorithm used here to build a food nutrition and health DTs evaluation model based on Cloud Computing combined with AlexNet has excellent food nutrition recognition and prediction accuracy.

Figure 11 provides the results of the time needed by each algorithm in the test set and training set under different food image data amounts. It can be found that the optimized AlexNet algorithm is better than the algorithms used by other scholars. Under different food image data amounts, the time required for the food image data increases gradually. At the same time, the time required by various algorithms in the test set is slightly lower than the time needed for the training set. This phenomenon is since the model has not yet formed a nutritional identification and analysis path corresponding to the food image during the training process. Besides, the neural network learns autonomously through the training process. Afterward, it can precisely analyze the nutritional classification of many food images, significantly improving the efficiency. Therefore, in the comparative analysis of the experimental results of all methods, the food nutrition and health DTs evaluation model based on Cloud Computing combined with AlexNet performs the best and achieves the better food nutrition classification and identification effects.

5 Conclusion

Today, people pay increasing attention to balanced nutrition. Here, Cloud Communication technology, AlexNet, and DTs technology build a food DTs nutrition evaluation model. Through experiments, it is found that the throughput of the STM optimization algorithm proposed here is optimal, which can ensure the stability of the identification and processing of human nutrition and health data. At the same time, the improved AlexNet algorithm has a classification and recognition accuracy of 93.17% for food images, providing a direction fr the balanced intake and intellectual development of the follow-up dietary nutrition field. Still, the present work has some shortcomings. For example, the Body Area Network is cannot acquire all types of data on human nutrition and health. Therefore, future work will optimize the nutritional structure of users and study the impact of human social attributes on the correlation between food nutrition.