1 Introduction

The integration of multimedia-assisted healthcare systems with cloud-computing services and mobile technologies has led to increased accessibility for healthcare providers and patients [1]. Utilizing cloud computing infrastructures and virtualization technologies allows for the transformation of traditional healthcare systems that demand manual care and monitoring to more salient, automatic and cost effective systems [1, 2]. With such paradigm, patients will not only have fast responses to health related inquiries without any disturbance to their daily routines, but also access to sophisticated back-end emergency processing centres, which at the end can help reduce healthcare costs [3]. Our goal in this paper is to develop a multimedia-assisted mobile healthcare application using cloud-computing virtualization technology. We consider food recognition and calorie measurement [46] as an example healthcare application that needs cloud-computing virtualization for efficient execution and mass deployment.

In our food recognition and calorie measurement application, the user takes a picture of the food using his smartphone and the application measures the amount of calorie intake automatically. The system enables the user/patient to obtain the measurement results of the food intake from the application, which simulates the calculation procedure performed by the dietician. The process entails several key functionalities such as the use of food pictures, image processing by graph cut segmentation, image processing and analysis, and deep learning algorithms for food classification and recognition. There are several challenges that make virtualization to be more effective in such type of real-time healthcare application. For example, existing client side devices (e.g. smartphones, tablets etc.) have limitations in adaptively allocating necessary computing resources in order to handle time sensitive and computationally intensive algorithms. Image processing and deep learning algorithms, essential for food recognition, consume devices’ batteries quickly, which is inconvenient for the user. It is also very challenging for client side devices to scale for the large number of food data and images that are necessary to achieve high accuracy. The entire process is time-consuming and inefficient and discomforting from users’ perspective and may deter them from using the application.

With cloud services the above challenges can be overcome. Users with mobile devises can rely on cloud resources to perform computationally intensive operations such as image segmentation, deep learning, data mining, and image processing. In this paper, we address the above challenges by proposing a virtualization mechanism in cloud computing that utilizes the Android architecture. Android allows for parting an application into activities run by the front-end user and services run by the back-end tasks. Generally, the approach for virtualization is either performed using a hosted or hypervisor architecture [7]. A hosted architecture installs and runs the virtualization layer as an application on top of an operating system and supports the broadcast range of hardware configuration. In contrast, hypervisor (bare-metal) architecture installs the virtualization layer directly on a clean x86-based system [8]. A hypervisor acts as a virtual machine manager that allows multiple operating systems to share a single hardware host. Each operating system appears to have the host’s processor, memory, and other resources all to itself. However, the hypervisor is actually controlling the host processor and resources, allocating what are needed to each operating system in turn and making sure that the guest operating systems (virtual machines) cannot disrupt each other [8]. In this paper, we have implemented both models (details are in Sect. 3), wherein (i) we configured Android x86 image on top of VMware Elastic Sky Integrated (ESXi) for achieving Bare-Metal architecture and (ii) Android x86 image on top of VMware Workstation on the host operating system. We finally used the hosted architecture for publishing our Android-based food recognition and calorie measurement application in the cloud. The main contributions made in this paper are as follows:

  • We propose a cloud-based virtualization mechanism for multimedia-assisted mobile food recognition application. Our mechanism allows users to control their virtual smartphone operations, through a dedicated client application installed on their smartphones, while the processing of the application continues to run on the virtual Android image even if the user is disconnected for any unexpected reason. With our mechanism, the mobile environment is emulated on the cloud in a manner that users always feel as if the application is being run on the smartphone, but in reality it is virtually running on the remote server in the cloud. This is rather significant to overcome the limited capability of smartphones to run intensive machine learning algorithms similar to those presented in this paper.

  • Our model integrates virtualization mechanism of the multimedia-assisted mobile application with deep neural networks in cloud computing. In our system, we use deep convolutional neural networks (CNN) as a backbone of the application and the system handles the training and testing requests at the top layers, without affecting the central layers. This will allow us to enhance the accuracy of food recognition by adding a large data set of images in a cloud-based storage.

  • Our experimental results of running the application on cloud servers show significant improvements when compared with experiments running on the local server. The results show that the rate of food recognition when we run the application in the cloud is more than 94.33 % compared to 87.160 % when we run the application on a local server. Also, with our virtualization mechanism the results are processed up to 49 % faster than the case of running the application locally.

The rest of the paper is organized as follows: The related work is presented in the next section followed by the proposed system in Sect. 3. In Sect. 4, we present the implementation of the proposed virtualization mechanism for our food recognition and classification application. In Sect. 5, we provide the experimental results and finally in Sect. 6 we conclude the paper and provide directions for future work.

2 Related work

In this section, we present related work in two areas: Multimedia-assisted food recognition and classification systems, and virtualization mechanisms for healthcare applications using cloud computing. In particular, we examine a sample of both areas of studies that we believe to be representative and specifically related to our work. We also describe in this section the main advantages and drawbacks of existing work and the need for new mechanisms as proposed in this paper.

2.1 Multimedia-assisted food classification and recognition systems

The open literature describes several approaches that use multimedia mechanisms such as image processing to analyze food content, e.g. [914]. In [9] the authors propose a system that utilizes food images that are captured and stored by multiple users in a public Web service called FoodLog. Then, a dictionary dataset of 6512 images is formed including the calorie estimation. The images in the dictionary are used in dietary assessment approach. The accuracy in such approach for measuring calories is very low. In [9], a new 3D/2D model-to-image registration framework is presented for estimating food volume from a single-view 2D image containing a reference object. In this system, the food is segmented from the background image using morphological operations while the size of the food is estimated based on user-selected 3D shape model. In [11, 12], a set of pictures is taken for before and after food consumption in order to recognize and classify the food and determine its size. In such method, the existence of a premeasured and predefined measurement pattern is used inside the images to translate the size in pixels of each portion. All these conditions can generate difficulties, which has been addressed by Martin et al.[13]. In [13], the authors proposed a system that captures the images and sends them to a research facility where the analysis and extraction are performed. The major disadvantage of such system is that it does not provide information to the users in real-time. There is a considerable delay in providing the information due to the offline processing of images. A smartphone based application for recording food intake is proposed in [14]. This system uses image analysis tools for identification and quantification of food consumption. However, it does not provide estimation of calorie intake and only process 2D images. Other various image processing and machine learning techniques applied in different steps of food recognition systems are discussed in [46].

2.2 Virtualization of healthcare applications in cloud computing

In the past few years several studies, such as those presented in [1520] have proposed methods for virtualization of healthcare applications in cloud computing as an alternative underlying technology to overcome the limitation of existing healthcare system. For example the work presented [15] proposes a system called “MedCloud” which utilizes cloud-based technologies in conjunction with privacy and security rules for patients’ data storage. The authors in [16] propose mobile cloud for Assistive Healthcare (MoCAsH) as an infrastructure that makes use of the cloud computing as well as collaborative plans by deploying intelligent mobile agents, context-aware middleware, and collaborative protocol for efficient resource sharing and planning. It also addresses various quality-of-service issues concerning critical responses and energy consumption. The goal of the work in [17] is to develop a cloud-assisted mobile pervasive system with medical software as a service (SaaS). The mechanism of virtualization uses back-end real-time application server stack to store and manage patients’ health records. It focuses on deadline-critical real-time medical data which is generated by sensor-based medical devices, such as wireless electrocardiogram (ECG). In order for the system to handle time sensitive and mission-critical medical data in a public cloud computing infrastructure, it uses a real-time application server operated as a virtual machine. Virtualization of healthcare sensors is addressed in [1820], which develop a virtual sensor for remote health monitoring applications. The system tries to overcome the discomfort issue of wearing blood pressure sensors during monitoring. The application is deployed in the cloud to ensure scalability, accessibility and flexibility. In such system, the physician or care-taker is notified if there is any deviation from the normal value for immediate attention.

In parallel to the above work, several mechanisms are proposed to help users conveniently use cloud-base healthcare systems. In [21], the authors build a “virtual network computing (VNC)” server and a VNC client for establishing the remote desktop connectivity between the mobile device and the server. They were able to establish remote connection to mobile from the desktop and send the text messages from the desktop. The cuckoo framework [22] is based on the client server communication methodology, wherein the mobile device and the server communicate via remote procedure call (RPC) and remote method invocation (RMI). The cuckoo application could be run either remotely or locally and is based on Android platform. Another model, known as Ibis high performance programming system [23], is used as the basis for Cuckoo’s communication component. However, in this system there is a decision making algorithm on how to offload the content to the cloud. Our model is different from [21, 22, 24] and [23] in several aspects as follows: It is designed based on the Android platform, which is not only designed for smartphone devices but also for the Android x86 emulator. It allows establishing remote connection from the physical device to the virtual machine. In our model we also support content off-loading to cloud, although it is Android based. Our system uses VNC instead of Ibis [23] and we support RPC and RMI protocols for performing remote operations. While Ibis provides remote access to the resources and acts as a middleware, it needs prior installation and may not be suitable for disconnected operations. VNC on the other hand is available for all platforms and easy to port on new platforms.

Our work is closely related to [24], as we share similar objectives of virtualizing the Android Application in android x86. In [24], the model, although designed for a different application than ours, is implemented based on virtual connection for Android. As there were multiple images running the same functionality, the processing time takes less than 1 second to open a PDF file in Android x86 images hosted in a data center, while the average time for opening the same PDF file takes 10 s. In our work, we propose a similar idea of mobile virtualization system which is based on Android x86 virtual machine. We use the hosted infrastructure for achieving virtualization with the intention of deploying our application in the cloud. A similar approach has also been proposed by [25], which is an Android application for performing object recognition, using the camera sensors of the smartphone. The implementation in [25] is based on the Ibis middleware and hence has the same drawbacks as the Cuckoo model which is also based on the client server communication methodology. Although we share some features with [2426]our model is rather advantageous to both studies from the virtualization aspect. With virtualization, the virtual machine layer is located between the physical hardware and the operating system, which minimized the overhead of code generation and processing. Also virtualization supports multiple operating systems. In comparison to [9, 11] and [12], Our approach provides much higher accuracy in almost real time. This means the user in our system does not need to wait for the result after he/she eats the meal, which could be too late to alert them about the amount of calorie in the plate. In the next section, we present the proposed system in details.

3 Proposed system

In this section, we discuss our proposed system. We first introduce the reader to the concept of virtualization in Sect. 3.1. Then, in Sect. 3.2, we provide details about our proposed model using Android virtualization mechanism (Fig. 1).

Fig. 1
figure 1

Bare-metal a virtualization versus Hosted b virtualization

3.1 Android virtualization mechanism

Figure 2 shows a high level view of our virtualization model. In this model, we are able to run our Android application “eat healthy stay healthy (EHSH)” in the cloud with Android x86 image. An android based smartphone normally runs on Android ARM processor and the Android operating system is specifically built to be compatible with the ARM processor. The Android x86 emulator mimics all of the hardware and software features of a typical mobile device. It runs a full Android system stack, down to the kernel level that includes a set of preinstalled applications that can be accessed from the user’s applications. For a smartphone, it emulates the mobile environment in a manner that the user always feels as if the application is being run on the Android smartphone, but in reality it is virtually running on the remote server (Android x86) with proper synchronization between the local client (smartphone) and the remote server.

Fig. 2
figure 2

Cloud-based virtualization on Android for eat healthy stay healthy application

Running the mobile application on virtual systems could be achieved through different approaches. The first approach is to implement the mobile application in the kernel layer (bare-metal architecture). Bare-metal architecture installs the virtualization layer directly on a clean x86-based system. A hypervisor being a virtual machine manager allows multiple operating systems to share a single hardware host. Each operating system appears to have the host’s processor, memory, and other resources all to itself. However, the hypervisor is actually controlling the host processor and resources by allocating what is needed for each operating system in turn and making sure that the guest operating systems (virtual machines) cannot disrupt each other. The main challenge of this method is that Android supports only one display and keypad device since Android is mainly designed for smartphones. Hence we would not be able to enter any entries in the text field in our application. In contrast, a hosted architecture installs and runs the virtualization layer as an application on top of an operating system and supports the broadest range of hardware configuration.

3.2 System mechanism

Our model uses both the hosted and hypervisor models and we used the hosted architecture for publishing our Android application in the cloud, because the bare-metal architecture will not allow us to publish the virtual machine structure in the cloud. By so doing, the users of EHSH can control their virtual smartphone operations through a dedicated client application installed on their smartphones while the processing of the application continues to run on the virtual Android image even if the user is disconnected for any unexpected reason. For establishing remote connection from the Android physical device to the Android x86 image, we used VNC and installed the VNC server on the Android x86 processor and VNC client on the physical smartphone as shown in Fig. 2. “remote desktop protocol (RDP)” is semantic, which means it is aware of controls, fonts and other graphic primitives. So while rendering changes of screens across the network, for example between two consecutive frames, if the only change is that a button is added, then RDP will send the location of the button on the screen, size and colour. While in the case of VNC, actual images are sent over the network. VNC is available on all platforms while RDP is specially made for the Windows platform. VNC also supports file sharing and is platform-independent. The simplicity of the protocol makes it easy to port on new platforms. Hence we are using VNC for our model, as we have Android ARM processor on physical smartphone and Android x86 on a virtual machine. Furthermore, VNC provides flexibility to establish remote connection to Android. Also, in the bare-metal virtualization architecture, the mobile application is implemented in the kernel layer. The main challenge of this method is that, Android supports only one display and keypad device since android is mainly designed for smartphone. Hence the user will not be able to enter any entries in the text field (especially in the login credential section) in our application. Hence we used the hosted architecture wherein this problem could be addressed. So, we have applied virtualization to the android application (EHSH) and the end result is that, we are able to establish a virtual cloud session on the mobile device as elaborated in Fig. 2. Apparently the login page is being displayed on the mobile screen and the user always feels as if the application is being run on the Android smartphone, but in reality it is virtually running on the remote server (Android x86) with proper synchronization between the local client (smartphone) and the remote server. We have used this methodology to implement our android application in the cloud, which is further explained in the next section.

4 Implementation

In this section, we describe the implementation and operation of our system. The aim of this section is to illustrate the use of VNC and Android platform for the purpose of visualization of our application. We also provide brief details of the use of image segmentation and deep learning based on our previous work [46].

We implement our system such that both the smartphone and the virtual machine’s operating system have Android. The smartphone used in the implementation is Samsung Galaxy S4 running Android 4.4.2 (KitKat) while the Android x86 emulator has the Android 4.4 (Kitkat) image. We have stored the replica of our application in the Android x86 virtual machine hosted on Amazon Web services; see Fig. 3. The VNC client on the physical smartphone is used to connect to the VNC server on the virtual machine. Once the connection is established, VNC transmits all the events and Android images similar to streaming a video. For handling connections from multiple users, we have assigned a virtual image to each user. Hence the user session remains consistent and the user never realizes that the session is being run on the remote machine. Hardware virtualization (hypervisor) is between the physical hardware of the virtual machine and the Android x86 OS, which enables the deployment of a replica VM. To maintain consistency between the Android smartphone and the android x86 replica, any new file that has been added to the file system of the smartphone (from sources other than the Internet in the case of continuously connected operation) is also sent to the replica. Running a VNC client on the physical smartphone is used to connect to the VNC server on the virtual machine. Once this occurs, VNC transmits all the events and android images just like streaming a video. The first component of our system is the Virtual Computing Architecture, which contains all parts of the virtualization architecture. The second component is food image processing and recognition, which performs feature extraction, classification and calorie measurement. This process is fully automatic and does not require any intervention from the administrator or the user. After the features are extracted and the food image is classified, the result is compared with the database to generate the corresponding calorie of the food item.

Fig. 3
figure 3

Cloud-based virtualization on Android for EHSH application

The system makes use of a set of key functionalities such as the use of food pictures, image processing by segmentation, and image analysis. The details of the image segmentation and classification are provided in [46], and will not be repeated here. Furthermore, we apply deep learning neural networks to increase our food recognition accuracy[27, 28]. We do this by initially capturing a set of images of one particular class and labelling them with an object name-set (object being Spaghetti, in Fig. 3). These would be our set of relevant (positive) images. After we have captured the image-sets we train the system with these images. As training takes place virtually on the server, we have the much needed processing power, so the system gets trained quickly (depending upon the number of images in a class). As part of the second step of training, we now re-train the system with the set of negative images (images that do not contain the relevant object), in our case we trained the system with the background images, so it does not categorize them as part of the mentioned class. Once the model file is generated from the training, we load it into the Android Application and test it against the images captured and submitted by the user. The system then performs the image recognition process and generates a list of the probabilities against the label name. The label with the highest probability is prompted to the user in the dialog box, to confirm the object name. Once the object name is confirmed, the system performs the calorie computation part as fully described in [29] and shows the computed calories to the user. The final result, containing the food item name and the corresponding calorie value are send to the android API.

5 Experimental results

This section reports on the experimental evaluation of the performance of the proposed model. We compare the results of running our application’s algorithms such as deep learning, image segmentation, and image processing on three different configurations of cloud servers and a local server connected to the smartphone.

The setup of our testing is as follows: We used seven different food classes, each class containing 20 test images. For each image belonging to a class, we have recorded the recognition accuracy, recognition success, and the time, in seconds, of processing the results to the user. We ran the first set of experiment on a single instance cloud server (1 ECU and 2GB RAM) in which we have included the top 5 results of images. The average timings for the implementation of the above algorithm were recorded between 16 and 18  s. For most of the image classes, the recognition success rate was 18 out of 20 images with accuracy varying between 59.05 % and 100 %. Except for a couple of images, the recognition rate of most food items recorded a recognition rate above 95 %.

In the second setup, we performed the experiments on a cloud server with two instances each with 4 GB RAM and 3 ECUs. For this experiment, we have included the top 5 results for images. The average timings for the implementation of the above algorithm were recorded between 15 and 16 s. The recognition success rates were similar to the case of one instance cloud server and the accuracy rates were between 59.05 and 99.9487 %. In the third setup, we performed the experiment using 4 instances of the cloud server with 15GB RAM and 13 ECU. With this relatively powerful cloud server, there was significant improvement in the timings and the average processing time was 14.60 s.

The experiment was also performed on the local server with 1.7GHz and a shared 1.78 GB RAM. We have noticed a significant difference with respect to the time and the recognition results in this case. The average timing for the implementation of the algorithm ranged between 24 and 28 s, with varying results on each run. Also the recognition was 16 out of 20 images. The results of the processing time of the above experiments are shown in Table 1. As we can see, the time has improved significantly on all cloud configurations compared to the local server. As expected, the time improvement increased as the processing power of the cloud sever increased. For some food items, the boost in time processing reached to 50 % percent improvement. One aspect of the results that did not provide significant improvement is the accuracy of food recognition as we can see in Table 3. Even when running the system on the most powerful could server, there were no improvements. The reason for such results is the number of images used in the experiment. The training algorithm needs a much bigger data set to show the difference between the two experiments. This is a limitation on the current experiment; however, we are towards adding more images for further investigation. For comparing the accuracy of the system, We have run the method with deep learning neural network on cloud and server and you can see the results in the table below. So by increasing the number of images we will get higher accuracy in cloud version. Also by updating the system periodically with unrecognized images, we will get better accuracy after a short time in cloud version in compare with local server (Table 2; Fig. 4).

Table 1 Comparison results of the application processing time
Table 2 Percentage improvement of processing time
Table 3 Comparison of the accuracy percentage
Fig. 4
figure 4

Accuracy of cloud and local server on spaghetti

Also, we have compared our time and accuracy results to other similar models. The models which we look to compare share the same goal of object recognition via mobile device and make use of the cloud computing models. However the approach they follow is different in terms of the recognition concept and even the cloud computing model. We have made two graphs, each including the comparison based on time and accuracy when comparing our model to other models.

In our simulations, the time taken (in seconds) is the overall time taken for the object recognition process (feature extraction, classification etc.) from the time the images are captured from the mobile device, till the point the final recognition results is displayed on the mobile screen. Matusiak [30]model, is a food object recognition model in a mobile phone application for visually impaired users. It makes use of color detection, object recognition module and light source detector, wherein the image resolution for recognition is 320 \(\times \) 480 and is trained with the database of 30 food products. The object recognition time in this model is 36.7 seconds for each image. The application is implemented as a standalone application on the mobile device, without the use of any external server.

Eyedentify [31]was also designed to recognize objects. Similar to our method, the computation requirements for achieving the recognition task in [31] were not met by the smartphone alone. By using cyber foraging in which some computations were offloaded to the cloud, they were able to increase the application’s responsiveness and accuracy and decrease the amount of energy used. In [31], Ibis middleware was used to build a distribution system, which was evaluated using eyeDentify, which performs object recognition. The problem with this type of approach is that it uses the Ibis distributed deployment system. Because the adapters for grid middleware in Amazon EC2 are still in progress, this prevents them from running the Ibis distributed environment in Amazon Web Services (AWS). However, in our system, we offloaded the computational part to multiple Amazon EC2 instances as part of AWS. In addition, Ibis analyzes the images by dividing them into circles, whereas our system processes images based on pixels. The circular region approach [31] misses the information not covered by the circles, whereas the pixel approach analyzes all the information in an image. Furthermore, in [31], the features used for recognition of objects were based on only the color feature, which is inefficient when the objects have the same colors. In contrast, our model is based on four feature sets. They had implemented it in a 8 node VU cluster wherein each cluster was dual CPU / dual core 2.4 GHz AMD OPetron 4 gb ram. It also ran on Linux Operating system. The overall time taken with the cyber foraging concept by [31], was 34.8 s.

As shown in Fig. 5, compared to [30] and [31], our model (EHSH) has an improved overall timing result. On the local client server model, the time taken for food object recognition was 26.96 s which further improved to 14.64 s with the use of the cloud based virtualization model.

Fig. 5
figure 5

Time Comparison across various recognition models

Fig. 6
figure 6

Classification accuracy comparison between food recognition models

Also, based on Fig. 6, we can observe that our model has an average classification accuracy of 87.16 % for the food objects. The existing food object recognition models like the Okamoto et. Al. model [32] has an overall accuracy of 74.8 %. In [32] they propose a mobile real-time eating action recognition system. They classify food items like meat, rice, pumpkin, bell pepper and carrot. Comparing [32] to our model, with the implantation of the deep learning in cloud based virtualization model, we are able to achieve 12.36 % better accuracy result with certain food objects even classified with an accuracy percentage of 100 % in single food portions. As shown in Fig. 2, we can observe similar accuracy improvements of 24.64 % with [33], 9.16 % with [34]and 20.83  % when compared to [35].

6 Conclusion and future work

This paper described a virtualization mechanism for real time multimedia-assisted mobile food recognition application in cloud computing. The key ideas and the overall system implementation as well as experimental results to support the validity of the proposed mechanism were described. We were able to virtually run computationally intensive and time sensitive Android-based food recognition and classification application on the virtual system (Android x86) hosted in the cloud. With the proposed virtualization mechanism, we addressed several challenges related to the complex food recognition algorithm. Our experimental results of running the application on the cloud servers show significant improvements when compared with experiments running on the local server. The results show that the rate of food recognition when we run the application in the cloud servers is more than 94.33 % compared to 87.16 % when we run the application on a local server. Also, with our virtualization mechanism the results are processed up to 59 % faster than the case of running the application locally. One of the main contributions of the proposed system is that it allows users to control their virtual smartphone operations, through a dedicated client application installed on their smartphones, while the processing of the application continue to run on the virtual Android image even if the user is disconnected for any unexpected event. With our mechanism the mobile environment is emulated on the cloud in a manner that users always feel as if the application is being run on the smartphone, but in reality it is virtually running on the remote server in the cloud. This is rather significant to overcome the limited capability of smartphones to run intensive machine learning algorithms similar to those presented in this paper.

As for our plans for future work, we will be embedding the VNC server configurations inside our Android Application. Currently we are establishing the remote connection outside of our Android application, thereby allowing the user to access other applications. One of the main reasons for creating this unique VNC configuration being, there is a stage in our Android application wherein the user captures the image from the physical device. The captured images are further used for image recognition, which is processed remotely in the virtual machine. Currently the existing configuration does not re-establish the remote connection once broken, while capturing images. Hence the images are fetched from the gallery before the launch of the application. We will be addressing these issues in the near future. Another aspect of the results is related to the accuracy of food recognition as shown in Table 3. We have noticed that the accuracy percentage has improved marginally. Even when comparing the accuracy results with the most powerful could server. The reason for such results is the number of images used in the experiments. The training algorithm needs a much bigger data set to show the difference in the accuracy results. This is a limitation on the current experiments, however we are planning to increase our set of data images and perform further investigation to study the accuracy of the system.