1 Introduction

Over the years, Wireless Sensor Networks (WSNs) have experienced an unprecedented growth in terms of applications, interfacing, scalability, interoperability and data computation. These technological advances along with the innovations in Radio Frequency Identification (RFID), and wireless and cellular networks have laid a solid foundation for the Internet of Things (IoT). The term Internet of Things (IoT) was first coined by Kevin Ashton in 1999 in the context of supply chain management [1]. It refers to a smarter world of objects where every object is connected to the Internet [2]. In IoT, all these objects, also known as entities, have digital identities and are thus organized, managed and controlled remotely and thus having a scope beyond the limits. Due to the growth in the development of smart objects, IoT has enriched almost all aspects of our daily lives and is continuously doing so with diverse range of novel, innovative and intelligent applications [3, 4]. These applications include smart healthcare [5], smart cities [6], smart agriculture [7], crowd sensing [8, 9], and crowed sourcing [10] etc., as shown in Fig. 1.

Fig. 1
figure 1

Applications of IoT

These advancements along with innovative applications are highly encouraging and show a bright future of IoT on one side but at the same time, multiple challenges on the other side. Some of these challenges include security, big data analytics, interoperability, Quality of Service (QoS) and energy management [11]. Among them, big data is critical due to the interrelation between IoT objects and plethora of data streams generated by them. A huge amount of information is generated from a vast variety of IoT devices and applications. Various big data analytics are employed to mine such information and improve the decision making. In an IoT context, big data is classified and described by various researchers from different perspectives and various models have been proposed [12,13,14], however, the most prevalent among them is 5 V model. This model classifies the big data into five categories, based on various attributes associated with them. These attributes are, size of the data (volume), real-time data collection (velocity), heterogeneous data collection from a diverse range of resources (variety), unpredictable data (veracity), and finally the application of such data in various fields, such as industry and academia (value). Recently, we have seen a phenomenal growth in big data research due to its application in various domains. This development is further ignited by the integration of IoT with big data creating opportunities for the improvement of services for many complicated systems, such as healthcare system. In the IoT literature, there has been a large number of big data technologies that are used for the analysis of large volumes of data from a number of resources in a smart healthcare domain. Among these technologies, machine learning (ML) is a dominant technique that performs complex analysis, intelligent judgments, and creative problem solving on the big data. It is estimated that the economic impact of using ML techniques for big data analytics, i.e., ML-based products and platforms, will range from $ 5.2 trillion to $ 6.7 trillion per year by 2025 [15]. This signifies the importance of ML in big data, and particularly in IoT.

There exist numerous comprehensive literature reviews that recognize the research trends in big data, ML, and IoT, respectively. For instance, in [16], the authors discussed the characteristics of big data from various dimensions, i.e. volume, velocity, variety, veracity, variability and value. Moreover, they discussed the current and emerging deep learning architectures and algorithms, specifically designed for big data analytics in various IoT domains. However, the proposed review is generic because it discusses deep learning techniques for big data analysis in multiple domains. Authors in [17] studied the latest machine learning techniques for big data analytics, used for IoT traffic profiling, device identification, security, edge-enabled computing infrastructure, and network management. However, this survey is restricted to the applicability of ML techniques for big data analysis in a wide range of applications within a specific domain. Similarly, big data technologies across various sectors such as smart health, smart traffic and logistics and smart agriculture were discussed in [18]. This survey enables the readers to choose the most suitable technique from a diverse range of available techniques for data analytics across various domains. Moreover, it also studied the applicability of these techniques in cross domains. However, this survey is limited in scope and pertains only to a single domain. Besides, it partially discussed techniques from each domain. Some surveys, on the other hand, target only a single IoT domain. For instance, the authors in [19] presented a taxonomy of ML-based techniques for smart city domain. However, it does not considered security of the data and the underlying network. All these literature reviews and surveys studied big data and ML from IoT perspective for different applications such as intelligent transportation systems, smart cities, smart agriculture, crowd sensing and smart homes. However, it is evident from the literature that there is a lack of research work that exclusively investigates big data analytics and ML in IoT healthcare domain. Some of the aforementioned surveys dedicated only a single section to this topic, however, there lacks a comprehensive survey on these technologies that identify the most suitable big data technologies and ML techniques for their applicability in IoT healthcare. Moreover, studies that interlink the two cross domains, i.e., big data analytics and healthcare are still in its infancy and thus require further attention from the research community. Similarly, there is no single study that examines the significance of data aggregation and its vital role in this specific domain.

To identify these reach gaps, we have carefully reviewed various papers related to ML techniques for big data analysis. Considering the challenging aspects of big data in the IoT healthcare, in this work, our ultimate objective is to present the state-of-the-art literature on the ML techniques and big data analytics that are exclusively proposed for IoT eHealth. We have also highlighted the strength, weaknesses and future challenges in this context. This will enable the readers to choose the most suitable technique from the available pool of big data analytics tools for healthcare and explore them further in the time ahead. Based on our extensive literature review, this is the first work that targets this particular domain and thus makes it unique from the rest of the papers, available in the literature. The main contributions of this paper are as follows:

  • It discusses the relationship between big data and IoT in general, followed by the state of the art big data research in IoT smart health. Finally, a comprehensive discussion is provided on various research challenges that provide further opportunities in this specific domain. This provides the most striking features to all interested parties for further exploration in the years ahead.

  • Fundamental concepts of big data and the complex relationship between big data and IoT is explored.

  • Big data challenges in IoT healthcare domain are discussed and future research directions are provided in this context.

  • A systematic review and study of the existing data aggregation techniques, based on ML and their applicability to IoT smart health are discussed.

The rest of this paper is organized as follows. Section 2 sheds some light on the article classification and our motivation towards researching this specific domain. In Section 3, we provide an introduction of IoT by highlighting its contribution towards various applications. This section exclusively studies the recent developments and transformation of conventional healthcare sector, along with a layered architecture for Wireless Body Sensor Networks (WBSNs). Section 5 discusses the concept of big data challenges, particularly in IoT from smart healthcare perspective. Next, we provide a detailed discussion on the role of ML techniques for the analysis of big data in IoT healthcare in Section 6. A comprehensive and updated literature review on various machine learning techniques for big data analytics in IoT eHealth is provided in Section 7. Research challenges in the field are presented in Section 8. Finally, the paper concludes with Section 9 by stating the limitations and future work for further exploration. The overall structure of this paper is depicted in Fig. 2.

Fig. 2
figure 2

Structure of the paper

2 Articles classification

In this work, we have examined some of the well-known academic databases and publishers such as Google Scholar, ABI/INFORM Global, Academic Search Premier, Applied Science and Technology Full Text (EBSCO), ACM Digital Library, IEEE Xplore Digital Library, Science direct and general Google search engine. We have used various keywords that include but are not limited to big data, IoT and big data, big data analytics in IoT health, IoT eHealth, and machine learning and big data analytics in IoT healthcare to explore primary challenges and issues in the application of ML to big data analytics in IoT smart health. We were striving for the latest literature including journal papers, conference papers, standards, project reports, patents, white papers and reports from industries. Furthermore, we have restricted our search for the related literature that is published over the past 4 years, i.e., from 2016 to 2020. Among them, particular emphasis was given to papers related to big data research in IoT health care domain. As a result, a total of 361 papers were downloaded, however, only 90 papers among them were selected and thoroughly reviewed, as shown in the Fig. 3. Each paper was carefully analyzed to find the research gaps and clarify our research direction as well as our motivation for carrying out this research. Based on our result, we have selected only 7 out of all research papers, which are [18, 20,21,22,23,24,25]. A detail discussion on these survey papers was provided in Section 1 that justify as to why we have carried out this research work, and our motivation behind this paper. Moreover, strengths and weaknesses of the aforementioned papers are also provided to justify our work along with the contributions and novelty of this survey.

Fig. 3
figure 3

Relevant Articles Published over the time

3 The internet of things

IoT is a web of smart and self-configuring things that can communicate with each other using a global network. It is essentially cyber-physical systems or a network of networks. An informal description for the phrase “IoT” was put forth by IEEE, as “a network of objects each of which is embedded with sensors and these sensors are connected to the Internet” [26]. The seamless communication among participating objects is facilitated using the low-cost sensors installed into a diverse range of objects supporting ubiquitous and pervasive computing applications [27]. Apart from these, other technologies that further stimulated the development of the IoT are wireless technologies, micro-electro-mechanical systems (MEMS) and the Internet. According to the market analysts, around 25 billion sensor-enabled devices will be installed by 2020 [28]. Moreover, the market scope of such devices is expected to be around 2.1 trillion by 2025 [29]. This implies that billions of physical devices or sensor-enabled objects will be connected and will communicate with each other via the Internet. The plethora of objects will generate huge and in most cases, real-time heterogeneous and complex data. It is therefore imperative to extract useful patterns from these raw data in an efficient manner. The raw data gathered from the physical environment need to be analyzed and mined for novel feature extraction and useful information. This become particularly important with the evolution of intelligent IoT applications, where the devices communicate with each other and enable them to share information by making intelligent decisions. As a result, big data analytics using data mining techniques is evolving as a new area of research. In recent years, we have witnessed the development and deployment of a large number of IoT applications [30,31,32]. These applications include smart cities, smart energy management, smart agriculture, military applications, environmental monitoring and healthcare. IoT has the capabilities to refurbish the current and future scenario of healthcare sector with promising technological, economic, and social prospects. It is estimated that the economic impact of IoT-enabled hardware and software will reach USD 176.82 Billion by 2026 [33]. The healthcare sector alone will constitute about 41%, a major share followed by industrial automation with 33% and energy with 7% of the IoT market [34]. Apart from these, 15% of the IoT market is related to objects and product-related transportation, agriculture, urban infrastructure, security, and retail sectors. These outlooks indicate the remarkable growth of the IoT services to healthcare industry on one side, while, challenges such as big data and other challenges on the other side that the research community will face shortly.

4 IoT in healthcare

With the emergence of eHealth and mHealth, we have witnessed an increasing role of technologies in the healthcare sector. Millions of sensors are attached to the patients that continuously monitor their health using various physiological, environmental and behavioural parameters. In healthcare IoT, i.e., eHealth and mHealth, wireless body sensor networks (WBSN) is a predominant technology for monitoring the patients. WBSN consists of sensors that are deployed around the human body [35]. The layered architecture of WBSN comprises of sensing layer, communication layer, processing layer, storage layer, and mining and learning layer as shown in Fig. 4 [36]. Each layer contains various components with their responsibilities. The sensing layer includes various sensing devices, such as wearable sensors and in-body sensors. Recently, medical super sensors (MSS) came into the market that have more memory with improved processing and communication capabilities as compared to the ordinary sensor nodes. These sensors are usually wearables or sometimes implanted inside the patients’ skin and can communicate with the network. These sensors gather vital information pertaining to body temperature, blood pressure, heartbeat rate, respiration rate, ECG, and blood glucose for diabetic patients [37]. In recent years, actuators are employed for raising alarms and modifying the environmental parameters, whenever necessary. We have witnessed huge developments in these applications in the form of novel monitoring applications. As a result, a large amount of contextual data is generated from these applications. It is mandatory to consider big data among other challenging issues while designing devices at the sensing layer. Some of these issues are price, size, energy consumption, memory, processing, power, deployment and organization of various devices at this layer. The next layer is the communication layer which is somehow similar to physical layer of the TCP/IP model. This layer is responsible for physical objects to connect and share data in WBSN, using specific communication protocols. It facilitates the inter and intra network communication. Standard and communication protocols defined at this layer provides interoperability in WBSN. These protocols also facilitate the exchange of data with existing infrastructures. There are various standards used by WSBN for intra communication at this layer, such as Bluetooth, ZigBee, RFID, NFC and UWB [38,39,40]. Each of these standards have their pros and cons and are used based on the specific application’s requirements [41]. Various challenges faced at this layer are network management, QoS (congestion, latency and energy efficiency), and security and privacy. Apart from these, data aggregation and big data analytics need to be considered for further exploration. These techniques preserve energy of the resource starving networks by substantially lowering the data transmission across the network. The third layer is the processing layer that analyzes the gathered data, makes decisions, and raises alarms and notifications. The main components of this layer are: (a) the processing unit (b) hardware platforms, and (c) operating system. The challenging issue at this layer is the limited processing capabilities of hardware components. The partially analyzed data at this layer is then passed on to the next layer, i.e., the Storage Layer. In IoT healthcare, a large number of devices can be attached to the human body that generates massive and complex data. It is the responsibility of storage layer to efficiently manage and store such data for further analysis and usage. IoT-based system are low on memory and are therefore unable to store such data. To overcome this limitation, numerous cloud-based platforms are available for the storage of data such as ThingWorx [41], OpenIoT [26, 42], Google Cloud [43], Amazon [44], Nimbits [45], GENI [46, 47]. These platforms improve the management and storage of data. Data can be reviewed and accessed virtually from anywhere and everywhere. This in turn facilitates the health professionals and researches to explore it further for better understanding and advancement of the field. Finally, the mining and learning layer is responsible for big data analytics and knowledge extraction. Various data mining techniques are available in the literature, however, ML techniques are successfully applied for big data analytics in health care IoT [17, 48]. ML-based techniques can manage huge data set efficiently, learn from the data and improve the learning experience. They are used to mine the vast amount of medical information and extract useful, potentially interesting, and unique and hidden information. The main components of this layer are: clustering, classification, association analysis, time series analysis, and outlier analysis [19, 49]. It is expected in the future that feedback will emerge from this layer, as opposed to present IoT scenario, where it comes from the clinics.

Fig. 4
figure 4

Layered Architecture of Wireless Body Sensor Network

5 Big data challenges in IoT smart healthcare

Despite the hype surrounding the smart applications of eHealth and mHealth in IoT, big data is still a challenging issue. Sensors and various medical devices attached to the patients’ bodies generate massive volumes of heterogeneous data, also called Big Data [50]. This huge volume of data contains highly correlated and redundant patterns. It is imperative to mine these data for providing continuous, efficient, and seamless healthcare facilities around the clock. However, the challenging issues are the processing and transmission of such data across the network. These issues not only consume higher energy but also bandwidth of the resource-constrained networks that lead to congestion and reduces the energy and lifetime of the underlying networks [51]. It is therefore imperative to aggregate raw data, using big data analytics, before transmitting it across the network for accurate and timely decision making. Moreover, it becomes a major concern for all stakeholders to process the data within the network intelligently and efficiently. Removing redundant and erroneous data, while identifying and extracting meaningful information and gaining new insights into the large volume of raw captured data is the core utility of big data analytics [52]. These techniques not only improve the performance but also conserve the energy using novel energy management techniques by enabling the long term operation of these networks [20, 51, 53].

6 Machine learning and big data analytics for IoT

In this section, we discuss the application of ML for big data analytics. ML is a subfield of computer science that evolved from pattern recognition and computational learning theory [54]. It is a type of Artificial Intelligence (AI) that provides machines with the ability to learn without explicit programming by making complex decisions [55]. In the past, it has been successfully applied to various domains such as computer vision [56], computer graphics [57], natural language processing (NLP) [58], speech recognition [59], computer networks [60], and intelligent control [61]. In recent years, we have witnessed its vital role in IoT and big data analytics due to its phenomenal growth with a diverse range of innovative applications. As a result, highly correlated data is produced from these heterogeneous and complex data sources, i.e., IoT devices. Thus, data management in these systems becomes extremely difficult that results in numerous challenges for the research community [62,63,64,65]. It is important to manage data from these large number of sources with increased velocity and scalability by devising novel big data analysis techniques. Existing techniques are ineffective due to lower accuracy and higher energy consumption that does not cater to these diverse ranges of applications. It is necessary to improve these techniques to cater to various applications. ML techniques play a pivotal role in IoT eHealth [66]. It empowers us to obtain deep analytics from a larger pool of available information. It mines useful information and features hidden in IoT data, and facilitates the decision-making process. Moreover, it helps us in the development of efficient and intelligent IoT applications. An IoT analysis model consists of various components such as data sources, edge/fog computing, and ML techniques for IoT big data analytics. In this model, the potential data sources include wearable devices such as sensors, and body area networks. They capture information related to human health such as temperature, ECG, and environmental data like humidity and camera’s images. Various ML techniques are applied to the data captured by these sources for further analysis. It is evident from the literature that ML techniques have successfully been applied for big data analysis in various applications of IoT such as smart traffic [67, 68], smart agriculture [69], smart human activity control [70], smart weather prediction [16, 71], healthcare [72, 73], and smart cities [19]. Big data has been studied in a diverse range of IoT domains. However, it is evident from the literature that there is lack of a comprehensive literature review that exclusively investigates big data analytics in IoT healthcare. Though, some of the aforementioned surveys dedicated only a section to this domain, there is no single study that examines the significance of ML techniques for big data analysis in IoT healthcare. In the next section, we present state-of-the-art literature by reviewing the latest ML techniques for big data analysis in IoT smart healthcare system. Moreover, strengths and weaknesses along with future challenges are also highlighted. This provides an insight to the readers that enable them to explore it further in the future.

7 A taxonomy of machine learning techniques for big data analysis in IoT smart healthcare system

IoT aims to improve the quality of human lives by automating some of the basic tasks that otherwise humans need to perform manually. In this context, monitoring and decision making is shifted from humans to machines. For instance, in IoT-based assisted living applications, sensors are attached to the health monitoring unit used by the patients. The information gathered by these sensors are transmitted across the network and are made available to all interested parties. This not only helps in timely treatment of the patients but also improves the responsiveness and accuracy of the underlying application [74, 75]. Moreover, the current medicines taken by the patient are monitored and the risk of new medication is evaluated in terms of any allergic reaction [66, 76]. As a result, not only the time is conserved but monetary value remains in place too. In this section, we review only selected ML techniques for big data analytics in IoT eHealth. Moreover, the key concepts along with their similarities and differences, strength and weaknesses are provided, and are summarized in Table 1.

Table 1 Key technological concepts, their similarities and differences

7.1 ML-based recommendation system

In [77], the authors proposed a recommendation system that devised the most feasible IoT wearable devices, based on the needs of an individual. The proposed system initially gathers the available data related to a patient’s health, e.g., previous history, demographic information, and retrieval of archived data from the sensors attached to the patient. Various ML-based classification techniques such as decision tree, logistic regression and LibSVM, are used to predict the occurrence of diseases. Finally, a mathematical model is used for recommending a customized IoT solution for each individual. In [78], the authors proposed a disease prediction system by performing the real-time Electrocardiograph (ECG) analysis. Firstly, the proposed approach analyzes and classifies the ECG waveforms that are captured in real-time from the ECG monitoring devices using various ML classifiers such as KNN and bagged tree. Next, any signs of diseases and abnormalities in the ECG are predicted and are then communicated to the cloud in real-time via a purpose-built IoT network, owned by the National Health Services (NHS), UK. Simulation results showed that the precision of the proposed scheme can reach up to 99.4%. However, the precision as well as the performance need to be evaluated using other metrics such as time complexity and energy efficiency.

In [79], the authors proposed an IoT architecture having five distant but inter-related layers. The first layer is the sensing layer, which includes various sensing devices used for gathering the data. These devices include but are not limited to, sensors, actuators, and a wide range of wearable devices. The second layer is the sending layer, which is somehow similar to the physical layer of the Open Source Interconnection (OSI) model. Its main responsibility is to devise various communication mechanisms for data transmission. This layer discusses communication mechanisms such as Wi-Fi, Bluetooth, ZigBee and Long Term Evolution (LTE) for sending the data to cloud. The third layer is the processing layer, which is concerned with the processing of data, based on some pre-defined criteria. Once the data is processed, notifications and alerts are generated in response. Some of the devices where processing occurs are smart phones, micro-controllers and microprocessors. At the fourth layer, i.e., storage layer, the data is stored at a preferred location such as clouds and hosted servers. Finally, the fifth layer, also known as the mining layer, converts the information into decisions using a diverse range of data mining or ML algorithms for reaching a conclusion. Based on the decision, various suggestions and recommendations are made. In [80], the authors proposed a recommender system Pro-Trip, which allows the users to organize the activities before a trip or on an ongoing trip. Pro-Trip collects all the data from the patients that is used for further recommendations to provide accurate results. The authors also proposed a technique for food RS designed for the healthcare system. The results of Pro-Trip are evaluated based on climate and food datasets that are collected in real-time. In the food recommendation system, they have evaluated the performance with latency, energy efficiency, and security in mind. In [81], the authors proposed a novel recommendation system based on Type-2 fuzzy ontology-aided RS, especially designed for IoT-based healthcare systems. It overcomes the issues faced while monitoring and extracting the optimal value of risk factors in patient’s data. Hence, the proposed technique ensures to observe the patient and then recommends the diet with a discrete amount of food and medicines. This approach evaluates the risk faced by the patient, deduces the health state of the patient with the help of wearable devices embedded with sensors, and further suggests the prescription of medicines and food. Authors have amalgamated two techniques: Type-2 fuzzy logic and fuzzy ontology, which remarkably improve the rate of prediction of recommendation. The accuracy, recall, and precision are compared with other ontologies, i.e., Type-1 and classical, which show excellence in the results. The future work could magnify upon the Type-2 fuzzy neural network and sentiment analysis for the RS. In Table 2, we have shown various recommendation systems for smart healthcare.

Table 2 ML-based Recommendation Systems for Smart Healthcare

7.2 ML-based prediction system

In [82], the authors proposed an IoT framework for predicting whether the person under observation is in stress or not by monitoring his/her heart beats. The proposed framework detects the pulse waveforms using a specially designed WiFi equipped board, which forwards the data to a pre-defined server. Next, the data gathered at different time intervals are assembled and stress prediction is evaluated by applying various ML techniques such as SVM and logistic regression. Simulation results showed that precision of the proposed framework can reach up to 68%. However, its precision can be improved further using appropriate classification models. In [83], the authors proposed a smart tele-health monitoring system using speech recognition algorithms. Its design goal is to identify and predict the occurrence of Parkinson’s disease using K-mean algorithm. The proposed system is device-independent and can be employed by a variety of wearable devices. The proposed system employs an edge computing framework as the wearable devices are resource-limited. The idea behind using edge computing is to achieve distributed services by reducing the reliance on centralized infrastructure. In [84], a cloud-based IoT framework was proposed for monitoring various diseases. It forecasts the level of these diseases, i.e., from normal to severe among students. It utilizes the concept of computational science on the data collected from the students using sensors and are stored at a repository to predict severity of the disease. Furthermore, various classification algorithms are used to predict the occurrence of such diseases. The proposed approach is evaluated using various performance metrics such as specificity, sensitivity, and F-measure. Simulation results prove that in terms of accuracy, the proposed approach outperforms the traditional approaches. In [85], the authors proposed a smart e-Health Gateway at the edge of the network in Fog-assisted system architecture. The gateway can perform real-time data processing, data mining, and data storage, locally. Moreover, the strength of the proposed architecture is that it can enable us to solve some of the emerging and complex issues faced by the ubiquitous health-care systems, such as mobility, energy efficiency, scalability, and reliability. Practical demonstration of proposed prototype demonstrated high-level features such as Early Warning Score (EWS) of our health monitoring system. The authors in [86] proposed a three-layer architecture for storing a large amount of sensory data for earlier prediction of heart diseases. In the proposed architecture, the first layer is responsible for data collection. The second layer is concerned with the storage of large volume of sensory data at the cloud. Finally, in the third layer, a prediction model for heart diseases is developed. At this layer, “Receiver Operating Characteristic Curve (ROC) analysis is performed that identifies potential symptoms before the occurrence of heart disease. In [87], the authors discussed the application of IoT in healthcare. They presented a novel ML-based model for disease classification in a healthcare monitoring system. Based on the extensive simulations, it was concluded that the proposed framework can extensively enhance the performance and detects diseases with higher accuracy. In [88], the authors proposed a Hierarchical Computing Architecture (HiCH) for the IoT healthcare sector. They proposed and implemented a system, similar to IBM’s MAPE-K model REF for the arrhythmia detection. The proposed system has three distant but interrelated layers of fog computing. They are: sensor devices layer, edge computing devices layer, and cloud computing layer. The responsibility of the first layer, i.e., sensor devices layer, is to sense and monitor the phenomenon of interest. Next, edge computing devices layer is responsible for making a local decision as well as system management. Finally, heavy training procedures are performed at the cloud layer. Simulation results show that the proposed system outperforms the traditional systems in terms of response time, bandwidth utilization, and memory utilization. However, accuracy of the proposed system is lower and may be improved further in the future. In [89], the authors proposed a low-cost, remote monitoring system that detects various fatal diseases such as cardiovascular diseases, diabetic mellitus, hypertension and different chronic degenerative medical conditions. The proposed system detects these diseases by measuring Heart Rate Variability (HRV), i.e., variation that occurs between consecutive heart beats concerning time. The data from the patients are captured using Zigbee pulse sensor. The captured data is then transmitted to the application server using Message Queuing Telemetry (MQTT), a specially designed IoT protocol. At the application server, the HRV data is further analyzed and visualized that shows any abnormalities for timely actions to be taken. Similarly, in [90], a novel, intelligent system called neuro-fuzzy temporal intelligent medical diagnosis system was proposed. The proposed system uses fuzzy rules that can classify and efficiently predict various fatal diseases. In Table 3, we have shown various ML-based prediction systems for smart healthcare.

Table 3 ML-based Prediction Systems for Smart Healthcare

7.3 ML-based data aggregation

The authors in [91] proposed a real-time data compression technique, known as Adaptive Learner Vector Quantization (ALVQ). The unique feature of ALVQ is that it works without having prior knowledge of the underlying topology. Initially, data is aggregated at the sensor level by wearables to ensure that only non-correlated data is forwarded towards the cluster head (CH). This not only reduces the computational cost on the CH but also reduces communication cost in the network. However, the proposed technique does not devise any aggregation mechanism at the CH level. Moreover, the applicability of this technique should be evaluated for critical applications with an acceptable level of accuracy. In [92], the authors presented a cluster based self-Organizing data aggregation framework for a healthcare facilitation. A self organizing algorithm is employed that classifies the aggregated healthcare data. The proposed scheme reduces the high-dimensional space into low-dimensional space that lowers the amount of transmitted data in the network and enhances the network lifetime. Moreover, it also enhances the quality of the aggregated data. In [93], the authors eliminated the highly correlated data using big data techniques. Hadoop framework was used to extract the critical information from data captured by sensors detached with the patients. Once redundancy is eliminated, the refined data is forwarded towards the physicians in real-time for timely action. As a result, various services provided by health care professionals are significantly improved. This reduces the amount of data transmitted across the network that in turn improves the responsiveness, accuracy, QoS, energy conservation and network lifetime. In [94], a novel framework known as “health informatics processing pipeline” for big data analytics in IoT was proposed. The proposed framework uses various techniques to extract useful patterns from the raw gathered data. The main features of the proposed framework include data capturing, storage, analysis, and data searching. The proposed framework eliminates the correlated data and transmits only highly refined and useful features. These features enable the framework to decide with the help of a decision support system using various ML techniques. In Table 4, we have shown various ML-based data aggregation schemes for smart healthcare.

Table 4 ML-based Data Aggregation for Smart Healthcare

7.4 ML-based living assistance

IoT-based solutions are assisting elderly population in the form of personalized, preventive and collaborative care. In this regard, authors in [95] presented IoT-based living assistance for the aged population. The proposed system monitors and stores the vital information of patients using a cloud-connected wrist band. An alarm is raised during critical situations that assist the patients by informing the healthcare professionals to take the right action and decision. The proposed solution is both energy and cost-efficient. Likewise, in [96], the authors proposed a framework that monitors medicine intake of patients. The key attributes of the proposed system are that: it tracks the medicine intake from the patients history including missed dosage. In case of medication discrepancy, such as missed or over dosage, an alarm is generated alerting both the patients as well as the medical staff. Moreover, in [97], authors designed a patient monitoring system for critically ill patients in the intensive care unit (ICU). The proposed system informs and assists all stakeholders in real time, whenever abrupt changes occurs in the pre-defined conditions for timely action. In [98], the authors has proposed a novel monitoring system based on the patient movement. The proposed system provides emergency services to the patients by evaluating their emergency situation from monitoring their movement. The in-home patient monitoring system relies especially on the proposed monitoring system. In [99], a system that explicitly detects the human presence without using cameras or motion detectors was proposed. Initially, the system collects interactive data, i.e., reading or writing with a diverse range of devices. Next, the presence of human is detected using various ML classification algorithms such as C4.5 decision tree, linear SVC and random forest. The system was initially trained and tested using a dataset gathered over a period of 3 days from 900 users. Simulation results shows that the precision of the proposed approach may vary from 50 to 99% with varying range of classification algorithms. However, it needs to be tested in real world scenarios within various settings to study its behaviour. In [100], the authors proposed an inexpensive health-care monitoring system for patients. The model is based on lightweight sensor-enabled wearable devices performing sensing, analyzing and sharing of real-time health-care data from the patients. An Arduino-based wearable device with body sensor networks is employed for data collection. Moreover, Labview is integrated with the system to facilitate the remote monitoring of home-bound patients. The proposed system eliminates many deficiencies that exist in manual systems. In Table 5, we have shown various ML-based assisted living approaches for smart healthcare.

Table 5 ML-based Assisted Living Techniques for Smart Healthcare

7.5 ML-based secured analysis

It is imperative to ensure the security and privacy of health care data due to its sensitive nature. In this regard, authors in [101] presented an on-line healthcare monitoring system. The proposed system collects and analyzes the health-related data from the patients, using sensors and medical devices, that negate the death circumstances. They fused various techniques such as watermarking and signal enhancements to improve the security and performance, accounting for clinical errors in the proposed scheme. Authors in [102] proposed a uniquely collaborative and intelligent security model for the IoT-based healthcare environment. The main objectives are to reduce security risks posed to a diverse range of IoT-enabled healthcare solutions. The proposed system is designed with a particular emphasis on the recent advances in this field. Various ML techniques are used for secured classification of the patient data. Likewise, authors in [103] presented a WBSN-enabled IoT healthcare solution. The proposed approach monitors the patient using wireless body network that consists of tiny, lightweight sensor nodes. The proposed approach uses various ML techniques to ensure that security is enhanced by protecting WBSN from intruders and various attacks. In [104], the authors proposed a novel mobile cloud computing framework for big data analytics. The main features of the proposed framework are that it offers availability and interoperability of health-care data, which can be shared among all interested parties. Various ML and DL techniques were used for classifying and testing the gathered data from patients. Although privacy and security of the health-care data are thoroughly discussed, they were not evaluated practically. In Table 6, we have shown various ML-based secured analysis approaches for smart healthcare.

Table 6 ML-based Secured Analysis for Smart Healthcare

8 Challenges and open research issues

In this section, we provide an insight into various challenges related to ML techniques for big data analytics in the IoT healthcare domain, as shown in Fig. 5. Moreover, research gaps are also provided for researchers to fill them in the future.

Fig. 5
figure 5

Challenges faced by ML techniques for big data analytics for IoT healthcare

8.1 Resource scarcity

In IoT, most devices such as sensors, smart phones, microcontrollers actuators, RFIDs, and gateways have limited energy with lower computational and processing power [105,106,107]. Moreover, data generated from these densely deployed, resource-starved devices contain similar and redundant patterns. Transmitting such correlated data across the network results in high energy consumption, lower QoS and lower throughput [108, 109]. The resource limitation issue is resolved upto some extent by integrating the IoT with the cloud computing paradigm. However, it increases the cost and complexity. Besides, other issues related to resource management such as resource discovery, modeling, provisioning, scheduling, estimation and monitoring are still of higher concern due to the unique nature of IoT networks [110]. Furthermore, optimization within the resource allocation techniques is an area to be explored further in this context. It is compulsory to design novel, lightweight and energy-efficient data aggregation techniques based on ML, as most of the existing techniques are not energy-efficient. Moreover, novel schemes should be devised that distribute the task among various IoT components that not only matches the resource scarcity of these networks, but also offers an acceptable level of accuracy [105].

8.2 Security and privacy

The application of IoT in healthcare domain is providing personalized facilities, i.e., customized and rapid access to healthcare which was unimaginable earlier. In these applications, both the technology and healthcare devices work with each other to offer a wide range of services. It is forecasted that almost 40% of IoT-related technology will be health-related shortly, more than any other market segment, with a huge market share of USD 136.8 billion by 2021 in [111]. Such developments in this field are revolutionary, however, it should be carefully adopted due to the challenges faced in the context of security, privacy and sensitivity by health-related data [112,113,114]. Upstream transmission of compromised data not only has a devastating effect on the underlying data aggregation technique but also deteriorates its performance [115]. It exposes the underlying networks to a wide range of security attacks such as DoS, eavesdropping, Sybil, sinkhole, and sleep deprivation attacks. These threats remain a challenge due to the rapid expansion in the field with an ever-increasing number and complexity of the emerging software and hardware vulnerabilities. Besides, healthcare data containing sensitive and confidential information such as personal details, family history, electronic medical records, and genomic data should be kept confidential. It was predicted that 72% of malicious traffic targeted the healthcare data [116]. It is thus imperative to protect such data from hackers by enforcing privacy and security, both physically and virtually [117]. Other challenges include low security, misconfigured devices, and network settings. Moreover, data from these varying range of devices are mostly heterogeneous in nature and usually managed by third parties and thus governance, security, and privacy of such data become a challenging task [118, 119]. Furthermore, existing security techniques are not a feasible option due to the resource-constrained nature of IoT devices. Designing lightweight and energy-efficient data aggregation techniques that not only secure, but also ensure the confidentiality, security and privacy of the data is an interesting domain for further examination.

8.3 Interoperability

Recently, we have witnessed rapid development both in the hardware and software but the actual challenge is the lack of global standards that are accepted and agreed by public across the globe. Thus, the healthcare IoT devices pose serious interoperability challenges. The designer must not only focus on the development side but at the same time, strive for interoperability among all aspects of IoT eHealth such as smart wearables, body area sensors, and advanced pervasive healthcare to promote healthier life styles [120, 121]. The benefits associated with interoperable devices are increased throughput, minimized unplanned outages, and reduced maintenance costs. Semantic interoperability of the clinical information is an important area for future research.

8.4 Energy management

Energy management is another challenging aspect of IoT healthcare applications. Usually, wearable and sensors attached to the human body are energy-constrained. They are equipped with limited energy supplies [122]. The frequent changes of batteries in these sensors and devices is cumbersome and sometimes impossible. Supplementary healthcare professionals with additional costs will be required to constantly look after these devices and sensors for battery replacement, whenever energy goes beyond certain thresholds. This will result in fatigue and mismanagement due to dynamic environments. Energy efficiency becomes an integral factor that determines the success of the underlying applications [22]. To overcome and improve energy conservation, it is necessary to design low power sensors that do not require frequent changes of batteries while, providing a reliable supply of power at the same time. Moreover, energy optimization algorithms with smarter energy management techniques have seen little attention and therefore need serious consideration from the researchers in IoT healthcare sector [123, 124]. Another area of research is the optimization of routing approaches that exploit the correlation among the captured data before it reaches its final destination, i.e., data aggregation techniques. These techniques eliminate redundancy that lowers the communication cost, conserves the energy and enhances the network lifetime.

8.5 Big data analytics

Another challenging aspect of IoT healthcare is big data analytics that deals with large-scale unstructured data. Recently, we have witnessed significant developments in hardware, software, and a diverse range of innovative IoT applications. Moreover, the growth forecast of IoT in the future is even more exaggerated with a large number of interconnected data sources and platforms with global infrastructure for information and communication. As a result, huge amount of data is produced. This large volume of mostly redundant data is transmitted across the network for analysis and decision making. Transmitting such large volume of data across the network can adversely affect the network performance. This brings many challenging issues that need to be dealt with utmost care [125]. In this context, it would be interesting to see how to gain insight into this huge volume of data for better decision making and optimized operations using various ML and DL-enabled techniques [126]. It is imperative to design novel big data analytics tools and techniques that perform analysis and extract the required information. Innovative noise removal techniques are needed to enhance the data signal, improve the quality of aggregated data, and conserve the overall energy of the network [127]. More importantly, in healthcare applications, most of the devices perform real-time monitoring and analysis. It would be interesting to see novel ML techniques in the future that apply real-time analytics by monitoring current conditions and respond accordingly. Novel data aggregation techniques with outlier reduction should be devised with improve security, QoS and lowered computation complexity. Furthermore, data aggregation has a stronger relationship with the underlying topology of the network. The performance of these techniques are greatly affected by the underlying topologies [128,129,130,131]. In this regard, clustering tends to be more effective in static networks, where network configuration remains the same for longer time. However, they need to be studied in dynamic as well as heterogeneous environments [132,133,134,135,136]. Finding an optimal location for these devices should be further investigated so that IoT can cater for a wide range of emerging healthcare applications in the years ahead.

9 Limitations and future work

In this paper, we have presented a detailed survey of big data analytics in IoT health-care domain. We have thoroughly studied the literature and selected the most relevant and up to date surveys to find research gap. Furthermore, we have also provided a comprehensive and state-of-the-art literature on ML-based techniques for big data analytics in IoT smart health. A detailed discussion of their strengths and weakness was also provided. This provided an insight to the readers in this domain and enable them to start their research by selecting the topic of their choice from available pool of techniques. Various research issues and challenges were discussed that motivate the researchers to exploit them further. Moreover, various issues that raised due to the emerging and cross-domain architectures of IoT, i.e., Internet of Nano-Things (IoNT), and web of Things (WoT) were thoroughly discussed to make a universal IoT vision a reality, a vision that successfully integrates this technology in almost all domains and that will hopefully flourish our daily lives in the years to come.