Keywords

2.1 Introduction

Nowadays, the concept of Big Data is a very popular concept [1]. Data is generated with huge volume and very high speed and it is increased exponentially [2, 3]. Big data is not a novel concept. However, researches change the way of defining it continuously [4]. Recently, Big Data is defined by fifteen characteristics while in the past it was identified by three characteristics which are volume, variety and velocity [5,6,7]. Big Data in healthcare is an active research topic [7]. The size of Big Data in Healthcare was about 150 Exabytes in 2011, and it is increased by a rate between 1.2 and 2.4 Exabytes annually [8]. Big Data are complex data that cannot be managed and analyzed using conventional software and hardware [9]. Researchers from all over the world have difficulty in handling multidimensional healthcare data [10]. So, improved algorithms and approaches are required to obtain the best outcomes. Researchers have been developing novel techniques and algorithms to handle Big Data [2]. BDA plays a significant role in extracting the valuable information from huge volume of data [11, 12]. Healthcare is a data-rich sector [5]. Patient’s health monitoring is useful for detecting various types of diseases [13]. It is predicted that the number of medical staff in 2030 will be forty-three million and this number will be much lower than what the world needs. So, alternatives are needed [14]. One of the most useful technologies that can help healthcare industry is IoT [13]. Machines which communicate with themselves and share data between each other are known as IoT [15]. These devices are linked and an IP address is assigned to each device [16]. To get IoT data, usually sensors are deployed to aggregate data and these data are transmitted to the centralized server. IoT permits generating huge amount of data [17]. IoT is one of the internet applications which can improve our life and it includes machine to machine learning (M2M) [15, 17]. Extracting the useful data from IoT data is not an easy task. So, BDA tools are needed to analyze real-time data of IoT. Combining BDA with IoT makes IoT more valuable [15]. Big Data can be employed in healthcare field to analyze and manage a large amount of health data [18]. Privacy and security are the main challenges in IoT and BDA [10, 15, 19, 20]. The purpose of this chapter is to summarize Big Data characteristics, Big Data sources, BDA-related work, challenges of BDA in healthcare, BDA platforms and tools, Integrating BDA with IoT challenges, and recent advances in IoT-based Big Data.

This chapter is aimed to provide researchers a review on recent advances in IoT-based medical Big Data analytics.

The major contributions of this chapter are as follows:

  • It reviews Big Data platforms and Tools.

  • It reviews the work that has been done on BDA in healthcare. It also highlights the significance of Big Data in healthcare.

  • It discusses the challenges of BDA in healthcare.

  • It reviews different types of sensors in healthcare monitoring system and discusses the challenges that sensors face.

  • It discusses the challenges of Integrating IoT with BDA.

  • It reviews the recent advances in IoT-based Big Data.

  • It will empower researchers to further work on BDA under IoT.

This chapter is organized as follows: in Sect. 2.2 Big Data definition is presented. Section 2.4 presents the healthcare Big Data sources. Section 2.5 reviews Big Data platforms and tools. In Sect. 2.6 BDA-related work is presented. Section 2.7 discusses the significance of Big Data in healthcare. The challenges of BDA are discussed in Sect. 2.8. Sections 2.9, 2.10, 2.11, 2.12, and 2.13 discuss IoT, sensors, sensor challenges, integrating IoT with BDA challenges and recent advances of IoT-based Big Data respectively. Figure 2.1 depicts the structure of the chapter.

Fig. 2.1
figure 1

Big Data characteristics

2.2 Big Data Definition

The first property of Big Data that comes minds is size. However, there are other characteristics [21]. Big Data means huge data which involves structured, unstructured, and semi-structured data [11]. Most authors define Big Data by the three characteristics which are volume, velocity, and variety and they also known as three V’s [11]. Volume is the large amount of data which are produced continuously from several sources. Velocity is the data which are produced rapidly and need to be processed in a rapid way for extracting useful information [9, 11]. Variety is the Big Data which are produced from different sources and in many formats. However, some authors and scientists have added more characteristics to give a better definition [11]. IBM has introduced the fourth V, which is veracity. Veracity indicates that there are some unreliable sources of data [21]. It also refers to data quality and the degree of certainty [9]. Other authors have defined Big Data by six characteristics which are denoted as six Vs. These authors have added value and variability to the characteristics. Value refers to the useful outcomes of the analysis [9]. Variability relates to the data flow rate variance [21]. The authors in [6] describe Big Data by fifteen characteristics (Fourteen Vs and a C). Thus, Validity, Volatility, Visualization, Virality, Viscosity, Venue, Vocabulary, Vagueness, and Complexity have been added to the characteristics [6]. Complexity refers to the difficulty of organizing and analyzing Big Data due to evolving data relationships [11]. Figure 2.1 exhibits the Big Data characteristics.

2.3 Big Data Sources

There are essential sources of Big Data which are social data, machine data, and transactional data. Social data generates from likes, tweets, comments, and video updates. Machine data generates from industrial equipment such as sensors. These kinds of data are growing exponentially as IoT grows continuously. Transactional data generates from the daily transactions such as payment orders and storage records [22].

2.4 Healthcare Big Data Sources

  1. 1.

    Physiological Data

    These data are enormous as far as volume and speed are concerned:

    1. (a)

      Volume

      An assortment of signals is gathered from different sources to monitor patient attributes involving blood pressure, blood glucose, and heart rate.

    2. (b)

      Velocity

      Data are collected rapidly. In healthcare sector, data is generated at high speeds [23]. The increased growth in data resulting from continuous monitoring needs to be processed in real time in order to assist making the appropriate decision. Effective approaches are needed to analyze and process the gathered signals to give useable data to the healthcare experts and other related partners [24].

  2. 2.

    Electronic Medical Records/Electronic Health Records

    Electronic medical record (EMR) is a digital record which contains the medical activity information of the patient and it is usually used to make a treatment decision [1]. Electronic health records (EHRs) are the most important source for Big Data which are the digitized healthcare data from a patient [24, 25]. The EHR are gathered from and exchanged by clinics, and insurance agencies. EHR view, store, and gather all details of the patient’s health like vital signs, immunizations, results from the test of radiology or laboratory, past clinical history, pathology reports, allergies, active medical problems, and medications [25,26,27]. EHR contain both structured and unstructured data [1, 25]. The difference between EHR and EMR is that EHR contains the whole patient records from birth to death in one place. So, EHR contains the patient’s record from several doctors [1, 25].

  3. 3.

    Medical Images

    These images produce a massive amount of data which can help experts for distinguishing or recognizing disease. Medical images are unstructured data which involve X-ray and CT scans. They play a significant role in diagnosis. Due to the complexity, dimensionality, and noise of the gathered images, effective image processing approaches are needed to offer appropriate data for patient care [24, 28]. One of the systems that can be employed for storing medical imaging are Picture Archival &Communication Systems (PACS) [25].

  4. 4.

    Clinical Notes

    Clinical notes contain claims, recommendations, and decisions and they are one of the biggest unstructured wellsprings of healthcare Big Data. Due to the diversity in format, dependability, completeness, and exactness of the clinical notes, it is difficult to assure that the healthcare provider has the right information. Effective data mining and natural language processing techniques are needed to extract useful data [24].

  5. 5.

    Behavioral Data

    The sources of this data are social network data and sensors data [25, 28]. Sensors collect data from patients. Diseases and their related symptoms are tracked continuously for providing best cure [28].

  6. 6.

    Genomic Data

    This data addresses aspects of DNA in structural and serial order of different functions of genes. There is a need for a particular program that can store and process this data. A repository named genomic database includes human genomes and association rules related to genomes. This repository identifies the identical symptoms that affect health and its related infections [28].

2.5 Big Data Platforms and Tools

  1. 1.

    Advanced Data Visualization (ADV)

    It is helpful for handling with various data types and it can be used easily. Moreover, it assists analysts to explore data. It achieves perfect results and it is employed for extraction of medical hidden patterns in healthcare data [25, 28].

  2. 2.

    Presto

    Presto is a distributed SQL Query engine which is employed to analyze a large amount of clinical data. Using Presto, data analysis can be performed rapidly [25, 28].

  3. 3.

    Hive

    Hive was at first evolved by Facebook; however, it is currently utilized and developed by different organizations such as Netflix and Amazon [24]. It is employed to handle with large-scale data records. Compared to Presto, Hive is slower. However, it is an effective tool which can perform all Excel sheet missions efficiently. It is commonly used to store and retrieve medical records [25, 28]. Data stored in Hive can be accessed by Presto [25]. Hive is utilized for Big Data analysis, summarization, and queries [3].

  4. 4.

    Vertica

    It is similar to Presto and is used to process a massive volume of clinical data which can be used later for data analytics. Also, it is affordable and its architecture has the advantage of simplicity. In addition, it has other advantages such as operational costs and accelerating healthcare reports and documentation which assist in analyzing health patterns of patients [25, 28].

  5. 5.

    Key Performance Indicators (KPI)

    They use electronic healthcare records to determine practices of people. Patients who are more susceptible to clinic environment may be exposed to KPI tool to improve outcomes [28].

  6. 6.

    Online Analytics Processing (OLAP)

    In OLAP, data is ordered in multidimensional patterns and statistical computation can be performed rapidly. It increases data safety restrictions and improve quality control. Also, it tracks healthcare records and assists in disease diagnosis [25, 28].

  7. 7.

    Online Transaction Processing (OLTP)

    It is used to process registration of patients . It is also employed for analyzing different operations of patients [28].

  8. 8.

    Apache Hadoop

    It was utilized for the first time by Yahoo and Facebook. It is an open source data processing framework which can store and process A massive volume of data on a cluster of hardware [29]. This framework is based on Java [30]. Hadoop consists of many components and the most significant ones are Hadoop Distributed File System and the MapReduce programming model. HDF is employed for storing data, while MapReduce is utilized for processing these data [29]. There are many benefits of Hadoop such as high scalability and flexibility, cost-effectiveness, and reliability to manage and process a large volume of structured and unstructured data [31].

  9. 9.

    The Hadoop Distributed File System (HDFS)

    HDFS improves the performance of clinical data analytics due to the partition process of massive data sets into small ones. These small data samples are distributed across whole system. It assists in diagnosis [25, 28].

  10. 10.

    Casandra File System (CFS)

    It is very similar to HDFS and it is employed to deal with analytical processes and tolerates errors [28].

  11. 11.

    Map Reduce System

    This system handles with a large amount of data. It segments the chore into subchores and gathers its yield. It efficiently incorporates different operational computations into the system. It monitors each server where the chore is being performed. It can perform parallel errands effectively [28]. It is cost-effective [3].

  12. 12.

    HBase

    It is a column-oriented management database . It runs on the top of HDFS. It is suitable to process the real-time data [32]. It is used to gather and analyze billions of rows in a short time [3].

  13. 13.

    Cloud Computing

    Cloud computing has a great impact in healthcare field. It makes healthcare more valuable via reducing costs, increasing the productivity and data analysis. It also offers excellent security [25].

  14. 14.

    1010data

    1010data contains of a columnar database and it is commonly used to handle with semi-structured data like such as IoT data. It can visualize, report, and integrate data. It is an advanced analytic tool. It can provide optimization and statistical analysis services. In addition, it supports large-scale infrastructure. The drawback of this tool is the inefficiency of extraction and transformation of data [29].

  15. 15.

    Cloudera Data Hub

    It is a Hadoop-based platform which is used for process and analyze the massive IoT data. To obtain more reliable service and excellent performance and data access control, the Cloudera Data Hub integrates Cloudera Manager, Navigator, and its backup and recovery components [29].

  16. 16.

    Infobright

    It is employed to solve data management and analytic issues. About fifty terabytes of data can be analyzed via this tool. It is appropriate for IoT data. This tool is commonly employed with Hadoop [29].

  17. 17.

    Hortonworks

    It builds a massive IoT data analytics and management framework based on Hadoop. The Hortonworks Data Platform (HDP) possess a free open source software distribution and concentrates on the enhancement of Hive. However, with its HDP plugin, it doesn’t have the ability to decrease the number hosts per node group in the produced cluster [29].

  18. 18.

    HP-HAVEn

    The Hadoop Autonomy Vertica Enterprise (HAVEn) security was presented by HP. It is a novel large IoT data platform architecture for a large number of HP systems which is employed with any number of applications. HP offers reference equipment setups for the major distributors of the Hadoop software. Autonomy’s IDOL software offers search and exploration services for unstructured data [29].

  19. 19.

    MongoDB

    It can be employed as a file system. It can manage data which change frequently. Also, it can be utilized for unstructured or semi-structured [24].

  20. 20.

    Apache Spark

    It is an open source framework which is utilized to process a large amount of data. It outperforms Hadoop MapReduce in speed and simplicity. It supports real-time processing machine learning algorithms [3, 7, 33].

  21. 21.

    Apache Mahout

    Apache mahout is a highly scalable and it supports machine learning techniques for smart data analysis applications [33].

2.6 Big Data Analytics–Related Work

A large volume of data is generated continuously. So, advanced data analysis is needed for better understanding and extracting the valuable information. There are many challenges of BDA such as the complexity of data which is a heterogenous data. BDA is an active research field. There are various analytical techniques involving data mining, visualization, statistical analysis, and machine learning [11, 34]. Data mining can help in diagnosis of diseases and providing efficient treatments [35].

BDA has the capability for handling a huge amount of data efficiently. Healthcare field produces a massive amount of data. These data could be structured or unstructured which makes processing them a challenging task. In [36], EMR which has been generated from many medical devices and apps was induced into MongoDB via Hadoop framework. So, better understanding for data was achieved and decision can be taken quickly [36].

Heterogenous data are generated in healthcare sector. Without BDA data are useless [37]. In [37], a survey on BDA in healthcare is presented. It was found that Hadoop can performs BDA effectively. Thus, urgent cases can be predicted.

BDA can support doctors’ decision [38]. In [38], authors present a review on BDA in healthcare. BDA platforms, algorithms, and challenges have been presented.

In [39], Big Data framework for healthcare systems was proposed. These systems can provide services based on the analysis of vital signs such as ECG. Algorithms for extracting feature values from raw data were added to Hadoop platform due to the inability of Hadoop to handle unstructured bio-signals.

In [40], authors present a review on utilization of BDA in heart attack prediction. Also, privacy concerns and challenges of BDA were discussed. It was found that BDA can be utilized to predict and prevent heart attack effectively.

In [41], the authors present a system for gathering EHR using Hadoop framework. These EHRs is stored in Hbase which is the database employed by Hadoop ecosystem. MapReduce functions are employed to split the data into parts. Each subpart is mapped to a specific node in the cluster for processing purpose. Medical data can be updated and statistics can be viewed.

Healthcare sector contains a large amount of data with high speed. This huge volume of data needs novel BDA framework where conventional machine learning tools cannot be applied directly. In [42], BDA framework was proposed to develop risk adjustment model of patient expenditures. Random forest regression algorithm was employed to enhance the accuracy of the predictive model. This paper exhibits the efficacy of predictive analytics using random forest algorithm.

BDA and deep learning are two important techniques as several organizations have been aggregating a huge amount of data [43]. The paper in [43] was discussing the significance of deep learning in BDA and its ability to extract useful information from a huge volume of data.

In [44], the authors were discussing that deep learning plays a significant role in predictive analytics and it can be used in analyzing medical images and diagnosis of diseases.

In [45], a voice pathology detection system using deep learning on the mobile healthcare framework was proposed. A convolutional neural network has been used. The obtained accuracy was 97.5%.

Measuring hospital’s performance is a significant process. However, it is not an easy task. BDA can be used to measure the performance of the hospitals in order to enhance the quality of the healthcare. In [46], the performance of US hospitals has been measured using machine learning.

In [47], BDA framework to analyze ten billion health records has been proposed. To build BDA platform, changing the configurations of MapReduce and the indexing of HBASE are required. This work has exhibited that the presented BDA process and configuration fulfilled security requirements. Also, the performance was satisfied.

In [48], BDA techniques in healthcare, applications, and challenges have been reviewed. It was found that BDA increases the treatment efficiency. In addition, it can be used to predict diseases. Also, deep learning can provide advanced BDA in the area of medical images.

Health forecasting is very significant. Many researchers have worked on this area in order to provide effective predictive analytics model which can forecast the future health situation using machine learning. In [49], BDA model was proposed for disease forecasting using Naive Bayes Technique (BPA-NB) . This technique is an appropriate huge dataset. Heart disease data was used to train the model. The obtained accuracy was 97.12%. Hadoop-spark was employed as a computing tool.

Cancer diagnosis at early stages is very crucial for effective treatment. Each single patient record produces a huge amount of data. Various machine learning methods were proposed to classify cancer [50]. In [50], BDA algorithm was proposed to diagnose cancer. HDFS was used to classify EHRs and EMRs. Also, National Language Processing (NLP) was utilized for the analysis and classification of cancer patient data.

Mining is an approach to explore a huge amount of data and extract the hidden patterns in order to understand the medical data and prevent heart diseases. There are various data mining techniques including Naïve Bayes, Decision tree, Neural network, genetic algorithm and clustering algorithms like KNN, and Support vector machine. In [51], data mining models for predicting heart disease are reviewed.

2.7 Significance of Big Data in Healthcare

It is significant to extract worthy information and get rid of useless parts from Big Data. Big Data in healthcare can achieve economic advantages using BDA such as saving money in the healthcare industry. In addition, it can be employed in clinical diagnosis, medical research, and hospital management [1]. Research institutions can benefit from BDA in developing drugs. For instance, cancer data can also be reprocessed to produce new cancer medicines. Thus, clinical trials can be improved with the help of statistical tools. So, BDA can provide effective drugs and it reduces the cost for patients [1, 37]. BDA assures that patients get the suitable treatment [23]. Also, doctors can made best decision based on these analytics, leading to improving care of patients. BDA can assist in predicting disease in an early stage before spreading. Thus, spreading can be prevented [37]. Using BDA governments can monitor the quality of the hospitals and can take the required procedures against the disqualified hospitals [37]. Health data can be stored, managed, and shared by the patients easily using smart phone apps and online websites. With the help of BDA, the diseases can be detected earlier than before and patients can get their cure in an early stage. Patients can monitor their health and enhance their lives using information mining and make the right decision about their health such as choosing the best diet and exercises [1, 23]. Via BDA, insurance companies can make the best policies [29].

2.8 Challenges of BDA

There is a massive amount of data available in healthcare industry, so it is difficult to know which data can be employed and why. Also, there is another issue, which is the lack of suitable IT infrastructure. Furthermore, there is a need to employ distributed data processing instead of paper records [52]. It is predicted that Big Data of healthcare will be nearly 40 ZB by the end of 2020. Healthcare Big Data consists of a huge amount of unstructured data which makes data analysis a difficult task [1]. In addition, BDA systems which can be employed for real-time cases are required [52]. Also, there is a difficulty in storing large amounts of medical data owing to the high cost of storage process. Healthcare industry generates huge data such as medical images and the outcomes of diagnostic tests. These medical data are growing continuously and they require to be maintained for a long time which is over fifty years in order to track the patient’s health [1, 5]. Medical data are the most sensitive one in the Big Data and patient’s data need to be protected to maintain the privacy of the patient. Data sharing between hospitals and health organization increases the concern of the issue of Big Data privacy in the healthcare sector. Moreover, one of the problems that governments face in BDA for healthcare is that there isn’t much data protocols and standards [1, 5, 52]. Another issue is that EHRs data depend on the team that enter the patient’s data and this team may enter wrong data and this will affect the outcome [9]. Also, there are a few companies in the word which are professional in the field of BDA, So, there is an urgent need for well-trained data analysts who have a good knowledge for visualizing the data using best tools and they can interpret the Big Data results [1, 52].

Moreover, a healthcare professional charges their service fees by meeting face to face with the patient. This generates serious prejudice against permitting new technology which reduces human interaction [53].

The data in several healthcare organizations, especially hospitals, are usually fragmented. For example, cost information is employed by the financial team. Clinical data like patient history, vital signs, progress notes, and the outcomes of diagnostic tests are stored in the EHR. These data are available to the physicians and nurses and are employed for tracking patient care and making cure plan [5, 53]. The solution for fragmentation problem is collecting data from different sources

Then this data should be normalized into a consistent structure. So, organizations will not need data bridges. Also, AI will have the ability to perform well in real time [53].

Imbalanced data is another challenge in Big Data. Recently, this issue has gained big attention [11]. Sometimes there are two classes with unequal distributions. Also, imbalanced multi-classes are another issue [11].

Also, monitor systems generate continuous data streams such as electrocardiogram which is hard to store. In addition, storing all these Big Data is too expensive, leading to incompleteness of data [1].

2.9 IoT

IoT can develop our life and save time. Recently, there are many research studies on IoT. So, IoT is a hot topic and there are many investments on it. IoT is composed of billions (even trillions) of connected objects which share data between each other. These objects are called smart objects (SOs) [54,55,56,57,58,59]. Security and privacy are the main challenges in IoT. In the healthcare sector, there are huge concerns about privacy due to the sensitivity of patient’s data [60]. IoT devices produce continuous streams of data. So, it is significant to develop tools for analyzing IoT data [33]. Wearable devices are very significant devices in IoT. Wearable sensors can be utilized for measuring different types of signals. These sensors play a crucial role for patients by measuring different parameters of them. There are different types of sensors with various functions to take care of patients or assist patients to prohibit the risks. These sensors assist doctors to put more attention to patients [61, 62]. There are many types of sensors such as electrocardiogram sensor (ECG) which is utilized for monitoring heart muscles activity [63, 64]. ECG is one of the easiest tests utilized for determining vital data about the cardiovascular system of a patients [64]. Electromyography sensor is employed for monitoring muscle function activity [63]. This measurement is employed for detecting neuromuscular abnormalities [65]. Electroencephalogram is utilized for monitoring brain electrical activity via electrodes at various locations on the scalp [63, 64]. An electroencephalogram machine is composed of electrodes, amplifiers, filters, and recording unit [64]. Blood pressure sensor is used for measuring the force exerted by circulating blood on the walls of blood vessels [63]. The device employed for measuring blood pressure is named sphygmomanometer. The blood pressure is usually expressed in terms of systolic pressure (when the heart beats) over diastolic pressure (when the heart is at rest between two heartbeats) in the cardiac cycle [64]. Breathing sensor monitors respiration [63]. Motion sensors are employed for estimating the level of activity [63]. The normal body temperature of an individual relies upon gender, time of measurement, and recent activities. The typical range for body temperature is from 36.5 C° to 37.2 C° for a healthy individual [61]. A high body temperature is a sign that a person has an infection or fever [64]. Temperature sensor gathers data about temperature from a source and changes it into a form that can be understood by any other device or individual. It is one of the most commonly used sensors in healthcare sector. There are two essential sorts of temperature sensors: contact sensors and non-contact sensors. Contact sensors differ from non-contact sensors in that they need direct physical contact with the object that is being sensed [61]. The level of oxygen in blood is a very significant factor. Pulse oximeter is employed for monitoring a person’s oxygen saturation [64]. Wearable devices are usually networked in order to perform powerful tasks. Wireless Sensor Networks (WSN) have potential applications in many industries such as healthcare monitoring application. This application is aimed at guaranteeing continuous monitoring of patients’ status in a way that enables patients to have freedom of movement [66, 67]. Body Sensor Network (BSN) is a network which is composed of wireless wearable (programmable) sensor nodes that communicate with a local personal device. The essential elements of this emerging technology are sensors, communication protocols, and coordinators [62, 68]. There are many applications of BSN such as healthcare, fitness, smart cities, and many other IoT applications. BSN is considered as a branch of WSN. However, there are many differences between these two networks. WSN has a larger number of nodes than BSN. In addition, it covers larger geographical range. BSN uses the least number of nodes with high accuracy. Moreover, batteries of BSN nodes can be recharged or replaced [69]. BSN applications need more sensors sampling, data transmission rate, and continuous monitoring [69].

2.10 Sensors Challenges

Careful studies are being carried out by researchers to design intelligent body sensors for continuous monitoring for patients with good accuracy. However, they face many challenges. The first challenge is fabricating and implementations. Second, there are both hardware and software limitations. There are many factors that should be taken into account when designing these sensors such as weight, cost, size, energy consumption, and safety. Another challenge that sensors network may face is loss of packets or fading during data transmission process which will lead to latency issue. So, the solution for this issue is allocating the bandwidth in medium access control (Mac) layer [70]. Also, Wireless Body Area Networks (WBANs) face many challenges such as physical layer, MAC layer, network layer, transport layer, and application layer challenges. Physical layer faces various challenges during the implementation of WBANs such as bandwidth limitations, receiver complexity, and Power consumption [66]. For health monitoring applications, Quality of Service (QoS) needs for emergency traffic should be taken into account [66]. Also, one of the hot topics of wireless healthcare observing networks is the ability of these systems to diminish the energy consumption of computing and communication infrastructure [66]. Moreover, dependable data transmission is a significant requirement of a wireless healthcare network. So, frames or packets loss may lead to a latency of information problem [66]. The application layer is at the head of the stack. Thus, it is anticipated to have a coordination role. In this case, data management is important and needs an effective machine learning algorithm to permit autonomous system replacing [66].

2.11 Integrating IoT with BDA Challenges

  1. 1.

    Privacy

    Privacy is the most significant challenge in BDA due to the unwillingness of many users to work on the same system which doesn’t provide any agreement to protect their personal data. There some temporary strategies which are used to protect the personal information of the user but these techniques are not related to privacy. In IoT analysis, security issue is owing to heterogeneous of the devices which interact and share data between them [19, 20]. So, appropriate authentication is needed [19].

  2. 2.

    Security

    There are concerns about the safety of the devices due to the possibility of physical damage. So, these devices need to be protected [15].

  3. 3.

    Data Mining

    Analysis of huge amount of data which is produced by IoT faces many challenges such as information extraction, data visualization, and integration [19].

  4. 4.

    Energy Consumption

    The energy consumption by the devices would be large. So, solutions are needed to reduce this consumption [15].

  5. 5.

    Integration

    Data which is aggregated from several devices can be of structured, unstructured, and semi-structured data. Integration of these data is a complex process [19].

2.12 Recent Advances in IoT-Based Big Data

IoT plays a significant role in the healthcare sector by aggregating and analyzing the medical information to minimize the medical errors [71]. The emerging BDA techniques can be used to improve the health sector. BDA can assist in analyzing a massive amount of health data [18]. IoT is one of the most significant Big Data sources which is based on connecting several intelligent devices to the internet [72]. In [18], real-time big medical data were aggregated from patients using sensors. Then these data are transferred to the cloud server in order to be analyzed. Then the analyzed data will be transferred to the associated individuals. This research has merged the technologies of Big Data, IoT and cloud computing. Figure 2.2 illustrates the methodology that has been used.

Fig. 2.2
figure 2

The used methodology

The authors in [73] have confirmed the significance of Big Data in the healthcare sector. It was found that body sensors produce huge volumes of health data. Two tasks were performed: merging these Big Data points with EHR and displaying these data to supervising doctors in real time. This work has proposed a sensor integration framework which presents a scalable cloud architecture that offer a holistic approach to the EHR sensor system. Apache Kafka and Spark was employed for processing the real-time Big Data. Using this system, a patient’s health can be visualized in real time which can assist in detecting urgent cases.

Cardiovascular disease is becoming a major concern worldwide. Many people all over the world have several chronic heart diseases. The number of deaths due to heart diseases also increases constantly. So, there is an urgent need for ECG monitoring system to monitor the patient continuously and remotely [74]. The work in [74] has proposed a system which merges the concept of Nanoelectronics , Internet of Things (IoT), and Big Data. The nanomaterial reduces the cost of ECG sensors. IoT assists the ECG signals to be sent through the sensor via a gateway using communication protocols such as Bluetooth, Zigbee, 3G/4G, Wi-Fi, and LAN. Then the data is transmitted to the doctor’s end or the cloud storage to analyze or process the data. The patient’s data can be displayed on several intelligent devices. Then urgent situations can be detected. BDA plays a significant role in analyzing data and extracting useful information.

In [75], BDA-based IoT Healthcare system was proposed. This system contains several sensors to monitor patients’ health. It is difficult to handle the massive amount of medical data which is generated from sensors. So, Intel Galileo Gen 2 was worked as an IoT agent and was employed for deploying the health information of patients into the Thinspeak Cloud and then these Big Data was analyzed by Hadoop framework. The system can alert doctors in case of emergencies.

Obtrusive sleep apnea (OSA) is a sleep disorder that has a negative effect on the patient’s life. So, systems which can detect OSA are needed. In [76], an effective system which can assist in detecting OSA and supporting treatment is proposed. This system monitors several factors like sleep environment, sleep condition, physical activities, and physiological parameters. This system has the capability to perform two kinds of processing, which are preprocessing and batch data processing. Preprocessing is done at the edge of the network which leads to an improvement in the efficiency of the system. The proposed system has demonstrated a 93.3% of effectivity in the air quality index prediction for guiding the OSA cure.

Wearable devices are uncomfortable especially when they are used for a long time. So, another solution is needed [77]. In [77], Smart clothes were designed for health monitoring purpose. The visualization of wearable sensor data is performed using a mobile application. These data are also stored on a “health cloud,” which is merged with a machine learning library for diagnostic and predictive analytics purposes [77, 78].

Due to the increased number of the elder and disabled individuals, there is a significant need for efficient systems for health monitoring. In [79], the authors proposed an IoT-based health monitoring system. ECG and other health data which are collected from wearable devices and sensors are sent to the cloud and can be accessed by professionals. Signal improvement, watermarking, and analytics were employed for avoiding identity theft [78, 79].

IoT has proved its significance in remote health organizations [80]. The authors in [80] were discussing about that IoT in healthcare produces a massive amount of data. This paper also demonstrates the architecture of IoT which is composed of five layers: physical layer, network layer, application layer, middleware layer, and business layer. The physical layer composes of sensors which are employed to aggregate data. Network layer is the interface between the sensors and the servers. Application layer is employed for providing services to the users using defining several applications of IoT. Middleware Layer is employed to store and process the data. Business layer deals with system management.

Recently there are concerns about sleep apnea (SA) owing to the increasing number of patients and deaths, as well as the high costs of patient care. Some solutions were proposed to support treatment of elderly people. However, these solutions have some defects. In [81], an effective system to support SA patients was introduced. This system combines fog computing, cloud computing, IoT platform, and Big Data platform. IoT Big Data is analyzed using Big Data analyzer. This system is tested using a dataset with a size of 30 GB and questionnaire which was filled by specialists. The outcomes exhibit that data analytics help specialists to take the right decision and guide patients in treatment which will affect elderly life positively.

In [82], the significance of IoT and BDA in support of precision medicine was discussed. The computational flow of in silico knowledge data discovery was introduced and analyzed and the useful results for the case study of genome mapping dependent on computer model of RNA have been revealed.

There is an urgent need to provide better healthcare services. IoT and BDA can significantly improve the healthcare sector, which is one of the biggest sectors. Thus, better services can be provided. BDA for IoT health data can assist in detecting the diseases at an early stage and making the suitable decisions. In [83], the recent trends in healthcare systems were discussed. The authors focused on cloud computing, fog computing, BDA, and IoT. In addition, the challenges in providing enhanced healthcare systems are discussed. It was clarified that the biggest challenges in the healthcare systems are privacy and security problems.

IoT can be employed in many applications such as smart home, smart farming, and smart healthcare. The massive amount of data produced by deployed sensors needs to be analyzed to make the appropriate decision and improve the accuracy. Deep learning plays a crucial role in making intelligent IoT [84]. In [85], the role of deep learning analytics in IoT has been investigated. In this paper an IoT platform was implemented to analyze and classify the real-time ECG data using Convolutional Neural Network. In [2], IoT architecture was proposed for storing and processing big sensor data of healthcare applications. MapReduce-based logistic regression has been implemented with the assist of Apache Mahout to develop a model for predicting heart diseases.

2.13 Conclusion

IoT means several objects which are connected to each other and has the capability to share data between them. The large amount of data produced by sensors and healthcare applications is continuously increased. These data can be structured, unstructured, or semi-structured data. This massive data is called Big Data. There is a difficulty in storing, processing, and analyzing these data using conventional methods. So, new technologies are needed to extract the useful information for the analysis purpose. There is an increased demand for BDA in healthcare field owing to its crucial role. BDA provides an overview of patient data which has been collected from several sources. Using BDA, decision can be made correctly at the right time. BDA can be employed to manage and control the large datasets. Thus, BDA can enhance the quality of our life. The advancement of Big Data and IoT has a significant impact in all fields including healthcare sector. Integrating IoT and BDA can improve the healthcare field. In this chapter, Big Data characteristics, Big Data sources, and sources of Big Data in healthcare are reviewed. BDA-related work and significance of Big Data in healthcare are highlighted. Reviewing BDA challenges, Big Data platforms and tools, sensors, and integrating BDA with IoT challenges were other parts of this chapter. Finally, the recent advances in IoT-based Big Data are highlighted.

2.14 Future Scope

More efforts are needed for developing machine learning approaches for BDA in healthcare and integrating them with the hardware. Also, future research is required for solving IoT challenges such as security and privacy. Also, most of the research studies on BDA in healthcare sector were from the developed countries. So, it is significant to encourage the researches on BDA in healthcare in the developing countries in order to deliver better quality of care. In addition, future research is required to solve sensors challenges.