Keywords

1 Introduction

The use of the Internet of Things (IoT) in healthcare is sharply increased among a variety of specific Internet-use cases. The Medical Internet of Things takes more effort to improve care itself, with remote monitoring as the broader primary application of remote medicine. The synergies between medicine and technology have taken great strides around the world. For example, the Internet of Things (IoT) data analytics is gaining popularity, providing the next generation of electronic healthcare and mobile healthcare services. They are transforming traditional institutions based on the management of large data with a blockchain solution. The chain’s evolution is in favor and permitted by technical means to maintain the support of strategic applications needed for its potential growth. In the cloud-enabled network blockchain, there are trading and mining nodes both in the cloud and on-premises. According to embodiments, the node may be an enterprise-level server. The final level of overall collaboration on evolution is based on cloud and blockchain applications as the basis for distributed chains’ operation. The devices configured to use public blockchain services and private blockchain nodes clouds to communicate securely via APIs. The IoT devices combining blockchain technology as a security framework IoT system using secure distributed key management techniques make it possible to discover each other and transaction encryption machine-to-machine.

In this, Fig. 1 represents the cloud blockchain network communication. Machine learning techniques have been widely used in the medical field. Most of the medical data resources can use to transfer valuable knowledge to assistive scientific decision making. It is stored in each hospital or other medical institution separately, which poses a major issue to the medical data applied to the constructed prediction model, its quality and efficiency. For medical professionals, researchers, there are several machine learning techniques to be used throughout disease research. Medical data keeps a history of patient records and will be unused in the future. This can be analyzed and considered for future research. This huge database is analyzed to identify machine learning techniques used by healthcare staff to predict patient risk.

Fig. 1
figure 1

Cloud blockchain network model

The most commonly used machine learning technology, which may be registered in the category. Occupies a set of pre-classified patterns to create a demographic model classification. Learning and classification are involved in a process called data classification. The training data is being learned and analyzed by a classification algorithm. The test data is used for classification to approximate the classification rules. The rules can apply to the new data tuple with acceptable accuracy. The pre-classification example has been used in the classifier training algorithm to verify the group that requires the appropriate identification parameters. This section discusses the process of classification of disease analysis using IoT clinical data and machine learning methods.

2 Related Work

The literature survey for various IoT blockchain security methods and disease classification strategies used to predict the result is discussed below.

Co-clustering is used simultaneously to group clinical features and patients to characterize block-wise data loss patterns [1] (A) group-based feature selection and data imputation for specific patient subgroups. (B) Miss the predictive model to consider data availability. The machine learning (ML) method of decision making and the medical field’s data has proven highly predictive and supportive. ML technology’s latest developments in the Internet of Things (IoT) [2] are being used. The task of breath estimation task [3] compares the performance of different machine learning techniques. Problems can be divided into two categories: high and low breathing work based on information extracted from pressure, volume and flow through signals recorded by mechanical ventilation.

Select the feature to solve the problem, and the proposed method is novel, rapid provision of mutual information to select the feature [4]. The feature selection algorithm improves the classification accuracy and is used to select the function to reduce the classification system’s execution time. The neural network modification uses structured data and unstructured data and recommends a conventional neural network (CNN) basic peak disease risk prediction algorithm [5]. To the best of our knowledge, existing work focuses on these two types of medical big data analytics. Machine learning technology has been used for [6] vegetation parameter estimation and disease detection; the effects of the disease symptoms on their performance have been small. Prevention to predict the actual occurrence of the pre epilepsy can help through therapeutic intervention [7]. Studies had found that abnormal activity in the brain begins minutes before the onset of a seizure, called a predictable condition.

Proposed DualFog-IoT is compared to the IoT-based architecture of the existing centralized data center. Having a genetic characteristic of blockchain, the proposed model system decreases rates and furthers existing IoT ecosystems unload minimum upper grayscale and [8, 9] to reduce the cloud data center. Blockchain technology’s complexity is maintained for most developers or teams to build, usually expensive and difficult to monitor the support blockchain network in their application. This algorithm thereby forces the redesign of blockchain that uses cryptosystems to resist quantum attacks, creating quantum-fixed or quantum-resistant cryptosystems, called quantum proofs, called post-quantum [10]. Threatens public key cryptographic hash functions. If all copies of the opponent store segments [11], the opponent can leave the system, causing a permanent loss of segments due to the blockchain system’s malfunction.

The unique geometry of [12] high-dimensional data using a manifold learning and support vector machine (SVM) proposed PVC detection and data visualization method. [13, 14] has proposed some of the blockchain-based storage systems in recent years. In most cases, the blockchain acts as a “witness of the agreement” with publishers. This method is, so far, because it is not managed to reduce the size of the blockchain itself, and it will not be able to avoid the storage model of Bitcoin. Maintaining this also a complete record of blockchain implementation in the IoT environment helps to insufficient storage capacity on edge devices. At the same time, the system does not require any significant transactions per second [15]. Even segment blockchain is used to improve blockchain sharing by separating transaction storage from transaction validation.

Our main contribution is to link the probability of failure as a group to each epoch by using the probability limit as the sum of the upper limit hypergeometric [16] and the two types of distribution. It is a reliable data management scheme based on blockchain in edge computing (BlockTDM called) [17] that has been proposed to solve the above problems. The flexible configuration of blockchain architecture, mutual authentication protocol, flexible consensus, smart contract management module and transaction data and blockchain node management and deployment are among them. Algorithms included artificial neural networks (ANN) [18], support vector machine (SVM), naïve Bayesian classifier (NBC), boosted decision trees (BDT) and multivariable logistic regression (MLR).

However, one of the important problems of fog computing and blockchain [19] integration is scalability. The group chain has proposed a new type of scalable public blockchain for the double-stranded structure of IoT services computing fog of computing. Despite its potential, there are some urgent problems to be solved to make the IoT services are widely used. Various loosely coupled distributed intelligence [20] to adjust the connection operation requires an IoT device for managing the system. Developers’ point of view analyzed blockchain from [21] blockchain, a larger software system, highlighting data storage and combination key concepts and considerations.

Two methods are currently introduced based on the heterogeneous network, protein interactions, genotype—is built using the phenotype correlations and phenotypic similarity. In HeteSim_MultiPath (HSMP) [22], HeteSim scores different routes contributing to the longer path and the damping constant. Therefore, a non-invasive diagnostic system based on machine learning (ML) has been developed to solve these problems. The decision-making system of experts based on the application of machine learning classifiers and artificial fuzzy logic is an effective diagnostic result; the mortality rate has decreased [23]. Therefore, no clear requirements should be treated as a technique to identify the most appropriate parameters during predictive analysis.

Instead, the feature extraction time-series EEG is converted to a time-frequency distribution (TFD) [24]. Gradient boosters use a set of programs aside to train the entire TFD directly. This paper proposes a machine learning method for predicting particulate matter concentrations from wind and precipitation levels, based on [25] two-year weather and air pollution data.

This paper [26] has been used to record valid composite minorities oversampling to generate new composites. Baldwinian learning and PSO (BLPSO) [27] are based on a novel hybrid algorithm to increase particle diversity and prevent premature convergence of PSO. Ability to compare calculated heart rate variability (HRV) baselines before the start of daytime antibiotics (LOS group) or during a randomly selected period (control group) during the calibration period [28].

In the future, Moderate Resolution Imaging Spectrum radix statistical model for predicting fire activity for 1–5 days to use satellite fire counts and meteorological data from temporary reanalysis [29] to develop. The component contains an apnea event [30], which automatically proposes the first use of EIT boundary voltage data from infants to obtain the main function of research apnea detection using machine learning. Factorial switching in linear dynamic systems is a common framework for addressing this issue [31].

This analysis of the previous method has a low classification accuracy and less security performance to introduce a new method in the next section.

3 Implementation of the Proposed Method

This IoT application model’s main motivation is to provide a computationally secure key generation for protected data through encryption with blockchain technology. A master key is a secret key that is accepted between communicating parties before a communication protocol begins. The essential characteristic of machine learning is the design of heterogeneous data applications with different dimensionalities. Various IoT application consists of different information collected by recording the data for producing diverse data representations. The collection of data is carried out by distributed and decentralized control with autonomous data sources.

Correlation Blockchain Matrix Factorization Classifier (CBMFC) algorithm is used to classify the given data into several columns and provide security in medical data information. Blockchain HMAC encryption is the enforcement of access control mechanisms, digital signatures, routing controls, notarization, etc., to provide data security services against attacks and prevention. The digest is calculated for ciphertext messages using the HMAC encryption algorithm. HMAC stands for the Hash-based Message Authentication Code. This authentication uses a key to implement the hash function product along with the content of the message. From Fig. 2, first, preprocess the data using Co-relational Matrix Data Preprocessing to remove the noise. The model is mainly executed in the preprocessing task. It presents a preprocessing task to remove noise and inconsistent data obtained from various sources. Interval-based measures are the discretization factors from large data applications, and the measured values reach the closest values. Data preprocessing is usually done to see if there is redundant information in the dataset. All attributes are initially considered separate subsets, with the final combination of attributes and features is marked as the highest subset of features. The hash value is stored on each block; if any new record enters the storage, it will update into the previous block. This proposed framework reduces the clustering task; a grouping of the same attributes occurs by using our framework for predicting the disease using a given disease dataset.

Fig. 2
figure 2

Proposed Method CBMFC block diagram

3.1 Correlation Matrix Data Preprocessing

The initial stage in Correlation Matrix Data Preprocessing is finding the unrelated data and normalized datasets. While processing the data, it is essential to calculate the univariate stats like the mean value, standard error, and rate of recurrences to inspect the volume of missing data. In complete information, maximum likelihood the population criteria have approximated that would probably generate the estimate values from the sample set which is analyzed. The classification accuracy depends upon the accuracy of data, so data should be non-ambiguous, correct and complete. Data collection methods are loosely controlled, resulting in inappropriate values like out-of-range missing values. Correlation matrices are useful for showing the correlation coefficient (or degree of relevance) between variables. The correlation matrix is symmetric, just as the correlation between V1 and V2 is the same as the correlation between V2 and V1.

Let the data matrix V composed to n-dimensional observation dataset, and individual variable values (V) size of [R × n]. To assume the row of R has centered, observe all the row values i to n. The correlational \(\sum\) of each row V data.

$$ \sum = cor\left( V \right) = {\rm E}\left[ {V^{T} } \right] $$
(1)

To estimate the correlation data matrix.

$$ {\rm E}\left[ {V^{T} } \right] \approx \frac{{V^{T} }}{n} $$
(2)

Form the Eq. (2), matrices as the product of two simpler matrices E and L, using a procedure known as Eigenvalue Decomposition.

$$ \mathop \sum \limits_{{}}^{{}} = EL^{{ - 1}} $$
(3)

The data matrix E is resized [R × C] matrix, where each column to apply the eigenvector.

From Eqs. (2, 3), derivation of the correlation matrix is as follows,

$$ \rho \left( {R,C} \right) = {\text{corr}}\left( {R,C} \right) = \frac{{{\text{cov}}\left( {\left( {R,C} \right)V} \right)}}{{\sigma _{R} \sigma _{C} }} = \frac{{E\left[ {{\text{cov}}\left( {\left( {\left( {R - \mu _{R} } \right)\left( {C - \mu _{C} } \right)} \right)V} \right)} \right]^{{ - 1}} }}{{\sigma _{R} \sigma _{C} }} $$
(4)

All such data discrepancies can lead to wrong research results; thus, data is processed before applying an algorithmic technique for better and improved results. Data needs to be preprocessed to ease the entire data process to improve machine techniques’ efficiency. The main steps used for preprocessing the data include data cleaning, data integration, data conversion and data reduction. In this, fill attribute values with the correlation value for unknown instances and convert all disease databases in a single file format.

3.2 Medical Feature Selection

A feature is that a subset selection which is a preprocessing step used in machine learning. It aims to increase learning accuracy to reduce and eliminate invaluable and irrelevant data dimensions. It indicates specific problems and their functions and the type of prediction that will be useful.

The input dataset is fed into the feature selection method block, where the feature selection is made according to the given dataset. It will take this way to reduce the number of attributes selected for a given number of dependent attributes.

The medical algorithm feature selected by randomly sampling instances from the training data and the selection process is shown in Fig. 3. It has been found that the nearest value class (adjacent value) of the same class is opposite each time to select the best models. The characteristic weight is based on its value to distinguish the example for detecting the model or has been updated from the latest hit and latest features.

$$ {\text{Weight}}~(W_{m} ) = \mathop \sum \limits_{{i,j = 0}}^{w} f_{m} - \frac{{\left( {U,~V,~S~} \right)~^{2} \left( i \right)}}{N} $$
(5)
$$ {\text{Gain}}~\left( {G_{m} } \right) = \mathop \sum \limits_{{i = o}}^{n} S_{i} log_{2} $$
(6)
Fig. 3
figure 3

Best feature selection process

where \(f_{m} ,\) the weight for attributes U, V, which are randomly sampled instances, S is the latest hit and N is the number of randomly sampled instances. The function diff calculates the difference between two instances of a given attribute.

\(f_{m}\) is an attribute of U, V, if they are an approximate model example, S is the recent success and N is the approximate number of sample events. The functional differences are calculated as the difference between the two events of a given attribute. The difference between the continuous attribute is the actual difference normalized to the interval [0, 1].

Algorithm steps

Input: training data D and feature subsets \(f_{{n - 1}}\)

figure a

If available in the presence and corresponding class distribution of the feature, feature selection measures the amount of information about the class prediction bit.

3.3 Optimized Pairwise Coupling Classification (OPCC)

It also serves as a classification mark supervised learning method and a statistical method of classification. It considers a basic natural model and assigns it to us by determining the probability of imprisonment with uncertainty in the moral model. This feature is characterized by information gain, and then, the best ranking features are selected as the best attributes to use in the classification.

Algorithm steps:

figure b

The attributes are determined for each attribute’s information to gain to classify a set of data segments. Then the information gain must be selected maximized attributes. After the classification result was stored in the database using HMAC encryption with blockchain.

3.4 HMAC Encryption and Blockchain Security Model

The Hash-based Message Authentication Code (HMAC) algorithm is implemented using binary operations and hash functions. HMAC is calculated with any cryptographic hash function; the resulting MAC algorithm is called HMAC-MD5. The security strength provided by the HMAC algorithm depends on the HMAC key, the most basic hash algorithm and the security features of the MAC Tag length. MAC is calculated using the data on the HMAC function, and the following operation is performed:

$$ {\text{HMAC~}}\left( {k_{1} ,k_{2} ,~{\text{data}}} \right) = {\text{~hash~}}(\left( {k_{1} ,{\text{~inner}}} \right){\text{~}}||{\text{~}}\left( {~k_{2} ,{\text{~outer}}} \right)){\text{~t}} $$
(8)

where t = time, \(k_{1} ~and~k_{2}\) = a pair of keys (encryption and authentication).

Algorithm steps

Input: document, MK- master key, HMAC encryption key (k1, k2).

figure c
figure d

It is considered an easy way to compute the MAC value h(x) for a given k and an arbitrary input x (Fig. 4).

Fig. 4
figure 4

Process of Key Generation

\(H_{{k1\left( x \right)}}\) is mapped to a message x of arbitrary length value having n number of fixed bits.

It is considered impossible to calculate the MAC value hk(m) of a new message m if the key k is unknown, though we get MAC values of other messages.

Protecting HMAC is used to demonstrate the correct data relationship between the designer’s embedded strength and the HMAC hash function. In terms of an embedded hash function, the HMAC function’s strength for basic hash function encryption within security depends in a certain way. HMAC function is usually a safe and predetermined number of successes based on the probability of fraud. Created based on the time spent using the same key to create a MAC message.

3.5 Blockchain Network Construction Algorithm:

figure e
figure f

From the above algorithm step to form the network group, the user can upload their document to a centralized server with encrypted format help of HMAC encryption. The HMAC authenticates the user using the master key for help to encrypt and decrypt the documents. This blockchain network incorporates patient medical data to diagnose and predict disease, and the resultant data is stored securely.

4 Result and Discussion

Statistics provide a strong basic background for quantifying and assessing results. However, it needs to be modified and tweaked for statistics-based algorithms before being applied to the IoT blockchain method. This section presents the results of a work that proposed a technique for predicting disease using machine learning. Section analysis to take cardiology disease data and some parameters (Temperature, heart rate and blood pressure) are considered to predict the disease level. Table 1 represents the simulation parameter of the proposed method to use.

Table 1 Simulation parameters

The number of correctly classified files with patient data according to the total number of files is defined as classification accuracy. This proposed method evaluates the following Eqs. (9, 10, 11, 12) for classification accuracy, precision, recall, F1 score, security and authentication time analysis. The Correlation Blockchain Matrix Factorization Classifier (CBMFC) method prediction accuracy is compared to the random forest, Bayesian classifier and SVM methods. Similarly, the proposed method’s security analysis, Enhanced HMAC Blockchain (EHMACB) security, compares to existing method Blowfish, MD5 and MidChain methods (Fig. 5).

Fig. 5
figure 5

Execution Time Analysis

Different bytes of data are taken to estimate the method proposed in this execution time analysis. In this proposed method, EHMAC is compared to existing methods Blowfish, MD5 and MidChain. The proposed method is to authenticate users and store the data to the network within 270 ms less execution time than the HMAC blockchain method.

Figure 6 represents the comparison of the proposed and existing method graph. In this analysis of security result, the proposed method EHMACB provides a 93.5% security compare to existing methods MidChain has 91.3%, MD5 has 85.6% and blowfish has 83.4% security in the medical blockchain network. The machine learning performance of the clinical dataset classification after the analysis is evaluated using the equation below.

$$ CA = \frac{{{\text{Number~of~classified~files~}}}}{{{\text{number~of~files}}}}*100~~~~~ $$
(9)
$$ {\text{Precision}} = \frac{{{\text{true~positive~}}}}{{{\text{true~positive}} + {\text{false~positive}}}}*100 $$
(10)
$$ {\text{Recall}} = \frac{{{\text{true~positive~}}}}{{{\text{true~positive}} + {\text{false~negative}}}}*100 $$
(11)
$$ F1 = \frac{{{\text{Precision}*\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$
(12)
Fig. 6
figure 6

Comparison of Security Analysis

Table 2 shows a comparison of the existing and proposed method disease prediction performance. Table 2 shows the classification accuracy, precision, recall, proposed method CBMFC, existing methods SVM, random forest and Bayesian classifier (Fig. 7).

Table 2 Analysis of proposed method prediction performance
Fig. 7
figure 7

Proposed method performance of prediction analysis

Using a pairwise coupling method to evaluate a two-pair feature matrix improves classification and prediction accuracy. The proposed machine learning method results provide higher performance than SVM, random forest and Bayesian classifier.

5 Conclusion

A given patient’s cardiology disease needs to be diagnosed accurately and in time. The proposed Correlation Blockchain Matrix Factorization Classifier (CBMFC) analysis the IoT data disease prediction. First, to apply the co-relational matrix for preprocessing to remove the noise from the IoT dataset. Finally, a Correlation Blockchain Matrix Factorization Classifier (CBMFC) method uses the train data and predicts it. The Blockchain Enhanced Hash-based Message Authentication Code (EHMAC) Encryption is used to provide security, it encrypts the user request, and records are stored on blocks. This analysis of the proposed method simulation result has a 90% of classification accuracy, 86% of precision, 87% of recall values and 93.5% of security with 270 ms execution time more efficiently compared to the existing method. This proposed method to implement into hospitals for analysis disease and secure the data form unknown person.