1 Introduction

The 4th Industrial Revolution has placed substantial focus on information and communications technologies (ICT), such as artificial intelligence (AI), the Internet of Things (IoT), big data, and blockchain. In the healthcare industry, there has been a renewed interest in telemedicine and other smart healthcare services [1,2,3]. AI is a technology through which machines have the ability to understand, perceive, and judge like humans [1,2,3]. There have already been several studies suggesting that AI can perform as well as or better than humans at key healthcare tasks, such as diagnosing disease. IoT technology is a hyper-network in which people and objects are connected to the Internet, thus allowing them to create, collect, share, and use information. Even without human intervention, many intelligent devices are capable of making decisions, working as groups, and sending information to the cloud automatically [1, 2]. Big Data refers to the vast amounts of data generated and collected in everyday activities. In the medical field, big data is used for the development of new drug and treatment recommendation services [4]. Blockchain is an innovative technology that does not store transaction books called blocks on a central server, but instead distributes them on personal computers which are then connected like chains for public storage. This is useful for personal health information because it is secure and because it enables individual transactions without the need for intermediaries [5, 6]. These ICT technologies collect, integrate, and store medical data such as Electronic Medical Records (EMR) and genomic data to make it big data, thus making it very easy to be used for various analyses and other purposes. The convergence of big data and artificial intelligence technology is expected to be highly useful in the development of digital health care and smart services based on vast amounts of medical and health data.

With the advancements that have been made in medicine, life expectancy has been prolonged, leading to an increase in the elderly population. The World Health Organization (WHO) reported that, with a rapidly ageing population, the number of chronic diseases can be expected to increase as well. Stroke is one of the highest prevalence diseases, and according to the WHO, 5.7 million people died from a stroke in 2016, ranking third after malignant neoplasm (cancer) and heart disease [7]. Stroke is a disease in which brain cells die because of the clogging or bursting of cerebral arteries, resulting in necrosis of brain cells, and in the worst case, death [8, 9]. Depending on the location and type of stroke, brain dysfunction such as hemiplegia and verbal or consciousness disorder may be observed. Stroke is known to cause disability in both the elderly and in adults [9, 10]. Patients can recover from a stroke if they are identified quickly and transferred to a medical institution for treatment. Therefore, a quick and accurate diagnosis is essential.

Due to its many symptoms, it is difficult to detect stroke early. While there has been substantial research into stroke risk factors, differences in the definition or methodology affect the prediction accuracy. The main risk factors that are consistently used for stroke prediction are smoking, diabetes, high blood pressure, and obesity [10]. The Framingham Heart Study developed a stroke risk prediction model based on cardiovascular disease and cerebrovascular disease [11]. However, it is difficult to apply such models to elderly Koreans who have very different social and behavioral characteristics than the participants in those studies. In response to this, Jee [13] and Yu [14] developed prediction models of stroke disease for Koreans. Jee et al. [13] developed a prediction model (10-year average stroke risk prediction model) on the probability of stroke within 10 years using EMR items such as age, diabetes, alcohol consumption, smoking, total cholesterol, and body mass index. Meanwhile, Yu et al. [14] attempted a semantic interpretation of stroke patients using the NIHSS (National Institutes of Health Stroke Scale) value for the elderly at Chungnam National University Hospital. However, these initial studies used the same methodology as the Framingham Heart Study, which does not consider the possibility of death or competitive risk due to causes other than stroke. While past medical records and health check-up information are important, there is still a need to grasp the degree of risk in daily life. In other words, comprehensive analysis and prediction systems using patterns and biosignals are urgently required.

Walking is the most basic exercise for the human body [15, 16]. Gait is an alternating motion in which many skeletal muscles move the body in a constant direction through cooperation with various joints of the upper and lower extremities [15, 16]. Research has shown that stroke patients may suffer from a loss of symmetric posture, decreased walking ability, and impaired balance response and motor ability [17, 18]. When one’s walking abilities are limited, it often leads to social isolation [17] and reduces the daily activities of that patient [19]. For stroke patients, recovering one’s walking ability is an important factor that determines an individual’s quality of life. Therefore, patterns in walking have emerged as a key component of early prediction for the onset of stroke and post-stroke patient rehabilitation [20]. Therefore, one should be able to detect stroke as soon as there are changes to a person’s walking pattern. Therefore, there is a need for a precise and quick diagnosis method that detects pedestrian asymmetry.

In this paper, we designed and implemented a new AI system that automatically extracts important features from the collected motion data and predicts stroke disease in real time. The motion sensors were attached to all of the study participants, and data were collected from both stroke and elderly patients during walking. The collected data were sent to a server which predicts and analyzes stroke using machine learning or deep learning algorithms. Depending on the algorithms, our system was able to predict stroke using the shoulders and quadriceps angles, angular velocity, and angular acceleration. The performance of the proposed system was verified by confirming the prediction accuracies of 98.25% for the C4.5 decision tree model, 98.72% for RandomForest, 96.60% for XGBoost, and 98.99% for LSTM. Altogether, this study presents a machine learning-based model that can accurately detect and predict the prognostic symptoms of stroke disease in real time. This system overcomes the limitations of previous models that require history and EMR data.

The rest of this paper is structured as follows. Chapter 2 describes stroke and previous research methods. Chapter 3 describes the stroke disease prediction system using motion biosignals obtained while walking, and Chapter 4 describes the experimental and analysis results in detail. Finally, Chapter 5 discusses the conclusion and future research topics.

2 Related work

2.1 Stroke disease

Stroke is a brain disorder caused by the sudden onset of neurological deficiency due to the blockage or bursting of blood vessels in the brain [21, 22]. Stroke patients generally have permanent dysfunction or complications such as motor, language, sensory, perceptual cognition, vision, and quadriplegia [22, 23]. There are two types of strokes: Cerebral infarction (ischemic) caused by the blockage of blood vessels and cerebral hemorrhage (cerebral hemorrhage) caused by the rupture of blood vessels [23]. Cerebral infarction is further divided into thrombosis, wherein blood clots form in the blood vessels damaged by arteriosclerosis, and cerebral embolism, wherein blood clots that have formed in the large arteries block the blood vessels to the brain. Meanwhile, cerebral hemorrhage can be further divided into intracerebral hemorrhage and subarachnoid hemorrhage. Cerebral hemorrhage in the brain parenchyma occurs spontaneously without external shock, and hypertension is its main cause. Subarachnoid hemorrhage is a disease in which a part of a blood vessel wall has weakened in elasticity and then swells in an alveolar shape, after which the cerebral aneurysm ruptures and blood leaks under the arachnoid membrane surrounding the brain [23, 24]. In particular, subarachnoid hemorrhage has been reported to be fatal enough to cause death before reaching the hospital in more than 30% of stroke patients [24].

Acute stroke disease causes disorders in the autonomic nervous system and central nervous system which are accompanied by heart problems such as arrhythmia in the electrocardiogram [25]. Acute stroke is a potentially fatal disease that can also cause permanent dysfunction and complications which cause difficulties in social or economic activities [26]. In particular, 85% of elderly patients in the early stages of stroke show upper limb disability, while 55–75% report upper limb disability over six months after onset [27]. The weakening of the upper limb muscle strength is caused by decreased nerve response and changes in the median nerve boundary, as well as muscle weakness due to inactivity [28]. Most social activities are restricted due to physical dysfunction, thus resulting in a poor quality of life. In addition, low motivation leads to mental disability such as depression, anger, and loss of pleasure, all of which adversely affect function recovery in stroke patients [29]. It is therefore necessary to early detect stroke symptoms that occur in the daily lives of the elderly, respond quickly, and make a thorough diagnosis by visiting the appropriate medical institutions. As a result, there is a need for research that can help minimize the social and economic damage caused by the aftereffects of stroke. Multilateral studies on the immediate detection and diagnosis of stroke diseases and various rehabilitation treatments are urgently needed to improve the quality of life of the elderly.

2.2 Previous Stroke Studies on motion

The human autonomic nervous system, sympathetic nervous system, and musculoskeletal system are integral for walking, and they presuppose human body movements [16]. The conditions of the walking load energy consumption decrease as the load approaches the center of the body. In walking, the feet are the only part of the human body in contact with the supporting ground, and they therefore play a crucial role in all weight loads [30]. The foot not only provides the driving force necessary for body-to-body movement when walking, but it also has a mechanism that adapts to irregular ground by absorbing the physical impact that occurs at such times. Each person’s walking and upright posture are inherently unstable because they maintain a high-pressure center on a relatively small base plane provided by their feet. To provide additional stability, the muscles of the waist and legs are activated. This posture adjustment is very complex when performing movement in an unstable state, and it shows high muscle activity because it uses many muscles [31]. The measurement of walking ability can be measured by dividing it into qualitative and quantitative aspects; gait measurement (gait cycle, cadence, swing period, stance period, step length, stride length, and walking speed) provides a quantitative respective [20, 32, 33]. Regarding qualitative measurements, walking asymmetry is a typical measurement value, and it occurs when the base and raised angles of both feet do not match each other during walking. Walking speed is a typical quantitative measurement method, and it has the advantage of being able to be measured relatively easily and simply while ensuring reliability.

For stroke patients, walking asymmetry can typically be observed as a significant difference in the degree of asymmetry in the normal state through changes in the movement of the upper and lower extremities [32, 33]. Akay et al. [34] developed a technique to measure and quantify acceleration information comparing between stroke patients and healthy elderly while walking. They introduced a maximum likelihood estimator (MLE)-based fractal analysis for the complexity of body motion, and the results showed that people with stroke had significantly higher values than healthy people. Chen et al. [35] were able to classify stroke patients based on data from daily living activities collected using wearable devices. In that paper, daily activity data were analyzed, and 11 stroke subjects were selected. The collected data were subjected to machine learning algorithms such as Decision Tree, RandomForest, Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGBoost). The results indicated that XGBoost can accurately predict seven daily activities. Carmo et al. [36] reported a statistical analysis of the changes in arm and leg movement as well as motion values during walking in stroke patients and control groups. The experimental results confirmed that stroke patients had a reduced range of motion in the glenohumeral and elbow joints when walking. Motion data measured during walking focuses on statistical and correlation analysis. Therefore, it can determine whether there is asymmetry due to stroke based on motion data collected in real time while walking. It is essential to research methods that can quickly diagnose stroke before receiving treatment from medical staff.

2.3 AI-based stroke research

Several researchers have applied AI technology to the field of predicting and diagnosing stroke [14, 37,38,39,40,41,42,43, 45]. For example, Yu et al. [14] published the results of an analytic and prediction stroke model based on the severity of NIHSS using a C4.5 decision tree algorithm, a representative predictive model of data mining. The rules of the operating principle provided by the C4.5 decision tree were analyzed in detail. However, there is still a need for an in-depth analysis of the data, as the rationale of the decision tree algorithm only provides a partial resolution. Shanti et al. [37] were able to detect the risk of stroke using an artificial neural network (ANN) algorithm. The paper showed that ANN's Backpropagation algorithm was used for learning, and that the consistency and diagnostic accuracy of the prediction was improved. A similar method by Hanifa et al. [38] used an ANN-based predictive model that can detect stroke patients with high accuracy of 95.33% from the experimental data of 300 people. Nevertheless, the classification and prediction of stroke diseases based on the ANN algorithm is still difficult to interpret, and its operating principles only emphasize accuracy. Hanifa et al. [39] presented the results of predicting and verifying stroke risk factors by adjusting the parameter values of the support vector machine (SVM) prediction model. A relatively accurate model which detailed the stroke risk situation was presented using the RBF kernel function of the SVM. However, the focus of that model was on determining the severity and prognosis after the onset of stroke, rather than detecting and predicting symptoms before a stroke. Chiun et al. [40] developed a system for detecting ischemic stroke using the extended patch images of CT scans as the input of the CNN model; it obtained greater than 90% prediction accuracy. Liu et al. [41] proposed a Res-CNN model that automatically classifies acute ischemic stroke lesions in multi-modality MRIs. The results confirmed that this model solved the problem of performance degradation using the residual unit, and that the model performance can be improved through data expansion. Chantamit et al. [43] introduced a method of integrating ICD-10 [44] codes into health records and integrating potential risk factors into predictive patterns and models in EMR information. Based on the integrated EMR information, a deep-learning LSTM-RNN model was applied for stroke prediction.

Recent studies have used different human biosignals to determine and predict stroke diseases in the elderly [23, 42,43,44,45]. For example, Choi et al. [23] used Electroencephalogram (EEG) biosignals to detect and predict stroke precursor symptoms. First, a signal at each timepoint was decomposed from raw EEG data with Fast Fourier Transform (FFT), thus giving values for alpha (α), beta (β), gamma (γ), delta (δ), and theta (θ) for six EEG channels. In addition, the ratio values ​​between low β, high β, and θ were extracted and used with the RandomForest algorithm to obtain a stroke prediction accuracy of up to 92.51%. Yu et al. [42] proposed an early stroke detection method based on machine learning and deep learning that measures the electromyography (EMG) of thighs and calves during walking in daily life. They presented a methodology that can accurately detect and predict stroke precursor symptoms with EMG data collected in real time, as opposed to past information such as EMR or CT scans. The experimental results for elderly daily activities were verified with accuracies of 90.38% and 98.958% using the RandomForest algorithm. Choi et al. [45] proposed a deep learning model that can predict stroke diseases using raw data that has not undergone frequency attributes of EEG biosignals and that has only been subjected to basic pretreatment. These studies confirmed that the LSTM model of deep learning is suitable for the time series analysis of biosignals. However, for biosignals measured and generated in daily life such as walking or driving, the convenience of wearing and the reliability of the data are paramount. Therefore, there is a need for a new alternative for predicting stroke diseases in the elderly during daily activities which can overcome the limitations of the traditional methodology as well as the disadvantages of biosignal measurement and collection.

3 Stroke disease prediction system using motion information while walking

We proposed an AI-based stroke disease prediction system that uses motion information collected during the daily activities of the elderly. Figure 1 shows the system, which consists of five modules in total: (1) Measures and collects motion data in real time; (2) Collects motion data in real time to extract data preprocess and important attributes; (3) Integrates and distributes collected motion data with ECG, EMG, EEG, and individual EMR data; (4) Predicts motion disease, and (5) Visualizes the results. The proposed system extracts and stores optimized preprocessing and attributes algorithms for various biosignals, such as ECG, EMG, and EEG, as well as motion data collected from the elderly during daily activities. Data for each of the biosignals are used as input values for both machine learning and deep learning models to provide optimal real-time prediction and analysis results for stroke diseases.

Fig. 1
figure 1

The overall structure of the real-time stroke disease prediction system using motion biosignals

3.1 A module for measuring and collecting motion data while walking

In addition to the motion data, other biosignals such as ECG, EEG, and EMG were measured to predict stroke diseases in the elderly. All biosignals were measured in real time using the Captiv motion analyzer [46, 47]. The collected data were used to verify the prediction and analysis for stroke in the elderly. Figure 2 shows the attachment locations of all the wearable sensors on each subject.

Fig. 2
figure 2

The location of each sensor for collection of biosignals in real time

Figure 3 shows the specific locations of sensors for motion biosignals. The Captiv motion analyzer was chosen for its high accuracy (gyroscope ± 2000o/s, accelerometer ± 16 g, and magnetometer ± 2.5 gauss) and because it has been used in many medical and rehabilitation fields. To maximize the measurement accuracy, the sampling rate was set to 128 Hz. Raw data of four quaternions (qx, qy, qz, and qw) were collected from six locations on the body: (1) Left and right Shoulders, (2) Left and right quadriceps, (3) Back, and (4) Waist.

Fig. 3
figure 3

Real-time measurement and collection of motion biosignals

3.2 Pre-processing of motion data and extracting important attributes module

The motion data measured at the six locations (Fig. 3) are transmitted to a smartphone or gateway through BLE communication. Other methods, such as Wi-Fi, LTE, and 5G, can also be used depending on the system settings. The messaging standard was JSON (JavaScript Object Notation). Incomplete data or missing values raw data received from the module were removed. Since the categories of the minimum and maximum values are different for each data, a normalization process was also performed. The system was designed to extract the attribute values of angle, angular velocity, and angular acceleration based on the quaternion values ​​of qx, qy, qz, and qw for each position. Table 1 summarizes the raw data of motion extracted and collected in this paper with some additional explanations.

Table 1 Description of 24 attributes and meanings extracted from motion raw data

By using the quaternion value for each motion data listed in Table 1, the angle, angular velocity, and angular acceleration properties were extracted for each measurement position and used in the experiment (Table 2). Since the waist and back are reference positions, separate attribute values were not extracted for these.

Table 2 Important attributes and descriptions extracted from motion data

3.3 Module for integrating and distributed storage of data by multiple biosignals

The motion data-based stroke disease prediction system proposed in this paper can collect and manage various types of biosignals, such as motion, ECG, EEG, EMG, and PPG. The various biosignals including these motions are sequential values according to time. From the time-series data, the suitable attribute values for predicting stroke in the elderly are extracted according to Sect. 3.2. Therefore, for the real-time storage of raw data for each biosignal, the system was designed and developed to integrate and distribute storage as representative MongoDB of Not Only SQL (NoSQL). Depending on each of the biosignals, raw or processed data were extracted for statistical information and important attributes. These will be distributed and stored in a Relational Database (RDB) and NoSOL according to the characteristics. These raw data and important attribute values ​​stored in the RDB and NoSOL can be used for disease prediction and multifaceted analysis based on machine learning and deep learning.

3.4 Machine learning and deep learning-based learning and real-time prediction modules

Real-time motion data collected from the elderly are time-series data that have sequential values over time. It is necessary to consider the time-series properties of motion data. Therefore, the LSTM model of deep learning, which can express the properties of time series data, was used to predict stroke diseases. This LSTM method has the advantage of solving the long-term dependence problem of RNN in deep learning. Further, a predictive model using machine learning methodology for the values of angle, angular velocity, and angular acceleration attributes from each measurement location was extracted. This machine learning method has the additional advantage of being able to receive predictive performance and hermetic information for stroke disease.

The machine learning- and deep learning-based learning and prediction module proposed in this paper constitutes the following two subblocks (see Fig. 4): first, in the batch processing block, machine learning and deep learning are performed by storing and preprocessing motion data collected in real time to extract important attributes. In the real-time processing block, stroke diseases are predicted as soon as motion data are measured and collected during walking of the elderly, and the risk values are provided to the medical staff, the patient themselves, their next-of-kin, and the visualization module described in Sect. 3.5. In the real-time processing block, raw motion data of more than 3 s were collected, features were extracted and predictions were executed. Through the experiments, it was confirmed that the extracted features for machine learning-based algorithms and raw data for deep learning-based algorithms, provided prediction results within 0.05 s.

Fig. 4
figure 4

Machine learning and deep learning-based learning and real-time prediction modules

3.5 A module that visualizes and provides prediction and analysis results

The visualization and prediction information of stroke created using the motion and the various biosignals obtained during walking are provided (see module on the right). The screen is designed to provide predictive results and semantic analysis information of stroke to the medical staff, the patients themselves, and their next of kin. According to the judgment of stroke risk analysis, medical staff or hospitals may support the patient by contacting the patient and quickly transporting them to the hospital to conduct precise examination and diagnosis services. However, various clinical difficulties and risks exist in predicting and determining stroke diseases using raw data and the important attributes of motion. The judgment of an experienced medical staff with professional medical knowledge is essential in distinguishing whether an elderly person is having a stroke based on various biosignals, including motion. Consequently, this study conducted experiments and verification to provide scientific and meaningful information that enables the faster and more accurate judgment of medical staff and hospitals, rather than a system that directly predicts the precursors and risk of stroke in elderly patients.

4 Experiment and analysis

4.1 Data set and experimental environment

This section describes the verification and performance of the system for the prediction of stroke using motion. Sensors were placed at six locations on the patient. The waist and back sensors were used as reference points while the real-time data were collected from the left and right shoulders and quadriceps at a sampling rate of 128 Hz per second. This motion data positioning is because the joint movement and the movement speed of the upper and lower bodies can be measured in a patient with stroke symptoms during walking. To elaborate, it is possible to check the movement of the upper body and the movement range both in the standing state and during walking, and the movement speed can be determined according to the change of the main joints and muscles of the pelvis in the lower body. These raw data were sent to the server for processing. Next, the angle, angular velocity, and angular acceleration were automatically calculated and obtained. These motion biosignals are widely used in the study of gait changes and rehabilitation exercises in patients with hemiplegia. In the case of stroke, patients’ walking patterns and related factors are analyzed based on fine symptoms, as the decreases in walking speed and the increases in the duration of the foot on the ground compared to the settler are evaluated. The motion information angle, angular velocity, and angular acceleration values of the shoulders and quadriceps are important precursor symptoms of stroke. Therefore, in this paper, we intend to analyze walking disorders caused by minute symptoms or balance abnormalities during walking using motion data that can serve as medical and kinematic parameters.

The motion data of stroke elderly and normal elderly were obtained at Chungnam National University Emergency Center and Department of Rehabilitation Medicine. From 2017 to 2018, various biosignals such as ECG, EEG, and EMG, including motion, were measured and collected from elderly participants over 65 years old [42, 45]. Patients who were confirmed to have had a stroke within one month and who were undergoing rehabilitation treatment were selected as stroke subjects. In 2017, data were obtained from 48 elderly stroke patients and 75 controls; in 2018, data were obtained from 13 stroke patients and 137 controls. To balance the numbers of stroke patients and controls, all 61 stroke patients and 61 randomly extracted controls were selected. All subjects repeated each scenario a total of five times: walking, standing, sitting in a chair, raising and lowering their arms, speaking, and sleeping. The first and last measurements were excluded from the experimental data, as the subjects may have experienced discomfort in wearing the sensors, tension, and fatigue from repetition. The model development in this paper was conducted in an environment using Ubuntu 18.04.4 LTS, an Intel Core i9-10900X CPU, an NVIDIA Quadro RTX 8000 GPU, and 256 MB RAM.

4.2 Performance indicators of experimental results

This section defines the performance evaluation indicators used to verify the performance of the stroke system with motion data [14, 23, 42, 45, 48].

  • Accuracy: The percentage of people who have tested positive for stroke and negative for normal and control groups among all people.

  • F1-score: The harmonic mean of precision and recall.

  • Recall: The percentage of stroke patients that have tested positive.

  • Precision: The proportion of people who are stroke patients among those who have tested positive.

When a stroke patient is misclassified as a normal person, the untreated disease can have an impact on that person’s daily life. Therefore, accuracy is the most important performance indicator in the field of disease prediction and healthcare. A predictive model should minimize the rate of misclassification of stroke patients as normal people (see Table 3).

$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{FN}} + {\text{FP}} + {\text{TN}}}} $$
(1)
$$ {\text{F1 - Score}} = 2 \times \frac{{{\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$
(2)
$$ {\text{Recall}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} $$
(3)
$$ {\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}} $$
(4)
Table 3 Confusion matrix of performance evaluation for stroke prediction

4.3 Experiment and analysis based on machine learning

In this section, the attribute values of angle, angular velocity, and angular acceleration for each measurement location were put through a machine learning methodology. This experiment used C4.5 decision tree, C5.0 decision tree, alternating decision tree, RandomForest, logistic regression, naïve Bayes, multi-layer perceptron, SVM, C&RT, XGBoost, and QUEST. Each machine learning-based stroke disease prediction model attempted to achieve the highest accuracy based on different learning and prediction datasets. Two different experiments were conducted. In these experiments, 67% of the randomly extracted data were used for learning and the rest of the data were used for testing. Likewise, data extracted by the same method at a rate of 80% were used for learning, and the remaining 20% of data were used for testing. Finally, the entire dataset was divided into 5-Fold, 10-Fold, and 20-Fold Cross-Validation, and the predictive model verification and experiment were conducted.

In total, 12 motion attribute values (Sect. 3.2) were used in this experiment. Since motion data change at a rate of 128 Hz per second based on the sensors, data were extracted in units of 0.1 s when extracting attribute values. In total, 122 subjects comprising 61 stroke patients and 61 elderly people with a total of 283,734 attribute values (141,867 cases each) were extracted and tested. To verify the performance in this experiment, the four performance indicators defined in Sect. 4.2 were used for verification. Tables 4 and 5 present the data sets classified into each machine learning algorithm using 12 motion attribute values, and the performance indicators of prediction accuracy, F1-score, recall, and precision are summarized.

Table 4 Predictive accuracy and F1-score (%) for each algorithm using 12 attribute values of motion
Table 5 Recall and Precision (%) for each algorithm using 12 attribute values of motion

Next, the 12 attributes of motion data were tested after normalization using the Z-score method (Tables 6 and 7). However, the deviation of the maximum and minimum values is very large for the 12 attributes. This creates a problem of bias in the measurement units, which adversely affects learning. Therefore, stable learning and prediction results can be guaranteed when the same weight is applied by converting the values to the same range from 0.0 to 1.0 for each attribute through the normalization method. In Eq. (5) below, σ and μ respectively refer to the standard deviation and average of the attribute x, α is the weight value, and 1.0 was set in the experiment conducted in this paper.

$$ \overrightarrow {{x_{i} }} = \frac{{x_{i} - \mu }}{\sigma } \times \alpha $$
(5)
Table 6 Predictive accuracy and F1-score (%) for each algorithm applying Z-score to 12 attributes of motion
Table 7 Recall and precision (%) by algorithm applying Z-score to 12 attributes of motion

After normalization, the overall performance improvement in terms of prediction accuracy for stroke in the elderly was about 1.0% (Table 6). The prediction model with the highest prediction accuracy was RandomForest, which had 98.72% accuracy. For the parameter setting of the RandomForest algorithm, the number of trees was set to 50, the random seed value was set to 1, the tree depth was set to 30, and the minimum number of node splits was set to 2.

4.4 Experiment and analysis based on deep learning

Experiments on deep learning to predict stroke were performed with LSTM models due to its proven track record in time series analysis [42, 45]. LSTM is a model that overcomes the structural shortcomings of the existing RNN, and it is designed to solve vanishing gradients where the amount of computation increases and the value decreases as the error value is backpropagated [42, 45]. LSTM consists of four cell states along with input, forget, and output gates. This LSTM has a structure that transfers past information to the next state. At this time, the vector value output from each gate is generated through the sigmoid layer and the tanh layer. As a result, the LSTM learns long-term dependent values, and the cell state operates in the order of passing information from the past to the next state. All 24 raw data points from Sect. 3.2 were used. To generate a predictive model, learning data were randomly extracted, and the remaining data not used for learning were used for testing. To verify the experimental and prediction results, the data segmentation ratio was divided into 67% to 33% as well as 80% to 20%.

In this experiment, the important parameters for performance verification using motion data-based deep learning LSTM were set, and they are described as follows. Among the parameters, nUnit means the number of cells in the LSTM network and Iteration means the total number of learning iterations. Next, the learning rate is a scheduling value that adjusts the rate of learning. It is important to set appropriate values as learning rates do not reduce learning errors; overfitting or underfitting may occur when learning progresses too fast or too slow. 1st Decay LR (Learning Rate) can prevent overfitting and induce stable learning error reduction by multiplying the initial learning rate by the 0.1 value from the corresponding number of learning. "2nd Decay LR" was used for stable convergence to determine the optimal predictive model by multiplying the reduced learning rate in "1st Decay LR" by 0.1 again. A hidden node is a parameter value for setting the number of hidden nodes included in a cell in one LSTM. In this experiment, Adam optimizer, which showed a stable performance in prediction accuracy and loss among various optimizers of LSTM, was selected and used to predict stroke in the elderly. As presented in Table 8, when 80% of the data were randomly extracted for learning and the remaining 20% ​​was used for validation, a stroke prediction accuracy of 98.994% was obtained.

Table 8 Accuracy of predicting stroke diseases based on LSTM using motion raw data (%)

Figure 5 shows the prediction accuracy and error rate with 80/20 data from the 7th experiment in Table 8. In Fig. 5b, the error is reduced according to each iteration, and a stable stroke disease prediction is guaranteed. As a result, the LSTM model showed optimal predictive performance when using 5000 learning iterations as well as a learning rate of 0.005 and with the number of hidden neurons (or hidden nodes) in the cell set to three times the nUnit parameter.

Fig. 5
figure 5

Changes in prediction accuracy and error rate through LSTM model. a Trend of prediction accuracy according to iterations. b Trend of error rate according to iterations

Figure 6 shows the ROC (receiver operating characteristic) curve of the LSTM models in Table 8. The ROC curve expresses the threshold and performance of binary classification prediction of elderly stroke disease, where the x-axis indicates specificity, and the y-axis indicates sensitivity.

Fig. 6
figure 6

The ROC curve of LSTM model using raw motion biosignals

5 Conclusion

In this paper, we propose a new system that can predict stroke in the elderly using machine learning and deep learning algorithms based on real-time motion biosignals. The proposed system overcomes the limitations of the current predictive models for stroke occurrence, which are based on the past 10 years of medical history and EMR data, or the stroke risk factors suggested in the Framingham Heart Study. The proposed achieved prediction accuracies of 98.25% for the C4.5 decision tree model, 98.72% for RandomForest, and 96.60% for XGBoost with attribute values obtained during walking. For deep learning experiments, the LSTM model achieved 98.99% accuracy for stroke disease prediction. To conclude, we showed that the impairment of the upper and lower motor functions can be used as a prognostic symptom of stroke disease for early detection. The use of AI can reduce the number of false positives and allow for a preemptive response for stroke to be taken.

In a future research project, we intend to conduct research and develop a service predictive of stroke by organically combining various biosignals and EMR by measuring and collecting real-time biosignals in sleep or driving situations.