INTRODUCTION

The cascade of adverse health effects resulting from falls is an important public health concern worldwide.1,2 Although the age-standardized incidence rate of falls was less in 2017 than in 1990,1 falls still account for a large proportion of injuries in older adults.1,2 A recent study reported that approximately 30% of older adults experienced falls annually and that fallers were more likely to suffer from disability due to fall-related injuries.3 Thus, fall prevention is important to prevent the deterioration of personal health, including physical and mental health, and to reduce social burdens, such as healthcare costs.4

Multiple factors are considered to contribute to fall risk. Systematic reviews have reported that poor physical function,5,6 cognitive decline,7 malnutrition,8 obesity,8,9 physical inactivity,10 polypharmacy,11 and urinary incontinence12 are risk factors for falls among older adults. In addition to these risk factors, psychosocial factors13,14,15,16,17 have gained increasing attention as important factors. These include the fear of falling,13 socioeconomic status,14,15,16 social participation,16 and fall efficacy.17 These studies suggest that there may still be unknown risk factors, related in particular to psychosocial aspects. In addition, because existing studies do not directly compare the various risk factors, the risk factors that are most likely to predict falls in older adults remain unclear.

In recent years, studies using machine learning approaches have gained increasing attention.18,19,20,21,22 However, previous machine learning studies of fall risk have failed to obtain an accurate predictive model.21,22 This situation may be explained by the following reasons: (1) Frailty was more prevalent in each previous study’s target population than in the average population of the studied age range,21,22 and the risk of falling is relatively elevated in frail, older people.23 Thus, the predictors used in their study may have had poor predictive ability for future fall risk because the characteristics of the fallers and nonfallers in those studies were similar. (2) Previous studies lacked information on the variables related to psychosocial aspects owing to the nature of the medical claim data. In general, medical claim data mainly involve physical data, including diagnosis, medication use, and physical examination data; therefore, they lack psychosocial information. (3) The machine learning algorithm used in previous studies (random forest) were less powerful, leading to lower performance metrics. In general, eXtreme Gradient Boosting (XGBoost) algorithm,24 a new algorithm, performs better than random forest algorithm25 owing to the former’s serial learning algorithm that builds trees one at time, where a new tree helps to correct the errors made by the previously trained tree. Meanwhile, the random forest algorithm builds each tree independently.25

Machine learning approaches as a whole are advancing rapidly. In addition, new frameworks are becoming available to make complex machine learning models more interpretable, enabling the extraction of actionable insights from models.26,27 Therefore, the exploration of new predictors and their behavior in predicting falls with an interpretable machine learning approach can help in developing preventive strategies for community-dwelling older adults. Recently, psychosocial factors have been considered especially important for older adults because of these factors’ bidirectional relationships.13,14,15,16,17,28 For example, a longitudinal study targeting the general population reported that depressive symptoms and weak handgrip strength were mutually associated with one another.28

In this study, we take advantage of the Japan Gerontological Evaluation Study (JAGES), which conducted a large-scale ongoing panel survey targeted at older adults across Japan using a multidimensional questionnaire. The JAGES questionnaire involves multidimensional variables covered in the surveys, including health, psychological, and functional factors.29,30 We aim to construct an interpretable predictive machine learning model for falls among community-dwelling older adults using machine learning methods.

METHODS

Study Population

We constructed panel data from the JAGES, which is an ongoing nationwide study targeting community-dwelling older adults in Japan. The baseline survey was conducted from August 2010 to January 2012. Self-reported questionnaires were mailed to 141,407 individuals aged ≥65 years who were not certified in the long-term care insurance system across 24 municipalities in nine of the 47 prefectures (provinces) of Japan. A follow-up survey was conducted from October to December 2013, and self-administered questionnaires for the follow-up survey were mailed to the same respondents (Figure 1).

Figure 1
figure 1

Participant selection for analysis, Japan, 2010–2013.

Variables

The experience of falls during the past year was used as the participant's outcome of interest. Self-reported falls were evaluated by asking the participants, “Have you had any falls over the past year?” with possible responses of “multiple times,” “once,” or “none.” Participants who provided the latter two responses were combined as previous studies have reported that annual fallers exhibited health characteristics similar to those of nonfallers.31,32

First, variables with more than 30% missing information were excluded from the analysis.33 We used all the remaining 142 variables included in the baseline survey of the JAGES as candidate predictors (Supplementary Table 1). Some baseline variables were aggregated to calculate the scores of the following corresponding scales: Japanese Geriatric Depression Scale,34 Tokyo Metropolitan Institute of Gerontology Index of Competence,35 and sense of coherence (SOC) scale (Supplementary Table 2).36 We also calculated the study period and included as a candidate feature.

In the present study, the Japanese Geriatric Depression Scale was a 15-item scale whose total score was calculated by adding negative responses ranging from 0 to 15, with higher scores indicating a higher probability of depression. We categorized respondents into the following three groups: no depression (0–4 points), mild depression (5–9 points), and severe depression (10–15 points).34

The Tokyo Metropolitan Institute of Gerontology Index of Competence is a 13-item scale (in yes/no format) to assess the abilities regarding physical functions, effectance, and social roles.35 In this scale, a high score indicates high ability.35 Each of the three domains was aggregated separately and used as a candidate feature.

In this study, SOC was measured using six questions (two questions from each of the three subdomains employed in the SOC Scale).36 SOC, which is defined as the ability to cope with stressful life experiences, is considered to reflect personal health behavior.37 According to a previous study,38 the responses were summed to create a score that ranged from 6 to 30, with a high score indicating a high level of SOC. We categorized the respondents into the following three groups, in keeping with a previous study: low SOC (6–20 points), middle SOC (21–23 points), and high SOC (24–30 points).38

Analytic Strategy

The random forest imputation algorithm was used in this study to handle missing variables.33 The variables that were assessed in the follow-up survey were used as the explanatory variables in addition to the 142 candidate features (measured in the baseline survey) in our imputation procedure, and the follow-up variables other than our outcome (i.e., experience of falls in the past year) were excluded after the imputation procedure.

Our machine learning procedure comprised the three steps: feature selection, modeling, and model evaluation and SHapley Additive exPlanations (SHAP) value calculation. In a high-dimensional dataset such as ours, selection of features is an important procedure in machine learning to improve interpretability, avoid overfitting, and prevent performance degradation.39 The random-forest-based Boruta algorithm was used for feature selection.39 This algorithm is reported to be one of the most robust feature selection algorithms and recommended for the analysis of high-dimensional dataset.40 For modeling and model evaluation, we used a nested k-fold cross-validation procedure to prevent overfitting and overly optimistic estimates of model performance.41,42 First, the dataset was randomly split into 10 mutually exclusive folds (outer split). Nine of the 10 folds were used as training data to train the model, while the remaining fold was used as test data for model evaluation. The training data from the outer split were further divided into 5 folds (inner split). Four of these 5 folds were used for hyperparameter optimization (training set) and validated with the remaining fold (validation set). This process was iterated until each fold in the outer split was evaluated as a test (10 iterations). Then, the entire nested k-fold cross-validation procedure was repeated 10 times, evaluating 100 independent models (Figure 2).

Figure 2
figure 2

Modeling step of proposed machine learning approach (10 repetitions of nested k-fold cross-validation).

We used a random search strategy to identify the optimal hyperparameters for the training model43 and applied the XGBoost algorithm, which is based on a decision tree framework, for the learning procedure.24 Mean performance metrics were calculated from the 100 evaluated models. Model performance metrics included accuracy score, F1 score, and area under the receiver operating characteristic curve (AUC). The formulas for each performance metric are summarized in Supplementary Figure 1.

The combination of an oversampling method (synthetic minority oversampling technique) and under-sampling method (edited nearest neighbor)44 was applied to handle the imbalanced class distribution of the data.45,46 The sampling method we used in this study forms new samples using k-nearest neighbors and then cleans the oversampled data.44

To compare the magnitude of the contribution of each predictor, we computed the SHAP value, which is a novel framework based on the game theory.27 The SHAP value quantifies the contribution of each feature to the prediction results. This framework allowed us to verify whether each feature contributed positively or negatively to the probability of falling. We calculated SHAP values for the XGBoost model with the highest predictive capacity (i.e., highest AUC).

For comparison, we calculated the prediction performance scores with imbalanced data and with random under-sampling with a 1:1 ratio. Furthermore, we used random forest models25 as a baseline for performance comparison with XGBoost models. We also implemented a conventional logistic regression model using the selected features via the random-forest-based Boruta algorithm. All analyses were performed using Python 3.8.3. The study protocol was reviewed and approved by ethics committees at Tohoku University.

RESULTS

Characteristics of the Analyzed Participants

At the baseline, 92,272 individuals responded to the questionnaire (65.3% response rate). We excluded the participants with invalid baseline information (n = 14,558); thus, 77,714 participants remained. In total, 63,462 participants responded to the follow-up survey (81.7% follow-up rate). We then excluded participants who had invalid follow-up information (n = 1,024) or were functionally dependent at baseline (n = 555). Consequently, 61,883 participants were analyzed in our main analysis (Figure 1). The mean ages of the nonfaller/annual faller group and the multiple faller group were 72.8 (SD = 5.5) and 75.4 (SD = 6.1), respectively. The baseline demographic characteristics of nonfallers/annual fallers and multiple fallers in the follow-up survey are summarized in Table 1.

Table 1 Baseline characteristics according to experience of falls in follow-up survey, Japan, 2010–2013

Selected Features

Fourteen features measured at the time of the baseline survey were selected from the 142 candidates based on the random-forest-based Boruta algorithm (Figure 3): the experience of falling (multiple times, once, or none), self-rated health (poor, fair, good, or excellent), age (continuous), fear of falling (no or yes), ability to stand up from chairs without using one’s hands (no or yes), depressive symptoms (mild, moderate, or severe), choking experience (no or yes), dry mouth (no or yes), arthrosis (no or yes), difficulties in eating hard foods (no or yes), ability to climb stairs without a handrail (no or yes), SOC (low, middle, or high), incontinence (no or yes), and number of remaining teeth (≥20, 10–19, 1–9, or edentulous).

Figure 3
figure 3

Feature importance and SHapley Additive exPlanations (SHAP) values for each selected feature. (A) Global feature importance. (B) Local explanation summary. The behavior of each local prediction and each dot represents an individual prediction. Each dot represents the direction of effects (positive = red, negative = blue) at different levels of each predictor. When multiple dots are in the same x position, they accumulate to represent density.

Prediction of Fall Risk

The mean prediction performance of our primary model in terms of accuracy, F1 score, and AUC score was 0.88 (SD = 0.02), 0.89 (SD = 0.02), and 0.88 (SD = 0.02), respectively. The scores of each of the 100 models are presented in Supplementary Table 3. Among the predictive scores obtained from the application of an under-sampling method to the imbalanced sample, those obtained from our primary model were the best (Table 2). Moreover, XGBoost obtained better model evaluation scores than the random forest algorithm (Table 2). The implementation of the conventional logistic regression model using the selected features produced results, which are presented in Supplementary Table 4.

Table 2 Prediction performance of XGBoost and random forest algorithms with and without resampling

Figure 3 presents the calculated SHAP values of our primary model. Figure 3(A) presents the global feature importance, demonstrating that the experience of falling as of the baseline survey was the most important feature, followed by self-rated health and age. Figure 3(B) presents the behavior of each local prediction, where each dot represents an individual prediction. Each dot reveals the direction of effects for different levels of each predictor; for example, the lower values of experience of falling (i.e., multiple fallers; blue dot) were associated with a higher risk of falling than higher values of the experience of falling (i.e., none; red dot).

DISCUSSION

This study is the first to successfully determine long-term predictors of fall risk among community-dwelling older adults using a machine learning approach. In the present study, 14 features were selected as predictors of falls in the older adults, and SOC was selected as an important new potential predictor.

Many previous systematic reviews have identified risk factors for falling, such as physical functions.5,6,7,8,9,10,11,12 In addition to these findings, old age,47 fear of falling,13 depression,5 urinary incontinence,12 and low self-rated health48 were reported as risk factors for falling. In keeping with these studies, age, fear of falling, depressive symptoms, self-rated health, and physical function were selected as important features in this study.

In our study, predictors considered to reflect psychosocial aspects were observed as the important feature for fall prediction (Figure 3(A)). For example, we found that depressive symptoms were important predictors of fall risk. The possible mechanism underlying the association between depressive symptoms and physical functions may be explained by the physical inactivity and unintentional weight loss due to depressive symptoms: a previous study reported that individuals with depressive symptoms were more likely to be physically inactive.49 Moreover, other studies reported that depression is associated with anorexia and weight loss in older people.50,51 Furthermore, both physical inactivity and unintentional weight loss are associated with poor physical function.52,53 Additionally, SOC, which is defined as the ability to cope with stressful life experiences,37 was also selected as one of the important features among the 142 candidates, which has not been reported in previous studies. In general, individuals with high SOC are able to self-evaluate their social roles by accepting, approving, coping with, and even changing them as required. These individuals are able to engage in specific, sensible, and rational behaviors, and this ability may contribute to healthier behavior.36 In addition, a longitudinal study reported that lower SOC was a predictor for the onset of depression.54 Indeed, our conventional regression model showed that the association of SOC with fall risk was somewhat attenuated by depressive symptoms (Supplementary Table 4). This suggests that a low level of SOC predicts the 3-year fall risk by increasing depressive symptoms. However, future studies are warranted to examine the possible mechanism underlying the associations between psychosocial factors and fall risk.

Although further studies investigating those psychosocial factors’ contributions to fall risk are warranted, we considered the assessment of psychosocial factors in clinical practice is important for a number of reasons. The relationship between psychosocial factors and physical functions has recently been reported to be bidirectional. For example, a longitudinal study targeting the general population in China reported that depressive symptoms were associated with weak handgrip strength.28 Moreover, a longitudinal study targeting older women in the USA reported that depressive symptoms were associated with new-onset physical frailty.55 Therefore, psychosocial factors could be theoretically associated with falling because physical function is one of the risk factors for falling.5,6

An observational study that analyzed medical claim data of older patients receiving home healthcare services obtained a moderately accurate predictive model for fall risk: the AUC value of their best model was 0.67.21 Another previous study that analyzed community-dwelling older adults who were at elevated risk for disability also reported a moderately accurate prediction model: the AUC value of their best model was 0.66.22 The poorer performance metrics in those previous studies compared with ours might be explained by the following: Frailty was more prevalent in each previous study’s target population than in the average population of the studied age range,21,22 and the risk of falling is relatively elevated in frail, older people.23 Thus, we considered that within the frail population, previous experience of falling may not accurately predict future falls. Meanwhile, our results showed that previous experience of falling was the strongest predictor of falling. This finding suggests that this is the most important information for assessing fall risk in functionally independent, community-dwelling, older people. The set of candidate features for predicting falls in previous studies was biased toward somatic or functional information and did not fully capture psychosocial factors, such as mental health or socioeconomic status, which are considered more important in older adults.13,14,15,16,17 By contrast, among the 14 features of the 142 candidate features in our study, three features with psychosocial factors were selected as important predictors for risk. These particular features were not captured in previous studies that conducted predictive models using machine learning approaches.21,22 The machine learning algorithm used in previous studies (random forest) was less powerful, leading to lower performance metrics. In general, the XGBoost method used in this study outperforms the random forest method used in previous studies.21,22 Thus, we consider that the XGBoost method used in our predictive model for fall risk in community-dwelling older adults is more powerful than the random forest method.

This study has several limitations. First, we cannot infer causality from the results, as the use of machine learning does not protect against bias due to unmeasured confounders. Nevertheless, a study has argued that highly accurate predictive models perform better than traditional parametric causal models.56 Second, most of the features were time-variant variables, and we used only those variables that were assessed in the baseline survey. Third, we could not identify the participants who experienced falling during the intermediate period. Therefore, our estimates of fall risk may underestimate the actual risk. Fourth, the JAGES respondents are not nationally representative. Thus, the difference in the demographic characteristics between the target population and our study population should be considered when applying the current results to other populations.

In conclusion, our machine learning approaches for 3-year fall prediction yielded high prediction performance with useful actionable interpretations. The machine learning approach is thus useful for exploring potential factors and providing new insights for fall prevention strategies.