Keywords

1 Introduction

The number of people aged over 65 is projected to grow from an estimated 524 million in 2010 to nearly 1.5 billion in 2050 worldwide [1]. This trend has a direct impact on the sustainability of health systems, in maintaining both public policies and the required budgets.

This growing population group represents an unprecedented challenge for healthcare systems. In developed countries, older adults already account for 12 to 21 % of all ED visits and it is estimated that this will increase by around 34 % by 2030 [14].

Older patients have increasingly complex medical conditions in terms of their number of morbidities and other conditions, such as the number of medications they use, existence of geriatric syndromes, their degree of physical or mental disability, and the interplay of social factors influencing their condition [9]. Recent studies have shown that adults above 75 years of age have the highest rates of ED readmission, and the longest stays, demanding around 50 % more ancillary tests [15]. Notwithstanding the intense use of resources, these patients often leave the ED unsatisfied, and with poorer clinical outcomes, and higher rates of misdiagnosis and medication errors [16] compared to younger patients. Additionally, once they are discharged from the hospital, they have a high risk of adverse outcomes, such as functional worsening, ED readmission, hospitalization, death and institutionalization [17].

In this paper we present our recent work on ED readmission risk prediction. We utilize historic patient information, including demographic data, clinical characteristics or drug treatment information among others. Our work focuses on high risk patients (two higher strata) according to the Kaiser Permanente Risk Stratification Model [11]. This includes patients with prominence of specific organ disease (heart failure, chronic obstructive pulmonary disease and diabetes mellitus) and patients with high multi-morbidity. Predictive models are built for each of the stratified groups using different classifiers such as Support Vector Machine (SVM) and Random Forest. In order to deal with class imbalance and high dimensional feature space, different filtering techniques have been proposed during experimental approach.

The main contributions of this work are:

  • We extend the work by Besga et al. [2] applying well-known machine learning techniques such as class balancing and feature selection in order to obtain better sensitivity.

  • We compare two well stablished supervised classification algorithms, Random Forests and SVM, and analyze their performance in different scenarios.

  • We make use of a wrapper feature selection method that maximizes the prediction ability while minimizes models’ complexity.

The paper is organized as follows. In Sect. 2 we present some related works on predictive modelling for readmission risk estimation. In Sect. 3 we present the dataset as well as the methodological approach followed in order to build our models. Next, we describe the evaluation methodology and the experimental results. In Sect. 5 we discuss the conclusions and future work.

2 Related Work

Readmission risk modelling is a research topic that has been extensively studied in recent years. The main objective is usually to reduce readmission costs by identifying those patients with higher risk of coming back soon. Patients with higher risk can be followed-up after discharge, checking their health status by means of interventions such as phone calls, home visits or online monitoring, which are resource intensive. Predictive systems generally try to model the probability of unplanned readmission (or death) of a patient within a given time period.

In a recent work, Kansagara et al. [9] presented a systematic review of risk prediction models for hospital readmission. Many of the analyzed models target certain subpopulation with specific conditions or diseases such as Acute Miocardial Infarction (AMI) or heart failure (HF) while others embrace general population.

One of the most popular models that focus on general populations is LACE [3]. The LACE index is based on a model that predicts the risk of death or urgent readmission (within 30 days) after leaving the hospital. The algorithm used to build the model is commonly used in the literature (logistic regression analysis) and, according to the published results, the model has a high discriminative ability. The model uses information of 48 variables collected from 4812 patients from several Canadian hospitals.

A variant called LACE +  [4] is an extension of the previous model that makes use of variables drawn from administrative data.

A similar approach is followed by Health Quality Ontario (HQO) with their system called HARP (Hospital Admission Risk Prediction) [10]. The system aims to determine the risk of patients in short and long term future hospitalizations. HARP defines two periods of 30 days and 15 months for which the model infers the probability of hospitalization, relaying on several variables. From an initial set of variables of 4 different categories (demographic, feature community, disease and condition and meetings with the hospital system) the system identifies two sets of variables, a complex and a simpler one, with the most predictive variables. Using these sets of variables and a dataset containing approximately 382,000 episodes, two models for one month and 15 months are implemented. The models were developed using multivariate regression analysis. According to the committee of experts involved in the development of HARP, the most important metric was the sensitivity (i.e. the ability to detect hospitalizations). Regarding this metric, claimed results suggest that both simple and complex models achieve high sensitivity, although the complex model gets better results. The authors of this work suggest that the simple model could be a good substitute when certain hospitalization data is not available (e.g. to perform stratification outside the hospital).

A recent work by Yu et al. [5] presents an institution-specific readmission risk prediction framework. The idea beneath this approach is that most of the readmission prediction models have not sufficient accuracy due to differences between the patient characteristics of different hospitals. In this work an experimental study is performed, where a classification method (SVM) is applied as well as regression (Cox) analysis.

In [2] Besga et al. analyzed patients who attended Emergency Department of the Araba university Hospital (AUH) during June 2014. We exploit this dataset improving their results with further experiments.

3 Materials and Methods

The dataset, presented by Besga et al. in [2], is composed of 360 patients divided into four groups, namely: case management (CM), patients with chronic obstructive pulmonary disease (COPD), heart failure (HF) and Diabetes Mellitus (DM). For each patient a set of 97 variables were collected, divided into four main groups: (i) Sociodemographic data and baseline status, (ii) Personal history, (iii) Reasons for consultation/Diagnoses made at ED and (iv) Regular medications and other treatments. Dataset contains missing values.

In order to build our model following a binary classification approach, the target variable was set to readmitted/not readmitted. Those patients returning to ED within 30 days after being discharged are considered readmitted (value = 1), otherwise are seen as not readmitted (value = 0).

It is noteworthy that one patient returning the first day and another returning the 30th are both considered as readmitted. On the other hand, a patient returning the 31th day is considered as not readmitted, while in practice underwent a readmission. We believe that having the number of days passed before readmission would have been much more meaningful for identification and would have permitted even identifying a more accurate prediction, including the predicted time for readmission.

All the tests were conducted using 10-fold cross-validation. The evaluation metrics that we have used are: sensitivity, specificity and accuracy. In order to avoid any random number generation bias, we have conducted 10 independent executions with different random generating seeds and averaged the results obtained.

According to the data shown in Table 1 our dataset has a high dimensional feature space. In this scenario we have carried out some feature selection techniques. The goal is to find a feature subset that would reduce the complexity of the model, so that it would be easier to interpret by physicians, while improving the prediction performance and reducing overfitting.

Table 1. Distribution of variables by category

We are going to use the following approaches: filter methods and wrapper methods. Filter algorithms are general preprocessing algorithms that do not assume the use of a specific classification method. Wrapper algorithms, in the other hand, “wrap” the feature selection around a specific classifier and select a subset of features based on the classifier’s accuracy using cross-validation [18]. Wrapper methods evaluate subsets of variables, that is, unlike filter methods, do not compute the worth of a single feature but the whole subset of features.

  • Filter method: We have used Correlation-based Feature Selection (CBFS) method since it evaluates the usefulness of individual features for predicting the class along with the level of inter-correlation among them [19]. In this work we have used the implementation provided by Weka [8].

  • Wrapper method: We have selected SVM as the specific classification algorithm and Area Under the Curve (AUC) as evaluation measure. Since an exhaustive search is impractical due to space dimensionality, we used heuristics, following a greedy stepwise approach. In this work we have used the implementation provided by Weka.

3.1 Support Vector Machine

Support vector machines (SVM) are supervised learning models which have been widely used in bioinformatics research and many other fields since their introduction in 1995 [7]. It is often defined as a non-probabilistic binary linear classifier, as it assigns new cases into one of two possible classes. In the readmission prediction problem, the model would predict whether a new case (the patient) will be readmitted within 30 days.

This algorithm is based on the idea that input vectors are non-linearly mapped into a very high dimensional space. In this new feature space it constructs a hyperplane which separates instances of both classes. Since there exist many decision hyperplanes that might classify the data, SVM tries to find the maximum-margin hyperplane, i.e. the one that represents the largest separation (margin) between the two classes.

In this work we have used the libSVMFootnote 1 implementation of the algorithm, which is the common implementation used for experimentation, and can be easily integrated to weka [8] using a wrapper. We have used a radial basis kernel function: exp(−γ*|u−v|^2) where γ = 1/num_features and C = 1.

3.2 Random Forest

Random Forest [6] is a classifier consisting of multiple decision trees trained using randomly selected feature subspaces. This method builds multiple decision trees at training phase. In order to predict the class of a new instance, it is put down to each of these trees. Each tree gives a prediction (votes) and the class having most votes over all the trees of the forest will be selected. The algorithm uses the bagging method, i.e. each tree is trained using a random subset (with replacement) of the original dataset. In addition, each split uses a random subset of features.

One of the advantages of random forests is that generally they generalize better than decision trees, which tend to overfitting and naturally perform some feature selection. They can also be run on large datasets and can handle thousands of attributes without attribute deletion. In this work we have used Weka’s implementation of the algorithm.

4 Results

In this section we analyze the prediction performance of different models on the emergency department short-time readmission dataset presented in [2]. As shown in Table 2 we have considered besides the original four subpopulations a fifth dataset that encompasses all of them.

Table 2. Comparative information about the subpopulations of the dataset

4.1 Class Balancing

In readmission prediction analysis like in any other supervised classification problem, imbalanced class distribution leads to important performance evaluation issues and problems to achieve desired results. The underlying problem with imbalanced datasets is that classification algorithms are often biased towards the majority class and hence, there is a higher misclassification rate of the minority class instances (which are usually the most interesting ones from the practical point of view) [13].

As shown in Table 3, class imbalance is causing an accuracy paradox. If we just look at the accuracy of the model we get an 83.62 % although SVM just behaves as suing only the greater a priori probability to make the classification decision.

Table 3. Confusion matrix of SVM on the diabetes mellitus dataset

Resampling.

There are several methods that can be used in order to tackle the class imbalance problem. Building a more balanced dataset is one of the most intuitive approaches. In our experiment we have used under-sampling as a preliminary approach and continued with an over-sampling using synthetic samples.

Under-sampling with random subsample.

Given that there is a low number of samples for the minority-class, which is also the most relevant for classification, we can anticipate that reducing the amount of samples for the majority-class to be comparable to the minority-class and avoid the class imbalance will lead to a model with poor generalization capability.

Focusing on the diabetes mellitus subpopulation dataset, it is composed of 97 instances belonging to the not-readmitted class and only 19 of the readmitted class. An experiment consisting of subsampling the dataset to a distribution of 1:1.5 between the minority and majority classes, and then applying a Random Forest classifier shows the following results in Table 4.

Table 4. Comparison of performance evaluation metrics for RF over original and under-sampled versions of diabetes mellitus dataset

As seen in Table 4, although the classification sensitivity has increased, it is still low (31.57 %) despite the sacrifice of both accuracy and specificity performance. Takin into account the low number of instances contained in our dataset, we don’t consider under-sampling an effective approach.

Oversampling with SMOTE.

We used Synthetic Minority Over-sampling Technique (SMOTE) [20] for oversampling the minority class. In order to avoid overfitting, we applied SMOTE (percentage of new instances equal to 200) at each fold of the 10-fold cross validation. If oversampling is done before 10-fold cross-validation, it is very likely that some of the newly created instances and the original ones are both in the training and testing sets, thus causing performance metrics being optimistic.

Our approach is to test the performance of two classifiers, namely SVM and Random Forests, using the over-sampled dataset, in order to compare it with the results obtained using the original imbalanced dataset. The choice of these two classifiers was based on the fact that both SVM and RF have been widely used in the literature, achieving good results [6]. On one hand, SVM has the advantage of been able to deal with data which is difficult to directly separate in the feature space, while on the other Random Forest has the advantage of the embedded feature selection process, which is helpful in high dimensional feature spaces. The experiment will be carried out by generating a model for each of the subpopulations on each of the specified scenarios. Table 5 shows the results of our experiment.

Table 5. Performance comparison using SVM and RF classifiers on original and over-sampled datasets

Results show that class-balanced dataset achieved better sensitivity than the original dataset. Nevertheless, both accuracy and specificity achieve worse results. It is worth noting that while performance is similar for both classifiers using the original dataset, SVM performs much better (in terms of sensitivity) when using the over-sampled version. At last, we observe that sensitivity improvement is rather small and it is obtained mainly at the expense of worsening both sensitivity and accuracy.

4.2 Feature Selection

Our dataset has a high dimensional feature space. With the use of feature selection algorithms we want to find a feature subset that would reduce the complexity of the model (so that it would be easier to interpret by the physicians) while improving the prediction performance and reducing overfitting. For that purpose we are using a filter method, with Correlation-based Feature Selection [19] as metric and a wrapper method, with SVM as the specific classifier, both presented in Sect. 3.

The experiment consists in training a SVM and a RF classifier using the original feature set and the generated feature subsets. The performance of the classifiers will be compared in terms of sensitivity, specificity and accuracy for each of the subpopulations.

It’s worth noting that the feature selection must be done using cross-validation. If full training set is utilized during attribute selection process, the generalization ability of the model can be compromised.

In Table 6 the results of the experiment are shown. According to these results, although in some cases the sensibility has been increased, overall the results are not as promising as expected. Actually, even though models are much simpler than the original model (i.e. the one using full feature set), the prediction performance has been reduced. Moreover, both feature selection methods have performed similarly, even if selected feature subsets differs considerably.

Table 6. Performance comparison of both feature selection methods

5 Conclusions and Future Work

This paper has presented a work on the prediction of 30-day readmission risk in Emergency Department. Several contributions have been presented regarding the enhancement of predictor’s performance, with special focus on sensitivity, i.e. predictive power of the critical class of readmitting patients. First, we have conducted an experiment that shows the performance variations produced by class-balancing techniques. Second, we analyze different feature selection methods and metrics and evaluate their performance. Two classification algorithms have been used (SVM and Random Forest) in order to evaluate the different approaches.

According to the results of our analysis, we conclude that although class balancing improves sensitivity results, the dataset seems not to have enough minority-class instances. In addition, setting 30 days as the arbitrary threshold for assigning the binary class label may cause situations such as labelling a patient readmitted the 30th day as “readmitted” and another readmitted the 31st day as “not readmitted”. This imposes a clear limitation to any generated model, since actually both patients should be treated as similar (in terms of readmission).

Future work will include addressing the problem with a regression approach, instead of supervised classification. Thus we want to avoid the mentioned arbitrary labelling problem. With a regression analysis approach we plan to predict not only the readmission risk but also the approximate readmission window (i.e. the time interval from hospital discharge and readmission).

We also plan to increase the size of the dataset, including more instances of the minority class. Extending the samples of the readmission class we expect to achieve better predictions and ultimately generate a better-generalizing model.