Introduction

The conditions for hemodialysis initiation and their association with outcomes in patients with end-stage kidney disease (ESKD) remain unsettled. Among the unfavorable circumstances, studies have mainly focused on late referral, which was consistently associated with prolonged initial hospitalization and elevated risk of mortality [1,2,3,4,5,6,7].

In contrast, the consequences of emergency start are currently poorly explored. According to the REIN registry, in France 30% of chronic hemodialysis patients were initiated in emergency conditions [8, 9]. Available retrospective cohort studies showed that emergency start was associated with higher mortality than planned dialysis [10, 11]. The emergency starters had lower health status, a higher rate of comorbidities, prolonged initial hospitalization and frequently were initiated on a central venous catheter (CVC) [12, 13]. CVC at initiation has been independently associated with infectious complications and excess risk of 1-year mortality [14,15,16,17]. Besides, the benefit of early referral to the nephrologist was reduced when the dialysis was started in a "suboptimal condition" with a CVC or as unplanned treatment [18, 19].

We took advantage of the prospective nationwide French REIN registry to define discrete groups of patients according to the conditions of hemodialysis initiation (follow-up, symptoms or timely inception, planned or emergency start, fistula or CVC) and their effects on survival. Two methods were used; the first summarized the clinical experience and epidemiological study results; the second was based on data mining constructed on the classification and regression tree (CART). A further objective was to evaluate the discriminative performance of these two methods on survival and infer a classification that could be applied in clinical practice.

Methods

Population

We used the data of the “Renal Epidemiology Information Network” (REIN) database and of the National Health Data System (SNDS), a French nationwide medico-administrative healthcare database that includes all ambulatory care and hospital stay reimbursement data nation-wide as well as death-related data [8]. REIN is a government-sponsored, nationwide, prospective registry which includes exhaustively all patients on kidney replacement therapy—either dialysis or transplantation—living in France. The details of its organizational principles and quality control have been reported elsewhere in detail [8]. The study was nested in the REIN registry approved by the CCTIRS, the CNIL, and the Scientific Council of the Agence de Biomedecine. All patients gave their informed consent. In our study, all adult (> 18 years) patients who started hemodialysis as first-line therapy for ESKD between 2010 and 2013 were eligible. Exclusion criteria were acute kidney injury and first treatment by peritoneal dialysis or renal transplantation. The patients were regrouped using the two following methods.

First regrouping method based on the clinical pattern

All patients were divided into four groups according to the condition of initiation of dialysis (emergency vs planned, symptoms, previous follow-up):

  • "Followed planned starters" began their dialysis without initial hospital stay or short initial stay (≤ 72 h) in a nephrology department and had preemptive creation of a native arteriovenous fistula (AVF).

  • "Followed symptomatic non-urgent starters" represent the patients who had been scheduled for a planned dialysis initiation but started earlier because of any non-urgent symptomatic event.

  • "Followed urgent starters" had consultations with a nephrologist in the year prior to the first dialysis, however, had not been addressed to planned renal replacement therapy and started dialysis in an emergency condition (mainly acute pulmonary edema or hyperkalemia).

  • "Unknown urgent starters" regrouped patients without any follow-up or consultation with a nephrologist, had not received erythropoietin in the year before dialysis initiation, and had a CVC for their first hemodialysis session.

The variable "emergency" is directly reported in REIN, allowing the "urgent starters" to be selected. The urgent starters were considered "followed urgent" if they had had at least one nephrology consultation or had received erythropoietin treatment, or had an AVF created before the first hemodialysis. If none of these conditions was met, the urgent starters were defined as "unknown urgent".

REIN and SNDS are two anonymized databases without a common identifier. They were matched by the indirect "deterministic" method, as validated by Raffray et al., on age, sex, place of residence, place of treatment, date of treatment and death date if relevant [20]. Followed patients without any initial hospital stay or with short initial hospital stay ≤ 72 h in a nephrology department or hospital stay for a reason other than a renal disease were identified as "followed planned", and those with initial hospital stay ≥ 72 h in a nephrology department as "followed symptomatic non-urgent starters". The patients who did not match any group identification after two stages were excluded from the analysis.

Second regrouping methods based on data mining (Classification and Regression Tree (CART) method)

The REIN database was divided into two samples. The first one, created by a random selection of 25% of REIN patients, was used to construct the CART. We used only the variables responding to the following criteria for data mining: the variable must describe the baseline characteristics of patients and parameters of HD start; the rate of missing values must be ≤ 10%; for binary variables, the rate of one of the two possible values must be ≥ 10% (without restriction for categorical variables). The objective of “data mining was to identify the most informative variables that could predict the 2-year overall survival rate. In this analysis, the clinical proxies for an emergency start were "initiation coded as urgent", "first initiation in ICU", and "first hemodialysis on CVC". The parameters of the CART model were determined by default (minsplit = 20, cp = 0.1). The data concerning the remaining 75% of patients (test-sample) were used to test the CART-based classification.

Statistical analysis

The position and dispersion parameters were expressed as mean ± standard deviation, median, maximum, minimum for numerical data and number and frequency with percentage for categorical and binary variables. Overall survival rates from the first hemodialysis date to 2 years after initiation and 95% confidence intervals were assessed by the Kaplan–Meier method, and survival probabilities were compared using the Log-rank test. Multivariate Cox proportional survival hazards models were constructed to determine the effect of grouping on patient survival adjusted for comorbidities and baseline characteristics (gender, emergency start, biochemical parameters and causes of ESKD). Missing data were not imputed. All statistical analyses were performed with R software version 3.3.2.

Results

First regrouping in four groups according to the clinical conditions

Matching results

We could match the data of 27,905 (81%) from 34,306 patients recorded in the REIN registry. We collected information about length of hospital stay up to 3 months before dialysis initiation to separate followed planned from followed symptomatic non-urgent starters. We excluded 3742 patients due to missing information about a hospital stay, that made the grouping impossible. For 353 patients initially identified as unknown urgent starters in REIN, after matching with SNDS, we found information about hospital stay in a nephrology department before their first HD; these patients were also excluded. The final dataset contained 23,810 patients of whom 15,177 (63.7%) were "followed planned", 3023 (12.7%) "followed symptomatic non-urgent", 3722 (15.6%) "followed urgent" and 1888 (8%) "unknown urgent" starters (Sup Fig. 1).

Pattern of patients according to the groups of HD initiation

The mean age of the selected patients was 68.3 ± 15.1 years (36.2% women). The predominant age class was 71 years old and over across the four groups (Table 1). A total of 5610 (23.6%) patients initiated hemodialysis in an emergency condition. The most common comorbidities for all groups were diabetes, coronary heart disease, congestive heart failure and heart rhythm disorders. Diabetes and hypertension were the two leading causes of ESKD.

Table 1 Demographic, clinical and laboratory characteristics at inception in the four clinically-defined groups of hemodialysis initiation

The "followed urgent starters" were likely to have more comorbidities than other groups: 49.6% had diabetes, and 72.4% of them received insulin treatment, 40.4% had congestive heart failure, 32.6% coronary heart disease and 30.3% heart rhythm disorders.

"Unknown urgent starters" were less likely to have diabetes and insulin treatment. Still, they included a higher proportion of young patients (20.1% < 50 years, 9–13.2% in other groups), active smokers, cirrhosis, progressing cancers and HIV infection compared to followed patients. This group had lower serum albumin, hemoglobin and eGFR (Table 1); they initiated dialysis mostly with a CVC (97.5%), and only 5.5% had previously received erythropoietin (vs 52.7–65.3% in followed patients). Among the causes of ESKD, chronic pyelonephritis and vascular nephropathy were diagnosed more frequently than in followed patients.

"Followed planned starters" began dialysis treatment in the most favorable conditions compared to other patients. Compared to "followed planned", "symptomatic non-urgent starters" were slightly older, more frequently had diabetes but had fewer comorbidities or insulin treatment. They more often started on a CVC in line with their symptomatic profile.

Survival

"Followed urgent" and "unknown urgent starters" had lower 6-month survival probability compared with "followed planned" patients (Figure 1). Afterwards, the four curves became divergent and at 2 years, each group displayed different probability of survival (Log-rank test p < 0.001) with 77.3% (95% confidence interval 76.6–78.0) for “followed planned”, 79.2% (95% confidence interval 77.8–80.7) “followed symptomatic non urgent”, 66.8% (95% confidence interval 65.3-68.3) “followed urgent” and 71.7% (95% confidence interval 69.6–73.8) “unknown urgent starters”.

Fig. 1
figure 1

2-year survival (Kaplan Meier curve) in four groups of hemodialysis starters

In a multivariate Cox proportional hazard model, the risk of mortality at 6 months was similar among the four groups. Still, at 2 years of follow-up, the risk of mortality was lower in "followed symptomatic" (hazard ratio 0.86; 95% confidence interval 0.75–0.99) and higher in "followed urgent starters" (hazard ratio 1.05 (95% confidence interval 0.94–1.18) (Sup Tables 1 and 2). The cause of ESKD was significantly associated with the 2-year mortality risk.

Second regrouping of patients in five data mining categories

A total of 8,576 (25%) REIN patients were included in a data mining sample. Using the criteria described in the “Methods" section, we selected 25 variables (Sup Table 4). The results showed that age was the most informative variable in mortality risk, followed by CVC use for the first hemodialysis, and progressing cancer (Fig. 2). The relative risk of 2-year mortality varied between 0.21 (under 57-year-old patients without cancer) and 1.9 (over 70-year-old patients with CVC utilization during the first HD). We also tested two other models (Sup Figs. 2 and 3). However, further regrouping of patients and statistical analysis was performed using the principal model elements because it had the optimal statistical parameters, was consistent with clinical experience, and could be easily interpreted by nephrologists. Thus, the remaining 75% of patients of the REIN database were divided into five categories: age ≤ 57 years, without cancer; age 57–70 years, without cancer; age ≥ 70 years and first hemodialysis without CVC (with permanent vascular access); age < 70 years, with cancer; age ≥ 70 years, first hemodialysis with CVC. After the exclusion of 912 patients without any category identification, the final test sample included 24,818 patients.

Fig. 2
figure 2

Classification and regression tree: essential model. KT central venous catheter (CVC). For other variables cf Table S1. This tree represents the recursive partition of sample extracted from REIN. It should read from top to bottom. The order in which variables are presented depends on their capacity to predict the risk of 2-year mortality. Each of the leaves of the tree represents a sub-group of partition attached to a simple question. Each question refers to only one single attribute and has a “yes” (left branch) or “no” (right branch) answer. The values presented in each leaf are, respectively, relative risk of mortality, number of deaths carried over the total number of patients in this sub-group and proportion (%) of the sub-group in the total sample. To study which sub-group of patients we are, we must start at the root leaf and descend by asking a sequence of questions until desired leaf

Pattern of patients according to the five data mining categories of HD initiation

Overall, "under 57-year-old patients without cancer" were more autonomous and displayed the lowest rate of associated chronic diseases. Viral infections (HIV, Hepatitis B and C) were particularly prevalent in this category (Tables 2, 3), and the most common causes of ESKD were glomerulonephritis and diabetes. In all other categories, diabetes and hypertension were more likely to have contributed to ESKD.

Table 2 Demographic, clinical and laboratory characteristics at inception according to data mining categories
Table 3 Results of crossing between four groups and five categories

"Under 70-year-old patients with cancer" had a much lower proportion of diabetes (3%) and viral infection compared to under 57-year-old patients without cancer. Among patients with cancer, 41.9% initiated hemodialysis as urgent starters; the majority (70.4%) with a CVC and more frequently (16.9%) than other patients in intensive care units.

Compared to other categories, over 70-year-old starters with or without CVC had more comorbidities, mainly a higher proportion of peripheral arterial disease, abdominal aortic aneurysms, stroke, coronary heart diseases, congestive heart failure, heart rhythm disorders and chronic respiratory insufficiency. Over 70-year-old patients starting with a CVC had the highest proportion of diabetes, behavioral impairment, paraplegia, or hemiplegia, walking disabilities and handicap. More than half of these patients began hemodialysis in emergency conditions.

"Over 70-year-old patients without CVC" were more likely to start the treatment in favorable conditions. The majority initiated as planned starters (only 10.9% in emergency conditions, 2.4% in intensive care unit), had been more frequently consulted by a nephrologist (5.4 ± 2.8 consultations), and had more often received erythropoietin (62.2%) in the year prior to the first dialysis. Their biochemical profile was also better than in other patients: hemoglobin (10.5 g/dl), serum albumin (34.4 g/l), plasma creatinine (508 μmol/l).

The clinical profile of the 57–70-year-old patients was close to both of the over-70-year-old categories; however, they had a slightly higher rate of cirrhosis (3.7%) compared to other categories.

Survival

The differences in survival rates between the five categories were apparent within the 6 months and persisted throughout the two years after dialysis initiation (Fig. 3). Compared to other patients, the "under 57-year-old patients without cancer" displayed the highest survival rate (93.2%) and "over 70-year-old patients with CVC" had the poorest (68.5%) 2-year survival rates. The other categories had 82.5%, 72.4% and 61.4% of 2-year survival. Besides age, initiation on CVC was an essential element of outcomes in the patients aged 70 years or more.

Fig. 3
figure 3

2-year survival (Kaplan Meier curve) in five data mining categories of HD starters

The results of the Cox model are presented in Supplemental Table 5. The data mining classification in five categories had a significant effect on overall survival. It was also significantly influenced by the cause of ESKD and emergency start that slightly increased the risk of mortality (hazard ratio 1.13, 95% confidence interval 1.03–1.24).

Discussion

To dissect the respective effect on survival of emergency hemodialysis start, type of vascular access, previous follow-up, and symptoms at the inception of dialysis, we developed two different classifications regrouping incident hemodialysis patients.

In the first classification, two groups of planned and two groups of urgent starters were defined based on clinical patterns. Differences in survival were not significant within the first 6 months of inception but diverged afterwards, suggesting that outcomes were more related to the patients' characteristics than to initiation conditions per se. The emergency start was globally associated with poorer outcomes but, among the emergency starters, patients with predialysis follow-up had a lower survival rate than those who remained unknown until inception. These intriguing results may pertain to the clinical profile associated with a followed urgent start: older patients (more than 60% ≥ 70 years old) with more comorbidities, including congestive heart failure and diabetes. We hypothesized that these patients could have started dialysis after the acute decompensation of a coexistent disease or acute metabolic disorders associated with ESKD, underlying poorer outcomes. Alternatively, frail patients with multiple comorbidities could have been identified with a high risk of dying before dialysis, so that inception was postponed until there was an absolute indication related to a vital risk [21]. Conversely, patients starting when symptomatic or those unknown were younger, slightly well-nourished, had fewer comorbidities and presumably had better tolerance to the uremic milieu, or a shorter course of chronic kidney disease, explaining a paradoxical better prognosis.

"Unknown urgent starters" were patients with more heterogenous socio-demographic and clinical profiles. The regrouping algorithm may lack specificity since, for instance, the absence of consultations or hospital stay in a nephrology department before inception could not strictly be interpreted as an absence of follow-up or referral to a nephrologist. In some cases, the lack of information in hospital databases may be related to coding rules as the management of ESKD was "masked" by acute complications or another disease defined as the principal diagnosis. Also, "unknown urgent starters" may have had difficulties in acceding to health care related to organizational barriers (late referral, geographical distance, language barrier, etc.) or due to personal reasons (migrants, very elderly persons, precarious situations, etc.). Of note, both urgent groups had a very high percentage of CVC at initiation (75 and 98%), which was a strong and independent predictor of 6-month and 24-month mortality as suggested by a few studies [22, 23].

Regarding planned hemodialysis, the difference in 6-month and 2-year survival rates between "followed planned" and "followed symptomatic non-urgent starters" was minimal, an observation which is reminiscent of the results from the unique randomized trial in incident dialysis patients comparing late symptomatic versus early preemptive start, and which found similar outcomes in the two groups [4]. In a retrospective study, Rivara et al. reported a 26% excess mortality risk when symptoms were present at inception [24]. The excess risk was 40% when symptoms were related to volume overload, although the start of dialysis was probably urgent in those cases. In our study, the persistent difference in mortality across the four groups over the two year follow-up period suggests that the patients' profiles actually did contribute significantly to the outcomes, in addition to the condition of dialysis initiation.

In the Cox model, the effect of the first regrouping (urgent, symptomatic, follow-up) was not discriminating. We found a marginal excess of risk associated with the emergency start per se. However, inception with a CVC increased the 6-month relative risk of death by 74% independently of the first hemodialysis context. The harmful effect of CVC at inception persisted at 24-month follow-up, suggesting that start with a CVC was not random but associated with comorbidities and possibly decision-making processes. Among other factors, the excess of risk related to the causes of ESKD was significant, which may be explained by associated diabetes and other cardiovascular comorbidities. In this analysis, our results suggest that functional vascular access at initiation conveys most of the prognosis information related to emergency start.

The data mining algorithm determined age, CVC at inception and active cancer as the most informative variables to predict survival. The highest risk of mortality was observed in over 70-year-old patients starting with a CVC. As older age and CVC were independently associated with frail status, their combination, resulting from clinical practices, may have contributed strongly to the lowest survival rate.

The age variable was involved in the essential CART in the first and third segmentation. Increasing age is a well-known predictor of mortality and may confound the role of other factors. To investigate the dataset attitude independently of age, we performed a model excluding age from the data mining process. The causes of ESKD thus emerged as the most discriminative variable, which is not surprising as this variable was much related to age. However, the discriminative capacity of cancer, which always remained in the second position on the tree, poorly depended on age at hemodialysis start.

The five data mining categories seemed to discriminate better in terms of hazard ratio than the four clinical groups. For example, the over-70-year-old patients starting with a CVC had approximately a fourfold mortality risk than under 57-year-old patients regardless of the vascular access. Hemodialysis initiation in emergency conditions slightly increased the relative risk of death. Regarding the other factors, the excess of risk related to the causes of ESKD was significant in both Cox models adjusted on groups or categories and other variables. Furthermore, compared to the four groups, Kaplan Meier curves of data mining categories were more likely to be visually distinct at all time points.

Importantly, in both clinical and data mining models, we found that CVC was a robust independent predictor of mortality, whereas emergency start only had a marginal effect. Thus, CVC may affect outcomes directly through an increased risk of infection and bleeding but may also serve as a proxy for patients in poor clinical condition, for whom dialysis was at first deemed inappropriate. This explanation is reinforced by the mortality curves still diverging after a 2-year follow-up, long after the initial risks due to CVC insertion.

Some limitations of our study should be kept in mind. The main advantage of the clinical classification is to rely on daily practice and fit well with known prognosis factors. To our knowledge, the comparison of characteristics and survival between these four groups has never been reported. Nevertheless, this classification has two significant drawbacks. The first relates to the complexity of the method, as the algorithm matching the two databases included more than 60 successive steps. The second limitation is the exclusion of patients without clear group identification. In addition, the reduction of the sample size by 30% may not remain without consequences on statistical power and representativity.

The advantage of CART is the possibility to explore the complete dataset without exclusion of patients or establishing prior hypotheses. Furthermore, this method clarifies the relationship between different baseline parameters and mortality risk, generating hypotheses to guide prognosis research. The CART was performed automatically by R software, but the nephrologists controlled the processes by selecting the clinically pertinent variables. Surprisingly, the CART model was relevant for clinical use, as three criteria alone drove most of the outcomes, cancer under 70 years and CVC over 70 years.

Data mining exhibits two limitations. The first is caused by the reproducibility and stability of the obtained results. The CART was created using 25% of the REIN database extracted at a given time. Although we tested the classification on the remaining portion of the nationwide database, we cannot be sure that another sample of 25% or a sample of another size would provide the exact same CART structure. The probability of obtaining unexpected results that are difficult to interpret from a clinical perspective is the second potential limitation and can be explained by confounding factors. The covariations used by the CART algorithm to select the predictive variables do not measure a causal link, so that some variables may hide much more fundamental ones. The hidden covariates significantly interfere in the segmentation process but do not display later in the tree structure.

The comparison performed on the test sample revealed that these two methods were different. In general, all five data mining categories were represented within each group of the clinical classification. However, elderly patients were more likely to start planned hemodialysis if they had an arteriovenous fistula in place.

Conclusions

In conclusion, classifications based on data mining are quite innovative and may contribute to unravel the determinants of survival after starting hemodialysis. Although each classification did capture different prognostic information, both analyses showed that starting hemodialysis on a CVC was associated with more dramatic outcomes than emergency start per se. Future studies should address the reasons for delayed creation of an AV fistula and identify strategies for optimizing the availability of a functional vascular access at inception of hemodialysis.