Initial correspondence from Drs. Maofeng and Li

Dear Editor,

We read with great interest the study by Yehya and colleagues that investigated the risk factors of morality outcome for pediatric patients with acute respiratory distress syndrome (ARDS) [1]. It was a good idea to investigate interaction terms of direct/indirect and immunocompromised status as well as the interaction between infectious/non-infectious and immunocompromised status. However, since the model was validated in the same data set as that used for the development of the model, the problem of overfitting cannot be fully excluded. It is well known that the inclusion of interaction terms increases the degrees of freedom and the apparent fit (e.g., using the same set to develop and validate the model) can be good. Another issue is that the authors opted to use a logistic regression model because the proportional hazards assumption was not fulfilled by the using Cox proportional model. However, an alternative method is to incorporate the immunocompromised status with the time-varying coefficient [2]. In other words, the effect of immunocompromised status can have different effects (hazard ratios) on 7 versus 21 days (Fig. 1). In this way, the information of survival outcome can be fully utilized. Finally, the authors utilized the Fine-Gray model to account for competing risks, but it is suggested to report both the subdistribution hazard ratio (sHR) and cause-specific hazard ratio. The sHR has a direct association with cumulative incidence [3]. It can also be interesting to investigate subphenotypes of pediatric ARDS by using a latent profile analysis or model-based regression tree [4].

Fig. 1
figure 1

Schematic illustration of a variable with time-varying coefficients in a Cox proportional hazard regression model. The effect of a covariate x can have different effects on mortality risk. In other words, the coefficient varies over time. Time stratified effect of fixed baseline covariate on survival. In the example, the immunocompromised status was a fixed covariate determined at baseline. Note that the effects of the baseline covariate for different time windows are different, resulting in a series of hazard ratios

Reply from Drs. Yehya, Keim and Thomas

We appreciate Drs. Maofeng’s and Li’s interest in our manuscript investigating subtypes of pediatric acute respiratory distress syndrome (ARDS). They raise concerns about overfitting, the use of logistic (rather than Cox) regression, and presentation of cause-specific hazard ratios (CSHRs) in addition to subdistribution hazard ratios (SHRs). We agree overfitting is an issue and that external validation is a must. We were limited by the lack of other pediatric ARDS cohorts of adequate size for validation. We eagerly await the eventual publication of the multicenter, multinational pediatric ARDS Incidence and Epidemiology Study.

We chose logistic regression over Cox for two reasons. First, after consultation with a statistician, we were pointed to commentary by Dr. David Schoenfeld, who argues that in critical care research prolonged survival among patients that die during their stay does not benefit that patient [5]. Arguably, it would be preferable to die on day 1 rather than prolong an inevitable death on day 28, as a patient is not extubated or physiologically capable (usually) during the interim to enjoy the benefits of survival. Second, proportional hazards are optimal for Cox regression, and not all variables tested (specifically immunocompromised status) met this assumption. We considered using time-varying coefficients, but chose not to as this would have created a three-way interaction term (immunocompromised status, time, and infectious/non-infectious ARDS). We felt this was unnecessarily confusing. Therefore, we opted for the logistic model, which required fewer assumptions of our data.

However, we agree that time-varying coefficients are appropriate. Thus, we present here the Cox proportional hazard analysis of the interaction between immunocompromised status and direct/indirect and infectious/non-infectious ARDS, stratified by separate time horizons (Table 1), as suggested. In every model, the interaction term for immunocompromised*infectious/non-infectious is significant, consistent with our logistic model.

Table 1 Effect sizes for different subsets of immunocompromised status when analyzed using Cox proportional hazard modeling

It is suggested that we report both SHR and CSHR for probability of extubation. We worry that this would be misleading as SHR and CSHR convey fundamentally different information. By censoring observations > 28 days, competing risk regression becomes comparable to ventilator-free days (VFD) at 28 days [6]. SHR reports the association between a variable and the probability of extubation given that non-survivors are assigned ventilation > 28 days (similar to assigning non-survivors VFD = 0). By contrast, CSHR reports the association between a variable and probability of extubation treating deaths as censored. This negates the effect of death on the outcome of interest (extubation), which we feel is misleading.

Finally, Drs. Maofeng and Li suggest that we consider a latent profile analysis or model-based regression tree for subtype identification. We completely agree. Because we anticipate similar conclusions from latent profile analysis of the existing data set, we are reserving this for identifying endotypes using biomarkers, which may identify subgroups with shared biologic rather than clinical characteristics. As for model-based regression classification, we performed classification and regression tree analysis in our original manuscript (Fig. 3), resulting in a simplified version of our logistic model. As before, this will also require validation in an external data set prior to widespread adoption.