Introduction

Idiopathic pulmonary fibrosis (IPF) is a disease with a severe prognosis and a mean survival from the time of diagnosis estimated from retrospective studies of 3–5 years [1,2,3,4,5]. However, the natural history of IPF is highly variable and it is very difficult to predict the disease course in individual patient [6,7,8,9]. Placebo arms of recent phase II and III clinical trials provide an opportunity to investigate the natural history of lung function decline in patients with IPF [10,11,12,13,14,15,16,17,18,19,20]. However, these results should be regarded with caution, due to relatively short follow-up times (1–2 years); moreover, patients enrolled in clinical trials do not represent a random sample of the whole IPF population, many patients are in fact excluded for age, disease severity and comorbidities. For this reason, the probability of death estimated among patients enrolled in placebo arms of clinical trials has proved to be substantially lower as compared to real practice, while estimated survival appeared to be longer [21, 22].

In this context, healthcare administrative databases can represent a useful tool to conduct historical cohort studies on unselected populations to describe the real progression of a rare disease and to investigate the role of several health determinants on main disease outcomes. In literature, many researchers used this approach to estimate incidence and prevalence rate of IPF in specific geographic areas and their time trends [6, 23,24,25,26,27,28,29,30]. In these studies, the diagnostic code-based algorithms used to identify IPF cases are different according to the specific information gathered and available in the databases. Although some algorithms are clinically validated and their performances well-known, few studies have implemented this approach to analyse the main outcomes of the disease [31].

In this study, we used healthcare administrative databases of Lombardy Region (10 million inhabitants) to identify a retrospective cohort of unselect patients with IPF and analyse the progression of disease, in terms of mortality and hospitalizations following its onset. Secondary objective was to evaluate the impact of demographic characteristics and clinical complexity of patients on these outcomes.

Materials and methods

Study design, data collection and study population

We conducted a retrospective observational study using healthcare administrative data of the Lombardy Region that provides universal coverage within the Italian National Health Service (NHS).

At regional level, data regarding the main provided health services are stored in several large databases for administrative and epidemiological purposes. A data warehouse (called DENALI) has been created that collect and arrange health information of all Lombardy Region population [28].

A peculiarity of DENALI is the implementation of probabilistic record linkage to match data belonging to the same individual, but stored in different datasets. This linkage technique uses properties of variables common to different databases to determine the probability that two records refer to the same person [32, 33], and it therefore provides accurate matching when datasets do not share a single common identifier or when the identifier contains errors or omissions [34].

To perform our study, we used data collected in DENALI from 2000 to 2010, related to demographic aspects (sex, date of birth, date of death, date of emigration) and to the main characteristics of inpatient and outpatient claims, occurred in public or private hospitals and funded by the public healthcare system.

Since there is no univocal IPF case definition based on diagnosis codes reported in health claims, we used multiple analyses identifying IPF cases through three algorithms similar to those reported in recent studies, that have a clinical validation in literature and that are adaptable to Italian available data [23, 35, 36] (Table 1). The first definition of IPF case was based on the algorithm proposed by Raghu et al. [23, 35] and clinically validated by Esposito et al. [31], on patients at least 50 years old; the second one was based on algorithm proposed and validated by Ley [36]. The third case definition was also based on Ley’s algorithm but we decided to remove the inclusion criteria “at least 2 claims for either ICD-9 code 516.3 at least 1 month apart”, to include patients who did not have a second claim for IPF disease because: (1) the mild expression of disease did not require further access to the NHS, or (2) the severe clinical conditions led to death in a short time.

Table 1 Algorithms used to identify IPF cases in Italian healthcare administrative databases

We excluded patients who had been living in Lombardy for less than 5 years at the time of the first medical claim with a diagnosis code for IPF (index event), to identify newly diagnosed IPF cases (incident cases). We chose a washout period of 5 years because the median survival of IPF patients is 2–5 years from diagnosis [1, 5]. Thus, for each case definition, we finally obtained the cohort of Lombardy dwellers with a new diagnosis of IPF between January 1st 2005 and December 31st 2010 (study period).

After the index event, we assessed vital status, time to death and hospitalizations occurred in the study period for each enrolled patient. Among hospitalizations, we also identified the subgroup of admissions due to an acute respiratory event, as those recorded with a first diagnosis of acute respiratory failure (ICD-9-CM code 518.81), acute lung oedema (518.84), pneumonia (486) or congestive heart failure (CHF) (428.0).

From all inpatients claims occurred during the five years preceding the index event, we identified coexisting chronic conditions, using ICD-9-CM diagnostic codes according to the algorithms proposed by Quan [37]. The coexisting pathologies were also aggregated into a comorbidity score, known as the Charlson comorbidity index (CCI) [38].

Statistical analysis

At the index event, we evaluated demographic characteristics (age at onset, sex) and coexisting chronic conditions. We compared data of patients selected based on the three IPF case definitions: differences were evaluated with Pearson χ2 test for nominal and discrete variables and one-way ANOVA for continuous variables. Non-parametric tests were used if the variable distribution was not normal [39]. Bonferroni’s corrections was used to assess differences between cohorts.

For each cohort identified through the three IPF case definitions, we estimated mean annual mortality rate (per 1,000 person-years), mean annual incidence rate for hospitalization and for acute respiratory hospitalization (per person-year), presenting them with exact 95% confidence intervals (95% CI). We carried out overall and sex-stratified survival analyses using the Kaplan–Meier approach to estimate mean survival time, mean time to the first hospitalization and mean time to first admission for acute respiratory event after the index event. Log-rank test was used to assess the significance of the difference between male and female patients. Cox proportional hazard models were applied to investigate the relationship between potential risk factors (age at onset, sex, CCI and comorbidities) and outcomes (death and hospital admission). The test proposed by Harrel and Lee based on Schoenfeld residuals was used to check the proportional hazards assumption.

For all statistical tests, a pre-specified two-sided α of 0.05 was regarded as statistically significant.

All analyses were performed using SAS software, version 9.4 (SAS Institute, Cary, NC, USA) and R, version 3.5.2 (R Project for Statistical Computing, http://www.R-project.org).

Results

According to the aforementioned definitions of IPF cases, we identified 2,338 (definition 1), 460 (definition 2), and 1,704 (definition 3) incident IPF cases with a mean age at diagnosis of about 72 years and a proportion of male patients varying between 59 and 62% (Table 2). At the index event, patients without a chronic comorbidity were around 26% and 30%, according to IPF case definition. The most prevalent coexisting diseases among IPF patients were: other pulmonary diseases (41–42%), CHF (19–23%), diabetes (17–19%), cerebrovascular diseases (13–15%), tumour (12–15%), and myocardial infarction (11%-12%). Overall, the complexity of female patients at diagnosis was significantly lower than that of male patients, regardless of the IPF case definition: 22% of males had not comorbidities while the percentage in females varied between 33 and 41% (Tables e1–e3). Based on definition 1, 43.4% of patients died during follow-up period with a mean annual mortality rate of 213.6 (per 1,000 person-years) (95% CI 213.2–214.0) (Table 3). This rate increased to 234.8 (95% CI 234.2–235.3) and to 267.5 (95% CI 266.5–268.6) applying definition 3 and definition 2, respectively.

Table 2 Characteristics of the study population at index event, according to IPF case definition
Table 3 Outcomes of interest occurred during follow-up period, according to IPF case definition

In IPF patients, factors associated with mortality were age, male sex and CCI.

(Table e4, Fig. 1a–c, Figure e2A–e2C). When definition 2 was used, this last factor was not a statistically significant predictor (Table e4, Figure e1A–e1C).

Table 4 Multivariable Cox proportional hazards models

Using definition 1, about 70% of study population had a hospital admission for any cause and the mean annual rate (per person-year) was 1.67 (95% CI 1.63–1.71) (Table 3). Concerning the hospital admission due to an acute respiratory event, 24% were hospitalized and the mean annual rate (per person-year) was 0.18 (95% CI 0.17–0.19). Based on definition 2, hospitalization rates and proportions of patients with at least one hospital admission were statistically higher compared to those obtained using definition 1. Using definition 3, the results related to the hospitalizations during the follow-up were similar to those obtained by applying definition 1 and they were statistically lower than those obtained by definition 2. Based on definition 1 and 3, the predictors associated to hospitalization for any cause and an acute respiratory event were age, male sex and CCI (Table e4, Figs. 1d–I, e2D–e2I).Using definition 2, age and male sex were statically significant predictors for acute respiratory hospitalization (Table e4, Figure e1D–e1I).

According to definition 1 and definition 3, chronic conditions associated to mortality in IPF patients were CHF, tumour and diabetes (Table 4). Concerning predictors for hospitalizations after IPF onset, CHF and COPD were associated to this outcome.

Discussion

Our study considered incident IPF cases in the Lombardy region (about 10 million inhabitants), the most populous region in Italy, in the period between 2005 and 2010. The study describes survival, hospitalizations and frequency of co-morbidities in an unselected IPF cohort, with typical demographic and clinical characteristics, followed up for a long period (five years of time), before the introduction in clinical practice of antifibrotic therapies, pirfenidone and nintedanib. We used data from the healthcare administrative databases of the regional Health System; this approach allowed us to analyse a large unselected cohort of IPF patients and to follow them for a long period after the disease onset. Using different algorithms proposed and validated in literature [23, 35, 36], we observed that mortality and hospitalization rate are high in patients with IPF and age, sex and comorbidities significantly affect clinical outcomes occurred after the onset. The median survival of IPF has been established to be between 3.5 and 4.4 years in the US and 3.1 years in the UK and the cause of death was often the disease itself [7, 26]. Analyses of subgroups are consistent in affirming that the elderly are the most affected demographic subgroup [6, 7, 23, 26, 40, 41, 42, 43] and the ones with the poorest prognosis [44]; regarding sex, men have been shown to be more susceptible to IPF [7, 23, 24, 26], even if women seem to have a higher rate of increase in IPF mortality [6]. Our results confirm the severe prognosis of the disease with a mean survival from diagnosis of 3–3.4 years. Comparisons are difficult due to the use of different methods to identify the study populations and different study designs. In this unselected IPF population, males are more frequently represented and show a greater prevalence of comorbidity than females. As in a previous study [45], in our population, females show a significant survival advantage over males, even after adjusting for age and chronic comorbidities.

Hospitalizations are common events in patients with IPF [46, 47]. Studies on IPF patients have analysed the significance of hospitalization [48,49,50] and non-elective hospitalization is usually of clinical significance to the patient and it is associated with a high risk of subsequent mortality [51]. For this reason, hospitalization has been used as an end-point in IPF clinical trials [15,16,17]. Hospitalization is common in our patients with IPF during the course of the disease with a mean annual rate (per person-year) varied between 1.67 (95% CI 1.63–1.71) and 2.60 (95% CI 2.49–2.70). Hospitalization is an event also reported in clinical trials, but occurs with a much lower frequency [52]. This may be due to better case selection, enrolment of patients with a generally less severe disease, fewer comorbidities and not elderly, and/or a shorter follow-up period.

In literature, prevalence data on comorbidities in IPF are sparse. The reported prevalence of comorbidities is variable; this depends on the type of studies and how comorbidities were studied. Comorbidities may affect quality of life of IPF patients and potentially influence survival; it is debatable whether treating comorbidities influences the clinical outcome [5, 42], even if it is reasonable to assume that improved treatment of comorbidities may have a favourable impact on the clinical course of these patients [42, 53]. Our results confirm that comorbidities are frequently observed in patients with IPF. The high prevalence of COPD patients comes as no surprise, since both diseases are related to smoking. In our cohort, we found a high prevalence of neoplasms (12–15%). Previous single center retrospective studies reported a high prevalence of these comorbidities and showed as some of these comorbidities influenced survival [42, 54]. We confirmed previous observations and outlined that the number of comorbidities is important for survival estimation [41, 42].

Population-based studies of IPF, as our study is, allow to analyse large population for long periods of time, but have been limited by reliance on diagnostic code-based algorithms that lack clinical validation. The poor positive predictive value (PPV) of the IPF algorithm is likely due to a combination of misdiagnosis at the clinical level and miscoding at the administrative level [31]. A modified IPF algorithm was derived and validated to optimize the PPV by Ley’s study [36]. Applying the Ley et al. [36] criteria to our population, the results were identical both regarding the prevalence of comorbidities and the effect of age and sex on prognosis (death and hospitalization). With the application of Ley et al. [36] criteria, the number of comorbidities does not influence the patient’s prognosis: this could be due to the fact that with these criteria, the number of the population has decreased a lot and this does not allow to reach statistical. The Ley et al. criteria required at least two distinct IPF diagnosis codes at least one month apart; we have also introduced a modified Ley criteria believing that the need for at least two IPF diagnosis codes can exclude more serious patients and patients with very mild disease. Even with this change, the results have been substantially identical.

The consistency of the results obtained with the application of the different criteria supports the robustness and confidence of the results themselves.

According to previous results, in our study, chronic conditions associated to mortality in IPF patients were CHF, tumour and diabetes [41, 42]. We also observed that CHF and COPD were predictors for hospitalization.

There are some limitations in our study linked to the use of administrative healthcare data, as discussed in previous studies [28, 55]. First, the case definition used to identify IPF patients in this study is not validated with respect to medical record review and if diagnoses were based on multidisciplinary discussions is unknown. Second, pertinent clinical data such as smoking and occupational status, as well as environmental exposures and pulmonary function tests are not captured in our database and their effects on our findings are therefore unknown. Finally, in our analysis, we have not been able to establish the cause of death (whether due to the disease itself or to comorbidities) due to the absence of death certificate data.

Despite these limitations, our study analysed a very large population of unselected incident IPF cases over a long period of 5 years and our results were confirmed using different algorithms of IPF case definitions.

Our findings represent a retrospective observation of real life data in a period prior to the era of antifibrotic therapy and which therefore reflects the natural history of the disease. The mortality data, hospitalizations and comorbidities observed in our study probably come closer to the real situation than the same data obtained from clinical trials in which the patient population is definitely more homogeneous and well studied but also more selected and less severe. Differences between the data obtained from clinical trials and real life also emerge from registries such as the German INSIGHT-IPF registry, where the yearly mortality, for example, turned out to be higher than the rate observed in clinical trials (observed mortality rate 14.2% per year vs. 7–8% in the placebo group of various clinical trials) [55, 56, 57]. The data collection process is different from ours, but the German study and our experience stress the marked difference in the clinical history of IPF that emerges with respect to the data provided by large randomized controlled clinical trials.

It should be emphasized that the data in our study relate to a time when the currently approved antifibrotic drugs for IPF therapy were not yet available. It will be interesting to see whether the introduction of these drugs, which may slow the progression of the disease, will in the coming years lead to a change in survival that can be detected in studies on real life patients like ours.

Lastly, one important finding from our study is the significant effect of sex on the clinical history of the disease. If these observations will be confirmed by further studies, we believe that sex should be taken into account in patient stratification when designing future clinical trials to avoid possible confounding factors in interpreting the results.

Conclusions

Since the Italian healthcare system is universal, our data source provided us with one of the largest samples of patients ever considered, without age limitations, and with a long follow-up period. Our results suggest that the burden of IPF could be considerable, potentially requiring substantial health care resources, as IPF patients present many comorbidities and are likely to be hospitalized.

These real life data confirm the poor prognosis for IPF, the high mortality and risk of hospitalization and the frequent presence of comorbidities related in part to the age of the affected population and partly to smoking. Our data provide evidence that the disease prognosis is significantly worse in men both in terms of higher and earlier mortality and earlier hospitalization. From the methodological point of view, our data are confirmed and validated after application of Ley’s criteria and definition and this support the confidence on our results.

Fig. 1
figure 1

Results of Cox proportional hazards models in incident IPF cases identified by definition 1.Survival function and probability of hospitalization during follow-up time. a–c Survival functions estimated by sex, for 70-year-old patient (mean age at IPF onset) and no chronic comorbidities (CCI equal to 0) (panel a), CCI equal to 1 or 2 (panel b) and CCI more than 2 (panel c). d–f Probability of first hospitalization(ordinary or day hospital) estimated by sex, for 70-year-old patient (mean age at IPF onset) and no chronic comorbidities (CCI equal to 0) (panel d), CCI equal to 1 or 2 (panel e) and CCI more than 2 (panel f). g–l Probability of first acute respiratory hospitalization estimated by sex, for 70-year-old patient (mean age at IPF onset) and no chronic comorbidities (CCI equal to 0) (panel J), CCI equal to 1 or 2 (panel k) and CCI more than 2 (panel l)