Introduction

Computed tomography (CT) is the “gold standard” method for assessing structural changes in the lungs of patients with cystic fibrosis (CF). Although CT is more sensitive than spirometry for detecting mild disease progression [1], its role in disease monitoring remains controversial, notably because of the cumulative radiation dose. However, very-low-dose CT protocols delivering doses close to those of combined posteroanterior and lateral chest X-rays were recently described [2,3,4].

Numerous visual CT scoring methods have been proposed to quantify pulmonary structural changes, the most recent being the CF-CT score [5], derived from the Brody-II score [6]. Visual CT scores have higher sensitivity than pulmonary function tests for the detection of early changes in the CF lung [7, 8], and have been shown to correlate with clinical endpoints such as survival [9], quality of life [1, 10] and the rate of exacerbations [8, 11]. However, visual methods have questionable repeatability, require extensive training and are time-consuming [12], preventing their use in clinical practice, as well as for clinical studies. It is noteworthy that none of the recent trials of ivacaftor and/or lumacaftor, the recently released therapies aimed at improving function of the defective cystic fibrosis transmembrane conductance regulator (CFTR) protein, used CT structural changes as an endpoint [13, 14]. Automated methods could overcome the limitations of visual scoring.

Most CF-related lung morphological changes exhibit higher CT attenuation values than normal lung parenchyma. This is especially the case of bronchial wall thickening and mucus plugging, both of which are dramatically improved by ivacaftor [15].

By analogy with automated quantification of CT low-attenuation areas in emphysema [16], we postulated that automated quantification of high-attenuation structures in CF might objectively reflect disease severity and improvement under the newly released therapies.

The purpose of this study was to develop an automated density-based CT scoring method for evaluating high attenuating lung structural abnormalities in patients with cystic fibrosis (CF). To keep the CT radiation dose as low as possible, we focused on a scoring method only requiring inspiratory images.

Materials and methods

Patients

This multicentre two-phase retrospective study, based on adult outpatients with CF, was approved by the Paris Ile de France I ethics committee (ref. 13.652). The need for informed consent was waived.

Patient characteristics are presented in Table 1.

Table 1 Patient characteristics

The development phase involved CF patients from six French CF centres who had at least one gating (class 3) mutation in the gene encoding cystic fibrosis transmembrane conductance regulator (CFTR) protein and who were treated with ivacaftor. Patients were eligible if they had had at least two volumetric unenhanced chest CT exams, one before starting ivacaftor and at least one during treatment, plus spirometric measurements performed within 1 month before or after each CT exam, performed from November 2010 through September 2015. A total of 45 CT exams (17 at baseline + 28 during follow-up) from 17 patients were included. The improvement of visual score in the patients of the development cohort has been reported in a previous paper [15].

The validation phase involved an independent cohort of CF patients not treated with ivacaftor. This unpublished cohort included all 53 CF outpatients who had unenhanced chest CT and spirometric measurements performed on the same day at our nationally designated adult CF centre in 2013. These combined examinations were performed as part of their routine follow-up.

Exclusion criteria were the unavailability of CT images reconstructed with a soft kernel or a slice thickness of more than 2 mm, or the use of contrast injection. Only CT examinations performed as routine follow-up and outside an exacerbation phase were taken into account.

CT examinations

Depending on the centre, inspiratory chest CT examinations of the whole lungs were obtained with eight different 16-128 multislice CT devices from four different vendors (Somatom Sensation 16 and Somatom Definition DS, Siemens Healthcare, Erlangen, Germany; Lightspeed VCT, Optima CT660 and Discovery HD750, GE Healthcare, Milwaukee, WI, USA; Ingenuity CT, Philips Healthcare, Best, The Netherlands; Aquilion, Toshiba Medical System, Otawara, Japan).

Regarding the acquisition protocol, the tube voltage was 80, 100 or 120 kV, depending on body weight, in 3 (3%), 37 (38%) and 58 (59%) cases, respectively. Images were reconstructed with a slice thickness of 0.625-2 mm, with for each case, at least one set of images reconstructed with a standard reconstruction algorithm. This was true for both pre-treatment and follow-up CT scans. Median dose length product was 148 mGy.cm [interquartile range (IQR) = 128-185 mGy.cm]. Iterative reconstructions were used for some CT examinations. All CT acquisitions targeted the whole lung volume. Being not a standard of care in the participating CF centres, spirometric control of the inspiratory CT acquisition was not performed, and additional expiratory images were not acquired.

Image analysis

Image analysis was performed by two radiologists (G.C. and M.P.R.) with 5 and 16 years of experience in thoracic imaging, respectively.

CT images were scored blindly to clinical information and to the date of CT. Both automated and visual scores were calculated.

Automated CT scoring

Quantification was performed after whole-lung segmentation, which consisted of separating the lungs from the chest wall and mediastinum. Two commercially available lung-segmentation software programs were systematically used for each CT, in order to later evaluate the lung segmentation software influence on the score results: Myrian XP-lung software version 1.19.1 (Intrasense, Montpellier, France) and Syngo.via Pulmo CT software version VB1B (Siemens Healthcare, Erlangen, Germany). The lung segmentations were completely automated, with no user interference. The central airways were not included, contrary to intrapulmonary bronchi and pulmonary vessels.

Myrian lung segmentation was improved by systematically applying a sequence of basic morphological operators, in automated mode, and was optionally improved by additional manual editing with 3D tools, especially to include peripheral consolidations when they had been excluded from the initial lung segmentation (Fig. E1, Supplementary material).

Three sets of data were thus obtained for the development cohort:

  1. 1.

    Lung segmentation with XP-lung software without manual editing (segmentation 1)

  2. 2.

    Lung segmentation with XP-lung software with manual editing (segmentation 2)

  3. 3.

    Lung segmentation with Pulmo CT software without manual editing (segmentation 3)

Only segmentation 1 was used for the validation cohort.

Structural changes with high attenuation values (e.g. bronchial wall thickening, mucus plugging/ bronchiolar nodules, consolidation, atelectasis) were quantified with a thresholding method. Fourteen threshold values were tested for their correlation with FEV1%pred. Four fixed threshold values were tested [-300, -400, -500 and -600 Hounsfield units (HU)], as well as ten adapted threshold values, taking into account, for each CT examination, individual histogram features, namely mode—corresponding to the most highly represented attenuation value—mean lung density (MLD) and standard deviation (SD), which are known to be influenced by the inspiratory level. Expiration flattens the density distribution curve and also shifts it towards higher density values [17, 18] (see Supplementary material, Fig. E2). We hypothesised that adapted thresholds based on Mode or MLD or integrating SD might compensate for the changes of density distribution related to the level of inspiration.

Various combinations were tested, as shown in Table 2.

Table 2 Cross-sectional correlations between CT-density scores and FEV1%pred

The automated CT-density score was expressed as the ratio between the high-attenuating (diseased) lung volume and total lung volume. For instance, a CT-density score value of 4 indicated that 4% of the total lung volume had an attenuation value superior or equal to the threshold.

Additional details on image processing are provided in the online Supplementary material.

Visual scoring

The visual method used the CF-CT score, except that air trapping was not assessed because expiratory images were not available [5]. The visual scores in the two cohorts were calculated by one of the two radiologists (G.C.) who had received 1 week of intensive training in a reference centre (Lung Analysis, Erasmus Medical Centre, Rotterdam, The Netherlands) in order to achieve good interobserver agreement in scoring.

Scoring repeatability and time required

Twenty-five CT scans from the development cohort were randomly selected to assess repeatability.

Intraobserver repeatability (G.C.) was evaluated for the visual score and the automated score based on segmentation 2, which included manual editing, by two reading sessions at a 1-month interval.

Interobserver repeatability (M.P.R.) was evaluated for the automated score, based on segmentation 2. Repeatability was not assessed for the automated scores based on segmentation 1 or 3, which did not include manual editing.

The time required to obtain the visual and automated scores was measured in the development cohort.

Spirometric measurements

The percentage of predicted forced expiratory volume in 1 s (% predicted, FEV1%pred) measured at the time of CT was used as an endpoint to assess the CT score performance for cross-sectional evaluation in the two cohorts. Longitudinal evaluation was also performed in the development cohort. Changes in FEV1%pred (ΔFEV1%pred) contemporary to each follow-up CT scan were used to assess the performance of the visual and automated scores for patient follow-up. In patients with more than two follow-up CT scans, the comparison was always performed with the closest preceding simultaneous CT and functional examinations.

Statistical analysis

Spearman's rank correlation coefficient (R) was used to judge the correlation between the CT scores and FEV1%pred, and the correlation between changes in the CT scores (Δscores) and changes in FEV1%pred (ΔFEV1%pred). Spearman R values were interpreted as follows: <0.4 = absent to weak correlation, 0.40-0.59 = moderate correlation, 0.60-0.79 = good correlation, >0.8 = strong correlation.

The statistical significance of changes in the CT scores and FEV1%pred values between baseline and last follow-up was evaluated with the Wilcoxon test.

The intraclass correlation coefficient (ICC) and Bland-Altman plots were used to assess repeatability. Excellent repeatability was assumed when the ICC was 0.8 or more.

SAS software version 9.4 (SAS Institute, Cary, NC, USA) was used for all analyses.

Results

Patients characteristics

A total of 70 patients were evaluated: 17 in the development cohort and 53 in the validation cohort (Table 1).

The development cohort included a median of two scans per patient (IQR = 2-3) with a median interval of 17 months between consecutive scans (IQR = 12.5-21.7). The median FEV1%pred was 38% before treatment (IQR = 33-77) and increased by a median of +3·9% (predicted) between two consecutive CT scans on treatment (IQR = -3 to +8; range = -17 to +28% predicted).

In the validation cohort, a single CT scan and the corresponding FEV1%pred value were evaluated for each patient. The median FEV1%pred was 52% (IQR = 38-68).

Cross-sectional correlation in the development cohort

All CT scans (2-4 per patient) with contemporary FEV1%pred values were analysed for cross-sectional correlations between the CT scores and FEV1%pred in the development cohort (Table 2). Using segmentation 1, the median values of mode and SD were -912 HU (IQR = -899 to -912 HU) and 170 HU (IQR = 150-183 HU), respectively. All automated CT-density scores based on adapted thresholds showed moderate to good correlations with FEV1%pred (R = -0.55 to -0.68, p < 0.001), while those based on fixed thresholds tended to show weaker correlations (R = -0.43 to -0.57, p ≤ 0.004). The highest correlation coefficient values were obtained when using Mode + 3 SD as the threshold (R = -0.61 to -0.68, depending on the segmentation method, p < 0.001). This was also true when considering only the initial CT for each patient (R = -0.71 to -0.85, p ≤ 0.005).

The correlation coefficient value with FEV1%pred was slightly higher for the visual score (R = -0.72, p < 0.001).

The correlations between the visual and automated CT scores were good to strong (R = 0.68-0.89, p < 0.001) (Table E1, Supplementary material). Median CT-density scores when using Mode + 3 SD or Mode + 300 HU as the threshold were 4.0 (IQR = 3.5-4.4) and 7.5 (IQR = 5.4-9.6) respectively, based on segmentation 1 (Table E2, Supplementary material).

Cross-sectional correlation in the validation cohort

Correlations in the validation cohort were close to those obtained in the development cohort. CT-density scores based on adapted thresholds (Mode + 300, 400 or 500 HU and Mode + 1.5, 2 or 3 SD) showed good correlations with FEV1%pred (R = -0.60 to -0.61, p < 0.001) (Fig. 1).

Fig. 1
figure 1

Automated CT scoring in the validation cohort of patients with various disease severities. a Axial CT image in a patient with mild lung disease (FEV1%pred =77%). Bronchiectasis and bronchial wall thickening are seen in the posterior segment of the right upper lobe (white arrow). These lesions are included in areas of high attenuation (pink areas). Scoring with Mode + 300 HU yielded a CT-density score of 4.4. This means that 4.4% of the total lung volume had an attenuation value superior or equal to mode (-899 HU) + 300 HU. b Axial CT image in a patient with moderate disease (FEV1%pred =56%) shows bilateral mucus plugging (yellow arrowheads). The CT-density score was 9.8. c Axial CT image in a patient with severe disease (FEV1%pred =31%) shows diffuse bronchiectasis and bronchial wall thickening (yellow arrows). The CT-density score was 14.5

The correlation between the visual CF-CT score and FEV1%pred was also good (R = -0.64, p < 0.001).

Longitudinal correlations in the development cohort

Longitudinal correlations between the ΔCT-density score and ΔFEV1%pred based on the 28 follow-up CT scans in the development cohort are summarised in Table 3. The ΔCT-density scores obtained with adapted thresholds tended to better correlate with ΔFEV1%pred than did scores based on fixed thresholds. The highest correlation coefficient values were obtained with Δ(Mode + 3 SD) (R = -0.55 to -0.61, p ≤ 0.008) and Δ(Mode + 300 HU) (R = -0.51 to -0.60, p ≤ 0.008) depending on the segmentation method.

Table 3 Correlations between longitudinal changes in CT-density scores (Δ scores) and in FEV1%pred (Δ FEV1%pred) in the development cohort

By contrast, the Δvisual CF-CT score showed only a moderate correlation with ΔFEV1%pred (R = -0.49, p = 0.008).

FEV1%pred and visual CF-CT score showed discordant evolution in 32% of cases overall, with either improvement of FEV1%pred but increase of CF-CT score or worsening of FEV1%pred but decrease of CF-CT score. Changes in the CT-density score and FEV1%pred, with automated scores based on Mode + 3 SD or Mode + 300 HU, were discordant in 24% (5/21) to 32% (9/28) of cases, depending on the segmentation method used (Fig. 2).

Fig. 2
figure 2

Development cohort: longitudinal changes in FEV1 % predicted versus changes in a the visual CF-CT score, and in the CT-density scores based on b Mode + 300 HU and c Mode + 3 SD with segmentation 2. The changes are concordant in the left upper quadrant (improvement in both FEV1%pred and the CT score) and in the right lower quadrant (worsening of both FEV1%pred and the CT score)

FEV1%pred values improved significantly between the pretherapeutic examination and last follow-up on ivacaftor [+6.3% predicted; 95% confidence interval (95% CI), 0-14.5; p = 0.045]. Significant improvements were also noted in the visual CF-CT score (p = 0.016) and in the CT-density scores based on Mode + 3 SD and Mode + 300 HU when calculated from segmentations 1 and 2 (p < 0.05). The CT-density score calculated from segmentation 3 also improved, but not significantly (p > 0.05) (Table E3, Supplementary material).

Repeatability and time required

The intraobserver repeatability of both the visual CF-CT score and the automated CT-density score based on segmentation 2 was excellent (ICC > 0.8). However, intraobserver repeatability was higher with the automated score (ICCs ≥ 0.947), regardless of the threshold.

The interobserver repeatability of the automated score was also excellent (ICC, 0.947-0.997) (Table E4, Supplementary material)

The average time required to obtain the automated scores was respectively 2.0 ± 0.5 and 0.8 ± 0.2 min when based on segmentation 1 and 3 (no manual editing). It was 6.6 ± 2.4 min based on segmentation 2 with manual editing. The visual CF-CT score took an average of 17.8 ± 7.8 min.

Discussion

We report a good cross-sectional correlation between a new automated density-based CT score for high-attenuating lung structural abnormalities and FEV1%pred in adults with CF. The automated score tended to better correlate with changes in FEV1%pred among patients treated with ivacaftor than did the visual score. The developed score was validated in a larger independent cohort of unselected adult CF patients, with similar results for two different commercially available lung segmentation software.

Although the CT attenuation of pulmonary structural abnormalities associated with CF differs from that of normal lung, few attempts have been made to quantify CF-related pulmonary lesions in terms of CT density distribution. Quantification of low-attenuation areas in order to assess air trapping showed a good correlation with residual volume (RV) and maximum mid-expiratory flow [17, 19,20,21]. However, most of these correlations were weak, and the need for both expiratory and inspiratory images raise concerns as to the radiation dose.

Our aim was to develop a score only requiring inspiratory images, to keep the CT radiation dose as low as possible. Indeed, the report by O’Connell et al. [22] highlighted the increasing exposure to ionising radiation to patients with CF, being mainly attributable to CT scanning. This is the reason why we exclude routinely performing expiratory CT in our centre.

Even though Palumbo et al. [23] described a flattening of the lung parenchyma CT density distribution in CF patients with severe lung impairment, we are first to propose an automated CT scoring method based on quantification of high-attenuating lung structures.

Quantitative evaluation of airway disease in CF has been previously performed with other approaches, focusing on the analysis of airway size and geometry [24,25,26]. Wielpütz et al. [24], performing automated airway analysis, reported high negative correlations between enlarged airway dimensions and FEV1 in adults.

Quantifying high attenuation lung structures, instead of irreversible bronchial lumen dilatation allows monitoring changes under the newly developed targeted therapies and this explains why the score correlated well with FEV1 on longitudinal follow-up. Normal high-attenuating lung structures such as pulmonary vessels are also included. However, differences in pulmonary vessel volume among patients had probably little influence on score variations compared to those due to the bronchial disease.

Compared to fixed thresholds, adapted thresholds taking into account CT acquisition-dependent variations in lung density distribution improved the correlation with FEV1%pred. Indeed, lung density is known to be influenced by the level of inspiration, the scanning parameters, the quality of CT calibration and even the CT device manufacturer [27,28,29,30]. Whereas expiration flattens the density distribution curve with a shift towards higher density values [17, 18], most other parameters mainly shift the density distribution towards higher or lower values [30, 31]. Various correction methods have been proposed to quantify emphysema and air trapping, but none had previously been proposed for quantifying high-attenuating structures [17, 30,31,32]. We suspected that adapting thresholds based on Mode or MLD might partially correct the shift in density distribution, whereas the use of SD might partially correct the flattening of the density distribution due to a low level of inspiration. The benefit of adapting the threshold was supported by the stronger FEV1%pred correlations obtained with adapted thresholds. Mode + 3 SD or Mode + 300 HU offered the best compromise, optimising the correlations in both the cross-sectional and longitudinal analyses.

Adapted thresholds based on Mode also allow taking into account the attenuation variations due to various tube voltage setting, since Mode represents the most frequent attenuation value observed in the lung histogram. De Lavernhe et al. [33] studied the correlation between lung function and other histogram characteristics and found a moderate cross-sectional correlation of Log-iKurtosis with FEV1%pred in CF patients. This approach is different from ours, which is based on the quantification of high-attenuating structures.

The cross-sectional correlation of our automated score with FEV1%pred was slightly weaker than that of the visual CF-CT score in both the development and validation cohorts in our study, still remaining in the upper range of correlation values previously reported for visual scores (-0.33 to -0.78) [6, 34,35,36]. In contrast, as already mentioned, longitudinal correlation with FEV1%pred was slightly better for the automated score. This may be because subtle changes are more difficult to assess visually than to detect by objective measurements of attenuation, or because visual score also takes into account irreversible changes such as bronchial dilatation.

The rates of discordance with changes in FEV1%pred were similar with the visual and automated CT scores, and in the range of those previously reported (31%, vs 24-32% in our series) [37].

Median CT-density scores calculated with lung segmentation 1 or 2 improved significantly in the CF patients treated with ivacaftor. An improvement was also noted for scores obtained with segmentation 3, even though statistical significance was reached for fewer thresholds. The similar results obtained with and without manual editing of small segmentation errors (segmentation 1 and 2) demonstrate the process can be fully automated.

Automated CT scoring has the potential to overcome the main limitations of visual scoring methods, one of which is the training required to attain and maintain adequate repeatability. Calder et al. [12] suggested that central reading by highly trained scorers might be an option for CT research studies [20]. Automated methods do not suffer from repeatability issues and, even in case of manual editing, repeatability was excellent regardless of the threshold (ICC > 0.90). However, we only evaluated repeatability based on the same set of scans; thus, only the influence of variation in the manual edited segmentations was evaluated, no other potential variation factors, which would require repeating scans within a short time frame.

Lastly, the time required for automated scoring was far shorter than for visual scoring and is compatible with daily practice.

Our study has several limitations. First, owing to the retrospective and multicentre design, the scanning techniques were not standardised. Contrary to visual scoring, density-based automated scoring is highly dependent on the scanning technique. The CT examinations used in our development cohort often had different slice thicknesses and/or tube voltages. This heterogeneity may have been detrimental for our scoring method. However, this corresponds to routine practice and has not prevented from obtaining good correlations to the pulmonary function. The use of standardised scanning protocols and calibrated breath-holds, as previously suggested [38, 39], would probably even improve the performance of our automated CT scoring method. Another limitation is that the number of CT scans per patient in the development cohort was uneven. However, our results were confirmed in the validation cohort, with only one CT scan and one FEV1%pred value per patient. A third limitation is that FEV1%pred was the only CF outcome measure. However FEV1%pred measurement is the only surrogate for mortality to be considered as primary endpoint by the European Medicines Agency for clinical trials in CF [40]. Correlations with other clinical endpoints such as quality of life, the exacerbation rate, and survival are important for the validation of chest CT as a surrogate outcome and should be assessed in further studies [39].

In conclusion, our results demonstrate that CT density-based automated scoring of lung structural abnormalities is feasible in CF patients. The use of adapted thresholds such as Mode + 3 SD or Mode + 300 HU to quantify high-attenuating CF-related lesions yielded a good correlation with FEV1%pred, in both cross-sectional and longitudinal analyses. This scoring method, validated in a second, independent cohort, is much less time-consuming than visual scoring and could prove suitable both in daily practice and as an objective endpoint for clinical trials.