Keywords

1 Introduction

Chronic lung disease is one of the leading causes of morbidity and mortality across the world. As smoking rates in the developing world increase, the prevalence of chronic lung disease is set to rise. Interstitial lung diseases (ILD) are characterised by inflammation and scarring of the lung and the incidence of ILD continues to increase [25].

A subset of ILDs are characterised by lung fibrosis, with idiopathic pulmonary fibrosis (IPF) having the worst prognosis of all the fibrosing ILDs [4]. In IPF the airways are pulled open by fibrotic contraction of the surrounding connective tissue. Computed tomography (CT) imaging is used to visualise airway structure. In IPF the presence of dilated airways in the lung periphery on CT, termed traction bronchiectasis, is a disease hallmark.

When assessing disease severity in IPF, physiologic measurements are typically used. However these are associated with a degree of measurement variability. It has been postulated that combining imaging measures of airway abnormality with lung function measurements could help improve estimation of disease severity in IPF [18]. Importantly, better measures of disease severity would benefit cohort enrichment of subjects into therapeutic trials.

Lung damage in IPF progresses from the distal lung towards the centre of the lung [15]. As a result, the earliest signs of lung damage are seen in the smaller airways. Yet these airways are typically the most challenging to quantify. Airway measurement is complicated by partial volume effects that result in smaller airways having a blurred contour to their walls. Measurement challenges are compounded by variations in CT image acquisition including different reconstruction kernels, scan parameters and scanner models as well as the underlying pathology affecting the lung.

Physics based airway measurement algorithms tend to perform sub optimally when measuring the lumens of small airways [3, 12]. Identifying airway walls can also be challenging. Airway paths often run in tandem with those of the pulmonary artery. Consequently, in regions when the pulmonary artery abuts the airway wall, identification of the contour of the outer airway wall is compromised.

1.1 Related Work

Deep learning frameworks have been applied to the measurement of airways in the lung in a bid to improve measurement accuracy. However, these machine learning methods are extremely data hungry and can be biased towards the training data sample [10]. Synthetic data by way of generative models has been employed to improve the training of deep learning models. This helps overcome the data limitations that are ubiquitous to medical imaging studies [24].

A state of the art method in measuring airway lumen radius and wall thickness on CT imaging, simGAN [16, 21], takes labelled simplistic representations of airway patches (synthetic images) and aims to transforms them in to the emulations of real airways by generative adversarial training (GAN) [6]. These are then used for supervised training of a convolutional neural regressor (CNR) which learns to measure airway radius and wall thickness and ultimately to run inference on real CT images.

The driving loss for realism in simGAN is cross-entropy loss computed on the classifications of the discriminator. For successful synthetic refinement by image transformation, the synthetic and refined images must have good correspondence in their shared label. To this end, a per-pixel \(\Vert l\Vert _{1}\) regularisation loss is applied between input and output of the refiner.

GAN training is inherently unstable with mode collapse complicating and lengthening training times. As an alternative strategy, in this paper we propose the first use of perceptual losses to generate labelled synthetic airway images. Perceptual loss functions have been applied to image style transfer and super-resolution tasks [11]. We explore the clinical benefits of learning from perceptual loss generated synthetic data in mortality prediction.

2 Methods

In the first part of our study we generate synthetic airway patches that demonstrate realistic airway characteristics. In tandem, we segment the airways on clinical CT scans of a cohort of IPF patients. We train our Airway Transfer Network (ATN) to transform our synthetic images to refined images across our synthetic and real datasets by optimising for perceptual losses. We then compare the results of ATN with simGAN. A CNR is trained on the resultant refined datasets for the purpose of inference on real CT airways. We compare the two refiner models qualitatively. We compare ATN and simGAN against the full width at half maximum edgecued segmentation limited (FWHMesl) technique as implemented in [20], originally by [12]. The FWHMesl technique is widely used in the literature as the reference for comparison of previous airway measurement methods [7, 16, 26]. In our clinical comparison, we examine which of the three methods of airway measurement provides the best and most consistent association with mortality on CT scans of patients with IPF.

Airway segmentation was performed using a 2D dilated U-Net [27] trained on CT scans in 25 IPF and healthy individuals [17]. We extract orthogonal airway patches for all segmented airways. We parameterise airway labels as two ellipses that share centre and rotation, resulting in 7 parameters for each patch: inner airway wall major and minor axis radii \(R_{A}\) and \(R_{B}\); outer airway wall major and minor axis radii \(W_{A}\) and \(W_{B}\); centre coordinates \(C_{x}\) and \(C_{y}\); and rotation \(\theta \). Due to the phase in \(\theta \), for the purposes of CNR training the rotation angle is converted into a double angle representation [13].

Once the refiner model has been trained, its output is used to train a CNR by supervised learning to regress to target airway labels. The inner and outer airway wall measures are then derived. All deep learning methods were implemented in pytorch [19] and CT image processing was done using the open source airway analysis framework known as AirQuant [17]. We release our code open sourceFootnote 1.

2.1 Airway Synthesis

Details of airway parameters and synthesis pipeline have been previously described [16]. Airway characteristics are sampled from a set of distribution parameters informed by [23]. We deviate from these parameters in two ways. First, we use an airway lumen radius (LR) interval of [0.3, 6] to permit measurement of smaller airways. Second, we use an airway wall thickness [\(0.1 \cdot LR + 0.2\), \(0.3\cdot LR+0.8\)] mm to reflect the lack of airway wall thickening in IPF. We add four further parameters: (i) parameters for the airway centre determined by a normal distribution \(X\sim N(0,1)\) mm to account for airway skeletons that are not perfectly positioned within the centre of the airway lumen. (ii) \(p=0.4\) that an adjacent airway of similar diameter is randomly added. This is performed to accommodate airway patches close to airway bifurcations and to train the CNR to correctly identify the airway in the centre of the patch. (iii) We model our airways as ellipsoids, we achieve this by an ellipsoidness characteristic, sampled from a uniform distribution, \(X\sim U(0.9,1)\) which determines the ratio in major and minor radii of the ellipse. (iv) Uniformly random rotation applied to the airway in the horizontal axis. We include our synthetic airway generator and configuration parameters in the open-source code repository.

2.2 Perceptual Losses

We implement perceptual losses for computing high level perceptual differences between synthetic and real images as described by [11]. These losses are computed by comparing the activations in particular layers, j of a pretrained convolutional neural network (CNN), \(\phi \) between a pair of images. Different activation layers of a trained CNN learn to represent different image features on the same sampled patch. In minimising for perceptual losses we are looking to reduce the differences in the activation of these layers between the refiner output and some objective image. For each calculation of perceptual losses on a synthetic input image, x we have a refiner prediction, \(\hat{y}\). As a modification of the original style transfer implementation [11], a randomly chosen real image is selected as the style target, \(y_{s}\). Perceptual losses are then calculated and summed for different layers \(\phi _{j}\).

We utilise feature reconstruction loss. This is defined as the mean euclidean distance between activations of the input and output images of the refiner, where C, H, and W are the number of channels, height and width of layer j respectively. We use a VGG-16 [22] network pretrained on the ImageNet dataset [2] in our calculations of style and feature losses.

$$\begin{aligned} l^{\phi ,j}_{feat}(\hat{y},x) = \frac{1}{C_{j}H_{j}W_{j}} \Vert \phi _{j}(\hat{y}) - \phi _{j}(x)\Vert _{1} \end{aligned}$$
(1)

We also employ style reconstruction loss, which considers those features that tend to be activated together between the refiner output and the given style target image, a random real airway, where \(G^{\phi }_{j}\) is the gram matrix for a given layer j of \(\phi \) as described in [5].

$$\begin{aligned} l^{\phi ,j}_{style}(\hat{y},y_{s}) = \frac{1}{C_{j}H_{j}W_{j}} \Vert G^{\phi }_{j}(\hat{y}) - G^{\phi }_{j}(y_{s})\Vert _{1} \end{aligned}$$
(2)

2.3 Clinical Data

We examined CT images from 113 IPF patients diagnosed at the University Hospitals Leuven, Belgium. CTs were evaluated by an experienced chest radiologist (author JJ) for quality i.e. absence of breathing artefacts and infection. The quality of the automated segmentation was also visually inspected to ensure contiguous airway segmentations without oversegmentation blowouts. Airway segmentations were also required to reach the sixth airway generation in the upper and lower lobes to be selected for analysis. Pulmonary function tests were considered if they occurred within 90 days of the CT scan: Forced Vital Capacity (FVC, n = 111)); diffusing capacity of the lung for carbon monoxide (DLco, n=103).

The trachea and first generation bronchi were excluded from analysis. We define an airway segment as the length of airway that runs between airway branching points or an airway endpoint. All airway segments were pruned by 1 mm at either end to avoid bifurcating patches. \(80\times 80\) pixel size orthogonal airway patches were linearly interpolated with a pixel size of \(0.5\times 0.5\) mm from the CT at 0.5 mm intervals along each segment. This resulted in a final set of 546,790 real CT-derived airway patches. A synthetic dataset of 375,000 patches was generated to train our refiner.

27% of patients were female. 74% of patients had smoked previously. The median patient age was 71, with 57% of patients having died. All patients had received antifibrotic drug treatment.

Measures of intertapering, intratapering [14] and absolute airway volume were derived from the airway measurements for each airway segment. Segmental intertapering represents the relative difference in diameter of an airway segment when compared to its parent segment. Segmental intertapering is calculated as the difference in mean diameter, \(\bar{d}\) of an airway segment and its parent segment, \(\bar{d_{p}}\), divided by the mean diameter of the parent segment. Segmental intratapering is the gradient of change in diameter of the airway segment relative to the diameter of the origin of the segmentFootnote 2. Segmental intratapering is computed by dividing the gradient, m by the zero-intercept, c of a line \(y=mx+c\) fitted to the diameter measurements of an airway segment. Segmental volume is computed by summing area measurements along an airway segment, and multiplying this value by the measurement interval, i.e. an integration of area along the segment’s length.

$$\begin{aligned} intertapering = \frac{\bar{d_{p}}-\bar{d}}{\bar{d_{p}}} \end{aligned}$$
(3)
$$\begin{aligned} intratapering = \frac{-m}{c} \end{aligned}$$
(4)

Univariable and multivariable Cox proportional hazards models were used to examine patient survival. Multivariable models included patient age (years), gender, smoking status (never vs ever) and either FVC or DLco (as measures of disease severity) as covariates. The goodness of fit of the model was denoted by the concordance index [8]. A p-value of <0.05 was considered statistically significant.

2.4 Implementation Details

We use the same refiner architecture as in [16, 21], the refiner is a purely convolutional network with four repeating 3 \(\times \) 3, 64 feature ResNet blocks [9]. The measurement CNR, described in [16], is a convolutional network that feeds into two fully connected layers to learn the airway ellipse parameters. Instead of the custom CNR loss described in [16], we implemented a mean square error (MSE) loss for regressing to the airway ellipse parameters.

Synthetic images were generated to \(0.5\times 0.5\) mm pixel size making \(80\times 80\) pixel patches, corresponding to the real patch generation noted in Sect. 2.3. All images were standardised and augmented on the fly, adding random Gaussian noise [25, 25] Hounsfield units, random levels of Gaussian blurring with standard deviation scalled in the interval [0.5, 0.875] and random flipping (\(p=0.2\)). We apply random scaling on real images only, in the interval [0.75, 1.25] to increase diversity in airway size. Finally, a centre crop was applied to make a \(32\times 32\) pixel input patch.

Both simGAN and ATN models were trained for 10000 steps, where the simGAN refiner had 50 training iterations and the discriminator 1 iteration for every 1 step. The simGAN discriminator was implemented as described in the original method, with a memory buffer and local patch discrimination [21]. We used Weights & Biases for experiment tracking [1].

Figure demonstrates the overall method employed here as well as the ATN and CNR architecture.

Fig. 1.
figure 1

Schematic demonstrating the data flows and model architectures. Also included is the architecture of the Airway Transfer Network (ATN) and Convolutional Neural Regressor (CNR). Where \(y_{c}\), \(y_{s}\) and \(\hat{y}\) refer to the notation used for calculating feature, \(l_{feat}\) and style, \(l_{style}\) losses from the particular activation layers of the pretrained VGG-16 model. AirQuant is an opensource airway analysis framework that can extract airway patches. The CNR model feeds measurements of the real airways back to AirQuant for final analysis.

3 Results

We implemented all training on an NVIDIA GeForce RTX 2070 graphical processing unit with a batch size of 256, learning rate of 0.001, \(\Vert l\Vert _{1}\) regularisation factor in range of [0.0001, 0.1]. simGAN and ATN took 14 and 0.6 h respectively to converge during training. We qualitatively found that both simGAN and ATN produced refined images of optimal quality with a \(\Vert l\Vert _{1}\) regularisation factor of 0.01.

Style-transfer from paintings to natural images show that larger-scale structure is transferred from the target image when training on losses of higher layers [11]. In order to maintain label correspondence between refiner input and output, we similarly only use the feature loss using the relu3_3 activation layer. Style loss is computed from the two lower relu1_2, relu2_2 activation layers onlyFootnote 3. Figure 2 demonstrates qualitative results of our airway refinement method.

Fig. 2.
figure 2

Uncurated set of synthetic images x and output \(\hat{y}\) of our airway transformation network in the same relative position below. Our model was trained to minimise perceptual losses. Airways are all represented at different scales.

The CNR was trained with batch size in the interval [256, 2000] and learning rate of 0.001. Batch size of 2000 was chosen for its speed, and converged at around 40 epochs within one hour. The CNR achieves comparable results on ATN and simGAN refined images.

Figure 3 demonstrates qualitative results of our ATN method on real CT data. Table 1 shows results of the Cox regression survival analyses. The CNR when regressing to an airway feature demonstrated a strong association with mortality. This was despite the CNR label not perfectly aligning to the exact airway boundary.

Table 1. Cox proportional hazards results comparing mortality prediction of airway biomarkers derived by different measurement methods.
Fig. 3.
figure 3

Uncurated inference on real airway patches performed by our airway measurement regressor network. The network was trained on refined synthetic data from our proposed airway transformation network, which minimises perceptual losses. The inner red ellipse delineates the inner airway wall and the outer blue ellipse, the outer airway wall. Airways are all presented at different scales. (Color figure online)

4 Conclusion

We present a learning based airway measurement method trained on a transformation network that refines synthetic data using perceptual losses. Our model ATN was compared with a state-of-the-art model simGAN [16] and a physics based method FWHMesl. When assessing the clinical utility of ATN, we found that it was the strongest predictor of survival across all three airway biomarkers. We found that our method trains faster and with minimal complications, unlike a GAN framework. We expect future work to consider the generalisation of such a method, for example examining airways in patients with different diseases, images acquired on different scanner parameters and potentially on higher scale imaging such as micro-CT studies of the lungs.