Introduction

Lung cancer is the primary cause of cancer-related death in the world [1]; thus, lung cancer screening trials have been performed widely in an attempt to intervene in early stage disease—hopefully resulting in reduced mortality. The NLST (National Lung Screening Trial) showed a 20 % reduction in lung cancer mortality as well as 6.7 % decrease in all-cause mortality with low-dose computed tomography (CT) screening versus conventional chest radiography [2]. However, the benefit-to-harm ratio is still not fully investigated and false positive screens remain a worrying issue [3]. Targeting lung cancer screening and maximising screening benefits are the focus of future research. In this respect, selecting high-risk individuals and identifying high-risk pulmonary nodules are crucial elements of any lung cancer screening program. Various risk prediction models using patient characteristics as well as clinical risk factors have been suggested, some focusing on pre-screen or early-screen risk for selection of high-risk individuals (where no nodules have been found yet) [48], others focus on risk of malignancy of pulmonary nodules detected in screening [912].

In order for a model to prove clinically useful, it must prove not only to perform well in the original cohort, from which it was developed, but also on external data sets, thus documenting generalizability. Therefore, validation of suggested risk prediction models is essential.

The recently proposed risk prediction models exploring the probability of cancer in pulmonary nodules detected on first screening CT suggested by McWilliams et al. [12] were performed on the cohort originating from the Pan-Canadian Early Detection of Lung Cancer Study (PanCan). For validation, data from participants from chemoprevention trials from the British Columbia Cancer Agency (BCCA) were used. Parsimonious and full logistic multivariable regression models were performed. The parsimonious model included variables with p less than 0.05: sex, nodule size and nodule location. In the full model, variables with p values less than 0.25 and greater than 0.05 were added: age, family history of lung cancer, visually assessed emphysema, nodule type (nonsolid or with ground glass opacity, part-solid or solid) and nodule count per scan. Both the parsimonious and the full model were performed with and without the inclusion of the variable spiculation, because this variable was not available from the BCCA data set. In the study, receiver operating characteristics (ROC) curves with areas under the curve (AUC) of at least 0.90 were achieved.

The Danish Lung Cancer Screening Trial (DLCST) is a population-based prospective randomised controlled trial with extended data on patient characteristics and both nodule characteristics and visual evaluations of emphysema [13]. In this study, we assess the discriminative performance of the PanCan model using the DLCST cohort.

Method

Study population

We used data from the DLCST, in which participants between the ages of 50 and 70 years with a smoking history of at least 20 pack-years, and a lung function of at least 30 % of the predicted, were randomised to either five annual low-dose CT examinations or annual control visits without imaging. Participants were either continuous smokers or ex-smokers who had quit after the age of 50 years and no more than 10 years prior to entering the study. A detailed description of the study has been previously published [14].

The DLCST was approved by the Ethics Committee of Copenhagen County and fully funded by the Danish Ministry of Interior and Health. Approval of data management in the trial was obtained from the Danish Data Protection Agency. The trial is registered in the ClinicalTrials.gov Protocol Registration System (identification no. NCT00496977). All participants provided written informed consent.

Imaging

The screening group was examined on CT annually over a period of 5 years, using a multi-slice CT system (16-row Philips Mx 8000, Philips Medical Systems). Examinations were performed supine after full inspiration using a low dose technique (120 kV and 40 mAs) with the following specifications: section collimation 16 × 0.75 mm, pitch 1.5 and rotation time 0.5 s. Participants were instructed to first hyperventilate three times and then inhale maximally and hold their breath during imaging. Images were reconstructed with two kernels: thick (3 mm) and thin (1 mm) slice thicknesses using soft and hard algorithms (kernel C and D), respectively. Nodule analyses and visual assessments were performed on thin slices.

Nodule data

In this study, participants randomised to the screening arm of DLCST with at least one nodule without benign calcification pattern were included. Benign calcification pattern was defined as central, laminated, popcorn or diffuse calcification.

Two experienced chest radiologists (KB and HH) were responsible for the initial assessments of the images during the course of the screening, in which nodule size was manually measured and given as an average of the two observations. A nodule diameter of 3 mm was considered the lower limit of a positive finding in the initial evaluation in DLCST. One experienced chest radiologist (ES), blinded to diagnoses of lung cancer, recorded spiculation and patterns of benign or potentially malignant calcification, and categorised the nodules according to type: perifissural, solid, part-solid or non-solid (pure ground glass). All nodules found throughout the study period were included with the image on which they were first seen. Perifissural nodules were not included in the prediction analysis using PanCan model coefficients, as no coefficient was stated for this nodule subtype owing to the very low risk of malignancy. In this study, nodule count during the whole period of observation was used as opposed to nodule count per scan used in the PanCan study; the DLCST data registration only allowed for this analysis. One nodule which was classified as a benign calcification turned out to be malignant. This nodule was removed from the analysis as a result of the benign classification by the blinded radiologist (ES).

Data on emphysema

Baseline study images were evaluated for emphysema by one observer (MW, radiology resident with 3 years of chest radiology experience) blinded to study identifiers as well as presence or absence of lung cancer in the individual participant. Thus, images were selected so no lung cancer findings were revealed, and in case a cancer was present in the baseline scan, the second round scan, in which the cancer had been removed, was chosen instead. Consequently, no lung cancers were shown to the observer, who recorded emphysema as present or not present. If scans without nodules were unavailable, this participant was not included in the emphysema assessment. A detailed description of the method for assessment of emphysema has been previously published [13].

Statistical analysis

Basic comparative statistics were performed with the use of Student’s t test for continuous data and Fisher’s exact test for categorical data.

To estimate the lung-cancer-risk predictive ability of the PanCan models, risks were calculated with coefficients and intercepts in accordance with the PanCan risk models, using both the parsimonious and full models with spiculation (1b and 2b). For nodule size, transformations were performed as a result of non-linearity in the relationship between size and risk of lung cancer [12]. The ability of the PanCan risk prediction models to separate persons in the DLCST cohort who developed lung cancer from those who did not was assessed by measuring discriminative accuracy with the use of ROC and AUC. Secondly, multivariable logistic regression analyses were performed using DLCST data and covariates from the PanCan parsimonious and full models, thereby estimating new regression coefficients for comparison. Because some participants had more than one nodule, the variances of effect estimates were adjusted for correlated responses (clustering of data within persons, Huber–White method). ROC and AUC were used to evaluate risk prediction. Difference between AUCs was tested by use of paired bootstrapping based on 1,000 bootstrapped samples. Statistical programming and figures were performed using R, version 3.1.1, packages rms, pROC and ROCR.

Results

In DLCST, 823 persons were diagnosed with 1,385 nodules of which 233 nodules were classified as benign calcifications and excluded, leaving a total of 718 persons and 1,152 nodules to be included in the analyses.

Table 1 shows nodule characteristics by lung cancer status, in PanCan, BCCA and DLCST, respectively. The smallest nodule diameter in DLCST was 3 mm; hence, nodules are generally larger than in the PanCan and BCCA cohorts, in which nodules with a diameter of only 1 mm were included. Mean diameters differ markedly from median diameters, and the distributions of nodule sizes are skewed. Therefore, mean nodule sizes and standard deviations are of limited interpretive value.

Table 1 Nodule characteristics in the three different cohorts

Malignancy was significantly related to nodule type (Table 1); thus, perifissural nodules had a very low risk (OR 0.1, p = 0.017), and part-solid nodules were associated with increased risk (OR 2.9, p = 0.009).

Included participants had a mean age of 59.0 years (SD 4.9); participants with cancer were slightly older than participants without cancer (mean age 61.6 (SD 4.9) versus 58.8 (SD 4.8) years, p < 0.001).

There were 338 women (47.1 %) and 380 (52.9 %) men included in the study. Of these, 66 were diagnosed with lung cancer, six participants had more than one malignant nodule. Lung cancer tended to be less frequent in women compared to men (OR 0.71, p = 0.205), and emphysema was a co-finding of 36.9 % of benign nodules and of 38.7 % of malignant nodules (OR 1.1, p = 0.788). Thirteen participants (four with lung cancer and nine without) with nodules included in this study lack emphysema assessment and were thus excluded from the full model logistic regression.

Tables 2 and 3 show results from the parsimonious and full models, respectively, comparing PanCan and DLCST results. In PanCan, female sex implied a higher risk (parsimonious model: OR 1.91 (CI 1.19–3.07), p = 0.008; full model: OR 1.82 (CI 1.12–2.97), p = 0.02), whereas in DLCST, female sex tended to lower the risk (parsimonious model: OR 0.55 (CI 0.31–0.96), p = 0.047; full model: OR 0.49 (CI 0.26–0.91), p = 0.040). Furthermore, in DLCST age (OR 1.10 (CI 1.03–1.17), p = 0.001) and family history (OR 2.61 (CI 1.37–4.98), p = 0.013) were significant predictor variables in the full model (Table 3). Spiculation is a major predictor of malignancy in DLCST (parsimonious model: OR 3.77 (CI 1.72–8.30), p = 0.002; full model: OR 3.40 (CI 1.36–8.46), p = 0.013); and nodule size is by far the most important determinant in both the parsimonious and full models (regression coefficients −4.1909 and −3.8075, respectively, p < 0.001). ROC curve analysis using exclusively nodule size as the determinant variable, with DLCST coefficients, resulted in an AUC of 0.829.

Table 2 PanCan parsimonious prediction model 1b for the probability of lung cancer
Table 3 PanCan full prediction model 2b for the probability of lung cancer

For the parsimonious model with DLCST data (Fig. 1), AUCs were 0.826 and 0.853 using PanCan and DLCST coefficients, respectively; and for the full model (Figs. 2 and 3), AUCs were 0.834 and 0.870 using PanCan and DLCST coefficients, respectively. Using DLCST coefficients, AUC of the full model was not significantly larger than AUC of the parsimonious model (AUC 0.870 vs. 0.853, p = 0.064). AUC of the full model is significantly larger than AUC if nodule size is the only determinant (AUC 0.870 vs. 0.829, p = 0.015), but AUC of the parsimonious model was not significantly larger than AUC if nodule size is the only determinant (AUC 0.853 vs. 0.829, p = 0.065).

Fig. 1
figure 1

ROC curves with AUC using PanCan parsimonious model 1b on DLCST data

Fig. 2
figure 2

ROC curves with AUC using PanCan full model 2b on DLCST data

Fig. 3
figure 3

Malignant solitary pulmonary nodule with spiculation and part-solid opacity. Low-dose chest screening CT from DLCST

The non-linear transformation of nodule size used in the PanCan study (1/√nodule diameter) resulted in the same AUCs as did use of the transformation 1/log(nodule diameter).

Discussion

The high risk discrimination ability of the PanCan risk prediction models was largely validated by DLCST data from Denmark. However, AUCs did not reach 0.90. The specific AUCs of the different PanCan models were not reported in the original paper, and thus, exact comparisons are not possible.

Nodule size and spiculation are confirmed to be very significant risk predictor variables in both PanCan and DLCST (Tables 2, 3 and Fig. 3).

An important difference between the PanCan and the DLCST cohorts is the difference in nodule size of the included nodules; nodules with diameters of only 1–2 mm are included in the PanCan cohort, whereas 3 mm was the lower limit in DLCST. Thus, the mean diameter of benign nodules differs by almost 3 mm. We hypothesise that the stronger performance in the original PanCan risk prediction model cohort is mainly due to the inclusion of many very small nodules in the PanCan study. A nodule of 1–2 mm is hardly ever malignant, and this causes the discriminative power of the risk model to appear stronger than it actually is when it comes to cases of more doubt (bigger benign nodules).

With recent advances in automated nodule volumetry and volume doubling time measurements, a potential improvement in risk prediction by the use of these for size estimation, instead of using subjective, visual diameter measurement, could be tested.

When using DLCST data, age and family history were both significant predictors, and they should probably be added to a future parsimonious model. Age is a well-known lung cancer risk factor, and it is generally recognised that the incidence of lung cancer increases significantly with age [15]. Familial aggregation of lung cancer—across histological subtypes—has previously been documented, an effect that remains present after adjustment for confounders such as socio-economic status and smoking habits [16].

According to our logistic models, the effect of sex seems to be opposite in the original PanCan models: in the DLCST cohort, female sex appears to lower the risk of lung cancer. It is still controversial whether women have a different susceptibility to tobacco carcinogens; also it is important to consider both sex-related, i.e. biological differences, and gender-related, i.e. socially constructed, differences between men and women; the latter differ substantially between cultures and countries, and thus a generalised cross-culture risk prediction based on sex may be difficult [17]. Complex interactions of genetics, environmental and social constructions complicate conclusions [18], and these contradicting results suggest that sex should probably be removed from the model, as no true sex-related and culture-independent difference in lung cancer risk can be concluded, and it is certainly not supported by our results. The total number of malignant nodules was 102 in PanCan, 42 in BCCA and 66 in DLCST, and some nodules appeared in the same participants; therefore, as a result of the limited total numbers of included participants with lung cancer in PanCan, BCCA and DLCST, observed sex differences could be a matter of chance. Emphysema is not significantly associated with nodule malignancy in this validation cohort, nor was it in the PanCan study. It has, however, previously been shown that emphysema is significantly associated with lung cancer [19, 20], but perhaps emphysema is associated with increased risk of both benign and malignant nodules, and thus less useful in nodule-malignancy determination.

Regarding nodule count, it is possible that the different ways of counting the nodules in DLCST compared to PanCan (accumulative over the whole period of screening compared to nodule count in first scan only) is accountable for the differences in OR and p values observed; no effect was seen in DLCST, whereas significant reduction in lung cancer risk was seen with increasing nodule count in PanCan. Because we included all nodules found during all screening rounds in DLCST, and because the data in DLCST did not systematically include time of nodule disappearance, nodule count per scan could not be accurately calculated in DLCST, and thus accumulated nodule count was used instead.

In the PanCan risk prediction model paper it is stated that smoking history was not independently associated with lung cancer in the fully adjusted model and was thus left out. It is unclear how smoking history was defined (pack-years, smoking duration, etc.); also, there are other non-significant predictor variables, which indeed were included in the PanCan full model. Both tobacco exposure (pack-years) and lung function have previously been shown in several studies to be important predictors [8, 19, 20], and they should probably be included as predictor variables in future studies. Asbestos exposure and inhalation of other harmful particles could also be tested for risk predictive potential.

Lastly, we suggest further validation of the risk prediction models in cohorts with higher prevalence of malignant nodules and where small, benign nodules are less common, as we see in daily clinical practice.

Conclusion

The PanCan risk prediction models show high lung cancer risk discrimination for solitary pulmonary nodules in the DLCST cohort, the prediction being mainly based on nodule size. However, we suggest inclusion of age and family history of lung cancer as predictor variables in the parsimonious model as well; these variables were significant predictors in DLCST. In addition, we propose the variable sex be removed from the models, as our results did not support the PanCan conclusion that female sex is associated with increased risk; further work confirming the role of sex in risk stratification in different populations is needed.