Introduction

Persistent pulmonary ground-glass nodules (GGNs) have been a growing concern in the last few years because many of them turned out to be pulmonary adenocarcinomas or their precursors [1, 2]. In the 2011 IASLC/ATS/ERS classification [3], it was recommended to report the size of the solid component and entire tumour size separately for part-solid GGNs as the invasive component of pulmonary adenocarcinoma is closely related to a patient’s prognosis. A subsequent study showed that the diameter of the solid component was a better prognostic predictor than the largest diameter of the whole nodule in adenocarcinomas appearing as part-solid GGNs [4]. Therefore, the diameter of the solid component as well as the size of the whole nodule is a key element in the recent recommendations for the management of subsolid nodules [2]. However, when the variability of manual measurements was assessed, 95 % limits of interobserver agreement were reported to be 1.72 mm and 1.73 mm for GGNs and solid nodules, respectively [5, 6]. As for a part-solid GGN, the interobserver measurement variability of solid component may be even higher given the fact that solid components are usually very small, with the largest diameters of no more than a few millimetres. Furthermore, given their slow growth rate, it may not be easy to determine interval changes in GGNs with manual measurements, especially when the solid component is small.

As computer-aided volumetry was reported to reduce interobserver variability in solid nodules; a similar approach may be applicable to GGNs [79]. While several articles have explored the topic of semi-automated segmentation in GGNs, most of them focused on the feasibility of volumetry and mass measurements as well as their intra/interobserver, and intrascan variability [1013]. However, these studies did not compare the software measurements with manual measurements of maximal diameters of ground-glass and solid components of nodules, which are still the reference standard for GGNs [2]. More importantly, there have been no studies which compared the software measurements with pathology measurements of invasive components and the entire tumours in pulmonary adenocarcinomas [3].

Therefore, the purpose of our study was to evaluate the performance of computer-aided segmentation of ground-glass and solid components in subsolid nodules and to compare the software and pathology measurements in pulmonary adenocarcinomas manifesting as subsolid nodules.

Materials and methods

This study was approved by the Institutional Review Board of our institution, and written informed consent was waived in this retrospective study.

Patients

We retrospectively reviewed our thoracic surgery database for GGNs surgically resected between 2013 and 2015. Of 143 eligible patients with 152 resected nodules, we excluded 79 nodules in 71 patients on the basis of the following exclusion criteria: (1) unavailability of the exact size of either the entire tumour or its invasive component on pathology (75 nodules in 67 patients) (in particular, the size of the invasive component of pulmonary adenocarcinoma is reported as “larger than 10 mm” when it exceeds this threshold in our hospital), and (2) CT parameters and/or protocols judged inadequate for proper analysis of subsolid nodules (four nodules in four patients) (one HRCT with intersection gap, three CT scans with slice thickness >1.25 mm).

Of the remaining 73 nodules in 72 patients, six nodules in six patients were excluded from the final analysis due to inadequate segmentation by the software (see the segmentation accuracy paragraph for detailed information). Therefore, the final population included 66 patients (37 female and 29 males) with a mean age of 59 years (range, 33 − 76 years).

CT technique

CT images were obtained using one of the following four CT scanners; Sensation 16 (Siemens Medical Solutions, Forchheim, Germany), Somatom Definition (Siemens Medical Solutions), LightSpeed Ultra (GE Healthcare, Milwaukee, WI), or Brilliance 64 (Philips Medical Systems, Best, The Netherlands). Given the retrospective design of this study, different CT protocols were used, including CT with (n = 18) or without (n = 48) intravenous contrast material, and CT with standard- (n = 39) or low-dose technique (n = 27). In all patients, CT images were reconstructed using a high-frequency algorithm with a section thickness of 1 mm or 1.25 mm.

Image analysis

For each nodule, two radiologists (J.M.G. and J.C. with 24 and 4 years of experience, respectively) independently measured the maximal transverse diameter of the entire nodule on lung windows (window width, 1500 HU; level, -700 HU), and the maximal transverse diameter of solid component on both lung and mediastinal windows (window width, 400 HU; level, 30 HU) using electronic calipers.

All nodules were then processed by a radiologist (J.C.) with Veolity software (version 1.1, MeVis Medical Solutions, Bremen, Germany). This software, which was previously documented and tested on both solid and subsolid nodules [14, 15], segments a targeted nodule into ground-glass and solid components (if present), and allows the user to choose different thresholds for both components. The software also allows further manual adjustments of the following parameters for both ground-glass and solid components: placement of seed points and lesion roundness. Software then provides various measures including the maximal transverse diameters of the entire nodule and its solid component. For this study, we chose to use five different thresholds for the ground-glass component (-850 HU, -800 HU, -750 HU, -700 HU, and -650 HU) and nine different thresholds for the solid component (-500 HU, -450 HU, -400 HU, -350 HU, -300 HU, -250 HU, -200 HU, -160 HU, and -130 HU). These thresholds were chosen according to a previous study using the same software [14], which had a different purpose. In particular, thresholds of -160 HU and -130 HU corresponded to the range of attenuation potentially recognizable by a human observer in a mediastinal window setting. Software measurements at these thresholds were closest to the radiologists’ manual measurements on mediastinal windows [14].

Segmentation accuracy

Similar to the previous study, which has dealt with solid nodule segmentation [16], the segmentation accuracy for each nodule was evaluated by the radiologist who processed the nodules with the software using the following visual scale: (1) excellent segmentation of both GGN and solid components (Figs. 1 and 2); (2) good segmentation in which the proportion of correct segmentation was 80 % or greater for both components with no vessel incorrectly segmented as a solid component (Fig. 3); (3) insufficient segmentation in which the proportion was less than 80 % for either component or with a vessel incorrectly segmented as a solid component (Fig. 4); and (4) failure in which a nodule could not be segmented. Segmentation scores were then further classified into two groups: adequate segmentation (1 and 2) and inadequate segmentation (3 and 4).

Fig. 1
figure 1

Example of an excellent segmentation for a part-solid GGN in the right upper lobe before (a) and after (b) segmentation with thresholds of -750 HU and -350 HU for ground-glass and solid components, respectively. Software measurements were 16.8 mm for the ground-glass component and 8.7 mm for the solid component. Pathology obtained with wedge resection revealed a lepidic predominant adenocarcinoma with a maximal tumour size of 17 mm and an invasive component size of 9 mm

Fig. 2
figure 2

Example of an excellent segmentation for a part-solid GGN in which there was a moderate correlation between software and pathology measurements (a. axial view of the segmented nodule, b. sagittal). Software measurements for the ground-glass and solid component were 8.7 mm and 3.5 mm, respectively. Manual measurement for the ground-glass component was 10 mm and those for the solid component were 5 mm and 3.8 mm on lung and mediastinal windows, respectively. Pathology obtained with wedge resection revealed an acinar predominant invasive adenocarcinoma with a tumour size of 12 mm and an invasive component size of 8 mm

Fig. 3
figure 3

Example of a good segmentation (a. axial view of the segmented nodule, b. sagittal). Pathology obtained with segmentectomy revealed a MIA with an invasive component size of 2 mm. In this case, the best visual segmentation of ground-glass component was obtained using a -850 HU threshold, but small portions of the ground-glass component on the medial and inferior aspects of the nodule (arrows) could not be included in the segmentation even after adjustments

Fig. 4
figure 4

Example of an insufficient nodule segmentation (a. axial view of the segmented nodule, b. sagittal). Pathology obtained with segmentectomy revealed an MIA with an invasive component size of 1 mm. In this case, even after further adjustements (e.g., increasing the roundness) to the default segmentation of software, it was not possible to separate its solid component from adjacent pulmonary vessels

When segmentation accuracy was inadequate, the observer was allowed to retry segmentation using manual adjustments of seed points and/or roundness parameter for either ground-glass or solid components.

Surgery and pathologic evaluation

Of the 67 nodules included in the final analysis, 15 were resected by lobectomy and 52 by sublobar resection (16 by segmentectomy and 36 by wedge resection).

For each resected nodule, the longest diameter of tumour was mostly measured on a gross specimen using a ruler placed along the tumour, but when the whole tumour could be mounted on pathologic slides, it was measured under light microscopy on a representative slide containing the largest cross-section of the tumour. The invasive component of adenocarcinoma was measured by a ruler after drawing the border of the invasive component under light microscopy.

Statistics

For the results of manual measurements, mean values from two observers were used. In regard to the ground-glass components, the manual measurements on lung window, as well as software measurements using the five aforementioned attenuation thresholds, were compared with the tumour sizes on pathology. As for the solid components, the manual measurements on lung and mediastinal window, as well as software measurements using the nine aforementioned thresholds, were compared with the sizes of invasive components on pathology. We also evaluated the difference between the software and manual measurements.

Finally, the diagnostic accuracies of the software and manual measurements (on lung and mediastinal windows) in predicting adenocarcinoma in situ (AIS) or minimally invasive adenocarcinoma (MIA) by applying a 5-mm threshold for the solid components was evaluated. A threshold of 5 mm was chosen as it has been suggested as a threshold for invasive components in MIA, for which the disease-free survival has shown to be nearly 100 %.

Analysis was done using the Bland-Altman method and paired t-test with a software program (R for Windows, version 3.2.0). Results with P values less than 0.05 were considered statistically significant.

Results

Segmentation accuracy

Of 73 initially included nodules, 67 were segmented adequately (excellent segmentation [n = 65], good segmentation [n = 2]) and were included in the further analysis. Of these adequately segmented nodules, 21 nodules (31 %) required the manual adjustments of segmentation parameters in addition to determining attenuation thresholds: in 19 nodules (28 %), adjustments improved the segmentation accuracy from insufficient to excellent, and in two nodules (3 %) from insufficient to good.

Specific causes for inadequate segmentation in six nodules (insufficient segmentation [n = 6], failure [n = 0]) were as follows: (1) misclassification of all or part of a blood vessel passing through the nodule as a solid component (n = 4); (2) inclusion of part of the chest wall (n = 1); and (3) incomplete segmentation of the ground-glass component due to its very low attenuation and lobulated border in addition to inclusion of part of the chest wall (n = 1).

CT and pathologic findings

There were 47 part-solid GGNs and 20 pure GGNs. The mean maximal transverse diameter of the nodules was 13.7 mm (range, 5.9 − 25.2 mm), and the mean maximal diameter of the solid components was 3.1 mm (range, 0 − 14.9 mm) and 4.4 mm (range, 0 − 16.7 mm) on mediastinal and lung windows, respectively.

There were ten AIS, 30 MIA and 27 invasive adenocarcinoma. The mean maximal diameter of the tumours on pathology was 11.4 mm (range, 5 − 25 mm) and the mean maximal diameter of the invasive components was 4.2 mm (range, 0 − 10 mm).

Difference between the software, manual measurements and the tumour sizes on pathology

The mean differences between the software measurements of ground-glass components at each of the five thresholds, their manual measurements on lung windows and the tumour sizes on pathology are provided in Table 1.

Table 1 Mean differences between software, manual measurements and tumour sizes on pathology (n = 67)

The software measurement at a threshold of -750 HU showed a good performance with no significant difference (p = 0.83; mean difference, + 0.06 mm) and the best 95 % limits of agreement (LOA) with pathology (-4.51 to +4.67 mm, Fig. 5). In comparison, the manual measurements on lung windows were significantly larger than the tumour sizes on pathology (p < 0.001) with a mean difference of +2.38 mm, and showed a lower 95 % LOA (-3.23 to +7.98 mm, Fig. 5).

Fig. 5
figure 5

Bland-Altman plots of absolute differences of software measurements at a threshold of -750 HU and manual measurements on lung windows of the ground-glass components with tumour sizes on pathology

Difference between software, manual measurements and invasive component sizes on pathology

The mean differences between the solid component sizes at each of the nine software thresholds, obtained using manual measurements on lung and mediastinal windows, and the invasive component sizes on pathology are summarized in Table 2. Those measurements were not significantly different from the invasive component sizes on pathology at the thresholds of -400 HU and – 350 HU (+0.54 mm [p = 0.14],] and -0.12 mm [p = 0.74], respectively). Bland-Altman plots for those thresholds are presented in Fig. 6.

Table 2 Mean differences between software, manual measurements on lung and mediastinal windows and invasive component sizes on pathology (n = 67)
Fig. 6
figure 6

Bland-Altman plots of absolute differences of software measurements at two thresholds (-400 HU and -350 HU) and manual measurements on lung windows of the solid components with tumour size on pathology

As for the manual measurements, the measurements of the solid components on lung windows were also not significantly different from the pathology measurements (p = 0.73) with a mean difference of 0.15 mm but showed a lower agreement than software measurements (Fig. 6). However, the measurements on mediastinal windows were significantly smaller than pathology measurements (p < 0.001), with a mean difference of -1.14 mm and also showed a slightly lower agreement compared to software measurements.

Diagnostic accuracies of software and manual measurements (on lung and mediastinal windows) in predicting AIS or MIA

The sensitivity and specificity of the software measurements (at a threshold of -350 HU) and manual measurements (on lung and mediastinal windows) in predicting AIS or MIA by applying a 5-mm threshold for the solid components were 88 % and 70 %, 83 % and 63 %, and 93 % and 41 %, respectively (Table 3).

Table 3 Diagnostic accuracies of software and manual measurements (on lung and mediastinal windows) in predicting AIS or MIA (n = 40)

Differences between software and manual measurements of the ground-glass components

At a threshold of -800 HU, the software measurements of the ground-glass components were not significantly different from the manual measurements on lung windows (mean difference, -0.55 mm, p = 0.14; 95 % LOA, -6.41 to 5.31 mm) (Table 4). The software measurements at a threshold of -750 HU were significantly smaller than the manual measurements with a mean difference of -2.32 mm (p < 0.001) and 95 % LOA of-8.27 to 3.63 mm.

Table 4 Mean differences of software measurements at various thresholds and manual measurements of ground-glass components (n = 67)

Differences between software and manual measurements of the solid components on lung and mediastinal windows

There was no significant difference between the software measurements at a threshold of -250 HU and the manual measurements on mediastinal windows (mean difference, -0.02 mm, p = 0.96; 95 % LOA, -4.74 to 4.71 mm) (Table 5).

Table 5 Mean differences of software measurements at various thresholds and manual measurements of solid components (n = 67)

With regard to measurements on lung windows, there were strong correlations between the software measurements at thresholds of -400 HU and -350 HU and the manual measurements with no significant differences between the two (0.39 mm, p = 0.29; 95 % LOA, -5.49 to 6.28 mm and -0.27 mm, p = 0.44; 95 % LOA, -5.72 to 5.18 mm, respectively).

Discussion

Accurate and reproducible measurements of solid components as well as nodule sizes are crucial in the management of subsolid nodules [2]. This information is also important in surgical planning and predicting prognosis of patients as subsolid nodules with small or no solid components can be candidates for sublobar resection and have an excellent prognosis [1].

In our study, software measurements at optimal thresholds of -750 and -350 HU for ground-glass and solid components, respectively, did not significantly differ from the corresponding pathology measurements (p > 0.05). Specifically, as for the ground-glass component, software measurements at a threshold of -750 HU were found to have the highest agreement with tumour sizes on pathology with 95 % LOA of -4.51 to +4.67 mm and a mean difference of +0.06 mm. In comparison, manual measurements on lung windows were significantly larger than tumour sizes on pathology with a mean difference of +2.38 mm, and showed a slightly lower agreement (-3.23; 7.98). It has been known that manual measurements on CT overestimate tumour sizes on pathology, and these measurement differences are believed to be due to the difference between in-vivo inflated state and ex-vivo post-resection deflated state of the lung tissue, as well as the tissue processing made prior to the pathology analysis [17].

In regard to the solid component, software measurements at a threshold of -350 HU showed the highest agreement without significant difference with invasive component sizes on pathology among the tested measures with 95 % LOA of -5.73 and +5.5 mm and a mean difference of -0.12 mm (p > 0.05). While manual measurements on mediastinal windows were significantly smaller than the actual sizes of invasive components (mean difference, -1.14 mm; p < 0.001), manual measurements on lung windows were not significantly different with pathology measurements with a mean difference of 0.15 mm (p = 0.73). However, both manual measurements techniques had a slightly lower agreement with pathology than software measures, with limits of agreements of -7.93 to +5.64 mm and -6.92 to +7.22 mm for mediastinal and lung windows, respectively. Our results concerning the manual measurements are in line with a previous study by Lee et al. on 59 manually delineated GGNs, which showed that the differences between manual and pathology measurements were -0.18 mm and -0.22 mm for two readers, respectively [17].

Interestingly, the sensitivity and specificity in predicting AIS or MIA by applying a 5-mm threshold for the solid components were 88 % and 70 % for the software, as compared with 83 % and 63 %, and 93 % and 41 % for the manual measurements on lung and mediastinal windows, respectively. There may have been some errors in estimating the size of invasive components with CT , because areas other than invasive components in the tumours, such as alveolar collapse or fibrotic focus, can also manifest as solid components [18] and some invasive components can be underestimated due to the limited spatial resolution of CT [19]. However, software segmentation may improve the diagnostic accuracy and reproducibility by selecting appropriate thresholds.

Our study had several limitations. First, owing to the retrospective design of the study, as well as the decision to include only patients who have undergone surgery, there may have been a selection bias since a sample of surgically resected lesions may contain more invasive lesions than the general population of GGNs. However, such selection was mandatory in this study for comparison of measurements with pathology. Second, pathology measurement itself, which has been used as the reference standard in this study, can be inaccurate and variable in some cases. Inadequate inflated status of the lung tissue after resection and tissue processing may have resulted in a relatively smaller size on pathology, as compared with manual measurements [17]. Moreover, routine pathology measurements are not necessarily done on axial planes comparable to CT scans, and they are subject to a significant interobserver variability with a low agreement between expert pathologists for invasive component sizes in difficult cases [20]. Third, the exact sizes of invasive components were not available for tumours with invasive component greater than 10 mm, and therefore these cases were excluded from the analysis. However, as the differentiation between invasive and pre- or minimally invasive adenocarcinomas is more challenging in lesions with small invasive components on both CT and pathology [20], we do not think that inclusion of tumours with larger invasive components may have changed the conclusion of our study. Final limitations are related to software segmentation. Indeed, six nodules had to be excluded from the final analysis due to inadequate segmentation. In 67 % of cases, this was due to incorrect segmentation of a vessel passing through the nodule as the solid component, which was previously reported as an issue of concern in GGN segmentation [14]. However, those inadequate segmentations are commonplace in semi-automatic segmentation software, even in solid nodules which pose less technical problems. For instance, in a study which compared six software packages for solid nodules, adequate segmentation rate ranged from 71 % to 86 % before any manual modification and from 71 % to 98 % after adjustments [16]. We speculate that further improvements in software algorithms will reduce the occurrence rate of inadequate segmentation. Fourth, we evaluated only two window settings of lung and mediastinal windows for manual measurements while various thresholds were applied for software segmentation. However, selecting optimal window settings for the measurement of tumour size and the invasive component is not an easy task [17] and a specific software threshold value cannot be simply translated into a window setting.

In conclusion, when segmentation results are adequate, software measurements showed no significant difference with sizes of the entire tumour and invasive components on pathology and a better agreement with pathology than manual measurements at the one set of lung or mediastinal window settings that have been used as the standard at the authors' institution in pulmonary adenocarcinomas manifesting as subsolid nodules.