Introduction

Pulmonary subsolid nodules (SSNs) represent a histological spectrum of entities from preinvasive lesions to invasive pulmonary adenocarcinomas (IPAs) [1]. However, the management strategy can significantly differ as preinvasive lesions can be followed up conservatively with CT scans or treated with limited resection without lymph node dissection [2, 3], whereas IPAs require surgical resection for which the standard procedure is lobectomy with lymph node dissection [4]. Therefore, discrimination of preinvasive lesions from IPAs is critical in the evaluation of SSNs. The prognosis of patients is also substantially distinct between preinvasive lesions and IPAs, and it is well known that patients with adenocarcinomas in-situ (AIS) or minimally invasive adenocarcinomas (MIAs) who undergo complete surgical resection should have 100 % or near 100 % 5-year disease-free survival [2].

The differentiation between preinvasive lesions and IPAs is currently based on their CT morphological features, sizes and clinical history [3, 5,6,7,8,9,10]. Quantitative imaging features such as mass, kurtosis, entropy and volume or mass doubling time can also be used for the differentiation of SSNs [5,6,7]. However, the most fundamental and practical method of differentiation would be the diameter measurement of nodules as well as its internal solid components. Specifically, previous studies have suggested size criteria for the diagnosis of malignant SSNs as 8–10 mm [3, 8, 9]. In current everyday practice, diameter measurements are performed manually using electronic calipers on the axial plane of a CT scan. Alternatively, however, semi-automatic segmentation software can be applied to calculate the nodule size. The advantages of semi-automatic measurement are that it is more accurate and reproducible [11, 12]. In addition, semi-automatic measurement is basically three-dimensional and not performed on an axial plane. Thus, it is potentially closer to the ground-truth of a tumour’s dimensions. Past publications that have investigated semi-automatic measurements have focused on measurement accuracy and variability. However, the actual merit of semi-automatic measurement in the diagnostic process of SSNs has not yet been analysed to our knowledge. It is unclear whether the diagnostic performance for SSNs could be improved when semi-automatic measurements are utilized instead of the conventional manual measurements.

Thus, in this study, we hypothesized that semi-automatic diameter measurements would improve the accuracy and reproducibility in discriminating preinvasive lesions and MIAs from IPAs appearing as SSNs and increase the reproducibility in classifying SSNs. We then performed manual and semi-automated measurements using a commercial software program and compared the datasets to substantiate our assumption.

Materials and methods

This retrospective analysis was approved by the Institutional Review Board of Seoul National University Hospital and written informed consent was waived.

Study population

Our study population comprised 104 consecutive patients who underwent preoperative non-enhanced CT using a SSN evaluation protocol and subsequent surgical resection between November 2014 and July 2016. Among the 104 patients, three patients whose pathological diagnoses did not belong to the adenocarcinoma spectrum including precursors (atypical adenomatous hyperplasia (AAH), AIS, MIA and IPA) were excluded. Seven additional patients were excluded owing to semi-automatic segmentation failure. In addition, five patients in whom nodules were classified as solid by at least one reader (E.J.H. and H.K. with 6 and 7 years of experience in chest CT, respectively), were also excluded. Consequently, 89 patients (39 men and 50 women; mean age ± standard deviation, 61.9±9.7 years for all individuals, 64.3±8.5 years for men, and 60.1±10.3 years for women; range 45–77 years for men and 34–79 years for women; independent t-test, p=0.044) with 102 SSNs were included in our study (Fig. 1).

Fig. 1
figure 1

Flow chart of the patient selection process including the inclusion and exclusion criteria

Seventy-eight of the patients had a single SSN; nine patients had two SSNs; and two patients had three nodules. Pathology revealed that there were two cases of AAH, three AIS, 27 MIAs and 70 IPAs. The median interval between CT and surgical resection was 2.0±5.4 days (range 0–36 days). Part of the study population (36/89) had participated in a previous published study [13].

CT image acquisition

All patients were scanned in the supine position during maximum inspiration. Non-enhanced CT scans were performed with a 64-row multi-detector CT scanner (Discovery CT750 HD; GE Healthcare, Waukesha, WI, USA). Detailed scanning parameters were as follows: detector configuration, 64 × 0.625 mm; tube voltage, 120 kVp; noise index setting, 70.44; beam pitch, 0.984:1; rotation time, 0.5 s; field-of-view, 350 mm; matrix, 512 × 512; bone kernel; filtered back projection; reconstruction increment, 0.625 mm; and slice thickness, 0.625 mm.

For radiation dose assessment, the volume CT dose index (CTDIvol) and dose-length product (DLP) were recorded from the scanner. The effective diameter, defined as the square root of the anteroposterior diameter times the transverse diameter [14], was obtained for each patient. Size-specific dose estimates (SSDE) were calculated to provide an approximation of the absorbed dose [14].

Semi-automatic and manual measurement of SSNs

First, semi-automatic measurement of the diameter of SSNs (DSSN) and internal solid portions (Dsolid) were performed by two radiologists (E.J.H. and H.K.) independently, using commercial software (Veolity version 1.2, MeVis Medical Solutions, Bremen, Germany) [13, 15, 16]. Readers initially determined and clicked one of the three nodule types (pure ground-glass, part-solid or solid) based on visual assessment according to Fleischner Society recommendations [2]. Thereafter, users defined a target by dragging a line traversing the nodule on an axial plane. Based on the predefined attenuation threshold for the ground-glass [-750 Hounsfield units (HU)] and solid portion (-350 HU) [15], the software determined the segmentation boundary [17]. Readers were allowed to adjust the segmentation boundary by tuning attenuation thresholds or the roundness of each component (ground-glass and/or solid portion), if necessary. Successful segmentation was evaluated subjectively by each reader (Electronic Supplementary Material 1) and was defined as correct segmentation proportion of 80 % or greater according to previous literature [13, 15, 18]. After segmentation, the effective diameters (diameter of a sphere where the volume equals the nodule volume) of both DSSN and Dsolid were provided. Thus, the nodule type, DSSN and Dsolid assessed by both readers were recorded for the successfully segmented SSNs.

Second, manual measurements were recorded using the longest diameter with its maximum perpendicular diameter so as to obtain the average diameter of SSNs (DSSN) and their solid portions (Dsolid). The radiologists were asked to classify the nodule type and then to measure the diameters on the lung window (window width and level of 1,500 and -700 HU) using the electronic caliper of the picture archiving and communication system.

Finally, one of the readers (H.K.) conducted the overall image analysis twice to perform intrareader variability analysis. Therefore, we obtained three sets of measurement data (reader 1, reader 2–1 and reader 2–2). Each image reading session was separated by an interval of 4 weeks and the readers were unaware of the pathological diagnosis. The order of image reading was randomized by patient.

Pathological diagnosis

In our institution, all surgical specimens containing subsolid nodules were fixed by infusing 10 % buffered formalin through a transpleural and transbronchial approach to produce an inflated state and to precisely measure the invasive adenocarcinoma component [19]. Haematoxylin and eosin were used for staining the specimens. All pathological diagnoses were established based on the surgical specimens by attending pulmonary pathologists of Seoul National University Hospital according to the 2011 International Association for the Study of Lung Cancer (IASLC)/the American Thoracic Society (ATS)/the European Respiratory Society (ERS) pulmonary adenocarcinoma classification [20]. Pathological diagnoses were made after operations as a part of routine clinical process and specimens were not reviewed again, particularly for the present study.

Statistical analysis

Semi-automatic and manual measurements of DSSN and Dsolid were compared using either the paired t-test or Wilcoxon’s test, as appropriate, after testing for normality. Thereafter, inter- and intrareader measurement variability of DSSN and Dsolid was calculated using the Bland-Altman method by calculating the 95 % confidence interval (CI) of the relative differences (differences in the two measurements divided by the mean) [21]. Percentage relative differences (Rdiff) were also compared between the semi-automatic and manual measurements.

Diagnostic performance based on the diameter measurements was tested using binary logistic regression analysis with subsequent receiver operating characteristic curve (ROC) analysis. First, DSSN and the solid proportion (%) within a nodule (Psolid) were used as input variables for logistic regression analysis using the pathological diagnosis (preinvasive lesion (AAH, AIS) and MIA vs. IPA) as a dependent variable in order to evaluate the statistical significance of the two variables. After confirmation of the significance, probability of the regression model using both DSSN and Psolid was obtained for each case. The probability was then used as an independent variable for ROC analysis using binary pathological diagnosis as a dependent variable. Furthermore, the area under the curve (AUC) was calculated and compared between the semi-automatic and manual measurements [22]. Psolid was used instead of Dsolid as there was a linear correlation between Dsolid and DSSN. The pathological diagnosis was divided into two groups (AAH, AIS and MIA vs. IPA) based on the distinct prognosis between them [2].

Inter- and intrareader reproducibility (agreement) of diagnosis, which was determined using a probability cut-off of 0.5 from logistic regression analysis, was investigated for each measurement method using Cohen’s κ statistics. The resulting Cohen’s κ values were then compared between the semi-automatic and manual measurements using Z statistics, with the null hypothesis H0: κ1-κ2=0 and the alternative hypothesis Ha: κ1- κ2≠0 [23, 24].

Inter- and intrareader reproducibility of SSN classification was also analysed using Cohen’s κ statistics and was compared between the two measurement methods using Z statistics. Nodule classification was based on the Fleischner Society’s recommendations for SSNs as follows: (1) pure ground-glass, (2) part-solid with a solid component 5 mm or larger or (3) less than 5 mm [2]. In addition, sub-classification of part-solid nodules was performed with Dsolid of each reader.

The first measurement data of reader 2 (reader 2–1) was used for the calculation of inter-reader measurement variability and reproducibility of diagnosis and classification. All statistical analyses were performed using two commercial software programs (MedCalc version 12.3.0, MedCalc Software, Mariakerke, Belgium; and SPSS 19.0, IBM SPSS Statistics, Armonk, NY, USA). A p-value < 0.05 was considered to indicate statistical significance.

Results

Semi-automatic and manual diameter measurement

For reader 1, mean DSSN was 14.6±6.0 mm (mean±standard deviation) using semi-automatic measurement and 14.4±6.9 mm using manual measurement (p=0.405). Mean Dsolid was 6.3±4.9 mm using semi-automatic measurement and 5.4±4.5 mm using manual measurement (p<0.001). For reader 2–1, mean DSSN was 14.5±5.8 mm using semi-automatic measurement and 14.6±6.9 mm using manual measurement (p=0.821). Mean Dsolid was 6.3±4.9 mm using semi-automatic measurement and 6.5±5.0 mm using manual measurement (p=0.147). For reader 2–2, mean DSSN was 14.5±5.8 mm using semi-automatic measurement and 14.1±6.5 mm using manual measurement (p=0.069). Mean Dsolid was 6.5±5.0 mm using semi-automatic measurement and 5.9±4.5 mm using manual measurement (p<0.001). Detailed data are provided in Table 1.

Table 1 Descriptive statistics for the diameter measurement

Inter- and intrareader measurement variability

For semi-automatic measurement, inter-reader variability was -1.9 mm (95 % CI: -2.3 to -1.6)–2.1 mm (95 % CI: 1.7–2.4) for DSSN and -2.1 mm (95 % CI: -2.5 to -1.8)–2.1 mm (95 % CI: 1.7–2.5) for Dsolid. Mean Rdiff was 4.11±8.21 % for DSSN and 9.03±34.02 % for Dsolid. Intrareader measurement variability was -1.5 mm (95 % CI: -1.8 to -1.3)–1.5 mm (95 % CI: 1.3–1.8) for DSSN and -2.2 mm (95 % CI: -2.6 to -1.9)–1.7 mm (95 % CI: 1.4–2.0) for Dsolid. Mean Rdiff was 2.31±4.88 % for DSSN and 8.89±34.21 % for Dsolid.

For manual measurement, inter-reader variability was -2.8 mm (95 % CI: -3.3 to -2.4)–2.4 mm (95 % CI: 2.0–2.9) for DSSN and -5.1 mm (95 % CI: -5.7 to -4.4)–2.8 mm (95 % CI: 2.1–3.5) for Dsolid. Mean Rdiff was 7.56±7.14 % for DSSN and 31.27±47.91 % for Dsolid. Intrareader measurement variability was -2.1 mm (95 % CI: -2.6 to -1.7)–3.1 mm (95 % CI: 2.7–3.6) for DSSN and -2.1 mm (95 % CI: -2.5 to -1.6)–3.3 mm (95 % CI: 2.8–3.8) for Dsolid. Mean Rdiff was 7.00±6.57 % for DSSN and 17.39±34.33 % for Dsolid. Inter- and intrareader variability of semi-automatic measurement was significantly lower than those of manual measurement for both DSSN and Dsolid (all, p<0.001).

Diagnostic performance

Input variables of DSSN and Psolid were statistically significant in the logistic regression models for both semi-automatic and manual measurements of both readers (p<0.05; Table 2). For reader 1, the AUC was 0.907 (95 % CI: 0.834–0.956) for semi-automatic measurement and 0.833 (95 % CI: 0.747–0.900) for manual measurement (p=0.001). For reader 2–1, AUC was 0.905 (95 % CI: 0.831–0.954) for semi-automatic measurement and 0.862 (95 % CI: 0.780–0.922) for manual measurement (p=0.019). For reader 2–2, AUC was 0.923 (95 % CI: 0.853–0.967) for semi-automatic measurement and 0.864 (95 % CI: 0.782–0.924) for manual measurement (p=0.001). Detailed results of logistic regression analysis and ROC analysis are described in Tables 2 and 3, respectively. DSSN and Dsolid according to the pathological diagnosis, and comparisons between them can be found in Table 4.

Table 2 Results of the logistic regression analysis for differentiating preinvasive lesions and minimally invasive adenocarcinomas from invasive pulmonary adenocarcinomas
Table 3 Diagnostic performance based on semi-automatic and manual measurements
Table 4 Diameter measurements and comparisons between pathological subtypes

Reproducibility of diagnosis

Inter-reader reproducibility (κ) of the diagnosis was 0.924 (95 % CI: 0.840–1.000) for semi-automatic measurement and 0.690 (95 % CI: 0.527–0.853) for manual measurement (p=0.012). Percentage inter-reader agreement was 97.1 % (99/102) for semi-automatic measurement and 88.2 % (90/102) for manual measurement.

Intrareader reproducibility (κ) of the diagnosis was 0.897 (95 % CI: 0.797–0.997) for semi-automatic measurement and 0.841 (95 % CI: 0.718–0.964) for manual measurement (p=0.490). Percentage intrareader agreement was 96.1 % (98/102) for semi-automatic measurement and 94.1 % (96/102) for manual measurement.

Reproducibility of nodule classification

Inter-reader reproducibility (κ) of nodule classification was 0.861 (95 % CI: 0.769–0.953) for semi-automatic measurement and 0.683 (95 % CI: 0.561–0.805) for manual measurement (p=0.022) (Fig. 2). Percentage inter-reader agreement was 92.2 % (94/102) for semi-automatic measurement and 80.4 % (82/102) for manual measurement.

Fig. 2
figure 2

Measurement of the internal solid portion of an invasive pulmonary adenocarcinoma appearing as a subsolid nodule in a 77-year-old male. (a) Manual average diameter of the internal solid portion was measured as 4.7 mm and 7.6 mm by readers 1 and 2, respectively, on an axial plane of a non-enhanced CT scan. (b) Semi-automatic effective diameter of the internal solid portion was 6.6 mm for both readers. The measurement as well as nodule classification was consistent between the two readers. (c) Semi-automatic segmentation boundary was overlaid on the coronal CT image

Intra-reader reproducibility (κ) of nodule classification was 0.894 (95 % CI: 0.812–0.976) for semi-automatic measurement and 0.750 (95 % CI: 0.632–0.868) for manual measurement (p=0.049). Percentage intrareader agreement was 94.1 % (96/102) for semi-automatic measurement and 85.3 % (87/102) for manual measurement. Detailed data can be found in Table 5.

Table 5 Subsolid nodule classification of the two readers

Radiation dosage

Mean CTDIvol and DLP were 2.22±0.90 mGy (range 0.74–5.17 mGy) and 85.27±35.61 mGy∙cm (range 25.04–192.37 mGy∙cm), respectively. Mean effective diameter was 25.5±2.0 cm (range 20.3–31.3 cm) and mean SSDE was 3.14±1.08 mGy (range 1.12–6.82 mGy).

Discussion

Our study results demonstrated that semi-automatic measurements were superior to manual measurements in the aspect of diagnostic accuracy for SSNs. We also demonstrated that inter-reader reproducibility of the diagnosis and classification of SSNs was significantly higher using semi-automatic measurements.

In terms of diameter measurement, DSSN was shown to be similar between the semi-automatic and manual measurement methods, although there were significant differences in Dsolid between the measurement methods for reader 1 and reader 2–2. Accordingly, the differences in Dsolid between the pathological diagnoses were greater using semi-automatic measurements than with manual measurements. In other words, Dsolid of preinvasive lesions and MIAs were measured smaller and/or Dsolid of IPAs were measured larger using semi-automatic measurement, which is probably attributable to the smaller variability in the range of semi-automatic measurement of solid portions. Consequently, diagnostic performance (AUC) based on the semi-automatic effective diameter was revealed to be significantly greater than that based on the manual average diameter. To our knowledge, no studies to date have elucidated the impact of the diameter measurement method on the diagnostic performance for SSNs. Past studies focused on the measurement accuracy and variability of semi-automatic measurement. However, measurement itself is different from diagnosis. Diagnosis is based on the measurement but it is a next step in clinical process. Therefore, the plausible link between reduced measurement variability and improved diagnostic accuracy should be validated as well. The key findings of our study results are: (1) the size measurement method can affect the radiological diagnosis and (2) the accuracy of diagnosis can be augmented by using the semi-automated segmentation tool. We must note that in real clinical practice, a diagnosis is made not only using diameters but also using multiple morphological features and clinical factors. Nonetheless, diameter measurement is still the most fundamental quantitative feature in the evaluation of lung nodules.

We also hypothesized prior to our study that semi-automatic measurements would help increase the inter-reader reproducibility of a diagnosis. Given the results of previous publications on the marked improvement of measurement variability through semi-automatic segmentation [25,26,27], it was deemed reasonable that semi-automatic measurement would reduce inter-reader variability of a diagnosis. As expected, the inter-reader agreement of a diagnosis was shown to be significantly higher using the semi-automatic effective diameter (κ=0.924) than with the manual average diameter (κ=0.690) in our study.

With respect to the classification of SSNs, recent studies reported substantial inter-reader variability [28,29,30,31]. Indeed, according to Riel et al. [29] who categorized nodules according to Fleischner Society guidelines, inter- and intrareader agreement was shown to be only moderate (κ=0.51 and 0.57) [29]. Penn et al. [30] and Yoo et al. [31] also reported a similar degree of inter-reader agreement (κ=0.56 and 0.57, respectively). Riel et al. [29] pointed out that the majority of disagreements they encountered were related to either the presence of a solid component in part-solid nodules or the size of the solid component relative to the 5-mm threshold. In our study, we demonstrated that semi-automatic measurement can help reduce inter-reader disagreement stemming from the latter factor. However, as the differentiation between pure ground-glass and part-solid nodules was performed visually in both measurement methods, we were not able to address the issue of determining the presence of a solid portion in this study. Nevertheless, for the sub-classification of part-solid nodules according to Dsolid using a 5-mm threshold, semi-automatic segmentation was able to significantly improve inter-reader reproducibility. Sub-classification of part-solid nodules according to Dsolid has clinical implications considering that the management recommendations for part-solid nodules suggested by Lung CT Screening Reporting and Data System (Lung-RADS) or the Fleischner Society, i.e. whether to suggest surveillance CT or invasive procedures (biopsy or surgical resection), are based on the solid component size (6-mm and 5-mm cut-offs, respectively) [2, 32].

It must be re-emphasized that the software package we used in our study was not fully automatic and thus needed human input to determine the nodule type and segmentation seeding point. Nevertheless, the outcome was more accurate and reproducible than manual human measurement alone. In the era of lung cancer CT screening, the frequency of detected SSNs will undoubtedly increase. As pure ground-glass nodules and part-solid nodules are identified in approximately 4 % and 5 % of these baseline CT screenings [33], the need for standardization of practice for patients with SSNs will also become greater. Management guidelines such as Lung-RADS [32] or the Fleischner Society recommendations [2] contribute to relieving these concerns, and we believe that semi-automatic measurement will play an additional role in further promoting the reproducibility of nodule characterization and clinical standardization.

There are several limitations to this study. First, we only included patients who underwent surgical resections as we used their pathological diagnosis as the reference standard, and this inclusion criterion may have induced selection bias. The proportion of pure ground-glass nodules from that of all SSNs was 20.6 % (reader 1) in our study, which is lower than the expected outcome of the screening population [33]. As the variability of classification was mainly caused by the solid component measurement, the proportion of pure ground-glass and part-solid nodules might have affected the analysis results. Second, our study results may be software package-specific. As the performance of any semi-automatic measurement software depends on the segmentation algorithm used, diagnostic performance and classification reproducibility may also be dependent on the semi-automatic segmentation tool used. The segmentation success rate was 93.6 % (102/109) in our study and the optimal segmentation was not achievable in seven cases.

In conclusion, semi-automatic effective diameter measurement offers an opportunity to improve the diagnostic accuracy and reproducibility as well as classification reproducibility of SSNs, helping to facilitate a more accurate and standardized clinical practice for patients with SSNs.