Introduction

Recent randomized controlled trials have proved that accurate presurgical staging of colon cancer is important for selecting patients who will benefit from neoadjuvant chemotherapy in locally advanced colon cancer [1] and identify patients who are not eligible for laparoscopic surgery in advanced state of colon cancer [2]. These studies have utilized CT to stratify high-risk patients into a differentiated treatment. Therefore, the role of CT in primary staging of colon cancer patients will increasingly extend from mainly accurate distant staging towards locoregional staging of the disease. Although CT has for years been adopted as the standard imaging workup, most published data have addressed the performance of CT for staging liver and extrahepatic metastases and evidence on locoregional staging accuracies is weaker [3]. One recent meta-analysis has shown that the sensitivity and specificity of CT for detecting tumor over T3 staging was 90 and 69% but demonstrated lower performance for detecting tumor invasion beyond muscular propria layer over 5 mm (T3cd-T4) with sensitivity and specificity of 77 and 70%, respectively. Moreover, the sensitivity and specificity of CT in detecting lymph node involvement was relatively low (71 and 67%, respectively) [4]. Majority of these studies have included readers who had at least over 2 years of experiences [5]. Not much is known about the expertise of the radiologists that is required to accurately stage the local tumor for reliable clinical decision making. Learning curve is a concept of improvement in performance after repetition, introducing the importance of practice experience as an essential tool for accomplishing the expertise [6]. Locoregional CT staging of colon cancer can be especially challenging due to anatomical reasons, such as presence of haustra and its peculiar orientation within abdominal cavity, also geographical distortion caused by peristaltic movements. These require visualization of colon cancer in 3-dimensional projection, utilizing techniques like multiplanar reconstruction (MPR). Therefore, familiarizing oneself to view colon cancer in different perspectives can demand longer period and repetition to achieve a certain level of expertise. Previous studies in other cancer types have demonstrated significant improvement of reader’s performance through continuous feedback [7]; however, data for colon cancer staging are lacking. Considering the increasingly important role of CT to select high-risk patients and stratify treatment it is important to know whether a learning curve for locoregional staging exists and whether senior trainees with minimal practice experience would benefit from an interactive feedback.

Materials and methods

Study population

This retrospective study was approved by our institutional review board (IRBd18-119). The requirement for written informed consent was waived. Six hundred patients with pathological colon cancer diagnoses, between January 2010 and December 2016, were included from institutional database search. We consecutively included patients who underwent presurgical CT followed by surgical resection of colon cancer within one-month period. Exclusion criteria included patients with rectal cancers, mucinous cancers, small sized colon tumor lesion which could not be visualized on CT, poor image quality and patients who received neoadjuvant chemotherapy. The included cohort comprised 88 patients, which were consecutively divided into four batches of cases, creating 20 cases of four batches for the learning curve analysis. The remaining 8 cases were utilized as part of introductory training and was not included in the batches that were given to the readers for learning curve analysis [8]. Table 1 summarized the demographic and tumor characteristics of the final included cohort.

Table 1 Patient demographic and tumor characteristics

Computed tomography

All patients were kept on nil per os for 2–4 h before the CT scan and had oral intake of 20 cc of omnipaque 300 mg/ml (General Electronics Healthcare, Milwaukee, WI, USA) diluted in 1 l of water, 1 h before the examination. Bowel preparation was not performed before CT. One of three different 16–64 slice CT scanners (Gemini TF (Philips, Amsterdam, Netherlands), Sensation (Siemens, Erlangen, Germany) and Aquillion (Toshiba, Tokyo, Japan)) were used to perform CT scan. Patients underwent preoperative abdomen CT with iodine-based intravenous contrast (3 ml/s, total amount of 90–130 ml, followed by bolus injection of 30 cc normal saline) in portal-venous phase at 70 s delay. Images from all CT scanners were reconstructed at 1–2 mm slice thickness.

Reader interpretation

Four senior radiology residents, trained in different centers in Europe participated in the study and all four readers had exposures to abdomino-pelvic CT during their training and had specific interest in abdominal imaging. However, all of them had almost no previous experience in precise staging of colon cancer on CT. All readers were provided with imaging literature regarding colon cancer staging and recent developments in colon cancer treatment. Thereafter the four readers received a 90-min interactive introductory lecture on colon cancer CT staging provided by an experienced board-certified radiologist (E.H., with over 6 years of experience in abdominal CT imaging), under the supervision of a senior faculty member (R.B., with over 20 years of experience in abdominal CT imaging). This lecture covered anatomy of colon, T and N staging of colon cancer using CT, with images demonstrated each staging category. Subsequently, they reviewed 8 exemplary colon cancer CTs that were not included in the study on a separate workstation. Readers used multiplanar reconstruction software, which allowed images to be viewed from three-dimensional views (Fig. 1). This exercise aided the readers to familiarize themselves with interpreting images and staging colon cancer on CT and using the local PACS system and software (Carestream Health, Rochester, NY, USA).

Fig. 1
figure 1

Example of radiologic stage T4N1 colon cancer is depicted on designated local PACS system. Readers were advised to use multiplanar reconstruction software to analyze in three-dimensional views

The readers were blinded to all clinical and histological information, except the location of tumor to ensure the consistency and accurate time measurement of the image interpretation. The four readers reviewed images independently on separate workstations, interpreted images in sequence and recorded their findings in online survey tool that was created for the analysis. The following items were scored; (1) T staging of tumor and reader’s confidence (using 5-scale, with 5 as the most confident and 1 as the least confident, (2) N staging of tumor and reader’s confidence, (3) Time required for interpretation of each cases in seconds. T staging was divided into 3 categories: T1-2 tumor was defined as mass in colon lumen limited to the bowel wall with clear pericolic fat on CT or luminal projection of mass without any bowel wall distortion. T3 tumor was defined as a tumor which demonstrated smooth or nodular extension beyond the normal contour of the bowel wall. T4 tumors were defined as a tumor which extends into adjacent peritoneum or grown into other adjacent tissues or organ (Fig. 2). For N staging, a lymph node was considered as malignant when a lymph node was over 9 mm in short axis. Additionally, cluster of more than 3 lymph nodes along the locoregional vascular pedicle was considered as malignant [9,10,11].

Fig. 2
figure 2

Axial (a) and coronal (b) reformatted CT images demonstrating pathologic T2 stage sigmoid colon cancer, depicting bowel wall thickening with clear pericolic fat. Axial (c) and coronal (d) reformatted CT images demonstrating pathologic T3 stage sigmoid colon cancer, with nodular extension beyond the normal contour of the bowel wall (white arrow)

The four readers were randomly divided into two groups. One group of 2 readers received interactive feedback by an experienced radiologist after each batch of 20 CTs. They were able to review radiological findings and pathological information and discuss their findings with experienced radiologist. The feedback was given for each case and the readers were allowed to ask any questions regarding image and pathological data interpretation. In another group, readers were only provided with pathological data after the interpretation of each batch and left with own will to compare their findings.

2.4 Reference standard

Specimens were processed according to the local institutional protocol that is based on guidelines by College of American Pathologists and Royal College of Pathologists. Reporting of the histopathologic information followed synoptic protocol according to Pathologisch Anatomisch Landelijk Geautomatiseerd Archief [12].

Statistics

The readers’ accuracies, sensitivities, specificities, positive and negative predictive values (PPV and NPV) in differentiating between ≤ T2 and ≥ T3 tumors (criteria used to select patients who are eligible for neoadjuvant chemotherapy [1]) and assessing lymph node involvement were evaluated. Learning curves for each reader were constructed by smoothing the scatter plots of correct assessment 0/1 against the number of evaluated tumors using loess regression. Generalized estimating equations logistic regression models were used to model the probability of a correct assessment, with each reader and the number of evaluated tumors as predictors. The model also incorporated an interaction effect between the number of evaluated tumors and feedback group. The effect of number of evaluated tumors was assumed to plateau, with the same point of inflection for readers with and without feedback. The correlation between multiple evaluations of the same tumor was modeled with an exchangeable correlation structure, i.e., the same assumed correlation between all pairs of evaluations of the same tumor. The optimal point of inflection was determined by determining the integer point (2–79) associated with the smallest QICu statistic.

Averages (median for ordinal outcomes and mean for the continuous outcome) and standard deviations were assessed for T- and N-staging confidence (ordinal 1–5), and for reading time (continuous). Smoothed curves of confidence and reading time against number of evaluated tumors were constructed using loess regression. For all three outcomes, linear generalized estimating equations models with predictors reader ID and the number of evaluated tumors. The models further incorporated an interaction effect between the number of evaluated tumors and feedback group, and an exchangeable correlation structure between multiple evaluations of the same tumor.

Different combinations of sequential batches are utilized (e.g., batch 1 vs. 2, 3 and 4, batch 1, 2 vs. 3, 4 and batch 1, 2, 3 vs. 4) for the analysis and comparison of diagnostic performance and reader improvement according to the location of primary tumor. Reader improvement was evaluated using ANOVA and error variance was permitted to eliminate the hypothesis of homogeneity according to different groups (e.g., groups of batches, feedback vs. no feedback). When comparing groups to identify differences in the change of diagnostic performance with increased experience, diagnostic performance according to increased number of reviewed cases are considered independently within the groups.

Reader confidence and reading time was compared using t test (for variable with normal distribution) and Mann–Whitney test (for variables that are not normally distributed) between groups of readers who received feedback and who did not.

All statistical analyses were performed using R version 3.6.1 and MedCalc version 19.1.3. P values < 0.05 were considered statistically significant.

Results

T staging (≤ T2 vs. ≥ T3)

The learning curve for accuracy of staging between ≤ T2 and ≥ T3 colon cancer is demonstrated in Fig. 3. All the readers have significantly improved and showed statistically significant association with number of cases reviewed (P < 0.001) in the 1st 40 cases in both groups (with and without feedback). The four readers have reached plateau in their accuracy after 38 cases, which is presented as an inflection point in Fig. 3. There was no significant difference in the slopes between the groups who received feedback and who did not. Sensitivity of T staging also demonstrated significant association with experience (P < 0.001) and displayed inflection point at 32 cases. Specificity and NPV also showed significant associations with increased number of reviewed cases (0.030 and 0.002, respectively), whereas PPV did not show a significant association with increased experience. The presence of feedback revealed a significant correlation with NPV (P = 0.003), but did not show a significant association with accuracy, sensitivity, specificity and PPV (P = 0.093–0.780).

Fig. 3
figure 3

Learning curves for T staging (≤ T2 vs. ≥ T3) of colon cancer with increasing number of reviewed cases. Accuracy of T staging in readers who received feedback (dashed line) and did not receive feedback (solid line) are presented. Accuracy was plotted and fitted using generalized estimating equations logistic regression models

The diagnostic performance of all four readers in distinguishing between ≤ T2 and ≥ T3 colon cancer is depicted in Table 2 and Supplementary Table S1. At the baseline, there was no significant difference in accuracy, sensitivity, specificity, PPV and NPV between readers and groups of feedback vs. no feedback (P = 0.268–0.826). The improvement of accuracy was significant between batch 1 vs. 2–4 (P < 0.001), batches 1, 2 vs. 3, 4 (P = 0.001) and batch 1, 2, 3 vs. 4 (P = 0.003). Sensitivity and NPV improved with increased number of examined cases with statistical significance; batch 1 vs. 2–4 (Ps < 0.001), batches 1, 2 vs. 3, 4 (Ps = 0.001) and batch 1, 2, 3 vs. 4 (P = 0.042 for sensitivity and P = 0.005 for NPV). Specificity showed statistical significance between batch 1, 2, 3 vs. 4 (P = 0.014) and PPV did not reveal a significant difference in any combination of batch groups. There was no statistically significant difference in the improvement of diagnostic performance between the groups who received feedback and who did not (Ps > 0.05) (Supplementary Table S1).

Table 2 Diagnostic performance of all readers in distinguishing between ≤ T2 and ≥ T3 colon cancer and lymph node involvement with increasing number of evaluated cases

Accuracy of locoregional colon cancer according to the location of primary tumor is demonstrates in Table 3 There were statistically significant differences in accuracy, sensitivity, specificity and PPV according the groups of primary colon tumor location (Ps < 0.05). T staging for tumors located in hepatic/splenic flexures showed the highest accuracy and PPV (P = 0.001 and < 0.001, respectively) and tumors located in cecum/ascending colon tumor revealed higher sensitivity and specificity in (P < 0.001 and 0.004, respectively). NPV for T staging was not significantly different according to the location of primary colon tumor. Importantly, there was a significant difference in the improvement of accuracy of colon tumor T staging according to the location of primary tumor (P = 0.002). Accuracy was improved between batches 1 vs. 2,3,4 and batches 1, 2 vs. 2, 3 for T staging of tumors in transverse/descending colon, sigmoid colon and hepatic/splenic flexures (Ps < 0.05), whereas there was no statistically significant improvement of accuracy was noted for T staging of tumors located in cecum/ascending colon with increased reader experience (Ps > 0.05).

Table 3 Improvement of diagnostic performance of all readers in distinguishing between ≤ T2 and ≥ T3 colon cancer according to the location of primary colon tumor

Reader’s confidence for distinguishing between ≤ T2 and ≥ T3 colon cancer showed statistically significant association with feedback as the number of reviewed cases increased (P = 0.002) (Fig. 4). Reader’s confidence showed increment between batches 1,2 vs. 3,4 (P = 0.006) and batches 1, 2, 3 vs. 4 (P = 0.020) in groups of readers who received feedback but did not show any increment in groups of readers who did not receive feedback (P > 0.05). There was a significant difference in overall reader confidence between the two groups (P < 0.001). Comparison of reader confidence according to the batch number showed statistically higher reader confidence in readers who received feedback for batches 1, 3 and 4 (P = 0.045, 0.004 and < 0.001, respectively) (Table 4), implying that the readers in the feedback group had higher confidence initially and feedback enhanced the increment of their confidence, compared to the readers who did not receive feedback.

Fig. 4
figure 4

Reader confidence with increasing number of reviewed cases for T staging of colon cancer in scale of 1–5 in readers with feedback (dashed line) and without feedback (solid line)

Table 4 Comparison of reader confidence in distinguishing between ≤ T2 and ≥ T3 colon cancer and lymph node involvement

Evaluation of lymph node involvement

Analysis of learning curve for accuracy of lymph node involvement of colon cancer revealed no significant improvement with increased number of reviewed cases (P = 0.850) (Table 2, Fig. 5). Also, Sensitivity, specificity, PPV and NPV did not reveal any improvement with increased reader experience (P > 0.05). There was no statistically significant difference in diagnostic performance of identifying lymph node involvement between groups with and without feedback (P > 0.05).

Fig. 5
figure 5

Learning curve for N staging (N0 vs. ≥ N1) of colon cancer with increasing number of reviewed cases. Accuracy of N staging in readers who received feedback (dashed line) and did not receive feedback (solid line) are presented. Accuracy was plotted and fitted using generalized estimating equations logistic regression models

There was no significant increase in reader confidence of N staging in both groups of reader who received feedback and who did not (Ps < 0.05) and there was no significant association with feedback as the number of reviewed cases increased (P = 0.070) (Fig. 6). Reader confidence for identifying lymph node involvement was higher in groups of readers who received feedback in batches 2, 3, 4 and all examinations combined (Ps < 0.05) (Table 4).

Fig. 6
figure 6

Reader confidence with increasing number of reviewed cases for N staging of colon cancer in scale of 1–5 in readers with feedback (dashed line) and without feedback (solid line)

Reading time

There is a statistically significant decrease in reading time for all readers as the number of reviewed cases increased (between batch 1 vs. 2, 3, 4, batch 1, 2, vs. 3, 4 and batch 1, 2, 3 vs. 4, all P < 0.001) and reading time showed a significant negative correlation with number of reviewed cases (P < 0.001) (Fig. 7). Between groups with and without feedback, readers who received feedback had significantly shorter reading time in batch 1, 3, 4 and all cases combined (Ps < 0.05) (Table 5).

Fig. 7
figure 7

Reading time with increasing number of reviewed cases in seconds in readers with feedback (dashed line) and without feedback (solid line)

Table 5 Comparison of reading time with increased number of reviewed cases

Discussion

This study evaluated the learning curve for T (≤ T2 vs. ≥ T3) and N staging (N0 vs. ≥ N1) of colon cancer on primary CT of four radiology senior trainees, from various countries and different radiology training programs. All four readers, regardless of feedback, which was given after every 20 cases, demonstrated a significant improvement in their performance of T staging and reached a plateau after 38 cases. Although there was a higher accuracy for tumors located in right hemicolon and hepatic, splenic flexures, there was no significant difference in performance between readers who received feedback and who did not. Interestingly, drastic improvement of accuracy was found in initial 38 cases in all readers, with or without feedback and the improvement slowed down after this point. However, diagnostic performances of the readers continued to improve after this point, and reached level of comparable diagnostic performance of experienced readers that were reported [4] after reviewing all the cases in the study. Such finding suggests that further experience after the inflection point is helpful in improvement of diagnostic accuracy of colon cancer and further study with larger number of readers and cases can be helpful to identify the learning curve of radiologists after the inflection point. However, reader’s confidence was significantly different between the two groups. The group that received feedback revealed a significant increase in confidence, with increased number of cases. For all readers—regardless of whether or not feedback was given, the diagnostic performance and reader’s confidence for N staging however did not improve with increased number of cases. The time that is required to read the images decreased for all readers with increased number of cases. Shorter reading times were achieved for readers who received feedback as compared to those who did not.

With proven feasibility of neoadjuvant chemotherapy and immunotherapy in colon cancer patients in clinical trials [1, 13], there is a need for a dedicated training program for radiologists to reach a competent level of locoregional CT staging. Especially, when the decision for neoadjuvant chemo or immunotherapy will be based on CT imaging findings.

Previous studies used a cut off for high-risk colon cancer as ≥ T3c [14, 15], whereas our study applied the cut off between ≤ T2 and ≥ T3, following published clinical trials which considered patients with T3 disease as high-risk, regardless of the degree of extension beyond muscular propria [1, 15]. In our own experience, precise measurement of tumor extension beyond bowel wall is not reproducible due to the limited soft tissue contrast of CT, though further studies are needed to confirm our hypothesis.

We found that readers performed better in T staging of tumors located in cecum, ascending colon (right hemicolon), compared with that located in sigmoid colon (tumors in these two locations were enough in numbers for statistical analysis in this study). This can be explained by the anatomical orientation of the right colon, situated vertically in craniocaudal direction of the body, revealing the perpendicular dimension of the tumor in axial imaging, which most of the radiologists are the most familiar with. The sigmoid colon can be diverse in length and alignment according to individuals, which makes it harder to evaluate tumor stage and requires multiplanar reconstruction tools for better staging. Interestingly, tumors in hepatic and splenic flexures are staged more accurately than in transverse and descending colon but because of the small number of cases it’s too early to draw any conclusions. Further research with large number of tumors in various locations within colon should be done to validate the findings of this study.

We showed no significant improvement in the two groups for N staging. Also reader’s confidence did not increase with more experience, regardless of the presence of feedback. This is in concordance with previous meta-analysis that revealed high variability of CT performances for nodal staging and sensitivities and specificities varied according to the criteria chosen [4]. In this study, we have utilized a simplified version criteria that is used in the FOxTROT trial based on the size of lymph nodes to identify malignant nodes [9, 11] and found there is a high variability among readers and no improvement with more experience.

A previous report showed no effect of feedback to unexperienced readers in detecting tumor in prostate cancer[7]. The authors suggest that radiologists, as a trained visual learners, are capable of improving themselves in self-directed manner and with experience [7]. Although there was no significance difference in the diagnostic performance of readers between the two groups of readers with and without feedback, the readers who received feedback gained significantly more confidence with experience then those who did not receive feedback. This could be explained by the improved reassurance that readers perceive if he/she experiences positive feedback [1]. The findings of the current study suggest radiologists to be trained with exposure of at least 38 cases of colon cancer to ensure the quality of the report. Unfortunately, for N staging training did not provide a significant increase in performance nor in reader’s confidence.

There are several limitations to this study. First, this is a retrospective study. Second, only senior radiology residents participated in this study. Learning curve of the general radiologists who would be more likely to perform locoregional CT staging in clinical setting has not been investigated. However we believe that the findings of our study have provided a good direction towards how many CTs young radiologists should be exposed to to reach an acceptable accuracy for therapeutic decision making. Prospective studies involving radiologists with diverse expertise should be performed in the future. Third, we only evaluated colon cancer staging in binary fashion as discriminating tumor between T2 and T3 stage for T staging and N0 and N1 stage for N staging. This approach allowed us to identify the learning curve of the most robust criteria that will result in different treatment approaches taking in consideration the most current and upcoming changes of the treatment guidelines of colon cancer [1]. Fourth, the feedback was given regularly after reviewing each batch, consisting of 20 cases, a different scenario than in the clinical setting where trainees would more likely be given feedback after each case. However, during the course of the study, the readers in the group with feedback were allowed to ask further information from expert radiologists at any time when needed. Fifth, our cohort included rather large number of right-sided colon cancers. The suspected reason for this finding is that the center involved in the study is specialized for treating MSI tumors, which tend to have more right-sided colon cancers. Further study with cohort of colon cancers with evenly distributed location of tumors should be done.

In conclusion, diagnostic performance of CT T staging of colon cancer improved significantly in the first 38 cases in senior residents radiologists and continued to improve at a slower rate after this point, whereas CT N staging did not show a significant improvement. Colon cancers located the in right hemicolon was easier for radiologists to accurately stage than that in sigmoid colon and feedback for staging tumors did not lead to an improvement in performance suggesting that experience itself would be sufficient for radiologists to become accurate.