Introduction

Salivary gland tumors represent approximately 5% of all head and neck tumors [15, 31]. The annual incidence of salivary gland tumors varies between 3.3 and 10.3 cases per 100,000 and for malignant tumors from 0.8 to 1.7 cases per 100,000 in Western countries [1, 3, 4, 9] The majority (55–75%) of these lesions, however, are benign and consist mainly of pleomorphic adenomas. Mucoepidermoid carcinoma is the most common malignant tumor, while the parotid gland is most affected from all salivary glands [13, 34, 37]. Some studies predicted an increased incidence of salivary gland tumors as high as 5% per year [22].

Therefore, a correct pre-operative diagnosis of salivary gland tumors is particularly important for a correct, appropriate and instant treatment [12]. Since early diagnosis improves the prognosis of salivary gland tumors [17], it is essential for successful treatment of malignant salivary gland tumors of the first degree (“low grade”) with a 5-year survival rate of up to 90%. The 5-year survival rate in patients with high-grade tumors reaches 50% [11, 26].

The methods of diagnosis primarily depend on the location of lesions, and thus, fine-needle aspiration cytology (FNAC) and ultrasound are mainly applied to lesions of the superficial glands. For lesions of the small salivary glands, lower lobes or suspected cases of malignancy, other methods like MRI or CT should be used. One of the advantages of MRI and CT techniques is the ability to gauge the extent and invasion of the tumor, which can be additionally used for diagnosis [5, 6, 20]. CT allows a better diagnosis in case of bone infiltration, inflammation and vascular injury. When a tumor is suspected, however, MRI should be preferred as it allows a more accurate assessment of the extent of infiltration and tumor demarcation [35].

The value of the diagnostic methods (MRI, CT) in the diagnosis of salivary gland tumors was determined in several studies [18, 30], as well as the importance of multidetector CT (MDCT) for better assessment and investigation of inflammatory pathologies and tumor extension [24, 25]. Furthermore, enhancement in imaging of certain tumors is best assessed by dynamic CT/MRI [21]. However, the influence of radiological experience on the diagnostic accuracy and inter-observer agreement has not been investigated to date.

In the current era of oncology, salivary gland tumors are interdisciplinarily treated. As ear, nose and throat (ENT) physicians with varying amounts of experience in radiology have access to imaging data, CT and MR scans are routinely reviewed before surgery. Furthermore, experience levels of the initially reporting radiologists in radiology departments differ considerably.

This retrospective study includes the radiological evaluation of MR/CT images by three radiologists with differing experience in order to determine the influence of experience on the radiological diagnosis of salivary gland tumors.

Materials and methods

Patients

This study was approved by the local ethics committee with a waiver for written informed consent. From 2006 to 2012, 203 patients were examined at our institute with suspected salivary gland tumors; 75 of those were excluded either due to missing histopathological findings or because of conspicuous features in MRI/CT images resulting from a prior iatrogenic procedure. The remaining 128 cases were retrospectively analysed in this study (Table 1). The diagnosis was performed based on 116 MRI or 12 CT examinations. In 127 cases, a resection (99%), and in one case (1%) an FNAC was performed for histopathological evaluation. The average time between radiological examination and surgery was 28 days (range 1–243 days).

Table 1 Histological findings, with morphology codes of international classification of diseases for oncology (ICD-O) [14]

Magnetic resonance imaging (MRI)

MRI examinations were performed using a head and neck coil combination with 1.5 Tesla units (Magnetom Avanto, Siemens, Erlangen, Germany). Turbo spin-echo (TSE) (TR/TE 539/13 ms) and spin-echo (SE) (TR/TE 600/17 ms) sequences of T1-weighted images in transverse direction were acquired before and after contrast administration. Additional images with fat suppression in coronal (TR/TE 773/17 ms) and sagittal planes were acquired. The T2-weighted images (TSE) with fat suppression were included in transverse plane (TR/TE 4200/110 ms); 113 of 116 patients received Gd-DOTA as contrast agent (Dotarem, Guerbet, France) with 0.1 ml/kg of body weight. In three cases, no contrast agent was administered.

Computed tomography (CT)

The CT examinations were performed with a dual-source CT scanner (Somatom Definition Flash, Siemens Healthcare, Forchheim, Germany). The rotation time was 0.28 s with a collimation of 64 × 0.6 mm, and temporal resolution of 75 ms. The images were obtained with a tube voltage of 80–140 kV and a slice thickness of 4 mm. Contrast agent was intravenously injected with 2 ml/s flow-rate (80 ml Iopamidol, Imeron 400, Bracco, Konstanz, Germany).

Radiological performance

All imaging data were reviewed by three radiologists with varying amounts of experience. The first radiologist (R1) had more than 20 years of experience, the second radiologist (R2) 11 years, and the third radiologist (R3) 7 years of experience in head and neck imaging. Readers were blinded to histopathological results, medical history and prior radiological reports. To provide a better comparability of read-outs, observers were allowed to evaluate each case for a maximum time of 10 min.

Image analysis

Evaluation of imaging data was performed on regular PACS workstations (Centricity 4.2, GE Healthcare, Dornstadt, Germany); 116 (91%) MRI and 12 (9%) CT scans were diagnosed.

To determine malignancy or benignity, reviewers rated tumor dignity on the basis of presented images. The differentiation between malignant or benign lesions was made based on criteria that have been thoroughly described elsewhere [2, 8, 16, 23, 27, 36]. Images were randomly presented to reviewers and their diagnoses were documented on evaluation sheets. The evaluation sheets contained free description text for diagnosis without any predefined diagnoses and fields with pre-defined values for localization and affected glands.

For tumor classification, the entity was regarded as correct when the radiological diagnosis corresponded to histopathological findings. Differential diagnoses were not considered. For dignity determination, diagnoses with benign tumors were rated as benign and with the same method assessed for malignant tumors. If incorrectly affected glands or localization were reported, the diagnosis was considered wrong.

Statistical analysis

Statistical evaluation of the results was performed using commercially available software (MedCalc statistical software version 12.7.2; MedCalc Software bvba, Ostend, Belgium, and BiAS 8.6.0, Epsilon Verlag, Frankfurt am Main, Germany). A p value < 0.05 was considered as statistically significant. To assess patient distribution, a t test and a Chi-square test with regard to gender and age were used. The values for sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the three raters (R1, R2, R3) were calculated regarding dignity and entity. For measuring the agreement among the readers, Cohen’s kappa values (κ) were calculated and interpreted as follows: κ < 0.2 slight; 0.2 ≤ κ < 0.4 fair; 0.4 ≤ κ < 0.6 moderate; 0.6 ≤ κ < 0.8 substantial; κ > 0.8 almost perfect agreement [19].

Results

Dignity

R1 achieved the highest values of all raters for malignancy/benignity assessment, both using MRI and CT with values of 100% for sensitivity, specificity, NPV and PPV. One exception was the determination of benignity by MRI, which ranged from 91.6 to 98.7%. For benignity assessment using MRI, R3 obtained higher sensitivity values than R2 (difference: 7.3%). In benignity assessment using CT, R2 and R3 reached the same values. However, they differed slightly in specificity (2.0%) and PPV (5.8%) for malignancy assessment (Table 2).

Table 2 CT and MRI sensitivity (sens.), specificity (spec.), positive predictive value (PPV), negative predictive value (NPV) and accuracy (Acur.) for malignancy/benignity, achieved by observers R1 (> 20 years of experience), R2 (11 years) and R3 (7 years)

For malignancy assessment using CT, the radiological performance was proportional to the radiological experience, except for the sensitivity and NPV, where R2 reached the same results as R1 (100%).

The agreement (k) among readers ranged from 0.45 to 0.50 (p < 0.001) for MRI. For CT, a more distinct distribution of the Cohen’s kappa values was found. The greatest agreement was detected between R1 and R2 (κ = 0.74; p < 0.001) and the lowest between R2 and R3 (κ = 0.28; p < 0.001); the value of R1–R3 was in between (κ = 0.50; p < 0.001).

Classification of tumors

Table 3 shows a comparison between histopathological findings and diagnoses by the three radiologists. R1 achieved the best results of all three reviewers. In the classification of Warthin’s tumors using MRI, R1 achieved values above 95.0%, followed by R2 (range: 66.6–84.4%) and R3 (range: 43.1–73.6%). Regarding CT evaluation, the results of R3 and R1 were equal (100%), followed by R2 with values ranging from 50.0 to 100%.

Table 3 Radiological diagnoses in comparison to histopathological findings, sorted by radiologists, most experienced R1 (> 20 years), R2 (11), R3 (7)

In terms of classification of pleomorphic adenomas, R1 reached 100% both for MRI and CT. R2 achieved a sensitivity of 100% for CT scans. The difference to R1 was 11.1% for specificity, followed by R3 with differences of 33.3%. For MRI, the gap between R1 and R2 for sensitivity/specificity increased to 23.0%/11.1% and to 56.4%/32.4% for R3. Regarding the classification of other benign tumors, R2 and R3 revealed a sensitivity of 0% for MRI. Regarding other malignant tumors, the difference for sensitivity and PPV between R2 and R3 was 12.5%/11.1% in favor of R2. For the group without findings of malignancy, R3 reached a sensitivity of 8.0% for MRI and 0.0% for CT. In the classification of squamous cell carcinomas (SCC) by MRI/CT, R1 achieved 100% for all values and R2 and R3 had the same results for sensitivity. The specificity of R2 at CT was higher than R3 with 22.2%.

The highest agreement for MRI was detected between R1 and R2 (κ = 0.62; p < 0.001), followed by R2 and R3 (κ = 0.30 p < 0.001) and R1 and R3 (κ = 0.28; p < 0.001). For CT, the highest agreement was found between R1 and R2 (κ = 0.49; p < 0.001), followed by R2 and R3 (κ = 0.37; p < 0.001), and R1 and R3 (κ = 0.38; p < 0.001).

Discussion

In this study we investigated the influence of the reviewers’ experience on accuracy of radiological evaluation of salivary gland tumors using MRI and CT.

Our results indicate that radiological performance appears to be proportional to the experience of the radiologist for classification of malignant tumors as well as for dignity assessment.

The observer with more than 20 years of experience (R1) reached the highest scores for sensitivity, specificity, PPV and NPV (Fig. 1a, b). For malignancy assessment using CT, R2 achieved the same results as R1 in sensitivity and NPV (100%/100%), which is reflected by the high agreement for both reviewers (Fig. 2). The lowest agreement for CT was found between R2 and R3, although R2 and R3 achieved the same results in benignity assessment. This might be explained by the fact that both R2 and R3 had the same number of false diagnoses, however, for different cases. For benignity assessment using MRI, R3 achieved better results in up to 11% of cases in comparison to R2. The difference in sensitivity between R1 and R3 was 6.10 and 7.31% between R3 and R2. A similar study including two radiologists with at least 5 years of experience resulted in similar values [39]. In our study, however, there was always a difference (6–33%) between R1 and the other two reviewers.

Fig. 1
figure 1

Calculated values for dignity for MRI (a) and CT (b) sorted by observers (R1 > 20, R2 = 11, R3 = 7 years of radiological experience)

Fig. 2
figure 2

Inter-observer agreement regarding classification and dignity for MRI/CT, assessed by Cohen’s Kappa k for the most experienced observer R1 (> 20 years), the less experienced R2 (11 years) and the least experienced R3 (7 years)

For malignancy assessment using MRI, R2 and R3 achieved the same sensitivity and almost the same values for specificity. These results are similar to the results of a study with two observers, who had 5 and 10 years of experience, respectively [7]. However, in contrast to our study, the results were not calculated separately for each radiologist. The results of R1 for the sensitivity in malignancy assessment using MRI or CT are similar to a study conducted with experienced radiologists [29]. Furthermore, the results of our study are based on histopathology, resulting in a higher accuracy of the reference standard [32].

For tumor classification using MRI, we noted a confusion of Warthin’s tumors with pleomorphic adenomas. R3 assessed 17 of 38 (R2:6; R1:0) cases with Warthin’s tumors as pleomorphic adenomas and 17 of 39 (R2:4; R1:0) cases with pleomorphic adenomas as Warthin’s tumor (Figs. 3, 4]. This might also explain the relatively good results of R3 for dignity assessment, since R3 distinguished benign from malignant tumors, but made less precise classification of tumors. R1 reached values higher than 95% for Warthin’s tumor assessment using MRI. A study of two radiologists with 6 months and 8 years of experience performing the preoperative diagnosis of Warthin’s tumors showed similar results [10]. Our study furthermore indicates that the values of diagnostic performance are higher for experienced radiologists vs less experienced ones. Thus, our results suggest that especially radiologists with greater levels of experience should primarily be involved in reporting CT and MRI examinations of salivary gland tumors.

Fig. 3
figure 3

A 69-year-old man with a histologically confirmed Warthin’s tumor of the left parotid gland. R1 diagnosed it correctly as Warthin’s tumor and diagnoses of R2/R3 were incorrect (pleomorphic adenoma/malignancy). a axial T1; b fat-saturated T2 and c contrast-enhanced T1 image with fat-saturation

Fig. 4
figure 4

MRI of pleomorphic adenoma on the left parotid gland of a 35-year-old woman correctly diagnosed by R1 and incorrectly diagnosed by R2 as a cyst and by R3 as a Warthin’s tumor. a axial turbo spin-echo (TSE) T1; b sagittal T2 W TSE and c axial contrast-enhanced spin-echo sequence

Regarding classification of pleomorphic adenoma using CT, R1 reached values of 100%, R2 more than 88.8% and R3 ranged from 66.6 to 88.8%. These results are comparable with other studies [38].

An according observation can be made for the groups “other benign tumors” and “diseases without neoplasm”. All three reviewers partially diagnosed some cases of “other benign tumors” incorrectly as Warthin’s tumor (R1: 1 of 5; R2 and R3: 2 of 5). In the group “diseases without neoplasm” this was even more pronounced, where R1 misdiagnosed 1 of 13 cases in MRI as Warthin (R2: 6; R3: 9).

Similar to other studies, the most common benign tumor in our study was pleomorphic adenoma [33]. However, the most common malignant tumor in our study was SCC, which is in contrast to other studies where this is the mucoepidermoid carcinoma or adenocystic carcinoma [28]. The absence of mucoepidermoid carcinoma may somewhat bias the results and should be considered in future studies.

Regarding malignant tumors, some cases with SCC were incorrectly diagnosed as pleomorphic adenomas (n = 3) by R3 and as pleomorphic adenoma and Warthin’s tumor (n = 1 in each case) by R2 [Fig. 5]. It is also important to note that SCC using MRI was mainly misdiagnosed as pleomorphic adenoma by less experienced radiologists, but not using CT scans. None of the healthy patients without neoplasm were correctly diagnosed by R3.

Fig. 5
figure 5

A 40-year-old man with a histologically confirmed pleomorphic adenoma on the left parotid gland. All observers (R1-R3) diagnosed it correctly as pleomorphic adenoma. a axial T1; b axial T2 TSE and c contrast-enhanced axial T1 image

The highest inter-observer agreement was documented between R1 and R2, reaching from moderate to substantial agreement. The agreement between R3 and the two experienced radiologists, however, was fair to moderate.

There were several limitations to this study. The ratio between benign and malignant tumors was not balanced, resulting in a limited comparison between the results for malignant and benign tumors. Another limitation is the ratio of MRI and CT being 1:10, thus the results regarding CT have limited reliability and should be verified in a larger study. In three cases of MRI, no contrast agents were administered, and this may mildly have influenced our results. The majority of cases consisted of Warthin’s tumor, pleomorphic adenoma and carcinoma. This study should be extended in future investigations including more cases with other tumor entities for more accurate and reliable results. This study is highly dependent on the expertise of the individual raters. The selection of the reviewing radiologists could also be extended with an exacter gradation of radiological experience of the observers.

In conclusion, our results indicate that increasing levels of radiological experience leads to a higher accuracy in classification of salivary gland tumors using both CT and MRI. Furthermore, a long radiological experience (> 20 years) is required for accurate radiological diagnosis of uncommon benign salivary gland tumors.