Introduction

Global concerns for a shortage of healthcare workers are growing. World Health Organization projections show that by 2030, the physician shortage in countries of the Organisation for Economic Co-operation and Development (OECD) may mount to 1.2 million [1]. Many countries seek to remedy this shortage, partially by employing international medical graduates (IMGs). For instance, the number of IMGs in Germany is 11.8% and rising [2]. However, employing IMGs may also come at a cost, since language barriers and cultural differences can lead to difficulties in communication [3,4,5,6,7]. As such, a deficit in communication between healthcare workers has been shown to be one of the major risk factors for patient safety [8, 9]. Apart from limitations in verbal communication, language barriers could also lead to difficulties in creating and understanding written documents, such as radiology reports.

In the field of radiology, structured reporting is increasingly viewed as a tool that could potentially enable an automatic translation of reports into foreign languages [10,11,12]. Structured reporting templates with predefined text elements in multiple languages could allow radiologists to report in their mother tongue and then translate the report into another language automatically. While multiple studies of various exams have shown that structured reporting leads to improved completeness and comprehensibility, and results in higher satisfaction by referring clinicians [12,13,14,15,16,17,18,19,20,21,22,23], to date, the challenge arising from reports created by non-native speakers has been addressed by only a few studies so far [24, 25].

In 2012, Stramare et al established a system for bilingual structured reporting (English/Italian) and tested this tool for the purpose of multidisciplinary meetings [24]. The authors further shared the multilingual reports on two international databases, demonstrating that multilingual structured reports (SRs) can contribute to international research and exchange of knowledge [24]. However, the generated reports had text output in bullet point form only, with an average length of 22.3 terms, and were not evaluated individually [24].

Being able to translate reports into different languages could not only facilitate teleconferences and multidisciplinary meetings as demonstrated by Stramare et al, it could also serve the growing demand for telemedicine and teleradiology. Teleradiology has the potential to improve workload distribution, on-call services, specialist consultation, shorten reporting times and reduce costs [25]. However, there is one major obstacle to international teleradiology, the language barrier. In 2010, Ross et al examined the potential of multilingual structured reporting tools to overcome this obstacle and build a cross-border teleradiology network. In this project, a multilingual structured reporting tool for knee x-rays was used to create reports in Estonian or Lithuanian which were then automatically translated into Danish [25]. The findings of these reports were then compared to those from reports created by a Danish radiologist, revealing consistent findings in 80% of reports [25].

Neither of these studies evaluated qualitative parameters of the individual reports or the satisfaction of referring clinicians. To be beneficial to patient care, reports should not only concur in their main findings, but also fulfill high-quality standards. We believe that further research evaluating the risks and potential of multilingual SRs is required to justify more widespread implementation.

We therefore aimed to compare the quality as well as the satisfaction of referring clinicians with content, comprehensibility, and clinical relevance of reports created by German-speaking and English-speaking radiologists (GS and ES) with multilingual structured reporting templates.

Materials and methods

Patient selection and study design

German-speaking and English-speaking radiologists created free-text reports (FTRs) and SRs in German and English. SRs were created and translated with multilingual structured reporting tools. Reports were evaluated with a standardized, self-designed questionnaire by clinicians.

To achieve better generalizability, we chose three different study types with varying levels of complexity: chest x-ray, shoulder x-ray, and CT pulmonary angiogram (CTPA) for pulmonary embolism.

A retrospective search was performed in our database to identify potential exams for inclusion in the present study. Image acquisition dates range from September 2013 to September 2016. Selection criteria were as follows: For supine chest x-rays, the image quality had to be adequate and at least one prior image was required. A patient’s shoulder x-rays were included if they had reported shoulder pain without a history of trauma or tumor. Patients who had undergone a CTPA study that confirmed an acute pulmonary embolism were also included. From a larger pool of exams that fulfilled the eligibility criteria, 24 exams of a randomly selected subpopulation (14 men, 10 women, age 25–89 years) were included in the study.

The study was approved by our institutional review board. Informed consent was waived due to the retrospective and anonymous nature of our study.

Templates for structured reporting

Radiology reports were generated using online reporting templates which were designed using online software (Smart Reporting GmbH). The templates consist of checklists containing point-and-click menus. As the user selects different options, full sentences are generated accordingly (see Fig. 1).

Fig. 1
figure 1

Screenshot of structured reporting tool. The report is generated by selecting predefined options or entering values into text fields. The translation tool allows switching the language of the template and report flexibly [retrieved from Smart Radiology software, “CT pulmonary embolism” template, 4 Nov 2018]

Structured templates for each of the three study types were designed by experienced radiology residents under the supervision of faculty members and based on recommendations by the RSNA Reporting Initiative for CTPA [26], shoulder [27], and chest x-ray [28] (see Fig. 1 and Supplement 1). The CTPA and shoulder x-ray templates had both been evaluated in two separate, previously published studies that compared the quality of German SRs and FTRs [16, 29].

While the templates were originally created in German, they were subsequently translated into English by a medical student who is a German and English native speaker under supervision by a radiology resident. The online template included a feature that enabled the reporting radiologist to instantly switch the report language, enabling ES to create a German report (and vice versa). Before these bilingual templates were used for reporting purposes, slight adjustments were made according to feedback by radiology residents from a British teaching hospital. While the templates contained an option to enter free text if necessary, the primary aim was to evaluate the structured templates. Therefore, readers were encouraged to limit the use of the free-text option wherever possible. In the instances where additional information was entered by the ES, the same native speaker mentioned above subsequently translated the sentences. A total of 1.848 words were translated and of these, 154 words (8.3%) were translated manually.

Creation of radiology reports

The readers were radiologists with 4 to 8 years of clinical experience including the reporting of the three exams in this study. Two were from the Department of Radiology at the University Hospital, LMU Munich (Germany) and two from the Department of Radiology at King’s College London (Great Britain).

Every German reader created 8 reports for each of the three examination types, resulting in 24 reports per reader. Half of these 8 reports were SRs (SR_GS) and the other half FTRs (FTR_GS). The English readers, in contrast, only created SRs—4 reports for each of the three examination types, leading to a total of 12 reports per reader (Fig. 2a). SRs by English readers were automatically translated into German using the integrated translation tool (SR_ES).

Fig. 2
figure 2

Study design. a Chest x-ray, shoulder x-ray, and CT pulmonary angiogram (CTPA) images were randomized and examined by two German-speaking and two English-speaking radiologists. German-speaking radiologists created 3 × 8 reports each (4 SRs + 4 FTRs), whereas English-speaking radiologists created only 3 × 4 each (4 SRs), leading to a total number of 72 reports. b Reports were evaluated by German clinicians. While chest x-ray and CTPA reports were reviewed by two internists, shoulder x-ray reports were assessed by two orthopedists each

Readers were randomly assigned to group A or B. These groups defined which 4 reports would be SRs and which 4 would be FTRs. After completing a report, the reader saved the original report and an additional, automatically translated version.

Evaluation of the reports

To avoid confounding the aptitude of the templates for creating reports in a foreign language with the language proficiency of the reporting radiologist, we evaluated the German reports, since the ES did not speak German.

The number of report evaluations for this study was based on previously published sample size calculations from prior studies that compared report quality between report types [14, 20, 23]. Following those considerations, our study sample consisted of 24 reports per report type (FTR_GS, SR_GS, SR_ES), resulting in a total of 72 reports and 144 evaluations by referring physicians.

Reports were anonymized, uniformly formatted, and inserted into a PDF document in random sequence. The age of the patient and the clinical question were included, but no clues to who had created the report.

Each report was evaluated by two German clinicians. Reports on chest x-rays (N = 24) and CTPA for pulmonary embolism (N = 24) were rated by two specialists for internal medicine each (6 and 22 years of clinical experience), whereas shoulder x-ray reports (N = 24) were rated by two specialists for orthopedics (9 and 12 years of clinical experience). The total number of evaluations amounted to 144 (Fig. 2b).

Reports were rated with an online evaluation tool called LimeSurvey [30]. Our questionnaire was created by conducting a literature review and researching the most important qualities of a radiology report [10, 21, 31,32,33,34,35,36,37,38,39]. We concluded that these were content, comprehensibility, and clinical consequences. These main qualities were rated on 6-point Likert scales. Additionally, we included a 6-point rating scale for overall quality and asked our clinicians to guess the nationality of the reader.

The questionnaire for report evaluation (see Fig. 3) was created in close collaboration with the LMU’s Institute for Medical Education and the physicians who later evaluated the reports. An initial draft of the online survey was discussed and modified based on their input on the wording and content of the evaluation.

Fig. 3
figure 3

Report evaluation questionnaire. This figure shows the report evaluation questionnaire using checkboxes and 6-point Likert scale questions for content, comprehensibility, clinical consequences, and overall report quality

Rater’s survey

After a referring physician completed the report evaluations, a follow-up survey was conducted the same day. It contained 10 free-text questions regarding the clinician’s opinion about potential advantages and disadvantages of SRs and FTRs and which type of report the clinician would personally prefer (see Supplement 4).

Statistical analysis

Results are summarized as medians with interquartile range or frequencies and percentages, as appropriate. Statistical analysis was performed using the non-parametric Friedman test for paired data, comparing the three different report types. Post hoc analysis with Wilcoxon signed rank tests was conducted with a Bonferroni correction applied for nine pair-wise comparison for the three items—quality, comprehensibility, and clinical consequence—resulting in a significance level set at α = 0.05/9 = 0.0056.

Results

Overall quality

In most cases, SR_GS (N = 48 ratings; 100%) received either very good (N = 23; 47.9%) or good ratings (N = 11; 22.9%). Likewise, SR_ES (N = 48 ratings; 100%) were mostly rated to be very good (N = 15; 31.3%) or good (N = 12; 25.0%). In contrast, FTR_GS (N = 48; 100%) were predominantly found to be good (N = 19; 39.6%) or satisfactory (N = 12; 25.0%). While among the SR_GS there was only one evaluation with a barely acceptable grade (N = 1; 2.1%) and unacceptable grade (N = 1; 2.1%) each, these numbers were higher for SR_ES (barely acceptable: N = 5; 10.4%, unacceptable: N = 6; 12.5%) and FTR_GS (barely acceptable: N = 5; 16.7%, unacceptable: N = 1; 2.1%) (see Fig. 4).

Fig. 4
figure 4

Overall report quality. The majority of SR_GS and SR_ES received very good or good ratings

Although SR_GS received a higher number of very good or good ratings (N = 34; 70.8%) and a lower number of barely acceptable or unacceptable ones (N = 2; 4.2%) than SR_ES (very good or good: N = 27; 56.3%, barely acceptable or unacceptable: N = 11; 22.9%), an overall comparison of the report quality revealed no significant difference (Z = − 2.560; p = 0.010; α = 0.0056). In the same manner, no significant differences were found between SR_ES and FTR_GS (Z = − 1.117; p = 0.264). However, SR_GS exhibited a significantly higher overall report quality than FTR_GS (Z = − 3.489; p < 0.001).

A descriptive subanalysis showed differences in overall quality ratings based on the exam type. Whereas the majority of both SR_GS and SR_ES received very high or high ratings in the shoulder x-ray (87.5% of SR_GS, 93.8% of SR_ES) and CTPA exam (87.5% of SR_GS and 62.5% of SR_ES), a clearly smaller percentage of SR_GS and SR_ES were viewed to have a very high or high quality in the chest x-ray exam (37.5% of SR_GS, 12.5% of SR_ES). For chest x-ray exams, a particularly high number of SR_ES was rated as barely acceptable or unacceptable (56.3%), compared to only 6.3% of SR_GS and 6.3% of FTR_GS (see Supplement 2).

Comprehensibility

Median comprehensibility ratings added up to 5.50 for SR_GS (IQR = 1.75), 5.00 for SR_ES (IQR = 2), and 4.00 for FTR_GS (IQR = 1.75), each on a 6-point Likert scale (“The language of the report is clear and easy to understand”; 1 = strongly disagree; 6 = strongly agree). While the comprehensibility was significantly better for SR_GS compared to that for FTR_GS (Z = − 2.848; p = 0.004), there were no significant differences for either of the other two combinations (see Table 1).

Table 1 Comprehensibility and clinical consequences. Median values and interquartile ranges of each report type for comprehensibility and clinical consequences are illustrated. SR_GS received superior median values for both criteria compared to SR_ES, but the difference was not significant (α = 0.0056). SR_GS received significantly better ratings for comprehensibility compared to FTR_GS

Clinical consequences

Median ratings for positive impact on further clinical management amounted to 6.00 for SR_GS (IQR = 2), 5.00 for SR_ES (IQR = 2), and 5.00 for FTR_GS (IQR = 2), each on a 6-point Likert scale (“Based on the report a decision on further clinical management can be made, e.g., further diagnostics or treatment”; 1 = strongly disagree; 6 = strongly agree, see Supplement 3 for samples of reports that scored the highest possible ratings for clinical decision-making). None of the comparisons revealed a significant difference (see Table 1).

Native language of readers

For SR_GS, the evaluating clinicians were able to guess the native language of the reader correctly in 50.0% of cases (N = 24), while they guessed incorrectly or chose “I don’t know” in 18.8% (N = 9) and 31.3% (N = 15) of cases each.

In contrast, there were only 22.9% of correct guesses (N = 11) for SR_ES that were automatically translated from English into German. The referring clinicians guessed incorrectly in 41.7% of the cases (N = 20) and were not sure in 35.4% (N = 17).

Rater’s survey

All raters preferred SRs over FTRs, pointing out completeness, improved readability, and structure as the main advantages. Reduced flexibility of reports and increased length were perceived as potential drawbacks.

Discussion

Multilingual structured templates can help radiologists overcome language barriers

Our study revealed no significant differences between SR_ES (automatically translated from English to German) and SR_GS regarding overall quality, comprehensibility, and clinical consequences. Although the differences were not significant, SR_GS still received a higher percentage of very good and good reports, in addition to higher median ratings for comprehensibility and clinical consequences.

At first glance, these results seem rather surprising: From a technical point of view, the final German report of SR_GS would be identical to that of SR_ES if only the same selection options were chosen, regardless of the input language. Additional information stated in free-text elements is also unlikely to have affected these results since there were only few cases where free text entered by the reader had to be translated manually by a native speaker.

Importantly, the differences in quality seem to predominantly concern chest x-ray reports. Nine out of 11 SR_ES with a barely acceptable or unacceptable rating were reports on chest x-ray exams. When asked what the main reason for the given grade was, reviewing clinicians most often criticized lack of pertinent information. This reason was given eight times for SR_ES and three times for SR_GS. Therefore, it seems that the ES specified fewer items, leading to less detailed reports. A possible explanation could be differences in reporting between medical centers, which a number of studies have found to be extensive [37, 40, 41]. From our personal experience, German clinicians may also expect rather detailed chest x-ray reports, whereas in surveys in other EU countries, almost 50% of clinicians thought that “no abnormal findings” could suffice as a chest x-ray report [42]. Our results on chest x-ray reports support findings by Johnson et al that in some cases, structured reporting does not necessarily lead to higher report quality [43]. Thus, it seems advisable to test each template before implementation, and the individual preferences of referring clinicians and reporting radiologists need to be taken into consideration, especially in international research collaborations.

Irrespective of these considerations, for all three parameters, the SR_ES received at least ratings equal to those of FTR_GS, which are still used in the clinical workflow of most radiology departments [44, 45]. In addition, in most cases (77.1%), the referring physicians could not correctly guess the native language of the SR_ES, highlighting the effectiveness of the automatic translation tool. Since the utilized structured reporting software does not machine-translate free text but instead adds predefined translations for each text segment to the report, translation errors or inept wording can be avoided reliably. However, fully SR may not always be feasible (e.g., due to specific conclusions for the individual patient such as atypical findings [46]). To account for this need, combining SR with a couple of free-text sentences at the end, as a conclusion or impression, may prove beneficial for successful clinical implementation. Therefore, the templates used in this study included elements allowing for free-text entries when deemed necessary by the reporting radiologist.

Structured reporting leads to improved report quality and comprehensibility

SR_GS had a significantly higher overall quality and significantly improved comprehensibility compared to FTR_GS. All raters stated SRs as their preferred report type. These observations are in line with findings from previous publications demonstrating the advantages of SRs compared to FTRs with regard to completeness, clarity, overall quality, and satisfaction by referring physicians [14, 16,17,18,19,20,21,22,23]. Importantly, these qualities have been confirmed in highly standardized examinations (e.g., videofluoroscopic exams [14] and cranial MRI scans for multiple sclerosis [47]) as well as in exams with a high grade of complexity (e.g., staging of rectal cancer [48] and hepatocellular carcinoma [49]).

At the same time, potential disadvantages of structured reporting need to be taken into consideration. For instance, radiologists might spend less time looking at the exam (eye-dwell) and more time looking at the template, which might lead to missed findings [50]. Another common concern is that the time spent creating structured reporting templates, adapting to them, and adhering to their rigid structure may lead to inefficiency [51].

Limitations

The findings of this study need to be viewed in the light of several limitations.

First, since the study had a retrospective design and reports were created in a study setting, the quality of SRs still needs to be validated in clinical routine.

Next, the assigned ratings may to some extent also reflect inter-individual differences due to the small number of readers and raters (four each). The overall quality as assessed by the raters depended both on the level of detail specified by the readers and on the personal preference of the raters. Additional prospective multi-center studies including a larger number of reporting subjects and raters may provide a more balanced view.

Also, the multilingual feature of the structured templates was only available for English and German, two closely related languages from the same language family with similar linguistic features. Therefore, the findings of this study need to be tested for additional combinations of languages, particularly ones with more extensive differences in grammar (e.g., English and Chinese).

Finally, the translation of free-text fields poses an essential challenge. Even in a SR, free-text fields are indispensable as not every potential finding can be covered by predefined elements. Although free-text fields played a minor role in the present work, the general necessity of free-text fields is a critical limitation of multilingual templates regarding their implementation in actual clinical practice. In recent years, machine-translation has emerged as a viable tool to overcome language barriers in the field of healthcare [52,53,54] and could potentially be used to translate free-text fields in SRs. Numerous machine-translators are already publicly available, including commercial ones (e.g., IBM Websphere Translation Server) and free, web-based ones (e.g., Bing Microsoft Translator, Google Translate). To date, there is only very limited evidence on the translation accuracy of these applications for medical purposes. However, as a high degree of accuracy is crucial in a clinical setting, those tools warrant further evaluation before they can be used as part of the clinical workflow.

Conclusion

Altogether, these findings have important implications for the international radiological community. Due to migration and globalization, there is an increasing need for radiologists to be able to create reports in different languages in order to improve communication with international patients and colleagues. Multilingual templates could serve as an effective tool for multidisciplinary conferences, international specialist consultation, and scientific exchange in the increasingly globalized radiology community. Furthermore, the importance of telemedicine is growing and transnational teleradiology networks among countries with different languages could be facilitated, thus providing more cost-effective out-of-hour radiology services and broader coverage.