Introduction

Communication of diagnostic radiology results is heavily, although not exclusively, reliant on the written radiology report [1]. As such, it is not surprising that scholarship regarding radiology reports has existed for decades [25]. Following the widespread implementation of speech recognition beginning in the late 1990s, structured reporting templates gained supporters, as the clarity and comprehensiveness implied in their itemized format was preferred among both radiologists and referring physicians [69]. Speech recognition has myriad benefits, however, 4.8–22 % of reports created via speech recognition may have errors [1013] and 1.9 % of these errors may alter report interpretation [13]. When capturing any error, no matter how minor, up to 60 % of reports may contain speech recognition errors [14]. Although there are few existing studies that correlate use of structured reports to quality [6, 14], usage of pre-created templates should decrease speech recognition time and thus may reduce error rate. The written radiology report also serves as a billing document, where the comprehensive and standardized nature of structured reports can add value.

In the emergency department, as well as in other settings, clinicians prefer itemized structured reports [8, 15, 16], presumably because of their readability and clarity. For these reasons, our academic emergency radiology practice transitioned to itemized structured reporting approximately 3 years prior to this study, and all radiologists were encouraged to use pre-created reporting templates. Previous work in subspecialty divisions has shown that (with incentives) radiologist compliance with structured reports can approach 100 % [16]. This study was undertaken to examine template compliance in our emergency radiology division, the frequency in which an itemized radiology report was substantially altered, and the effect that structured reporting template use had on radiologist-specific parameters such as audio dictation time, report length, and total radiologist study time. We hypothesized that template driven reports would have shorter audio duration times leading to a decrease in speech recognition errors and decreased overall radiologist study time resulting in increased radiologist efficiency.

Materials and methods

This retrospective study was approved by our institutional review board. Data was acquired from our departmental Powerscribe 360 (Nuance Communications, Burlington, MA) database. All examinations were ordered from the Emergency Departments (ED) of three university-affiliated hospitals. Interpretations were provided by a dedicated 24-7-365 Emergency Radiology Division consisting of ten fellowship-trained attending radiologists during the study period.

Data collection

For this analysis, seven common imaging studies performed in the Emergency Department were selected: computed tomography (CT) of the chest with intravenous (IV) contrast, CT head without IV contrast, CT abdomen and pelvis with IV contrast, CT abdomen and pelvis without IV contrast, one view chest radiographs, two view (posteroanterior and lateral) chest radiographs, and right upper quadrant ultrasound. Consecutive ED occurrences of these examinations during a 2-month period (July 1, 2014 through August 31, 2014) identified 3449 diagnostic radiology reports. Corresponding patient and examination data were extracted into a database.

Using the Powerscribe 360 database for each examination, we documented the individual audio dictation time, dictated words, total words, and total time the radiologist spent on a study from the time the report was created until when it was completed and signed. We excluded examinations with resident involvement or when greater than one study was linked and dictated in a single radiology report.

Template analysis

For every radiology report, we recorded whether a basic template was used by comparing the final radiology report to base templates in our system. If a basic template was used, we assessed if there were missing elements from that template or if the template was complete.

We used the following guidelines for the template qualifications:

  1. 1.

    Every standard template in our system has capitalized findings and impression. If a report did not have findings and impression sections capitalized, then no basic template was used.

  2. 2.

    Our template reports all contain itemized subheadings under the findings section. If some of these subheadings were missing the report was considered to be a basic template with missing elements.

  3. 3.

    Even if criteria 1 was fulfilled, if the findings section in the template was entirely erased and compiled freelance then it was considered that a basic template was not used.

Statistical methods

For numeric covariates, the mean and standard deviation of the outcomes are calculated and presented. For categorical variables, frequency and percentage are calculated and presented. Fisher’s exact test is employed to test if there is any association between the providers and questions. For univariate analysis of study type and template usage, ANOVA, Chi-square test, and Kruskal-Wallis test were employed based on the characteristics of the data set, as detailed in the result section. The significance level is set at 0.05. SAS 9.4 is used for data analyses and management.

Results

Among the 3449 cases, 32 cases were excluded from analysis due to missing data. In 81.2 % (n = 2772) of all cases, a basic template was used. In 2.8 % (n = 78) of these cases with template usage, the radiologist removed key elements from the structured template. Descriptive statistics of the 3417 reports with complete data are presented in Table 1.

Table 1 Descriptive variables of 3417 radiology reports

Table 2 presents the reports by study type. For each study type, reports are displayed by template usage, as well as audio duration, total words, and total radiologist time. For all covariates, there was a significant association with study type (p < 0.001). That these parameters (audio duration, word usage, and radiologist time) should be associated with study type is not surprising—as more complex cross-sectional imaging exams take longer to dictate and use more total words than radiographs. That template usage is significantly associated with study type implies that in certain cases (such as the right upper quadrant ultrasound with 88.3 % template usage with no missing elements) radiologists find it easier to deploy templates.

Table 2 Analysis of template use by study type

Table 2 Analysis of template use by study type: univariate association with study type and whether or not a template was used. *The parametric p value is calculated by ANOVA for numerical covariates. For variable template and missing, using, p value is calculated by Chi-square test for categorical covariate. **The non-parametric p value is calculated by the Kruskal-Wallis test because of non-homogeneity of variance.

Subsequently, the audio duration, total words, and total radiologist time were analyzed by individual attending to see if these reporting properties varied by attending. These covariates all were significantly associated with the individual attending (p < 0.001). For total words, the mean among attendings varied from 86.2 for the most succinct attending to 170.5 for the most verbose, while mean audio duration ranged from 18 to 71.6 s. Interestingly, the mean longest reports did not correspond to the audio time, indicating that some radiologists either typed a significant among of additional material, used pre-created macros which populated sentences, or dictated at a much faster word-per-second rate (getting more words into a shorter audio duration period).

Returning to the overall data set, Table 3 examines the impact of template usage across all reports. Template use results in a significant decrease in audio duration time (p < 0.001), but does not impact total number of words or total radiologist time spent on exams.

Table 3 Association of report parameters with template use

Table 3: Association of report parameters with template use: values are displayed as mean ± SD. *The parametric p value is calculated by ANOVA. **The non-parametric p value is calculated by Kruskal-Wallis test.

Given that template use did not have an impact on total words or total radiologist study time in all reports—the data was grouped by attending and reanalyzed. We sought to determine if the total word length and time per study for an individual attending was impacted if they used a template or not. In 20 % (2/10) of cases, attendings had significant differences in total word length when templates were used. Ten percent of cases (n = 1/10) were higher word count with template usage (p < 0.001), and 10 % of cases (n = 1/10) were lower word count with template usage (p < 0.001). These adjusted p values were calculated by the Satterthwaite test because of non-homogeneity of variance. For total study time, there was significance in only 10 % of cases (n = 1/10), with longer total study time for radiologists in the template usage group (p < 0.05).

A regression analysis was performed to see what variables predicted longer audio duration time or longer total radiologist time. The type of study and basic template usage were significant predictors of audio duration (p < 0.001). Total radiologist time per study was not significantly associated with any variables.

Discussion

More than 3 years after implementation, template compliance in our practice is strong, with 81 % use of structured reporting templates. However, our compliance does not approach the 100 % reported by Larson et al., perhaps due to the constant feedback and financial incentives in that organization [16]. This comparison in itself may be a defense for radiologist incentivization if 100 % template use is desired. When radiologists deploy templates, in only 2.8 % of cases do they delete major template elements. Thus, the readability, completeness, clarity, and billing utility which are the hallmark of structured itemized reports [8] are unaffected in the majority of cases. Template use significantly decreased audio duration time, with mean dictation time of 32 s in cases of template use, and 60 s without template use, representing a 47 % reduction. In the era of speech recognition, the radiologist replaced the transcriptionist as report editor, resulting in increased errors of syntax, grammar, and semantics [17, 18]. Given that speech recognition results in errors in up to 9.7 % of reports [13], we hypothesize that decreasing audio duration would proportionally decrease speech recognition errors. Although a logical extension of our data, we did not specifically examine for decreased errors with template use.

Total radiologist time spent on examinations was not significantly associated with template use. To be clear, the use of speech recognition software (SRS) improves report turnaround time [19, 20]; we show here that within the context of SRS, template deployment does not further improve radiologist time per report. It stands to reason that in the same overall time per study, the non-template using radiologist is performing different tasks given the marked difference in audio duration time among these two groups. Specifically, the template users dictate for 32 s out of 348 s, which means they are dictating for 9.2 % of the time they have a study open. For non-template users, this is higher at 16.8 %. How does this difference in action impact study interpretation and error rate? Our study does not answer this, but raises interesting questions about the impact of multi-tasking on the diagnostic radiologist. The authors note that all users in our system were familiar with our structured templates, which had been around for just over 3 years, and these numbers likely represent a plateau for our group. When individual radiologists were examined, one radiologist (10 % of the total pool) showed a statistically significant increase in total study time in cases of template use. This amounted to a mean increase of 60 s when a template was used; examining more granular data did not readily explain why this was the case for this individual, and this remains unclear to the authors.

Total report word length was not significantly associated with template use. Yet, when we looked at radiologists on an individual level to see how templating affected their dictation length, 10 % (1/10) of radiologists had significantly longer dictations with templates (p < 0.001), 10 % (1/10) had significantly shorter dictations (p < 0.001), and the remaining 80 % of radiologists showed no affect. However, we note that in our study the internal contents of the unstructured, non-template reports were not examined for completeness. Additionally, there may be some degree of selection bias on the part of the radiologist: the decision to deploy or not deploy a template may be based on study complexity. We know the structured itemized reports are comprehensive; it could be that the unstructured reports, while the same overall length, do not contain all necessary elements in all cases. Notably, the ultimate in standardized language is point-and-click structured reporting, which has not been shown to improve report accuracy or completeness [21, 22].

There is significant individual variability among attendings in all parameters [23], with audio duration, total report length, and total radiologist study time (all with p < 0.001). Many radiology groups, including ours, report turnaround time metrics to individual radiologists, in a quality effort to track and improve efficient and timely report availability. Our analysis shows cumulative radiologist time per study to be an overall basket of many reporting tasks. As a specific example: individual radiologist time per study ranged from 86 to 701 s on average. The radiologist with the longest time per study also had the most total average words at 170, versus a low of 86 among all radiologists. However, a separate radiologist had the longest audio duration time. The radiologist with the shortest time per study was not the radiologist with the shortest reports or the shortest audio duration time. This type of in-depth, individualized reporting analysis could be helpful to coach both radiologists and trainees to improve speed through efficiency in dictation and report creation by identifying specific factors in which they are taking more time then peers.

Limitations

Our study has limitations. First, our data arise from a single academic institution among radiologists who were familiar with our templates and applied to an Emergency Department (ED) examination set. Thus, generalizability must be done with care. Specifically, the familiarity of our attendings with our templates likely made them easier to use and increased compliance. Although we had a large number of reports, there were only ten radiologists, and individual factors among these radiologists could conceivably affect the data. Finally, our attending radiologists range in age and training background, with some attendings who used templates in training, while others had significant practice or training exposure in transcription or freelance speech recognition use. These affects are difficult to ascertain.

Conclusions

Standard itemized template usage is accepted by radiologists with greater than 80 % compliance, even in an organization without incentives for template use. Template use does not affect radiologist time per study and does not affect report length. However, template use significantly decreases audio dictation time, which is the major cause of errors in the era of speech recognition software.