Introduction

On a daily basis, imaging plays a central role in medical decision-making. At tertiary referral centers, the ability to render second-opinion interpretations of imaging studies performed elsewhere becomes important. Referring physicians may rely on the expertise of their subspecialty radiology colleagues to provide more detailed re-interpretation, thereby increasing confidence in management decisions. In many institutions, pathology departments offer similar services, where surgical pathology and cytopathology slides are submitted for second-opinion interpretations [1] and therapeutic interventions are delayed until the official in-house review is performed. Despite the fact that discrepancy rates with primary reports are low after second-opinion radiology interpretations, several studies have demonstrated direct benefit in patient care as the second-opinion interpretations are more accurate in cases of discrepancies, particularly in oncologic patients [2,3,4,5,6,7]. Our Musculoskeletal Radiology (MSK) division is part of a tertiary referral center with a high volume of orthopedic oncology patients.

There is wide variety in how academic radiology institutions handle the review of imaging studies performed at other practices [8]. At our institution, outside studies submitted by referring clinicians either can be uploaded and stored in the picture archiving and communication system (PACS) without re-interpretation, or can be uploaded to PACS with an accompanying request for second-opinion interpretation. The outside studies that are not submitted for second-opinion interpretation are stored as reference studies in PACS, in an effort to optimize and centralize patient care data and produce a more-complete electronic medical record (EMR). In our radiology department, more than 20,000 outside studies per year are submitted to the film library to be stored in PACS, and more than 5000 are additionally submitted for second-opinion interpretation. There is substantial work associated with the performance of second-opinion interpretations, including processing of the images by the Image Processing Department (our “film” library) and increased workload for the interpreting radiologists. A volume analysis of second-opinion interpretations in a tertiary cancer center has demonstrated that the increase in daily work can be as high as 18% [9].

Recent changes in health care policy call for cost constraints, added-value analysis in medical procedures/imaging studies, strict and complete documentation, and implementation of institutional relative value units [10]. Analyzing and prospectively tracking the added value of second-opinion consultations is challenging, as large amounts of data are generated via these re-interpretations, with limited known impact on medical care and or degree of discrepancy with the original primary report. The ability to mine this data using an automated process would be ideal in order to track discrepancies and assess impact on patient care. The purpose of this study was to assess the patient care impact and accuracy of subspecialty MSK radiology interpretation through implementation of an automated quality assurance tool designed to prospectively track discrepancies between the primary and secondary interpretations.

Materials and methods

Institutional Review Board approval was obtained to review patient data as part of a quality improvement/assurance activity, waiving the requirement for informed consent. The study was compliant with the Health Insurance Portability and Accountability Act.

In the MSK radiology division, all the studies submitted for second-opinion interpretation are reviewed by subspecialty-trained musculoskeletal radiologists who exclusively read MSK studies, and whose experience ranges from 1 to 40 years after MSK fellowship training. Starting in July 2013, a standardized structured reporting template called MOI-RADS (Musculoskeletal Outside Interpretation) was created and integrated easily into the existing radiology workflow (Table 1). At the time of second-opinion interpretation, radiologists were instructed to insert the template at the end of each report, with a pick-list to choose the level of concordance or discordance compared with the primary report and the potential effect in clinical management on the basis of their subjective assessment. Staff radiologists, trainee radiologists, and referring physicians were trained in the use and interpretation of the template with lectures and electronic communications. For an initial 1-year period (July 2013 to July 2014), mandatory implementation of the standardized template was put in place, and after this initial period, long-term optional implementation was performed.

Table 1 Scale for indicating level of concordance or discordance with original report

As described in Table 1, all studies were coded based on the level of concordance or discordance with the primary report. Depending on the likelihood to impact patient care, the discordance was subclassified as not clinically relevant (category B) or clinically relevant (category C) and whether the finding was not detected (subcategory d) or detected but incorrectly interpreted (subcategory i). In cases where additional information was available at the time of second-opinion interpretation that was not likely available at the time of primary interpretation, category D was used with subsequent level of concordance and discordance subcategory listing (A, Bi, Bd, Ci, Cd).

The standardized template inserted at the end of each second-opinion radiology report had a unique identifiable code “MOI RADS Category “X”” to enable fast and reliable automated search. Using the mPower® search engine (Nuance®, USA), second-opinion MSK interpretations containing the standardized template between July 2013 and March 2020 were identified. The request rate of second-opinion interpretations by referring providers and discordance rates compared with primary reports were analyzed.

To analyze the long-term utilization rate of the standardized template by radiologists, a search in our radiology information system (RIS) Image Processing Department (“film” library) was performed to identify all MSK studies submitted for second-opinion interpretation. The film library processes all the second-opinion interpretation requests, and as part of the institutional protocol, the primary report is stored in the EMR. Before March 2015, in our health system, the processing of outside studies was based on manual import of images from compact discs (CDs), other electronic devices, or by manually digitalizing the film. After March 2015, the film library implemented a centralized digital format to upload the studies and process requests for second-opinion interpretation. This new system allowed the centralization of all the images from the health system and the ability to analyze and keep track of all the outside studies submitted to be stored in PACS. Therefore, reliable and complete data regarding the number of MSK studies submitted for second-opinion interpretations is available to be analyzed only from March 2015 to March 2020, during the long-term optional implementation period of the standardized template.

Of the interpretations that demonstrated likely clinically relevant discordance with the primary report (category Ci; category Cd; category D, Ci; and category D, Cd), a sample of 30 studies was randomly selected and the EMR was reviewed to evaluate the impact on patient care and change in medical management. The EMR was reviewed by a senior radiology resident (PGY5) with more than 7 years of experience in MSK research and 10 months of musculoskeletal-concentration training as a PGY5. Clinical impact on patient care and change in medical management was defined as whether the second-opinion report by our subspecialty-trained MSK radiologists resulted in a change in diagnosis that affected patient’s prognosis, follow-up (tissue sampling, observation, or additional studies), treatment (surgical, chemotherapy, medical, antibiotics), and/or referral to other specialists. In addition, a comprehensive review of the EMR was performed to identify which report was correct in the final diagnosis or best recommendation (primary report vs. second-opinion interpretation vs. both). Final diagnosis was identified based on pathology results, surgical findings, additional imaging, and clinical long-term follow-up. In cases of clinical long-term follow-up, at least 2 years of available clinical or imaging follow-up without growth of the mass was required for the mass to be categorized as benign.

Descriptive statistics were used to analyze the request rate of second-opinion interpretations by referring providers, utilization rate of the standardized template by radiologists, discordance rates compared with primary reports, and impact on patient care. Statistical analyses (percentages were used for descriptive statistics, and chi-squared test for trend in proportions was used to evaluate utilization rate of the standardized template by the interpreting radiologists) were performed using IBM SPSS (version 20; IBM SPSS, New York, NY, USA) software.

Results

From March 2015 to March 2020, a total of 9500 outside MSK studies were processed by the film library. Of those, 1012 (10.7%) were submitted for second-opinion interpretation and 8488 (89.3%) were stored as reference films in our PACS. During this time, the overall utilization rate by the radiologist of the standardized template was 65.0% (658/1012), with marked decreased in utilization rate over time (p < 0.005); the template was only utilized in 21.6% (53/245) of secondary-interpretation reports during the last year (March 2019 to March 2020) (Table 2).

Table 2 Utilization rate of the standardized template by year

From June 2013 to March 2020, a total of 1052 second-opinion interpretations were identified using the standardized template. The primary report was available in 1037 studies (99.5%), to enable assessment of the level of concordance or discordance. In 14 second-opinion reports (1.4%), the standardized template was manually changed by the interpreting radiologist and a category was not assigned. Services with higher requests for second-opinion interpretation were oncology (n = 351, 33.4%) and orthopedic surgery (n = 255, 24.3%) (Fig. 1). At the time of the second-opinion interpretation request, 647 patients (61.5%) were seen in the outpatient clinic, 61 patients were inpatients (5.8%), 9 patients were seen in the emergency room (0.9%), and 335 (31.8%) had an unknown location. The modality that demonstrated highest request for second-opinion interpretation was magnetic resonance imaging (n = 753, 71.6%), followed by radiographs (n = 231, 21.9%), computed tomography (n = 68, 6.5%), and ultrasound (n = 1, 0.1%).

Fig. 1
figure 1

Services of referring providers requesting second-opinion interpretation

Overall, 67.9% (n = 714) of the examinations demonstrated concordance and 29.4% (n = 309) of the examinations demonstrated discordance between the primary report and the second-opinion interpretation (Fig. 2). There were clinically important differences between the primary report and the second-opinion interpretation in 184 studies (17.5% of the total studies and 59.5% of the discrepancies). As seen in Fig. 3, there were approximately twice as many discrepancies in the interpretation of the abnormalities (n = 217, 70.2% of the 309 discrepancies) than in the detection of the abnormality (n = 92, 29.8% of the 309 discrepancies). There were clinically unimportant changes in 125 studies (11.9% of the total and 40.5% of the 309 discrepancies).

Fig. 2
figure 2

Distribution of level of concordance and discordance in comparison to primary report

Fig. 3
figure 3

Discordant categories. Distribution of studies that were classified as discordant in comparison to primary report

Review of the EMR of the randomly selected sample of discordant cases categorized as likely to be clinically relevant revealed a change in management in 63.3% of the cases (19/30, 95% confidence interval of 43.9–80.1%). After the second-opinion interpretation, the following 3 patterns of changes in medical management were identified: (1) “downgrading” or “de-escalation” in management with a procedure not being performed as it was deemed unnecessary based on the secondary interpretation; (2) “upgrading” or “escalation” of management with a procedure was performed for findings not identified or incorrectly interpreted in the primary report; or (3) non-diagnostic finding in the primary report was changed to diagnostic finding in the secondary interpretation changing management from additional sampling to appropriate treatment.

Examples of de-escalation in management included changing biopsy of an indeterminate lesion in the primary report to imaging follow-up for a nonaggressive lesion on the secondary interpretation, as well as changing the recommendation for surgery for a full-thickness tendon tear to physical therapy for a low-grade tendon tear (Fig. 4). This change in medical management was seen in 11 cases (57.8% of the 19 studies with change in medical management; 95% confidence interval of 33.5–79.7%).

Fig. 4
figure 4

A 24-year-old female presents with lump in medial knee. Sagittal fat saturated proton density (a) and T1w (b) images of the knee demonstrated an irregular subcutaneous isointense lesion (arrows). Primary report described the lesion as indeterminate and biopsy was recommended. Images were submitted for second-opinion interpretation and lesion was interpreted as hematoma or fat necrosis (category Ci). Follow-up imaging 3 months later, sagittal fat saturated proton density (c) and T1w (d) images of the knee, demonstrated interval decreased in size (arrowheads) consistent with non-aggressive lesion likely hematoma or fat necrosis

Examples of escalation included recommending surgery or biopsy in suspicions/indeterminate lesions, meniscal tears, tendon tears, or calcaneal coalitions that were not identified or incorrectly interpreted in the primary report (Figs. 5 and 6). This change in medical management was seen in 7 cases (36.8% of the 19 studies with change in medical management; 95% confidence interval of 16.3–61.6%).

Fig. 5
figure 5

A 31-year-old male presents knee pain after trauma. Sagittal fat saturated proton density (a) and fat saturated coronal T2W (b) images of the knee. Primary report described the horizontal lateral meniscal tear (arrowhead); however, it failed to detect the full-thickness tear of the posterior cruciate ligament (PCL) (arrow) (category Cd). Full-thickness PCL tear was confirmed on subsequent surgery

Fig. 6
figure 6

A 26-year-old male with history of pigmented villonodular synovitis (PVNS) status post synovectomy 3 years ago, presents with worsening knee swelling. Lateral knee radiographs performed in 2011 (a) and 2014 (b). Radiograph performed in 2014 was initially interpreted in the primary report as decreased knee effusion and possible distal femoral fracture. Radiograph performed in 2014 was submitted for second-opinion interpretation, which described worsening soft tissue abnormalities in the suprapatellar joint recess (arrow) and popliteal fossa (arrowhead), suggestive of worsening PVNS; MRI was recommended for further characterization (category Ci). No fracture was described. MRI was performed (axial proton density, c) and confirmed worsening nodular synovial thickening consistent with worsening PVNS in the suprapatellar recess and popliteal fossa. No fracture was visualized on MRI

Finally, medical therapy was initiated in a patient with multiple lytic lesions consistent with multiple myeloma rather than the outside recommendation for additional diagnostic sampling for a single lesion identified in the primary report (5.2% of the 19 studies with change in medical management; 95% confidence interval of 0.1–26.0%) (Fig. 7).

Fig. 7
figure 7

A 72-year-old male with monoclonal gammopathy presents with pelvic pain. CT pelvis was initially interpreted in the primary report as single expansile lesion in the left iliac bone with associated pathologic fracture (arrow); however, it failed to detect additional smaller lytic lesions in the right iliac bone (arrowheads). Because the information regarding monoclonal gammopathy was not known by the primary institution (outside report), the second-opinion interpretation categorized this case as category D, Cd and medical management for multiple myeloma was started

Although potentially clinically relevant, no change in medical management was identified in 11 studies (36.7% of the 30 medical records reviews; 95% confidence interval of 19.9–56.1%), for example in Fig. 8.

Fig. 8
figure 8

A 66-year-old female with known history of multiple myeloma presents with shoulder pain. Fat saturated proton density coronal image of the shoulder was initially interpreted in the primary report as normal bone marrow signal. Second-opinion interpretation demonstrated patchy abnormal bone marrow signal consistent with bone marrow infiltration changes of multiple myeloma (arrows), category Cd. However, patient had known history of disseminated bone marrow changes in the spine and lower extremities from prior studies and no change in medical management was identified

All 30 cases had final diagnoses or at least 2 years of clinical follow-up in cases of presumable benign masses that were used as reference standard to evaluate which report was correct. Second-opinion interpretation by subspecialized MSK radiologist in our institution was correct in in 80.0% of the studies (24/30; 95% confidence interval of 61.4–92.3%), noting that our institution was correct in all the 19 second-opinion interpretations that demonstrated change in medical management.

Discussion

Requests for second-opinion interpretation of outside imaging studies are common in tertiary care referral centers and may either take the form of a “curbside consult” or a formal reinterpretation request. Benefits from providing this service include strengthening the professional relationship and trust between the referring physicians and the radiologists, centralization of patient data with storage of images in the PACS, and reduction in healthcare cost by avoiding unnecessary reimaging of the patient [11]. Nevertheless, the clinical impact of secondary interpretations in patient management has been difficult to evaluate despite the large amount of data generated by requests from referring providers. We have demonstrated that with the implementation of a standardized template coding this data, the discrepancy rate and classification can be easily extracted, facilitating the analysis of patient care impact of second-opinion interpretations.

Our study comprised a large series of challenging studies from a tertiary referral center with high volume of orthopedic oncology patients; nevertheless, our rate of clinically important discrepancies was 1 in every 5–6 studies submitted for official second-opinion interpretation, with a review of a sample studies demonstrating that the final diagnoses favored the second-opinion consultation in 80% of the cases. Similar findings have been described in second-opinion interpretations of MSK studies by subspecialized radiologists [4, 7], where clinically important discrepancies were identified in 1 of 4–4.5 studies, with re-interpretation been correct in 82–93% of the cases. Comparable findings have been demonstrated in second-opinion interpretation of neuroradiology studies by sub-specialized radiologists, where clinically important discrepancies were found in 1 of 13 studies, with final diagnosis favoring second-opinion interpretations in 84% of the cases [2]. Several studies have demonstrated that in tertiary oncologic care centers, there are clinically important benefits on accurate diagnosis, staging, management, and prognostications by primary or second-opinion interpretations of cross-sectional images by subspecialized radiologists [2,3,4, 6, 7, 12].

The rate of discrepancies between radiologists is variable depending on the subspecialty, clinical setting (emergency room, outpatient, oncologic ward, etc.), and the radiologist’s level of training [2,3,4,5,6,7,8,9, 12,13,14], with a wide range from 0.1 to 25%. In a large study conducted by the American College of Radiology (ACR), through the RADPEER program, the overall disagreement rate was 2.9% after the review of more than 20,000 second reviews in 14 facilities [15]. Higher rates of discrepancy are seen in our study, as well as prior studies that have analyzed the discrepancy rate of second-opinion interpretations [2,3,4,5,6, 14]; we hypothesize that this is due to the fact that in second-opinion consultations, radiologists are reviewing a set of images with a higher rate of abnormalities than would be typically encountered in regular practice.

In our study, the majority of the discrepancies were described as a failure of interpretation of the abnormality rather than failure to detect the abnormality. This is consistent with prior studies of second-opinion interpretation in MSK imaging [4, 6, 7] but different in comparison to neuroradiology studies, where failure of detection of the abnormality by the primary interpreter was higher than failure to detect the abnormality [2]. In MSK imaging, particularly in cases of bone and soft-tissue lesions, the accurate interpretation of aggressive or non-aggressive imaging features is challenging for less-experienced or general radiologists. Rozenberg et al. [6] demonstrated that in orthopedic oncology patients, there is a higher rate of clinically significant discrepancies when studies are initially interpreted by non-MSK radiologists as opposed to subspecialty-trained MSK radiologists, 27.9% versus 9.2%, respectively. In MSK imaging, the distinction between an aggressive and non-aggressive neoplasm is vital to decide the next step in management and the need for diagnostic or therapeutic interventions, such as biopsies, particularly in the setting of “do not touch” lesions.

To the best of our knowledge, there is limited literature regarding implementation of a quality assurance tool in the radiologist workflow to prospectively track and analyze the discrepancy rate of second-opinion interpretations with primary interpretations in MSK imaging. Tracking this data is important for quality assurance purposes, assessing the utility of a second-opinion consultation service, evaluating added-value analysis in current changing healthcare policies, supporting the need for adequate reimbursement, and for educational purposes in training programs and peer learning. Utilization of the standardized template is paramount in this process. As shown in our study, the utilization of the standardized template decreased with time; this could be related to changing in staffing during the study period and non-mandatory implementation of the template. Prior studies have demonstrated high-compliance with the utilization of a similar coding standardized template at the time of study interpretation, using mandatory implementation with active monitoring systems and direct email notification to interpreting radiologists in cases of non-compliance [16].

Several limitations are present in this study, including selection bias, since studies that are submitted for second-opinion interpretations are more likely to show abnormal results; in addition, there could be clues for accurate diagnosis by additional laboratory findings, pathology results, or the type of referral service (oncology vs. internal medicine vs. traumatology). To decrease this type of selection bias, the category D (additional information not available by primary report) was added in the standardized template. Only reports that were encoded with the standardized template were analyzed in this study; therefore, the level of concordance or discordance of a large number of second-opinion interpretations is still unknown. Although a limitation, this was needed to evaluate the utilization rate of the standardized template without a mandatory implementation. This study evaluated the effect of second-opinion interpretations on clinical management in the routine clinical environment; therefore, interobserver variability was not measured as code selection was made based on subjective assessment by the interpreting radiologist. As an academic institution, most of the second-opinion interpretation studies are reviewed by a trainee (“second set of “eyes”) and by the attending radiologist, which in effect could represent a double read that may not be possible in the primary practice. Lastly, the ability to look at the primary imaging report may have further biased the interpreter towards or away from the primary reader’s conclusion.

In conclusion, implementation of a quality assurance tool embedded in the radiology workflow of second-opinion interpretations can facilitate the analysis of patient care impact by subspecialty musculoskeletal radiologists; however, stricter and mandatory implementation is necessary to maintain sufficient utilization of the tool. Oncologic studies were the most common indication for second-opinion interpretation. Although the original and second interpretations in the majority of cases were in agreement, subspecialty musculoskeletal radiology interpretation was shown to be more accurate than primary interpretations and impacted clinical management in cases of discrepancy.