Introduction

Total shoulder arthroplasty (TSA) was introduced in the 1950s [1]. Since then, there have been an increasing number of patients undergoing TSA [2], now estimated at over 50,000 TSA performed per year [3]. The spatial resolution of computed tomography (CT) and its ability to image calcium provide an option for evaluation of bone cortex, bone-metal interface, and adjacent soft tissues [4]. Therefore, following replacement, it is not uncommon for periprosthetic shoulder pathologies to be evaluated with CT. TSA-related pathologies include periprosthetic fractures, prosthesis loosening, and bone stock assessment [5, 6]; however, artifacts arising from these metallic implants degrade CT images following TSA and potentially limit the evaluation of adjacent tissues [4, 7].

One method to mitigate the artifact associated with metallic implants is to obtain high keV virtual monoenergetic images by using dual energy CT [8, 9]. This method requires knowledge of a history of TSA prior to scanning and image acquisition with a dual-energy CT (DECT) system. Additionally, projection-based metal artifact reduction algorithms such as iterative metal artifact reduction (iMAR) are also effective in reducing metal artifacts, improving the image quality and diagnostic value of CT images [10, 11]. iMAR reconstructed images can be post-processed after a patient has been scanned using projection CT data, so prior knowledge of TSA is not required prior to image acquisition [12,13,14]. Several studies have investigated the utility of iMAR in metal artifact reduction including a phantom study on metal hip implants [12], a clinical study on instrumented spinal fusion [13], and a clinical study on dental hardware [14], all with encouraging results. In this current study, we sought to determine the utility of iMAR and 130 keV dual-energy virtual monoenergetic images (DE-VMI) to improve bone and soft tissue visualization in CT scans affected by metal artifacts, and by TSA, in particular.

Material and methods

Institutional review board approval was obtained for this study, which was in compliance with the Health Insurance Portability and Accountability Act. All patients agreed to the use of existing medical and imaging records for research purposes. Inclusion criteria for this retrospective study were patients 18 years and older with a TSA or a reverse TSA and archived imaging and projection data corresponding to the dual-energy CT protocol described below. Patients with other metallic implants other than a shoulder prosthesis were excluded from the study. All patients underwent CT imaging of the shoulder between 1/2016 and 8/2016. Nineteen unique patients (13 females [68.4%] and 6 males [31.6%]) with a median age of 67 years (45 to 91 years) with 19 TSA (10 standard TSA and 9 reverse TSA) met inclusion criteria. Out of the 19 patients, 12 had unilateral TSA and 7 had bilateral TSA. Only one TSA underwent dual-energy CT imaging in each patient. The type of TSA alloy was cobalt chrome in 14 cases and titanium in the remaining 5 cases.

CT technique

Patients were positioned with the affected arm placed down and the contralateral arm above the head. Shoulder CTs were all performed on a dual-source 192 slice CT scanner (Somatom Force, Siemens Healthcare, Forchheim, Germany) using a dual-energy scanning protocol, which uses a tube voltage pair of 100/150 kV, with an added tin filter on the 150-kV beam. The quality reference mAs were 415 and 208 for the low and high energies, respectively. The nominal volume CT dose index (CTDIvol) was 24.14 mGy, which matches the scanner radiation output (i.e., CTDIvol) for the single-energy CT exam at our institution. Automatic exposure control was on (CAREDose4D, Siemens Healthcare) so the actual CTDIvol for each patient varied according to the patient size. Detector configuration was 192 × 0.6 mm (physical collimation 96 × 0.6 mm with a z-flying focal spot). Helical pitch was 1.0. The maximum field of view of the 150-kV beam was 352 mm.

Image reconstruction

For all patients, four sets of images were reconstructed: (1) original polychromatic kV images (linear blend = 0.5) approximately equivalent to 120 kV from the DECT scan with weighted filtered back projection (wFBP); (2) polychromatic kV images reconstructed with iMAR; (3) Dual energy virtual monoenergetic images (DE-VMI) at 130 keV; (4) DE-VMI at 130 keV reconstructed with iMAR. For each set of reconstructed images, axial, oblique sagittal, and oblique coronal planes (appropriately oriented to the plane of the scapula) were reconstructed using both a sharp kernel (Br64) for visualization of the bone cortex and trabeculae and soft tissue smooth kernels (Br44 and Qr44). Slice thickness was set at 1 mm for axial plane, and 2 mm for oblique axial, oblique sagittal, and oblique coronal planes. Figure 1 illustrates these 4 different sets of images. iMAR was performed as previously described by Kotsenas et al. [13]. Briefly, vendor-provided iMAR software with a predetermined set of reconstruction parameters for each anatomical region was used for metal artifact reduction. First, the software segments the image into pixels affected or unaffected by the metal artifact. Using the affected pixels, the software creates an image of the metal artifact and detects the metal projections. iMAR then uses this information to correct the images. The correction step is repeated multiple times to minimize the effects of the metal artifact on the image.

Fig. 1
figure 1

Illustrative example of four types of reconstructed images evaluated in this study. Axial soft tissue windows in an 80-year old female with right reverse total shoulder arthroplasty. a Original polychromatic kV images reconstructed with weighted filtered back projection. b 130 keV dual-energy virtual monoenergetic images (DE-VMI). c Polychromatic kV images reconstructed with iterative metal artifact reduction (iMAR). d 130 keV dual-energy virtual monoenergetic images (DE-VMI) reconstructed with iMAR (DE-VMI + iMAR). Bone windows in the same patient are shown in figure parts eh. e Original polychromatic kV images reconstructed with weighted filtered back projection. f 130 keV dual-energy virtual monoenergetic images (DE-VMI). g Polychromatic kV images reconstructed with iMAR. h 130 keV DE-VMI reconstructed with iMAR (DE-VMI + iMAR)

Image analysis

Three readers including two musculoskeletal radiologists with 23 and 24 years of experience interpreting musculoskeletal studies and a first year musculoskeletal radiology fellow who had a training session with non-study patients, all blinded to the reconstruction technique, reviewed each image set independently in four reading sessions with 4 weeks between each session. For each patient, only one reconstruction type was reviewed during each reading session, with the different reconstructions randomly assigned for each patient. During a fifth reading session, each reader reviewed all 4 techniques for each patient and a rank order preference was submitted.

The following anatomic structures were evaluated for extent of artifact and image quality in soft tissue windows: deltoid muscle and the supraspinatus and subscapularis tendons. The following bony structures were evaluated for extent of artifact and image quality in bone windows: glenoid and proximal humeral cortices at the level of the bone prosthesis interfaces and trabecular bone, and the glenoid and proximal humeral bone-metal interfaces.

Each anatomic structure was subsequently evaluated for worst degree of artifact severity and overall diagnostic image quality. Artifact evaluation was performed using a 5-point Likert scale: 1 = no artifact, 2 = minimal streak artifact that does not affect the structure in question, 3 = mild streak artifact that somewhat affects the structure, 4 = moderate streak artifact that considerably affects the structure, 5 = severe streak artifact that significantly affects the structure. Image quality was also scored from 1 to 5: 1 = fully diagnostic, 2 = diagnostic without impairment, 3 = diagnostic with little impairment, 4 = diagnostic with relevant impairment, 5 = non-diagnostic.

During the fifth reading session, rank order preference was established by asking readers to review all 4 sets of reconstruction side by side and rank their preferred sets of images from 1 (best) to 4 (worst). Ties were allowed for rank order preference, and readers were blinded to reconstruction type.

Furthermore, CT number and noise (as defined by standard deviation of the mean CT number) were recorded for each case at 4 different regions of interest (ROI) placed by one of the readers (SB, fellow) after the blinded reading sessions: (1) trabecular bone at a region most affected by artifacts, (2) trabecular bone at a normal-appearing region not affected by artifacts, (3) deltoid muscle at a region most affected by artifacts, (4) deltoid muscle at a normal-appearing region not affected by artifacts. All CT numbers were determined using Visage Imaging (Visage Imaging Inc., Richmond, Australia) radiology PACS system used routinely in our institute. Noise was calculated as the standard deviation (SD) of CT number.

Statistical analysis

All data were analyzed using IBM SPSS Statistics software package for windows version 25 (IBM Corp, Armonk, USA). The Shapiro-Wilk test was used to decide if the continuous variables were normally distributed. Continuous variables are presented as mean ± SD if normally distributed or median (range) if their distribution is not normal. The average scores from the 3 readers were calculated for artifact assessment and diagnostic quality. For artifact scores, the average score for all 7 regions of interest was then calculated. For normally distributed variables, ANOVA with repeated measures test was used to assess the statistical significance of differences between the four image sets. Mauchly’s Test of Sphericity was deployed to determine the assumption of equal variances. Then, we performed post hoc analysis with Bonferroni correction to compare each of the two sets of images. A P value of < 0.008 was considered statistically significant. For non-parametric variables, K-related Friedman test was used to assess the statistical significance of differences between the four image sets. Then, Wilcoxon signed-ranked test was used to compare each of the two sets of images. In order to correct for multiple comparisons, Bonferroni correction was used and a P < 0.008 was considered as statistically significant. For finding the set of images that the readers preferred the most, we determined the set of images with the most number of top rank choices from all the 3 readers. In order to evaluate inter-rater reliability, interclass correlation coefficient (ICC) was calculated for artifact and image quality scores separately. CT numbers within the bone and soft tissue were compared between areas most affected by artifacts and normal areas not affected by artifacts.

Results

Artifact score

Figure 2a shows the artifact scores of 4 different sets of images for all patients. Figure 3a shows the average and 95% confidence interval (95% CI) of the artifact scores across anatomic sites for each reader. Mean artifact scores were 4.28 ± 0.42 for original polychromatic images, 3.30 ± 0.47 for polychromatic kV with iMAR, 3.97 ± 0.54 for 130 keV DE-VMI alone, and 3.02 ± 0.47 for 130 keV DE-VMI with iMAR, respectively. The mean artifact scores differed significantly between the four sets of images (P < 0.001). DE-VMI had less artifact compared with the original polychromatic kV images (P = 0.001). DE-VMI with iMAR produced less artifacts when compared with original polychromatic kV or DE-VMI (P < 0.001 for both).

Fig. 2
figure 2

Scatter plots for average artifact scores (a), overall image quality scores (b), and preference rank (c) for 4 different sets of images (original polychromatic kV, polychromatic kV with iterative metal artifact reduction [iMAR], 130 keV dual-energy virtual monoenergetic images [DE-VMI], 130 keV DE-VMI with iMAR [DE-VMI + iMAR]) in all patients. Patients are labeled by number along the x-axis.

Fig. 3
figure 3

Mean and 95% confidence interval of artifact scores (a), overall image quality scores (b), and preference rank (c) for 4 different sets of images (original polychromatic kV, polychromatic kV with iterative metal artifact reduction [iMAR], 130 keV dual-energy virtual monoenergetic images [DE-VMI], 130 keV DE-VMI with iMAR [DE-VMI + iMAR]) as recorded by each reader. Note that, for every reader, original polychromatic kV images performed the worst and 130 keV DE-VMI with iMAR performed the best, on average.

Overall image quality

Figure 2b compares the overall image quality scores for the four types of reconstructed CT images for all patients. Figure 3b shows the average and 95% confidence interval (95% CI) of the overall image quality scores for each readers for each type of reconstruction. The median (range) image quality was 4.33 (3.33 to 5) for original polychromatic kV, 3.33 (2.33 to 4.33) for polychromatic kV with iMAR, 3.67 (3 to 5) for 130 keV DE-VMI, and 3 (2.33 to 4) for 130 keV DE-VMI with iMAR, respectively. There was a statistically significant difference between the four image sets with regard to image quality scores (P < 0.001). DE-VMI had a higher image quality compared with original polychromatic kV images (P = 0.005). Adding iMAR to the reconstruction process significantly improved the image quality of both original polychromatic and DE-VMI (P < 0.001 and P = 0.002, respectively).

Preference order

DE-VMI with iMAR was the most preferred type of reconstructed image and was the top choice of at least 2 out of the 3 readers in 18 out of 19 cases. In one case, there was a tie in readers’ preference between original and monoenergetic images both reconstructed with iMAR. Differences in reader preference across reconstructions were statistically significant (P < 0.001). Figure 2c demonstrates the preference order of the 4 different sets of images for all patients. Figure 3c shows the average preference ranking for each reader.

CT number and noise

The median (range) CT number and noise at soft tissue and bone of ROIs most affected by the artifact and normal tissue ROIs not affected by the artifacts are summarized in Tables 1 and 2, respectively.

Table 1 Median (range) of CT numbers (Hounsfield units) in bone and soft tissue regions of interest within the soft tissue or bony anatomic structures of the shoulder most affected by artifact and in an adjacent normal region not affected by artifact
Table 2 Median (range) of noise in bone and soft tissue regions of interest within the soft tissue or bony anatomic structures of the shoulder most affected by artifact and in an adjacent normal region not affected by artifact

In the normal soft tissue regions, there was no statistically significant difference in CT numbers between the 4 different sets of images (P = 0.153). In the soft tissue and bony structures in regions of artifact, CT numbers were significantly higher and noise was significantly lower in the iMAR images compared with those in the original images and in DE-VMI with iMAR compared with those in DE-VMI without iMAR (P ≤ 0.003). A similar trend was observed for monoenergetic images compared with original polychromatic kV images, except for CT numbers in regions of bony artifact.

Inter-rater reliability

ICC, an indicator of inter-rater reliability, was 0.88 (95% CI 0.82 vs 0.92) for artifact score, 0.79 (95% CI 0.69 to 0.86) for the overall image quality score, and 0.92 (95% CI 0.88 to 0.95) for readers’ preference order.

Discussion

This study sought to compare several methods of reducing metallic artifact in the presence of TSA. The results suggest that, while DE-VMI and iMAR independently reduce artifact and improve image quality, concurrent use of these techniques results in significantly less visual artifact and improved image quality. Significant quantitative reduction in image noise in regions of artifact substantiates these subjective findings.

The findings of the current study are in line with previous studies on the utility of iMAR in reducing metal artifacts in patients with a variety of metal implants from dental hardware to hip prosthesis. In one study, Subhas et al. showed that, in patients with either shoulder or hip prostheses, subjective measures such as the degree of streak artifacts and the diagnostic quality of the images are better in iMAR images compared with those in original wFBP images [10]. Moreover, they showed that the difference between pre- and post-operative CT attenuation numbers was significantly smaller in iMAR images compared with that in original wFBP images, suggesting that iMAR does not result in substantial CT number distortion. Although preoperative assessment of CT numbers was not obtained in this current study, evaluation of areas not obscured by artifact serves as a surrogate for preoperative imaging. As in the study by Subhas et al., the difference in attenuation numbers between areas affected by artifact and those unaffected was smaller in the reconstructions that utilized iMAR.

Similar to our findings, previous studies have shown significant improvement in artifact and image quality scores when DE-VMI reconstructions are compared with original single polychromatic energy kV images reconstructed with wFBP [8, 9, 15]. In one study, Yoo et al. showed that increasing the energy of DE-VMI significantly reduces artifact index, improves image quality, and increases delineation scoring of fracture lines [15] in patients with implants placed for the management of distal radius fractures.

In our study, readers preferred DE-VMI at 130 keV with iMAR to both DE-VMI without iMAR and original polychromatic kV images with iMAR. This is in line with the findings of Bongers et al. who reported the highest degree of artifact reduction in combined monoenergetic and iMAR images in patients with dental implants or hip prostheses [16]. In contrast, in a study on patients with total ankle prosthesis implants by Khodarahmi et al., the highest degree of metal artifact reduction was achieved in polychromatic kV images with iMAR compared with other sets of images including monoenergetic with iMAR and monoenergetic images without iMAR [8]. There could be two potential reasons behind the discrepancy between the findings of these studies. First, the techniques used in these studies are different. Similar to the current study, Bongers et al. used 130 keV DE-VMI and in line with our findings concluded that a combination of monoenergetic images with iMAR reconstruction results in the highest level of artifact reduction. On the contrary, Khodrami et al. used 150 keV and 190 keV images and concluded that a combination of monoenergetic images with iMAR reconstruction achieves less artifact reduction compared with iMAR alone. Second, the anatomy of the evaluated region of interest could result in different results for the use of a combination of monoenergetic images with iMAR reconstruction as these studies have been performed on different body parts and prostheses. Khodrami et al. reported a decrease in high density artifacts when they used a combination of monoenergetic images with iMAR reconstruction, but polychromatic kV was better at reducing low-density streaks. It is possible that different types of artifact respond differently to a combination of monoenergetic images with iMAR reconstruction.

The limitations of the current study include the relatively small sample size, which limits subgroup analysis based on the type of prosthesis. Future studies might consider different types of prostheses and constructs, whose geometry and elemental composition may result in varying degrees of artifact. Moreover, the majority of patients enrolled in this study were female. Males on average have wider shoulders and the iMAR technique in these patients could be limited because the X-ray beam traverses more tissue. Next, the study focused on readers’ perception of effects of artifacts on images and the overall image quality. While we have examined the role of 130 keV virtual monoenergetic imaging and iMAR in reducing artifacts in these patients, we did not examine the diagnostic benefit of these reconstructions for different diagnostic tasks; this will require further study. Additionally, we used a single monoenergetic reconstruction, 130 keV. Different monoenergetic keV’s will likely have varying effects on different artifacts [15]. Finally, iMAR is vendor-specific. Different vendors provide their own artifact reduction algorithms.

In patients with metal artifacts due to shoulder replacement surgery, the combined use of DE-VMI and iMAR reconstruction significantly improves both subjective and objective indicators of image quality. In patients who are undergoing shoulder CT following TSA, consideration should be given to scheduling the CT exam on a DECT system with metal artifact reduction.