Introduction

Total knee arthroplasty (TKA) is the treatment of choice for patients with end-stage knee osteoarthritis and intractable pain, but various complications may occur after TKA, such as aseptic loosening, infection, osteolysis, and periprosthetic fracture [1, 2]. CT plays an important role in the assessment of TKA and can reveal abnormal changes in the periprosthetic bone that are occult on plain radiographs. However, severe metal artifacts caused by large TKA components obscure surrounding bone and soft tissue, limiting the utility of CT [3].

Over the past several decades, various metal artifact reduction (MAR) methods have been developed to address those artifacts in CT images. Projection-based MAR (PB-MAR) algorithms are the most commonly used form of MAR, and many previous studies have reported the effective reduction in metal artifacts using projection-based methods in various target regions [4,5,6,7]. Projection completion algorithms detect and segment the corrupted projection data that is affected by metal prostheses, and then modify the corrupted region by using various inpainting/interpolation methods [4, 5]. In recent years, virtual monochromatic imaging (VMI) using dual-energy CT (DECT) acquisition has gained more recognition. DECT can be an effective solution for artifact reduction because it can synthesize virtual monoenergetic images from two separate polychromatic spectra by linear energetic extrapolation [6,7,8]. VMI can effectively reduce beam hardening artifacts resulting from the polychromatic nature of the X-ray beam used in conventional CT [5].

Despite these technological advances, many problems remain in cases with large metal objects and multiple implants such as hip and knee prostheses [7]. Because each technique has its unique advantages and disadvantages, hybrid methods that combine several MAR techniques currently in use at different steps of image acquisition and reconstruction can result in a further reduction in metal artifacts [9]. Therefore, this study aimed to evaluate the effect of the combined use of VMI and PB-MAR algorithm on further artifact reduction and image quality in TKA patients.

Materials and methods

Our institutional review board approved this retrospective study with a waiver of informed consent.

Study population

Between March 2018 and October 2018, 164 patients had lower extremity CT examinations on a dual-layer spectral detector CT (SDCT) at our institution, and among them, 63 CT scans were performed using MAR protocols. Of these, 27 patients with previous knee surgery other than TKA were excluded who underwent high tibial osteotomy (n = 13), open reduction internal fixation (n = 6), or other orthopedic surgeries such as ligament repair (n = 8). Finally, 36 patients who had a previous history of TKA for bilateral or unilateral knee joints were included in this study. All patients were scheduled for CT scans for the evaluation of postoperative complications or preoperative planning for revision surgery or TKA of the contralateral joint. Our study population comprised 2 men (mean age, 54.0 ± 2.8 years; range, 52–56 years) and 34 women (mean age, 73.6 ± 7.4 years; range, 53–84 years). There were 15 patients with unilateral TKA (6 right TKAs and 9 left TKAs) and 21 patients with bilateral TKA, and a total of 57 knee joints were used for analysis.

Image acquisition

All CT examinations were performed with a dual-layer SDCT system (IQon Spectral CT, Philips Healthcare). The scanning parameters were as follows: detector configurations, 64 × 0.625 mm; tube voltage, 120 kVp; tube current, 78–93 reference mAs; pitch, 0.391; rotation time, 0.4 s; matrix, 512 × 512; slice thickness, 2 mm; and reconstruction increment, 2 mm. The reconstruction field of view (FOV) was 320 × 320 mm, and both knee joints were covered in the FOV. Conventional CT images were reconstructed using iDose4 (Philips Healthcare) level 2 with a sharp reconstruction kernel (YC filter). Spectral-based images (SBIs) were reconstructed using a dedicated spectral image reconstruction algorithm with a strength level of 2 (Spectral, Philips Healthcare). Then, VMI was reconstructed from SBIs at an energy level of 120 keV with reference to the optimal energy level in a previous study [10]. To investigate the combined effect of the PB-MAR (O-MAR, Philips Healthcare) and VMI on artifact reduction, four sets of images were reconstructed as follows: (1) conventional polychromatic images without a MAR algorithm (non-MAR), (2) polychromatic images with PB-MAR (PB-MAR), (3) VMI without PB-MAR (VMI), and (4) VMI with PB-MAR (VMI + PB-MAR).

Quantitative analysis

To investigate the effect of the severity of metal artifacts on the performance of the MAR protocol, metal artifacts were measured in two different areas for each MAR protocol. For each joint, metal artifacts were evaluated on the axial images at femoral epicondyles and tibial tuberosity levels, which represent the regions containing large and small cross-sectional areas of metal, that is, the femoral component and tibial stem, respectively (Fig. 1a). The area of the dark streak artifact was measured using the semiautomated method. Initially, an outline was created along the boundary of the knee joint using the automatic global thresholding function of ImageJ (ver 1.52; National Institutes of Health). In the region where the surface of the knee joint was indistinct due to severe streak artifacts, the outline was manually adjusted along the expected contour of the knee surface. Then, a region of interest (ROI) for low-attenuation streak artifacts was created to include all pixels with attenuation values less than or equal to the threshold value of − 200 Hounsfield units (HU) within the preworked outline (Fig. 1b). The threshold value was determined to best reflect low-attenuation streak artifacts through visual inspection gradually changing the threshold. The area, mean attenuation, and standard deviation (SD) of streak artifacts were calculated within the ROI. To assess the soft tissue contrast of each MAR protocol, two circular ROIs were placed within the muscle and adjacent subcutaneous fat and an additional ROI was located outside the patient for the measurement of background image noise. All ROIs were approximately the same size of 100 mm2 and located in areas with relatively fewer metal artifacts. The contrast-to-noise ratio (CNR) was calculated using the following formula: \( \mathrm{CNR}=\frac{\mathrm{mean}\ {\mathrm{HU}}_{\mathrm{muscle}}-\mathrm{mean}\ {\mathrm{HU}}_{\mathrm{fat}}}{{\mathrm{SD}}_{\mathrm{background}}} \). In the femoral region, the attenuation values of muscles and fats were contaminated by severe artifacts, and the CNR was not calculated. An artifact index (AI) was calculated as follows: \( \mathrm{AI}=\sqrt{{{\mathrm{SD}}_{\mathrm{artifact}}}^2-{{\mathrm{SD}}_{\mathrm{background}}}^2} \), where SDartifact and SDbackground are the SD of the streak artifacts and background, respectively.

Fig. 1
figure 1

a Metal artifacts were evaluated on axial images at the femoral epicondyle and tibial tuberosity levels, which represent the regions containing large and small cross-sectional areas of metal, respectively. b The area of the dark streak artifact was measured at a region of interest including all pixels with attenuation values less than or equal to the threshold value of − 200 HU within the boundary of the knee surface

Qualitative analysis

Qualitative analysis was performed by two board-certified musculoskeletal radiologists (M.S. and H.D.C., with 1 and 4 years of experience in musculoskeletal radiology, respectively). The same axial image data sets used for quantitative analysis were reviewed for qualitative analysis. Images were analyzed for the following three entities regarding the degree of overall metal artifacts and depiction of soft tissue and bony structure. We performed relative visual grading analysis to detect small differences in image quality more sensitively between different MAR protocols [11]. Two images were simultaneously presented to the reviewers in random order. The reviewers were blinded to the MAR protocol and allowed to adjust magnification and window settings as necessary. In total, the relative visual grading study comprised 342 comparisons; 3 comparisons between three different MAR protocols for 57 knee joints at the level of the femoral epicondyle and tibial tuberosity. The comparison of the two images was rated with a 5-point Likert scale: grade − 2, certainly worse in the second image; grade − 1, probably worse in the second image; grade 0, equivalent; grade + 1, probably better in the second image; and grade + 2, certainly better in the second image.

Statistical analysis

In this study, a linear model was not used because more than one joint was collected from a single patient, violating the assumption of independence [12]. To reflect the nested structure of the data, instead, we used linear mixed-effect models to estimate the differences in quantitative and qualitative measurements between different MAR protocols. We entered the MAR protocol as a fixed effect with random patient-specific slopes and random joint-specific intercepts. P values were calculated by likelihood-ratio tests of the full model against the null model without the fixed effect. After fitting the models, post hoc pairwise comparisons between different MAR protocols were performed by calculating the estimated marginal means followed by Tukey’s adjustment for multiple testing. To analyze relative visual grading results, we investigated whether grade scores were different from grade 0 (equality) using the linear mixed-effect model with random joint-specific intercepts and Bonferroni correction for multiple comparisons. The frequency distribution of responses was provided using stacked bar charts. Cohen’s weighted kappa statistics were used to evaluate interobserver agreement of the visual grading scores between the two radiologists.

Statistical analyses were performed using R statistical software (ver. 3.3.3; R Foundation for Statistical Computing), and we used the lme4 [13] and emmeans [14] packages for the linear mixed-effect model and estimated marginal means, respectively.

Results

Quantitative analysis

The results of the quantitative image analysis are summarized in Table 1 and Fig. 2. There were significant differences in all quantitative measurements including the area of streak artifacts, mean attenuation, AI, and CNR, among the four reconstruction methods (p < .001). In the femoral region where streak artifacts were severe, the VMI + PB-MAR protocol showed the best MAR performance in quantitative measures, including area, mean attenuation, and AI. In the pairwise comparison, the area of dark streak artifacts was the smallest when using the VMI + PB-MAR, with an estimated marginal mean of 985.5 mm2, followed by that when using the PB-MAR protocol (1691.8 mm2) (p < .001). Using VMI alone (2001.3 mm2) resulted in the least decrease in artifact area, with a reduction of only approximately 10%. When comparing the mean attenuation, the VMI + PB-MAR was the highest at − 350.4 HU, and PB-MAR showed the next highest value at − 431.1 HU (p < .001). Again, VMI (− 635.4 HU) showed the lowest mean CT numbers among the three MAR protocols, and only an approximately 13% improvement was observed compared to that for non-MAR images (− 731.6 HU). With regard to the AI, the value of VMI + PB-MAR (164.6 HU) was the lowest (p < .001), but the VMI protocol (300.8 HU) showed no significant difference compared to non-MAR (298.0 HU) (p = .751).

Table 1 Quantitative measurement of metal artifacts
Fig. 2
figure 2

Results of quantitative analysis. a The area of streak artifacts was the smallest when VMI and PB-MAR were used in combination. b The mean attenuation was highest when using the VMI + PB-MAR protocol in the femoral region and the highest with PB-MAR in the tibial region. c The artifact index was also the highest with the VMI + PB-MAR protocol in the femur and PB-MAR in the tibia. d The contrast-to-noise ratio was smaller in protocols using VMI (VMI and VMI + PB-MAR) than in protocols without VMI (non-MAR and PB-MAR). VMI, virtual monochromatic imaging; PB-MAR, metal artifact reduction algorithm for orthopedic implants

In the tibial region where metal artifacts were relatively less prominent, VMI + PB-MAR (56.3 mm2) also demonstrated the smallest area (p < .001), with an approximately 91% reduction in artifact area compared to that for the non-MAR protocol (616.9 mm2). However, there was no significant difference between the PB-MAR (234.0 mm2) and VMI (293.7 mm2) (p = .137). The PB-MAR protocol (− 267.2 HU) showed the highest mean attenuation (p < .001), but the mean attenuation of VMI (− 308.7 HU) was not significantly different from that of the VMI + PB-MAR protocol (− 286.9 HU) (p = .176). For the AI, all three MAR protocols showed a significant reduction in AI (non-MAR, 174.8 HU; PB-MAR, 100.8 HU; VMI, 109.7 HU; VMI + PB-MAR, 103.8 HU; p < .001), but there were no significant differences among the three protocols (p = .744). Finally, the MAR protocols with VMI (VMI and VMI + PB-MAR) showed significantly lower CNRs than the protocols without VMI (non-MAR and PB-MAR) in the tibial region (non-MAR, 8.50; PB-MAR, 8.70; VMI, 6.82; VMI + PB-MAR, 6.9; p < .001).

Qualitative analysis

The results of interobserver variability for the relative visual grading score showed good agreement for an overall artifact (κ = 0.75), depiction of soft tissue (κ = 0.62), and cortical bone (κ = 0.61) between the two radiologists. In the pairwise comparison of the three MAR protocols, VMI + PB-MAR showed fewer overall artifacts than did the other two protocols, and when comparing VMI and PB-MAR, PB-MAR was found to have fewer overall artifacts (p < .001). Next, soft tissue was best depicted in the VMI + PB-MAR protocol, followed by PB-MAR in both the femoral and tibial regions (p < .001). For the depiction of cortical bone in the distal femur, PB-MAR showed the best performance, followed by VMI + PB-MAR (p < .001). However, in the tibial stem region, where metal artifacts are relatively small, all three MAR protocols depicted cortical bones to a similar extent (Fig. 3). Representative cases are shown in Figs. 4, 5, and 6.

Fig. 3
figure 3

Results of qualitative analysis. VMI + PB-MAR showed fewer overall artifacts than did the other two protocols in the femoral and tibial regions (b, c). In the femoral region, where the metal artifacts were severe, soft tissue was best depicted in the VMI + PB-MAR protocol, followed by PB-MAR, and PB-MAR showed the best performance for the depiction of bone structure, followed by VMI + PB-MAR. As with the tibial region, when the metal artifacts were modest, the MAR protocol had relatively little effect on the depiction of bone structure. The relative visual grading scores are as follows: grade − 2, certainly worse in the second image; grade − 1, probably worse in the second image; grade 0, equivalent; grade + 1, probably better in the second image; and grade + 2, certainly better in the second image. VMI, virtual monochromatic imaging; PB-MAR, metal artifact reduction algorithm for orthopedic implants

Fig. 4
figure 4

Axial CT images of the right knee of a 76-year-old woman who underwent total knee arthroplasty surgery on both knee joints reconstructed with the (a) non-MAR, (b) PB-MAR, (c) VMI, and (d) VMI + PB-MAR protocols. All images are shown in the soft tissue window setting (window width = 400 HU, window level = 30 HU). If MAR was not used (a), severe metal artifacts due to large metals cause severe limitations in evaluation, and significant artifacts remained even when VMI was used alone (b). When VMI and PB-MAR were used in combination (d), metal artifacts were reduced the most. VMI, virtual monochromatic imaging; PB-MAR, projection-based metal artifact reduction

Fig. 5
figure 5

Axial CT images of a 77-year-old female patient who underwent total knee arthroplasty surgery of the left knee joint reconstructed with the (a) non-MAR, (b) PB-MAR, (c) VMI, and (d) VMI + PB-MAR protocols. All images are shown in the bone window setting (window width = 2000 HU, window level = 500 HU). If the metal artifacts were not severe, all three MAR protocols reduced artifacts sufficiently, but the metal-bone interface was most apparent for the VMI + PB-MAR protocol. VMI, virtual monochromatic imaging; PB-MAR, projection-based metal artifact reduction

Fig. 6
figure 6

Axial CT images of a 77-year-old female patient who underwent total knee arthroplasty surgery of the left knee joint reconstructed with the (a) non-MAR, (b) PB-MAR, (c) VMI, and (d) VMI + PB-MAR protocols. All images are shown in the bone window setting (window width = 2000 HU, window level = 500 HU). In the case of severe metal artifacts, PB-MAR showed the best performance for the depiction of bone structure (arrow), followed by VMI + PB-MAR protocols. VMI, virtual monochromatic imaging; PB-MAR, projection-based metal artifact reduction

Discussion

In our study, the combined use of VMI and PB-MAR resulted in the significantly improved reduction in metal artifacts compared to using them alone particularly in areas with severe artifacts. Using VMI alone showed only modest MAR performance when metal artifacts were severe. Where there were relatively small artifacts, all three MAR protocols showed favorable performance. In qualitative analysis, VMI + PB-MAR was best in terms of overall artifacts and soft tissue evaluation, and PB-MAR showed best performance in the depiction of cortical bone in the distal femur. However, in the case of VMI alone and the combination of VMI and PB-MAR, there were disadvantages of decreased CNR.

Although metal artifacts are caused by a combination of several mechanisms, two main causes of metal artifacts are beam hardening and photon starvation [15, 16]. DECT can primarily overcome beam hardening artifacts by using virtual high-monoenergetic images extrapolated from two polychromatic spectra [7, 17]. On the other hand, projection-based MAR algorithms are effective in reducing photon starvation artifacts [5, 18, 19]. Therefore, the combination of DECT in image acquisition and the MAR algorithm in postprocessing is expected to be more efficient for further artifact reduction [7, 9, 18]. In this study, the combined use of VMI and PB-MAR resulted in the smallest area of dark streaks, and in the tibial region, where artifacts were relatively modest, streak artifacts nearly disappeared, with an area reduction of 91%. However, the separate use of VMI showed unfavorable MAR performance compared to PB-MAR alone and the VMI + PB-MAR protocol in quantitative and visual grading analysis, especially in areas with severe artifacts. This result is consistent with the findings of previous works that DECT alone is not sufficient to remove the severe artifacts created by metallic devices with a larger size or higher atomic numbers [19,20,21]. The use of additional MAR algorithms may be required under severe photon starvation conditions, such as postoperative evaluation of hip and knee arthroplasty.

Several previous studies have shown that the combination of DECT and other projection-based MAR algorithms has an additional effect of further artifact reduction over the separate use of DECT. Lee et al found that a combination of gemstone spectral imaging and metal artifact reduction software (MARS, GE Healthcare) can more effectively reduce metal artifacts and improve the visualization of the periprosthetic region compared with what can be achieved without MARS [22]. Cha et al reported that DECT with VMI and MARS can reduce dental metallic artifacts with decreased AI in the head and neck regions [23]. However, many of the previous studies merely compared the combination of DECT and MAR software with those using only DECT or no MAR algorithm. However, DECT has a limitation on the reduction in severe artifacts caused by large implants, and the incremental gain of combining DECT and projection completion algorithms cannot be fairly evaluated by simple comparison with DECT. Long et al compared MAR performance in spine CT for iterative metal artifact reduction (iMAR, Siemens Healthineer), VMI, and a combination of iMAR and VMI using a second-generation dual-source DECT scanner, and in accordance with our study, the combination of VMI and iMAR showed the best performance of spine MAR [24]. However, in a phantom study by Long et al [4], hip and knee phantom images reconstructed using the projection-based method alone showed fewer metal artifacts in the visual assessment of radiologists than when using projection- and DE-based methods in combination. Although there are limitations to comparison with the results of phantom studies, more clinical studies are needed to confirm the utility of this hybrid MAR method in various body parts.

In this study, DECT-based protocols (VMI and VMI + PB-MAR) showed a reduced CNR compared to the projection-based protocol, and using the projection-completion method alone was most useful for the evaluation of bony structure in cases with severe artifacts. The reduction in the CNR can be explained by the higher photon energy used for VMI (120 keV) than the average energy of the polychromatic beam used in conventional CT (120 kVp), which is considered to be equivalent to 77 keV [25]. The problem of bone deletion has been reported as a pitfall of PB-MAR in several studies [26,27,28], and in a recent study by Laukamp et al [29], they concluded that PB-MAR is advantageous for the evaluation of soft tissue and that VMI was superior for the depiction of periprosthetic bone structure in patients with total hip replacement. However, in our study, when VMI + PB-MAR and PB-MAR were compared, the former was better in the evaluation of soft tissues, and PB-MAR alone was more useful in the evaluation of bone structure especially in cases with severe artifacts. Further studies are needed to confirm the clinical utility of this method in the diagnosis of postoperative complications in the bone and soft tissue, respectively.

This investigation had several limitations. First, we measured the area of dark streak artifacts below − 200 HU to evaluate the extent of the metal artifacts in this study. However, metal artifacts also have areas of bright streak artifacts, which also cause significant degradation in image quality. There were several phantom studies that measured the area of metal artifacts using threshold-based segmentation as in our study [15, 30]. In a phantom study, it is possible to selectively exclude metal occupation in the high attenuation region from the a priori knowledge of metal shape. However, in clinical images, it is difficult to precisely exclude metal in the high attenuation region because the exact size of the metal is unknown and the bony structure coexists with artifacts. For an accurate evaluation of all areas of dark and bright streak artifacts, further studies using more sophisticated segmentation techniques are needed. Second, only VMI using DECT with a dual-layer spectral detector was evaluated in our study. It is necessary to compare with other DECT systems using dual-source or fast kVp-switching techniques and ultimately with photon-counting detector CT.

In conclusion, the combined use of VMI and PB-MAR showed better MAR performance than did PB-MAR or VMI alone. In the cases with severe metal artifacts, the VMI + PB-MAR and PB-MAR protocols were useful for the evaluation of soft tissue and bone structure, respectively.