Introduction

Periprosthetic joint infection (PJI) remains a challenging clinical complication following total knee arthroplasty (TKA). It is associated with substantial patient morbidity and mortality as well as an increased economic burden to the healthcare system [19, 22]. The burden of PJI on patient’s health increases with each episode of recurrence [8]. Therefore, the preoperative identification of patients at risk for periprosthetic joint reinfection following revision TKA has the potential to assist in managing patient expectations and preoperative counseling [17].

Prior literature identified numerous risk factors for the failure of the different PJI treatment strategies [1, 18]. As these retrospective risk factor analyses did not quantify the weight of each risk factor for the recurrence of PJI following revision surgery, a preoperative risk calculator for recurrent infection following revision TKA was developed by Klemt et al. [20]. However, risk calculators are cumbersome to use in clinical practice and additionally demonstrate limited accuracy compared to novel approaches utilizing artificial intelligence (AI) [15].

Artificial intelligence algorithms including machine learning (ML) methods analyze large and complex datasets within seconds and with very high accuracy [11]. To date, machine-learning algorithms were used in a variety of non-orthopedic studies to assist clinical decision-making [7, 24]. Recently, machine-learning models were also successfully applied in orthopedics to predict clinical outcomes and complications including length of stay and discharge disposition [29]. However, to the best of our knowledge, there is a paucity of studies to predict PJI following hip and knee arthroplasty surgery. There has been only one prior study utilizing machine-learning algorithms for the identification of PJI in total joint arthroplasty (TJA) patients [28]. However, this prior study did not attempt to predict recurrent infections, despite the clinical significance of recurrent PJI in terms of increased morbidity and mortality [21]. Therefore, this study aimed to predict recurrent infections in patients following revision total knee arthroplasty for periprosthetic joint infection. The authors hypothesize that artificial neural networks can accurately predict recurrent infections in patients following revision TKA for PJI.

Methods

Patient data

Upon obtaining approval from the Institutional Review Board (IRB), a retrospective review of 618 revision total knee arthroplasty procedures for periprosthetic joint infection was performed at a tertiary academic center. The patient cohort included 165 patients with confirmed recurrent periprosthetic knee joint infection. The eradication of PJI following revision TKA was based on postoperative microbiology as well as Musculoskeletal Infection Society (MSIS) criteria [26]. All patients had a minimum follow-up of 3 years. Exclusion criteria for the present study were (1) prior revision surgery, and (2) missing or incomplete data.

The average follow-up time for the patient cohort was 6.6 ± 2.2 years. Patient demographics and surgical variables for the revision TKA patient cohort are summarized in Table 1. In terms of causative pathogens, 98 infections at revision surgery were due to Staphylococcus aureus, 94 infections at revision surgery were due to negative culture, 84 infections at revision surgery were due to Streptococcus species, 72 infections at revision surgery were due to mixed growth, 61 infections at revision surgery were due to Staphylococcus species and 31 infections at revision surgery were due to Methicillin-resistant Staphylococcus aureus (MRSA). Re-revision surgery was performed for negative cultures in 26 patients, Staphylococcus aureus in 24 patients, Streptococcus species in 21 patients and mixed growth in 17 patients. Of the 165 patients with recurrent infection, 29 (17%) of these infections were due to the same organism that was present at revision TKA for PJI.

Table 1 Patient cohort characteristics

The diagnosis of PJI at revision TKA was based on the criteria of the musculoskeletal infection society (MSIS) [25]. A PJI was present if a sinus tract communicating with the TKA was observed, or if 4 minor criteria existed: (1) elevated serum inflammatory makers erythrocyte sedimentation rate (ESR; ≥ 30 mm/h) or C-reactive protein (CRP; ≥ 10 mg/L), (2) elevated synovial white blood cell (WBC) count (≥ 3000 WBC/μL), (3) elevated synovial neutrophil percentage (PMN%; ≥ 80%), (4) presence of purulence in the joint space, or (5) more than five neutrophils per high-power field (HPF) observed during histopathologic analysis. Revision TKA surgery was performed using irrigation and debridement without modular component exchange in 23 patients, irrigation and debridement with modular component exchange in 121 patients, single-stage revision in 186 patients and two-stage revision in 288 patients. In agreement with previous studies, the general indications for two-stage revision TKA included the ability to tolerate two surgeries on separate occasions, controlled medical comorbidities as well as systemic conditions [13], in addition to patients with poor bone stock and soft tissues [14]. The antibiotic treatment protocol for PJI was determined in collaboration with Infectious Diseases specialists. Medical therapy was initiated in all PJI cases using broad-spectrum antibiotics after intra-operative samples were taken. Empirical antibiotic therapy was continued when the definitive tissue sample cultures yielded no pathogen growth.

The diagnosis of PJI at re-revision TKA was also based on the criteria of the musculoskeletal infection society (MSIS) [25]. All patients were treated with two-stage revision TKA to treat recurrent PJI.

Clinical variables

Using our institution’s electronic medical record system for patient chart review, patient data with respect to potential risk factors for recurrent PJI were collected [23]. These risk factors include patient demographics, medical comorbidities, surgical factors, preoperative laboratory findings and microbiology (Table 2). The surgical factors included the number of surgical interventions prior to revision surgery for PJI and the type of revision surgery that was previously performed to treat the PJI during index revision TKA.

Table 2 Potential risk factors for recurrent periprosthetic joint infection

Machine learning model development

For the classification analysis, we employed three state-of-the-art supervised machine-learning methods: (1) artificial neural networks (ANN), (2) stochastic gradient boosting (SGB), and (3) elastic-net penalized logistic regression (ENP). The three candidate models were chosen based on prior studies demonstrating the potency of these methods to accurately predict functional and clinical outcomes of hip and knee total joint arthroplasty patients [28]. To investigate the ability of the machine-learning models to predict recurrent periprosthetic joint infection, an 80:20 test-train split was used: 80% (494 TKAs) of data randomly selected and utilized to train the algorithms and 20% (124 TKAs) of data was utilized for internal validation and testing [10]. The subset of all potential risk factors that was included for final modeling of revision TKA patients was selected utilizing a recursive feature elimination approach with random forest algorithms [9]. A five-fold cross validation was performed and repeated five times to assess each algorithm’s ability to generalize previously unseen data. In brief, cross validation involves randomly partitioning the data sample into complementary subsets, performing the algorithm development on one subset—the training set—and validating the analysis on the other subset—the testing set. To reduce variability, the cross-validation procedure is repeated for multiple rounds, and validation results are averaged over the rounds to generate final estimates of the algorithm’s performance [28]. We applied a coarse-grained grid-search algorithm with repeated random sub-sampling to tune each algorithm’s hyperparameters during the training phase of each cross-validation round (ANN: number of hidden layer nodes; SGB: number of trees and boosting parameter; ENP: mixing parameter α (Ridge regularization α = 0; Lasso regularization α = 0) and regularization penalty λ). The grid-search algorithm was constrained to pre-defined lower bounds, upper bounds, and step sizes for each hyperparameter.

Model assessment was performed through discrimination, calibration and Brier score [10]. The area under the receiver operating curve (AUC) was used for model discrimination. Perfect candidate models have an AUC of 1, while an AUC of greater than 0.8 is considered as excellent [20]. A calibration plot was used to examine the overall predictive performance by plotting the observed and expected risk deciles and fitting a non-parametric LOESS smoothing [6]. The calibration intercept, used to assess whether the predictive model overestimates (< 0) or underestimates (> 0) risks, and slope, used to assess the general spread of estimated risks, were additionally evaluated for each predictive model. Perfect models have a calibration intercept of 0 and a calibration slope of 1 [6]. The Brier score, which represents the mean squared difference between the predicted probabilities and the observed outcomes, was used as a measure of overall performance for each algorithm. Perfect machine-learning candidate models demonstrate a Brier score error of 0 (Fig. 1).

Fig. 1
figure 1

Five-fold cross validation

Statistical analysis

A decision curve was created by plotting the net benefit across a range of threshold probabilities; thus a user can then determine what threshold best suits individual and management needs and simultaneously assess the predicted net benefit of using the model at that particular threshold [32]. Interpretability and explanation of revision TKA machine-learning candidate models was performed at local and global levels. Local explanations were provided for individual patients to demonstrate which variables for specific patients in question contributed to the model predictions. Global explanations were provided through variable importance plots (normalized to 100 points). All statistical analysis was performed using Matlab (MathWorks Inc., Natick, MA, USA), SPSS (SPSS Version 18.0, IBM Corp., Armonk, NY, USA), Anaconda (Anaconda Inc., Austin, TX, USA) and Python (Python Software Foundation, Wilmington, DE, USA).

Results

Model parameters were optimized using a coarse-grained grid-search algorithm with repeated random sub-sampling validation. The optimal ANN had two hidden layers with 20 neurons each. The optimal SGB consisted of 120 trees, with the number of predictors for each node set to default. The optimal SGB boosting learning rate was 0.3 with a sub-sampling coefficient of 0.85. The optimal ENP used a mixing parameter α = 0.4 and a regularization penalty term of λ = 0.6.

In the training set, the AUC for the candidate models ranged from 0.79 for random forest to 0.85 for neural networks (Table 3). The calibration intercept ranged from − 0.10 to 0.13, with the best intercept for neural networks (intercept of 0.06; Table 3; Fig. 2). The lowest Brier score error was achieved by neural networks (Brier score of 0.052). In the testing set of revision TKA patients, all machine-learning models demonstrated excellent discriminatory capabilities. The AUC of the three candidate models ranged from 0.81 to 0.84 (Table 4), with the highest AUC for neural networks (AUC = 0.84). The Brier score errors in the testing set varied between 0.053 and 0.056. The lowest Brier score error was achieved by neural networks (Brier score of 0.053, Table 4).

Table 3 Discrimination and calibration of machine learning algorithms on training set for revision TKA patients
Fig. 2
figure 2

Calibration plot for the neural network model for the prediction of recurrent infections in patients following revision total knee arthroplasty for periprosthetic joint infection

Table 4 Discrimination and calibration of machine-learning algorithms on testing set for revision TKA patients

The variables determined to be significantly associated with recurrent infections in patients following revision total knee arthroplasty for periprosthetic joint infection were as follows: previous irrigation and debridement with or without modular component exchange during revision surgery; > 4 prior open surgeries, metastatic disease, drug abuse, HIV/AIDS, extremity status 3, presence of enterococcus species, obesity, renal failure, diabetes, depression, alcohol abuse, male gender, age, smoking, Medicare insurance and the presence of Methicillin-resistant Staphylococcus aureus (MRSA) (Fig. 2). The strongest predictors were previous irrigation and debridement with or without modular component exchange and > 4 prior open surgeries (Fig. 3).

Fig. 3
figure 3

Global variable importance plot for the prediction of recurrent infections in patients following revision total knee arthroplasty for periprosthetic joint infection

Decision curve analysis showed that all machine-learning models achieved higher net benefits for the prediction of recurrent PJI, when compared to the default strategies of changing management for all patients or no patients (Fig. 4). An example of a local, individual patient-level explanation for the model predictions by neural networks is shown in Fig. 5. For a 68 year old male patient (> 4 prior open surgeries, Medicare insurance, smoker) who underwent irrigation and debridement with modular component for an infected TKA with the presence of Enterococcus, the predicted probability of recurrent PJI is 37.3% (Fig. 5).

Fig. 4
figure 4

Decision curve analysis for revision TKA patients using neural network models showing the net benefit of the neural network model (yellow) relative to the default strategies of changing management for all patients (blue) or for no patients (red)

Fig. 5
figure 5

Example of individual patient-specific explanation generated by the neural network model for a revision TKA patient. Green bars demonstrate an increase in the probability of recurrent infection, whereas red bars represent a decrease in the probability of recurrent infection

Discussion

The most pertinent finding of the present study was that the presented machine-learning models demonstrated excellent performance on discrimination, calibration and decision curve analysis for the prediction of recurrent infections in patients following revision total knee arthroplasty for PJI. The strongest predictors for recurrent PJI were previous irrigation and debridement with or without modular component exchange as well as prior open surgeries.

Prior studies evaluated the utility of risk calculators to estimate the probability of postoperative PJI for patients following hip and knee arthroplasty surgery [4, 5, 20]. Although risk calculators have the potential to assist in clinical decision-making as these numerical models take into account numerous risk factors to predict the likelihood of postoperative PJI, these prior works reported a limited accuracy for the developed risk calculators. Bozic et al. developed a PJI risk calculator for patients following primary TJA utilizing the Medicare claims database, reporting an AUC of 0.78 on validation [5]. In a different study, Bilimoria et al. created a PJI risk calculator using the American College of Surgeons National Surgical Quality Improvement Program database, demonstrating an AUC of 0.76 on validation for the prediction of PJI following primary hip and knee total joint arthroplasty [4]. There is only a single risk calculator for the prediction of recurrent PJI, which achieved an AUC of 0.75 on model validation [20]. Despite the theoretical utility of these risk calculators, its application in clinical practice has been limited to date, mainly due its limited accuracy and cumbersome nature. In contrast, machine-learning models possess the ability to analyze large and complex datasets with high accuracy and within seconds. This is achieved through an efficient and automated analysis of complex and non-linear relationships between numerous patients and surgical variables [11]. This present study represents one of the first approaches to predict recurrent infections in patients following revision total knee arthroplasty for periprosthetic joint infection utilizing multiple machine-learning models. The performance of the three candidate models on internal validation was verified through a rigorous evaluation in concordance with the Transparent Reporting of multivariable prediction models for Individual Prognosis or Diagnosis statement [16]. The study findings demonstrate high accuracy for all candidate models, in particular for artificial neural network models with an AUC of 0.84. Furthermore, excellent model discrimination was achieved, while performance was also verified on calibration and decision curve analysis. This is essential for clinical utility [30], and an aspect that was frequently not addressed in prior studies using machine learning for the prediction of clinical outcomes in patients following total joint arthroplasty [12], thereby highlighting a technical strength of the present study.

The findings of this present study demonstrate that surgical variables (previous irrigation and debridement with or without modular component exchange; > 4 prior open surgeries) and microbiology (Enterococcus, MRSA) were strong predictors for recurrent infections in patients following revision TKA for PJI. Similar observations were made in previous non-machine learning, retrospective studies [20, 31]. Shohat et al. demonstrated increased failure rates for DAIR patients, when compared to patients treated with either single or two-stage revision TJA, in a retrospective with 199 patients following revision TJA [27]. With regards to the number of prior open surgeries, previous retrospective studies have illustrated that numerous prior open surgeries are associated with an increased risk of failure following surgical treatment for PJI [31]. In terms of microbiology, Enterococcal PJI was identified in prior literature as a risk factor for treatment failure following hip and knee arthroplasty surgery for PJI [20, 31]. In addition, the risk calculator for recurrent PJI following revision TJA for PJI as developed by Klemt et al. highlighted Enterococcal PJI as one of the more significant risk factors for recurrent PJI [20]. In comparison to the risk calculator for recurrent PJI as developed by Klemt et al. [20], the current study also identified MRSA as a risk factor for recurrent PJI. This discrepancy may be due to the increased accuracy for data analysis as provided by machine-learning algorithms, when compared to logistic regression models. This improved accuracy of machine-learning technologies is based on its ability to identify complex and non-linear relationships between multiple clinical parameters, even in noisy data and data with missing information [11].

In addition to surgical factors and microbiology, this present study identified numerous patient factors to be strongly associated with a high risk of recurrent infection following revision TKA for PJI. Similar to previous retrospective studies, metastatic disease, HIV/AIDS and drug abuse were among the strongest predictors for recurrent PJI [20, 31]. Due to the increased risk, the International Consensus Meeting on PJI advised that drug abusers should not be offered TJA [2]. This recommendation is well supported by the findings of this present machine-learning study, illustrating that obesity is a significant risk factor for recurrent PJI.

The study findings need to be interpreted in light of its limitations. Firstly, this study utilizes a retrospective study design, which is associated with inherent limitations including recall and reporting bias, potentially leading to reduced capture rates [3]. Furthermore, the retrospective study design may have introduced management and treatment-related biases in the course of the study period. Additionally, all patients were from a single large tertiary referral center which limits the generalizability of the machine-learning models in clinical practice as patient populations may differ between our institution and for instance community hospitals. An external validation using independent populations, ideally from numerous patient cohorts across the country, has the potential to increase clinical applicability of the presented machine-learning models. Secondly, most of the risk factors were binary and thus this study did not evaluate the effect of disease severity, providing an opportunity for future work to assess the effect of disease severity on the risk of recurrent PJI following revision TKA. Finally, to account for the effect of time between revision and re-revision surgery on reinfection rates, this present study only considered patients with a minimum follow-up of 5 years. However, we acknowledge that a larger percentage of recurrent infections and re-revisions may occur with longer follow-up time. Therefore, future studies with long-term follow-up may be needed.

In conclusion, this study developed and validated three machine-learning models for the prediction of recurrent infections in patients following revision total knee arthroplasty for periprosthetic joint infection. The strongest predictors were previous irrigation and debridement with or without modular component exchange and prior open surgeries. The study findings illustrate excellent performance on discrimination, calibration and decision curve analysis for all machine-learning models, highlighting the potential of these computational tools in quantifying increased risks of recurrent periprosthetic joint infection to optimize patient outcomes.