Introduction

Despite advancements in total hip arthroplasty (THA) and the increased utilization of tranexamic acid (TXA) [1], acute blood loss anemia necessitating allogeneic blood transfusion persists as a post-operative complication [2]. THA remains a major source of blood transfusion burden [3]. Rates of blood transfusion vary widely by hospital system largely due to the inexact recommendations and lack of consensus regarding appropriate transfusion thresholds [4]. However, the rates of blood transfusion following primary total hip arthroplasty (THA) have been estimated to be as high as 9% [4].

Propelled by the removal of THA from the Center for Medicare and Medicaid Service (CMS) inpatient only list, an increasing percentage of primary THA will be done at outpatient centers including ambulatory surgery centers [5]. As opposed to hospitals with large blood banks, outpatient centers are less equipped to deal with significant blood loss necessitating transfusions [6, 7]. While transfusion is an infrequent event in primary THA, previous studies have identified it as a reason why patients fail to be discharged on the day of surgery [8]. Additionally, it represents the most significant predictor of an extended length of stay [9]. Pre-operative identification of those patients at high risk for transfusion would allow surgeons to direct these patients to the hospital setting or coordinate with the blood bank at the ambulatory surgery centers to have blood available [6]. Therefore, prior works have identified numerous risk factors for needing a blood transfusion following THA [10,11,12]. These studies identified patient and surgical variables that increase patient’s odds of requiring blood transfusion following THA; nonetheless, these prior works did not to evaluate the weight for each of these risk factors with regards to the risk of blood transfusion.

Recently, there has been an increase in the use of machine learning and artificial neural networks as predictive models within hip and knee arthroplasty [13]. Machine learning models are able to effectively analyze non-linear relationships between risk factors in a large dataset and achieve accuracies for risk factor predictions that outperform previously used statistical methods [14]. The promise of accurate and reliable predictive modeling has invaluable potential in helping with risk stratification, pre-operative planning, and optimization [15]. However, there is a paucity of studies on the application of machine learning technology to predict transfusion rates in arthroplasty. Therefore, this study aimed to develop and validate novel machine learning models for the prediction of transfusion rates following primary total hip arthroplasty.

Methods

Database

Upon obtaining approval from the Institutional Review Board (IRB), we performed a retrospective review of 7411 primary total hip arthroplasty procedures at a single tertiary referral center between 2016 and 2019. THA patients with the following criteria were excluded [16]: (1) simultaneous bilateral surgery, (2) partial hip joint arthroplasty, (3) malignancy, (4) incomplete data, and (5) less than 2 years of follow-up. After exclusions, a total of 7265 primary total hip arthroplasty patients remained for evaluation.

Measures

Our main aim was to develop an accurate predictive model for transfusion rates for patients following primary THA. Even so, in selecting predictors, we incorporated consideration of cause-and-effect relationships. Therefore, we performed a variable selection for candidate predictors in stages—patient factors first, followed by surgical variables. In concordance with these classic principles of causal modeling, we retained and in the final model controlled for predictors chosen in the first stage.

Using our institution’s electronic medical record system for patient chart review, patient and surgical variables associated with transfusion rates in THA were identified [2, 17]. Demographic variables included age, gender, body mass index (BMI), ethnicity, marital status, insurance status, American Society of Anesthesiologist Physical Status score (ASA score), medical comorbidities, as well as Charlson comorbidity index (CCI). With regards to smoking and drinking as medical comorbidities, patients that were classified as smokers or drinkers were actively consuming cigarettes or alcohol when admitted to hospital prior to primary THA. Surgical variables included for analysis were laterality, anesthesia type, tranexamic acid usage (1 mg intravenously at beginning of surgery and additional 1 mg at the time of closing), component fixation method (cemented vs non-cemented), surgical approach (anterolateral vs posterior), transfusion rates, and operation time.

Predictive models

For the classification analysis, we employed four state-of-the-art supervised machine learning methods: (1) artificial neural networks (ANN), (2) stochastic gradient boosting (SGB), (3) support vector machines (SVM), and (4) elastic-net penalized logistic regression (ENP). These machine learning methods were selected based on prior work showing the potency of these methods to accurately predict clinical outcomes of patients following hip and knee arthroplasty surgery [18]. To investigate the ability of these machine learning methods to understand the aforementioned outcome, we used an 80:20 test-train split: 80% (5812 THAs) of data randomly selected and utilized to train the algorithms and 20% (1453 THAs) of data utilized for internal validation and testing. For training the different machine learning models, a fivefold cross-validation repeated was repeated five times to assess each algorithm’s ability to generalize previously unseen data.

Machine learning candidate model discrimination was performed through the use of the area under the receiver-operating curve (AUC). Excellent candidate models exceed an AUC of 0.8. A calibration plot was utilized to ascertain machine learning candidate model calibration. The Brier score was used to assess the overall model performance in concordance with prior literature (perfect candidate models have Brier score of 0). Machine learning model interpretability and explanation was provided at the global and local levels.

Statistical analysis

All data analysis was performed using Matlab (MathWorks Inc., Natick, MA, USA), Anaconda (Anaconda Inc., Austin, TX, USA), Python (Python Software Foundation, Wilmington, DE, USA), and SPSS (SPSS Version 18.0, IBM Corp., Armonk, NY, USA).

Results

Overall, a total of 7265 patients underwent primary total hip arthroplasty. Blood transfusions were observed in 703 patients (9%), including 557 patients in the training set and 146 patients in the testing set. Patient demographics and surgical variables are summarized in Table 1. Variables identified for machine learning development through recursive feature selection were tranexamic acid usage (p < 0.001), bleeding disorders (p < 0.001), pre-operative hematocrit (< 33%; p < 0.01), pre-operative platelet count (< 109/L; p < 0.01), anesthesia type (p < 0.01), diabetes (p = 0.01), and gender (p = 0.02; Fig. 1). The strongest predictors for transfusion rates in primary total hip arthroplasty include tranexamic acid usage, bleeding disorders, and pre-operative hematocrit (Fig. 1).

Table 1 Patient cohort characteristics
Fig. 1
figure 1

Global variable importance plot for the prediction of transfusion rates following primary total hip arthroplasty

Machine learning model performance in the independent testing set resulted in AUC values ranging from 0.78 to 0.82 (Table 2; Table 3). The calibration intercept ranged from 0.10 to 0.26 (Table 2; Table 3; Fig. 2). The Brier score errors in the testing set varied between 0.052 and 0.056. The best model performance was achieved by neural networks with an AUC of 0.82, calibration slope of 0.10, calibration intercept of 1.12, and Brier score error of 0.052 (Table 2; Table 3). The machine learning models provided a prediction for blood loss that was within 15 ml of the actual blood loss observed during primary THA in 97.8% of the cases.

Table 2 Discrimination and calibration of machine learning algorithms on training set for THA patients
Table 3 Discrimination and calibration of machine learning algorithms on testing set for THA patients
Fig. 2
figure 2

Calibration plot for the neural network model for the prediction of transfusion rates following primary total hip arthroplasty

The decision curve analysis demonstrated that the four machine learning candidate models all achieved higher net benefits for the prediction of intraoperative blood loss and transfusion rates, when compared to the default strategies of changing management for all patients or no patients. Using the artificial neural network algorithm, local explanations of predictions for transfusion rates in primary THA patients were performed to assess the fidelity of the artificial neural network model (Fig. 3). For a female patient with a prior history of diabetes, elevated pre-operative hematocrit (36%), who underwent primary THA without tranexamic acid usage, the predicted probability for the need of blood transfusion is 21.1%.

Fig. 3
figure 3

Example of individual patient-specific explanation generated by the neural network model for a primary THA patient. Green bars demonstrate an increase in the probability of blood transfusions, whereas orange bars represent a decrease in the probability of blood transfusions

Discussion

The ability to accurately predict those patients at highest risk for perioperative blood product transfusion has potential to improve clinical outcomes and efficiency, and reduce cost. In this study, machine learning models were developed and validated to accurately predict the need for blood transfusion following primary total hip arthroplasty. Among the variables included in the model, tranexamic acid, and a history of a bleeding disorder provided the strongest predictive value within the model. Charges for type and screen and type and cross have been reported to be between $37 and $139.86 with additional charges for cross-matching additional units of blood [17]. In one large academic US-based hospital system, orthopedics was reported to be associated with the highest ratio of type and screen and type and cross tests ordered relative to the number of transfusions administered, thus representing a cost inefficiency [9]. Recent publications have reported that routine blood typing is unnecessary for primary total hip and knee arthroplasty [16]. Recommendations on blood preparation for revision total hip and knee arthroplasty is more limited with only one study providing such recommendations [17]. Previous efforts to refine ordering for blood products have demonstrated efficacy in reducing costs [2]. The current machine learning models represent an accurate method for predicting patient-specific blood transfusion and thus improving blood typing and management which represents a potential cost saving initiative. The clinical utility of the machine learning methods is based on its high accuracy (predictions of blood loss are within 15 ml of actual blood loss during primary THA in 97.8% of patients) as well as the ability to provide predictions for blood transfusion in primary THA within seconds [19, 20], which distinguishes machine learning models from prior retrospective studies utilizing the conventional statistical methods including multivariate regression analysis [9, 21]. Additionally, when compared to prior retrospective studies utilizing the conventional statistical methods to identify risk factors for blood transfusion in primary THA, machine learning models possess the unique ability to quantify the weight that each risk factor has on the probability for blood transfusions in primary THA [19, 20]. Therefore, machine learning models represent a computational tool that is capable of predicting the need for blood transfusions in primary THA on a patient-by-patient basis, an aspect that cannot be achieved from the research findings presented by prior retrospective studies.

The presented machine learning models incorporated many variables that have previously been identified as risk factors for blood transfusions. Age [22,23,24], pre-operative hematocrit [22, 24], and gender [17, 23, 24] have previously been identified as risk factors for post-operative transfusion following primary total joint arthroplasty. Low pre-operative hematocrit has often been cited as the most readily modifiable risk factor to prevent transfusion [22, 24]. Enrolling patients in dedicated blood management programs to optimize anemia prior to arthroplasty has resulted in reductions in transfusions and post-operative complications [25]. The machine learning models identified tranexamic acid and a past history of bleeding disorder to be the strongest predictors of post-operative blood transfusion. The widespread use of tranexamic acid has changed blood management strategies and is largely responsible for the sharp decrease in transfusion rates following total hip and knee arthroplasty over the past 20 years [26,27,28,29]. Reductions in blood loss, post-operative anemia, and transfusion rates have also been demonstrated in revision total hip and knee arthroplasty [30, 31]. Many authors have investigated the impact of different dosing regimen and routes of administration only to find that they are nearly equal in their blood-sparing properties and consistently outperform placebo controls [32,33,34,35].

Despite the agreement for risk factors for blood transfusions in primary THA between the present machine learning study and prior retrospective work [21, 36, 37], there are differences in research findings with regards to the threshold for pre-operative hematocrit. The threshold for low pre-operative hematocrit (< 36%) as identified in this present machine learning study is higher than that of previous retrospective studies (< 30%). This may be due to the increased accuracy of machine learning methods in the analysis of large and complex datasets, with machine learning models possessing great strength in the identification of complex and non-linear relationships between numerous clinical parameters [19].

The findings of the current investigation should be interpreted within the context of its limitations and strengths. While the algorithm was developed based on data collected from a sizeable cohort of patients, their predictive capabilities may not be entirely generalizable. External validation of the algorithm using independent populations has the potential to increase clinical applicability. Another limitation inherent to machine learning is the “black box” nature of the algorithm in which associations between variables are not explicitly known. However, a notable strength of the current machine learning model is its accuracy, despite the relatively limited number of THA patients and thus administered blood transfusions following THA. The AUC exceeded 0.8 for neural networks which is the threshold for excellent model performance. Machine learning has demonstrated excellent performance in predicting blood transfusions in the general surgical population and in those undergoing total knee arthroplasty [4, 17]. Third, although there were no institutional changes in the protocol for blood transfusions over the study period, due to the retrospective nature of this study, it remains unclear how much weight and emphasize each operating surgeon placed on each patient and surgical factor for selecting blood transfusions during primary THA surgery. However, this represents a common limitation in retrospective studies on this topic [2, 17]. Fourth, this is a retrospective study and thus is subjected to all inherent limitations of retrospective study designs including reporting and recall bias. Finally, most of the risk factors were binary, and thus, this study did not evaluate the effect of disease severity. However, this represents a common limitation of machine learning studies [38, 39].

In summary, the current study developed and validated machine learning models to accurately predict patient-specific blood transfusion following primary THA. The results represent a novel application of machine learning and has the potential to improve outcomes and pre-operative planning while also reducing costs. Further investigations are needed to validate this model to expand its generalizability.