Introduction

With the continued growth in utilization of total knee arthroplasty (TKA), healthcare workers and administrations have begun to scrutinize operating room (OR) efficiency in total knee arthroplasty [1,2,3]. Allocation of OR time and appropriate scheduling of case length are parameters closely associated with OR efficiency [4], which is vital for delivering efficient and cost-effective care to arthroplasty patients. The accurate prediction of surgical operative time in primary TKA, which tends to be a more predictable and reproducible surgical procedure, may be an ideal starting point when analyzing and attempting to improve OR efficiency.

Historically, the preoperative estimation of individual surgical operative time has been done based on surgeon’s estimates, or by an electronic medical record (EMR) based system which averaged previous case durations. However, prior studies demonstrated that the accuracy of surgeon’s estimates as well as EMR scheduling system lack accurate predictions of surgical operative time due to variations in preoperative data [5, 6]. Determinants for surgical operative time are based upon multiple perioperative factors, which may be beyond the abilities of previous estimation models. These limitations may be circumvented by estimating surgical operative time using machine learning (ML) algorithms. Recently, a number of studies have demonstrated the feasibility of using ML models to improve OR planning and efficiency through the accurate prediction of case duration in non-arthroplasty patients [7]. Therefore, this study aimed to develop and validate machine learning models to predict surgical operative time for patients undergoing primary total knee arthroplasty.

Materials and methods

Patient cohort

With Institutional Review Board (IRB) approval, we identified 10,089 consecutive primary TKA procedures that were performed at a single tertiary institution. Patients with missing perioperative data were excluded as were patients with simultaneous bilateral TKA surgery and less than 2 years of follow-up. A total of 10,021 primary TKA patients remained for the development and validation of machine learning algorithms. Surgical operative time was defined as the time from first incision to completion of wound closure. This duration was selected due to its medical and economic importance in concordance with prior literature [8]. Surgical operative time did not include anesthesia time nor the OR room turn over time between cases.

Variables

Electronic medical records were used to manually review patient and procedural variables associated with prolonged surgical operative time [9, 10]. Collected patient data included: (1) age, (2) gender, (3) body mass index (BMI), (4) ethnicity, (5) American Society of Anesthesiologist Physical Status score (ASA score), (6) medical comorbidities and (7) Charlson comorbidity index (CCI; Table 1). Procedural variables included for analysis involved: (1) indication for primary TKA (post-traumatic vs primary osteoarthritis), (2) anesthesia type, (3) tranexamic acid usage, (4) component fixation method (cemented vs non-cemented), (5) tourniquet use as well as (6) tourniquet time and pressure, (7) implant type and (8) prior knee surgeries.

Table 1 Baseline characteristics of study population

Model development

We developed three supervised machine learning algorithms in concordance with prior literature [11,12,13]: (1) artificial neural networks (ANN), (2) random forests (RF), and (3) k-nearest neighbors (KNN). The TKA dataset as shown in Table 1 was randomly divided into 2 datasets using an 80:20 split ratio [14, 15]: a dataset for the training of machine learning models (8,016 TKAs) and a dataset to test machine learning model performance (2,005 TKAs). A recursive feature elimination technique (popular technique to select feature most relevant in predicting the target variable) was utilized to determine patient and surgical factors for final modeling [16, 17]. A fivefold cross-validation was repeated 5 times to develop and assess all candidate models in the training set. We applied a grid-search algorithm to determine each algorithm’s hyperparameters during the training phase [18,19,20]: (1) ANN: number of hidden layer nodes; (2) RF: number of trees and boosting parameter; (3) KNN: mixing parameter α (Ridge regularization α = 0; Lasso regularization α = 0) and number of nearest neighbors.

Model accuracy was defined using the area under the receiver operating curve (AUC) [21]. Machine learning models no better than chance have an AUC of 0.5, with perfect candidate models demonstrating an AUC of 1 [22]. Machine learning model calibration was performed using a calibration plot. The Brier score was used to assess overall model performance [23]. The Brier score, defined as mean squared difference between predicted probabilities and observed frequencies of events in a given population, is 0 for perfect candidate models [24].

Statistical analysis

All data analysis was performed using Matlab (MathWorks Inc., Natick, MA, USA), Anaconda (Anaconda Inc., Austin, TX, USA) and Python (Python Software Foundation, Wilmington, DE, USA)[25].

Results

A total of 10,021 patients underwent primary total knee joint arthroplasty. The mean age of the patient cohort is 74.2 ± 22.7 years and the mean body mass index is 32.3 ± 6.4 kg/m2. The average follow-up time was 2.8 ± 1.1 years. Patient demographics and surgical variables for TKA patients are summarized in Table 1. The mean surgical operative time was 98.9 ± 32.6 min. The machine learning models demonstrated an average 11 min (SD: 3.4 min) improvement in absolute difference between predicted surgical operative time and actual surgical operative time, when compared to the conventionally used electronic medical record system.

Model parameters were optimized using a coarse-grained grid-search algorithm with repeated random sub-sampling validation. The optimal ANN had two hidden layers with 18 neurons each. The optimal RF consisted of 110 trees, with the number of predictors for each node set to default. The optimal KNN learning rate was 0.3 with a sub-sampling coefficient of 0.80 and a 24 nearest neighbors.

In the training dataset, all machine learning models demonstrated excellent model discrimination. The AUC for the candidate models ranged from 0.77 for k-nearest neighbor to 0.83 for neural networks (Table 2). The calibration intercept ranged from  – 0.19 to 0.22, with the best intercept for neural networks (intercept of 0.05; Table 2). The calibration slope varied between 0.92 and 1.18 across the three candidate models (Table 2). The lowest Brier score error was achieved by neural networks (Brier score of 0.053). In the testing set of TKA patients, the AUC for the three candidate models ranged from 0.78 to 0.82 (Table 3). The highest AUC was achieved by neural networks (AUC = 0.82; Table 3). The Brier score errors in the testing set varied between 0.053 and 0.055, with the lowest Brier score error for neural networks (Brier score of 0.053, Table 3).

Table 2 Discrimination and calibration of machine learning algorithms on training set for TKA patients
Table 3 Discrimination and calibration of machine learning algorithms on testing set for TKA patients

Decision curve analysis demonstrated that the three machine learning models all achieved higher net benefits for TKA patients, when compared to the default strategies of changing management for all patients or no patient. The variables significantly associated with surgical operative time were younger age (< 45 years), female gender, ASA score, Charlson Comorbidity Index (CCI), high BMI (> 40 kg/m2), indication for TKA (post-traumatic), tranexamic acid non-usage, and operating surgeon (Fig. 1). The strongest predictors for surgical operative time were younger age (< 45 years), tranexamic acid non-usage, and high BMI (> 40 kg/m2; Fig. 2).

Fig. 1
figure 1

Artificial intelligence algorithm for the prediction of surgical operative time following primary total knee arthroplasty

Fig. 2
figure 2

Global variable importance plot for the prediction of surgical operative time following primary TKA

An example of a local, individual patient-level explanation for the model predictions by neural networks is shown in Fig. 3. For a 43 year old non-obese (BMI = 28 kg/m2) male TKA patient with ASA score 3 and Charlson comorbidity index of 3.36, who was operated using tranexamic acid, the predicted probability of an operative time greater than 85 min is 17.6%. Younger age (< 45 years), higher Charlson comorbidity index, ASA score of 3 and post-traumatic TKA indication increased the probability of a longer operative time, whereas tranexamic acid usage, low BMI (< 40 m/kg2) and male gender decreased the probability of a longer operative time.

Fig. 3
figure 3

Example of individual patient-specific explanation generated by the neural network model for a TKA patient

Discussion

The efficient use of operating room (OR) time is significantly associated with health care spending [26]. With a high cost of use estimated at $36 per minute, under- and overestimation due to inaccurate prediction of surgical operative time can cause inefficiency in OR utilization and staffing. Thus, the accurate estimation of surgical duration is critical to enhancing OR efficiency and identifying patients at risk for prolonged surgical operative time. Regarding the estimation of surgical operative time, there have been accuracy improvement efforts [27]. Traditionally, one common approach was based on surgeon’s personal experience. However, according to a previous study by Laskin et al., surgeons overestimate surgical operative time up to 32% of the time, and underestimate it 42%_of the time [28]. Alternatively, electronic medical record (EMR)-based approaches have been used to calculate surgical operative time based on previous data for the same procedure and/or surgeon. Despite modestly higher accuracies (12%) compared to surgeon’s estimates [29], prior studies have demonstrated the limitations of this approach in terms of the inability to consider multiple significant influential factors [30, 31]. Additionally, these prior works demonstrated poor performance for EMR-based predictions of surgical operative time for patients with non-standard medical history, where the EMR system may struggle to provide comparative data [30, 31]. Furthermore, a lack of complete and reliable information in EMR systems provides a significant risk for a poor estimation of surgical operative time [32].

Due to the diverse multifactorial nature of OR time predictions, including patient characteristics and surgical environments [33], recent studies have investigated the possibility of utilizing machine learning (ML) techniques to improve the accuracy of these estimations. Previously, a pilot study including 990 operative cases over a variety of non-orthopedic specialties by Tuwatananurak et al. presented that ML models showed a 7 min improvement in absolute difference between predicted case duration and actual case duration, when compared to conventional electronic medical record systems [7]. Similarly, a recent large retrospective study by Bartek et al. demonstrated a high predictive capability for ML models to predict case-time duration across a broad spectrum of non-orthopedic departments in a tertiary medical center [34]. Despite these prior studies utilizing non-orthopedic patient populations, only a small sized retrospective study by Wu et al. described an improved accuracy for ML-based predictions, when compared to surgeon's own predictions in revision THA [35]. In this present study, we report an 11 min improvement in absolute difference between predicted surgical operative time and actual surgical operative time, when compared to the conventionally used electronic medical record system, which highlights the strong potential of machine learning models for the prediction of surgical operative time. Additionally, the clinical utility of the machine learning models is supported by an opportunity to reduce healthcare costs, with an estimated $36 per minute for each minute of surgical operative time. As prior modeling studies did not develop computational tools which significantly improve the absolute difference between predicted surgical operative time and actual surgical operative time [36], the presented machine learning models have potential to assist in clinical practice.

All machine learning candidate ML models showed excellent performance on discrimination, calibration and decision curve analysis for surgical operative time. ML algorithms have the strength to become more accurate and predictive as additional data are given because these algorithms have the ability to learn and improve from repetitive experiences with nonlinear complicated data [37]. Thus, the excellent predictive abilities of these ML models may be a direct reflection of the multifactorial etiology of surgical operative time in patients. In a recent modeling study including all subspecialties by Bartek et al., the ML-based model performed better in predicting case duration than a linear regression model. Based on the result of this present study, the ANN model provided superior predictions (AUC = 0.82), when compared to two other ML algorithms; random forest (RF) and k-nearest neighbor (KNN). Regarding the superiority of ANN, a series of previous studies has proved the accuracy of ANN models for making complex medical decisions [36]. Although direct comparison is limited, the predictive value of this ANN model was comparable to the values of recent ANN-based predictive models applied to various aspects of primary TKA including length of stay, charges, and disposition [38].

Based on the ANN algorithm, the strongest predictors for surgical operative time were younger age (< 45 years), high BMI (> 40 kg/m2), and tranexamic acid non-usage. Due to diverse potential variables [39], independently evaluating the effect of individual variables on surgical operative time has posed significant challenges. A retrospective database study by Sodhi et al. found that younger age and obesity were predictors of longer surgical operative times in TKA patients (p < 0.001) [40]. A retrospective study by Liabaud et al. reported that surgical operative time for TKA patients increased by 0.933 min when the BMI increased by 1 kg/m2 [41]. Additionally, in a large database study, Wang et al. presented that TKA patients with higher BMI required significantly longer surgical operative time [42]. In terms of tranexamic acid, a retrospective cohort study by Mufarrih et al. found that for unilateral TKA patients, tranexamic acid usage led to a significant reduction in total surgical operative time (p < 0.001) [43]. Similarly, a study by Stoicea et al. reported that tranexamic acid usage was associated with a significant reduction in surgical operative time, using a study cohort involving 564 primary and revision TKA patients [44]. Equally, in a prospective study with 43 primary TKA patients, Guerreiro et al. showed the beneficial effect of tranexamic acid usage during TKA surgery with regards to surgical operative time as well as blood loss [45].

In comparison to prior retrospective studies, the present ML study demonstrated an increasingly significant impact of high body mass index on surgical operative time following primary TKA. Both, Sodhi et al. as well as Liabaud et al. demonstrated a moderately strong effect of obesity on surgical operative time, with other patient factors being of greater significance [41, 42]. This discrepancy may be due to the use of ML algorithms in the present study, with ML models being shown to provide more accurate data analysis compared to conventional statistical approaches [46], in the setting of large and complex datasets with noisy or incomplete information.

With the current focus on quality improvement initiatives, the ability of ML for predicting patients expected to require a longer surgical operative time can enhance patient care, as well as improve healthcare resource utilization and overall efficiency. The US National Healthcare Safety Network (NHSN) index predicts surgical risks for infections based on 3 factors, surgical operative time being one of them [47]. Periprosthetic joint infection (PJI), one of the most devastating complications following TJA, is strongly associated with prolonged surgical operative time [33]. Increased surgical operative times are also an independent risk factor for a multitude of other postoperative complications following TJA [33, 48]. Thus, more accurate prediction of case duration based on ML models might enable anticipated response in efforts to mitigate complications and undesirable sequelae for at risk TKA patients. In addition, a prior study by Sodhi et al. demonstrated that increased surgical operative times had the greatest effect on length of stay in primary TKA patients [40]. Therefore, coupled with the higher risk of complications, prolonged surgical operative time has a significant effect on the utilization of healthcare resources associated with hospital stays. The increased resource utilization posed by lengthy surgical operative time is also correlated to inaccurate operating room (OR) scheduling. A more accurate prediction of surgical operative time with these ML models may aid in optimizing surgical case scheduling for TKA patients, as they allow the incorporation and interpretation of multiple diverse patient and procedural factors with the potential for optimizing perioperative planning between patient, surgeon, and hospital.

There are potential limitations in the present study. First, our ML models were developed based on data from a single tertiary institution. The data may not be generalizable in other practice settings. Specifically, patients demonstrated an average BMI greater 30 kg/m2 indicating an overweight population as per definition of the Center for Disease Control. Nonetheless, similar BMI ranges were reported in similar studies on this topic [12, 21, 49]. Second, this study has inherent limitations of retrospective design such as bias and an inability to control for confounding factors. Third, the accuracy values did not exceed 90%, which merits future refined ML-based prediction models. However, despite the above limitations, in the context of homogeneous sampling and characterization of surgical operative time, this current single institutional study has clinical feasibility as patients undergo the same healthcare protocols.

In conclusion, this study shows excellent performance of machine learning models for predicting surgical operative time in primary total knee arthroplasties. The accurate estimation of surgical duration is important in enhancing OR efficiency and identifying patients at risk for prolonged operative time. These models have the potential to enhance utilization and efficiency of OR and may assist in allocation of healthcare resources associated with TKA.