Development and internal validation of machine learning algorithms for predicting complications after primary total hip arthroplasty

Kunze, Kyle N.; Karhade, Aditya V.; Polce, Evan M.; Schwab, Joseph H.; Levine, Brett R.

doi:10.1007/s00402-022-04452-y

Development and internal validation of machine learning algorithms for predicting complications after primary total hip arthroplasty

Hip Arthroplasty
Published: 04 May 2022

Volume 143, pages 2181–2188, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Archives of Orthopaedic and Trauma Surgery Aims and scope Submit manuscript

Development and internal validation of machine learning algorithms for predicting complications after primary total hip arthroplasty

Download PDF

Kyle N. Kunze ORCID: orcid.org/0000-0002-0363-3482^1,4,
Aditya V. Karhade²,
Evan M. Polce³,
Joseph H. Schwab² &
…
Brett R. Levine¹

750 Accesses
7 Citations
Explore all metrics

Abstract

Introduction

Complications after total hip arthroplasty (THA) may result in readmission or reoperation and impose a significant cost on the healthcare system. Understanding which patients are at-risk for complications can potentially allow for targeted interventions to decrease complication rates through pursuing preoperative health optimization. The purpose of the current was to develop and internally validate machine learning (ML) algorithms capable of performing patient-specific predictions of all-cause complications within two years of primary THA.

Methods

This was a retrospective case–control study of clinical registry data from 616 primary THA patients from one large academic and two community hospitals. The primary outcome was all-cause complications at a minimum of 2-years after primary THA. Recursive feature elimination was applied to identify preoperative variables with the greatest predictive value. Five ML algorithms were developed on the training set using tenfold cross-validation and internally validated on the independent testing set of patients. Algorithms were assessed by discrimination, calibration, Brier score, and decision curve analysis to quantify performance.

Results

The observed complication rate was 16.6%. The stochastic gradient boosting model achieved the best performance with an AUC = 0.88, calibration intercept = 0.1, calibration slope = 1.22, and Brier score = 0.09. The most important factors for predicting complications were age, drug allergies, prior hip surgery, smoking, and opioid use. Individual patient-level explanations were provided for the algorithm predictions and incorporated into an open access digital application: https://sorg-apps.shinyapps.io/tha_complication/

Conclusions

The stochastic boosting gradient algorithm demonstrated good discriminatory capacity for identifying patients at high-risk of experiencing a postoperative complication and proof-of-concept for creating office-based applications from ML that can perform real-time prediction. However, this clinical utility of the current algorithm is unknown and definitions of complications broad. Further investigation on larger data sets and rigorous external validation is necessary prior to the assessment of clinical utility with respect to risk-stratification of patients undergoing primary THA.

Level of evidence

III, therapeutic study.

Comparable performance of machine learning algorithms in predicting readmission and complications following total joint arthroplasty with external validation

Article Open access 08 November 2023

Predicting extended hospital stay following revision total hip arthroplasty: a machine learning model analysis based on the ACS-NSQIP database

Article 19 September 2024

Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning

Article Open access 08 January 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The implementation of bundled payment models as an economic strategy to limit financially undesirable health care costs associated with total joint arthroplasty procedures has become commonplace among hospitals throughout the United States [1, 2]. A subsequent result of such policy has been increasing focus on the ability to predict and anticipate patient outcomes, the selective recruitment of patients, and promotion of preoperative health optimization based on risk factors. Procedure-associated complications after total hip arthroplasty (THA) is a significant potential source of increased health-care expenditures secondary to hospital and emergency room readmissions in addition to the possible need for revision arthroplasty [2]. Therefore, understanding which patients are at-risk for complications during the preoperative period would potentially allow for targeted intervention during a time when health optimization could be performed in order to possibly decrease this complication risk.

Much of the current knowledge regarding risk factors associated with complications after THA is limited to associations identified in studies that are not hypothesis-driven but found through testing many random variables [3,4,5,6,7,8]. Some of these risk factors include age [7, 9], number of comorbidities [7, 9], body mass index, [5, 10], smoking [6, 11], sex [12], and diabetes mellitus [11]. Few studies have sought to develop risk models incorporating sets of such known preoperative risk factors in order to understand and predict which patients are at higher risk for complications in the postoperative period [13], while other prediction models have been limited by using intraoperative data and therefore do not allow for preoperative optimization [14]. As such, there is currently a large pool of potential risk factors and a limited number of prediction models utilizing solely preoperative and modifiable risk factors. Developing and cross-validating models with the smallest number of the most important factors would be of great clinical utility to hip and knee surgeons. This stratification model would benefit patients by providing them with the opportunity to optimize their health prior to undergoing THA.

The application of machine learning is a powerful statistical instrument capable of determining patient-specific factors which influence the probability of a patient experiencing a complication after primary THA. Furthermore, machine learning allows for the development of clinical decision-making tools, which can be used in office-based settings to help discuss risk stratification with patients [15,16,17]. This may assist orthopaedic surgeons to better determine which patients may need further optimization prior to undergoing THA. The purposes of the current study were to (1) develop and internally validate machine learning algorithms capable of predicting all-cause complications within two years of primary THA, and (2) to use these algorithms to determine which preoperative factors are important in predicting all-cause complications after primary THA. The authors hypothesized that best performing machine learning algorithm would allow for both excellent prediction and an interpretable explanation of how factors specific to individual patients influenced the model decision making.

Methods

Patient selection

Following institutional board approval, data was obtained retrospectively from the electronic medical records of patients who underwent primary total hip arthroplasty by one fellowship-trained surgeons at one large academic and two community hospitals. The timeframe for patient inclusion was between January 2014 and January 2016. Exclusion criteria included etiology of degenerative hip osteoarthritis that was inflammatory, infectious, post-traumatic, acute femoral neck fracture, or related to osteonecrosis, patients undergoing revision THA, and patients with less than two-year follow-up. Overall, 616 patients met the inclusion criteria and had a median age of 62 (interquartile range [IQR] 54–70) years. A total of 352 (57.1%) patients were female. Additional demographic and clinical outcome information is displayed in Table 1. A minimum of 100 patients has been demonstrated to be an appropriate sample size for machine learning analyses and associated predictive analytics, and therefore the current sample of 616 patients was deemed valid [18, 19].

Table 1 Characteristics of study population, n = 616

Full size table

Primary outcome

The primary outcome was all-cause complications within the two-year follow-up period. Complications were considered all events classified to be either medical or orthopaedic (Table 2). Medical complications included post-operative myocardial infarction, pulmonary embolism, deep vein thrombosis, atrial fibrillation, and anemia requiring blood transfusion. Orthopaedic complications included nerve palsy, hematoma formation, heterotopic ossification, hip squeaking, wound abscesses or dehiscence, periprosthetic infection, intra- and post-operative fractures, dislocations, leg-length discrepancy, aseptic loosening, and atraumatic, return to the emergency department or readmission for any complaint related to the operative hip, and reoperations.

Table 2 Surgical and medical complications

Full size table

Candidate variables

Candidate variables were collected prospectively before THA and stored in a secure clinical repository. Candidate variables are listed in Table 1, with the rates of missing data as follows: preoperative opioid use (n = 2, 0.32%), smoking history (n = 2, 0.32%), diabetes at time of surgery (n = 2, 0.32%), drug allergies (n = 86, 14.0%), presence of one or more comorbidities (n = 2, 0.32%), preoperative health state (n = 26, 4.2%), preoperative modified Harris Hip Score (mHHS) (n = 17, 2.8%), preoperative hip flexion (n = 197, 32.0%). Hip flexion was the only variable with greater than 30% missing data and was consequently excluded [20, 21]. These PROMs included the patient reported health state (PRHS) [22] and the modified Harris Hip Score (mHHS) [23] Prior to analysis, missingness of data was explored and determined to be missing at random and appropriate for multiple imputation. The current analysis applied multiple imputation and predictive mean matching with the “mice” package in R (R Foundation for Statistical Computing, Vienna, Austria) [24]. Following imputation, recursive feature elimination (RFE) with random forest algorithms were used to determine the combination of variables with the highest predictive value that optimized algorithm performance through a process of backwards elimination.

Algorithm construction and performance assessment

The machine algorithm development methodology and data analysis had been previously described in detail [25, 26]. Briefly, five novel algorithms were constructed on a training set of patients (80% of initial cohort) using three iterations of tenfold cross-validation. Standardized metrics of model performance including (1) calibration (calibration plot, intercept, slope) [27, 28], (2) decision curve analysis [29, 30], (3) Brier score [31], and (4) discrimination (area under receiver operating curve), were used to comparatively evaluate model performance of both the training and testing sets.

Exploration of patient-specific model explanations

Local interpretable model-agnostic explanations (LIME) depict the decision-making process of machine learning algorithms and were used to demonstrate how the best performing algorithm explained prediction on a patient-by-patient basis [32]. Using LIME, an open access digital web application was developed with the capacity to provide both predictions and explanations at the individual patient level [33]. This application is freely accessible: https://sorg-apps.shinyapps.io/tha_complication/. Given that external validation was not performed in the present study, the application in its current form merely constitutes an educational tool and open-access source to the developed algorithms.

The Anaconda Distribution (Anaconda, Inc., Austin, Texas), R (The R Foundation, Vienna, Austria), RStudio (RStudio, Boston, MA), and Python (Python Software Foundation, Wilmington, Delaware) were used for data analysis. Predictive modeling development and testing was performed under guidelines set forth by Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines and the Guidelines for Developing and Reporting Machine Learning Models in Biomedical Research were followed for this analysis [34, 35].

Results

Final variable selection

The combination of variables identified for algorithm development through recursive feature selection that optimized predictive performance were comorbidities, preoperative opioid use greater than three months, current smoking, prior hip surgery, drug allergies, and age (Fig. 1B).

Algorithm selection and model performance

Cross-validation of the training set (n = 494) demonstrated that the AUC ranged from 0.81 to 0.92, the calibration intercept ranged from − 0.21 to 5.09, the calibration slope ranged from 0.8 to 3.49, and the Brier score ranged from 0.08 to 0.12 (Table 3).

Table 3 Algorithm performance on cross-validation of training set, n = 494, mean (95% confidence interval)

Full size table

In the testing set, the AUC ranged from 0.77 to 0.93, the calibration intercept ranged from − 0.50 to 2.11, the calibration slope ranged from 0.89 to 1.22, and the Brier score ranged from 0.08 to 0.16 (Table 4). The algorithm with the best performance was the stochastic gradient boosting model with AUC 0.88, calibration intercept 0.103, calibration slope 1.22, and Brier score 0.09. The most important factors for prediction of complications were age, documented drug allergies, prior ipsilateral hip surgery, smoking, and preoperative opioid use (Figs. 1A and 2A). The stochastic gradient boosting model resulted in greater net benefit compared to the default strategies of changes for all patients, for no patients, or changes based on age alone as demonstrated by the decision curve analysis (Fig. 2B).

Table 4 Algorithm performance in independent testing set (95% confidence interval), n = 122

Full size table

Potential application and utility of machine learning using patient specific explanations

An individual patient-specific risk explanation demonstrating the potential utility of using this risk stratification model towards preoperative health optimization applications is depicted in Fig. 3.

Discussion

The main finding of the current proof-of-concept study was that the best performing machine learning algorithm conferred good predictive capability with regards to risk of experiencing a complication after primary THA at the senior author’s institution. This model incorporated various modifiable and patient-specific risk factors that can be the target of optimization during the preoperative period that may decrease the risk of complications prior to undergoing surgical intervention. Furthermore, the development of a clinical decision-making tool which depicts and calculates patient-specific risk for complications can theoretically augment patient care by providing teachable and real-time data in clinic settings which may be used towards health optimization purposes. Such data may also be critical for establishing tiers for alternative payment models so as not to promote the potential for “cherry-picking” behavior and access to care problems.

There are several limitations that should be considered within the context of the current study results. Although the current machine learning algorithms had good prediction capabilities, the incidence of complications was not high enough to perform prediction of individual complication categories such as periprosthetic joint infection or myocardial infarction. The definition of complications was also broad as to capture a wide variety of potential postoperative events, which may have inflated the complication rate. However, we believe that the included complications represent primarily major events which would be relevant to associated penalties in alternative payment models. This study is retrospective in design and therefore is subject to biases inherent in such data collection methods; however, there was a high completion rate of included data and multiple imputation methods were employed to mitigate the effect of missing data. Finally, the relatively small sample size limited the holdout (testing) dataset for assessment of model performance to only 123 patients. As such, the present study represents a proof-of-concept design and the open-access tool presented herein is merely for educational purposes until rigorous external validation is performed. It is yet to be known if the same predictive value in assessing risk will hold in other patient populations. Finally, this analysis is limited in that though we investigated the predictive importance of all variables available in this specific institutional repository, other variables important for predicting complications likely exist. Future studies are warranted that incorporate other clinically relevant variables into the current model to determine whether or not these variables confer beneficial changes to the overall performance of the model in predicting all-cause complications.

The current study demonstrated that the stochastic gradient boosting algorithm was the best performing machine learning algorithm of the five that were developed and internally validated. This particular model had an AUC of 0.88, which is considered good discriminatory capability, and demonstrated appropriate predictive probabilities relative to observed events as the model did not overfit the data (Fig. 2A). This is particularly important in reference to standard AUC values, as in real-life practice complications are better described at the patient-level as probabilities of experiencing an event as opposed to a binary all-or-nothing event. Although the random forest (Supplement 1) and elastic-net penalized logistic regression (Supplement 2) also performed well, they had inferior calibration compared to the stochastic boosting gradient. Furthermore, the decision-curve analysis in the current study (Fig. 2B) demonstrated that using the stochastic gradient boosting model for risk stratification of postoperative complications was superior to considering all patients as high risk, none of the patients as high risk, and when considering age alone as a risk factor. Put simply, for patients undergoing primary THA, the stochastic gradient boosting model conferred greater utility in terms of preoperative risk stratification in comparison to alternate strategies of determining complication risk. This model provides concise assessment of complication risk in patients undergoing primary THA based on synthesized preoperative patient data. In addition, the model was incorporated into a proof-of-concept application that is user friendly and patient-specific while requiring fewer variables than prior prediction risk models [13, 14].

The current study found that the most important patient-specific factors contributing to complications in the institutional data set under consideration were age, medication allergies, opioid use, smoking, comorbidities, and prior hip surgery. It is of note, however, that the primary outcome in this study was all-cause complications. Using this all-encompassing definition of complications was necessary as the incidence of individual complications, such as myocardial infarction, was too infrequent to develop a meaningful model to predict each specific complication. In this context, some of the model interpretability is lost. For example, though a greater patient age may be predictive of all-cause complications, it is possible that increased age was protective of some specific complications simultaneously, such as dislocation. Few studies have also used this all-encompassing definition of complications in attempting to determine associations between preoperative variables and complications after THA. Harris et al. [36] used the American College of Surgeons-National Surgical Quality Improvement Program (ACS-NSQIP) database and least absolute shrinkage and selection operator (LASSO) methods to predict 30-day complications and mortality after total hip and knee arthroplasty procedures. The authors also found that age and various comorbidities were important contributors to experiencing complications after total knee or total hip arthroplasty. However, limitations to their study include: (1) combining total hip and knee arthroplasty patients, which are representative of potentially different patients with distinct risk profiles; (2) using LASSO methodology with less than excellent discriminatory capabilities (all models with AUCs less than 0.8, and for all complications, equal to 0.68); and (3) utilization of a national database with inherent limitations such as overrepresentation of specific populations and the limitation of 30-days complication data. Although the current study was not able to externally validate the best performing algorithm as Harris et al. did on the Veterans Affairs Surgical Quality Improvement Program (VASQIP) database, the model in the current study has the benefits of (1) using institutional data from a single-surgeon; (2) capturing complications within two-years of primary THA; and (3) having rigorously tested five independent classification-type machine learning models. Nonetheless, there remain limitations to both the current study and that performed by Harris et al. [36] which will need to be improved upon prior to creating a meaningful tool amenable to confidently predicting complications in diverse populations.

The model in the current study provides a rapid method for combining various pertinent clinical data points to accurately quantify patient risk at the individual level. Although risk stratification has been previously investigated in elective THA, research has primarily focused on global assessment of risk. The novelty of the methodology developed in the present study is the ability to efficiently determine risk at the individual patient-level and receive real-time feedback on the patient factors influencing the risk calculation. As the dataset in the current study is small and requires external validation, the presented patient scenario (Fig. 3) represents a proof-of-concept for machine learning capabilities and how machine learning can potentially impact clinical workflow and patient outcomes in the future. Though the clinical utility of this algorithm remains questionable, it is important to demonstrate how the machine learning algorithm and online application function. Future studies are warranted to externally validate the model in the current study and determine if additional variables could be of clinical utility, as well as to determine if patients undergoing total knee arthroplasty require a separate risk model. This open-source tool is for educational purposes only and should not be used in clinical settings at this time due to its generalizability being unknown outside of the authors’ institution.

Conclusion

The stochastic boosting gradient algorithm demonstrated good discriminatory capacity for identifying patients at high-risk of experiencing a postoperative complication and proof-of-concept for creating office-based applications from machine learning that can perform real-time prediction. However, this clinical utility of the current algorithm is unknown and definitions of complications broad. Further investigation on larger data sets and rigorous external validation is necessary prior to the assessment of clinical utility with respect to risk-stratification of patients undergoing primary THA.

References

Doran JP, Beyer AH, Bosco J, Naas PL, Parsley BS, Slover J, Zabinski SJ, Zuckerman JD, Iorio R (2016) Implementation of bundled payment initiatives for total joint arthroplasty: decreasing cost and increasing quality. Instr Course Lect 65:555–566
PubMed Google Scholar
Siddiqi A, White PB, Mistry JB, Gwam CU, Nace J, Mont MA, Delanois RE (2017) Effect of bundled payments and health care reform as alternative payment models in total joint arthroplasty: a clinical review. J Arthroplasty 32(8):2590–2597
Article PubMed Google Scholar
Kwon YM, Rossi D, MacAuliffe J, Peng Y, Arauz P (2018) Risk factors associated with early complications of revision surgery for head-neck taper corrosion in metal-on-polyethylene total hip arthroplasty. J Arthroplasty 33(10):3231–3237
Article PubMed Google Scholar
Miettinen SS, Makinen TJ, Kostensalo I, Makela K, Huhtala H, Kettunen JS, Remes V (2016) Risk factors for intraoperative calcar fracture in cementless total hip arthroplasty. Acta Orthop 87(2):113–119
Article PubMed Google Scholar
Triantafyllopoulos GK, Soranoglou VG, Memtsoudis SG, Sculco TP, Poultsides LA (2018) Rate and risk factors for periprosthetic joint infection among 36,494 primary total hip arthroplasties. J Arthroplasty 33(4):1166–1170
Article PubMed Google Scholar
Shetty T, Nguyen JT, Wu A, Sasaki M, Bogner E, Burge A, Cogsil T, Kim EU, Cummings K, Su EP, Lyman S (2019) Risk factors for nerve injury after total hip arthroplasty: a case-control study. J Arthroplasty 34(1):151–156
Article PubMed Google Scholar
Inneh IA, Lewis CG, Schutzer SF (2014) Focused risk analysis: regression model based on 5314 total hip and knee arthroplasty patients from a single institution. J Arthroplasty 29(10):2031–2035
Article PubMed Google Scholar
Gausden EB, Parhar HS, Popper JE, Sculco PK, Rush BNM (2018) Risk factors for early dislocation following primary elective total hip arthroplasty. J Arthroplasty 33(5):1567–1571 (e1562)
Article PubMed Google Scholar
Badarudeen S, Shu AC, Ong KL, Baykal D, Lau E, Malkani AL (2017) Complications after revision total hip arthroplasty in the medicare population. J Arthroplasty 32(6):1954–1958
Article PubMed Google Scholar
Jeschke E, Citak M, Gunster C, Halder AM, Heller KD, Malzahn J, Niethard FU, Schrader P, Zacher J, Gehrke T (2018) obesity increases the risk of postoperative complications and revision rates following primary total hip arthroplasty: an analysis of 131,576 total hip arthroplasty cases. J Arthroplasty 33(7):2287–2292 (e2281)
Article PubMed Google Scholar
Courtney PM, Boniello AJ, Berger RA (2017) Complications following outpatient total joint arthroplasty: an analysis of a national database. J Arthroplasty 32(5):1426–1430
Article PubMed Google Scholar
Schilling PL, Bozic KJ (2016) Development and validation of perioperative risk-adjustment models for hip fracture repair, total hip arthroplasty, and total knee arthroplasty. J Bone Joint Surg Am 98(1):e2
Article PubMed Google Scholar
Kunze KN, Li J, Movassaghi K, Wiggins AB, Sporer SM, Levine BR (2018) Internal validation of a predictive model for complications after total hip arthroplasty. J Arthroplasty 33(12):3759–3767
Article PubMed Google Scholar
Oldmeadow LB, McBurney H, Robertson VJ (2003) Predicting risk of extended inpatient rehabilitation after hip or knee arthroplasty. J Arthroplasty 18(6):775–779
Article PubMed Google Scholar
Kunze KN, Polce EM, Patel A, Courtney PM, Levine BR (2021) Validation and performance of a machine-learning derived prediction guide for total knee arthroplasty component sizing. Arch Orthop Trauma Surg 141(12):2235–2244
Article PubMed Google Scholar
Kunze KN, Polce EM, Clapp I, Nwachukwu BU, Chahla J, Nho SJ (2021) Machine learning algorithms predict functional improvement after hip arthroscopy for femoroacetabular impingement syndrome in athletes. J Bone Joint Surg Am 103(12):1055–1062
Article PubMed Google Scholar
Kunze KN, Karhade AV, Sadauskas AJ, Schwab JH, Levine BR (2020) Development of machine learning algorithms to predict clinically meaningful improvement for the patient-reported health state after total hip arthroplasty. J Arthroplasty 35(8):2119–2123
Article PubMed Google Scholar
Ramkumar PN, Karnuta JM, Haeberle HS, Rodeo SA, Nwachukwu BU, Williams RJ 3rd (2021) Effect of preoperative imaging and patient factors on clinically meaningful outcomes and quality of life after osteochondral allograft transplantation: a machine learning analysis of cartilage defects of the knee. Am J Sports Med 49(8):2177–2186
Article PubMed Google Scholar
Ramkumar PN, Karnuta JM, Haeberle HS, Owusu-Akyaw KA, Warner TS, Rodeo SA, Nwachukwu BU, Williams RJ 3rd (2021) Association between preoperative mental health and clinically meaningful outcomes after osteochondral allograft for cartilage defects of the knee: a machine learning analysis. Am J Sports Med 49(4):948–957
Article PubMed Google Scholar
University of Wisconsin-Madison, Social Science Computing Cooperative (2013) Multiple Imputation in Stata: Deciding to Impute. Published 01/13/2013. Available at https://www.ssc.wisc.edu/sscc/pubs/stata_mi_decide.html. Accessed Mar 2021
Lee JH, Huber Jr, J (2011) Multiple imputation with large proportions of missing data: How much is too much? In: United Kingdom Stata Users' Group Meetings 2011. Stata Users Group, https://ideas.repec.org/p/boc/usug11/23.html. Accessed Mar 2021
Herdman M, Gudex C, Lloyd A, Janssen M, Kind P, Parkin D, Bonsel G, Badia X (2011) Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Qual Life Res 20(10):1727–1736
Article CAS PubMed PubMed Central Google Scholar
Byrd JW (2003) Hip arthroscopy: patient assessment and indications. Instr Course Lect 52:711–719
PubMed Google Scholar
van Buuren S, Groothuis-Oudshoom K (2011) Mice: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67
Article Google Scholar
Karhade AV, Thio Q, Ogink PT, Bono CM, Ferrone ML, Oh KS, Saylor PJ, Schoenfeld AJ, Shin JH, Harris MB, Schwab JH (2019) Predicting 90-day and 1-year mortality in spinal metastatic disease: development and internal validation. Neurosurgery 85(4):E671–E681
Article PubMed Google Scholar
Thio Q, Karhade AV, Ogink PT, Raskin KA, De Amorim BK, Lozano Calderon SA, Schwab JH (2018) Can machine-learning techniques be used for 5-year survival prediction of patients with chondrosarcoma? Clin Orthop Relat Res 476(10):2040–2048
Article PubMed PubMed Central Google Scholar
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW (2010) Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21(1):128–138
Article PubMed PubMed Central Google Scholar
Steyerberg EW, Vergouwe Y (2014) Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 35(29):1925–1931
Article PubMed PubMed Central Google Scholar
Vickers AJ, Elkin EB (2006) Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 26(6):565–574
Article PubMed PubMed Central Google Scholar
Vickers AJ, Cronin AM, Elkin EB, Gonen M (2008) Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Mak 8:53
Article PubMed PubMed Central Google Scholar
Brier GW, Allen RA (1951) Verification of weather forecasts. In: Malone TF (ed) Compendium of meterology. American Meteorological Society, Boston, pp 841–848
Chapter Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) "Why should I trust you?": explaining the predictions of any classifier. In: Proceedings of the 22nd SIGKDD international conference on knowledge discovery and data mining. pp 1135–1144. https://arxiv.org/abs/1602.04938. Accessed March 2021
Ribeiro MT, Singh S, Guestrin C (2019) Model-agnostic interpretability of machine learning. In: Cornell University. https://arxiv.org/abs/1606.05386
Collins GS, Reitsma JB, Altman DG, Moons KG (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Br J Surg 102(3):148–158
Article CAS PubMed Google Scholar
Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, Shilton A, Yearwood J, Dimitrova N, Ho TB, Venkatesh S, Berk M (2016) Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 18(12):e323
Article PubMed PubMed Central Google Scholar
Harris AHS, Kuo AC, Weng Y, Trickey AW, Bowe T, Giori NJ (2019) Can machine learning methods produce accurate and easy-to-use prediction models of 30-day complications and mortality after knee or hip arthroplasty? Clin Orthop Relat Res 477(2):452–460
Article PubMed PubMed Central Google Scholar

Download references

Funding

No funding was obtained for this study.

Author information

Authors and Affiliations

Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL, USA
Kyle N. Kunze & Brett R. Levine
Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Aditya V. Karhade & Joseph H. Schwab
School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA
Evan M. Polce
Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, NY, USA
Kyle N. Kunze

Authors

Kyle N. Kunze
View author publications
You can also search for this author in PubMed Google Scholar
Aditya V. Karhade
View author publications
You can also search for this author in PubMed Google Scholar
Evan M. Polce
View author publications
You can also search for this author in PubMed Google Scholar
Joseph H. Schwab
View author publications
You can also search for this author in PubMed Google Scholar
Brett R. Levine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyle N. Kunze.

Ethics declarations

Conflict of interest

KNK, AK, EMP, JHS, BRL has no conflicts of interest.

Ethical approval

Institutional board approval was obtained from Rush University Medical Center prior to conducting this study.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplement Figure 1

. Calibration of random forest in testing set, n = 122 (TIFF 1976 KB)

Supplement Figure 2

. Calibration of elastic-net penalized logistic regression in testing set, n = 122 (TIFF 2001 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kunze, K.N., Karhade, A.V., Polce, E.M. et al. Development and internal validation of machine learning algorithms for predicting complications after primary total hip arthroplasty. Arch Orthop Trauma Surg 143, 2181–2188 (2023). https://doi.org/10.1007/s00402-022-04452-y

Download citation

Received: 22 October 2021
Accepted: 15 April 2022
Published: 04 May 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00402-022-04452-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Development and internal validation of machine learning algorithms for predicting complications after primary total hip arthroplasty

Abstract

Introduction

Methods

Results

Conclusions

Level of evidence

Similar content being viewed by others

Comparable performance of machine learning algorithms in predicting readmission and complications following total joint arthroplasty with external validation

Predicting extended hospital stay following revision total hip arthroplasty: a machine learning model analysis based on the ACS-NSQIP database

Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning

Explore related subjects

Introduction

Methods

Patient selection

Primary outcome

Candidate variables

Algorithm construction and performance assessment

Exploration of patient-specific model explanations

Results

Final variable selection

Algorithm selection and model performance

Potential application and utility of machine learning using patient specific explanations

Discussion

Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Supplementary Information

Supplement Figure 1

Supplement Figure 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation