Introduction

The concept of «artificial intelligence (AI)» was introduced by Pr J McCarthy at Darmouth College as early as 1956. Artificial intelligence is defined as a field of computer science that designs systems to do tasks that typically requires human intelligence [1, 2]. Today, AI has already infiltrated numerous aspects of our lives, from search engines, spam filters, translation software to autonomous driving vehicles, among others. Impressive progress has been made in AI in recent years, driven by exponential increases in computing power, following Moore’s law, and limitless data storage, along with the connexion of billions of individuals by mobile devices.

Terminology related to «AI» is complex and can be a source of confusion for non-expert readers. Machine learning (ML) is a subdivision of AI in which computers learn from data without being explicitly programmed [2]. Deep learning (DL) is a class of ML algorithms that uses multiple layers of nodes, called neurons, to progressively extract higher-level features from the raw input [2, 3]. The depth of the network is reportedly a critical component for good performance. Most modern deep learning models are based on artificial neural networks (ANN), inspired by the organization of the human brain. An enormous variety of deep neural network architectures has been designed, such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs). The latter transforms the input data using spatial filters that perform convolutional operations [4] and is particularly suitable for image processing (Fig. 1). RNNs are well suited for sequential data such as signal, speech, and language.

Fig. 1
figure 1

Simplified view of a convolutional neural network (CNN) applied on a standard AP X-ray of a THA, with two 3 by 3 convolutions extracting n1 and then n2 local features, one pooling layer 2 by 2 reducing the complexity and a final decision layer. This is a classical architecture for image classification

In the healthcare field, two broad categories of learning strategies are employed, namely, « supervised learning » and « unsupervised learning » [5]. Supervised ML is based on algorithms designed to learn by example. Put in another way, supervised ML is the process of teaching a model by supplying it with corresponding input and output data. These input/output pairs are usually referred to as « labeled data ». The process consists of two phases. Firstly, the training (the output is available to the learning model), which allows the model to fit the data. During this phase, the validation step is regularly done to monitor the genericity of the extracted knowledge. The second phase is the testing one (labels are hidden from the model) which measures the final performance of the learned system. The performance of classification models is quantified in confusion matrices that distinguish false-positive and false-negative errors from true-positive and true-negative predictions [6]. Supervised ML helps to optimize performance criteria with the help of experience. It is adapted to analyze and classify data from different sources, such as medical images and clinical features. Based on the latter, Oh et al. [7] evaluated the performance of different ML classifiers to predict pathological femoral fractures in patient with lung cancer. Also, Ashinsky et al. [8] showed ML was effective to predict the progression to symptomatic osteoarthritis (OA) from T2-weighted in vivo MRI knee images. In contrast, unsupervised learning is the training of machine using information that is neither classified nor labeled. In this case, the algorithms discover hidden patterns or data groupings without the need for teaching. Unsupervised ML models are utilized mainly for grouping unlabeled data based on their similarities or differences (clustering) or to find relationships between variables in a given dataset (association). Features engineered by unsupervised learning may also be incorporated into supervised learning models. For example, an unsupervised ML technique has revealed the presence of 3 distinct subtypes of type 2 diabetes, by analyzing electronic health records in combination with genetic data [9].

ML technologies have been evaluated in various fields of medicine such as radiology, ophthalmology, dermatology, and cardiology. They were shown to perform equally well, or even outperform, human specialists or traditional logistic regression models [10, 11]. In the orthopaedic field, AI and ML are still in their early development. Thus far, their applications in the clinical practice are rather limited, the most common ones being those in imaging-based diagnosis [12, 13] as well as in the advancement of value-based care [14, 15]. Automated image processing systems used to classify fractures have been reported to perform equivalent or better than clinicians, notably pretrained CNN systems [13]. Also, ML models appeared to consistently improve the performance of clinicians in detecting radiological abnormalities for wrist fractures [16], for scaphoid fractures [17], for calcaneus fractures [18] and for ACL tears and meniscal tears on knee MRI examinations [19, 20]. These systems would be helpful in automated detection, classification, and prediction of osteoarticular pathologies, thereby, providing efficient assistance to both general radiologists and non-radiologist clinicians.

Given the exponential growth of information in the field, we aimed to provide an update on the specific role of AI/ML in hip and knee reconstruction surgery through a literature review and an analysis of the most relevant papers.

Hip and knee degenerative disease

The spread of picture archiving and communication systems (PACSs) in our centers has made easier the analysis of medical images by automated systems at large scales. Different applications of ML, from diagnosis screening to prediction of degenerative articular disease progression, have been proposed. Image analysis by an AI machine takes several steps which differ from the human mind approach. Hence, the latter is able to see an entire image as a single object and to recognize specific features, such as osteophytes or joint space narrowing, to make a diagnosis of OA from a standard radiograph. Conversely, the computer has to collect data pixel by pixel, resulting into the definition of the texture of each pertinent structure. For OA, the most important features, such as pixel variation around articular bone (sclerosis) and the area of mean pixel intensity in the joint space, are selected according to a predefined model. Finally, the algorithm is tested to separate the populations of healthy and OA radiographs. This relatively simple approach to the diagnosis of OA on standard radiographs introduces the concept of « fitting » the data with a good model to obtain the best results. Xue et al. [21] trained and tested a previously validated deep CNN with 420 hip X-ray images to detect hip OA. The X-ray images were labeled by two experienced physicians, and separated into « normal » or « OA », according to the presence of osteophytes. The CNN model achieved a sensitivity of 95% and a specificity of 90.7% in the detection of hip OA, similar to those of experienced physicians.

Deep learning systems are well suited for the diagnosis of osteoarticular diseases, based on the interpretation of medical images (Fig. 2) combined with data from other sources, such as medical reports. Ashinsky et al. [8] developed a ML algorithm to predict knee symptomatic OA progression based on MRI. Specifically, they trained an automated system to classify T2-weighted MRI images of the medial femoral condyle in 68 subjects. In this study, image classification was based on inherent image texture and intensity information, rather than measurements such as cartilage volume or thickness, in association with a nonlinear image registration. In correlation with WOMAC score, a severity index of the condition of patients with hip or knee OA, the authors found the system was able to classify T2 images of cartilage to predict the development of clinical OA with 75% accuracy. Pedoia et al. [22] used a regression method to model articular cartilage degeneration for predicting advancements in OA. For this purpose, the model had been supplied with a combination of clinical, biomechanical and MRI data, into a topological data analysis (TDA) integration and visualization framework. Hence, similar profiles of subjects could be grouped into clusters, allowing the system to model progression of OA. As a result, sensitivity and specificity of the model were 91.1% and 86.8%, respectively. These multimodal approaches are probably the most promising applications of ML in medicine. In the near future, the image processing AI systems are very likely to assist orthopaedic surgeons in the diagnosis of OA, specifically in its early radiographic stages.

Fig. 2
figure 2

a Automatic detection of cartilage lesion of the lateral femoral condyle of a left knee by a deep learning system, as seen on coronal and sagittal MRI views. The model had been previously trained on MRI images annotated by a bounding box technique. This technique assists in the preparation of an algorithm to identify various types of articular lesions. b Automatic segmentation of knee structures (bone, cartilage, meniscus) from MRI images. This technique allows for fast and reproductible extraction of the cartilage volume of the knee joint. Courtesy of Incepto-Medical, Paris, France

Identification of hip and knee implants

The development of automated implant identification systems is desirable in several aspects of orthopaedic surgeon practice. Identifying implants is recommended, if not mandatory, when a revision hip or knee arthroplasty is planned. Implant identification is needed to prevent unnecessary component removal, or to prepare the bearing options. It is an essential preparatory step in preventing delays in care, perioperative morbidity, and associated costs. Nowadays, this task is commonly achieved using the patient X-rays. Alternatively, the orthopaedic surgeon may refer to the hospital operative records, office records, operative dictation records, and implant sheet/labels. However, these traditional methods fail to identify the device pre-operatively in up to 10% cases [23, 24]. Also, this information may be missing or not available in emergency situations. An image-based ML program for implant identification may offer an opportunity to mitigate delays in care and associated morbidity and costs. Several teams have developed machine learning-based programs to automatically detect and identify hip and knee implants on standard X-rays [25, 26]. In this context, most studies used a supervised ML technique, based on samples of 170 to 1972 images, and necessitated 49 to 1000 epochs (the number of complete passes through the training dataset) (Table 1). The number of different implant designs employed to train the models ranged from two to 29. Globally, these studies showed the efficiency of appropriately trained CNNs in identifying hip and knee prosthetic implants on standard X-rays, with a reported accuracy > 99% (Fig. 3). The distinction between specific arthroplasty anatomic designs represents a slightly more difficult task for automated systems, but can still be achieved with accuracy [27]. ML/DL algorithms have a strong potential to identify, integrate, and analyze features within numerous dimensions that may not be apparent to humans. Hence, Borjali et al. [28] found a DL system was able to « learn » to identify the design of 3 types of stems by « looking » at the tip of the stem, without being programmed to look specifically at this region. In the future, systems should also provide technical implant information such as taper size, stem size, and angle to facilitate and help make our surgical planning more accurate.

Table 1. Summary of studies evaluating automated knee/hip implant identification systems based on standard X-ray
Fig. 3
figure 3

The different steps of training, validation, and testing of a neural network (e.g. convolutional neural network) to identify the implant manufacturer and model are presented (after [25]). From left to right, the model is trained on a set of labeled X-ray images, which has been previously annotated and augmented by cropping, zooming in, upscaling, adding noise, flipping horizontally, etc. The model is supplied with this training dataset and validated with an external image set through several passes (the epochs). Once the model has been validated, the algorithm is tested on never seen X-rays images, which are then classified to evaluate the system performances in real conditions. The result is expressed as a prediction interval. CNN= convolutionnal neural network

These studies raise the potential of IA/ML image analysis systems for more difficult tasks, such as identification of post-operative complications, including periprosthetic osteolysis, osteoporosis, and, eventually, evaluate the risk of implant loosening or periprosthetic fracture occurrence. Shah et al. [29] tested four different publicly available CNN architectures (such as ResNet, AlexNet) to determine whether a knee or hip prosthesis implant was loose or well-fixed. Pre-operative radiographs from 697 patients undergoing a revision total hip arthroplasty (THA) or total knee arthroplasty (TKA) were obtained from the local PACS. The findings of fixed or loose implants at surgery were considered the gold-standard diagnosis of fixation. The performance of the different pretrained models ranged from 88.2 to 95.3% on the validation sets, with DenseNet being the best performing model. Interestingly, when historical patient data were combined with radiological images, overall accuracy, sensitivity, and specificity reached 88.3%, 70.2%, and 95.6% on test dataset, respectively, with the model performing better in THA revision than TKA revision. This study illustrates the interest of using pretrained and publicly available CNN architectures to develop ML algorithms with limited computing resources. Then, it will also help to guide future efforts in using automated systems to predict complications of orthopaedic implants, with the ability to analyze both images and data. Last, one potential application for automated implant detection systems would be to help collecting imaging data for arthroplasty registries.

Prediction of patient outcome after total joint arthroplasty (TJA)

Clinical outcome

Although ML can provide reliable support to the clinician in medical image analysis, one of its most promising aspects in the health system relies in the prediction of clinical outcome. Hence, being able to risk-stratify a patient planned for a hip or knee arthroplasty, could help surgeons to give clear pre-operative information and closely monitor his patient, and intervene sooner in case of complication. In addition, a clinically meaningful system would ideally help to predict length of hospital stay and associated costs. Contrary to classical statistical techniques, ML systems are implicitly able to perform variable selection and weighting among a large pool of available variables. Therefore, the analysis of large databases, such as joint arthroplasty national registries, using ML approaches, constitutes a unique opportunity to go to the next step of personalized care and, eventually, develop value-based payment models. Hence, naïve Bayesian models were found to have excellent capacity to predict length of hospital stay based on individual factors in patients undergoing total knee or total hip arthroplasty [14, 15]. The development of algorithms using pre-operative patient-specific comorbidity data constitutes an innovative solution to reliably predict post-operative complication and associated expenditure, in lower extremity arthroplasty patients [30]. Hence, using a ML approach known as Logic Forest, Hyer et al. [31] evaluated the impact of pre-operative risk factors in the utilization of health care resources in the year following elective surgery, such as TKA or THA. Out of more than one million patients included in this study, 4.8% incurred almost 32% of the expenditures postoperatively. The ML system identified hemiplegia/paraplegia, weight loss, congestive heart failure with chronic kidney disease as the main predictive factors of health care use in the year following surgery. In an attempt to predict early clinical outcome in TJA patients, Bini et al. [32] evaluated a system combining ML with individual data collected from a commercially available wearable device. Hence, they showed the ML algorithm could accurately predict the six-week patient reported outcome measure (PROM) data as early as 11 days following a total hip or knee arthroplasty. Fontana et al. [33] trained three ML algorithms (logistic least absolute shrinkage and selection operator (LASSO), random forest, a type of ML classification algorithm consisting of many decision trees, and linear support vector machine) to predict changes in PROMs, such as SF-36, Hip Disability and Osteoarthritis Outcome Score (HOOS) or KOOS, two years after a TK or a TH replacement. The authors found the three ML models performed equally well for a given PROM. Also, they reported a consistent improvement in algorithm predictive capability when the information was obtained before surgery as opposed to before decision. The authors warned on the risk of applying their model to any sample, without a first step of sample-specific calibration. These results illustrate the potential of ML systems to improve clinical decision-making and patient care by helping clinicians identify, in advance, patients who are less likely to achieve meaningful clinical improvements after TJA.

Post-operative complications

Complications and unplanned hospital readmissions following TJA impose considerable burdens on the health care system. General surgical risk prediction models lack accuracy for specific procedures, such as THA and TKA [34]. TJA-specific pre-operative risk prediction models, such as the American Joint Replacement Registry Risk Calculator, which estimates risk for 90-day mortality and 2-year prosthetic joint infection, have also substantial limitations. Specifically, the performance of the latter was found poor in an external validation study, notably for 90-day mortality [35]. A ML strategy is an opportunity to develop and validate specific prediction models for mortality and major complications after elective TJA. For example, an ML model has been found better at prediction of one year mortality following hip fracture than a logistic model [36]. A ML regression strategy, namely, LASSO (least absolute shrinkage and selection operator) regression, was used by Harris et al. [37] to select and classify important variables to predict mortality and major complications after primary THA or TKA. The model was developed and internally validated on a database of more than 100,000 THA/TKAs. The model had good accuracy in predicting the risk of 30-day mortality, renal or cardiac complications. When tested on a cohort of more than 70,000 cases (external validation), the model was found robust in terms of predictions about mortality and cardiac complications, but not for renal complications. The authors underlined the difficulty for ML algorithms, trained in as specific context, to be transposed into real world. Although a number of the reported ML-based tools are preliminary, some of them are already available for clinical practice for orthopaedics, such as specific risk calculators following THA/TKA available online [38].

In cases of predicting success/failure of treatment, ML has the ability to find arrangements between considerable amounts of patient data. Specifically, Shohat et al. [39] used a random forest analysis model, to predict outcome following irrigation and debridement for periprosthetic joint infection (PJI). Data were collected retrospectively from 1,174 revision THAs and TKAs and 52 variables were analyzed. Random forest analysis identified ten important factors associated with failure, including higher CRP levels, positive blood cultures, indication for index surgery other than OA, and absence of modular component exchange. Interestingly, the algorithm created by random forest was found to predict failure as compared to observed outcomes with a high level of agreement.

Patient monitoring

Last, the widespread use of mobile devices in the general population has made communication simpler and now makes remote monitoring of patients possible. Such technologies offer the possibility of collecting data related to physical activity and patient postoperative rehabilitation in real-time. Above all, a remote patient monitoring system may help surgeons to detect patients who are not progressing as expected, and, eventually, lead to modify exercise programs or increase the frequency of clinic visits [32].

Conclusion

Through this literature review, we aimed to report the potential applications of ML/DL in the field of hip and knee prosthetic reconstruction. Ideally, ML algorithms should help physicians to make pre-operative diagnosis more reliable, predict post-operative outcome, improve the detection of post-operative complications, and, ultimately, determine the optimal therapeutic strategy. AI/ML systems have the capacity to leverage data from variable sources, such as medical imaging, clinical records, patient monitoring systems, and arthroplasty registries; machine learning models have simplified the combination of data from observations from all of these sources to provide reliable information. Specifically, studies have shown the accuracy of AI systems in the detection and prediction of progression of osteoarticular degenerative diseases, in the identification of hip and knee implant models, in the prediction of function and complications after THA/TKA, and the evaluation of costs related to these procedures. Interestingly, these studies have shown that machines are able to mimic, and, in many instances, surpass, the human observer, and have the potential to make accurate prediction of future data, notably in predicting accurate outcome after treatment in a given individual.

A number of challenges are emerging with the gradual introduction of artificial intelligence-based methods in clinical medicine. The long-term effects of AI on the healthcare field are uncertain. Although most studies use publicly available pretrained models, one major limitation of AI/ML production is the need of enormous amount of medical data which are difficult to find and share. Another limitation, in relation to the previous one, is the difficulty to reliably transpose results from one medical centre to another. Generally, AI lacks several features of human mind functioning, such as « commonsense reasoning », empathy, compassion, emotion, creativity, judgement, and responsibility. Therefore, even well-trained models may lead to errors with potential harmful consequences for the patient. This aspect illustrates the need for us, as clinicians, to stay in control of these processes.

In a near future, AI/ML will probably provide the orthopaedic surgeon with key tools in an increasingly data-driven and data-dependent world. As the amount of patient-related data continues to grow, it is becoming evident that medical decisions will increasingly have recourse to AI/ML. The latter will need to be incorporated into the daily practice, with the help of automated algorithms for computers. Also, it is probable that advanced ML systems will overcome the problem of missing data. Advances in unsupervised learning will enable far greater characterization of patient’s risk factors for complications or failure following hip or knee reconstruction. Ultimately, this will lead to better surgical technique selection, improved outcomes, and lower healthcare costs. Internal and external validation is essential to transpose prediction models into real life and to generalize them. However, there is still a need for evidence obtained from large cohorts, randomized controlled studies and external validation for AI and ML to be used as daily tools in orthopaedics.