Introduction

The increasing demand for high quality and readily accessible health care has led health systems to spend more and more time finding solutions to improve efficiency [1]. The Operating Room (OR) is considered the financial center by many experts, accounting for about 35% to 40% of costs [1, 2].

Complexity is one of the word that best describes the OR; high expectations from the patients, interactions between different professional figures, unpredictability and not simple surgical case scheduling are just some of the elements that lead to its difficult management [3]. Despite attempts to resort to industrial principles to increase efficiency, the particular characteristics of the OR make such application not always easy. Having the possibility to analyze the large amount of data deriving from the operating block in order to obtain interpretative models and precise prediction estimates could lead to a better use of resources, limiting waste of capital and bringing to a system optimization with a better and safer service.

Machine Learning (ML) is a subtype of Artificial Intelligence that uses algorithms learning from massive amounts of data in an iterative way without being explicitly programmed to do so [4]. They are able to extract schemes from diverse sources of data, explain them and create prediction model [5]. No fatigue, no loss of attention, no careless mistakes is made by these machines during the analysis of titanic amount of data [6].

The acceleration of the digitization of health care data, the magnification of the storage capacity and the application of powerful analysis systems will surely be fundamental in the improvement of medical care, even in such a complex context as the perioperative medicine.

In this systematic review, an analysis of how ML is applied in perioperative medicine was made, with the goal to understand if and how these technologies, can improve the OR management, reducing the costs and maximizing revenue and care quality.

Methods

This review was based on the PRISMA statement guidelines and is the result of a collaboration between the Department of Management Engineering and the Anesthesiology, Critical Care and Pain Medicine Division of the University of Parma. The authors conducted a systematic search on Scopus and PubMed databases, and other verified sources, specifically, the Cochrane library and Mesh. Franklin Dexter’s annotated bibliography on predicting operating room task durations was also included [7]. All the relevant studies published between 2015 and February 2019 that used ML in operative block are considered. The string comprised various combinations of “machine learning” “anesthesia”, “perioperative”, “pacu”, “operating room”, “recovery room”, “robotic assisted surgery”.

It has been chosen not to include studies prior to the year 2015 in this research. The year 2015 was chosen after a publication analysis performed together with our statisticians; in this year we have observed the start of the exponential increase in research on ML in medicine compared to previous years, where ML was still mainly a theoretical thing and used sporadically, with few cases in real trials.

Papers concerning children, animals, published prior 2015 or after February 2019, abstracts, and studies written in languages other than English were excluded. All papers concerning OR, anesthesia, Recovery Room (RR) and Post Anesthesia Care Unit (PACU) were included in the review.

The sequent was the syntax used on Scopus and Pubmed to make the research:

  • “(ABS “machine learning“) AND (ABS (anesthesia) OR ABS (perioperative) OR ABS (PACU) OR ABS (“operating room“) OR ABS (“recovery room”) OR ABS (“robotic assisted surgery“) AND (LIMIT-TO (PUBYEAR,2019)) OR LIMIT-TO (PUBYEAR,2018) OR LIMIT-TO (PUBYEAR,2017) OR LIMIT-TO (PUBYEAR,2016) OR LIMIT-TO (PUBYEAR,2015) AND (LIMIT-TO (LANGUAGE,” English”)) “.

  • “((((((((((((Machine Learning) AND anesthesia) OR operating room) OR perioperative) OR pacu) OR robotic assisted surgery) OR recovery room) AND (“2015“[Date - Publication]: “2019/02/27“[Date - Publication])) AND English [Language]) AND ( ( Clinical Trial [ptyp] OR systematic [sb]) AND free full text [sb] AND Humans [Mesh] AND English [lang] AND adult [MeSH]))) AND Clinical Trial [ptyp] AND free full text [sb] AND Humans [Mesh] AND English [lang] AND adult [MeSH])”.

Results

The search yielded 932 results, without duplicates. In the first screening we excluded all the reviews and the conference papers, considered not eligible for our research, for a total of 906 studies.

From the 26 full text remained, we eliminated all the veterinary and pediatric studies; the result was 22 studies. After another screening, we eliminated other 3 studies not strictly connected to ML application.

In the final selection, 19 studies, were included in the analysis[8-26] . Fig. 1 shows the PRISMA flowchart for selection (Fig. 1).

Fig. 1
figure 1

Literature search flow diagram based on PRISMA

The exponential increasing of the studies in the last four years show how just recently the scientific community realized the power of those instruments; many more studies on ML in medical field are expected to be published in the coming years (Fig. 2).

Fig. 2
figure 2

Publication for year since 2015

Papers were summarized in Tables 1 and 2. Study characteristics relating to ML methods, populations, trial settings, variable, and outcomes were extracted to build our comparison table. Later, we splitted our table, isolating all the papers strictly connected to administration (5/19), that provide a proposal of organizational or predictive models of duration and cancellation of surgical cases [8,9,10,11,12]. The remaining studies (14/19) analyze, instead, outcomes that could be used as indirect parameters in the OR management [13,14,15,16,17,18,19,20,21,22,23,24,25,26].

Table 1 Table of selected studies strictly pertinent to organizational models
Table 2 Table of selected studies that analyze variables integrable in the OR planning

Analyzing the typology of ML, only one study employed an unsupervised technique [21], with the most used represented by the supervised. The most used algorithms were decision trees and random forest (multiple decision trees)[10,12,13,14,19,20,26].

All the studies showed a significant increase in performance with the use of ML, compared to the traditional models.

Discussion

In a purely managerial field, the analysis of the selected articles has made possible to make important observations on the potentialities of the use of ML in the medical field and, in particular, in the OR.

Estimation of Surgical Case Duration

The excellent prediction results have allowed to calculate the duration of many type of procedures; this could be able to improve all the operations of OR scheduling and the management of hospital resources.

An example of prediction model in this context is the study of Tuwatananurak et al., in which a proprietary algorithm of ML, leap Rail®, is used to estimate the duration of single surgical cases [9]. The algorithm learned to make predictions based on a set of examples consisting of a dataset of 15,000 surgical cases, subdivided into a training dataset and a test dataset. It was asked to the machine to predict the duration of individual case. The algorithm, free to use any information in the training dataset, identified patterns between them. The leap Rail® model created multiple models, using different algorithms each of which searched for different patterns between data. Once the best was identified and chosen (often a random forest), the machine used it to make future predictions. The performance of this model was measured with new case data never seen by the machine, the data from test dataset. Among of 1059 cases, the algorithm made predictions for 93,5% of them (990); the average difference between the predicted and the current value was 20 min in the case of ML, compared to 27 min of the traditional models, bringing a significant statistical improvement of almost 7 min. Considering 15 min as the threshold for a clinically significant prediction error, the predictive accuracy percentages were respectively 31.2% and 41.1% for traditional method and leap Rail® (p < 0.0001). In addition, the use of this algorithm has led to a 70% reduction in the general inaccuracy of scheduling operations.

Another example was the application of these new technologies in robotic surgery. Robotic surgery, although on the one hand offers great advantages, on the other hand it is associated with considerable costs [27]; Childers et al. estimated a cost per procedure of $ 3568, of which $ 1866 dedicated for instruments and accessories [28]. In a retrospective analysis conducted in abdominal surgery, Khorgami et al. found an average cost of $ 12.340 ± $ 5880 for the robotic cases compared to $ 10.227 ± $ 4986 in case of laparoscopic surgery (p < 0.001) [29]. It is therefore essential, in order to maximize profits, to try to make the most of every robotic unit purchased. An important step for improving efficiency could be to plan an accurate scheduling of surgical procedures by ameliorating the accuracy of single duration case predicting. In the study made by Zhao et al, 500 consecutive robot-assisted surgery cases between 1 January 2014 and 30 June 2017 were selected [8]. 28 variables have been analysed. For comparison, a simple linear regression model to a variable has been used. Then, several supervised ML techniques were implemented, including Random Forest and Neural Networks and the performance using a 10-fold cross-validation model was measured. The new model, in particular the Boosted Regression Tree, permitted to pass from an accuracy level of case prediction of 34.9% to an impressive 51.7% (p < 0.001).

However, data sources reliability and knowledge is imperative in this context. The study by Shahabikargar et al. is a good example [12]. Using data from the Gold Coast Hospital, a predictive model of surgery duration was studied. The authors have shown that thanks to the careful analysis of the data, a filtering phase of surgery episodes has allowed an improvement in the overall improvement of random forest; the prediction proved even more accurate using ensemble methods.

PACU

According to Fairley et al. more than 30% of total costs in health care is due to waste of time or space, which could be largely eliminated by improving the organization and internal logistics of hospitals [10].

Usually, after surgery, patients are admitted into the PACU (Post-Anesthesia Care Unit) rooms used for post-anesthesia recovery after surgery. Often, however, the organization between the OR and PACU is deficient, and many times it is the case that the PACU are congested. When there is no bed in the PACU, patient is forced to stay in the OR, with all the costs that result (much higher than those of the PACU), until one of them is free. Fairley et al., in 6 months, have estimated more than 20 h where PACU were full, for a total of more than $ 44,000, primarily due to inadequate surgical planning [10]. Their study analyzed 5371 procedures, of which 4350 used PACUs for patient recovery. The ML model was designed based on historical data and then used to predict the OR occupation, in terms of numbers and times, for new clinical cases. With historical set and the new data predicted by the ML machine, new scheduling operations have been carried out. In previous OR scheduling, prior to the use of ML techniques, 480 min of PACU’s unavailability were revealed; with the new optimized subdivision, it was reduced to 113 min, with a 76% reduction without reducing the OR usage [10].

These four studies were chosen as the most representative of the managerial/economic advantages offered by the implementation of ML techniques within the logistic hospitals department [8,9,10,11,12]. Managers of medical companies, implementing these techniques, could make the most of the available resources by optimally managing. Earning even a few predictive minutes on individual hospital cases translates in saving tens of thousands of euros each quarter. The cost for OR minutes has been estimated to be between $ 22 and $ 133 [9]. Having a better perception of the duration of the single cases allows managers to make better scheduling operations. The distribution of procedures over the days is a difficult operation that can be based only on predictions; the accuracy of them is obviously extremely important. A hospital that knows how to make the most of its space, resources and time is a hospital that offers better care with lower costs and higher profits [10].

Surgical Cases Cancellation Detection

Day of surgery cancellations remains one of the major causes of inefficient use of OR time and a waste of the limited health care resources. The average cost for a surgery cancelation is $336, with some surgeries that cost more to cancel than others. Specifically, it was found that the most cancellation cost is related to the neurosurgery procedures which cost $619 for cancelled case and the less cancellation cost is related to ENT procedures which cost $215 to cancel [30].

In everyday practice, detecting high-risk surgery cancellation is not easy but fundamental; it demands more automatic classification methods and techniques that can detect high-risk surgery cancellation from large databases.

According to Luo et al, the global cancellation rate (CR) generally ranges from 4.65 to 30.3%. The use of ML algorithms, and, in particular, random forest has permitted to identify surgeries with high risks of cancellation and to providing a new method for the managers. The results indicate, with a stable performance, that the effective identification of surgeries with high risks of cancellation can be done with great results. Surgery manager could apply these new technologies in order to plan preventive measures to reduce the CR. As mentioned earlier, a lowered CR will lead to a higher utility rate of institutional resources, such as ORs, resulting in improved cost efficiency of the healthcare system [11].

Other Variables

In this systematic review other variables studied by ML have emerged, which could be used indirectly for a better surgical cases planning. In a study published in 2016, using a database containing 898 patients, a Multiple Criteria Decision Analysis (MCDA) method has been successfully used for assessing the American Society of Anesthesiologist physical status (ASA) score [25]. Having accurate risk stratification is not only extremely important in a clinical setting, but also becomes essential from an organizational point of view. In fact, a complex patient often requires extra attention during anesthesia maneuvers and a higher number of monitoring. Wu et al., analyzing the factors that extend the anesthetic induction time, highlighted how an ASA ≥ III score, not surprisingly, is listed among these [31]. Therefore, the introduction of this information in the programming system could, undoubtedly, increase its accuracy. A similar example is the “Alex Difficult Laryngoscopy Software (ADLS), designed for prediction of difficult laryngoscopy, with a positive predictive value of 76% and a negative predictive value of 76% [23].

A further application is the use of ML during perioperative procedures. New technologies have been tested to guide the needle during anesthetic maneuvers. Hadjerci et al. have presented the first automatic system that identifies the anatomical target and provides the needle insertion trajectory during the execution of ultrasound-guided regional anesthesia [24]. In another study, a hybrid machine learning to help operators during ultrasound-guided epidural injections has been proposed [21]. The use of these advanced artificial intelligence methods could increase the accuracy of the maneuvers, reduce the possible complications, speeding up the procedure simultaneously; in the foreseeable future, knowing if, how and when they are used should become a factor to be considered during planning.

Statistic and Machine Learning Algorithms

Before the advent of the ML, all predictive operations were carried out using only predictive statistic, and still now this must continue to be considered a valid and strong instrument [32, 33].

Statistical forecasting has its origin in classical statistics whereas ML has its origins in computers science. Although the two methods have common properties, each of them has its own peculiarities. Choong et al. well summarized the main differences between classical statistical analysis and medical big data analysis; if in the first case we need assumptions a priori, in the second case the system learn from data, generating hypotheses and identifing predictive patterns [5, 34].

A specific question about linear regression is present in literature. In a recent systematic review, the authors have well summarized the experts positions regarding this concern [35]; basic regression models, based on assumptions and benefiting from human intervention, would not fit into the narrow definition of ML. Examples of such an interpretation are emerging in medical studies [35, 36].

There are different types of ML; classically they are divided into supervised and unsupervised, depending on whether the expected output is established a priori or not [5]. From our analysis, with the exception of one study [21], the major type of ML used is the supervised one and among these, decision trees and random forest are the most exploited. Considering that one of the targets is to help the hospital managers have a better comprehension of the principle of the classification method, Decision Trees, with its great quality forecasting capacity, was the preferred method, probably firstly because it has the advantage that can produce an easily interpretable output that can be red and comprehended by the hospital managers (while techniques like Neural Network are not able to). Understanding how it works is certainly essential to be able to interface with these new technologies, exploiting all their capabilities. Therefore, obtaining certifications in information technology and big data management could be crucial, also with the aim of overcoming barriers and possible mistrusts from the competent authorities and administrations [37].

Although these technologies have great potential in the OR organization, currently also limits about their applicability in this area exist. If certifications of healthcare professionals using these systems have already been mentioned, another limit is their use in the event of new procedures [38]. As shown in the table, the ML requires a sufficient number of data to have valid results; if on one side this means that it is able to analyze an immense number of variables and data, on the other side if this number is not reached the results may erroneously not be significant, as it could be in the case of estimates made on a few cases.

Limitations

Although the study includes a good number of articles, and both Scopus and PubMed are complete and structured databases, a sure limitation of this study is that it was carried out on just two libraries and few other sources. Moreover, in the development of the tables it has not always been possible to obtain the data necessary for the analysis because not shared.

In addition, heterogeneity between studies, for instance in outcome definitions, analysis methods and the endpoint of the studies itself, did not allow to perform a meta-analysis.

Conclusion

ML models have a huge potential to improve hospital medical services. Thanks to them, we can perform a precise perioperative risk assessment or be more aware of the need for Recovery Time of each patient, allowing medical staff to develop different and personalized services for patients, increasing security and quality of perioperative period. In an administrative and managerial perspective, ML systems allow an accurate prediction of the time of use of the most expensive structures, such as OR and PACU, on which most of the profits depend. However, further studies are needed to assess the effective role of these new technologies in the perioperative medicine and OR management.

Author’s Contribution.

The manuscript has been read, reviewed and approved by all of the authors. The authors take responsibility for and agree with the data presented.