Strategies to maximize efficiency in the operating room (OR) have been reported extensively in the literature.1-4 Efficient OR utilization must account for the cost of both underutilized and overutilized OR hours.1,5 From the accounting perspective, the staffing expense during scheduled hours is a sunk cost, so the savings for finishing cases early is effectively zero.6 Therefore, OR efficiency has two competing priorities, i.e., using all available time to perform cases and controlling overutilization.2,3,6,7 As the duration of cases increases (for example cardiac surgery), this effect is heightened; both underutilization and overutilization in any one room can exceed three hours.

Table Average surgical time and standard deviation of the 13 different categories of surgeries (time in hours). The table also shows the lognormal parameters and Kolmogorov-Smirnov P values for generic surgical times

Overutilization is defined by the ratio of non-budgeted time to the total budgeted OR time for a series of surgical cases.1 Regardless of who pays, a case cost analysis showed that it is more cost effective to proceed rather than to postpone the surgery, suggesting that a policy of “zero tolerance for overtime” may be too rigid.8 The challenge is to book surgical cases to reduce the probability of requiring overtime.2

Overutilization is multifactorial and includes: delayed start of the first case, add-on urgent cases, longer than expected case duration due to the random nature of the surgical times, and a mismatch between OR workload and allocated OR time.7,9,10 The Operating Room Coordinator plays a pivotal role. Risk-adverse coordinators are cost-effective since they minimize unused OR capacity without increasing the risk of overtime.11

Overutilization in a single OR can be minimized if we assume that: 1) the first case starts on time; 2) there are no add-on cases; 3) allocated OR time was calculated to match the typical OR workload; and 4) we can accurately determine the time required to complete each case.12 For lengthy cardiac surgical procedures, the start times and the ability to add on cases can be controlled easily through operational policy. Often, the allocated OR time is historical and based on nursing shift patterns rather than on OR workload.

The estimation of the time to complete a series of surgical cases in a single OR is challenging due to the variability of both surgical procedures and turnover times. Schedules based on the surgeons’ own estimated operating times is a popular technique; however, on average, surgeons tend to underestimate the time needed to perform a procedure and the time required to fit cases into the time available.13-15 Although, epidemiologically, this practice accounts for the majority of underestimations of OR times, it can be bias corrected through choices made on the day of surgery.16,17

Operating room schedulers were twice as likely to underestimate than to overestimate case durations, supporting the use of a surgeon-specific database of mean procedure times to assist the scheduling.18 Historical data combined with the surgeons’ own estimates improved the schedule time estimates.19 A computerized scheduling system providing the surgeons with mean procedure durations did improve the relationship between expected and observed schedule times. A linear prediction model that combined objective factors with the surgeons’ estimates of operative times was a strong predictor of total OR time.20

A prospective trial of thoracic and spinal surgical cases was carried out to explore the use of the trimmed mean duration of the final ten cases combined with an updated surgeon case duration. The results would not improve OR management decisions substantively unless a significant change to the procedure, surgical approach, or anesthetic technique was required.21

Historical data from the sum of the mean case durations and turnover times can be used to derive the time to complete a series of surgical cases in a single OR.2,6,22

It is widely considered that individual cases and turnover times follow lognormal distributions.23-27 , Footnote 1,Footnote 2 Prediction error could be improved by the use of three parameter lognormal distribution.11 Some authors have stated that the statistical distribution of the sum of the surgeries and turnover times that compose the schedule for a single OR can follow either a normal or a Weibull probability distribution.5

Techniques to sum independent lognormal variables have been studied extensively. Although exact closed-form expressions for the lognormal sum probability density function are unknown, several analytical approximation methods exist in the literature,28-30 , Footnote 3 The different methods have been compared to determine the parameters of the resulting lognormal distribution.31 While each method had its own advantages and disadvantages, the Fenton-Wilkinson approach gives the most accurate estimate, particularly in the cumulative distribution function tail. The three other methods differ considerably in their complexity, and only the simpler Fenton-Wilkinson method offers a closed-form solution for approximating the underlying parameters to the lognormal distribution.C

Alternative theories exist regarding the distribution for the sum of lognormal random variables. These include a Type IV Pearson distribution.32

It was proposed that the sum of multiple lognormal surgical and lognormal turnover times would result in a lognormal distribution of the total time. This proposal was based on the assumption that the sum of lognormal variables is needed to create a new lognormal distributed variable.A In order to find a good estimator for the duration of the schedule for a single cardiac OR and thus reduce the chance of overutilization, this project used this novel technique to compare: 1) the second tertile cut-off point of the lognormal distribution of the sum of surgical cases and turnover times with 2) the sum of the average duration of surgical cases and turnover times.

Methods

The Research Ethics Board of St. Michael’s Hospital approved the use of routinely collected data to create and test the model. The need for consent was waived as the data was collated anonymously. As well, this was a quality improvement study involving minimal risk, and there was no change to normal OR scheduling practice. Data were derived from the perioperative information system, ORSOS One-Call 2.6 (McKesson Provider Technologies, Alpharetta, GA, USA).

We studied 6,090 cases performed by nine different cardiovascular surgeons from January 1, 2004 to January 30, 2009 at St. Michael’s Hospital, located in Toronto, Ontario, Canada. The cases were grouped clinically into 13 different categories (Table). Coronary artery bypass graft surgery (CABG) accounted for 63.33% of the cases. Surgical time was defined as the interval from the time the patient entered the OR until the time the patient departed the OR. This interval included several subsets of time: 1) Anesthesia time: The interval from the time the patient entered the OR to the time the surgery started. Previous initiatives in our institution have resulted in prompt starts for the first case of each day and consistent practice amongst the anesthesia staff. The average induction time was 46.69 min with a standard deviation of 16.70 min for all cases. The lack of significant variation allowed us to include this subset of time in the surgical time; 2) Operating time: The interval from the time the procedure began (i.e., first incision is made) to the time the procedure ended (i.e., the wound has been closed and the surgeons have completed all procedure-related activities on the patient); and 3) Transfer time: The interval from the time the procedure ended to the time the patient departed from the OR. For cardiovascular surgery, where the patient is cared for in a cardiovascular intensive care unit, transfer time occurred when the patient was to be transferred from the operating table to a bed and to vacate the OR.

We collected turnover time data (the interval from the time the patient departed the OR to the time the next patient on the schedule entered the same OR) during a five-month period (January 2009 to May 2009). The average turnover time was 0.50 hr, with a standard deviation of 0.23 hr. The five-month period that we selected was more representative of current practice in the cardiac ORs. Initiatives had been introduced to standardize the setup and equipment needed for each case, and, consequently, the turnover times for all cases were grouped together. Therefore, and in line with the literature, we assumed that cases and turnover times were statistically independent.22

Using SAS (SAS Institute Inc., Carey, NC, USA), we fitted three parameter lognormal distributions to surgical and turnover times. The minimum times or thresholds were calculated directly by SAS using the maximum likelihood method.33 To study the lognormal goodness of fit, we conducted Kolmogorov-Smirnov tests and performed two graphical analyses: 1) we compared time histograms with the fitted lognormal distributions, and 2) we constructed probability plots to compare both of the real data quantiles with the lognormal ones.

We defined two estimators of the schedule duration (the interval from the time the first patient entered the OR to the time the last patient on the schedule departed the same OR). The first estimator was the “estimated average duration of the schedule” calculated as the sum of the average surgical times and turnover times in the schedule. This “empirical average” was equivalent to the mean value of the lognormal probability distribution of the schedule duration. The second estimator was the second tertile cut-off point of the lognormal distribution (i.e., the time taken for two-thirds of cases to be completed). The use of this prediction point was based on an economic assessment that determined that OR allocations were not being planned appropriately if more than one-third of the ORs had overruns.7

A scheduling tool programmed in Microsoft Office Excel (Microsoft Corporation, Redmond, WA, USA) was created utilizing a database of the lognormal parameters for surgical times and turnover times. Two procedures and surgeons were selected, and the scheduling tool automatically calculated the three parameters of the resulting lognormal probability distribution of the schedule, as previously described.A The estimated average and second tertile cut-off point are displayed. When this cut-off point was greater than the staffed block time, a red warning box was displayed. In those cases where there were < 20 records, the tool used the parameters of the “generic distribution” for the type of surgery using pooled surgeons’ data. From June 1 to August 31, 2009, simultaneous tracking of the scheduling program was performed. The average estimated schedule duration and the second tertile cut-off point were compared with the actual schedule duration. To determine the best predictor of the schedule duration, we excluded blocks where cancellations took place, provided the first case started on time and there were no add-on cases. We did not intend to use this methodology to predict cancellations.

Results

Within the weekly cyclic master schedule, the cardiovascular service was allocated 13 block times in three cardiac ORs. From Monday to Thursday, four of these block times were 10.25 hr and seven block times were 8.25 hr. On Fridays the remaining two blocks had block times of 9.25 and 7.25 hr, respectively. The assumption was that each block could accommodate two surgeries.

For each procedure (generic surgeon), the lognormal parameters and Kolmogorov-Smirnov goodness of fit P values for generic operating times (generic surgeon) are shown in the Table. The surgical time histogram and the fitted lognormal distribution for aortic valve replacement/repair and CABG (non surgeon specific) are illustrated in Fig. 1a. The probability plot for the same procedure is shown in Fig. 1b. Both the histogram and the probability plot confirm that the lognormal distribution reflects true surgical times.

Fig. 1
figure 1

1a) Histogram and lognormal fitted distribution for aortic valve replacement/repair and coronary artery bypass graft (CABG) (non surgeon specific); 1b) Probability plot for aortic valve replacement/repair and CABG (non surgeon specific)

We collected data from 138 scheduled blocks at the end of the three months. Forty-three blocks were excluded due to last minute changes to the schedule resulting in unpredicted delays or case cancellations. Ninety-five schedules were analyzed. Forty-two (44.2%) comprised two sequential coronary artery bypass graft (one to three bypasses) surgeries.

For each schedule, the estimated average duration was located at the 52.74 percentile point (standard deviation 1.90%) due to the skewness of the lognormal distribution, with a range from 51.24% to 59.99%

A histogram of the differences between the estimated average schedule duration and the actual duration is shown in Fig. 2a. The mean is 0.19 hr (standard deviation 0.98 hr). A histogram of the differences between the estimated second tertile cut-off point and the actual duration of the schedule is shown in Fig. 2b. The mean is 0.59 hr (standard deviation 0.93 hr). In both cases, the normal test statistic (based on the Shapiro-Wilk test), the Anderson-Darling P value, and the Cramér-von Mises P value suggested that these differences follow a normal distribution. A paired Student’s t test was performed to determine if the mean values of the differences were statistically different (H0: there is no difference between the mean values). The t statistic was -2.83 (188 degrees of freedom) with a one-tail P value of 0.0026 providing evidence against H0.

Fig. 2
figure 2

2a) Differences between the estimated average duration and the real duration. 2b) Differences between the estimated second tertile cut-off point and the real duration of the schedule

There were 37 (39.95%) schedules with overtime during the three months. The average overtime was 65.81 min (standard deviation 50.33 min), with a range from five minutes to 170 min. These overtime schedules were analyzed to determine the predictive capacity of the methodology. The estimated average duration predicted 44 schedules would require overtime, 32 of which overran. The second tertile cut-off point predicted that 61 schedules would require overtime, only 35 of which overran. On average, the real duration of the schedule in these false positives was located at the 26.67 percentile point (standard deviation 17.53%).

Two overtime schedules were not predicted by either the estimated average duration or the second tertile cut-off point. The actual schedule durations were located on the 88.91 and the 93.6 percentile points.

Discussion

On the day of surgery, management decisions based on an accurate estimation of the duration of an operating schedule can reduce OR overutilization and help to maximize OR utilization.22 We used the Fenton-Wilkinson approximation to sum the lognormal procedure times and turnover times to help predict the duration of an operating schedule and the likelihood of overtime when the first case starts on time and there are no add-on cases.A

When fitting lognormal distributions to surgical times and turnover times, 50.9% of the P values were ≥ 5%, suggesting a good fit. It has been suggested that real distributions may not fit any particular theoretical distribution and that graphical analysis may be an alternative powerful statistical tool.34 , Footnote 4 We used a graphic criterion to fit lognormal distributions to surgical times and successfully tested the results using numeric simulation.

Our results demonstrate a difference of 11.64 min (2.28% for an 8.5 hr session) between the actual duration and the estimated schedule duration, validating the use of the average duration of a series of surgical cases and turnover times to estimate total schedule duration. We found the average value to be located from 51 to 53 percentile points for 74.74% of the cases. This implies that use of the sum of average times captures slightly more than half of the possible schedule durations.

The duration of the schedule is random by nature; therefore, there is always a probability that procedures will take more or less time than the average. Our results suggest that neither the estimated averages nor the second tertile cut-off points alone are able to predict the need for overtime without considerable false positive results.

The combined use of the estimated average schedule duration and the second tertile cut-off point may help to limit overtime expense.A Suppose the average duration of two scheduled CABGs is estimated at 8.12 hr. If these cases were allocated an 8.25 hr block time, there is a 47% chance of exceeding the time available. The second tertile cut-off point for this schedule duration is 8.68 hr. Therefore, the schedule has a 66.7% chance of being completed within 30 overtime minutes.

Our results suggest a mismatch between the typical OR workload for the cardiovascular service and the allocated OR time for the service. First, only 35.71% of the assigned block times are long enough to perform the most common (42.21%) combination of procedures reliably; CABG + CABG and 100% of the schedules allocated in the 7.25 hr block were predicated to require overtime. Psychological biases and/or lack of optimization may apply.1,4,7,35 Facilities can calculate appropriate OR allocation using the statistical analysis of Strum et al. (1997), as adapted by McIntosh et al. (2006) and Pandit and Dexter (2009).1,4,7

An analysis of the cancellation data where the second case was cancelled due to insufficient time showed that most of the first cases exceeded the second tertile cut-off point. This is an expected effect of lognormal distributed operating times and cannot be prevented using this methodology.

This study presents several limitations. First, the authors’ results were designed to apply subject to the following conditions: 1) the first case starts on time; 2) there are no add-on cases; and 3) use of existing OR time allocation. Second, the grouping of the different surgeries into 13 categories was made based on clinical criteria; therefore, it is subject to errors and omissions. Third, due to insufficient data, we could not fit distributions for re-operation cases, although these cases often take longer. The use of primary case data to estimate the duration of these redo cases could underestimate the real duration. The final limitation could occur if the performed case differed from the booked case.

The data and analysis were based entirely on the cardiac OR setting, though they may be transferrable to other operating rooms, such as neurosurgery, where the normal workload includes a small number of procedures of long duration.

In conclusion, we measured the ability of a modified version of a methodology proposed by Alvarez et al. (2010) to estimate the probability distribution of the duration of the schedule in the Cardiovascular Service at St. Michael’s Hospital, Toronto. We used two estimators of the schedule duration, i.e., the estimated average schedule duration and the second tertile cut-off point. We demonstrated that the estimated average schedule duration is indeed a good estimator of the real duration of the schedule. We also validated the decision rule proposed by Alvarez et al. (2010) to reduce overtime by combining the second tertile cut-off point with the overtime tolerance.A This study tries to improve the utilization of an existing time allocation for a cardiac OR. Using the data collected and a simulation model for days with different mixed cases, it may be possible to determine the optimal time allocation required for cardiac surgery at our institution.