The use of intraoperative testing is common in anesthesia practice and is typically used to guide drug and fluid therapy, electrolyte replacement, and transfusion practices. Point-of-care testing (POCT), defined as medical testing at or near the site of patient care,1 is frequently used and has the advantage of rapid acquisition of laboratory data. In the perioperative environment where patient physiology and clinical state are in constant and often rapid flux, information available in context allows for timely decision-making and intervention. Given this, POCT is of particular relevance to the anesthesiologist and is rapidly becoming a standard of care.2,3 Nevertheless, because of its cost and the lack of evidence supporting advantages over central laboratory testing, POCT is not available in all perioperative environments.

There is a paucity of evidence-based guidelines for intraoperative testing practice in the setting of noncardiac surgery, particularly in terms of appropriate indications for testing, types and number of tests that should be obtained, application of results, or the role of specific POCT. In addition, there is a lack of studies evaluating patient and surgical factors associated with the use of intraoperative testing, and its impact on patient outcome remains unknown. Several studies have examined predictors of testing in the intensive care unit (ICU), a setting that is similar to the operating room. In this context, the patient’s severity of illness has been related to testing, though, next to hospital length of stay, the strongest predictor identified is hospital teaching status.4,5

Given these identified gaps in the literature and a desire to obtain more information before investing locally in POCT technology, we sought to establish a regional baseline of intraoperative testing practice, focusing on the patient and surgical factors associated with its use. We posed the following question: In adult patients undergoing noncardiac surgery, what patient and surgical factors are associated with the administration of at least one intraoperative test? Given the absence of supporting perioperative literature, we generated our hypothesis based on the assumption that predictors associated with greater intraoperative physiologic derangement would predict testing. As such, we hypothesized that the number of comorbidities, duration of surgery, and emergency surgery would be positively related to intraoperative testing. In particular, we expected that undergoing high-risk surgery (i.e., a procedure with a high risk of bleeding) would be a robust predictor of intraoperative testing. We reasoned that large-volume blood loss and the resultant transfusion of blood products are associated with significant and potentially life-threatening physiologic derangement, and testing is necessary to monitor and guide treatment.

Methods

We designed and report our study according to the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines.6

This historical cohort study was conducted at Hamilton Health Sciences (HHS), a medical group of five hospitals and a cancer centre located in Hamilton, Ontario, Canada. Three of these hospitals (McMaster University Medical Centre, Hamilton General Hospital [HGH], and Juravinski Hospital [JH]) contain operating rooms providing daily surgical care. Point-of-care testing, including arterial blood gas and electrolyte monitoring, is in place at only one of these sites (HGH), but available for use only by perfusionists working in the setting of cardiac surgery.

After obtaining local institutional Research Ethics Board approval on October 18, 2013, we obtained a random sample of 1,000 noncardiac, nonobstetric operations performed across HHS from January 1-December 31, 2012—during which 23,803 surgical procedures were conducted.7 The sample size of 1,000 was based on a desired sample with 100 testing events and, in the absence of any supporting literature, on an anecdotal estimate of 10% testing frequency. The desire to obtain 100 outcomes was chosen according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines, which are recommended to minimize overfitting.8

Two independent reviewers extracted patient, surgical, and testing details for each case. Patient information included age, sex, American Society of Anesthesiologists class, body mass index, history of chronic pain, activity tolerance (classified as < 4 or ≥ 4 metabolic equivalents), alcohol abuse, other substance abuse, and comorbidities (i.e., cardiovascular, respiratory, endocrine, gastrointestinal, neurologic, musculoskeletal, genitourinary/renal, psychiatric, hematologic, immune/infectious, and integumentary comorbidities). Surgical details included date of surgery, procedural start and stop times, case priority (classified as elective or urgent [booking priority 1, 2, or 3]), name of the procedure as booked, and hospital where the surgery took place. For each case, we examined the anesthesia chart and laboratory orders in our electronic medical record to identify whether or not testing was performed between the start and end times of the procedure. When tests were performed, we documented the test type and case priority with which it was requested (classified as stat, urgent, or routine), as well as the times when the test was ordered, collected, received in the lab, and results were available.

The primary outcome variable was the administration of at least one intraoperative test. Secondary outcomes included test type and frequency, and the time from ordering the tests to receiving the results.

Statistical analysis

Stata statistical software (Stata 12.0, College Station, TX, USA) was used for all analyses, with P < 0.05 (two-sided) considered statistically significant. Patient, surgical, and testing details were summarized using descriptive statistics. Categorical variables were described using counts and percentages. Continuous variables were described using mean (standard deviation [SD]).

The primary outcome variable of administering at least one intraoperative test was evaluated using multivariable logistic regression. We report our model according to the TRIPOD statement.9

Covariates included in the model

Predictors considered for inclusion were selected based on their clinical relevance, anticipated relationship with testing, and previous literature derived from the ICU setting (Table 1).4,5 Patient factors included age, male sex, and number of comorbidities recorded in the preoperative anesthesia assessment. Surgical factors included procedure duration, type of anesthesia (general or regional), use of a laparoscopic surgical approach, non-elective surgery, and procedure start time after 5 PM. We also evaluated the impact of resident physician participation, as hospital teaching status has been shown to be the greatest predictor of testing in the ICU.4,5 Finally, to evaluate institutional predictors of testing, we also included the hospital in which the surgery was performed.

Table 1 Patient and surgical predictors considered for model inclusion

In characterizing surgery, surgical subtype by specialty (i.e., vascular, general, neurologic, orthopedic, plastic, thoracic, spinal, urologic, gynecologic) was considered. Nevertheless, the risk of intraoperative bleeding and perioperative death varies widely within surgical subtype. Vascular surgery, for example, includes both open repair of ascending aortic aneurysms, with a 2-5% risk of death,10 and varicose vein ligation, a procedure with negligible risk often performed as day surgery. As such, we elected to use the Johns Hopkins surgical risk classification,11 which classifies surgeries as minimal, moderate, and high risk according to their potential for perioperative bleeding.

Model selection

We identified 14 predictors (see Table 1) and, based on the estimated event rate, ten covariates could be included in the model. A priori, we anticipated that many predictors would be collinear and evaluated for these using variance inflation factors, with values < 4 considered acceptable. Forced simultaneous entry was used rather than automated stepwise selection, as simulation studies have shown a higher risk of overfitting with the latter approach.12,13

Model fit and discrimination

Model calibration was evaluated using a calibration curve that plots the observed outcome against the predicted use of intraoperative testing.9 Model discrimination was evaluated by generating a classification table based on a testing probability of 0.5. The classification table was used to create a receiver operating characteristic (ROC) curve, and the concordance statistic (C-statistic) was calculated. We examined the optimism-corrected C-statistic, which adjusts for statistical overfitting, by calculating the C-statistic in a bootstrapped population.14 We decided a priori that, consistent with convention,14 we would consider our model overfitted if there was a > 10% difference between the C-statistic of our original and bootstrapped samples.

Results

Descriptive data pertaining to all variables considered for model inclusion are shown in Table 1. Study findings showed that 110/1,000 (11.0%) patients underwent a total of 413 intraoperative tests. The most common test performed was the complete blood count (CBC), and the mean (SD) time for test results to become available was 29.0 (19.9) min. Further detail regarding the number and types of tests are shown in Table 2.

Table 2 Characteristics of intraoperative testing (total number of tests administered = 413)

Model building

In evaluating for collinearity, site and surgical risk were highly correlated. As well, non-elective procedures and those beginning after 5 PM were collinear. As such, the variables for site, moderate-risk surgery, and evening start were removed from the model. Collinearity of all remaining variables was minimal. We also evaluated the distribution of procedure duration by surgical risk category and, though there was a positive relationship, the distributions of procedure duration by surgical risk category overlapped. As such, both high-surgical risk and procedure duration were left in the final model.

Recognizing that they may have a non-linear relationship with intraoperative testing, we evaluated the impact of including continuous variables (i.e., length of procedure, number of comorbidities) in the model as multivariable fractional polynomials.15 Nevertheless, doing so did not significantly improve the performance of the model and, as such, these variables were left in their native form.

Clinical reasoning suggests that both regional anesthesia and laparoscopic surgery may be related to some of the other predictors included in our model. Regional anesthesia is not an option for spine or neurosurgical procedures and, similarly, laparoscopic approaches are not possible for some procedure types such as orthopedic surgery. To ensure that this did not significantly impact on model estimates, we undertook an interaction analysis wherein we created interaction terms for regional anesthesia and laparoscopic surgery with each of the other variables included in the model. We evaluated the impact of including each interaction term separately using both the general likelihood ratio (one degree of freedom) and Wald tests for the interaction term, and we found all terms to be non-significant at a threshold of P < 0.05. In addition, when we removed either length of procedure or regional anesthesia, the discrimination of the model decreased (optimism-corrected C-statistic for model without laparoscopic surgery = 0.89; optimism-corrected C-statistic for model without regional anesthesia = 0.88) without significantly impacting the values of the odds ratios for the other predictors. Therefore, given that our objective was to build a predictive, rather than an explanatory model, we decided to keep both terms in the final model.

Logistic regression model

The coefficients and odds ratios (ORs) for the final logistic regression model are shown in Table 3. The only statistically significant patient-related predictor of intraoperative testing was the number of comorbidities (OR, 1.1; 95% confidence interval [CI], 1.0 to 1.2; P = 0.03). Three surgical variables were predictive of testing, namely, emergency surgery, duration of surgery, and high-risk surgery. For patients undergoing emergency surgery, the odds of intraoperative testing were 3.8 (95% CI, 2.0 to 7.2; P < 0.001) times greater than the odds for patients undergoing elective procedures. For every additional hour of procedure time, the odds of testing increased more than twofold (OR, 2.3; 95% CI, 1.8 to 2.9; P < 0.001). Undergoing high-risk surgery was associated with an odds of intraoperative testing that was 12.3 (95% CI, 8.3 to 18.2; P < 0.001) times greater than the odds for undergoing low- or moderate-risk surgery. In contrast to testing in the ICU, the involvement of resident physicians in patient care did not have a statistically significant association with intraoperative testing (OR, 1.3; 95% CI, 0.7 to 2.4; P = 0.42).

Table 3 Results of multivariable logistic regression model

Model fit and discrimination

Results of the Hosmer-Lemeshow test (P = 0.374) suggested that it was reasonable to accept the null hypothesis that there was no difference between our model and the data. Inspection of the calibration curve (Fig. 1) suggested that there was a reasonable fit between our model and the data. A classification table was created (available as Electronic Supplementary Material) to assess the model’s sensitivity and specificity using a cut-point of 0.5. The sensitivity and specificity of the model for predicting intraoperative testing were 65.5% and 97.3%, respectively, with an overall rate of 93.8% for correct classification. Based on this classification, a ROC curve was created (Fig. 2) and found to have a C-statistic of 0.96 and an optimism-corrected C-statistic of 0.92. Based on the guidelines for interpretation of an area under the curve suggested by Kleinbaum and Klein,16 this result is consistent with an excellent predictive value.

Fig. 1
figure 1

Calibration plot (size of the marker corresponds to the number of observations)

Fig. 2
figure 2

Receiver operating characteristic (ROC) curve for multivariable logistic regression model

Discussion

In this study, we evaluated the incidence and predictors of intraoperative testing, which was performed in 11% of our sample of adult patients undergoing noncardiac surgery. The most commonly administered test was a CBC, and in a region that does not employ POCT, the time from ordering the tests to receiving the results was 29.0 (19.9) min. We developed a logistic regression model to examine the patient and surgical characteristics associated with testing and found that the only patient-related predictor was the number of comorbidities, with an 11% increase in the odds of testing for every additional diagnosis. The remaining predictors were surgical, with emergency surgery, longer surgery, and surgery associated with an elevated risk of bleeding all significantly associated with an increase in the odds of intraoperative testing.

Though they are germane to anesthesia practice, there is a lack of previously reported studies evaluating intraoperative testing as an outcome as well as an absence of previously identified predictors of testing. Information about when surgical patients are more likely to undergo intraoperative testing can help with resource allocation and planning, particularly in centres where emergency or high-risk surgeries are not routine. This information may also influence administrative decision-making regarding the purchase of POCT technology. For example, hospitals that do not routinely offer high-risk surgery may opt to collaborate with central laboratory staff to ensure that, when high-risk surgery is carried out, intraoperative tests are processed on a priority basis. Centres considering the purchase of POCT technology may choose to evaluate their frequency of high-risk surgeries prior to deciding if the expense is justified.

Outside of the operating room setting, studies have previously examined testing practice in the ICU. Zimmerman et al. identified the factors influencing laboratory blood testing in over 14,000 consecutively admitted patients. The authors found that the number of samples drawn for testing was determined primarily by the patient’s severity of illness and admission diagnosis. Nevertheless, after adjusting for patient-related factors, they found that the number of tests ordered in teaching ICUs was 2.3 times that of non-teaching ICUs, with no associated difference in outcome.5 More recently, Spence et al. used data derived from more than 10,000 patients to examine ICU testing practices across a regional healthcare system in Canada. Similar to Zimmerman, they found that, after adjusting for patient and illness characteristics, the most influential factor on number of tests, after ICU length of stay, was ICU teaching status and that this factor was not associated with a difference in mortality.4

Interestingly, though our study did not examine teaching status (as all hospitals within HHS are teaching hospitals) but rather the involvement of a resident in patient care, we did not find a relationship between intraoperative testing and the presence of trainees. This may be because, in Canada, trainees in the operating room typically care for patients while working one-to-one with faculty, and they are more closely supervised than those working in the ICU. It may also be because the previously stated increases in testing in teaching hospital ICUs were not related to the presence of trainees but rather to another aspect of the teaching hospital structure.

In our hospital system without access to POCT, we also examined the time from ordering the tests to receiving the results. While a mean time of 29 min to obtain a test result is acceptable in a non-surgical setting, a delay in obtaining results that may change management is concerning in the context of the operating room. When 29 min is compared with the time required for a point-of-care arterial blood gas test or hemoglobin measurement—typically from two to five minutes—the latter seems much more appropriately suited to the dynamic physiology and hemodynamics of the operating room. Nevertheless, there is a lack of randomized or observational studies comparing patient outcomes using POCT with central laboratory testing in the setting of noncardiac surgery.

Nevertheless, evidence from cardiac surgery has suggested that the use of POCT is associated with better outcomes for patients. Multiple studies have evaluated the use of POC viscoelastic testing to inform the need for transfusion and the appropriate blood product to administer in patients undergoing cardiac surgery.17 A recent meta-analysis found that POCT-guided transfusion management significantly decreased the odds that patients would receive allogeneic blood products (OR, 0.6; 95% CI, 0.6 to 0.7; P < 0.001) and require re-exploration due to postoperative bleeding (OR, 0.6; 95% CI, 0.5 to 0.7; P < 0.001).18 Beyond outcomes directly related to bleeding, the incidence of postoperative acute kidney injury (OR, 0.8; 95% CI, 0.6 to 0.9; P = 0.03) and thromboembolic events (OR, 0.4; 95% CI, 0.3 to 0.7; P < 0.001) was significantly decreased in the viscoelastic testing group, though there was no difference in hospital mortality, cerebrovascular accident, or length of ICU and hospital stay.18 Nevertheless, as these studies evaluated tests that are not available via central laboratory testing, it cannot be inferred that this benefit is related to test timeliness rather than to the unique information provided to clinicians.

Point-of-care testing devices were first used in the operating room in the 1980s.2 Though they are increasingly common, arguments against their routine use are continually raised because of concerns about inaccuracy,19 increased cost,19 and the possible increase in the frequency of unnecessary testing.19,20 Nevertheless, POCT technology has been refined such that the results of most near-patient tests have been found to be acceptable approximations of their central laboratory equivalents—particularly when frontline staff are sufficiently trained.21-23 Arguments based on cost have been countered with more complex economic analyses that take into account patient outcomes and hospital efficiency.24-26 In terms of increases in unnecessary testing, Wax et al. conducted a study examining intraoperative test utilization in 38,115 surgical procedures. They found that the introduction of point-of-care hematocrit, biochemistry, and arterial blood gas testing did not affect the proportion of cases in which testing was utilized or the number of tests conducted.27

Though it has been argued that POCT constitutes a standard of care,3 before investing in this technology, it is important to evaluate whether it will be routinely used. Our results may inform this process by identifying the types of surgery where intraoperative testing is typically employed. Our study does, however, have several weaknesses that must be taken into account. Data were collected by retrospective chart review, and it is unclear whether all patient information was elicited in a consistent manner for all patients in the cohort. As a result, the absence of information about a patient characteristic or comorbidity may relate either to its absence or to its not having been documented. We evaluated only 1,000 patients, included a limited number of variables in our model, did not consider the effect that individual anesthesia providers may have on whether or not a test is performed, and did not assess the appropriateness of testing when it was ordered. As such, there may be important patient, surgical, institutional, or practitioner characteristics associated with intraoperative testing that we did not identify. Provider characteristics in particular have been shown to be more important than patient characteristics in predicting testing in the preoperative setting.28 Nevertheless, given that the predictive ability of the model was “excellent”,16 it is unlikely that missed predictors would have improved the model’s discrimination and quite likely that they would have contributed to overfitting. Finally, our sample was derived from a single region and may not generalize to other operating room settings. Nevertheless, given the large number and wide-ranging type of procedures sampled,7 in our view, it provides a reasonable approximation of Canadian anesthesia practice.

Our study shows that the odds of intraoperative testing increase for patients with a greater number of comorbidities undergoing longer, emergent procedures. These odds increase substantially if the procedure involves a high risk of bleeding.11 Our results suggest that intraoperative tests are ordered for clinically appropriate reasons. This is supported by the fact that the predictors included in our model are clinically appropriate and provide a high level of outcome discrimination based on the optimism-corrected C-statistic of 0.92. We found that the time from ordering the tests to receiving the results can be substantial when tests are ordered from a centralized laboratory, particularly when compared with the speed and efficiency of POCT. In the context of the operating room environment where patient physiology is rapidly changing, this timeliness may impact on patient outcomes. Nevertheless, this has yet to be confirmed in randomized-controlled trials comparing intraoperative POCT with centralized laboratory testing. For such a trial to be efficient, patients at an elevated likelihood of needing intraoperative testing would need to be identified a priori. Our study provides preliminary data that may inform the development of inclusion criteria for such a randomized-controlled trial. It also provides information that may inform administrative decision-making with regard to POCT implementation in the absence of rigorous supporting evidence.