Research into surgical outcomes has primarily focused on the role of patient patho-physiological risk factors and on the skills of the individual surgeon. However, this approach neglects a wide range of factors that have been found to be of importance in achieving safe, high-quality performance in other high-risk environments.1 In recent papers our group and others have argued for a much wider assessment of factors that may be relevant to surgical outcome, including such factors as equipment design, communication, team performance and factors affecting individual performance and the working environment.13 In order to carry out just such an evaluation we must have an understanding of the many factors that may influence surgical outcome, and we need to have reliable and valid measures for all the relevant ones.

Teamwork is fundamental to effective surgery, yet there are currently no measures of teamwork to help evaluate team interventions, guide training or assess the impact of teamwork on outcomes. Formal team training is not offered routinely in most institutions even though teams in theater are expected to function to a high standard. In practice, operating theater (OT)teams differ markedly with respect to many of the factors that may influence surgical outcome, with very little apparent standardization. To date, research that has addressed team performance in surgery has remained focused within-discipline, namely: anesthetists,4 nursing,5 surgical,6 or their students.7 The little research that has addressed interdisciplinary teamwork has tended to focus on a single behavior, most often communication, in isolation to other behaviors.811 While it is important to describe and assess specific, individual team skills, this approach will never capture the characteristics of the whole surgical team. Furthermore, team training currently relies on informal methods of assessment and measurement, often derived from the aviation industry.12

The development of measures of team performance in other high-risk environments has proved to be a complex undertaking. The research in this field that has been carried out shows that to develop effective teamwork measures a framework or model of team performance needs to be constructed. Given the routine, structured nature of the surgical process, we chose to work from a basic input-process-output model of team performance (Fig. 1). This model is established in aviation,13 the UK National Health Service (NHS)14 and in prominent team theory literature.1518 The diagram depicted in Fig. 1 indicates that effective team function depends on input factors, such as team structure and skills, on the environment in which the team works and on the processes and guidelines underpinning teamwork.

Figure 1.
figure 1

A model of surgical team performance. (Adapted from Healey et al. 23; A.N. Healey, S. Undre and C.A. Vincent, Qual Saf Health Care, 2004)

While there are a number of methods of assessing teams, we chose to rely primarily on observation for the development of measures. Observational research has been used in many other high-risk domains effectively and more recently for assessing communication and errors in the operating theater.19,20 While team assessment in aviation, military and naval settings provided important guidelines, they were not directly applicable to surgery. Rather than simply adapt these measures, we sought to derive measures from guidelines of best surgical practice and combine broader dimensions of behavior with the assessment of specific tasks. Although we recognize that crises, such as severe bleeding, place particular demands on surgical teams, and hence require particular team skills, we have not addressed these in the current assessment instrument. We have, in the first instance, focused on assessing the team skills required for relatively routine surgery, while recognizing that more complex team skills may need to be incorporated at a later date.

The aim of this work was to develop a practical method of assessing teamwork in the theater that is able to capture the most important behavioral dimensions of surgical teamwork and task completion. We aimed to test the feasibility and practicality of systematic observations in the OT, evaluate a framework for measuring team performance and report preliminary data using the OTAS (observational teamwork assessment for surgery) instrument.

METHOD

Design

This was an observational study of surgical team performance using specifically developed measures of surgical team performance.

Sample

Data were collected from 50 general surgery operations [29 open and 21 laparoscopic/minimal access surgery (MAS)] in a single operating theater. The patient cohort comprised 24 female patients and 26 male patients, with ages ranging from 20 to 91 years; admissions were both elective and emergency. In keeping with the objective of this study, detailed analysis of the various patient and operation types was not carried out.

Thirty-three (66%) operations were the first operation of the day, and the remaining 17 (34%) were either the second or third operation of the day. The typical mix of operations contained hernia repairs, laparoscopic cholecystectomies, colectomies, anterior resections, ileostomy reversals, hemorrhoidectomies, appendicectomies, gastrectomy, laparoscopic fundoplication, laparoscopic banding procedure and Hartmann’s procedure.

The identity of the anesthetists, nurses and surgeons varied from case-to-case and sometimes within one case. However, there was a reasonable consistency of personnel in the sample. Particular nurses and OT assistants were allocated to the OT, and there was some tendency for anesthetists to work with particular surgeons, though not as a strict rule. For this sample, we limited the duration of the operation used for the purposes of data collection from 30 to 240 minutes.

Measures

OTAS has two elements, each completed in the current format by a separate observer: a task checklist, completed by a surgical observer, and an assessment of team behavior on five dimensions, completed by a post-doctoral psychologist. The general surgical process was divided into phases and stages (Table 1). Each phase consists of distinct stages. We use the abbreviations PRE, OP and POST to refer to the pre-operative, intra-operative and post-operative phases, respectively.

Table 1. The structure of OTAS is determined by critical points which mark the transition from one stage or phase of the surgical process to another

Task checklist

The task list was constructed for each stage and phase of the operation with the help of theater protocols, recommendations for good practice, domain knowledge and expert advice. Additional interviews were conducted regarding the appropriateness of the task list and the contributions of the items on the task list to teamwork and outcome. The results of these will be reported in another paper (A.N. Healey, S. Undre and C.A. Vincent, submitted). Tasks were placed into three categories: namely, patient, equipment and communications tasks. Patient-centered tasks comprised either actions or information associated directly with the patient, such as safe transfer to the operating table and patient notes present. Equipment-centered checks included the checking and counting of surgical instruments. Communication-centered tasks included information such as operative site laterality confirmation. The criteria for items on the checklist were marked yes or no depending on the nature of the task. For example, under the category of equipment preparation, diathermy machine preparation was scored positive if they were switched on and tested prior to the operation. Likewise, the anesthetic machines were deemed checked if the anesthetist on duty was observed running through the standard test list. If the operation was the second case of the day, all of the machines were scored as checked on the presumption that they had been working appropriately for the previous case. However, if the equipment had not been used for the first case, then the same criteria as the first case would apply.

Team behaviors

Team performance was also assessed on a set of teamwork behaviors and comprised of shared-monitoring, communication, co-operation, co-ordination and shared leadership, all adapted from Dickinson and McIntyre’s model of teamwork.21 Further support for using the behavioral dimensions were based on preliminary interviews by Undre et al. 3 and from other measures of teamwork, such as those used by Fletcher et al. in which they modified a scale used in aviation NOTECHS to rate anesthetists non-technical skills.22 Their team working dimension consisted of coordination, extracting information, using authority, supporting others and assessing capabilities. For the purposes of this study sub-teams (nursing, surgical and anesthetic teams) were not scored individually, but an aggregate score for the whole team was used. Behavioral summary scales on a seven-point Likert scale were used, with each scale-point relating to a certain level of quality and quantity of a given teamwork component, as determined by various descriptive elements (see example of leadership scale; Fig. 2). Notes were also taken on effective and ineffective behavioral exemplars/markers during each case, which provided support for the behavioral ratings given. Inter-observer reliability is currently being explored and will be reported in future studies. Preliminary data suggest a good level of agreement, with exact agreement on ratings on a seven-point scale on over 75% of occasions for all five behavioral dimensions. A full account of the development of the measures can be found in Healey et al., 23 and copies of the measures themselves can be obtained at http://www.csru.org.uk.

Figure 2.
figure 2

Shared leadership rating scale used to rate performance in shared leadership and assertion in ‚getting the job done’.

Clinical data

Clinical data were collected at the time of the operation, and a retrospective analysis of patient notes was carried out 6 months later to assess the immediate, peri-operative and late complications and follow-up for these patients.

Procedure

A surgeon of registrar level (observer 1) and a post-doctoral chartered psychologist (observer 2) collected data on tasks and behaviors, respectively. Other measures taken during observation included operative stage times, team composition in the theater and a record of any critical incidents.

Data Analysis

A mix of parametric and non-parametric tests was employed to analyze the data. We carried out ANOVAs to assess the differences of task completion and behavior ratings across the operative stages. In addition, we calculated Spearman’s rho rank order correlation coefficients (r s) for rates of task completion and behavior rating across stages. Finally, we used chi-square (χ2) tests for categorical analysis to explore the possible relations between behavior ratings, type and duration of the operation and post-operative outcomes.

RESULTS

Operation Duration

The overall mean duration of the operations was 136 minutes (range: 61–240 minutes). The breakdown of the operation duration into the various phases and stages is outlined in Table 2, and the means are depicted in Fig. 3. The mean duration (in minutes) for the stages were: PRE2A = 28.78 [open surgery (open) = 29.86, minimal access surgery (MAS) = 27.28], PRE3 = 10.8 (open = 11.68, MAS = 9.57), OP1 = 8.94 (open = 8.27, MAS = 9.85), OP2 = 39.1 (open = 44.62, MAS = 31.47), OP3 = 15.18 (open = 17.31, MAS = 12.23) and POST1 = 9.72 (open = 9.41, MAS = 10.14). A two-way ANOVA showed that there was no difference in operative duration between the types of operation for any stage of the procedure.

Table 2. ANOVA of operative type and operative duration for any stage/phase of a surgical operation
Figure 3.
figure 3

The average duration of the Observational Teamwork Assessment for Surgery (OTAS) operative stages. PRE2A signifies the duration of anesthesia (adapted from Healey et al. 200423)

Task Completion

Table 3 summarizes task-completion, with the total number of tasks checked (n) and the mean, minimum and maximum number of tasks completed per operative phase. Task completion was high (92%) in the post-operative phases, lower intra-operatively (76%) and only 69% pre-operatively. Completion of the communication tasks was the lowest [68.64±1.44 (SE)], followed by equipment tasks (75.9±0.656), and the completion of patient tasks was the highest (93.48±0.639). Figure 4 shows that patient tasks were consistently high across phases, whereas communication remained lower in both the PRE and OP phases compared to the POST phase, while equipment task completion increased across phases. There was no significant difference between open and closed22 operations on task completion for any phase or stage.

Table 3. Summary of task completion per phase and task type
Figure 4.
figure 4

Average task completion per phase of the operation.

An example of a the completed task list for PRE2 is given in Fig. 5

Figure 5.
figure 5

Example of task list from PRE2.

Team Behavior Ratings

Overall mean ratings of all team behaviors were reasonably high (>4 on a seven-point scale) and did not vary greatly across the different phases of the operation (Fig. 6), although team behaviors were rated slightly more highly in the OP phase (mean = 5.4) than in the PRE (5.2) and POST phases (5.1). Significant differences were, however, observed in the ratings on the different kinds of behavior, with communication (4.56) rated lowest, followed by leadership (5.20), shared-monitoring (5.41), co-ordination (5.48); co-operation (5.77) was rated the highest. A two-way repeated measures ANOVA, conducted on behavior5 and phase,3 confirmed that these behaviors differed significantly from each other overall and across phases [F (4, 46) = 54.45, P < 0.000]. Communication and co-ordination were rated higher in the OP phase than in PRE and POST phases, whereas leadership, co-operation and shared-monitoring were comparatively more consistent across phases. As with task-completion, there was no significant difference between open or closed operations with respect to behaviors.

Figure 6.
figure 6

Average behavior ratings per phase of the operation.

Relations Between Behavior and Task Completion

After aggregating tasks into mean percentage scores, we tested whether any or all of the behavior rated ’s’ correlated with overall task completion. In the PRE and POST phases ratings of communication rating correlated with overall task completion (r s = 0.468, P < 0.000 and r s = 0.345, P = 0.007, respectively), but this was not the case in the OP phase. We also tested whether there was any correlation between completion of the separate task-types and ratings of separate behaviors. In the OP phase there was no correlation between behavior ratings and tasks. However, pre-operatively there was a highly significant positive correlation between communication tasks and rating on communication behavior (r s = 0.415, P = 0.001) and a marginally significant positive correlation between communication tasks and rating of leadership (r s = 0.233, P = 0.05). Post-operatively, communication again was positively correlated with communication task (r s = 0.308, P = 0.01), and co-ordination was positively correlated with equipment task completion (r s = 0.321, P = 0.01). These results suggest overall that there is some relation between broad-based team behaviors and task completion but that, fundamentally, they are addressing different aspects of team performance.

Post-operative Complications

Of the 19 cases with complications, only four were in the MAS operation category, whereas 15 were in the open operation category. Principal complications included pain, pyrexia, wound infection, urinary retention, splenic-injury, bladder injury and two post-operative deaths (one cardiac arrest and one post-operative sepsis with multi-organ failure leading to death). We found no significant relationships between rates of task-completion and the occurrence of complications. We did find a relationship between team behaviors and complication occurrence. However, while the results show that there may be an association between teamwork and post-operative complications, this analysis cannot be interpreted at face value at this time. A full risk-stratified analysis and other factors, such as technical skills, will have to be taken into account before any such association can be made; this area will be explored further in future studies.

DISCUSSION

We have described the first full trial of an observational method of team assessment in surgery. The assessment covers both tasks and behaviors in the PRE, OP and POST phases of surgery. Team performance was measured against a structured protocol of tasks and teamwork behaviors which were assessed on a set of ordinal scales. Ratings of behaviors were global and based on the observation of effective and ineffective behaviors in the OT. This framework of measurements therefore covers both specific tasks carried out by the team and also the team’s overall performance. This distinction is important because different teams may complete a similar number of routine tasks, but vary in the quality of their communication and co-ordination.

Our findings suggest that observational assessment in operating theaters is feasible, purposeful and informative. The ratings of overall team performance were reasonably high, though variable, but the completion of operative tasks was some way below best-practice guidelines and certainly below the standard of performance expected of high-reliability teams. Clinically, the lapses in task completion did not appear to affect outcome, but there was evidence that clinically significant steps were being missed, which at the very least eroded safety margins. There was, for instance, a frequent failure to check both surgical and anesthetic equipment and a failure to confirm the procedure verbally, patient notes were absent in about one-eighth of the cases and in over a third of the cases there was no verbal confirmation of readiness to start the operation. There were also incomplete notes, lack of equipment, lack of blood results and patients not being starved on wards.

Considerable variation in stage duration was also found, which reflected, in part, variations in procedures and patient-specific problems. However, the variation pre-operatively was also attributed in many cases to delays. Delays and changes to the case-lists occurred in over 70% of the cases. The reasons for delays were numerous, including the patient journey to theater, busy ward staff or porters and bed allocation problems. The time that elapsed in the anesthetic room once the patient had arrived varied for various reasons: the patient’s condition and the absence of the surgeon or anesthetist. Other delaying factors included the staff being unfamiliar with stock locations, coupled with a lack of compensatory supervision. The potential risk of an error and/or accident is the highest when these delays and associated deviations from best practice mount up and are compounded by the additional pressures of workload.24

Communication was rated lower than other behaviors, particularly in the PRE and POST phases. This was due in part to the fact that inter-disciplinary communication is less formalized and more distributed before and after the actual operation. There was a positive correlation between communication rating and overall task completion pre-operatively and post-operatively, but not intra-operatively. There was also a positive correlation between communication tasks and the rating on communication behavior and leadership. These results suggest overall that there is some relation between broad-based team behaviors and task completion. However, importantly, the results also address different aspects of team performance in providing both information about protocol and deviations from established protocol and evidence-based information on ad hoc variations in team behavior.

A crucial issue to be explored in future studies is whether team performance can be shown to affect outcome. In our study, we found no significant relationships between task-completion and complication. However, behavior ratings were associated with the occurrence of complications. For instance, we found that verbal communication confirming antibiotics was observed in only 53% of cases, which may have influenced infection outcome. On further investigation only four patients had a post-operative infection, and of these only two had a lack of verbal confirmation of antibiotics being given. However, lack of confirmation verbally does not necessarily mean that the antibiotics were not written up already or that the patient did not already receive them with the induction of anaesthesia. Outcome data have to be interpreted with caution as appropriate risk stratification must be applied for a proper analysis and various other factors, such as technical skills, have to be taken into consideration. This study was a feasibility study to assess teamwork at this stage and not to relate teamwork to outcome, although this may be possible in the future once other factors have been accounted for. The main point of this study is simply to illustrate that different forms of team performance data may be used in the general modeling of the system and its relationship to outcome. It will be the aim of future studies to test which form of measurement and data are the most effective for analyzing team performance in general.

Another sensitive subject to be taken into account while undertaking research of this nature is the fear of blame and disciplinary action. As pointed out by Vincent et al. 1 in their paper, fears may be expressed by members of surgical teams that observation may be used for ‘surveillance’, checking up and, possibly, as a basis for disciplinary action. We have stressed to the team that our data will only be used as a research tool. Most importantly, it is necessary to emphasize that the purpose of such observations is not to study individuals, but processes, procedures and team performance in general. The aim is to observe common patterns over a series of operations to help improve teamwork and efficiency, not to examine individual deficiencies.

We are currently in the process of refining the observational measures and testing in a further sample of operations in a different theater setting. A particularly important development is to provide ratings of the three sub-teams (nursing, anaesthetic and surgical) as we believe this will give a more accurate reflection of theater performance than the overall team ratings used in this study. We also intend to develop a short version suitable for use in training and simulation and, crucially, for direct comparison of the team performance during training with that actually observed in the OT. Further assessment of reliability and validity is also required and is being addressed in on-going studies. We need to pay particular attention to delineating the process of observation of behaviors and specifying how these should be rated so that the measures may be more widely used. A clearer specification of scoring may provide the necessary detail to show why certain teams perform at certain levels on each behavioral dimension and how and why those performance elements affect outcome. A closer analysis of tasks, particularly communication, may provide some indication of those relations. Moreover, a consideration of team composition and its relationship to team co-ordination is also important. Indeed, we see the potential to construct specific behavior ratings that are more closely related to the activity of particular sections of the team at particular stages.

Downloads

The measures used in this study will be freely available from The Clinical Safety Research website: http://www.csru.org.uk