Introduction

Postoperative pulmonary complications (PPCs) are common after surgical procedures, with a variable incidence reported in the literature [1,2,3]. PPCs have been shown to increase hospital cost, length of stay, and in-hospital mortality [4,5,6]. Efforts have been made to identify patients at highest risk of these complications in hopes of performing earlier interventions and targeted therapy [7, 8]. Several risk assessment algorithms have been developed to assist with identification of patients at risk for PPCs. The utility of some early risk indices developed out of the Veterans Affairs system in the United States were limited by poor generalizability due to the patient populations in these studies [9, 10].

The Assess Respiratory Risk in Surgical Patients in Catalonia (ARISCAT) and Prospective Evaluation of a Risk Score for Postoperative Pulmonary Complications in Europe (PERISCOPE), which validated the ARISCAT tool, as well as the Gupta pulmonary risk index, which utilizes data from the National Surgical Quality Improvement (NSQIP) database, are commonly used models for predicting PPCs [3, 11, 12]. These models have been validated in a large number of patients and are routinely used in the preoperative setting to determine the risk of pulmonary complications. While these tools were developed using large surgical populations in both Europe and the United States, few otolaryngology cases were included in their development. In fact, only 6% of the cases in the ARISCAT cohort were otolaryngology surgeries, and just 0.3% of the cases evaluated in the Gupta study were otolaryngology surgeries. Additionally, these studies include otolaryngology as a single category and do not specify the type of surgery performed. This is problematic, as otolaryngology encompasses a broad range of surgeries spanning minor outpatient procedures to major head and neck cancer surgeries.

Head and neck cancer resections are prolonged surgeries that carry risk of PPCs, with a reported incidence as high as 33% [13]. There are important differences in these procedures when compared to other major surgeries, such as distortion of the upper airway and routine use of tracheostomy. As free flap reconstruction has become the gold standard for these patients, tracheostomy is regularly included in surgical planning due to expected post-operative upper airway obstruction [14]. Furthermore, the average surgical and anesthesia time in these cases is longer than most abdominal or thoracic cases, but the impact of case length without a concomitant abdominal or thoracic incision is unknown. Given these unique factors, it is uncertain if ARISCAT and the Gupta index provide accurate estimates of patients’ risk when undergoing head and neck cancer surgery. Importantly, misclassification of complication risk could lead to unnecessary delays in delivery of care. In this study, we sought to externally validate these risk indices in a large cohort of head and neck surgery patients. We hypothesized that these models would perform poorly in this population.

Methods

Study design and cohort

This is a retrospective cohort study of all head and neck oncologic surgeries with free flap reconstruction performed at a tertiary care center in the Southeastern United States. Following approval by the local Institutional Review Board, cases matching the above description between 2005 and 2017 were identified through use of CPT codes (20969 and 15757) and manually reviewed to ensure free flap reconstruction was performed after major head and neck resection. Given the retrospective nature of the study, informed consent was not required by the ethics review board. Baseline characteristics were recorded for each case including age, gender, body mass index (BMI), pre-operative metabolic equivalents (METs), and the American Society of Anesthesiologists Physical Status Classification (ASA PS). Pre-operative variables assessed included baseline hemoglobin levels, pre-operative oxygen saturations, and serum albumin. Other patient characteristics included pertinent past medical history, home oxygen requirement, and whether patients were treated for an upper respiratory infection (URI) within 30 days prior to surgery. Risk scores based on ARISCAT and the Gupta index were calculated for each patient [3, 11, 12]. Factors used for calculation of the ARISCAT score included age, pre-operative oxygen saturation, whether the patient was anemic pre-operatively (hemoglobin less than 10 g/dL), whether the patient had an upper respiratory infection within a month prior to surgery, site of surgery, length of surgery, and whether the surgery was emergent. Factors used for calculation of each the Gupta index included age, American Society of Anesthesiologists class, chronic obstructive pulmonary disease, dependent functional status, preoperative sepsis, smoking before operation, and type of operation [3]. Given the primary surgical location of the head and neck, each case was considered ‘peripheral’ and ‘non-emergent’ for the purpose of calculating the ARISCAT score and ‘ENT’ for the purpose of calculating the Gupta index.

Outcomes

The primary outcome was development of a PPC, which was comprised of post-operative pneumonia (PNA) and post-operative respiratory failure (PORF). These two endpoints were defined based on the criteria outlined in the PERISCOPE study as shown in Table 1 [12]. Billing code diagnoses were obtained to identify patients with PNA and PORF initially, and each case was manually reviewed to ensure that PERISCOPE criteria were met. Primary outcomes for this study were at the initiation of study design. No subgroup analyses were conducted in this study.

Table 1 Criteria for diagnosis of primary endpoints

Statistical analysis

Patient demographics and characteristics were summarized using descriptive statistics. Frequencies were used for categorical variables, and continuous variables were summarized by the sample median and interquartile range. Functional status as classified in the Gupta pulmonary risk index (independent, partially dependent, or fully dependent) was not readily available in the patient record. Instead, we incorporated preoperative metabolic equivalents (METS), which is a less subjective measure of functional capacity, to ensure that the Gupta pulmonary risk index was not unduly penalized. Every potential categorization of METs was assessed, and the performance of the Gupta index reported in the study was the best performance observed over all the potential categorizations. Data completion rate was excellent for all variables with the exception of METs. Missing data was dealt with using multiple imputation. The risk indices were evaluated on discrimination using the area under the receiver operating characteristic (ROC) curve. Calibration was assessed via a plot of the predicted risk versus the nonparametric regression estimated risk. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and percentage correctly classified were also calculated for each potential cutoff. Statistical analyses were conducted using R (Boston MA, USA).

Results

Seven hundred ninety-four patients underwent free flap reconstruction between 2005 and 2017. Baseline characteristics are summarized in Table 2. The median age was 62 (IQR 41–83) with 519 males (65%) and 275 females (35%). 65.9% and 46.5% of patients reported use of tobacco or alcohol, respectively. Prior to the operation, 77 patients (9.7%) had a diagnosis of COPD, 22 (2.8%) had asthma, and 9 patients (1.1%) had a history of lung cancer. Thirteen patients used home oxygen (1.6%), with 10 requiring 1–3 l/min by nasal cannula. Other co-morbidities included obstructive sleep apnea with use of CPAP (n = 16, 2%), congestive heart failure (n = 33, 4.2%), prior cerebrovascular accident (n = 51, 6.4%), and liver disease (n = 35, 4.4%).

Table 2 Patient Characteristics

The mean ASA PS class was 2.92, with 77.7% of patients having an ASA of III (n = 617). Median preoperative oxygen saturation was 98% (IQR 95–100) and hematocrit was 40 (IQR 30–47). Six percent of patients had a URI within 30 days prior to surgery (n = 45). All operations were elective and considered peripheral, with a median operative time of 585 min (IQR 412–870) and a transfusion rate of 35.4% (n = 281). Thirteen percent of patients (n = 106) required reoperation during the same hospital admission. Eight percent of patients developed myocardial infarction (n = 61). Perioperative information is summarized in Table 3.

Table 3 Pre-operative, surgical, and post-operative data

Post-operatively, mean mechanical ventilation time was 0.73 days (range 0–41), with 17% of patients (n = 135) requiring one or more days on the ventilator and one patient being discharged from the hospital requiring mechanical ventilation. Ten percent of patients (n = 81) developed PNA, 7.7% of patients (n = 61) developed PORF, and 4.8% of patients (n = 38) developed both PORF and PNA. The overall incidence of PPCs in this cohort was 13.1% (n = 104).

ARISCAT

The ARISCAT score displayed poor ability to discriminate those with and without a PPC with an area under the curve (AUC) of 0.596 (95% CI: [0.542, 0.649]). It demonstrated reasonable calibration to a point, but a ceiling effect became apparent in the mid 20s beyond which the score consistently overestimated increased risk. The ARISCAT score is often implemented clinically using a cutoff of 26 [11]. Using this cutoff, the ARISCAT displayed good sensitivity 92% (95%CI: [85, 97]) and negative predictive value 94% (95%CI: [88, 97]), but a poor specificity and positive predictive value of 17% (95% CI: [14, 20]) and 14% (95% CI: [12, 17]) respectively. The performances at other potential cutoffs are given in Table 4.

Table 4 Performance measures of post-operative pulmonary indices

Gupta index

The Gupta index was substantially limited in that it separated participants into only 7 levels of risk. The discrimination of the Gupta index was estimated to be slightly higher than the ARISCAT index with an AUC of 0.649 (95%CI: [0.589, 0.701] but the difference failed to attain statistical significance (p = 0.08). Despite its discriminatory ability, the index displayed almost no calibration. The optimal cutoff for the Gupta index was at approximately 30% risk. At this cutoff, the index displayed adequate specificity of 88% (95%CI: [85, 90]) and negative predictive value of 91% (95%CI: [88, 93]), but mediocre sensitivity of 41% (95%CI: [32, 51]) and positive predictive value 33% (95%CI: [25, 42]). The performances at other potential cutoffs are given in Table 4. The receiver operator curve and linear prediction analyses for both indices can be seen in Fig. 1 and Fig. 2, respectively.

Fig. 1
figure 1

ROC analysis for both ARISCAT (solid line) and Gupta index (dotted line) of post-operative pulmonary complications

Fig. 2
figure 2

Linear prediction for ARISCAT (solid line) and Gupta index (dotted line) for post-operative pulmonary complications

Discussion

Post-operative pulmonary complications are known to increase cost, length of hospital stay, and the risk of 30-day mortality in surgical patients [4, 6, 15, 16]. Given the substantial burden of these complications, predictive risk assessments have become an integral component of perioperative medicine. Obtaining an accurate prognosis allows for appropriate treatment planning and implementation of strategies to reduce the risk of PPCs, including lung-protective ventilation, aggressive pulmonary rehabilitation, fluid status assessment, and aggressive hemodynamic monitoring [17, 18]. Although the ARISCAT score and the Gupta pulmonary risk index are widely used and accurately stratify a variety of surgical patients into risk categories for the development of PPCs, we observed poor predictive performance of these instruments in major head and neck surgery at our institution. Recognizing the poor performance of these tools is important, as there is consensus that tools such as ARISCAT should be used in evaluating the majority of patients undergoing surgery [19]. Based on our results, these tools have the potential to lead to unnecessary delays in care delivery or encourage inappropriate changes to intra-operative and post-operative care for patients undergoing head and neck surgery with free flap reconstruction.

The observed predictive performance of these instruments in major head and neck surgery is low, and this is likely due to a combination of factors. First, while the development of these tools involved an impressive review of greater than 250,000 surgical cases in total, there were relatively few otolaryngology cases. Specifically, of 1627 cases evaluated, only 5% (n = 133) were otolaryngology cases in ARISCAT and 6% (n = 307) of 5099 cases in PERISCOPE, which validated the ARISCAT study. Despite the large number of patients in the NSQIP dataset used to develop the Gupta pulmonary risk index, only 0.3% (n = 646) were otolaryngology procedures. In fact, prior to the current study, Loeffelbein and colleagues performed the largest study evaluating PPCs in major head and neck surgery. They evaluated 648 patients and found an overall rate of PPCs of 18%, and they determined that patients with ASA class 3 or greater, obesity, or history of alcohol abuse should be monitored closely for development of PPCs [20].

Additionally, otolaryngology surgery spans a wide range of cases from minor outpatient procedures, with low inherent risk of PPCs, to major head and neck surgeries with a high risk of PPCs. Without knowing the specific types of otolaryngology cases included, it is unclear that these risk indices apply to the present patient population. It should be noted that the above-mentioned instruments perform very well in patients undergoing intra-abdominal, open chest, or vascular surgery, which may be due to the fact that there are predictable alterations in respiratory function post-operatively due to the nature of these procedures. Specifically, chest wall and diaphragm manipulation does not occur in head and neck surgery, and such manipulation in other types of surgery likely contributes significantly to the risk of developing PPCs [21, 22].

Other characteristics unique to major head and neck surgery may also contribute to the low discrimination observed in the perioperative risk assessment tools. In head and neck oncologic surgery, patients often have tumors that compromise the airway, and surgical resection frequently alters upper aerodigestive anatomy. This alteration leads to difficulty in swallowing and potential risk for aspiration. While patients are routinely kept NPO following surgery to allow for healing, they are still at risk of aspirating oral cavity secretions [23]. Additionally, free flap reconstruction requires substantial operative time for vascular anastomosis of the flap pedicle, and these patients frequently have comorbid poor pulmonary function. Due to the extended length of surgery, combined with a high frequency of comorbid conditions, most head and neck surgical reconstructive patients would fall into high-risk categories for the existing risk assessments. However, tracheostomy is commonly utilized in the immediate post-operative period as a reliable means to prevent post-operative airway obstruction caused by free flap reconstruction, and it does not appear to lengthen hospital stay [24, 25]. The presence of a tracheostomy may improve secretion clearance and thereby reduces the risk of post-operative PNA. Tracheostomy also decreases airway dead space and resistance, while simultaneously aiding in ease of post-operative positive pressure ventilation and lung expansion [26, 27].

Despite the differences between head and neck surgery and other types of major open surgical procedures, these patients are still at considerable risk for development of PPCs [13]. There is significant variation within the literature defining PPCs and the factors determined to be associated with pulmonary complications in head and neck free flap reconstruction. Moreover, the prevalence of PPCs has been widely variable within each population studied. Xu and colleagues found an incidence of 11.6% of post-operative pneumonia in patients undergoing head and neck surgery with free flap reconstruction, which was significantly higher than patients not undergoing free flap reconstruction. In this study, it was determined that prolonged hospital stay was associated with pulmonary complications, though it is unclear as to whether the prolonged hospitalization was causative in the development of PPCs [28]. Other studies have found much higher rates of development of PPCs and varying risk factors that are associated with their development. Damian and colleagues studied a cohort of 110 patients and found a rate of PPCs of 33%, but they were unable to identify specific risk factors in regard to pre-operative pulmonary status and risk of PPC development [13]. Similarly, Pohlenz and colleagues found a 34% incidence of respiratory complications but did not define the criteria necessary to constitute a complication. They determined that operative time and ASA status were associated with the development of any medical complication, including respiratory complications [29]. Forty-five percent of patients developed PPCs in a study by Petrar and colleagues [30]. The higher incidence of PPCs reported was likely reflective of the fact that atelectasis was included in their analysis. We elected not to include atelectasis as a pulmonary complication as this finding is common following prolonged surgery and can persist for weeks without significant consequences [31]. While the above studies are valuable in identifying rates and risk factors for development of PPCs, they did not specifically evaluate the performance of risk assessment tools in this population.

This study has several strengths, which contribute to the significance of the result. First, we chose to define major PPCs as they were previously defined in the PERISCOPE study, which allowed comparative analysis of patients undergoing free flap reconstruction to commonly used preoperative pulmonary indices [12]. Second, the patient population on which we report is comprised of a greater number of patients undergoing major head and neck surgery with free flap reconstruction than either the ARISCAT or Gupta pulmonary risk studies. Finally, the overall incidence of PPCs in our study was 13%, which represented the most common postoperative complication for the cohort in this study.

This study has several limitations as well. First, as this is a retrospective review, the collection of data was limited to the variables that were already available within the medical record. Since functional status as defined by Gupta and colleagues was not available, the index had to be modified to use a categorization of preoperative METs. By using the best performing categorization, we attempted to portray the Gupta index optimistically. However, it is possible that the index’s performance without the change may have differed from what is reported here. Consequently, comparisons between the two indices in this manuscript should be interpreted tentatively. Additionally, the population in this study is limited to a single center. While this factor could affect generalizability of the results, patients undergoing the procedure in question (head and neck cancer resection with free flap reconstruction) are typically similar in terms of risk factors and exposures both across the United States globally [32]. Finally, there are some differences in the populations from which both the Gupta index and ARISCAT scores were developed compared to the current study’s population. For example, both of the aforementioned indices evaluated populations with a roughly 1:1 ratio of male and female patients, whereas the current study shows male predilection (65% male patients). While it is possible that this and other differences could affect the scoring of the Gupta index and ARISCAT, we feel that these differences may actually serve as an argument for not using these indices in head and neck surgery populations.

This is the largest study evaluating the performance of perioperative risk prediction models for pulmonary complications in patients undergoing major head and neck surgery with free flap reconstruction. The existing predictive models performed poorly, likely due to the unique aspects of head and neck oncologic surgery and the heterogeneity of the surgical populations in which these tools were developed. While these tools are useful for a wide range of surgical procedures, their utility in head and neck surgery appears to be limited. A risk assessment tool developed explicitly for major head and neck surgery with free flap reconstruction is needed for accurate perioperative risk assessment.

Conclusions

Patients undergoing major head and neck surgery with free flap reconstruction are at risk for development of post-operative pulmonary complications. While perioperative risk assessment tools have been developed to identify patients most at risk, the indices examined here did not perform well in head and neck free flap reconstructions performed at our institution. While both the Gupta index and ARISCAT are commonly employed in pre-operative risk assessment prior to major surgeries, utilizing these indices in this particular patient population may lead to delay in care delivery secondary to misclassification of risk. As such, there is need for a predictive model that specifically addresses the risk of development of PPCs in patients undergoing major head and neck surgery.