Introduction

Head and neck squamous cell carcinoma (HNSCC) is the sixth most common cancer worldwide, accounting for more than 900,000 cases and 400,000 deaths annually [1, 2]. The majority of patients present with loco-regionally advanced disease (Stage III–IV), which carries a poor prognosis: more than 50% of these patients develop local or regional recurrence with or without distant metastases within 3 years of their initial treatment [3,4,5]. Around 13,800 HNSCC recurrences occur each year in the United States alone; median survival for these patients is between 6 and 15 months, depending on patient- and disease-related factors [6, 7].

Historically, locally recurrent disease has been treated with combined modality approaches, including salvage surgery and reirradiation with or without chemotherapy (CT) [8, 9]. Selective neck dissection, elective neck dissection with more extensive lymph node removal, and prophylactic neck radiotherapy (RT) have also been shown to decrease the risk of recurrence [10]. Based on landmark phase III trials[11], the Food and Drug Administration (FDA) and European Medicines Agency (EMA) approved pembrolizumab, an anti-programmed cell death 1 (PD-1) therapy, as a first-line treatment for eligible patients with recurrent or metastatic HNSCC [12, 13]. However, only a minority of patients with HNSCC realize significant clinical benefit with these novel immunotherapies; despite advancements in surgical and systemic treatments, survival rates have not significantly increased over the past 30 years [1, 14].

Recurrences are difficult to treat for a number of reasons, including the effects of primary or initial treatments on malignant cells, and the infiltrative and multifocal nature of recurrent HNSCC lesions [15]. Multiple national and international guidelines have been developed to improve and guide treatment management in patients with recurrent head and neck cancer (HNC), but to date, no study has systematically reviewed the quality and rigor in development of these guidelines. The Appraisal of Guidelines for Research and Evaluation (AGREE II) tool was established to assess the quality of clinical practice guidelines (CPGs) and its use has been validated and proven effective in a variety of fields [16,17,18], including otorhinolaryngology [19,20,21,22,23,24]. In this study, we aimed to assess existing recommendations for the management of recurrent HNC using the AGREE II instrument, in order to evaluate their clinical applicability and the methodological rigor and transparency with which they were developed.

Methods

Literature search strategy

The search methodology, study selection, and reporting employed in this study were conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) criteria [25]. An electronic search of the PubMed, Embase, and Scopus databases for all clinical practice guidelines for the management of recurrent of HNC was conducted using the following search terms and their synonyms: “recurrent” or “metastatic” with “head and neck cancer” and “guideline”. Searches were also conducted in Google Scholar and the gray literature to capture any additional articles not retrieved from the first three databases.

Guideline selection

Two independent raters (EDR, NS) reviewed the titles and abstracts of the guidelines identified for relevancy. Inclusion criteria were English language consensus statements, both national and international, published from the time of database inception to November 10, 2021. Systematic reviews, textbook chapters, and primary studies were excluded. The text of the remaining studies was then reviewed in full by two independent reviewers (EDR, NS). An article was excluded upon full text review if (1) recurrent or metastatic HNC was not the primary focus of the guideline, (2) it did not mention recurrent or metastatic HNC, (3) it was an older version of an existing guideline, or (4) it was not a guideline or consensus statement. Any potential discrepancies were resolved via two author consensus or via discussion with a third rater (KR).

Data extraction

Included CPGs were reviewed and one author (EDR) extracted the following data from each article: first author and year of publication, guideline developer group, country, funding source (if any), evidence base/methodology, and target user population. Missing data are labeled as “—” or “None specified”/ “Not specified” in the provided tables.

AGREE II scoring

Each included CPG was reviewed independently by four authors (JL, SV, DR, MS) who were trained in the AGREE II methodology and evaluation criteria (www.agreetrust.org), and had previously implemented the AGREE II instrument in prior publications. Guidelines were scored on each of the 23 items in the six domains of the AGREE II instrument: (1) Scope and purpose, (2) Stakeholder involvement, (3) Rigor of development, (4) Clarity of presentation, (5) Applicability, and (6) Editorial independence. A summary of the items within each domain can be found in Table 1. For each item, CPGs are rated on a 7-point Likert scale contingent on how many of the necessary criteria for that domain were met by the guideline. CPGs that did not address a given item within a domain received a score of ‘1’ for that item. Scaled domain scores (range 0–100%) were calculated in accordance with the formula provided in the AGREE II instrument methodology manual[26]:

Table 1 Summary of the 23 components within the six quality domains of the AGREE II instrument
$$\mathrm{Scaled}\;\mathrm{domain}\;\mathrm{score}=\frac{\mathrm{Obtained}\;\mathrm{score}-\mathrm{Minimum}\;\mathrm{possible}\;\mathrm{score}}{\mathrm{Maximum}\;\mathrm{possible}\;\mathrm{score}-\mathrm{Minimum}\;\mathrm{possible}\;\mathrm{score}}\times100$$

Scaled scores in each of the six domains were utilized to calculate an overall numeric score (the mean of all six scaled domain scores) and an overall quality rating. CPG quality ratings was based upon the number of domains in which the CPG demonstrated adequacy (as defined by a scaled domain threshold of ≥ 60%)—guidelines with ≥ 5 adequate domains were designated as ‘high’ quality; 3–4 domains, ‘average’ quality; and ≤ 2 domains as ‘low’ quality. As previously established in the literature, a scaled domain score of 60% or greater was used to signify adequacy within a given domain [20, 21, 27].

Data analysis

Interrater reliability and scoring agreement between the four independent raters was assessed by a fifth independent author (EDR) via two-way random effects intraclass correlation coefficient (ICC) analysis (Stata 15.1, StataCorp, College Station, TX). ICC was classified as poor (< 0.20), fair (0.21–0.40), moderate (0.41–0.60), good (0.61–0.80), or very good (0.81–1.00) as previously established in the literature.

Results

Literature search and guideline selection

Our comprehensive search of the literature yielded an initial set of 1799 records, leaving 1157 articles for screening after duplicates were removed. After iterative title/abstract and full text screening, five CPGs ultimately met inclusion criteria and were included for data extraction and quality assessment: (1) ACR Appropriateness Criteria® Retreatment of Recurrent Head and Neck Cancer after Prior Definitive Radiation—Expert Panel on Radiation Oncology—Head and Neck Cancer from the American College of Radiology (ACR) [28], (2) AHNS series: Do you know your guidelines? Guideline recommendations for recurrent and persistent head and neck cancer after primary treatment from the American Head & Neck Society (AHNS) [29], (3) Guidelines for treatment of recurrent or metastatic head and neck cancer from the Indian Cooperative Oncology Network (ICON) [30], (4) Selection of systemic therapy in patients with locally advanced and recurrent/metastatic head and neck cancer: RAND-based expert opinion by an Italian multidisciplinary panel (Benasso et al.) [31], and (5) the Recurrent head and neck cancer: United Kingdom National Multidisciplinary Guidelines (Mehanna et al.) [32]. Studies that were excluded upon full text screening included Systemic Targeted Therapy for Recurrent and Metastatic Head and Neck Squamous Cell Cancer, which is a chapter in the textbook “Head and Neck Cancer: A Multidisciplinary Approach” [33], and Management of the Neck in Squamous Cell Carcinoma of the Oral Cavity and Oropharynx: ASCO Clinical Practice Guideline [34], which did not focus on recurrent or metastatic HNC. Figure 1 demonstrates the PRISMA flow diagram with our full literature search and selection process.

Fig. 1
figure 1

Flow diagram demonstrating literature search results and strategy for guideline inclusion

Guideline characteristics

Characteristics of the included guidelines can be found in Table 2. Included guidelines were published between 2011 and 2019. Two of the guidelines were published by American professional medical societies (ACR, AHNS) [28, 29], two were written by multidisciplinary panels/symposiums [30, 31], and one did not specify its guideline developing group. None of the guidelines specified a funding source. None of the CPGs were supported by a systematic literature review; the most common evidence base was expert consensus supplemented by a clinical case review or non-systematic literature review (n = 3, 60.0%). Guidelines were intended for low-volume community practice oncologists [30], physicians in pediatrics [32], and clinicians [31]; two remaining guidelines did not specify their target user population.

Table 2 Recurrent head and neck cancer management clinical practice guideline characteristics

Guideline evaluation via the AGREE II instrument

Appraisal scores from the four independent reviewers were used to calculate scaled domain scores for each guideline in each of the six quality domains of the AGREE II instrument; results can be found in Table 3. Overall quality of the available consensus statements on management of recurrent or treatment resistant HNC is poor. Average overall scaled domain score was 40.9% (± 11.0), with four of the guidelines (80.0%) receiving an overall quality rating of ‘low’ (ACR, AHNS, Benasso et al., Mehanna et al.), and one receiving an ‘average’ quality rating (ICON). The AHNS guideline recommendations did not receive satisfactory scores in any of the domains.

Table 3 AGREE II Scaled domain scores and CPG quality assessment

Of the six AGREE II domains, CPGs received the lowest scores by far in domains 5 (‘Applicability’, 12.9%) and 3 (‘Rigor of development’, 22.3%); none of the guidelines achieved an adequate score in either domain. The highest average scaled domain scores were found in Domain 4 (‘Clarity and presentation’, 59.2%) and Domain 1 (‘Scope and purpose’, 57.2%); three and two CPGs achieved adequate scores in each domain, respectively. Domain 6, ‘Editorial independence’, had the largest variation in scores with a standard deviation of 35.7%, and ranged from a low of 0.0% (Mehanna et al.) to a high of 100.0% (ICON).

Interrater consistency

Interrater agreement within each domain was evaluated via ICC analysis, and can be found in Table 4. Mean ICC score was 0.74 across the six domains, indicating ‘good’ interrater reliability. Consistency between reviewers was ‘very good’ in three domains (‘Scope and purpose’, ‘Applicability’, and ‘Editorial independence’) and ‘good’ in one domain (‘Rigor of development’). Interrater agreement in ‘Stakeholder involvement’ and ‘Clarity of presentation’ was moderate.

Table 4 Intraclass correlation coefficients (ICC) for interrater reliability across the six AGREE II domains

Discussion

HNC remains a significant public health concern, with an estimated 66,360 new diagnoses expected this year in the United States alone [35]. When recurrence occurs, local disease can be managed via salvage surgery, or, if unresectable, via reirradiation with or without CT [3]. More recently, immunotherapies, specifically immune checkpoint inhibitors (ICIs), have emerged as a promising fourth treatment modality for these patients. Not only has our scientific understanding of the biology of these tumors evolved, but new data has emerged that specifies the role and efficacy of salvage surgery, reirradiation, and systemic therapies (including newly developed ICIs). Given this complexity in the treatment decision-making process, there is a need for clear guidance regarding the clinical indications and comparative efficacy of these treatment modalities. Evidence-based guideline recommendations for the management of recurrent or treatment-resistant HNC would benefit both clinicians and patients by optimizing care and improving clinical outcomes as a result. Despite the importance and significant disease burden of recurrent HNC worldwide, no study thus far has systematically evaluated the quality and methodological rigor of existing guidelines on the matter. Implementing the AGREE II tool for CPG quality assessment, we conducted a systematic review and quality appraisal of guidelines addressing the clinical management of HNC. Our secondary aim was to highlight the salient recommendations found among the guidelines to provide clarity on the subject.

Overall, our systematic review revealed that evidence-based guidelines in the literature are lacking, and the quality of the existing recommendation statements is poor. None of the five guidelines identified met the AGREE II threshold for a ‘high’ quality clinical recommendation document, and four CPGs achieved an overall AGREE II quality rating of ‘low’. Only one remaining study met the criteria for an ‘average’ quality guideline—the Guidelines for treatment of recurrent or metastatic head and neck cancer published by the ICON expert panel in 2014. The ICON guideline received satisfactory scores in four of the six AGREE II domains, but had notable weaknesses in applicability and rigor of development, leading to an overall scaled domain score of 60%. This finding demonstrates that the ICON guideline may be the most clearly presented, evidence-based CPG that is currently published. However, as it was published in 2014, the ICON guideline was the second oldest of the included guidelines. Its recommendations thus predate much of the clinical progress surrounding PD-1 inhibitors such as pembrolizumab and nivolumab, which were approved by the FDA for the treatment of platinum-refractory recurrent or metastatic HNSCC in 2016, and have since become a first-line regimen for those patients eligible for immunotherapy [3, 11, 36]. In such a rapidly progressing field, it is imperative that guidelines are updated to reflect current research and have explicit systems in place to maintain this status.

Two of the lowest scoring CPGs were the AHNS guideline recommendation, which received an inadequate score in all six AGREE II domains, and the Mehanna et al. United Kingdom National Multidisciplinary Guidelines, which had the lowest mean scaled domain score overall (32%), with a minimum score of 0% in Domain 6 (‘Editorial independence’). A score of 0% in Domain 6 indicates that the recommendations formulated were unduly biased with competing interests, or did not explicitly declare any funding or potential competing interests.

The highest average scaled domain scores were in Domains 1 (‘Scope and purpose’) and 4 (‘Clarity and presentation’). This suggests that the guidelines were found to adequately and clearly present their overall aim, specific health questions addressed, and target population, and that key recommendations were specific, unambiguous, and easily identifiable. This is in line with the prior literature—Domains 1 and 4 are most frequently the highest scoring AGREE II domains, regardless of the subject material of a practice guideline [20, 22, 37, 38]. It may be easier to achieve higher scores in these domains as they do not necessitate methodological rigor or transparency in guideline development, but more basic elements that are essential to any scientific publication, such as the specificity and clarity of research questions and recommendations made.

The lowest scoring domains in this systematic appraisal were Domains 3 (‘Rigor of development’) and 5 (‘Applicability’), which received mean scores of 22% and 13%, respectively, among the five CPGs reviewed. As Domain 5 is well established as one of the most predictive of usability and relevance to clinical practice[23], future CPG development groups must address barriers and facilitators to guideline implementation (and strategies to overcome said barriers), and integrate cost analyses to address any potential resource limitations to guideline implementation. Low scores in ‘Rigor of development’ indicate fundamental issues with existing recurrent HNC guidelines and the processes they used to synthesize evidence and formulate recommendations. Future guidelines can be strengthened in this domain by making specific improvements in items 7, 13, and 14, which were consistently the lowest scoring items in the domain. Guideline development groups should: (1) employ systematic literature searches to develop their evidence base, ideally in accordance with a pre-established, validated criteria such as the PRISMA checklist, (2) send out completed guidelines for external peer/expert review prior to publication, and (3) designate a specific timeline and procedure for updating the guideline recommendations, to maintain recommendations that are up-to-date with the current literature and clinical trials.

Recommendations

Employing a rigorous review of evidence-based guidelines from all five CPGs, the authors and independent reviewers have summarized applicable recommendations regarding the management of recurrence in head and neck cancers as follows:

  • Follow-up: close surveillance and follow-up after primary treatment is strongly recommended. Current guidelines recommend follow up every 1–3 months in the first year, 2–6 months in the second year, 4–8 months in the third to fifth year, and every 12 months thereafter.

    • If recurrence is suspected, main imaging modalities are computed tomography (CT) or magnetic resonance imaging (MRI). Positron emission tomography (PET) with CT (PET-CT) is warranted to evaluate for metastatic disease.

  • Surgery: if resectable, salvage surgery is first line. Should be performed by experienced surgical teams with reconstructive expertise input. Factors precluding surgical resection include:

    • Tumor factors: extensive skin/soft tissue or infratemporal involvement, prevertebral fascia or skull base invasion, > 270° internal carotid artery encasement

    • Patient factors: high perioperative mortality risk, anticoagulants that cannot be stopped, patient denies surgical procedure

    • Operative factors: ability to achieve R0 resection and good reconstruction, acceptable level of morbidity (speech, swallowing, cosmesis)

  • RT: reirradiation is the primary treatment modality with curative intent in patients for whom surgery is contraindicated. Clinicians should consider the following guidelines:

    • To limit toxicity, target recurrent gross disease with limited margins. Do not add elective nodal reirradiation.

    • Administer a dose of at least 50–60 Gy, but keep target volumes tight. Limit cumulative spinal cord dose to < 50–60 Gy.

    • Prefer modern RT techniques via three-dimensional conformal RT (3DCRT)/intensity-modulated RT (IMRT). Image guidance can be helpful.

  • CT: can be implemented in combination with RT, for primary treatment when surgery is not an option, in the adjuvant setting after salvage surgery, or in the palliative setting.

    • Select patients carefully based upon performance status (PS), comorbidities, frailty, nutritional status, and renal function.

      1. o

        CRT with standard dose cisplatin is inappropriate in patients with creatinine clearance < 60 mL/min.

    • For patients with non-resectable recurrence with good PS and deemed fit, triple therapy with cetuximab, platinum-based CT, and 5-fluorouracil (5-FU), appears to provide the best outcomes. If not fit, consider combinations of platinum and cetuximab or platinum and 5-FU.

  • Immunotherapy (i.e. with ICIs) represents the newest treatment option for recurrent or persistent disease and has demonstrated favorable results in clinical trials. Patients with inoperable recurrent and/or metastatic disease should be offered the opportunity to participate in clinical trials with new therapeutic agents. Combined positive score (CPS) testing should be performed on the biopsy specimen to determine if patients are suitable candidates for immunotherapy.

Limitations

Though systematic and validated search and appraisal methodologies were implemented, this study is not without its limitations. The literature search methodology we utilized may have omitted pertinent CPGs if the keywords utilized differed from those associated with other relevant guidelines. There are also some inherent limitations to the AGREE II methodology. The AGREE II methodology is intended to assess the objectivity, quality, and rigor with which consensus statements are developed, and does not incorporate domains that assess the validity or level of evidence of recommendations, which are imperative in assessing their clinical validity. Finally, while our reviewers achieved ‘good’ or ‘very good’ interrater reliability in the majority of the AGREE II domains, only ‘moderate’ agreement was found in two of the six domains, indicating subjectivity and imprecision between reviewers, and potentially minimizing the significance of our findings.

Conclusion

Implementing the AGREE II instrument, we systematically appraised the quality of clinical recommendation statements regarding the management of recurrent HNC, and found significant variability and overall lack of quality among available CPGs. Of the five included consensus statements, four received an overall ‘low’ quality rating per the AGREE II criteria, and the ICON guideline was the only one to receive an overall ‘average’ rating. The lowest scores were achieved in the ‘Applicability’ and ‘Rigor of development’ domains, and no guideline statement received an adequate score in either domain. Key recommendations from the five CPGs are summarized above. Guidelines developers can address the weaknesses of current guidelines by employing systematic literature searches, employing external peer review, and ensuring the maintenance of updated guidelines. Future groups developing recommendations for the management of recurrent HNSCC can improve the quality and standardization of their guidelines by implementing the AGREE II framework.