Introduction

Head and neck squamous cell carcinoma of unknown primary (HNSCCUP) is the regional metastatic disease of head and neck squamous cell carcinoma (HNSCC) without an obvious primary tumor [1,2,3,4]. Despite initial clinical and radiologic examinations, the index tumor may be unidentified for reasons including minute size, tumor involution, occult location, and slow growth [5, 6]. The incidence of head and neck carcinoma of unknown primary (HNCUP) ranges from 3 to 7% of all head and neck cancers, which is the 7th most common form of cancer worldwide [7, 8]. Among HNCUP, squamous cell carcinomas comprise 53–77% of cases and are most associated with masses in the upper two thirds of the neck with an occult index tumor in the head and neck region [5, 9,10,11,12]. Because the location of the primary lesion is unknown, patients with HNSCCUP may experience significant distress due to uncertainty about prognosis and tumor recurrence [13].

The diagnostic evaluation for HNSCCUP in a patient with a neck mass suspicious for squamous cell carcinoma focuses initially on identifying the primary lesion via a thorough history and physical exam, fiberoptic laryngoscopy, as well as mucosal imaging by contrasted computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and/or PET/CT [14]. Fine-needle aspiration (FNA) may be used to obtain histologic confirmation of squamous cell carcinoma [15]. If nondiagnostic, FNA may be repeated, or a core needle biopsy may be used [15]. Treatment methods include radical neck dissection and bilateral tonsillectomy followed by radiotherapy, or radiotherapy with or without chemotherapy followed by surgery [1,2,3,4, 13, 15].

Management of HNSCCUP varies among centers but is generally coordinated by a multidisciplinary team composed of radiologists, oncologists, pathologists and head and neck surgeons. Several professional societies and organizations have created clinical practice guidelines (CPGs) to coordinate multidisciplinary care, as well as standardize and establish best practices for the management of HNSCCUP. CPGs are systematically developed statements designed to facilitate practitioner and patient decisions in the management of a particular clinical condition. When developed and implemented appropriately, these guidelines can significantly influence patient care and outcomes [16]. Despite the potential impact of CPGs on disease management, there has not yet been a systematic quality appraisal of CPGs addressing the topic of HNSCCUP.

Given the potentially variable management of HNSCCUP, our team sought to evaluate CPGs using the Appraisal of Guidelines for Research and Evaluation (AGREE II) tool, which is an effective instrument for guiding the development and evaluating the quality of CPGs [17, 18]. The AGREE II tool is composed of 23 scored items across the following 6 domains: scope and purpose, stakeholder involvement, rigor of development, clarity of presentation, applicability, and editorial independence [17]. In recent literature, the systematic appraisal of CPGs has underscored the need for quality guidelines across specialties. In otorhinolaryngology, studies have evaluated CPGs for several conditions including temporomandibular joint disorders and thyroid cancer, among others [19,20,21,22]. In this study, we perform a systematic literature review to identify CPGs for the management of HNSCCUP and evaluate them using the AGREE II instrument.

Materials and methods

Search strategy and guideline selection

A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) criteria using EMBASE, MEDLINE via PubMed, Scopus (Fig. 1) [23]. The search terms were [(“head and neck” AND "squamous cell carcinoma" AND “unknown primary” OR “occult primary”) AND (“guideline” OR “consensus” OR “recommendation”)] in the title, abstract, or keywords.

Fig. 1
figure 1

Preferred reporting items for Systematic reviews and meta-analyses (PRISMA) flow diagram demonstrating literature search process and final guideline selection

Data extraction and quality appraisal

Full articles were reviewed for the following: primary author, year of publication, development group, region of origin, funding, intended users, evidence base, and guideline content. Four authors (JN, JH, DR, AP) successfully completed training in the AGREE II methodology and evaluation criteria (www.agreetrust.org). Each author then independently performed quality appraisals on the included guidelines. A standardized form made available to reviewers. This form included the six AGREE II quality domains and 23 individual line items (Table 1). Each of these 23 items were scored on a scale from 1 (strongly disagree) to 7 (strongly agree) based upon how many elements of the AGREE II criteria for that domain were met by each guideline. After appraisals were complete, these data were collectively analyzed by the primary author. Each domain was weighted equally. Domain scores and intraclass correlation coefficients (ICCs) were calculated. The “scaled domain score” (between 0 and 100%) was calculated, as per the original formula provided in the AGREE II instrument methodology manual [17]:

$${\text{Scaled domain score}} = \frac{{{\text{Obtained score}} - {\text{Minimum Possible score}}}}{{{\text{Maximum possible score }} - {\text{Minimum possible score}}}} \times 100.$$
Table 1 Six domains of quality and accompanying line items provided in AGREE II

A scaled domain score equal to or above the threshold of a 60% indicated adequacy in a given domain [24,25,26]. As recommended by AGREE II methodology, each CPG was given an overall numeric score and categorical quality rating. This score was based on the number of domains in which the CPG scored ≥ 60%: ‘high quality’ if ≥ 5 domains, ‘average quality’ for 3–4 domains, and ‘low quality’ if ≤ 2 domains scored ≥ 60%.

Data analysis

Descriptive statistical analyses were performed using SPSS (version 24, IBM). A two-way random effects intraclass correlation coefficient analysis (ICC) with a 95% confidence interval was calculated using Python 3.8 and the pingouin API [27]. Consistent with previous literature [28], ICC classifications were defined as poor (< 0.20), fair (0.21–0.40), moderate (0.41– 0.60), good (0.61–0.80), or very good (0.81–1.00).

Results

A systematic literature review identified 703 documents across the three databases surveyed. Resultant articles were sorted, indexed, and duplicate entries found within multiple databases were then removed. The process yielded 515 unique articles. Based on a title and abstract review these articles were then evaluated for possible inclusion in this study. A total of 42 full-text articles involving the diagnosis and/or management of HNSCCUP were selected for review. Seven guidelines were ultimately selected for appraisal [2, 29,30,31,32,33,34]. Steps of literature review process are summarized in a flow diagram (Fig. 1).

Of the 7 CPGs identified, 2 were developed in the United States. Four United Kingdom groups along with one European working group developed guidelines. Most guidelines were published within the last 2 years. Upon review, these guidelines were authored by multidisciplinary teams, which often included otolaryngologists, clinical oncologists, radiation oncologists, and pathologists. The content of these guidelines ranged from broad to specific topics: general head and neck cancer management (n = 2); assessing and managing cancers of the upper aerodigestive tract (n = 2), and management of HNSCCUP in particular (n = 3). These CPG characteristics are summarized in Table 2.

Table 2 General characteristics abstracted from seven available clinical practice guidelines for HNSCCUP management

Quality appraisal by domain score

Independent appraisals of CPGs undertaken by our reviewers enabled calculation of scaled domain scores for the six quality domains of the AGREE II (Table 3). For all appraised guidelines combined, the lowest average scaled domain score was 53.5 ± 34.4% (domain 3: rigor of development). The highest average domain score was 73.6 ± 18.0% (domain 4: clarity of presentation). Among single guideline domains scores, the highest calculated was 99.5% for domain 3: rigor of development of the guideline developed by the American Society of Clinical Oncology. No guideline received a perfect score of 100% in an AGREE II quality domain. The lowest single guideline domain score was 4.27% in domain 3 (rigor of development) for the British Association of Head and Neck Oncologists guideline. As alluded to above, there was variability in scaled domain scores between the 7 appraised CPGs, as evidenced by the average standard deviation of 26.4% (18.0–34.04%). According to evaluation criteria delineated in the AGREE II scoring guidelines, CPGs with five or more scaled domain scores > 60% were classified as ‘high’ quality. This ‘high’-quality threshold was met by two guidelines developed by National Institute for Health and Care Excellence (NICE) and American Society of Clinical Oncology (ASCO). There was a single guideline from the ENT UK Head and Neck Society Council that achieved “average quality” designation by scoring > 60% in four domains.

Table 3 Scaled AGREE II domain scores for the 7 identified HNSCCUP CPGs

Quality appraisal by overall AGREE II score

The highest scoring CPG including all six quality domains was produced by Pilling et al. for the National Institute for Health and Care Excellence, which achieved a mean score of 94.9 ± 4.3%. For this guideline, domain 2 (stakeholder involvement) scored the lowest at 87.5% while domain 3 (rigor of development) scored the highest at 99.5%. In contrast, the CPG with the lowest average score across all six domains received an overall mean score of 33.6 ± 17.9% and was produced by British Association of Head and Neck Oncologists. This low overall score was attributable to limited evidence of developmental rigor (domain 3), low applicability of guideline content (domain 5), and uncertain editorial independence (domain 6).

Intraclass correlation coefficient

To quantify agreement between appraisers for each AGREE II domain, intraclass correlation coefficients (ICCs) were employed (Table 4). There was ‘very good’ inter-rater reliability, (ICC of 0.81 or higher) in the stakeholder involvement AGREE II domain. This indicates similar quality scores and thus strong agreement amongst raters. Three domains demonstrated a ‘good’ ICC score (ICC of 0.61 or higher). Two domains were below this threshold, indicating some disagreement between reviewers on the domains of scope and purpose (domain 1) and clarity of presentation (domain 4).

Table 4 Intraclass correlation coefficients for inter-rater reliability across the 6 AGREE II domains

Discussion

CPGs have been increasingly employed in medical practice and can impact practitioner and patient shared decision making. Despite comprising a small proportion of head and neck cancers, HNSCCUP has caught the attention of several professional societies and organizations worldwide, which have produced CPGs on the management of this condition [7]. Though CPGs can influence patient care and outcomes, no prior studies have systematically appraised the content, methodologic rigor, and applicability of HNSCCUP guidelines. In the present study, we offer a systematic literature review and appraisal of seven CPGs using the AGREE II instrument, which evaluates guidelines in six domains:

  • Domain 1: ‘scope and purpose’ assesses the overall purpose, intended target population, and health questions of the guideline. This domain is important as it helps to define the reach and utility of a CPG, facilitating a user in applying the guideline to appropriate patient populations.

  • Domain 2: ‘stakeholder involvement’ considers the professional makeup of the guideline development panel, perspectives of the target patient population, as well as the intended users of the CPG. Given the multidisciplinary management of HNSCCUP, it is appropriate for relevant stakeholders to guide the development of clinical guidelines.

  • Domain 3: ‘rigor of development’ evaluates the methodologic rigor of CPG development, including the use of systematic methods for literature search, evidence review, and guideline development. Overall, this domain assesses the degree to which recommendations are evidence-based and appropriately formulated.

  • Domain 4: ‘clarity of presentation’ considers the conciseness and readability of recommendations, and thus provides insight into how easily the CPG can be implemented in clinical practice.

  • Domain 5: applicability evaluates the degree to which CPGs consider application barriers and provide tools to facilitate guideline implementation.

  • Domain 6: ‘editorial independence’ assesses the disclosure and management of funding sources and conflicts of interest. [17]

Among the seven CPGs in our systematic appraisal, the NICE and ASCO guidelines achieved the AGREE II threshold of being a “high”-quality CPG by scoring ≥ 60% in a scaled domain score in ≥ 5 quality domains. This finding suggests that these two guidelines may possess the highest degree of methodologic rigor and clinical utilization among included CPGs. The ASCO guideline scored high in all domains with no significant areas of weakness. It clearly defined its scope and purpose, involved multidisciplinary stakeholders (including medical, surgical, and radiation oncologists, a patient representative, and a pathologist), used methodologically rigorous approaches to formulate evidence-based recommendations, and did not have any significant conflicts of interest among its expert panel. Moreover, the ASCO guideline had a dedicated member of the expert panel focus on guideline application in the clinical setting and provided resources to facilitate guideline implementation.

While the NICE guideline also met the threshold of a high-quality CPG, it scored lower across most domains relative to the ASCO guideline. Nonetheless, the NICE guideline performed strongly in the scope and purpose domain by clearly outlining clinical questions, guideline objectives, and applicable patient populations. Additionally, the NICE guideline performed strongly in the applicability domain by providing tools and resources for guideline implementation.

The ENT UK Head and Neck Society Council guideline achieved an “average” rating by scoring ≥ 60% in 3–4 quality domains. This guideline performed poorly in the applicability domain, with limited apparent consideration of the clinical utilization of the guideline. The remaining four guidelines achieved a “low” quality score by scoring ≥ 60% in ≤ 2 domains. Most notably, these guidelines performed poorly in the rigor of development domain, which may be considered the most important and comprehensive among six domains since it contains 8 among a total of 23 scored items and assesses the quality of evidence and soundness of methodology used to formulate recommendations. The four “low”-quality guidelines also scored weakly in the applicability domain and may not have explicitly stated considerations regarding guideline implementation or provided tools that can readily be used in clinical settings. In addition to the discussion above, the main issues with each protocol are summarized in Table 3.

Inter-rater reliability assessed through ICC scores were favorable across the six domains. Four of six domains had ‘very good’ or ‘good’ ICC scores, indicating that there was agreement among reviewers in scoring these domains. Although there was some disagreement among our reviewers, this does not necessarily suggest a lack of internal validity. Rather, this may be a consequence of valuable content being presented in a manner that was not readily apparent to our expert reviewers.

Given that half of the included CPGs were appraised to be low quality, the authors urge future CPG development groups to continue refining the guideline development process. Overall, CPGs should aim to clearly state the target population, health questions addressed, and guideline purpose. The expert panel should be composed of relevant multidisciplinary members who appropriately represent stakeholders in patient management of the medical condition of interest. Target users of the guideline should be explicitly stated, and patient perspectives should be accounted for and incorporated into the guideline development process. During editing stages, attention should be paid to the readability and conciseness of recommendations. CPGs should also consider potential resource limitations and other application barriers and provide implementation tools when feasible. Additionally, CPGs should clearly state funding sources and consider their potential impact on guideline development, as well as disclose methods for soliciting and addressing conflicts of interests.

In support of improving future guidelines, it should be noted that several key recommendations were identified by our panel of reviewers. First, preoperative evaluation should include a thorough history and physical examination, response to informational needs, discussion of smoking cessation, biopsy of clinically suspicious masses, and consideration of HPV testing. In general, these patients require dedicated imaging of the H&N, which can be accomplished through contrast-enhanced CT imaging or FDG PET/CT if a primary site is not visible on CT imaging. Multiple expert panels strongly recommend that all unknown primary patients undergo management with a neck dissection. With regard to other treatment considerations, large-volume bilateral neck disease and/or gross (macroscopic) extranodal extension (ENE) favor definitive chemoradiotherapy. Adjuvant radiotherapy should not be offered to patients with no primary tumor and a single pathologically positive node without ENE following a high-quality neck dissection.

Furthermore, future guidelines would benefit from discussing the emerging role of robotic surgery in the management of carcinoma of unknown primary in the head and neck. Transoral robotic surgery (TORS) provides a high-definition magnified view of the oropharynx, which permits visualization of small mucosal lesions otherwise difficult to identify without magnification. In comparison to standard diagnostic panendoscopy, the improved visualization and freedom of motion offered by TORS may also facilitate targeted resection of submucosal tissue in the oropharynx. Recent reports suggest that TORS base of tongue resection may identify primary tumors in up to 90% HNSCCUP patients [35]. Multiple case series further suggest that this approach offers low complication rates, reduced morbidity, and improved tumor identification [36,37,38]. Although TORS mucosectomy is an expanding surgical technique with a key role in head and neck surgery, larger studies are necessary to fully describe its utility in the setting of head and neck cancer diagnosed without an identifiable primary.

Comprehensive guidelines should also mention the importance of HPV and EBV (EBER) detection in lymph node fine-needle aspirates. Recent reports have demonstrated their potential role in distinguishing between primary SCC subsites [39, 40]. Even though only a subset oropharyngeal SCC are HPV related, it may be of diagnostic value to pursue HPV and EBER detection to serve as an indicator of oropharyngeal and nasopharyngeal primary SCC, respectively.

This study has several limitations. Though the AGREE II tool assesses the development and organization of CPGs, it cannot evaluate the validity and strength of cited evidence and recommendations. Additionally, the tool utilizes a grading scale of “strongly agree” to “strongly disagree”, which is inherently subject to bias and individual rater interpretation. Though we sought to limit subjectivity using four independent expert raters, inter-rater disagreement was evidenced by low ICC scores in two of six domains. Additionally, we did not include non-English or subscription-based publications in our study, which may have excluded some available CPGs on HNSCCUP.

Conclusion

The majority of clinical practice guidelines for the diagnosis and management of head and neck cancer of unknown primary site were of poor quality. Only two of seven CPGs demonstrated high-quality content, according to a comprehensive assessment using the AGREE II instrument. Future guidelines should explain the methodological rigor of development and take steps to clearly delineate clinical applicability. This would serve to optimize the quality and utility of guidelines for the benefit of patients with HNSCCUP.