Keywords

1 Introduction

Medical literature is classified based on the strength of study design to assist in evaluating the impact of a particular study; in each of these, randomized controlled trials (RCTs) or systematic reviews of RCTs provide the highest level of evidence, with observational studies considered less cogent [1]. Recently, there has been increasing emphasis on comparative effectiveness research (CER), the role of which is to identify and validate diagnostic and treatment options for physicians, patients, payers, and policymakers in an attempt to provide the best medical care for patients while containing costs [2, 3]. The Institute of Medicine defines CER as “the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition, or to improve the delivery of care” [4]. CER aims to achieve this in a way that applies to the general population; fundamental to this is that the study population is diverse and assembled from a primary care practice setting with outcomes that include decisions based on patients’ values [5]. With these goals in mind, RCTs do not necessarily represent the best study design; some authors argue that RCTs determine efficacy, not effectiveness [5].

In head and neck oncology, RCTs do not generally compare different treatment modalities. However, one pivotal RCT changed the approach to advanced laryngeal cancer by comparing surgery and postoperative radiation to induction chemotherapy followed by radiation for advanced laryngeal cancer. Finding no difference in overall survival, this study promoted organ preservation approaches to head and neck cancer [6]. Subsequently, the organ preservation approach to hypopharyngeal cancers was evaluated in a RCT comparing surgery with postoperative radiation to induction chemotherapy followed by radiation; both groups were found to have similar median survivals [7]. These studies lead to widespread acceptance of organ preservation management of advanced laryngeal and hypopharyngeal cancers, as reflected by longitudinal clinical registry data [8, 9]. Surprisingly, this change in treatment paradigm has been accompanied by a decrease in survival, especially at low-volume community medical centers [8, 9]. This potentially reflects the danger of generalizing the findings of RCTs, which have clearly outlined patient eligibility. For example, a RCT comparing induction chemotherapy to concurrent chemotherapy for advanced laryngeal cancers exclude those patients who present with cartilage destruction; [10] this criterion may not be appropriately recognized when recommending organ-preserving treatments. Additionally, patients included in most head and neck oncology RCTs are younger and healthier than this general patient population [2]. The majority of the otolaryngology literature consists of a low level of evidence, with the majority of studies containing level 4 evidence, but this landscape is changing [11]. Carefully-designed observational studies, which represent lower echelons of evidence strength, may complement RCTs and provide meaningful CER in head and neck oncology [2].

2 Major Barriers to Comparative Effectiveness Research

2.1 Powering Meaningful Studies

In 2014, the projected incidence of head and neck cancer is 55,070 people or 3.3 % of all cancers, and the projected mortality is 12,000 deaths or 2.0 % of all cancer deaths [12]. With such a small portion of the general population affected, it is difficult to accrue enough patients to studies to have meaningful results, especially as compared to a more prevalent medical condition, such as otitis media [13]. Without enough patients for appropriate power, negative results do not necessarily mean that a significant difference does not exist.

A meta-analysis of prophylactic antibiotic use in head and neck surgery patients identified 7 RCTs between 1981 and 2003 that compared 24 h of peri-operative antibiotics to a longer course (3–4 days in some trials, 5 days in others). Each of these studies were underpowered, so the result of no difference between the treatment groups was not particularly reliable; by pooling these results in a meta-analysis, the authors were able to achieve adequate power to conclude that no difference exists [14]. Even within this group of studies, the surgical procedure ranged from upper aerodigestive tract surgery to pedicled myocutaneous flaps to combined composite resections with free flap reconstructions. In addition, little information was provided about patient characteristics that might affect outcomes, such as smoking history, comorbidities, or previous radiation therapy, for a subset analysis [14]. This meta-analysis evaluated perioperative antibiotic use without addressing specific subsites of disease; considering that the treatment for head and neck cancer varies by each subsite (e.g., paranasal sinuses, nasopharynx, oropharynx, oral cavity, hypopharynx, larynx, skin, and thyroid), assessing head and neck cancer patients by subsite of disease even further reduces the ability to achieve adequate power.

Multicenter clinical trials may accrue enough patients to answer CER questions prospectively, whereas clinical registry data may be appropriate to evaluate existing gaps. A common limitation to tumor registries is the lack of detailed information, such as clinical indications, tobacco history, TNM staging, test results, and treatment-related complications, to name a few [2]. Despite these limitations, there are certain questions within head and neck oncology that could be appropriately addressed. The Longitudinal Outcomes Registry of Head and Neck Carcinoma was built to address these shortcomings [15], but has since closed due to insufficient funding; developing similarly motivated registries would be worthwhile. As more sophisticated electronic health records bridge medical centers and health systems, such detailed data may be accessible on adequate numbers of patients.

2.2 Selection Bias

Selection bias occurs when patients are assigned to one intervention or another in a way that confounds the study outcomes. In retrospective cohort studies, for example, patients were likely chosen to receive one treatment or another based upon patient or tumor characteristics. Majoufre et al. [16] evaluated a historical cohort of patients who presented with clinically N0 oral cavity cancer and underwent either a type 3 modified radical neck dissection or a supraomohyoid neck dissection, finding no significant difference in recurrence or survival. Interestingly, however, the group that underwent supraomohyoid neck dissections had a better 2-year and 5-year survival when compared to the modified radical neck dissection group (85.8 % vs. 73.6 %, and 70.2 % vs. 57.2 %, respectively); [16] although these differences did not reach statistical significance, they suggest that there was a selection bias involved in surgical planning such that the patients who underwent the less extensive neck dissection had a favorable 5-year survival. This same question of whether a modified radical neck dissection or supraomohyoid neck dissection is more appropriate for N0 oral cavity cancer patients was addressed by the Brazilian Head and Neck Cancer Study Group in a RCT [17]. Randomizing patients to one type of neck dissection or another removes selection bias; accordingly, the 5-year survival for the modified radical neck dissection group was 63 and 67 % for the supraomohyoid neck dissection group (p = 0.72) [17]. Although the majority of the otolaryngology literature has a low level of evidence, [11] careful study design, and data analyses can adjust for biases inherent in observational studies to generate meaningful CER.

2.3 Evaluating Clinical and Functional Outcomes

Head and neck cancer and its treatment can be functionally debilitating. However, most studies focus either on clinical outcomes, such as survival and recurrence, or on functional outcomes and quality of life; rarely do studies prioritize both outcomes. To further complicate this issue, few studies use accepted, validated instruments to evaluate patients’ function.

When comparing endoscopic resection versus radiation therapy for early (T1) glottic cancer, a recent systematic review identified 1,045 studies, 888 of which were dismissed after a review of their abstracts [18]. After reviewing, the complete manuscripts for the remaining 146 studies, 127 were subsequently excluded. The review then focused on 2 systematic reviews and 17 articles, the majority of which were retrospective comparative and cross-sectional studies. After reviewing this literature, the authors were unable to pool the data because of poor study designs, heterogeneity among study populations, and inherent period bias from the years covered (e.g., changes in radiation technique and dosing). Of the 17 primary studies, 3 did not report length of follow-up. Only 11 of the 17 studies reported survival outcomes; of these, 2 did not report overall survival, 9 did not report disease-free survival, and 6 did not report disease-specific survival. Only 7 studies reported a functional evaluation, which ranged from clinician-ratings to patient perception ratings to acoustic and aerodynamic analysis. Of the validated patient perception instruments used, 2 studies used the Voice Handicap Index, 1 used head and neck quality of life questionnaires, and 2 used the voice-related quality of life scale [18]. In an attempt to organize these best available data for clinical practice guideline recommendations, the authors conclude that there is not enough evidence to demonstrate a difference between these treatment modalities [19]. The issues faced by these authors are fairly representative of the quality of head and neck surgical oncology literature.

Standards surrounding clinical and functional outcomes need to be established for successful and meaningful CER; these might be best determined by specialty society efforts. Ideally, both types of outcomes would be evaluated and reported in the same study, with the use of validated instruments to assess patient function at baseline and in short- and long-term post-treatment intervals. One such example is in the realm of laryngeal preservation. The premise of laryngeal preservation is to achieve locoregional control but maintain a functioning larynx for natural breathing, speaking, and swallowing. Landmark RCTs (as previously discussed in this chapter) established equivalent survival after frontline chemoradiation in lieu of complete surgical removal of the larynx (i.e., total laryngectomy) for locally advanced stage laryngeal cancer. With broad application of nonsurgical laryngeal preservation, it became clear that structural preservation of the larynx does not equate to functional laryngeal preservation. A pooled analysis of three RTOG chemoradiation trials reported an alarming crude rate of 43 % of patients with adequate baseline functioning developing late grade 3–4 laryngopharyngeal dysfunction after aggressive nonsurgical therapy [20]. This largely constituted chronic gastrostomy dependence related to dysphagia (difficulty swallow). Bearing in mind these outcomes, an international consensus panel developed a combined endpoint to account for both survival and functioning in phase III clinical trials of laryngeal preservation strategies—“laryngoesophageal dysfunction (LED)-free survival, which includes the events of death, local relapse, total or partial laryngectomy, tracheotomy at ≥2 years, or feeding tube at ≥2 years”. Secondary endpoints were also defined including patient-centered outcomes contributing to QOL in survivorship [21].

Functional outcomes are considered a key measure of success in contemporary management of head and neck malignancies. Among these outcomes, swallowing emerges as a top functional priority of patients and a driver of post-treatment quality of life [22, 23]. When rated subjectively in the clinical setting (e.g., per CTCAE), grade 3 dysphagia is essentially a marker of feeding tube dependence. The clinical literature has a preponderance of studies using grade 3 dysphagia (i.e., feeding tube-dependent dysphagia) as the sole functional outcome. It is clear, however, that alone feeding tube dependence is not a sensitive marker of swallowing impairment. Many survivors with substantial and clinically meaningful levels of swallowing impairment (such as tracheal aspiration) continue to eat without a feeding tube, albeit with great effort and risk of secondary complications (i.e., aspiration pneumonia). For instance, we have previously demonstrated in observational studies that only 33–45 % of chronic aspirators are feeding tube dependent [24]. Looking beyond gastrostomy-dependent dysphagia, swallowing abilities can be quantified from the patient’s perspective using a validated patient-reported outcome inventory developed specifically for the head and neck population—the MD Anderson Dysphagia Inventory (MDADI) [25]. Opportunities for CER using meta-analysis or pooled datasets are ripe with now widespread adoption of the MDADI in published single institutional series using various treatment modalities (e.g., MDADI after robotic surgery for TORS in oropharyngeal cancer, [2630] and MDADI scores after nonsurgical therapy for oropharyngeal cancer [3134]). Consistent reporting of confounding factors like precise tumor subsite, TNM, and therapeutic details will be required to pool data for comparative purposes.

3 Important Target Areas for Comparative Effectiveness Research

3.1 Pre-treatment Evaluation

Human papillomavirus (HPV)-associated head and neck cancers have been found to confer favorable survival [35] and are presenting with increasing incidence [36]. Given the higher treatment response rate, lower risk oropharyngeal squamous cell carcinomas may respond just as well to deescalated therapy, which may limit treatment-associated morbidity while providing similar clinical outcomes. The Radiation Therapy Oncology Group has an on-going RCT (1016) that is evaluating how concurrent cetuximab and radiation compare to the traditional regimen of cisplatin and radiation in patients with HPV-positive tumors [37]. The European Cooperative Oncology Group has also conducted a RCT offering induction chemotherapy followed by concurrent cetuximab and radiation, with patients randomized to receive either high or low doses of radiotherapy [38]. Prospective studies and RCTs address these questions well, but long-term survival and functional outcomes take longer to obtain. These studies may be complemented by carefully designed observational studies.

Just as HPV-positivity is associated with a favorable prognosis and response to treatment, other biomarkers reflecting etiology or molecular expression hold promise as important predictive markers, including the epidermal growth factor receptor, p53, B cell lymphoma-2 (Bcl-2), cyclin D1, and vascular endothelial growth factor, to name a few [39]. None of these have yet been established in routine clinical management because of problems with consistency and study design [40]. CER has great potential in evaluating the clinical utility of these markers for personalized treatment approaches [41]. Identifying the predictive capabilities of biomarkers may lead to more targeted treatment choices with possible reduction in treatment-related toxicity.

3.2 Treatment

Subsites of head and neck cancer in which radiation and surgery are both considered valid approaches are in need of meaningful CER to compare treatment modalities. As illustrated by the earlier discussion of endoscopic surgery versus radiotherapy for T1 glottic cancers [18, 19], the literature that exists on this subject is of questionable quality. Although most data indicate that radiation and minimally invasive surgery have similar effectiveness for early glottic cancers, there has not been an adequate prospective trial allowing for direct comparison of clinical and functional outcomes.

There has been renewed interest in surgery for oropharyngeal cancers with the advent of robotic surgery; transoral robotic surgery (TORS) is becoming more commonly accepted for early stage oropharyngeal cancers. CER comparing TORS to traditional open surgical approaches and to radiation-based therapy is needed. Currently, there are five independent TORS trials on-going, each with a single arm of TORS at a single institution [42]. The feasibility of TORS at multiple institutions was previously reported by Weinstein et al. [43] Given the low incidence of TORS-appropriate cases, a multicenter trial with standardized functional assessments and multiple study arms has been opened, although the primary study group are intermediate-risk, who are randomized to either standard (60 Gy) versus low-dose (50 Gy) adjuvant radiotherapy. A more robust RCT is necessary that would directly compare TORS to radiotherapy.

CER would also be helpful in overcoming the barriers to fully evaluating the clinical benefit of neoadjuvant therapy in head and neck cancers. Induction chemotherapy is used in the management of many solid tumors, but its role in head and neck cancer is less clear. In 2000, the Meta-Analysis of Chemotherapy on Head and Neck Cancer (MACH-NC) collaborative group evaluated 31 clinical trials, finding no improvement in survival. However, this group also reported that there was a small but significant benefit when analysis was limited to trials using cisplatin and fluorouracil (FU) [44]. Subsequent clinical trials have been hampered largely by low accrual, which translates into an underpowered study; unfortunately, this makes it difficult to determine whether there is an actual clinical benefit from the addition of neoadjuvant chemotherapy despite negative results [45, 46]. Larger multicenter trials may address this issue, although the use of clinical registries may need to be developed in order to achieve adequate power for conclusive findings.

3.3 Post-treatment Surveillance

Clinical practice guidelines for post-treatment surveillance of head and neck cancer patients lack strong evidence in the medical literature [47]. The clinical effectiveness of imaging strategies (e.g., one post-treatment imaging and then as-needed for symptoms vs. only as-needed for symptoms vs. routine imaging) with regard to identifying asymptomatic recurrences and second primary tumors is an area that warrants CER. Broad variability exists in the oncology community vis-à-vis the interval and type of surveillance imaging (PET-CT, CT, MRI, chest X-ray) necessary in the post-treatment setting. Most challenging is that these strategies and imaging choices may differ by disease subsite and treatment modality utilized.

Additionally, although the National Comprehensive Cancer Network (NCCN) formally prescribes a post-treatment surveillance schedule of office visits, [48] this is not evidence-based. In fact, the United Kingdom’s National Institute of Health and Care Excellence (NICE), which also creates evidence-based guidelines, simply emphasizes that follow-up is important in the first two post-treatment year, since the risk of recurrence is higher during that time, with increasing intervals between visits as time goes on [49]. As with post-treatment imaging, these follow-up strategies may be impacted by disease subsite and treatment modality.

3.4 Quality of Care Metrics

There has been a great deal of interest in identifying quality metrics for head and neck cancer care; [5052] in most cases, these metrics are identified from the best available evidence [53] but their impact on clinical and functional outcomes is largely unknown. With appropriate statistical modeling, CER could identify which process metrics impact patient outcomes. These standards would include technical aspects of radiation therapy, treatment breaks, and peri-operative complications. Endpoints would include patient-reported outcomes, functional outcomes, and clinical outcomes, such as survival and recurrence, both in short- and long-term follow-up.

4 Conclusion

CER has the potential to reform the care head and neck cancer patients. Current barriers include low-powered studies limited by the low incidence of this disease, selection bias in clinical trials, and few established standards for reporting clinical and functional outcomes in comparative studies. All of these have workable solutions that will improve the quality of head and neck cancer studies in the areas of pretreatment evaluation, treatment, posttreatment surveillance, and the identification and validation of quality metrics.