Introduction

Increasing evidence exists for the role of primary gross tumor volume (pGTV) and nodal gross tumor volume (nGTV) as prognostic markers in head and neck cancer (HNSCC), especially in oropharyngeal squamous cell carcinoma (OPSCC). Higher pGTV has been shown to correlate with increased risk of recurrence and poorer survival [1,2,3,4,5,6,7,8,9,10]. The suggestion for this phenomenon is that a larger tumor contains a higher number of clonogenic cells, which may impair treatment response to radiotherapy [11,12,13]. Patients with larger tumor volumes may need to be treated with a higher dose of radiotherapy to achieve a favorable treatment response, whereas smaller tumor volumes may need lower doses of radiotherapy and thus avoid toxic side effects of the treatment [10, 14].

The tumor, node, metastasis (TNM) classification is a validated standard tool for prognostic evaluation of OPSCC management according to disease stage. It is based on tumor dimensions and extension with metric and anatomic criteria [15, 16]. However, previous studies have indicated that tumor volume as a prognostic factor may be superior to TNM staging among patients treated with radiotherapy [3, 6, 9, 10, 17]. The limitation of TNM classification and staging is that the system may classify tumors with different volumes into the same stage due to one-dimensional determination [3, 6, 10, 18]. The benefit of volume measuring is the information of exact three-dimensional structure of the tumor and estimation of the total tumor mass [1, 3, 6]. Furthermore, the recent 8th edition of the TNM classification divides tumors by p16 status into HPV-positive and HPV-negative OPSCC, and takes into account extracapsular spread of nodal metastases [19, 20].

Previous studies have indicated that pGTV may have prognostic impact on local control in OPSCC [1, 6, 13, 18]. In addition, it has been observed to carry prognostic value also for overall survival and disease-free survival [1, 3, 6]. As is well-known, a clear majority of OPSCCs are HPV positive in the Western countries, with more favorable outcome and response to (chemo)radiotherapy compared to HPV-negative tumors [21,22,23,24,25]. However, there is still an unexplained variation in OPSCC prognosis and treatment outcome irrespective of HPV status. Tumor volume as a prognostic factor has been studied in OPSCC patients [17] in general, and separately only in p16-positive patients [17]; studies comparing p16-negative and p16-positive tumors are still missing. In addition, clear cut-off values for volumes predicting poorer survival remain unknown and the previous studies were done with the 7th edition of TNM classification. Our aim was to study the prognostic value of primary and nodal tumor volume in p16-positive and p16-negative OPSCC tumors by using the 8th edition of the TNM classification.

Materials and methods

Study population

The inclusion criteria for the study were newly diagnosed biopsy-proven OPSCC patients with available tumor p16 status treated with radiochemotherapy or radiotherapy using IMRT. Patients with distant metastases or who underwent neck dissection and primary tumor resection before radiotherapy as well as patients treated with palliative intent were excluded. CONSORT diagram of patient selection is shown in Fig. 1. Altogether, there were 91 consecutive OPSCC patients who met the inclusion criteria and who had been treated at the Departments of Oncology and Otorhinolaryngology—Head and Neck Surgery at the Helsinki University Hospital, Helsinki, Finland during a 3-year time period between January 2012 and December 2014. An institutional study permission was granted for the study design.

Fig. 1
figure 1

A CONSORT diagram of patient selection. OPSCC oropharyngeal squamous cell carcinoma, IMRT intensity-modulated radiotherapy

Clinical data were collected from electronic medical records. Data included gender, age at the date of OPSCC diagnosis, tobacco smoking status, and the date of radiation treatment termination. History of tobacco smoking was defined as “non-smoker” or ex-smoker/current smoker. Ex-smoker was defined as a user who had more than a 1-year history of tobacco smoking. Tumor spread and stage were evaluated according to the 8th edition of the TNM classification.

Primary and nodal tumor volume evaluation

The pGTV and nGTV were calculated three dimensionally from CT scans by an experienced head and neck radiation oncologist using Varian Eclipse® radiotherapy treatment planning system versions 10.0 and 11.0 (Varian Medical Systems Inc., Palo Alto, CA, USA). The volume results are given in cubic centimeters (cm3). We often use combined MRI and CT imaging for radiotherapy dose planning. MRI was, however, not done for all patients and we therefore we used CT for volume measurements. Different imaging modalities may produce significantly different GTVs [26, 27]. Thus, using MRI measures in the patients for whom it was available and CT-based volumes for the rest of the patients could have resulted in non-comparable GTV volumes. The radiotherapy planning CT was done with contrast enhancement in all patients using Omnipaque™ injection 350 mg I/ml solution. In addition, nodal involvement was defined according to generally accepted criteria, where a minimal axial diameter of 10 mm was the size criterion in the digastric region for the lymph nodes and the size criterion in the subdigastric region was 11 mm [28]. In addition, lymph node necrosis and cystic lymph node metastases were defined according to generally accepted criteria [28, 29].

p16-INK4a immunohistochemistry

We detected the p16-INK4a status by immunohistochemistry on paraffin-embedded formalin-fixed tissue samples. Positive p16 expression was defined when over 70% of tumor cells were positive. Xylene was used for deparaffinization and rehydration was done in graded series of alcohol. “PreTreatment module” (Lab Vision Corp., UK) was used to treat the tissue slides in Tris-HCl buffer (pH 8.5), and endogenous peroxidase was blocked with 0.3% Dako REAL Peroxidase-Blocking Solution. Monoclonal mouse anti-human p16-INK (9517 CINtec Histology Kit, MTM laboratories, Germany) was used as primary antibody.

Radiotherapy and chemotherapy

All tumors were treated with curative intent by radiochemotherapy or radiotherapy using IMRT up to 70 Gray (Gy), in 2 Gy fractions per day with or without platinum-based chemotherapy. In addition, elective nodal areas were treated up to 50 Gy. Chemotherapy consisted of cisplatin 40 mg/m2 given weekly during the radiotherapy.

Follow-up time

Follow-up time was defined as a time in months from the date of treatment termination to the date of last follow up. Follow-up appointments were planned every 3 months in the first year after completion of treatment, every 3 to 4 months in the second year, and in the next 3 years every 6 months.

Statistical analysis

Data were analyzed with a statistical software program (IBM SPSS Statistics 24, IBM, Somers, IL, USA). The primary endpoints were overall survival (OS), disease-free survival (DFS), and locoregional control (LRC) separately for p16-positive and p16-negative patients. Survival time was defined as time in months from the completion of treatment to the date of last follow-up or death. Locoregional failure was defined as tumor residual or a biopsy-proven recurrence locally or regionally during follow-up.

All analyses were conducted separately with pGTV and nGTV in p16-positive and p16-negative patients. Survival curves of dichotomized groups were calculated by the Kaplan–Meier method, and compared using Breslow tests. Cox regression hazard models were used for multivariate analyses evaluating OS, LRC, and DFS. Each of the potential prognostic factors (pGTV, nGTV, T category, N category, and stage) were analyzed in separate Cox regression multivariate models. The models were adjusted for age and gender as they are general clinically relevant factors. The results are presented as mean, median, or number of patients when appropriate. Continuous variables and dichotomized groups’ relationships were assessed by eta correlation coefficient. A two-sided p-value of less than 0.05 was considered statistically significant.

Results

Clinical patient and tumor data of 91 newly diagnosed oropharyngeal cancer cases were evaluated. Majority of the patients were male (72, 79.1%) and the mean age was 62 years (range 41.4–84.7 years). All patients had a minimum follow-up time of 31 months.

Immunohistochemical p16 status was determined for all cases. There were 72 (79.1%) p16-positive and 19 (20.9%) p16-negative cases. All expect 11 (12.1%) patients underwent definitive radiochemotherapy using IMRT. The reasons for the 11 patients receiving IMRT only without chemotherapy were as follows: small primary tumor size without regional metastases for two patients, age over 80 years for three, and high serum creatinine levels or other comorbidities for six. All the patient- and tumor-related demographic parameters are summarized in Table 1.

Table 1 Patient- and tumor-related demographic parameters according to p16 status

p16 status and survival

OS (p < 0.001), LRC (p < 0.001), and DFS (p < 0.001) rates were significantly more favorable among p16-positive patients compared with p16-negative patients during the whole follow-up. For p16-positive patients, the estimated 2‑year OS was 93.1% (67/72), DFS 87.5% (63/72), and LRC 93.1% (67/72), whereas for p16-negative patients, OS was 63.2% (12/19), DFS 57.9% (11/19), and LRC 68.4% (13/19). Distant metastases occurred only in two p16-positive patients and in two p16-negative patients during the whole follow-up and therefore the effect of this phenomenon on survival could not be evaluated statistically.

Tumor volume and prognosis in p16-negative patients

The mean pGTV for p16-negative tumors was 38 cm3 (standard deviation, SD, 45.87) and median 19 cm3 (range 1–147). pGTV as a continuous variable in a multivariate model in p16-negative patients showed an independent statistically significant impact on OS (p = 0.020; hazard ratio, HR, 1.02; 95% confidence interval, CI95%, 1.00–1.04) but not in LRC or DFS when age and gender were adjusted (Table 2). Furthermore, for different analyses, T categories of p16-negative tumors were distributed to low (T1–T2) and high (T3–T4b) T category groups. Small and large T categories groups were analyzed and compared in a similar multivariate model as above, without showing any significant prognostic value in any of the endpoints (Table 2). Due to high reciprocal correlation (eta correlation coefficient 0.978), it was not possible to present different T categories and pGTV in the same multivariate Cox regression model.

Table 2 Impact of primary gross tumor volume (pGTV), nodal gross tumor volume (nGTV), T category, N category, and stage on overall survival (OS), disease-free survival (DFS), and locoregional control (LRC) in p16-positive and p16-negative patients

For the survival analyses and in order to assess clear survival cut-off values, pGTV of p16-negative tumors was also dichotomized by its mean value to small (≤38 cm3) and large volumes (>38 cm3) and by its median value to small (≤19 cm3) and large volumes (>19 cm3). The group of large pGTV (>38 cm3) had statistically significantly poorer OS (p = 0.005) and DFS (p = 0.028), and closely but not significantly poorer LRC (p = 0.062) compared with the small pGTV group. Furthermore, when dichotomized and compared by its median value, large pGTV (>19 cm3) was associated with significantly poorer DFS (p = 0.049) and LRC (p = 0.029), and closely but not significantly poorer OS (p = 0.064) when compared with the small pGTV (≤19 cm3) group. As with the p16-positive patients, we also dichotomized T categories of p16-negative patients to small (T1–T2) and large (T3–T4b) T category groups. We found no significant differences in OS (p = 0.263), DFS (p = 0.458), or LRC (p = 0.371) between these two groups by Breslow test in Kaplan–Meier curves (Fig. 2) or in multivariate models (Table 2).

Fig. 2
figure 2

Impact of mean primary gross tumor volume (pGTVa), T category (b), N category (c), and stage (d) on overall survival (OS) in p16-negative patients

The mean nGTV for p16-negative tumors was 13 cm3 (SD 34.22) and median 0 cm3 (range 0–142). As a continuous variable, nGTV in p16-negative patients showed an independent statistically significant impact on OS (p = 0.027, HR 1.02, CI95% 1.01–1.04), LRC (p = 0.017, HR 1.02, CI95% 1.00–1.04), and DFS (p = 0.022, HR 1.02, CI95% 1.00–1.04) in a multivariate model (Table 2). nGTV dichotomized by its mean value to small and large volume groups did not show any differences in any of the endpoints. nGTV could not be dichotomized by its median value (0 cm3).

For further analyses, N categories were dichotomized to low (N0–N2a) and high (N2b–N3b) N category groups. We found no significant differences in OS (p = 0.548), DFS (p = 0.685), or LRC (p = 0.284) between these two groups by using Breslow test or in multivariate models. Clinical stage of p16-negative tumors where dichotomized to low (I–II) and high (III–IVc) stages. High-stage group (III–IVc) was associated with statistically significantly poorer OS (p = 0.046, HR 9.86, CI95% 1.05–93.03) but not with poorer LRC or DFS in multivariate models (Table 2). In addition, high-stage (III–IVc) group had no significant difference in OS (p = 0.169), DFS (p = 0.362), or LRC (p = 0.234) when compared with the low-stage (I–II) group. All the results of multivariate analyses are shown in Table 2. Survival differences between different groups of pGTV, T category, N category, and stage on OS in p16-negative patients are shown as Kaplan–Meier curves in Fig. 2 and on LRC in Fig. 3.

Fig. 3
figure 3

Impact of median primary gross tumor volume (pGTVa), T category (b), N category (c), and stage (d) on locoregional control (LRC) in p16-negative patients

Tumor volume and prognosis in p16-positive patients

The mean pGTV for p16-positive tumors was 23 cm3 (SD 29.03) and median 14 cm3 (range 1–190). As a continuous variable pGTV showed no independent statistically significant impact on any of the endpoints (OS, LRC, or DFS) in p16-positive patients in a Cox regression multivariate model when age and gender were adjusted (Table 2). Furthermore, for different analyses, T categories of p16-positive tumors were distributed to low (T1–T2) and high (T3–T4) T category groups. Dichotomized T categories were analyzed in a similar multivariate model as above and a significant prognostic value was not found in any of the endpoints (Table 2). Due to high reciprocal correlation (eta correlation coefficient 1.000), it was not possible to present different T categories and pGTV in the same multivariate Cox regression model.

For the survival analyses and in order to assess clear survival cut-off values, pGTV was dichotomized into two different volume groups by its mean value to small (≤23 cm3) and large volumes (>23 cm3) and by its median value to small (≤14 cm3) and large volumes (>14 cm3). We found no significant differences in OS (p = 0.676), DFS (p = 0.662), or LRC (p = 0.601) between small and large pGTV when dichotomized by its mean value or for OS (p = 0.507), DFS (p = 0.752), or LRC (p = 0.887) when dichotomized by its median value in p16-positive patients. Furthermore, T categories were distributed to small (T1–T2) and large (T3–T4) tumors and compared in a multivariate model without showing any statistically significant impact on any of the endpoints (Table 2). Small and large T categories were also analyzed with Breslow test as Kaplan–Meier curves, but no significant prognostic differences were found for OS (p = 0.895), DFS (p = 0.901), or LRC (p = 0.603).

The mean nGTV for p16-positive tumors was 26 cm3 (SD 34.12) and median 15 cm3 (range 0–200). As a continuous variable, nGTV showed an independent statistically significant impact on DFS (p = 0.005, HR 1.02, CI95% 1.01–1.03) in p16-positive patients in a Cox regression multivariate model when age and gender were adjusted (Table 2). Furthermore, large nGTV was associated with statistically significant poorer DFS (p = 0.046) when dichotomized by its mean value to small (≤26 cm3) and large (>26 cm3) volumes. Statistically significant differences were not found in OS (p = 0.508) or LRC (p = 0.159) between small and large nGTV when dichotomized by its mean value. nGTV was also dichotomized and analyzed by its median value, but significant prognostic differences in OS (p = 0.625), DFS (p = 0.627), or LRC (p = 0.673) were not found between small and large volumes. For comparison, we also dichotomized N categories to low (N0–N1) and high (N2–N3) N categories and clinical stages to low (I–II) and high (III–IV) stages. Different N categories or clinical stages showed no prognostic differences in any of the endpoints in multivariate models when age and gender were adjusted (Table 2). In addition, comparison of low and high N categories showed no significant prognostic differences in OS (p = 0.124), DFS (p = 0.328), or LRC (p = 0.806), whereas the high clinical stage (III–IV) group had significantly poorer DFS (p = 0.041) but not OS (p = 0.056) or LRC (p = 0.123) when compared with the low-stage (I–II) group. Multivariate analyses are shown in Table 2. Survival differences in DFS between different groups of nGTV, T categories, N categories, and stages in p16-positive patients are shown as Kaplan–Meier curves in Fig. 4. Finally, an example of a treatment planning CT scan of a p16-positive tumor is illustrated in Fig. 5.

Fig. 4
figure 4

Impact of mean nodal gross tumor volume (nGTV) (a), T category (b), N category (c) and stage (d) on disease-free survival (DFS) in p16-positive patients

Fig. 5
figure 5

P16-positive tumor (T2N2) of the base of tongue: primary gross tumor volume (pGTV) yellow contour, nodal gross tumor volume (nGTV; left) brown contour; clinical target volume (CTV) light blue contour; planning target volume (PTV) red contour; right submandibular gland blue contour; left submandibular gland dark blue contour; spinal canal purple contour. Metastasis at the right side of neck is not visualized at this level

Discussion

This single-institution study on 91 OPSCC patients treated with definitive radiochemotherapy or radiotherapy using IMRT provides new evidence that pGTV and nGTV may have specific roles as independent prognostic factors for the subgroups of p16-negative and p16-positive cancers.

pGTV was found to be an independent prognostic factor in p16-negative patients for OS in multivariate analysis where age and gender were adjusted. Large p16-negative tumors with the mean pGTV (38 cm3) as the cut-off value were associated with significantly poorer OS and DFS, and the trend was similar with LRC but not significant when compared with smaller tumors. p16-negative tumors larger than a median pGTV (>19 cm3) were associated with significantly poorer DFS and LRC, and the trend was similar for OS but not statistically significant. For comparison, T categories, N category, and stage of p16-negative patients were dichotomized into low and high groups but none had differences in any of the endpoints. In multivariate analyses of T category, N category, and stage, only stage had prognostic impact on OS but the confidence interval was relatively high. pGTV, T category, N category, or clinical stage showed no significant impact on any of the endpoints in p16-positive patients in multivariate analyses. Our findings of the prognostic value of pGTV in OPSCC are in line with previous studies [1, 3, 6, 17, 18] but we are the first to report these findings separately in p16-positive and p16-negative tumors and according to the newest TNM classification [19].

nGTV of p16-negative tumors had significant prognostic value in all endpoints in multivariate analyses but significant differences were not found between large and small nGTV groups when dichotomized by its mean or median value. As a conclusion, nGTV may serve as a prognostic factor for p16-negative patients but clear cut-off values remain unknown. nGTV of p16-positive tumors was an independent prognostic factor for DFS and LRC in multivariate analyses and DFS was significantly poorer among large tumors when dichotomized by mean nGTV (26 cm3) to large and small tumors. Differences in survival were not found between large and small tumors when the cut-off value was assessed by nGTV median value (15 cm3). Davis et al. 2016 had similar results of nGTV for DFS in p16-positive tumors in multivariate analysis but they did not analyze impact of nGTV on LRC. To our knowledge, this is the first report to study the prognostic value of pGTV and nGTV both in multivariate analyses and with different volume cut-off values in p16-positive tumors.

As shown in various studies [21, 25, 30], p16-positive primary tumors are in general smaller compared with p16-negative tumors by T category but have a tendency to spread more often to regional lymph nodes. Despite regional spreading, their response to (chemo)radiotherapy, and thus also locoregional control, are more favorable [21, 23, 25]. Our findings are in line with previous studies showing p16-negative tumors to be larger by T category, whereas p16-positive tumors have a tendency to spread regionally more often. In addition, the prognosis of p16-positive patients was significantly more favorable compared with p16-negative tumors. pGTV of p16-negative tumors was clearly larger and nGTV smaller when compared with p16-positive tumors, which correlates with their T and N categories defined with the new 8th TNM classification. However, a few cases had different T categories or N categories despite similar pGTV and nGTV values (data not shown). The definition of T category and N category is based on diametric and anatomic criteria [15, 16], whereas volume is measured three-dimensionally, which seems to give a more accurate dimension for the tumor and thus explain our findings. Larger tumor mass (volume) may contain a larger number of malignant tumor cells [11,12,13], which in certain cut-off values may impair treatment response to concurrent radiochemotherapy or radiotherapy using IMRT in OPSCC. In addition, the differences in treatment response may be explained by the differences in biological and genetic background of p16-positive and p16-negative tumors according to the viral status [24, 31]. We found a significant impact of pGTV on all endpoints in p16-negative patients depending on mean or median cut-off values as shown above.

There are clear etiological, biological, clinical, and treatment planning differences between p16-positive and p16-negative tumors. Furthermore, the majority of newly diagnosed OPSCCs are currently p16-positive in Western countries [23]. Therefore, the recently launched UICC (8th edition) TNM classification includes specific criteria for p16-positive patients [20]. The new classification also includes extracapsular spread as a parameter for nodal extension estimation and thus further aids in treatment planning and prognostication [19, 32]. In our study, TNM classification of the tumors was made with the newest 8th edition, which was a clear strength of the study design compared with previous OPSCC studies. Another study strength was a homogenous study group with a long follow-up time without dropouts. The clear study limitations were lack of HPV detection with PCR or in situ hybridization, and a relatively small number of p16-negative patients. In addition, there was a rather small number of events in survival analyses, which is the consequence of successful treatment planning and favorable response to (chemo)radiotherapy.

In previous studies of HNSCC [6, 9, 10, 13, 33], pGTV has shown higher prognostic value compared to TNM classification as well as in the present study in p16-negative patients. Additionally, our results on nGTV showed higher prognostic value in multivariate analyses of p16-negative patients and p16-positive patients. Whether this result is partly due to the fact that all the patients in the present series were treated with definitive radiochemotherapy or radiotherapy using IMRT and not with a combination of surgery and postoperative radiotherapy remains to be elucidated.

In the present study, mean pGTV as a cut-off value was closer to the point which clearly distributes tumors to groups with either favorable or non-favorable prognosis, as compared to the median volume. In previous studies, pGTV cut-off values have been assessed with a large range from 6 cm3 to 70 cm3 by median or mean volume without a clear consensus [2,3,4, 6, 9, 18]. In the present study, larger p16-negative tumors than mean pGTV (>38 cm3) were associated with poorer OS and DFS, whereas p16-negative tumors larger than median pGTV (>19 cm3) were associated with poorer LRC. To our knowledge, this is the first study to assess and compare cut-off value with mean and median values in the same study cohort.

Conclusion

pGTV may serve as an independent prognostic factor in p16-negative patients in order to predict poor treatment response to radiochemotherapy or radiotherapy using IMRT. nGTV may serve as an independent prognostic factor both in p16-positive and p16-negative OPSCC treated with radiochemotherapy or radiotherapy using IMRT. There are many ongoing trials where the aim is to establish specific patient selection criteria for de-escalation protocols. Our findings on tumor-related factors might have an impact in this field in the future. For this reason, future studies are warranted with larger patient cohorts to confirm these results and in order to assess clear cut-off values for favorable and non-favorable treatment response.