Introduction

Head and neck cancer was estimated to newly affect approximately 53,000 patients and to be responsible for 10,750 deaths in the USA in 2020 [1]. Head and neck cancer arises in the epithelial layers of the upper aerodigestive tract (the oral cavity, the nasopharynx, the oropharynx, the hypopharynx, and the larynx) [2]. Ninety percent of head and neck cancers are squamous cell carcinomas (HNSCC).

Panendoscopy of the upper aerodigestive tract and/or oral inspection are described for investigating patients suspected to have a head and neck cancer [3, 4]. Classically, panendoscopy is conducted with cold-light illumination in patients. It is used (1) to obtain biopsies for diagnostic confirmation of HNSCC, precancerous lesions, and/or other simultaneous detection of other lesions; (2) to assess the tumor extension by tumor inspection and palpation; and (3) to guide therapeutic decisions [4].

The potential of optical fluorescence imaging for tumor tissue identification and optimization of locoregional control by early detection and removal of subclinical disease has been described over the past decade [5]. The development of optical fibers over the last 20 years has boosted the emergence of new optical instruments and techniques for clinical use. One such technique is probe-based confocal laser endomicroscopy (pCLE), a noninvasive and real-time imaging technique allowing acquisition of in situ images of the tissue microarchitecture. The cellular scale images are similar to those provided by conventional histopathology of ex vivo samples [6]. The technique is mainly used in gastroenterology to make in vivo clinical diagnoses and monitor explorations (Barrett’s esophagus, pancreatic cysts, biliary stenosis, etc.) [7, 8]. pCLE is also in development for characterizing head and neck cancer [9,10,11].

We have previously reported the ex vivo evaluation of five fluorescent dyes for analysis of head and neck cancer tissue by pCLE after topical application [12]. The combination of pCLE imaging with these dyes provided interpretable images similar to conventional histology images and offered a noninvasive way of staining tissues in vivo. Patent blue V was selected for this clinical study due to ex vivo high image quality after topical application. This is an off-label use authorization of the patent blue V. In the last decade, head and neck cancer microimaging by pCLE has typically occurred following intravenous administration of fluorescein with a short efficient imaging window [13]. To date, it is mostly the oral cavity and vocal folds that have been targeted by pCLE, and ear nose and throat (ENT) physicians and pathologists have interpreted the data postoperatively. Sensitivity has been reported as ranging from 45.5 to 100%, and specificity ranges from 60 to 100% [14,15,16]. In medical practice, pCLE could allow for in vivo histological diagnosis without invasive biopsy, helping to better characterize early cancers and preinvasive forms, and could identify the tumor margin prior to endoscopic laser resection.

For this study, we aimed to validate the in vivo value of pCLE after topical application of patent blue V in the management of patients with head and neck cancers. pCLE imaging was considered to be a “virtual biopsy” that we compared to the gold standard: the final diagnosis based on histological sections.

Material and methods

Patient population

The present patient population consists of 44 consecutive patients over 18 years old eligible for the study if they had early laryngeal, oro/hypopharyngeal, and oral cavity cancer (T1–T2). Patients with the following conditions were excluded: history or allergy to patent blue V, pregnant women, or breastfeeding women. Included patients were treated at the Gustave Roussy Cancer Center (Villejuif, France) from 2013 to 2016 (n = 34) and at the Oncopôle Hôpital régional de Toulouse from 2016 to 2017 (n = 10).The institutional review board approved the prospective study, and informed consent was obtained from all patients. The trial was registered at clinicaltrials.gov under the number NCT01626638.

Forty-four patients underwent white-light examination or panendoscopy according to the lesion’s location. After examination, 19 patients had biopsies taken and 33 patients underwent endoscopic laser resection or intraoral resection as planned therapeutic procedures. Two pCLE examinations were canceled due to technical failure and data from finally 42 patients could be analyzed (Fig. 1).

Fig. 1
figure 1

Synopsis of the study from image collection to image interpretation. A total of 42 patients were included in the clinical study. A Patent blue V dye was applied topically after induction of anesthesia and panendoscopy or oral examination. B Probe-based confocal laser endomicroscopy (pCLE) images were acquired intraoperatively on tumor (T) and corresponding margins (M) but interpreted postoperatively. Laser resections or biopsies were processed after the imaging session for further conventional histology. C pCLE images were blindly interpreted in a random order by three pathologists (random assignment of 50% of the images for the third pathologist) and then compared to the final histology diagnosis (hematoxylin, eosin, and saffron [HES] staining). D Flowchart of included patients and diagnoses for pathologist 1

Fluorescent dye administration

In the operating room and after patient anesthesia, the fluorescent dye patent blue V (sodium patent blue V 2.5% injection, Guerbet, Villepinte, France) was diluted in NaCl 0.9% at 0.025%. Off-label patent blue V was applied topically for 30 s on the lesion after white-light examination or panendoscopy, and then the imaging session was conducted (Fig. 1A–B).

Instruments

pCLE imaging of the head and neck lesion and its margins was achieved for all patients (Fig. 1A). Fluorescence images were recorded using a fibered imaging system, Cellvizio® (Mauna Kea Technologies, Paris, France). The pCLE consists of a flexible, fibered miniprobe connected to a laser unit, which is equipped with a laser diode operating at 660 nm, a rapid scanning laser (frame rate from 8 to 12 images per second), and an avalanche photodiode to detect the fluorescence signal. For larynx examination, the probe was introduced through an Albarran lever or applied directly to the lesion in the case of oral cavity lesions. In contrast with conventional histopathology, images obtained are not perpendicular, but en face to the surface of the tissue sections. The data was acquired using a confocal UHD miniprobe (Mauna Kea technologies, Paris, France), which has a 240-µm diameter field of view. This fiber bundle, made of 30,000 optical fibers, has a transverse resolution of 1 µm and an axial resolution of 10 µm. The images were taken at depths varying between 55 and 65 µm below the surface. Image reconstruction using a video mosaicing technique (Mauna Kea technologies, Paris, France) provided a representation of a larger field of view according to the number of recorded images needed to scan a zone tissue area.

Imaging protocol

During examination, the lesion was firstly detected macroscopically. Then, both the lesions and its limits when accessible were imaged with pCLE by the head and neck surgeon (maximum of five zones imaged) as follows:

  • Four videos were recorded in the tumor zone.

  • One to four zones in the tumor’s margin (anterior, posterior, medial, and lateral) were filmed.

After the imaging session, a biopsy or laser resection of the tumor was carried out by ENT surgeons.

Finally, fresh head and neck biopsies or specimens were transferred to the pathology department.

After the surgery, at distance of the exam, a maximum of five images and one video per zone were selected based on limited motion artifacts and visible structures for blinded interpretation by pathologists.

Videos are proposed in supplement 1.

Histopathology

Head and neck biopsies or specimens were fixed in formol, embedded in paraffin, sectioned at a thickness of 3 μm perpendicular to the surface mucosa, stained with hematoxylin, eosin, and saffron (HES), and interpreted by pathologists from the Gustave Roussy pathology department (n = 32) or pathologists from Toulouse Oncopôle pathology department (n = 10) according to standard guidelines. The histological diagnosis based on HES was considered to be the gold standard for the study.

Interpretation and classification of pCLE images from each zone

Three pathologists retrospectively interpreted pCLE images and videos from each zone according to their individual level of expertise in optical biopsy.

  • PT1: A senior pathologist experienced in histological assessment of HNSCC and with prior experience in optical biopsies [12][17]

  • PT2: A junior pathologist experienced in histological assessment of HNSCC and with prior experience in optical biopsies [17]

  • PT3: A junior pathologist experienced in histological assessment of HNSCC and without prior experience in optical biopsies

Clinical data, including macroscopic imaging and patient records, were not available to the pathologists during the study. All pCLE images and videos were presented to the pathologists independently of any patient-specific information, with no information on the location of the tumor, and in a random order. The pathologists classified each zone imaged in two levels to assess whether pCLE provides enough information to characterize head and neck tissues.

First, the pathologists classified each zone imaged into three groups: noncancerous tissue; precancerous and cancerous tissue; and unclassifiable tissue. Subsequently, when possible, they further refined the histological diagnosis based on the pCLE characteristics defined in previous work [12] into squamous epithelium; respiratory epithelium; necrosis, fibrosis, inflammation, benign tumor; low-grade dysplasia; high-grade dysplasia/in situ carcinoma; microinvasive HNSCC; invasive poorly/moderately or well-differentiated HNSCC; and other types of tumor.

For each interpreted zone, the pathologists retained the worst-scenario diagnosis (with consequences on therapeutic management).

Statistical analysis

Quantitative variables were described using median and range. Qualitative variables were described using frequency and percentage. Sample size was estimated to n = 140 images with corresponding HES section with a targeted concordance rate of 90% and a precision of 10%. Diagnostic performance was estimated using proportions and their two-sided 95% confidence intervals (CIs) adjusted for the correlation between multiple observations within each patient [18]. Ignoring the correlation leads to misleadingly narrow CIs, although the mean estimate of sensitivity and specificity is unbiased. The method adjusts only by making the CIs wider. pCLE interpretable images in each zone with corresponding HES sections were included in the statistical analyses. Missing data were not replaced. Statistical analyses were performed with SAS software (v9.4; SAS Institute, Cary, NC).

Results

Patient characteristics and pCLE protocol

Forty-four patients met the inclusion criteria, but only 42 patients were imaged by pCLE due to technical issues at the time of two medical examinations. Table 1 details the characteristics of the included patients, tumor location, and intervention type; Fig. 1D details PT1’s flowchart.

Table 1 Clinical characteristics of the 42 patients included in the study

Topical application of patent blue V dye was not associated with any adverse effect or allergy. The quality of staining with patent blue V varied and depended on tumor location and/or the macroscopic appearance of the lesion (ulceration, plane form, necrosis, erythroplakia, or leucoplakia). When low staining was observed, topical application was repeated twice to enhance contrast images.

Conducting pCLE was difficult when examining the larynx of the fifteen corresponding patients (Table 1), as the optical probe was not optimized for ENT specialty care. The flexibility of the optical probe made it difficult to access the lesion and identify the margins accurately. After the first five larynx examinations, we introduced the optical probe through an Albarran lever to enhance the imaging protocol.

Zones imaged by pCLE

For 42 patients, 96 zones could be imaged (2.9 images per zone) and compared to final histology (HES section). The videos were of heterogeneous quality, mainly due to difficulties stabilizing the miniprobe for image acquisition (8 images/sec) during patient examination. As a consequence, 7.1% of the data acquired in tumor zones could not be presented to pathologists. In these cases, images were mainly composed of artifacts. However, 92.9% of recorded zones were presented to PT1 and PT2 and 64.3% were presented to PT3. The pathologists’ evaluation of the pCLE images in each zone was then based on recognition of architectural and cellular criteria. Importantly, PT1 considered that poor quality prevented 26.8% of the zones from vocal folds from being informative. When the head and neck pCLE images in a zone were considered by our pathologists to be of at least sufficient quality to allow evaluation, it was because the fluorescent dye had allowed morphological analysis of the images based on architecture, size of cells, and nuclei shape. Heterogeneous patent blue V staining and arbitrary dye loading resulted in limited visualization of cells and/or nuclei in some images. Despite this, images supplied information on the heterogeneous distribution of cancerous cells surrounded by inflammatory and/or fibrous stroma in precancerous or cancerous tissue (Fig. 2A and C). In contrast, regular cell distribution and homogeneous nuclei size distribution were visible in noncancerous zones (Fig. 2D). Keratinization abnormalities such as keratin pearls could be recognized by pathologists on pCLE images, helping to distinguish well-differentiated SCC from normal squamous epithelium.

Fig. 2
figure 2

Typical pCLE images of invasive well-differentiated head and neck squamous cell carcinoma (HNSCC) (A and C) with corresponding HES sections (B and D) and squamous epithelium (E) with corresponding HES section (F)

pCLE interpretation

A total of 96 zones with corresponding HES sections from 42 patients were analyzed by pCLE by PT1 and PT2. PT3 interpreted 54 zones with corresponding HES sections. The data available to PT3 were randomly selected by the statistical team who were not involved in the data acquisition so as not to bias the results.

All pathologists had equivalent concordance rates in the tumor zone (from 69.6 to 74.8%) and anterior margin zone (50%), independently of previous experience in optical biopsy (Table 2). Figure 3 presents typical pCLE images of cancerous tissue (Fig. 3A and B), and squamous (Fig. 3C and D) and respiratory epithelium (Fig. 3E) that PT1 interpreted in concordance with the histological diagnosis.

Table 2 Concordance per zone and overall—pCLE diagnosis versus final histology
Fig. 3
figure 3

Typical pCLE images interpreted by pathologist 1 in concordance with histological diagnosis: A and B invasive well-differentiated HNSCC; C normal squamous epithelium from floor of mouth; D dorsum of the tongue; and E normal respiratory epithelium from larynx

We next compared concordance rates by anatomical location of tumor (Table 3). For PT1 and PT2, the concordance between pCLE diagnosis and final histology diagnosis was higher in images from the oral cavity (from 68.6 to 77.1%) than in those from the oropharynx + hypopharynx + larynx (from 40.5 to 59.1%); this was in accordance with the complexity of obtaining high-quality images of the pharynx and larynx.

Table 3 Concordance rate—pCLE diagnosis versus final histology by anatomical location of tumor

We analyzed discrepancies according to tumor location for PT1, the most experienced pathologist in our study (Table 4). False negatives and false positives mainly occurred when imaging vocal folds or the tongue. Figure 4 presents pCLE images for which there was a discrepancy between PT1’s interpretation and the histological diagnosis. A poorly differentiated HNSCC (histological diagnosis) was misclassified as respiratory epithelium (Fig. 4A) and a well-differentiated HNSCC was misclassified as noncancerous tissue (Fig. 4B). Respiratory epithelium was misclassified as low-grade dysplasia (Fig. 4C) and high-grade dysplasia/in situ carcinoma wrongly classified as squamous epithelium (Fig. 4D).

Table 4 Discrepancies according to tumor location for experienced PT1
Fig. 4
figure 4

Typical pCLE images interpreted by pathologist 1 that had discrepancies between interpretation and final histological diagnosis: A invasive poorly differentiated HNSCC wrongly interpreted as respiratory epithelium; B respiratory epithelium wrongly interpreted as low-grade dysplasia; C noncancerous tissue wrongly interpreted as invasive well-differentiated HNSCC; and D high-grade dysplasia/in situ carcinoma wrongly interpreted as squamous epithelium

Prior experience was not a critical point for the sensitivity scores (Table 5), as results were comparable between pathologists (from 73.2 to 75%). However, an obvious difference was observed for specificity scores, with specificity increasing with the pathologists’ experience (from 30% for PT3 to 57.4% for PT1). Although previous experience in ex vivo optical biopsies improved specificity scores, inaccurate interpretation of normal tissue imaged by pCLE was nevertheless a serious drawback with this technology.

Table 5 Sensitivity and specificity of pCLE imaging in head and neck cancer based on 96 interpreted zones

Discussion

Acquiring pCLE videos during patient examination was the main challenge of this clinical study: 7.1% of the data acquired in tumor zones could not be presented to pathologists. These videos contained motion artifacts and/or an absence of visible structures due to low staining. Manual exclusion of these videos could be enhanced by the deep learning-based detection of motion artifacts described recently by Aubreville et al. [19] for head and neck cancer pCLE imaging. The optical probe of the Cellvizio system was developed for gastrointestinal imaging [8] and was not suitable for head and neck imaging. The flexibility of the probe greatly limited the precise positioning of the probe on the margins of vocal folds tumors. To improve positioning of the probe during panendoscopy, we stabilized the probe with a cystoscope combined with an Albarran lever. Despite this, the poor quality of a quarter of the zones imaged from the vocal folds rendered them uninterpretable by the pathologists. In particular, images of the anterior and lateral margins were difficult to acquire even though surgeons’ expertise increased during the clinical study. For the oral cavity, zones imaged from the ventral surface of the tongue and the floor of the mouth were difficult to record. Due to the softness of the tissue, it was difficult to maintain adequate contact between the probe and the mucosa during video acquisition. We included patients with small lesions, and as a consequence, patients with early T1 and T2 were selected. Patients presented at hospital for medical examinations for early lesions when they described voice alteration (already observed when small lesions on vocal cords) or when they noticed small permanent and/or progressive lesions in the mouth. Moreover, oral cavity and larynx were preferred because these lesions were reachable with the probe. For oropharynx, the lesions were not easy to reach to obtain good images. And there is not a lot of T1-T2 in our hospitals. For hypopharynx, T1-T2 is rare and the probe with the Albarran lever is not easy to use during the laryngoscopy. For these reasons, we included only two patients with oropharyngeal and hypopharyngeal cancer in this study and focused our attention on oral cavity and larynx.

Routine patient examination followed by pCLE imaging extended the time spent in the operating room by 30–45 min. In our previous ex vivo study, we produced rapid high-quality pCLE images of head and neck tissues [12]. However, in this study under clinical conditions, we encountered technical issues, staining heterogeneity, and anatomical constraints. Five head and neck surgeons participated in the study and a learning curve was also highlighted; the surgeon who had the highest proportion of interpretable zones included in the study improved her ability to acquire good-quality images during the clinical study. A training program for pCLE management by surgeons could increase the quality of video acquisition.

To date, all published clinical studies in head and neck endomicroscopy have used fluorescein as the contrast agent [14, 15, 20,21,22,23]. This dye, administered intravenously, has also been used extensively in gastrointestinal endomicroscopy [24]. The fluorescent dye diffuses from the capillaries to the interstitial space, allowing observation of cell limits. Image interpretation is then based on cell distribution and cell size [16]. Recently, Sievert et al. described the comparison between healthy tissue and squamous cell carcinoma based on blood vessels and intraepithelial capillary loops after fluorescein injection [25]. Given the risk of allergy that exists with systemic injection, we have previously evaluated the topical application of several dyes and selected patent blue V for the clinical study [12]. Patent blue V has regulatory approval in our country for sentinel lymph node detection [26]. Ex vivo imaging gave high-quality images of samples irrespective of the anatomical source. Topical application of patent blue V resulted in ex vivo fluorescence in the cell cytoplasm with mostly unstained nuclei [12]. However, in clinical practice, we noticed heterogeneous dye loading after the first topical application, which could be explained by saliva or mucus limiting penetration of the dye. As a consequence, we repeated topical applications to improve video quality. We also observed random loading of the dye into tissue structures, which may have disrupted pathologists’ interpretations, as fluorescent labeling of tissue varied among patients. Hyperkeratosis could also be one of the major limitations of analyzing head and neck mucosa by pCLE, a drawback also described by Just et al. [27] and Muldoon et al. [28]. This phenomenon is associated with patients who are heavy smokers. Hyperkeratosis limits penetration of the fluorescent dye in the case of topical application and reflects excitation light. Hyperkeratosis was found in 28/42 patients in our study. Blinded pCLE image interpretation without access to clinical information was also more difficult than routine practice. However, we wanted to focus on the pCLE images in each zone itself and exclude any outside influence on its interpretation.

Herein, with 42 patients imaged, we present the largest series to date of patients imaged by pCLE for detection of head and neck cancer. Other studies described results based on patient numbers ranging from 7 to 23 individuals and following intravenous fluorescein injection. Our prospective clinical trial imaged mainly tumors of the tongue (n = 14) and the vocal folds (n = 14). As in other published studies, we found specific constraints to acquiring pCLE images in the oral cavity and pharynx, such as soft tissues or difficult access. Our results were not as good as expected. The sensitivity was higher than specificity, as Goncalves et al. [15] and Volgger et al. [14] also found. Furthermore, although both sensitivity and specificity were lower than in other studies, they were based on a larger cohort. This study also focused on pCLE images not only from the tumor area but from corresponding tumor margins as well. The poorer performance was mainly in the interpretation of images of the posterior, medial, and lateral margins; these area were difficult to image, and this had consequences on specificity scores.

In this study, pathologists interpreted multiple fields of view (maximum of five images per zone) and a video per zone to reduce the risk of not detecting the invasive areas of HNSCC and more generally to reduce the risk of misdiagnosis, particularly the risk of false negatives. Although the recognition of architectural and cellular abnormalities in pCLE images made it possible to distinguish between cancerous and noncancerous tissue, the degree of invasion could not be specified with certainty, as the imaging is achieved at a depth of 60 µm. Based on the results of this clinical study, pathologists could, in most cases, determine a diagnosis of “precancerous or cancerous tissue” and then refine the histological diagnosis to HNSCC. However, conclusions on whether the HNSCC was invasive or not and on the level of differentiation could not be made with confidence by the pathologists based on pCLE. pCLE may be combined with other optical imaging technologies to support the identification of invasive HNSCC. Multimodal technologies such as optical coherence tomography, narrow band imaging, or Raman spectroscopy combined with pCLE could overcome pCLE limits [14, 29, 30].

One of the main questions in pCLE imaging remains: Who should interpret in vivo microimaging? On the one hand, real-time interpretation carried out by the physician in charge of the examination has occurred in other medical applications of pCLE [31]; on the other hand, telepathology could make it possible for the pathologist to make the diagnosis, given that they are trained in and expert at interpreting at cellular level data [32]. In our study, pCLE images were interpreted by three pathologists not at the time of the exam. ENT surgeons did not interpret the data to avoid biasing the results; they had participated in data acquisition and had knowledge of the medical case and macroscopic imaging. Although our results are not in favor in clinical adoption of pCLE in head and neck cancer imaging, we believe that deep-learning assistance could support pathologist interpretation and improve sensitivity and specificity scores, as described by Dittberner et al. [20].

Conclusion

Practical and technical adjustments were needed to analyze head and neck lesions in various anatomical sites by pCLE. Image interpretation by three pathologists resulted in reasonable sensitivity scores but low specificity scores, mainly due to difficulties in acquiring relevant images in tumor margin zones. Difficulty identifying the invasive areas of HNSCC was also one of the main limitations of the study; it is possible that this could be overcome by multimodal imaging.