Gastric cancer is an important cause of cancer mortality. Screening for precursor lesions is a cost-effective measure to detect early gastric cancer and precancerous lesions [1] and potentially improves the prognosis of this disease. Esophagogastroduodenoscopy is widely utilised for early detection of gastric lesions. Gastric intestinal metaplasia (GIM) constitutes an important step in gastric carcinogenesis [2]. Unlike the case of Barrett’s esophagus, GIM is not easily diagnosed by white-light endoscopy (WLE) alone [3]. Autofluorescence imaging (AFI) also was studied for the diagnosis of intestinal metaplasia but the results were suboptimal [4]. Magnifying narrow-band imaging (mNBI) was reported to be accurate for the diagnosis of gastric lesions [5] but does not allow real-time visualization at the subcellular level, which is needed for definitive diagnosis. Confocal endomicroscopy allows virtual histological diagnosis and targeted biopsy. Our group had reported that experienced confocal endoscopists can competently diagnose GIM and gastric cancer on CLE images [6] and Guo et al. [7] had shown that endoscopy-mounted confocal endomicroscopy was more accurate than WLE for the diagnosis of GIM. There is no corresponding data for probe-based confocal laser endomicroscopy (pCLE) in the diagnosis of GIM. pCLE demonstrated higher sensitivity with similar specificity compared with virtual chromoendoscopy in classification of colorectal polyps [8]. There are no data on the accuracy of pCLE compared to virtual chromoendoscopy for diagnosing GIM.

We aimed to determine the sensitivity, specificity, and accuracy of pCLE for diagnosing GIM and to compare the accuracy for diagnosing GIM between pCLE and WLE, between pCLE and AFI, and between pCLE and mNBI, using histology as the “gold standard.”

Methods

Patients

Patients were included if they were at high risk for gastric cancer, namely they were Chinese subjects aged 50 years and older with a known past history of GIM confirmed on histology diagnosed on a previous gastroscopy done at least 1 year prior. Patients were excluded if they were unable to give informed consent, had allergy to fluorescein, had renal impairment with serum creatinine above the upper limit of normal, were pregnant or breast feeding, or had uncorrected coagulopathy or severe thrombocytopenia precluding biopsy. All subjects gave informed consent before the study. This study was approved by the institutional review board.

Randomization

Patients were randomized into two groups. The randomization sequence was generated by a computer programme in blocks of 10 and was concealed. The randomization assignment was sealed in an opaque envelope, which was opened immediately before each procedure. Subjects in group A had WLE followed by AFI, mNBI, and pCLE in that sequence, whilst group B subjects received WLE followed by mNBI, AFI, and pCLE in that sequence. The difference between these two groups was the sequence of two endoscopy imaging modes: AFI and mNBI. Diagnosis made during mNBI and AFI must be based on predefined criteria and should not be influenced by the preceding imaging modality. To control for possible bias on the third imaging modality caused by influence from the preceding imaging modality, patients were randomized to receive mNBI before AFI or AFI before mNBI.

Endoscopic procedure

All procedures in the study were performed by a single endoscopist (LGL) who was trained in image-enhanced endoscopy and confocal laser endomicroscopy (both the endoscopy mounted version and the probe-based version). The endoscopist was blinded to the sites of GIM diagnosed on the previous gastroscopy. Before the procedure, all patients were given oral acetylcysteine 1,200 mg in 100 ml of water as mucolytic agent. All patients had conscious sedation with intravenous midazolam and intravenous fentanyl for the gastroscopy, which was performed under continuous monitoring of vital signs, with supplemental oxygen. Intravenous hyoscine 10 mg was administered as an antispasmodic agent. WLE was first performed in the usual fashion up to the descending duodenum. During WLE, the endoscopist informed the research nurse the real-time endoscopic diagnosis for each of the following sites: lesser curvature of the antrum within 2–3 cm of the pylorus, greater curvature of the antrum within 2–3 cm of the pylorus, incisura angularis, lesser curvature of the corpus 4 cm proximal to the angulus, middle portion of the greater curvature of the corpus 8 cm from the cardia, cardia within 1 cm below the oesophagogastric junction (defined as the point where gastric folds disappear), in addition to any lesion. Suction was applied immediately left to the examined site to leave a suction polyp which was used as a marker to guide subsequent examinations with the other modalities.

Following this, the endoscopist pressed a button on the endoscope to switch to AFI (for group A) or mNBI (for group B), and all of the above-mentioned sites were again examined and the AFI or mNBI diagnosis recorded real-time by the research nurse. The endoscopist then switched to the third modality (mNBI for group A and AFI for group B) by pressing another button, and the same sites were again examined with real-time diagnosis recorded. Subsequent to this, fluorescein sodium 5 ml (10 %; Pharmalab, NSW, Australia) was injected intravenously and the pCLE probe was inserted via the working channel of the endoscope. pCLE examination was performed for all sites examined by previous modalities (as stated above), confocal videos were viewed simultaneously real-time, and the pCLE diagnosis was recorded on-site by the research nurse. After examination with all four modalities, forceps biopsies were obtained from all examined sites and sent for histology.

Histopathology

All tissue biopsies were interpreted by a gastrointestinal pathologist (SS) and verified by a second gastrointestinal pathologist (MT). Both of them were blinded to the endoscopist’s diagnosis of each biopsy site. Gastric biopsy specimens were fixed in 10 % neutral buffered formalin, processed, embedded in paraffin, and cut in 4-μm sections. Slides from each specimen were stained with haematoxylin and eosin for routine histopathologic examination. Histology was the “gold standard” for the diagnosis of GIM.

Off-site pCLE video review

After all 20 recruited subjects had undergone endoscopy, the pCLE videos were reviewed off-site in a random order by the endoscopist (LGL) blinded to the endoscopy findings and histology. Sequences were viewed in their original format (.mkt, proprietary format, Mauna Kea Technologies, Paris, France). This removed the bias of WLE, AFI, and mNBI findings on the interpretation of pCLE videos.

Endoscopic equipment

The GIF-FQ260Z Evis Lucera Gastrointestinal Videoscope (Olympus, Tokyo, Japan), which combines AFI, NBI, and optical magnification (up to 85× zoom) modes into an endoscope, was used. pCLE was performed using the Cellvizio-GI system (Mauna Kea Technologies). It consists of a laser scanning unit that combines laser light illumination and rapid laser scanning, confocal miniprobes (GastroFlex; Mauna Kea Technologies) that can be inserted through the working channel of an endoscope, and control and acquisition software for real-time image reconstruction, immediate sequences display, and post-procedure analysis with editing tools.

Diagnostic criteria

WLE appearance of intestinal metaplasia was defined as ash-coloured nodular change, which might be solitary, multiple but localized, or diffuse, as described by Kaminishi et al. [3]. GIM was diagnosed on AFI when there was homogenous green appearance. Inoue et al. reported that AFI showed homogeneous green areas in regions where intestinal metaplasia was prevalent and that purple areas had little intestinal metaplasia, yielding a per-biopsy accuracy of 76 % for diagnosing GIM [4]. mNBI diagnosis of GIM was defined as the presence of the light blue crest representing the presence of a histological brush border [5]. Our group had previously described confocal endomicroscopic appearance of GIM, with goblet cells appearing as dark, cup-shaped shadows interspersed between the epithelial cells, which are slightly larger than the neighbouring epithelial cells. The dark shadows have a definite orientation around a glandular lumen. The shadows are slightly larger than the neighbouring epithelial cells. The columnar-lined epithelium exhibits villous architecture [6].

Sample size calculation

The analysis was performed per biopsy site. WLE had been reported as having a sensitivity of 10–40 % for diagnosing intestinal metaplasia and gastric neoplasia [3, 7]. There is no equivalent data for pCLE for the diagnosis of GIM. Endoscope-mounted confocal laser endomicroscopy had a sensitivity of 91 % for the diagnosis of GIM [7]. In per-biopsy analysis, the sensitivity of pCLE for diagnosing neoplasia in Barrett’s oesophagus was 75–88 % and the corresponding sensitivity was 94–98 % for endoscope-mounted confocal laser endomicroscopy [912]. Extrapolating this difference in performance for pCLE and endoscope-mounted confocal laser endomicroscopy for dysplastic Barrett’s to GIM, we estimated a sensitivity of 80 % for pCLE in the diagnosis of GIM. For the primary outcome measure of comparing the sensitivity of pCLE versus WLE for detecting GIM, using sensitivity of 80 % for pCLE, and 40 % for WLE, power of 80 %, and significance of 0.05, the sample size required was 116 biopsies, which were matched with corresponding pCLE video sequences. Because a minimum of 6 biopsies were obtained per patient, 20 patients were recruited.

Data analysis

Statistical analysis was performed by the study statistician (YHC). Statistical analysis was performed with SPSS 17.0 (SPSS Inc, Chicago, IL). Sensitivity, specificity, and accuracy were computed for each method (pCLE, mNBI, AFI, WLE), along with exact binomial 95% confidence intervals, with histology diagnosis serving as the “gold standard.” Accuracy was defined as (number of true positives + number of true negatives)/(number of true positives + number of true negatives + number of false positives + number of false negatives). McNemar’s test was used to compare the difference of the sensitivity, specificity, and accuracy between different modalities. Tango’s score interval for the difference of dependent proportions was used to obtain a 95 % confidence interval for the difference in sensitivity, specificity and accuracy between modalities. A two-tailed p value <0.05 was considered statistically significant.

Results

Patient characteristics

From September 2011 to July 2012, 25 subjects were approached and 20 patients agreed to participate. The 20 recruited subjects had a mean age of 62.5 ± 6.6 years. There were 15 males and 5 females. One patient had family history of gastric cancer. Ten patients were randomized to each group. None of the subjects had any endoscopic complication or adverse reaction to sodium fluorescein, other than transient yellow discoloration of the skin and urine.

Endoscopic and histologic findings

For the 20 participants, a total of 125 sites were examined, yielding 66 sites (52.8 %) with GIM, which was found predominantly in the antrum greater curve (n = 16; 24.2 %), antrum lesser curve (n = 20; 30.3 %), and incisura (n = 15; 22.7 %). GIM was found less commonly at the body lesser curve (n = 6; 9.1 %), body greater curve (n = 4; 6.1 %), and cardia (n = 5; 7.6 %). No dysplasia or cancer was detected. In addition to the protocol sites, five lesions were identified, consisting of four Paris 0-IIc lesions (3 at incisura, 1 at antral greater curve), and one Paris 0-IIa lesion at antral lesser curve. Of these five lesions, three had GIM and two sites were normal on histology.

Diagnostic characteristics of endoscopic modalities

The endoscopic appearances consistent with positive diagnosis of GIM on WLE, AFI, and mNBI are shown in Fig. 1. The pCLE and histological features of GIM are shown in Fig. 2. The diagnostic characteristics of WLE, AFI, mNBI, real-time pCLE, and off-site pCLE are shown in Table 1. The comparisons between the performance characteristics of the various modalities are shown in Table 2. Real-time pCLE had significantly better sensitivity (90.9 vs. 37.9 %, p < 0.001) and accuracy (88 vs. 64.8 %, p < 0.001) for the diagnosis of GIM compared with WLE. The sensitivity (90.9 vs. 68.2 %, p = 0.001), specificity (84.7 vs. 69.5 %, p = 0.042), and accuracy (88 vs. 68.8 %, p < 0.001) of real-time pCLE were significantly better than AFI. However, the sensitivity, specificity, and accuracy of real-time pCLE for the diagnosis of GIM were numerically but not statically significantly better than that of mNBI.

Fig. 1
figure 1

GIM seen on WLE as ash-colored nodular change (A), on AFI as homogenous green appearance (B), and on mNBI as the light blue crest representing the presence of a histological brush border (C)

Fig. 2
figure 2

pCLE (A) and histological (B) appearance of GIM

Table 1 Performance characteristics of WLE, AFI, mNBI, real-time pCLE and off-site pCLE
Table 2 Comparison of the performance characteristics between real-time pCLE versus WLE, AFI, and mNBI, as well as between off-site pCLE versus WLE, AFI, and mNBI

During real-time pCLE interpretation, there was no real-time video-scrolling function to allow convenient and immediate review of the pCLE videos. Therefore, real-time pCLE was interpreted with only first-pass visualization of the pCLE videos. Off-site pCLE reading removed the bias from WLE, AFI, and mNBI and allowed repeated viewing of pCLE videos until satisfactory visualization was achieved for interpretation. Off-site pCLE had significantly better accuracy for the diagnosis of GIM compared with WLE and AFI, and mNBI. Off-site pCLE had superior specificity (94.9 vs. 84.7 %, p = 0.031) and accuracy (95.2 vs. 88 %, p = 0.012) compared with real-time pCLE. Although off-site pCLE’s sensitivity (95.5 %) for the diagnosis of GIM also was numerically better than those of real-time pCLE (90.9 %), the difference did not reach statistical significance.

Discussion

pCLE is a new technology which allows real-time histological diagnosis during endoscopy. The pCLE probe can be passed through the working channel of most endoscopes. This allows pCLE to be used in conjunction with various forms of virtual chromoendoscopy, including AFI and mNBI (Olympus), FICE (Fujinon), and i-scan (Pentax). This study aimed to ascertain the performance characteristics of pCLE in comparison with WLE, AFI, and mNBI. After the ascertainment of the good performance characteristics of pCLE in the diagnosis of GIM in this study, subsequent studies can be planned to determine the role of pCLE with virtual chromoendoscopy in various gastrointestinal pathologies. Our study is not designed to determine the population who would benefit from the screening or surveillance gastroscopy. However, there is currently an ongoing, prospective, multicentre study in Singapore that aims to determine the risk factors for gastric cancer, and the subset of high-risk patients who would benefit from surveillance gastroscopy.

To our knowledge, this was the first study to compare pCLE with virtual chromoendoscopy for the diagnosis of GIM. Although Guo et al. [7] described the role of endoscopy-mounted CLE in the diagnosis of gastric pathologies, and Buchner et al. [8] compared pCLE with virtual chromoendomicroscopy for the diagnosis of neoplastic versus nonneoplastic colon polyps, no comparable data are available for the role pCLE in diagnosing GIM compared with virtual chromoendoscopy. We showed that pCLE was superior to WLE and AFI for the diagnosis of GIM. The sensitivity, specificity, and accuracy of real-time pCLE were numerically better than those of mNBI, but the differences were not statistically significant. The lack of significance is likely to be due to the small sample size, because this study was powered to compare WLE and pCLE, and not to compare WLE and mNBI. An appropriately powered study is recommended to determine whether real-time pCLE is better than mNBI in the diagnosis of GIM.

Off-site review of pCLE videos gave better specificity and accuracy compared with real-time pCLE interpretation. This is likely due to the repeated viewing of off-site pCLE videos, thus allowing subtle GIM changes to be detected. The Cellvizio system we used was not equipped with on-site scrolling capability. Therefore, there was no review of the videos real-time in this study. Incorporation of video scrolling function in the pCLE system could potentially aid the real-time interpretation of pCLE videos, similar to the case of pathologists strolling through the slides as they view their specimens under the microscope. Off-site pCLE has significantly better accuracy than mNBI, suggesting that pCLE with on-site review capabilities may potentially perform better than mNBI. The new generation of Cellvizio system, which became available to us after the completion of this study, is equipped with on-site scrolling and review capabilities.

The sensitivity of pCLE for diagnosing GIM in our study appeared lower than that reported by Guo et al., who used the endoscopy-mounted system. The results reported for the diagnosis of Barrett’s esophagus and neoplasia also were worse for pCLE compared with the endoscopy-mounted system [9, 11, 12]. The reason for this could be the faster image appearance frame rate and the lack of on-site review capability in pCLE, in contrast to the endoscopy-mounted system. On the other hand, the off-site pCLE review yielded results comparable to that by Guo et al.

Fluorescein enhanced AFI [13] involves the visualization of mucosa after intravenous injection of fluorescein and is a viable complementary modality which can be used in tandem with pCLE, as both require prior intravenous fluorescein. Fluorescein-enhanced AFI had been shown to differentiate neoplastic from nonneoplastic colon polyps, and it would be interesting to determine its performance characteristics for diagnosing gastric pathology.

This study involved sequential examination using various modalities and therefore presented the possibility of bias of the preceding procedure on the subsequent one. This potential weakness was minimized by the randomization of patients to undergo AFI before mNBI and vice versa. Blinded off-site interpretation of randomly arranged pCLE videos independent of the other modalities removed the bias of these modalities on off-site pCLE reading.

In summary, pCLE performed better than WLE and AFI for the diagnosis of GIM. Off-site review, which is akin to pathologists reviewing their histology slides, improved the diagnostic characteristics of pCLE. Thus, pCLE is a useful tool for screening gastric precancerous lesions in patients at high risk for gastric cancer.