Introduction

Light microscopy has been the tool of choice for pathologists for over 150 years, used to provide tissue diagnoses. Developments in imaging technology over recent decades have introduced an alternative option, variably referred to as virtual microscopy, whole slide imaging (WSI) or digital pathology [1, 2]. The digital modality offers a variety of advantages over glass slide viewing. Depending on the setting, these include cheaper and more convenient image storage for ready review at clinical meetings or at follow-up biopsy, access to rapid, remote and multiple second opinions, primary reporting in locations remote to a pathologist’s service, which could include frozen section tissue reporting, and acting as an educational resource. In referral practice, a permanent archive of cases can be maintained in the receiving laboratory, after return of any glass slides to the host institution.

Despite this plethora of potential advantages, pathologists have been slow to adopt primary digital reporting into routine practice. Some of the reasons for this are practical; for example, in relation to initial investment and bandwidth speed hindering local implementation in healthcare establishments. In addition, there are many legal and regulatory potential obstacles such as medical device regulation, licensure, credentialing and privileging, jurisdiction, malpractice insurance, reimbursement, privacy and security [3].

Several WSI systems have been licensed in Canada and Europe; however, this is not the case in the USA where the United States Food and Drug Administration (FDA) currently regards WSI as a class III (high risk) medical device [4], requiring the highest level of validation possible, and this has likely hindered implementation in that jurisdiction; although recently, it has approved application in restricted circumstances, such as in the scoring of HER2 immunostaining in breast cancer [5]. The authors are aware that industry pre-market approval trials are currently underway on large cohorts to demonstrate the safety of digital pathology. Results are awaited.

There is a relative paucity in high quality published validation studies of primary digital reporting in comparison to conventional glass slide reporting [620]. The College of American Pathologists (CAP) Pathology and Laboratory Quality Center have recently produced a set of 12 recommendations for validating WSI systems for use in diagnostic pathology [21], and the Digital Pathology Association has also generated detailed guidance on digital pathology validation in the healthcare environment [22]. These may enhance the number and quality of subsequent digital pathology validation studies.

In this study, we evaluate primary digital pathology reporting in the setting of routine subspecialist gastrointestinal pathology, commonplace in most tissue pathology laboratories and representing one of the highest volume specialties in most laboratories. The aim was to compare individual digital and glass slide diagnoses, amongst three pathologists reporting in a gastrointestinal subspecialty team, in a prospective series of 100 consecutive diagnostic cases from routine practice in a large teaching hospital laboratory. Discordant diagnoses were classified, and the study evaluated against CAP recommendations.

Materials and methods

The study setting was the Department of Histopathology within the Royal Victoria Hospital, Belfast, the largest teaching hospital in Northern Ireland. Three pathologists were involved in slide evaluation for the study (MBL, PJK and OPH); all of whom participated in a subspecialist adult gastrointestinal reporting service at the time of the study. Fifty consecutive cases were selected prospectively from the routine diagnostic practice of two of the involved pathologists (MBL and PJK), providing 100 study cases. Large surgical resections and hepatopancreaticobiliary cases are handled separately in routine practice, as in most laboratories, and did not comprise any of the study cases. Most originated from upper or lower gastrointestinal endoscopy, or from appendicectomy operations, representing the most common gastrointestinal specimens typically handled by most pathology laboratories.

The three study pathologists each independently evaluated by routine light microscopy all haematoxylin and eosin (H&E)-stained glass slides from each case, accompanied by the patient demographic details and clinical information as provided on the specimen request forms. In no cases were additional H&E-stained levels, histochemical or immunohistochemical stains required to reach diagnosis on glass slide evaluation by any of the three pathologists. All glass slides from each case were then digitally scanned using a Hamamatsu Nanozoomer (Hamamatsu, United Kingdom) at ×40 magnification providing high resolution whole slide scans for review. Scanning included a quality assurance step before approval of final images for review, to ensure optimal image exposure and focus. All digital slides were transferred to the PathXL (PathXL, United Kingdom) cloud server and made available to study pathologists via the web using the PathXL imageviewer.

After a washout period of at least 6 months, each of the three study pathologists independently evaluated the whole H&E-stained digital slide images for each of the 100 cases, with the same clinical information as provided for glass slide evaluation. Digital viewing was performed on a variety of computer monitors that were available to the pathologists in their own offices. No attempt was made to control for monitor size or resolution.

The glass slide diagnoses and digital diagnoses from each pathologist for all 100 cases were compiled into a single Microsoft Excel™ database for comparison. Multiheader microscopy slide review involving the three study pathologists was arranged, with access to both glass slides and digital images and all six diagnoses provided for each case (three glass slide and three digital from each pathologist). Each case was discussed and classified as concordant, if all diagnoses were deemed identical or equivalent (differing only in descriptive terminology applied) or discordant, if any of the diagnoses were deemed to differ significantly. Discordant cases only were reviewed simultaneously by all three study pathologists on multiheader glass slide microscopy and digitally, to ensure representative diagnostic areas were present on both modalities; and a consensus diagnosis was reached for each case. For all discordant cases, the nature of the discordance was determined and classified as ‘viewing modality independent’, if the discordant diagnosis was the same on glass and digital image viewing by the same pathologist i.e. representing interobserver discordance and ‘viewing modality dependent’, if the discordant diagnosis differed between glass and digital evaluation by the same pathologist i.e. representing intraobserver discordance. An evaluation was made of the likely clinical significance of each discordance. Discordances were classified as minor if they were considered unlikely to significantly alter patient investigation/treatment and major if they were considered likely to significantly alter patient investigation/treatment.

For classification purposes, some grouping of diagnoses was required, particularly in the setting of lower gastrointestinal tract inflammatory conditions, most commonly chronic idiopathic inflammatory bowel disease (chronic ulcerative colitis and Crohn’s disease) and so-called diverticular disease-associated colitis. There is considerable morphological overlap in these conditions in particular, and achieving a final diagnosis requires detailed clinical and endoscopic information; therefore, pathology reports are typically descriptive rather than definitive diagnostically. For these reasons, on case review and discussion, such cases were grouped together as ‘inflammatory bowel disease’ (Table 1) and diagnoses deemed concordant unless there was a clear oversight, such as inflamed mucosa reported as normal or granulomata missed. Similarly, intestinal metaplasia is generally considered important and distinctive morphologically in upper gastrointestinal tract pathology, allowing categorisation as discordant when this feature was missed, either in the setting of Barrett’s oesophagus or chronic gastritis. Comparisons were made between anatomic site of the case and likelihood of discordance, using the chi-square test.

Table 1 Summary of consensus diagnoses and intraobserver and interobserver discordant diagnoses between three pathologists using glass and digital imaging in 100 study cases

Results

Table 1 details the full range of consensus diagnoses for all 100 study cases and the number of cases with each diagnosis, to demonstrate the repertoire of such cases representing routine gastrointestinal pathology practice. Fifty-two patients were male and 48 female. The age range was 15 to 91 years (mean ± SD, 55.3 ± 19.5 years). As expected, diagnoses covered a wide spectrum of inflammatory and neoplastic upper and lower gastrointestinal tract disease. The 100 study cases generated 300 pairs of diagnoses; 100 glass slide diagnoses and 100 digital diagnoses from each of three pathologists. Figure 1 depicts the breakdown of cases with discordant diagnoses and their clinical significance.

Fig. 1
figure 1

Breakdown of the 100 study cases, indicating number and nature of discordant diagnoses and classification as major or minor, based on likelihood of significantly altering patient investigation or treatment

Intraobserver variation (glass vs. digital viewing)

Table 2 provides the clinical information and pathologists’ diagnoses for all 19 cases demonstrating any discordance. Examining intraobserver concordance, 286 (95.3 %) of 300 pairs of diagnoses were concordant. Intraobserver concordance for each pathologist was similar (all >90 %). In ten of the 14 discordant diagnostic pairs, the glass slide diagnosis was favoured, and in four cases, the digital diagnosis was favoured (Table 2). The likelihood of discordance was not related to anatomical segment within the gastrointestinal tract (oesophagus, stomach, duodenum, colorectum or appendix), with discordant diagnoses spread evenly across these sites (p = 0.47). One case, in which the consensus diagnosis on review was focal active colitis (FAC), demonstrated both intraobserver and interobserver discordance, with one pathologist diagnosing FAC on glass and digital, one diagnosing FAC on glass alone and one diagnosing as normal on glass and digital. On review, diagnostic material was evident on both viewing modalities. All of the intraobserver discordances were considered minor, being unlikely to significantly alter patient investigation or treatment.

Table 2 Details of all cases with discordant diagnoses (n = 19)

Interobserver variation only

Five cases demonstrated discordance which was purely interobserver in nature, or viewing modality independent. One of these diagnostic discordances was considered major, given likely different clinical management between the two diagnoses, which of an adenoma of the duodenal ampulla misinterpreted as an inflammatory polyp, on both glass and digital viewing by one study pathologist.

Discussion

Validation aims to demonstrate that any new methodology performs at least as well as the existing gold standard before adoption for clinical use. In the setting of digital pathology, validation is required to determine that a pathologist can view scanned whole slide images to make diagnoses at least as accurately as those rendered with light microscopy. This requires an accurate scanned digital reproduction of the original glass slide which can be saved, safely stored and subsequently retrieved for imaging on a suitable monitor, without image degradation.

There is a lack of appropriate validation studies to reflect ever-improving standards in digital pathology. Furthermore, many of the earlier validation studies involved a broad range of specialties and specimens, not reflecting current trends towards subspecialty reporting practice. It is considered more appropriate to validate by specialty, as there are clear differences in pathology practice between specialties, relating to specimen types and case complexity, which are likely to influence applicability of the digital modality. Concordance rates between glass slide and WSI diagnoses have been reported from 73 to 98 %, in a wide variety of clinical settings and with variable study designs [620].

We have studied the feasibility of digital slide viewing for primary reporting in the common subspecialty setting of non-resection luminal gastrointestinal pathology practice, evaluating and categorising interobserver and intraobserver discordance for 100 prospective consecutive randomly chosen cases between glass and digital diagnoses of three study pathologists. The study design met all 12 of the recent CAP recommendations for evaluating WSI systems for diagnostic use, around size and appropriateness of the selected study cases, emulation of the real-world clinical environment, attention to scanning detail to ensure quality of the final digital image for reporting and inclusion of a sufficient washout period between glass and digital viewing [21]. CAP recommends a minimum washout period of at least 2 weeks. In our opinion, this is insufficient to avoid recall bias resulting from recollection of interesting or unusual cases, even in the setting of a routine case mix rather than consultation caseload.

We found intraobserver concordance in 286 (95.3 %) of 300 pairs of diagnoses and interobserver concordance in 94 (94 %) of the total of 100 study cases. This result supports the use of digital pathology in primary diagnostic reporting in this setting. The concordance rate is broadly comparable to rates previously published in the field of gastrointestinal pathology [8, 16, 19]; although, it should be noted that comparison between studies is difficult because of differences in setting or study design, which may be subtle, or differences associated with the inherently subjective issue of subclassification of discordance with respect to ‘clinical significance’. For this study, we adopted a simple concordant or discordant dichotomous classification, with detail provided for all discordant cases, and classification as major or minor discordance depending on the likely implication of the discordance for patient investigation and treatment.

In all cases of intraobserver discordance in this study, the discordance was considered to be minor, at most resulting in possible additional investigations (blood tests and/or repeat endoscopy), such as with diagnoses of focal active colitis, possible collagenous colitis or duodenal intraepithelial lymphocytosis. The clinical significance of intestinal metaplasia is contentious, in both the settings of Barrett’s oesophagus and chronic gastritis, and therefore, missing this finding may or may not result in a different endoscopic follow-up strategy, depending on the gastroenterologist involved. Missed granulomata may represent a missed opportunity to make a definitive diagnosis of Crohn’s disease, but in this study case, the diagnosis of Crohn’s disease was already established. An overdiagnosis of acute appendicitis may provide false reassurance and an inaccurate explanation of clinical symptoms and prevent or delay further appropriate investigation if symptoms persist.

One of the interobserver diagnostic discordances was considered to be major, that of an adenoma of the duodenal ampulla misinterpreted as an inflammatory polyp. Follow-up and management of these two diagnoses are likely to differ, and an adenoma typically requiring assurance of complete removal. All of the other cases of interobserver discordance were considered minor.

It is difficult to compare discordance rates between published studies because of variation in criteria applied for classification of discordant diagnoses. For example, Molnar et al. [16], in a study comparing digital with glass slide diagnoses in routine gastric and colonic biopsies, reported a discordant rate of 7.8 % (8 of 103 cases), but, in three of these cases, this was attributed to ‘insufficient clinical information’. Provision of identical clinical information for both arms of the study should be a basic tenet of any such glass versus digital validation study.

The closest study to the current one in design is that of Al-Janabi et al., who examined 100 gastrointestinal biopsy and resection specimens, with a washout period of 6–12 months and described concordance between glass and digital diagnoses in 95 % of cases [8]. Similar to our study, the 5 % of discordant cases were mainly inflammatory in nature, relating to differential interpretation of mucosal inflammatory activity. A larger study, by van der Post et al. [19], comprised 295 cases with exclusively colonic biopsies and reported slightly lower concordance, but study design was different, limiting comparison. Specific fields of gastrointestinal pathology have been addressed in other studies, including the reporting of polyps in the setting of bowel cancer screening [18] and the reporting of Barrett’s neoplasia in the setting of clinical trial oesophageal biopsy material [23]; the latter is a notoriously challenging area of diagnostic practice. Both of these studies concluded that virtual microscopy compares favorably with conventional microscopy.

The nature of most discordant cases described, in all of these studies including ours, reflects typical borderline calls in gastrointestinal pathology, which are considered similarly likely to occur in digital or glass slide practice. We found intraobserver discordance over twice as common amongst our digital study diagnoses compared to glass (10 vs. 4 discordances), but the overall rate of concordance was high (95.3 %), and these discordance rates fall well within the range of ‘non-inferiority’ defined by another large study [9]. Importantly, none of the intraobserver discordant diagnoses were considered likely to be of more than minor clinical significance. Further, the interobserver arm of our study found viewing modality independent concordance of 94 %. In other words, the overall low rate of discordance is probably no or little higher than one would expect on second viewing of any such set of cases, by the same or a second pathologist [14, 24].

Our study is limited by including no cases with an identifiable infectious agent, such as fungal or viral oesophagitis, Helicobacter gastritis or duodenal giardiasis. Such microorganisms typically require high power slide examination for confident diagnosis, and detection of these on digital examination has not been evaluated by our study, as a result of the random nature of cases included. However, such cases were included in one similar study and, although confident detection of microorganisms on digital viewing at the scanned magnification was acknowledged as problematic by the study pathologists, such cases did not generate any discordance [8].

Another potential criticism of our study is that the participating pathologists had no prior formal training in digital slide viewing, although all three were familiar with the PathXL viewing platform and digital slide navigation. Digital viewing is possibly more likely to miss mild and/or focal inflammation or focal findings such as intestinal metaplasia or granulomatous inflammation, and it is likely that missing focal findings is related to strategies used for screening the entire image. This could be addressed by training in use of the digital viewing platform, attention to design of hardware to facilitate this and/or application of tracking software to ensure all of the slide has been examined [25]. Jukic et al. described a similar study design to ours, applied to general pathology reporting, but included a training set of 500 cases provided to each pathologist before the study [14]. Their reported discordance rate was 4.4 %. It may be that, in a similar way to glass slide microscopy, the ability of pathologists to review digital images improves over time, resulting in higher confidence, higher reproducibility and reduced number of discordances with glass. However, the lack of formal training evidently did not result in significant discordance in our study.

All three study pathologists also anecdotally reported difficulty in some cases with image underexposure, causing difficulties, for example, in distinguishing dysplastic from non-dysplastic colonic mucosa and counting duodenal or colonic mucosal intraepithelial lymphocytes. This relates to the generation of the digital image at source, specifically the scanner settings, rather than viewing software, which has settings to digitally adjust brightness and sharpness. Despite these concerns, such cases did not result in significant discordance. Quality of the scanned image is of paramount important, regarding focus, resolution and exposure, and any such digital pathology practice should involve an essential quality assurance step, conducted before the release of images to the pathologist for reporting, with regular pathologist feedback to the quality assurance team crucial in optimising imagery.

Although not formally assessed in this study, all three study pathologists considered it more time-consuming to screen an entire slide digitally than by glass viewing. This may be a more significant concern for larger resection specimen cases and could prove a major hindrance to implementation of digital viewing for routine primary reporting once validation is accepted. Developments in high resolution screens and navigation tools which more closely mimic light microscopy navigation may surmount this problem. Issues of image storage requirements, cost and integration with laboratory information management systems, not addressed in this study, are amongst the additional challenges to a laboratory attempting to become ‘fully digital’ [26].

In conclusion, our study provides further evidence to support validation of digital slide viewing as an alternative to light microscopy for primary reporting in the setting of gastrointestinal pathology. The study quality is highlighted by its ‘real world’ setting and adherence to all 12 CAP recommendations, with a long washout period and inclusion of both intraobserver and interobserver discordance data particular strengths. Developments in hardware and software allowing approximation of digital reporting to current light microscopy reporting strategies will enhance focus on the added value of benefits brought by digital pathology and facilitate implementation into routine practice.