Introduction

Whole slide imaging (WSI) systems consist of devices to ‘convert’ glass slides into multiple digital high-resolution images scanned by a camera. Software assembles all the images and enables them to be visualised as a single large image similar to a low power microscope view. It is also possible to magnify the image analogous to changing objective lenses [1]. The introduction of WSI is bringing about a paradigm shift in the way that we practice pathology. Over the last decade, WSI have been used for research, teaching, telepathology remote real-time interpretation of frozen sections and immunohistochemistry [2,3,4].

Major advantages of WSI are the possibility to analyse a slide from a remote access, share cases with experts and the inherent portability. In addition, WSI enable visualisation of much more detail that the human eye is able to see by means of a conventional light microscope (CLM) [5]. WSI systems are more ergonomic, provide larger field of vision and easy navigation, allow a wider range of magnifications and make it possible to easily perform measurements and annotations [2]. These systems also provide high-quality digital images, which enable conservation of cases and may prevent loss of data. Cloud storages eliminate storage problems, allow easy searching for case retrieval and put an end to the problems of broken glass slides and the inevitable fading of stains [1, 6].

However, there are still barriers to be overcome. The quality of the image, impediments to workflow, cost, threats to job security and the need for fast, high-capacity servers are some commonly cited disadvantages. Staining and focus may also be sensitive to the variations of the glass slide preparation. Badly positioned sections, chatter artefact, tissue folds and bubbles formed during coverslipping may result in poor focus and require a re-scan. A lack of familiarity with the technology increases time of diagnosis and hinders the workflow by slow performance [7,8,9]. Most studies have concluded that there is a learning curve, where the pathologists progressively improve their diagnosis time as they become familiarised with the technology [10, 11].

Due to the absence of recommendations to guide validation studies, the College of American Pathologists Pathology and Laboratory Quality Center (CAP-PLQC) has established guidelines for validation of WSI systems [12]. Subsequently, the Canadian Association of Pathologists released guidelines for establishing a telepathology service for anatomic pathology using WSI [13] and the Digital Pathology Association (DPA) has also provided additional criteria in this context [14]. The USA Food and Drug Administration (FDA) is responsible for regulating device manufacturers and has approved limited use of WSI for some tissues, stains and reagents used in immunohistochemistry [7]. Recently, the FDA approved a WSI system via de novo classification, which is the only digital pathology system cleared for primary diagnostic use so far [15].

Given this scenario, it is necessary to provide validation of specific WSI systems before clinical use [16] and a re-validation when any significant change occurs [12]. Some groups are already using WSI in routine diagnostic services [5, 17, 18]. The most common problems in previous validation studies were the lack of research involving a large range of subspecialty specimens, comparisons of WSI diagnosis with a ‘gold standard’ [7] and a sample containing known malignant diagnoses or challenging material [19, 20]. Regarding the current status of WSI system validation, there have been studies on cytopathology, dermatopathology, neuropathology and gastrointestinal, breast, genitourinary, gynaecological, paediatric, pulmonary, renal, head and neck [2] and liver pathology areas [21]. However, there are still no studies published on oral pathology, hematopathology and endocrine, bone and soft-tissue pathologies. This lack of validation leads to a reluctance around the acceptance of the use of WSI [7].

Therefore, this study was designed based on the CAP-PLQC guidelines [12] and DPA suggestions [14] and proposes to evaluate intra-observer variability between CLM and WSI systems, as a measure to assess the performance of WSI systems, for diagnostic purposes of oral diseases in clinical practice, routine pathology and primary diagnosis. This study tested the hypothesis that WSI systems are a reliable method for diagnosis of oral diseases.

Materials and methods

Study design

This cross-sectional, retrospective study was approved by the Piracicaba Dental School/University of Campinas Ethics Committee in 05/06/2017 (registration: CAAE: 66762817.0.0000.5418). The sample consisted of 70 (n = 70) H&E-stained glass slides of oral biopsies, randomly selected between the years 2002 and 2017, from a series of previously stipulated diagnoses, which aimed to cover the most common diseases in a routine oral pathology service, with a broad range of entities, oral sites and tissue sources. This approach aimed to avoid bias related to intrinsic diversity of cases and to improve variability, but also maintain equitability. The glass slides were scanned using the Aperio Digital Pathology System (Aperio Technologies Inc., Vista, CA, USA) with spatial sampling of 0.47 μm per pixel, with automated focusing and magnification at × 20. All of the tissues present on glass slides were scanned and included in the digital images [12]. The monitor (Samsung, Seul, Coreia do Sul). used for slide viewing and interpretation had a screen resolution of 1600 × 900 pixels. Two pathologists, with extended previous experience in digital microscopy, blindly analysed and provided a diagnosis, in an independent way, for all cases with CLM, and after 3 months of washout, with WSI system. To achieve the recommendation of reproducibility [14], clinical information (age and sex of patients, anatomical site and clinical aspects of the lesions) was provided along with the cases. The diagnoses were compared between the two methods and classified as (1) concordant: diagnoses in both methods are the same; (2) slightly discordant: no clinical or prognostic implications; or (3) discordant: with clinical/prognostic implications for the patient. Discordant cases were re-assessed to establish a preferred diagnosis between CLM and WSI in order to establish the reason for the disagreement, in particular to determine if discrepancies were due to factors in the method of preparation or to differences in the pathologists’ interpretation of the slides or images [21].

The pathologists involved descriptively pointed out technical problems in glass slides with the potential to affect the quality of the scanned images. The quality of glass slides and digital slides were stated as (1) poor: region of interest is compromised making diagnosis difficult or impossible; (2) diagnostic: insufficient tissue quantity, altered stain and/or deficiencies (artefacts or folds); (3) good: minor deficiencies (artefacts or folds); or (4) excellent: enough tissue quantity, appropriate stain, no artefacts or folds/whole material is focused, good colour fidelity and no artefacts or folds [22]. Discordant cases were assessed in terms of quality to verify if this was an interfering factor for diagnostic concordance. The time taken to render a diagnosis was measured for each case, as an indicator of the workflow, since this factor is often used to resist the acceptance of digital methods [23].

Statistics

This study focused on the intraobserver agreement as the primary form of analysis and preferred measurement [12, 14]. We assessed the Cohen κ statistics to establish the agreement between CLM and WSI (κ values of < 0.00 were considered to indicate poor agreement, 0.0–0.2 slight agreement, 0.2–0.4 fair agreement, 0.4–0.6 moderate agreement, 0.6–0.8 substantial or good agreement and > 0.8 excellent or almost perfect agreement) [24]. The interobserver variability was not explored. Statistical analyses were performed using VassarStats Website for Statistical Computation [25].

Results

The oral diseases and oral sites are summarised in Table 1. Both pathologists had 68 concordant cases out of the 70 cases included in this validation study. The intraobserver agreement between CLM and WSI diagnoses was considered excellent (κ = 0.967; 95% CI 0.876–1 for pathologist 1 and κ = 0.967; 95% CI 0.877–1 for pathologist 2) with 97% agreement for both pathologists.

Table 1 Included cases according to diagnoses and topography of the oral biopsies

There were two discordant cases (with clinical/prognostic implications) for each observer, which were carefully analysed to elucidate the main reasons to disagreement. For pathologist 1, the WSI diagnosis was considered as correct in one case, whereas CLM diagnosis was judged as correct in the other. For pathologist 2, the CLM diagnosis was preferred in both cases (Table 2).

Table 2 Intraobserver discordant cases between methods, technical problems and correspondents’ preferred diagnoses

Technical problems used to measure the quality of the glass slides and the digital slides are presented in Table 3. Discordant cases were assessed in terms of quality to determine if this was an interfering factor for diagnostic concordance. Among four overall discordances, three presented insufficient quantity of tissue. Moreover, discordant cases involved the same diagnoses for both pathologists in different cases, and the spectrum of the cases allowed individual interpretations, which led to discordances. The discordances were also influenced by the complexity of cases.

Table 3 Quality of glass slides and WSI

The time to render a proper diagnosis was measured (Fig. 1). Similar median times were seen in both methods for pathologist 1 and in WSI for pathologist 2. Pathologist 2 showed a higher median time for CLM diagnoses, and an associated reduction of median time do render diagnoses by means of WSI. Among six cases with higher maximum time values for diagnoses, three were discordant cases (in a total of four overall discordances). The outlier time values occurred more frequently in cases of leukoplakia and adenoid cystic carcinoma (ACC) and were correlated to the inherent diagnostic difficulty of the cases (Table 4).

Fig. 1
figure 1

Box plot graphic with maximum, minimum, median and interquartile range of time needed for diagnosis for both pathologists in each method

Table 4 Time to diagnosis outliers

Discussion

This study represents the first validation of a WSI system used for histopathological diagnosis of oral diseases. The sample size (n = 70) is sufficient to cover spectrum and complexity of lesions usually observed in a routine practice, according to the recommendation of CAP-PLQC, which suggests that a sample set of at least 60 cases should be included in the validation process [12, 14]. Clinical information was provided along with the cases to reproduce the practice context [12, 14], as well as most of the well-designed published studies [21, 23, 26,27,28,29]. Additional H&E-stained slides and histochemical or immunohistochemical staining were not provided in any studied case to reach final diagnosis. Although clinical information has been provided, both pathologists pointed out that the absence of clinical photos and clinical diagnostic hypotheses represented limitations in the diagnostic process. The washout period chosen was of 3 months to minimise ‘memorization bias’. This is a frequent variation in study design with most of the previously published studies stabilising a washout period between 2 weeks and 1 year [23, 26, 28, 30,31,32,33,34].

The best parameter to evaluate the performance of a WSI system against CLM is intraobserver agreement, rather than accuracy [12, 14, 34]. Intraobserver agreement refers to the percentage of diagnostic concordance when one observer assesses two methods with an interval of time although accuracy indicates the degree of agreement between the diagnosis result from the WSI and the ‘true diagnosis’ (the one that is accepted, since it is established by a definition or consensus) [12] as a ‘gold standard’. Some studies only compared the WSI with a gold standard [28], which represents a major problem in validation studies. The present study did not compare WSI with a gold standard.

Kappa statistics expressed the level of agreement between the methods and indicated an excellent concordance for both pathologists, similar to previously published studies [17, 21, 26, 31, 33, 35]. That may reflect the high quality of digital slides and better workflow of WSI [1]. Other studies were designed with different observers assessing each method (interobserver variability by instance) inserting an inevitable bias instead of only evaluating the performance of the method [36, 37]. The interobserver variability was not explored in this study since it is considered an expected variable due to the distinct interpretations of each pathologist and infer more about the pathologist experience and little about the method [36].

In discordant cases, the preferred diagnoses were agreed by review of CLM and WSI to verify which one provided the more coherent or ‘correct’ diagnosis. In the present study, most of the preferred diagnoses (3/4) for discordant cases were obtained by CLM. However, we recognise the need to analyse each case to assess if the discordances are related to the quality of WSI [6], intrinsic to the technology or due to other factors, since intraobserver variability can be increased in difficult cases, even using the same glass slide over time [34].

Glass slide and correspondent digital slide quality were classified according to the presence of artefacts and folds, quantity of tissue, altered stain, blurred focus and good colour fidelity. Most cases were considered ‘excellent’, and those classified as ‘diagnostic’ were determined to provide enough material to render the diagnoses. Some studies did not consider the quality of digital images as a prominent cause of discordance [23], while others point it out as the main reason for diagnostic failure [6]. It is almost impossible to achieve optimum focus in entire digital image since tissue sections on a glass slide are very rarely planar [38]. In this study, three of the four discordant cases presented insufficient quantity of tissue for analyses.

Two cases (43 and 65) presented discordant diagnoses between actinic cheilitis and SCC, presenting areas of hyperkeratosis, acanthosis, atrophic epithelium, epithelial dysplasia, solar elastosis and microinvasion of epithelial cells in the lamina propria. In this context, the discordances may have occurred due to the fact that the pathologists did not observe the microinvasive areas or because these alterations may be interpreted as reactive epithelial atypia secondary to the lesion’s inflammation rather than genuine dysplasia [39]. In these cases, the preferred diagnosis was judged as correct by CLM in one case and by WSI in the other, clearly disregarding the diagnostic method as the reason for the disagreement.

The other two discordant cases (55 and 58) involved discordances between pleomorphic adenoma and ACC, which are biphasic tumours that may present similar or overlapping morphological characteristics [40, 41]. These tumours often result in controversial interpretations and were considered difficult cases in the current study since both pathologists struggled to determine if the lesions were benign or malignant. The fact that both pathologists had the same discordances in different cases reinforces the possibility that these divergences are due to the difficulty of the cases and variations of interpretations of each pathologist, rather than the diagnostic methods. In this study, the intrinsic difficulty of the cases influenced the occurrence of diagnostic discordances, rather than the method of preparation. In addition, there was a limited amount of tissue in these specimens, which is known to be a potential diagnostic pitfall in the differential diagnosis between these tumour types [42].

WSIs offer a flexible viewing facility, requiring less time to identify histological structures and providing good definition [3], but the operation is influenced by the difficulty and experience of handling and navigation, making time an important factor that reflects the workflow. In this study, the measurement of time to diagnosis was discrepant between pathologists. To allow a more coherent comparison, we assessed median values and concluded that median time was higher only in CLM for pathologist 2, not necessarily related to any difficulty of the method. This result, when compared with WSI time for the same pathologist, indicates a reduction of time needed to render diagnoses using WSI, showing an improvement of the workflow. This may be related to better ergonomics, larger field of vision and full visualisation as soon as the WSI was open, instead of glass slide handling [43]. This information disagrees with most published studies, which reported a range of 1 to 2 extra minutes of time required to render a diagnosis by virtual slides [4, 29, 44, 45]. Pathologist 1 presented a similar median time in both methods, also similar to the median time in WSI for pathologist 2.

For both pathologists, the time outliers occurred more frequently in cases of leukoplakias and ACC. Discordant ACC cases presented insufficient quantity of tissue and the other outliers presented minimal technical problems (faded staining and tissue folding) insufficient to justify this range of exceeded time. The outlier’s time values were higher for pathologist 1 with reduced time in two cases during WSI evaluation. Pathologist 2 presented a notable reduction of time to render the diagnoses in WSI.

In conclusion, this study provides original evidence of a high performance for a WSI system in the histopathological diagnoses of oral diseases. Most importantly, the combination of a high concordance level between the studied methods and an outstanding workflow suggests that WSI is suitable for diagnostic purposes of oral diseases in clinical practice, routine pathology and primary diagnosis in the field of oral pathology. Therefore, this study accepted the hypothesis that a WSI system is a reliable method in oral diagnosis.