Keywords

1 Introduction

In-situ hybridization (ISH) can be used to look for the presence of a genetic abnormality or condition such as amplification of cancer causing genes specifically in cells that, when viewed under a microscope, morphologically appear to be malignant. Unique nucleic acid sequences occupy precise positions in chromosomes, cells and tissues and ISH allows the presence, absence and/or amplification/expression status of such sequences to be determined without major disruption of the sequences. ISH employs labeled deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) probe molecules that bind to a target gene sequence or transcript to catalyze detection or localization of targeted nucleic acid genes within a cell or tissue sample [1].

Historically, the clinical evaluation of proteins and nucleic acids in tissue has relied upon in situ immunoenzymatic detection (staining) methods. For example, detection of B cell clonality is useful for assisting in the diagnosis of B cell lymphomas and such an assessment can be accomplished through the evaluation of KAPPA and LAMBDA light chain expression. As seen in Fig. 1, tonsil tissue stained for KAPPA mRNA may be detected using a black chromogen (silver, Ag) and LAMBDA mRNA may be detected using a purple chromogen (tyramide-sulforhodamine). The presence of the signal of interest appears as tiny spots (e.g. discrete dots) and these spots may accumulate to form larger regions of aggregate signal (hereinafter “signal aggregate blobs” or “blobs”) depending on the expression level (copy number) of each targeted mRNA in B cells. By way of example, plasma cells have approximately 100,000 mRNA copies per cell, and therefore signal in those cells may appear as blobs.

Fig. 1.
figure 1

The example of tonsil stained using in situ hybridization (ISH) illustrating KAPPA mRNA detected with silver (Ag) in the black color and LAMBDA mRNA detected with tyramide SRB in the purple color: (a) the wholeslide image with six tonsil regions and (b) a field-of-view image at 40X. (Color figure online)

Quantitative ISH analysis will likely be useful in clinical evaluation of a variety of RNA biomarkers; however, its utility remains uncertain due to limitations of existing technologies. An automated technique for estimating an amount of isolated dot signal and signal aggregate blob may facilitate enhanced clinical interpretation of stained biological samples, enable samples to be interpreted more quickly and accurately, and empower evaluation of RNA biomarker clinical utility. In this study, we have developed an image-analysis system and method that enables the detection and quantification of the number of nucleic acid signals present in stained samples.

2 Methods

The proposed image-analysis framework for detecting and quantifying the expression of the RNA targets (biomarkers) used in our study is shown in Fig. 2.

Fig. 2.
figure 2

Image-analysis flowchart illustrating the steps to detect and quantify the expression of RNA targets (biomarkers) in a whole slide image (WS – wholeslide).

In this study, we propose a method of estimating an amount of signal corresponding to at least one biomarker in an image of a biological sample comprising: (1) detecting isolated spots in an image (e.g., an unmixed image channel image corresponding to signals from a biomarker); (2) deriving an optical density value of a representative isolated spot (e.g., based on computed signal features or characteristics from the detected isolated spots); (3) and estimating the number of predictive spots in signal aggregates in each of the sub-regions based on the derived optical density value of the representative isolated spot. The method further includes calculation of a total of number of spots in a sub-region by combining a number of detected isolated spots and the estimated number of predictive spots in signal aggregates in each of the sub-regions. Finally, a total number of detected isolated spots combined (i.e. summed) with the estimated number of predictive spots for each sub-region of signal aggregates for the entire tissue slides can be calculated and stored in a database [1].

2.1 Tissue Staining and Digital Images

Using 2.5-μm formaldehyde fixed-paraffin embedded (FFPE) tissue sections, a total of 189 field-of-view (FOV) microscope images and a total of 31 tissue slides of tonsil, lymphoma, and Calu-3 (xenograft) were included in the algorithm development. Tissue slides were stained with a simplex (one color)- and a duplex (two-color)-ISH protocol using probes targeting GAPDH, KAPPA, MALAT1, and KAPPA/LAMBDA RNA transcripts. The staining process was performed using a VENTANA Benchmark Ultra autostainer. All slides were counterstained with Hematoxylin (HTX) in blue color. The 31 slides were scanned using a VENTANA DP 200 scanner. RGB images were obtained with a resolution of 0.25 × 0.25 μm2 and a typical size of 3 billion pixels or 20 × 20 mm2.

2.2 Pre-processing of Color Unmixing

Preprocessing of a color unmixing is performed using a conventional color-deconvolution method to separate different chromogens e.g., black, purple, and blue. In our study, the approach proposed by Ruifrok et al. [2] was selected. The unmixing method can be applied to singleplex stained images with one chromogen and counterstain, or applied to multiplex staining images with more than one chromogen and counterstain, as shown in the examples in Fig. 3.

Fig. 3.
figure 3

(a) A portion of a whole slide image stained using an in situ hybridization assay to detect KAPPA mRNA (black color) and LAMBDA mRNA (purple color) with counterstain hematoxylin; (b, c) an example of an image channel image after unmixing, showing only signal corresponding to KAPPA mRNA (black color) and LAMBDA mRNA (purple color), respectively, and (d) an example of an image channel after unmixing, showing the hematoxylin channel (blue). (Color figure online)

2.3 Isolated Spot Detection

Following image acquisition and/or unmixing, an image having a single biomarker channel is provided to the spot detection module such that isolated spots within the image may be detected (as opposed to the “blobs” or aggregate dot signals). An unmixed image channel image is used for input for the spot and blob detection module. A morphological operation is performed to detect isolated spots, i.e. dots, within the image.

As seen in Fig. 4, following the detection of each of the isolated spots in the input image, the detected isolated spots are separated from the blobs in the input image, providing an “isolated spots image channel” and a “blob image channel”. The detected spots are masked out from a blob image channel. In an isolated spots image channel, small objects or blurred point sources can be detected using a multiscale Difference of Gaussians (DoG) approach. Multiple spot sizes are configured in ascending order (small to large), but the processing is in the order of large to small spots. In each iteration, a DoG filter is created from the given inner and outer filter sizes [3]. The respective detections are collected in a resulting seed/annotation object to become the location of each of the detected isolated spots in the (x, y) coordinates; this location corresponds to the seed center of each detected isolated spot. A seed center can be calculated by determining a centroid or center of mass of each detected isolated spot.

Fig. 4.
figure 4

(a) Provides an example of a portion of a whole slide image stained in an ISH assay, (b) illustrates the result of the unmixing of (a) into a single channel (black channel); (c) illustrates a blob channel image whereby the signals from the detected isolated spots from (d) are masked out; (d) illustrates the result of dot detection (a spot channel image) on the unmixed image channel image of (b); (e, f) illustrates derived (x, y) locations of the detected isolated spots in the spot channel image; and (g) and (h) illustrate an overlay of the detected isolated spots superimposed on the portion of the whole slide image.

2.4 Descriptive Signal Features for Each Detected Isolated Spot

With reference to Fig. 5, the optical density derivation module first computes descriptive signal features for each of the detected isolated spots in the image. The signal feature derivation module implements a Gaussian fitting technique is to analyze and parameterize certain characteristics of the detected isolated spots. The fitting method is performed based on the assumption that the distribution of the optical density and the radius is the normal distribution. A 1D-Gaussian-function fitting method is used to estimate the associated spot parameters within a pre-defined patch size surrounding a detected and isolated spot. The patch size is 7 × 7 pixels, which was determined to be the most appropriate patch size for any particular application that will facilitate the provisioning of optimal histogram results.

Fig. 5.
figure 5

The characteristics of (a) the isolated spots shown in red dots and (b) the feature histograms of intensity, blurriness, size, and roundness, respectively.

The characteristics derived from the Gaussian fitting technique include the size, intensity, blurriness, and roundness of the detected isolated dots, and each of these characteristics are computed using parameters of the Gaussian function. By solving the linear system Ax = b, the estimated parameters from the fitting method consist of mean, standard deviation (SD), and full-width-at-half maximum.

By fitting the parameters using the Gaussian model, the computed descriptive signal features of each isolated spot were obtained as following:

  1. 1.

    Intensity – is computed using the 98 percentile within the radius of the 5 pixel surrounded the center of the detected spots [no unit].

  2. 2.

    Blurriness – refers to the standard deviation (σ) of the Gaussian-function fitting method.

  3. 3.

    Size - refers to the full width at half maximum (FWHM) computed by:

    $$ FWHM = 2\sqrt {2{ \ln }_{2} }\upsigma \approx 2.355\upsigma $$
    (1)
  4. 4.

    Roundness – is the characteristic computed based on the comparison between the actual optical density distribution within a patch and the perfect Gaussian model computed from the estimated parameters. The concordance correlation coefficient (CCC) (which measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability) was used to compare the relationship (or the agreement), where CCC = 1 shows that the estimated parameters are perfectly agreement to the ideal Gaussian model; whereas, CCC = 0 shows that there is no agreement between the estimated parameters and the ideal Gaussian model [no unit].

Next, histograms can be generated for each computed signal feature characteristic, as shown in Fig. 5.

2.5 Estimation of a Number of Predictive Spots in Signal Aggregates in Each of the Subregions

The generated histograms provide for an understanding of the density of detected isolated cells that have particular values or representative characteristics. The generated histograms therefore provide insight into the characteristics of a representative or typical detected isolated spot. For example, from the intensity histogram (e.g. Fig. 5), it is possible to determine the intensity value of the detected isolated spots that is repeated most often (i.e. the mode of the intensity values). The representative or typical detected isolated spot is then assigned that particular determined intensity value.

The characteristics of the isolated spot representative are used to estimate the number of the spot in the aggregate signals. The estimation assumes a linear relationship between the summation of the optical density for the single spots and the aggregate signals, as following:

$$ N = \frac{{\sum OD_{A} }}{{\sum OD_{S} }}, $$
(2)

where N is the number of the spots within an aggregate signal region, ODA is the optical density of the aggregate signals, and ODS is the optical density of the representative isolated spot signals.

Using the feature histograms of the isolated spots in the previous step, we can apply the individual spot properties in the calculation of their summation of the optical density. The selected properties can be the mode of the intensity (optical density) and the mode of the radius in the feature histograms to calculate the summation of a representative individual spot:

$$ \sum OD_{S} = Area \times \overline{{OD_{S} }} , $$
(3)

where Area refers to a circle (πr2) or a rectangle (w × h) area assumed to be a shape of a spot, and (\( \overline{{OD_{\text{S}} }} \)) refers to the representative optical density of a single dot. This can be the mode of the intensity histogram, the average of the total intensity of the total detected isolated spots, or the weighted intensity, etc.

2.5.1 Segmentation and Residual Image Generation

Prior to estimating the number of predictive spots in signal aggregates, the input image is segmented into a plurality of sub-regions using segmentation. The generation of sub-regions is used to minimize the computation error due to the fact that the computations are based on a smaller local region rather an entire image. The segmentation also reduces the complexity in computing the spot counting in the aggregate signals and the sub-region concept is useful for the quality control verification by an observer and to reduce the complexity in estimating signal in the aggregate signal blobs.

As shown in Fig. 6, the residual signal is computed by masking out the black-channel image with the isolated spot image. On the residual image, irregularly sized sub-regions can be created by a superpixel segmentation method [4]. The sub-regions of the residual channel image are segmented and grouped the clump signals into smaller regions. Using the superpixel segmentation method, it groups the pixels substantially uniform and perceptually meaningful. The sub-regions using superpixels support in efficient estimation of the number of the signals efficiently. Because some sub-regions have little aggregate signal, it is easy to verify the estimated spot count within that segment. On the other hand, some sub-regions segmented by the superpixel method have completely aggregated signals within the segment, so that it creates a consistent approximation of the spot count within that segment.

Fig. 6.
figure 6

(a) Illustrates the result of the unmixed image in a single black channel, (b) illustrates the detected isolated dot image, (c) illustrates the residual image after masking out the black channel image from detected isolated spot image, and (d) the superpixel segmentation method was applied to the residual image (c).

Finally, the derived intensity parameter is multiplied by the area to give the optical density of a representative isolated spot. The computed optical density of a representative spot is then supplied to the spot estimation module. Once the number of predictive spots in each sub-region is estimated, the data may be stored in a database or other storage module.

3 Results

3.1 Verification of Detected Isolated Spot Counts

The quality control was performed based on a graphic user interface (GUI) which the detected isolated spots overlaid on the original and the observer could correct e.g. add, delete, move the spots. The verification was performed using 31 FOV on the simplex silver microscope images by a trained observer. The agreement plot is shown below with the R2 of 0.99 and CCC = 0.99. The example of the spot counting results before and after the correction is in Fig. 7. The correspondence of total spot count identified by the observer (115,154) and the algorithm (112,809) is illustrated in the accompanying Table 1.

Table 1. The total spot counts between the algorithm and the observer
Fig. 7.
figure 7

Illustrates the overall scatter of the spot count correspondence between the expert observer and the algorithm results (R2 = 0.99, CCC = 0.99) verified on 31 FOVs.

3.2 Individual Spot Feature Characteristics and Number of Predictive Spots in Signal Aggregates

We characterized and compared the dots generated by a single probe (i.e., Kappa 01, Kappa 02, or Kappa 03) versus a cocktail of three probes (e.g., Kappa 01, 02, 03), and no probe control using tonsil tissue. As seen in Fig. 8, the intensity of three probes shows wider range than in the one probe images, whereas, the blurriness, size, and roundness characteristics of the spots generated by one probe are not different to spots generated by three probes. As seen in Fig. 9, the analysis result image overlaid with superpixel outlines (green), the overlaid red dots indicating the isolated spots detected by the algorithm, and a red number indicating the additional spots estimated for the aggregate signal within each superpixel.

Fig. 8.
figure 8

Illustrates histograms of the spot characteristics of (a) intensity, (b) blurriness, (c) size, and (d) roundness generated by a single probe (i.e., Kappa 01, Kappa 02, or Kappa 03) versus a cocktail of three probes (Kappa 01,02,03).

Fig. 9.
figure 9

The analysis result image overplaid with superpixel outlines (green), the overplaid red dots indicating the isolated spots detected by the algorithm, and a red number indicating the additional spots estimated within the aggregate signal with each superpixel. (Color figure online)

4 Conclusions

In this study, we have leveraged the unique detection features of the RNA ISH technology to develop a new method to quantify the RNA signal in FFPE tissue, while maintaining tissue context. It is anticipated that this method will enable analysis of gene expression changes in heterogeneous cancer and normal cells and tissues, with single cell resolution, thereby enabling evaluation of the clinical utility of the plethora of RNA biomarkers encoded in the human genome.