Fast Regions-of-Interest Detection in Whole Slide Histopathology Images

Li, Ruoyu; Huang, Junzhou

doi:10.1007/978-3-319-28194-0_15

Ruoyu Li¹⁸ &
Junzhou Huang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9467))

Included in the following conference series:

International Workshop on Patch-based Techniques in Medical Imaging

1249 Accesses
9 Citations

Abstract

In this paper, we present a novel superpixel based Region of Interest (ROI) search and segmentation algorithm. The proposed superpixel generation method differs from pioneer works due to its combination of boundary update and coarse-to-fine refinement for superpixel clustering. The former maintains the accuracy of segmentation, meanwhile, avoids much of unnecessary revisit to the ‘non-boundary’ pixels. The latter reduces the complexity by faster localizing those boundary blocks. The paper introduces the novel superpixel algorithm [10] to the problem of ROI detection and segmentation along with a coarse-to-fine refinement scheme over a set of image of different magnification. Extensive experiments indicates that the proposed method gives better accuracy and efficiency than other superpixel-based methods for lung cancer cell images. Moreover, the block-wise coarse-to-fine scheme enables a quick search and segmentation of ROIs in whole slide images, while, other methods still cannot.

Z. Huang—This work was partially supported by U.S. NSF IIS-1423056, CMMI-1434401, CNS-1405985.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Automatic cell segmentation in histopathological images via two-staged superpixel-based algorithms

Article 16 October 2018

Normalized Euclidean Super-Pixels for Medical Image Segmentation

ASARI: A New Adaptive Oversegmentation Method

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The detection and segmentation of region of interest (ROI) is a crucial intermediate step between histopathology images acquisition [4] and computer-aided automated diagnosis [11, 12] for those hazardous diseases, such as infectious diseases and cancers, which are still big threats to both personal health and public sanitation.

Thinking about the scenarios of clinic application and the pathophysiology requirements, we have some challenging but natural technical requirements, e.g. the low time and energy cost of the ROI search process as well as the high fidelity and the trustworthiness of segmented ROIs. The whole slide images (WSI) are here the digitized histopathology images of highest resolution (e.g. $10^6\times 10^6$). The size of typical WSI in original data of lung cancer slide is roughly as large as 1.5 GByte. We need a novel efficient solution to handle such big volume of data without losing too much accuracy. Our main task is to accelerate the search for specific patches or patch clusters, e.g. ROI, and then to increase the accuracy of classification for ROI and background pixels via a much improved segmentations.

Fortunately, we are not alone in solving the problem by harnessing the latest machine learning and computer vision techniques. In [2], a multi-scale superpixel classification approach has been proposed for efficient detection of ROIs in WSI. However, the method does not correctly notice the effect of wrong labeling in early classification stage may not be compensated by later more accurate classification. The classifier worked on different scales of magnification, and so it has to be trained multiple times with samples extracted from superpixels of different magnification. The [7] reduced the workload of labeling and grading by two ways: by excluding the areas of definitely normal tissues within a single specimen or by excluding entire specimens which do not contain any tumor cells. Besides, [7] presented a multi-resolution cancer detection algorithm to boost the latter. Another superpixel automated segmentation method is [8], which trains a classifier to predict where mitochondrial boundaries occur using diverse cues from superpixel graph. However, because of the old superpixel algorithm [1], the slow speed and the low accuracy of superpixel encumber the overall performance. The superpixel generation algorithm used in the paper is totally different from [1] where the superpixels were clustered pixel-wise. Combining the coarse-to-fine scheme [3] and boundary-only update policy [9], our method manipulates the rectangular blocks of pixel to construct a coarse segmentation of superpixel before the more accurate refinement using boundary-only update (See Fig. 1).

The proposed approach is able to generate better superpixels of perfect snapping to the actual boundaries between the foreground and the background. The improvement brought by the algorithm on patch classification and image annotation accuracy has been proved and verified in [10], we, for the first time, apply the method and quantitatively verify the improvement of the accuracy of ROI detection in histopathology images, e.g. lung cancer H&E-stained WSI. The paper is organized as followings: we first introduce the new superpixel generation algorithm and coarse-to-fine strategy for reducing dimensionality for optimization in Sect. 1. Then we introduce the details of the algorithm as well as the mathematical and optimization background in Sect. 2. Finally, we will present experimental results and analysis in Sects. 3 and 4.

2 Methodology

Our method for detection and segmentation of ROIs has two components. We first obtain a initial identification of ROIs by clustering the superpixels at low magnification. Then the superpixels were mapped to image of higher magnification by labeling corresponding pixels. The process was repeated several times until segmentations are stable. The last, the classifier labels superpixels repretented by selected features. Different from previous classic superpixel based segmentation methods [1, 9], the proposed algorithm gives topologically preserving segmentation of the image. The better segmentation the superpixels define, the more accurate the classification of ROIs will be attained.

2.1 ROIs in Lung Cancer Histopathology Images

The main idea of superpixel based segmentation methods is to cluster those pixels of similar spacial, color and topological properties and to construct a group of superpixels of all similar pixels within. As to build a fast and efficient search technique for regions of interest in lung cancer histopathology WSI, which are usually at least of trillions of pixels, previous methods may not be suitable. Because they neglected some important features in cancer cell histopathology images. The tumor cells of lung cancer patients (not only for lung cancer, but also generally appear in other subtypes of cancer) infest as cell mass. If we treat the regions where tumor cell mass appears as ROIs, it is easy to have direct observations from the H&E stained histopathology images that those tumor cells are more deeply colored due to the massive reproduction of genetic materials inside tumor nuclei (See Fig. 2).

2.2 Superpixel Clustering

As the metric of superpixel generation, we indicate the following objective functions as the one which to be minimized at each round updating the classification of pixels (or blocks):

$$\begin{aligned} E(s,\mu ,c)&= \sum _{p} E_{col}(s_p, c_{s_p}) + \lambda _{pos}\sum _{p} E_{pos}(s_p, \mu _{s_p}) + \nonumber \\&\quad \lambda _b \sum _{p}\sum _{q\in N_8} E_b(s_p, s_q) + E_{topo}(s) + E_{size}(s). \end{aligned}$$

(1)

with $c= (c_1,c_2,\dots ,c_M)$, $\mu = (\mu _1, \mu _2,\dots , \mu _M)$ the group of centers and mean position of each superpixels. And, the $N_8$ means the 8 neighbors surrounding the pixel p in a $3\times 3$ block. $E_{col}(s_p, c_{s_p}) = (I(p) - c_{s_p})^2$ is the color intensity of pixel inside the superpixel to the average intensity value of this suerpixel, in other word, it is the variance of the color intensity distribution over [0, 255], also known as appearance coherence. The shape regularization is described as the energy term averaging the distance between each contained pixel to the mean position of the superpixel, $E_{pos} (s_p, \mu _{s_p}) = \Vert p - \mu _{s_p} \Vert _2^2$, where $\mu _{s_p}$ is the center of each superpixel. On the other hand, the regularization on the size of superpixels and the connectivity of superpixels will give penalty on those superpixels of too small size and those disconnected superpixels by making the objective function positive infinity. It needs to be noted that we only consider the 4 neighbors (up, down, left and right) of the pixel (block) when we maximize $\hat{s}_{b^l_i} = arg \min _{s^l_i\in N_4} E(s, \mu , c)$.

2.3 Boundary-Only Update

The proposed superpixel generation method should be more costly efficient due to its strategy of boundary-only update at each round of pixel clustering. The boundary-only update scheme is to only update those blocks closely nearby the boundary of superpixels.

$$\begin{aligned} E_b(s_p, s_q)= {\left\{ \begin{array}{ll} 1, &{}s_p \ne s_q, \\ 0, &{}otherwise. \end{array}\right. } \end{aligned}$$

(2)

$E(s_p) = \sum _{q\in N_4} E_b(s_p,s_q)$. Only if $E(s_p) = 0$, then the corresponding block p is not a boundary block. Otherwise, it is a boundary block and has at least one neighbor belongs to other superpixel. When using the boundary-only update, there are two keypoints: (1) if we update the label of any block, it may change the list of boundary blocks; (2) we need to append the new boundary block to the end of the list because and follow the FIFO principle when deciding the order of blocks for consideration of changing label, in order to avoid the risk of divergence given by correlated dimensions in coordinate descent optimization.

2.4 Coarse-to-Fine Refinement

In the paper, we does not only utilize the coarse-to-fine strategy in the generation of superpixels, but also in the mapping to the images of higher magnification. When generating superpixels, the fundamental unit for manipulation is not single pixel but a series of rectangular blocks of size from large to small. We start from clustering coarse superpixels using the biggest blocks. Based on the result of last round, we then manipulate smaller blocks to form boundaries with more details. Combining with the boundary-only update strategy, the coarse-to-fine refinement could more efficiently construct the superpixels of irregular boundary. Besides, we construct superpixels over multiple layers of images of different magnification. In this way, the localization of boundary blocks will be much easier and the boundary update only happens to those blocks fallen into the boundary regions constructed at higher magnification. The effect of acceleration will become more significant as the size of whole slide image increases.

2.5 Complexity Analysis

Based on similar philosophy in sparse learning [5, 6], we are able to reduce the total computational complexity from $\mathcal {O}(\sum _l N_l\times nMaxIter)$, where N is the size of image, to $\mathcal {O}(\sum _l \sum _i B_l^i)$. N is usually extremely large since the WSI has trillions of pixels. For pixel-wise methods, nMaxIter should be large enough to guarantee the convergence. However, for this algorithm, at each iteration, we manipulate blocks instead of pixels in image of low resolution (size is also shrinking to $10^3\times 10^3$ level), and then conduct mapping to image of high resolution and refine the boundaries. The boundary length $B_l^i$ in image of magnification l for iteration i is much smaller than the size of current image $N_l$. Due to the reduced dimensionality, the convergence comes faster than pixel-wise methods.

3 Experiments

3.1 Experimental Setup

In the experimental stage, a random forest and a SVM classifier were built which operated on the regions defined by the superpixels generated by Algorithm 1. A total of 384 features were extracted from 100 WSIs including local binary patterns and statistics derived from the histogram of the three-channel HSD color model as well as texture features, e.g. color SIFT. The proposed method was compared with superpixels generated by SLIC [1] and tetragonum (non-superpixel). The experiments used the adenocarcinoma and squamous cell carcinoma lung cancer images from the NLST (National Lung Screening Trial) Data Portal1^{Footnote 1}. We conduct 10-fold cross-validation before recording and perform all experiments in a workstation of Intel i7-4770 CPU.

Table 1. The table presents the comparison results of the proposed superpixels, SLIC and tetragonum (non-superpixel) in term of classification statistics including: the rate of error classification, precision and recall. Tetragonum: sliding rectangular windows.

Full size table

3.2 Numerical Results

Due to the overwhelming fidelity of our superpixels, the classifier operates over the regions segmented by the proposed superpixel algorithm is able to deliver better classification accuracy (See Table 1). Since the feature descriptors were built on the patches segmented by contours of superpixels, the better the superpixel fitting the natural boundaries, the better the extracted features characterize the sample patches. In Fig. 3, we show a typical process of recursive coarse-to-fine refinement over the multi-resolution image set for lung cancer histopathology images. We first do a coarse-to-fine superpixel generation over low magnification image (Step 1 & 2), and then we map the superpixel mask (Fig. 3) to an image of higher magnification (Step 3) and repeat the Step 1 & 2. The recursive refinement does not stop until the image of highest resolution (WSI) with converged energy function [10]. Due to the reduced complexity of superpixel construction, we could significantly finish the patch-feature extraction in a much shorter time. Our method is possible to shrink the processing time cost to B/N, where N $\gg $ B in WSI. The ROC curves indicates the improvement on ROI detection accuracy brought by the new superpixel algorithm.

4 Conclusion

In the paper, we presented a novel solution to fast detection of ROI in whole slide lung cancer histopathology image. We integrated the novel superpixel generation algorithm with a multi-level block-wise optimization scheme. Our algorithm performed a faster and finer ROI detection and segmentation process, which ensure a more accurate classification of ROI. The effectiveness and efficiency of our algorithm has been verified on large histopathology WSI database, e.g. NLST.

Notes

1.
https://biometry.nci.nih.gov/cdas/studies/nlst/.

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Article Google Scholar
Bejnordi, B.E., Litjens, G., Hermsen, M., Karssemeijer, N., van der Laak, J.A.: A multi-scale superpixel classification approach to the detection of regions of interest in whole slide histopathology images. In: SPIE Medical Imaging, pp. 94200H–94200H. International Society for Optics and Photonics (2015)
Google Scholar
Van den Bergh, M., Roig, G., Boix, X., Manen, S., Van Gool, L.: Online video seeds for temporal window objectness. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 377–384. IEEE (2013)
Google Scholar
Huang, J., Huang, X., Metaxas, D.: Simultaneous image transformation and sparse representation recovery. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Huang, J., Huang, X., Metaxas, D.: Learning with dynamic group sparsity. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 64–71. IEEE (2009)
Google Scholar
Huang, J., Zhang, S., Li, H., Metaxas, D.: Composite splitting algorithms for convex optimization. Comput. Vis. Image Underst. 115(12), 1610–1622 (2011)
Article Google Scholar
Litjens, G., Bejnordi, B.E., Timofeeva, N., Swadi, G., Kovacs, I., Hulsbergen-van de Kaa, C., van der Laak, J.: Automated detection of prostate cancer in digitized whole-slide images of H and E-stained biopsy specimens. In: SPIE Medical Imaging. International Society for Optics and Photonics (2015)
Google Scholar
Lucchi, A., Smith, K., Achanta, R., Lepetit, V., Fua, P.: A fully automated approach to segmentation of irregularly shaped cellular structures in EM images. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010, Part II. LNCS, vol. 6362, pp. 463–471. Springer, Heidelberg (2010)
Chapter Google Scholar
Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 756–771. Springer, Heidelberg (2014)
Google Scholar
Yao, J., Boben, M., Fidler, S., Urtasun, R.: Real-time coarse-to-fine topologically preserving segmentation. Energy 2, 2–3 (2015)
Google Scholar
Zhang, X., Liu, W., Dundar, M., Badve, S., Zhang, S.: Towards large-scale histopathological image analysis: hashing-based image retrieval. IEEE Trans. Med. Imaging 34(2), 496–506 (2015)
Article Google Scholar
Zhang, X., Su, H., Yang, L., Zhang, S.: Fine-grained histopathological image analysis via robust segmentation and large-scale retrieval. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, 76019, USA
Ruoyu Li & Junzhou Huang

Authors

Ruoyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Junzhou Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junzhou Huang .

Editor information

Editors and Affiliations

Univers of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Guorong Wu
Bordeaux University, Bordeaux, France
Pierrick Coupé
Siemens Healthcare, Malvern, Pennsylvania, USA
Yiqiang Zhan
College of Charleston, Charleston, South Carolina, USA
Brent Munsell
Imperial College London, London, United Kingdom
Daniel Rueckert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, R., Huang, J. (2015). Fast Regions-of-Interest Detection in Whole Slide Histopathology Images. In: Wu, G., Coupé, P., Zhan, Y., Munsell, B., Rueckert, D. (eds) Patch-Based Techniques in Medical Imaging. Patch-MI 2015. Lecture Notes in Computer Science(), vol 9467. Springer, Cham. https://doi.org/10.1007/978-3-319-28194-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-28194-0_15
Published: 08 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28193-3
Online ISBN: 978-3-319-28194-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics