Robust 3D Organ Localization with Dual Learning Architectures and Fusion

Lu, Xiaoguang; Xu, Daguang; Liu, David

doi:10.1007/978-3-319-46976-8_2

Xiaoguang Lu²⁵,
Daguang Xu²⁵ &
David Liu²⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10008))

Included in the following conference series:

7511 Accesses
12 Citations

Abstract

We present a robust algorithm for organ localization from 3D volumes in the presence of large anatomical and contextual variations. The 3D spatial search space is decomposed into two components: slice and pixel, both are modeled in 2D space. For each component, we adopt different learning architectures to leverage respective modeling power on global and local context at three orthogonal orientations. Unlike conventional patch-based scanning schemes in learning-based object detection algorithms, slice scanning along each orientation is applied, which significantly reduces the number of model evaluations. Object search evidence obtained from three orientations and different learning architectures is consolidated through fusion schemes to lead to the target organ location. Experiments conducted using 499 patient CT body scans show promise and robustness of the proposed approach.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Organ Localization Using Joint AP/LAT View Landmark Consensus Detection and Hierarchical Active Appearance Models

Shape-Aware Complementary-Task Learning for Multi-organ Segmentation

Partially Supervised Multi-organ Segmentation via Affinity-Aware Consistency Learning and Cross Site Feature Alignment

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Automatic 3D organ localization is essential in a wide range of clinical applications. It provides seed points to initialize subsequent segmentation algorithms. It is also useful for visual navigation, automatic windowing, semantic tagging, and organ-based lesion grouping.

Accurate localization of organs still remains a challenging task. From the local contextual perspective, the size, shape, and appearance of organs vary significantly across patients, even more so when there are pathologies or prior surgeries. Global context around each organ also varies significantly, although the context within the entire field of view such as that among multiple anatomical organs provides a cue for individual organ localization. For example, in the abdominal region, organs such as the kidney can “float” around with large degrees of freedom, therefore leading to varying appearance context. Various sizes of field of views and different body regions in clinical practice also increase the variation of global appearance.

Data-driven learning-based approaches have shown success and been widely deployed in object localization tasks. A typical search strategy in such methods uses a scanning window based scheme. A model/classifier is trained based on annotations to determine likelihood of a patch (sub-volume) being the target object. During online testing, the classifier is applied to each sub-volume by scanning through the entire volume. Target location is calculated by consolidating evidence collected from all scanned patches. Conventional scanning window patch-based approach is more suitable for capturing local appearance variations given its limited field of view (voxels within the sub-volume), but not global appearance variations. Many methods have been proposed in this paradigm; some focus on improving the classifiers, while others improve the scanning strategy [11], or integrates other modeling methods such as conditional random field [3] and recursive context propagation network [12].

Another category of method is based on long range regression and voting. In [1], a regression forest is trained to find the non-linear mapping from voxels to the desired anatomy location, which extracts features globally from the volume, and is shown to be effective for resolving local ambiguities. However, it has been shown in [8] that the precision of such regression methods is not as accurate as the patch based classification methods due to large context variations.

We propose a framework which models both local and global context without using patch-based scanning schemes, where two emerging learning architectures are exploited to complement each other. We use the convolution neural network (CNN) [7] to capture global context [13], and the fully convolutional network (FCN) [10] to capture local context. The local context focuses on the localization precision, while the global context helps improve robustness such as resolving ambiguities and eliminating false detections. The global context and local appearance information are integrated through a probabilistic graphical model, and we call such a learning scheme as the dual learning architecture. We show in our experiments that, with explicitly modeling and fusion of both local and global contextual information, our approach is more robust and achieves a higher accuracy compared to the state-of-the-art algorithms. In addition to the object location, a significant amount of positive seeds (within the target organ) are generated, which are useful for subsequent processes such as segmentation using graph-cut methods. Furthermore, because both CNN and FCN support multi-label tasks, our algorithm can be generalized to simultaneous multi-organ localization with limited extra run-time computational cost.

2 Methodology

2.1 Context Modeling with Dual Learning Architectures

The organ localization task is formulated as a probabilistic graphical model [6], as shown in Fig. 1. Random variable I denotes a 2D image, E represents the existence (E = 1) or absence (E = 0) of the organ of interest within image I, and L is the organ location within image I. Both E and L are hidden variables, while I is an observed variable. The joint distribution factors according to the probabilistic graphical model as follows:

$$\begin{aligned} P(I,E,L)=P(L|I,E)P(E|I)P(I). \end{aligned}$$

(1)

Our goal is to query the organ location given the image, i.e., P(L|I). This can be expressed as

$$\begin{aligned} \begin{aligned} P(L|I) \!=\! P(L,I) / P(I) = \sum _E{P(I,E,L)} / P(I)&\!=\! \sum _E{P(L|I,E)P(E|I)} P(I) / P(I) \\&= \sum _E{P(L|I,E)P(E|I)}. \end{aligned} \end{aligned}$$

(2)

By definition, $P(L|I,E=0) = 0$ for all valid locations, and $P(L={empty}|I,E=0)=1$. Therefore

$$\begin{aligned} P(L|I) = P(L|I,E=1) P(E=1|I) \end{aligned}$$

(3)

for all valid pixel locations, and

$$\begin{aligned} \begin{aligned}&P(L={empty}|I) \\ {}&= P(L={empty}|I, E=1) P(E=1|I) + P(L={empty}|I,E=0) P(E=0|I) \\ {}&= P(L={empty}|I, E=0) P(E=0|I). \end{aligned} \end{aligned}$$

(4)

The probability distribution function $P(E=0\; or\; 1 | I)$ poses an image categorization problem. This function is depicted in Fig. 1(a). This was often implemented by extracting global image features and training a classifier on those features. In recent years, deep Neural Networks have shown superior performance in this task. In this paper, we use the Convolutional Neural Network (CNN) [7].

The probability distribution function $P(L|I, E=1)$ presents a pixel classification task. In contrast to P(E|I), which is a global image classification problem, $P(L|I,E=1)$ is a local pixel or patch classification problem, where the patch is centered at pixel location L. One could again use a CNN to classify each patch, but in recent literature it has been shown that the fully convolutional networks (FCN) demonstrate advantages over the CNNs for pixel-level classification. We therefore adopt the FCN for this local image classification problem. To the best of our knowledge, this is the first time an FCN is used in conjunction with a CNN in a “dual learning” architecture for solving the global-local pixel classification problem.

While the FCN is described above as a local pixel classifier, it has been used in the literature to classify pixels into multi-label masks, in which the “background” class is one of the possible labels. This means, we could have used directly the FCN to classify all the pixels without using the global CNN classifier at all. However, as we will show in the experiments, there are significant advantages of combining the FCN with the CNN, where FCN’s limited receptive field [9, 15] is compensated by CNNs’ response. This is also evident from the above probabilistic formulation: a FCN-only pixel classifier would model directly P(L|I) as shown in Fig. 1(b) without considering the hidden variable E. Therefore, our global-local model poses a stronger assumption than a typically FCN-only classifier, which does not have knowledge of the presence of the organ. For multi-organ localization tasks [4], the proposed method can be extended through multi-label training with the same architectures.

Compared with patch-based sub-window scanning in conventional object localization, in our method, one entire slice (not a sub-patch) is used as one input sample to either CNN and FCN. During online testing on a given volume, for each CNN or FCN model, the total number of image samples that are passed through CNN/FCN for evaluation is the number of slices along one orientation.

2.2 Cross-Sectional Fusion and Clustering

The dual learning architectures with respective models operate along each of the three orthogonal orientations, i.e., axial, sagittal, and coronal, resulting in three volumetric probability/score maps. These maps are generated from different orientations with different image context and therefore provide complementary information towards the target localization decision making. Typical ensemble schemes or information fusion approaches can be applied, such as majority voting, or sum rule [5], to lead to a consolidated score for each voxel. We call this scheme cross-sectional fusion.

After the consolidated probability/score map is computed, three-dimensional connected component analysis is conducted. The centroid of the largest cluster is computed as the estimated object location.

3 Experiments

Among all the organs with available expert annotations, the right kidney is one of the most challenging organs [2]. We use the right kidney as an exemplar case in our experiments. We have collected 450 patient CT body scans, one scan from each patient. For each scan, right kidney was manually delineated. At the training stage, 405 scans were selected at random for training and the remaining 45 scans (10 %) for validation. Our training data covers large variations in populations, contrast phases, scanning ranges, and pathologies. The axial slice resolution ranges between 0.5 mm and 1.5 mm. The inter-slice distance varies from 0.5 mm to 7.0 mm. Scan coverage includes abdominal regions, but can extend to head/neck and knees. After all models were trained, we collected another 49 patient CT scans from clinical sites for independent testing. Right kidney is also manually delineated in these 49 test cases to compute quantitative measurement for algorithm performance evaluation. Typical test scan samples are provided in Fig. 2.

Table 1. Number of training images for each model.

Full size table

Each CT scan contains a stack of axial slices, which were used to reconstruct a 3D volume at an isotropic resolution of $2\times 2\times 2\,mm^3$. All the algorithms/models in our subsequent experiments operate at this resolution. Three orthogonal orientations (axial, sagittal, and coronal) are considered for cross-sectional analysis. Only the right hand side of the body is considered in the experiments (training and testing) as the right kidney is the target object. The centroid of the delineated right kidney was used as ground-truth location. A volumetric mask was generated based on the annotations, where right kidney voxels are labeled as ones and all other background was labeled as zeros. This mask was used to provide the labels for FCN training. For CNN training, a two-class classification is defined, i.e., whether or not an image slice contains the right kidney.

Table 2. Statistics of Euclidean distance from the automatic localization result to the ground-truth position at $2\,mm$ resolution. Sum rule is applied in cross-sectional fusion. CS: cross-sectional fusion.

Full size table

Slice-level modeling (CNN): the AlexNet architecture [7], which contains 5 convolution layers and 3 fully connected layers, is adopted. One CNN model is trained for each cross-section orientation using the same learning architecture. Pixel-level modeling (FCN): the VGG-FCN8s architecture [10] is adopted, which is an end-to-end network with 7 levels of convolution layers, 5 pooling layers and 3 deconvolution layers. One FCN model is learned for each cross-section orientation with the same network architecture. Table 1 lists the number of training images/slices used for each model.

For comparison, we implemented a 3D patch-based scanning window approach based on the method proposed by Zheng et al. [14], and applied it on the same test set. We refer to their approach as marginal space learning (MSL). Quantitative performance evaluation against the ground-truth is provided in Table 2 and Fig. 3. Figure 4 presents an example to demonstrate complementary information extraction from the dual learning architectures.

Although the focus of the proposed method is on organ localization, one typical use case of organ localization is for organ segmentation. We evaluate the impact of our kidney localization on the accuracy of kidney segmentation. As the MSL method together with active shape models has shown to provide good cardiac segmentation results [14], we adopt it for right kidney segmentation. Our automatic localization led to similar segmentation error rates compared to using the ground-truth locations. Using our automatic localization results as input for segmentation, the [mean, std., median, 80 percentile] of point-to-mesh errors (used in [14]) in mm are [2.32, 1.23, 1.91, 2.22], while the ground-truth locations led to error rates of [2.00, 0.48, 1.85, 2.20].

4 Conclusions

We have presented a robust 3D organ localization algorithm. We approach the 3D localization task through cross-sectional 2D modeling, exploiting two learning architectures that model various context for localizing the target organ. Contextual information extracted by the two learning schemes is complementary and integrated for improved robustness. Because FCN and CNN are capable of learning multiple targets/labels, our method can be extended for simultaneous multi-organ localization. Although CT body scans are used in the experiments, the proposed method is not limited to specific imaging modalities.

References

Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E.: Regression forests for efficient anatomy detection and localization in CT studies. In: Menze, B., Langs, G., Tu, Z., Criminisi, A. (eds.) MICCAI 2010. LNCS, vol. 6533, pp. 106–117. Springer, Heidelberg (2011)
Google Scholar
Cuingnet, R., Prevost, R., Lesage, D., Cohen, L.D., Mory, B., Ardon, R.: Automatic detection and segmentation of kidneys in 3D CT images using random forests. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012, Part III. LNCS, vol. 7512, pp. 66–74. Springer, Heidelberg (2012)
Chapter Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE TPAMI 35(8), 1915–1929 (2013)
Article Google Scholar
Gauriau, R., Cuingnet, R., Lesage, D., Bloch, I.: Multi-organ localization combining global-to-local regression and confidence maps. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part III. LNCS, vol. 8675, pp. 337–344. Springer, Heidelberg (2014)
Google Scholar
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE TPAMI 20(3), 226–239 (1998)
Article Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. Lippincott Williams & Wilkins, Philadelphia (2009)
MATH Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the NIPS (2012)
Google Scholar
Lay, N., Birkbeck, N., Zhang, J., Zhou, S.K.: Rapid multi-organ segmentation using context integration and discriminative models. In: Joshi, S., Pohl, K.M., Wells, W.M., Zöllei, L., Gee, J.C. (eds.) IPMI 2013. LNCS, vol. 7917, pp. 450–462. Springer, Heidelberg (2013)
Chapter Google Scholar
Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better (2015). arXiv:1506.04579v2
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the CVPR (2015)
Google Scholar
Roth, H.R., Lu, L., Farag, A., Shin, H.C., Liu, J., Turkbey, E.B., Summers, R.M.: Deeporgan: multi-level deep convolutional networks for automated pancreas segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 556–564. Springer, Heidelberg (2015)
Chapter Google Scholar
Sharma, A., Tuzel, O., Liu, M.Y.: Recursive context propagation network for semantic scene labeling. In: Proceedings of the NIPS (2014)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proc. CVPR (2013)
Google Scholar
Zheng, Y., Barbu, A., Georgescu, B., Scheuering, M., Comaniciu, D.: Four-chamber heart modeling and automatic segmentation for 3D cardiac CT volumes using marginal space learning and steerable features. IEEE TMI 27(11), 1668–1681 (2008)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: Proceedings of the ICLR (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Medical Imaging Technologies, Siemens Medical Solutions, Inc., Princeton, New Jersey, USA
Xiaoguang Lu, Daguang Xu & David Liu

Authors

Xiaoguang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Daguang Xu
View author publications
You can also search for this author in PubMed Google Scholar
David Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoguang Lu .

Editor information

Editors and Affiliations

University of Adelaide, Adelaide, South Australia, Australia
Gustavo Carneiro
Technical University of Munich, Garching, Germany
Diana Mateus
Technical University of Munich, Garching, Germany
Loïc Peter
University of Queensland, St Lucia, Queensland, Australia
Andrew Bradley
Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares
University of Oxford, Oxford, United Kingdom
Vasileios Belagiannis
Universidade Estadual Paulista, Bauru, Brazil
João Paulo Papa
Instituto Superior Técnico, Lisbon, Portugal
Jacinto C. Nascimento
Delft University of Technology, Delft, The Netherlands
Marco Loog
University of South Australia, Adelaide, South Australia, Australia
Zhi Lu
Universidade do Porto, Porto, Portugal
Jaime S. Cardoso
Google DeepMind, London, United Kingdom
Julien Cornebise

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, X., Xu, D., Liu, D. (2016). Robust 3D Organ Localization with Dual Learning Architectures and Fusion. In: Carneiro, G., et al. Deep Learning and Data Labeling for Medical Applications. DLMIA LABELS 2016 2016. Lecture Notes in Computer Science(), vol 10008. Springer, Cham. https://doi.org/10.1007/978-3-319-46976-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-46976-8_2
Published: 27 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46975-1
Online ISBN: 978-3-319-46976-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust 3D Organ Localization with Dual Learning Architectures and Fusion

Abstract

Similar content being viewed by others

Organ Localization Using Joint AP/LAT View Landmark Consensus Detection and Hierarchical Active Appearance Models

Shape-Aware Complementary-Task Learning for Multi-organ Segmentation

Partially Supervised Multi-organ Segmentation via Affinity-Aware Consistency Learning and Cross Site Feature Alignment

Keywords

1 Introduction

2 Methodology

2.1 Context Modeling with Dual Learning Architectures

2.2 Cross-Sectional Fusion and Clustering

3 Experiments

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Robust 3D Organ Localization with Dual Learning Architectures and Fusion

Abstract

Similar content being viewed by others

Organ Localization Using Joint AP/LAT View Landmark Consensus Detection and Hierarchical Active Appearance Models

Shape-Aware Complementary-Task Learning for Multi-organ Segmentation

Partially Supervised Multi-organ Segmentation via Affinity-Aware Consistency Learning and Cross Site Feature Alignment

Keywords

1 Introduction

2 Methodology

2.1 Context Modeling with Dual Learning Architectures

2.2 Cross-Sectional Fusion and Clustering

3 Experiments

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation