Automatic Vertebra Labeling in Large-Scale 3D CT Using Deep Image-to-Image Network with Message Passing and Sparsity Regularization

Yang, Dong; Xiong, Tao; Xu, Daguang; Huang, Qiangui; Liu, David; Zhou, S. Kevin; Xu, Zhoubing; Park, JinHyeong; Chen, Mingqing; Tran, Trac D.; Chin, Sang Peter; Metaxas, Dimitris; Comaniciu, Dorin

doi:10.1007/978-3-319-59050-9_50

Dong Yang²⁰,
Tao Xiong²¹,
Daguang Xu²²,
Qiangui Huang²³,
David Liu²²,
S. Kevin Zhou²²,
Zhoubing Xu²²,
JinHyeong Park²²,
Mingqing Chen²²,
Trac D. Tran²¹,
Sang Peter Chin²¹,
Dimitris Metaxas²⁰ &
…
Dorin Comaniciu²²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10265))

Included in the following conference series:

International Conference on Information Processing in Medical Imaging

7539 Accesses
61 Citations

Abstract

Automatic localization and labeling of vertebra in 3D medical images plays an important role in many clinical tasks, including pathological diagnosis, surgical planning and postoperative assessment. However, the unusual conditions of pathological cases, such as the abnormal spine curvature, bright visual imaging artifacts caused by metal implants, and the limited field of view, increase the difficulties of accurate localization. In this paper, we propose an automatic and fast algorithm to localize and label the vertebra centroids in 3D CT volumes. First, we deploy a deep image-to-image network (DI2IN) to initialize vertebra locations, employing the convolutional encoder-decoder architecture together with multi-level feature concatenation and deep supervision. Next, the centroid probability maps from DI2IN are iteratively evolved with the message passing schemes based on the mutual relation of vertebra centroids. Finally, the localization results are refined with sparsity regularization. The proposed method is evaluated on a public dataset of 302 spine CT volumes with various pathologies. Our method outperforms other state-of-the-art methods in terms of localization accuracy. The run time is around 3 seconds on average per case. To further boost the performance, we retrain the DI2IN on additional 1000+ 3D CT volumes from different patients. To the best of our knowledge, this is the first time more than 1000 3D CT volumes with expert annotation are adopted in experiments for the anatomic landmark detection tasks. Our experimental results show that training with such a large dataset significantly improves the performance and the overall identification rate, for the first time by our knowledge, reaches 90%.

D. Yang and T. Xiong contributed equally.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Automatic Vertebra Labeling in Large-Scale Medical Images Using Deep Image-to-Image Network with Message Passing and Sparsity Regularization

Deep Image-to-Image Recurrent Network with Shape Basis Learning for Automatic Vertebra Labeling in Large-Scale 3D CT Volumes

Vertebrae Localization, Segmentation and Identification Using a Graph Optimization and an Anatomic Consistency Cycle

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Automatic localization and labeling of vertebrae in 3D spinal imaging, e.g. computed tomography (CT) or magnetic resonance imaging (MRI), has become an essential tool for clinical tasks, including pathological diagnosis, surgical planning and post-operative assessment. Specific applications such as vertebrae segmentation, fracture detection, tumor detection, registration and statistical shape analysis can also benefit from the effective vertebrae detection and labeling algorithms. However, there are many challenges associated with designing an accurate and automatic algorithm, which arise from pathologies, image artifacts, and the limited field-of-view (FOV). For example, as shown in Fig. 1, the abnormal spine curvature and surgical metal implants significantly alter the appearance of vertebrae and reduce the image contrast. Spine-focused scans with small field-of-view (FOV) also add difficulty to the identification tasks due to lack of global spatial and contextual information.

To address these challenges, many approaches have been proposed for automatic localization and identification of vertebrae. Glocker et al. [1] presented a method based on regression forests and probabilistic graphic models. However, their method is likely to suffer from the narrow field-of-view because the broad contextual information is not always available. To overcome this limitation, Glocker et al. [2] proposed a randomized classification forest based approach which achieved reasonable localization and identification performances on pathological cases and those with limited FOV. Recently, deep learning has been employed in the applications of spine detection. Chen et al. [3] presented a joint convolutional neural network (J-CNN). This hybrid approach used a random forest classifier to coarsely localize the candidates before the J-CNN scanned the input CT volume for final results. By incorporating the pairwise information of neighboring vertebrae in J-CNN, it outperformed other methods [2]. Suzani et al. [4] proposed a deep feed-forward neural network to detect if an input image contained a specific vertebra. Although this work achieved high detection rates, it reported a large mean localization error compared with other works. Besides, instead of the direct 3D volumetric input, this work extracted 1D features based on the local voxel intensities as the input of deep feed-forward neural network. In addition, no convolution or pooling operation was applied in the network. Payer et al. [5] proposed a composite neural network to build up the full connection between response maps of all landmarks with convolutional kernels. The spatial relationship of landmarks were implicitly embedded in the CNN model.

In order to overcome these limitations and to take advantage of deep neural networks, we present an approach, shown in Fig. 2, with the following contributions:

(a)
Deep Image-to-Image Network (DI2IN) for Voxel-Wise Regression

Without extracting features from input images, the proposed deep image-to-image architecture directly takes a 3D CT volume as input. The training of the proposed network is designed as multichannel voxel-wise regression (refer to Sect. 2.1). It generates the multichannel probability maps associated with different vertebra centers, which intuitively illustrate the location and label of vertebrae. Our neural network requires no coarse classifiers to remove the outliers for preprocessing. Instead, it automatically extracts contextual and spatial information by itself. By taking the advantage of fully convolutional implementation, the proposed network is significantly time-efficient, which sets it apart from the sliding window approaches.
(b)
Response Enhancement with Message Passing

Although the proposed deep image-to-image network generates confident probability maps, there is no guarantee that it will avoid false positives (outliers) due to the complexity of appearance (shown in Fig. 1). To resolve this problem, we adopt a message passing scheme within the probability maps of vertebra centers, which leverages the mutual relation of vertebrae. A chain-structure graphical model is introduced to depict the spatial relationship. Each node in the model represents a probability distribution of one vertebra center. During the passing scheme, the probability map of each vertebra center iteratively receives messages (encoded in the convolution operation) from all neighboring vertebrae (nodes) and absorbs them for further self-evolvement. The collected messages can not only enhance the response of correct location, but also suppress that of the false positives.
(c)
Refinement Using Sparse Representation

To further refine the coordinates of vertebrae, we incorporate a dictionary learning and sparse representation approach which utilizes the holistic structure of the spine and identifies the important set of coordinates. Instead of learning a regression model to fit the spinal shape, we simply adopt the coordinates of the spine in the training samples to construct a data dictionary and formulate this problem as an $\ell _1$ norm optimization to learn the best sparse representation. Based on the regularity of the spine shape, ambiguous coordinates are removed and the sparse representation is optimized in a subspace instead of all coordinates (refer to Sect. 2.2). Finally, the refined coordinates in each axis are reconstructed from the same subspace jointly, which further improves the localization and identification performance.

The rest of the paper is organized as follows: In Sect. 2, we introduce our deep image-to-image network architecture with message passing and refinement approach. In Sect. 3, the proposed framework is compared to previous state-of-the-art methods based on a public spine dataset. In Sect. 4, we present the conclusion and discussion.

2 Methodology

2.1 Deep Image-to-Image Network (DI2IN) for Multiple Landmark Localization

In this section, we present the proposed deep image-to-image network, which is multi-layer convolutional, to localize vertebra centroids. As shown in Fig. 3, the proposed network is deployed in a symmetric manner which can be treated equivalently as a convolutional encoder-decoder network. It is implemented in the fashion of voxel-wise end-to-end learning to enable efficient inference. The multichannel ground truth data is specially designed with the coordinates of vertebra centroid. A Gaussian distribution $I_{gt} = \frac{1}{\sigma \sqrt{2\pi }}e^{-\left\| x-\mu \right\| ^{2}/{2\sigma ^{2}}}$ is defined in each channel to represent the vertebra location. Vector $x\in \mathbb {R}^{3}$ represents the voxel coordinates in volume, vector $\mu $ is the ground truth location of vertebra centroid. Variance $\sigma ^{2}$ is pre-defined which controls the scale of the Gaussian distribution. Each channel’s prediction $I_{prediction}$ corresponds to a unique vertebra centroid. It has the same size as the input image. Therefore, the whole learning problem is formulated as multichannel voxel-wise regression. During the training, we apply the square loss $\left| I_{prediction} - I_{gt} \right| ^{2}$ for each voxel at the output layer. We define the centroid detection as a regression task instead of classification. Because the highly imbalanced data in classification is inevitable and it causes the misleading classification accuracy.

Convolution, rectified linear unit (ReLU), and max-pooling layers are used in the encoder part of the proposed network. Pooling is critical as it helps increase the receptive field of neurons and lower the GPU memory consumption. With the larger receptive field, more contextual information is taken into consideration for each neuron in different layers. Therefore, the relative spatial position of vertebra centroids in prediction would be better interpreted. The decoder part is composed of the convolution, ReLU and upsampling layers. Upsampling layers are implemented with the bilinear interpolation to enlarge and densify the activation. It further enables the end-to-end voxel-wise training. The convolutional filter size is $1\times 1\times 1$ in the final output layer and $3\times 3\times 3$ for the other convolution layers. The max-pooling filter size is $2\times 2\times 2$. The stride in the convolution layers is set as 1 to maintain the same size in each channel. The pooling factor in pooling layers is set as 2 for downsampling by half in each dimension. The number of channels in each layers are marked next to the layers in Fig. 3. In upsampling layers, the input features are upsampled by a factor of 2 in x, y, z directions respectively. The network takes a 3D CT image (volume) as input and directly outputs multiple probability maps, with each map associated with one vertebra landmark (equivalent to vertebra centroid). The framework is more efficient at computing the probability maps as well as the centroid locations than the patch-wise classification or regression methods in [3, 4].

Our DI2IN adopts several prevailing techniques [6,7,8, 10, 11] with necessary modification. We utilize the feature layer concatenation in DI2IN which is analogous with the one described in [7]. The shortcut bridges are built up directly from the encoder layers to decoder layers. It passes forward the feature maps from the encoder and is then concatenated with the decoder feature layers. The concatenated features are used as the input for next convolution layers. Following the concatenation, high and low level features are combined explicitly so that the network benefits from both the local and global contextual information. Deep supervision in neural network during the end-to-end training is shown in [8, 10, 11] to achieve excellent boundary detection and segmentation results. In the network, we introduce a more sophisticated deep supervision method to improve the performance. Several branches are bifurcated out from the main network from the intermediate layers of the decoder part. With proper upsampling factors and convolution operations, the output size of each channel of all branches matches the size of the input image. The supervision is introduced at the end of each branch i by computing a loss term $l_{i}$ with the same ground truth data. To further leverage the results from different branches, the final output is determined by the convolution operation of output concatenation of all branches with ReLU. The total loss $l_{total}$ is a combination of loss terms from all output layers which includes the output layers from all branches and the final output layer, as shown here:

$$\begin{aligned} l_{total} = \sum _{i}l_{i} + l_{final} \end{aligned}$$

2.2 Probability Map Enhancement with Message Passing Scheme

Given the image I, the DI2IN generates one probability map $P\left( v_i | I \right) $ for the center of each individual vertebra i with high confidence. The vertebrae will be located at the peak positions $v_i$ of probability maps. However, we find that these probability maps are not perfect yet: some probability maps don’t have response or have very low response at the ground truth locations because of similar image appearances of several vertebrae (e.g. T1–T12). In order to handle the problem of missing response, we propose a message passing scheme to effectively enhance the probability maps by utilizing the prior knowledge of the spine structure.

The concept of message passing was first introduced in the context of probabilistic graphical models. It is used in the sum-product or max-product algorithms for exact inference of the marginal probabilities of nodes or the distribution mode in a tree-structured graph. Messages are passed iteratively between neighboring nodes to exchange information and optimize the overall probability distribution. Similarly, we introduce an MRF-like model, a chain-structure graph shown in Fig. 4, to express the spatial relationship among vertebrae, where each node in the graph represents one vertebra center $v_i$. Then we propose the following formulation to update the $P\left( v_i | I \right) $ during the iteration t of message passing.

$$\begin{aligned} P_{t+1}\left( v_i | I \right)&=\frac{1}{Z}\left[ \alpha \cdot \frac{\sum _{j\in \partial i }m_{j\rightarrow i}}{\left| \partial i \right| } + P_{t}\left( v_i | I \right) \right] \end{aligned}$$

(1)

$$\begin{aligned}&=\frac{1}{Z}\left[ \alpha \cdot \frac{\sum _{j\in \partial i }P_{t}\left( v_j | I \right) *k\left( v_i | v_j \right) }{\left| \partial i \right| } + P_{t}\left( v_i | I \right) \right] \end{aligned}$$

(2)

where $\partial i$ is the neighbor of vertebra i in the graph, Z is a normalization constant, and $\alpha \in \left( 0,1 \right) $ is a discounted factor. The messages $m_{j\rightarrow i}$, defined as $P_{t}\left( v_j | I \right) *k\left( v_i | v_j \right) $, are passed along the chain shown in Fig. 4. $*$ is the convolution operation. $k\left( v_i | v_j \right) $ is a single convolution kernel which is learned from the ground truth distribution of vertebra i, j. Multi-dimensional convolution itself is capable to shift the mass of the probability map $P_{t}\left( v_i | I \right) $ to its neighborhood with a fixed orientation (kernel). If $P_{t}\left( v_i | I \right) $ is confident at its correct location, then the message $m_{j\rightarrow i}$ would be a strong prior for $P_{t+1}\left( v_j | I \right) $ at the correct location of the vertebra j. After several iterations of message passing, the vertebra with missing response can be compensated with the aggregated messages from its neighboring vertebrae. The underlying assumption is that majority of the vertebra probability maps are confident and well distributed around the true locations, which is guaranteed by the powerful DI2IN in our method. The advantage of the proposed scheme is that it can be concatenated into the DI2IN for further end-to-end training (fine-tuning) when the iteration number is fixed. The location of each vertebra centroid can simply be determined by the location of the maximum value in the corresponding probability map.

Several recent works have deployed the message-passing concept for different landmark detection tasks. Chu et al. [12] proposed the passing scheme between the feature maps instead of landmark probability maps. Yang et al. [9] introduced a fully connected graphical model for message passing between probability maps. The hand-crafted features were adopted in the pair-wise terms of the messages. Payer et al. [5] also brought up the fully connected graphical model, applying one-time passing with pixel-wise dot-product for noise cancelling. In our proposed method, the passing is directly among the response maps along the chain-structure model. The response maps are gradually enhanced within several passing iterations, since one passing is not enough to make necessary adjustment for probability maps. Compared to the hand-craft features, the single convolutional kernel is eligible to generate messages between neighbors because the designed neighborhood is compact. In our framework, the missing response is the major issue instead of the noisy output, so the dot-product operation is not applicable and may hurt the output probabilities.

2.3 Sparse Representation for Landmark Refinement

As shown in Fig. 5, the DI2IN with message passing generates a clear probability map, where the high probability map indicates the potential location of the landmark (centroid of the vertebrae). However, sometimes due to image artifacts and low image resolution, it is still difficult to guarantee there will be no false positive. In [3], a shape regression optimization model was used to refine the predicted vertebral centroids in the vertical axis. By minimizing an energy function, the optimized parameters are learned for each test sample to determine the final coordinates of vertebrae. However, their model assumes that the coordinates distribution can be described in a quadratic form, and it was only applied for coordinates in the vertical axis.

Inspired by the previous works in sparse representation, we propose an $\ell _1$ norm approach to help refine the coordinates in all x, y and z axes. Given a pre-generated shape-based dictionary $\mathbf {D}$ and the predicted coordinates vector of all centroids $\mathbf {v}$ in a testing sample, we adopt the $\ell _1$ norm optimization to solve the sparse coefficient vector $\mathbf {a}$. The refined coordinates $\hat{\mathbf {v}}$ is defined as $\hat{\mathbf {v}}=\mathbf {D}\mathbf {a}$. In particular, the shape-based dictionary is learned from the training samples. For example, the dictionary $\mathbf {D}_z$ associated with the vertical axis is constructed by the z coordinates of all centroids of each sample in the training database. $\mathbf {v}_z$ denotes the predicted z coordinates of one sample in the testing database. The dictionaries $\mathbf {D}_x$ and $\mathbf {D}_y$ indicate the dictionaries associated with other axes and are learned in the same way.

The details are shown in Algorithm 1. First, we use dynamic programming to find the maximum descending subsequence in the predicted coordinates $\mathbf {v}_z$ since the vertical axis of the spine produces the most stable results. We define the subspace $\mathcal {S}$ of dictionary and the predicted coordinates vector based on the indices in the subsequence. For example, we only choose the atoms from dictionary $\mathbf {D}_z$ and $\mathbf {v}_z$ associated with the indices to generate a sub-dictionary $\mathbf {D}_{z,\mathcal {S}}$ and sub-vector $\mathbf {v}_{z,\mathcal {S}}$. Then we solve the optimization problem in Step 3 for x, y and z axes individually in the subspace $\mathcal {S}$ instead of the original space $\mathcal {S}_0$. Finally, all coordinates are reconstructed by the original dictionary (i.e., $\mathbf {D}_z$) and sparse vector (i.e., $\mathbf {a}_z$). Intuitively, we remove the ambiguous outliers in the preliminary predicted coordinates and then define a subspace without these outliers. Based on the subspace, we find the best sparse combination in the corresponding sub-dictionary. By taking advantage of the original dictionary, all coordinates are reconstructed and refined simultaneously as shown in Fig. 6.

3 Experiments

First, we evaluate the proposed method on the database introduced in [2] which consists of 302 CT scans of patients with varying types of pathologies. There are several unusual appearances in the database, such as the abnormal spine curvature and the bright visual artifacts caused by metal implants from the post-operative procedures. In addition, the field-of-view (FOV) of each CT image varies widely in terms of vertical cropping, image noise and physical resolution [1]. Most cases contain a portion of whole vertebrae while the global spine structure is visible only in a few cases. The large variations in pathologies and the limited FOV increase the complexity of vertebra appearance, and thus raise the difficulties of accurate spine localization and identification task. The ground truth is marked at the centroid of each vertebra, which is annotated by clinical experts. In previous works [1, 3, 4], there are two different settings on these 302 CT images: the first one uses 112 of the images as training and another 112 images as testing; the second one takes all images (242) in setting one with extra 18 images as training data and an additional 60 images as testing data. For a fair comparison, we follow the same database settings in our experiments. They are denoted as “Set 1” and “Set 2” respectively. We follow the evaluation metrics described in [2], in terms of the Euclidean distance error (in mm) and identification rates (Id.Rates) defined in [1]. Table 1 compares our evaluation performance with the number reported by previous approaches [2,3,4]. We obtain an overall average mean error of 9.1 mm and 8.6 mm and an identification rates of 80% and 85% on those two sets, respectively. Overall, our method outperforms the state-of-the-art methods on the same datasets in terms of mean error and identification rates.

It is well known that deep neural networks have the capability to represent the variations of a large amount of data. With large amounts of annotated data in the training, the deep neural network can usually achieve better performance on various tasks. In order to validate if more training data can boost the performance of the proposed method, we introduce additional 1000+ CT scans of patients into the training samples and train our proposed model again from scratch. These data cover large variations in populations and contrast phases which are collected for various purposes. Most cases have a large FOV and include all the vertebrae. Some scans are extended to the knee and head. The testing data is not changed in all experiments. This pipeline is denoted as “Our Method+1000 training data”. As shown in Table 1, the experimental results demonstrate that the large amount of training samples can further improve the performance significantly. Our approach has achieved the best performance in almost all the metrics. On “Set 1”, the Id. Rates of our method is 13% higher than the state-of-the-art method [2]. We also achieve more than 90% Id. Rates on “Set 2”, which is 6% higher than the state-of-the-art method [3].

Table 1. Comparison of localization errors in mm and identification rates among different methods. “Set 1” has 112 CT images for training and 112 images for testing. “Set 2” uses all data in “Set 1” with extra 18 images for training and 60 images for testing. Our Method (DI2IN+MP+Sparsity) is trained and tested using default data setting in “Set 1” and “Set 2”, while “+1000” indicates this model is trained with additional 1000 images and evaluated on the same testing data. Evaluation of results after each step are also listed for comparison, which shows that they improve the performance. “MP” and “Sparsity” denote message passing scheme and sparsity regularization respectively.

Full size table

All experiments are conducted on a workstation equipped with an Intel 3.50 GHz CPU and a 12 GB Nvidia Titan X GPU. During the evaluation, the response maps of all output channels are compared with a heuristic threshold constant in an element-wise manner in order to distinguish valid response from random noise. Only the channels whose response maps contain elements with value greater than the threshold are considered. The vertebra centroids associated with these channels are then identified to be present in the image. The landmarks corresponding to the other response maps are considered as non-presented. The localization and identification of all vertebrae in one case is achieved simultaneously in an efficient way. The testing time of our method is around three seconds per case on average assisted with the GPU. The experimental results demonstrate that our proposed method for spine centroids localization and identification is not only effective in terms of accuracy, but also significantly time-efficient.

4 Conclusion

In this paper, we proposed an effective and fast automatic method to localize and label vertebra centroids in 3D CT volumes. Our method outperforms other state-of-the-art methods of spine labeling in terms of various evaluation metrics. For the future study, we plan to investigate various DI2IN architectures (e.g. ResNet) and other sophisticated refinement approaches to further improve the localization and identification performance.

Disclaimer: This feature is based on research, and is not commercially available. Due to regulatory reasons its future availability cannot be guaranteed.

References

Glocker, B., Feulner, J., Criminisi, A., Haynor, D.R., Konukoglu, E.: Automatic localization and identification of vertebrae in arbitrary field-of-view CT scans. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7512, pp. 590–598. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33454-2_73
Chapter Google Scholar
Glocker, B., Zikic, D., Konukoglu, E., Haynor, D.R., Criminisi, A.: Vertebrae localization in pathological spine CT via dense classification from sparse annotations. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 262–270. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40763-5_33
Chapter Google Scholar
Chen, H., Shen, C., Qin, J., Ni, D., Shi, L., Cheng, J.C.Y., Heng, P.-A.: Automatic localization and identification of vertebrae in spine CT via a joint learning model with deep neural networks. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 515–522. Springer, Cham (2015). doi:10.1007/978-3-319-24553-9_63
Chapter Google Scholar
Suzani, A., Seitel, A., Liu, Y., Fels, S., Rohling, R.N., Abolmaesumi, P.: Fast automatic vertebrae detection and localization in pathological CT scans - a deep learning approach. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 678–686. Springer, Cham (2015). doi:10.1007/978-3-319-24574-4_81
Chapter Google Scholar
Payer, C., Štern, D., Bischof, H., Urschler, M.: Regressing heatmaps for multiple landmark localization using CNNs. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 230–238. Springer, Cham (2016). doi:10.1007/978-3-319-46723-8_27
Chapter Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint. arXiv:1511.00561 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). doi:10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1403 (2015)
Google Scholar
Yang, W., Ouyang, W., Li, H., Wang, X.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: CVPR (2016)
Google Scholar
Merkow, J., Kriegman, D., Marsden, A., Tu, Z.: Dense volume-to-volume vascular boundary detection. arXiv preprint arXiv:1605.08401 (2016)
Dou, Q., Chen, H., Jin, Y., Yu, L., Qin, J., Heng, P.-A.: 3D deeply supervised network for automatic liver segmentation from CT volumes. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 149–157. Springer, Cham (2016). doi:10.1007/978-3-319-46723-8_18
Chapter Google Scholar
Chu, X., Ouyang, W., Li, H., Wang, X.: Structured feature learning for pose estimation. arXiv preprint arXiv:1603.09065 (2016)

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rutgers University, Piscataway, NJ, 08854, USA
Dong Yang & Dimitris Metaxas
Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
Tao Xiong, Trac D. Tran & Sang Peter Chin
Medical Imaging Technologies, Siemens Healthcare Technology Center, Princeton, NJ, 08540, USA
Daguang Xu, David Liu, S. Kevin Zhou, Zhoubing Xu, JinHyeong Park, Mingqing Chen & Dorin Comaniciu
Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA
Qiangui Huang

Authors

Dong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Daguang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qiangui Huang
View author publications
You can also search for this author in PubMed Google Scholar
David Liu
View author publications
You can also search for this author in PubMed Google Scholar
S. Kevin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhoubing Xu
View author publications
You can also search for this author in PubMed Google Scholar
JinHyeong Park
View author publications
You can also search for this author in PubMed Google Scholar
Mingqing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Trac D. Tran
View author publications
You can also search for this author in PubMed Google Scholar
Sang Peter Chin
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Metaxas
View author publications
You can also search for this author in PubMed Google Scholar
Dorin Comaniciu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daguang Xu .

Editor information

Editors and Affiliations

University of North Carolina, Chapel Hill, North Carolina, USA
Marc Niethammer
University of North Carolina, Chapel Hill, North Carolina, USA
Martin Styner
Kitware Inc., Carrboro, North Carolina, USA
Stephen Aylward
University of North Carolina, Chapel Hill, North Carolina, USA
Hongtu Zhu
University of Pennsylvania, Philadelphia, Pennsylvania, USA
Ipek Oguz
University of North Carolina, Chapel Hill, North Carolina, USA
Pew-Thian Yap
University of North Carolina, Chapel Hill, North Carolina, USA
Dinggang Shen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, D. et al. (2017). Automatic Vertebra Labeling in Large-Scale 3D CT Using Deep Image-to-Image Network with Message Passing and Sparsity Regularization. In: Niethammer, M., et al. Information Processing in Medical Imaging. IPMI 2017. Lecture Notes in Computer Science(), vol 10265. Springer, Cham. https://doi.org/10.1007/978-3-319-59050-9_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-59050-9_50
Published: 23 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59049-3
Online ISBN: 978-3-319-59050-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Vertebra Labeling in Large-Scale 3D CT Using Deep Image-to-Image Network with Message Passing and Sparsity Regularization

Abstract

Similar content being viewed by others

Automatic Vertebra Labeling in Large-Scale Medical Images Using Deep Image-to-Image Network with Message Passing and Sparsity Regularization

Deep Image-to-Image Recurrent Network with Shape Basis Learning for Automatic Vertebra Labeling in Large-Scale 3D CT Volumes

Vertebrae Localization, Segmentation and Identification Using a Graph Optimization and an Anatomic Consistency Cycle

Keywords

1 Introduction

2 Methodology

2.1 Deep Image-to-Image Network (DI2IN) for Multiple Landmark Localization

2.2 Probability Map Enhancement with Message Passing Scheme

2.3 Sparse Representation for Landmark Refinement

3 Experiments

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Vertebra Labeling in Large-Scale 3D CT Using Deep Image-to-Image Network with Message Passing and Sparsity Regularization

Abstract

Similar content being viewed by others

Automatic Vertebra Labeling in Large-Scale Medical Images Using Deep Image-to-Image Network with Message Passing and Sparsity Regularization

Deep Image-to-Image Recurrent Network with Shape Basis Learning for Automatic Vertebra Labeling in Large-Scale 3D CT Volumes

Vertebrae Localization, Segmentation and Identification Using a Graph Optimization and an Anatomic Consistency Cycle

Keywords

1 Introduction

2 Methodology

2.1 Deep Image-to-Image Network (DI2IN) for Multiple Landmark Localization

2.2 Probability Map Enhancement with Message Passing Scheme

2.3 Sparse Representation for Landmark Refinement

3 Experiments

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation