Deep Volumetric Universal Lesion Detection Using Light-Weight Pseudo 3D Convolution and Surface Point Regression

Cai, Jinzheng; Yan, Ke; Cheng, Chi-Tung; Xiao, Jing; Liao, Chien-Hung; Lu, Le; Harrison, Adam P.

doi:10.1007/978-3-030-59719-1_1

Jinzheng Cai¹⁶,
Ke Yan¹⁶,
Chi-Tung Cheng¹⁷,
Jing Xiao¹⁸,
Chien-Hung Liao¹⁷,
Le Lu¹⁶ &
…
Adam P. Harrison¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12264))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

9380 Accesses
14 Citations

Abstract

Identifying, measuring and reporting lesions accurately and comprehensively from patient CT scans are important yet time-consuming procedures for physicians. Computer-aided lesion/significant-findings detection techniques are at the core of medical imaging, which remain very challenging due to the tremendously large variability of lesion appearance, location and size distributions in 3D imaging. In this work, we propose a novel deep anchor-free one-stage volumetric lesion detector (VLD) framework that incorporates (1) pseudo 3D convolution operators to recycle the architectural configurations and pre-trained weights from the off-the-shelf 2D networks, especially ones with large capacities to cope with data variance, and (2) a new surface point regression method to effectively regress the 3D lesion spatial extents by pinpointing their representative key points on lesion surfaces. Experimental validations are first conducted on the public large-scale NIH DeepLesion dataset where our proposed method delivers new state-of-the-art quantitative performance. We also test VLD on our in-house dataset for liver tumor detection. VLD generalizes well in both large-scale and small-sized tumor datasets in CT imaging.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Hybrid Multi-atrous and Multi-scale Network for Liver Lesion Detection

Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation

Automated Detection of Liver Tumor Using Deep Learning

Keywords

1 Introduction

Automated lesion detection is an important yet challenging task in medical image analysis, as exploited by [8, 16, 19, 22, 23, 27, 29] on the public NIH DeepLesion dataset. Its aims include improving physician’s reading efficiency and increasing the sensitivity for localizing/reporting small but vital tumors, which are more prone to be missed, e.g. human-reader sensitivity is reported at 48–57% with small-sized hepatocellular carcinoma (HCC) liver lesions [1]. Automated lesion detection remains difficult due to the tremendously large appearance variability, unpredictable locations, and frequent small-sized lesions of interest [12, 22]. In particular, two key aspects requiring further research are (1) the best means to effectively process the 3D volumetric data (since small and critical tumors require 3D imaging context to be differentiated) and (2) to more accurately regress the tumor’s 3D bounding box. This work makes significant contributions towards both aims.

Computed tomography (CT) scans are volumetric, so incorporating 3D context is the key in recognizing lesions. As a direct solution, 3D convolutional neural networks (CNNs) have achieved good performance for lung nodule detection [5, 6]. However, due to GPU memory constraints, shallower networks and smaller input dimensions are used [5, 6], which may limit the performance for more complicated detection problems. For instance, universal lesion detection (ULD) [16, 17, 21, 29], which aims to detect many lesions types with diverse appearances from the whole body, demands wider and deeper networks to extract more comprehensive image features. To resolve this issue, 2.5D networks have been designed [2, 16, 17, 20, 21, 29] that use deep 2D CNNs with ImageNet pre-trained weights and fuse image features of multiple consecutive axial slices. Nevertheless, these methods do not fully exploit 3D information since their 3D related operations operate sparsely at only selected network layers via convolutional-layer inner products. 2.5D models are also inefficient because they process CT volumes in a slice-by-slice manner. Partially inspired by [3, 14, 24], we propose applying pseudo 3D convolution (P3DC) backbones to efficiently process 3D images. This allows our volumetric lesion detector (VLD) framework to fully exploit 3D context while re-purposing off-the-shelf deep 2D network structures and inheriting their large capacities to cope with lesion variances.

Good lesion detection performance also relies on accurate bounding box regression. But, some lesions, e.g. liver lesions, frequently present vague boundaries that are hard to distinguish from background. Most existing anchor-based [15] and anchor-free [18, 28] algorithms rely on features extracted from the proposal center to predict the lesion’s extent. This is sub-optimal since lesion boundary features should intuitively be crucial for this task. To this end, we adopt and enhance the RepPoint algorithm [25], which generates a point set to estimate bounding boxes, with each point fixating on a representative part. Such a point set can drive more finely-tuned bounding box regression than traditional strategies, which is crucial for accurately localizing small lesions. Different from RepPoint, we propose surface point regression (SPR), which uses a novel triplet-base appearance regularization to force regressed points to move towards lesion boundaries. This allows for an even more accurate regression.

In this work, we advance both volumetric detection and bounding box regression using deep volumetric P3DCs and effective SPR, respectively. We demonstrate that our P3DC backbone can outperform state-of-the-art 2.5D and 3D detectors on the public large-scale NIH DeepLesion dataset [22], e.g. we increase the strongest baseline’s sensitivity of detecting small lesions from $22.4\%$ to $30.3\%$ at 1 false positive (FP) per CT volume. When incorporating SPR, our VLD outperforms the best baseline [2] by >4% sensitivity for all operating points on free-response receiver operating characteristic (FROC). We also evaluate VLD on an extremely challenging dataset (574 patient studies) of HCC liver lesions collected from archives in Chang Cung Memorial Hospital. Many patients suffer from cirrhosis, which make HCC detection extremely difficult. P3DC alone accounts for $63.6\%$ sensitivity at 1 FP per CT volume. Adding SPR boosts this sensitivity to $69.2\%$. Importantly, for both the DeepLesion and in-house HCC dataset, our complete VLD framework provides the largest performance gains for small lesions, which are the easiest to miss by human readers and thus should be the focus for any detection system.

2 Method

VLD follows a one-stage anchor-free detection workflow [2, 28], which is simple but has yielded state-of-the-art performance on DeepLesion [2]. As shown in Fig. 1, VLD takes volumetric CT scans as inputs and extracts deep convolutional features with its P3DC backbone. The extracted features are then fed into VLD’s 3D center regression and SPR heads to generate center coordinates and surface points, respectively.

2.1 P3DC Backbone

VLD relies on a deep volumetric P3DC backbone. To do this, we build off of DenseNet-121 [7]. Specifically, we first remove the fourth dense block as we found this truncated version performs better with DeepLesion. The core strategy of VLD is to keep front-end processing to 2D, while only converting the third dense block of the truncated DenseNet-121 to 3D using P3DCs. This strategy is consistent with [21], which found that introducing 3D information at higher layers is preferred to lower layers. Using N to denote convolutional kernel sizes throughout, for the first two dense blocks the weight parameters, $(c_{o},c_{i},N,N)$, are reshaped to $(c_{o},c_{i},1,N,N)$ to process volumetric data slice-by-slice. When processing dynamic CTs with multiple contrast phases, e.g., our in-house dataset, we stack the multi-phase input and inflate the weight of the first convolutional kernel along its second dimension [3].

To implement 3D processing, we convert the third dense block and task-specific heads and investigate several different options for P3DCs, which include inflated 3D (I3D) [3], spatio-temporal 3D (ST-3D) [14], and axial-coronal-sagittal 3D (ACS-3D) [24]. These options are depicted in Fig. 2. I3D [3] simply duplicates 2D kernels along the axial (3D) direction and downscales weight values by the number of duplications. Thus, I3D produces true 3D kernels. ST-3D [14] first reshapes $(c_o,c_i,N,N)$ kernels into $(c_o,c_i,1,N,N)$ to act as “spatial” kernels and introduces an extra $(c_o,c_i,N,1,1)$ kernel as the “temporal” kernel. The resulting features from both are fused using channel-wise concatenation. There are alternative ST-3D configurations; however, the parallel structure of Fig. 2(c) was shown to be best in a liver segmentation study [27]. ACS-3D [24] splits the kernel $(c_o,c_i,N,N)$ into axial $(c_{oa},c_i,N,N)$, coronal $(c_{oc},c_i,N,N)$, and sagittal $(c_{os},c_i,N,N)$ kernels, where $c_o=c_{oa}+c_{os}+c_{oc}$. Thereafter, it reshapes the view-specific kernels correspondingly into $(c_{oa},c_i,1,N,N)$, $(c_{oc},c_i,N,1,N)$, and $(c_{os},c_i,N,N,1)$. Like ST-3D, ACS-3D fuses the resulting features using channel-wise concatenation. Compared to the extra temporal-kernels introduced by ST-3D, ACS-3D requires no extra model parameters, keeping the converted model light-weight. In our implementation, we empirically set the ratio of $c_{oa}:c_{oc}:c_{os}$ to 8 : 1 : 1 as the axial plane usually holds the highest resolution.

VLD has two task-specific network heads, one to locate the lesion centers and one to regress surface points. Before inputting the deep volumetric features into the heads, we use an feature pyramid network (FPN) [10] with three $(c_o,c_i,1,1,1)$ convolutional layers to fuse outputs of the dense blocks, which helps VLD to be robust to lesions with different sizes. Focusing first on the center regression head, it takes the output of the FPN (i.e. “deep feature” in Fig. 1) and processes it with an ACS-3D convolutional layer followed by a $(1,c_i,1,1,1)$ convolutional layer. Both layers are randomly initialized. Like CenterNet [28], the output is a 3D heat map, $\hat{Y}$, that predicts lesion centers. Ground-truth heat map, Y, is generated as a Gaussian heat map with the radius in each dimension set to half of the target lesion’s width, height, and depth. We use focal loss [2, 11, 28] to train the center regression head:

$$\begin{aligned} \mathcal {L}_{ctr} = \frac{-1}{m} \sum _{xyz} {\left\{ \begin{array}{ll} (1 - \hat{Y}_{xyz})^{\alpha } \log (\hat{Y}_{xyz}) &{} \!\text {if}\ Y_{xyz}=1 \\ \begin{array}{c} (1-Y_{xyz})^{\beta } (\hat{Y}_{xyz})^{\alpha } \log (1-\hat{Y}_{xyz}) \end{array}&\!\text {otherwise} \end{array}\right. } \text {,} \end{aligned}$$

(1)

where m is the number of lesions in the CT and $\alpha =2$ and $\beta =4$ are focal-loss hyper-parameters [28]. The ground-truth heat map is <1 everywhere except at the lesion center voxel. Like recent work [2], when possible we also exploit hard negatives by generating negative-valued heatmaps in Y, which will magnify their loss contributions more than 0-valued regions. See Cai et al. [2] for more details.

2.2 Surface Point Regression

The P3DC backbone and center regression head are effective at locating lesions. But, once the lesion is located its extent must also be determined. To do this, we directly regress a 3D point set (actually offsets from the center point), using backbone features located at the center point:

$$\begin{aligned} \mathcal {P} = \{(x_k, y_k, z_k)\}_{k=1}^{n}, \end{aligned}$$

(2)

where n is the total number of points. This requires a $1\times 1 \times 1$ convolution with 3n outputs. Empirically, we find $n=16$ delivers the best results. Because $\mathcal {P}$ is computed from center-point features, it may suffer from inaccuracies. Thus, we also compute offsets to refine $\mathcal {P}$:

$$\begin{aligned} \mathcal {P}_r = \{(x_k + \varDelta x_k, y_k + \varDelta y_k, z_k + \varDelta z_k)\}_{k=1}^{n}, \end{aligned}$$

(3)

where $\{(\varDelta x_k, \varDelta y_k, \varDelta z_k)\}$ are the predicted offsets of the refined surface points. To do this, for each location in $\mathcal {P}$, we bilinearly interpolate corresponding backbone features and regress location-specific offsets. This only requires a $1\times 1 \times 1$ convolution with 3 outputs. To actually supervise the $\mathcal {P}$ and $\mathcal {P}_{r}$ regression, we compute their minimum and maximum coordinates and ensure they match with the ground-truth bounding box. More formally, if we denote the ground-truth box using its top-right-front and bottom-left-rear corners $\{(x_{trf}, y_{trf}, z_{trf}),$ $ (x_{blr}, y_{blr}, z_{blr})\}$, the regression of $\mathcal {P}$ and $\mathcal {P}_{r}$ can be trained using the following loss:

$$\begin{aligned} \mathcal {L}_{pts} = \sum _{i \in (x, y, z)} |i_{blr} - \min _{1\le k \le n}(i_k)| + |i_{trf} - \max _{1\le k \le n}(i_k)| \\ +|i_{blr} - \min _{1\le k \le n}(i_k + \varDelta i_k)| + |i_{trf} - \max _{1\le k \le n}(i_k + \varDelta i_k)| \mathrm {.} \end{aligned}$$

(4)

One important limitation of (4) is that ellipsoid lesions do not fit perfectly in cuboid boxes. As a result, regressed points may still satisfy (4) if they lay outside the lesion, but still inside the box. Such points may be more prone to produce inaccurate offsets, i.e. (3), during inference. To address this, we propose an appearance-based similarity constraint to encourage points to only fixate on lesion surfaces so that the point set can represent fine-grained lesion geometry correctly. The idea is to force surface-point appearance to be more similar to regions inside the lesion than to those outside it. This constraint is achieved by adding a triplet-loss with the lesion center as the positive anchor (inside) and box corners as negative anchors (outside). Specifically, we compute point-wise features from the center and eight corners of the bounding box with bilinear sampling and denote them as $a^p$ and $\{a^n_{j}\}_{j=1}^8$, respectively. We also extract point-wise features from $P_r$: $\{a_{k}\}_{k=1}^n$. The triplet-loss is then formulated as

$$\begin{aligned} \mathcal {L}_{tri} = \frac{1}{m} \sum _{k=1}^{n}\sum _{j=1}^{8} \max (0, \Vert a^p - a_k\Vert _2 - \Vert a^p - a^n_j\Vert _2 + 1). \end{aligned}$$

(5)

With the supervision of $\mathcal {L}_{pts}$ and $\mathcal {L}_{tri}$, we expect surface points will either move toward lesion surfaces or to the center. This constitutes our surface point regression (SPR). The extracted point-wise features are designed to be semantic in nature (healthy versus lesion tissue). Thus, complex lesion appearances, e.g., cavitations, should be mapped to a similar semantic space. We optimize the SPR together with the center regression head by minimizing a joint loss function:

$$\begin{aligned} \mathcal {L} = \mathcal {L}_{ctr} + 0.1(\mathcal {L}_{pts} + \mathcal {L}_{tri}). \end{aligned}$$

(6)

2.3 Implementation Details

We implement our system in Pytorch [13] on four NVIDIA Quadro RTX 6000 GPUs. The P3DC backbone weights were initialized with the pre-trained Lesion Harvester weights [2], which were trained using the official DeepLesion data split so there is no data leakage. We also tried ImageNet-pretrained weights and random initialization, but performance was not as good. All other layers were randomly initialized. The FPN’s output, i.e., “deep feature” in Fig. 1, has 512 channels. In the task-specific heads, each ACS-3D layer consists of an ACS-3D convolutional layer with a kernel size of 3 and $c_{ao}+c_{co}+c_{so}=256$. The output channels of the lesion center heat map, $\mathcal {P}$, $\mathcal {P}_{r}$, and point-wise features are 1, 48 (16 points), 3, and 128, respectively. We adopt the Adam [9] optimizer and set a base learning rate to 0.0001, which was reduced by a factor of 10 after the validation loss reached its minimum value.

3 Experimental Results

Datasets. We evaluate our approach on two datasets. DeepLesion [23] is a large-scale benchmark for ULD that comprises 32,735 retrospectively clinically annotated lesions from 10,594 CT scans of 4,427 unique patients. Many works report performance on DeepLesion, but most are either 2D [16, 17, 29] or 2.5D [20, 21]. We use the 3D annotations and hard-negatives from [2] to both train and evaluate DeepLesion. The volumetric test set of DeepLesion [2] includes 272 fully-annotated sub-volumes and more accurately reflects the 3D lesion detection performance. HCC Liver Dataset: We also evaluate on our in-house dataset of 574 dynamic CT studies of patients with HCC liver lesions. HCC is one of the most fatal cancers and detection at early stages is crucial. However, HCC often co-occurs with liver fibrosis, challenging lesion discovery. Human sensitivities have been reported to be 48–57% for small-sized lesions [1]. We randomly split the dataset patient-wise into 384, 92, and 98 studies for training, validation, and testing, respectively.

Evaluation and Comparison Methods. A detected bounding-box is regarded as correct when the 3D-IoU between the detected box and a ground-truth box exceeds 0.3. The FROC is used for evaluation. We first evaluate different P3DC backbones: ST-3D, I3D, and ACS-3D. We also test a shallow fully-3D UNet [4] backbone within the CenterNet [28] framework and also against the 2.5D Lesion Harvester [2], which reports the highest performance to date for the DeepLesion dataset. These two competitors directly regress a lesion’s size using features sampled from the predicted lesion center and can also naturally learn from hard-negatives [2]. In addition, we also report results using CenterNet (2D) [28], Faster R-CNN (2.5D) [15], and MULAN (2.5D) [21], drawn from Cai et al.’s experiments [2]. This represents a comprehensive comparison across many different detector variants. To measure the impact of our proposed SPR, we also implement VLD with deep representative points (DRP) [26] that foregoes the appearance-based triplet loss. Finally, we evaluate our proposed VLD framework: P3DC + SPR.

Table 1. Sensitivities (%) at various FPs per CT volume.

Full size table

Results. In Table 1, we compare our proposed approach against alternative approaches. Using FROC analysis, the average sensitivities on DeepLesion are: CenterNet-2D $27.9\%$; CenterNet-3D $18.7\%$; Faster R-CNN $25.4\%$; MULAN $27.9\%$; Lesion Harvester $31.9\%$, and our strongest P3DC variant $36.4\%$. As can been seen, P3DC significantly outperforms the previous SOTA Lesion Harvester and MULAN methods by $4.5\%$ and $8.5\%$, respectively, which validates the effectiveness of P3DC over its 2.5D counterparts. We choose ACS-3D over I3D for it produces comparable performance to I3D, meanwhile it keeps VLD light-weight.

From Table 1, we also observe that adding the original DRP method actually underperforms the baseline P3DC. This in fact motivated our development of SPR. The DRP method lacks explicit constraints on point locations, making it challenging to automatically learn effective point-wise feature from CT images. In contrast, SPR introduces surface constraints to force the regressed points to distribute onto lesion surfaces. Tests on our in-house dataset also confirms that our proposed SPR can improve sensitivities on HCC liver lesion detection.

While these results demonstrate the value of our P3DC backbone and SPR bounding-box regression, even more convincing conclusions can be drawn when analyzing performance based on lesion size. In DeepLesion, we use 2 cm and 5 cm as cut-off sizes. However, our HCC liver dataset has hardly any lesions smaller than 2 cm, so we only stratify based on a 5 cm cut-off. As Table 2 indicates, compared to Lesion Harvester, our P3DC backbone can yield improvements of $~7\%$ sensitivity for small-size lesions in DeepLesion. These are the most critical lesions to detect, since these are the easiest for human observers to miss. Adding the SPR boosts small-size performance even further, indicating that SPR’s aggregation of boundary features can produce improved fine-grained bounding boxes. Moving to the HCC dataset, our SPR can produce boosts in sensitivity of over $4\%$ compared to direct CenterNet-style regression, further validating our SPR regression strategy. These are clinically significant performance improvements. Visual examples can be found in Fig. 3, and our supplementary material, depicting the process of SPR’s more refined regression of bounding box extents.

Table 2. Size-stratified sensitivities (%) at FP $=1$ per CT volume.$^{\dagger }$: P3DC+DRP produces FPs with high confidences, thus at FP $=1$, it has lower sensitivity than P3DC+SPR on HCC Liver.

Full size table

4 Conclusion

In this work, we tackle challenges of lesion detection in CT scans by proposing a very deep volumetric lesion detection model VLD. It processes CT scans directly in 3D so as to fully incorporate 3D context for better performance. It has very deep backbones with large capacities so that it can handle lesions with large appearance variability. Its surface point regression head can effectively estimate the 3D lesion spatial extents. It also generalize well with small-scaled medical datasets as it is light-weight and can be initialized with pre-trained 2D networks. Compared with 2D, 2.5D, and fully 3D variants, our method is superior in accuracy, model size, and speed (see our supplementary material). The proposed VLD achieved new SOTA performance on the large-scale NIH DeepLesion dataset. It has also validated its generalization capability on our in-house HCC liver dataset.

References

Addley, H., et al.: Accuracy of hepatocellular carcinoma detection on multidetector CT in a transplant liver population with explant liver correlation. Clin. Radiol. 66, 349–356 (2011)
Article Google Scholar
Cai, J., et al.: Lesion harvester: iteratively mining unlabeled lesions and hard-negative examples at scale. CoRR abs/2001.07776 (2020)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR 2017, pp. 4724–4733. IEEE (2017)
Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Ding, J., Li, A., Hu, Z., Wang, L.: Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 559–567. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_64
Chapter Google Scholar
Dou, Q., Chen, H., Yu, L., Qin, J., Heng, P.: Multilevel contextual 3-D CNNs for false positive reduction in pulmonary nodule detection. IEEE Trans. Biomed. Eng. 64(7), 1558–1567 (2017)
Article Google Scholar
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR 2017, pp. 2261–2269. IEEE (2017)
Google Scholar
Jiang, C., Wang, S., Xu, H., Liang, X.: Elixirnet: relation-aware network architecture adaptation for medical lesion detection. In: AAAI 2020, pp. 11093–11100. AAAI Press (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) ICLR 2015 (2015)
Google Scholar
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR 2017, pp. 936–944. IEEE (2017)
Google Scholar
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV 2017, pp. 2999–3007. IEEE (2017)
Google Scholar
Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Article Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8024–8035 (2019)
Google Scholar
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: ICCV 2017, pp. 5534–5542. IEEE (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS 2015, pp. 91–99 (2015)
Google Scholar
Shao, Q., Gong, L., Ma, K., Liu, H., Zheng, Y.: Attentive CT lesion detection using deep pyramid inference with multi-scale booster. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 301–309. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_34
Chapter Google Scholar
Tang, Y., Yan, K., Tang, Y., Liu, J., Xiao, J., Summers, R.M.: Uldor: a universal lesion detector for CT scans with pseudo masks and hard negative example mining. In: ISBI 2019, pp. 833–836. IEEE (2019)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV 2019, pp. 9626–9635. IEEE (2019)
Google Scholar
Wang, X., Cai, Z., Gao, D., Vasconcelos, N.: Towards universal object detection by domain attention. In: CVPR 2019, pp. 7289–7298. IEEE (2019)
Google Scholar
Yan, K., Bagheri, M., Summers, R.M.: 3D context enhanced region-based convolutional neural network for end-to-end lesion detection. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 511–519. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_58
Chapter Google Scholar
Yan, K., et al.: MULAN: multitask universal lesion analysis network for joint lesion detection, tagging, and segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 194–202. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_22
Chapter Google Scholar
Yan, K., Wang, X., Lu, L., Summers, R.M.: Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. J. Med. Imaging 5(3), 036501 (2018)
Article Google Scholar
Yan, K., et al.: Deep lesion graphs in the wild: Relationship learning and organization of significant radiology image findings in a diverse large-scale lesion database. In: CVPR 2018, pp. 9261–9270. IEEE (2018)
Google Scholar
Yang, J., Huang, X., Ni, B., Xu, J., Yang, C., Xu, G.: Reinventing 2D convolutions for 3D medical images. CoRR abs/1911.10477 (2019)
Google Scholar
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: ICCV 2019, pp. 9656–9665. IEEE (2019)
Google Scholar
Yang, Z., et al.: Dense reppoints: representing visual objects with dense point sets. CoRR abs/1912.11473 (2019)
Google Scholar
Zhang, J., Xie, Y., Zhang, P., Chen, H., Xia, Y., Shen, C.: Light-weight hybrid convolutional network for liver tumor segmentation. In: Kraus, S. (ed.) IJCAI 2019, pp. 4271–4277. ijcai.org (2019)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. CoRR abs/1904.07850 (2019)
Google Scholar
Zlocha, M., Dou, Q., Glocker, B.: Improving RetinaNet for CT lesion detection with dense masks from weak RECIST labels. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 402–410. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_45
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

PAII Inc., Bethesda, MD, USA
Jinzheng Cai, Ke Yan, Le Lu & Adam P. Harrison
Chang Gung Memorial Hospital, Linkou, Taiwan, ROC
Chi-Tung Cheng & Chien-Hung Liao
Ping An Technology, Shenzhen, China
Jing Xiao

Authors

Jinzheng Cai
View author publications
You can also search for this author in PubMed Google Scholar
Ke Yan
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Tung Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Chien-Hung Liao
View author publications
You can also search for this author in PubMed Google Scholar
Le Lu
View author publications
You can also search for this author in PubMed Google Scholar
Adam P. Harrison
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinzheng Cai .

Editor information

Editors and Affiliations

University of Toronto, Toronto, ON, Canada
Anne L. Martel
The University of British Columbia, Vancouver, BC, Canada
Purang Abolmaesumi
University College London, London, UK
Danail Stoyanov
École Centrale de Nantes, Nantes, France
Diana Mateus
EURECOM, Biot, France
Maria A. Zuluaga
Chinese Academy of Sciences, Beijing, China
S. Kevin Zhou
Sorbonne University, Paris, France
Daniel Racoceanu
The Hebrew University of Jerusalem, Jerusalem, Israel
Leo Joskowicz

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 16179 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, J. et al. (2020). Deep Volumetric Universal Lesion Detection Using Light-Weight Pseudo 3D Convolution and Surface Point Regression. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12264. Springer, Cham. https://doi.org/10.1007/978-3-030-59719-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-59719-1_1
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59718-4
Online ISBN: 978-3-030-59719-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)