Abstract
Computed tomography (CT) reconstruction from X-ray projections acquired within a limited angle range is challenging, especially when the angle range is extremely small. Both analytical and iterative models need more projections for effective modeling. Deep learning methods have gained prevalence due to their excellent reconstruction performances, but such success is mainly limited within the same dataset and does not generalize across datasets with different distributions. Hereby we propose ExtraPolationNetwork for limited-angle CT reconstruction via the introduction of a sinogram extrapolation module, which is theoretically justified. The module complements extra sinogram information and boots model generalizability. Extensive experimental results show that our reconstruction model achieves state-of-the-art performance on NIH-AAPM dataset, similar to existing approaches. More importantly, we show that using such a sinogram extrapolation module significantly improves the generalization capability of the model on unseen datasets (e.g., COVID-19 and LIDC datasets) when compared to existing approaches.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction and Motivation
In healthcare, Computed Tomography (CT) based on X-ray projections is an indispensable imaging modality for clinical diagnosis. Limited-angle (LA) CT is a common type of acquisition in many scenarios, such as to reduce radiation dose in low-dose CT or forced to take projections in a restricted range of angles in C-arm CT [17] and dental CT. However, the deficiency of projection angles brings significant challenge to image reconstruction and may lead to severe artifacts in the reconstructed images.
Many CT image reconstruction algorithms have been proposed in the literature to improve image quality, which can be categorized as model-based and deep-learning-based methods. For example, Filtered Back Projection (FBP) [20], as a representative analytical method, is widely used for reconstructing a high-quality image efficiently. However, FBP prefers acquisition with full-ranged views which makes using it for LACT sub-optimal. The (some times extreme) reduction on the range of projection angles decreases the effectiveness of the commercial CT reconstruction algorithms. To overcome such challenge, iterative regularization-based algorithms [6, 12, 15, 18, 21, 27] are proposed to leverage prior-knowledge on the image to be reconstructed and achieve better reconstruction performance for LACT. Notice that those iterative algorithms are often computationally expensive and require careful case-by-case hyperparameter tuning.
Currently, deep learning (DL) techniques have been widely adopted in CT and demonstrate promising reconstruction performance [2, 8, 24, 28, 32, 33]. By further combining the iterative algorithms with DL, a series of iterative frameworks with the accordingly designed neural-network-based modules are proposed [1, 3, 5, 7, 13, 19, 29]. ADMMNet [25] introduces a neural-network-based module in reconstruction problem and achieves remarkable performance. Furthermore, DuDoNet [10, 11], ADMM-CSNet [26] and LEARN++ [31] improve reconstruction results with an enhancement module in the projection domain, which inspires us to fuse dual-domain learning in our model design.
Although deep-learning-based algorithms have achieved state-of-the-art performance, they are also known to easily over-fit on training data, which is not expected in practice. MetaInvNet [30] is then proposed to improve the reconstruction performance with sparse-view projections, demonstrating good model generalizability. They attempt to find better initialization for an iterative HQS-CG [6] model with a U-Net [16] and achieve better generalization performance in such scenarios. But they still focus on the case with a large range of acquired projections, which limits the application of their model in practice. How to obtain a highly generalizable model when learning from practical data is still difficult.
To retain model generalizability in LACT reconstruction, we propose a model, called ExtraPolationNetwork (EPNet), for recovering high-quality CT images. In this model, we utilize dual-domain learning to emphasize data consistency between image domain and projection domain, and introduce an extrapolation module. The proposed extrapolation module helps complement missed information in the projection domain and provides extra details for reconstruction. Extensive experimental results show that the model achieves state-of-the-art performance on the NIH-AAPM dataset [14]. Furthermore, we also achieve better generalization performance on additional datasets, COVID-19 and LIDC [4]. This empirically verifies the effectiveness of the proposed extrapolation module. We make our implementation available at https://github.com/mars11121/EPNet.
2 Problem Formulation
CT reconstruction aims to reconstruct clean image u from the projection data Y with unknown noise n, whose mathematical formulation is:
where A is the Radon transform. For LACT, the projection data Y is incomplete as a result of the decrease of angle range (the view angle \(\alpha \in [0, \alpha _{max}]\) with \(\alpha _{max} < 180^\circ \)). The reduced sinogram information limits the performance of current reconstruction methods. Therefore, the estimation of a more complete sinogram \(\widetilde{Y}\) is necessary to enhance model reconstruction performance. To yield such an accurate estimation, the consistency between incomplete projection Y and complete projection \(\widetilde{Y}\) is crucial. We assume Y is obtained by some operations (e.g. downsampling operation) from \(\widetilde{Y}\). Besides, \(\widetilde{Y}\) and clean image u should also be consistent under the corresponding transformation matrix \(\widetilde{A}\). Consequently, we propose the following constraints:
where P is the downsampling matrix.
In this way, the final model becomes the following optimization problem:
where \(R(\cdot )\) is a regularization term incorporating image priors.
3 Proposed Method
In this section, we introduce full details on the proposed method, which is depicted in Fig. 1. Our model is built by unrolling the HQS-CG [6] algorithm with N iterations. The HQS-CG algorithm is briefly introduced in Sect. 3.1. Specifically, we utilize the Init-CNN module [30] to search for a better initialization for Conjugate Gradient (CG) algorithm in each iteration. The input of the module is composed of reconstructed images from the image domain and projection domain. In the image domain, we retrain the basic HQS-CG model and use the CG module for reconstruction. In the projection domain, we first use our proposed Extrapolation Layer (EPL) to estimate extra sinograms. Then, we use Sinogram Enhancement Network (SENet) to inpaint the extrapolated sinograms and reconstruct them with Radon Inversion Layer (RIL) [10], which is capable of backpropagating gradients to the previous layer. Section 3.2 introduces the details of the involved modules.
3.1 HQS-CG Algorithm
Traditionally, there exist many effective algorithms to solve objective (2). One such algorithm is Half Quadratic Splitting (HQS) [6], which solves the following:
where \(W=({W}_{1},{W}_{2},\ldots ,{W}_{M})\) is a \(M-\)channel operator, \(z=({z}_{1},{z}_{2}, \ldots ,{z}_{M})\), \(\lambda >0 \), \(\beta _{1}>0\), \(\beta _{2}>0\), and \(\gamma =({\gamma }_{1},{\gamma }_{2},\ldots ,{\gamma }_{M})\) with \(\{{\gamma }_{i}\}_{i=1}^{M} > 0\). The operator W is chosen as the highpass components of the piecewise linear tight wavelet frame transform. With alternating optimization among \(\widetilde{Y}\), u, and z, the final closed-form solution could be derived as follows:
where \({\tau }_{\lambda }(x) = \mathrm {sgn}(x)\max \{\Vert x\Vert -\lambda , 0\}\) is the soft-thresholding operator.
3.2 Dual-Domain Reconstruction Pipelines
Init-CNN and RIL. We realize the Init-CNN module with a heavy U-Net architecture with skip connection, which stabilizes the training. Besides, the heavy U-Net shares parameters across different steps, which is proved more powerful for the final reconstruction. The Radon Inversion Layer (RIL) is first introduced in DuDoNet [10], which builds dual-domain learning for reconstruction. We here use the module to obtain the reconstructed image from the projection domain.
EPL. As introduced, the reduction of angle range is the main bottleneck in the limited-angle scenario. Besides, usual interpolation techniques are not suitable in this case. But few researchers consider extrapolating sinograms with CNNs, which provides more details of images since sinograms contain both spatial and temporal (or view angle) information of the corresponding images. Compared with the image domain difference, sinograms from different data distributions also have similarities in the temporal dimension. To utilize such an advantage, we propose a module called “Extrapolation Layer (EPL)” to extrapolate sinograms before SENet.
As shown in Fig. 2, the EPL module is composed of three parallel convolutional neural networks, where the left and the right networks are used to predict neighboring sinograms of the corresponding sides and the middle one is used to denoise the input. The outputs of the three networks are then concatenated, followed with the proposed supervision defined as follows:
where \({Y}_{out}\) is the predicted sinogram, \({Y}_{gt}\) is the corresponding ground-truth, and mask is a binary matrix to emphasize the bilateral prediction. Here, we utilize RIL to realize a dual-domain consistency for the prediction, which makes the module estimation more accurate when embedded into the whole model.
SENet. With extrapolated sinograms, we then use SENet to firstly enhance the quality of sinograms, which is designed as a light CNN as in Fig. 3. At last, the enhanced sinograms are mapped to the image domain via RIL, which would help decrease the different optimization directions in our dual-domain learning. The objective for SENet is as follows:
where \({Y}_{se}\) is the enhanced sinogram, \({Y}_{gt}\) and \({u}_{gt}\) are the corresponding ground-truth sinogram and image, respectively.
Loss function. With the above modules, the full objective function of EPNet is defined by:
where N is the total iterations of unrolled back-bone HQS-CG model, \(\{u\}_{i=1}^{N}\) is the reconstructed image of each iteration, and \(\mathcal{L}_{ssim}\) is the SSIM loss.
4 Experimental Results
4.1 Datasets and Experimental Settings
Datasets. We first train and test models on the “2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge” dataset [14]. Specifically, we choose 1,746 slices of five patients for training and 1,716 slices of another five patients for testing. To further show our models’ generalization capability, we test our models on 1,958 slices of four patients chosen from the COVID-19 dataset, and 1,635 slices of six patients from the LIDC dataset [4]. The two datasets are also composed of chest CT images but from different scenarios and machines, which constitutes good choices for testing the generalization capability. All the experiments are conducted with Fan-Beam Geometry and the number of detector elements is set to 800. Besides, we add mixed noise, composed of 5% Gaussian noise and Poisson noise with an intensity of \(5{e}^{6}\), to all simulated sinograms [30].
Implementations and Training Settings. All the compared models are trained and tested with the corresponding angle number (15, 30, 60, 90) except MetaInvNet_ori, which is trained with 180 angle number as Zhang et al. [30] do. Our models are implemented using the PyTorch framework. We use the Adam optimizer [9] with \(({\beta }_{1}, {\beta }_{2})\) = (0.9, 0.999) to train these models. The learning rate starts from 0.0001. Models are all trained on a Nvidia 3090 GPU card for 10 epochs with a batch size of 1.
Evaluation Metric. Quantitative results are measured by the multi-scale structural similarity index (SSIM) (with level = 5, Gaussian kernel size = 11, and standard deviation = 1.5) [23] and peak signal-to-noise ratio (PSNR) [22] (Table 2).
4.2 Ablation Study
To investigate the effectiveness of different modules and used hyperparameters for models, we firstly conduct an ablation study with the following configurations, where the number of the input sinogram angle is fixed to \(\alpha _{max}=60\):
a) EPL30: our model with pretrained EPL fixed and extrapolate 30 angles,
b) EPL30\(_{re}\): our model with pretrained EPL not fixed and extrapolate 30 angles,
c) EPL60: our model with pretrained EPL fixed and extrapolate 60 angles,
d) EPL120: our model with pretrained EPL fixed and extrapolate 120 angles,
e) DuDoEPL30: DuDoNet with our proposed EPL and extrapolate 30 angles.
The quantitative comparison is shown in Table 1. When comparing (a) and (b), retraining parameters of the pretrained EPL module reduces the generalizability of our model, so we fix the EPL module parameters in later experiments. Besides, we investigate the most suitable extrapolated angle number for EPL. Comparing models (a) (c) (d), when increasing the number of extrapolated angles from 30 to 120, the reconstruction performance on AAPM-test is not affected, but the generalization performance gradually reduces. Therefore, we fix the number of extrapolated angles as 30 in all later experiments. Besides, we also insert the module in DuDoNet [10], the reconstruction performance drops a lot, but the module also improves generalization result by about 1.5 dB.
4.3 Quantitative and Qualitative Results Comparison
Quantitative Results Comparison. Then, we quantitatively compare our models with model-based and data-driven models. Results on the AAPM-test set show that the performance of our models and retrained MetaInvNet [30] are the best. Besides, the original training setting of MetaInvNet has achieved a better generalization performance on COVID-test and LIDC-test sets, but they need more projections to train the model and our models have also achieved better generalizability results than it except when \(\alpha _{max}\) = 15, which is due to the extremely limited sinogram information fed into extrapolation layer. On the other hand, HQS-CG has kept their performance across different data distributions, however the prior knowledge modeling limits their reconstruction performance on AAPM-test set, and the tuning and computation time is too expensive.
Qualitative Results Comparison. We also visualize the reconstruction results of these methods on AAPM-test and COVID-test datasets. As in the first three rows of Fig. 4, the reconstructed images from ours and retrained MetaInvNet show the best visualization quality on AAPM-test set across different angle numbers. Besides, our results show sharper details with the additional utilization of \(\mathcal{L}_{SE}\) in the projection domain. When testing the reconstructed image on the COVID-test set, our result also gives sharper details but with more artifacts since the data distribution is very different. Although HQS-CG has achieved better quantitative results on the COVID-test dataset, the reconstructed image of their model in the fourth row is even smoother than FBP.
5 Conclusion
We propose the novel EPNet for limited-angle CT image reconstruction and the model achieves exciting generalization performance. We utilize dual-domain learning for data consistency in two domains and propose an EPL module to estimate extra sinograms, which provide useful information for the final reconstruction. Quantitative and qualitative comparisons with competing methods verify the reconstruction performance and the generalizability of our model. The effectiveness encourages us to further explore designing a better architecture for EPL in the future.
References
Adler, J., Öktem, O.: Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 37(6), 1322–1332 (2018)
Chen, H., et al.: Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Trans. Med. Imaging 36(12), 2524–2535 (2017)
Cheng, W., Wang, Y., Li, H., Duan, Y.: Learned full-sampling reconstruction from incomplete data. IEEE Trans. Comput. Imag. 6, 945–957 (2020)
Clark, K., et al.: The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26(6), 1045–1057 (2013)
Ding, Q., Chen, G., Zhang, X., Huang, Q., Ji, H., Gao, H.: Low-dose CT with deep learning regularization via proximal forward backward splitting. Phys. Med. Biol. 65, 125009 (2020)
Geman, D., Yang, C.: Nonlinear image recovery with half-quadratic regularization. IEEE Trans. Image Process. 4(7), 932–946 (1995)
Gupta, H., Jin, K.H., Nguyen, H.Q., McCann, M.T., Unser, M.: CNN-based projected gradient descent for consistent CT image reconstruction. IEEE Trans. Med. Imaging 37(6), 1440–1453 (2018)
Jin, K.H., McCann, M.T., Froustey, E., Unser, M.: Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 26(9), 4509–4522 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lin, W.A., et al.: DuDoNet: dual domain network for CT metal artifact reduction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10512–10521 (2019)
Lyu, Y., Lin, W.-A., Liao, H., Lu, J., Zhou, S.K.: Encoding metal mask projection for metal artifact reduction in computed tomography. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12262, pp. 147–157. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59713-9_15
Mahmood, F., Shahid, N., Skoglund, U., Vandergheynst, P.: Adaptive graph-based total variation for tomographic reconstructions. IEEE Signal Process. Lett. 25(5), 700–704 (2018)
Mardani, M., et al.: Neural proximal gradient descent for compressive imaging. arXiv preprint arXiv:1806.03963 (2018)
McCollough, C.: TU-FG-207A-04: overview of the low dose CT grand challenge. Med. Phys. 43(6Part35), 3759–3760 (2016)
Rantala, M., et al.: Wavelet-based reconstruction for limited-angle x-ray tomography. IEEE Trans. Med. Imaging 25(2), 210–217 (2006)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Schafer, S., et al.: Mobile C-arm cone-beam CT for guidance of spine surgery: image quality, radiation dose, and integration with interventional guidance. Med. Phys. 38(8), 4563–4574 (2011)
Sidky, E.Y., Pan, X.: Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Phys. Med. Biol. 53(17), 4777 (2008)
Solomon, O., et al.: Deep unfolded robust PCA with application to clutter suppression in ultrasound. IEEE Trans. Med. Imaging 39(4), 1051–1063 (2019)
Wang, G., Zhang, Y., Ye, X., Mou, X.: Machine Learning for Tomographic Imaging. IOP Publishing, Bristol (2019)
Wang, T., Nakamoto, K., Zhang, H., Liu, H.: Reweighted anisotropic total variation minimization for limited-angle CT reconstruction. IEEE Trans. Nucl. Sci. 64(10), 2742–2760 (2017)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402. IEEE (2003)
Yang, Q., et al.: Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE Trans. Med. Imaging 37(6), 1348–1357 (2018)
Yang, Y., Sun, J., Li, H., Xu, Z.: Deep ADMM-Net for compressive sensing MRI. Adv. Neural. Inf. Process. Syst. 29, 10–18 (2016)
Yang, Y., Sun, J., Li, H., Xu, Z.: ADMM-CSNet: a deep learning approach for image compressive sensing. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 521–538 (2018)
Zeng, D., et al.: Spectral CT image restoration via an average image-induced nonlocal means filter. IEEE Trans. Biomed. Eng. 63(5), 1044–1057 (2015)
Zhang, H.M., Dong, B.: A review on deep learning in medical image reconstruction. J. Oper. Res. Soc. China, 1–30 (2020)
Zhang, H., Dong, B., Liu, B.: JSR-Net: a deep network for joint spatial-radon domain CT reconstruction from incomplete data. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3657–3661. IEEE (2019)
Zhang, H., Liu, B., Yu, H., Dong, B.: MetaInv-Net: meta inversion network for sparse view CT image reconstruction. IEEE Trans. Med. Imaging 40(2), 621–634 (2021)
Zhang, Y., et al.: LEARN++: recurrent dual-domain reconstruction network for compressed sensing CT. arXiv preprint arXiv:2012.06983 (2020)
Zhou, S.K., et al.: A review of deep learning in medical imaging: imaging traits, technology trends, case studies with progress highlights, and future promises. In: Proceedings of the IEEE (2021)
Zhou, S.K., Rueckert, D., Fichtinger, G.: Handbook of Medical Image Computing and Computer Assisted Intervention. Academic Press, San Diego (2019)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 11831002, in part by the Beijing Natural Science Foundation under Grant 180001, in part by the NSFC under Grant 12090022, and in part by the Beijing Academy of Artificial Intelligence (BAAI).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, C. et al. (2021). Improving Generalizability in Limited-Angle CT Reconstruction with Sinogram Extrapolation. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12906. Springer, Cham. https://doi.org/10.1007/978-3-030-87231-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-87231-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87230-4
Online ISBN: 978-3-030-87231-1
eBook Packages: Computer ScienceComputer Science (R0)