Keywords

1 Introduction

Image reconstruction from down-sampled or limited measurements, e.g., low dose and limited angle CT, are examples of ill-posed inverse problems, which can be formulated as estimating the image \(u\in X\) from the measurement \(g\in Y\),

$$\begin{aligned} g = Au + n, \end{aligned}$$
(1)

where the reconstruction space X and data space Y are typically Hilbert space, \(A:X\rightarrow Y\) is the projection matrix for sparse CT, and \(n\in Y\) is the random noises generated during the imaging processes. The goal of CT reconstruction is to recover the image u from the set of acquired projection data g. For the sparse-view CT, the system matrix, denoted by \(A_S\), has fewer rows than columns so that there is a nontrivial nullspace and has infinity many solutions. Even if the solution of the inverse problem exists and is unique, the linear operator \(A_S\) may still be ill-conditioned such that the condition number \(\Vert A\Vert \Vert A^{-1}\Vert \) is large and the linear system (1) is sensitive to the perturbations in data.

One way for the ill-posed inverse problem is to introduce certain regularity into the problem to guarantee the existence, uniqueness and stability of the solution. The general regularization method gives the following energy minimization problem

$$\begin{aligned} \min _{u\in X} ~ \mathcal D(Au,g)+\mathcal R(u), \end{aligned}$$
(2)

where \(\mathcal D(Au,g)\) is the data fidelity term and \(\mathcal R(u)\) is the regularization. Thus, the task of solving (2) mainly includes: (1) how to define the data fidelity to describe the interrelationship between g and u; and (2) how to model the regularization according to the prior information of u. In case of additive Gaussian noise and u being piecewise constant, we can obtain the well-known total variation minimization model for CT reconstruction [7]. Although TV regularization improves the reconstruction quality compared to analytical reconstruction such as filtered back-projection (FBP) method, it is still not judicious to choose the data fidelity and regularization in such a sophisticated way.

Due to the development of deep convolutional neural networks (CNN) in a broad range of computer vision tasks, deep learning techniques are being actively used in medical imaging community. The pioneer work of Yang et al. [8] reformulated an ADMM algorithm for compressive sensing MR imaging into a deep network by learning the parameters end-to-end in the training phase. Jin et al. [4] used the deep CNN as a post-processing step after the reconstruction of FBP to mitigate noises and artifacts. Adler and Öktem [1] proposed the learned primal dual algorithm for CT reconstruction by unrolling the proximal primal-dual optimization method and replacing the proximal operators with CNNs. Liu, Kuang and Zhang [6] used a deep learning regularization structure to learn the data consistence from the observed data. Dong, Li and Shen [3] proposed a joint spatial-Radon domain reconstruction (JSR) model for sparse view CT imaging, and was recently reformulated into the feed-forward deep network [9]. Learning-based models have been already proven efficient for image reconstruction problems.

In this work, we aim to reconstruct the sparse-view CT by making using of the full-sampling system matrix, which is called as the learned full sampling reconstruction (FSR). Instead of modeling the data fidelity term according to the noise distribution and the regularization term based on the prior information, we take the advantages of deep CNN to learn the interrelationship between observed data and reconstruction image and the prior information directly from the data. As we can obtain the full sampling system-matrix according to the sufficient sampling conditions in [5], we introduce another fidelity term to enforce the closeness of the reconstructed image and the full sampling projection data. In this way, we can learn the prior information of the completed Radon domain data from the training data, which is then applied to approximate the full-sampling projection in the testing. We use the alternating direction method to achieve an iterative scheme, and find the best update in each iteration using the CNN. Numerical experiments demonstrate that the proposed FSR-net achieves better performance in sparse-view CT reconstruction.

2 Our Approach

In CT reconstruction, the system matrix A reflects the relationship between the projections on detector and the reconstructed objects. For the circular fan-beam CT, the dimensions of the system matrix A are \(M\times N_{\mathrm {pix}}\), where \(N_{\mathrm {pix}}\) denotes the total number of pixels and M is the number of ray integrations defined by

$$ M = N_{\mathrm {views}}\times N_{\mathrm {bins}} $$

with \(N_{\mathrm {views}}\) being the number of views (i.e., \(2\pi \) arc is divided into \(N_{\mathrm {views}}\) equally spaced angular intervals) and \(N_{\mathrm {bins}}\) being the number of bins on the detectors (i.e., the detector is equally divided into \(N_{\mathrm {bins}}\)). Before we discuss the sparse-view CT, we define the full sampling based on the four sufficient-sampling conditions (SSCs) in [5], which is obtained by setting the sampling parameters \(N_{\mathrm {views}}\) and \(N_{\mathrm {bins}}\) for given \(N_{\mathrm {pix}}\) to characterize the invertibility and stability of the system matrix. The first pair of the SSCs characterizes invertibility of A, that is

$$\begin{aligned} \mathrm {SSC1:} ~M\ge N_{\mathrm {pix}}~~\hbox {and}\quad \mathrm {SSC2:} ~\sigma _{\mathrm {min}}\ne 0, \end{aligned}$$

where \(\sigma _{\mathrm {min}}\) is the smallest singular value of A. The other pair of the SSCs characterizes the numerical stability for inversion of A, which is defined as

$$\begin{aligned} \mathrm {SSC3:} ~\frac{\kappa (A)}{\kappa _{DC}}<r_{\hbox {samp}}~~~\hbox {and}\quad \mathrm {SSC4:} ~N_{\mathrm {views}}=N_{\mathrm {bins}}=2N, \end{aligned}$$

where , \(r_{\hbox {samp}}\) is a finite ratio parameter greater than 1, and N is the length of the field-of-view (ROV) of the detector. The relationship between N and \(N_{\mathrm {pix}}\) is \( N_{\mathrm {pix}}\approx \frac{\pi }{4}N^2\) and we simply let \(N\approx \sqrt{N_{\mathrm {pix}}}\). Both the SSC1 and SSC4 are simple to evaluate, which will be used in our work.

When the \(N_{\mathrm {views}}\) is not large enough to meet the SSCs for the fixed \(N_{\mathrm {bins}}\), it can be regarded as the sparse-view CT problem. Our goal is to develop efficient reconstruction methods for such ill-posed inverse problem. Since the full-sampling system matrix can be constructed according to the SSCs, we directly bridge the completed Radon domain data \(f\in Z\) and the reconstructed image \(u\in X\) through a full-sampling system matrix such that

$$\begin{aligned} f =A_F u, \end{aligned}$$

where \(A_F: X\rightarrow Z\) is the full-sampling projection matrix and Z is a Hilbert space. Therefore, we propose the following minimization model to jointly reconstruct the spatial and Radon domain data for sparse-view CT

$$\begin{aligned} \min _{u\in X,f\in Z} \mathcal D(A_Su,g) + \mathcal R(u) + \mathcal F(A_Fu,f), \end{aligned}$$
(3)

where \(\mathcal F(A_Fu,f)\) is used to measure the distance between \(A_Fu\) and f. Since the unknown u and f are coupled together in (3), we introduce a new variable \(\tilde{u}\) and rewrite (3) by adding a fitting term \(\Vert \tilde{u} - u \Vert ^2\) as follows

$$\begin{aligned} \min _{u\in X, f\in Z,\tilde{u}\in X} \mathcal D(A_S\tilde{u}, g) + \mathcal R(\tilde{u}) + \mathcal F(A_Fu,f) + \frac{1}{2r}\Vert \tilde{u} - u \Vert ^2_X, \end{aligned}$$
(4)

where r is a positive parameter used to measure the trade-off between the under-sampling data g and a full-sampling projected data f. The first term in (4) contains the linear operator \(A_S\), which can be reformulated based on the Legendre-Fenchel conjugate [2]

$$\begin{aligned} \min _{u\in X, f\in Z, \tilde{u}\in X} \max _{p\in Y} ~\langle A_S\tilde{u},p\rangle -\mathcal D^*(p,g)+\mathcal R(\tilde{u})+ \mathcal F(A_Fu,f)+ \frac{1}{2r}\Vert \tilde{u} -u\Vert _X^2, \end{aligned}$$
(5)

where \(\mathcal D^*\) denotes the conjugate of \(\mathcal D\). The classical alternating direction method can be used to obtain an efficient algorithm for the multiple variable minimization problem (5), which gives

$$\begin{aligned} \left\{ \begin{array}{ll} p^{k+1} = \arg \min \limits _{p\in Y} ~~ \mathcal D^*(p,g) - \langle A_S\tilde{u}^k,p\rangle + \frac{1}{2\tau }\Vert p-p^k\Vert _Y^2,\\ \tilde{u}^{k+1} = \arg \min \limits _{\tilde{u}\in X} ~~ \mathcal R(\tilde{u})+\langle A_S\tilde{u},p^{k+1}\rangle + \frac{1}{2r}\Vert \tilde{u}-u^k\Vert _X^2,\\ f^{k+1} = \arg \min \limits _{f\in Z} ~~ \mathcal F(A_Fu^k,f) + \frac{1}{2\sigma } \Vert f-f^k\Vert _Z^2,\\ u^{k+1} = \arg \min \limits _{u\in X} ~~ \mathcal F(A_Fu,f) + \frac{1}{2r}\Vert u-\tilde{u}^{k+1}\Vert _X^2, \end{array} \right. \end{aligned}$$
(6)

where \(\tau \) and \(\sigma \) are positive parameters. As shown, the proximal method is adopted for the subproblem with respect to p and f in case the likelihood functional \(\mathcal D(\cdot ,\cdot )\) and \(\mathcal F(\cdot ,\cdot )\) are non-smooth. The solutions to each subproblem can be expressed as follows

$$\begin{aligned} \left\{ \begin{array}{ll} p^{k+1} = (\mathcal I + \tau \partial \mathcal D^*)^{-1}(p^k, \tau A_S\tilde{u}, g),\\ \tilde{u}^{k+1} = (\mathcal I +r\partial \mathcal R)^{-1}(u^k, rA^*_Sp^{k+1}),\\ f^{k+1} = (\mathcal I +\sigma \partial \mathcal F)^{-1}(f^k, \sigma A_Fu^k),\\ u^{k+1} = (\mathcal I+r\partial \mathcal F)^{-1} (\tilde{u}^{k+1}, rA^*_Ff^{k+1}). \end{array} \right. \end{aligned}$$
(7)

Guided by the success of deep learning, we use CNN for unrolled iterative scheme such that the network can learn how to combine the variables in the object functional, which accounts for a deep feed-forward neural network by using CNNs to approximate the inverse operators in (7). The alternating direction algorithm with I iterations is outlined as Algorithm 1.

Remark 1

In the algorithm, we assume the constraint \(\tilde{u}=u\) holds unconditionally. Therefore, \(f^{k+1}\) is calculated based on \(A_F\tilde{u}^{k+1}\) rather than \(A_Fu^k\) as \(\tilde{u}^{k+1}\) was already updated in the previous step. Besides, instead of selecting specific values for \(\tau \), \(\sigma \) and r, we let the network learn the appropriate value by itself.

figure b

3 Experiments and Results

In this section, we evaluate the proposed algorithm on both the ellipse data [1] and a piglet dataFootnote 1 by comparing with the state-of-the-art work, i.e., FBP-Unet denoising [4] and Leaned Primal-Dual network (PD-net) [1].

3.1 Implementation

The methods are implemented in Python using Operator Discretization Library (ODL) and TensorFlow. We let the number of data that persists between the iterates be \(N_{u}=N_{\tilde{u}} = 6\) and \(N_{p}=N_f = 7\). The convolution are all \(3\times 3\) pixel size, and the numbers of channels in each iteration are p of \(9\rightarrow 32 \rightarrow 32\rightarrow 7\), \(\tilde{u}\) of \(7 \rightarrow 32 \rightarrow 32 \rightarrow 6\), f of \(8 \rightarrow 32 \rightarrow 32 \rightarrow 7\) and u of \(7 \rightarrow 32 \rightarrow 32 \rightarrow 6\). The network structure of one iteration is illustrated in Fig. 1, where totally 10 iterations are contained in our network. As shown, each iteration involves four 3-layer that is the depth of network is 120 layers. Our FSR-net has approximately \(4.9\times 10^5\) parameters, while FBP-Unet and PD-net have \(10^{7}\) and \(2.4\times 10^5\) parameters, respectively.

Fig. 1.
figure 1

Network architecture to solve the tomography problem. Each box corresponds to one variable, which are all of the same architecture.

We use the Xavier initialization scheme to initialize the convolution parameters, and initialize all biases to zero. Let \(\varTheta = \{\theta ^p, \theta ^{\tilde{u}}, \theta ^f, \theta ^u\}\) and \(\mathcal T^\dagger \) be the pseudo-inverse of the minimization process (3) defined as

$$\begin{aligned} \mathcal T^{\dagger }_\varTheta (g)\approx (u_{\mathrm {true}},f_{\mathrm {true}}) \quad {for\ data\ } g {\ satisfying}\ (1), \end{aligned}$$

Suppose \((\mathcal T_\varTheta (u),\mathcal T_\varTheta (f)) = \mathcal T^\dagger _\varTheta (g)\) and \((g_1,u^*_1), (g_2,u^*_2),\ldots ,(g_L, u^*_L)\) be L training samples. We apply the ADAM optimizer in TensorFlow to minimize the following empirical loss function

$$\begin{aligned} \mathcal L (\varTheta )= \frac{1}{2L}\sum _{i=1}^L\Big (\left\| \mathcal T_\varTheta (u_i)-u^*_i\right\| _{X}^{2}+\left\| \mathcal T_\varTheta (f_i)-A_Fu^*_i\right\| _{Z}^{2}\Big ). \end{aligned}$$
(8)

Most parameters are set the same as the PD-net in [1]. We use \(2\times 10^5\) batches on each problem and a learning rate schedule according to cosine annealing, i.e., the learning rate at step t is \(\eta ^t=\frac{\eta ^0}{2}\big (1+\cos (\pi \frac{t}{t_{max}})\big ),\) where the initial learning rate is set as \(\eta ^0=10^{-3}\) for the ellipse data and \(\eta ^0=10^{-4}\) for the piglet phantom. We let the parameter \(\beta _2\) of the ADAM optimizer to 0.99 and limit the gradient norms to 1 to improve training stability. The batch size is set as 5 and 1 for the ellipse data and piglet phantom, while the epoch is set as 22 for both datasets.

Fig. 2.
figure 2

Reconstruction comparison on the ellipse data, where the window is set to [0.1, 0.4].

3.2 Results on Ellipse Phantoms

Similar to [1], we randomly generate ellipses on a \(128\times 128\) pixel domain by parallel beam projection geometry with \(N_{\mathrm {bins}}=128\) and \(N_{\mathrm {views}}=15\), \(N_{\mathrm {views}}=30\). Both 5% and 10% additive white Gaussian noises are added to the projection data. We use the full sampling system matrix provided by ODL for parallel beam CT as \(A_F\) in our model (3). Table 1 presents the PSNR and SSIM obtained by the CNN based models. It is obviously shown that the best PSNR values are always achieved by our FSR-net and PD-net ranks the second position, both of which are significantly better than the Unet based post-processing method. Especially, the advantage of our FSR-net over PD-net becomes more convincing for \(N_{\mathrm {views}}=15\) and 5% Gaussian noise, giving an improvement exceeding 1 dB, which demonstrates the effectiveness of our model in sparse-view reconstruction. The comparison of the PSNR and SSIM between the FBP with g and FBP using the reconstructed projection data f from our model in Table 1 also demonstrates that our model can recover the Radon domain data to certain qualities. We display the reconstruction results of the sparse 15 views with 5% Gaussian noise in Fig. 2, which shows that our reconstruction preserves the geometry and details better than the other two methods.

Table 1. Comparison results for the ellipse phantom in terms of PSNR, SSIM and Runtime.

Moreover, we apply the model trained by the sparse 30 views and 5% Gaussian noise to test data with different sparsity, i.e., g obtained by \(N_{\mathrm {views}}=30, 25, 20\). We compare the results with the learned PDHG net, primal-net and PD-net from [1] in terms of PSNR and SSIM in Table 2. As shown, our model performs more stable in adapting with different testing data, which is because our model minimizes the distance between the reconstructed image and the full-sampling projection data.

3.3 Results on Piglet Phantom Data

We test the proposed model on simulated CT data of a deceased piglet, which is scanned from a 64-slice multi-detector CT scanner (Discovery CT750 HD, GE Healthcare) using 100 kV and 0.625 mm slice thickness. We use 896 images of size \(512\times 512\) as the ground truth for training and 10 for evaluation. We adopt the fan-beam geometry with \(N_{\mathrm {bins}}=512\) and \(N_{\mathrm {bins}}=1024\), source to axis distance 500 mm and axis to detector distance 500 mm. The number of views is set as follows

  • For \(N_{\mathrm {bins}}=512\), the observed data g is generated by 64 uniformly distributed views over \(2\pi \) arc with two different Poisson noises of \(10^4\) and \(5\times 10^5\) incident photons per pixel before attenuation. The full-sampling system matrix \(A_F\) is constructed according to SSC1, i.e., \(N_{\mathrm {views}}=512\);

  • For \(N_{\mathrm {bins}}=1024\), the observed data g is generated with either 120 views or 60 views and Poisson noise of \(10^4\) incident photons. The full-sampling system matrix \(A_F\) is defined according to SSC4, i.e., \(N_{\mathrm {views}}=1024\). Because \(N_{\mathrm {views}}=1024\) gives too much computational burden, we use \(N_{\mathrm {views}}=720\) in practice.

As shown in Tables 3 and 4, our model outperforms other methods in reconstruction quality. Especially when we use the parameters trained by \(N_{\mathrm {views}}=64\) to reconstruct the sparse data such as \(N_{\mathrm {views}}=32,28,24\), our model achieves a PSNR 0.5\(\sim \)3 dB higher than PD-net. Both the reconstructed images and the error maps are displayed in Fig. 3, the first column displays the FBP reconstruction of observed data g (1st row) and our estimated full-sampling measurement f (2nd row). It is obviously shown that our model can well inpaint the Radon domain data and improve the reconstruction quality.

Table 2. Reconstruction comparison on the piglet phantom for different sparsities.
Table 3. Comparison results for the piglet phantom in terms of PSNR, SSIM and Runtime.
Table 4. Reconstruction comparison on the piglet phantom for different sparsities.

4 Conclusion

We proposed a novel iterative reconstruction model by fitting the reconstructed image with its corresponding measurements in Radon domain through the full-sampling system matrices. This new algorithm is in the family of deep learning based iterative reconstruction schemes. The application on sparse-view CT image reconstruction demonstrates the effectiveness of the proposed model and it is also clearly shown that the proposed method can be applied to other applications such as limited-angle CT reconstruction and compressed-sensing MR reconstruction.

Fig. 3.
figure 3

Reconstruction comparison of a piglet phantom with \(N_{\mathrm {views}}=64\) and \(N_{\mathrm {bins}}=512\).