1 Introduction

X-ray computed tomography (CT) has been widely used in clinical and preclinical applications and plays a central role in the examination of diseases and procedures [1, 2]. X-ray radiation dose is harmful for patients, and low-dose CT reconstruction technique is a research hot spot in the current medical CT field. To reduce the X-ray radiation dose, medical CT systems can decrease the X-ray intensity or the number of scanning projection angles. However, these two strategies result in a reconstructed image with a low signal-to-noise ratio (SNR), indicating the negative impact of noise and artifacts.

In comparison with traditional CT-reconstructed algorithms [3,4,5], the iterative algorithms based on compressive sensing (CS) [6] can be used to reconstruct high-quality CT images with incomplete or low SNR projection data. In the CT reconstruction, some advanced exemplary algorithms were employed, such as total variation (TV) minimization [7, 8], soft-thresholding algorithms [9, 10], adaptive-weighted total variation (AWTV) [11], dictionary learning [12], multi-direction anisotropic total variation (MDATV) [13], and split-Bregman reconstruction [14]. Among them, TV using the x- and y-coordinate gradient operators as the sparse representation approach during the iteration process is one of the most popular algorithms. However, it still can be improved by combining with the diagonal total variation (DTV) [15,16,17,18,19], which accelerates the iterative convergence and reconstructs the high-quality image from incomplete projection data.

In this paper, the research goal was to reconstruct high-quality CT images with a low dose. There are many methods to reduce the delivered dose in CT scanning, but we focused on a sparse-view reconstruction strategy. To handle the noise and artifacts in the reconstructed images from few-view projections, we proposed a hybrid reconstruction approach that combined TV- and DTV-constraints aimed at exploring the sparse-view reconstruction. We introduce the methodology including the proposed algorithm in Sect. 2, analyze the reconstructed results in Sect. 3, and discuss the issues related to our reconstruction process and the corresponding results in Sect. 4.

2 Methodology

2.1 Total variation and diagonal total variation

The reconstruction algorithm with TV-constraint can be defined as:

$$ \hbox{min} \left\| {\vec{f}} \right\|_{\text{TV}} \quad {\text{s}} . {\text{t}} .\quad \left\| {A\vec{f} - \vec{p}} \right\|_{2}^{2} < \sigma^{2} , $$
(1)

where \( \vec{\varvec{f}} \) is the reconstructed image, \( \varvec{A} \) is the projection matrix, \( \vec{\varvec{p}} \) is the projection data, σ is permissible error, and TV of \( \vec{\varvec{f}} \) can be expressed as:

$$ \left\| {\vec{f}} \right\|_{\text{TV}} = \sum\limits_{s,t} {\left| {\vec{\nabla }f_{s,t} } \right| = \sqrt {\left[ {D_{s} (f_{s,t} )} \right]^{2} + \left[ {D_{t} (f_{s,t} )} \right]^{2} } } , $$
(2)

where \( \vec{\nabla } \) represents the local gradient operator, \( f_{s,t} \) is a pixel value of \( \vec{f} \) at position (s, t), \( D_{s} (\cdot ) \) and \( D_{t} (\cdot ) \) are the discrete differential operators along the s- and t-axes, respectively, defined as:

$$ D_{s} (f_{s,t} ) = f_{s,t} - f_{s - 1,t} $$
(3)
$$ D_{t} (f_{s,t} ) = f_{s,t} - f_{s,t - 1} . $$
(4)

The positional relationship between \( f_{s,t} \) and its neighboring pixels is shown in Fig. 1.

Fig. 1
figure 1

Illustration of the pixel positions in a reconstructed image

The DTV of \( \vec{\varvec{f}} \) is defined as \( \left\| {\vec{\varvec{f}}} \right\|_{\text{DTV}} \):

$$ \left\| {\vec{f}} \right\|_{\text{DTV}} = \sum\limits_{s,t} {\left| {\vec{\nabla }_{D} f_{s,t} } \right| = \sqrt {[D_{ds} (f_{s,t} )]^{2} + [D_{dt} (f_{s,t} )]^{2} } } , $$
(5)

where \( \vec{\nabla }_{D} \) represents the local diagonal gradient operator and \( D_{ds} (\cdot ) \) and \( D_{dt} (\cdot ) \) are the discrete differential operators along the diagonal s- and t-axes, respectively, given by

$$ D_{ds} (f_{s,t} ) = f_{s,t} - f_{s - 1,t - 1} $$
(6)
$$ D_{dt} (f_{s,t} ) = f_{s,t} - f_{s - 1,t + 1} . $$
(7)

2.2 Proposed algorithm

In this paper, we used DTV to sparsely represent images when iterative convergence of the reconstructed algorithm with TV-constraint showed no change during the reconstruction process. To solve the optimization problem, we employed the steepest-descent method [20]. Our proposed algorithm can be defined as follows:

$$ \hbox{min} \left\| {\vec{f}} \right\|_{\text{TV}} \quad {\text{s}} . {\text{t}} .\quad \left\| {A\vec{f} - \vec{p}} \right\|_{2}^{2} < \sigma_{1}^{2} $$
(8)
$$ \hbox{min} \left\| {\vec{f}} \right\|_{\text{DTV}} \quad {\text{s}} . {\text{t}} .\quad \left\| {A\vec{f} - \vec{p}} \right\|_{2}^{2} < \sigma_{2}^{2} , $$
(9)

where \( \sigma_{1} \) and \( \sigma_{2} \) are the permissible errors of TV and DTV, respectively.

The steepest-descent method was applied to solve Eqs. (8) and (9), and we obtained the following formulas:

$$ \vec{f}^{l + 1} = \vec{f}^{l} - \alpha \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{\text{TV}}^{l} $$
(10)
$$ \vec{f}^{k + 1} = \vec{f}^{k} - \beta \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{\text{DTV}}^{k} , $$
(11)

where \( l \) and \( k \) denote the iteration indices of TV and DTV in the steepest-descent method, respectively, and α and β are the gradient descent step sizes of TV and DTV, respectively. \( \varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g} }_{\text{TV}} \) is the normalized TV gradient, while \( \vec{\varvec{g}}_{\text{TV}} \) is the TV gradient and is related by \( \varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g} }_{\text{TV}} = {{\vec{\varvec{g}}_{\text{TV}} } \mathord{\left/ {\vphantom {{\vec{\varvec{g}}_{\text{TV}} } {\left\| {\vec{\varvec{g}}_{\text{TV}} } \right\|_{2} }}} \right. \kern-0pt} {\left\| {\vec{\varvec{g}}_{\text{TV}} } \right\|_{2} }} \). The individual elements of \( \vec{\varvec{g}}_{\text{TV}} \) can be defined as follows:

$$ \begin{aligned} \frac{{\partial \left\| {\vec{f}} \right\|_{\text{TV}} }}{{\partial f_{s,t} }} & \approx \frac{{(f_{s,t} - f_{s - 1,t} ) + (f_{s,t} - f_{s,t - 1} )}}{{\sqrt {\varepsilon + (f_{s,t} - f_{s - 1,t} )^{2} + (f_{s,t} - f_{s,t - 1} )^{2} } }} \\ & \quad - \frac{{(f_{s + 1,t} - f_{s,t} )}}{{\sqrt {\varepsilon + (f_{s + 1,t} - f_{s,t} )^{2} + (f_{s + 1,t} - f_{s + 1,t - 1} )^{2} } }} \\ & \quad - \frac{{(f_{s,t + 1} - f_{s,t} )}}{{\sqrt {\varepsilon + (f_{s,t + 1} - f_{s,t} )^{2} + (f_{s,t + 1} - f_{s - 1,t + 1} )^{2} } }}. \\ \end{aligned} $$
(12)

\( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{\text{DTV}} \) is the normalized DTV gradient, while \( \vec{g}_{\text{DTV}} \) is DTV gradient and related by \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{\text{DTV}} = \vec{g}_{\text{DTV}} /\left\| {\vec{g}_{\text{DTV}} } \right\|_{2} \). The individual elements of \( \vec{g}_{\text{DTV}} \) can be defined as follows:

$$ \begin{aligned} \frac{{\partial \left\| {\vec{f}} \right\|_{\text{DTV}} }}{{\partial f_{s,t} }} & \approx \frac{{(f_{s,t} - f_{s - 1,t - 1} ) + (f_{s,t} - f_{s - 1,t + 1} )}}{{\sqrt {\varepsilon + (f_{s,t} - f_{s - 1,t - 1} )^{2} + (f_{s,t} - f_{s - 1,t + 1} )^{2} } }} \\ & \quad - \frac{{(f_{{s + 1,t{ - }1}} - f_{s,t} )}}{{\sqrt {\varepsilon + (f_{{s + 1,t{ - }1}} - f_{s,t} )^{2} + (f_{{s + 1,t{ - }1}} - f_{s,t - 2} )^{2} } }} \\ & \quad - \frac{{(f_{{s{ + }1,t + 1}} - f_{s,t} )}}{{\sqrt {\varepsilon + (f_{{s{ + }1,t + 1}} - f_{s,t} )^{2} + (f_{{s{ + }1,t + 1}} - f_{s,t + 2} )^{2} } }}, \\ \end{aligned} $$
(13)

where ε is a known positive integer. In our study, we selected ε = 10−8.

We will now describe the iterative steps of the proposed algorithm. The whole iteration process contained two loops, the outside and inside loops operated the algebraic reconstruction techniques (ART) [4] and TV gradient descent, respectively, and then, the DTV was used as a substitute when the iterative convergence of the reconstructed algorithm was stable after the \( N_{\text{TV}} \)th iteration. The flow chart is shown in Table 1, where \( m = 2, \ldots ,M \) denotes the projection angles, \( \vec{\varvec{A}}_{m} \) is the mth row vector of the projection matrix \( \varvec{A} = (\vec{\varvec{A}}_{1} ,\vec{\varvec{A}}_{2} , \ldots ,\vec{\varvec{A}}_{m} , \ldots ,\vec{\varvec{A}}_{M} ),\,\vec{\varvec{p}} = (p_{1} ,p_{2} , \ldots ,p_{m} , \ldots ,p_{M} ) \) is the projection, and \( \lambda \) is the convergence parameter of the ART method. The inside loop is labeled by \( k \), and \( K \) is the iteration count for the TV and DTV minimizations.

Table 1 Implementation steps of the TV + DTV reconstruction

3 Numerical and experimental studies

In this section, we present our numerical and experimental studies. There were two sets of experiments; FORBILD head phantom [21] was used in the numerical study, and real projection data were used to reconstruct the aspirin images in the experimental study. The image quality was assessed with relative root-mean-square error (RMSE) and structure similarity (SSIM) [22].

The RMSE is defined as

$$ {\text{RMSE}} = \sqrt {{{\left( {\sum\limits_{0 \le s < N} {\sum\limits_{0 \le t < M} {(f_{s,t} - f_{s,t}^{*} )} }^{2} } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{0 \le s < N} {\sum\limits_{0 \le t < M} {(f_{s,t} - f_{s,t}^{*} )} }^{2} } \right)} {(M \times N)}}} \right. \kern-0pt} {(M \times N)}}} , $$
(14)

where \( f_{s,t} \) is the pixel value of the original image at position (s, t) and \( f_{s,t}^{*} \) is the pixel value of the reconstructed image at position (s, t).

SSIM is defined as

$$ {\text{SSIM}} = l(\vec{f},\vec{f}^{*} )^{\delta } \cdot c(\vec{f},\vec{f}^{*} )^{\gamma } \cdot v(\vec{f},\vec{f}^{*} )^{\eta } , $$
(15)

where

$$ l(\vec{\varvec{f}},\vec{\varvec{f}}^{*} ) = {{(2\bar{\vec{f}} \times \bar{\vec{f}}^{*} + c1)} \mathord{\left/ {\vphantom {{(2\bar{\vec{f}} \times \bar{\vec{f}}^{*} + c1)} {((\bar{\vec{f}})^{2} + (\bar{\vec{f}}^{*} )^{2} + c1)}}} \right. \kern-0pt} {((\bar{\vec{f}})^{2} + (\bar{\vec{f}}^{*} )^{2} + c1)}} $$
(16)
$$ c(\vec{\varvec{f}},\vec{\varvec{f}}^{*} ) = {{(2\sigma_{{\vec{f}}} \sigma_{{\vec{f}^{*} }} { + }c 2 )} \mathord{\left/ {\vphantom {{(2\sigma_{{\vec{f}}} \sigma_{{\vec{f}^{*} }} { + }c 2 )} {(\sigma_{{\vec{f}}}^{2} + \sigma_{{\vec{f}^{*} }}^{2} + c2)}}} \right. \kern-0pt} {(\sigma_{{\vec{f}}}^{2} + \sigma_{{\vec{f}^{*} }}^{2} + c2)}} $$
(17)
$$ v(\vec{\varvec{f}},\vec{\varvec{f}}^{*} ) = {{(\sigma_{{\vec{f}\vec{f}^{*} }} { + }c3)} \mathord{\left/ {\vphantom {{(\sigma_{{\vec{f}\vec{f}^{*} }} { + }c3)} {(\sigma_{{\vec{f}}} \sigma_{{\vec{f}^{*} }} + c3}}} \right. \kern-0pt} {(\sigma_{{\vec{f}}} \sigma_{{\vec{f}^{*} }} + c3}}), $$
(18)

where \( \bar{\vec{f}} \) and \( \bar{\vec{f}}^{*} \) are the mean of \( \vec{\varvec{f}} \) and \( \vec{\varvec{f}}^{*} \), respectively, \( \sigma_{{\vec{f}}} \) and \( \sigma_{{\vec{f}^{*} }} \) are the variances of \( \vec{\varvec{f}} \) and \( \vec{\varvec{f}}^{*} , \) respectively, \( \sigma_{{\vec{f}\vec{f}^{*} }} \) is the covariance of \( \vec{\varvec{f}} \) and \( \vec{\varvec{f}}^{*} \), and \( c1,\,{\text{c2}} \), and \( c3 \) are three small positive constants used to avoid instability if the values of the denominators in Eqs. (16), (17) and (18) are very close to zero respectively. \( \delta ,\,\gamma \), and \( \eta \) are used to adjust the weights of the luminance \( l(\vec{\varvec{f}},\vec{\varvec{f}}^{*} ) \), contrast \( c(\vec{\varvec{f}},\vec{\varvec{f}}^{*} ) \), and structures \( v(\vec{\varvec{f}},\vec{\varvec{f}}^{*} ) \). In our study, we selected \( \delta = \gamma = \eta = 1,\,c1 = 2 \times 10^{ - 8} ,\,c2 = 1 \times 10^{ - 8} , \) and \( c3 = 0.5 \times c2 = 0.5 \times 10^{ - 7} \), and the value of SSIM was between − 1 and 1. When two images were the same, the SSIM value was 1.

The selection of the optimal parameters is very important. According to Table 1, there are three parameters: λ, α, and β. Previous studies reported a range for λ of [0, 2], and the steepest-descent parameter (α or β) and descent iterative number (K) should be set so that α * K (or β * K) is of the order of, but larger than one [7]. In the cases considered here, the descent iterative numbers were 20, and the ranges of α and β were both [0.05, 1]. The following optimal parameters were selected from these ranges. We use the FORBILD head phantom with 30 projections as an example to determine the optimal parameters. We changed the value of λ to reconstruct images with the ART method and calculated the reconstruction error. Figure 2 shows that λ = 1 was the optimal parameter where RMSE was the lowest and SSIM was approaching 1. When λ was fixed, we changed the value of α to reconstruct images with the TV method and calculated the reconstruction error. Figure 3 shows that α = 0.55 was the optimal parameter. When λ and α were fixed, we changed the value of β to reconstruct images with the TV + DTV method and calculated the reconstruction error. Figure 4 shows that β = 0.28 was the optimal parameter. Then we used λ = 1, α = 0.55, and β = 0.28 in the next numerical simulation using the FORBILD head phantom. A similar search was conducted for real data, and the optimal values of these parameters are shown in Table 2.

Fig. 2
figure 2

RMSE and SSIM of the analyses to find the optimum regularization parameter \( \lambda \). a RMSE, b SSIM

Fig. 3
figure 3

RMSE and SSIM of the analyses to find the optimum regularization parameter \( \alpha \). a RMSE, b SSIM

Fig. 4
figure 4

RMSE and SSIM of analyses to find the optimum regularization parameter β. a RMSE, b SSIM

Table 2 Optimum parameter selections for each dataset

3.1 Numerical simulation study

In the numerical simulation, a FORBILD head phantom (256 × 256 × 8 bits), shown in Fig. 5a, was used to analyze the performance of the proposed algorithm. The scanning range of the CT system was from 0 to 2π with a θ angular increment. The projection number was set to 30 so that θ = π/15. The scanned angle can be specified by ψ = θ * i (1 ≤ i ≤ 30). The iteration number was set to 1000, and the proposed algorithm (TV + DTV) included 600 TV iterations and 400 DTV iterations. Figure 5b, c is reconstructed by the TV and TV + DTV methods, respectively.

Fig. 5
figure 5

Reconstructed FORBILD head phantom for comparison. a Original FORBILD head phantom, b TV, c TV + DTV

Although the reconstructed images obtained using the TV and TV + DTV methods were not significantly different in Fig. 5, the positions of the arrows in Fig. 6 showed that the profile of the reconstructed image using the TV + DTV method was more stable than that of the TV method. Furthermore, the zoomed-in part of the reconstructed images is shown in Fig. 7, and Fig. 7c has fewer artifacts, clearer edges, and a more uniform distribution compared with Fig. 7b reconstructed by TV method.

Fig. 6
figure 6

(Color online) Profile of line 180 in the different reconstructed methods for the FORBILD head phantom. a TV, b TV + DTV

Fig. 7
figure 7

Magnified part of the reconstructed FORBILD head phantom for comparison. a Original FORBILD head phantom, b TV, c TV + DTV

In Fig. 8, the RMSE and SSIM are plotted with the iteration number. For both criteria, there was a similar trend in which the RMSE and SSIM of the reconstructed images became better when converted from TV-constraint to DTV-constraint after 600 iterations. Table 3 lists the RMSE and SSIM calculated from the reconstructed FORBILD head phantom with the TV and TV + DTV methods. It was evident that the RMSE of the reconstructed images using the TV + DTV method was considerably smaller than those obtained by the TV method, and the SSIM was significantly larger. Both the RMSE and SSIM showed that the TV + DTV method can be used to reconstruct images with higher quality.

Fig. 8
figure 8

RMSE and SSIM lines of reconstructed FORBILD head phantoms with iteration number (from 100 to 1000) for the TV and TV + DTV methods. a RMSE, b SSIM

Table 3 RMSE and SSIM of reconstruction images

3.2 Real projection data study

We applied our proposed algorithm to a set of real projection data, acquired from scanned aspirin. The source-to-object distance was 48.1551 cm, and the object-to-detector distance was 100.4449 cm. The detector pixel size was 0.0098 cm, and the number of detector elements was 1472. X-ray CT geometrical calibration via locally linear embedding [23] was used to calibrate the geometry, and the calibrated rotation center and detector offsets were − 0.3318 and − 0.5364 cm, respectively. The number of projection angles was 360, equally spaced in the angular range [0, 2π]. To demonstrate the performance of our proposed approach, we reduced the number of views to 90 by setting the angular increment to θ = π/45. As shown in Fig. 9a, the reconstructed full-view image was considered to be the standard image. Figure 9b, c were reconstructed by the TV and TV + DTV methods with 90 projection views, respectively. The TV-constraint was converted into the DTV-constraint after 40 iteration numbers, and the lines of the RMSE and SSIM with respect to iteration number are shown in Fig. 10. The RMSE and SSIM of reconstructed images are listed in Table 3. It was observed that the result of the TV + DTV method was considerably better than that of the TV method.

Fig. 9
figure 9

Reconstructed aspirin images for comparison. a Standard image, b TV, c TV + DTV

Fig. 10
figure 10

RMSE and SSIM lines of reconstructed aspirin projection data with respect to iteration number for the TV and TV + DTV methods. a RMSE, b SSIM

4 Discussions and conclusion

There are several issues worth further discussion. Although the proposed hybrid method can be used to reconstruct high-quality images from sparse-view data, it should be noted that the DTV-constraint had no obvious advantages over the TV-constraint with a small number of iterations, due to the gradient operators for the sparse representation. Therefore, it is vital to find the appropriate iteration number to convert the TV-constraint into the DTV-constraint to accelerate the iterative convergence. In the reconstruction process, the iteration numbers were 600 and 40 in the numerical and experimental studies, respectively, according to the characteristics of the low- and high-frequency components in the reconstructed image.

Another issue regarding the feasibility of the reconstruction algorithm is whether the run time is acceptable. The run time depends on the computational environment. MATLAB R2014b on a computer with the Intel(R) Core(TM) i5-4590 CPU @3.30 GHz, RAM 8.00 GB, 64-bit OS was used here. The implementation of the TV + DTV algorithm took 179 s for the FORBILD head phantom (total number of iterations was 1000) and 285 s for the real aspirin projection data (the total iterative number was 90). The run time was acceptable, but could be further improved by many methods, such as parallel computing and CUDA acceleration.

For our proposed algorithm, the selection of the weights of the TV- and DTV-constraints was also an important issue in the reconstruction. If they are too small, the algorithm based on the TV- and DTV-constraints would not be able to reduce the artifacts and noise of the reconstructed image. If they are too large, the TV- and DTV-constraints would over-smooth the CT images. Thus, the weight parameter selection for the TV- and DTV-constraints depends on the levels of artifacts and noise. In this paper, the used parameters are shown in Table 2 according to the experimental analysis.

In conclusion, we proposed a hybrid reconstruction approach by combining the TV- and DTV-constraints, by operating the DTV-constraint to sparsely represent the images when the iterative convergence of the reconstructed algorithm with the TV-constraint did not vary during the reconstruction process. The numerical and experimental studies demonstrated that the proposed hybrid method can be used to reconstruct high-quality images from sparse-view data, and the RMSE and SSIM were improved when the TV-constraint was converted to the DTV-constraint after a set number of iterations. Further research will be performed to explore the directional TV optimization problem in the CT reconstruction.