Keywords

1 Introduction

Position Emission Tomography (PET) is a unique in vivo functional imaging technique which provides the most sensitive non-invasive molecular assay of human body. The photons emitted from radioactively labelled molecules (tracer) in a subject are collected by the PET detectors. With the photon data, the spatio-temporal distribution of the tracer can be estimated by image reconstruction. The PET image reconstruction can be considered as a problem of fitting a high-dimensional model (in terms of the number of unknowns, for example the intensity values of millions of voxels in a modern PET scanner) to noisy projection data where the photon emission is a random process greatly affected by the amount of tracer reached to the imaging target. The measured projection data can be highly undersampled due to the detector configuration when there are only a fraction of counts collected by the scanner (which is the motivation for developing the ultrasensitive total-body PET scanner http://explorer.ucdavis.edu/). The fitting of a high-dimensional forward model to this undersampled and noisy data would result in overfitting of the noise and unreliable image reconstruction.

Given this ill-posed problem of PET image reconstruction with low-count projection data, in the maximum likelihood (ML) PET reconstruction framework, penalised likelihood (PL) reconstruction (or equivalently maximum a posteriori, MAP) has been extensively studied [17]. Such methods involve adding a regularisation (penalty) term to the log likelihood function, and thus the effective forward model complexity can be controlled by changing the weight of this regularisation. Ideally the effective complexity should be reduced to match the size of the measured data to avoid overfitting. However the appropriate model complexity is usually unknown for a particular clinical task, and this is a general problem of using penalised likelihood framework. Two other problems are how to formulate the penalty model and how to solve the corresponding MAP problem. When there is no regularisation, the ML problem can be solved by the expectation maximisation (EM) algorithm iteratively with a closed form update equation [8]. For some of the penalty models, the optimisation transfer technique can be applied to derive a closed form update from the surrogate functions of the original penalised likelihood function [9]. However such penalty models are usually not edge-preserving and would result in undesirable oversmooth on the edges and fine features in the image. When the closed-form solution does not exist, a modified EM solution may be found for some of the penalty models. The one-step-late (OSL) algorithm [1] gives one such solution by using the gradient of the penalty term evaluated at the previous image estimation, however OSL does not guarantee convergence or non-negativity, depending on the penalty and penalty weight. Realistic penalty models which preserve the image details are usually non-smooth and non-convex, making the optimisation mathematically and computationally challenging. Among the penalty models, anatomical images acquired by high-resolution magnetic resonance (MR) or X-ray computed tomography (CT) from the same subject are considered useful, as they provide the prior information on the underlying structures. Recent reviews on using anatomical prior information for PET image reconstruction can be found in [5, 6, 10], and the Bowsher method [4] which encourages PET image smoothness over the neighbour voxels selected from the anatomical image, was found to achieve better performance while being relatively efficient computationally compared to other methods [5]. Apart from the penalised likelihood PET reconstruction frameworks, very recently an alternative perspective of using the image-derived prior was proposed in [11], by incorporating the prior information into the image representation via kernel functions, and the regularisation was applied to the PET forward model. This leads to a very elegant kernelised EM solution and achieves better performance.

In this work, we propose a novel perspective on constraining the ill-posed PET reconstruction with low-count projection data by exploring the sparsity in natural images [12, 13]. Instead of using a regular voxel grid, the PET image is represented by a reduced number of supervoxels of various sizes and shapes so that the complexity of the PET forward model (in terms of the number of unknowns) is reduced to match the low-count data. The sparse image representation is jointly derived from the anatomical prior image and the PET data, preserving the edges and anatomical details without degradation of the structures only present in the PET image. This approach can be considered as a segmentation-based reconstruction method which is potentially able to eliminate the partial volume effect and it is more flexible than the method proposed in [3]. The prior information is integrated in the image representation as in [11], and therefore the regularisation operates in the forward model instead of in the additive penalty term, so the reconstruction is directly solved by the EM algorithm. Experiments using simulated and clinical data show promising results.

2 Methods

2.1 Sparse Image Representation Using Supervoxels

Supervoxels can be defined as an over-segmentation of an image into perceptually meaningful regions that preserves the image details. Using such supervoxels instead of the voxels defined on a regular grid leads to a more compact and sparse image representation which greatly reduces the image redundancy with little loss of the details. A visual demonstration of this concept can be found in [14].

In this work the supervoxel clustering was conducted using the simple linear iterative clustering (SLIC) method proposed in [15] for its computational efficiency and good performance to adhere to boundaries. The supervoxels were generated by an adapted k-means clustering of the voxels (with limited search region instead of the whole image) based on multi-dimensional Euclidean distance \(D_{i,j}\) between two points i and j in the image domain, defined by

$$\begin{aligned} D_{i,j} =&\ \sqrt{d_{f}^2 + \left( \frac{d_{s}}{S}\right) ^2 m^2}, \\ d_{s}=&\ \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2 + (z_i - z_j)^2}, \nonumber \\ d_{f} =&\ \sqrt{(f_i - f_j)^2}, \nonumber \end{aligned}$$
(1)

where \(d_{s}\) is the spatial Euclidean distance, \(d_{f}\) is the image intensity similarity, and these two different measures are combined into a single one with \(S= \root 3 \of {(N/K)}\) (N the number of voxels and K the number of supervoxels) being the mean spatial distance within a supervoxel as a normalisation factor, and m being a weight between the intensity similarity and spatial proximity. Note that the intensity similarity \(d_{f}\) can be extended to include additional dimensions when there are a group of images or multi-channel information for clustering.

It can be seen that SLIC does not explicitly enforce connectivity, therefore in this work, connected-component labelling [16] was performed after SLIC supervoxel clustering to assign the originally disconnected groups of voxels within the same supervoxel to new supervoxels where all the voxels within the same supervoxel are spatially connected. Also the supervoxels generated by SLIC of extremely small size due to image noise were considered as “orphans” and were merged into the nearest supervoxels.

The over-segmentation generated by the supervoxel clustering leads to a sparse image representation when the number of supervoxels K is greatly smaller than the number of voxels N. Let \(\mathbf {A}\) denote the representation matrix in the image domain, \(\mathbf {A}\) is binary and sparse, which determines whether a voxel i belongs to a given supervoxel j, that is

$$\begin{aligned} A_{i,j}=\bigg \{ \begin{array}{ccc} 1, &{} &{} i \in j \\ 0, &{} &{} i \notin j. \end{array} \end{aligned}$$

Then from the supervoxel intensity values \(\varvec{s}\), the original image \(\varvec{f}\) on the voxel grid can be established by \(\varvec{f} = \mathbf {A}\varvec{s}\) where \(\varvec{f}\in \mathbb {R}^{N \times 1}\) and \(\varvec{s}\in \mathbb {R}^{K \times 1}\). In PET reconstruction, using the image representation \(\varvec{f} = \mathbf {A}\varvec{s}\) with a given \(\mathbf {A}\) (not a square matrix) transfers the reconstruction of the original image \(\varvec{f}\) to the estimation of \(\varvec{s}\) with less number of the unknowns when \(K<N\), without losing the image details preserved in \(\mathbf {A}\). The joint determination of \(\mathbf {A}\) from both the anatomical prior images and the PET data will be discussed in Sect. 2.3. Notably, in contrary to embedding the anatomical information within a Bayesian reconstruction framework based on solely the image intensity such as joint entropy or mutual information, the proposed method avoids the potential bias by using the image geometry instead.

2.2 PET Reconstruction with Sparse Image Representation

The sparse image representation can be directly integrated into the forward model of PET reconstruction

$$\begin{aligned} \bar{\varvec{g}} = \mathbf {P}\mathbf {A}\varvec{s} + \varvec{r}, \end{aligned}$$
(2)

where \(\bar{\varvec{g}}\) is the expected projection data, \(\mathbf {P}\) is the system information matrix of the detection probabilities, and \(\varvec{r}\) is the expected scatter and random events.

Within the maximum likelihood (ML) reconstruction framework, the estimate of the image \(\varvec{f}\) (here \(\varvec{f} = \mathbf {A}\varvec{s}\)) is found by maximising the Poisson log likelihood [17]

$$\begin{aligned} L(\varvec{g}| \bar{\varvec{g}})=\sum _i g_i\log \bar{g_i}-\bar{g_i} \end{aligned}$$
(3)

with observed projection data \(\varvec{g}\)

$$\begin{aligned} \hat{\varvec{s}} = \arg \max _{\varvec{s}\ge 0}L(\varvec{g}| \mathbf {A}\varvec{s}). \end{aligned}$$
(4)

The iterative update to find the solution can be directly derived by the expectation-maximisation (EM) algorithm [8]

$$\begin{aligned} \varvec{s}^{n+1}=\frac{\varvec{s}^{n}}{\mathbf {A}^{T}\mathbf {P}^{T}\mathbf {1}}\mathbf {A}^{T}\mathbf {P}^{T}\frac{\varvec{g}}{ \mathbf {P}\mathbf {A}\varvec{s}^n + \varvec{r}}, \end{aligned}$$
(5)

where T denotes the matrix transpose and n denotes the iteration number.

With prior images, it is possible to have the sparse image representation matrix \(\mathbf {A}\) defined on a denser voxel grid that does not match the PET imaging system characterised by \(\mathbf {P}\). Instead of downsampling the prior images to match the PET imaging resolution, in this work a resampling operator is introduced into the forward model to maintain the image at higher spatial resolution to avoid the loss of the edges and other image details. Let \(\mathbf {R}\) denote the matrix form of the resampling operator in the image domain, then the forward model becomes \( \bar{\varvec{g}} = \mathbf {P}\mathbf {R}\mathbf {A}\varvec{s} + \varvec{r}\), and the iterative update becomes

$$\begin{aligned} \varvec{s}^{n+1}=\frac{\varvec{s}^{n}}{\mathbf {A}^{T}\mathbf {R}^{T}\mathbf {P}^{T}\mathbf {1}}\mathbf {A}^{T}\mathbf {R}^{T}\mathbf {P}^{T}\frac{\varvec{g}}{ \mathbf {P}\mathbf {R}\mathbf {A}\varvec{s}^n + \varvec{r}}. \end{aligned}$$
(6)

So far it has been demonstrated the use of the sparse image representation in reconstructing PET images. For dynamic PET data, directly reconstruct the parametric images from the raw projection data can achieve improved accuracy and robustness [9, 18, 19]. The sparse image representation is directly applicable to dynamic PET data as it is a linear operation in the image domain. For dynamic PET data, the sparse representation matrix \(\mathbf {A}\) is consistent for all time frames, and in \(\varvec{f} = \mathbf {A}\varvec{s}\), \(\varvec{f}\) and \(\varvec{s}\) are expanded so that \(\varvec{f}\in \mathbb {R}^{N \times nt}\) and \(\varvec{s}\in \mathbb {R}^{K \times nt}\) where nt is the number of time frames. Using a linearised kinetic model [20], \(\varvec{s}\) can be described as \(\varvec{s}=\varvec{\theta }\mathbf {B}\), where \(\mathbf {B}\in \mathbb {R}^{nk \times nt}\) are the temporal basis functions and \(\varvec{\theta }\in \mathbb {R}^{K \times nk}\) are the kinetic parameters for all supervoxels, with nk being the number of kinetic parameters for each supervoxel. The direct estimation of the kinetic parameters \(\varvec{\theta }\) can be solved by applying the optimisation transfer technique [9] to obtain a closed-form update equation with improved convergence performance.

2.3 Aggregation of Multi-layer Supervoxels and Joint Clustering

A single layer of supervoxels provides a sparse representation of the image which is affected by the algorithm and parameters used to generated the over-segmentation. As suggested in [21], aggregation of multi-layer supervoxels generated by different algorithms with varying parameters can improve the performance of capturing the diverse and multi-scale visual features in a natural image. For PET reconstruction, to eliminate the bias introduced by a specific algorithm or parameter, the aggregation can be performed as an average of multiple PET images reconstructed from the same projection data and prior images using different over-segmentations generated by varying the supervoxel clustering algorithm and/or the parameters. In this work the multi-layer supervoxels were generated by varying the number of supervoxels N and the weight m between the intensity similarity and spatial proximity in the SLIC algorithm.

One contribution of this work is the joint determination of the sparse image representation \(\mathbf {A}\) from both the PET data and the prior image. It is widely acknowledged that the use of an anatomical prior can introduce bias and artefacts in PET image reconstruction when there is signal mismatch between the prior image and true PET image. In this work the over-segmentations derived from the prior images and from the PET data are combined to avoid missing the PET signal absent from the prior image. Also since the proposed sparse image representation is an image geometry constraint rather than an image intensity one, the structures shown in the prior images but not in the true PET image will not explicitly bias the PET image reconstruction. The additional over-segmentation information from PET data is derived from the gradient of the log likelihood in Eq. 3 with respect to the image \(\varvec{f}\) at \(\varvec{f}=\varvec{f}_{prior} = \mathbf {A}_{prior}\varvec{s}_{prior}\), \(\varvec{s}_{prior} = \arg \max _{\varvec{s}\ge 0}L(\varvec{g}| \mathbf {A}_{prior}\varvec{s})\), where \(\varvec{f}_{prior}\) is the PET image determined by only the sparse representation matrix \(\mathbf {A}_{prior}\) derived from the prior image. The gradient \(\frac{\partial L}{\partial \varvec{f}}\) is derived as

$$\begin{aligned} \frac{\partial L}{\partial \varvec{f}} = \frac{\partial L}{\partial \bar{\varvec{g}}} \frac{\partial \bar{\varvec{g}}}{\partial \varvec{f}}, \frac{\partial L}{\partial f_j} =\sum _i p_{i,j}(\frac{g_i}{\bar{g}_i}-1), \end{aligned}$$
(7)

and it is the back-projection of the mismatch of the measured projection data \(\varvec{g}\) and the expected projection data \(\bar{\varvec{g}}\) generated by the forward model with the current image estimation, so apart from the noise it indicates the difference between the true image and the estimated image, and can be used to modify the over-segmentation to account for the PET-only structures. In this work, the supervoxel clustering method SLIC was used to create from this gradient image new voxel clusters that were then separated from the clusters generated from the prior images as new supervoxels to update the sparse representation matrix \(\mathbf {A}\).

3 Experiements

3.1 Simulation Study

Firstly the proposed approach was validated using simulated PET data and compared with the several existing algorithms. [18F]FDG scans were simulated using a 3D brain phantom from the BrainWeb database (http://brainweb.bic.mni.mcgill.ca/brainweb/). The associated T1-weighted MR image was used as the anatomical prior with no additional noise. Theoretical [18F]FDG uptake value was used for the activity in white matter (8452.8 Bq / cc) and grey matter (22986.2 Bq / cc). 6 lesions with various sizes and uptake values were added to the PET phantom which were absent from the MR image, as shown in Fig. 1.

Fig. 1.
figure 1

Simulated brain phantom with lesions (voxel size \(1\times 1\times 1\) \(mm^{3}\)). (a) the BrainWeb brain phantom composed of grey matter, white matter and 6 lesions (various sizes and uptake values); (b) the corresponding T1-weighted MR image used as the anatomical prior.

The activity image was defined on a grid of \(218\times 218\times 187\) voxels and used to generate PET projection data (5  mm FWHM resolution). A \(20\,\%\) uniform background was added as the mean scatter and randoms, and Poisson noise was then introduced. Simulations were conducted at both low-count level (10 million) and high-count level (100 million), with 10 noise realisations for each.

PET reconstruction was performed with pre-reconstruction correction for attenuation, and the estimated scatter and randoms were incorporated in the forward model. The maximum likelihood-expectation maximisation method with no prior [8] (MLEM), the bowsher method [4] (Bowsher), the kernelised EM method [11] (KEM) and the proposed reconstruction algorithm using sparse image representation (SIR) were performed for reconstruction. All the methods used the same resampling operator \(\mathbf {R}\) in Eq. 6, and the same T1-weighted MR image shown in Fig. 1(b) as the anatomical prior except MLEM. For KEM and Bowsher, 50 nearest neighbours in a \(5\times 5\times 5\) local neighbourhood were used. The reconstructed PET images are shown in Fig. 2.

Fig. 2.
figure 2

PET images reconstructed by MLEM (no prior), Bowsher, KEM and the proposed SIR at low-count level (1e7) with the same colour scale. Bowsher oversmooths the lesions, whereas KEM and SIR reduce the noise while recovering the lesion contrast and the proposed SIR shows improved edge preservation.

For quantitative comparison of the reconstruction methods, the ensemble mean squared bias \(Bias^2\) and variance Var were calculated from the reconstructed images. In addition, the contrast recovery coefficient (CRC) was calculated for the lesions. The definitions of \(Bias^2\), Var and CRC in [11] were used. The bias and variance images are shown in Fig. 3.

Fig. 3.
figure 3

Bias images (top, with the same colour scale) and variance images (bottom, with the same colour scale) achieved by MLEM (no prior), Bowsher, KEM and the proposed SIR at low-count level (1e7). SIR achieved reduced bias. Note that the mismatch between the PET phantom and MR image, and the MR partial volume effect also contribute to the bias.

Figure 4 compares the ensemble mean squared bias verse ensemble variance trade-off of the four PET reconstruction algorithms, achieved either by changing the iteration number (MLEM, KEM and SIR) or by varying the penalty weight (Bowsher).

Fig. 4.
figure 4

Ensemble mean squared bias verse ensemble variance trade-off achieved by MLEM (no prior), Bowsher, KEM and the proposed SIR at low-count level (1e7).

Figure 5 compares the contrast recovery coefficient (CRC) ratios versus the background noise trade-off for the 6 lesions. The white matter was used to compute background noise standard deviation. The proposed SIR method achieved higher CRCs with lower background noise.

Fig. 5.
figure 5

Contrast recovery coefficient (CRC) ratios of the lesions verse background noise. A CRC ratio is the CRC value normalised to the ground truth CRC, and 1 is the perfect recovery.

3.2 Clinical Data

The proposed algorithm was also applied to reconstruct clinical [18F]Choline data from a patient scanned with a Siemens Biograph mMR scanner. The T1-weighted MR image (3T 3D GR/IR TR = 1800 ms TE = 2.67 ms FA = 9\(^{\circ }\)) shown in Fig. 6(a) was used as the anatomical prior image after bias field correction using FreeSurfer (http://freesurfer.net/) and denoising with a non-local mean filter [22]. For PET reconstruction, the attenuation map was calculated from a pseudo CT image synthesised from the subject’s T1-weighted MR image acquired in the same imaging session using http://cmictig.cs.ucl.ac.uk/niftyweb/ based on the work in [23]. [18F]Choline PET projection data of 27 s was used to evaluate the reconstruction performance. The data started from 150 s after [18F]Choline injection, and was corrected for dead-time, scatter (based on the synthesised CT), randoms and normalised using the manufacturer’s software. Figure 6 shows the PET images reconstructed by the MLEM, kernelised EM (KEM) and the proposed sparsity-based method, along with the T1-weighted MR image used as the anatomical prior image, and the gadolinium-enhanced T1-weighted MR image which supported the identification of a lesion and it was not used as the anatomical prior. It shows that the image reconstructed by MLEM is too noisy for lesion detection, and the KEM and the proposed methods reduced noise and improved the lesion identification from low-count PET projection data. Note that for the proposed method, the image reconstructed using a single layer of supervoxels is presented here to show its potential.

Fig. 6.
figure 6

Reconstruction of low-count clinical [18F]Choline data by (b) MLEM (no prior), (c) KEM and (d) the proposed SIR method shown with the same colour scale. (a) the T1w MR used as the image prior and (e) the gadolinium-enhanced T1w MR showing the lesion (not used as the prior).

4 Conclusion and Discussion

In this work we have provided a novel perspective of solving the ill-posed problem in PET reconstruction, by exploring the sparse nature of images to reduce the complexity of PET forward projection model to fit noisy photon count data. The PET image is reconstructed on the basis of supervoxels of various sizes and shapes instead of the voxels defined on a regular grid, using a sparse image representation derived from an over-segmentation in the image domain which provides a lower-dimensional representation with little loss of image details. The supervoxel clustering is derived from the anatomical prior images and the log likelihood gradient image computed from the PET data. Multiple layers of supervoxels are used to eliminate the bias introduced by the clustering algorithm and parameters. Unlike in the MAP framework, this regularisation is directly integrated into the PET forward projection model and achieves a very efficient EM solution, and it is directly applicable to direct parametric reconstruction of dynamic PET data. The results of experiments on simulated data show improved bias versus variance trade-off and higher contrast recovery over the current state-of-the-art methods. The application to clinical [18F]Choline data shows promising results of identifying a brain lesion from low-count projection data.

The proposed approach is readily applicable for incorporating multiple prior images, such as MR images (for example T1-weighted, T2-weighted, FLAIR). It can be seen that the over-segmentation is a key factor that affects the final reconstructed PET image, in this work a state-of-the-art supervoxel clustering algorithm was used, and the optimisation specific to a given clinical problem and the strategy of aggregating multi-layer supervoxels will be explored to further improve the performance. Applications to whole-body imaging will also be explored in the future.