Introduction

Positron emission tomography (PET) of the brain usually takes several minutes. Due to the long duration of acquisition, the motion of the patient’s head is quite common [1]. This movement of the head during acquisition results in blurred images that in turn may cause misinterpretation or repetition of image acquisition. The severity of image degradation depends on the amplitude of the motion, spatial resolution of the scanner, and the duration of frame acquisition [2]. Several approaches have already been used to confront the problem. Different types of restraints such as thermoplastic masks and neoprene caps are usually used to fix the head during acquisition. Although such restraints limit the amplitude of motions, they do not completely block it. In fact, translational motions in the range of 5–20 mm and rotational motions of 1°–4° have been reported whenever such devices are used [1, 3]. Also, motion-tracking hardware is used to monitor the patient’s movement during image acquisition.

These devices provide information for correcting the motion in acquired images. These systems are usually composed of video cameras that monitor the light sources or reflective mirrors attached to the head of patients [4, 5]. Although the technique has several advantages, the main disadvantage is that the light source needs to be affixed in a precisely known geometry [6, 7]. Furthermore, it is not always possible to place the monitoring device in correct geometry due to the small diameter of the brain PET, especially in hybrid PET-CT systems [8]. The most advanced devices use POLARIS tracking system [1, 2, 9, 10]. Assuming accurate measurement of patient movement, a number of methods for motion correction have been developed:

  1. (a)

    The first is the multiple acquisition frames (MAFs) method [4, 6]. In this method, frames are reconstructed individually and then motion compensation is applied and summed. The patient’s movement is measured by an external monitoring system. Every time the motion is larger than a specified threshold, the PET data will save in a new frame. This is then followed by summation of reconstructions from the motion-free acquired frames. The greatest disadvantage of the MAF method is that it neglects motions within the frames by using a high motion threshold; however, decreasing the motion threshold can instead lead to increasing the number of low statistic frames to be reconstructed.

  2. (b)

    The second method is image-driven correction [11, 15], in which unconstrained or constrained de-convolution operators are used for post-processing of motion-blurred reconstructed images. Primarily, this is not an interesting method because the noise in the PET data is amplified using a de-convolution process.

  3. (c)

    The third method is event-driven correction. It is based on the correction of the individual lines of response (LORs) [5, 11]. In this technique, motion correction is performed by transforming the LORs along which the events are measured to where they would have been measured if the object had not moved [1, 5, 11]. However, implementation of this approach requires the list-mode data acquisition capability or use of specialized hardware to modify LORs on the fly. Moreover, this technique introduces normalization errors because sinograms are likely to be multiplied by normalization factors corresponding to the wrong detector pairs, which a number of works have attempted to address [1214]. The latter techniques, however, further add to the computationally expense, as they modify the sensitivity correction factors.

Rahmim et al. [16] proposed a motion-compensated reconstruction method that is applicable to any scanner in the field (e.g., without list-mode acquisition capability), on the assumption of externally tracked motion information; it involves incorporation of the measured motion information into the system matrix of the EM algorithm. This paper introduces a new technique for detection and correction of head motion during PET image acquisition, using six point sources of activity mounted on plastic cap and multi-frame data acquisition. This technique does not require any external motion-tracking devices. To examine and evaluate the technique, Monte Carlo’s simulated images were used.

Materials and methods

GATE/GEANT

GATE (Geant4 Application for Tomographic Emission) Monte Carlo package (version 4.0.0) was used to simulate a PET scanner [17]. This version of GATE/GEANT has evolved from the GEANT4 Monte Carlo package (version 4.9.1. p02). GATE/GEANT simulator allows describing time-dependent phenomena such as phantom movements or source decay kinetics, thus allowing the simulation of time curves under realistic acquisition conditions. GATE/GEANT has been already validated for simulations of the whole-body ECAT EXACT HR+ PET scanner [18].

PET scanner

ECAT EXACT HR+ PET scanner that consisted of four rings, each including 72 BGO detectors in circular order (diameter 82.7 cm), was simulated. Each individual detector was cut into 8 × 8 crystal elements of 4.4 × 4.1 × 30.0 mm size, giving a total of 32 crystal rings, each with 576 crystal elements covering an axial FOV of 15.5-cm diameter. The transaxial full width at half maximum (FWHM) was 4.3 and 4.5 mm for 2-dimensional and 3-dimensional acquisition, respectively, at the center of the scanner [19].

The event rate for brain usually reaches a maximum from the center of the FOV and a minimum from the FOV edge. Our simulations included normalization, attenuation, random and scatter correction. In general, all real data will be corrupted in some way by attenuation. Therefore, Chang’s first-order correction method was used to compensate for photon attenuation. This method assumes constant attenuation throughout the whole brain. Each pixel within the frame of interest is simply multiplied by a correction coefficient [20]. For scatter correction in 3D mode, Gaussian fit method was applied. This method that consists of fitting a Gaussian profile to the scatter tails found at the edge of each projection works well in our brain scanning, where the activity and the scattering medium are almost uniformly distributed and concentrated at the center of the field of view, hence resulting in a simple, slowly changing scatter distribution [21].

Brain phantom

Hoffman-voxelized phantom was used to simulate a typical patient’s head [4, 6]. The phantom was in 128 × 128 × 64 arrays at the voxel size of 2 × 2 × 2 mm. The phantom included 70 MBq of F18 based on 2-deoxy-2-[F 18]-fluoro-d-glucose (FDG) distribution in normal brain [22]. The phantom was positioned at the center of both the axial and transaxial fields of view (Fig. 1).

Fig. 1
figure 1

HR+ geometry performed by GATE. Three-dimensional (3D) acquisition with a Hoffman phantom

Data acquisition

Imaging was performed in multi-frame acquisition mode at a sampling rate of 2 frames per second. A frame of 0.5-s duration is almost the minimum sampling rate required for tracking the head motion in ordinary situations [1]. In our study, imaging was performed tracking 0.2 million disintegration per frame in 3D acquisition mode. The data were rebound and sinograms were generated in 128 × 128 matrix size. The thickness of the slices was 2 mm. Slice images were reconstructed by the use of filtered back-projection (FBP) algorithm with a 6.0-mm Gaussian filter, considering attenuation correction as is usual in routine clinical practice [1, 22]. All the simulations were performed with a 2.7-GHz quad core PC with a 512-MByte RAM and an 80-GByte system disk. The installed operating system was Fedora12 Linux. Also, the simulation time was about 8 h for each simulation.

Motion tracking

To track the head motion, six point sources consisting of a 1-mm diameter capillary with 1-mm length were considered around the phantom (Fig. 2a).

Fig. 2
figure 2

a illustration of the combination of a set of 6 point sources around a 3D head model. b The relative distances of the 6 point sources, which are used to accurately track the head position (all distances are constant during the scan and in mm)

Because of the short frame duration (0.5 s), the minimum activity for tracking point sources in each frame is 0.5 MBq. However, increasing the activity of point sources may lead to better performance, though resulting in the increase of the dose absorbed by the patient. Because of the spatial angle of radiation emitted from these point sources, the patient-absorbed dose from these points is negligible in contrast to the absorbed dose from injected activity in normal PET study.

A 4 × 4 transformation matrix, representing a combination of 3 rotations and 3 translations, has in reality 12 unknown parameters (the 4th row values are 0, 0, 0, 1); therefore, at least four point sources are required to solve this matrix. However, because of using iterative algorithm for estimating the transform matrix, increasing the number of point sources to six results in significant improvement in accuracy. In practice, the patient’s absorbed dose is an important factor for limiting the number of point sources. The act of detecting such point sources does not require external devices and the scanner itself can track their motion. We assumed that the motion of point sources completely reflected the movement of the subject’s head [1]. In practice, the point sources should be securely attached to the subject’s head and preferably not overlapping the head image.

Motion detection and correction

The most important steps in implementing this algorithm were detection of point sources in the images and determination of their order and configuration. Detection of point sources was based on searching the 3-dimensional reconstructed images and determining six voxels having maximum counts, which are not in close vicinity to each other. The determination of the order and configuration of point sources was based on their relative distances. The point sources were initially placed in a configuration with known distances (Fig. 2b). Since movement and rotation do not change their relative distances, the relative distances of point sources in image were used to determine the order.

The coordinates of the point sources in the first frame were considered as the reference values and constituted the reference matrix (A ref.). The coordinates of the point sources in the test frame were used to constitute the test matrix (A test).

$$ A_{{{\text{ref}} .= }} \left[ {\begin{array}{*{20}c} {x_{1} } & {x_{2} } & {x_{3} } & {x_{4} } & {x_{5} } & {x_{6} } \\ {y_{1} } & {y_{2} } & {y_{3} } & {y_{4} } & {y_{5} } & {y_{6} } \\ {z_{1} } & {z_{2} } & {z_{3} } & {z_{4} } & {z_{5} } & {z_{6} } \\ 1 & 1 & 1 & 1 & 1 & 1 \\ \end{array} } \right]\quad A_{{{\text{test}} = }} \left[ {\begin{array}{*{20}c} {x^{'}_{1} } & {x^{'}_{2} } & {x^{'}_{3} } & {x^{'}_{4} } & {x^{'}_{5} } & {x^{'}_{6} } \\ {y^{'}_{1} } & {y^{'}_{2} } & {y^{'}_{3} } & {y^{'}_{4} } & {y^{'}_{5} } & {y^{'}_{6} } \\ {z^{'}_{1} } & {z^{'}_{2} } & {z^{'}_{3} } & {z^{'}_{4} } & {z^{'}_{5} } & {z^{'}_{6} } \\ 1 & 1 & 1 & 1 & 1 & 1 \\ \end{array} } \right] $$
(1)

The transformation matrix (T tran.), which was solved using nonlinear least-squares iterative algorithm [23, 24], was determined using the following equation:

$$ A_{{{\text{ref}} .}} = T_{{{\text{tran}} .}} \times A_{{{\text{test}} .}} \quad T_{{{\text{tran}} .}} = \left[ {\begin{array}{*{20}c} {t_{11} } & {t_{12} } & {t_{13} } & {t_{14} } \\ {t_{21} } & {t_{22} } & {t_{23} } & {t_{24} } \\ {t_{31} } & {t_{32} } & {t_{33} } & {t_{34} } \\ {t_{41} } & {t_{42} } & {t_{43} } & {t_{44} } \\ \end{array} } \right] $$
(2)

The transformation matrix, T tran. was then checked for movement by comparing with 4-by-4 unit matrix (I) as below:

$$ {\text{CF}} = \sqrt {\sum\limits_{i = 1}^{4} {\sum\limits_{j = 1}^{4} {(T_{{{\text{tran}} .ij}} - I_{ij} )^{2} } } } $$
(3)

CF is a correlation factor between these two matrices. If there was significant movement, CF > 3 corresponded to 1 voxel resolution, and the test frame would be corrected by multiplying with the transformation matrix as:

$$ {\text{Image}}_{{{\text{corr}} .}} = T_{{{\text{tran}} .}} \times {\text{Image}}_{\text{test}} . $$
(4)

The procedure was repeated for all the frames in this study (300 frames for 2.5 min). Then all the corrected frames were added together to create the data for one single slice. A flowchart of the IBMC procedure is shown in (Fig. 3).

Fig. 3
figure 3

Flowchart of the IBMC procedure. The image matrix was corrected by applying the transformation T tran., according to motion data by the motion-tracking system

Head motion

Five types of motions were considered for the phantom:

Rotational motion of 2°, 4°, 7°, 10°, 15° around the x-axis.

Rotational motion of 2°, 4°, 7°, 10°, 15° around the y-axis.

Rotational motion of 2°, 4°, 7°, 10°, 15°, 30°, 90° around the z-axis.

Random combined motion in the range of ±7 mm translation over the x-axis, ±10° rotation around the x-axis, ±5 mm translation over the y-axis, ±15° rotation around the y-axis and finally ±30° rotation around the z-axis. The motion for each frame was produced using a set of five random numbers in the above-mentioned ranges.

Gradually combined motion in the range of ±15 mm translation over the x-axis, ±10° rotation around the x-axis, ±10 mm translation over the y-axis, ±20° rotation around the y-axis, and finally ±15 mm translation over the z-axis and ±15° rotation around the z-axis.

The duration of each of these five motions was 2.5 min.

Evaluation of the motion-corrected images

The method was evaluated by comparing the corrected and non-corrected images with the motion-free reference images. Corresponding slices were extracted and displayed in 128 × 128 size for visual comparison. The difference between corresponding transaxial slice image was assessed by calculating mean-squared differences (MSD) [6].

$$ {\text{MSD}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {[f(i) - f_{\text{ref}} (i)]^{2} } $$
(5)

where n is the number of pixels in each slice, f(i) is the distance between each pixel in the test image (corrected and non-corrected slice) and reference point source (x 1y 1z 1). f ref.(i) is corresponding f(i) for motion-free slice.

In addition, the Pearson correlation coefficient that measures the strength and direction of a linear relationship between two sets of variables was calculated [25]. Each transaxial corrected and non-corrected slice was compared to the corresponding motion-free slice using the following equation:

$$ r = \frac{{\sum\nolimits_{i = 1}^{n} {x_{i} y_{i} } - \frac{1}{n}\sum\nolimits_{i = 1}^{n} {x_{i} } \sum\nolimits_{i = 1}^{n} {y_{i} } }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {x_{i}^{2} } - \frac{1}{n}\left( {\sum\nolimits_{i = 1}^{n} {x_{i} } } \right)^{2} } \sqrt {\sum\nolimits_{i = 1}^{n} {y_{i}^{2} } - \frac{1}{n}\left( {\sum\nolimits_{i = 1}^{n} {y_{i} } } \right)^{2} } }} $$
(6)

where n is the number of pixels in each slice and x i is the pixel value in transaxial corrected and non-corrected slice and y i is the pixel value in corresponding motion-free slice.

Results

Simple rotations

Table 1 shows MSDs and Pearson’s correlation between non-corrected and corresponding reference images (without motion), and also between motion-corrected and reference images. The values are averages derived from 64 slices of Hoffman phantom and corresponding standard deviations. As can be seen for 2° of rotations (around x-, y- and z-axis), the difference between non-corrected and corrected images was not significant. This was probably due to small amplitudes of the motions compared to the scanner resolution. For rotations of ≥4°, the differences between the corrected and non-corrected images were remarkable. Statistical t test showed that, after correction, the differences between motion-corrected and corresponding reference images were not significant, showing the success of algorithm in correcting the image for motion.

Table 1 Calculation of MSD and Pearson correlation coefficient for 17 types of motion

A representative slice image of Hoffman brain phantom is presented in Fig. 4. The figure shows the same slice after 1°, 7° and 90° of rotation around the x-, y-, and z-axis, respectively, and corresponding images after motion correction.

Fig. 4
figure 4

A representative slice of Hoffman brain phantom (R). Images of the same slice after 10° rotation around the x-axis (a), 7° rotation around the y-axis (b) and 90° rotation around the z-axis (c). The corresponding images after corrections are marked as a’, b’ and c’, respectively

Combined motion

Random motions of the phantom during 2.5 min of study are simulated. A representative slice image of Hoffman phantom (without motion) and the corresponding slice image with combined random motion and after motion correction are presented in (Fig. 5). The counts along a line profile drawn on these slices are shown in (Fig. 5). As can be seen, the line profile over the motion-corrected image is very close to the corresponding line profile over the reference image. The correlation graphs of the same images are shown in (Fig. 6). The figure shows pixel-by-pixel comparisons of the image before and after motion correction to the reference image. The correlation coefficients before and after motion correction were 0.67 and 0.98, respectively. Also, the corresponding MSDs were 197.2 and 0.84, respectively.

Fig. 5
figure 5

A representative slice image of Hoffman phantom; the reference slice image (without motion) is shown on the left, the corresponding slice image with combined random motion in the middle and the same image after motion correction on the right. The counts along a line profile drawn on these slices are shown below. As can be seen, the line profile over the motion-corrected image is very close to the corresponding line profile over the reference image

Fig. 6
figure 6

Pixel-by-pixel comparison. Graphical representation of correlation between a slice image of Hoffman phantom and the corresponding reference image before and after random motion correction

In the gradually combined motion study, the continuously moving phantom was scanned for 2.5 min, while the ranges of motion were ±15 mm and ±10° in the x-axis, ±10 mm and ±20° in the y-axis, and ±15 mm and ± 15° in the z-axis. A slice image of the Hoffman phantom (without motion) and the corresponding image with gradual motion before and after motion correction are presented in Fig. 7. Also, count profiles of these slices are shown in Fig. 7. As can be seen, the line profile over the motion-corrected image is very close to the corresponding line profile over the reference image.

Fig. 7
figure 7

A slice image of stationary phantom is shown on the left, the corresponding slice image after adding gradual motion to the phantom in the middle and the same slice after motion correction on the right. Below are line profiles over the same images

The corresponding correlation graphs are shown in Fig. 8. The figure shows pixel-by-pixel comparisons of the image before and after motion correction to the reference image. The correlation coefficient before correction was 0.66 and after correction 0.97. Also, the MSDs were 193.1 and 0.85 before and after motion correction, respectively. Table 2 shows the MSDs and correlation coefficients averaged over 64 slices of Hoffman phantom image for random and gradual motions separately. All simulations were also performed for a longer time frame scan (2.5 min, 75 frames, each frame of 2 s) with the same random and gradual motions. Table 3) shows the MSDs and correlation coefficients averaged over 64 slices of Hoffman phantom image for these simulations separately. From Tables 2 and 3, it can be inferred that, with the same movements, on increasing the time frame from 0.5 to 2 s the effectiveness of the proposed technique in motion compensation was reduced.

Fig. 8
figure 8

Pixel-by-pixel comparison. Graphical representation of correlation between a slice image of Hoffman phantom and the corresponding reference image before and after gradual motion correction

Table 2 Calculation of MSD and Pearson correlation coefficient for two types of phantom motion for 2.5 min (300 frames, each frame of 0.5 s)
Table 3 Calculation of MSD and Pearson correlation coefficient for two types of phantom motion for 2.5 min (75 frames, each frame of 2 s)

Discussion and conclusion

We presented an image-based method for correction of head motion artifacts in brain PET acquisitions without using any external motion-tracking system, using 6 positron emitter point sources. Because the proposed technique uses the same coordinate system for motion tracking and image acquisition, the complicated steps of the compensation for relative motion between tracker and scanner were eliminated. Furthermore, this method did not require any additional hardware or software for communication between tracker and scanner.

Alternatively, it is possible to compensate for the motion by realigning each LOR in the list-mode data according to the motion data [1, 11]. However, the size of list-mode data is large and realignment of the LORs needs more computation than the proposed image-based method. In addition, our technique can be applied to PET systems, which cannot perform list-mode data acquisition.

Very effective motion compensation was achieved in Monte Carlo simulated images. It was shown that the method was successful in correcting different types of motion. However, the motion-corrected slices appeared slightly poor in resolution and contrast compared to corresponding reference images. This was mainly due to two different reasons.

First, the present method includes the intra-frame motion and compares the frames with each other. The efficiency of the technique may be improved by reducing the time per frame of acquisition. As can be seen from Tables 2 and 3, with the same movements, on increasing the time frame (from 0.5 to 2 s.) the effectiveness of the proposed technique in motion compensation was reduced. Theoretically, it is possible to reduce the duration as desired, though at the cost of disk space and processing time. Increasing the temporal resolution may also require higher activity for the point source that may be limited due to radiation protection concerns.

Another reason for the poor quality of motion-corrected image is the spatial resolution of reconstructed images. In a digital image, correction is limited at the pixel level. In our study, pixel size was 2 × 2 mm. If a pixel required 1–2 mm correction, only one pixel correction would be possible, which might cause a relatively large interpolation error. Moreover, rotational motion affects an image irregularly. The areas that are close to the center of rotation are less affected compared to those far from the center. This results in some image degradation due to any type of motion correction. To reduce this effect, the resolution of the image may be increased by using smaller voxels and utilizing super-resolution methods; in any case, image resolution is limited to the intrinsic spatial resolution of the imaging system.

Further investigation is required to determine the impact of the technique in a clinical context.