Introduction

The value of combined positron emission tomography and computed tomography (PET/CT) for diagnosis and staging of malignant disease is well accepted. Within the last years, two additional applications are of growing importance for PET/CT: radio therapy planning (RTP) and therapy monitoring. Both applications set new challenges for PET/CT—very precise quantification of tumour localisation and tracer uptake.

In RTP, the exact localisation of the tumour must be known to delineate the target volume that will be irradiated, often called biological target volume when using PET [1, 2]. Therefore, correct volume delineation is substantial to achieve local tumour control. Additionally, the volume of irradiated healthy tissue can be reduced, and hence, the dose in malignant tissue can be escalated.

In therapy monitoring, tumour volume is also of interest, but more important is to measure the tracer uptake of the lesion with high accuracy. It may be the decisive factor in the choice of therapy scheme and may influence the future therapy decisions and finally the outcome of the patient [35].

Regardless of discussions about algorithms for tumour volume definition in PET [6], it is clear that motion of tumours during the PET acquisition, which takes several minutes per bed position, is one of the limiting factors. PET presents data averaged over many breathing cycles, which leads to smearing artefacts. Another problem in PET/CT measurements is the possible misalignment between PET and CT data, since CT data are acquired as a snapshot of one breathing state [7]. Especially the lung is of interest, as non-small-cell lung cancer is a major target in the field of using PET in RTP [1, 8] Studies evaluating tumour motion in the lung using 4D MRI found cranio-caudal motion amplitudes of (24 ± 17 mm) in the lower regions of the lung, reporting a significantly higher mobility in tumours of the lower region of the lung than of the middle and upper regions [9]. Such tumour motion can lead to errors in quantification of activity uptake and size of the tumour [10].

Among the attempts to reduce the effects of respiration artefacts, respiratory gating (RG) is currently the mostly used method. In RG, the acquired events are sorted into different respiratory gates depending on the breathing state of the patient. Then, different images are reconstructed for each state which are free of respiration artefacts [11]. This method shows improvements in quantification: Nehmeh et al. [10] reported about a reduction of tumour volume of up to 34% and an increase in the standardised uptake values (SUV) of up to 159% comparing gated images to non-gated images.

The disadvantage of RG is the reduction of signal-to-noise ratio, as for each gate, only a fraction of the acquired counts is used for image reconstruction. This can be improved by increasing the scan time [12, 13]. But on the other side, due to discomfort for patients and for economical reasons, scan time cannot be increased by a factor of four or more, which would be the minimum number of required gates in which a breathing cycle needs to be divided [14]. Hence, for clinical routine, algorithms are needed, which reduce motion artefacts without reducing the image quality. There are various approaches fulfilling this requirement, each with its own advantages and disadvantages:

The first of these correction methods has been reported for patient shifts in PET of the brain where a rigid transformation is sufficient to correct for the motion [15]. Outside the cranium, rigid transformation was used successfully for correction of single organ motion, e.g. for cardiac studies [16]. For motion correction of large fields of view, covering several organs, non-rigid methods seem to be superior and are therefore investigated by several groups. Schaefers et al. [17] recently reported about a correction based on optical flow methods, and Lamare et al. [18] implemented an affine transformation of list-mode data for the correction of respiratory motion over the thorax.

While all these methods correct the whole data set, we investigated an alternative approach based on local motion correction. This means that only the volume around the tumour is corrected, while the rest of the data set remains unchanged. Such an approach is based on the assumption that as long as a limited volume around the tumour is considered, a local rigid transformation may be sufficient to compensate the effects of respiratory motion. This method aims at using all counts for the reconstructed image, combined with a short computational time.

Materials and methods

PET/CT scanner

PET data were acquired with the PET/CT Biograph Sensation 16 scanner (Siemens Medical Solutions, Erlangen, Germany). The PET component of this tomograph consists of 24 detector rings of LSO detectors and is a 3D-only tomograph. The axial and transversal field of view is 16.2 and 58.5 cm, respectively. The transverse resolution of the scanner is 6.5 mm, while the axial resolution is 6.0 mm both at a radius of 1 cm. The CT component of the tomograph is a 16-slice spiral CT with a variable slice thickness of 0.6–10.0 mm and a 50-cm transverse field of view that can be extended to 70 cm by means of a fitting algorithm. A detailed characterisation of the scanner can be found elsewhere [19]. The scanner is equipped with a research package for list-mode acquisition (Siemens Medical Solutions).

Data processing and correction algorithm

For both phantom and patient studies, list-mode data were taken for 10 min. In a first step, a summed image using all data was reconstructed with an attenuation-weighted ordered-subset expectation maximisation algorithm using four iterations and eight subsets. Attenuation correction based on CT data was performed as well as scatter correction. This data set is called “uncorrected image” in the following. A rectangular volume of interest (VOI) was defined manually around the lesion; this could be done using sagittal, coronal and transversal views. Within this VOI, the cranio-caudal motion of the tumour over the whole list-mode acquisition was registered by a centre of mass (COM) approach previously described by our group [20]. The resulting curve represented the tumour position in units of plane numbers (0 is the most cranial plane of the acquired bed position) over time. By dividing the interval between the two maximum end positions of the tumour position into n frames (n ranging from 4 to 40 for the phantom studies and n = 12 for the patient studies), n frames for n different tumour positions were obtained. In the next step, the acquired events were sorted into bins corresponding to the tumour position. This resembles respiratory gating, but in our case, the real tumour position was used as trigger signal and not only a time signal obtained from, for example, the point of maximum inspiration of the patient.

The sorted events were rebinned into sinograms and reconstructed with the same algorithm as the summed image before. Then, in each of these reconstructed images, the COM of the tumour was estimated within the VOI defined before. To avoid influence of the background activity on the value of the COM of the tumour, only voxels which contain activity concentrations higher than 60% of the maximum concentration within the selected VOI were used for this calculation.

Before the next step, a reference frame was chosen: This reference frame reflected the respiratory state of the patient to which the images were corrected. We decided arbitrarily to use the end-expiration state. All other states can be chosen as well depending on the clinical needs. The difference of the COM between the reference frame and all other frames was calculated. This resulted in n − 1 3D vectors for each frame except the reference frame. Then for each frame, the activity distribution inside the VOI was shifted according to this vector, representing the measured tumour motion. The activity distribution outside the VOI was not changed. The shift of the activity distribution was done for each direction separately and according to the following formulas:

$${\text{km}}^{{{\text{cor}}}}_{k} {\left[ {x,y,z} \right]} = {\left( {1 - \xi ^{k}_{X} } \right)} \times {\text{km}}_{k} {\left[ {x - \Delta {\text{COM}}^{k}_{{X{\text{,down}}}} ,y,z} \right]} + \xi ^{k}_{X} \times {\text{km}}_{i} {\left[ {x - \Delta {\text{COM}}^{k}_{{X,{\text{up}}}} ,y,z} \right]}$$
(1)
$${\text{km}}^{{{\text{cor}}}}_{k} {\left[ {x,y,z} \right]} = {\left( {1 - \xi ^{k}_{Y} } \right)} \times {\text{km}}_{k} {\left[ {x,y - \Delta {\text{COM}}^{k}_{{Y,{\text{down}}}} ,z} \right]} + \xi ^{k}_{Y} \times {\text{km}}_{i} {\left[ {x,y - \Delta {\text{COM}}^{k}_{{Y{\text{,up}}}} ,z} \right]}$$
(2)
$${\text{km}}^{{{\text{cor}}}}_{k} {\left[ {x,y,z} \right]} = {\left( {1 - \xi ^{k}_{Z} } \right)} \times {\text{km}}_{k} {\left[ {x,y,z - \Delta {\text{COM}}^{k}_{{Z{\text{,down}}}} } \right]} + \xi ^{k}_{Z} \times {\text{km}}_{i} {\left[ {x,y,z - \Delta {\text{COM}}^{k}_{{Z,{\text{up}}}} } \right]}$$
(3)

where Im k represents the image matrix of the k-th frame before the correction, while Im k [x, y, z] is the element of this matrix at the positions x, y and z. \(\operatorname{Im} _k^{{\text{cor}}} \) is the image matrix after the correction. \(\Delta {\text{COM}}^{k}_{{i,\operatorname{up} }} \) and \(\Delta {\text{COM}}^{k}_{{i,\operatorname{down} }} \) are the values for shifting the activity distribution in frame k in direction i, rounded up and rounded down, respectively. The factor is given by:

$$\xi _d^k = \left| {\Delta {\text{COM}}_d^k \bmod 1} \right|,$$
(4)

where \(\Delta {\text{COM}}_d^k \) represents the value for shifting the activity distribution in frame k into direction d.

With this formula, shifts of the activity distribution in fractions of planes are performed equivalent to a trilinear interpolation. Voxels at the border of the VOI are treated as illustrated in Fig. 1.

Fig. 1
figure 1

Example of voxel shift in VOI: the left side represents the uncorrected and the right side the corrected frame; the VOI is given by the thick rectangle. All voxels within the VOI are shifted upwards by one voxel. Voxels which are shifted out of the VOI are not taken into account for the corrected image (e.g. number 7). Missing voxels at the opposite border are filled by a copy of the closest voxel outside of the VOI (number 2). No voxel outside of the VOI is changed

After performing this step for all frames except the reference frame, the images for all frames were summed to end up with one data set, in the following called “corrected image”. When the maximum difference of the position of the lesion in one direction was smaller than the dimension of one voxel, i.e. 2.7 mm in transversal directions and 3.4 mm in axial direction, this was considered as noise, and hence, no correction in this direction was done. A flow diagram of the whole method is shown in Fig. 2.

Fig. 2
figure 2

Flow diagram of motion correction algorithm

After applying the correction algorithm, the lesions in the uncorrected and the corrected image were analysed for activity concentration and size of the lesions. For the latter, a threshold of 45% of the maximum activity concentration of the lesion was used to delineate the volume. According to [21], this threshold was found to be appropriate for the typical lesion volumes and signal-to-noise ratios found in our patients’ scans. Additionally, the mean activity concentration within the delineated volume was measured. For the patient data, instead of activity concentration, standardised uptake values normalised to body mass were used (SUV).

Motion phantom

To define the optimal number of frames for the correction algorithm and for validation of the method, a phantom was built, simulating respiratory motion in cranio-caudal direction in the presence of background activity. The phantom consisted of a Plexiglas cylinder with a diameter of 30 cm. Inside the cylinder, a moveable holder for fillable Plexiglas spheres was attached. The motion of the holder was controlled by a linear stepper motor, which was attached to it at the top of the cylinder. The stepper motor was controlled by a computer to simulate the respiratory motion. The motion amplitude can be varied between 0 and 30 mm with an accuracy of 0.1 mm. For this study, the phantom was set up with a Plexiglas sphere with a volume of 23 ml filled with 18F-FDG solution. Activity concentrations were chosen to represent a signal-to-background ratio of 1:8, with absolute concentration in the sphere between 20 and 28 kBq/ml.

Four measurements were performed with amplitudes of 15 and 28 mm and a breathing frequency of 13 and 20 cycles per minute for each amplitude. For each of these settings, CT for attenuation correction was acquired, followed by PET acquisition in list-mode format for 10 min. As standard of reference, a data set without motion was acquired. All data were transferred to a workstation for further processing.

Patients

List-mode data were acquired for nine patients with solitary lung lesions. All patients underwent a routine clinical 18F-FDG protocol for oncological staging or restaging including a diagnostic CT or a low-dose CT. Standard protocol during CT requires the patients to hold their breath without extreme inhalation. For three patients (patients 1, 2 and 4), list-mode data were acquired after the clinical protocol was finished (87–137 min after injection of 399–507 MBq depending on the weight of the patient), and for six patients, list-mode data were acquired before the clinical protocol (58–112 min after injection of 404–549 MBq). The bed position of the PET was centred on the expected lesion, and list-mode data were acquired for 10 min. The list-mode data were transferred to a workstation for further processing.

All patients gave written informed consent for the PET and CT studies.

Results

Motion phantom

The maximum detected movement in cranio-caudal direction, measured lesion volume and mean activity concentration within this volume were used as quality factor of the correction algorithm to find the optimal number of frames used to divide the respiration cycle (n). In all measurements, frame numbers higher than 12 showed best values, meaning that the curves of lesion volume as function of n showed minimum values and curves of detected movement and mean activity concentration showed maximum values. The curves for the measurements with 28 mm amplitude are shown in Fig. 3. Although the noise in the curves was different, the tendency was the same for all the curves: Starting from the uncorrected data, the improvement was most pronounced using four and six frames. For values over 16 frames, results were suboptimal and gradually got more fluctuating. Thus, for further analysis, n = 12 was chosen as minimal number of frames to achieve best values for lesion volume and activity concentration.

Fig. 3
figure 3

Measured movement in cranio-caudal direction (a), lesion volume (b) and mean activity concentration in this volume (c) as function of the number of the frames (n). Phantom with 28-mm amplitude and 20/min frequency (solid line) and 13/min frequency (dotted line)

For this number of frames, data for the acquired phantom studies were used to verify the local motion correction algorithm. Values for measured volume, maximum and mean activity concentration before and after correction of the data are shown in Table 1. In case of maximum motion of 28-mm amplitude, lesion volume improved up to 50% and reached the true volume of the sphere of 23 ml within 3.5%, while measured max activity concentration increased up to 14% and mean activity concentration up to 26%.

Table 1 Results measured for motion phantom (23-ml hot sphere) for different amplitude and frequency settings; comparison between corrected and uncorrected data

Patient studies

Six patients had solitary lesions in the lung surrounded only by lung tissue. In two patients, the lesion was attached to the posterior thorax wall, and in one patient, the lesion was at the border liver–lung. Visual analysis of co-registered PET and CT data did not show noticeable mismatch in any of the nine patients.

In Table 2, the maximum motion amplitude for each direction, the lesion volume after the correction and the change of lesion volume, maximum and mean SUV before and after the correction of the data are shown for each data set. Changes in mean SUV and volume are visualised in Fig. 4. The tumour volume was between 2.0 and 17.4 ccm (mean 10.9) before and between 1.7 and 15.6 ccm (mean 9.6) after the correction. The maximum change in measured lesion volume was for patient 1 with 27%. The max SUV (mean SUV) was between 5.5 and 12.4 (3.3 and 8.0) before and between 5.5 and 14.3 (3.3 and 8.7) after the correction. The maximum change in maximum SUV was 15% in patient 4 and in mean SUV for patient 1, 13%.

Fig. 4
figure 4

Change of measured lesion volume and mean SUV

Table 2 Results for patient examinations

Figure 5 shows the lesion in patient 1 in a sagittal view before and after applying the correction algorithm.

Fig. 5
figure 5

CT fused with uncorrected (a) and corrected (b) PET image

Discussion

Due to the long acquisition time, motion is a well-known problem in PET. Especially periodic motion, as it is caused by respiration, can influence the quantification of PET data in areas like thorax or upper abdomen [10]. In this study, we focused on lung lesions, but other organs are also affected by respiratory motion [20], and local motion correction may be applied.

While non-rigid transformations for motion corrections [17, 18] are currently under investigation by others, we present a method that might offer advantages over such transformations. The proposed method is based on the assumption, that an exact quantification in only a limited area around the lesion itself is needed. This assumption holds true for applications like RTP or therapy monitoring. Secondly, the lesion is assumed to be a rigid structure; hence, a linear transformation in the three coordinate axes is sufficient to correct for the motion. Although this assumption might not be fully true, it can be considered as correct within the limits of the spatial resolution of PET. One big advantage of the rigid correction is that the volume of the lesion for each frame is fully maintained. For non-rigid algorithms, there may be a problem of how the volume of a lesion will be changed due to the correction algorithm. Another point is that the proposed method will also—and most likely even better—work with new more specific tracers: firstly, because measuring the COM of a lesions may be related to the signal-to-background ratio, which is higher for more specific tracers. Secondly, our method does not depend on detection of lung–liver or lung–mediastinum borders, as some other methods. Thus, it will still work with more specific tracers, although they may not delineate these borders as well as FDG.

We have shown that with local motion correction, the motion artefacts induced by periodic motion like respiration can be reduced. Measured lesion volume was reduced by up to 50% and was accurate within 3% in phantom studies. The value of 50% was found for a simulated motion of 28 mm, the maximum in our phantom. For smaller amplitudes, the effect of motion is less pronounced, and hence, the improvement of the motion correction is less. This is reflected by a maximum change of 7% for 15-mm motion amplitude. Relative changes in mean activity concentration are lower than changes in lesion volume, but are still up to 26% for an amplitude of 28 mm.

In patient data, we found changes of up to 27% for lesion volume, which is lower than what we measured in the phantom studies. In addition, we found that in patient studies, the improvement by motion correction is less dependent on the amplitude of motion of the lesion. Other important factors are lesion size and tracer uptake. For example, in patient no. 6, who showed only lesion motion of 5 mm in cranio-caudal direction; a difference of 22% in lesions volume was found. This, however, needs to be investigated in a larger patient population.

In RTP, the precise tumour localisation, represented by the biological target volume, is essential to deliver high dose to the malignant tissue and keeping the dose to surrounding tissue low. Therefore, differences found in lesion volume assessment after motion correction may influence treatment planning. Another advantage for RTP is the possibility offered by our method to correct to the particular respiratory state which radiotherapists use for their treatment. For example, if respiratory-gated radiotherapy is done in end-expiration phase, the planning CT is acquired in this phase as well. In the PET data, the lesion is corrected to the end-expiration phase by our method. As CT and PET data are acquired in the same machine, they are co-registered and the data can be transferred to the treatment planning system.

Finally, two PET data sets can be created (one in maximum inspiration and the other in maximum expiration), which show the area over which the tumour is moving. Since this represents an average of many respiration cycles, it may be superior to motion of the tumour measured by 4DCT. The latter is acquired only over one or a few breathing cycles in order to safe radiation dose, so the probability to have representative breathing cycles is less.

We used a method of intrinsic motion assessment based on list-mode data [20] to delineate the respiration curve. From this curve, the different frames used by the correction algorithm are determined. This assures that real tumour motion is used for the correction, including even motion which is non-periodic, e.g. motion caused by muscle relaxation.

In principle, also externally gated studies can be used as input for the correction algorithm presented here. The result will be affected by the fact that only an indicator such as the expansion of the thorax is measured. Advantages of the data-driven method are discussed in [20]. Since after correction all measured events are included in the final image, signal-to-noise ratio is not reduced compared to an uncorrected image. Therefore, scan time does not need to be increased, as it would be the case when analysing frames of gated studies. Thus, this local motion correction method can easily be implemented in the clinical workflow.

Although there is no additional acquisition time necessary, the presented method needs some additional processing time and a minimum of manual interaction. The latter is the definition of the VOI in the non-corrected image. This can be done by the technologist performing the acquisition. Then, the reconstruction of the 12 frames takes about 10 min on a standard PC and the correction process itself about 1 min additional effort in time. When the data-driven motion detection is used to create the respiration curve, about 17 h are necessary for a 10-min PET acquisition and about 5 h for a 3-min acquisition, which corresponds to routine scans in our department. All these steps besides the first VOI definition can be done automatically, for example overnight. The processing time for the data-driven motion detection can be reduced by the use of faster computers or parallel processing on several processors.

Besides this longer processing time, there are limitations for local motion corrections in some settings. It will fail in situations where the detection of lesions is of primary interest. But the major advantage can be found with known lesions which are recommended to undergo PET to obtain additional information for RTP. In patients in which lesion uptake is used for therapy monitoring, quantification can be highly improved for moving lesions. The impact of such a correction on local tumour control or patient outcome needs to be investigated in further studies.

Another issue in the context of motion correction is the problem of an inaccurate attenuation correction due to misalignment between PET and CT data. The problem is that if only one CT is used, which corresponds to another breathing cycle, PET frames which correspond to another breathing cycle are corrected with the appropriate attenuation map. Although beyond the scope of this study, especially because we are not able to acquire 4DCTs with our current hardware, it would be of interest to investigate the additional effect of using optimised attenuation maps for reconstruction of the different frames (e.g. by using 4DCT). To eliminate additional radiation dose, an alternative would be to modify the attenuation map by an algorithm that assigns for each voxel, considered to be within the tumour in the PET, the attenuation value of soft tissue in the corresponding attenuation map. Such an emission-driven correction was used by Martínez-Möller and colleagues in [22] for cardiac PET/CT studies.

Conclusion

We presented a method of local motion correction which improved the accuracy of the quantification of lesion volume and activity uptake, while the image quality was equivalent compared to non-corrected images without increasing the acquisition time. Hence, this method is an optimal preprocessing step for radiation therapy planning especially when high precision is required like in stereotactic radiation therapy. Furthermore, for therapy monitoring, the accuracy in uptake quantification—essential for therapy decisions—is improved.