Introduction

Over the past several decades, [18F]fluorodeoxyglucose positron emission tomography ([18F]FDG PET) imaging has been widely used in both clinical practice and research [1,2,3]. Recently, the application of PET in pediatric diseases has gained significant interest, proving to be a beneficial diagnostic instrument for evaluating both pediatric tumors and non-tumor diseases [3, 4]. [18F]FDG PET has been employed in the diagnosis of pediatric solid tumors, hematological tumors, post-transplant lymphoproliferative disease (PTLD), and fever of unknown origin (FUO) [5,6,7]. It is routinely used for various purposes such as staging, evaluating treatment response, monitoring disease progression, and detecting recurrence in numerous malignancies [8].

However, compared with adult PET imaging, the process of pediatric PET examination is more intricate [9, 10]. Immobilization in pediatric patients can be accomplished using body wraps or specialized holding devices. Parents are often allowed to stay with the child throughout the study to provide emotional support [11]. Maintaining stillness during the examination is crucial in pediatric imaging, as minor movements can lead to artifacts. Sedation is typically recommended to inhibit movement during the examination and ensure satisfactory image quality [12]. However, sedation comes with certain drawbacks: (1) It may potentially affect the neurodevelopment of children [13]; (2) despite sedation, some children may not cooperate or may exhibit involuntary movements during the examination; and (3) the potential effects of sedation on the biodistribution of the FDG tracer must be taken into account [11, 14]. Consequently, research efforts are increasingly focusing on expedited PET scanning and accurate diagnosis without the need for sedation.

Various methods were employed to compensate or correct for movement during PET imaging, such as the use of external devices that have proven successful in detecting motion [15]. However, the routine clinical application of these devices was impeded by challenges including limited and complex integration with the scanner, additional patient setup time, patient discomfort, and a relatively high failure rate in some cases. An alternative approach involved extracting motion information directly from the PET data by reconstructing a time series of short-duration PET images [16]. It was worth noting that the limitation of current data-driven methods for PET was their reliance on the CT image used for attenuation and scatter correction.

Ultrafast total-body PET imaging offered a balance between acquisition time and image reconstruction quality [17]. Total-body PET/CT enabled simultaneous PET acquisition of the entire human body with a single bed. This scanner boasted superior sensitivity of 174 kcps/MBq and commendable spatial resolution capabilities for human imaging (\(\le\) 3.0 mm FWHM near the center of the AFOV). As a result, the sensitivity of the PET scanner was greatly improved, and the acquisition time is significantly shortened [18]. In the study of [19], despite the noticeable visual distinctions evident in total-body PET images with 30-s and 300-s acquisition time, there existed no substantive variance in diagnostic accuracy between them.

Deep learning–based frameworks emerged as powerful tools, providing solutions to various tasks in medical imaging, including disease diagnosis, tumor segmentation, and image post-reconstruction [20,21,22,23]. Several generative models [9], e.g., employing convolutional neural networks (CNN) or generative adversarial networks (GAN), drew researchers’ attention due to their superior results in enhancing the quality of PET images compared to traditional approaches, all while maintaining low computational costs. However, to our best knowledge, a great number of deep learning–based methods focused on brain PET images instead of total-body PET data of pediatric patients [3, 9]. Additionally, popular GAN-based frameworks usually suffered from instability and model collapse during the training process, as obtaining the Nash equilibrium point was challenging and the training process in GANs was complex [24].

In this prospective study, we developed a deformable 3D U-Net on total-body PET to bolster the feasibility of sedation-free pediatric PET imaging (acquisition time durations of about 6 s, \(\ge\) 50-fold reduced scan duration compared to standard clinical routine). The proposed model was trained on 245 adult subjects and initially tested on 16 pediatric patients under sedation (weight 4.7–17.0 kg, age 0.4–3.5 years). Five rapid scans (acquisition times of approximately 3 s, 6 s, 15 s, 30 s, and 75 s) were retrospectively simulated by selecting the reconstruction time window. A variety of metrics, including quantitative comparisons of PET signal recovery and clinical reading scores, were adopted to compare the physical image quality, the preservation of clinical information, and lesion detectability between synthesized PET images and ground truth, i.e., full-time PET scans (300-s acquisition time). In the end, two experienced radiologists evaluated the ultrafast PET images of five children without sedation (weight 10.0–16.8 kg, age 1.2–2.9 years, 6-s scan time) enhanced by the proposed approach, which validated the detectability and clinical feasibility of synthesized PET images. The experimental results also demonstrated that the proposed method, based on ultrafast PET scanning, has the potential to mitigate the known side effects associated with the use of sedation in pediatric patients.

Materials and methods

Patients

This study was a prospective single-center imaging investigation that encompasses three distinct datasets. The first dataset comprised a substantial cohort of 245 adult subjects for the training and hyperparameter tuning of the proposed deep neural network. The second dataset (named test set 1) consisted of a smaller, independent group involving 16 pediatric subjects who underwent sedation due to suspected tumors, serving as the primary test set. The third dataset, denoted as test set 2, encompasses a prospective group of five children without sedation. All the patients, who underwent total-body PET/CT at the Department of Nuclear Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, were enrolled for tumor diagnosis staging from January 2022 to January 2023. The selected pediatric dataset had the following inclusion criteria: (1) age under 4 years, (2) underwent total-body PET scan, and (3) had a definite surgical pathology or follow-up diagnosis. Among them, 16 children were orally sedated 20 min before the PET scan while five children did not take any sedation during the PET scan. The study was approved by the institutional review board of Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, and the informed consent was obtained from all patients’ legal guardians. The characteristics of all patients are presented in Table 1.

Table 1 Adult and pediatric patient characteristics

PET imaging protocol

All adult patients were required to fast for at least 6 h prior to [18F]FDG administration. A total-body PET/CT scanner with an AFOV (axial transverse field of view) of 194 cm (uEXPLORER, United Imaging Healthcare, Shanghai, China) was performed 1 h after [18F]FDG injection. As shown in Fig. 1, the acquisition time of list-mode PET data was 300 s, and all the PET images, including test sets 1 and 2, were reconstructed 75 s, 30 s, 15 s, 6 s, and 3 s by adopting OSEM (ordered-subset expectation maximization) with the following parameters: time of flight (TOF) and point spread function (PSF) modeling; two iterations and 20 subsets; matrix 256 × 256; slice thickness of 2.89 mm; and pixel size, 2.34 × 2.34 × 2.89 mm3 with a Gaussian post-filter (3 mm). Meanwhile, some critical correction approaches, including attenuation and scatter correction, were implemented. All images were assessed on a commercial medical image processing workstation (uWS-MI, United Imaging Healthcare) to ensure a better comparison.

Fig. 1
figure 1

[18F]FDG PET image of a 3-year-old (female) patient weighing 15 kg with enlarged lymph nodes in the neck was reconstructed into 3 s, 6 s, 15 s, 30 s, 75 s, and 300 s that were shown in MIP and axial view. This patient with sedation was randomly selected from the test set 1

Data processing

In this study, the adult dataset with 245 subjects was utilized exclusively during the entire training process, where the primary focus was to train the proposed model and optimize parameters to achieve superior image reconstruction quality. During the training stage, the adult dataset was divided into tenfolds, with each fold containing approximately 10% of PET images. The proposed approach underwent a tenfold cross-validation procedure, which involved ten iterations. In each iteration, onefold was designated as the validation set, while another fold was used as the test set, and the remaining eightfolds were used for model training. The above procedure was repeated ten times to ensure stable and average performance. Once the optimal training strategy and hyperparameter groups were found, the training and tuning process was halted. Subsequently, all subjects in the adult dataset were adopted to train the final version of the proposed model based on the optimized hyperparameter groups. Finally, the whole subjects from two pediatric datasets were fed into the well-trained neural network model to generate the synthesized PET images. This was done to assess the effectiveness and the routine feasibility of intelligent ultrafast total-body PET for sedation-free pediatric [18F]FDG imaging.

Deep neural network model

In this study, a deep convolutional neural network named deformable 3D U-Net was devised to synthesize ultrafast total-body PET images for sedation-free pediatric [18F]FDG imaging. Inspired by DeepPET [25], the proposed framework adopted a U-Net-like architecture, consisting of two parts: the encoding path and the decoding path, as shown in Fig. 2. In the encoding path, we designed a novel spatial deformable aggregation block (SDAB) with residual projection to adaptively extract spatial information from a larger receptive field. In the decoding path, we also employed SDAB to reduce the adverse effects of image noise, reconstruct textures and boundaries, and enhance the brightness and contrast of total-body [18F]FDG PET images. Meanwhile, we proposed a novel multi-scale self-attention module (MSSAM) in the decoding path to exploit multi-scale self-similarity prior, capture long-range dependency, and break scale constraint, producing more prosperous and more faithful details. The model was trained by designing the pairs of adult PET images for ultrafast scan (3 s, 6 s, 15 s, 30 s, and 75 s) and standard reference (300 s) and then tested on the datasets of pediatric patients with/without sedation. More information about the architecture of proposed deformable 3D U-Net and the training processing was attached in the Supplementary material.

Fig. 2
figure 2

An illustration of the proposed deformable 3D U-Net

Evaluation metrics and statistical analysis

For qualitative assessments, the quality of all synthesized PET images with different acquisition times was evaluated by two experienced nuclear radiologists (i.e., Ruohua Chen with 8 years of experience and Xiang Zhou with 20 years of experience in PET-related diagnosis). The opinion scores were given by them to assess the clinical feasibility, e.g., metabolic details at different body regions. The images in test sets 1 and 2 were presented to the two radiologists for independent reading in a randomized order to minimize potential bias. They were also blinded to the patient history and the acquisition time. According to the widely recognized reading standard in many recent studies [3, 26, 27], the radiologists independently assigned an image quality score on a five-point Likert scale, and the evaluation standards are shown in Table 2. The five-point Likert scale was used to evaluate three aspects: (1) the conspicuity of the organ anatomical structures, (2) the conspicuity of the major suspected malignant lesions, and (3) the image noise. Paired sample t-tests were also utilized to evaluate the statistical significance of opinion scores between two radiologists.

Table 2 The five-point scale for the evaluation of overall image quality

For quantitative evaluation, three classical quantitative metrics were included in this study to assess the reconstruction performance of these [18F]FDG PET images with different acquisition times: peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and feature similarity (FSIM). The PSNR was defined as

$${\text{PSNR}}=10{{\text{log}}}_{10}\left(\frac{V{R}^{2}}{{\Vert y-x\Vert }_{2}^{2}}\right)$$

where V was the total amounts of voxels and R represented the range of the intensity of the PET image with 300-s acquisition times, and \({\Vert y-x\Vert }_{2}^{2}\) computed the mean squared error between it (\(x\)) and the reconstructed PET image (\(y\)). The pixel-wise quantities were easily calculated and compared, which also had straightforward interpretations. However, they did not correspond well with the sort of errors that humans perceived, particularly blurring and smearing artifacts. Additional measures that more accurately reflected perceived image quality were therefore desirable. The SSIM was defined as

$${\text{SSIM}}\left(x,y\right)=\frac{\left(2{\mu }_{x}{\mu }_{y}+{C}_{1}\right)\left(2{\sigma }_{xy}+{C}_{2}\right)}{\left({\mu }_{x}^{2}+{\mu }_{y}^{2}+{C}_{1}\right)\left({\sigma }_{x}^{2}+{\sigma }_{y}^{2}+{C}_{2}\right)}$$

where \({\mu }_{x}\) and \({\mu }_{y}\) were the averages of images \(x\) (i.e., the PET image with 300-s acquisition times) and \(y\) (i.e., the corresponding reconstructed PET image), and \({\sigma }_{x}\) and \({\sigma }_{y}\) were their standard deviations, respectively. \({C}_{1}\) and \({C}_{2}\) were two positive constants to avoid a null denominator and they were usually fixed at \(1\times {10}^{-6}\) and \(3\times {10}^{-6}\). Analogously, the FSIM was defined as

$${\text{FSIM}}\left(x,y\right)=\frac{\sum G\left(x\right)*G\left(y\right)*S\left(x\right)*S\left(y\right)}{\left(\sum {G\left(x\right)}^{2}\sum {G\left(y\right)}^{2}\right)}$$

where \(x\) represented the PET image with 300-s acquisition times, \(y\) represented the corresponding reconstructed PET image, \(G(x)\) represented the gradient magnitude of image \(x\), \(G(y)\) represented the gradient magnitude of image \(y\), and \(S\left(x\right)\) and \(S\left(y\right)\) represented the contrast information of image \(x\) and image \(y\), respectively. This formula’s numerator was the weighted sum of the product of the gradient and contrast information from the two images. Conversely, the denominator of this formula was the weighted sum of the product of the squared sum of gradient information from the two images. Theoretically, images with higher PSNR and values closer to 1 for SSIM and FSIM were considered to be of higher quality in the context of image synthesis and reconstruction.

For lesion detectability and clinical feasibility of the proposed method, imaging metrics, i.e., the mean standardized uptake value (SUVmean) and the maximum standardized uptake value (SUVmax), were employed as tracer measures in the selected lesions. These lesions were manually delineated and subjected to review by a nuclear medicine physician under the supervision of a radiologist. To delineate the statistical significance, paired t-tests were used to compare the objective image values (SUVmean and SUVmax) and the quality metric values (PSNR, SSIM, and FSIM) between the ground truth (established as the standard reference, i.e., PET images acquired with a 300-s acquisition time) and the outcomes derived from the proposed deformable 3D U-Net. A p value of < 0.05 was considered to indicate statistical significance.

Results

Objective image quality

The quantitative and qualitative results of test set 1 are depicted in Fig. 3 and Tables 3 and 4. As shown in Fig. 3, the synthesized PET images were of high visual image quality (less noise) and present significant improvements in comparison to the original total-body PET images. Here, [18F]FDG PET images with 300-s acquisition time were established as the ground truth. As delineated in Table 3, as the acquisition time decreased to 30 s, 15 s, 6 s, and 3 s, the PSNR of original PET images diminished to \(35.71\pm 7.63\), \(32.24\pm 7.65\), \(29.13\pm 7.81\), and \(28.92\pm 7.70\), respectively. Correspondingly, SSIM decreased to \(0.917\pm 0.07\), \(0.913\pm 0.07\), \(0.906\pm 0.07\), and \(0.888\pm 0.07\), while FSIM decreased to \(0.925\pm 0.06\), \(0.918\pm 0.06\), \(0.911\pm 0.06\), and \(0.895\pm 0.05\). These trends revealed a positive correlation between quantitative values of PSNR, SSIM, and FSIM and the length of acquisition time. The results were presented in the form of a ± b, where a and b represented the average and standard deviation values of each metric based on tenfold cross-validation experiments.

Fig. 3
figure 3

Visualized results of the original [18F]FDG PET image with full time, [18F]FDG PET images with 75 s, 30 s, 15 s, 6 s, 3 s, and the corresponding synthesized PET images. This figure is derived from a randomly selected child randomly selected child subject with sedation in the Test Set 1

Table 3 Quantitative results of the original ultrafast total-body PET images with different acquisition time from 16 pediatric scans in test set 1
Table 4 Quantitative results of the synthesized total-body PET images with different acquisition times from 16 pediatric scans in test set 1

The quantitative results of the corresponding synthesized PET images, including the average PSNR, SSIM, and FSIM, are presented in Table 4. The p-values, calculated by comparing ultrafast PET images with varying acquisition times against their corresponding synthesized PET images (e.g., Table 3 vs. Table 4), were also included in the analysis. It was observed that the proposed deformable 3D U-Net yields substantial improvements, enhancing PSNR by 23.06%, 27.33%, 18.08%, and 11.65%; SSIM by 2.48%, 1.66%, 1.53%, and 1.64%; and FSIM by 2.57%, 1.87%, 1.31%, and 1.62%, when compared to the original ultrafast PET images. These improvements were statistically significant (p < 0.01 in t-tests). It was noteworthy that, for ultrafast PET images with longer acquisition times (e.g., 75 s), the increment ratios were not as pronounced as those observed with shorter acquisition time (e.g., 3 s or 6 s). These findings underscored the considerable potential of the proposed model in the context of ultrafast PET reconstruction scenarios.

Figure 5 illustrates visualized results of synthesized PET images based on short-term scans with various acquisition times (3 s, 6 s, 15 s, 30 s, and 75 s) and the original full-time PET images. The patient depicted in this figure was the same as that in Fig. 3. The proposed method generated images that preserved tumors (highlighted by red numbers) and displayed tissue structure. Local regions of interest were magnified using red boxes. For each PET image, the two radiologists independently assigned an image quality score on a five-point Likert scale, as we described in Table 2. According to the independent scoring results, for all sub-images in Fig. 5, the two radiologists provided consistent image quality scores; hence, we directly displayed the scores for each sub-image. The middle row of Fig. 5 depicted the axial views of the abdominal lesions with different acquisition times in terms of PSNR, SSIM, and FSIM, indicating that the benefits (PSNR of 36.37, SSIM of 0.960, and FSIM of 0.964) from the proposed method were obvious under shorter-term scan conditions (i.e., 6-s acquisition time) for diagnosis.

For lesion detection, Table 7 provided the mean standardized uptake values (SUVmean) errors and max standardized uptake value (SUVmax) errors which were calculated from the ultrafast PET images and the synthesized scans relative to the full-time scan images for the same patient in Fig. 5. We could observe that after enhancement by the proposed approach, the average SUVmean error ratios for ultrafast PET images with 75 s, 30 s, 15 s, 6 s, and 3 s were reduced from 2.0 to 0.3% (p > 0.05), 4.7 to 0.9% (p > 0.05), 13.6 to 3.42% (p < 0.05), 12.9 to 4.1% (p < 0.05), and 19.3 to 8.1% (p < 0.05), respectively. Similarly, the average SUVmax error ratios for ultrafast PET images with 75 s, 30 s, 15 s, 6 s, and 3 s were reduced from 11.2 to 5.6% (p > 0.05), 8.7 to 2.8% (p > 0.05), 12.2 to 4.4% (p < 0.05), 17.4 to 6.2% (p < 0.05), and 20.1 to 6.6% (p < 0.05), respectively. It was evident that the synthesized lesion areas based on ultrafast PET images (e.g., 3 s, 6 s, or 15 s) showed significantly improved error ratios in SUVmean and SUVmax (p < 0.05). However, the SUVmean and SUVmax error ratios of the lesion regions synthesized by the proposed framework did not exhibit significant differences compared to the original full-time PET images in the 30 s and above PET imaging strategies. These results demonstrated the strong reconstruction ability of the proposed framework, particularly for PET images acquired with very short scanning times (e.g., less than 15 s), highlighting its adaptability to ultrafast PET image reconstruction.

Subjective image quality

Table 5 presents the average opinion scores of synthesized total-body PET images with various acquisition times. The images with 3 s (\(1.8\pm 0.4\)) and 6 s (\(3.2\pm 0.4\)) had significantly lower scores than those with 75 s (\(4.6\pm 0.7\); p < \(1\times {10}^{-3}\)). However, the generated PET images with 15 s (\(3.6\pm 0.4\)) and 30 s (\(3.9\pm 0.4\)) received higher scores. It was noteworthy that only five cases of 6-s acquisition time and two cases of 15-s acquisition time were scored 2. These subjective image quality scores highlighted the great potential of the proposed model in ultrafast PET reconstruction scenarios. Specifically, the proposed deformable 3D U-Net may significantly improve opinion scores for ultrafast PET images with an acquisition time of 6 s or more. Figure 5 further illustrates the corresponding opinion scores for the brain, abdomen, and pelvic cavity, offering a more detailed understanding of the clinical reading. The brain images showed in the top row in Fig. 5 were further preprocessed and registered to the MNI brain atlas for more accurate quantitative analysis. The opinion scores of these synthesized PET scans with 3 s, 6 s, 15 s, 30 s, and 75 s were 1, 2, 3, 4, and 5, respectively. The results meant that the generated ultrafast brain PET images with 6 s were difficult to meet the clinical diagnostic requirements.

Table 5 Summary of the quality scores of synthesized images from 16 pediatric scans in test set 1

Sedation-free synthesis

Figure 4 depicts the visualized results of the five original [18F]FDG PET images (6-s acquisition time) and the corresponding synthesized PET images. These subfigures were derived from test set 2. The pediatric subjects in our study, who underwent imaging without sedation, were gently guided into a state of sleep with their parents’ presence. Following this, they were subjected to a rapid PET scan, with a total acquisition time of 300 s, under the careful assistance of technologists and nurses. It was evident that our proposed deformable 3D U-Net consistently achieved high-quality reconstruction regardless of the PET imaging acquisition time. Upon closer examination of the regional details in the synthesized PET images, such as the liver region in Figs. 3 and 4, it became apparent that the reconstruction quality of longer acquisition time PET images (i.e., 6 s) surpassed that of shorter acquisition time PET images (i.e., 3 s). This disparity arose from the inherent limitation of ultrafast PET images with shorter acquisition times (e.g., 3 s), which inherently contained significantly less metabolic detail across the entire body, thereby presenting considerable challenges for the reconstruction process. Two radiologists reached a high degree of consistency on the clinical feasibility of the 6-s synthesized PET images.

Fig. 4
figure 4

Visualized results of the original [18F]FDG PET image (6-s acquisition time) and the corresponding generated PET images. All of the five children from test set 2, who are imaged without sedation, are displayed in this figure

Discussion

Total-body PET imaging was known for its high sensitivity and had the potential to reduce examination time. However, rapid total-body PET scanning with shorter acquisition time often resulted in compromised image quality, leading to a significantly reduced signal-to-noise ratio and an increased risk of missing small tumors. In previous works [28,29,30], ultrafast total-body PET imaging adopted GANs to achieve satisfactory performance based on 30-s acquisition time. For instance, Hosch et al. [31] proposed a modified pix2pixHD deep-learning network, which was trained on the data from 387 patients who underwent ultra-low-count FDG PET/CT scans (whole-body acquisition time approximately 30 s) and tested on data from 200 patients, to generate synthesized full-dose PET images based on a digital PET/CT scanner. Additionally, Zhang et al. [19] investigated the diagnostic value of the images obtained through the total-body PET/CT scans (88 oncology patients) with 30-s acquisition time and adopted the post-surgical pathological diagnosis to evaluate the experimental results. To the best of our knowledge, no previous studies specifically focused on intelligent ultrafast total-body PET for sedation-free pediatric [18F]FDG Imaging, particularly involving brain function analysis, abdominal lesion detection, and pelvic tumor classification.

Evidence from prior research [32,33,34] suggested that sedation might alter the accurate distribution patterns of radioactive tracers, like FDG, in brain PET imaging. Such alterations have the potential to result in diagnostic inaccuracies or errors. Furthermore, sedation can induce respiratory depression in children, as all sedative drugs suppress the central nervous system in a dose-dependent manner, potentially resulting in the loss of airway control. Allergic reactions to light oral sedation, while rare, can also occur, presenting symptoms such as skin rash, itching, and swelling of the face, tongue, or lips. Moreover, after using sedation, pediatric patients are typically monitored until they are near their baseline level of consciousness and are no longer at increased risk for cardiorespiratory depression. Some children may require extended monitoring for up to 12 h post-sedation. Finally, in the context of sedation procedures, the role of the sedation nurse is critical. They ensure the availability and proper functioning of necessary monitoring equipment, verify that informed consent has been obtained prior to sedation administration, and confirm that the patient will be accompanied by a responsible adult upon discharge. These measures are integral to maintaining patient safety and the integrity of the sedation process. These risks underscore the importance of careful consideration and medical supervision when administering sedation, particularly in children. Due to emotional or stressful reactions, some children do not cooperate with doctors and parents to use sedation, and involuntary movement may still occur post-sedation. Thus, ultrafast imaging based on 30-s acquisition time via the total-body PET/CT could not be suitable for pediatric patients. Therefore, it is urgent to explore the limitation of acquisition time for ultra-fast scanning without sedation. A sedation-free approach for ultrafast PET reconstruction would potentially reduce the risks associated with sedation, such as respiratory depression and allergic reactions, and could also reduce costs and recovery time.

In this study, we attempted to use an artificial intelligence framework to investigate the feasibility of acquiring maximum-speed ultrafast PET images for sedation-free pediatric patients, while ensuring image quality for subsequent lesion detection and analysis. We proposed a novel deformable 3D U-Net framework to generate high quality total-body PET images based on ultrafast PET scans with 50-fold reduced acquisition time (6 s), obtained using the uEXPLORER PET/CT system. Our approach offered three main advantages over the standard 3D U-Net, as identified in our preliminary performance comparison tests: (1) adaptive convolutional layers: The deformable convolutional layers in our framework can adaptively modify the shape of the convolutional kernel. This feature was crucial for synthesizing total-body PET images due to the high variability of anatomical structures. (2) Enhanced feature capture: Deformable convolutions were more proficient at capturing complex spatial relationships and morphological variations, leading to more accurate and robust feature extraction than standard convolutional layers. (3) Superior performance in quantitative metrics: Our results suggested that the deformable 3D U-Net significantly outperformed the standard 3D U-Net, particularly in cases with substantial anatomical variation among different patients. To our knowledge, only a few previous approaches [35,36,37] that used GANs or U-Nets enhanced pediatric [18F]FDG ultrafast scans using total-body PET imaging. Some previously published approaches were restricted to abdominal or pelvic PET scanning, which was more accessible due to the limited anatomical variance compared to whole-body images [38]. We aimed to evaluate the qualitative and clinical results of this study in the context of ultrafast total-body imaging for pediatric patients without sedation.

The quantitative and semiquantitative results presented that the quality of the synthesized PET images gradually increases with increasing scanning time, and the quantitative (SSIM, PSNR, FSIM) and semiquantitative (SUVmax, SUVmean) values of the images synthesized by the proposed deformable 3D U-Net were the closest to the full-time reference images for the same short-term scan duration. Significant improvements in quantitative metrics were observed for 6-s and 15-s acquisition time (Table 4 and Fig. 5). For the reconstruction of ultrafast brain PET images, it was found that the acquisition time could be shortened to 15 s to achieve satisfactory qualitative and quantitative results (Figs. 5 and 6). In the case of abdominal and pelvic diseases, an extremely short scanning time of 6 s was sufficient to produce images of diagnostic quality (Figs. 5 and 6). Therefore, the proposed method demonstrated its ability to compensate for the reduced signal-to-noise ratio caused by rapid scanning, thereby enhancing image quality. This is particularly beneficial for the investigation of body tumors in children, as more pediatric patients can successfully complete a PET examination without the need for sedation.

Fig. 5
figure 5

Visualization of the original PET images with full time and the synthesized PET images based on ultrafast scans with 3 s, 6 s, 15 s, 30 s, and 75 s. The top, middle, and bottom lines correspond to the regions of the brain, abdomen, and pelvic cavity, respectively. The values of related evaluation metrics, including PSNR, SSIM, FSIM, and the average opinion scores provided by two radiologists, are listed. For each PET image, the two radiologists independently assign an image quality score on a five-point Likert scale, as we described in Table 2. Local regions of interest are magnified using red boxes. Among them, the brain images are further preprocessed and registered to the MNI brain atlas for more accurate quantitative analysis. The synthesized PET images of the abdomen and pelvic cavity preserved tumors that are highlighted by red numbers. This figure presents the same patient as Fig. 3

Fig. 6
figure 6

Clinical opinion scores of two radiologists for synthesized PET images based on ultrafast PET images with 3 s, 6 s, 15 s, 30 s, and 75 s and the original PET images with full time. a Opinion scores for the brain. b Opinion scores for the abdomen. c Opinion scores for the pelvic cavity. All opinion scores are given by slice-scale clinical reading results and denoted by the manner of mean \(\pm\) standard deviation

The clinical reading results of the two radiologists for all the synthesized PET images generated by our proposed framework for the brain, abdomen, and pelvic cavity were analyzed, as depicted in Fig. 6 and Table 6. The opinion scores were denoted by the manner of mean \(\pm\) SD. As shown in Fig. 6a–c and Table 6, most of the opinion scores given by the two radiologists exhibited high agreement (p > 0.05 in all paired t-tests). Significant differences between raters in opinion scores were only found in synthesized PET images from 3 s at the brain region (p < 0.05). The opinion scores of synthesized PET images from 3 s, 6 s, 15 s, 30 s, and 75 s progressively increased (Table 7). The generated image based on 6-s scans can meet clinical needs for the diagnosis of both body diseases. All the synthesized PET images of pediatric patients with/without sedation from two test datasets were used for further focal segmentation, which also demonstrated that a 6-s ultrafast PET image was feasible in the primary investigation for body diseases. Regarding specific regions, the opinion scores for brain significantly decreased, indicating the relatively greater reconstruction difficulty for brain region. Therefore, for some uncooperative young children, the examination time can be further shortened to 6 s with our method assistance.

Table 6 Opinion scores of the two radiologists for the synthesized PET images based on ultrafast PET images with different acquisition times and three different body regions
Table 7 The mean standardized uptake values (SUVmean) errors and max standardized uptake values (SUVmax) errors between original PET images and generated PET images with different acquisition times. The lesion data is taken from the same patient in Fig. 5

This study had several limitations worth mentioning. First, our analysis encompassed a limited cohort of pediatric patients who were pathologically diagnosed, with only five of them undergoing PET scans without sedation. The small sample size could potentially introduce bias into the results and restrict the applicability of the findings. Second, due to ethical considerations, we were unable to expose the pediatric patients to both ultrafast imaging without sedation and conventional imaging with sedation, which facilitated for a more accurate comparison of the two scan images. This lack of direct comparison may constrain the robustness of the conclusions derived from the study. Additionally, this study primarily concentrated on examinations for body diseases, with a limited number of cases focusing on diagnosing brain function. Consequently, additional research was required to enhance ultrafast imaging techniques specifically for evaluating brain function.

Conclusion

This paper introduces a novel deep neural network framework designed to evaluate the feasibility of significantly reducing the time required for pediatric [18F]FDG ultrafast scans using total-body PET/CT without sedation. The visualized results and quantitative analysis demonstrate that our approach achieves excellent reconstruction performance on commonly used metrics, such as PSNR, SSIM, and FSIM, for total-body, brain, abdomen, and pelvic cavity scans. Compared to reconstructed PET images with longer time durations (e.g., 75 s), our approach exhibits a marked improvement in ultrafast PET reconstruction scenarios (e.g., 3 s), with increasing ratios of 23.06%, 2.48%, and 2.57% for PSNR, SSIM, and FSIM, respectively. The clinical reading results by two radiologists also support the clinical feasibility and better visual discrimination of the synthesized full-time PET images using our proposed framework. In conclusion, this study provides a sufficient demonstration for the design of a better framework for pediatric ultrafast sedation-free whole-body PET imaging and enhances clinical decision-making.