Purpose

Machine learning (ML) is a subfield of artificial intelligence (AI) primarily aimed at identifying patterns. Several ML algorithms can be applied, such as a support vector machine (SVM), decision trees, and Bayes network, but deep learning has achieved the most remarkable performance and success. In particular, for the image-based tasks, convolutional neural networks (CNN) dominate. This approach does not require handcrafted features calculation or operator input. The convolution operation uses multiple filters to extract features (feature map) from the input image. During training, CNNs learn those features that are critical to successful performance [1]. Figure 1 summarizes the main types of networks for imaging applications. Basic principles and definitions in ML are provided in the Supplementary material.

Fig. 1
figure 1

Main types of artificial neural networks for iamging applications

The use of CNNs has led to the development of image analysis algorithms for radiological applications: tuberculosis detection on chest X-ray, lung nodule and interstitial lung disease assessment on chest computed tomography (CT), pulmonary embolism identification on CT angiography, detection of breast mass on mammography, and of intracranial haemorrhage on head CT. Indeed, as of March 1st, 2020, radiology resulted in the most exploited field with the highest number of FDA approved tools based on ML technology [2].

Nuclear medicine is also expected to benefit from the CNN-based algorithms, particularly from tools for clinical decision support, examinations scheduling, proper imaging protocols choice, image quality improvement, interpretation and reporting. Therefore, the purpose of the present review was to summarize the available literature on the developing field of deep learning, particularly the application of CNNs, in PET/CT and PET/MR.

Materials and methods

Eligibility criteria, search strategy and study selection

Using the PubMed and MEDLINE database, we performed a comprehensive literature search for potentially relevant articles published up until July 24th, 2020. No limitations on publication date were applied. The search strategy combined terms (text words) referring to "convolutional neural networks” and “positron emission tomography. In particular, the following search strategy was applied: “convolutional neural network” OR “CNN” AND “positron emission tomography” OR “PET” OR “PET/CT” OR “PET/MR”. Titles and abstracts of retrieved records have been screened independently by MK and AC. Exclusion criteria were: not-original articles, review articles, book chapters, editorials, case reports, non-English language papers, duplicates, non-in-human studies and studies out of the field of interest. Subsequently, we screened the reference list of selected studies to identify additional eligible articles. Aiming at a comprehensive assessment of the early stage of the development of deep learning applications in PET–scoping review–no additional exclusion criteria, which assess quality, were used; consequently, we included early-stage and proof-of-concept investigations.

Data extraction and analysis

We summarized study characteristics for all the selected papers. Study characteristics included: title, authors, year of publication, abstract, study design, population (public dataset or not) and sample size, application (technical or clinical), medical filed (oncology, neurology, cardiology or other) and disease/condition, type of imaging modality (PET/CT or PET/MRI), radiopharmaceutical, aim, and input type of data. According to the objective—technical vs diagnosis/prognostication—the articles were categorized into Image Quality and Technical applications vs Clinical studies. Main results and performance metrics were recorded. Descriptive statistical metrics were used to summarize the data.

Results

Study selection

The search of the PubMed/MEDLINE database returned a total of 381 studies. After the removal of duplicates, 110 records were left. After the abstract review and inclusion/exclusion criteria application, 47/110 studies were excluded. The screening process is summarized in Fig. 2. Sixty-three articles were finally included.

Fig. 2
figure 2

Article selection process

Study characteristics

The 63 included studies embraced both the technical (n = 23) and the clinical field (n = 40). Technical studies aimed at investigating the role of CNN-based methods focussing their attention on the image quality (n = 11) and technical issues (n = 12), mainly attenuation correction.

Clinical studies explored CNN applications in lung cancer (n = 7), head and neck cancer (n = 4), esophageal cancer (n = 2), lymphoma (n = 3), prostate cancer (N = 4), cervical cancer (n = 1), sarcomas (n = 1), multiple cancer types (n = 4), in neurology (n = 10) and cardiology (n = 1). Three clinical studies belonged to “other” category investigating CNN-based strategies in sex determination, cerebellum tracer uptake and in the improvement of cerebral blood flow measurement. Summary of the characteristics of the selected studies is provided in Table 1.

Table 1 Summary of included studies’ characteristics

The input modalities were: PET (n = 29, of which n = 1 PET MIP and n = 3 PET sinograms) PET and CT (n = 13), CT (n = 6), PET and MR (n = 4), MR (n = 5), floodmaps (n = 1), Coincidence waveforms (n = 1), MLAA‐based activity and attenuation maps (n = 2), polar maps (n = 1) and simulated PET-low-resolution sinogram (n = 1).

Image quality and technical applications

Summary of the technical studies’ main characteristics and findings is provided in Table 2.

Table 2 CNN-based studies focussed on image quality and technical PET aspects

Image quality

Radiation exposure is a central issue in nuclear medicine practice. A balance between the reduction of tracer activity and image quality is a challenge. On these grounds, Zhou et al. developed a supervised DL model (CycleWGANs) to boost low-dose PET images quality. The proposed method was compared to other existing imaging denoising methods (Non-Local Mean (NLM) and block-matching 3D(BM3D), RED-CNN and 3D-cGAN). The proposed model accurately estimated full-dose PET image from low-dose input images, at the count level of 1 million true counts. Additionally, it preserved SUVmean and SUVmax values and suppressed image noise for low dose PET imaging [3].

While Xiang et al. developed a CNN-based method to accurately estimate the standard PET image, combining both the low-quality low-dose PET (LPET) image and T1-weighted MR acquisition, the proposed method achieved a fast and competitive quality [4].

Spuhler et al. developed a denoising CNN-based method (dCNN) to recover full-count images from low-count images. dCNN was compared to existing conventional U-NET. The proposed algorithm achieved better results in terms of mean absolute percent error (MAPE): 4.99 ± 0.68 vs. 5.31 ± 0.76; peak signal-to-noise ratio (PSNR): 31.55 ± 1.31 dB vs. 31.05 ± 1.39; and structural similarity index metric (SSIM): 0.9513 ± 0.0154 vs. 0.9447 ± 0.0178 [5].

Image quality degradation and inaccurate image-based quantification related to the intrinsic PET low spatial resolution were investigated by Song et al. [6, 7]. They conducted two investigations to improve PET image resolution.

They developed a self-supervised super-resolution technique (SSSR) for PET, based on dual generative adversarial networks (GANs). Inputs for the SSSR were: a low-resolution PET image, a high-resolution anatomical magnetic resonance image (MR), spatial information (axial and radial coordinates), and a high-dimensional feature set coming from an adjunct CNN. Good performance was achieved in image quality, peak signal-to-noise ratio, structural similarity index, and contrast-to-noise ratio.

Subsequently, the group designed, implemented, and validated several CNN architectures for super-resolution (SR) PET imaging, including shallow and deep varieties. They used the low-resolution PET with its high-resolution anatomical counterpart (e.g. a T1-weighted MR image) as input images. CNN outperformed penalized deconvolution and partial volume correction. The superior performance was demonstrated qualitatively (edge and contrast recovery) and quantitatively (PSNR, SSIM, and on the contrast-to-noise ratio (CNR).

Whiteley et al. proposed a sinogram repair method based on a CNN able to mitigate the effects of malfunctioning of block detectors, which usually leads to a decreased sensitivity. The proposed method outperformed previously tested methods [8].

Thin-pixelated crystals provide high spatial resolution, but PET systems with such characteristics are not widely available. Hong et al. proposed a data-driven, single-image super-resolution (SISR) approach to enhance the PET image resolution and noise property for PET scanners with large pixelated crystals. They achieved fair image resolution and noise property results (comparable image qualities with four times larger crystals) [9].

Low spatial resolution in pre-clinical and clinical PET scanners with an extended field of view (FOV) can be related to the parallax error, which increases the uncertainty estimation of the annihilation position. Zatcepin et al. developed two DL-based algorithms to estimate depth-of-interaction (DOI) in depolished PET detector arrays, a dense NN and a CNN, and multiple linear regression (MLR) based methods. Tests were performed on an 8 × 8 array of 1.53 × 1.53x15 mm3 crystals and a 4 × 4 array of 3.1 × 3.1x15 mm3; both coupled to a 4 × 4 array 3 × 3 mm3 silicon photomultipliers. DL-based methods performed better than MLR-based methods and other conventional linear methods, achieving an average DOI resolution of = 2.99 mm (8 × 8 array) and 3.14 mm (4 × 4 array) full width at half maximum (FWHM) [10].

Incomplete projection data lead to artefacts in the reconstructed image. Liu et al. developed a CNN-based method for the recovery of partial-ring PET images. In this study, 20 digital brain phantoms were used in the Monte Carlo simulation toolkit, SimSET, to simulate full-ring PET scans. The CNN achieved good performance in terms of mean squared error (MSE), structural similarity (SSIM) index and recovery coefficient (RC), showing the potential to recover partial-ring PET images [11] successfully.

As far as PET image reconstruction is concerned, Kim et al. proposed a denoising CNN-based method integrated within the iterative PET reconstruction framework. The algorithm outperformed conventional methods based on total variation (TV) and non-local means (NLM) penalties [12].

Finally, Gong et al. trained a deep residual CNN to improve PET image quality using the existing inter-patient information embedded in the NN. Additionally, the algorithm was integrated into the iterative reconstruction framework. The proposed approach outperformed neural network denoising methods and other conventional methods (the Gaussian filter and penalized reconstruction methods) [13].

Technical applications

One of the most critical technical challenges in PET/MR is an accurate PET attenuation correction (AC) estimation. Seven [14,15,16,17,18,19,20] out of the twelve included studies investigated the potential role of CNN-based methods in the field of AC.

Blanc-Durand et al. [14] proposed generating the AC-maps from Zero Echo Time (ZTE) MRI images. Three different methods were compared to the reference CT-based AC map: a single-head atlas-based method, a ZTE-segmentation based method and a CNN-based method with a U-Net architecture. The best performance was achieved by the U-Net AC method that showed the lowest bias, the lowest inter-individual, inter-regional variability, with a negligible impact on brain metabolism estimation.

Leynes et al. [15] proposed a DL model to directly synthesize PseudoCT images from patient-specific multiparametric MRI (Dixon MRI) and a proton-density-weighted ZTE MRI, named ZEDD-CT. The proposed CNN-based method achieved a 4 × and 1.5 × reduction in root-mean-squared-error (RMSE) quantification of bone and soft tissue lesions, respectively.

Bradshaw et al. [16] evaluated DL's potential use for PET/MR attenuation correction in the pelvis using diagnostic MRI. They found that the DL-based approach outperformed the one using dedicated attenuation correction MRI sequences, shortening the scanning time.

Hwang et al. [17, 18] in 2018 and 2019 investigated different DL-based approaches to improve the simultaneous reconstruction of activity and attenuation in PET imaging based on maximum likelihood reconstruction of activity attenuation (MLAA) approach. In the first one, they proposed three different CNN architectures to learn CT-based attenuation map from the MLAA-generated activity distribution and attenuation map. The three proposed models were: Convolutional Autoencoder (CAE), U-Net, hybrid CAE and U-net. The hybrid architecture yielded the best results with a Dice similarity coefficient of 0.79 in the bone and 0.72 in the air cavity.

The second study aimed to improve total-body PET/MRI attenuation correction and compare with the Dixon-based four-segment method. The average Dice similarity coefficient (bone regions) between μ‐CNN and μ‐CT was 0.77, thus providing a reliable attenuation map.

Arabi et al. trained one CNN to generate PET-AC images (PET-DLAC) from PET-non-AC images. They evaluated the quantification accuracy in four datasets (18F-FDG, 18F-DOPA, 18F-Flortaucipir, and 18F-Flutemetamol) PET-CTAC images as reference. DLAC achieved less than 9% absolute SUV bias within each tracer dataset, but it appeared susceptible to outliers [19].

Spuhler et al. developed a CNN-based method to generate patient-specific transmission data from T1-weighted MRI for PET/MRI neuroimaging; they assessed both static and dynamic reconstructions. Good accuracy was shown for both reconstructions by the DL approach. The mean bias was -1.06 ± 0.81% for generated transmission data [20].

Berg et al. proposed a CNN-based method to estimate the TOF PET using pairs of digitized detector waveforms for a coincident event as input. A 20% and 23% improvement in time resolution vs leading-edge discrimination and vs constant fraction discrimination, respectively, was achieved [21].

Xu et al. [22] explored the potential of a 3D CNN-based method for dual-tracer PET images reconstruction. They developed a hybrid loss-guided DL-based framework using sinogram data. The proposed algorithm outperformed comparison methods, successfully recovering the distribution of lower total counts. The proposed approach was promising for two tracers' simultaneous imaging, even for tracers labelled with the same isotope.

Kumar et al. proposed a CNN-based method to improve PET–CT fusion. The proposed method encoded modality-specific features and then used them to derive a spatially varying fusion map quantifying the relative importance of each modality's feature across different anatomical regions. Consequently, fusion maps were multiplied by the modality-specific feature maps to obtain a representation of the complementary multi-modality information at different locations. The DL method ability to detect and segment multiple regions was evaluated and compared to reference techniques for multi-modality image fusion (fused inputs, multi-branch, and multi-channel techniques) and segmentation. The developed CNN resulted in a significantly higher foreground detection accuracy and Dice score [23].

As a first step in developing an automated method able to quantify skeletal tumour burden in PET/ CT, Belal et al. developed a CNN-based method for bone segmentation and compared its’ performance with manual segmentations made by an experienced physician. Sørensen-Dice index (SDI) was used to measure the segmentation accuracy. The average volume difference (volume difference/mean volume) between the two segmentations was 5–6% and < 3% for the vertebral column and ribs, and for other bones, respectively [24].

Lee et al. proposed a CNN-based method for voxel dose prediction from PET and CT image patches used as inputs in the radiotherapy planning setting. The voxel dose rate maps predicted by the CNN were compared with a) the ground truth from direct Monte Carlo and b) dose rate maps generated from voxel S-value (VSV) kernel convolution method. Results showed good agreement with the ground truth (voxel dose rate errors = 2.54% ± 2.09%). Significant improvements were achieved in comparison to the conventional dosimetry approaches [25].

Clinical studies

Summary of the studies’ features and main results is provided in Table 3.

Table 3 Summary of the characteristics of the 40 selected clinical studies

Brain and head and neck cancer

In medical imaging, segmentation is a common task; it is used for radiotherapy planning, treatment response assessment and prognostic parameters calculation. An automated approach (full 3D U-Net CNN) for brain lesion segmentation from 18F-FET PET images in patients showing different glial tumours was tested. The authors demonstrated promising performance: a Dice similarity coefficient (DSC) up to 0.8231 was obtained [26].

Radiation therapy is one of the most effective therapeutic strategies in head and neck cancer patients. Treatment success strongly relies on a precise delineation of gross tumour volume (GTV) on medical images. Huang et al. developed and verified an automated GTV segmentation method based on CNN and PET-CT images. Dice similarity coefficient (DSC) of GTV of the proposed method was higher than the previously described automated approaches [27].

Olin et al. described further steps forward radiotherapy planning using CNN-based methods. They tested the feasibility of an automated “one-stop-shop” radiotherapy planning framework using PET/MR data. All dosimetric parameters of the synthetic CT-based dose plans resulted within ± 1% of the conventional dose plans [28].

Lymph node staging is crucial since it influences both the overall survival and the probability of distant metastases. Chen et al. combined radiomics, and DL approaches to classify lymph nodes. They designed a “many-objective radiomics” (MaO-radiomics) model and a 3-dimensional convolutional neural network (3D-CNN). The algorithm fully utilized spatial contextual information and fused the outputs through an evidential reasoning approach. The hybrid method showed an accuracy of 0.88 [29].

Lymphoma

CNN-based methods in lymphoma patients provided good performance in detection and characterization of 18F-FDG-avid lesions. In particular, in the study by Capobianco et al. the CNN-based total metabolic tumour volume (TMTV) was compared to the reference TMTV in terms of prognostic value for progression-free survival (PFS) and overall survival (OS). CNN-derived TMTV was significantly correlated with the reference TMTV (ro = 0.76; p < 0.001). In 280 patients, 6737 ROIPARS (PARS = PET assisted Reporting System) were obtained applying the CNN-based method, while the ROIREF were 7996. The CNN yielded 3317 true negatives, 2399 true positives, 589 false negatives and 432 false positives. Both TMTV resulted in predictive of PFS and OS [30].

Sadik et al. developed a DL-based method to automatically quantify the uptake in the liver and mediastinal blood pool needed to determine the Deauville score, as the first step towards an automated treatment response evaluation. Good accordance between the proposed method and experienced radiologists was achieved [31].

Sites of physiological 18F-FDG uptake and normal excretion (sFEPU) can interfere in the interpretation of abnormal PET findings and reduce the sensitivity. Bi et al. focussed on the potential use of a CNN-based method—a multiscale superpixel-based encoding (MSE) in sFEPU identification. Their method outperformed other existing methods in the classification of sFEPU with average F-score of 0.9173 [32].

Lung cancer

Among the included articles, 7/40 clinical studies investigated the potential of CNN-based methods in lung cancer patients. In lung lesion detection false positives (FPs)-reduction was a central issue [33, 34]. Interestingly, Teramoto et al. developed an FPs reduction method by incorporating CNN into FPs reduction technique that used shape features from PET images' CT and metabolic features. The proposed ensemble technique showed a 90% sensitivity and 4.9 FPs/case [33].

Zhao et al. developed a multi-modality segmentation method relying on FDG uptake and CT information for tumour delineation. They demonstrated that the proposed PET/CT CNN-based method achieved a significant performance gain in tumour segmentation compared to other traditional and ML-based methods [35].

CNN-based methods were also explored as a tool to assist staging in lung cancer; Kirienko et al. tested a CNN, developed using both PET and CT, to classify T parameter (T1-T2 vs T3-T4). The AUC of the model resulted in 0.83 [36].

For nodal staging, Wang et al. developed a CNN and compared it with four classical ML methods. CNN showed sensitivity, specificity, accuracy, and AUC of 84%, 88%, 86%, and 0.91. Diagnostic performance was not significantly different among the tested algorithms [37].

The CNN-method developed by Tau et al. was aimed at predicting disease spread at nodal and distant sites in non-small cell lung cancer. CNN-based algorithm accuracy was higher for predicting nodal than distant metastases: 80% and 63%, respectively [38].

Finally, Baek et al. showed that CNNs trained to perform tumour segmentation (with no other information than physician contours) identified survival-related image features with remarkable prognostic value. The estimated AUC was 0.88 (95% CI: 0.80–0.96) to predict 2-year OS [39].

Oesophagal cancer

CNN-based methods in oesophagal cancer therapy response and outcome prediction were evaluated by Ypsilantis et al. [40] and Yang et al. [41]. In the former, three-slices (3S)-CNN outperformed other competing predictive parameters (e.g., SUVmax and radiomic indexes); An accuracy of 73% has been achieved in predicting non-responders and responders from pre-treatment 18F-FDG-PET/CT images [40]. In the latter paper, CNN-based methods provided promising results in identifying patients who died within 1 year from the initial diagnosis; results suggested that the prediction model could identify tumours with more aggressive behaviour. Hence, both studies built solid ground to lead further investigations supporting future personalized management of patients affected by oesophagal cancer [41].

Prostate cancer

Prognosis, prostate cancer delineation, nodal staging, and recurrence were all four topics that included studies using CNN-based prostate cancer molecular imaging methods.

Polymeri et al. evaluated a DL algorithm on 18F-choline PET/CT images of 145 patients for automated cancer assessment (versus manual segmentation) and OS prediction. Good accordance between manual measurements and automated PET/CT biomarkers was shown. Automated PET/CT measures were significantly associated with OS (p = 0.02) [42].

Mortensen et al. focussed on comparing manual vs automated prostate cancer assessment in terms of 18F-choline PET derived parameters. The correlation between automated and manual measurement was significant. CNN segmentation provided volume and conventional PET measures similar to manually derived ones. Mean differences (95% CI) were 1.40 (− 2.26 to 5.06), 0.37 (− 0.01 to 0.75), -0.08 (− 0.30 to 0.14), and 9.61 (− 3.95 to 23.17) of volume, SUVmax, SUVmean, and total lesion uptake, respectively [43].

Hartenstein et al. trained three different CNNs to determine 68Ga-PSMA PET/CT lymph node status from CT alone. The best CNN outperformed two experienced radiologists with an AUC of 0.95 and 0.81, respectively [44].

Finally, Lee et al. [45] evaluated the performance of deep learning approaches in detecting abnormal 18F-FACBC uptake in patients with biochemical cancer recurrence of prostate cancer. Two different CNN architectures were used: a 2D-CNN (ResNet-50), which uses single slices (slice-based approach) and a 3D-CNN (ResNet-14), which uses a hundred slices per PET image (case-based approach). The slice-based approached performed much better than the case-based approach (AUC = 0.971 and 0.699, respectively).

Multiple cancer types

Nobashi et al. evaluated the performance of CNN-based approaches to dichotomously classify 18F-FDG PET/CT brain scans of cancer patients as abnormal vs normal obtaining convincing results. An overall model that averaged all built models' probabilities showed the best accuracy of 82% [46].

Shaish et al. investigated whether CNN can predict the SUVmax of lymph nodes in patients with cancer. The predicted SUVmax resulted associated with the reference SUVmax (p < 0.0001) [47].

Sibille et al. tested multiple CNN configurations’ performance on a large cohort of lung cancer and lymphoma patients to localize and classify uptake patterns on total body 18F-FDG PET/CT images into suspicious vs non-suspicious for cancer. In the classification the AUC varied considerably depending on the imaging modality: CT alone, AUC = 0.78 (95% confidence interval [CI]: 0.72, 0.83); 18F-FDG PET alone, AUC = 0.97 (95% CI 0.97, 0.98); 18F-FDG PET/CT, AUC = 0.98 (95% CI 0.97, 0.99); 18F-FDG PET/CT maximum intensity projection (MIP), AUC = 0.98 (95% CI 0.98, 0.99); and 18F-FDG PET/CT MIP atlas, AUC = 0.99 (95% CI 0.98, 1.00) [48].

Kawauchi et al. tested two CNN-based methods (A and B) to classify lesions into benign, malignant and equivocal. A total of 76,785 MIP images were analysed. In the total-body analysis, Algorithm A achieved 91% (benign), 100% (malignant) and 57.5% (equivocal) accuracy; while Algorithm B showed 99.4% (benign), 99.4% (malignant) and 87.5% (equivocal) accuracy. In the region-based analysis, the accuracy in the prediction of malignant uptake regions was 97.3% (head-and-neck), 96.6% (chest), 92.8% (abdomen) and 99.6% (pelvis) [49].

Cervical cancer

Chen et al. evaluated the performance of spatial information embedded CNN (S-CNN) in the detection of cervical cancer, a known challenging task related to its proximity to the bladder. The S-CNN output has been processed by a thresholding method combined with prior information, reaching a mean DSC of 0.84 [50].

Sarcoma

The high mortality rate related to distant metastases prompts the need for an early prediction of disease spread in sarcoma patients. Peng et al. compared the performance of their deep multi-modality collaborative learning method to the state-of-the-art methods, achieving the overall best performance in predicting distant metastases risk with the following results: the best AUC value of 0.84, the best accuracy of 85%, the best sensitivity of 92%, the best F1 score of 86%, also a second-best precision of 81%, and a competitive third-best specificity of 79% [51].

Neurology

CNN-based methods were investigated in Alzheimer’s Disease (AD) and Parkinson’s Disease (PD), in particular, 7/10 [52,53,54,55,56,57,58] focussed on AD, 1/10 focussed on both [59] and 2/10 focussed on PD only [60, 61].

For AD diagnosis, Ding et al. studied the performance of their CNN based on InceptionV3 architecture. The algorithm achieved an AUC of 0.98 (95% confidence interval: 0.94, 1.00) in predicting the clinical diagnosis of AD, outperforming imager evaluation [52].

Liu et al. used one CNN to classify patients affected by AD [53]. They built a multiple deep 3D-CNN and an upper high-level 2D-CNN able to automatically learn generic multi-level and multimodal features from multiple imaging modalities. High accuracy (93%) was achieved for classification of AD versus controls, while for classification of Mild Cognitive Impairment (MCI) and controls accuracy was lower (83%), demonstrating that the classification of this status is challenging.

The same group tested a classification framework's performance based on a combination of 2D CNN and recurrent neural networks (RNNs). The algorithm showed an AUC of 0.95 for AD vs normal controls (NC) classification and 0.84 for MCI vs NC classification [54].

Huang et al. proposed a CNN that integrated the multi-modality information from the hippocampal area of both T1-MR and 18F-FDG PET images. The accuracy was 90% and 87% for controls vs AD, and for controls vs MCI, respectively [55].

Kim et al. investigated amyloid quantification methods via a DL model. They aimed at developing a one-step quantification algorithm for amyloid PET, using images acquired from multiple institutions with different radiopharmaceuticals. The mean absolute errors of the composite SUV ratio of test sets for 18F-Florbetapir and 18F-Florbetaben PET were 0.06 and 0.05, respectively [56].

Choi et al. [57] developed a CNN-based method trained on 18F-Fluorodeoxyglucose and 18F-Florbetapir PET images to predict future cognitive decline in MCI patients. Results showed an accuracy of 84% in the prediction for conversion to AD in MCI patients, while accuracy for classification between AD and healthy subjects was 96%. The same group, Choi et al., developed a DL-based evaluation of cognitive dysfunction (cognitive signature) on both Parkinson and AD. The proposed algorithm discriminated between AD and controls on 18F-FDG PET/CT, achieving an AUC = 0.94. When this model was directly transferred to images coming from MCI subjects to identify those who would have most likely progressed to AD, the AUC was 0.82; while testing the method on images coming from Parkinson disease patients to discriminate the ones with dementia, the AUC was 0.81 [59].

Yee et al. proposed a CNN-based method to generate a probability score along the continuum of AD. The method based on 18F-FDG-PET images showed the limited prognostic value in predicting future conversion to Dementia Alzheimer Type [58].

Zhao et al. proposed a 3D deep CNN for an automated early differential diagnosis on 18F-FDG PET/CT images to discriminate Idiopathic Parkinson's Disease (IPD) from multiple system atrophy (MSA) and progressive supranuclear palsy (PSP). Performance achieved by the CNN-based method was as follows: 98% sensitivity, 94% specificity, 95.5% positive predictive value (PPV) and 97% negative predictive value (NPV) for the classification of IPD; 97% sensitivity, 99.5% specificity, 99% PPV, and 99% NPV for MSA diagnosis; 83% sensitivity, 98% specificity, 90% PPV, and 98% NPV, for the PSP, respectively. Also, saliency maps were illustrated. It is worth to mention that, among the saliency features discovered by the deep learning methods, the midbrain was implied as well, which is a widely accepted pathological region for movement disorders that were not considered in the analysis of 18F-FDG PET/CT images yet [60].

Manzanera et al. [61] investigated a 3D-CNN model's potential role in the differentiation of PD patients from controls on 18F-FDG PET/CT images, achieving good performance AUC of 0.94 on the test set.

Cardiology

Hirata et al. [62] developed a CNN-based method to retrospectively differentiate cardiac sarcoidosis (CS) and non-CS in 18F-FDG PET/CT images of 85 patients (CS = 33). An appropriate diagnosis could help prevent deadly cardiac events occurring in this particular type of patients such as complete heart block, ventricular or atrial arrhythmias, congestive heart failure, and sudden cardiac death. Performance of the CNN-based method with the ReliefF algorithm's introduction achieved a sensitivity and specificity of 84% and 87%, respectively, outperforming the standardized uptake value (SUV)-based classification method and the coefficient of variance (CoV)-based classification method.

Other applications

Cerebral blood flow (CBF) is altered in many neurological diseases. Guo et al. [63] developed a CNN-based method trained to integrate single and multi-delay arterial spin labelling (ASL) and structural MR to predict gold-standard 15O-water PET CBF maps. Significant improvement in image quality and quantification accuracy was achieved. Results showed good performance with a structural similarity index of 0.732 for the multi delay and 0.854 for a single delay.

Xiong et al. evaluated the performance of three different 3D deep CNNs (U-Net, V-Net, and modified U-Net) in the automated measurement of 18F-FDG uptake in the cerebellum. U-Net CNN yielded the best performance with a Dice coefficient of 0.911 and showed no significant slope and intercepted error in the SUV uptake measurement than an independent reference standard [64]. This study demonstrated the potential of deep CNNs in automated SUV measurement of reference regions.

To prevent patient misidentification, Kawauchi et al. [65] developed a CNN-based method to predict patients' sex from 18F-FDG PET/CT images, achieving an accuracy of 99%. The pelvic region was the most crucial region to classify the patients correctly. Moreover, the DL method was also able to predict the age and body weight.

Discussion

Nuclear medicine field has experienced rapid development of AI-based applications in the last 2 years.

The vast majority of included articles (45/63–71%) were published in 2019–2020. CNN-based algorithms have been proposed for a wide range of PET imaging purposes, encompassing technical and clinical objectives. Indeed, machine learning algorithms have been demonstrated to be of value for image quality improvement, attenuation correction (in particular for PET/MR systems), and automatic extraction of a higher amount of information from raw and processed images. Clinical applications comprised oncology (detection, diagnosis, and prognostication in many cancer diseases), neurology and cardiology, in line with the PET/CT indications.

One of the main challenges for CNN-based algorithms development is the scarcity of the datasets. Augmentation strategies are generally put in place to improve model performance and overcome the overfitting phenomenon, a common problem related to machine learning algorithms. Augmenting the data allows adding variability in the dataset to improve the prediction generalization [66]. The selected technical studies included up to 180 subjects, and clinical studies patient population ranged between 11 and 6462 patients, with a median of 209. Deep learning methods require exponentially larger populations. On the one hand, this is necessary to minimize the effects of overfitting. On the other hand, it allows training an algorithm on a cohort representative of the “real-world” population for which the model is developed. It has been demonstrated that model performance significantly improves with dataset enlargement. When a 1000 samples dataset vs the > 100,000 datasets was used for retinopathy classifier development, the weighted error resulted in 13% vs 7%, respectively [67]. Several studies can be considered proof-of-concept or preliminary investigations because of the limited dataset size, restricting their clinical practice applicability. Large and representative study cohorts are challenging to enrol because of ethical limitations, expense, time requirements, or lack of ground truth. Indeed, retrospective study design rather than prospective is the main one among the selected studies.

Alternative strategies have been implemented to overcome this challenge. Multiple studies focussed on neurological diseases (8/10 selected studies, [52,53,54,55,56,57,58,59]) used image datasets from the Alzheimer disease neuroimaging initiative (ADNI) (http://adni.loni.usc.edu). ADNI started in 2004 with an initial ANDI-1 project, followed by ADNI-2 and ADNI-3, to detect and track AD using genetic, biochemical, clinical, and imaging biomarkers. The availability of this open database allows multiple investigators worldwide to study and develop alternative strategies to respond to the need for early identification of the disease and risk stratification. The availability of such public datasets for a wide range of conditions could accelerate the research path. Some initiatives such as The Cancer Imaging Archive (TCIA)—a project funded by the Cancer Imaging Programme of the National Cancer Institute—hosts datasets of different medical imaging types and cancer patients' modalities accessible for public download [68].

Virtual clinical trials (VCTs) (or “in silico” imaging trials or virtual imaging trials) may constitute an alternative approach to evaluate medical imaging technologies and to perform clinical trials. Within VCTs in medical imaging, the investigators may create models of humans, synthetic datasets, simulate imaging scanners, design and use interpretation models, and emerge from our review especially for technical investigations [9, 11, 13]. This technology is at its early stage for clinical applications. These approaches are challenged by computational complexity, simulation realism, and difficulties for validation but soon may represent an alternative or at least a companion strategy for research in medical imaging [69].

Prior experiences on distributed learning approaches such as Clara platform have been launched to promote collaboration among institutions preserving policy and regulatory aspects (https://developer.nvidia.com/clara).

Deep learning applied to images attempts to identify features in an image that could be predictive of the outcome of interest (diagnosis or patient survival) without explicit human programming. On the other hand, radiomics is based on calculating many parameters (histogram and texture features), defined by mathematical formulas, that subsequently are analysed using appropriate statistical methods or ML algorithms to assess their potential diagnostic or predictive value. Interestingly, image mining tools may be based on combining these two strategies (radiomics and CNNs), such as in the study by Peng et al. [51]. They applied and compared handcrafted features (with the random forest for classification and prediction), PET-derived 2D and 3D CNN, and an algorithm integrating in-depth features with texture features to predict distant metastases development in patients affected by soft-tissue sarcoma. The multi-modality (PET/CT) collaborative (radiomics and CNN) learning approach demonstrated the best performance. The proposed combination strategy may overcome single approaches.

Knowledge of the basic principles and awareness of deep learning methods' advantages and limitations should become part of radiologists and nuclear medicine physicians' skills. Restructured training programmes are under development [70]. The availability of educational resources by national and international scientific societies and academia for practising professionals is growing; books [1, 71], journal articles [72,73,74], meetings[75], webinars, and online resources[76] can be accessed. The integration of AI-based tools into the medical workflow is an up-and-coming trend, and all the professionals working in imaging departments should embrace innovation coming from AI, attend training initiatives and be up to date.

We have to acknowledge that we may have missed some papers from technical and engineering resources. However, we aimed to identify the research trend towards the clinical arena. Secondly, we did not perform a quality assessment of the studies since we intended to include preliminary investigations to identify early trends in CNN-based approaches in PET imaging.

In conclusion, CNN applications for PET/CT and PET/MR are exponentially growing for both technical and clinical purposes. ML algorithms demonstrated promising results with performances equal or even higher compared to conventional approaches. Novel research strategies emerged to face the challenges of ML algorithms development. Introduction of AI-based methods into clinical practice requires dedicated educational initiatives for professionals involved in the medical imaging field to enable a critical appraisal of the advantages and limitations of AI-based tools.