Introduction

Nuclear medicine and radiology have evolved greatly over the last 20 years thanks mainly to hardware developments, such as the deployment of multimodality imaging devices in the 2000s [1], and the development of fast detector technologies [1, 2]. Software innovations have also led to substantial improvements in the spatial resolution and the signal-to-noise ratio of reconstructed images, thanks for example to point-spread function and time-of-flight (ToF) information being incorporated into PET image reconstruction [3]. Despite being quantitative by nature, nuclear medicine images are, in most clinical publications, clinical trials and obviously routine clinical practice, exploited in a very restrictive manner (i.e. analysed mostly visually or semiquantitatively). There seems to be growing interest in more automatic analysis of medical images coupled to extraction of multiple features, including some that may not be accessible to the naked eye, even the expert trained eye [4, 5]. The main objective of this evolution and change in paradigm is to better harness the information provided by imaging studies in terms of influencing patient management workflow within the context of precision medicine. As a result, within this new framework, medical imaging should play an enhanced role, and become essential beyond diagnosis to cover therapy planning, as well as therapy monitoring and assessment, and predictive modelling and stratification, to become overall an integral part of future clinical decision-making systems.

The aim of the present paper is to provide definitions of artificial intelligence (AI, machine/deep learning) and radio(geno)mics, as well as some insights into their potential applications in nuclear medicine imaging.

Definitions of artificial intelligence, machine (deep) learning and radio(geno)mics

Artificial intelligence

The term artificial intelligence is a ‘fuzzy concept’ with a number of possible definitions depending on context, time and applications. As an academic discipline, it is considered to have been founded in 1956 at the Dartmouth conference [6]. A rather general definition is “intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals”. However in the present context of medical imaging, a more specific definition may be more appropriate: “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation” [7]. As algorithms tackle increasingly complex tasks, those considered to require ‘intelligence’ are sometimes removed from the field of AI, leading to the assertion that “AI is whatever has not been done yet” [8]. An example of this is character recognition, which may no longer be considered as ‘artificial intelligence’, because it is now a standard routinely used technology, for example by postal services. Well-known capabilities of today’s algorithms usually considered as AI include speech recognition and more importantly understanding, language translation, mastering complex games such as Go [9] and more recently even more complex strategy video gamesFootnote 1, or autonomously driving cars.

AI systems can be classified as analytical, human-inspired or humanized AI [7]. Analytical systems possess only characteristics related to cognitive intelligence, using learning based on past experience to make predictions. Human-inspired AI systems possess emotional intelligence and understanding in addition to cognitive elements. Humanized AI systems are able to demonstrate cognitive, emotional and social intelligence, with self-consciousness and self-awareness in interactions with others. In the twenty-first century, AI techniques have benefited from improved theoretical understanding (e.g. in neural network mathematics), advances in computer power (e.g. graphical processing units, GPU), wider availability of always larger quantities of data for learning (e.g. through the development of social networks and other platforms, cloud storage/computing, etc.), and availability of the algorithms and libraries themselves. As a result, older concepts and theories can now be actually applied to real-life problems and tasks, even by nonexperts on commercially available systems.

Regarding medical imaging, a number of tasks that clinicians perform that rely on images could therefore theoretically be carried out by AI, including but not limited to, lesion detection, disease classification, diagnosis and staging, quantification, treatment planning (delineation of targets and organs at risk, and dosimetry optimization), assessing response to treatment and prognosis [10]. Automation is expected to allow these tasks to be achieved with much higher robustness and reproducibility, and potentially with fewer errors, in a much shorter time. Obviously, there are other aspects of medical imaging in which AI could provide solutions to improve practice, such as improving operational workflow, finance management and quality improvement, amongst others [11]. Most, if not all, AI systems developed for medical image analysis tasks belong to the class of analytical systems, and can therefore be classified as machine/deep-learning techniques.

Machine (deep) learning

Machine learning is the study of algorithms that learn and improve through experience, and as such is a fundamental concept of AI. Learning is usually considered to be unsupervised or (semi)supervised. Unsupervised learning consists of finding patterns in unlabelled data [12], whereas supervised learning uses labels to infer classification or regression, and semisupervised learning is typically carried out with a small amount of labelled data and a large amount of unlabelled data [13]. In the context of medical imaging, the standard workflow or machine learning pipeline is usually directly applied to address most tasks (Fig. 1).

Fig. 1
figure 1

The radiomics pipeline, in comparison with the usual machine learning workflow, and the deep learning workflow

Data (e.g. images or parts of images) are fed into a feature extractor. The goal of the feature extractor is to compute from the input data a number of ‘handcrafted’ (or ‘engineered’) features. These can be based partly on expert knowledge, can be more or less relevant to the task at hand, more or less complex, and in small or large numbers. These features are then fed into a classifier (or a regression) algorithm whose goal is to map the features with the classification task, for example differentiating between two types of tumour, or stratifying patients based on clinical outcome. This part is often divided into two steps: feature selection and actual modelling. Feature selection consists of identifying a smaller relevant subset amongst the features calculated by the extractor. This is usually done to facilitate the subsequent modelling step by reducing training times, avoid dimensionality issues and reduce overfitting, as well as to simplify the resulting models and improve their interpretability. There are several techniques to perform feature selection, that can be either linked to or independent of the chosen classifier that will subsequently combine these selected features into some sort of multiparametric model. These techniques can be categorized into three main approaches: filter, wrapper and embedded [14]. Filter methods identify variables independently of the model that will be combining them, based only on metrics such as the correlation between each variable and the endpoint, suppressing the least valuable ones. Despite not being prone to overfitting, filter methods tend to retain redundant features, as they do not take into consideration correlations between them [15]. Wrapper approaches evaluate subsets of features, which allows relationships between features to be taken into account, in contrast to filter methods. Their main associated drawbacks are the higher computation time and risk of overfitting with small samples. Embedded methods combine the advantages of wrapper and filter approaches, by ‘embedding’ the feature selection process within a learning algorithm, performing feature selection and classification simultaneously. Actual modelling (i.e. mapping a combination of features with the endpoint) can be performed thanks to a classifier or a regression algorithm. A large number of techniques have been developed in the field of machine learning. Popular examples include random forest, support vector machines and artificial neural networks [16].

Deep learning (as opposed to ‘shallow’ learning methods, as described above) is a category of methods belonging to the machine learning field, that are mostly based on the use of specific types of artificial neural networks, sometimes with a very large number of layers and nodes. Thus deep learning is a specific type of machine learning, which is itself part of AI (see Fig. 2) [17].

Fig. 2
figure 2

Deep learning is a specific type of machine learning, and both are AI concepts

These techniques rely on a cascade of multiple layers of nonlinear processing units for feature extraction and transformation, where the input to each successive layer is the output from the previous layer, so that multiple levels of representation that correspond to different levels of abstraction are learned [18]. Even though neural networks were designed much earlier, ‘deep’ networks, with their ability to learn efficiently through a general purpose procedure, are more recent. Furthermore, the main impact of convolutional neural networks (CNN) on computer vision and imaging applications was considered to be a real breakthrough in the years 2011 and 2012. CNNs trained using the backpropagation algorithm had existed for decades, and GPU implementations for years. However in 2012, Cireşan et al. showed how max-pooling CNN implemented on a GPU could provide improved results in a number of vision benchmarks. The same year, Krizhevsky et al. won the ImageNet competition with much better performance than shallow machine learning methods, using a similar CNN design [19]. Cireşan et al. also won both the ICPR and the MICCAI Grand Challenge on mitosis detection [20]. Over the next years, the performance obtained by challengers in the ImageNet competition steadily improved thanks to new CNN designs and techniques [21]. Deep learning methods, especially CNN but also other types of network (e.g. recurrent neural networks and generative adversarial networks) have since been exploited to address existing challenges in a number of medical imaging tasks, including but not limited to, image registration, reconstruction, classification, pattern recognition, segmentation, denoising and super resolution. In most of these tasks, they often achieved unprecedented performance (in terms of computational efficiency and/or performance in the specific task), becoming de facto the new standard and benchmark to beat [22].

One of the major differences between these techniques and the ‘older’ machine learning approaches described above is that the aim of these networks is to learn specific patterns relevant to a given task (e.g. segmentation or endpoint prediction) from the data (i.e. images) themselves, instead of relying on ‘engineered’ or ‘handcrafted’ features (including expert knowledge) [22, 23]. In that regard, these methods can be considered a paradigm shift, as they may provide an ‘end-to-end’ workflow, relying on a general purpose learning procedure (Fig. 1).

User intervention, for example detection and selection of objects of interest for further characterization, could therefore be significantly simplified or even deemed unnecessary. On the other hand, a number of challenges need to be well understood when considering the use of these techniques. Deep neural networks have a large number of hyper parameters, and exploring the parameter space to identify optimal ones is usually not feasible due to limitations in computational resources and time. Tricks such as computing the gradient on several examples at once (batch processing) can help speed up computation. The large processing capabilities of GPUs have enabled significant improvements in training speed. Deep neural networks are also prone to overfitting. This is due in part to their ability to model rare dependencies observed in the training data, thanks to their numerous layers. Various approaches such as regularization and dropout are usually implemented to limit overfitting [24]. Data can also be augmented by methods such as zooming and rotating to increase the size of the required training sets [25]. Finally, transfer learning is an important component in which networks pretrained on different, although much larger datasets, are fine-tuned using smaller datasets more specific to the task at hand [26, 27].

Radio(geno)mics

In parallel to the improvements in PET/CT hardware and reconstruction software over the last two decades, several developments have been made in the field of PET/CT image processing and analysis: noise filtering [28, 29] and partial volume effect correction [30] methods can further improve both the visual quality and quantitative accuracy of PET images. In addition, (semi)automated image analysis algorithms can detect lesions of interest [31] and also delineate them with similar accuracy and higher reproducibility and robustness than human experts [32,32,34]. This has opened the way to a more comprehensive characterization of organs and tumours, by extracting quantitative metrics (‘handcrafted’ or ‘engineered’ image features) from preprocessed and segmented PET/CT images. In this context, most of the current work on PET/CT imaging has concentrated on the radiotracer most commonly used clinically, namely 18F-fluorodeoxyglucose (18F-FDG), with very few studies considering other tracers [35]. The four methodological components shown in Fig. 1 (preprocessing, segmentation, feature extraction, modelling) are the key building blocks of the scientific field known today as radiomics. The term ‘radiomics’ first appeared in 2010 [35] and the fully formalized framework of radiomics was described in 2012 [36]. As can be understood from the previous sections, radiomics is simply a translation of the standard machine learning pipeline (Fig. 1) applied to medical images. The rationale behind the development of the radiomics field of research is that medical images contain features of tumour phenotypes that can reflect at least part of the underlying pathophysiological processes at smaller scales, including down to the genetic level. This is why the term radiomics is often associated with genomics, in the term ‘radiogenomics’. Radiogenomics actually has two different meanings. The first, older one is related to radiobiology and is not relevant in the present context. The second concerns the association/combination of radiomics and genomics, which can be categorized into two different methodological approaches. The first investigates the links between the two, i.e. what part of the genomics information can be explained or ‘decoded’ by radiomics, which has been described as ‘imaging genomics’ [37, 38] and investigated in a number of studies [39, 40]. The alternative approach concerns the development of methodology that combines two sources of information making use of their complementary value in order to build more efficient predictive models.

Nuclear medicine imaging applications of artificial intelligence, deep learning and radio(geno)mics

Applications of AI in nuclear medicine are extremely wide and promising and may impact different aspects [41]. The first step concerns the use of AI for data processing at the detector level for image reconstruction, including corrections for the different physics processes associated with the detection process (e.g. attenuation, scatter). Beyond the image reconstruction step, AI may be useful for different image processing steps including denoising, segmentation and fusion. Finally, AI can be used in the construction of models based on information extracted from images that would help achieve predictive, personalized medicine relying on images.

At the detection level, recent efforts include the use of CNN to enhance PET image resolution and improve the noise properties of PET scanners with large pixelated crystals [42] and to estimate ToF directly from pairs of coincident digitized detector waveforms [43]. Integrating a deep neural network into the iterative image reconstruction process may improve final image quality [44, 45]. Deep learning methods have already been proposed for attenuation correction and registration in PET/CT and PET/MR, and have been shown to be able to generate attenuation maps with high accuracy [46,47,48,49,50]. In the same context, deep learning has been used for improving maximum-likelihood reconstruction of activity and attenuation (MLAA) with ToF PET data [51]. Denoising is one of the most popular image processing applications for which deep learning techniques have been successfully used, for example to generate full-dose PET images from low-dose images [52] or to directly filter reconstructed PET images [29].

Automated detection, counting and segmentation/characterization of lesions in images may have wide applications in diagnosis, as well as in planning treatment and monitoring response, but more broadly, also for all radio(geno)mics applications. For a long time, methods relying on older, shallow machine learning frameworks were not able to reach the required levels of automation and accuracy to be fully transferred to clinical practice or to allow the fast processing of hundreds of patients in radiomics analysis. Some recent developments still involve the use of ‘older’ machine learning techniques [53], but a growing number are relying on deep learning methods with the hope of greatly improving both automation and performance. Indeed, CNNs have been especially successful for medical image segmentation tasks [22]. This is explained by the fact that, contrary to classification tasks (one label per image), segmentation learning occurs at the voxel level (one label per voxel). The amount of learning data thus allows efficient training of the network parameters. For example, despite having only a few training examples available in the recent PET functional volume segmentation MICCAI challenge, the method based on a pretrained CNN achieved the highest score (although not significantly higher than the scores of some of the more conventional techniques) [32]. CNNs have also been applied to the problem of multimodal PET/CT cosegmentation [34, 54, 55]. Pipelines for tumour detection and segmentation based on a deep learning framework [31, 55, 56] are likely to provide fully automated solutions for this step of the radiomics pipeline, thereby eliminating this important bottleneck.

Predictive modelling and radio(geno)mics studies already heavily rely on machine learning methods [16, 57,58,59], although most of them are in the field of radiology and not nuclear medicine. Evaluation of machine learning and deep learning methods have shown improved feature selection, more robust model building and harmonization of radiomic PET features [59,60,61,62,63]. However, only a limited number of studies have explored the potential of deep network CNNs to reach a higher levels of automation by using them as an end-to-end methodology, and most of them have been in the field of CT or MR imaging [64,65,66,67,68,69,70], with only a few examples of their use in nuclear medicine imaging such as FDG PET [71,72,73] and SPECT [74].

Discussion

Although most currently available studies on the use of deep features, and their combination with usual radiomic features, have been carried out in the fields of CT and MRI, the same concepts can be applied to nuclear medicine imaging. Replacing the usual machine learning/radiomics pipeline by one based on end-to-end deep learning (Fig. 1) may be an attractive solution to solving some of the issues and limitations of radiomics. In this approach, all steps performed separately and sequentially (segmentation, feature extraction, modelling) are performed by one (or several) neural network(s). However, this approach actually replaces previous challenges by others more specific to the use of deep learning. First, these methods are data-hungry and therefore datasets much larger than those usually available in radiomics studies are needed for efficient training. Thus, techniques and tricks such as transfer learning and data augmentation, or reliance on segmentation networks to build classifiers [75], become crucially important. Second, the requirement to provide interpretable models is also important in clinical applications. Therefore there is a need to provide feedback and explanation to the end-users regarding a network decision, using for example network visualization techniques [76] to generate heat maps in the input images highlighting the areas in the image, or even within the tumour, that were the most relevant in reaching the decision. This is also important in understanding and correcting the remaining errors that the algorithms make, and in trying to address other issues including regulatory, legal and accountability issues [77].

A major paradigm shift is occurring in the design of most computer approaches designed for use in the clinic. It is unclear how much time is needed for deep learning methods to be integrated into clinical nuclear medicine practice, and to achieve full automation of most clinical tasks. Currently, these developments have focused on tackling the most common clinical problems for which sufficient data are available.

The aim of most developed methods is to solve one problem within one specific task. While they may excel in interpreting image and contextual information, they are usually not able to make associations the way a human brain does, and cannot replace clinicians for all tasks they perform. In addition, they may not yet have reached the same level of performance as the expert in all situations, and therefore a full artificial nuclear medicine physician still belongs to the domain of science fiction. On the other hand, the role of nuclear medicine physicians is likely to evolve as these new techniques are integrated into their practice, and it is therefore important that the acquisition of a basic understanding of these methods and concepts is part of their training. They will themselves also probably contribute to the training of AI, providing additional expert knowledge and experience to the tools they will then use.

The availability of data remains a crucial bottleneck in AI system learning, because curated data (ensuring the training data conforms to a number of quality criteria, which usually involves experts and is time-expensive) is simply not yet available for all tasks and in sufficient amount. On the other hand, the deep learning software platforms are open-source and because of this, experimentation and sharing of innovations has been fast and on a massive scale, and this may eventually also help in terms of data processing and availability. An additional concern for proper training of machine and deep learning models is the lack of standardization in both image acquisition and reconstruction (despite long-term efforts by societies such as the EANM, SNMMI and RSNA, and others) and in machine (deep) learning techniques themselves (including but not limited to, radiomics definitions, nomenclature, implementation, software, machine learning methodology and implementation, and optimization). The strong heterogeneity and variability in scanner models, vendors, acquisition protocols and reconstruction settings is a considerable challenge for training generalizable models that remains to be fully addressed. Efforts already ongoing, for example the Image Biomarker Standardization Initiative (IBSI) for radiomics standardization [78,79,80] and harmonization/normalization techniques [63, 81], should clearly be emphasized and supported to further improve these aspects in the future.