Keywords

1 Introduction

Artificial Intelligence (AI) approaches in medical imaging have witnessed significant evolution over the past years. The reasons for this are manifold: The field of computer vision has arguably seen the most drastic advance in its state of the art facilitated by the increasingly widespread application of deep learning [1], the introduction of large, curated data sets facilitating transfer learning approaches [2], the substantial research and industry interest in the domain and the availability of both hardware accelerators (mainly graphics processing units) and software frameworks providing pretrained algorithms and approachable application programming interfaces lowering the barrier to entry to the field. Furthermore, medical imaging represents an excellent target for machine learning applications as it is widely available in standardized data exchange formats and stored electronically [3]. Also, the availability of images alongside medical/radiological reports provide inbuilt human ground-truth assessments of relevant findings.

The trend of large dataset accrual has increasingly also manifested in the medical field, with large databases of medical imaging data being assembled as national efforts attempting to provide a cross-sectional assessment of large populations of both healthy volunteers and patients. The German National Cohort Health Study (NAKO Gesundheitsstudie, www.nako.de) and the United Kingdom Biobank [4] are examples of this development, providing access to thousands of imaging data sets to researchers and practitioners in the field, which can be used for the development of machine learning algorithms. These efforts supplement initiatives such as the [5], representing curated collections of oncology-specific material including medical imaging but also digital histopathology or genomic sequence data. The increasing roll-out of partially or fully electronic patient records signifies a further important step towards the collection of relevant metadata, which can be included in predictive models alongside image-based information. However, such data repositories are not without specific challenges: Large-scale data collection signifies an increased importance of privacy protection, for which next-generation methods have only recently been introduced [6]. Moreover, data quality is paramount for the development of predictive algorithms, thus care needs to be taken that images and clinical metadata are generated and expertly curated with high standards of quality assurance. Algorithms need to be trained and validated on diverse and representative patient collectives to ascertain not only their validity when applied to unseen data from new sources, but also to assert their fairness, control their bias and render them reproducible and interpretable. The deployment of machine learning algorithms to clinical routine poses great challenges of its own, necessitating interdisciplinary cooperation and continuous monitoring and improvements. Finally, the reimbursement of algorithm-based diagnostic services remains largely unresolved. Issues such as these represent but a limited subset of the parameters which need to be taken into account in the design of artificial intelligence algorithms for medical use and are discussed in other parts of this book, as well as touched upon later in this chapter.

Expectedly for a novel field, most of the literature published on the topic of artificial intelligence applications in medical imaging has focused on diagnostic applications in the field of oncology such as the prediction of tumor subtypes, genetic features, metastatic behavior or patient survival. Algorithms targeted at diagnosis often provide objectively verifiable outputs (e.g. by comparison of the algorithm’s prediction to a histopathologic result), and can be compared to the performance of human experts (e.g. true/false positive/negative rates), facilitating their validation. The field of therapy monitoring and theranostics, that is, the image-based expression quantification of relevant therapeutic targets, has however not yet witnessed the same level of research activity. Several reasons emerge, such as the following:

  1. 1.

    Treatment represents a heterogenous clinical process characterized by the application of several therapeutic approaches, often simultaneously. For instance, oncologic therapy consists of surgical, pharmacologic, radiotherapeutic, and other supplemental interventions. Establishing causal relationships between a certain treatment and its effect is therefore often a difficult undertaking.

  2. 2.

    The interplay between treatment and disease is hard to accurately quantify. For example, tumors demonstrate therapy escape phenomena leading to treatment resistance, which can be hard to distinguish from inefficacy or primary failure of the treatment.

  3. 3.

    Cancer imaging is influenced by systemic effects such as individual toxicity or comorbidities that can have a modulating effect on local findings (e.g. perfusion effects of anti-vascular agents versus decrease in cardiovascular output causing tissue mal-perfusion) and which are hard to deconvolve from specific treatment outcomes.

  4. 4.

    Effects mediating treatment response are also functions of the complex genetic, transcriptomic, epigenetic, and environmental tumor landscape in which causes and effects can be impossible to distinguish.

  5. 5.

    Novel treatments are continuously introduced, thus retrospectively collected data, often the bedrock of oncological machine learning applications, might not be applicable as algorithm training material.

  6. 6.

    Finally, cancer is insufficiently understood and represents a disease as individual as the patients themselves. Intra- and inter-tumoral heterogeneity thus pose hindrances to the applicability of algorithmic tools aimed foremost at generalization, drastically increasing the difficulty of training such algorithms.

In attempting to taxonomically classify the current literature about machine learning and artificial intelligence approaches for treatment response prediction and assessment as well as theranostics, two patterns emerge:

  • The majority of studies focus on the prediction of therapy response from a single timepoint and single surrogate. Such studies attempt to capture information from a singular imaging study, often the baseline examination, to predict differences in treatment outcome by characterizing a specific tumor phenotype.

  • Studies focusing on longitudinal/integrative monitoring of findings, for example, integrating the features of the tumor alongside relevant metadata and/or their evolution over the treatment period to predict the course of therapy.

With respect to the defining tumor features, research can be stratified into studies aiming at the quantification of tumor volume, either purely morphological or morphological and metabolic, for example, by the definition and automated tracking of metabolic tumor volume, and into studies concerned with higher-order descriptors of disease features or treatment targets. Such features can be derived from the tumor itself, for example, histogram metrics, texture features etc. and/or incorporate other data, such as clinical record information.

Finally, from a methodological point of view, research can be divided into studies applying traditional computer vision techniques by utilizing predefined mathematical descriptors of the image (features) alongside machine learning-methods typically used for tabular data analysis such as regression models, tree-based algorithms etc. and studies applying deep neural networks directly to the imaging data. For the former, the term radiomics is often used. We would like to point out that this distinction is not formal, and the term radiomics is used for deep-neural-network-based algorithms as well. Due to its ill definition, we eschew the usage of this term altogether and refer instead to the techniques and algorithms in question by their technical description, which we believe to be more both clearer and more informative.

The methodological concerns applied to a study are also a function of the data used for algorithm development. Unlike pure anatomic imaging, which typically takes the form of a three-dimensional stack of images in black and white, hybrid and functional imaging usually provides at least two congruent images for the same anatomical location. In case of dynamic acquisitions, such as multiple contrast media phases, the dimensionality of the data further increases. This data is often heterogeneous with respect to its spatial resolution (e.g. the technical resolution of the scanner or the effects resulting from interactions of radionuclides with the tissue leading to, for example, the actual resolution of PET differing from the nominal resolution of the detector elements). These factors need to be taken into account and potentially corrected for in quantitative imaging studies.

In the following sections we will highlight and contrast relevant literature findings regarding the application of machine learning to therapy response evaluation with a focus on hybrid oncological imaging and provide recommendations and future directions for practitioners and researchers in the field.

2 Literature Review

2.1 Morphological and Metabolic Tumor Volume Tracking

2.1.1 Volumetry-Based Oncological Response Assessment Frameworks

The conceptually simplest automated therapy surveillance approaches rely on the quantification of the reduction in tumor volume using automated methods, thus mirroring human evaluation, for example, by application of the Response Evaluation Criteria in Solid Tumors (RECIST). RECIST was among the first attempts to quantify tumor response to treatment in imaging. However, it relies on two-dimensional evaluation and on the definition of so-called target lesions, which necessarily limits its scope and potential representativeness, since individual tumor manifestations are employed as surrogates of disease burden. RECIST evaluation suffers from further notable limitations, mainly in tumor entities with ill-defined margins (e.g. pancreatic cancer) and can be a poor correlate of therapy response due to phenomena such as pseudo-progression, whereby tumor volume initially increases in response to therapy due to inflammatory changes. The 2009 position paper by Wahl et al. introduced a systematic framework combining previous guidelines for incorporating metabolic and functional imaging-derived information into tumor response assessment called PERCIST (PET response criteria in solid tumors). The PERCIST framework stipulates the categories complete and partial metabolic response, stable metabolic disease, and progressive metabolic disease by measurement of lean body mass-adjusted standardized uptake value (SUL). Similar frameworks have been proposed by other working groups, such as the EORTC, as well as combined functional/morphologic criteria such as the Lugano criteria proposed in 2014, incorporating elements of both RECIST and radionuclide uptake information.

The quantitative nature of PET allows the calculation of absolute radionuclide activity per volume tissue, offering benefits over the standardized uptake value, which has been shown to depend on several extraneous parameters. Thus, more recently, parameters like the total lesion glycolysis (TLG) and metabolic tumor volume (MTV) have been proposed as more precise biomarkers of disease activity. These however require a definition of the tumor volume itself, also termed segmentation.

2.1.2 Automated Segmentation-Based Volumetry Techniques

The evolution of automated volumetry methods thus closely follows the evolution of automated tumor segmentation methods. Earlier studies [7,8,9] rely on legacy segmentation techniques such as region-growing, nearest-neighbor or probabilistic graphical methods [10]. Hybrid imaging provides a benefit in this regard by providing a form of pre-segmentation mask via the high-SUV tumor region, helping to guide algorithm behavior. Such iso-contour-based segmentation methods [11] have been demonstrated, for example, in sarcoma. Similar approaches can also be applied directly to metabolic tumor volume (MTV) tracking without the associated morphological imaging. This approach has shown promise in several tumor entities such as rectal cancer [12], lymphoma [13], gynecological tumors [14], or esophageal cancer [15]. However, it has been noted that MTV lacks standardization and large-scale external validation and thus cannot be assumed to be a universal gold standard for therapy surveillance in comparison to, for example, the standardized uptake value (SUV) [16].

2.1.3 Evolution of Automated Segmentation Using Neural Networks

Automated segmentation has witnessed a substantial evolution with the introduction of neural network-based segmentation methods. Earlier methods, based on fully convolutional neural networks [17] have more recently been superseded by encoder–decoder architectures with transverse short-circuits, such as the UNet architecture proposed by Ronneberger et al. in 2015 [18] and their conceptual evolutions such as Feature Pyramid Networks (FPNs) [19]. A common trait of these architectures is the utilization of image information captured at multiple scales and the transmission of high spatial frequency (i.e. high detail) image information from early to late parts of the network with corresponding feature map sizes. Encoder–decoder architectures have dominated the segmentation literature since ca. 2015, and can be applied both in two and three dimensions. Fully automatic segmentation has been proposed as a solution to the aforementioned standardization problem [20] and been successfully applied to both treatment response assessment, for example, in breast cancer [21], where it has been shown to outperform dynamic contrast-enhanced MRI, and treatment planning, for example, for brain tumor radiotherapy [22].

3 Quantitative Image and Texture Analysis in Oncological Therapy Response Monitoring

The advent of quantitative image analysis workflows within the past 5 years has generated significant interest in the utilization of image-derived data for tumor characterization. Such approaches rely on either the bulk extraction of tumor-related image features, their preprocessing and modeling using machine learning (also termed radiomics), or the end-to-end analysis of image data using neural networks. As discussed above, we will not terminologically differentiate between these approaches, believing them to not be mutually exclusive. However, it is expected that the numerous shortcomings of the so-called radiomics workflow will eventually lead to its replacement by algorithms and techniques based on more robust techniques and models, and not susceptible to the same technical limitations we will describe below. The typical workflow of quantitative image analysis studies is common to both approaches, consisting of a volume of interest definition step and a modeling step. For volume of interest definition i.e. segmentation, both manual and all above-mentioned automatic methods are applicable and commonly used. For details on the various techniques, we refer to the chapters in Part I of this book.

The research developments in the field of treatment supervision in hybrid imaging have closely followed the main oncologic application areas of PET.

3.1 Neuro-Oncology

In neuro-oncologic applications, for example, studies have focused on the identification of molecular phenotypes with relevance for therapy and prognosis, such as isocitrate dehydrogenase status [23] from amino acid (fluoroethyl tyrosine, FET) PET scans in gliomas. The authors found that the inclusion of radiomic parameters improved diagnostic accuracy compared to PET-derived metrics alone. Similarly, a recent study by Hotta et al. found image texture parameters derived from 11C-methionine PET to yield excellent discriminative performance between recurrence of malignant brain tumors and radiation necrosis [24], a topic of critical relevance for steering treatment decisions. A multitude of works (see e.g. overview in [25]) have focused on brain metastases, amongst others for differentiation of primary brain tumors from metastases, pinpointing the origin of metastatic lesions to the brain and for differentiating treatment-related changes from recurrence. Recent studies have also focused specifically on treatment, with studies by Cha et al. demonstrating strong performance of convolutional neural network ensembles in the prediction of metastatic lesion response to radiotherapy [26] from baseline imaging examinations.

3.2 Head and Neck Cancers

In head and neck cancers, several studies have demonstrated the benefits of integrating quantitative imaging features with morphological tumor descriptors for predictive modeling workflows. For instance, Fujima et al. showed that in patients who underwent chemoradiation treatment for pharyngeal cancer, tumor shape and texture features were highly predictive of progression-free and overall patient survival [27]. They note that clinical parameters alone were not sufficient for discriminating survival subgroups in their study. Feliciani et al. employed texture metrics derived from pretherapeutic FDG-PET and found these imaging biomarkers highly predictive of local chemoradiation therapy failure [28]. Crispin-Ortuzar and colleagues aimed at predicting head and neck tumor hypoxia, which is usually assessed, for example, with specific hypoxia radiotracers such as 18F-FMISO, using FDG-PET-derived texture parameters. They report substantial improvements over baseline FDG-PET performance alone and note that quantitative imaging biomarkers can provide an alternative to hypoxia-specific radiotracers where such are unavailable [29].

3.3 Lung Cancer

In lung cancer, the relevance of including FDG-PET into patient workup was shown in the 2002 PLUS trial [30], demonstrating a 20% reduction in unnecessary surgical interventions. Consequently, several studies have investigated quantitative imaging features, for example, in the prediction of histological subtypes [31] or posttreatment survival [32]. Oikonomou et al. studied the association of quantitative image features with several outcomes, including local and distant disease control, recurrence-free probability and survival metrics and found image-derived features to represent the only predictors of overall survival, disease-specific survival and regional disease control [33]. A recent multicenter trial by Dissaux et al. demonstrated that FDG-PET-derived texture features predict local disease control in patients undergoing stereotactic radiotherapy for early-stage non-small-cell lung cancer and highlighted the potential value of such algorithms for therapeutic decision-making. The large body of research into machine learning and quantitative imaging biomarker applications in lung cancer has also provided insight into key challenges associated with such applications. Yang et al. note that the widespread application of texture-derived image features as prognostic predictors is impeded by a lack of quality control and robustness and proceed to demonstrate high inter-rater variability impacting the reproducibility of texture parameters [34]. Such challenges are of course not immanent to thoracic imaging workflows and have been repeatedly noted in previous studies irrespective of imaging modality applied [35, 36] with PET-specific solutions recently proposed [37].

3.4 Prostate Cancer

The role of hybrid imaging in prostate cancer is continuously evolving and expanding with the application of Gallium or Fluorine-labeled PSMA supported by recent meta-analyses [38, 39] and having been demonstrated to impact patient management in a majority of cases [40]. The first randomized prospective trial testing the influence of PSMA PET/CT on prostate patient outcome was announced in early 2019 [41]. Quantitative imaging feature studies have recently provided promising results applied to PSMA PET. For example, Zamboglou et al. demonstrate PSMA-PET-derived quantitative features to discriminate between cancer- and non-cancer-affected prostate tissue, as well as differentiate between Gleason scores of 7 and ≥8 and between patients with and without nodal involvement [42]. PSMA expression is an excellent example of a theranostic application, i.e. the specific expression monitoring of a therapy-relevant target: since Lutetium-PSMA can be used for radioligand treatment in advanced prostate cancer [43], machine-learning applications predicting response to such therapy directly from the images could hence represent a promising next step.

3.5 Breast Cancer

The field of breast cancer research has witnessed among the strongest advances in the utilization of quantitative imaging workflows and the application of machine intelligence, likely due to the high quality of image acquisition because of the lack of motion artifacts, the universal implementation of standardized reporting in the form of BIRADS and the high incidence. Hence, several studies have proposed image-derived features for the noninvasive characterization of breast cancer. For example, Antunovic et al. utilized pretreatment FDG-PET/CT of breast cancer and found histogram features to be associated with histopathological, molecular, and receptor expression subtypes [44]. Similarly, Huang et al. found image features derived from PET/MRI data to be associated with tumor grading, stage, subtype, recurrence, and survival [45]. Ou et al. utilize machine learning to differentiate between breast carcinoma and breast lymphoma based on texture features derived from FDG-PET/CT [46]. Focused on therapy response prediction, Antunovic and colleagues noted the association of molecular breast cancer subtypes with distinct responses to neoadjuvant chemotherapy and developed machine learning algorithms on FDG-PET/CT to predict pathological complete response in locally advanced breast cancer [47]. Ha et al. also utilized FDG-PET/CT to develop machine learning-derived metabolic signatures of breast cancer associated with Ki67 gene expression, pathological complete response to neoadjuvant chemotherapy and recurrence risk [48]. As noted above, however, such workflows are not without challenges and it was recently noted in the work by Sollini et al. that most evidence on the utility […] is at the feasibility level. The authors recommend harmonization, validation on representative datasets and the establishment of guidelines for the application of quantitative imaging parameters in breast imaging [49].

3.6 Gastrointestinal Oncology

The largest body of work regarding therapy prediction using quantitative image-derived parameters in hybrid imaging has arguably been produced in the area of gastrointestinal oncology. In esophageal cancer for instance, several studies on radiomics workflows have highlighted the significance of heterogeneity-related image features and have derived models predictive of prognosis and therapy response [50,51,52]. Yip et al. included longitudinally acquired datasets in their model and found a decrease in tumor heterogeneity-related texture and histogram features to be associated with tumor response and patient survival [53]. Ypsilantis et al. employed convolutional neural networks on PET scans and found them to outperform radiomics models in the prediction of therapy response in esophageal cancer [54]. Furthermore, sub-regional analyses, taking into account intra-tumoral heterogeneity are being assessed for their impact on the survival of esophageal cancer patients treated with chemoradiation, shown, for example, in the study by Xie et al. [55].

In pancreatic cancer, multiparametric imaging and machine learning have been investigated for differentiation of inflammatory and neoplastic processes [56]. The added utility of hybrid fusion imaging for the delineation of tumors has been noted by Belli et al. in a recent study [57] with applications in quantitative imaging workflows. In our own work, we note the importance and potential benefits of multiparametric data integration for accurate prognostic prediction in the field of pancreatic cancer [58]. Cui et al. identified quantitative parameters prognostic of stereotactic radiation therapy in pancreatic cancer from FDG-PET/CT imaging [59]. With the evolving role of hybrid imaging for therapy planning in pancreatic cancer [60, 61] especially with respect to neoadjuvant treatment regimens, as well as the advances in molecular subtyping including the distinction of differentially activated metabolic pathways, [62,63,64], it must be assumed that the scope of quantitative imaging workflows will soon expand further to hybrid imaging.

In rectal cancer, several studies have investigated the utility of pretreatment quantitative imaging biomarkers in the prediction of therapy response. The study by Lovinfosse and colleagues found texture parameters derived from pretreatment FDG-PET/CT predictive of survival in a cohort of patients with locally advanced rectal cancer treated with neoadjuvant chemoradiation, noting that these features outperformed volume-based parameters in predictive performance [65]. Amorim et al. compared FDG-PET- and diffusion-weighted MRI-derived parameters and observed the information gained from these modalities to be independent and complementary, underscoring the relevance of multiparametric hybrid imaging workflows in oncology [66]. The importance of tumor heterogeneity was noted by Bundschuh et al., who note that heterogeneity-related image features are relevant both early in the course of therapy and after its completion [67]. A similar dual timepoint study was performed by Jeon et al., who performed multiparametric modeling including clinical parameters and MRI-derived texture features and observed changes in these features to be associated with distinct risk phenotypes. The authors note that their results would be applicable to and benefit from the inclusion of functional imaging [68].

4 Discussion and Outlook

In this chapter, we review the applications of machine learning and artificial intelligence to therapy monitoring in the domain of molecular and hybrid imaging, as well as theranostics. Despite its somewhat earlier stage of evolution compared to applications purely focused on diagnosis, such as tumor detection or subtype classification, the multitude of studies presented showcase the intense research interest in the field and provide an outlook on the main objectives of techniques, algorithms, and applications aimed at therapy monitoring and response prediction. Evidently, diagnostic and theranostic applications are closely related. For example, specific tumor subtypes are associated with distinct therapy response, providing space for exploration of novel therapy targets and specific therapeutic agents. The clinical utilization of theranostic radiotracers is also expected to expand beyond the current main routine application of prostate imaging with PSMA: initial studies report successes, for example, in the application of texture analysis in neuroendocrine tumors [69]. The combined application of diagnostic and theranostic radiotracers has also been reported, with very recent results showcasing their complementary value in the outcome prediction of pancreatic neuroendocrine neoplasms [70], expanding on previous studies reporting on combined radiotracer application [71]. We believe machine learning techniques to herald a transition towards integrated theranostic applications which will likely blur the current borders between diagnosis- and therapy-response-focused studies. This evolution will obviously not remain without challenges. Foremost, it will be predicated on the development and availability of emerging and novel theranostic radiotracers beyond the above-mentioned fields of prostate cancer and neuroendocrine tumors, as well as the understanding of their interaction with biological targets and their unique challenges and pitfalls [72], to enable their utilization in AI-guided and precision medicine applications [73].

Reviewing the current literature findings, a clear trend can be observed from tumor tissue and metabolic volume tracking applications towards image texture analysis which can be ascribed to the above-mentioned rise of quantitative imaging workflows [74] within the past few years. We however still observe specific challenges, several of which are unmet in current literature:

Nearly all of the studies outlined above utilize hybrid imaging-based texture analysis workflows. A more thorough investigation on the differential contribution of each modality to the predictive model, or an analysis of the added benefit of hybrid imaging over a single modality were not routinely performed. Anatomic and functional imaging have been shown to present specific and individual challenges with respect to texture analysis, rendering such a differentiated assessment necessary [75]. Furthermore, the difficulties of harmonizing quantitative imaging workflows and rendering them robust towards variances between diagnostic equipment vendors, differences in human performance and unstandardized texture feature specifications have been noted extensively in the literature [36], mostly aimed at anatomical imaging modalities. However, recent works have focused on harmonizing texture features specifically in functional imaging [37] alongside efforts for protocol standardization and guidelines aimed at hybrid imaging studies [76]. Ultimately, we believe handcrafted quantitative imaging features and the field of radiomics to represent an intermediate step in the evolution of machine learning application in medical imaging towards deep learning-based workflows. The latter offer greater representational flexibility and robustness, obviating post-processing and harmonization requirements in favor of data diversity and larger patient cohorts and rendering them inherently more suitable for multicentric studies [77,78,79,80]. The advent of deep learning and associated advances in image registration [81] will also signify greater facility in integrating additional information from studies acquired at multiple timepoints. Longitudinal imaging has been shown to offer deeper insight into therapy-related changes in tumor biology [82]; however, it was only performed in a small fraction of the studies presented above due to the difficulties of acquiring multi-timepoint imaging and the escalated requirements towards selection of time-stable and reproducible image features [83]. Lastly, many of the studies presented base their assessment of therapy response on surrogate measurements, for example, on tumor volume decrease or on associations between therapy response and a decrease in image heterogeneity believed to mirror biological phenomena, which cannot always be objectively validated. Furthermore, therapy response is a multifactorial process greatly dependent on clinical parameters, which should be included in the modeling process [58]. The introduction of algorithms enabling the direct prediction of patient survival from images and the associated clinical data [84] will thus improve the capabilities for pre-therapeutic risk stratification and provide higher confidence for guiding therapy decisions.

In conclusion, this chapter discusses the applications of machine learning-based medical image analysis workflows, their applications to therapy response monitoring and theranostics in a hybrid imaging setting, as well as current and future research directions. We believe that the concurrent evolution and innovations in the fields of oncologic hybrid imaging, theranostics, and computer vision will fuel scientific discovery in the field and provide the opportunity for clinical translation and improvements to patient care.