Keywords

1.1 Introduction

The underlying molecular basis of cancer is complex and deciphering it has been the basis for many decades of research. The revolution in the available techniques that occurred in the 1970s led to the first in-depth studies concerning the molecular basis of cancer. Gaining an understanding of the changes that give rise to cancer at the molecular level allowed to not only understand how events in the body give rise to the disease but also how it progresses and also how these events could be targeted for the development of therapies. It was these initial studies that allowed for the development of the first drugs that could target molecules and signalling pathways to treat oncogenic processes, such as uncontrolled proliferation and resistance to apoptosis. When microarrays were first developed, their ability to create a profile of gene transcription in cancer (reviewed in (Govindarajan et al., 2012)) led to the first discussions of precision oncology. Precision oncology, a type of precision medicine, involves the tailoring of screening or treatment to an individual or specific population group based on the molecular profiles specific to that individual or group of individuals (Batch et al., 2022). The understanding of the molecular biology underlying cancer has been advanced in recent decades by the development of high throughput techniques such as next-generation sequencing (NGS) and advanced proteomics techniques such as SWATH. The data generated by these techniques has been used to decipher the molecular mechanisms of tumour initiation and progression. This data has also been used to construct database resources to integrate and analyse molecular mechanisms underlying cancer.

The ability of scientists to use these large datasets and databases to make useful observations and predictions concerning cancer is due to the advent and application of artificial intelligence. Artificial intelligence (AI) is an analytical or predictive operation performed by computers to emulate the decision-making processes of human beings. It has intensive problem-solving capabilities and can be used to perform tasks such as making predictions, scaling data, integrating different datasets, and reducing the dimensionality of data. Most importantly precision oncology can associate different patterns within data with real-world diagnoses, prognoses, or disease monitoring capabilities. The ability of AI to analyse large sets of data and transform this data into clinically actionable knowledge relies on the ability of AI to learn from either previous data or model teaching datasets. This learning ability is based on machine learning (ML) and deep learning (DL)-based approaches (Jiang et al., 2017) (Saltz et al., 2018) (Huang et al., 2020) (Ibrahim et al., 2020). The increase in interest in AI, precision oncology and precision medicine can be seen in the number of entries these topics find when used as search terms in PubMed. Standalone terms that carry entries for AI or precision medicine go back to the 1950s while the earliest entries for precision oncology date to the 1970s. There has also been a lot of interest in AI since the 1990s while interest in the other two increased rapidly from 2010 onwards. A combination of terms involving AI both precision medicine and precision oncology AND AI have only been topics of interest since 2017 (Fig. 1.1).

Fig. 1.1
Two bar charts depict the number of publications versus years. a, precision medicine, precision oncology, and A I for the year 1951 to 2022. b, A I and precision medicine and oncology are plotted.

PubMed entries on AI and precision medicine/oncology (a) using the terms independently the earliest references to AI or precision medicine come from the 1950s. While the earliest reference to precision oncology comes from 1977. All terms show an increase in the number of entries in PubMed. (b) The number of entries in PubMed for AI AND precision medicine and AI AND precision oncology since 2015/ This is a depiction of the number of entries identified in PubMed when the search terms AI AND precision medicine and AI AND precision oncology are used. For both terms, there are only regular entries after 2015 and the number of papers increases dramatically as time goes on. This demonstrates that these are topics of growing interest to researchers

1.2 AI in Medicine

In order for AI to accurately make predictions regarding a patient’s health and treatment requirements, it must be able to learn from previous data and analyses. In this way, it emulates the human clinician learning from past experiences. The ability of computing algorithms to learn and adjust their performance to better recognise patterns in data is known as machine learning (ML). Initially, an AI does this using training data to create or fine tune mathematical models (Hakenberg et al., 2012). Deep learning (DL) is a specific type of ML which uses data that is labelled (supervised) and data that is unlabeled (unsupervised) in the training process. It integrates these different types of data by using multi-layer non-linear analysis and classification. One of the applications of DL is in a process known as natural language processing, and reinforcement learning (Falk et al., 2019) (Kaelbling et al., 1996).

Natural language processing (NLP) algorithms use two terms and establish if they are linked by counting the number of times they occur together. If they occur together more frequently than they are associated (Cheng et al., 2008) (Santus et al., 2019). This technique is used to search large amounts of literature or databases of information for articles or cases of interest. This is important because there are vast amounts of literature relating to cancer research and studies. One of these algorithms, known as MEDscape uses NLP to search and organise medical patient notes. The useful data retrieved from these notes is used to automatically update patient records (Morin et al., 2021). AI using NLP algorithms have been used to accurately predict patient outcomes using a variety of data including imaging reports and oncologist notes from thousands of patients with multiple different tumour types. The predictions the AI was able to make included cancer progression, treatment response and the likelihood and speed of metastasis (Kehl et al., 2021).

AI makes use of neural networks to copy the way humans think and interpret data but without user bias and human error. These neural networks allow AI to make logical conclusions similar to those that could be reached by humans (Joshi et al., 2021). These networks use multiple fundamental computing units (neurons) to convert raw input data into classified, annotated and analysed output data. The nodes are connected to form a network that contains multiple layers including an input layer, multiple functional or hidden layers, and an output layer (Kuwahara et al., 2021). There are multiple types of neural networks. Artificial Neural Networks (ANNs) use multiple interconnected computational neurons that distribute data analysis tasks. These networks are useful for analysing multidimensional complex data. The distribution and initial decisions the network make regarding this data are based on the learning by these algorithms. This algorithm also analyses the data sorting decisions by analysing if these decisions make the outcome worsens or improves the output (Baskin et al., 2016). Convolutional neural networks (CNNs), a type of ANN, contain neurons that are self-optimised through learning. (O’shea et al., 2021). They are classed as Deep Neural Networks, because CNNs have multiple layers (Alquraishi & Sorger, 2021). Recurrent neural network (RNN) remembers previous analyses both the inputs and resulting outcomes and then treats all future inputs and outputs as related (Dupond, 2019).

AI must be able to make decisions to perform its analysis and useful feature selection. The decision tools used are generally decision trees. These decision tools are named trees as the graphical representation of the decision-making process resembles a flowchart. The AI performs a test or analysis of each piece of data, and this gives rise to separate results. Each decision is represented as a node and each result represented as a branch. The final results then lead to a further analysis of each branch. This gives rise to the branched tree structure with some decisions (results proving to be dead ends). The final terminal nodes are known as classification or label (Kamiński et al., 2018). There are different types of trees as shown in Table 1.1 and Fig. 1.2.

Table 1.1 Decision tree techniques
Fig. 1.2
Four illustrations of different decision trees. a, random forest decision tree, b, neighbor-joining decision tree, c, regression decision tree and d, Binary trees with a sequential decision.

Depictions of common decision tree methods (a) Random Forest trees use multiple trees and then select the most common outcome (b) Neighbour joining tree group nodes by similarity and select between these similar nodes (c) Regression trees the nodes are the mean of the previous nodes (d) Binary trees with sequential decisions based on one of two possible outcomes

AI has used decision trees to improve diagnosis. One study used lung cancer samples from the Lung Image Database Consortium (LIDC) dataset. This data was split 90% for training and 10% for testing. A labelled subset of the training set was used to train a CNN-based ransom decision tree. Once the CNN random decision tree was trained it was tested on the test data. This tree was able to accurately assign labels to the unlabeled data (Zheng et al., 2019). The origin of cancer tissue gas has been predicted based on miRNA profiling using AI based on two types of decision tree. Firstly, with neighbour joining methods and secondly with binary decision tree analyses. The neighbour joining method with an accuracy of 93.9%. The prediction accuracy of the binary decision tree method was 84.8% (Park et al., 2021).

Guidelines have been established in order to assist in the validation of the analysis provided by AI. These guidelines are known as the critical assessment of genome interpretation (CAGI) and were formulated using variants that were experimentally validated to cause disease and assessing if those predictions obeying the guidelines match these validated results (Andreoletti et al., 2019). The fifth edition of CAGI created in 2021 consists of 14 questions or criteria known as challenges (Andreoletti et al., 2019).

1.3 Biomarker Discovery and Application

An ideal strategy to improve the screening, diagnosis, classification, staging and treatment of various cancers, is the identification of various molecules or molecular patterns or profiles that can serve as biomarkers. These biomarkers can be genomic mutations, transcripts, non-coding RNAs, proteins, metabolites, or even epigenetic markers. When patients present with symptoms indicating that they may have cancer, the standard procedure is a physical examination and radiographic imaging, this may be followed by biopsy examination. Many cancers screening procedures require invasive or expensive procedures. Biomarkers are normally classified as prognostic or predictive. Prognostic biomarkers are used to categorise patients by their risk of developing disease (screening), diagnosis of the disease, risk of disease progression, severity of disease and risk of death from the disease (Echle et al., 2021). Predictive biomarkers can be used to select a targeted treatment. These predictive biomarkers can also be used for drug discovery or in clinical trials for new treatments (Echle et al., 2021). The discovery of these biomarkers relies on the use of large omics datasets and the identification of patterns of the presence or absence of molecules in these large datasets that can be associated with disease. AI is a vital tool in this discovery process as it allows these large datasets to be rapidly and accurately analysed and associations with diseases to be identified. This is made possible through the use of machine and deep learning algorithms. Indeed, DL-based image analysis has broad applications in multiple fields of modern medicine that involve image data: in radiology, DL performs (Echle et al., 2021).

Liquid biopsies involve the identification of biomarkers in various body fluids. This can be blood, urine, saliva, or even cerebral spinal fluid. This is a more ideal diagnostic or prognostic technique than normal biopsies as they are less invasive and traumatising to a patient. This is also an important consideration for precision medicine as these samples can be obtained and analysed rapidly to give a current view of the patients’ health and status (Kaur et al., 2017). These biomarkers can be transcripts, genomic markers in the form of DNA, proteins, or metabolites. In the case of RNA and DNA the transcripts would appear in biological fluids in the form of circulating cell free nucleic acids (ccfNAs). These ccfNAs have already been used as biomarkers in cancer diagnosis, prognosis, and monitoring (reviewed in (Pös et al., 2018)). It has also been established that these ccfNA appear in higher amounts in disorders such as cancer (Pös et al., 2018).

These nucleic acids can be in the form of cell free DNA which is fragmented DNA usually no longer than 450 bp in size. This DNA can be either of genomic or mitochondrial origin (Thierry et al., 2016). Circulating cell free RNAs include mRNA transcripts, non-coding RNAs, such as microRNA (miRNA), long non-coding RNA (lncRNA) and circular RNAs, transfer RNAs and ribosomal RNAs (reviewed in (Pös et al., 2018)). These nucleic acids are normally released into the body fluids as a result of cell death or in the case of many of the RNA molecules through active secretion (Vita et al., 2022).

1.4 Multi-omics Data

High throughput techniques like NGS allow for in-depth analysis of the mutational landscapes, gene expression patterns and epigenetic modifications for a large number of samples. Integration of “multi-omics” (genomics, epi-genomics, transcriptomics, proteomics, and metabolomics), and “non-omics” (medical/mass-spectrometry imaging, patient clinical history, treatments, and disease endemicity) data could help overcome the challenges in the accurate detection, characterisation, and monitoring of cancers. The complex analysis, annotation and combination of various omics data is sometimes only possible following data simplification. When these simplification processes are performed it is important to note that it may lead to the loss of information. The complexity of data is normally measured by the number of dimensions (variables) it has (Pezoulas et al., 2021). This reduction allows for increased ease and speed of analysis as well as a reduction in the space needed to store the data (Meng et al., 2016).

1.4.1 Genomics

The generation of large genomic datasets is due to advances in next-generation sequencer (NGS) (Paolillo et al., 2016)) and in silico computational algorithms. Whole genome sequencing allows for the analysis of all genomic alterations in cancer. It gives information regarding the number and identity of driver mutations and allows the mutational signature of the tumour to be identified. WGS has led to multiple sequencing projects and the establishment of databases containing the DNA sequence profiles of many cancers. These databases are listed in Table 1.2. To be truly useful genomic data must be integrated with clinical data, patient demographics, survival data, treatment status (Robinson et al., 2017). This is needed to link genomic events to specific cancers prognoses, and treatment responses (Robinson et al., 2017). AI has immense potential to contribute significantly at every stage of cancer management ranging from reliable early detection, stratification, determination of infiltrative tumour margins during surgical treatment, response to drugs/therapy, tracking tumour evolution and potential acquired resistance to treatments over time, prediction of tumour aggressiveness, metastasis pattern and recurrence (Bi et al., 2019).

Table 1.2 DNA sequence databases and their applications

1.4.2 Transcriptomics

Transcriptome includes the transcribed mRNAs, the alternately spliced isoforms of those mRNAs as well as non-coding RNAs such as miRNA. Any study looking at all these transcripts will aim to identify all the transcripts involved in metabolic processes and how they interact to result in gene expression. Studies that only examine specific sets of transcripts, i.e., mRNAs or miRNAs will provide answers to more specific questions. The result of epigenetic changes that occur in cancer can and have been studied by examining the transcriptome of cancers where these epigenetic changes have occurred. These studies have been undertaken in breast cancer (Robinson et al., 2015), prostate (Varambally et al., 2002) (Bhasin et al., 2015), head and neck squamous cell carcinoma (HNSCC) (Kelley et al., 2017).

1.4.3 Proteomics

Proteomic profiles reveal the actual cellular response to the conditions a cell is faced with. The change in protein expression also provides information regarding processes that affect protein modification, transport, and stability. Datasets of protein expression profiles are created using mass spectrometry and have been used to profile protein expression changes in response to therapy, monitor drug toxicity, and for diagnosis using specific biomarkers. These biomarker profiles, which are identified through protein expression signatures can also be to monitor disease progression, establish metastatic risk, do treatment follow-up to check for recurrence and stratify patients according to subtype (Keyl et al., 2022). Once again, these large data sets require AI to interpret them accurately, reliably, rapidly and consistently. Many AI algorithms have been used to infer protein–protein interaction networks from proteomic datasets (Keyl et al., 2022). Another significant role for AI in proteomics is predicting docking capabilities between drugs and their target compounds.

AI can also be used to combine and integrate proteomic and genomic data to identify DNA mutations related to protein signalling. These genetic changes can then be said to be genetic drives of cancer. This has been performed in breast cancer, where the identification of signalling pathways specifically altered in different breast cancer subtypes was achieved. It also identified SKP1 and CETN3 as two new markers for basal-like breast cancer (Mertins et al., 2016). Proteomic and transcriptomic data can be integrated to identify changes in the splicing of mRNA and the generation of different protein isoforms that may be characteristic of different cancers (Liu et al., 2017). Proteomic data can show a much stronger association to the clinical characteristics of a patient, and this is reflected by the close association of integrated proteomics data with the clinical outcomes, for example MS analysis integrated with histopathological diagnosis (Huber et al., 2014). This can be done with very small amounts of extracted proteins, for example a study was performed where very small amounts of protein were analysed using LC-MS which led to deep coverage of entire proteomes of specific cell types (Kulak et al., 2017). A recent development has been the use of single-cell proteomics which has gained importance since it is able to give insights into cancer heterogeneity and the metastatic ability of single cells compared to colonies. It is also able to provide information concerning rare/mutated cells (Doerr, 2019). This has been successfully used to grade and rank acute myeloid leukaemia hierarchy (Schoof et al., 2021b).

1.4.4 Metabolomics

Metabolomics is the analysis of small molecules, such as amino acids, lipids, nucleotides, carbohydrates and organic acids, which are produced because of primary or secondary metabolic processes. The populations of these molecules changes during, growth, in response to stress and consequently during the development and progression of cancer (Bertini et al., 2009) (Lin et al., 2011) (Veselkov et al., 2011). Therefore, metabolomics can be used as an indicator of the molecular mechanisms underlying tumorigenesis.

It can also be used to monitor disease progression, the response of the tumour to drugs and other treatments. As with proteomics, the profiling of metabolites relies on mass spectrometry but with the additional use of nuclear magnetic resonance (NMR) spectroscopy (Merz & Serkova, 2009). Traditionally the sample had to be separated or fractionated to achieve the best results, but separation-free MS techniques have been developed which reduce the volume of sample required and reduce variation in the data generated through the analysis. These include direct infusion-MS, MALDI-MS, mass spectrometry imaging (MSI), and direct analysis in real-time mass spectrometry (Dettmer et al., 2007). The Global Natural Product Social Molecular Networking (GNPS) is a small-molecule mass spectrometry networking hub. Researchers can deposit their own MS data for small molecules and this repository is available for other uses to search and use. GNPS has been shown to be very useful for cataloguing and organising MS/MS data using AI in the form of correlation and visualisation approaches. These can be used to identify spectra from related molecules (Wang, Carver, et al., 2016). Techniques such as principal component analysis or hierarchical clustering can be used in conjunction with ML to data mine these repositories to enhance the identification of spectra (Bertini et al., 2009) (Duan et al., 2005). These techniques have been used to identify metabolic biomarkers for multiple cancers including colorectal (Yamazaki, 2015), pancreatic (Zhang et al., 2012), lung (Zhuang et al., 2016), breast (Li et al., 2020), gastric (Ikeda et al., 2012), ovarian (Zhang et al., 2013) and prostate (Kelly et al., 2016).

1.4.5 Microbiomics

It has been estimated that the microbiota of the average human contains 40 trillion microbial cells (Sender et al., 2016). This microbiota is now known to play a role in the development and progression of cancer, especially through interactions with the nervous system and what is known as the gut–brain axis (reviewed in (Hull et al., 2021)). The profiling of all the microbial genes, metabolites, proteins and transcripts within a single patient is known as the patient’s microbiome (Sepich-Poore et al., 2021). This can partly be explained by the interaction between the microbiome and the immune system as this may favour the development of cancer (Mangani et al., 2017). Microbiomes have been so closely associated with cancer, that it is now known that specific populations of microorganisms and microbial metabolites are associated with specific cancers. Therefore, different microbial signatures can be used as biomarkers to diagnose or monitor cancer, and affect the safety, tolerability and efficacy of specific treatments. Microbiomics are studies using the same high throughput techniques such as NGS and mass spectrometry. Once again this gives rise to large databases, which require the use of AI and machine or deep learning to analyse and interpret this data. Any attempt to integrate this microbiomic data with other “omics” data would require the use of AI (reviewed in (Cammarota et al., 2020)). AI can also be used to identify and evaluate microbiome community interactions with other microbes or the host. This is done using Network analysis and is useful for the identification of changes in these interactions may be caused by microbial community structure, environmental factors, metabolites, clinical. These networks can be constructed based on similarity or correlation coefficients between pairwise variables. Extended relationships can then be inferred based on these pairwise interactions. This is done using algorithms such as SparCC (Sparse Correlations for Compositional data) (Friedman & Alm, 2012) and Compositionally Robust Inference of Microbial Ecological Networks) (Faust et al., 2012).

1.5 Imaging

Medical imaging techniques such as Magnetic Resonance Imaging (MRI), CT scans, and Positron emission tomography (PET), are commonly used in the diagnosis of cancer. This is because these techniques are good at soft tissue contrast. This allows them to be good at locating tumours and monitoring tumour progression. They are also non-invasive and have a high resolution (Magadza & Viriri, 2021) (Menze et al., 2014). The aim of imaging cancer or suspected cancer tissue is known as tumour segmentation. This is the act of distinguishing between normal and cancerous tissue. This is a vital procedure for the use of imaging techniques in diagnosis and treatment planning, monitoring treatment response and disease progression (Bousselham et al., 2019). AI has been successfully used to automate the interpretation of medical imaging. It has been shown to be able to analyse stained sections of temper tissue and segment these images allowing for the identification and quantification of various parameters. These include the rate and amount of mitosis (Romo-Bucheli et al., 2017), the presence and abundance of mutations (Coudray et al., 2018), the differentiation between nuclei from benign cells versus those from cancer cells (Sirinukunwattana et al., 2016) (Xu et al., 2016), spatial localisation of proteins (Saltz et al., 2018). AI-based image analysis is more reproducible, objective and is quantitative compared to manual assessment. Convolutional neural networks (CNNs) are most commonly used for image analysis (Muhammad et al., 2020). There are two types of automated segmentation, generative and discriminating methods (Magadza & Viriri, 2021). Both methods use the same seven stages of analysis image acquisition, image preprocessing (deionising/enhancement/restoration), image segmentation/feature extraction and object recognition (Pan, 2007).

Image Segmentation techniques are all based on pixel-based selection to discern a Region of Interest (ROI). However, there are different methods that are used to achieve this, In the region-based method, a pixel in the ROI is selected as the reference or seed pixel. Neighbouring pixels are then compared to this pixel in order to establish if they are similar enough to be included (Punitha et al., 2018). In the edge-based method, the image is reduced to only its important structural characteristics. This decreases the image size. It also allows for the image’s background to be separated from the object (Farag, 1992). The fuzzy theory-based method is an amalgamation of the region and the edge methods. (Basir et al., 2003). The partial differential equation (PDF) method calculates an energy of the image function. It then uses a partial differential equation (PDE) to describe the parametric curve evolution based on the energy of the image. It then uses this equation to find similar pixels (Sliž & Mikulka, 2016). In the threshold-based method, a grayscale binary image is created to reduce image complexity. This makes it easier to classify pixels (Bhargavi & Jyothi, 2014). Finally, the semantic segmentation network method classifies every individual pixel as either tumour or normal (Chen et al., 2017). When it comes to performing Whole Slide Image (WSI) segmentation some of these methods are more time and computing power consuming than others. The semantic method is the slowest and requires the most computing power (Guo et al., 2019).

1.5.1 Radiogenomics

Histopathological images have been integrated with genomics data in order to enhance feature selection based on cancer tissue architecture (López de Maturana et al., 2019). In a similar way, multi-omics data have been associated with features in medical images to develop predictive models using AI algorithms. This has been successfully performed for prostate cancer (Robinson et al., 2015), renal cell carcinoma (Schoof et al., 2021a), low-grade glioma (Brat et al., 2015), non-small cell lung cancer (Yu et al., 2016) and breast cancer (Yuan et al., 2012). This technique was initially given the name imaging genomics since it associated image features with genomic data. However, another term, radiomics or radiogenomics has been used to cover all the different omics data that can be associated with image features (Bodalal et al., 2019). Image features that can be associated with this omics data include structures, shapes, lines, points, colours or boundaries. It can even be extended to regions of the image associated with these features (Bi et al., 2019). In order to carry out a radiogenomics analysis, the AI must extract features identified on an image and link these features with phenotypes which is due to protein expression which can then be associated with genomic, transcriptomic and epigenomic or other omics changes (Rutman & Kuo, 2009). The appearance of these features on an image can then be an indication that these omics changes are present in the patient and the tumour. In the same way, these omics profiles can be used as indicators of for instance patient survival or disease progression, these associated image features can now be used to do the same (Berger & Mardis, 2018). AI is also necessary in radiogenomics as some of the features or changes in the cancer tissue may be so subtle that they may be missed by the human eye. Computer-assisted image analysis will accurately and consistently detect these changes based on what the algorithm has learned from previous data thanks to the application of machine and deep learning. These changes can then be associated accurately and without bias to any genomic, proteomic, transcriptomic, epigenomic, metabolomic or feature within the patient records. This is due to the analysis the AI can conduct on this data to extract unique features and then associate them with the unique image features. As previously stated, this integration would be too complex for a human being to complete accurately and timeously (Hussein et al., 2017). This end-to-end, automated data analysis or pipeline is able to compute and discriminate a vast number of features in both the image analysis and patient records or omics data to achieve the most accurate selection of features that are associated. And these models’ ability to learn means that they are optimising their analytical ability and performance while integrating these data sets and looking for associations (Jansen et al., 2018).

Ai and radiogenomics have been shown to be able to predict the neoadjuvant therapy response in esophageal cancer using a convolutional neural network to analyse fluorodeoxyglucose positron emission tomography (18F-FDG PET) images. It was able to associate features from these images with transcriptomic data and make highly specific and accurate predictions (Ypsilantis et al., 2015). In another study, AI algorithms were used to identify image features within PET images in breast cancer patients and associate these features with a genetic biomarker (Fujishima et al., 2017). Studies have also shown that features in images could be associated with tumour mutational burden, the average number of genetic mutations per megabase (Angus et al., 2019) and with the metastatic ability of the tumour (Trivizakis et al., 2019).

1.6 Drugs, AI and Precision Oncology

1.6.1 Drug Discovery and Re-purposing

The design or discovery of new drugs is a time-consuming and expensive undertaking with many potential compounds that have already had large amounts of money, $314 million to $2.8 billion, spent on them failing in the final stages. This means that all the time and money spent on them was essentially wasted (Waring et al., 2015). It is estimated that 90% of drugs fail to enter clinical trials for regulatory approval in (Fleming, 2018). AI can be used to remove those compounds most likely to fail from further development and prevent resources being wasted on them (Gawehn et al., 2016). This can be achieved using modelling to design better drugs by assessing a compound’s binding abilities, identifying their binding partners that may be biologically significant and establishing if there are any toxic interactions they may have. Some of these modelling algorithms that have been developed and that are already in use include the quantitative structure-activity relationship (QSAR) models. These models still face problems since they need to learn from experimental data sets. If these datasets are small, it may decrease the accuracy of the model. If the data is not validated there may be errors that would lead to errors in the final model due to the algorithm learning from incorrect data (Roy & Pratim Roy, 2009) (Zhao et al., 2017). AI can also be used to search through chemical databases to identify compounds with a structure that may indicate their ability to bind to a specific target. The searching of these large libraries is known as high-throughput screening techniques (HTS) (Inglese et al., 2006) (Zhu et al., 2016).

AI can also be used to predict how a drug will behave with respect to its physicochemical properties, bioactivity and toxicity. Physiochemical properties can be predicted using AI-based tools such as using Quantitative Structure Property Relationship (QSPR) workflow. This algorithm was originally designed to predict the physiochemical properties of environmental toxins (Zang et al., 2017). Other algorithms have also been developed that are able to perform function such as predicting the solubility of a drug, these include undirected graph recursive neural networks and graph-based convolutional neural networks (CVNN) (Kumar et al., 2017). The efficacy of drugs can be predicted by establishing their affinity for their target molecule, and toxicity and side effects may be predicted by identifying any unintended interactions it may have. AI is able to accomplish these actions by calculating the binding affinities for the drug on a large number of molecules. It can do this by identifying any similar features or structures the drug has with similar molecules or targets similar to the intended target (Öztürk et al., 2018). Screening for the most effective treatment for a specific patient is also possible using AI. One way this can be done is through the use of a digital twin.

1.6.2 Digital Twins

An important concept in the use of AI in medicine is the creation of a digital twin. This digital twin is the use of patient-specific data to create a virtual copy of the patient. An accurate digital twin requires accurate, detailed and up-to-date information about the patient (Batch et al., 2022). Deciding on the best treatment for an individual patient is one of the primary uses of the digital twin. This process is demonstrated in Fig. 1.3. As much data concerning the patient is gathered. This includes various omics data, patient records, medical imaging and imaging reports and any data concerning demographic or risk factors. AI then creates the digital twin. These twins are then duplicated, and each twin is given a virtual treatment. Using information regarding the molecular basis of these treatments, their side effects and case reports and studies of these treatments and AI algorithm can then run simulations for each individual treatment on the digital twin. The results can be used to select the best treatment option (Björnsson et al., 2019).

Fig. 1.3
An illustration of a patient's medical imaging, omics data, and patient records are run in the digital twins. Digital twins receive treatment with drugs and the best outcome is identified.

The use of digital twins in drug discovery. Various types of data from a patient are used to create the most accurate digital twin possible. This twin is then duplicated and treated virtually with all available drugs. Artificial intelligence then calculates treatment outcomes based on drug molecular interactions and the molecular environment of the digital twin

There are many ways these drugs can be tested in these simulations. One example is the use of protein–protein interaction (PPI) networks, constructed using a patient’s proteomic or transcriptomic data as a map. Changes in protein expression caused by a treatment can then be mapped to the patients PPI to identify changes in the pathways the drug could cause when used to treat the patient (Barabási et al., 2011) (Zhou et al., 2014). Another example could involve genetic changes detected in a patient. These alterations that lead to transcript and protein changes can be used to create a twin with the altered protein and protein expression patterns. A treatment targeting this protein can be used to treat the digital twin. The resulting effects on PPI and pathways can then be simulated in the twin.

1.7 Conclusion

The integration and analysis of data from various sources such as different “omics”, medical images and medical imaging reports, electronic medical records, or hand-written doctor’s notes, is only possible in a practical manner using AI and machine learning. The requirement for the use of AI has been necessitated due to the advancements in multidimensional “omics” technologies. The application of AI to biological data enables the understanding of complex biological systems. AI is already used in the automated extraction of information as well as the automated integration of health records. It is also currently used to organise, annotate and store data in big data storage systems such as cloud scaling. AI can outperform human clinicians and pathologists in all these tasks and it has enabled us to develop new techniques to study cancer, detect cancer at an early stage, more accurately predict patient outcomes. Decide on the correct treatment, monitor disease progression and treatment effectiveness, design new drugs and therapies, and stratify and classify tumours (Fig. 1.4).

Fig. 1.4
A circular illustration of A I in precision oncology with artificial intelligence and machine learning at the center and inputs are given on the left side. The output is defined on the right side.

A summary of the applications of AI in precision oncology

This chapter has served as a brief introduction to the various topics that will be covered in the following chapters of this book. The initial chapters will examine the use of AI in the identification and application of novel biomarkers for precision oncology. This involves the use of these biomarkers in diagnosis, screening, monitoring drug resistance and in the choice of the most appropriate treatment regimen. They will also discuss the novel use of ccfNAs as biomarkers in precision medicine. The last of these initial chapters will discuss the use of digital pathology in accomplishing these tasks and how the new field of radiogenomics will allow image features to be associated with molecular signatures. The book will then discuss some of the less discussed “omics” that are studied to obtain data that can be used to identify biomarkers for use in precision oncology. These include epigenomics, metabolomics and microbiomics. The final chapters of the book will discuss the practical and clinical application of AI to precision oncology in detail. The first of these applications the book will discuss is the use of nanotechnology in AI-based precision oncology. It will then focus on the use of AI-based devices in cancer screening. This will be followed by a chapter describing the use of AI to design new drugs and then a chapter describing the application of AI to increase the efficiency of immunotherapy. For the final applications, the book will discuss the role AI can play in helping clinicians and oncologists choose the correct treatment for an individual patient using various AI tools and techniques. The concluding chapter will summarise the topics covered and offer insights into the future of AI in precision oncology.