Machine Learning in Medical Imaging – Clinical Applications and Challenges in Computer Vision

Mehta, Ojas; Liao, Zhibin; Jenkinson, Mark; Carneiro, Gustavo; Verjans, Johan

doi:10.1007/978-981-19-1223-8_4

Ojas Mehta⁴,
Zhibin Liao⁴,
Mark Jenkinson^4,5,
Gustavo Carneiro⁴ &
…
Johan Verjans^4,6,7

1215 Accesses
2 Citations

Abstract

Applications for Machine Learning in Healthcare have rapidly increased in recent years. In particular, the analysis of images using machine learning and computer vision is one of the most important domains in the area. The idea that machines can outperform the human eye in recognizing subtle patterns is not new, but it is now gaining momentum with large financial investments and the arrival of many startups with a focus in this area. Several examples illustrate that machine learning has enabled us to detect more diffuse patterns that are difficult to detect by non-experts. This chapter provides a state-of-the-art review of machine learning and computer vision in medical image analysis. We start with a brief introduction to computer vision and an overview of deep learning architectures. We proceed to highlight relevant progress in clinical development and translation across various medical specialties of dermatology, pathology, ophthalmology, radiology, and cardiology, focusing on the domains of computer vision and machine learning. Furthermore, we introduce some of the challenges that the disciplines of computer vision and machine learning face within a traditional regulatory environment. This chapter highlights the developments of computer vision and machine learning in medicine by displaying a breadth of powerful examples that give the reader an understanding of the potential impact and challenges that computer vision and machine learning can play in the clinical environment.

Access provided by Autonomous University of Puebla. Download chapter PDF

A review on deep learning in medical image analysis

Article 04 September 2021

Enhancing Medical Diagnosis Through Deep Learning and Machine Learning Approaches in Image Analysis

Deep Learning in Medical Image Analysis

Keywords

1 Introduction

This chapter provides a state-of-the-art review of artificial intelligence in medical image analysis. We start with a brief introduction to computer vision and overview of deep learning architectures. We proceed to highlight relevant progress in clinical development and translation across various medical specialties of dermatology, pathology, ophthalmology, radiology, and cardiology, focusing on the domains of computer vision and machine learning. Furthermore, we introduce some of the challenges that the disciplines of computer vision and machine learning face within a traditional regulatory environment. This chapter highlights the developments of computer vision and machine learning in medicine by displaying a breadth of powerful examples that give the reader an understanding of the potential impact and challenges that artificial intelligence can play in the clinical environment.

2 Medical Imaging Modalities

The number of available imaging modalities has grown over the past few decades. From the early days of plain fluoroscopic images and simple light microscopy, imaging modalities that are now commonplace in well-resourced hospitals include computed tomography, magnetic resonance imaging, ultrasound imaging, positron emission tomography and digital microscopy, among others [1]. Beyond the various modalities of image acquisition, the digitization of images and technology of storing, transferring and manipulating image data has accelerated the pace of the medical imaging field to a point where ample opportunities exist to take advantage of this data for the benefit of patient care. In this chapter, we will provide examples of computer vision applications in medical problems.

3 General Principles of Computer Vision and Machine Learning

Computer vision is a field in computer science focused on identifying, analyzing, and decomposing images into meaningful elements in a process that emulates the inner workings of the human visual system. Essentially, it assigns a machine with a task that parallels the higher cognitive processing our brains perform from mere visual capturing of an image to its processing, interpretation and response.

There has been exponential growth in computer vision over the past decade due to gains in computational power, data storage and sharing capabilities, and the development of innovative machine learning models that have transformed the performance of artificial intelligence. Particularly, deep learning has been a transformative approach in the field of computer vision. Whilst the early development [2] and application [3] of neural networks occurred in the 1970s and 1980s, advances in the necessary processing power with the use of graphics processing units (GPUs) happened in the late 2000s enabling deep learning models, such as convolutional neural networks (CNNs), to be trained at an acceptable speed.

A critical step in a typical computer vision task is the ability to recognize patterns automatically. Underpinning good pattern recognition is the ability to have a large, reliable and high-quality dataset. Typically, a dataset is split into training, validation, and testing components. Training and validation datasets allow for model selection and parameter adjustment, and the test dataset enables the assessment of that model. Thus, we find that many examples of high performing artificial intelligence (AI) algorithms have large, high-quality datasets that underpin them.

Similarly, humans often rely on pattern recognition in everyday life, albeit in a more implicit way. When doctors assess images, whether it be from X-rays, MRI, ultrasound, or histopathology slides, pattern recognition is critical. Having seen more images of a particular disease condition makes doctors more competent to identify it the subsequent time.

Medical image analysis is an exciting field for the meaningful application of computer vision and machine learning techniques. Diagnoses and clinical decisions often rely heavily on the acquisition and interpretation of images by a clinician. There is a growing number of applications of AI in this field, and in this chapter, we will mention a few of them across a broad range of specialties. However, to better understand how computer vision and machine learning can be applied, we need to break down the individual tasks that AI models perform within medical image analysis, namely object classification, object detection and image segmentation.

4 Computer Vision Tasks and CNN Architectures

4.1 Object Classification, Object Detection and Image Segmentation in Medical Image Analysis

There are nuanced differences between the concepts of object classification, detection and segmentation. With classification, the goal is to identify the objects within an image, i.e. an image of a pack of wolves would be classified as an image of wolves (Fig. 1). Object detection would involve creating bounding boxes around each wolf to ‘locate’ the wolves within the image. Segmentation of the image would involve the analysis of the image at a pixel level to determine which pixels belong to the ‘wolf’ object and which do not. Therefore, the ‘wolf’ object is more closely outlined. Where object detection may result in bounding boxes that can overlap, segmentation is mutually exclusive, and a pixel in the image can only be attributed to a single object within the image.

In a conventional sense, the concept of pattern recognition is most easily compared to a classification task. In the domain of machine learning aided medical image analysis, classification tasks take the form of making global or study level diagnoses from available medical images or videos.

Extending beyond classification, the domains of object detection and segmentation techniques are often employed when the clinical decisions involve localization or complete tracing of the lesions down to the pixel level, where the main difference lies in the output forms: the object detection CNNs give a bounding box to indicate the location of a detected lesion, and the segmentation CNNs provide a concise segmentation of the lesion object. This also means the annotation cost between the two types of tasks is different. For the task of object detection, bounding box annotation enclosing a lesion object is much easier to draw. For the segmentation task, the annotation needs to be drawn carefully according to the lesion object boundary, with every pixel in the image assigned to a class. Naturally, the annotation costs of segmentation will be significantly higher. Generally, segmentation will be more time consuming than classification and detection.

4.2 CNN Architectures: A Deep Dive

4.2.1 Origin of CNNs

The origin of CNNs dates back to the “neocognitron” proposed by Kunihiko Fukushima in 1980 [3]. The “neocognitron” referred to a self-organizing neural network model inspired by the human visual pattern recognition system. Here, basic CNN layers were introduced: convolution (generating response to useful spatial patterns) and down-sampling (reducing the spatial size of input or convoluted feature maps). The LeNet-5, a seven-layer CNN introduced by LeCun et al. is a simple CNN model by today’s standard [4]. The LeNet-5 was successfully applied to the MNIST database (Modified National Institute of Standards and Technology database) of handwritten digits provided by United States Postal Service, achieving a 1% error rate [5]. Its success inspired the various CNN architectures seen today. In 2012, the development of CNNs rapidly expanded when AlexNet [6]—a GPU-accelerated deep CNN architecture won the ImageNet ILSVRC 2012 image recognition challenge [7].

4.2.2 CNN Design

A CNN architecture defines a stack of functional layers, each performing some form of mathematical computation to transfer its input features to output features in a specific order. Taking the LeNet-5 as an example, each convolution (conv) layer consists of a convolution operation and a nonlinear activation operation. In LeNet-5 the activation function is the hyperbolic tangent function (tanh), where the commonly used counterparts can be sigmoid or the more popular Rectified Linear Unit (ReLU) [8]. The average pooling (avg-pool) operation is used in LeNet-5 to perform the down-sampling operation, where the max-pooling is more favored in the recent CNN architectures. The seven layers of LeNet-5 can be simply described as sequence of (1) conv; (2) avg-pool; (3) conv; (4) avg-pool; (5) conv; (6) fc (fully connected); (7) softmax. The fully connected layer applies a linear transformation to the input and sometimes referred to as linear or dense layers in literature. The final softmax layer is used to compute the probability of each class for the classification task. Architecture-wise, modern CNNs such as the GoogLeNets [9, 10] and ResNets [11] discussed below are variants to the LeNet-5 with more sophisticated and well-designed functional layers, a longer stack of layers, and millions of trainable model parameters, providing much larger learning capacity.

The original GoogLeNet, often referred to as Inception v1, was designed by Google researchers [10]. The v1 network is a deep learning architecture with 22 layers and utilized the Inception modules for multi-scale and multi-path processing of features. The GoogLeNet was trained on the ImageNet dataset containing approximately 1.28 million images and won the ILSVRC 2014 challenge [12]. Over the years, the Inception networks went through several modifications from v1 to v4 [9, 13], exploring the structure variations of the Inception module for better learning capacities. The high performance of the GoogLeNets is reflected by its frequent usage in medical image analysis.

ResNets, designed by Microsoft researchers [11], are a family of deep learning architectures that utilize residual connections to allow information to optionally bypass layers of computation. They achieve efficient backpropagation from the loss signal to early network layers, allowing networks to be built deeper. The ResNet family are code-named by the number of layers in the network; the commonly used ones are ResNet-18, 34, 50, 101, and 152. The ResNets achieved extraordinary performance on the ILSVRC 2015 & COCO 2015 challenges [14].

4.2.3 Architectures for Object Detection

The object detection networks are commonly classified in two streams: (1) one-stage networks, e.g., YOLO [15], SSD [16], and RetinaNet [17], which aim to detect objects simultaneously (i.e., where are the lesions) and predict the object class (i.e., what type or classification of the lesion); and (2) two-stage networks, e.g., the R-CNN [18] detector family from the original R-CNN to Fast R-CNN [19] and Faster R-CNN [20], which first find the regions of interests and then in the second stage fulfil the object classification task. Although one-stage networks compute faster, two-stage networks often perform better.

4.2.4 Architectures for Image Segmentation

FCN (Fully Convolutional Networks) [21], U-Net [22], DeepLab [23], and Mask R-CNN [24] are commonly used architectures for image segmentation. FCN, U-Net, DeepLab are end-to-end one stage solutions generating pixel-level object classification directly from an input image; hence the raw network outputs may not be accurate on the object boundary. The raw network predictions can be further refined by Dense-CRF [25] to align the object boundary better. Mask R-CNN builds on top of Faster R-CNN by adding an FCN network in the second stage of detection to generate extra pixel-level segmentation, in additional to the object bounding box. This effectively achieves the goal of instance segmentation—separating multiple objects (even from the same class, i.e., identifying each wolf in the pack) in the same image from each other.

In the domain of medical image segmentation, U-Net has been widely adopted. This can be credited to its simplicity, wide available online implementations, and high performance, as evidenced by its ability to win medical image segmentation challenges such as the Dental X-Ray Image Segmentation challenge and Cell Tracking Challenge at ISBI 2015.

5 Clinical Applications

5.1 Dermatology

Dermatologists are all too familiar with the diagnosis of skin lesions based on visual inspection, often aided by a dermatoscope. Clinicians consider additional patient demographic and clinical information to aid the diagnosis with the possibility of further downstream confirmatory testing and therapy. Nonetheless, diagnosis largely relies on visual inspection and, therefore, image analysis in the diagnostic process invites an opportunity for the use of AI techniques.

Researchers around the world have seized this opportunity. For example, along with academic institutions, large corporations such as Google and Microsoft have research divisions involved in developing high performing deep learning architectures and fine-tuning them for application in the field of Dermatology.

Esteva et al. [26] retrained the GoogLeNet Inception v3 CNN architecture with 127,463 images of epidermal and melanocytic lesions in the training and validation set (Fig. 2). They were able to classify benign and malignant lesions in a testing set of 1942 biopsy labelled images with an AUC exceeding 91%. This result was better than the accuracy of the 25 dermatologists to whom the model was compared [26].

In a focused head-to-head task comparison of diagnosing melanoma from benign naevi, Brinker et al. compared the performance of the ResNet-50 deep learning architecture to 157 dermatologists from 12 different University hospitals in Germany [27, 28]. The CNN was trained on a labelled dermatoscopic dataset from the International Skin Imaging Collaboration (ISIC) archive, which contained 2169 melanomas and 18,566 atypical naevi. The CNN and specialists were tested on a dataset of 100 images. The CNN outperformed 136 dermatologists in terms of sensitivity and specificity [28]. When the ResNet-50 CNN was trained on a different biopsy-proven dataset of 4204 dermatoscopic images with a 1:1 ratio of melanoma to naevi, it conclusively outperformed junior and senior dermatologists against a testing dataset of 804 biopsy-proven images [27].

Similarly, a team from Microsoft Research Asia used ResNet-152 to classify 12 different skin diseases and achieved performance comparable to that of 16 dermatologists [29].

These studies demonstrate the potential of AI to assist in diagnosing dermatological conditions from a predetermined known set of diagnoses. However, whilst seemingly obvious, one must remind themselves that AI cannot classify an image to a clinical condition that it has not been trained against. Once again, this highlights the importance of a high quality and broad training dataset relevant to the task in question.

The commercial industry has capitalized on general optimism about the capability of AI. SkinVision, based in the Netherlands, allows users to upload photos of skin moles or spots. It subsequently classifies as benign or malignant and provides a risk assessment for the patient [30]. It has reported a sensitivity of approximately 95% at a specificity level of 78% in its ability to detect pre-malignant conditions. The application makes predictions on photos that have been uploaded by users via their personal device. To make a meaningful prediction, the software initially processes the image. This involves noise removal to eliminate minor irregularities e.g., freckles, and feature extraction to obtain geometric, texture and colour parameters, and image segmentation to identify the lesion of interest from surrounding skin. Rather than using a CNN, which is the hallmark of deep learning models, this application feeds the information obtained from feature extraction into a Support Vector Machine (SVM) classifier, a well-known method in machine learning [30].

Whilst the clinical applications of AI in dermatology are undoubtedly exciting, there are a lot of fundamental limitations that need to be considered. For example, conditions under which images are taken, both in training and testing, are particularly important in the process.

Dermatology is a highly specialized field, and it is well-known that the diagnostic accuracy of lesions assessed by family practitioners or physicians not specializing in the field is comparatively low. Dutch and British studies have estimated an underwhelming 40–60% of skin lesions are accurately diagnosed by general practitioners [31, 32]. This makes a compelling case for the use of smartphone applications to aid bedside assessment of skin lesions.

5.2 Pathology

Pathologists provide a critical perspective on the diagnosis of many medical conditions, and histological diagnoses often serve as the gold standard test in many situations. Much histopathological assessment is undertaken via light microscopy with additional tissue staining and immunohistochemistry to enhance diagnostic capability. As the diagnosis is often contingent on image analysis, the field of pathology lends itself to AI technology. Critical to image analysis by AI has been the advancement in technology that now enables digitizing histology slides with whole slide imaging scanners. [33, 34]

In 2016, a competition called CAMELYON16, hosted in the Netherlands [35], tasked researchers with developing automated solutions to detect the presence of lymph node metastases in tissue biopsies of women with breast cancer. It consisted of two tasks: a) identification of individual metastatic foci within the whole-slide image; b) classification of the presence of metastasis or not within the whole-slide image. Thirty-two submitted algorithms were trained on a set of 270 images. Deep-learning algorithms performed the best overall. The best performing algorithms had an AUC comparable to that of experienced pathologists in both tasks. An example of the results of the top 3 performing teams are seen in Fig. 3 [35]. The top-performing model, developed by contributors from Harvard Medical School and Massachusetts Institute of Technology, involved a deep-learning model with a 22-layer GoogLeNet architecture [10].

Similarly, a group from New York University School of Medicine applied deep algorithms to determine the presence and classify a subtype of lung cancer from histopathology whole-slide images (WSI) [36]. Coudray et al. trained images on WSIs from the Cancer Genome Atlas in order to classify them into normal lung, adenocarcinoma or squamous cell carcinoma. They were not only able to achieve diagnostic performance similar to that of pathologists with an average AUC of 0.97, but also able to train the network to predict adenocarcinoma subtypes based on their mutation status. The prediction of mutation status had an AUC of between 0.733 and 0.856. A summary of their approach can be seen in Fig. 4.

5.3 Ophthalmology

In the field of ophthalmology, fundus photography is a routine part of the clinical examination. Fundoscopy allows assessment of the retina, its vasculature and the optic nerve head. Fundoscopic findings can reveal a great deal about other systemic conditions such as diabetes, hypertension and raised intracranial pressure [37].

Google researchers developed and validated a deep learning algorithm to detect diabetic retinopathy in retinal fundus photographs [38]. Armed with a dataset of 128,175 fundoscopic images from patients presenting for diabetic retinopathy screening, they trained a deep learning model to detect the presence of diabetic retinopathy and diabetic macular edema. Their algorithm had an AUC of greater than 0.99 for detecting referable diabetic retinopathy or macular oedema.

A group of researchers from The University of Iowa worked on developing algorithms to automate the screening of diabetic retinopathy. They demonstrated that IDx-DR version X2.1, a system underpinned by a deep learning AI algorithm based on the Alexnet [6] and Oxford Visual Geometry Group [39] network architectures, was able to achieve 96.8% sensitivity, 87% specificity and an AUC of 0.980 in the screening of diabetic retinopathy [40]. In a prospective clinical trial with 900 enrolled patients, the AI system exceeded endpoints of superiority when compared to the Wisconson Fundus Photograph Reading Centre (FPRC), which is the typical gold standard. The IDx-DR system achieved a sensitivity of 87% and specificity of 91% [41]. The use of deep learning enabled the high performance seen here with AI, as it was a marked improvement from the previous algorithm, also developed by Abramoff et al., that did not incorporate deep learning methods [42]. The IDx-DR has since obtained approval from the United States Food and Drug Administration (FDA), being the first device authorized for marketing that provides a screening decision without the need for a clinician also to interpret the image or results [43].

5.4 General Radiology

As a specialty primarily working with imaging modalities that include x-ray, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound and nuclear imaging, there is an abundance of opportunity for AI to aid diagnostics. Radiologists may specialize further and focus on limited sections of the body, defined by anatomical regions or specific modalities.

A commonly ordered diagnostic test is the Chest X-ray. With the common Chest X-ray, there remains technical and patient-related factors that may hinder the process of standardizing the X-ray image that is then produced. Researchers at Stanford University developed CheXNet, a 121-layer dense CNN model developed on a database of 112,120 frontal-view chest X-ray images that were individually labelled from a set of 14 different diagnoses [44]. They found the model to exceed the diagnostic performance of four radiologists.

The enthusiasm of AI in the field of radiology has led to a number of image challenges, one of which is the Radiological Society of North America (RSNA) Paediatric Bone Age Machine Learning Challenge. This challenge provided a dataset of 14,236 paediatric hand radiographs and received 105 submissions globally. Most of these submissions used deep learning algorithms. With a mean patient age of 127 months in the dataset, the winning submission had an average deviation in age estimation of only 4.2 months [45].

Diagnosis of fractures on skeletal X-rays can also benefit from AI assistance. Researchers from the National University of Singapore used the Faster R-CNN architecture [46], a high performing object detection network, to identify radius and ulna wrist fractures. They trained the model on 7356 AP and lateral X-ray images of the wrist, and the model correctly detected and localized the fracture in 91% and 96% of cases in the AP and lateral views, respectively [47]. This can be seen in Fig. 5.

There are a growing number of AI applications in the field of advanced imaging. Within the MRI modality, there have been deep learning models created to perform analyses across organ systems including the brain, kidneys, prostate and spine among others [48]. Some of the many applications of deep learning models include brain lesion quantification and segmentation [49], diagnostic improvement for multiple sclerosis [50] and Alzheimer’s disease [51, 52].

Artificial Intelligence has also been used to improve workflow and image interpretation, aiding the radiologist. The US FDA has approved the use of an AI-based tool that automates brain segmentation and volumetric analysis. This tool has been studied and shown to assist in the diagnosis of Alzheimer’s dementia [52].

Artificial Intelligence models can also improve the efficiency of imaging, creating safer environments for patients. For instance, a deep learning model has also been developed to reduce the amount of gadolinium contrast used in MRI brains by ten-fold, without significant image quality degradation [53]. This may result in opportunities for patients with severe renal impairment who are often denied gadolinium-based imaging due to the nephrotoxicity of gadolinium and the possibility of an adverse reaction called nephrogenic systemic fibrosis [54].

5.5 Neuro-Radiology

Artificial intelligence can uniquely contribute to intelligent medical assessment and clinical workflow optimization.

The use of AI in imaging can enhance the speed of medical image interpretation and help prioritize images. In the outpatient setting, AI has been shown to significantly reduce the speed with which intracranial haemorrhage (ICH) can be diagnosed. Arbabsharani et al. [55] have shown that a deep learning model trained on over 37,000 ICH-protocol CT brain studies can lead to early diagnosis of ICH. They evaluated the model prospectively for three months and found that the meantime to diagnosis for an ICH in an outpatient CT brain study was significantly reduced from 512 min to 19 min. Their model achieves this impact by flagging and prioritizing scans deemed by the model to contain an ICH. This form of AI-based prioritization can have a significant impact on the clinical workflow.

Some early success has also been demonstrated by Viz.ai, a US FDA approved startup that uses deep learning methodology to diagnose large vessel occlusions on CT angiogram imaging. After the diagnosis of a large vessel stroke, the software quickly notifies a stroke response team of its findings. On average, the AI software alerts the on-call physicians within 6 min after CT angiography is completed, via a built-in ringtone. Physicians can access these images via the mobile application. Experience with this software has been positive, although in a small sample size of 43 patients, demonstrating that it reduces the time to treatment and overall hospital length of stay [56]. The software can lead to a 20-min reduction in door-to-puncture times and an improvement in the mean modified Rankin Score [57]. Although it is still early days with the implementation of this technology, it certainly provides a glimpse of how AI alteration of clinical workflow can improve patient outcomes and be worthy of reimbursement.

5.6 Radiation Therapy

Radiation therapy is a field that lends itself to a constructive partnership with artificial intelligence techniques. Microsoft Research Cambridge have developed methods to automate the segmentation of abnormal, malignant tissues, an integral part of the planning process in radiation therapy. As part of project “InnerEye”, researchers have successfully developed an 11-layer deep CNN called “DeepMedic” [58], for the task of brain tumour segmentation [59]. Automating parts of the planning process can save a significant amount of time for radiation oncologists as it is often a repetitive and arduous task. Segmenting out abnormal tissue that needs irradiation from benign tissue where irradiation needs to be minimized is crucial but time-consuming.

Similar adaptations of AI-based tissue segmentation have been performed. Prostate cancer segmentation has been performed on MRI images [60], creating opportunities to assist radiation therapy planning [61]. This concept has also been researched for treatments in breast cancer, lung cancer and abdominal cancer [62, 63].

5.7 Cardiology Application

Cardiologists also have a unique combination of imaging tools to aid decision making. These tools include the electrocardiogram (ECG), echocardiography, CT coronary angiography (CTCA) and cardiac MRI.

The standard electrocardiogram (ECG) is an incredibly informative tool and often the sole piece of information from which many important clinical decisions are made. Digitization of ECG data has allowed large scale data collection, opening up possibilities of using AI to help diagnose rhythms. Hannun et al. developed a deep neural network trained on 91,232 single-lead 30 second ECG strips from 53,549 patients with a patch monitoring device [64]. They were able to classify ten different arrhythmias in addition to sinus rhythm and noise to a level of accuracy that exceeded that of a group of cardiologists.

Echocardiography is a popular tool to assess the function of the heart. In addition to ejection fraction, there are a number of important measurements that have therapeutic implications. The US FDA has approved several AI-powered tools to assist with ejection fraction and other measurement estimation (Ultromics EchoGo [65], Caption Health [66]).

CT coronary angiography [67] and CT calcium scoring [68] are rapidly gaining popularity as tests to rule out the presence of severe coronary artery disease and predict the risk of cardiovascular events, respectively. Technology that uses AI to determine coronary calcium scores has currently already been approved by the US FDA [69, 70].

Overall, AI has been able to impact all steps in the cardiovascular imaging chain, namely decision support tools, examination, reconstruction, post-processing, diagnosis and prognostication [71]. Thus far, AI tools have been task-focused and do not span across the entire imaging chain, nor has that been the focus. Limitations to implementation of some of these tools in cardiovascular healthcare are related to regulatory approvals, uncertain added value to the clinician or patient, and a clear scarcity in traditional randomized controlled trials to prove efficacy [71].

6 Challenges

Successful integration of AI into daily clinical workflow presents numerous challenges. These challenges span the entire process of dataset management, algorithm development, regulatory approvals and implementation. For example, the process from data acquisition to model development and ultimate clinical use has been depicted in Fig. 6.

6.1 Dataset Management

Medical data are generated in an environment where information is held in a confidential manner and data sharing is highly restricted. As described above, the performance of AI algorithms is closely tied to its training datasets. Therefore, great care must be taken to de-identify data prior to its use in model building.

Datasets have their limitations. Recognition of these limitations is necessary to minimize data bias. The dataset population upon which the AI model is developed must be similar to the population upon which the model is applied. Dataset size has implications for model performance. While large datasets may allow for some inaccuracies, smaller datasets require high-quality data. With supervised learning, there must also be a widely accepted gold standard as this becomes the premise for ground truth [72]. AI models are trained to match results of the presented ground truth. Therefore, as technology improves and gold standards for diagnoses evolve, models must also be updated and retrained.

6.2 Algorithm Development and Maintenance

In the fields of computer vision and machine learning, most successful models have utilized deep learning techniques. Whilst traditional AI techniques are more interpretable, deep learning models have limited transparency despite achieving good quantitative performance. Due to challenges in visibility of step-by-step processes that lead the model from input to output, these models are often considered to operate within a “black-box” [73]. This phenomenon can impact the ability to generalize the model for applications in other situations that the model has not been directly developed against. Thus, the adaptation of the model to a new testing environment. For example, a model developed on the imaging data from one radiology centre may not have the same performance when applied to images acquired from another centre, despite evaluation of the same region of interest [74].

In addition to the development of an algorithm, there exists some challenges in its maintenance. An advantage of AI-based methods is that as availability of datasets grows, the algorithm can be retrained and constantly updated. However, this may lead to situations where the prediction for an individual task may change due to modification of the training dataset. This conflict would require reconciliation, likely by the physician.

Model security against adversarial attacks is another concern that will need to be addressed [73]. Scientists and physicians need to be mindful that deliberate alteration of data inputs can bias a model resulting in suboptimal or erroneous decisions [75]. This is also a consideration for policymakers and regulators alike.

6.3 Regulatory Approval

There are many barriers of entry to the regular uptake of AI by clinicians. Often, clinicians and healthcare providers can gain greater confidence in technology if approved by the US FDA regulatory body. However, the regulation of AI by the US FDA poses challenges not seen with hardware-based medical devices or pharmaceuticals.

To be approved by the FDA, currently a technology needs to obtain one of three broad categories of clearance: 510(k) clearance [76], Premarket approval (PMA) [77] or de novo pathway [78]. The FDA determines which category of clearance is necessary for the AI tool based on three considerations: risk to patient safety; the existence of predicate algorithm and degree of human input [79].

Risk to patient safety is determined by the duration and size of the impact caused by false positives or false negatives from a particular technology. This can be classified as low (Class 1), intermediate (Class II) or high (Class III). For high risk and specific intermediate risk scenarios, a PMA, the most stringent process of the three, would be required.

Technology that is incremental with an existing predicate technology benefits from the notion that its safety and efficacy must be at least comparable to that of existing technology. Therefore, if the technology can be shown to be at least as safe as another FDA-cleared technology, it may be eligible for a 510(k) clearance [79, 80]. For lower risk novel technology or one with a novel application and no legally marketed counterparts, clearance via the de novo pathway may be sought [80].

The degree of clinician input also affects the regulatory process. There is a distinction made between computer-aided detection (CAD) and computer-aided diagnosis (CADx). CAD technology can alert clinicians to relevant findings, whereas CADx technology provides an assessment of the disease by providing a diagnosis or differential list [79]. CAD has greater clinician involvement and therefore, a clinical decision support system powered by AI may pose a lesser risk as a CAD compared to a CADx.

7 Conclusion

In this chapter, we have discussed the various elements that go into a Deep learning AI model that is typically used in the fields of Computer Vision and Machine Learning. We have shown several representative use cases of the many examples across a range of medical specialties and highlighted the importance of the training dataset in model success and application.

If AI is to be incorporated into routine clinical practice, the collaboration between computer scientists and physicians is essential. Computer scientists require physician expertise to identify problems, provide relevant datasets and determine the appropriateness of the clinical application of the model. Physicians rely on computer scientists for model development, refinement and maintenance. The arrival of AI as an entirely new category of technology in the field of medicine has necessitated special attention from regulatory bodies such as the US FDA.

There is still much to do before AI becomes commonplace in clinical practice, but the response of the scientific community and regulatory bodies has been promising.

References

Bercovich E, Javitt MC (2018) Medical imaging: from roentgen to the digital revolution, and beyond. Rambam Maimonides Med J 9. https://doi.org/10.5041/RMMJ.10355
Ivakhnenko AG (1971) Polynomial theory of complex systems. IEEE Trans Syst Man Cybern SMC-1:364–378. https://doi.org/10.1109/TSMC.1971.4308320
Article Google Scholar
Fukushima K (1980) Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36:193–202. https://doi.org/10.1007/BF00344251
Article Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
LeCun Y et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1:541–551. https://doi.org/10.1162/neco.1989.1.4.541
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115:211–252. https://doi.org/10.1007/s11263-015-0816-y
Article Google Scholar
Nair V, Hinton G (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning, pp 807–814
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception architecture for computer vision. In: Proceedings of 2016 IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Chapter Google Scholar
Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of 2015 IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Chapter Google Scholar
He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 770, 2016–778. https://doi.org/10.1109/CVPR.2016.90
Net I (2014) Large scale visual recognition challenge 2014 (ILSVRC2014). http://www.image-net.org/challenges/LSVRC/2014/
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv arXiv:1602.07261
Google Scholar
COCO (2015) Common objects in context. https://cocodataset.org/#home
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv arXiv:1804.02767
Google Scholar
Liu W et al (2016) SSD: single shot MultiBox detector. arXiv arXiv:1512.02325
Google Scholar
Lin T, Goyal P, Girshick R, Kaiming H, Dollar P (2018) Focal loss for dense object detection. arXiv arXiv:1708.02002
Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv arXiv:1311.2524
Google Scholar
Girshick R (2015) Fast R-CNN. arXiv arXiv:1504.08083
Google Scholar
Ren S, He K, Girschick R, Sun J (2012) Faster R-CNN: towards real-time object detection with region proposal networks. arXiv arXiv:1504.08083
Google Scholar
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. arXiv arXiv:1411.4038
Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. arXiv arXiv:1505.04597
Google Scholar
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv arXiv:1606.00915
Google Scholar
He K, Gkioxari G, Girshick R (2017) Mask R-CNN. arXiv arXiv:1703.06870
Google Scholar
Krahenbuhl P, Koltun V (2012) Efficient inference in fully connected CRFs with Gaussian edge potentials. arXiv arXiv:1210.5644
Google Scholar
Esteva A et al (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118. https://doi.org/10.1038/nature21056
Article Google Scholar
Brinker TJ et al (2019) Deep neural networks are superior to dermatologists in melanoma image classification. Eur J Cancer 119:11–17. https://doi.org/10.1016/j.ejca.2019.05.023
Article Google Scholar
Brinker TJ et al (2019) Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur J Cancer 113:47–54. https://doi.org/10.1016/j.ejca.2019.04.001
Article Google Scholar
Han SS et al (2018) Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J Invest Dermatol 138:1529–1538. https://doi.org/10.1016/j.jid.2018.01.028
Article Google Scholar
Udrea A et al (2020) Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms. J Eur Acad Dermatol Venereol 34:648–655. https://doi.org/10.1111/jdv.15935
Article Google Scholar
Goulding JM, Levine S, Blizard RA, Deroide F, Swale VJ (2009) Dermatological surgery: a comparison of activity and outcomes in primary and secondary care. Br J Dermatol 161:110–114. https://doi.org/10.1111/j.1365-2133.2009.09228.x
Article Google Scholar
Koelink CJ et al (2014) Diagnostic accuracy and cost-effectiveness of dermoscopy in primary care: a cluster randomized clinical trial. J Eur Acad Dermatol Venereol 28:1442–1449. https://doi.org/10.1111/jdv.12306
Article Google Scholar
Aeffner F et al (2018) Digital microscopy, image analysis, and virtual slide repository. ILAR J 59:66–79. https://doi.org/10.1093/ilar/ily007
Article Google Scholar
Zarella MD et al (2019) A practical guide to whole slide imaging: a white paper from the digital pathology association. Arch Pathol Lab Med 143:222–234. https://doi.org/10.5858/arpa.2018-0343-RA
Article Google Scholar
Ehteshami Bejnordi B et al (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318:2199–2210. https://doi.org/10.1001/jama.2017.14585
Article Google Scholar
Coudray N et al (2018) Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 24:1559–1567. https://doi.org/10.1038/s41591-018-0177-5
Article Google Scholar
Schneiderman H (1990) Clinical methods: the history, physical, and laboratory examinations, 3rd edn. LexisNexis
Google Scholar
Gulshan V et al (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316:2402–2410. https://doi.org/10.1001/jama.2016.17216
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Google Scholar
Abràmoff MD et al (2016) Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning. Invest Ophthalmol Vis Sci 57:5200–5206. https://doi.org/10.1167/iovs.16-19964
Article Google Scholar
Abramoff MD, Lavin PT, Birch M, Shah N, Folk JC (2018) Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 1:39. https://doi.org/10.1038/s41746-018-0040-6
Article Google Scholar
Abramoff MD et al (2013) Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol 131:351–357. https://doi.org/10.1001/jamaophthalmol.2013.1743
Article Google Scholar
U.S. Food and Drug Administration (2018, April 11) FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems [Press release]. https://www.fda.gov/news-events/press-announcements/fda-permits-marketing-artificial-intelligence-based-device-detect-certain-diabetes-related-eye
Rajpurkar P et al (2017) CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv arXiv:1711.05225
Google Scholar
Halabi SS et al (2019) The RSNA pediatric bone age machine learning challenge. Radiology 290:498–503. https://doi.org/10.1148/radiol.2018180736
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Thian YL et al (2019) Convolutional neural networks for automated fracture detection and localization on wrist radiographs. Radiol Artif Intell 1:e180001. https://doi.org/10.1148/ryai.2019180001
Article Google Scholar
Lundervold AS, Lundervold A (2019) An overview of deep learning in medical imaging focusing on MRI. Z Med Phys 29:102–127. https://doi.org/10.1016/j.zemedi.2018.11.002
Article Google Scholar
Bangalore Yogananda CG et al (2020) A fully automated deep learning network for brain tumor segmentation. Tomography 6:186–193. https://doi.org/10.18383/j.tom.2019.00026
Article Google Scholar
Narayana PA et al (2020) Deep learning for predicting enhancing lesions in multiple sclerosis from noncontrast MRI. Radiology 294:398–404. https://doi.org/10.1148/radiol.2019191061
Article Google Scholar
Islam J, Zhang Y (2018) Brain MRI analysis for Alzheimer’s disease diagnosis using an ensemble system of deep convolutional neural networks. Brain Inform 5:2. https://doi.org/10.1186/s40708-018-0080-3
Article Google Scholar
Struyfs H et al (2020) Automated MRI volumetry as a diagnostic tool for Alzheimer’s disease: validation of icobrain dm. Neuroimage Clin 26:102243. https://doi.org/10.1016/j.nicl.2020.102243
Article Google Scholar
Gong E, Pauly JM, Wintermark M, Zaharchuk G (2018) Deep learning enables reduced gadolinium dose for contrast-enhanced brain MRI. J Magn Reson Imaging 48:330–340. https://doi.org/10.1002/jmri.25970
Article Google Scholar
Schieda N et al (2018) Gadolinium-based contrast agents in kidney disease: a comprehensive review and clinical practice guideline issued by the Canadian Association of Radiologists. Can J Kidney Health Dis 5:2054358118778573. https://doi.org/10.1177/2054358118778573
Article Google Scholar
Arbabshirani MR et al (2018) Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit Med 1:9. https://doi.org/10.1038/s41746-017-0015-z
Article Google Scholar
Hassan AE et al (2020) Early experience utilizing artificial intelligence shows significant reduction in transfer times and length of stay in a hub and spoke model. Interv Neuroradiol 26:615–622. https://doi.org/10.1177/1591019920953055
Article Google Scholar
Morey J, Fiano E, Yaeger K, Zhang X, Fifi J (2020) Impact of Viz LVO on time-to-treatment and clinical outcomes in large vessel occlusion stroke patients presenting to primary stroke centers. https://doi.org/10.1101/2020.07.02.20143834
Kamnitsas K et al (2017) Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal 36:61–78. https://doi.org/10.1016/j.media.2016.10.004
Article Google Scholar
Kamnitsas, K. et al. (2016). DeepMedic for Brain Tumor Segmentation. In: Crimi, A., Menze, B., Maier, O., Reyes, M., Winzeck, S., Handels, H. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2016. Lecture Notes in Computer Science, vol 10154. Springer, Cham. https://doi.org/10.1007/978-3-319-55524-9_14
Wang B et al (2019) Deeply supervised 3D fully convolutional networks with group dilated convolution for automatic MRI prostate segmentation. Med Phys 46:1707–1718. https://doi.org/10.1002/mp.13416
Article Google Scholar
Almeida G, Tavares J (2020) Deep learning in radiation oncology treatment planning for prostate cancer: a systematic review. J Med Syst 44:179. https://doi.org/10.1007/s10916-020-01641-3
Article Google Scholar
Men K et al (2018) Fully automatic and robust segmentation of the clinical target volume for radiotherapy of breast cancer using big data and deep learning. Phys Med 50:13–19. https://doi.org/10.1016/j.ejmp.2018.05.006
Article Google Scholar
Sahiner B et al (2019) Deep learning in medical imaging and radiation therapy. Med Phys 46:e1–e36. https://doi.org/10.1002/mp.13264
Article Google Scholar
Hannun AY et al (2019) Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med 25:65–69. https://doi.org/10.1038/s41591-018-0268-3
Article Google Scholar
Ultromics (2019). Ultromics receives FDA clearance for EchoGo Pro; a first-of-kind solution to help diagnose CAD [Press release]. https://www.ultromics.com/press-releases/ultromics-receives-fda-clearance-for-a-first-of-kind-solution-in-echocardiography-to-help-clinicians-diagnose-disease-1
U.S. Food and Drug Administration (2020, February 7). FDA Authorizes Marketing of First Cardiac Ultrasound Software That Uses Artificial Intelligence to Guide User [Press release]. https://www.fda.gov/news-events/press-announcements/fda-authorizes-marketing-first-cardiac-ultrasound-software-uses-artificial-intelligence-guide-user
Nazir MS, Nicol E (2019) The SCOT-HEART trial: cardiac CT to guide patient management and improve outcomes. Cardiovasc Res 115:e88–e90. https://doi.org/10.1093/cvr/cvz173
Article Google Scholar
Budoff MJ et al (2009) Coronary calcium predicts events better with absolute calcium scores than age-sex-race/ethnicity percentiles: MESA (Multi-Ethnic Study of Atherosclerosis). J Am Coll Cardiol 53:345–352. https://doi.org/10.1016/j.jacc.2008.07.072
Article Google Scholar
Shadmi R, Mazo V, Bregman-Amitai O, Elnekave E (2018) 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, pp 24–28
Book Google Scholar
Stemmer A et al (2020) Using machine learning algorithms to review computed tomography scans and assess risk for cardiovascular disease: retrospective analysis from the National Lung Screening Trial (NLST). PLoS One 15:e0236021. https://doi.org/10.1371/journal.pone.0236021
Article Google Scholar
Siegersma KR et al (2019) Artificial intelligence in cardiovascular imaging: state of the art and implications for the imaging cardiologist. Neth Heart J 27:403–413. https://doi.org/10.1007/s12471-019-01311-1
Article Google Scholar
Weikert T et al (2020) Machine learning in cardiovascular radiology: ESCR position statement on design requirements, quality assessment, current applications, opportunities, and challenges. Eur Radiol. https://doi.org/10.1007/s00330-020-07417-0
Wang F, Preininger A (2019) AI in health: state of the art, challenges, and future directions. Yearb Med Inform 28:16–26. https://doi.org/10.1055/s-0039-1677908
Article Google Scholar
Bien N et al (2018) Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. PLoS Med 15:e1002699. https://doi.org/10.1371/journal.pmed.1002699
Article Google Scholar
Finlayson SG et al (2019) Adversarial attacks on medical machine learning. Science 363:1287–1289. https://doi.org/10.1126/science.aaw4399
Article Google Scholar
U.S. Food and Drug Administration (2020) Premarket notification 510(k). https://www.fda.gov/medical-devices/premarket-submissions/premarket-notification-510k
U.S. Food and Drug Administration (2019) Premarket Approval (PMA). https://www.fda.gov/medical-devices/premarketsubmissions/premarket-approval-pma
U.S. Food and Drug Administration (2019) De Novo classification request. https://www.fda.gov/medical-devices/premarket-submissions/de-novo-classification-request
Kohli A, Mahajan V, Seals K, Kohli A, Jha S (2019) Concepts in U.S. Food and Drug Administration regulation of artificial intelligence for medical imaging. AJR Am J Roentgenol 213:886–888. https://doi.org/10.2214/AJR.18.20410
Article Google Scholar
Benjamens S, Dhunnoo P, Mesko B (2020) The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med 3:118. https://doi.org/10.1038/s41746-020-00324-0
Article Google Scholar

Download references

Author information

Authors and Affiliations

Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
Ojas Mehta, Zhibin Liao, Mark Jenkinson, Gustavo Carneiro & Johan Verjans
Oxford Centre for Functional MRI of the Brain (FMRIB), Wellcome Centre for Integrative Neuroimaging, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
Mark Jenkinson
South Australian Health and Medical Research Institute, Adelaide, Australia
Johan Verjans
Royal Adelaide Hospital, Adelaide, Australia
Johan Verjans

Authors

Ojas Mehta
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Liao
View author publications
You can also search for this author in PubMed Google Scholar
Mark Jenkinson
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Carneiro
View author publications
You can also search for this author in PubMed Google Scholar
Johan Verjans
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johan Verjans .

Editor information

Editors and Affiliations

Department of Emergency Medicine, Monash Health, Clayton, VIC, Australia
Manda Raz
St Vincent’s Hospital Melbourne, Fitzroy, VIC, Australia
Tam C. Nguyen
St. Vincent’s Health Australia, Sydney, NSW, Australia
Erwin Loh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mehta, O., Liao, Z., Jenkinson, M., Carneiro, G., Verjans, J. (2022). Machine Learning in Medical Imaging – Clinical Applications and Challenges in Computer Vision. In: Raz, M., Nguyen, T.C., Loh, E. (eds) Artificial Intelligence in Medicine. Springer, Singapore. https://doi.org/10.1007/978-981-19-1223-8_4

Download citation

DOI: https://doi.org/10.1007/978-981-19-1223-8_4
Published: 17 June 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1222-1
Online ISBN: 978-981-19-1223-8
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics