Keywords

1 Introduction

Machine learning (ML), a subset of “artificial intelligence” (AI), can be described as a class of algorithms which identify patterns within the data automatically and use these for prediction or decision making on future data [1] with minimum interventions by a human. The usage of ML has seen a rapid increase in medical imaging [2,3,4,5,6,7,8,9], and medical image analysis, since the objects in medical images can be quite intricate to be represented correctly by some basic function. Representing such objects requires complex modelling with several parameters. Determining numerous parameters manually from the data is nearly impossible. Therefore, ML plays an essential role in the detection and diagnostics in medical imaging field. Computer-aided diagnosis (CAD) - one of the most prevalent usages of ML [2, 10] - involves classifying objects into pre-defined classes (e.g., lesions or non-lesions, and benign or malignant) depending upon input data.

Towards the end of the 1990s, ML techniques, both supervised (that use training data labels to understand patterns in data) and unsupervised (that do not use labelled training data to understand the data patterns), became very popular in medical image analysis. Examples include segmentation of brain regions, extraction of representative features and classification for CAD. ML is immensely popular and happens to be the foundation of several successful commercial medical image analysis products. ML has brought a paradigm shift from manually-defined systems to systems that are computer-trained with the help of historical data by extracting representative features from the data. ML algorithms then derive the optimal decision function. Thus, extracting discriminative features from the input medical images becomes a critical step in designing an ML system. In traditional ML frameworks, feature extraction is defined by human experts. Therefore, such features are called handcrafted features and thus the resultant system performance is subject to the developers’ expertise.

Deep learning (DL) is a specific branch of ML. It is based on artificial neural network (ANN) which mimics the multilayered human perception system. DL has further brought a shift from conventional ML systems to a class of self-taught systems that learn the features which capture optimal representation of the data for the given problem thereby eliminating the need for handcrafted features. For decades even after introduction, ANNs found less acceptance as there were severe hinderances in the training of deep architecture to solve real problems. This was mainly because of vanishing gradient and overfitting, lack of powerful computing machines, and the scarcity of enough data for training the system. However, most of these constraints have now been resolved, due to improvements in - the obtainability of data, training algorithms and power of computing using graphics processing units (GPU). DL frameworks have shown promising results in duplicating humans in several arenas, such as medical imaging. The idea of applying DL to medical imaging is a getting increasingly popular however, there are several limitations that impede its progress.

We opine that ultimately, the implementation of ML/DL-based medical image analysis tools in radiology practice at a commercial scale will take place. However, we also believe that this will not completely replace radiologists, although some specific subset of manual tasks will get replaced with complementary modules which will provide an overall augmentation of the entire medical imaging system. In this review, we discuss various state-of-the-art applications of ML and DL in medical image analysis. We also discuss some common DL frameworks, strengths of DL and the challenges to be overcome in future.

2 Machine Learning

2.1 Types of Machine Learning Algorithms

Based on system training, ML algorithms are categorized as supervised and unsupervised.

Supervised Training.

Supervised learning determines a function which reconstructs output with inference from the input which is constructed with representative numerical or nominal features vectors, comprising of independent variables and the corresponding output variable also called dependent/target variable. If the output is a numerical variable, the training method is known as regression. When the output is a categorical variable, the method is called classification.

Unsupervised Training.

Unsupervised learning performs data processing independent of labels and is trained to describe hidden patterns from unlabeled data and group the data into segments wherein each segment consists of samples which are similar as per some attributes. Dimensionality reduction methods like Principal component analysis and clustering methods like k-means are some examples of unsupervised learning algorithms.

2.2 Applications of Machine Learning in Medical Image Analysis

A large number of research papers have described applications of ML in medical imaging, such as CAD of Parkinson’s disease using brain MRI using graph‐theory‐based spectral features [4] and 3D local binary pattern features [7], CAD of Schizophrenia using functional MRI using non-linear fuzzy kernel-based feature extraction [8] and fuzzy-rough-set feature selection [9]. Some more applications have been - detection of lung nodule in CXR [11,12,13,14] and thoracic CT [13,14,15,16,17,18] diagnosis of lung nodules into malignant or benign in CXR [19], microcalcifications detection in mammography [20,21,22,23], masses detection [24] and classification of masses into benign or malignant [25,26,27] in mammography. Recently, Aggarwal et al. [28] employed surfacelet transform, a 3D multi-resolution transform for capturing complex direction details while extracting features from brain MRI volumes for CAD of Alzheimer’s disease. Further, modified and enhanced intuitionistic fuzzy-c-means clustering methods have been suggested for segmenting of human brain MRI scans [29]. A survey of works on ML in CAD has been presented [30]. Some works on ML in CAD and medical image analysis are available in the literature [10, 31].

2.3 Machine Learning Algorithms – Towards Deep Learning

A very prevalent method for classification, support vector machine (SVM), generally shows superior performance in most of the classification problems, owing to its merits of convex optimization and regularization [32, 33]. Lately, ensemble learning, including algorithms leveraging boosting and bagging, is being commonly used for enhanced classification performance [34].

Numerical or nominal representations of input data are called features in ML. Thus, the performance of the ML algorithm largely depends on the quality of features fed to it as input. Determining informative features is an important aspect in ML systems. Many domain experts attempt to learn and create handcrafted features by the help of various estimation techniques, such as performance tests and statistical analysis. In this direction, various feature extraction and selection algorithms have been established for obtaining high-performing features.

Artificial Neural network (ANN) is one of the popular algorithms for ML. ANN models the computational units of human brain neurons and their synapses. ANN comprises of layers of interconnected artificial neurons. Every neuron is an implementation of a basic classifier which generates a decision depending upon the weighted sum of inputs. The weights of the edges in the network are trained by a back-propagation algorithm, wherein tuples of input and expected output are provided, imitating the situation wherein the brain depends on external stimuli for learning about achieving certain goals [35]. ANNs had notable performance in several areas but suffered limits like trapping in the local minima while learning optimized weights of network, and overfitting on the given dataset.

3 Overview of Deep Learning

ANN is a ML framework that derives inspiration from the functioning of human neurons. However, for a long time these did not gain much success. This field of research has recently re-emerged, in the form of DL which was developed in the field of computer vision and is becoming immensely useful in several other fields. It gained immense popularity in 2012, when a convolutional neural network (CNN) won ImageNet Classification [36]. Thereafter, researchers in most domains, including medical imaging, started implementing DL. With fast and increasing improvement in computational power and the availability of massive amounts of data, DL is becoming the most preferred ML technique for implementation since it is able to learn much more high-level, abstract and complex patterns from given data than any other conventional ML techniques. Unlike conventional ML algorithms, DL methods make the feature engineering process significantly simple. Moreover, DL can even be applied directly to raw input data. Hence, this lets more researchers to explore more novel ideas. For instance, DL has been successfully applied to CAD of Alzheimer’s [37], Parkinson’s [38] and Schizophrenia [39] diseases using MRI, as well as segmentation of brain regions in MRI and ultrasound [40].

Some of the common DL frameworks are - Convolutional Neural Network (CNN), Recurrent Neural Networks (RNNs), Auto-encoders (AEs) and Stacked Auto-encoders (SAEs), among others. CNNs and RNNs are supervised frameworks while AEs and SAEs are unsupervised frameworks. These are explained as follows.

3.1 Convolutional Neural Network

A CNN is a neural network proposed by Fukushima [41] for simulating the human visual system in 1980. A CNN comprises of several layers of connected computation units (neuron-like) with step-wise processing. It has attained substantial advancements in the field of computer vision. A CNN can derive hierarchical information by pre-processing, going from edge-level features to object-representations in images.

The CNN comprises of inter-connected convolutional, pooling and fully connected layers. The input is an image, and the outputs are class categories. The main task of a convolutional layer is detecting characteristic edges, lines, objects and various visual elements. Data propagation in the forward direction is equivalent to a performing shift-invariant convolution. The outcome of convolution is composed into the respective unit in the next layer which further processes it with an activation function and computes subsequent output. The activation function is a non-linear unit like sigmoid or rectified linear unit. For capturing an increasing field of view as we move to deeper layers, feature maps computed at every layer are reduced gradually and spatially by the pooling layer wherein a maximum operation is performed over the pixels within the local region. The convolutional and pooling layers are iterated a few times. After this, fully connected layers are used for integrating the features from the whole of the feature maps. Finally, a softmax layer is generally used as the output layer. Figure 1 depicts the architecture of a CNN.

Fig. 1.
figure 1

Convolutional neural network architecture

CNNs are different from simple Deep Neural Networks [42]. In CNNs the network weights are shared so that the network carries out convolution on the input images. Therefore, the network is not required to separately learn detectors for an object that occurs at multiple positions within the image. This makes the network invariant to input translations and considerably decreases the number of parameters to be learnt.

3.2 Recurrent Neural Networks

RNN is a special category of ANN well-suited for temporal data i.e. speech and text for sequence modelling. A conventional ANN assumes that all inputs (and outputs) are independent of each other. This assumption, however, is not applicable for many types of problems. For examples, in machine translation task of natural language processing –prediction of the next word in a sentence is dependent on the words preceding it. RNNs are called “recurrent” since the same computation is performed for each element of the given sequence, with the outcome being dependent on the prior computations. An alternate way to visualize RNNs is that they have a “memory” which saves information of the computations done so far and creates an internal state of the network. This way RNNs yield substantial enhancement in performance in natural language processing, speech recognition and generation, machine translation, handwriting recognition tasks [43, 44].

Although primarily introduced for 1-D input, RNNs are increasingly being used for images as well. In medical applications, RNNs have been employed successfully for segmentation [45] in the MRBrainS challenge.

3.3 Auto-encoders and Stacked Auto-encoders

AEs are networks trained with one hidden layer for reconstructing the input on the output layer. Key aspects of AEs are – (1) bias and weight matrix from input to hidden layer and (2) corresponding bias and weight from the hidden layer to the reconstructed output. A non-linear activation function is employed at the hidden layer. Further, the dimension of the hidden layer is kept smaller than the input layer dimension. This projection of input data onto a lower dimension space captures a dominant latent structure of the data.

The denoising auto-encoder (DAE) [46] is useful in preventing the network from learning a trivial solution. DAEs are trained for reconstructing the input by providing corrupted variant of the input data by adding noise (for example salt-and-pepper-noise). Initially, deep neural networks (DNNs) were difficult to efficiently train. These did not gain popularity until 2006 [47,48,49] when it was demonstrated that unsupervised training of DNNs in a layer-wise ‘greedy’ fashion (pre-training), followed by fine-tuning of the whole network in a supervised manner, could achieve better results. Stacked auto-encoder (SAE), also called deep AE, is one such popular architecture. SAEs are constructed by placing layers of AEs on top of each other.

4 Deep Learning in Medical Image Analysis

DL is particularly important for medical imaging analysis because it can take years of manual training to achieve acceptable level of domain expertise for appropriate hand-crafted feature determination required to implement traditional ML algorithm. DL has been extensively implemented in various medical image analysis tasks. The results demonstrate the immense potential of DL in achieving higher accuracies of automated systems in future. Some of these studies, carried out in majorly two application areas, have been discussed as follows:

4.1 Classification

Image classification was initially the area wherein DL made significant contribution to medical imaging. Early reports of DL-based CAD systems for breast cancer [50], lung cancer [51, 52, 58] and Alzheimer’s disease (AD) [53,54,55] demonstrate good performance in detection of the diseases. DL has been implemented in detection, diagnosis and analysis of breast cancer risk [50, 56, 57]. Some works have used DL for AD diagnosis, in which the diseases are detected using multi-modal brain data owing to effective feature computation by DL [53,54,55]. Numerous studies have surveyed the usage of DL in medicine [59,60,61].

In such settings, dataset sizes are comparatively smaller (hundreds/thousands) than those in computer vision (millions of samples). Transfer learning, therefore, becomes a natural choice for such applications. Fundamentally, transfer learning is the usage of networks pre-trained on a large dataset of natural images to attempt to work for smaller medical data. Two main transfer learning strategies are: (1) use of pre-trained network for extracting features, (2) fine-tuning a pre-trained network using medical data. Both strategies are prevalent and have been implemented extensively. The first strategy allows plugging the computed features in to image analysis system without the need to train a deep network. However, fine-tuning a pre-trained Google’s Inception v3 network on medical data attained a performance close to human expert [62]. CNNs pre-trained using natural images have astoundingly demonstrated promising results in some tasks, matching up to the performance of human experts.

Initially, the medical imaging researchers focused on network architectures like SAEs. Some studies [63,64,65] have applied SAEs for CAD of Alzheimer’s disease using brain MRI. Lately, a move towards CNNs is evident with application such as retinal imaging, brain MRI and lung computed tomography (CT) to name some.

Some researchers also train their networks from scratch rather than utilizing pre-trained networks. Menegola et al. [66] compared training from scratch to fine-tuning of pre-trained networks in their experiments and demonstrated that fine-tuning performed better on a small dataset of about a 1000 skin lesions images.

In a nutshell, DL, along with transfer learning, is capable in improving the performance of existing CAD to the level of commercial use.

4.2 Segmentation

Medical image segmentation, identifying the objects from background medical images to capturing meaningful information about their volumes and shapes, is a major challenge in medical image analysis. The potential of DL approaches has put them as a primary choice for medical image segmentation [67].

Application of CNN in segmentation of medical image has been incorporated in various studies. The overall perception is to carry out segmentation using 2D image as input and to apply 2D filters on it [68]. In an experiment, Bar et al. [69], low-level features are extracted from a model pre-trained on Imagenet. Thereafter, high-level features are extracted from PiCoDes [70] and fused together for segmentation. 2.5D approaches [71,72,73] are inspired by the fact that these have more spatial information than 2D but less computation costs than 3D. Usually, they encompass extraction of three orthogonal 2D patches in the three planes.

The benefit of a 3D CNN is to compute a powerful representation along all 3 dimensions. The 3D network is trained for predicting the label of a volume element based on the surrounding 3D patch content. The accessibility of 3D medical imaging and vast development in hardware has enabled full utilization of spatial 3D information.

5 Limitations of Deep Learning in Medical Image Analysis

Deep learning applied to medical imaging has the potential to become the most disruptive technology medical imaging has witnessed since the introduction of digital imaging. Researchers recommend that DL systems will take over humans in not only diagnosis but also in prediction of disease, prescribing medicine and guiding in treatments. Even though there are promising results from several research studies, there are a few limitations/challenges yet to be overcome before DL can become a part of mainstream radiological practice. Some of these challenges are discussed below.

5.1 Training Data

DL requires enormous training data as performance of DL classifier depends on the size and quality of the dataset to a large extent. However, scarcity of data is one the main barriers in the success of DL in medical imaging. Further, construction of large medical imaging data is a challenge since annotating the data requires much time and effort from not just single but multiple experts to rule out human error. Furthermore, annotations may not always be feasible due to non-availability of experts. Class imbalance in data is also a major challenge that is common in health sector especially in case of rare diseases wherein the number of samples in disease class are highly insufficient as compared to controls. In addition, due to the variations in prevalence of diseases and imaging machines, protocols and standards used in hospitals across the worlds, it is difficult to build generalized DL systems that can work for varied datasets from different sources.

5.2 Interpretability

The present DL methods are black-box in nature. That is, even when the DL method demonstrates excellent performance, mostly, it is impossible to derive a logical or technical explanation of the functioning of the system. This brings us to a question - whether it is acceptable to use a system which is unable to provide a reasoning for the decision it takes in a critical domain like healthcare?

5.3 Legal and Ethical Aspects

Issues may arise concerning the usage of clinical imaging data for development of DL systems at a commercial scale as subjects’ private and sensitive information is captured in their medical image data. With the growth of healthcare data, medical image analysis practitioners also face a challenge of how to anonymize patient information to avert its disclosure. Additional legal issues would arise if we implement a DL system in clinical practice, independent from intervention of a clinician. This raises a question - who would be accountable for any error that causes harm to patients?

6 Discussion and Conclusion

Currently, clinicians are experiencing a growing number of complex imaging standards in radiology. This adds to their difficulty to complete analysis on time and compose accurate reports. However, Deep Learning is showing impressive performance outcomes in a wide range of medical imaging applications. In many of the applications DL methods have mostly outperformed traditional ML methods. In future, DL is expected to support clinicians and radiologists in providing diagnosis with more accuracy at a commercial scale. However, penetration of DL in medicine is not as fast as in other real-world problems such as automated speech recognition, machine translation, object recognition etc.

Innovation and development in DL has led to disruption in ubiquitous technology space including virtual digital assistants and autonomous driving, to name some. With these ground-breaking technological advancements, it is not unreasonable to foresee that there will be critical changes in radiology in the coming years owing to DL.

In this review, we have discussed DL from the perspective of medical image analysis. The usage of DL in medical image analysis is currently at very initial stage. With the recent technological innovations, larger fully annotated databases are required to progress with DL-based progress in medical imaging. It will be essential to train the DL network, and to evaluate its performance. There are several other problems and challenges to resolve and overcome. Such as, legal and ethical issues arising due to the presence of identification related patient information present in the patient data used for developing commercial DL systems.

As we contemplate the use of DL in medical imaging, we see this revolution to be more of a collective medium in reducing the load from several repetitive tasks and increasing objective computations in presently subjective processes, rather than replacing clinical experts. The active involvement of experts is indispensable in this critical and sensitive area of medical imaging.