Abstract
Deep learning has been the subject of a significant amount of research interest in the development of novel algorithms for deep learning algorithms and medical image processing have proven very effective in a number of medical imaging tasks to help illness identification and diagnosis. The shortage of large-sized datasets that are also adequately annotated is a key barrier that is preventing the continued advancement of deep learning models used in medical image analysis, despite the effectiveness of these models. Over the course of the previous 5 years, a great number of research have concentrated on finding solutions to this problem. In this work, we present a complete overview of the use of deep learning techniques in a variety of medical image analysis tasks by reviewing and summarizing the current research that have been conducted in this area. In particular, we place an emphasis on the most recent developments and contributions of state-of-the-art semi-supervised and unsupervised deep learning in medical image analysis. These advancements and contributions are shortened based on various application scenarios, which include image registration, segmentation, classification and detection. In addition to this, we explore the significant technological obstacles that lie ahead and provide some potential answers for the ongoing study.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
There is a large amount of inter-reader variability when it comes to reading and interpreting medical images because the accuracy of the diagnosis and detection of many diseases is dependent on the expertise of individual clinicians (e.g., pathologists, radiologists,) in the current clinical practice. The goal of these schemes is to assist clinicians in reading medical images in a more time-efficient manner and in making diagnostic decisions in a manner that is more accurate and objective. The application of computer-aided quantitative image feature analysis is the basis for this method's scientific rationale. According to this line of thinking, the use of such an approach can assist in overcoming a number of debilitating factors in clinical practice, such as the wide range of expertise possessed by clinicians, the potential exhaustion of human experts, and the deficiency of adequate medical resources [1].
Target segmentation, illness classification and feature computation are the three stages that make up a typical development approach for traditional computer-aided design (CAD) systems. For the purpose of achieving mass classification on digital mammograms, for instance, [2] devised a computer-aided design (CAD) method. Initially, the process began by utilizing a modified active contour method to separate out the areas of interest (ROIs) containing the target masses from the background. Following this, a wide range of image features was employed to thoroughly analyze the characteristics of the lesion in a quantitative manner, encompassing its size, shape, margin structure, texture, and other pertinent factors. This comprehensive analysis led to the transformation of the original pixel data into a vector representing these distinctive characteristics. Ultimately, a classification model utilizing linear discrimination analysis (LDA) was employed to evaluate the feature vector and determine whether the mass was malignant or not.
The first few layers of the model could be able to collect some fundamental information about the lesion, such as the shape of the tumor, its location, and its direction. The following set of layers could recognize and retain the characteristics that are consistently associated to the malignancy of the lesion (for example, shape and edge irregularity), while discarding changes that are unrelated to this relationship (e.g. location). The pertinent characteristics will undergo further processing and be constructed by following higher layers in a way that is much more abstract. When the number of layers that are used is increased, it is possible to create a greater degree of feature representations. Important characteristics that are concealed inside the raw picture are discovered by a generic neural network based model in a self-taught way during the whole of the method; hence, manual feature creation is not required.
Deep learning applications in medical image processing have previously been covered in many outstanding review publications. Several early deep learning strategies were evaluated by [3, 4], where both of whom focused on supervised methods. [12] have recently studied the use of generative adversarial networks in medical imaging tasks, and the results are promising. For diagnostic or segmentation tasks [5] reviewed on how to employ the techniques of semi-supervised and multiple-instance learning. In the context of medical picture segmentation, [6,7,8] studied a number of approaches for dealing with dataset restrictions (e.g., limited or poor annotations). This research, on the other hand, aims to shed light on how recent advancements in deep learning could improve the medical image analysis area, which is typically slowed down by a lack of annotated data.
Our study has two distinguishing aspects from prior review papers: comprehensiveness and technical orientation. While emphasizing promising “not-so-supervised” techniques like self-directed, unsupervised, and semi-supervised learning, we also don’t discount the value of more traditional, supervised methods. Second, rather than focusing on a single job, we demonstrate how the aforementioned learning methodologies may be applied to four common medical picture analysis scenarios (classification, segmentation, detection, and registration). Deep learning-based object identification was a major topic of discussion since it has received so little attention in previous reviews (after 2019). We concentrated on the utilization of chest X-rays, mammograms, CTs, and MRIs in our research. Physicians in the same department can evaluate a wide variety of pictures of this sort since they have many similar qualities (Radiology). Other image domains (e.g. histology) may potentially benefit from some of these technologies, which have the potential to be utilized in radiographic or MRI pictures. Third, the most up-to-date models and architectures for accomplishing these goals are described. Even though it is rapidly developing as a promising area, self-supervised learning in the context of medical vision is still being rigorously studied. The results of this poll may be useful to a broad range of people, including medical researchers and researchers with skills in deep learning, artificial intelligence, and big data.
The rest of the paper is presents as follows: deep learning improvements in unsupervised and semi-supervised techniques are examined in Sect. 2. In addition, attention processes, domain knowledge, and uncertainty estimation are presented as three significant tactics for performance advancement. Classification, segmentation, detection, and registration are all covered in detail in Sect. 3 of this report. Section 4 analyses hurdles for further model improvement and proposes some views on future research routes.
2 Deep Learning Models: An overview
Deep learning comprises a variety of learning paradigms, such as supervised, semi-supervised, and unsupervised learning, depending on the existence of labeled data in the training dataset. In supervised learning tasks like image classification, photos are linked with associated labels to train the model, which then refines its parameters. As a result, during testing, the model assigns probability scores to each image based on its previously acquired correlations. Unsupervised learning, on the other hand, is the discovery of hidden patterns or structures in data that have not been labeled. Here, the model derives correlations simply from input data, with the addition of unlabeled data, which frequently contains complex information. This study provides an overview of recent advances in these fields, with the goal of addressing the issues caused by the scarcity of annotated data in medical imaging tasks. The presentation moves through common frameworks within each learning paradigm, emphasizing their potential contributions to medical imaging, as seen in Fig. 1.
2.1 Supervised Learning
As a deep learning architecture for medical image processing, convolutional neural networks (CNNs) are commonly used [9]. Convolutional and pooling layers make up the bulk of CNNs. A basic CNN in action while performing a classification job on a medical picture. Convolutional, pooling and fully connected layers are used by the CNN to turn an image into a class-based probability of that picture, which is then output. A pooling layer is added after the convolutional layer to lower the size of the feature maps and, as a result, the number of parameters. Two typical pooling procedures are average pooling and maximal pooling. For the remainder of the layers, the same procedure is used.
2.2 Unsupervised Learning
2.2.1 Auto Encoders
For dimensionality reduction and feature learning, auto encoders are commonly used [9, 10]. Simple auto encoders have limited representational power because to their small structural depth [11, 12]; however, more complex auto encoders with additional buried layers may boost representational strength. To learn more complex non-linear patterns, deep auto encoders (SAEs) stack numerous auto-encoders and optimize them greedily layer-wise. As a result, SAEs generalize better outside of training data than shallow auto-encoders [13, 14]. Typical SAEs have an encoder and a decoder network, both of which are usually symmetrical to one another. This may be done by adding additional regularization terms, such as the sparsity restrictions found in Sparse Auto encoders [15], into the original reconstruction loss. Auto encoders meant to be insensitive to input perturbations include the Denoising Auto encoder and the contractive Auto encoder.
2.2.2 GAN (Generative Adversarial Networks)
A generative model that is unconditional lacks the ability to directly regulate the modes of data that are being synthesized. Conditioning the generator and discriminator of the conditional GAN (cGAN) with extra information (i.e., the class labels) allows for the construction of the conditional GAN (cGAN), which is used to steer the process of data production. To be more specific, a noise vector denoted by z and a class label denoted by c are both sent along to G at the same time. Likewise, the real/fake data and class label c are both passed along to D as inputs at the same time. Class labels are not the only kind of information that may be conditionally included; pictures and other properties are also acceptable alternatives. Additionally, the auxiliary classifier GAN, also known as ACGAN, is another technique that may be used to enhance picture synthesis by using label conditioning. D in ACGAN is no longer given access to the class conditional information, in contrast to the discriminator in cGAN, which still has this access. In addition to determining which photos are genuine and which are fraudulent, D is responsible for rebuilding the class labels. When ACGAN is forced to execute the extra classification assignment, it is able to readily create pictures of a high quality. The model class probabilities for estimating the sum of the losses of each class are represented in Eq. (1),
where \({y}_{i}\) is the class label. The entropy loss on the probability of all classes are represented in Eq. (2),
Generative adversarial networks, often known as GANs, are a subcategory of deep neural networks that were first suggested for use in generative modelling by [16, 17]. This architecture includes a built-in framework for estimating generative models, which may gather samples directly from the appropriate underlying data distribution. This produces more accurate findings. As a result, it is no longer necessary to explicitly specify a probability distribution. It consists of two separate models: a generator G and a discriminator D. It is hoped that G will progressively estimate the underlying data distribution via the adversarial process and create realistic samples. The model parameter for calculating gradient are represented in Eq. (3),
2.3 Semi-Supervised Learning
This is in contrast to unsupervised learning, which can only work on unlabeled data. In particular, SSL is applicable to the situation in which there is a small amount of labelled data and a big amount of data that is not labelled. These two different kinds of data need to be relevant for the extra information that is conveyed by the unlabeled data to be beneficial in compensating the labelled data. When completing tasks with a restricted amount of labelled data, it is fair to anticipate that the addition of unlabeled data would result in an improvement in overall performance, and the more unlabeled data included, the better. In point of fact, this objective has been investigated for a number of decades, and the decade of the 1990s saw a rise in interest in making use of SSL approaches in text categorization. The book titled “Semi-Supervised Learning” that was written by Chapelle et al. in 2009 is an excellent resource for readers who are interested in understanding the link between SSL and traditional machine learning techniques. (Figs. 2, 3).
The authors show empirical evidence demonstrating, despite the fact that it might have a positive value, unlabeled data can occasionally lead to a decline in performance, which is an interesting discovery. But the current literature on deep learning suggests that this empirical result is shifting; more and more pieces, particularly in the area of computer vision. In addition, deep semi-supervised learning has been effectively used to medical picture analysis in order to lower the cost of annotation and improve performance. We classify common SSL techniques as one of three types: (1) those based on consistency regularization, (2) those based on pseudo labelling, and (3) those based on generative models. (Table 1).
Pseudo annotations for unlabeled instances are created by an SSL model itself as part of pseudo labelling and the resulting examples are used in conjunction with real labels to train the SSL model. Many cycles of this method are used to improve the quality of the pseudo labels and the overall performance of the model. Naive pseudo-labeling and Mixup augmentation [18, 19] may be used together to boost the performance of the SSL model [20]. Multi-view co-training and pseudo labelling are both effective [21, 22]. The mathematical declaration of consistency cost is represented in Eq. (4).
The teacher model holds its previous weights in alpha (α) proportion and (1 − α) portion of student weights represented in Eq. (5),
2.4 Strategies for Performance Enhancement
2.4.1 Domain Knowledge
When applied directly to medical imaging problems, most well-established deep learning models are likely to yield inferior results since they were built to evaluate natural pictures [23, 24]. This is due to the fundamental differences between natural scenes and medical imagery. As a first step, medical pictures often display considerable inter-class similarity, making it difficult to extract fine-grained visual characteristics that are necessary for understanding small distinctions and producing accurate predictions. Second, natural datasets used as benchmarks often comprise tens of thousands to millions of photos, while medical image databases are typically considerably smaller. This prevents very complicated computer vision models from being used in the medical field. This means that the question of how to tailor models for use in medical image analysis persists. Learning helpful may be aided by including appropriate domain knowledge or task-specific features.
2.4.2 Uncertainty Estimation
In highly-regulated therapeutic environments, trustworthiness is of paramount importance (e.g. cancer diagnosis). There are several sources of inaccuracy in model predictions, such as noisy data and inference mistakes; thus, it is preferable to measure uncertainty and ensure the findings can be trusted [25,26,27]. Bayesian methods approximate the posterior distribution over the parameters of neural networks. In order to quantify uncertainty, ensemble methods use a combination of different models. If you want to learn more about uncertainty estimate, I highly recommend reading [28,29,30].
2.4.3 Attention Mechanisms
The visual processing system of primates, which generates attention, chooses a subset of relevant sensory information for complicated scene interpretation as opposed to employing all available information [31,32,33]. Researchers in deep learning have been inspired to include attention into the development of cutting-edge models in a variety of domains by the concept of zeroing down on certain bits of inputs. We may essentially classify attention processes as either soft or hard based on the methods used to pick attended places in a picture. While the former learns a weighted average of features over all locations deterministically, the latter randomly selects a selection of feature sites to focus on [34]. Since hard attention cannot be differentiated, researchers have focused on soft attention even though it is more computationally costly.
3 Deep Learning Applications
3.1 Classification
Computer-assisted diagnosis (CADx) seeks to classify medical pictures so that malignant lesions may be distinguished from benign ones, or so that specific illnesses can be identified from input photos [35,36,37]. Over the last decade, deep learning-based CADx methods have shown tremendous success. However, effective performance from deep neural networks often requires a large number of labelled photos; this condition may be difficult to meet for many medical imaging datasets. Many methods have been explored to compensate for the dearth of big annotated datasets, but transfer learning has emerged as the clear frontrunner. While transfer learning has been shown to be effective in improving performance with little to no annotated data, alternative learning paradigms such as unsupervised image synthesis, self-supervised learning, and semi-supervised learning have also shown promising results. In the following sections, we will introduce the use of these learning paradigms in the categorization of medical images.
3.1.1 Supervised Classification
The global X-ray image’s attention heat maps were employed to hide vast, unimportant areas and emphasize small, key ones that hold diagnostic clues for the thorax illness. The suggested model was able to successfully combine global and local data, leading to improved classification accuracy. Input pictures include salient elements beneficial to the goal task, and each attention module has been taught to zero in on a specific subset of these local structures.
However, the effectiveness of deep learning models is very sensitive to the number and quality of training datasets and picture annotations. It might be difficult to build a sufficiently big and high-quality training dataset in many medical image analysis jobs, particularly 3D situations, because of challenges in data gathering and annotation [38,39,40]. Pre-trained convolutional neural networks (CNNs) with sufficient fine-tuning outperformed CNNs trained from scratch, as shown by [41,42,43].
3.1.2 Unsupervised Methods
3.1.2.1 Unsupervised Image Synthesis
Simple but effective, traditional data augmentation (e.g., rotation, scaling, flipping, translation, etc.) may provide more training examples for improved performance. However, it does not add much to the knowledge provided by the current training examples. In the medical field, GANs have been employed as a more involved strategy for data augmentation because of their ability to understand the distribution of concealed data and generate realistic pictures. To enhance liver lesion categorization on a small dataset, [41] used DCGAN to synthetically generate high-quality instances. Only 182 lesions, including cysts, metastases, and hemangiomas, are included in the dataset. The authors used standard data augmentation techniques (e.g., rotation, flip, translation, and scaling) to generate over 90,000 samples, which is necessary for training GAN. The classification performance was greatly enhanced by the GAN-based synthetic data augmentation, with the sensitivity and specificity rising from 78.6 and 88.4 percent, respectively, to 85.7 and 92.4 percent. More recently, the authors have expanded lesion synthesis from an unconditional setting (DCGAN) to a conditional one [42] (ACGAN). Besides synthesizing new instances, ACGAN’s discriminator also projected lesion classes based on the auxiliary information (lesion classes). Weak classification performance was shown using ACGAN-based synthetic augmentation compared to the unconditional baseline.
3.1.2.2 Self-Supervised Learning Based Classification
This approach works well in situations when there are many medical photos accessible but only a fraction of them have been tagged. As a result, the process of optimizing the model consists of two phases: unsupervised self-training and supervised fine-tuning. As a first step, the model is fine-tuned with unlabeled photos in order to efficiently learn features that are representational of the image semantics. Supervised fine-tuning is used to self-trained models for improved performance in future classification challenges. Self-supervision may be developed in practice using pretext activities or contrastive learning [44,45,46,47,48,49,50].
3.1.2.3 Self-Supervised Pretext Tasks
Such as rotation prediction [51, 52] and Rubik’s cube recovery, are used in self-supervised pretext task based categorization [53,54,55]. The two phases of this novel pretext job are picture corruption (by disorganizing patches) and image restoration. Classification accuracy for medical images was enhanced by using a context restoration pre-training technique. After an initial period of “pre-training,” the models were trained with the use of labelled examples. To learn feature representations, we focused on maximizing agreement between positive picture pairs, which may be two enhanced instances of the same image or many photos from the same patient. Fewer tagged pictures of dermatological and chest X-rays were used to fine-tune the pre-trained algorithms. In terms of mean area under the curve (AUC) for chest X-ray classification, these models fared better than their ImageNet-pre-trained counterparts by 1.1%, and in terms of top-1 accuracy (AUC), they performed better by 6.7%.
3.1.3 Semi-Supervised Learning
Semi-supervised learning, in contrast to self-supervised methods, requires integrating unlabeled data with labelled data through diverse strategies to train models for improved performance. As labelled data was scarce, [57, 58] used a GAN trained semi-supervisedly [59, 60] to classify heart illness in chest X-rays. This semi-supervised GAN, in contrast to the vanilla GAN [61, 62], was trained on both unlabeled and labelled data. In addition to predicting whether or not a picture is fake, its discriminator can now also classify input images as normal or abnormal. The semi-supervised GAN-based classifier outperformed the supervised CNN as the number of labelled instances became larger.
3.2 Segmentation
Medical image analysis requires segmenting lesions, organs, and other substructures from backgrounds. Segmentation takes more supervision than classification and detection. Recent research has focused on utilizing Transformers to supervised segment medical images, thus we placed them in the supervised segmentation area. This categorization does not preclude Transformer-based designs in semi-supervised or unsupervised settings or other medical imaging applications.
3.2.1 Supervised Learning Based Segmenting Models
3.2.1.1 U-Net and its Variants
Lower–level fine-grained features include vital information for exact localizations (i.e., labelling each pixel) that are necessary for image segmentation, whereas higher–level coarse-grained features capture semantics relevant for overall picture classification in a convolutional network. By employing skip connections, the network’s output may match its input’s spatial resolution. U-Net takes just two-dimensional pictures and creates segmentation maps for each pixel category. [63] looked at underlying structures to understand how long and short skip connections impact picture segmentation. Short skip connections are required for training deep segmentation networks. Simple skip connections between U-encoder and decoder subnetworks fuse semantically disparate feature maps. Before fusing feature maps, they advised bridging the semantic gap. In U-Net + + , stacked and thick skip connections replaced basic ones. In four medical picture segmentation tasks, the recommended design beat U-Net and wide U-Net [66,67,68,69,70,71,72,73,74,75].
Attention gates (AGs) were proposed by [64] to be included into the U-Net design in order to minimize the amount of irrelevant data and highlight the most important salient aspects conveyed through skip links. Attention U-Net often outperformed U-Net when it came to CT pancreas segmentation. To assess uncertainty in segmenting images of the prostate by magnetic resonance (MR) and the chest by computerized tomography (CT), [65] developed a hierarchical probabilistic model. The scientists employed vibrational auto encoders and a number of latent variables to mimic segmentation changes at different resolutions and deduce the uncertainties or ambiguities in the experts’ comments.
3.2.1.2 Transformers for Segmentation
The field of natural language processing (NLP) makes use of a class of encoder–decoder network designs called “transformers” to perform sequence-to-sequence processing. Multi-head self-attention (MSA) is a crucial sub-module that employs several parallel self-attention layers to produce numerous attention vectors for each input all at once. In contrast to U-Net and its derivatives, which rely on convolutional neural networks, Transformers use self-attention processes, which have the benefit of learning complicated, long-range relationships from input pictures. Both a hybrid approach and a Transformer-only approach exist for implementing Transformers in the context of medical picture segmentation [74]. The Transformers technique does not use convolutional neural networks, but the CNN-based hybrid method does.
Previously, we saw that U-Net and its derivatives that rely on convolution processes have produced effective outcomes. Skip connections allow the decoder to make advantage of the encoder’s low-level/high-resolution CNN characteristics, which provide accurate localization information. These models are often poor at depicting long-range relations because of the localization that is inherent to convolutions. Despite the fact that transformers based on self-attention processes are adept in capturing long-range dependencies, the authors discovered that this technique, on its own, was unable to provide desirable outcomes [76,77,78,79]. This is due to the fact that it places an emphasis only on acquiring knowledge of the global context while paying no attention whatsoever to acquiring knowledge of the finer, local aspects. Therefore, the authors suggest integrating global context from the Transformer with fine-grained spatial information from CNN features. Trans U-Net employs a skip-connected encoder–decoder architecture, as illustrated in Fig. 4.
The embedded picture patch sequence is fed to the multi-layered transformer branch, which extracts global context details. After the last layer's output has been processed, it is transformed into 2D feature maps. These maps are up sampled to greater resolutions at three distinct scales in order to recover finer local information. In a similar vein, the CNN offshoot employs three ResNet-based blocks to extract characteristics at three distinct spatial scales, starting at the local level and working its way up to the global level. In order to selectively fuse features from both branches with the same resolution scale, a separate module is utilized. Both the local and global contexts may be captured by the combined characteristics [80, 81]. The final segmentation mask is created using the multi-level fused features. Transfuse was able to successfully segment prostate MRI scans [82,83,84,85,86,87,88,89,90].
3.2.1.3 Self-Supervised Pretext Tasks
When just a limited number of annotated examples are available, self-supervision through pretext tasks and contrastive learning is often used to pre-train the model and allow tackling downstream tasks (such medical picture segmentation) more precisely and effectively. Application circumstances might inform the development of the pretext tasks, or the tasks could be selected from the canon of what has already been employed in computer vision. To address the former [66] developed a unique pretext task to segment cardiac MR images based on predicted anatomical locations. Accurate ventricles segmentation was taken on using the characteristics learned by the system autonomously in the pretext task. When just a small number of annotations were given, the suggested technique nevertheless outperformed the traditional U-Net trained from scratch in terms of segmentation accuracy. More trustworthy instruction for the student model may be generated by the uncertainty-aware teacher model, and the student model may be able to enhance the teacher model. The c method may also be used to enhance the mean-teacher model. The instructor model also adaptively updated the student model in the same way. The suggested technique outperformed state-of-the-art methods in segmenting pneumonia lesions and showed great resilience to label noise [91, 92].
3.3 Detection
Here, we will first take a quick look back at a number of current milestone detection frameworks, including one- and two-stage methods. Importantly, we present these detection frameworks inside the supervised and semi-supervised learning paradigms since these contexts are where they are most often used. We will next discuss the use of these frameworks for locating both common and rare lesions. Last but not least, we will present GAN and VAE-based unsupervised lesion identification.
3.3.1 Supervised Lesion Detection
These frameworks need to be modified in many ways, such as by integrating domain-specific features, estimating uncertainty, or adopting a semi-supervised learning technique, in order to provide high detection performance in the medical domain [97,98,99,100,101,102,103].
In order to suggest potential nodule locations from 2D axial slices, an FPN was used to the deconvolution feature map. The authors recommended allowing the classification network to take in the complete spectrum of contexts of the nodule candidates to bring down the false positive rate. They opted for a 3D CNN over a 2D one so that more unique characteristics may be recorded for nodule detection by taking use of the 3D environment of candidate locations.
For the purpose of training 3D CNN-based models, [70] created a large dataset (PN9) including more than 40,000 annotated lung nodules. Through the use of correlations between successive CT slices, the scientists enhanced the model’s capacity to identify big and tiny lung nodules. Long-range relationships between locations and channels in the feature map were captured using a non-local operation-based module applied to a slice group.
In recent years, semi-supervised techniques have been used to medical item recognition in order to boost its accuracy. Before any lesion annotations were added to the CT scans, an FPN was applied to the pictures to create dummy labels for the objects in the images. Afterward, mixup augmentation was used to combine the pseudo-labeled instances with those that had ground truth annotations. However, the scientists adopted the mixup augmentation approach originally developed for classification tasks using picture class labels and applied it to the lesion detection problem with bounding box annotations. When compared to supervised learning baselines, the semi-supervised method improved lung nodule detection ability significantly.
3.3.2 Universal Lesion Detection
There is a growing interest in research that seeks to identify and localize various types of lesions from the entire human body at once, but traditional lesion detectors have focused on a single type of lesion. Different forms of lesions, such as lung nodules, liver tumors, abdominal masses, pelvic masses, etc., are represented in Deep Lesion’s 32 K-strong dataset. In order to lower the number of false positives produced by the model, it was retrained using negative examples. Using a multitask detector (MULAN) to simultaneously carry out lesion detection, tagging, and segmentation, [50] were able to significantly enhance the performance of universal lesion detection. It has been shown that combining tasks may improve performance on a single task since they can supply complimentary information to each other. MULAN is an adaptation of the three-head-branch variant of the Mask RCNN. Each proposed region’s lesion status and bounding box regressions are predicted by the detection branch, while 185 tags (such as body part, lesion type, intensity, shape, etc.) are predicted by the tagging branch for each lesion proposal [104,105,106,107,108,109,110].
3.3.3 Unsupervised Lesion Detection
No matter whether you’re trying to identify a particular sort of lesion or a universal lesion, you'll need some level of supervision to train a one-stage or two-stage detector, as we’ve discussed above. Before training the detectors, we must setup the supervision by specifying the sorts of lesions to be monitored. Once the detectors have been trained, they will be unable to recognize lesions that were not included in the original training set. Unsupervised lesion identification, on the other hand, does not need ground-truth annotations, therefore the kinds of lesions do not have to be stated in advance. Despite this, it may be utilized for rough anomaly identification and to identify potential imaging biomarker candidates.
The effectiveness of these unsupervised models has mostly been shown in MRI, where VAE and GAN are often utilized to estimate the normative distribution. The authors do an in-depth analysis of the differences between these models and provide several insightful examples of effective uses. This research concludes, among other things, that restoration-based methods outperform reconstruction-based methods in situations when runtime is not a factor [111,112,113,114,115,116,117,118,119]. In contrast to the aforementioned survey publication, we will just provide a quick overview of reconstruction-based methods and zero in on the most current research concerning restoration-based detection.
3.3.3.1 Reconstruction-Based Paradigm
The original picture is reconstructed from its latent representation using an AE or VAE-based model. In order to train the model, only clean pictures are utilized. Additionally, the model is tuned to provide low reconstruction error pixel by pixel. Reconstruction errors are predicted to be small for healthy picture regions and large for abnormal image regions when unhealthy photos are processed by the model. When these two measures were combined, the CVAE-based model produced acceptable tumor segmentation findings in MRIs. It is important to note that the authors included local context into CVAE by using patch placements as conditions. In order to enhance performance, the location-related condition might give extra background knowledge about healthy and diseased tissues.
3.3.3.2 Restoration-Based Paradigm:
Either an ideal latent representation or the original, non-anomalous version of the input abnormal picture is the goal of restoration. There have been applications of both GAN-based and VAE-based approaches, with the former often using GAN during latent representation restoration. The ideal latent representation was then restored by performing gradient descent in the latent space (with respect to the latent variable) given an input picture (normal or anomalous). To be more specific, the optimization is governed by a loss function that takes into account both the residual loss and the discrimination loss. Similar to the reconstruction error, the residual loss evaluates the degree to which the produced pictures differ from the originals based on the latent variable. Meanwhile, the discriminator network receives both kinds of pictures and uses a single intermediate layer to extract characteristics from them.
3.4 Registration
Registration seeks to build a variation between pictures, as opposed to rigid registration, in which all of the image pixels evenly undergo a basic transform (such as rotation). In recent years, deep learning has been used to this field of study increasingly often, particularly in the area of deformable image registration. Our assessment of deep learning-based medical image registration techniques follows the same three-fold structure of the review paper (Haskins et al., 2020): (1) deep iterative registration; (2) supervised registration; and (3) unsupervised registration. More detailed information on registration strategies is available in many other outstanding review studies [50, 51].
3.4.1 Deep Iterative Registration
To accomplish deformable registration, for instance, [50] employed a 5-layer convolutional neural network (CNN) to develop a measure to assess the similarity of aligned 3D brain MRI T1–T2 image pairings. For multimodal registration, this deep learning-based measure performed better than manually specified similarity metrics like mutual information. The closest comparable study [41, 42], who used an FCN pre-trained using stacked denoising auto encoder to assess the similarity of 2D CT–MR patch pairings.
3.4.2 Supervised Registration
Warp/deformation fields may be synthesized/simulated, manually marked, or obtained through registration. [48] built a multiscale CNNs-based model to directly predict 3D DVFs. To enhance their training dataset, they created DVFs with varied spatial frequency and amplitude, then augmented the data with 1 million training samples. Distorted pictures were recorded in a single pass after training, outperforming B-spline. Image similarity assessments may also aid with registration. [68] established dual-supervised brain MR image registration training. Using ground truth guidance, the difference in deformation fields was estimated. Picture similarity was used to compare the template and distorted topic image. First improved network convergence, then training and registration.
3.4.3 Unsupervised Registration
Traditional registration procedures make it difficult to collect ground truth warp fields, and the limited deformations available for model training lead to unsatisfactory outcomes on unseen images. Wu et al. (2016) used a convolutional stacked auto encoder to improve registration performance. The decoder outputs the registration field, while the encoder inputs a moving and static image. The spatial transformer network [56] warped the dynamic image with the registration field to reconstruct the static image. Voxel Morph reduces the difference between reconstructed and fixed images to provide deformation fields. Our unsupervised registration approach was orders of magnitude faster than symmetric normalization (SyN).
All of the loss functions used by the aforementioned unsupervised registration methods are designed using a combination of user-defined similarity metrics and specific regularization terms. Although traditional similarity measures perform well in mono-modal registration, they are not as successful in multi-modal instances as deep similarity metrics. In order to get optimal outcomes in multi-modal registration, it has been suggested to use sophisticated deep similarity metrics learnt in unsupervised settings.
4 Discussions
4.1 On the Task-Specific Perspective
4.1.1 Classification
When compared to the development of computer vision, the use of deep learning in medical image analysis has lagged behind. Nonetheless, it is possible that straight use of computer vision techniques may not provide desirable outcomes owing to the differences between medical pictures and natural images. Good performance requires overcoming obstacles specific to medical imaging jobs. The key to success in the classification challenge is to extract highly dis-criminative characteristics with regard to particular classes. Domains with high inter-class similarity may make this challenging, whereas domains with considerable inter-class variation make it quite straightforward. It is challenging to capture discriminative characteristics for breast cancers, which contributes to poor mammography classification performance. Given the high degree of similarity across classes, it may be appropriate to learn fine-grained visual categorization characteristics that distinguish one class from another. However, it is important to keep in mind that all the picture samples in benchmark FVGC datasets were intentionally gathered to display significant inter-class similarity. Thus, methods developed and tested on such data may not translate well to medical datasets, where only a subset of photos exhibits strong inter-class similarity.
4.1.2 Detection
As the procedure of bounding box prediction demonstrates, detecting medical objects is more involved than classifying them. The difficulties of categorization naturally manifest themselves in the process of detection. Meanwhile, there are other difficulties, such as class imbalance and the identification of tiny-scale items, such microscopic lung nodules. One-stage detectors are often equally effective as two-stage detectors in detecting big things but have greater trouble with little ones. Using multiscale features has been shown to help with this problem in both single- and two-stage detectors. Futurized image pyramids (Liu et al., 2020b) are a simple but successful method in which characteristics are retrieved from different picture scales separately. While there is no one right approach to construct a feature pyramid, it is generally accepted that robust, high-level semantics and high-resolution feature maps must be fused for optimal performance. As shown by FPN, this is crucial for the detection of microscopic things (Lin et al., 2017a).
4.1.3 Segmentation
Classification and detection difficulties are combined in the difficult task of segmenting medical images. These difficulties frequently look interwoven, too. Structures, forms, and contours that are crucial to a proper diagnosis and prognosis may be lost as a result. For this reason, we think it’s important to improve segmentation performance by creating non-region-based indicators that may supplement region-based data.
The power of transformers is in their capacity to accurately portray causal relationships across broad time scales, and it is this feature that we want to highlight here. Most CNN-based approaches don’t put an emphasis on long-range dependencies, despite its usefulness for obtaining accurate segmentation. Both intra-slice dependencies (relationships between pixels inside a single CT or MRI slice) and inter-slice dependencies (relationships between pixels in different slices).
4.1.4 Registration
The goal of medical image registration is to identify the pixel- or voxel-level correlation between two pictures, which is a very different challenge from those that came before it. Gaining access to trustworthy ground truth registrations, whether they be created synthetically or by traditional registration techniques, presents a distinct obstacle. The use of unsupervised techniques has shown much promise in resolving this problem. But many unsupervised registration systems [50] are made up of numerous phases to register pictures in a coarse-to-fine fashion. The improved performance may be offset by the increased computational complexity and training difficulty introduced by multi-stage frameworks. To this purpose, it is preferable to create registration frameworks with as few steps as possible, so that they may be learned in their entirety.
4.1.5 Incorporating Domain Knowledge
Most medical vision models are adopted from their counterparts in the natural imaging community; nevertheless, medical pictures present their own set of issues that make them harder to work with. If used properly, domain knowledge may reduce the amount of time and effort needed to solve these problems using computing. However, we find that it is sometimes more challenging to successfully combine the extensive domain knowledge that is already known to radiologists. Mammograms may detect breast cancer in certain women. Important signals for radiologists to discover worrisome areas and diagnose malignancy include unilateral correspondence and bilateral difference. As it is, there aren't many effective ways to put this information to use. Because of this, further study is required to fully exploit the benefits of superior domain expertise.
4.2 More Widespread use of Deep Learning in Medical Contexts
Despite its widespread usage in academic and industrial research institutes for interpreting medical pictures, deep learning has not had the profound effect on clinical practice that was anticipated. Researchers across the globe quickly jumped on the deep learning on patient chest X-rays and CT scans in an effort to make a more precise and timely diagnosis and prognosis of the condition. However, model overfitting, poor assessment, wrong data sources, etc. greatly skewed the positive outcomes. Another review paper (Roberts et al., 2021) came to a similar conclusion after analyzing 62 studies that were chosen from 415.
4.2.1 Image Datasets
Deep learning relies heavily on data to function. For the purpose of training and testing new algorithms, the field of medical vision has generated and is continuing to develop medical picture datasets of increasing size (typically at least several hundred images). When benchmark datasets for various illnesses (such as cancer) are provided annually as part of the MICCAI challenges, for instance, this tremendously promotes the development of medical vision. Nonetheless, because everyone in the community is working toward the same goal of perfect performance, overfitting is likely to occur on this dataset if it is utilised exclusively (Roberts et al., 2021). Many academics have realised this issue, therefore it is usual practise to employ many public and/or private datasets to thoroughly verify the effectiveness of a new algorithm. Although this helps lessen prejudice in the population as a whole, it is not enough for widespread clinical use.
More data should be used to train and test models to further reduce community-wide bias. Data curation, or the ongoing creation of vast, varied databases via collaborative effort with experts, is one straightforward approach to add new data. Our alternative recommendation is to integrate dispersed private datasets as ethical and legal restrictions permit, which is a more roundabout approach. Large, representative, labelled data may always appear to be insufficient, at least in the eyes of the medical image analysis community. The reality is more nuanced than that, however. It is true that many well-known public databases have restricted quantity and diversity due to time and expense restrictions. Most current data sources are secret and dispersed throughout several agencies and nations because to concerns about privacy and the complexity of the relevant political climates. As a result, it is preferable to use the combined power of private datasets and even personal data without compromising patients’ privacy. Federated learning, which provides models with encrypted access to private information, is a potential strategy for attaining this objective (Li et al., 2020f). Without the need to exchange data, federated learning enables the training of deep learning algorithms on data from many universities. Although this technology introduces new difficulties, it allows for the training of algorithms that are less biassed, more generalizable, more resilient, and perform better than ever before, thereby better serving the demands of clinical applications.
4.2.2 Performance Evaluation
While it is simple to assess the technical performance of proposed methodologies using these criteria, this does not always represent clinical application. Clinicians are more interested in whether or if using algorithms will improve patient care than they are in the performance increases stated in articles (Kelly et al., 2019). Consequently, we think it is crucial for research teams to interact with physicians for algorithmic assessment, in addition to using appropriate criteria. We very briefly discuss two approaches that may be used to institutionalize cooperative assessment. To begin, have clinicians participate in the peer review process for conferences and publications by submitting papers and giving their perspectives on open clinical problems. Second, evaluate whether the use of deep learning algorithms may enhance physicians’ performance and/or efficiency. Some research has looked at the possibility of using model findings as a “second opinion” to help guide physicians’ ultimate interpretation. For instance, McKinney et al. (2020) assessed the supplementary function of a deep learning model for the job of predicting breast cancer from mammograms. They discovered that the model could detect cancer in numerous situations when radiologists had failed to do so. In addition, the model greatly decreased the burden of the second reader in the “double-reading procedure” (common practise in the UK), while keeping performance levels close to the consensus opinion.
Overall, deep learning is an emerging topic of study that shows great promise in several areas of medical image analysis, such as illness classification, segmentation, detection, and picture registration. To construct deep learning based CAD schemes that can reach high scientific rigour, we currently face various technological problems or hazards (Roberts et al., 2021). Therefore, these challenges must be addressed via further study before deep learning-based CAD methods may gain widespread acceptance among doctors. The choice of the best deep learning method depends on the specific requirements and constraints of the medical imaging task at hand. It’s often beneficial to experiment with different architectures, pre-processing techniques, and training strategies to determine the most effective approach for a particular application. Additionally, incorporating domain knowledge, collaborating with medical experts, and conducting rigorous evaluations are essential for developing reliable and clinically relevant deep learning solutions in medical image analysis.
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Code Availability
The code generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Abdar, M., Pourpanah, F., Hussain, S., Rezazadegan, D., Liu, L., Ghavamzadeh, M., Fieguth, P., Cao, X., Khosravi, A., & Acharya, U. R. (2021). A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76, 243–297.
Abraham, N., Khan, N.M., (2019). A novel focal tversky loss function with improved attention u-net for lesion segmentation. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE, pp. 683–687.
Akselrod-Ballin, A., Karlinsky, L., Hazan, A., Bakalo, R., Horesh, A.B., Shoshan, Y., Barkan, E., et al., (2017). Deep learning for automatic detection of abnormal findings in breast mammography. In: Cardoso, M.J., (Eds.),
Arbel, T., Carneiro, G., Syeda-Mahmood, T., Tavares, J.M.R.S., Moradi, M., et al. (Eds.), Deep Learning in Medical Image Analysis and Multimodal Learning For Clinical Decision Support. Springer International Publishing, Cham, 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018 pp. 321–329.
Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K., (2018). Recurrent residual convolutional neural network based on u-net (r2U-net) for medical image segmentation. arXiv preprint arXiv:1802.06955.
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L., (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6077–6086.
Anwar, S. M., Majid, M., Qayyum, A., Awais, M., Alnowami, M., & Khan, M. K. (2018). Medical image analysis using convolutional neural networks: A review. Journal of Medical Systems, 42, 226.
Arazo, E., Ortego, D., Albert, P., O’Connor, N.E., McGuinness, K., (2020). Pseudo-labeling and confirmation bias in deep semi-supervised learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 1–8.
Arjovsky, M., Chintala, S., Bottou, L., (2017). Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. PMLR, pp. 214–223.
Avants, B. B., Epstein, C. L., Grossman, M., & Gee, J. C. (2008). Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis, 12, 26–41.
Azizi, S., Mustafa, B., Ryan, F., Beaver, Z., Freyberg, J., Deaton, J., Loh, A., Karthikesalingam, A., Kornblith, S., Chen, T., (2021). Big self-supervised models advance medical image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3478–3488.
Bahdanau, D., Cho, K., Bengio, Y., (2015). Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations (ICLR).
Bai, W., Chen, C., Tarroni, G., Duan, J., Guitton, F., Petersen, S. E., Guo, Y., Matthews, P. M., Rueckert, D., et al. (2019). Self-supervised learning for cardiac MR image segmentation by anatomical position prediction. In D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, et al. (Eds.), Medical image computing and computer assisted intervention – MICCAI 2019 (pp. 541–549). Cham: Springer International Publishing.
Bai, F., Xing, X., Shen, Y., Ma, H., Meng, M.Q.H., (2022). Discrepancy-based active learning for weakly supervised bleeding segmentation in wireless capsule endoscopy images, In Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, (pp. 24–34). Springer.
Balakrishnan, G., Zhao, A., Sabuncu, M. R., Guttag, J., & Dalca, A. V. (2019). VoxelMorph: A learning framework for deformable medical image registration. IEEE Transactions on Medical Imaging, 38, 1788–1800.
Baltatzis, V., Bintsi, K. M., Folgoc, L. L., Martinez Manzanera, O. E., Ellis, S., Nair, A., Desai, S., Glocker, B., & Schnabel, J. A. (2021). The pitfalls of sample selection: A case study on lung nodule classification. In I. Rekik, E. Adeli, S. H. Park, & J. Schnabel (Eds.), Predictive intelligence in medicine (pp. 201–211). Cham: Springer International Publishing.
Billot, B., Magdamo, C., Cheng, Y., Arnold, S. E., Das, S., & Iglesias, J. E. (2023). Robust machine learning segmentation for large-scale analysis of heterogeneous clinical brain MRI datasets. Proceedings of the National Academy of Sciences, 120, e2216399120.
Baumgartner, C. F., Tezcan, K. C., Chaitanya, K., Hötker, A. M., Muehlematter, U. J., Schawkat, K., Becker, A. S., Donati, O., Konukoglu, E., et al. (2019). PHiSeg: Capturing uncertainty in medical image segmentation. In D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, et al. (Eds.), Medical image computing and computer assisted intervention – MICCAI 2019 (pp. 119–127). Cham: Springer International Publishing.
Baur, C., Wiestler, B., Albarqouni, S., Navab, N., (2018). Deep autoencoding models for unsupervised anomaly segmentation in brain MR images. In International MICCAI Brainlesion Workshop. (pp. 161–169). Springer.
Baur, C., Denner, S., Wiestler, B., Navab, N., & Albarqouni, S. (2021). Autoencoders for unsupervised anomaly segmentation in brain MR images: A comparative study. Medical Image Analysis, 69, 101952.
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems, 19, 153.
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C. (2019). MixMatch: A holistic approach to semi-supervised learning. Advance in Neural Information Processing Systems, 32, 1–11.
Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59, 291–294.
Cai, J., Yan, K., Cheng, C. T., Xiao, J., Liao, C. H., Lu, L., Harrison, A. P., et al. (2020). Deep volumetric universal lesion detection using light-weight pseudo 3D convolution and surface point regression. In A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A. Zuluaga, S. K. Zhou, et al. (Eds.), Medical image computing and computer assisted intervention – MICCAI 2020 (pp. 3–13). Cham: Springer International Publishing.
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M., (2021). Swin-unet: U-net-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537.
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O., (2016). 3D U-Net: learning dense volumetric segmentation from sparse annotation. In Proceedings of the International Conference On Medical Image Computing And Computer-Assisted Intervention. (pp. 424–432). Springer.
Plis, S. M., Hjelm, D. R., Salakhutdinov, R., Allen, E. A., Bockholt, H. J., Long, J. D., Johnson, H. J., Paulsen, J. S., Turner, J. A., & Calhoun, V. D. (2014). Deep learning for neuroimaging: A validation study. Frontiers in Neuroscience. https://doi.org/10.3389/fnins.2014.00229
Chaitanya, K., Erdil, E., Karani, N., & Konukoglu, E. (2020). Contrastive learning of global and local features for medical image segmentation with limited annotations. Advance in Neural Information Processing Systems, 33, 12546–12558.
Chen, Y., Mancini, M., Zhu, X., & Akata, Z. (2022). Semi-supervised and unsupervised deep visual learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 1327–1347.
Moeskops, P., Viergever, M. A., Mendrik, A. M., de Vries, L. S., Benders, M. J. N. L., & Isgum, I. (2016). Automatic segmentation of MR brain images with a convolutional neural network. IEEE Transaction on Medical Imaging, 35(5), 1252–1262. https://doi.org/10.1109/TMI.2016.2548501
Chaudhari, S., Mithal, V., Polatkan, G., & Ramanath, R. (2021). An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology (TIST), 12, 1–32.
Chen, L., Yang, Y., Wang, J., Xu, W., Yuille, A.L., (2016). Attention to scale: scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3640–3649.
Chen, X., Fan, H., Girshick, R., He, K., (2020b). Improved baselines with momentum contrastive learning, arXiv preprint arXiv:2003.04297.
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J., (2017). Dual path networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems. (pp. 4470–4478). Curran Associates Inc., Long Beach, California, USA,.
Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder–decoder with atrous separable convolution for semantic image segmentation. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer vision – ECCV 2018 (pp. 833–851). Cham: Springer International Publishing.
Chen, S., Tan, X., Wang, B., & Hu, X. (2018). Reverse attention for salient object detection. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer vision – ECCV 2018 (pp. 236–252). Cham: Springer International Publishing.
Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M., (2013). Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 8150, pp. 246–253. https://doi.org/10.1007/978- 3- 642- 40763- 5 _ 31.
Chen, S., Ma, K., Zheng, Y., (2019). Med3D: Transfer learning for 3D medical image analysis, arXiv preprint arXiv:1904.00625.
Chen, L., Bentley, P., Mori, K., Misawa, K., Fujiwara, M., & Rueckert, D. (2019). Self-supervised learning for medical image analysis using image context restoration. Medical Image Analysis, 58, 101539.
Rajkomar, A., Lingam, S., Taylor, A. G., Blum, M., & Mongan, J. (2017). High-throughput classification of radiographs using deep convolutional neural networks. Journal of Digital Imaging, 30, 95–101. https://doi.org/10.1007/s10278-016-9914-9
Chen, S., Bortsova, G., García-Uceda Juárez, A., Tulder, G.v., Bruijne, M.d., (2019c). Multi-task attention-based semi-supervised learning for medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. (pp. 457–465). Springer.
Chen, T., Kornblith, S., Norouzi, M., Hinton, G., (2020a). A simple framework for contrastive learning of visual representations. In Hal, D., Aarti, S. (Eds.), Proceedings of the 37th International Conference on Machine Learning. PMLR, Proceedings of Machine Learning Research, pp. 1597–1607.
Ravishankar, H., Prabhu, S.M., Vaidya, V., Singhal, N., (2016a). Hybrid approach for automatic segmentation of fetal abdomen from ultrasound images using deep learning. In Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 779–782. https://doi.org/10.1109/ISBI.2016.7493382.
Chen, T., Liu, S., Chang, S., Cheng, Y., Amini, L., Wang, Z., (2020c). Adversarial robustness: From self-supervised pre-training to fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 699–708.
Chen, X., You, S., Tezcan, K. C., & Konukoglu, E. (2020). Unsupervised lesion detection via image restoration with a normative prior. Medical Image Analysis, 64, 101713.
Dai, C., Wang, S., Mo, Y., Angelini, E., Guo, Y., & Bai, W. (2022). Suggestive annotation of brain mr images with gradient-guided sampling. Medical Image Analysis, 77, 102373.
Akram, S.U., Kannala, J., Eklund, L., Heikkilä, J., (2016). Cell segmentation proposal network for microscopy image analysis. In: Proceedings of the Deep Learning in Medical Image Analysis (DLMIA). In: Lecture Notes in Computer Science, 10 0 08, pp. 21–29. https://doi.org/10.1007/978- 3- 319- 46976- 8 _ 3 .
Akselrod-Ballin, A., Karlinsky, L., Alpert, S., Hasoul, S., Ben-Ari, R., Barkan, E., (2016). A region based convolutional network for tumor detection and classification in breast mammography. In Proceedings of the Deep Learning in Medical Image Analysis (DLMIA). In Lecture Notes in Computer Science, 10 0 08, pp. 197–205. https://doi.org/10.1007/978- 3- 319- 46976- 8 _ 21 .
Alansary, A., Kamnitsas, K., Davidson, A., Khlebnikov, R., Rajchl, M., Malamateniou, C., Rutherford, M., Hajnal, J.V., Glocker, B., Rueckert, D., Kainz, B., (2016). Fast fully automatic segmentation of the human placenta from motion corrupted MRI. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 589–597. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 68.
Albarqouni, S., Baur, C., Achilles, F., Belagiannis, V., Demirci, S., & Navab, N. (2016). Ag–gnet: Deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Transactions on Medical Imaging, 35, 1313–1321.
Anavi, Y., Kogan, I., Gelbart, E., Geva, O., Greenspan, H., (2015). A comparative study for chest radiograph image retrieval using binary texture and deep learning classification. In Proceedings of the IEEE Engineering in Medicine and Biology Society, pp. 2940–2943. https://doi.org/10.1109/EMBC.2015.7319008 .
Yao, J., Wang, S., Zhu, X., Huang, J., (2016). Imaging biomarker discovery for lung cancer survival prediction. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 649–657. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 75.
Anavi, Y., Kogan, I., Gelbart, E., Geva, O., Greenspan, H., (2016). Visualizing and enhancing a deep learning framework using patients age and gender for chest X-ray image retrieval. In Proceedings of the SPIE on Medical Imaging, 9785, p. 978510.
Andermatt, S., Pezold, S., Cattin, P., (2016). Multi-dimensional gated recurrent units for the segmentation of biomedical 3D-data. In Proceedings of the Deep Learning in Medical Image Analysis (DLMIA). In Lecture Notes in Computer Science, 10 0 08, pp. 142–151 .
Zhao, J., Zhang, M., Zhou, Z., Chu, J., & Cao, F. (2016). Automatic detection and classification of leukocytes using convolutional neural networks. Medical & Biological Engineering & Computing. https://doi.org/10.1007/s11517-016-1590-x
Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., & Mougiakakou, S. (2016). Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Transactions on Medical Imaging, 35(5), 1207–1216. https://doi.org/10.1109/TMI.2016.2535865
Antony, J., McGuinness, K., Connor, N.E.O., Moran, K., (2016). Quantifying radio-graphic knee osteoarthritis severity using deep convolutional neural networks. arxiv: 1609.02469 .
Zhang, H., Li, L., Qiao, K., Wang, L., Yan, B., Li, L., Hu, G., (2016a). Image prediction for limited-angle tomography via deep learning with convolutional neural network. arxiv: 1607.08707.
Apou, G., Schaadt, N. S., Naegel, B., Forestier, G., Schönmeyer, R., Feuerhake, F., Wemmert, C., & Grote, A. (2016). Detection of lobular structures in normal breast tissue. Computers in Biology and Medicine, 74, 91–102. https://doi.org/10.1016/j.compbiomed.2016.05.004
Zeiler, M.D., Fergus, R., (2014). Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, pp. 818–833.
Arevalo, J., González, F. A., Ramos-Pollán, R., Oliveira, J. L., & Guevara Lopez, M. A. (2016). Representation learning for mammography mass lesion classification with convolutional neural networks. Computer Methods and Programs in Biomedicine, 127, 248–257. https://doi.org/10.1016/j.cmpb.2015.12.014
Yu, L., Yang, X., Chen, H., Qin, J., Heng, P.A., (2017c). Volumetric convnets with mixed residual connections for automated prostate segmentation from 3D MR images. In Proceedings of the thirty-first AAAI Conference on Artificial Intelligence.
Baumgartner, C.F., Kamnitsas, K., Matthew, J., Smith, S., Kainz, B., Rueckert, D., (2016). Real-time standard scan plane detection and localisation in fetal ultrasound using fully convolutional neural networks. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 203–211. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 24 .
Ben-Cohen, A., Diamant, I., Klang, E., Amitai, M., Greenspan, H., (2016). Deep learning and data labeling for medical applications. In Proceedings of the International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. In Lecture Notes in Computer Science, 10 0 08, pp. 77–85. https://doi.org/10.1007/978- 3- 319- 46976- 8 _ 9 .
Bengio, Y., (2012). Practical recommendations for gradient-based training of deep ar- chitectures. In Neural Networks: Tricks of the Trade. (pp. 437–478). Springer, Berlin Heidelberg.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., (2007). Greedy layer-wise training of deep networks. In Proceedings of the Advances in Neural Information Processing Systems, pp. 153–160.
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5, 157–166.
Benou, A., Veksler, R., Friedman, A., Raviv, T.R., (2016). De-noising of contrast-enhanced MRI sequences by an ensemble of expert deep neural networks. In Proceedings of the Deep Learning in Medical Image Analysis (DLMIA). In Lecture Notes in Computer Science, 10 0 08, pp. 95–110.
BenTaieb, A., Hamarneh, G., (2016). Topology aware fully convolutional networks for histology gland segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 460–468. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 53 .
BenTaieb, A., Kawahara, J., Hamarneh, G., (2016). Multi-loss convolutional networks for gland analysis in microscopy. In Proceedingds of the IEEE International Symposium on Biomedical Imaging, pp. 642–645. https://doi.org/10.1109/ISBI.2016. 74 9334 9 .
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(1), 281–305.
Birenbaum, A., Greenspan, H., (2016). Longitudinal multiple sclerosis lesion segmen- tation using multi-view convolutional neural networks. In Proceedings of the Deep Learning in Medical Image Analysis (DLMIA). In Lecture Notes in Computer Science, 10 0 08, pp. 58–67. https://doi.org/10.1007/978- 3- 319- 46976- 8 _ 7 .
Cheng, X., Zhang, L., & Zheng, Y. (2015). Deep similarity learning for multimodal medical images. Computer Methods in Biomechanics and Biomedical Engineering. https://doi.org/10.1080/21681163.2015.1135299
Cicero, M., Bilbily, A., Colak, E., Dowdell, T., Gray, B., Perampaladas, K., & Barfett, J. (2017). Training and validating a deep convolutional neural network for computer-aided detection and classification of abnormalities on frontal chest radiographs. Investigative Radiology, 52(5), 281–287. https://doi.org/10.1097/RLI.0000000000000341
Günhan Ertosun, M., Rubin, D.L., (2015). Automated grading of gliomas using deep learning in digital pathology images: a modular approach with ensemble of convolutional neural networks. In Proceedings of the AMIA Annual Symposium, pp. 1899–1908.
Guo, Y., Gao, Y., & Shen, D. (2016). Deformable MR prostate segmentation via deep feature learning and sparse patch matching. IEEE Transactions on Medical Imaging, 35(4), 1077–1089. https://doi.org/10.1109/TMI.2015.2508280
Guo, Y., Wu, G., Commander, L.A., Szary, S., Jewells, V., Lin, W., Shen, D., (2014). Segmenting hippocampus from infant brains by sparse patch matching with deep-learned features. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 8674, pp. 308–315. https://doi.org/10.1007/978- 3- 319- 10470- 6 _ 39 .
Han, X.-H., Lei, J., Chen, Y.-W., (2016). HEp-2 cell classification using K-support spatial pooling in deep CNNs. In Proceedings of the Deep Learning in Medical Image Analysis (DLMIA). In Lecture Notes in Computer Science, 10 0 08, pp. 3–11. https://doi.org/10.1007/978- 3- 319- 46976- 8 _ 1.
Haugeland, J. (1985). Artificial intelligence: The very idea. Cambridge: The MIT Press.
Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.-M., & Larochelle, H. (2016). Brain tumor segmentation with deep neural networks. Medical Image Analysis, 35, 18–31. https://doi.org/10.1016/j.media.2016.05.004
Havaei, M., Guizard, N., Chapados, N., Bengio, Y., (2016b). HeMIS: Hetero-modal image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 469–477. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 54.
Jung, S., Kim, S., Lee, J., (2023). A simple yet powerful deep active learning with snapshots ensembles, In International Conference on Learning Representations. URL: https://openreview.net/forum?id=IVESH65r0
Janowczyk, A., & Madabhushi, A. (2016). Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics, 7, 29. https://doi.org/10.4103/2153-3539.186902
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T., (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the Twenty-Second ACM International Conference on Multi- media, pp. 675–678. https://doi.org/10.1145/264786 8.2654 889.
Kainz, P., Pfeiffer, M., Urschler, M., (2015). Semantic segmentation of colon glands with deep convolutional neural networks and total variation segmentation. arxiv: 1511.06919.
Källén, H., Molin, J., Heyden, A., Lundstr, C., Aström, K., (2016). Towards grading gleason score using generically trained deep convolutional neural networks. In Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 1163–1167. https://doi.org/10.1109/ISBI.2016.7493473.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324. https://doi.org/10.1109/5.726791
Lekadir, K., Galimzianova, A., Betriu, A., Del Mar Vila, M., Igual, L., Rubin, D. L., Fernandez, E., Radeva, P., & Napel, S. (2017). A convolutional neural network for au- tomatic characterization of plaque composition in carotid ultrasound. IEEE Journal of Biomedical and Health Informatics, 21, 48–55. https://doi.org/10.1109/JBHI.2016.2631401
Li, R., Zhang, W., Suk, H.-I., Wang, L., Li, J., Shen, D., Ji, S., (2014). Deep learning based imaging data completion for improved brain disease diagnosis. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 8675, pp. 305–312.
Lou, W., Li, H., Li, G., Han, X., & Wan, X. (2023). Which pixel to annotate: A label-efficient nuclei segmentation framework. IEEE Transactions on Medical Imaging, 42, 947–958.
Miao, S., Wang, Z. J., & Liao, R. (2016). A CNN regression approach for real-time 2D/3D registration. IEEE Transactions on Medical Imaging, 35(5), 1352–1363. https://doi.org/10.1109/TMI.2016.2521800
Pinaya, W. H. L., Gadelha, A., Doyle, O. M., Noto, C., Zugman, A., Cordeiro, Q., Jackowski, A. P., Bressan, R. A., & Sato, J. R. (2016). Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia. Scientific Reports, 6, 38897. https://doi.org/10.1038/srep38897
Poudel, R. P. K., Lamata, P., Montana, G., (2016). Recurrent fully convolutional neural networks for multi-slice MRI cardiac segmentation. arxiv: 1608.03974.
Ravi, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., & Yang, G.-Z. (2017). Deep learning for health informatics. IEEE Journal of Biomedical and Health Informatics, 21, 4–21. https://doi.org/10.1109/JBHI.2016.2636665
Sahiner, B., Chan, H.-P., Petrick, N., Wei, D., Helvie, M. A., Adler, D. D., & Goodsitt, M. M. (1996). Classification of mass and normal breast tissue: A convolution neural network classifier with spatial domain and texture images. IEEE Transactions on Medical Imaging, 15, 598–610. https://doi.org/10.1109/42.538937
Samala, R.K., Chan, H.-P., Hadjiiski, L., Cha, K., Helvie, M.A., (2016a). Deep-learning convolution neural network for computer-aided detection of microcalcifications in digital breast tomosynthesis. In Proceedings of the SPIE on Medical Imaging, 9785, p. 97850Y.
Samala, R. K., Chan, H.-P., Hadjiiski, L., Helvie, M. A., Wei, J., & Cha, K. (2016). Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography. Medical Physics, 43(12), 6654–6666.
Sarraf, S., Tofighi, G., (2016). Classification of Alzheimer’s disease using fmri data and deep learning convolutional neural networks. arxiv: 1603.08631.
Schaumberg, A.J., Rubin, M.A., Fuchs, T.J., (2016). H & estained whole slide deep learning predicts SPOP mutation state in prostate cancer. arxiv: 064279 https://doi.org/10.1101/064279.
Schlegl, T., Waldstein, S.M., Vogl, W.-D., Schmidt-Erfurth, U., Langs, G., (2015). Predicting semantic descriptions from medical images with convolutional neural networks. In Proceedings of the Information Processing in Medical Imaging. In Lecture Notes in Computer Science, 9123, pp. 437–448. 10.1007/ 978- 3- 319- 19992- 4 _ 34.
Spampinato, C., Palazzo, S., Giordano, D., Aldinucci, M., & Leonardi, R. (2017). Deep learning for automated skeletal bone age assessment in X-ray images. Medical Image Analysis, 36, 41–51. https://doi.org/10.1016/j.media.2016.10.010
Springenberg, J. T., Dosovitskiy, A., Brox, T., Riedmiller, M., (2014). Striving for simplicity: The all convolutional net. arxiv: 1412.6806.
Štern, D., Payer, C., Lepetit, V., Urschler, M., (2016). Automated age estimation from hand MRI volumes using deep learning. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 194–202. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 23.
Suk, H.-I., Shen, D., (2013). Deep learning-based feature representation for AD/MCI classification. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 8150, pp. 583–590. https://doi.org/10.1007/978- 3- 642- 40763- 5 _ 72.
Sun, W., Tseng, T.-L.B., Zhang, J., & Qian, W. (2016). Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data. Computerized Medical Imaging Graphics. https://doi.org/10.1016/j.compmedimag.2016.07.004
Sun, W., Zheng, B., Qian, W., (2016b). Computer aided lung cancer diagnosis with deep learning algorithms. In Proceedings of the SPIE Medical Imaging, 9785, p. 97850Z.
Teikari, P., Santos, M., Poon, C., Hynynen, K., (2016). Deep learning convolutional networks for multiphoton microscopy vasculature segmentation. arxiv: 1606.02382.
Tran, P.V., (2016). A fully convolutional neural network for cardiac segmentation in short-axis MRI. arxiv: 1604.00494.
Xie, Y., Xing, F., Kong, X., Su, H., Yang, L., 2015b. Beyond classification: Structured regression for robust cell detection using convolutional neural network. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9351, pp. 358–365. 10.1007/ 978- 3- 319- 24574- 4 _ 43.
Xie, Y., Zhang, Z., Sapkota, M., Yang, L., (2016b). Spatial clockwork recurrent neural network for muscle perimysium segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 185–193. Springer. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 22.
Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N., (2016c). Multimodal deep learning for cervical dysplasia diagnosis. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 115–123. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 14.
Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M., Chang, E.I.C., (2014). Deep learning of feature representation with multiple instance learning for medical image analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1626–1630. https://doi.org/10.1109/ICASSP.2014.6853873.
Xu, Z., Huang, J., (2016). Detecting 10,0 0 0 Cells in one second. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 676–684. 10.1007/ 978- 3- 319- 46723- 8 _ 78.
Yang, D., Zhang, S., Yan, Z., Tan, C., Li, K., Metaxas, D., (2015). Automated anatomical landmark detection on distal femur surface using convolutional neural network. In proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 17–21. https://doi.org/10.1109/isbi.2015.7163806.
Yang, H., Sun, J., Li, H., Wang, L., Xu, Z., (2016a). Deep fusion net for multi-atlas segmentation: Application to cardiac MR images. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 521–528. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 60.
Wang, S., Yao, J., Xu, Z., Huang, J., (2016d). Subtype cell detection with an accelerated deep convolution neural network. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention. In Lecture Notes in Computer Science, 9901, pp. 640–648. https://doi.org/10.1007/978- 3- 319- 46723- 8 _ 74.
Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M., Chang, E.I.C., (2014). Deep learning of feature representation with multiple instance learning for medical image analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1626–1630. https://doi.org/10.1109/ICASSP.2014.6853873.
Yang, X., Kwitt, R., Niethammer, M., (2016d). Fast predictive image registration. In Proceedings of the Deep Learning in Medical Image Analysis (DLMIA). In Lecture Notes in Computer Science, 10 0 08, pp. 48–57. 10.1007/ 978- 3- 319- 46976- 8 _ 6.
Zhang, Q., Xiao, Y., Dai, W., Suo, J., Wang, C., Shi, J., & Zheng, H. (2016). Deep learning based classification of breast tumors with shear-wave elastography. Ultrasonics, 72, 150–157. https://doi.org/10.1016/j.ultras.2016.08.004
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by K. Balasamy, V. Seethalakshmi, S. Suganyadevi. The first draft of the manuscript was written by K. Balasamy, S. Suganyadevi and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Balasamy, K., Seethalakshmi, V. & Suganyadevi, S. Medical Image Analysis Through Deep Learning Techniques: A Comprehensive Survey. Wireless Pers Commun 137, 1685–1714 (2024). https://doi.org/10.1007/s11277-024-11428-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-024-11428-1