Keywords

1 Introduction

Neuroradiology has often been at the forefront of radiological imaging advances, such as the advent of diffusion-weighted MRI [1], due to the high stakes associated with diseases of the brain and spine as well as pragmatic factors such as the small field of view required for brain imaging and the sparing of the brain from respiratory motion artifact. With advances in computer vision in recent years, much interest has centered on the application of these technologies to neuroimaging; however, this presents a challenge due to the cross-sectional and, in the case of MRI, multiparametric nature of brain and spine imaging. The hardware demands associated with training deep learning networks using large numbers of three-dimensional image volumes are significant [2], although newer techniques [3] in combination with the availability of increasingly powerful GPU chips are beginning to overcome these challenges. AI applications to neuroimaging involve all aspects of image acquisition and interpretation and include study protocoling, image reconstruction, segmentation, and detection of disease processes (i.e., image classification).

2 Preprocessing of Brain Imaging

When utilizing supervised training for any task, the quality of the labeled training data has a profound impact on the success of the trained network. Accordingly, brain imaging data typically undergo several preprocessing steps before being utilized in AI applications. These steps include brain extraction (i.e., skull stripping), histogram normalization, and coregistration.

For many brain imaging AI applications, the removal of non-brain tissues from imaging data, including the skull, orbital contents, and soft tissues of the head and neck, leads to better performance [4,5,6]. The most commonly used tools for these tasks include the FMRIB Software Library (FSL) Brain Extraction Tool (BET) [7,8,9] and BET 2 [10], Brain Surface Extractor (BSE) [11], FreeSurfer [12], Robust Learning-based Brain Extraction System (ROBEX) [13], and Brain Extraction based on nonlocal Segmentation Technique (BEaST) [14]. For pediatric brain imaging, Learning Algorithm for Brain Extraction and Labeling (LABEL) has shown superior brain extraction performance as compared with several other commonly used tools [15]. Newer approaches for brain extraction that have utilized 3D convolutional neural networks (CNNs) have demonstrated superiority when used specifically for brain tumor studies [16] and have outperformed several older conventional non-CNN approaches~[17].

Many big data applications utilize MR images acquired from multiple centers and scanners, which introduces challenges related to source heterogeneity. For example, MR imaging is prone to various artifacts that may degrade the performance of AI applications. Variations in image intensity that occur due to inhomogeneities of MRI field strength, certain image acquisition artifacts, and patient motion may be addressed with bias field correction [18]. Commonly used tools for bias correction include nonparametric nonuniform intensity normalization (N3) [19] and N4ITK [20]. Another issue unique to MR imaging not encountered when using radiographs or CT is that variations in MRI scanner hardware and sequence designs frequently result in differences in image intensities for a given tissue class. Image histogram normalization is a common technique for standardizing these intensities across a heterogeneously acquired dataset. The most common methods include creating and applying an average histogram for the dataset [21] or matching individual images’ histograms to that of a chosen reference image [22].

For many AI applications, it is desirable to coregister brain images from different patients (and sequence acquisitions, when using MRI) to a standard geometry, commonly the Montreal Neurological Institute (MNI) space. Many software tools exist for coregistration, such as FMRIB’s Linear Image Registration Tool (FLIRT) [23, 24] and Non-linear Image Registration Tool (FNIRT) [25], Advanced Neuroimaging Tools (ANTs) [26], and FreeSurfer. A newer CNN-based approach dubbed Quicksilver has shown promising results and may outperform traditional methods [27].

Data augmentation is a technique for artificially increasing the number of training samples used in situations where large volumes of labeled data are unavailable [28]. Data augmentation has been described for mitigating the risk of overfitting of deep networks and as a method of handling class imbalance by increasing the proportion of the minority (often disease-positive) class. Pereira et al. performed augmentation using image rotation and reported a tumor segmentation mean performance gain of 2.6% [29]. Akkus et al. achieved an 8.8% accuracy gain for classifying 1p/19q mutation status in low-grade gliomas after augmentation by image rotation, translation, and flipping [30].

3 Applications

Applications of AI to neuroimaging address all stages of image acquisition and interpretation and approach both specific and complex tasks.

3.1 Protocoling, Acquisition, and Image Construction

Once an imaging study is ordered by a referring clinician an imaging protocol must be assigned that is appropriate for the indication and the patient’s medical history. Given the importance of cross-sectional imaging in neuroradiology, protocoling may be a complicated task (particularly in the case of MRI) and is typically performed by the radiologist, interrupting workflow [31] and in so doing potentially contributing to diagnostic errors [32]. In addition to unburdening the radiologist, automated protocolling has the potential to increase MR scanner throughput by including only the sequences pertinent to the given patient. Expanding on previous work applying AI to radiological protocoling [33], Brown and Marotta used natural language processing (NLP) to extract labeled data from radiology information system records, which were then used to train a gradient boost machine to generate custom MRI brain protocols with high accuracy [34].

Once MR data is obtained from the scanner it must first be processed into images for the radiologist to review. This initial raw data is processed by a series of modules that require expert oversight to mitigate image noise and other artifacts, adding time and introducing variance to the image acquisition process. Building on previous deep learning approaches for shortening MR acquisition times through undersampling [35, 36], a network trained on brain MRI called Automated Transform by Manifold Approximation (AUTOMAP) performs image reconstruction rapidly and with less artifact than conventional methods [37] (Fig. 15.1). Since AUTOMAP is implemented as a feed-forward system it completes image reconstruction almost instantly, enabling acquisition issues to be identified and addressed immediately, potentially reducing the need for patient callbacks.

Fig. 15.1
figure 1

Axial and sagittal MR image reconstructions performed using AUTOMAP (middle column) and using conventional methods (right column), with the ground truth images (left column) included for reference. AUTOMAP, which employs deep learning, results in improved signal-to-noise. Reprinted by permission from Springer Nature: Nature, “Image reconstruction by domain-transform manifold learning,” Zhu et al. [37]

Deep learning also shows promise for increasing the accessibility of specialized neuroimaging studies by shortening the acquisition time or enabling the generation of entire simulated imaging modalities. For example, diffusion tensor imaging (DTI), which provides information about white matter anatomy in the brain and spine, may be challenging to obtain on young or very sick patients due to the acquisition time and degree of patient cooperation required. Applying deep learning to DTI can achieve a 12-fold reduction in acquisition time by predicting DTI parameters from fewer data points than conventionally utilized [38]. Similarly, a reduction in acquisition time for arterial spin labeling perfusion imaging was achieved using a trained CNN to predict the final perfusion maps from fewer subtraction images [39].

Seven Tesla MR scanners can reveal a level of detail far beyond that of 1.5 or 3 T scanners [40]; however, 7 T magnets are generally confined to academic imaging centers and may be less tolerated by patients due to the high magnetic field strength [41]. By performing canonical correlation analysis on 3 T and 7 T brain MRI from the same patients, Bahrami et al. [42] were able to artificially generate simulated 7 T images using 3 T images for test patients. Furthermore, these simulated 7 T images had superior performance in subsequent segmentation tasks.

Recognizing that at their essence all radiological imaging modalities represent a type of anatomical abstraction, the ability to synthetically generate another MRI sequence, or imaging modality entirely, presents an intriguing target for AI. Using deep learning, brain MRI T1 images can be generated from T2 images and vice versa [43]. PET–MRI, which holds several advantages over PET–CT, including superior soft tissue contrast, has the disadvantage that in the absence of a CT acquisition it does not readily allow for attenuation correction of the PET images. However, supervised training of a deep network has enabled the generation of synthetic CT head images from contrast-enhanced gradient echo brain MRI, and these synthesized images achieve greater accuracy than existing methods when used to perform attenuation correction on the accompanying PET images [44]. A similar approach was used to train a CNN to utilize a single T1 sequence to generate synthetic CT images with greater speed and lower error rates than conventional methods (Fig. 15.2) [45].

Fig. 15.2
figure 2

Using a single MRI brain sequence as input (contrast-enhanced T1 gradient echo; left column), a trained CNN can generate synthetic CT (sCT) head images (middle column). Ground truth CT images (right column) are presented for comparison. Reprinted by permission from John Wiley and Sons: Medical Physics, “MR-based synthetic CT generation using a deep convolutional neural network method,” Xiao Han [45]

3.2 Segmentation

Accurate, fast segmentation of brain imaging, which can be broadly divided into either anatomical (e.g., subcortical structure) or lesion (pathology-specific) segmentation is an important prerequisite step for a number of clinical and research tasks including monitoring progression of white matter [46, 47] and neurodegenerative diseases [48, 49] and assessing tumor treatment response [50]. However, since manual segmentation is tedious, time consuming, and subject to inter- and intra-observer variance, there is great interest in developing AI solutions. To facilitate the comparison of segmentation algorithms, several open competitions exist featuring public datasets and standardized evaluation methodology, several of which are described in this section.

Anatomical brain imaging segmentation entails the delineation of either basic tissue components (e.g., gray matter, white matter, and cerebrospinal fluid) or atlas-based substructures. For the former, commonly utilized brain tissue segmentation datasets include the Medical Image and Statistical Interpretation Lab (MICCAI) 2012 Multi-Atlas Labelling Challenge [51] and the Internet Brain Segmentation Repository (IBSR). Two more specialized MICCAI challenges exist, MRBrainS13 [52], which contains brain MRIs from adults aged 65–80, and NeoBrainS12, which is comprised of neonatal brain MRIs.

The most common brain lesion segmentation tasks addressed by AI are tumor and multiple sclerosis (MS) lesion segmentation. The MICCAI Brain Tumor Segmentation (BRATS) challenges have occurred annually since 2012, with the datasets growing in number over the years to include 243 preoperative glioma multimodal brain MRIs in the 2018 challenge [53, 54]. The winner of the BRATS 2017 segmentation challenge, as determined by the best overall Dice scores and Hausdorff distances for complete tumor, core tumor, and enhancing tumor segmentation, employed an ensemble CNN comprising several existing architectures under the principle that through a majority voting system the ensemble can derive the strengths of its best performing individual networks, resulting in greater generalizability for the performance of other tasks [55].

Additional deep learning segmentation applications target stroke (described subsequently), multiple sclerosis [56, 57], and cerebral small vessel disease (leukoaraiosis) [58] lesions. Anatomical Tracings of Lesions After Stroke (ATLAS-1) is a publicly available annotated dataset containing over 300 brain MRIs with acute infarcts [59]. For MS lesion segmentation, the major public datasets are MICCAI 2008 [60], International Symposium on Biomedical Imaging (ISBI) 2015 [61], and MS Lesion Segmentation Challenge (MSSEG) 2016 [62].

Due to the limited numbers of training and test subjects generally available within existing public annotated datasets, several of the best performing networks for various segmentation tasks have pooled multiple public datasets, supplemented with their own data, or employed data augmentation techniques [63,64,65,66]. A study by AlBadawy et al. demonstrated the importance of such measures, finding that the source(s) of tumor segmentation training data held a significant impact on the resulting performance during network validation (Fig. 15.3) [67].

Fig. 15.3
figure 3

Two example brain tumor segmentations generated by separate models trained on data from the same, different, or both institutions. Accuracy was greater when the model was trained with data from the same or both institutions as compared with a model trained only using data from a different institution. The enhancing region (Class 2) is segmented in green, necrotic region (Class 3) in yellow, area of T1 abnormality excluding the enhancing and necrotic regions (Class 4) in red, and the area of FLAIR signal abnormality excluding classes 2–4 (Class 5) in blue. Reprinted by permission from John Wiley and Sons: Medical Physics, “Deep learning for segmentation of brain tumors: Impact of cross-institutional training and testing,” AlBadawy et al. [67]

3.3 Stroke

Stroke represents a major cause of morbidity and mortality worldwide. For example, in the United States stroke afflicts an estimated 795,000 people each year [68], accounting for 1 in every 20 deaths [69]. With over 1.9 million neurons lost each minute in the setting of an acute stroke [70], it is critical to quickly diagnose and triage stroke patients.

The Alberta Stroke Program Early Computed Tomography Score (ASPECTS) is a validated and widely used method for triaging patients with suspected anterior circulation acute stroke. ASPECTS divides the middle cerebral artery territories into ten regions of interest bilaterally [71]. The resulting score obtained from a patient’s non-contrast-enhanced CT head correlates with functional outcomes and helps guide management. e-ASPECTS, a ML-based software tool with CE-mark approval for use in Europe, has demonstrated non-inferiority (10% threshold for sensitivity and specificity) for ASPECT scoring as compared with neuroradiologists from multiple stroke centers [72]. Deep learning networks have also achieved high accuracy at quantifying infarct volumes using DWI [73] and FLAIR [74] MR sequences.

Once a patient is diagnosed with an acute stroke, there is a need to quantify the volume of infarcted (unsalvageable) tissue and the ischemic but not yet infarcted (salvageable) tissue. This latter salvageable tissue is referred to as the ischemic penumbra. Quantification of the infarct core and ischemic penumbra is generally performed with either CT or MR brain perfusion. In the latter approach, the diffusion-perfusion mismatch is used to guide thrombolysis and thrombectomy decision-making [75]. Using acute DWI and perfusion imaging in concert with follow-up T2/FLAIR as training data, Nielsen et al. developed a deep CNN to distinguish infarcted tissue from the ischemic penumbra using only acute MR perfusion data. They achieved an AUC of 0.88 for diagnosing the final infarct volume and demonstrated an ability to predict the effect of thrombolysis treatment [76]. Additional studies have investigated the prediction of long-term language [77, 78] and motor [79] outcomes using ML evaluation of stroke territory volumes and locations.

3.4 Tumor Classification

The ability to classify brain tumor type and World Health Organization grade using MRI has long been a goal of machine learning research. As early as 1998, Poptani et al. used an artificial neural network to differentiate normal brain MR spectroscopy studies from those with infectious and neoplastic diseases, achieving diagnostic accuracies of 73% and 98% for low- and high-grade gliomas, respectively [80]. More recent work has commonly employed support vector machines (SVMs) for tumor classification tasks, perhaps due to evidence that SVMs may perform better than neural networks with small training datasets [81]. In 2008, Emblem et al. applied a SVM approach to the differentiation of low- and high-grade gliomas using MR perfusion imaging, achieving true positive and true negative rates of 0.76 and 0.82, respectively [82]. Subsequent efforts have shown promising results for differentiating among glioma grades and other tumor classes using SVM analysis of conventional MRI without [83] or with [84, 85] the addition of perfusion MRI. Survival of patients with glioblastoma can also be predicted using SVM analysis of features derived from MR perfusion [86], conventional [87], and combined conventional, DTI, and perfusion [88] imaging features. SVM [88] and other [89] machine learning techniques have also been employed in radiomics research to investigate imaging markers for prediction of tumor molecular subtypes.

Differentiating glioblastoma, primary central nervous system lymphoma, and solitary brain metastasis is a common neuroradiological challenge due to the relatively high prevalence of these tumor classes and the potential for overlapping imaging characteristics. A multilayer perceptron trained using MR perfusion and permeability imaging was able to differentiate these tumor classes with high accuracy (AUC 0.77) comparable to that of neuroradiologists [90].

In the setting of chemoradiation therapy for glioblastoma, differentiating viable tumor from treatment-related necrosis (pseudoprogression) on follow-up brain imaging is a common challenge in clinical neuro-oncology [91]. The application of SVMs to differentiating these entities has shown high accuracy using MR conventional imaging in combination with either perfusion [92] or permeability [93] data. A study evaluating the use of only conventional MRI sequences found that the best SVM accuracy was obtained using the FLAIR sequence (AUC 0.79), which achieved better accuracy than the neuroradiologist reviewers involved in the study [94].

3.5 Disease Detection

Applications of AI for neuroimaging disease detection exist within a spectrum of task complexity. On one end, there are applications that perform identification of a specific disease process, which often result in a binary classification (i.e., “normal” vs. “disease”). For example, several applications have been described for differentiating normal brain MRIs from those containing epileptogenic foci [95,96,97]. On the other end of the spectrum are broader surveillance applications designed to diagnose multiple critical pathologies, which one may envision as ultimately integrating within a real-world clinical radiology workflow. This latter, nascent category has been the source of much excitement [98,99,100,101].

In light of the importance and urgency of diagnosing intracranial hemorrhage, a disease process requiring neurosurgical evaluation and representing a contraindication for thrombolysis in the setting of acute stroke, the use of AI for identification of hemorrhage on head CT has been investigated in several studies. Whereas earlier attempts demonstrated promising results employing preprocessing algorithms heavily tailored for isolating hemorrhage [102,103,104], more recent efforts have investigated whether existing deep CNNs that have shown success at identifying everyday (nonmedical) images could be applied to head CTs. Desai et al. [105] compared two existing 2D deep CNNs for the identification of basal ganglia hemorrhage and found that GoogLeNet [106] outperformed AlexNet [28], noting that data augmentation and pre-training with the ImageNet repository [107] of everyday images improved diagnostic performance (AUC 1.0 for the best performing network). Transfer learning was similarly employed by Phong et al. [108], who achieved comparably high accuracies for identifying intracranial hemorrhage.

A study by Arbabshirani et al. [109] using CNNs to diagnose intracranial hemorrhage differed in several important ways. Whereas the above-described studies utilized relatively small datasets (<200 CT head studies), Arbabshirani et al. included over 46,000 CT head studies. To generate labels for this large number of studies, the authors expanded on other work investigating NLP applications to radiology reports [110, 111] and employed NLP to extrapolate a subset of human-annotated labels to generate machine-readable labels for the remainder of the radiology report dataset. The trained image classification model, which achieved an AUC of 0.846 for diagnosing intracranial hemorrhage, was then prospectively validated in a clinical workflow to flag new studies as either “routine” or “stat” in real time depending on the presence of intracranial hemorrhage. During this 3-month validation period, the network reclassified 94 of 347 CT head studies from “routine” to “stat.” Of the 94 studies flagged, 60 were confirmed by the interpreting radiologist as positive for intracranial hemorrhage. An additional four flagged studies were later reevaluated by a blinded overreader and deemed likely to reflect hemorrhage; in other words, the trained network had found hemorrhage that was missed by the interpreting radiologist.

Seeking to diagnose a broader range of intracranial pathologies, Prevedello et al. [112] trained a pair of CNNs using several hundred labeled head CTs for the purpose of identifying a number of critical findings. A CNN for processing images using brain tissue windows was able to diagnose hemorrhage, mass effect, and hydrocephalus with an AUC of 0.90, while a separately trained CNN evaluating images using a narrower “stroke window” achieved an AUC of 0.81 for the diagnosis of an acute ischemic stroke.

Approaching this challenge of simultaneously surveilling for multiple critical findings, Titano et al. [113] utilized a larger dataset of over 37,000 head CTs, first employing NLP to derive machine-readable labels from the radiology reports. These labels were then used for weakly supervised training of a 3D CNN modeled on ResNet-50 architecture to differentiate head CTs containing one or more critical findings (including acute fracture, intracranial hemorrhage, stroke, mass effect, and hydrocephalus) from those with only noncritical findings, achieving a sensitivity matching that of radiologists (sensitivity 0.79, specificity 0.48, AUC 0.73 for the model). To validate the clinical utility of the trained network, the authors performed a prospective double-blinded randomized controlled trial comparing how quickly the model versus radiologists could evaluate a head CT for critical findings, demonstrating that the model performed this task 150 times faster than the radiologists (mean 1.2 s vs. 177 s). Pending further multicenter prospective validation, such a tool could be used in a clinical radiology workflow to automatically triage head CTs for review.

4 Conclusion

Having already demonstrated success at a diverse range of neuroradiology tasks, artificial intelligence is poised to move beyond the proof-of-concept stage and impact many facets of clinical practice. The continued advancement of AI for neuroradiology depends in part on overcoming hurdles both technical and logistical in nature. The need for large-scale training data can be addressed by the release of more public annotated datasets, through development of applications that facilitate the creation of labels from existing radiology reports and DICOM metadata, crowdsourcing initiatives, and through improving data augmentation methodologies. The high computational costs of applying deep learning to volumetric data may be overcome by advances in GPU hardware and new techniques that better leverage multicore GPU architectures. Several open-source platforms now exist that facilitate deep learning efforts, including Keras, Caffe, and Theano, and the arrival of turnkey AI development applications is likely imminent. Similarly, while deep neural network architectures currently vary widely in design, standards may arise for specific classes of neuroimaging tasks. Finally, once a deep learning application is developed it must undergo validation, which faces its own regulatory and practical hurdles. For example, the opacity of deep networks, which traditionally function as “black boxes,” can make auditing a challenge, although this may be partially addressed through technical means like generating saliency overlays (i.e., “heat maps”). Regulatory bodies are considering new programs that would allow a vendor to make minor modifications to its existing application without requiring a full resubmission for approval [114], potentially enabling AI tools to continue improving during the postmarket phase.

These advancements, coupled with the tremendous interest in AI applications to neuroradiology, ensure that the field’s pace of evolution will continue to hasten. Whether or not we will witness an AI application that is able to pass the neuroradiology equivalent of the Turing Test—that is, AI possessing diagnostic abilities truly comparable to those of a neuroradiologist—remains a point of considerable debate. It is clear, however, that AI will become an increasingly important part of clinical neuroradiology and will carry with it the accompanying benefits to both patients and physicians.

5 Take-Home Points

  • Neuroimaging represents an intriguing target for AI applications due to the high morbidity and mortality associated with neurological diseases.

  • Technical challenges remain due to the volumetric and multiparametric nature of neuroradiological imaging; however advances in GPU power and development of novel deep learning architectures may enable these challenges to be overcome.

  • AI applications to neuroimaging have shown success at handling a range of tasks involving all stages from an imaging study’s acquisition through its interpretation, including study protocoling; shortening image acquisition times of conventional, DTI, and ASL MRI; generating synthetic images using a different imaging modality; and lesion segmentation.

  • Newer applications successfully identify and quantify specific disease processes including infarcts, tumors, and intracranial hemorrhage, and more robust approaches have shown success in surveilling for multiple acute neurological diseases.